kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] iommufd: Add nesting infrastructure
@ 2023-03-09  8:08 Yi Liu
  2023-03-09  8:08 ` [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace Yi Liu
                   ` (12 more replies)
  0 siblings, 13 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:08 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.

Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest I/O page table      |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush --+
    '-------------'                        |
    |             |                        V
    |             |           I/O page table pointer in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .------------------------.
    |   pIOMMU    |  |  FS for GIOVA->GPA     |
    |             |  '------------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.----------------------------------.
    |             |   | SS for GPA->HPA, unmanaged domain|
    |             |   '----------------------------------'
    '-------------'
Where:
 - FS = First stage page tables
 - SS = Second stage page tables
<Intel VT-d Nested translation>

In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.

This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. ioctl IOMMU_DEVICE_GET_HW_INFO is extended to report
the supported hwpt types bitmap to user. Selftest is added as well to cover
the new ioctls.

Complete code can be found in [2], QEMU could can be found in [3].

At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.

base-commit: 3dfe670c94c7fc4af42e5c08cdd8a110b594e18e

[1] https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.com/
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting

Thanks,
	Yi Liu

Lu Baolu (2):
  iommu: Add new iommu op to create domains owned by userspace
  iommu: Add nested domain support

Nicolin Chen (5):
  iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  iommufd/selftest: Add domain_alloc_user() support in iommu mock
  iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
  iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl

Yi Liu (5):
  iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  iommufd: Pass parent hwpt and user_data to
    iommufd_hw_pagetable_alloc()
  iommufd: IOMMU_HWPT_ALLOC allocation with user data
  iommufd: Add IOMMU_HWPT_INVALIDATE
  iommufd/device: Report supported hwpt_types

 drivers/iommu/iommufd/device.c                |   9 +-
 drivers/iommu/iommufd/hw_pagetable.c          | 242 +++++++++++++++++-
 drivers/iommu/iommufd/iommufd_private.h       |  16 +-
 drivers/iommu/iommufd/iommufd_test.h          |  30 +++
 drivers/iommu/iommufd/main.c                  |   7 +-
 drivers/iommu/iommufd/selftest.c              | 104 +++++++-
 include/linux/iommu.h                         |  11 +
 include/uapi/linux/iommufd.h                  |  65 +++++
 tools/testing/selftests/iommu/iommufd.c       | 126 ++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  71 +++++
 10 files changed, 654 insertions(+), 27 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
@ 2023-03-09  8:08 ` Yi Liu
  2023-03-10  0:56   ` Jason Gunthorpe
  2023-03-09  8:09 ` [PATCH 02/12] iommu: Add nested domain support Yi Liu
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:08 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Lu Baolu <baolu.lu@linux.intel.com>

Introduce a new iommu_domain op to create domains owned by userspace,
e.g. through iommufd. These domains have a few different properties
compares to kernel owned domains:

 - They may be UNMANAGED domains, but created with special parameters.
   For instance aperture size changes/number of levels, different
   IOPTE formats, or other things necessary to make a vIOMMU work

 - We have to track all the memory allocations with GFP_KERNEL_ACCOUNT
   to make the cgroup sandbox stronger

 - Device-specialty domains, such as NESTED domains can be created by
   iommufd.

The new op clearly says the domain is being created by IOMMUFD, that
the domain is intended for userspace use, and it provides a way to pass
a driver specific uAPI structure to customize the created domain to
exactly what the vIOMMU userspace driver requires.

iommu drivers that cannot support VFIO/IOMMUFD should not support this
op. This includes any driver that cannot provide a fully functional
UNMANAGED domain.

This op chooses to make the special parameters opaque to the core. This
suits the current usage model where accessing any of the IOMMU device
special parameters does require a userspace driver that matches the
kernel driver. If a need for common parameters, implemented similarly
by several drivers, arises then there is room in the design to grow a
generic parameter set as well.

This new op for now is only supposed to be used by iommufd, hence no
wrapper for it. iommufd would call the callback directly. As for domain
free, iommufd would use iommu_domain_free().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/iommu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3ef84ee359d2..a269bc62a31c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
  *           after use. Return the data buffer if success, or ERR_PTR on
  *           failure.
  * @domain_alloc: allocate iommu domain
+ * @domain_alloc_user: allocate user iommu domain
  * @probe_device: Add device to iommu driver handling
  * @release_device: Remove device from iommu driver handling
  * @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -266,6 +267,9 @@ struct iommu_ops {
 
 	/* Domain allocation and freeing by the iommu driver */
 	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
+	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
+						  struct iommu_domain *parent,
+						  const void *user_data);
 
 	struct iommu_device *(*probe_device)(struct device *dev);
 	void (*release_device)(struct device *dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/12] iommu: Add nested domain support
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
  2023-03-09  8:08 ` [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-17 10:25   ` Tian, Kevin
  2023-03-09  8:09 ` [PATCH 03/12] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Lu Baolu <baolu.lu@linux.intel.com>

Introduce a new domain type for a user space I/O address, which is nested
on top of another user space address represented by a UNMANAGED domain. The
mappings of a nested domain are managed by user space software, therefore
it's unnecessary to have map/unmap callbacks. But the updates of the PTEs
in the nested domain page table must be propagated to the caches on both
IOMMU (IOTLB) and devices (DevTLB).

The nested domain is allocated by the domain_alloc_user op, and attached
to the device through the existing iommu_attach_device/group() interfaces.

A new domain op, named cache_invalidate_user is added for the userspace to
flush the hardware caches for a nested domain through iommufd. No wrapper
for it, as it's only supposed to be used by iommufd.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/iommu.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a269bc62a31c..080278c8154d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -67,6 +67,9 @@ struct iommu_domain_geometry {
 
 #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address space */
 
+#define __IOMMU_DOMAIN_NESTED	(1U << 5)  /* User-managed IOVA nested on
+					      a stage-2 translation        */
+
 /*
  * This are the possible domain-types
  *
@@ -92,6 +95,7 @@ struct iommu_domain_geometry {
 				 __IOMMU_DOMAIN_DMA_API |	\
 				 __IOMMU_DOMAIN_DMA_FQ)
 #define IOMMU_DOMAIN_SVA	(__IOMMU_DOMAIN_SVA)
+#define IOMMU_DOMAIN_NESTED	(__IOMMU_DOMAIN_NESTED)
 
 struct iommu_domain {
 	unsigned type;
@@ -325,6 +329,7 @@ struct iommu_ops {
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *            queue
+ * @cache_invalidate_user: Flush hardware TLBs caching user space IO mappings
  * @iova_to_phys: translate iova to physical address
  * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
  *                           including no-snoop TLPs on PCIe or other platform
@@ -354,6 +359,8 @@ struct iommu_domain_ops {
 			       size_t size);
 	void (*iotlb_sync)(struct iommu_domain *domain,
 			   struct iommu_iotlb_gather *iotlb_gather);
+	void (*cache_invalidate_user)(struct iommu_domain *domain,
+				      void *user_data);
 
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
 				    dma_addr_t iova);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/12] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
  2023-03-09  8:08 ` [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace Yi Liu
  2023-03-09  8:09 ` [PATCH 02/12] iommu: Add nested domain support Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-10  1:17   ` Baolu Lu
  2023-03-09  8:09 ` [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

This converts IOMMUFD to use iommu_domain_alloc_user() for iommu_domain
creation.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 13bdab4c801b..84b4a11e62f8 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -5,6 +5,7 @@
 #include <linux/iommu.h>
 #include <uapi/linux/iommufd.h>
 
+#include "../iommu-priv.h"
 #include "iommufd_private.h"
 
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
@@ -55,6 +56,7 @@ struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, bool immediate_attach)
 {
+	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
@@ -69,7 +71,10 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
 
-	hwpt->domain = iommu_domain_alloc(idev->dev->bus);
+	if (ops->domain_alloc_user)
+		hwpt->domain = ops->domain_alloc_user(idev->dev, NULL, NULL);
+	else
+		hwpt->domain = iommu_domain_alloc(idev->dev->bus);
 	if (!hwpt->domain) {
 		rc = -ENOMEM;
 		goto out_abort;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (2 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 03/12] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-10  2:10   ` Baolu Lu
  2023-03-17 10:23   ` Tian, Kevin
  2023-03-09  8:09 ` [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

Nested translation has stage-1 and stage-2 page tables. A stage-1 page
table is managed by user space, and it needs to work with a stage-2 page
table, which is a parent hwpt for the stage-1 hwpt.

iommu core already supports accepting parent iommu_domain and user_data
to allocate an iommu_domain. This makes iommufd_hw_pagetable_alloc() to
accept the parent hwpt and user_data, and relays them to iommu core, to
prepare for supporting hw_pagetable allocation with user_data.

Also, add a parent pointer in struct iommufd_hw_pagetable for taking and
releasing its refcount.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          |  2 +-
 drivers/iommu/iommufd/hw_pagetable.c    | 28 ++++++++++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h |  5 ++++-
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 5c352807d946..19cd6df46c6a 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -558,7 +558,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	}
 
 	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
-					  immediate_attach);
+					  NULL, NULL, immediate_attach);
 	if (IS_ERR(hwpt)) {
 		destroy_hwpt = ERR_CAST(hwpt);
 		goto out_unlock;
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 84b4a11e62f8..16e92a1c150b 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -24,6 +24,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	if (hwpt->domain)
 		iommu_domain_free(hwpt->domain);
 
+	if (hwpt->parent)
+		refcount_dec(&hwpt->parent->obj.users);
 	refcount_dec(&hwpt->ioas->obj.users);
 }
 
@@ -46,6 +48,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  * @ictx: iommufd context
  * @ioas: IOAS to associate the domain with
  * @idev: Device to get an iommu_domain for
+ * @parent: Optional parent HWPT to associate with the domain with
+ * @user_data: Optional user_data pointer
  * @immediate_attach: True if idev should be attached to the hwpt
  *
  * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
@@ -54,14 +58,20 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
  */
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			   struct iommufd_device *idev, bool immediate_attach)
+			   struct iommufd_device *idev,
+			   struct iommufd_hw_pagetable *parent,
+			   void *user_data, bool immediate_attach)
 {
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
+	struct iommu_domain *parent_domain = NULL;
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
 	lockdep_assert_held(&ioas->mutex);
 
+	if (parent && !ops->domain_alloc_user)
+		return ERR_PTR(-EOPNOTSUPP);
+
 	hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
 	if (IS_ERR(hwpt))
 		return hwpt;
@@ -70,9 +80,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	/* Pairs with iommufd_hw_pagetable_destroy() */
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
+	if (parent) {
+		hwpt->parent = parent;
+		parent_domain = parent->domain;
+		refcount_inc(&parent->obj.users);
+	}
 
 	if (ops->domain_alloc_user)
-		hwpt->domain = ops->domain_alloc_user(idev->dev, NULL, NULL);
+		hwpt->domain = ops->domain_alloc_user(idev->dev,
+						      parent_domain, user_data);
 	else
 		hwpt->domain = iommu_domain_alloc(idev->dev->bus);
 	if (!hwpt->domain) {
@@ -80,6 +96,11 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		goto out_abort;
 	}
 
+	/* It must be either NESTED or UNMANAGED, depending on parent_domain */
+	if ((parent_domain && hwpt->domain->type != IOMMU_DOMAIN_NESTED) ||
+	    (!parent_domain && hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED))
+		goto out_abort;
+
 	/*
 	 * Set the coherency mode before we do iopt_table_add_domain() as some
 	 * iommus have a per-PTE bit that controls it and need to decide before
@@ -150,7 +171,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	}
 
 	mutex_lock(&ioas->mutex);
-	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
+	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
+					  NULL, NULL, false);
 	mutex_unlock(&ioas->mutex);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 05b5ad66f716..182c074eecdc 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -243,6 +243,7 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
  */
 struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
+	struct iommufd_hw_pagetable *parent;
 	struct iommufd_ioas *ioas;
 	struct iommu_domain *domain;
 	bool auto_domain : 1;
@@ -254,7 +255,9 @@ struct iommufd_hw_pagetable {
 
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
-			   struct iommufd_device *idev, bool immediate_attach);
+			   struct iommufd_device *idev,
+			   struct iommufd_hw_pagetable *parent,
+			   void *user_data, bool immediate_attach);
 int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (3 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-10  2:25   ` Baolu Lu
  2023-03-10 15:29   ` Jason Gunthorpe
  2023-03-09  8:09 ` [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Nicolin Chen <nicolinc@nvidia.com>

A user-managed hw_pagetable does not need to get populated, since it is
managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
calls into a helper, where the hwpt pointer will be redirected to its
hwpt->parent if it's available.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 16e92a1c150b..6e45ec0a66fa 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
 	return 0;
 }
 
+static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
+{
+	int rc;
+
+	if (hwpt->parent)
+		hwpt = hwpt->parent;
+
+	if (!list_empty(&hwpt->hwpt_item))
+		return 0;
+
+	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
+	if (rc)
+		return rc;
+	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
+	return 0;
+}
+
 /**
  * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
  * @ictx: iommufd context
@@ -131,10 +148,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			goto out_unlock;
 	}
 
-	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
+	rc = iommufd_hw_pagetable_link_ioas(hwpt);
 	if (rc)
 		goto out_detach;
-	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
 
 	mutex_unlock(&idev->igroup->lock);
 	return hwpt;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (4 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-10  3:02   ` Baolu Lu
  2023-03-09  8:09 ` [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
But it can only allocate hw_pagetables linked with IOAS. There are needs
to support hw_pagetable allocation with parameters specified by user. For
example, in nested translation, user needs to allocate hw_pagetable for
the stage-1 translation (e.g. a single I/O page table or a set of I/O page
tables) with user data. It also needs provide a stage-2 hw_pagetable which
is linked to the GPA IOAS.

This extends IOMMU_HWPT_ALLOC to accept user specified parameter and hwpt
ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires
a parent hwpt to point to stage-2 translation.

enum iommu_hwpt_type is defined to differentiate the user parameters use
by different usages. For the allocations that don't require user parameter,
IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility. Other types
would be added by future iommu vendor driver extensions.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 94 +++++++++++++++++++++++++---
 drivers/iommu/iommufd/main.c         |  2 +-
 include/uapi/linux/iommufd.h         | 30 +++++++++
 3 files changed, 115 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 6e45ec0a66fa..64e7cf7142e1 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -165,34 +165,106 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	return ERR_PTR(rc);
 }
 
+/*
+ * size of page table type specific data, indexed by
+ * enum iommu_hwpt_type.
+ */
+static const size_t iommufd_hwpt_alloc_data_size[] = {
+	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
+};
+
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
-	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_hw_pagetable *hwpt, *parent = NULL;
+	struct iommufd_object *pt_obj;
 	struct iommufd_device *idev;
 	struct iommufd_ioas *ioas;
+	const struct iommu_ops *ops;
+	void *data = NULL;
+	u32 klen;
 	int rc;
 
-	if (cmd->flags)
+	if (cmd->__reserved || cmd->flags)
 		return -EOPNOTSUPP;
 
 	idev = iommufd_get_device(ucmd, cmd->dev_id);
 	if (IS_ERR(idev))
 		return PTR_ERR(idev);
 
-	ioas = iommufd_get_ioas(ucmd, cmd->pt_id);
-	if (IS_ERR(ioas)) {
-		rc = PTR_ERR(ioas);
+	ops = dev_iommu_ops(idev->dev);
+	if (!ops) {
+		rc = -EOPNOTSUPP;
 		goto out_put_idev;
 	}
 
+	/* Only support IOMMU_HWPT_TYPE_DEFAULT for now */
+	if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) {
+		rc = -EINVAL;
+		goto out_put_idev;
+	}
+
+	pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY);
+	if (IS_ERR(pt_obj)) {
+		rc = -EINVAL;
+		goto out_put_idev;
+	}
+
+	switch (pt_obj->type) {
+	case IOMMUFD_OBJ_IOAS:
+		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
+		break;
+	case IOMMUFD_OBJ_HW_PAGETABLE:
+		/* pt_id points HWPT only when data_type is !IOMMU_HWPT_TYPE_DEFAULT */
+		if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+
+		parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+		/*
+		 * Cannot allocate user-managed hwpt linking to auto_created
+		 * hwpt. If the parent hwpt is already a user-managed hwpt,
+		 * don't allocate another user-managed hwpt linking to it.
+		 */
+		if (parent->auto_domain || parent->parent) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+		ioas = parent->ioas;
+		break;
+	default:
+		rc = -EINVAL;
+		goto out_put_pt;
+	}
+
+	klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
+	if (klen) {
+		if (!cmd->data_len) {
+			rc = -EINVAL;
+			goto out_put_pt;
+		}
+
+		data = kzalloc(klen, GFP_KERNEL);
+		if (!data) {
+			rc = -ENOMEM;
+			goto out_put_pt;
+		}
+
+		rc = copy_struct_from_user(data, klen,
+					   u64_to_user_ptr(cmd->data_uptr),
+					   cmd->data_len);
+		if (rc)
+			goto out_free_data;
+	}
+
 	mutex_lock(&ioas->mutex);
 	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
-					  NULL, NULL, false);
+					  parent, data, false);
 	mutex_unlock(&ioas->mutex);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
-		goto out_put_ioas;
+		goto out_free_data;
 	}
 
 	cmd->out_hwpt_id = hwpt->obj.id;
@@ -200,12 +272,14 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	if (rc)
 		goto out_hwpt;
 	iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
-	goto out_put_ioas;
+	goto out_free_data;
 
 out_hwpt:
 	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
-out_put_ioas:
-	iommufd_put_object(&ioas->obj);
+out_free_data:
+	kfree(data);
+out_put_pt:
+	iommufd_put_object(pt_obj);
 out_put_idev:
 	iommufd_put_object(&idev->obj);
 	return rc;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index f079c0bda46b..7ab1e2c638a1 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -295,7 +295,7 @@ struct iommufd_ioctl_op {
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
-		 __reserved),
+		 data_uptr),
 	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
 		 struct iommu_hw_info, __reserved),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4ac525897b82..48781ff40a37 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -347,6 +347,14 @@ struct iommu_vfio_ioas {
 };
 #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
 
+/**
+ * enum iommu_hwpt_type - IOMMU HWPT Type
+ * @IOMMU_HWPT_TYPE_DEFAULT: default
+ */
+enum iommu_hwpt_type {
+	IOMMU_HWPT_TYPE_DEFAULT,
+};
+
 /**
  * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
  * @size: sizeof(struct iommu_hwpt_alloc)
@@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
  * @pt_id: The IOAS to connect this HWPT to
  * @out_hwpt_id: The ID of the new HWPT
  * @__reserved: Must be 0
+ * @data_type: One of enum iommu_hwpt_type
+ * @data_len: Length of the type specific data
+ * @data_uptr: User pointer to the type specific data
  *
  * Explicitly allocate a hardware page table object. This is the same object
  * type that is returned by iommufd_device_attach() and represents the
  * underlying iommu driver's iommu_domain kernel object.
  *
  * A normal HWPT will be created with the mappings from the given IOAS.
+ * The @data_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or
+ * another type (being listed below) to specialize a kernel-managed HWPT.
+ *
+ * A user-managed HWPT will be created from a given parent HWPT via @pt_id, in
+ * which the parent HWPT must be allocated previously via the same ioctl from a
+ * given IOAS. The @data_type must not be set to IOMMU_HWPT_TYPE_DEFAULT but a
+ * pre-defined type corresponding to the underlying IOMMU hardware.
+ *
+ * If the @data_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the @data_len and
+ * the @data_uptr will be ignored. Otherwise, both of them must be given.
+ *
+ * +==============================+=====================================+===========+
+ * | @data_type                   |    Data structure in @data_uptr     |   @pt_id  |
+ * +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
+ * +------------------------------+-------------------------------------+-----------+
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -369,6 +396,9 @@ struct iommu_hwpt_alloc {
 	__u32 pt_id;
 	__u32 out_hwpt_id;
 	__u32 __reserved;
+	__u32 data_type;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (5 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-10  3:15   ` Baolu Lu
  2023-03-10 17:50   ` Jason Gunthorpe
  2023-03-09  8:09 ` [PATCH 08/12] iommufd/device: Report supported hwpt_types Yi Liu
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

In nested translation, the stage-1 page table is user-managed and used
by IOMMU hardware, so destroying mappings in the stage-1 page table should
be followed with an IOTLB invalidation.

This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 56 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  9 ++++
 drivers/iommu/iommufd/main.c            |  3 ++
 include/uapi/linux/iommufd.h            | 27 ++++++++++++
 4 files changed, 95 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 64e7cf7142e1..67facca98de1 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -284,3 +284,59 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(&idev->obj);
 	return rc;
 }
+
+/*
+ * size of page table type specific invalidate_info, indexed by
+ * enum iommu_hwpt_type.
+ */
+static const size_t iommufd_hwpt_invalidate_info_size[] = {};
+
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
+	struct iommufd_hw_pagetable *hwpt;
+	u64 user_ptr;
+	u32 user_data_len, klen;
+	int rc = 0;
+
+	/*
+	 * For a user-managed HWPT, type should not be IOMMU_HWPT_TYPE_DEFAULT.
+	 * data_len should not exceed the size of iommufd_invalidate_buffer.
+	 */
+	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len ||
+	    cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
+		return -EOPNOTSUPP;
+
+	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	/* Do not allow any kernel-managed hw_pagetable */
+	if (!hwpt->parent) {
+		rc = -EINVAL;
+		goto out_put_hwpt;
+	}
+
+	klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
+	if (!klen) {
+		rc = -EINVAL;
+		goto out_put_hwpt;
+	}
+
+	/*
+	 * Copy the needed fields before reusing the ucmd buffer, this
+	 * avoids memory allocation in this path.
+	 */
+	user_ptr = cmd->data_uptr;
+	user_data_len = cmd->data_len;
+
+	rc = copy_struct_from_user(cmd, klen,
+				   u64_to_user_ptr(user_ptr), user_data_len);
+	if (rc)
+		goto out_put_hwpt;
+
+	hwpt->domain->ops->cache_invalidate_user(hwpt->domain, cmd);
+out_put_hwpt:
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 182c074eecdc..d879264d1acf 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -265,6 +265,7 @@ struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_detach(struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
 
 static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 					    struct iommufd_hw_pagetable *hwpt)
@@ -276,6 +277,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 		refcount_dec(&hwpt->obj.users);
 }
 
+static inline struct iommufd_hw_pagetable *
+iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_HW_PAGETABLE),
+			    struct iommufd_hw_pagetable, obj);
+}
+
 struct iommufd_group {
 	struct kref ref;
 	struct mutex lock;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 7ab1e2c638a1..2cf45f65b637 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -263,6 +263,7 @@ union ucmd_buffer {
 	struct iommu_destroy destroy;
 	struct iommu_hwpt_alloc hwpt;
 	struct iommu_hw_info info;
+	struct iommu_hwpt_invalidate cache;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
 	struct iommu_ioas_copy ioas_copy;
@@ -298,6 +299,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 data_uptr),
 	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
 		 struct iommu_hw_info, __reserved),
+	IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
+		 struct iommu_hwpt_invalidate, data_uptr),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
 		 struct iommu_ioas_alloc, out_ioas_id),
 	IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 48781ff40a37..d0962c41f8d6 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -47,6 +47,7 @@ enum {
 	IOMMUFD_CMD_VFIO_IOAS,
 	IOMMUFD_CMD_HWPT_ALLOC,
 	IOMMUFD_CMD_DEVICE_GET_HW_INFO,
+	IOMMUFD_CMD_HWPT_INVALIDATE,
 };
 
 /**
@@ -447,4 +448,30 @@ struct iommu_hw_info {
 	__u32 __reserved;
 };
 #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
+
+/**
+ * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
+ * @size: sizeof(struct iommu_hwpt_invalidate)
+ * @hwpt_id: HWPT ID of target hardware page table for the invalidation
+ * @data_type: One of enum iommu_hwpt_type
+ * @data_len: Length of the type specific data
+ * @data_uptr: User pointer to the type specific data
+ *
+ * Invalidate the iommu cache for user-managed page table. Modifications
+ * on user-managed page table should be followed with this operation to
+ * sync the IOTLB. This is only needed by user-managed hw_pagetables, so
+ * the @data_type should never be IOMMU_HWPT_TYPE_DEFAULT.
+ *
+ * +==============================+========================================+
+ * | @data_type                   |     Data structure in @data_uptr       |
+ * +------------------------------+----------------------------------------+
+ */
+struct iommu_hwpt_invalidate {
+	__u32 size;
+	__u32 hwpt_id;
+	__u32 data_type;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
+};
+#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (6 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-10  3:30   ` Baolu Lu
  2023-03-09  8:09 ` [PATCH 09/12] iommufd/selftest: Add domain_alloc_user() support in iommu mock Yi Liu
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

This provides a way for userspace to probe the supported hwpt data
types by kernel. Currently, kernel only supports IOMMU_HWPT_TYPE_DEFAULT,
new types would be added per vendor drivers' extension.

Userspace that wants to allocate hw_pagetable with user data should check
this. While for the allocation without user data, no need for it. It is
supported by default.

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          |  1 +
 drivers/iommu/iommufd/hw_pagetable.c    | 18 +++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h |  2 ++
 drivers/iommu/iommufd/main.c            |  2 +-
 include/uapi/linux/iommufd.h            |  8 ++++++++
 5 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 19cd6df46c6a..0328071dcac1 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
 
 	cmd->out_data_type = ops->driver_type;
 	cmd->data_len = length;
+	cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
 
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 67facca98de1..160712256c64 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -173,6 +173,14 @@ static const size_t iommufd_hwpt_alloc_data_size[] = {
 	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
 };
 
+/*
+ * bitmaps of supported hwpt types of by underlying iommu, indexed
+ * by ops->driver_type which is one of enum iommu_hw_info_type.
+ */
+const u64 iommufd_hwpt_type_bitmaps[] =  {
+	[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+};
+
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
@@ -182,7 +190,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	struct iommufd_ioas *ioas;
 	const struct iommu_ops *ops;
 	void *data = NULL;
-	u32 klen;
+	u32 driver_type, klen;
 	int rc;
 
 	if (cmd->__reserved || cmd->flags)
@@ -198,8 +206,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_idev;
 	}
 
-	/* Only support IOMMU_HWPT_TYPE_DEFAULT for now */
-	if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) {
+	driver_type = ops->driver_type;
+
+	/* data_type should be a supported type by the driver */
+	if (WARN_ON(driver_type >= ARRAY_SIZE(iommufd_hwpt_type_bitmaps)) ||
+	    !((1 << cmd->data_type) &
+			iommufd_hwpt_type_bitmaps[driver_type])) {
 		rc = -EINVAL;
 		goto out_put_idev;
 	}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index d879264d1acf..164ccfc2e6e0 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -321,6 +321,8 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
 void iommufd_device_destroy(struct iommufd_object *obj);
 int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd);
 
+extern const u64 iommufd_hwpt_type_bitmaps[];
+
 struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 2cf45f65b637..7ec3ceac01b3 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -298,7 +298,7 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
 		 data_uptr),
 	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
-		 struct iommu_hw_info, __reserved),
+		 struct iommu_hw_info, out_hwpt_type_bitmap),
 	IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
 		 struct iommu_hwpt_invalidate, data_uptr),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index d0962c41f8d6..e2eff9c56ab3 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -421,6 +421,8 @@ enum iommu_hw_info_type {
  * @out_data_type: Output the iommu hardware info type, it is one of
  *                 enum iommu_hw_info_type.
  * @__reserved: Must be 0
+ * @out_hwpt_type_bitmap: Output the supported page table type. Each
+ *                        bit is defined in enum iommu_hwpt_type.
  *
  * Query the hardware iommu information for given device which has been
  * bound to iommufd. @data_len is the size of the buffer which captures
@@ -435,6 +437,11 @@ enum iommu_hw_info_type {
  * The @out_data_type will be filled if the ioctl succeeds. It would
  * be used to decode the data filled in the buffer pointed by @data_ptr.
  *
+ * @out_hwpt_type_bitmap reports the supported hwpt types. This differs
+ * per the @out_data_type. Userspace should check it before allocating a
+ * user-managed hw_pagetable with user data, unless it allocates a default
+ * hw_pagetable that does not need user data.
+ *
  * This is only available for the physical devices bound to iommufd as
  * only physical devices can have hardware IOMMU.
  */
@@ -446,6 +453,7 @@ struct iommu_hw_info {
 	__aligned_u64 data_ptr;
 	__u32 out_data_type;
 	__u32 __reserved;
+	__aligned_u64 out_hwpt_type_bitmap;
 };
 #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/12] iommufd/selftest: Add domain_alloc_user() support in iommu mock
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (7 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 08/12] iommufd/device: Report supported hwpt_types Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-09  8:09 ` [PATCH 10/12] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data Yi Liu
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Nicolin Chen <nicolinc@nvidia.com>

Add mock_domain_alloc_user function and iommu_hwpt_selftest data structure
to support user space selftest program to allocate domains with user data.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c       |  8 +++-
 drivers/iommu/iommufd/hw_pagetable.c | 31 +++++++++++--
 drivers/iommu/iommufd/iommufd_test.h | 16 +++++++
 drivers/iommu/iommufd/selftest.c     | 67 ++++++++++++++++++++++++----
 4 files changed, 109 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 0328071dcac1..f95b558f5e95 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -322,7 +322,13 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
 
 	cmd->out_data_type = ops->driver_type;
 	cmd->data_len = length;
-	cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
+
+	if (ops->driver_type != IOMMU_HW_INFO_TYPE_SELFTEST)
+		cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
+#ifdef CONFIG_IOMMUFD_TEST
+	else
+		cmd->out_hwpt_type_bitmap = U64_MAX; // Pretend to support all types
+#endif
 
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 160712256c64..0871da848447 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -7,6 +7,7 @@
 
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
+#include "iommufd_test.h"
 
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 {
@@ -181,6 +182,18 @@ const u64 iommufd_hwpt_type_bitmaps[] =  {
 	[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
 };
 
+/* Return true if type is supported, otherwise false */
+static inline bool
+iommufd_hwpt_type_check(enum iommu_hw_info_type driver_type,
+			enum iommu_hwpt_type type)
+{
+	if (WARN_ON(driver_type >= ARRAY_SIZE(iommufd_hwpt_type_bitmaps)))
+		return false;
+
+	return ((1 << type) &
+			iommufd_hwpt_type_bitmaps[driver_type]);
+}
+
 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
@@ -191,6 +204,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	const struct iommu_ops *ops;
 	void *data = NULL;
 	u32 driver_type, klen;
+	bool support;
 	int rc;
 
 	if (cmd->__reserved || cmd->flags)
@@ -209,9 +223,13 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	driver_type = ops->driver_type;
 
 	/* data_type should be a supported type by the driver */
-	if (WARN_ON(driver_type >= ARRAY_SIZE(iommufd_hwpt_type_bitmaps)) ||
-	    !((1 << cmd->data_type) &
-			iommufd_hwpt_type_bitmaps[driver_type])) {
+	if (driver_type != IOMMU_HW_INFO_TYPE_SELFTEST)
+		support = iommufd_hwpt_type_check(driver_type, cmd->data_type);
+#ifdef CONFIG_IOMMUFD_TEST
+	else
+		support = true; /* selftest pretend to support all types */
+#endif
+	if (!support) {
 		rc = -EINVAL;
 		goto out_put_idev;
 	}
@@ -250,7 +268,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_pt;
 	}
 
-	klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
+	if (cmd->data_type != IOMMU_HWPT_TYPE_SELFTTEST)
+		klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
+#ifdef CONFIG_IOMMUFD_TEST
+	else
+		klen = sizeof(struct iommu_hwpt_selftest);
+#endif
 	if (klen) {
 		if (!cmd->data_len) {
 			rc = -EINVAL;
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 578691602d94..8bbb4f4fa1d7 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -115,4 +115,20 @@ struct iommu_hw_info_selftest {
 	__u32 test_reg;
 };
 
+/* Should not be equal to any defined value in enum iommu_hwpt_type */
+#define IOMMU_HWPT_TYPE_SELFTTEST	0xbadbeef
+
+/**
+ * struct iommu_hwpt_selftest
+ *
+ * @flags: page table entry attributes
+ * @test_config: default iotlb setup (value IOMMU_TEST_IOTLB_DEFAULT)
+ */
+struct iommu_hwpt_selftest {
+#define IOMMU_TEST_FLAG_NESTED		(1ULL << 0)
+	__u64 flags;
+#define IOMMU_TEST_IOTLB_DEFAULT	0xbadbeef
+	__u64 test_config;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index b50ec3528ec1..b9c7f3bf64b5 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -84,7 +84,9 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
 
 struct mock_iommu_domain {
 	struct iommu_domain domain;
+	struct mock_iommu_domain *parent;
 	struct xarray pfns;
+	u32 iotlb;
 };
 
 enum selftest_obj_type {
@@ -142,26 +144,66 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length)
 	return info;
 }
 
-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+static const struct iommu_ops mock_ops;
+static struct iommu_domain_ops domain_nested_ops;
+
+static struct iommu_domain *
+__mock_domain_alloc(unsigned int iommu_domain_type,
+		    struct mock_iommu_domain *mock_parent,
+		    const struct iommu_hwpt_selftest *user_cfg)
 {
 	struct mock_iommu_domain *mock;
 
 	if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED)
 		return &mock_blocking_domain;
 
-	if (WARN_ON(iommu_domain_type != IOMMU_DOMAIN_UNMANAGED))
-		return NULL;
-
 	mock = kzalloc(sizeof(*mock), GFP_KERNEL);
 	if (!mock)
 		return NULL;
-	mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
-	mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
+	mock->parent = mock_parent;
+	mock->domain.type = iommu_domain_type;
 	mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
-	xa_init(&mock->pfns);
+	if (mock_parent) {
+		mock->iotlb = user_cfg->test_config;
+		mock->domain.ops = &domain_nested_ops;
+	} else {
+		mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
+		mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
+		mock->domain.ops = mock_ops.default_domain_ops;
+		xa_init(&mock->pfns);
+	}
 	return &mock->domain;
 }
 
+static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
+{
+	if (WARN_ON(iommu_domain_type != IOMMU_DOMAIN_BLOCKED &&
+		    iommu_domain_type != IOMMU_DOMAIN_UNMANAGED))
+		return NULL;
+	return __mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED, NULL, NULL);
+}
+
+static struct iommu_domain *mock_domain_alloc_user(struct device *dev,
+						   struct iommu_domain *parent,
+						   const void *user_data)
+{
+	const struct iommu_hwpt_selftest *user_cfg = user_data;
+	struct mock_iommu_domain *mock_parent = NULL;
+	unsigned int iommu_domain_type = IOMMU_DOMAIN_UNMANAGED;
+
+	if (parent) {
+		if (parent->ops != mock_ops.default_domain_ops)
+			return NULL;
+		if (!user_cfg || !(user_cfg->flags & IOMMU_TEST_FLAG_NESTED))
+			return NULL;
+		iommu_domain_type = IOMMU_DOMAIN_NESTED;
+		mock_parent = container_of(parent,
+					   struct mock_iommu_domain, domain);
+	}
+
+	return __mock_domain_alloc(iommu_domain_type, mock_parent, user_cfg);
+}
+
 static void mock_domain_free(struct iommu_domain *domain)
 {
 	struct mock_iommu_domain *mock =
@@ -296,6 +338,7 @@ static const struct iommu_ops mock_ops = {
 	.driver_type = IOMMU_HW_INFO_TYPE_SELFTEST,
 	.hw_info = mock_domain_hw_info,
 	.domain_alloc = mock_domain_alloc,
+	.domain_alloc_user = mock_domain_alloc_user,
 	.capable = mock_domain_capable,
 	.set_platform_dma_ops = mock_domain_set_plaform_dma_ops,
 	.default_domain_ops =
@@ -308,6 +351,11 @@ static const struct iommu_ops mock_ops = {
 		},
 };
 
+static struct iommu_domain_ops domain_nested_ops = {
+	.free = mock_domain_free,
+	.attach_dev = mock_domain_nop_attach,
+};
+
 struct iommu_device mock_iommu_device = {
 	.ops = &mock_ops,
 };
@@ -324,7 +372,10 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id,
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 	hwpt = container_of(obj, struct iommufd_hw_pagetable, obj);
-	if (hwpt->domain->ops != mock_ops.default_domain_ops) {
+	if ((hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED &&
+	     hwpt->domain->ops != mock_ops.default_domain_ops) ||
+	    (hwpt->domain->type == IOMMU_DOMAIN_NESTED &&
+	     hwpt->domain->ops != &domain_nested_ops)) {
 		iommufd_put_object(&hwpt->obj);
 		return ERR_PTR(-EINVAL);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/12] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (8 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 09/12] iommufd/selftest: Add domain_alloc_user() support in iommu mock Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-09  8:09 ` [PATCH 11/12] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Nicolin Chen <nicolinc@nvidia.com>

The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a
customized domain. Add its coverage for both a regular domain case and
a nested domain case.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 114 +++++++++++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  36 ++++++
 2 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d2ce2ddbdc40..783afd6ccd2e 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -125,6 +125,7 @@ TEST_F(iommufd, cmd_length)
 	TEST_LENGTH(iommu_option, IOMMU_OPTION);
 	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS);
 	TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO);
+	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC);
 #undef TEST_LENGTH
 }
 
@@ -197,6 +198,7 @@ FIXTURE_VARIANT(iommufd_ioas)
 {
 	unsigned int mock_domains;
 	unsigned int memory_limit;
+	bool new_hwpt;
 };
 
 FIXTURE_SETUP(iommufd_ioas)
@@ -236,6 +238,12 @@ FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain)
 	.mock_domains = 1,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain_hwpt)
+{
+	.mock_domains = 1,
+	.new_hwpt = true,
+};
+
 FIXTURE_VARIANT_ADD(iommufd_ioas, two_mock_domain)
 {
 	.mock_domains = 2,
@@ -263,6 +271,93 @@ TEST_F(iommufd_ioas, ioas_destroy)
 	}
 }
 
+TEST_F(iommufd_ioas, hwpt_alloc)
+{
+	uint32_t new_hwpt_id = 0;
+
+	if (self->stdev_id && self->device_id) {
+		test_cmd_hwpt_alloc(self->device_id, self->ioas_id, &new_hwpt_id);
+		test_cmd_mock_domain_replace(self->stdev_id, new_hwpt_id);
+		/* hw_pagetable cannot be freed if a device is attached to it */
+		EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, new_hwpt_id));
+
+		/* Detach from the new hw_pagetable and try again */
+		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+		test_ioctl_destroy(new_hwpt_id);
+	} else {
+		test_err_cmd_hwpt_alloc(ENOENT, self->device_id,
+					self->ioas_id, &new_hwpt_id);
+		test_err_mock_domain_replace(ENOENT,
+					     self->stdev_id, new_hwpt_id);
+	}
+}
+
+TEST_F(iommufd_ioas, nested_hwpt_alloc)
+{
+	uint32_t nested_hwpt_id[2] = {};
+	uint32_t parent_hwpt_id = 0;
+	uint32_t test_hwpt_id = 0;
+
+	if (self->device_id) {
+		/* Negative tests */
+		test_err_cmd_hwpt_alloc(ENOENT, self->ioas_id, self->device_id,
+					&test_hwpt_id);
+		test_err_cmd_hwpt_alloc(EINVAL, self->device_id,
+					self->device_id, &test_hwpt_id);
+
+		/* Allocate two nested hwpts sharing one common parent hwpt */
+		test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
+				    &parent_hwpt_id);
+
+		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+					   &nested_hwpt_id[0]);
+		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
+					   &nested_hwpt_id[1]);
+
+		/* Negative test: a nested hwpt on top of a nested hwpt */
+		test_err_cmd_hwpt_alloc_nested(EINVAL, self->device_id,
+					       nested_hwpt_id[0],
+					       &test_hwpt_id);
+		/* Negative test: parent hwpt now cannot be freed */
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
+
+		/* Attach device to nested_hwpt_id[0] that then will be busy */
+		test_cmd_mock_domain_replace(self->stdev_id,
+					     nested_hwpt_id[0]);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, nested_hwpt_id[0]));
+
+		/* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */
+		test_cmd_mock_domain_replace(self->stdev_id,
+					     nested_hwpt_id[1]);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
+		test_ioctl_destroy(nested_hwpt_id[0]);
+
+		/* Detach from nested_hwpt_id[1] and destroy it */
+		test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
+		test_ioctl_destroy(nested_hwpt_id[1]);
+
+		/* Detach from the parent hw_pagetable and destroy it */
+		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
+		test_ioctl_destroy(parent_hwpt_id);
+	} else {
+		test_err_cmd_hwpt_alloc(ENOENT, self->device_id, self->ioas_id,
+					&parent_hwpt_id);
+		test_err_cmd_hwpt_alloc_nested(ENOENT, self->device_id,
+					       parent_hwpt_id,
+					       &nested_hwpt_id[0]);
+		test_err_cmd_hwpt_alloc_nested(ENOENT, self->device_id,
+					       parent_hwpt_id,
+					       &nested_hwpt_id[1]);
+		test_err_mock_domain_replace(ENOENT, self->stdev_id,
+					     nested_hwpt_id[0]);
+		test_err_mock_domain_replace(ENOENT, self->stdev_id,
+					     nested_hwpt_id[1]);
+	}
+}
+
 TEST_F(iommufd_ioas, hwpt_attach)
 {
 	/* Create a device attached directly to a hwpt */
@@ -632,6 +727,8 @@ TEST_F(iommufd_ioas, access_pin)
 			       MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES);
 
 	for (npages = 1; npages < BUFFER_SIZE / PAGE_SIZE; npages++) {
+		uint32_t new_hwpt_id = 0;
+		uint32_t mock_device_id;
 		uint32_t mock_stdev_id;
 		uint32_t mock_hwpt_id;
 
@@ -665,12 +762,27 @@ TEST_F(iommufd_ioas, access_pin)
 				   _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
 				   &access_cmd));
 		test_cmd_mock_domain(self->ioas_id, &mock_stdev_id,
-				     &mock_hwpt_id, NULL);
+				     &mock_hwpt_id, &mock_device_id);
 		check_map_cmd.id = mock_hwpt_id;
+		if (variant->new_hwpt) {
+			test_cmd_hwpt_alloc(mock_device_id, self->ioas_id,
+					    &new_hwpt_id);
+			test_cmd_mock_domain_replace(mock_stdev_id,
+						     new_hwpt_id);
+			check_map_cmd.id = new_hwpt_id;
+		} else {
+			check_map_cmd.id = mock_hwpt_id;
+		}
 		ASSERT_EQ(0, ioctl(self->fd,
 				   _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP),
 				   &check_map_cmd));
 
+		if (variant->new_hwpt) {
+			/* Detach from the new hwpt for its destroy() */
+			test_cmd_mock_domain_replace(mock_stdev_id,
+						     mock_hwpt_id);
+			test_ioctl_destroy(new_hwpt_id);
+		}
 		test_ioctl_destroy(mock_stdev_id);
 		test_cmd_destroy_access_pages(
 			access_cmd.id,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index b57e1e60f69d..e0e484e1c775 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -116,6 +116,42 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 
 #define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \
 	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id))
+#define test_err_cmd_hwpt_alloc(_errno, device_id, pt_id, hwpt_id)     \
+	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \
+						  pt_id, hwpt_id))
+
+static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id,
+				       __u32 *hwpt_id)
+{
+	struct iommu_hwpt_selftest data = {
+		.flags = IOMMU_TEST_FLAG_NESTED,
+		.test_config = IOMMU_TEST_IOTLB_DEFAULT,
+	};
+	struct iommu_hwpt_alloc cmd = {
+		.size = sizeof(cmd),
+		.dev_id = device_id,
+		.pt_id = parent_id,
+		.data_type = IOMMU_HWPT_TYPE_SELFTTEST,
+		.data_len = sizeof(data),
+		.data_uptr = (uint64_t)&data,
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_HWPT_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	if (hwpt_id)
+		*hwpt_id = cmd.out_hwpt_id;
+	return 0;
+}
+
+#define test_cmd_hwpt_alloc_nested(device_id, parent_id, hwpt_id)     \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc_nested(self->fd, device_id, \
+						 parent_id, hwpt_id))
+#define test_err_cmd_hwpt_alloc_nested(_errno, device_id, parent_id, hwpt_id) \
+	EXPECT_ERRNO(_errno,                                                  \
+		     _test_cmd_hwpt_alloc_nested(self->fd, device_id,         \
+						 parent_id, hwpt_id))
 
 static int _test_cmd_access_set_ioas(int fd, __u32 access_id,
 				     unsigned int ioas_id)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/12] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (9 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 10/12] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-09  8:09 ` [PATCH 12/12] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
  2023-03-09 14:02 ` [PATCH 00/12] iommufd: Add nesting infrastructure Baolu Lu
  12 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Nicolin Chen <nicolinc@nvidia.com>

This allows to test whether IOTLB has been invalidated or not.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  4 ++++
 drivers/iommu/iommufd/selftest.c              | 22 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       |  4 ++++
 tools/testing/selftests/iommu/iommufd_utils.h | 14 ++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 8bbb4f4fa1d7..4b60a6df0428 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -19,6 +19,7 @@ enum {
 	IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
 	IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
 	IOMMU_TEST_OP_ACCESS_SET_IOAS,
+	IOMMU_TEST_OP_MD_CHECK_IOTLB,
 };
 
 enum {
@@ -95,6 +96,9 @@ struct iommu_test_cmd {
 		struct {
 			__u32 ioas_id;
 		} access_set_ioas;
+		struct {
+			__u32 iotlb;
+		} check_iotlb;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index b9c7f3bf64b5..a0d57f00f759 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -667,6 +667,25 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd,
 	return 0;
 }
 
+static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd,
+				       u32 mockpt_id, u32 iotlb)
+{
+	struct iommufd_hw_pagetable *hwpt;
+	struct mock_iommu_domain *mock;
+	int rc = 0;
+
+	hwpt = get_md_pagetable(ucmd, mockpt_id, &mock);
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
+
+	mock = container_of(hwpt->domain, struct mock_iommu_domain, domain);
+
+	if (iotlb != mock->iotlb)
+		rc = -EINVAL;
+	iommufd_put_object(&hwpt->obj);
+	return rc;
+}
+
 struct selftest_access {
 	struct iommufd_access *access;
 	struct file *file;
@@ -1077,6 +1096,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 		return iommufd_test_md_check_refs(
 			ucmd, u64_to_user_ptr(cmd->check_refs.uptr),
 			cmd->check_refs.length, cmd->check_refs.refs);
+	case IOMMU_TEST_OP_MD_CHECK_IOTLB:
+		return iommufd_test_md_check_iotlb(ucmd, cmd->id,
+						   cmd->check_iotlb.iotlb);
 	case IOMMU_TEST_OP_CREATE_ACCESS:
 		return iommufd_test_create_access(ucmd,
 						  cmd->create_access.flags);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 783afd6ccd2e..10f1592dc9e7 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -313,6 +313,10 @@ TEST_F(iommufd_ioas, nested_hwpt_alloc)
 					   &nested_hwpt_id[0]);
 		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id,
 					   &nested_hwpt_id[1]);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0],
+					  IOMMU_TEST_IOTLB_DEFAULT);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[1],
+					  IOMMU_TEST_IOTLB_DEFAULT);
 
 		/* Negative test: a nested hwpt on top of a nested hwpt */
 		test_err_cmd_hwpt_alloc_nested(EINVAL, self->device_id,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index e0e484e1c775..211798190e5b 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -120,6 +120,20 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \
 						  pt_id, hwpt_id))
 
+#define test_cmd_hwpt_check_iotlb(hwpt_id, expected)                           \
+	({                                                                     \
+		struct iommu_test_cmd test_cmd = {                             \
+			.size = sizeof(test_cmd),                              \
+			.op = IOMMU_TEST_OP_MD_CHECK_IOTLB,                    \
+			.id = hwpt_id,                                         \
+			.check_iotlb = { .iotlb = expected },                  \
+		};                                                             \
+		ASSERT_EQ(0,                                                   \
+			  ioctl(self->fd,                                      \
+				_IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_IOTLB), \
+				&test_cmd));                                   \
+	})
+
 static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id,
 				       __u32 *hwpt_id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/12] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (10 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 11/12] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
@ 2023-03-09  8:09 ` Yi Liu
  2023-03-09 14:02 ` [PATCH 00/12] iommufd: Add nesting infrastructure Baolu Lu
  12 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-03-09  8:09 UTC (permalink / raw)
  To: joro, alex.williamson, jgg, kevin.tian, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, iommu, linux-kernel,
	linux-kselftest

From: Nicolin Chen <nicolinc@nvidia.com>

Add a mock_domain_cache_invalidate_user() and a corresponding struct
iommu_hwpt_invalidate_selftest, to allow to test IOMMU_HWPT_INVALIDATE
from user space, by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c          | 10 +++++++--
 drivers/iommu/iommufd/iommufd_test.h          | 10 +++++++++
 drivers/iommu/iommufd/selftest.c              | 15 +++++++++++++
 tools/testing/selftests/iommu/iommufd.c       |  8 +++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 21 +++++++++++++++++++
 5 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 0871da848447..15dd1743d888 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -339,7 +339,8 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
 	 * data_len should not exceed the size of iommufd_invalidate_buffer.
 	 */
 	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len ||
-	    cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
+	    (cmd->data_type != IOMMU_HWPT_TYPE_SELFTTEST &&
+	     cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size)))
 		return -EOPNOTSUPP;
 
 	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
@@ -352,7 +353,12 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
 		goto out_put_hwpt;
 	}
 
-	klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
+	if (cmd->data_type != IOMMU_HWPT_TYPE_SELFTTEST)
+		klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
+#ifdef CONFIG_IOMMUFD_TEST
+	else
+		klen = sizeof(struct iommu_hwpt_invalidate_selftest);
+#endif
 	if (!klen) {
 		rc = -EINVAL;
 		goto out_put_hwpt;
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 4b60a6df0428..090e9d6abd87 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -135,4 +135,14 @@ struct iommu_hwpt_selftest {
 	__u64 test_config;
 };
 
+/**
+ * struct iommu_hwpt_invalidate_selftest
+ *
+ * @flags: invalidate flags
+ */
+struct iommu_hwpt_invalidate_selftest {
+#define IOMMU_TEST_INVALIDATE_ALL	(1ULL << 0)
+	__u64 flags;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index a0d57f00f759..fe1caba9e2c5 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -351,9 +351,24 @@ static const struct iommu_ops mock_ops = {
 		},
 };
 
+static void mock_domain_cache_invalidate_user(struct iommu_domain *domain,
+					      void *user_data)
+{
+	struct iommu_hwpt_invalidate_selftest *inv_info = user_data;
+	struct mock_iommu_domain *mock =
+		container_of(domain, struct mock_iommu_domain, domain);
+
+	if (domain->type != IOMMU_DOMAIN_NESTED || !mock->parent)
+		return;
+
+	if (inv_info->flags & IOMMU_TEST_INVALIDATE_ALL)
+		mock->iotlb = 0;
+}
+
 static struct iommu_domain_ops domain_nested_ops = {
 	.free = mock_domain_free,
 	.attach_dev = mock_domain_nop_attach,
+	.cache_invalidate_user = mock_domain_cache_invalidate_user,
 };
 
 struct iommu_device mock_iommu_device = {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 10f1592dc9e7..89a44479835d 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -126,6 +126,7 @@ TEST_F(iommufd, cmd_length)
 	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS);
 	TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO);
 	TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC);
+	TEST_LENGTH(iommu_hwpt_invalidate, IOMMU_HWPT_INVALIDATE);
 #undef TEST_LENGTH
 }
 
@@ -326,6 +327,13 @@ TEST_F(iommufd_ioas, nested_hwpt_alloc)
 		EXPECT_ERRNO(EBUSY,
 			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
 
+		/* hwpt_invalidate only supports a user-managed hwpt (nested) */
+		test_err_cmd_hwpt_invalidate(EINVAL, parent_hwpt_id);
+		test_cmd_hwpt_invalidate(nested_hwpt_id[0]);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0);
+		test_cmd_hwpt_invalidate(nested_hwpt_id[1]);
+		test_cmd_hwpt_check_iotlb(nested_hwpt_id[1], 0);
+
 		/* Attach device to nested_hwpt_id[0] that then will be busy */
 		test_cmd_mock_domain_replace(self->stdev_id,
 					     nested_hwpt_id[0]);
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 211798190e5b..b1592b4f2049 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -167,6 +167,27 @@ static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id,
 		     _test_cmd_hwpt_alloc_nested(self->fd, device_id,         \
 						 parent_id, hwpt_id))
 
+static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id)
+{
+	struct iommu_hwpt_invalidate_selftest data = {
+		.flags = IOMMU_TEST_INVALIDATE_ALL,
+	};
+	struct iommu_hwpt_invalidate cmd = {
+		.size = sizeof(cmd),
+		.hwpt_id = hwpt_id,
+		.data_type = IOMMU_HWPT_TYPE_SELFTTEST,
+		.data_len = sizeof(data),
+		.data_uptr = (uint64_t)&data,
+	};
+
+	return ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd);
+}
+
+#define test_cmd_hwpt_invalidate(hwpt_id) \
+	ASSERT_EQ(0, _test_cmd_hwpt_invalidate(self->fd, hwpt_id))
+#define test_err_cmd_hwpt_invalidate(_errno, hwpt_id) \
+	EXPECT_ERRNO(_errno, _test_cmd_hwpt_invalidate(self->fd, hwpt_id))
+
 static int _test_cmd_access_set_ioas(int fd, __u32 access_id,
 				     unsigned int ioas_id)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/12] iommufd: Add nesting infrastructure
  2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
                   ` (11 preceding siblings ...)
  2023-03-09  8:09 ` [PATCH 12/12] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
@ 2023-03-09 14:02 ` Baolu Lu
  12 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-03-09 14:02 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 2023/3/9 16:08, Yi Liu wrote:
> Nested translation is a hardware feature that is supported by many modern
> IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
> to get access to the physical address. stage-1 translation table is owned
> by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
> to stage-1 translation table should be followed by an IOTLB invalidation.

The last sentence is not always true. If any entry of the stage-1
translation table is changed from non-present to present, the IOTLB has
no chance to cache it yet. Hence there's no need for invalidation.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace
  2023-03-09  8:08 ` [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace Yi Liu
@ 2023-03-10  0:56   ` Jason Gunthorpe
  2023-03-29 10:56     ` Liu, Yi L
  2023-04-13  0:44     ` Nicolin Chen
  0 siblings, 2 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-10  0:56 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 3ef84ee359d2..a269bc62a31c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
>   *           after use. Return the data buffer if success, or ERR_PTR on
>   *           failure.
>   * @domain_alloc: allocate iommu domain
> + * @domain_alloc_user: allocate user iommu domain
>   * @probe_device: Add device to iommu driver handling
>   * @release_device: Remove device from iommu driver handling
>   * @probe_finalize: Do final setup work after the device is added to an IOMMU
> @@ -266,6 +267,9 @@ struct iommu_ops {
>  
>  	/* Domain allocation and freeing by the iommu driver */
>  	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> +	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
> +						  struct iommu_domain *parent,
> +						  const void *user_data);

Since the kernel does the copy from user and manages the zero fill
compat maybe this user_data have a union like Robin suggested.

But yes, this is the idea.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/12] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
  2023-03-09  8:09 ` [PATCH 03/12] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
@ 2023-03-10  1:17   ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-03-10  1:17 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 3/9/23 4:09 PM, Yi Liu wrote:
> This converts IOMMUFD to use iommu_domain_alloc_user() for iommu_domain
> creation.

Nit: I don't think this is a "convert" because iommu_domain_alloc() is
still there.

> 
> Suggested-by: Jason Gunthorpe<jgg@nvidia.com>
> Co-developed-by: Nicolin Chen<nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
> Signed-off-by: Yi Liu<yi.l.liu@intel.com>

Anyway, I agree with the idea,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

Best regards,
baolu


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-03-09  8:09 ` [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
@ 2023-03-10  2:10   ` Baolu Lu
  2023-03-10 17:49     ` Jason Gunthorpe
  2023-03-17 10:23   ` Tian, Kevin
  1 sibling, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-03-10  2:10 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 3/9/23 4:09 PM, Yi Liu wrote:
> Nested translation has stage-1 and stage-2 page tables. A stage-1 page
> table is managed by user space, and it needs to work with a stage-2 page
> table, which is a parent hwpt for the stage-1 hwpt.
> 
> iommu core already supports accepting parent iommu_domain and user_data
> to allocate an iommu_domain. This makes iommufd_hw_pagetable_alloc() to
> accept the parent hwpt and user_data, and relays them to iommu core, to
> prepare for supporting hw_pagetable allocation with user_data.
> 
> Also, add a parent pointer in struct iommufd_hw_pagetable for taking and
> releasing its refcount.
> 
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/device.c          |  2 +-
>   drivers/iommu/iommufd/hw_pagetable.c    | 28 ++++++++++++++++++++++---
>   drivers/iommu/iommufd/iommufd_private.h |  5 ++++-
>   3 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 5c352807d946..19cd6df46c6a 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -558,7 +558,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
>   	}
>   
>   	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
> -					  immediate_attach);
> +					  NULL, NULL, immediate_attach);
>   	if (IS_ERR(hwpt)) {
>   		destroy_hwpt = ERR_CAST(hwpt);
>   		goto out_unlock;
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 84b4a11e62f8..16e92a1c150b 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -24,6 +24,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
>   	if (hwpt->domain)
>   		iommu_domain_free(hwpt->domain);
>   
> +	if (hwpt->parent)
> +		refcount_dec(&hwpt->parent->obj.users);
>   	refcount_dec(&hwpt->ioas->obj.users);
>   }
>   
> @@ -46,6 +48,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>    * @ictx: iommufd context
>    * @ioas: IOAS to associate the domain with
>    * @idev: Device to get an iommu_domain for
> + * @parent: Optional parent HWPT to associate with the domain with
> + * @user_data: Optional user_data pointer
>    * @immediate_attach: True if idev should be attached to the hwpt
>    *
>    * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
> @@ -54,14 +58,20 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>    */
>   struct iommufd_hw_pagetable *
>   iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
> -			   struct iommufd_device *idev, bool immediate_attach)
> +			   struct iommufd_device *idev,
> +			   struct iommufd_hw_pagetable *parent,
> +			   void *user_data, bool immediate_attach)
>   {
>   	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> +	struct iommu_domain *parent_domain = NULL;
>   	struct iommufd_hw_pagetable *hwpt;
>   	int rc;
>   
>   	lockdep_assert_held(&ioas->mutex);
>   
> +	if (parent && !ops->domain_alloc_user)
> +		return ERR_PTR(-EOPNOTSUPP);

My understanding here is that we are checking whether domain_alloc_user
is required. It seems that as long as the caller inputs a valid
user_data or parent, domain_alloc_user is required. If so,

	if ((user_data || parent) && !ops->domain_alloc_user)
		return ERR_PTR(-EOPNOTSUPP);

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-09  8:09 ` [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
@ 2023-03-10  2:25   ` Baolu Lu
  2023-03-10  6:50     ` Nicolin Chen
  2023-03-10 15:29   ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-03-10  2:25 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 3/9/23 4:09 PM, Yi Liu wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> A user-managed hw_pagetable does not need to get populated, since it is
> managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
> calls into a helper, where the hwpt pointer will be redirected to its
> hwpt->parent if it's available.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 16e92a1c150b..6e45ec0a66fa 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>   	return 0;
>   }
>   
> +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
> +{
> +	int rc;
> +
> +	if (hwpt->parent)
> +		hwpt = hwpt->parent;
> +
> +	if (!list_empty(&hwpt->hwpt_item))
> +		return 0;

What is above check for? Is it "the hwpt has already been inserted into
the hwpt list of its ioas in another place"?

If so, is it possible that hwpt will be deleted from the list even when
this user hwpt is still linked to the ioas?

> +
> +	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> +	if (rc)
> +		return rc;
> +	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> +	return 0;
> +}
> +
>   /**
>    * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
>    * @ictx: iommufd context
> @@ -131,10 +148,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>   			goto out_unlock;
>   	}
>   
> -	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> +	rc = iommufd_hw_pagetable_link_ioas(hwpt);
>   	if (rc)
>   		goto out_detach;
> -	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
>   
>   	mutex_unlock(&idev->igroup->lock);
>   	return hwpt;

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-03-09  8:09 ` [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
@ 2023-03-10  3:02   ` Baolu Lu
  2023-03-23  8:11     ` Liu, Yi L
  0 siblings, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-03-10  3:02 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 3/9/23 4:09 PM, Yi Liu wrote:
> IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> But it can only allocate hw_pagetables linked with IOAS. There are needs
> to support hw_pagetable allocation with parameters specified by user. For
> example, in nested translation, user needs to allocate hw_pagetable for
> the stage-1 translation (e.g. a single I/O page table or a set of I/O page
> tables) with user data. It also needs provide a stage-2 hw_pagetable which
> is linked to the GPA IOAS.
> 
> This extends IOMMU_HWPT_ALLOC to accept user specified parameter and hwpt
> ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires
> a parent hwpt to point to stage-2 translation.
> 
> enum iommu_hwpt_type is defined to differentiate the user parameters use
> by different usages. For the allocations that don't require user parameter,
> IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility. Other types
> would be added by future iommu vendor driver extensions.
> 
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/hw_pagetable.c | 94 +++++++++++++++++++++++++---
>   drivers/iommu/iommufd/main.c         |  2 +-
>   include/uapi/linux/iommufd.h         | 30 +++++++++
>   3 files changed, 115 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 6e45ec0a66fa..64e7cf7142e1 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -165,34 +165,106 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>   	return ERR_PTR(rc);
>   }
>   
> +/*
> + * size of page table type specific data, indexed by
> + * enum iommu_hwpt_type.
> + */
> +static const size_t iommufd_hwpt_alloc_data_size[] = {
> +	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
> +};
> +
>   int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>   {
>   	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
> -	struct iommufd_hw_pagetable *hwpt;
> +	struct iommufd_hw_pagetable *hwpt, *parent = NULL;
> +	struct iommufd_object *pt_obj;
>   	struct iommufd_device *idev;
>   	struct iommufd_ioas *ioas;
> +	const struct iommu_ops *ops;
> +	void *data = NULL;
> +	u32 klen;
>   	int rc;

Reverse Christmas tree format. Ditto to other places in this file.

>   
> -	if (cmd->flags)
> +	if (cmd->__reserved || cmd->flags)
>   		return -EOPNOTSUPP;
>   
>   	idev = iommufd_get_device(ucmd, cmd->dev_id);
>   	if (IS_ERR(idev))
>   		return PTR_ERR(idev);
>   
> -	ioas = iommufd_get_ioas(ucmd, cmd->pt_id);
> -	if (IS_ERR(ioas)) {
> -		rc = PTR_ERR(ioas);
> +	ops = dev_iommu_ops(idev->dev);
> +	if (!ops) {
> +		rc = -EOPNOTSUPP;
>   		goto out_put_idev;
>   	}

No need to check. dev_iommu_ops() will never returns a NULL.

>   
> +	/* Only support IOMMU_HWPT_TYPE_DEFAULT for now */
> +	if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) {
> +		rc = -EINVAL;
> +		goto out_put_idev;
> +	}
> +
> +	pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY);
> +	if (IS_ERR(pt_obj)) {
> +		rc = -EINVAL;
> +		goto out_put_idev;
> +	}
> +
> +	switch (pt_obj->type) {
> +	case IOMMUFD_OBJ_IOAS:
> +		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> +		break;
> +	case IOMMUFD_OBJ_HW_PAGETABLE:
> +		/* pt_id points HWPT only when data_type is !IOMMU_HWPT_TYPE_DEFAULT */
> +		if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +
> +		parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
> +		/*
> +		 * Cannot allocate user-managed hwpt linking to auto_created
> +		 * hwpt. If the parent hwpt is already a user-managed hwpt,
> +		 * don't allocate another user-managed hwpt linking to it.
> +		 */
> +		if (parent->auto_domain || parent->parent) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}
> +		ioas = parent->ioas;
> +		break;
> +	default:
> +		rc = -EINVAL;
> +		goto out_put_pt;
> +	}
> +
> +	klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
> +	if (klen) {
> +		if (!cmd->data_len) {
> +			rc = -EINVAL;
> +			goto out_put_pt;
> +		}

Is the user_data still valid if (cmd->data_len < klen)?

> +
> +		data = kzalloc(klen, GFP_KERNEL);
> +		if (!data) {
> +			rc = -ENOMEM;
> +			goto out_put_pt;
> +		}
> +
> +		rc = copy_struct_from_user(data, klen,
> +					   u64_to_user_ptr(cmd->data_uptr),
> +					   cmd->data_len);
> +		if (rc)
> +			goto out_free_data;
> +	}
> +
>   	mutex_lock(&ioas->mutex);
>   	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
> -					  NULL, NULL, false);
> +					  parent, data, false);
>   	mutex_unlock(&ioas->mutex);
>   	if (IS_ERR(hwpt)) {
>   		rc = PTR_ERR(hwpt);
> -		goto out_put_ioas;
> +		goto out_free_data;
>   	}
>   
>   	cmd->out_hwpt_id = hwpt->obj.id;
> @@ -200,12 +272,14 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>   	if (rc)
>   		goto out_hwpt;
>   	iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
> -	goto out_put_ioas;
> +	goto out_free_data;
>   
>   out_hwpt:
>   	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
> -out_put_ioas:
> -	iommufd_put_object(&ioas->obj);
> +out_free_data:
> +	kfree(data);
> +out_put_pt:
> +	iommufd_put_object(pt_obj);
>   out_put_idev:
>   	iommufd_put_object(&idev->obj);
>   	return rc;
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index f079c0bda46b..7ab1e2c638a1 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -295,7 +295,7 @@ struct iommufd_ioctl_op {
>   static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>   	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
>   	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
> -		 __reserved),
> +		 data_uptr),
>   	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
>   		 struct iommu_hw_info, __reserved),
>   	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 4ac525897b82..48781ff40a37 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -347,6 +347,14 @@ struct iommu_vfio_ioas {
>   };
>   #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
>   
> +/**
> + * enum iommu_hwpt_type - IOMMU HWPT Type
> + * @IOMMU_HWPT_TYPE_DEFAULT: default
> + */
> +enum iommu_hwpt_type {
> +	IOMMU_HWPT_TYPE_DEFAULT,
> +};
> +
>   /**
>    * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
>    * @size: sizeof(struct iommu_hwpt_alloc)
> @@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
>    * @pt_id: The IOAS to connect this HWPT to
>    * @out_hwpt_id: The ID of the new HWPT
>    * @__reserved: Must be 0
> + * @data_type: One of enum iommu_hwpt_type
> + * @data_len: Length of the type specific data
> + * @data_uptr: User pointer to the type specific data
>    *
>    * Explicitly allocate a hardware page table object. This is the same object
>    * type that is returned by iommufd_device_attach() and represents the
>    * underlying iommu driver's iommu_domain kernel object.
>    *
>    * A normal HWPT will be created with the mappings from the given IOAS.
> + * The @data_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or
> + * another type (being listed below) to specialize a kernel-managed HWPT.
> + *
> + * A user-managed HWPT will be created from a given parent HWPT via @pt_id, in
> + * which the parent HWPT must be allocated previously via the same ioctl from a
> + * given IOAS. The @data_type must not be set to IOMMU_HWPT_TYPE_DEFAULT but a
> + * pre-defined type corresponding to the underlying IOMMU hardware.
> + *
> + * If the @data_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the @data_len and
> + * the @data_uptr will be ignored. Otherwise, both of them must be given.
> + *
> + * +==============================+=====================================+===========+
> + * | @data_type                   |    Data structure in @data_uptr     |   @pt_id  |
> + * +------------------------------+-------------------------------------+-----------+
> + * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS   |
> + * +------------------------------+-------------------------------------+-----------+
>    */
>   struct iommu_hwpt_alloc {
>   	__u32 size;
> @@ -369,6 +396,9 @@ struct iommu_hwpt_alloc {
>   	__u32 pt_id;
>   	__u32 out_hwpt_id;
>   	__u32 __reserved;
> +	__u32 data_type;
> +	__u32 data_len;
> +	__aligned_u64 data_uptr;
>   };
>   #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-09  8:09 ` [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
@ 2023-03-10  3:15   ` Baolu Lu
  2023-03-14  4:12     ` Liu, Yi L
  2023-03-10 17:50   ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-03-10  3:15 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 3/9/23 4:09 PM, Yi Liu wrote:
> In nested translation, the stage-1 page table is user-managed and used
> by IOMMU hardware, so destroying mappings in the stage-1 page table should
> be followed with an IOTLB invalidation.

s/destroying mappings/update of any present page table entry/

> This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.
> 
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/hw_pagetable.c    | 56 +++++++++++++++++++++++++
>   drivers/iommu/iommufd/iommufd_private.h |  9 ++++
>   drivers/iommu/iommufd/main.c            |  3 ++
>   include/uapi/linux/iommufd.h            | 27 ++++++++++++
>   4 files changed, 95 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 64e7cf7142e1..67facca98de1 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -284,3 +284,59 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>   	iommufd_put_object(&idev->obj);
>   	return rc;
>   }
> +
> +/*
> + * size of page table type specific invalidate_info, indexed by
> + * enum iommu_hwpt_type.
> + */
> +static const size_t iommufd_hwpt_invalidate_info_size[] = {};
> +
> +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
> +	struct iommufd_hw_pagetable *hwpt;
> +	u64 user_ptr;
> +	u32 user_data_len, klen;
> +	int rc = 0;
> +
> +	/*
> +	 * For a user-managed HWPT, type should not be IOMMU_HWPT_TYPE_DEFAULT.
> +	 * data_len should not exceed the size of iommufd_invalidate_buffer.
> +	 */
> +	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len ||
> +	    cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))

"data_len should not exceed the size of iommufd_invalidate_buffer."

How is this checked?

> +		return -EOPNOTSUPP;
> +
> +	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
> +	if (IS_ERR(hwpt))
> +		return PTR_ERR(hwpt);
> +
> +	/* Do not allow any kernel-managed hw_pagetable */
> +	if (!hwpt->parent) {
> +		rc = -EINVAL;
> +		goto out_put_hwpt;
> +	}
> +
> +	klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
> +	if (!klen) {
> +		rc = -EINVAL;
> +		goto out_put_hwpt;
> +	}
> +
> +	/*
> +	 * Copy the needed fields before reusing the ucmd buffer, this
> +	 * avoids memory allocation in this path.
> +	 */
> +	user_ptr = cmd->data_uptr;
> +	user_data_len = cmd->data_len;

Is it a valid case if "user_data_len < klen"?

> +
> +	rc = copy_struct_from_user(cmd, klen,
> +				   u64_to_user_ptr(user_ptr), user_data_len);
> +	if (rc)
> +		goto out_put_hwpt;
> +
> +	hwpt->domain->ops->cache_invalidate_user(hwpt->domain, cmd);
> +out_put_hwpt:
> +	iommufd_put_object(&hwpt->obj);
> +	return rc;
> +}
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 182c074eecdc..d879264d1acf 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -265,6 +265,7 @@ struct iommufd_hw_pagetable *
>   iommufd_hw_pagetable_detach(struct iommufd_device *idev);
>   void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
>   int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
> +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
>   
>   static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
>   					    struct iommufd_hw_pagetable *hwpt)
> @@ -276,6 +277,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
>   		refcount_dec(&hwpt->obj.users);
>   }
>   
> +static inline struct iommufd_hw_pagetable *
> +iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id)
> +{
> +	return container_of(iommufd_get_object(ucmd->ictx, id,
> +					       IOMMUFD_OBJ_HW_PAGETABLE),
> +			    struct iommufd_hw_pagetable, obj);
> +}
> +
>   struct iommufd_group {
>   	struct kref ref;
>   	struct mutex lock;
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 7ab1e2c638a1..2cf45f65b637 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -263,6 +263,7 @@ union ucmd_buffer {
>   	struct iommu_destroy destroy;
>   	struct iommu_hwpt_alloc hwpt;
>   	struct iommu_hw_info info;
> +	struct iommu_hwpt_invalidate cache;
>   	struct iommu_ioas_alloc alloc;
>   	struct iommu_ioas_allow_iovas allow_iovas;
>   	struct iommu_ioas_copy ioas_copy;
> @@ -298,6 +299,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>   		 data_uptr),
>   	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info,
>   		 struct iommu_hw_info, __reserved),
> +	IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
> +		 struct iommu_hwpt_invalidate, data_uptr),
>   	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
>   		 struct iommu_ioas_alloc, out_ioas_id),
>   	IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 48781ff40a37..d0962c41f8d6 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -47,6 +47,7 @@ enum {
>   	IOMMUFD_CMD_VFIO_IOAS,
>   	IOMMUFD_CMD_HWPT_ALLOC,
>   	IOMMUFD_CMD_DEVICE_GET_HW_INFO,
> +	IOMMUFD_CMD_HWPT_INVALIDATE,
>   };
>   
>   /**
> @@ -447,4 +448,30 @@ struct iommu_hw_info {
>   	__u32 __reserved;
>   };
>   #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
> +
> +/**
> + * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
> + * @size: sizeof(struct iommu_hwpt_invalidate)
> + * @hwpt_id: HWPT ID of target hardware page table for the invalidation
> + * @data_type: One of enum iommu_hwpt_type
> + * @data_len: Length of the type specific data
> + * @data_uptr: User pointer to the type specific data
> + *
> + * Invalidate the iommu cache for user-managed page table. Modifications
> + * on user-managed page table should be followed with this operation to
> + * sync the IOTLB. This is only needed by user-managed hw_pagetables, so
> + * the @data_type should never be IOMMU_HWPT_TYPE_DEFAULT.
> + *
> + * +==============================+========================================+
> + * | @data_type                   |     Data structure in @data_uptr       |
> + * +------------------------------+----------------------------------------+
> + */
> +struct iommu_hwpt_invalidate {
> +	__u32 size;
> +	__u32 hwpt_id;
> +	__u32 data_type;
> +	__u32 data_len;
> +	__aligned_u64 data_uptr;
> +};
> +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
>   #endif

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-09  8:09 ` [PATCH 08/12] iommufd/device: Report supported hwpt_types Yi Liu
@ 2023-03-10  3:30   ` Baolu Lu
  2023-03-10  7:10     ` Nicolin Chen
  0 siblings, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-03-10  3:30 UTC (permalink / raw)
  To: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 3/9/23 4:09 PM, Yi Liu wrote:
> This provides a way for userspace to probe the supported hwpt data
> types by kernel. Currently, kernel only supports IOMMU_HWPT_TYPE_DEFAULT,
> new types would be added per vendor drivers' extension.
> 
> Userspace that wants to allocate hw_pagetable with user data should check
> this. While for the allocation without user data, no need for it. It is
> supported by default.
> 
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>   drivers/iommu/iommufd/device.c          |  1 +
>   drivers/iommu/iommufd/hw_pagetable.c    | 18 +++++++++++++++---
>   drivers/iommu/iommufd/iommufd_private.h |  2 ++
>   drivers/iommu/iommufd/main.c            |  2 +-
>   include/uapi/linux/iommufd.h            |  8 ++++++++
>   5 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 19cd6df46c6a..0328071dcac1 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
>   
>   	cmd->out_data_type = ops->driver_type;
>   	cmd->data_len = length;
> +	cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
>   
>   	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
>   
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 67facca98de1..160712256c64 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -173,6 +173,14 @@ static const size_t iommufd_hwpt_alloc_data_size[] = {
>   	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
>   };
>   
> +/*
> + * bitmaps of supported hwpt types of by underlying iommu, indexed
> + * by ops->driver_type which is one of enum iommu_hw_info_type.
> + */
> +const u64 iommufd_hwpt_type_bitmaps[] =  {
> +	[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
> +};

I am a bit confused here. Why do you need this array? What I read is
that you want to convert ops->driver_type to a bit position in
cmd->out_hwpt_type_bitmap.

Am I getting it right?

If so, why not just
	cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);

?	

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-10  2:25   ` Baolu Lu
@ 2023-03-10  6:50     ` Nicolin Chen
  2023-03-10 12:51       ` Baolu Lu
  2023-03-23  8:06       ` Liu, Yi L
  0 siblings, 2 replies; 49+ messages in thread
From: Nicolin Chen @ 2023-03-10  6:50 UTC (permalink / raw)
  To: Yi Liu, Baolu Lu
  Cc: joro, alex.williamson, jgg, kevin.tian, robin.murphy, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 3/9/23 4:09 PM, Yi Liu wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > 
> > A user-managed hw_pagetable does not need to get populated, since it is
> > managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
> > calls into a helper, where the hwpt pointer will be redirected to its
> > hwpt->parent if it's available.
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
> >   1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> > index 16e92a1c150b..6e45ec0a66fa 100644
> > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
> >       return 0;
> >   }
> > 
> > +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
> > +{
> > +     int rc;
> > +
> > +     if (hwpt->parent)
> > +             hwpt = hwpt->parent;
> > +
> > +     if (!list_empty(&hwpt->hwpt_item))
> > +             return 0;
> 
> What is above check for? Is it "the hwpt has already been inserted into
> the hwpt list of its ioas in another place"?
> 
> If so, is it possible that hwpt will be deleted from the list even when
> this user hwpt is still linked to the ioas?

It means that the hwpt is already linked to the ioas. And the
hwpt_item can be only empty after a destroy().

With that being said, after I think it through, perhaps Yi's
previous change removing it might be better. So, it could be:

-------------------------------------------------------------------------------
+	/*
+	 * Only a parent hwpt needs to be linked to the IOAS. And a hwpt->parent
+	 * must be linked to the IOAS already, when it's being allocated.
+	 */
 	if (hwpt->parent)
-		hwpt = hwpt->parent;
-
-	if (!list_empty(&hwpt->hwpt_item))
 		return 0;
 
-------------------------------------------------------------------------------

I was concerned about the case where a device gets attached to
the nested hwpt without staging at the parent hwpt first. But,
the link between the parent hwpt and the IOAS happened inside
the allocation function now, not attach() any more.

Thanks
Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-10  3:30   ` Baolu Lu
@ 2023-03-10  7:10     ` Nicolin Chen
  2023-03-10  7:39       ` Liu, Yi L
  0 siblings, 1 reply; 49+ messages in thread
From: Nicolin Chen @ 2023-03-10  7:10 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, jgg, kevin.tian, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

On Fri, Mar 10, 2023 at 11:30:04AM +0800, Baolu Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 3/9/23 4:09 PM, Yi Liu wrote:
> > This provides a way for userspace to probe the supported hwpt data
> > types by kernel. Currently, kernel only supports IOMMU_HWPT_TYPE_DEFAULT,
> > new types would be added per vendor drivers' extension.
> > 
> > Userspace that wants to allocate hw_pagetable with user data should check
> > this. While for the allocation without user data, no need for it. It is
> > supported by default.
> > 
> > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/iommufd/device.c          |  1 +
> >   drivers/iommu/iommufd/hw_pagetable.c    | 18 +++++++++++++++---
> >   drivers/iommu/iommufd/iommufd_private.h |  2 ++
> >   drivers/iommu/iommufd/main.c            |  2 +-
> >   include/uapi/linux/iommufd.h            |  8 ++++++++
> >   5 files changed, 27 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 19cd6df46c6a..0328071dcac1 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
> > 
> >       cmd->out_data_type = ops->driver_type;
> >       cmd->data_len = length;
> > +     cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
> > 
> >       rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > 
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> > index 67facca98de1..160712256c64 100644
> > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > @@ -173,6 +173,14 @@ static const size_t iommufd_hwpt_alloc_data_size[] = {
> >       [IOMMU_HWPT_TYPE_DEFAULT] = 0,
> >   };
> > 
> > +/*
> > + * bitmaps of supported hwpt types of by underlying iommu, indexed
> > + * by ops->driver_type which is one of enum iommu_hw_info_type.
> > + */
> > +const u64 iommufd_hwpt_type_bitmaps[] =  {
> > +     [IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
> > +};
> 
> I am a bit confused here. Why do you need this array? What I read is
> that you want to convert ops->driver_type to a bit position in
> cmd->out_hwpt_type_bitmap.
> 
> Am I getting it right?
> 
> If so, why not just
>        cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
> 
> ?

A driver_type would be IOMMUFD_HW_INFO_TYPExx. What's inside the
BIT_ULL is IOMMUFD_HWPT_TYPE_*. It seems to get a bit confusing
after several rounds of renaming though. And they do seem to be
a bit of duplications at the actual values, at least for now.

I recall that we only had one enum for the types, yet later did
a separation into driver_types and hwpt_types after a discussion
regarding future PASID support?

Thanks
Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-10  7:10     ` Nicolin Chen
@ 2023-03-10  7:39       ` Liu, Yi L
  2023-03-10  7:45         ` Nicolin Chen
  0 siblings, 1 reply; 49+ messages in thread
From: Liu, Yi L @ 2023-03-10  7:39 UTC (permalink / raw)
  To: Nicolin Chen, Baolu Lu
  Cc: joro, alex.williamson, jgg, Tian, Kevin, robin.murphy, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, March 10, 2023 3:10 PM
> 
> On Fri, Mar 10, 2023 at 11:30:04AM +0800, Baolu Lu wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On 3/9/23 4:09 PM, Yi Liu wrote:
> > > This provides a way for userspace to probe the supported hwpt data
> > > types by kernel. Currently, kernel only supports
> IOMMU_HWPT_TYPE_DEFAULT,
> > > new types would be added per vendor drivers' extension.
> > >
> > > Userspace that wants to allocate hw_pagetable with user data should
> check
> > > this. While for the allocation without user data, no need for it. It is
> > > supported by default.
> > >
> > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >   drivers/iommu/iommufd/device.c          |  1 +
> > >   drivers/iommu/iommufd/hw_pagetable.c    | 18 +++++++++++++++---
> > >   drivers/iommu/iommufd/iommufd_private.h |  2 ++
> > >   drivers/iommu/iommufd/main.c            |  2 +-
> > >   include/uapi/linux/iommufd.h            |  8 ++++++++
> > >   5 files changed, 27 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommufd/device.c
> b/drivers/iommu/iommufd/device.c
> > > index 19cd6df46c6a..0328071dcac1 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct
> iommufd_ucmd *ucmd)
> > >
> > >       cmd->out_data_type = ops->driver_type;
> > >       cmd->data_len = length;
> > > +     cmd->out_hwpt_type_bitmap =
> iommufd_hwpt_type_bitmaps[ops->driver_type];
> > >
> > >       rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > >
> > > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> b/drivers/iommu/iommufd/hw_pagetable.c
> > > index 67facca98de1..160712256c64 100644
> > > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > > @@ -173,6 +173,14 @@ static const size_t
> iommufd_hwpt_alloc_data_size[] = {
> > >       [IOMMU_HWPT_TYPE_DEFAULT] = 0,
> > >   };
> > >
> > > +/*
> > > + * bitmaps of supported hwpt types of by underlying iommu, indexed
> > > + * by ops->driver_type which is one of enum iommu_hw_info_type.
> > > + */
> > > +const u64 iommufd_hwpt_type_bitmaps[] =  {
> > > +     [IOMMU_HW_INFO_TYPE_DEFAULT] =
> BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
> > > +};
> >
> > I am a bit confused here. Why do you need this array? What I read is
> > that you want to convert ops->driver_type to a bit position in
> > cmd->out_hwpt_type_bitmap.
> >
> > Am I getting it right?
> >
> > If so, why not just
> >        cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
> >
> > ?

The reason is for future extensions. If future usages need different types
of user data to allocate hwpt,  it can define a new type and corresponding
data structure. Such new usages may be using new vendor-specific page
tables or vendor-agnostic usages like Re-use of the KVM page table in
the IOMMU mentioned by IOMMUFD basic series.

https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
 
> A driver_type would be IOMMUFD_HW_INFO_TYPExx. What's inside the
> BIT_ULL is IOMMUFD_HWPT_TYPE_*. It seems to get a bit confusing
> after several rounds of renaming though. And they do seem to be
> a bit of duplications at the actual values, at least for now.

For now, vendor drivers only have one stage-1 page table format.
But in the future, it may change per new page table format
introduction and new usage.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-10  7:39       ` Liu, Yi L
@ 2023-03-10  7:45         ` Nicolin Chen
  2023-03-10 17:52           ` Jason Gunthorpe
  0 siblings, 1 reply; 49+ messages in thread
From: Nicolin Chen @ 2023-03-10  7:45 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Baolu Lu, joro, alex.williamson, jgg, Tian, Kevin, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

On Fri, Mar 10, 2023 at 07:39:00AM +0000, Liu, Yi L wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, March 10, 2023 3:10 PM
> >
> > On Fri, Mar 10, 2023 at 11:30:04AM +0800, Baolu Lu wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On 3/9/23 4:09 PM, Yi Liu wrote:
> > > > This provides a way for userspace to probe the supported hwpt data
> > > > types by kernel. Currently, kernel only supports
> > IOMMU_HWPT_TYPE_DEFAULT,
> > > > new types would be added per vendor drivers' extension.
> > > >
> > > > Userspace that wants to allocate hw_pagetable with user data should
> > check
> > > > this. While for the allocation without user data, no need for it. It is
> > > > supported by default.
> > > >
> > > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >   drivers/iommu/iommufd/device.c          |  1 +
> > > >   drivers/iommu/iommufd/hw_pagetable.c    | 18 +++++++++++++++---
> > > >   drivers/iommu/iommufd/iommufd_private.h |  2 ++
> > > >   drivers/iommu/iommufd/main.c            |  2 +-
> > > >   include/uapi/linux/iommufd.h            |  8 ++++++++
> > > >   5 files changed, 27 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/iommu/iommufd/device.c
> > b/drivers/iommu/iommufd/device.c
> > > > index 19cd6df46c6a..0328071dcac1 100644
> > > > --- a/drivers/iommu/iommufd/device.c
> > > > +++ b/drivers/iommu/iommufd/device.c
> > > > @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct
> > iommufd_ucmd *ucmd)
> > > >
> > > >       cmd->out_data_type = ops->driver_type;
> > > >       cmd->data_len = length;
> > > > +     cmd->out_hwpt_type_bitmap =
> > iommufd_hwpt_type_bitmaps[ops->driver_type];
> > > >
> > > >       rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > > >
> > > > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> > b/drivers/iommu/iommufd/hw_pagetable.c
> > > > index 67facca98de1..160712256c64 100644
> > > > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > > > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > > > @@ -173,6 +173,14 @@ static const size_t
> > iommufd_hwpt_alloc_data_size[] = {
> > > >       [IOMMU_HWPT_TYPE_DEFAULT] = 0,
> > > >   };
> > > >
> > > > +/*
> > > > + * bitmaps of supported hwpt types of by underlying iommu, indexed
> > > > + * by ops->driver_type which is one of enum iommu_hw_info_type.
> > > > + */
> > > > +const u64 iommufd_hwpt_type_bitmaps[] =  {
> > > > +     [IOMMU_HW_INFO_TYPE_DEFAULT] =
> > BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
> > > > +};
> > >
> > > I am a bit confused here. Why do you need this array? What I read is
> > > that you want to convert ops->driver_type to a bit position in
> > > cmd->out_hwpt_type_bitmap.
> > >
> > > Am I getting it right?
> > >
> > > If so, why not just
> > >        cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
> > >
> > > ?
> 
> The reason is for future extensions. If future usages need different types
> of user data to allocate hwpt,  it can define a new type and corresponding
> data structure. Such new usages may be using new vendor-specific page
> tables or vendor-agnostic usages like Re-use of the KVM page table in
> the IOMMU mentioned by IOMMUFD basic series.
> 
> https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
> 
> > A driver_type would be IOMMUFD_HW_INFO_TYPExx. What's inside the
> > BIT_ULL is IOMMUFD_HWPT_TYPE_*. It seems to get a bit confusing
> > after several rounds of renaming though. And they do seem to be
> > a bit of duplications at the actual values, at least for now.
> 
> For now, vendor drivers only have one stage-1 page table format.
> But in the future, it may change per new page table format
> introduction and new usage.

Yea, that's what I thought too. Yet, I am wondering a bit if
it'd be better to have an ops->hwpt_type in the drivers, v.s.
maintaining a potentially big chunk of the array here.

Thanks
Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-10  6:50     ` Nicolin Chen
@ 2023-03-10 12:51       ` Baolu Lu
  2023-03-23  8:06       ` Liu, Yi L
  1 sibling, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-03-10 12:51 UTC (permalink / raw)
  To: Nicolin Chen, Yi Liu
  Cc: baolu.lu, joro, alex.williamson, jgg, kevin.tian, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

On 2023/3/10 14:50, Nicolin Chen wrote:
> On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 3/9/23 4:09 PM, Yi Liu wrote:
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>
>>> A user-managed hw_pagetable does not need to get populated, since it is
>>> managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
>>> calls into a helper, where the hwpt pointer will be redirected to its
>>> hwpt->parent if it's available.
>>>
>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> ---
>>>    drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
>>>    1 file changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
>>> index 16e92a1c150b..6e45ec0a66fa 100644
>>> --- a/drivers/iommu/iommufd/hw_pagetable.c
>>> +++ b/drivers/iommu/iommufd/hw_pagetable.c
>>> @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>>>        return 0;
>>>    }
>>>
>>> +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
>>> +{
>>> +     int rc;
>>> +
>>> +     if (hwpt->parent)
>>> +             hwpt = hwpt->parent;
>>> +
>>> +     if (!list_empty(&hwpt->hwpt_item))
>>> +             return 0;
>>
>> What is above check for? Is it "the hwpt has already been inserted into
>> the hwpt list of its ioas in another place"?
>>
>> If so, is it possible that hwpt will be deleted from the list even when
>> this user hwpt is still linked to the ioas?
> 
> It means that the hwpt is already linked to the ioas. And the
> hwpt_item can be only empty after a destroy().
> 
> With that being said, after I think it through, perhaps Yi's
> previous change removing it might be better. So, it could be:
> 
> -------------------------------------------------------------------------------
> +	/*
> +	 * Only a parent hwpt needs to be linked to the IOAS. And a hwpt->parent
> +	 * must be linked to the IOAS already, when it's being allocated.
> +	 */
>   	if (hwpt->parent)
> -		hwpt = hwpt->parent;
> -
> -	if (!list_empty(&hwpt->hwpt_item))
>   		return 0;
>   
> -------------------------------------------------------------------------------
> 
> I was concerned about the case where a device gets attached to
> the nested hwpt without staging at the parent hwpt first. But,
> the link between the parent hwpt and the IOAS happened inside
> the allocation function now, not attach() any more.

Yes, it's clearer.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-09  8:09 ` [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
  2023-03-10  2:25   ` Baolu Lu
@ 2023-03-10 15:29   ` Jason Gunthorpe
  2023-03-10 23:31     ` Nicolin Chen
  1 sibling, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 15:29 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Mar 09, 2023 at 12:09:03AM -0800, Yi Liu wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> A user-managed hw_pagetable does not need to get populated, since it is
> managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
> calls into a helper, where the hwpt pointer will be redirected to its
> hwpt->parent if it's available.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 16e92a1c150b..6e45ec0a66fa 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
>  	return 0;
>  }
>  
> +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
> +{
> +	int rc;
> +
> +	if (hwpt->parent)

This should be:

   hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED

Ie if we asked the driver to alloc a domain and it allocated an
UNMANAGED domain then it means IOMMUFD manages the mappings and it
should be populated from the IOAS.

Arguably drivers should EOPNOTSUPP if presented with a parent in this
situation, but still this code should be clear about the purpose.

> +		hwpt = hwpt->parent;

And we definately shouldn't touch the parent. That is already setup
and owned by someone else. Just return and don't do anything.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-03-10  2:10   ` Baolu Lu
@ 2023-03-10 17:49     ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 17:49 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Fri, Mar 10, 2023 at 10:10:56AM +0800, Baolu Lu wrote:
> > @@ -54,14 +58,20 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
> >    */
> >   struct iommufd_hw_pagetable *
> >   iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
> > -			   struct iommufd_device *idev, bool immediate_attach)
> > +			   struct iommufd_device *idev,
> > +			   struct iommufd_hw_pagetable *parent,
> > +			   void *user_data, bool immediate_attach)
> >   {
> >   	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
> > +	struct iommu_domain *parent_domain = NULL;
> >   	struct iommufd_hw_pagetable *hwpt;
> >   	int rc;
> >   	lockdep_assert_held(&ioas->mutex);
> > +	if (parent && !ops->domain_alloc_user)
> > +		return ERR_PTR(-EOPNOTSUPP);
> 
> My understanding here is that we are checking whether domain_alloc_user
> is required. It seems that as long as the caller inputs a valid
> user_data or parent, domain_alloc_user is required. If so,
> 
> 	if ((user_data || parent) && !ops->domain_alloc_user)
> 		return ERR_PTR(-EOPNOTSUPP);

Yes

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-09  8:09 ` [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
  2023-03-10  3:15   ` Baolu Lu
@ 2023-03-10 17:50   ` Jason Gunthorpe
  2023-03-14  4:14     ` Liu, Yi L
  2023-03-14  4:18     ` Liu, Yi L
  1 sibling, 2 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 17:50 UTC (permalink / raw)
  To: Yi Liu
  Cc: joro, alex.williamson, kevin.tian, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
> +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
> +	struct iommufd_hw_pagetable *hwpt;
> +	u64 user_ptr;
> +	u32 user_data_len, klen;
> +	int rc = 0;
> +
> +	/*
> +	 * For a user-managed HWPT, type should not be IOMMU_HWPT_TYPE_DEFAULT.
> +	 * data_len should not exceed the size of iommufd_invalidate_buffer.
> +	 */
> +	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len ||
> +	    cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
> +		return -EOPNOTSUPP;

This needs to do the standard check for zeros in unknown trailing data
bit. Check that alloc does it too

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-10  7:45         ` Nicolin Chen
@ 2023-03-10 17:52           ` Jason Gunthorpe
  2023-03-23  8:08             ` Liu, Yi L
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-10 17:52 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Liu, Yi L, Baolu Lu, joro, alex.williamson, Tian, Kevin,
	robin.murphy, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Mar 09, 2023 at 11:45:05PM -0800, Nicolin Chen wrote:

> Yea, that's what I thought too. Yet, I am wondering a bit if
> it'd be better to have an ops->hwpt_type in the drivers, v.s.
> maintaining a potentially big chunk of the array here.

It probably is, some ops->iommufd_data.XX is not a bad idea

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-10 15:29   ` Jason Gunthorpe
@ 2023-03-10 23:31     ` Nicolin Chen
  0 siblings, 0 replies; 49+ messages in thread
From: Nicolin Chen @ 2023-03-10 23:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Fri, Mar 10, 2023 at 11:29:14AM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:09:03AM -0800, Yi Liu wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > 
> > A user-managed hw_pagetable does not need to get populated, since it is
> > managed by a guest OS. Move the iopt_table_add_domain and list_add_tail
> > calls into a helper, where the hwpt pointer will be redirected to its
> > hwpt->parent if it's available.
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> > index 16e92a1c150b..6e45ec0a66fa 100644
> > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
> >  	return 0;
> >  }
> >  
> > +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt)
> > +{
> > +	int rc;
> > +
> > +	if (hwpt->parent)
> 
> This should be:
> 
>    hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED
> 
> Ie if we asked the driver to alloc a domain and it allocated an
> UNMANAGED domain then it means IOMMUFD manages the mappings and it
> should be populated from the IOAS.

OK. That looks better to me.

> Arguably drivers should EOPNOTSUPP if presented with a parent in this
> situation, but still this code should be clear about the purpose.
> 
> > +		hwpt = hwpt->parent;
> 
> And we definately shouldn't touch the parent. That is already setup
> and owned by someone else. Just return and don't do anything.

Yes.

Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-10  3:15   ` Baolu Lu
@ 2023-03-14  4:12     ` Liu, Yi L
  0 siblings, 0 replies; 49+ messages in thread
From: Liu, Yi L @ 2023-03-14  4:12 UTC (permalink / raw)
  To: Baolu Lu, joro, alex.williamson, jgg, Tian, Kevin, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, March 10, 2023 11:16 AM
> 
> On 3/9/23 4:09 PM, Yi Liu wrote:
> > In nested translation, the stage-1 page table is user-managed and used
> > by IOMMU hardware, so destroying mappings in the stage-1 page table
> should
> > be followed with an IOTLB invalidation.
> 
> s/destroying mappings/update of any present page table entry/

Right. Not only destroying.

> > This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.
> >
> > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/iommufd/hw_pagetable.c    | 56
> +++++++++++++++++++++++++
> >   drivers/iommu/iommufd/iommufd_private.h |  9 ++++
> >   drivers/iommu/iommufd/main.c            |  3 ++
> >   include/uapi/linux/iommufd.h            | 27 ++++++++++++
> >   4 files changed, 95 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> b/drivers/iommu/iommufd/hw_pagetable.c
> > index 64e7cf7142e1..67facca98de1 100644
> > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > @@ -284,3 +284,59 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
> *ucmd)
> >   	iommufd_put_object(&idev->obj);
> >   	return rc;
> >   }
> > +
> > +/*
> > + * size of page table type specific invalidate_info, indexed by
> > + * enum iommu_hwpt_type.
> > + */
> > +static const size_t iommufd_hwpt_invalidate_info_size[] = {};
> > +
> > +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> > +{
> > +	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
> > +	struct iommufd_hw_pagetable *hwpt;
> > +	u64 user_ptr;
> > +	u32 user_data_len, klen;
> > +	int rc = 0;
> > +
> > +	/*
> > +	 * For a user-managed HWPT, type should not be
> IOMMU_HWPT_TYPE_DEFAULT.
> > +	 * data_len should not exceed the size of
> iommufd_invalidate_buffer.
> > +	 */
> > +	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
> >data_len ||
> > +	    cmd->data_type >=
> ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
> 
> "data_len should not exceed the size of iommufd_invalidate_buffer."
> 
> How is this checked?

Hmmm, this is a stale comment I suppose.

> 
> > +		return -EOPNOTSUPP;
> > +
> > +	hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
> > +	if (IS_ERR(hwpt))
> > +		return PTR_ERR(hwpt);
> > +
> > +	/* Do not allow any kernel-managed hw_pagetable */
> > +	if (!hwpt->parent) {
> > +		rc = -EINVAL;
> > +		goto out_put_hwpt;
> > +	}
> > +
> > +	klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
> > +	if (!klen) {
> > +		rc = -EINVAL;
> > +		goto out_put_hwpt;
> > +	}
> > +
> > +	/*
> > +	 * Copy the needed fields before reusing the ucmd buffer, this
> > +	 * avoids memory allocation in this path.
> > +	 */
> > +	user_ptr = cmd->data_uptr;
> > +	user_data_len = cmd->data_len;
> 
> Is it a valid case if "user_data_len < klen"?

Yes. e.g. an old qemu running on a new kernel which has new field
added in the end of the data structure.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-10 17:50   ` Jason Gunthorpe
@ 2023-03-14  4:14     ` Liu, Yi L
  2023-03-14  4:18     ` Liu, Yi L
  1 sibling, 0 replies; 49+ messages in thread
From: Liu, Yi L @ 2023-03-14  4:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, Tian, Kevin, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, March 11, 2023 1:50 AM
> 
> On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
> > +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> > +{
> > +	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
> > +	struct iommufd_hw_pagetable *hwpt;
> > +	u64 user_ptr;
> > +	u32 user_data_len, klen;
> > +	int rc = 0;
> > +
> > +	/*
> > +	 * For a user-managed HWPT, type should not be
> IOMMU_HWPT_TYPE_DEFAULT.
> > +	 * data_len should not exceed the size of
> iommufd_invalidate_buffer.
> > +	 */
> > +	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
> >data_len ||
> > +	    cmd->data_type >=
> ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
> > +		return -EOPNOTSUPP;
> 
> This needs to do the standard check for zeros in unknown trailing data
> bit. Check that alloc does it too

Yes. would add it in both path.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-10 17:50   ` Jason Gunthorpe
  2023-03-14  4:14     ` Liu, Yi L
@ 2023-03-14  4:18     ` Liu, Yi L
  2023-03-20 12:48       ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Liu, Yi L @ 2023-03-14  4:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, Tian, Kevin, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, March 11, 2023 1:50 AM
> 
> On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
> > +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> > +{
> > +	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
> > +	struct iommufd_hw_pagetable *hwpt;
> > +	u64 user_ptr;
> > +	u32 user_data_len, klen;
> > +	int rc = 0;
> > +
> > +	/*
> > +	 * For a user-managed HWPT, type should not be
> IOMMU_HWPT_TYPE_DEFAULT.
> > +	 * data_len should not exceed the size of
> iommufd_invalidate_buffer.
> > +	 */
> > +	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
> >data_len ||
> > +	    cmd->data_type >=
> ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
> > +		return -EOPNOTSUPP;
> 
> This needs to do the standard check for zeros in unknown trailing data
> bit. Check that alloc does it too

Maybe it has been covered by the copy_struct_from_user(). Is it?

+	/*
+	 * Copy the needed fields before reusing the ucmd buffer, this
+	 * avoids memory allocation in this path.
+	 */
+	user_ptr = cmd->data_uptr;
+	user_data_len = cmd->data_len;
+
+	rc = copy_struct_from_user(cmd, klen,
+				   u64_to_user_ptr(user_ptr), user_data_len);

Regards,
Yi Liu 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-03-09  8:09 ` [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
  2023-03-10  2:10   ` Baolu Lu
@ 2023-03-17 10:23   ` Tian, Kevin
  2023-03-20 12:47     ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2023-03-17 10:23 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, March 9, 2023 4:09 PM
> 
> +	/* It must be either NESTED or UNMANAGED, depending on
> parent_domain */
> +	if ((parent_domain && hwpt->domain->type !=
> IOMMU_DOMAIN_NESTED) ||
> +	    (!parent_domain && hwpt->domain->type !=
> IOMMU_DOMAIN_UNMANAGED))
> +		goto out_abort;
> +

WARN_ON()

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 02/12] iommu: Add nested domain support
  2023-03-09  8:09 ` [PATCH 02/12] iommu: Add nested domain support Yi Liu
@ 2023-03-17 10:25   ` Tian, Kevin
  2023-03-18  8:34     ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2023-03-17 10:25 UTC (permalink / raw)
  To: Liu, Yi L, joro, alex.williamson, jgg, robin.murphy, baolu.lu
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, March 9, 2023 4:09 PM
> 
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> Introduce a new domain type for a user space I/O address, which is nested

'a ... address'? let's call it 'user I/O page table'.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/12] iommu: Add nested domain support
  2023-03-17 10:25   ` Tian, Kevin
@ 2023-03-18  8:34     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-03-18  8:34 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, joro, alex.williamson, jgg, robin.murphy
  Cc: baolu.lu, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit, iommu,
	linux-kernel, linux-kselftest

On 2023/3/17 18:25, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, March 9, 2023 4:09 PM
>>
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>
>> Introduce a new domain type for a user space I/O address, which is nested
> 
> 'a ... address'? let's call it 'user I/O page table'.
> 

Okay, sure.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-03-17 10:23   ` Tian, Kevin
@ 2023-03-20 12:47     ` Jason Gunthorpe
  2023-03-21  1:25       ` Tian, Kevin
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 12:47 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Fri, Mar 17, 2023 at 10:23:54AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Thursday, March 9, 2023 4:09 PM
> > 
> > +	/* It must be either NESTED or UNMANAGED, depending on
> > parent_domain */
> > +	if ((parent_domain && hwpt->domain->type !=
> > IOMMU_DOMAIN_NESTED) ||
> > +	    (!parent_domain && hwpt->domain->type !=
> > IOMMU_DOMAIN_UNMANAGED))
> > +		goto out_abort;
> > +
> 
> WARN_ON()

Wouldn't that be userspace triggerable? It gets to pick the hwpt used.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE
  2023-03-14  4:18     ` Liu, Yi L
@ 2023-03-20 12:48       ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-03-20 12:48 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: joro, alex.williamson, Tian, Kevin, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Tue, Mar 14, 2023 at 04:18:21AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, March 11, 2023 1:50 AM
> > 
> > On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
> > > +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
> > > +{
> > > +	struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
> > > +	struct iommufd_hw_pagetable *hwpt;
> > > +	u64 user_ptr;
> > > +	u32 user_data_len, klen;
> > > +	int rc = 0;
> > > +
> > > +	/*
> > > +	 * For a user-managed HWPT, type should not be
> > IOMMU_HWPT_TYPE_DEFAULT.
> > > +	 * data_len should not exceed the size of
> > iommufd_invalidate_buffer.
> > > +	 */
> > > +	if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
> > >data_len ||
> > > +	    cmd->data_type >=
> > ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
> > > +		return -EOPNOTSUPP;
> > 
> > This needs to do the standard check for zeros in unknown trailing data
> > bit. Check that alloc does it too
> 
> Maybe it has been covered by the copy_struct_from_user(). Is it?

Yes

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc()
  2023-03-20 12:47     ` Jason Gunthorpe
@ 2023-03-21  1:25       ` Tian, Kevin
  0 siblings, 0 replies; 49+ messages in thread
From: Tian, Kevin @ 2023-03-21  1:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, joro, alex.williamson, robin.murphy, baolu.lu, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, March 20, 2023 8:47 PM
> 
> On Fri, Mar 17, 2023 at 10:23:54AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Thursday, March 9, 2023 4:09 PM
> > >
> > > +	/* It must be either NESTED or UNMANAGED, depending on
> > > parent_domain */
> > > +	if ((parent_domain && hwpt->domain->type !=
> > > IOMMU_DOMAIN_NESTED) ||
> > > +	    (!parent_domain && hwpt->domain->type !=
> > > IOMMU_DOMAIN_UNMANAGED))
> > > +		goto out_abort;
> > > +
> >
> > WARN_ON()
> 
> Wouldn't that be userspace triggerable? It gets to pick the hwpt used.
> 

The domain type is set by iommu driver instead of userspace. A sane
driver should pass above check, if it claims to support nested. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-10  6:50     ` Nicolin Chen
  2023-03-10 12:51       ` Baolu Lu
@ 2023-03-23  8:06       ` Liu, Yi L
  2023-03-23  8:12         ` Nicolin Chen
  1 sibling, 1 reply; 49+ messages in thread
From: Liu, Yi L @ 2023-03-23  8:06 UTC (permalink / raw)
  To: Nicolin Chen, Baolu Lu
  Cc: joro, alex.williamson, jgg, Tian, Kevin, robin.murphy, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, March 10, 2023 2:51 PM
 > 
> On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On 3/9/23 4:09 PM, Yi Liu wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > >
> > > A user-managed hw_pagetable does not need to get populated, since it
> is
> > > managed by a guest OS. Move the iopt_table_add_domain and
> list_add_tail
> > > calls into a helper, where the hwpt pointer will be redirected to its
> > > hwpt->parent if it's available.
> > >
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >   drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
> > >   1 file changed, 18 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> b/drivers/iommu/iommufd/hw_pagetable.c
> > > index 16e92a1c150b..6e45ec0a66fa 100644
> > > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > > @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct
> iommufd_hw_pagetable *hwpt)
> > >       return 0;
> > >   }
> > >
> > > +static int iommufd_hw_pagetable_link_ioas(struct
> iommufd_hw_pagetable *hwpt)
> > > +{
> > > +     int rc;
> > > +
> > > +     if (hwpt->parent)
> > > +             hwpt = hwpt->parent;
> > > +
> > > +     if (!list_empty(&hwpt->hwpt_item))
> > > +             return 0;
> >
> > What is above check for? Is it "the hwpt has already been inserted into
> > the hwpt list of its ioas in another place"?
> >
> > If so, is it possible that hwpt will be deleted from the list even when
> > this user hwpt is still linked to the ioas?
> 
> It means that the hwpt is already linked to the ioas. And the
> hwpt_item can be only empty after a destroy().
> 
> With that being said, after I think it through, perhaps Yi's
> previous change removing it might be better. So, it could be:
> 
> -------------------------------------------------------------------------------
> +	/*
> +	 * Only a parent hwpt needs to be linked to the IOAS. And a hwpt-
> >parent
> +	 * must be linked to the IOAS already, when it's being allocated.
> +	 */
>  	if (hwpt->parent)
> -		hwpt = hwpt->parent;
> -
> -	if (!list_empty(&hwpt->hwpt_item))
>  		return 0;
> 
> -------------------------------------------------------------------------------
> 
> I was concerned about the case where a device gets attached to
> the nested hwpt without staging at the parent hwpt first.

I think I was convinced with the reason that this helper may be
called by allocation for both standalone s2 hwpt and the nested
hwpt. So my change was not enough. Yours covers both cases.

> But,
> the link between the parent hwpt and the IOAS happened inside
> the allocation function now, not attach() any more.

Not quite get. This helper is also called in the allocation path. Is
it? Anyhow, with Jason's comment, this helper may be reworked.
We can sync later on the next version.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 08/12] iommufd/device: Report supported hwpt_types
  2023-03-10 17:52           ` Jason Gunthorpe
@ 2023-03-23  8:08             ` Liu, Yi L
  0 siblings, 0 replies; 49+ messages in thread
From: Liu, Yi L @ 2023-03-23  8:08 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen
  Cc: Baolu Lu, joro, alex.williamson, Tian, Kevin, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, March 11, 2023 1:53 AM
> 
> On Thu, Mar 09, 2023 at 11:45:05PM -0800, Nicolin Chen wrote:
> 
> > Yea, that's what I thought too. Yet, I am wondering a bit if
> > it'd be better to have an ops->hwpt_type in the drivers, v.s.
> > maintaining a potentially big chunk of the array here.
> 
> It probably is, some ops->iommufd_data.XX is not a bad idea

Will try. 

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data
  2023-03-10  3:02   ` Baolu Lu
@ 2023-03-23  8:11     ` Liu, Yi L
  0 siblings, 0 replies; 49+ messages in thread
From: Liu, Yi L @ 2023-03-23  8:11 UTC (permalink / raw)
  To: Baolu Lu, joro, alex.williamson, jgg, Tian, Kevin, robin.murphy
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Friday, March 10, 2023 11:02 AM
> 
> On 3/9/23 4:09 PM, Yi Liu wrote:
> > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for
> usersapce.
> > But it can only allocate hw_pagetables linked with IOAS. There are needs
> > to support hw_pagetable allocation with parameters specified by user. For
> > example, in nested translation, user needs to allocate hw_pagetable for
> > the stage-1 translation (e.g. a single I/O page table or a set of I/O page
> > tables) with user data. It also needs provide a stage-2 hw_pagetable
> which
> > is linked to the GPA IOAS.
> >
> > This extends IOMMU_HWPT_ALLOC to accept user specified parameter
> and hwpt
> > ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires
> > a parent hwpt to point to stage-2 translation.
> >
> > enum iommu_hwpt_type is defined to differentiate the user parameters
> use
> > by different usages. For the allocations that don't require user parameter,
> > IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility.
> Other types
> > would be added by future iommu vendor driver extensions.
> >
> > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >   drivers/iommu/iommufd/hw_pagetable.c | 94
> +++++++++++++++++++++++++---
> >   drivers/iommu/iommufd/main.c         |  2 +-
> >   include/uapi/linux/iommufd.h         | 30 +++++++++
> >   3 files changed, 115 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> b/drivers/iommu/iommufd/hw_pagetable.c
> > index 6e45ec0a66fa..64e7cf7142e1 100644
> > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > @@ -165,34 +165,106 @@ iommufd_hw_pagetable_alloc(struct
> iommufd_ctx *ictx, struct iommufd_ioas *ioas,
> >   	return ERR_PTR(rc);
> >   }
> >
> > +/*
> > + * size of page table type specific data, indexed by
> > + * enum iommu_hwpt_type.
> > + */
> > +static const size_t iommufd_hwpt_alloc_data_size[] = {
> > +	[IOMMU_HWPT_TYPE_DEFAULT] = 0,
> > +};
> > +
> >   int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
> >   {
> >   	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
> > -	struct iommufd_hw_pagetable *hwpt;
> > +	struct iommufd_hw_pagetable *hwpt, *parent = NULL;
> > +	struct iommufd_object *pt_obj;
> >   	struct iommufd_device *idev;
> >   	struct iommufd_ioas *ioas;
> > +	const struct iommu_ops *ops;
> > +	void *data = NULL;
> > +	u32 klen;
> >   	int rc;
> 
> Reverse Christmas tree format. Ditto to other places in this file.

Got it.

> >
> > -	if (cmd->flags)
> > +	if (cmd->__reserved || cmd->flags)
> >   		return -EOPNOTSUPP;
> >
> >   	idev = iommufd_get_device(ucmd, cmd->dev_id);
> >   	if (IS_ERR(idev))
> >   		return PTR_ERR(idev);
> >
> > -	ioas = iommufd_get_ioas(ucmd, cmd->pt_id);
> > -	if (IS_ERR(ioas)) {
> > -		rc = PTR_ERR(ioas);
> > +	ops = dev_iommu_ops(idev->dev);
> > +	if (!ops) {
> > +		rc = -EOPNOTSUPP;
> >   		goto out_put_idev;
> >   	}
> 
> No need to check. dev_iommu_ops() will never returns a NULL.

Got it.

> >
> > +	/* Only support IOMMU_HWPT_TYPE_DEFAULT for now */
> > +	if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) {
> > +		rc = -EINVAL;
> > +		goto out_put_idev;
> > +	}
> > +
> > +	pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id,
> IOMMUFD_OBJ_ANY);
> > +	if (IS_ERR(pt_obj)) {
> > +		rc = -EINVAL;
> > +		goto out_put_idev;
> > +	}
> > +
> > +	switch (pt_obj->type) {
> > +	case IOMMUFD_OBJ_IOAS:
> > +		ioas = container_of(pt_obj, struct iommufd_ioas, obj);
> > +		break;
> > +	case IOMMUFD_OBJ_HW_PAGETABLE:
> > +		/* pt_id points HWPT only when data_type
> is !IOMMU_HWPT_TYPE_DEFAULT */
> > +		if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT) {
> > +			rc = -EINVAL;
> > +			goto out_put_pt;
> > +		}
> > +
> > +		parent = container_of(pt_obj, struct
> iommufd_hw_pagetable, obj);
> > +		/*
> > +		 * Cannot allocate user-managed hwpt linking to
> auto_created
> > +		 * hwpt. If the parent hwpt is already a user-managed hwpt,
> > +		 * don't allocate another user-managed hwpt linking to it.
> > +		 */
> > +		if (parent->auto_domain || parent->parent) {
> > +			rc = -EINVAL;
> > +			goto out_put_pt;
> > +		}
> > +		ioas = parent->ioas;
> > +		break;
> > +	default:
> > +		rc = -EINVAL;
> > +		goto out_put_pt;
> > +	}
> > +
> > +	klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
> > +	if (klen) {
> > +		if (!cmd->data_len) {
> > +			rc = -EINVAL;
> > +			goto out_put_pt;
> > +		}
> 
> Is the user_data still valid if (cmd->data_len < klen)?

Yes, one day if kernel has new fields added in the end of the uapi
structure but qemu is using an old uapi header. In such case, the
extra byte should be zeroed.

> 
> > +
> > +		data = kzalloc(klen, GFP_KERNEL);
> > +		if (!data) {
> > +			rc = -ENOMEM;
> > +			goto out_put_pt;
> > +		}
> > +
> > +		rc = copy_struct_from_user(data, klen,
> > +					   u64_to_user_ptr(cmd->data_uptr),
> > +					   cmd->data_len);
> > +		if (rc)
> > +			goto out_free_data;
> > +	}
> > +
> >   	mutex_lock(&ioas->mutex);
> >   	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
> > -					  NULL, NULL, false);
> > +					  parent, data, false);
> >   	mutex_unlock(&ioas->mutex);
> >   	if (IS_ERR(hwpt)) {
> >   		rc = PTR_ERR(hwpt);
> > -		goto out_put_ioas;
> > +		goto out_free_data;
> >   	}
> >
> >   	cmd->out_hwpt_id = hwpt->obj.id;
> > @@ -200,12 +272,14 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
> *ucmd)
> >   	if (rc)
> >   		goto out_hwpt;
> >   	iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
> > -	goto out_put_ioas;
> > +	goto out_free_data;
> >
> >   out_hwpt:
> >   	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
> > -out_put_ioas:
> > -	iommufd_put_object(&ioas->obj);
> > +out_free_data:
> > +	kfree(data);
> > +out_put_pt:
> > +	iommufd_put_object(pt_obj);
> >   out_put_idev:
> >   	iommufd_put_object(&idev->obj);
> >   	return rc;
> > diff --git a/drivers/iommu/iommufd/main.c
> b/drivers/iommu/iommufd/main.c
> > index f079c0bda46b..7ab1e2c638a1 100644
> > --- a/drivers/iommu/iommufd/main.c
> > +++ b/drivers/iommu/iommufd/main.c
> > @@ -295,7 +295,7 @@ struct iommufd_ioctl_op {
> >   static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
> >   	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct
> iommu_destroy, id),
> >   	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct
> iommu_hwpt_alloc,
> > -		 __reserved),
> > +		 data_uptr),
> >   	IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO,
> iommufd_device_get_hw_info,
> >   		 struct iommu_hw_info, __reserved),
> >   	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
> > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > index 4ac525897b82..48781ff40a37 100644
> > --- a/include/uapi/linux/iommufd.h
> > +++ b/include/uapi/linux/iommufd.h
> > @@ -347,6 +347,14 @@ struct iommu_vfio_ioas {
> >   };
> >   #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE,
> IOMMUFD_CMD_VFIO_IOAS)
> >
> > +/**
> > + * enum iommu_hwpt_type - IOMMU HWPT Type
> > + * @IOMMU_HWPT_TYPE_DEFAULT: default
> > + */
> > +enum iommu_hwpt_type {
> > +	IOMMU_HWPT_TYPE_DEFAULT,
> > +};
> > +
> >   /**
> >    * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
> >    * @size: sizeof(struct iommu_hwpt_alloc)
> > @@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
> >    * @pt_id: The IOAS to connect this HWPT to
> >    * @out_hwpt_id: The ID of the new HWPT
> >    * @__reserved: Must be 0
> > + * @data_type: One of enum iommu_hwpt_type
> > + * @data_len: Length of the type specific data
> > + * @data_uptr: User pointer to the type specific data
> >    *
> >    * Explicitly allocate a hardware page table object. This is the same
> object
> >    * type that is returned by iommufd_device_attach() and represents the
> >    * underlying iommu driver's iommu_domain kernel object.
> >    *
> >    * A normal HWPT will be created with the mappings from the given IOAS.
> > + * The @data_type for its allocation can be set to
> IOMMU_HWPT_TYPE_DEFAULT, or
> > + * another type (being listed below) to specialize a kernel-managed
> HWPT.
> > + *
> > + * A user-managed HWPT will be created from a given parent HWPT via
> @pt_id, in
> > + * which the parent HWPT must be allocated previously via the same ioctl
> from a
> > + * given IOAS. The @data_type must not be set to
> IOMMU_HWPT_TYPE_DEFAULT but a
> > + * pre-defined type corresponding to the underlying IOMMU hardware.
> > + *
> > + * If the @data_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the
> @data_len and
> > + * the @data_uptr will be ignored. Otherwise, both of them must be
> given.
> > + *
> > + *
> +==============================+=============================
> ========+===========+
> > + * | @data_type                   |    Data structure in @data_uptr     |   @pt_id
> |
> > + * +------------------------------+-------------------------------------+-----------+
> > + * | IOMMU_HWPT_TYPE_DEFAULT      |               N/A                   |    IOAS
> |
> > + * +------------------------------+-------------------------------------+-----------+
> >    */
> >   struct iommu_hwpt_alloc {
> >   	__u32 size;
> > @@ -369,6 +396,9 @@ struct iommu_hwpt_alloc {
> >   	__u32 pt_id;
> >   	__u32 out_hwpt_id;
> >   	__u32 __reserved;
> > +	__u32 data_type;
> > +	__u32 data_len;
> > +	__aligned_u64 data_uptr;
> >   };
> >   #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE,
> IOMMUFD_CMD_HWPT_ALLOC)
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-23  8:06       ` Liu, Yi L
@ 2023-03-23  8:12         ` Nicolin Chen
  2023-03-23  8:28           ` Liu, Yi L
  0 siblings, 1 reply; 49+ messages in thread
From: Nicolin Chen @ 2023-03-23  8:12 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Baolu Lu, joro, alex.williamson, jgg, Tian, Kevin, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

On Thu, Mar 23, 2023 at 08:06:26AM +0000, Liu, Yi L wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, March 10, 2023 2:51 PM
>  >
> > On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On 3/9/23 4:09 PM, Yi Liu wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > >
> > > > A user-managed hw_pagetable does not need to get populated, since it
> > is
> > > > managed by a guest OS. Move the iopt_table_add_domain and
> > list_add_tail
> > > > calls into a helper, where the hwpt pointer will be redirected to its
> > > > hwpt->parent if it's available.
> > > >
> > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >   drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++--
> > > >   1 file changed, 18 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> > b/drivers/iommu/iommufd/hw_pagetable.c
> > > > index 16e92a1c150b..6e45ec0a66fa 100644
> > > > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > > > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > > > @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct
> > iommufd_hw_pagetable *hwpt)
> > > >       return 0;
> > > >   }
> > > >
> > > > +static int iommufd_hw_pagetable_link_ioas(struct
> > iommufd_hw_pagetable *hwpt)
> > > > +{
> > > > +     int rc;
> > > > +
> > > > +     if (hwpt->parent)
> > > > +             hwpt = hwpt->parent;
> > > > +
> > > > +     if (!list_empty(&hwpt->hwpt_item))
> > > > +             return 0;
> > >
> > > What is above check for? Is it "the hwpt has already been inserted into
> > > the hwpt list of its ioas in another place"?
> > >
> > > If so, is it possible that hwpt will be deleted from the list even when
> > > this user hwpt is still linked to the ioas?
> >
> > It means that the hwpt is already linked to the ioas. And the
> > hwpt_item can be only empty after a destroy().
> >
> > With that being said, after I think it through, perhaps Yi's
> > previous change removing it might be better. So, it could be:
> >
> > -------------------------------------------------------------------------------
> > +     /*
> > +      * Only a parent hwpt needs to be linked to the IOAS. And a hwpt-
> > >parent
> > +      * must be linked to the IOAS already, when it's being allocated.
> > +      */
> >       if (hwpt->parent)
> > -             hwpt = hwpt->parent;
> > -
> > -     if (!list_empty(&hwpt->hwpt_item))
> >               return 0;
> >
> > -------------------------------------------------------------------------------
> >
> > I was concerned about the case where a device gets attached to
> > the nested hwpt without staging at the parent hwpt first.
> 
> I think I was convinced with the reason that this helper may be
> called by allocation for both standalone s2 hwpt and the nested
> hwpt. So my change was not enough. Yours covers both cases.
> 
> > But,
> > the link between the parent hwpt and the IOAS happened inside
> > the allocation function now, not attach() any more.
> 
> Not quite get. This helper is also called in the allocation path. Is
> it? Anyhow, with Jason's comment, this helper may be reworked.
> We can sync later on the next version.

We previously had this link_ioas() in attach() routine so we
needed to make sure hwpt->parent got populated, because the
device could be attached to an S1 HWPT directly. But now this
is in the alloc() routine, so by the time an S1 HWPT is being
allocated, an S2 HWPT must be allocated first and populated
already.

Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
  2023-03-23  8:12         ` Nicolin Chen
@ 2023-03-23  8:28           ` Liu, Yi L
  0 siblings, 0 replies; 49+ messages in thread
From: Liu, Yi L @ 2023-03-23  8:28 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Baolu Lu, joro, alex.williamson, jgg, Tian, Kevin, robin.murphy,
	cohuck, eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	iommu, linux-kernel, linux-kselftest

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, March 23, 2023 4:12 PM
> 
> On Thu, Mar 23, 2023 at 08:06:26AM +0000, Liu, Yi L wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Friday, March 10, 2023 2:51 PM
> >  >
> > > On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On 3/9/23 4:09 PM, Yi Liu wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > >
> > > > > A user-managed hw_pagetable does not need to get populated,
> since it
> > > is
> > > > > managed by a guest OS. Move the iopt_table_add_domain and
> > > list_add_tail
> > > > > calls into a helper, where the hwpt pointer will be redirected to its
> > > > > hwpt->parent if it's available.
> > > > >
> > > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >   drivers/iommu/iommufd/hw_pagetable.c | 20
> ++++++++++++++++++--
> > > > >   1 file changed, 18 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/iommu/iommufd/hw_pagetable.c
> > > b/drivers/iommu/iommufd/hw_pagetable.c
> > > > > index 16e92a1c150b..6e45ec0a66fa 100644
> > > > > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > > > > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > > > > @@ -43,6 +43,23 @@ int
> iommufd_hw_pagetable_enforce_cc(struct
> > > iommufd_hw_pagetable *hwpt)
> > > > >       return 0;
> > > > >   }
> > > > >
> > > > > +static int iommufd_hw_pagetable_link_ioas(struct
> > > iommufd_hw_pagetable *hwpt)
> > > > > +{
> > > > > +     int rc;
> > > > > +
> > > > > +     if (hwpt->parent)
> > > > > +             hwpt = hwpt->parent;
> > > > > +
> > > > > +     if (!list_empty(&hwpt->hwpt_item))
> > > > > +             return 0;
> > > >
> > > > What is above check for? Is it "the hwpt has already been inserted into
> > > > the hwpt list of its ioas in another place"?
> > > >
> > > > If so, is it possible that hwpt will be deleted from the list even when
> > > > this user hwpt is still linked to the ioas?
> > >
> > > It means that the hwpt is already linked to the ioas. And the
> > > hwpt_item can be only empty after a destroy().
> > >
> > > With that being said, after I think it through, perhaps Yi's
> > > previous change removing it might be better. So, it could be:
> > >
> > > -------------------------------------------------------------------------------
> > > +     /*
> > > +      * Only a parent hwpt needs to be linked to the IOAS. And a hwpt-
> > > >parent
> > > +      * must be linked to the IOAS already, when it's being allocated.
> > > +      */
> > >       if (hwpt->parent)
> > > -             hwpt = hwpt->parent;
> > > -
> > > -     if (!list_empty(&hwpt->hwpt_item))
> > >               return 0;
> > >
> > > -------------------------------------------------------------------------------
> > >
> > > I was concerned about the case where a device gets attached to
> > > the nested hwpt without staging at the parent hwpt first.
> >
> > I think I was convinced with the reason that this helper may be
> > called by allocation for both standalone s2 hwpt and the nested
> > hwpt. So my change was not enough. Yours covers both cases.
> >
> > > But,
> > > the link between the parent hwpt and the IOAS happened inside
> > > the allocation function now, not attach() any more.
> >
> > Not quite get. This helper is also called in the allocation path. Is
> > it? Anyhow, with Jason's comment, this helper may be reworked.
> > We can sync later on the next version.
> 
> We previously had this link_ioas() in attach() routine so we
> needed to make sure hwpt->parent got populated, because the
> device could be attached to an S1 HWPT directly. But now this
> is in the alloc() routine, so by the time an S1 HWPT is being
> allocated, an S2 HWPT must be allocated first and populated
> already.

Aha, yes. 😊

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace
  2023-03-10  0:56   ` Jason Gunthorpe
@ 2023-03-29 10:56     ` Liu, Yi L
  2023-04-13  0:44     ` Nicolin Chen
  1 sibling, 0 replies; 49+ messages in thread
From: Liu, Yi L @ 2023-03-29 10:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, alex.williamson, Tian, Kevin, robin.murphy, baolu.lu,
	cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, March 10, 2023 8:56 AM
> 
> On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 3ef84ee359d2..a269bc62a31c 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
> >   *           after use. Return the data buffer if success, or ERR_PTR on
> >   *           failure.
> >   * @domain_alloc: allocate iommu domain
> > + * @domain_alloc_user: allocate user iommu domain
> >   * @probe_device: Add device to iommu driver handling
> >   * @release_device: Remove device from iommu driver handling
> >   * @probe_finalize: Do final setup work after the device is added to an
> IOMMU
> > @@ -266,6 +267,9 @@ struct iommu_ops {
> >
> >  	/* Domain allocation and freeing by the iommu driver */
> >  	struct iommu_domain *(*domain_alloc)(unsigned
> iommu_domain_type);
> > +	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
> > +						  struct iommu_domain
> *parent,
> > +						  const void *user_data);
> 
> Since the kernel does the copy from user and manages the zero fill
> compat maybe this user_data have a union like Robin suggested.
>
> But yes, this is the idea.

Ok. so it's a union like the below, and in this patch may be only an empty
union can be added as the struct iommu_hwpt_intel_vtd and
struct iommu_hwpt_arm_smmuv3 would be added by the vendor nesting
patch series.

union iommu_hwpt_alloc_user_data {
	struct iommu_hwpt_intel_vtd vtd;
	struct iommu_hwpt_arm_smmuv3 smmuv3;
};

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace
  2023-03-10  0:56   ` Jason Gunthorpe
  2023-03-29 10:56     ` Liu, Yi L
@ 2023-04-13  0:44     ` Nicolin Chen
  2023-04-13 11:37       ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Nicolin Chen @ 2023-04-13  0:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

Hi Jason,

On Thu, Mar 09, 2023 at 08:56:06PM -0400, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 3ef84ee359d2..a269bc62a31c 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
> >   *           after use. Return the data buffer if success, or ERR_PTR on
> >   *           failure.
> >   * @domain_alloc: allocate iommu domain
> > + * @domain_alloc_user: allocate user iommu domain
> >   * @probe_device: Add device to iommu driver handling
> >   * @release_device: Remove device from iommu driver handling
> >   * @probe_finalize: Do final setup work after the device is added to an IOMMU
> > @@ -266,6 +267,9 @@ struct iommu_ops {
> >  
> >  	/* Domain allocation and freeing by the iommu driver */
> >  	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> > +	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
> > +						  struct iommu_domain *parent,
> > +						  const void *user_data);
> 
> Since the kernel does the copy from user and manages the zero fill
> compat maybe this user_data have a union like Robin suggested.
> 
> But yes, this is the idea.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

We pass in a read-only data to this ->domain_alloc_user() while
it also returns NULL on failure, matching ->domain_alloc(). So,
there seems to be no error feedback pathway from the driver to
user space.

Robin remarked in the SMMU series that an STE configuration can
fail. So, a proper error feedback is required for this callback
too.

To return a driver/HW specific error, I think we could define a
"u8 out_error" in the user_data structure. So, we probably need
a non-const pass-in here. What do you think?

Thanks
Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace
  2023-04-13  0:44     ` Nicolin Chen
@ 2023-04-13 11:37       ` Jason Gunthorpe
  2023-04-13 15:25         ` Nicolin Chen
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-04-13 11:37 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Wed, Apr 12, 2023 at 05:44:04PM -0700, Nicolin Chen wrote:
> Hi Jason,
> 
> On Thu, Mar 09, 2023 at 08:56:06PM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 3ef84ee359d2..a269bc62a31c 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
> > >   *           after use. Return the data buffer if success, or ERR_PTR on
> > >   *           failure.
> > >   * @domain_alloc: allocate iommu domain
> > > + * @domain_alloc_user: allocate user iommu domain
> > >   * @probe_device: Add device to iommu driver handling
> > >   * @release_device: Remove device from iommu driver handling
> > >   * @probe_finalize: Do final setup work after the device is added to an IOMMU
> > > @@ -266,6 +267,9 @@ struct iommu_ops {
> > >  
> > >  	/* Domain allocation and freeing by the iommu driver */
> > >  	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> > > +	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
> > > +						  struct iommu_domain *parent,
> > > +						  const void *user_data);
> > 
> > Since the kernel does the copy from user and manages the zero fill
> > compat maybe this user_data have a union like Robin suggested.
> > 
> > But yes, this is the idea.
> > 
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> We pass in a read-only data to this ->domain_alloc_user() while
> it also returns NULL on failure, matching ->domain_alloc(). So,
> there seems to be no error feedback pathway from the driver to
> user space.
> 
> Robin remarked in the SMMU series that an STE configuration can
> fail. So, a proper error feedback is required for this callback
> too.
> 
> To return a driver/HW specific error, I think we could define a
> "u8 out_error" in the user_data structure. So, we probably need
> a non-const pass-in here. What do you think?

What is wrong with err_ptr?

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace
  2023-04-13 11:37       ` Jason Gunthorpe
@ 2023-04-13 15:25         ` Nicolin Chen
  0 siblings, 0 replies; 49+ messages in thread
From: Nicolin Chen @ 2023-04-13 15:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, joro, alex.williamson, kevin.tian, robin.murphy,
	baolu.lu, cohuck, eric.auger, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, iommu, linux-kernel, linux-kselftest

On Thu, Apr 13, 2023 at 08:37:14AM -0300, Jason Gunthorpe wrote:

> > > > @@ -266,6 +267,9 @@ struct iommu_ops {
> > > >  
> > > >  	/* Domain allocation and freeing by the iommu driver */
> > > >  	struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
> > > > +	struct iommu_domain *(*domain_alloc_user)(struct device *dev,
> > > > +						  struct iommu_domain *parent,
> > > > +						  const void *user_data);
> > > 
> > > Since the kernel does the copy from user and manages the zero fill
> > > compat maybe this user_data have a union like Robin suggested.
> > > 
> > > But yes, this is the idea.
> > > 
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > We pass in a read-only data to this ->domain_alloc_user() while
> > it also returns NULL on failure, matching ->domain_alloc(). So,
> > there seems to be no error feedback pathway from the driver to
> > user space.
> > 
> > Robin remarked in the SMMU series that an STE configuration can
> > fail. So, a proper error feedback is required for this callback
> > too.
> > 
> > To return a driver/HW specific error, I think we could define a
> > "u8 out_error" in the user_data structure. So, we probably need
> > a non-const pass-in here. What do you think?
> 
> What is wrong with err_ptr?

I see. That could keep the "const" then. Will try that.

Thanks!
Nic

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-04-13 15:25 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09  8:08 [PATCH 00/12] iommufd: Add nesting infrastructure Yi Liu
2023-03-09  8:08 ` [PATCH 01/12] iommu: Add new iommu op to create domains owned by userspace Yi Liu
2023-03-10  0:56   ` Jason Gunthorpe
2023-03-29 10:56     ` Liu, Yi L
2023-04-13  0:44     ` Nicolin Chen
2023-04-13 11:37       ` Jason Gunthorpe
2023-04-13 15:25         ` Nicolin Chen
2023-03-09  8:09 ` [PATCH 02/12] iommu: Add nested domain support Yi Liu
2023-03-17 10:25   ` Tian, Kevin
2023-03-18  8:34     ` Baolu Lu
2023-03-09  8:09 ` [PATCH 03/12] iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation Yi Liu
2023-03-10  1:17   ` Baolu Lu
2023-03-09  8:09 ` [PATCH 04/12] iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() Yi Liu
2023-03-10  2:10   ` Baolu Lu
2023-03-10 17:49     ` Jason Gunthorpe
2023-03-17 10:23   ` Tian, Kevin
2023-03-20 12:47     ` Jason Gunthorpe
2023-03-21  1:25       ` Tian, Kevin
2023-03-09  8:09 ` [PATCH 05/12] iommufd/hw_pagetable: Do not populate user-managed hw_pagetables Yi Liu
2023-03-10  2:25   ` Baolu Lu
2023-03-10  6:50     ` Nicolin Chen
2023-03-10 12:51       ` Baolu Lu
2023-03-23  8:06       ` Liu, Yi L
2023-03-23  8:12         ` Nicolin Chen
2023-03-23  8:28           ` Liu, Yi L
2023-03-10 15:29   ` Jason Gunthorpe
2023-03-10 23:31     ` Nicolin Chen
2023-03-09  8:09 ` [PATCH 06/12] iommufd: IOMMU_HWPT_ALLOC allocation with user data Yi Liu
2023-03-10  3:02   ` Baolu Lu
2023-03-23  8:11     ` Liu, Yi L
2023-03-09  8:09 ` [PATCH 07/12] iommufd: Add IOMMU_HWPT_INVALIDATE Yi Liu
2023-03-10  3:15   ` Baolu Lu
2023-03-14  4:12     ` Liu, Yi L
2023-03-10 17:50   ` Jason Gunthorpe
2023-03-14  4:14     ` Liu, Yi L
2023-03-14  4:18     ` Liu, Yi L
2023-03-20 12:48       ` Jason Gunthorpe
2023-03-09  8:09 ` [PATCH 08/12] iommufd/device: Report supported hwpt_types Yi Liu
2023-03-10  3:30   ` Baolu Lu
2023-03-10  7:10     ` Nicolin Chen
2023-03-10  7:39       ` Liu, Yi L
2023-03-10  7:45         ` Nicolin Chen
2023-03-10 17:52           ` Jason Gunthorpe
2023-03-23  8:08             ` Liu, Yi L
2023-03-09  8:09 ` [PATCH 09/12] iommufd/selftest: Add domain_alloc_user() support in iommu mock Yi Liu
2023-03-09  8:09 ` [PATCH 10/12] iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data Yi Liu
2023-03-09  8:09 ` [PATCH 11/12] iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op Yi Liu
2023-03-09  8:09 ` [PATCH 12/12] iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu
2023-03-09 14:02 ` [PATCH 00/12] iommufd: Add nesting infrastructure Baolu Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).