KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
@ 2020-06-27  3:15 Lu Baolu
  2020-06-27  3:15 ` [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach Lu Baolu
  2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
  0 siblings, 2 replies; 10+ messages in thread
From: Lu Baolu @ 2020-06-27  3:15 UTC (permalink / raw)
  To: Joerg Roedel, Alex Williamson
  Cc: Cornelia Huck, Kevin Tian, Ashok Raj, Dave Jiang, Liu Yi L,
	iommu, linux-kernel, kvm, Lu Baolu

The hardware assistant vfio mediated device is a use case of iommu
aux-domain. The interactions between vfio/mdev and iommu during mdev
creation and passthr are:

- Create a group for mdev with iommu_group_alloc();
- Add the device to the group with
        group = iommu_group_alloc();
        if (IS_ERR(group))
                return PTR_ERR(group);

        ret = iommu_group_add_device(group, &mdev->dev);
        if (!ret)
                dev_info(&mdev->dev, "MDEV: group_id = %d\n",
                         iommu_group_id(group));
- Allocate an aux-domain
	iommu_domain_alloc()
- Attach the aux-domain to the physical device from which the mdev is
  created.
	iommu_aux_attach_device()

In the whole process, an iommu group was allocated for the mdev and an
iommu domain was attached to the group, but the group->domain leaves
NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.

This adds iommu_group_get/set_domain() so that group->domain could be
managed whenever a domain is attached or detached through the aux-domain
api's.

Fixes: 7bd50f0cd2fd5 ("vfio/type1: Add domain at(de)taching group helpers")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
 include/linux/iommu.h | 14 ++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d43120eb1dc5..e2b665303d70 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -715,6 +715,34 @@ int iommu_group_set_name(struct iommu_group *group, const char *name)
 }
 EXPORT_SYMBOL_GPL(iommu_group_set_name);
 
+/**
+ * iommu_group_get_domain - get domain of a group
+ * @group: the group
+ *
+ * This is called to get the domain of a group.
+ */
+struct iommu_domain *iommu_group_get_domain(struct iommu_group *group)
+{
+	return group->domain;
+}
+EXPORT_SYMBOL_GPL(iommu_group_get_domain);
+
+/**
+ * iommu_group_set_domain - set domain for a group
+ * @group: the group
+ * @domain: iommu domain
+ *
+ * This is called to set the domain for a group. In aux-domain case, a domain
+ * might attach or detach to an iommu group through the aux-domain apis, but
+ * the group->domain doesn't get a chance to be updated there.
+ */
+void iommu_group_set_domain(struct iommu_group *group,
+			    struct iommu_domain *domain)
+{
+	group->domain = domain;
+}
+EXPORT_SYMBOL_GPL(iommu_group_set_domain);
+
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
 					       struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..ff88d548a870 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -496,6 +496,9 @@ extern void iommu_group_set_iommudata(struct iommu_group *group,
 				      void *iommu_data,
 				      void (*release)(void *iommu_data));
 extern int iommu_group_set_name(struct iommu_group *group, const char *name);
+extern struct iommu_domain *iommu_group_get_domain(struct iommu_group *group);
+extern void iommu_group_set_domain(struct iommu_group *group,
+				   struct iommu_domain *domain);
 extern int iommu_group_add_device(struct iommu_group *group,
 				  struct device *dev);
 extern void iommu_group_remove_device(struct device *dev);
@@ -840,6 +843,17 @@ static inline int iommu_group_set_name(struct iommu_group *group,
 	return -ENODEV;
 }
 
+static inline
+struct iommu_domain *iommu_group_get_domain(struct iommu_group *group)
+{
+	return NULL;
+}
+
+static inline void iommu_group_set_domain(struct iommu_group *group,
+					  struct iommu_domain *domain)
+{
+}
+
 static inline int iommu_group_add_device(struct iommu_group *group,
 					 struct device *dev)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach
  2020-06-27  3:15 [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Lu Baolu
@ 2020-06-27  3:15 ` Lu Baolu
  2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
  1 sibling, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2020-06-27  3:15 UTC (permalink / raw)
  To: Joerg Roedel, Alex Williamson
  Cc: Cornelia Huck, Kevin Tian, Ashok Raj, Dave Jiang, Liu Yi L,
	iommu, linux-kernel, kvm, Lu Baolu

Update group->domain whenever an aux-domain is attached to or detached
from a mediated device. Without this change, iommu_get_domain_for_dev()
will be broken for mdev devices.

Fixes: 7bd50f0cd2fd5 ("vfio/type1: Add domain at(de)taching group helpers")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 37 ++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5e556ac9102a..e0d8802ce0c9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1634,10 +1634,28 @@ static int vfio_mdev_attach_domain(struct device *dev, void *data)
 
 	iommu_device = vfio_mdev_get_iommu_device(dev);
 	if (iommu_device) {
-		if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
-			return iommu_aux_attach_device(domain, iommu_device);
-		else
+		if (iommu_dev_feature_enabled(iommu_device,
+					      IOMMU_DEV_FEAT_AUX)) {
+			struct iommu_group *group = iommu_group_get(dev);
+			int ret;
+
+			if (!group)
+				return -EINVAL;
+
+			if (iommu_group_get_domain(group)) {
+				iommu_group_put(group);
+				return -EBUSY;
+			}
+
+			ret = iommu_aux_attach_device(domain, iommu_device);
+			if (!ret)
+				iommu_group_set_domain(group, domain);
+
+			iommu_group_put(group);
+			return ret;
+		} else {
 			return iommu_attach_device(domain, iommu_device);
+		}
 	}
 
 	return -EINVAL;
@@ -1650,10 +1668,19 @@ static int vfio_mdev_detach_domain(struct device *dev, void *data)
 
 	iommu_device = vfio_mdev_get_iommu_device(dev);
 	if (iommu_device) {
-		if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
+		if (iommu_dev_feature_enabled(iommu_device,
+					      IOMMU_DEV_FEAT_AUX)) {
+			struct iommu_group *group;
+
 			iommu_aux_detach_device(domain, iommu_device);
-		else
+			group = iommu_group_get(dev);
+			if (group) {
+				iommu_group_set_domain(group, NULL);
+				iommu_group_put(group);
+			}
+		} else {
 			iommu_detach_device(domain, iommu_device);
+		}
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-06-27  3:15 [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Lu Baolu
  2020-06-27  3:15 ` [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach Lu Baolu
@ 2020-06-29 11:56 ` Robin Murphy
  2020-06-30  1:03   ` Lu Baolu
  1 sibling, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2020-06-29 11:56 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Alex Williamson
  Cc: Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

On 2020-06-27 04:15, Lu Baolu wrote:
> The hardware assistant vfio mediated device is a use case of iommu
> aux-domain. The interactions between vfio/mdev and iommu during mdev
> creation and passthr are:
> 
> - Create a group for mdev with iommu_group_alloc();
> - Add the device to the group with
>          group = iommu_group_alloc();
>          if (IS_ERR(group))
>                  return PTR_ERR(group);
> 
>          ret = iommu_group_add_device(group, &mdev->dev);
>          if (!ret)
>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>                           iommu_group_id(group));
> - Allocate an aux-domain
> 	iommu_domain_alloc()
> - Attach the aux-domain to the physical device from which the mdev is
>    created.
> 	iommu_aux_attach_device()
> 
> In the whole process, an iommu group was allocated for the mdev and an
> iommu domain was attached to the group, but the group->domain leaves
> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
> 
> This adds iommu_group_get/set_domain() so that group->domain could be
> managed whenever a domain is attached or detached through the aux-domain
> api's.

Letting external callers poke around directly in the internals of 
iommu_group doesn't look right to me.

If a regular device is attached to one or more aux domains for PASID 
use, iommu_get_domain_for_dev() is still going to return the primary 
domain, so why should it be expected to behave differently for mediated 
devices? AFAICS it's perfectly legitimate to have no primary domain if 
traffic-without-PASID is invalid.

Robin.

> Fixes: 7bd50f0cd2fd5 ("vfio/type1: Add domain at(de)taching group helpers")
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>   drivers/iommu/iommu.c | 28 ++++++++++++++++++++++++++++
>   include/linux/iommu.h | 14 ++++++++++++++
>   2 files changed, 42 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d43120eb1dc5..e2b665303d70 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -715,6 +715,34 @@ int iommu_group_set_name(struct iommu_group *group, const char *name)
>   }
>   EXPORT_SYMBOL_GPL(iommu_group_set_name);
>   
> +/**
> + * iommu_group_get_domain - get domain of a group
> + * @group: the group
> + *
> + * This is called to get the domain of a group.
> + */
> +struct iommu_domain *iommu_group_get_domain(struct iommu_group *group)
> +{
> +	return group->domain;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_get_domain);
> +
> +/**
> + * iommu_group_set_domain - set domain for a group
> + * @group: the group
> + * @domain: iommu domain
> + *
> + * This is called to set the domain for a group. In aux-domain case, a domain
> + * might attach or detach to an iommu group through the aux-domain apis, but
> + * the group->domain doesn't get a chance to be updated there.
> + */
> +void iommu_group_set_domain(struct iommu_group *group,
> +			    struct iommu_domain *domain)
> +{
> +	group->domain = domain;
> +}
> +EXPORT_SYMBOL_GPL(iommu_group_set_domain);
> +
>   static int iommu_create_device_direct_mappings(struct iommu_group *group,
>   					       struct device *dev)
>   {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5f0b7859d2eb..ff88d548a870 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -496,6 +496,9 @@ extern void iommu_group_set_iommudata(struct iommu_group *group,
>   				      void *iommu_data,
>   				      void (*release)(void *iommu_data));
>   extern int iommu_group_set_name(struct iommu_group *group, const char *name);
> +extern struct iommu_domain *iommu_group_get_domain(struct iommu_group *group);
> +extern void iommu_group_set_domain(struct iommu_group *group,
> +				   struct iommu_domain *domain);
>   extern int iommu_group_add_device(struct iommu_group *group,
>   				  struct device *dev);
>   extern void iommu_group_remove_device(struct device *dev);
> @@ -840,6 +843,17 @@ static inline int iommu_group_set_name(struct iommu_group *group,
>   	return -ENODEV;
>   }
>   
> +static inline
> +struct iommu_domain *iommu_group_get_domain(struct iommu_group *group)
> +{
> +	return NULL;
> +}
> +
> +static inline void iommu_group_set_domain(struct iommu_group *group,
> +					  struct iommu_domain *domain)
> +{
> +}
> +
>   static inline int iommu_group_add_device(struct iommu_group *group,
>   					 struct device *dev)
>   {
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
@ 2020-06-30  1:03   ` Lu Baolu
  2020-06-30 16:51     ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Lu Baolu @ 2020-06-30  1:03 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Alex Williamson
  Cc: baolu.lu, Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

Hi Robin,

On 6/29/20 7:56 PM, Robin Murphy wrote:
> On 2020-06-27 04:15, Lu Baolu wrote:
>> The hardware assistant vfio mediated device is a use case of iommu
>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>> creation and passthr are:
>>
>> - Create a group for mdev with iommu_group_alloc();
>> - Add the device to the group with
>>          group = iommu_group_alloc();
>>          if (IS_ERR(group))
>>                  return PTR_ERR(group);
>>
>>          ret = iommu_group_add_device(group, &mdev->dev);
>>          if (!ret)
>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>                           iommu_group_id(group));
>> - Allocate an aux-domain
>>     iommu_domain_alloc()
>> - Attach the aux-domain to the physical device from which the mdev is
>>    created.
>>     iommu_aux_attach_device()
>>
>> In the whole process, an iommu group was allocated for the mdev and an
>> iommu domain was attached to the group, but the group->domain leaves
>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>
>> This adds iommu_group_get/set_domain() so that group->domain could be
>> managed whenever a domain is attached or detached through the aux-domain
>> api's.
> 
> Letting external callers poke around directly in the internals of 
> iommu_group doesn't look right to me.

Unfortunately, it seems that the vifo iommu abstraction is deeply bound
to the IOMMU subsystem. We can easily find other examples:

iommu_group_get/set_iommudata()
iommu_group_get/set_name()
...

> 
> If a regular device is attached to one or more aux domains for PASID 
> use, iommu_get_domain_for_dev() is still going to return the primary 
> domain, so why should it be expected to behave differently for mediated

Unlike the normal device attach, we will encounter two devices when it
comes to aux-domain.

- Parent physical device - this might be, for example, a PCIe device
with PASID feature support, hence it is able to tag an unique PASID
for DMA transfers originated from its subset. The device driver hence
is able to wrapper this subset into an isolated:

- Mediated device - a fake device created by the device driver mentioned
above.

Yes. All you mentioned are right for the parent device. But for mediated
device, iommu_get_domain_for_dev() doesn't work even it has an valid
iommu_group and iommu_domain.

iommu_get_domain_for_dev() is a necessary interface for device drivers
which want to support aux-domain. For example,

           struct iommu_domain *domain;
           struct device *dev = mdev_dev(mdev);
	  unsigned long pasid;

           domain = iommu_get_domain_for_dev(dev);
           if (!domain)
                   return -ENODEV;

           pasid = iommu_aux_get_pasid(domain, dev->parent);
	  if (pasid == IOASID_INVALID)
		  return -EINVAL;

	  /* Program the device context with the PASID value */
	  ....

Without this fix, iommu_get_domain_for_dev() always returns NULL and the
device driver has no means to support aux-domain.

Best regards,
baolu

> devices? AFAICS it's perfectly legitimate to have no primary domain if 
> traffic-without-PASID is invalid.
> 
> Robin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-06-30  1:03   ` Lu Baolu
@ 2020-06-30 16:51     ` Robin Murphy
  2020-07-01  7:32       ` Lu Baolu
  0 siblings, 1 reply; 10+ messages in thread
From: Robin Murphy @ 2020-06-30 16:51 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Alex Williamson
  Cc: Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

On 2020-06-30 02:03, Lu Baolu wrote:
> Hi Robin,
> 
> On 6/29/20 7:56 PM, Robin Murphy wrote:
>> On 2020-06-27 04:15, Lu Baolu wrote:
>>> The hardware assistant vfio mediated device is a use case of iommu
>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>> creation and passthr are:
>>>
>>> - Create a group for mdev with iommu_group_alloc();
>>> - Add the device to the group with
>>>          group = iommu_group_alloc();
>>>          if (IS_ERR(group))
>>>                  return PTR_ERR(group);
>>>
>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>          if (!ret)
>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>                           iommu_group_id(group));
>>> - Allocate an aux-domain
>>>     iommu_domain_alloc()
>>> - Attach the aux-domain to the physical device from which the mdev is
>>>    created.
>>>     iommu_aux_attach_device()
>>>
>>> In the whole process, an iommu group was allocated for the mdev and an
>>> iommu domain was attached to the group, but the group->domain leaves
>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>>
>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>> managed whenever a domain is attached or detached through the aux-domain
>>> api's.
>>
>> Letting external callers poke around directly in the internals of 
>> iommu_group doesn't look right to me.
> 
> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
> to the IOMMU subsystem. We can easily find other examples:
> 
> iommu_group_get/set_iommudata()
> iommu_group_get/set_name()
> ...

Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich new 
opportunities to fundamentally break correct operation of the API:

	dom = iommu_domain_alloc();
	iommu_attach_group(dom, grp);
	...
	iommu_group_set_domain(grp, NULL);
	// oops, leaked and can't ever detach properly now

or perhaps:

	grp = iommu_group_alloc();
	iommu_group_add_device(grp, dev);
	iommu_group_set_domain(grp, dom);
	...
	iommu_detach_group(dom, grp);
	// oops, IOMMU driver might not handle this

>> If a regular device is attached to one or more aux domains for PASID 
>> use, iommu_get_domain_for_dev() is still going to return the primary 
>> domain, so why should it be expected to behave differently for mediated
> 
> Unlike the normal device attach, we will encounter two devices when it
> comes to aux-domain.
> 
> - Parent physical device - this might be, for example, a PCIe device
> with PASID feature support, hence it is able to tag an unique PASID
> for DMA transfers originated from its subset. The device driver hence
> is able to wrapper this subset into an isolated:
> 
> - Mediated device - a fake device created by the device driver mentioned
> above.
> 
> Yes. All you mentioned are right for the parent device. But for mediated
> device, iommu_get_domain_for_dev() doesn't work even it has an valid
> iommu_group and iommu_domain.
> 
> iommu_get_domain_for_dev() is a necessary interface for device drivers
> which want to support aux-domain. For example,

Only if they want to follow this very specific notion of using made-up 
devices and groups to represent aux attachments. Even if a driver 
managing its own aux domains entirely privately does create child 
devices for them, it's not like it can't keep its domain pointers in 
drvdata if it wants to ;)

Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.

>            struct iommu_domain *domain;
>            struct device *dev = mdev_dev(mdev);
>        unsigned long pasid;
> 
>            domain = iommu_get_domain_for_dev(dev);
>            if (!domain)
>                    return -ENODEV;
> 
>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>        if (pasid == IOASID_INVALID)
>            return -EINVAL;
> 
>        /* Program the device context with the PASID value */
>        ....
> 
> Without this fix, iommu_get_domain_for_dev() always returns NULL and the
> device driver has no means to support aux-domain.

So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. Either 
way, simply punching holes in the API for mdev to hack around its own 
mess doesn't seem like the best thing to do.

The initial impression I got was that it's implicitly assumed here that 
the mdev itself is attached to exactly one aux domain and nothing else, 
at which point I would wonder why it's using aux at all, but are you 
saying that in fact no attach happens with the mdev group either way, 
only to the parent device?

I'll admit I'm not hugely familiar with any of this, but it seems to me 
that the logical flow should be:

	- allocate domain
	- attach as aux to parent
	- retrieve aux domain PASID
	- create mdev child based on PASID
	- attach mdev to domain (normally)

Of course that might require giving the IOMMU API a proper first-class 
notion of mediated devices, such that it knows the mdev represents the 
PASID, and can recognise the mdev attach is equivalent to the earlier 
parent aux attach so not just blindly hand it down to an IOMMU driver 
that's never heard of this new device before. Or perhaps the IOMMU 
drivers do their own bookkeeping for the mdev bus, such that they do 
handle the attach call, and just validate it internally based on the 
associated parent device and PASID. Either way, the inside maintains 
self-consistency and from the outside it looks like standard API usage 
without nasty hacks.

I'm pretty sure I've heard suggestions of using mediated devices beyond 
VFIO (e.g. within the kernel itself), so chances are this is a direction 
that we'll have to take at some point anyway.

And, that said, even if people do want an immediate quick fix regardless 
of technical debt, I'd still be a lot happier to see 
iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)

Robin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-06-30 16:51     ` Robin Murphy
@ 2020-07-01  7:32       ` Lu Baolu
  2020-07-01 12:18         ` Robin Murphy
  0 siblings, 1 reply; 10+ messages in thread
From: Lu Baolu @ 2020-07-01  7:32 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Alex Williamson
  Cc: baolu.lu, Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

Hi Robin,

On 2020/7/1 0:51, Robin Murphy wrote:
> On 2020-06-30 02:03, Lu Baolu wrote:
>> Hi Robin,
>>
>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>> creation and passthr are:
>>>>
>>>> - Create a group for mdev with iommu_group_alloc();
>>>> - Add the device to the group with
>>>>          group = iommu_group_alloc();
>>>>          if (IS_ERR(group))
>>>>                  return PTR_ERR(group);
>>>>
>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>          if (!ret)
>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>                           iommu_group_id(group));
>>>> - Allocate an aux-domain
>>>>     iommu_domain_alloc()
>>>> - Attach the aux-domain to the physical device from which the mdev is
>>>>    created.
>>>>     iommu_aux_attach_device()
>>>>
>>>> In the whole process, an iommu group was allocated for the mdev and an
>>>> iommu domain was attached to the group, but the group->domain leaves
>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>>>
>>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>>> managed whenever a domain is attached or detached through the 
>>>> aux-domain
>>>> api's.
>>>
>>> Letting external callers poke around directly in the internals of 
>>> iommu_group doesn't look right to me.
>>
>> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
>> to the IOMMU subsystem. We can easily find other examples:
>>
>> iommu_group_get/set_iommudata()
>> iommu_group_get/set_name()
>> ...
> 
> Sure, but those are ways for users of a group to attach useful 
> information of their own to it, that doesn't matter to the IOMMU 
> subsystem itself. The interface you've proposed gives callers rich new 
> opportunities to fundamentally break correct operation of the API:
> 
>      dom = iommu_domain_alloc();
>      iommu_attach_group(dom, grp);
>      ...
>      iommu_group_set_domain(grp, NULL);
>      // oops, leaked and can't ever detach properly now
> 
> or perhaps:
> 
>      grp = iommu_group_alloc();
>      iommu_group_add_device(grp, dev);
>      iommu_group_set_domain(grp, dom);
>      ...
>      iommu_detach_group(dom, grp);
>      // oops, IOMMU driver might not handle this
> 
>>> If a regular device is attached to one or more aux domains for PASID 
>>> use, iommu_get_domain_for_dev() is still going to return the primary 
>>> domain, so why should it be expected to behave differently for mediated
>>
>> Unlike the normal device attach, we will encounter two devices when it
>> comes to aux-domain.
>>
>> - Parent physical device - this might be, for example, a PCIe device
>> with PASID feature support, hence it is able to tag an unique PASID
>> for DMA transfers originated from its subset. The device driver hence
>> is able to wrapper this subset into an isolated:
>>
>> - Mediated device - a fake device created by the device driver mentioned
>> above.
>>
>> Yes. All you mentioned are right for the parent device. But for mediated
>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>> iommu_group and iommu_domain.
>>
>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>> which want to support aux-domain. For example,
> 
> Only if they want to follow this very specific notion of using made-up 
> devices and groups to represent aux attachments. Even if a driver 
> managing its own aux domains entirely privately does create child 
> devices for them, it's not like it can't keep its domain pointers in 
> drvdata if it wants to ;)
> 
> Let's not conflate the current implementation of vfio_mdev with the 
> general concepts involved here.
> 
>>            struct iommu_domain *domain;
>>            struct device *dev = mdev_dev(mdev);
>>        unsigned long pasid;
>>
>>            domain = iommu_get_domain_for_dev(dev);
>>            if (!domain)
>>                    return -ENODEV;
>>
>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>        if (pasid == IOASID_INVALID)
>>            return -EINVAL;
>>
>>        /* Program the device context with the PASID value */
>>        ....
>>
>> Without this fix, iommu_get_domain_for_dev() always returns NULL and the
>> device driver has no means to support aux-domain.
> 
> So either the IOMMU API itself is missing the ability to do the right 
> thing internally, or the mdev layer isn't using it appropriately. Either 
> way, simply punching holes in the API for mdev to hack around its own 
> mess doesn't seem like the best thing to do.
> 
> The initial impression I got was that it's implicitly assumed here that 
> the mdev itself is attached to exactly one aux domain and nothing else, 
> at which point I would wonder why it's using aux at all, but are you 
> saying that in fact no attach happens with the mdev group either way, 
> only to the parent device?
> 
> I'll admit I'm not hugely familiar with any of this, but it seems to me 
> that the logical flow should be:
> 
>      - allocate domain
>      - attach as aux to parent
>      - retrieve aux domain PASID
>      - create mdev child based on PASID
>      - attach mdev to domain (normally)
> 
> Of course that might require giving the IOMMU API a proper first-class 
> notion of mediated devices, such that it knows the mdev represents the 
> PASID, and can recognise the mdev attach is equivalent to the earlier 
> parent aux attach so not just blindly hand it down to an IOMMU driver 
> that's never heard of this new device before. Or perhaps the IOMMU 
> drivers do their own bookkeeping for the mdev bus, such that they do 
> handle the attach call, and just validate it internally based on the 
> associated parent device and PASID. Either way, the inside maintains 
> self-consistency and from the outside it looks like standard API usage 
> without nasty hacks.
> 
> I'm pretty sure I've heard suggestions of using mediated devices beyond 
> VFIO (e.g. within the kernel itself), so chances are this is a direction 
> that we'll have to take at some point anyway.
> 
> And, that said, even if people do want an immediate quick fix regardless 
> of technical debt, I'd still be a lot happier to see 
> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)

Get your point and I agree with your concerns.

To maintain the relationship between mdev's iommu_group and
iommu_domain, how about extending below existing aux_attach api

int iommu_aux_attach_device(struct iommu_domain *domain,
			    struct device *dev)

by adding the mdev's iommu_group?

int iommu_aux_attach_device(struct iommu_domain *domain,
			    struct device *dev,
			    struct iommu_group *group)

And, in iommu_aux_attach_device(), we require,
  - @group only has a single device;
  - @group hasn't been attached by any devices;
  - Set the @domain to @group

Just like what we've done in iommu_attach_device().

Any thoughts?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-07-01  7:32       ` Lu Baolu
@ 2020-07-01 12:18         ` Robin Murphy
  2020-07-02  1:32           ` Lu Baolu
  2020-07-02  2:36           ` Lu Baolu
  0 siblings, 2 replies; 10+ messages in thread
From: Robin Murphy @ 2020-07-01 12:18 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Alex Williamson
  Cc: Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

On 2020-07-01 08:32, Lu Baolu wrote:
> Hi Robin,
> 
> On 2020/7/1 0:51, Robin Murphy wrote:
>> On 2020-06-30 02:03, Lu Baolu wrote:
>>> Hi Robin,
>>>
>>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>>> creation and passthr are:
>>>>>
>>>>> - Create a group for mdev with iommu_group_alloc();
>>>>> - Add the device to the group with
>>>>>          group = iommu_group_alloc();
>>>>>          if (IS_ERR(group))
>>>>>                  return PTR_ERR(group);
>>>>>
>>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>>          if (!ret)
>>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>>                           iommu_group_id(group));
>>>>> - Allocate an aux-domain
>>>>>     iommu_domain_alloc()
>>>>> - Attach the aux-domain to the physical device from which the mdev is
>>>>>    created.
>>>>>     iommu_aux_attach_device()
>>>>>
>>>>> In the whole process, an iommu group was allocated for the mdev and an
>>>>> iommu domain was attached to the group, but the group->domain leaves
>>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>>>>
>>>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>>>> managed whenever a domain is attached or detached through the 
>>>>> aux-domain
>>>>> api's.
>>>>
>>>> Letting external callers poke around directly in the internals of 
>>>> iommu_group doesn't look right to me.
>>>
>>> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
>>> to the IOMMU subsystem. We can easily find other examples:
>>>
>>> iommu_group_get/set_iommudata()
>>> iommu_group_get/set_name()
>>> ...
>>
>> Sure, but those are ways for users of a group to attach useful 
>> information of their own to it, that doesn't matter to the IOMMU 
>> subsystem itself. The interface you've proposed gives callers rich new 
>> opportunities to fundamentally break correct operation of the API:
>>
>>      dom = iommu_domain_alloc();
>>      iommu_attach_group(dom, grp);
>>      ...
>>      iommu_group_set_domain(grp, NULL);
>>      // oops, leaked and can't ever detach properly now
>>
>> or perhaps:
>>
>>      grp = iommu_group_alloc();
>>      iommu_group_add_device(grp, dev);
>>      iommu_group_set_domain(grp, dom);
>>      ...
>>      iommu_detach_group(dom, grp);
>>      // oops, IOMMU driver might not handle this
>>
>>>> If a regular device is attached to one or more aux domains for PASID 
>>>> use, iommu_get_domain_for_dev() is still going to return the primary 
>>>> domain, so why should it be expected to behave differently for mediated
>>>
>>> Unlike the normal device attach, we will encounter two devices when it
>>> comes to aux-domain.
>>>
>>> - Parent physical device - this might be, for example, a PCIe device
>>> with PASID feature support, hence it is able to tag an unique PASID
>>> for DMA transfers originated from its subset. The device driver hence
>>> is able to wrapper this subset into an isolated:
>>>
>>> - Mediated device - a fake device created by the device driver mentioned
>>> above.
>>>
>>> Yes. All you mentioned are right for the parent device. But for mediated
>>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>>> iommu_group and iommu_domain.
>>>
>>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>>> which want to support aux-domain. For example,
>>
>> Only if they want to follow this very specific notion of using made-up 
>> devices and groups to represent aux attachments. Even if a driver 
>> managing its own aux domains entirely privately does create child 
>> devices for them, it's not like it can't keep its domain pointers in 
>> drvdata if it wants to ;)
>>
>> Let's not conflate the current implementation of vfio_mdev with the 
>> general concepts involved here.
>>
>>>            struct iommu_domain *domain;
>>>            struct device *dev = mdev_dev(mdev);
>>>        unsigned long pasid;
>>>
>>>            domain = iommu_get_domain_for_dev(dev);
>>>            if (!domain)
>>>                    return -ENODEV;
>>>
>>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>>        if (pasid == IOASID_INVALID)
>>>            return -EINVAL;
>>>
>>>        /* Program the device context with the PASID value */
>>>        ....
>>>
>>> Without this fix, iommu_get_domain_for_dev() always returns NULL and the
>>> device driver has no means to support aux-domain.
>>
>> So either the IOMMU API itself is missing the ability to do the right 
>> thing internally, or the mdev layer isn't using it appropriately. 
>> Either way, simply punching holes in the API for mdev to hack around 
>> its own mess doesn't seem like the best thing to do.
>>
>> The initial impression I got was that it's implicitly assumed here 
>> that the mdev itself is attached to exactly one aux domain and nothing 
>> else, at which point I would wonder why it's using aux at all, but are 
>> you saying that in fact no attach happens with the mdev group either 
>> way, only to the parent device?
>>
>> I'll admit I'm not hugely familiar with any of this, but it seems to 
>> me that the logical flow should be:
>>
>>      - allocate domain
>>      - attach as aux to parent
>>      - retrieve aux domain PASID
>>      - create mdev child based on PASID
>>      - attach mdev to domain (normally)
>>
>> Of course that might require giving the IOMMU API a proper first-class 
>> notion of mediated devices, such that it knows the mdev represents the 
>> PASID, and can recognise the mdev attach is equivalent to the earlier 
>> parent aux attach so not just blindly hand it down to an IOMMU driver 
>> that's never heard of this new device before. Or perhaps the IOMMU 
>> drivers do their own bookkeeping for the mdev bus, such that they do 
>> handle the attach call, and just validate it internally based on the 
>> associated parent device and PASID. Either way, the inside maintains 
>> self-consistency and from the outside it looks like standard API usage 
>> without nasty hacks.
>>
>> I'm pretty sure I've heard suggestions of using mediated devices 
>> beyond VFIO (e.g. within the kernel itself), so chances are this is a 
>> direction that we'll have to take at some point anyway.
>>
>> And, that said, even if people do want an immediate quick fix 
>> regardless of technical debt, I'd still be a lot happier to see 
>> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)
> 
> Get your point and I agree with your concerns.
> 
> To maintain the relationship between mdev's iommu_group and
> iommu_domain, how about extending below existing aux_attach api
> 
> int iommu_aux_attach_device(struct iommu_domain *domain,
>                  struct device *dev)
> 
> by adding the mdev's iommu_group?
> 
> int iommu_aux_attach_device(struct iommu_domain *domain,
>                  struct device *dev,
>                  struct iommu_group *group)
> 
> And, in iommu_aux_attach_device(), we require,
>   - @group only has a single device;
>   - @group hasn't been attached by any devices;
>   - Set the @domain to @group
> 
> Just like what we've done in iommu_attach_device().
> 
> Any thoughts?

Rather than pass a bare iommu_group with implicit restrictions, it might 
be neater to just pass an mdev_device, so that the IOMMU core can also 
take care of allocating and setting up the group. Then we flag the group 
internally as a special "mdev group" such that we can prevent callers 
from subsequently trying to add/remove devices or attach/detach its 
domain directly. That seems like it would make a pretty straightforward 
and robust API extension, as long as the mdev argument here is optional 
so that SVA and other aux users don't have to care. Other than the 
slightly different ordering where caller would have to allocate the mdev 
first, then finish it's PASID-based configuration afterwards, I guess 
it's not far off what I was thinking yesterday :)

Robin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-07-01 12:18         ` Robin Murphy
@ 2020-07-02  1:32           ` Lu Baolu
  2020-07-02  2:36           ` Lu Baolu
  1 sibling, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2020-07-02  1:32 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Alex Williamson
  Cc: baolu.lu, Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

Hello,

On 7/1/20 8:18 PM, Robin Murphy wrote:
> On 2020-07-01 08:32, Lu Baolu wrote:
>> Hi Robin,
>>
>> On 2020/7/1 0:51, Robin Murphy wrote:
>>> On 2020-06-30 02:03, Lu Baolu wrote:
>>>> Hi Robin,
>>>>
>>>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>>>> creation and passthr are:
>>>>>>
>>>>>> - Create a group for mdev with iommu_group_alloc();
>>>>>> - Add the device to the group with
>>>>>>          group = iommu_group_alloc();
>>>>>>          if (IS_ERR(group))
>>>>>>                  return PTR_ERR(group);
>>>>>>
>>>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>>>          if (!ret)
>>>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>>>                           iommu_group_id(group));
>>>>>> - Allocate an aux-domain
>>>>>>     iommu_domain_alloc()
>>>>>> - Attach the aux-domain to the physical device from which the mdev is
>>>>>>    created.
>>>>>>     iommu_aux_attach_device()
>>>>>>
>>>>>> In the whole process, an iommu group was allocated for the mdev 
>>>>>> and an
>>>>>> iommu domain was attached to the group, but the group->domain leaves
>>>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>>>>>
>>>>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>>>>> managed whenever a domain is attached or detached through the 
>>>>>> aux-domain
>>>>>> api's.
>>>>>
>>>>> Letting external callers poke around directly in the internals of 
>>>>> iommu_group doesn't look right to me.
>>>>
>>>> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
>>>> to the IOMMU subsystem. We can easily find other examples:
>>>>
>>>> iommu_group_get/set_iommudata()
>>>> iommu_group_get/set_name()
>>>> ...
>>>
>>> Sure, but those are ways for users of a group to attach useful 
>>> information of their own to it, that doesn't matter to the IOMMU 
>>> subsystem itself. The interface you've proposed gives callers rich 
>>> new opportunities to fundamentally break correct operation of the API:
>>>
>>>      dom = iommu_domain_alloc();
>>>      iommu_attach_group(dom, grp);
>>>      ...
>>>      iommu_group_set_domain(grp, NULL);
>>>      // oops, leaked and can't ever detach properly now
>>>
>>> or perhaps:
>>>
>>>      grp = iommu_group_alloc();
>>>      iommu_group_add_device(grp, dev);
>>>      iommu_group_set_domain(grp, dom);
>>>      ...
>>>      iommu_detach_group(dom, grp);
>>>      // oops, IOMMU driver might not handle this
>>>
>>>>> If a regular device is attached to one or more aux domains for 
>>>>> PASID use, iommu_get_domain_for_dev() is still going to return the 
>>>>> primary domain, so why should it be expected to behave differently 
>>>>> for mediated
>>>>
>>>> Unlike the normal device attach, we will encounter two devices when it
>>>> comes to aux-domain.
>>>>
>>>> - Parent physical device - this might be, for example, a PCIe device
>>>> with PASID feature support, hence it is able to tag an unique PASID
>>>> for DMA transfers originated from its subset. The device driver hence
>>>> is able to wrapper this subset into an isolated:
>>>>
>>>> - Mediated device - a fake device created by the device driver 
>>>> mentioned
>>>> above.
>>>>
>>>> Yes. All you mentioned are right for the parent device. But for 
>>>> mediated
>>>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>>>> iommu_group and iommu_domain.
>>>>
>>>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>>>> which want to support aux-domain. For example,
>>>
>>> Only if they want to follow this very specific notion of using 
>>> made-up devices and groups to represent aux attachments. Even if a 
>>> driver managing its own aux domains entirely privately does create 
>>> child devices for them, it's not like it can't keep its domain 
>>> pointers in drvdata if it wants to ;)
>>>
>>> Let's not conflate the current implementation of vfio_mdev with the 
>>> general concepts involved here.
>>>
>>>>            struct iommu_domain *domain;
>>>>            struct device *dev = mdev_dev(mdev);
>>>>        unsigned long pasid;
>>>>
>>>>            domain = iommu_get_domain_for_dev(dev);
>>>>            if (!domain)
>>>>                    return -ENODEV;
>>>>
>>>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>>>        if (pasid == IOASID_INVALID)
>>>>            return -EINVAL;
>>>>
>>>>        /* Program the device context with the PASID value */
>>>>        ....
>>>>
>>>> Without this fix, iommu_get_domain_for_dev() always returns NULL and 
>>>> the
>>>> device driver has no means to support aux-domain.
>>>
>>> So either the IOMMU API itself is missing the ability to do the right 
>>> thing internally, or the mdev layer isn't using it appropriately. 
>>> Either way, simply punching holes in the API for mdev to hack around 
>>> its own mess doesn't seem like the best thing to do.
>>>
>>> The initial impression I got was that it's implicitly assumed here 
>>> that the mdev itself is attached to exactly one aux domain and 
>>> nothing else, at which point I would wonder why it's using aux at 
>>> all, but are you saying that in fact no attach happens with the mdev 
>>> group either way, only to the parent device?
>>>
>>> I'll admit I'm not hugely familiar with any of this, but it seems to 
>>> me that the logical flow should be:
>>>
>>>      - allocate domain
>>>      - attach as aux to parent
>>>      - retrieve aux domain PASID
>>>      - create mdev child based on PASID
>>>      - attach mdev to domain (normally)
>>>
>>> Of course that might require giving the IOMMU API a proper 
>>> first-class notion of mediated devices, such that it knows the mdev 
>>> represents the PASID, and can recognise the mdev attach is equivalent 
>>> to the earlier parent aux attach so not just blindly hand it down to 
>>> an IOMMU driver that's never heard of this new device before. Or 
>>> perhaps the IOMMU drivers do their own bookkeeping for the mdev bus, 
>>> such that they do handle the attach call, and just validate it 
>>> internally based on the associated parent device and PASID. Either 
>>> way, the inside maintains self-consistency and from the outside it 
>>> looks like standard API usage without nasty hacks.
>>>
>>> I'm pretty sure I've heard suggestions of using mediated devices 
>>> beyond VFIO (e.g. within the kernel itself), so chances are this is a 
>>> direction that we'll have to take at some point anyway.
>>>
>>> And, that said, even if people do want an immediate quick fix 
>>> regardless of technical debt, I'd still be a lot happier to see 
>>> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)
>>
>> Get your point and I agree with your concerns.
>>
>> To maintain the relationship between mdev's iommu_group and
>> iommu_domain, how about extending below existing aux_attach api
>>
>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>                  struct device *dev)
>>
>> by adding the mdev's iommu_group?
>>
>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>                  struct device *dev,
>>                  struct iommu_group *group)
>>
>> And, in iommu_aux_attach_device(), we require,
>>   - @group only has a single device;
>>   - @group hasn't been attached by any devices;
>>   - Set the @domain to @group
>>
>> Just like what we've done in iommu_attach_device().
>>
>> Any thoughts?
> 
> Rather than pass a bare iommu_group with implicit restrictions, it might 
> be neater to just pass an mdev_device, so that the IOMMU core can also 
> take care of allocating and setting up the group. Then we flag the group 
> internally as a special "mdev group" such that we can prevent callers 
> from subsequently trying to add/remove devices or attach/detach its 
> domain directly. That seems like it would make a pretty straightforward 
> and robust API extension, as long as the mdev argument here is optional 
> so that SVA and other aux users don't have to care. Other than the 
> slightly different ordering where caller would have to allocate the mdev 
> first, then finish it's PASID-based configuration afterwards, I guess 
> it's not far off what I was thinking yesterday :)

Hi Alex, Joerg and others, any comments here?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-07-01 12:18         ` Robin Murphy
  2020-07-02  1:32           ` Lu Baolu
@ 2020-07-02  2:36           ` Lu Baolu
  2020-07-07  1:26             ` Lu Baolu
  1 sibling, 1 reply; 10+ messages in thread
From: Lu Baolu @ 2020-07-02  2:36 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Alex Williamson
  Cc: baolu.lu, Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

Hi Robin,

On 7/1/20 8:18 PM, Robin Murphy wrote:
> On 2020-07-01 08:32, Lu Baolu wrote:
>> Hi Robin,
>>
>> On 2020/7/1 0:51, Robin Murphy wrote:
>>> On 2020-06-30 02:03, Lu Baolu wrote:
>>>> Hi Robin,
>>>>
>>>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>>>> creation and passthr are:
>>>>>>
>>>>>> - Create a group for mdev with iommu_group_alloc();
>>>>>> - Add the device to the group with
>>>>>>          group = iommu_group_alloc();
>>>>>>          if (IS_ERR(group))
>>>>>>                  return PTR_ERR(group);
>>>>>>
>>>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>>>          if (!ret)
>>>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>>>                           iommu_group_id(group));
>>>>>> - Allocate an aux-domain
>>>>>>     iommu_domain_alloc()
>>>>>> - Attach the aux-domain to the physical device from which the mdev is
>>>>>>    created.
>>>>>>     iommu_aux_attach_device()
>>>>>>
>>>>>> In the whole process, an iommu group was allocated for the mdev 
>>>>>> and an
>>>>>> iommu domain was attached to the group, but the group->domain leaves
>>>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>>>>>
>>>>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>>>>> managed whenever a domain is attached or detached through the 
>>>>>> aux-domain
>>>>>> api's.
>>>>>
>>>>> Letting external callers poke around directly in the internals of 
>>>>> iommu_group doesn't look right to me.
>>>>
>>>> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
>>>> to the IOMMU subsystem. We can easily find other examples:
>>>>
>>>> iommu_group_get/set_iommudata()
>>>> iommu_group_get/set_name()
>>>> ...
>>>
>>> Sure, but those are ways for users of a group to attach useful 
>>> information of their own to it, that doesn't matter to the IOMMU 
>>> subsystem itself. The interface you've proposed gives callers rich 
>>> new opportunities to fundamentally break correct operation of the API:
>>>
>>>      dom = iommu_domain_alloc();
>>>      iommu_attach_group(dom, grp);
>>>      ...
>>>      iommu_group_set_domain(grp, NULL);
>>>      // oops, leaked and can't ever detach properly now
>>>
>>> or perhaps:
>>>
>>>      grp = iommu_group_alloc();
>>>      iommu_group_add_device(grp, dev);
>>>      iommu_group_set_domain(grp, dom);
>>>      ...
>>>      iommu_detach_group(dom, grp);
>>>      // oops, IOMMU driver might not handle this
>>>
>>>>> If a regular device is attached to one or more aux domains for 
>>>>> PASID use, iommu_get_domain_for_dev() is still going to return the 
>>>>> primary domain, so why should it be expected to behave differently 
>>>>> for mediated
>>>>
>>>> Unlike the normal device attach, we will encounter two devices when it
>>>> comes to aux-domain.
>>>>
>>>> - Parent physical device - this might be, for example, a PCIe device
>>>> with PASID feature support, hence it is able to tag an unique PASID
>>>> for DMA transfers originated from its subset. The device driver hence
>>>> is able to wrapper this subset into an isolated:
>>>>
>>>> - Mediated device - a fake device created by the device driver 
>>>> mentioned
>>>> above.
>>>>
>>>> Yes. All you mentioned are right for the parent device. But for 
>>>> mediated
>>>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>>>> iommu_group and iommu_domain.
>>>>
>>>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>>>> which want to support aux-domain. For example,
>>>
>>> Only if they want to follow this very specific notion of using 
>>> made-up devices and groups to represent aux attachments. Even if a 
>>> driver managing its own aux domains entirely privately does create 
>>> child devices for them, it's not like it can't keep its domain 
>>> pointers in drvdata if it wants to ;)
>>>
>>> Let's not conflate the current implementation of vfio_mdev with the 
>>> general concepts involved here.
>>>
>>>>            struct iommu_domain *domain;
>>>>            struct device *dev = mdev_dev(mdev);
>>>>        unsigned long pasid;
>>>>
>>>>            domain = iommu_get_domain_for_dev(dev);
>>>>            if (!domain)
>>>>                    return -ENODEV;
>>>>
>>>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>>>        if (pasid == IOASID_INVALID)
>>>>            return -EINVAL;
>>>>
>>>>        /* Program the device context with the PASID value */
>>>>        ....
>>>>
>>>> Without this fix, iommu_get_domain_for_dev() always returns NULL and 
>>>> the
>>>> device driver has no means to support aux-domain.
>>>
>>> So either the IOMMU API itself is missing the ability to do the right 
>>> thing internally, or the mdev layer isn't using it appropriately. 
>>> Either way, simply punching holes in the API for mdev to hack around 
>>> its own mess doesn't seem like the best thing to do.
>>>
>>> The initial impression I got was that it's implicitly assumed here 
>>> that the mdev itself is attached to exactly one aux domain and 
>>> nothing else, at which point I would wonder why it's using aux at 
>>> all, but are you saying that in fact no attach happens with the mdev 
>>> group either way, only to the parent device?
>>>
>>> I'll admit I'm not hugely familiar with any of this, but it seems to 
>>> me that the logical flow should be:
>>>
>>>      - allocate domain
>>>      - attach as aux to parent
>>>      - retrieve aux domain PASID
>>>      - create mdev child based on PASID
>>>      - attach mdev to domain (normally)
>>>
>>> Of course that might require giving the IOMMU API a proper 
>>> first-class notion of mediated devices, such that it knows the mdev 
>>> represents the PASID, and can recognise the mdev attach is equivalent 
>>> to the earlier parent aux attach so not just blindly hand it down to 
>>> an IOMMU driver that's never heard of this new device before. Or 
>>> perhaps the IOMMU drivers do their own bookkeeping for the mdev bus, 
>>> such that they do handle the attach call, and just validate it 
>>> internally based on the associated parent device and PASID. Either 
>>> way, the inside maintains self-consistency and from the outside it 
>>> looks like standard API usage without nasty hacks.
>>>
>>> I'm pretty sure I've heard suggestions of using mediated devices 
>>> beyond VFIO (e.g. within the kernel itself), so chances are this is a 
>>> direction that we'll have to take at some point anyway.
>>>
>>> And, that said, even if people do want an immediate quick fix 
>>> regardless of technical debt, I'd still be a lot happier to see 
>>> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)
>>
>> Get your point and I agree with your concerns.
>>
>> To maintain the relationship between mdev's iommu_group and
>> iommu_domain, how about extending below existing aux_attach api
>>
>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>                  struct device *dev)
>>
>> by adding the mdev's iommu_group?
>>
>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>                  struct device *dev,
>>                  struct iommu_group *group)
>>
>> And, in iommu_aux_attach_device(), we require,
>>   - @group only has a single device;
>>   - @group hasn't been attached by any devices;
>>   - Set the @domain to @group
>>
>> Just like what we've done in iommu_attach_device().
>>
>> Any thoughts?
> 
> Rather than pass a bare iommu_group with implicit restrictions, it might 
> be neater to just pass an mdev_device, so that the IOMMU core can also 
> take care of allocating and setting up the group. Then we flag the group 
> internally as a special "mdev group" such that we can prevent callers 
> from subsequently trying to add/remove devices or attach/detach its 
> domain directly. That seems like it would make a pretty straightforward 
> and robust API extension, as long as the mdev argument here is optional 
> so that SVA and other aux users don't have to care. Other than the 
> slightly different ordering where caller would have to allocate the mdev 
> first, then finish it's PASID-based configuration afterwards, I guess 
> it's not far off what I was thinking yesterday :)

It looks good to me if we pass an *optional* made-up device instead of
iommu_group. But it seems that vfio/mdev assumes an iommu_group first
and then attaches domains to the groups. Hence, it's hard to move the
group allocation and setting up into the attach interface.

As proposed, the new iommu_aux_attach_device() might look like this:

int iommu_aux_attach_device(struct iommu_domain *domain,
                             struct device *phys_dev,
                             struct device *dev)

where,

@phys_dev: The physical device which supports IOMMU_DEV_FEAT_AUX;
@dev: a made-up device which presents the subset resources binding to
       the aux-domain. An example use case is vfio/mdev. For cases where
       no made-up devices are used, pass NULL instead.

With @dev passed, we can require

- single device in group;
- no previous attaching;
- set up internal logistics between group and domain;

The iommu_aux_detach_device() needs the equivalent extensions.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
  2020-07-02  2:36           ` Lu Baolu
@ 2020-07-07  1:26             ` Lu Baolu
  0 siblings, 0 replies; 10+ messages in thread
From: Lu Baolu @ 2020-07-07  1:26 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel, Alex Williamson
  Cc: baolu.lu, Kevin Tian, Dave Jiang, Ashok Raj, kvm, Cornelia Huck,
	linux-kernel, iommu

On 7/2/20 10:36 AM, Lu Baolu wrote:
> Hi Robin,
> 
> On 7/1/20 8:18 PM, Robin Murphy wrote:
>> On 2020-07-01 08:32, Lu Baolu wrote:
>>> Hi Robin,
>>>
>>> On 2020/7/1 0:51, Robin Murphy wrote:
>>>> On 2020-06-30 02:03, Lu Baolu wrote:
>>>>> Hi Robin,
>>>>>
>>>>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>>>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>>>>> creation and passthr are:
>>>>>>>
>>>>>>> - Create a group for mdev with iommu_group_alloc();
>>>>>>> - Add the device to the group with
>>>>>>>          group = iommu_group_alloc();
>>>>>>>          if (IS_ERR(group))
>>>>>>>                  return PTR_ERR(group);
>>>>>>>
>>>>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>>>>          if (!ret)
>>>>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>>>>                           iommu_group_id(group));
>>>>>>> - Allocate an aux-domain
>>>>>>>     iommu_domain_alloc()
>>>>>>> - Attach the aux-domain to the physical device from which the 
>>>>>>> mdev is
>>>>>>>    created.
>>>>>>>     iommu_aux_attach_device()
>>>>>>>
>>>>>>> In the whole process, an iommu group was allocated for the mdev 
>>>>>>> and an
>>>>>>> iommu domain was attached to the group, but the group->domain leaves
>>>>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work 
>>>>>>> anymore.
>>>>>>>
>>>>>>> This adds iommu_group_get/set_domain() so that group->domain 
>>>>>>> could be
>>>>>>> managed whenever a domain is attached or detached through the 
>>>>>>> aux-domain
>>>>>>> api's.
>>>>>>
>>>>>> Letting external callers poke around directly in the internals of 
>>>>>> iommu_group doesn't look right to me.
>>>>>
>>>>> Unfortunately, it seems that the vifo iommu abstraction is deeply 
>>>>> bound
>>>>> to the IOMMU subsystem. We can easily find other examples:
>>>>>
>>>>> iommu_group_get/set_iommudata()
>>>>> iommu_group_get/set_name()
>>>>> ...
>>>>
>>>> Sure, but those are ways for users of a group to attach useful 
>>>> information of their own to it, that doesn't matter to the IOMMU 
>>>> subsystem itself. The interface you've proposed gives callers rich 
>>>> new opportunities to fundamentally break correct operation of the API:
>>>>
>>>>      dom = iommu_domain_alloc();
>>>>      iommu_attach_group(dom, grp);
>>>>      ...
>>>>      iommu_group_set_domain(grp, NULL);
>>>>      // oops, leaked and can't ever detach properly now
>>>>
>>>> or perhaps:
>>>>
>>>>      grp = iommu_group_alloc();
>>>>      iommu_group_add_device(grp, dev);
>>>>      iommu_group_set_domain(grp, dom);
>>>>      ...
>>>>      iommu_detach_group(dom, grp);
>>>>      // oops, IOMMU driver might not handle this
>>>>
>>>>>> If a regular device is attached to one or more aux domains for 
>>>>>> PASID use, iommu_get_domain_for_dev() is still going to return the 
>>>>>> primary domain, so why should it be expected to behave differently 
>>>>>> for mediated
>>>>>
>>>>> Unlike the normal device attach, we will encounter two devices when it
>>>>> comes to aux-domain.
>>>>>
>>>>> - Parent physical device - this might be, for example, a PCIe device
>>>>> with PASID feature support, hence it is able to tag an unique PASID
>>>>> for DMA transfers originated from its subset. The device driver hence
>>>>> is able to wrapper this subset into an isolated:
>>>>>
>>>>> - Mediated device - a fake device created by the device driver 
>>>>> mentioned
>>>>> above.
>>>>>
>>>>> Yes. All you mentioned are right for the parent device. But for 
>>>>> mediated
>>>>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>>>>> iommu_group and iommu_domain.
>>>>>
>>>>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>>>>> which want to support aux-domain. For example,
>>>>
>>>> Only if they want to follow this very specific notion of using 
>>>> made-up devices and groups to represent aux attachments. Even if a 
>>>> driver managing its own aux domains entirely privately does create 
>>>> child devices for them, it's not like it can't keep its domain 
>>>> pointers in drvdata if it wants to ;)
>>>>
>>>> Let's not conflate the current implementation of vfio_mdev with the 
>>>> general concepts involved here.
>>>>
>>>>>            struct iommu_domain *domain;
>>>>>            struct device *dev = mdev_dev(mdev);
>>>>>        unsigned long pasid;
>>>>>
>>>>>            domain = iommu_get_domain_for_dev(dev);
>>>>>            if (!domain)
>>>>>                    return -ENODEV;
>>>>>
>>>>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>>>>        if (pasid == IOASID_INVALID)
>>>>>            return -EINVAL;
>>>>>
>>>>>        /* Program the device context with the PASID value */
>>>>>        ....
>>>>>
>>>>> Without this fix, iommu_get_domain_for_dev() always returns NULL 
>>>>> and the
>>>>> device driver has no means to support aux-domain.
>>>>
>>>> So either the IOMMU API itself is missing the ability to do the 
>>>> right thing internally, or the mdev layer isn't using it 
>>>> appropriately. Either way, simply punching holes in the API for mdev 
>>>> to hack around its own mess doesn't seem like the best thing to do.
>>>>
>>>> The initial impression I got was that it's implicitly assumed here 
>>>> that the mdev itself is attached to exactly one aux domain and 
>>>> nothing else, at which point I would wonder why it's using aux at 
>>>> all, but are you saying that in fact no attach happens with the mdev 
>>>> group either way, only to the parent device?
>>>>
>>>> I'll admit I'm not hugely familiar with any of this, but it seems to 
>>>> me that the logical flow should be:
>>>>
>>>>      - allocate domain
>>>>      - attach as aux to parent
>>>>      - retrieve aux domain PASID
>>>>      - create mdev child based on PASID
>>>>      - attach mdev to domain (normally)
>>>>
>>>> Of course that might require giving the IOMMU API a proper 
>>>> first-class notion of mediated devices, such that it knows the mdev 
>>>> represents the PASID, and can recognise the mdev attach is 
>>>> equivalent to the earlier parent aux attach so not just blindly hand 
>>>> it down to an IOMMU driver that's never heard of this new device 
>>>> before. Or perhaps the IOMMU drivers do their own bookkeeping for 
>>>> the mdev bus, such that they do handle the attach call, and just 
>>>> validate it internally based on the associated parent device and 
>>>> PASID. Either way, the inside maintains self-consistency and from 
>>>> the outside it looks like standard API usage without nasty hacks.
>>>>
>>>> I'm pretty sure I've heard suggestions of using mediated devices 
>>>> beyond VFIO (e.g. within the kernel itself), so chances are this is 
>>>> a direction that we'll have to take at some point anyway.
>>>>
>>>> And, that said, even if people do want an immediate quick fix 
>>>> regardless of technical debt, I'd still be a lot happier to see 
>>>> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)
>>>
>>> Get your point and I agree with your concerns.
>>>
>>> To maintain the relationship between mdev's iommu_group and
>>> iommu_domain, how about extending below existing aux_attach api
>>>
>>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>>                  struct device *dev)
>>>
>>> by adding the mdev's iommu_group?
>>>
>>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>>                  struct device *dev,
>>>                  struct iommu_group *group)
>>>
>>> And, in iommu_aux_attach_device(), we require,
>>>   - @group only has a single device;
>>>   - @group hasn't been attached by any devices;
>>>   - Set the @domain to @group
>>>
>>> Just like what we've done in iommu_attach_device().
>>>
>>> Any thoughts?
>>
>> Rather than pass a bare iommu_group with implicit restrictions, it 
>> might be neater to just pass an mdev_device, so that the IOMMU core 
>> can also take care of allocating and setting up the group. Then we 
>> flag the group internally as a special "mdev group" such that we can 
>> prevent callers from subsequently trying to add/remove devices or 
>> attach/detach its domain directly. That seems like it would make a 
>> pretty straightforward and robust API extension, as long as the mdev 
>> argument here is optional so that SVA and other aux users don't have 
>> to care. Other than the slightly different ordering where caller would 
>> have to allocate the mdev first, then finish it's PASID-based 
>> configuration afterwards, I guess it's not far off what I was thinking 
>> yesterday :)
> 
> It looks good to me if we pass an *optional* made-up device instead of
> iommu_group. But it seems that vfio/mdev assumes an iommu_group first
> and then attaches domains to the groups. Hence, it's hard to move the
> group allocation and setting up into the attach interface.
> 
> As proposed, the new iommu_aux_attach_device() might look like this:
> 
> int iommu_aux_attach_device(struct iommu_domain *domain,
>                              struct device *phys_dev,
>                              struct device *dev)
> 
> where,
> 
> @phys_dev: The physical device which supports IOMMU_DEV_FEAT_AUX;
> @dev: a made-up device which presents the subset resources binding to
>        the aux-domain. An example use case is vfio/mdev. For cases where
>        no made-up devices are used, pass NULL instead.
> 
> With @dev passed, we can require
> 
> - single device in group;
> - no previous attaching;
> - set up internal logistics between group and domain;
> 
> The iommu_aux_detach_device() needs the equivalent extensions.

Okay, let me send out the code first so that people can comment on the
code.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-27  3:15 [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Lu Baolu
2020-06-27  3:15 ` [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach Lu Baolu
2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
2020-06-30  1:03   ` Lu Baolu
2020-06-30 16:51     ` Robin Murphy
2020-07-01  7:32       ` Lu Baolu
2020-07-01 12:18         ` Robin Murphy
2020-07-02  1:32           ` Lu Baolu
2020-07-02  2:36           ` Lu Baolu
2020-07-07  1:26             ` Lu Baolu

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git