archive mirror
 help / color / mirror / Atom feed
From: Robin Murphy <>
To: Lu Baolu <>,
	Joerg Roedel <>,
	Alex Williamson <>
Cc: Kevin Tian <>,
	Dave Jiang <>,
	Ashok Raj <>,, Cornelia Huck <>,,
Subject: Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
Date: Tue, 30 Jun 2020 17:51:29 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 2020-06-30 02:03, Lu Baolu wrote:
> Hi Robin,
> On 6/29/20 7:56 PM, Robin Murphy wrote:
>> On 2020-06-27 04:15, Lu Baolu wrote:
>>> The hardware assistant vfio mediated device is a use case of iommu
>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>> creation and passthr are:
>>> - Create a group for mdev with iommu_group_alloc();
>>> - Add the device to the group with
>>>          group = iommu_group_alloc();
>>>          if (IS_ERR(group))
>>>                  return PTR_ERR(group);
>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>          if (!ret)
>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>                           iommu_group_id(group));
>>> - Allocate an aux-domain
>>>     iommu_domain_alloc()
>>> - Attach the aux-domain to the physical device from which the mdev is
>>>    created.
>>>     iommu_aux_attach_device()
>>> In the whole process, an iommu group was allocated for the mdev and an
>>> iommu domain was attached to the group, but the group->domain leaves
>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>> managed whenever a domain is attached or detached through the aux-domain
>>> api's.
>> Letting external callers poke around directly in the internals of 
>> iommu_group doesn't look right to me.
> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
> to the IOMMU subsystem. We can easily find other examples:
> iommu_group_get/set_iommudata()
> iommu_group_get/set_name()
> ...

Sure, but those are ways for users of a group to attach useful 
information of their own to it, that doesn't matter to the IOMMU 
subsystem itself. The interface you've proposed gives callers rich new 
opportunities to fundamentally break correct operation of the API:

	dom = iommu_domain_alloc();
	iommu_attach_group(dom, grp);
	iommu_group_set_domain(grp, NULL);
	// oops, leaked and can't ever detach properly now

or perhaps:

	grp = iommu_group_alloc();
	iommu_group_add_device(grp, dev);
	iommu_group_set_domain(grp, dom);
	iommu_detach_group(dom, grp);
	// oops, IOMMU driver might not handle this

>> If a regular device is attached to one or more aux domains for PASID 
>> use, iommu_get_domain_for_dev() is still going to return the primary 
>> domain, so why should it be expected to behave differently for mediated
> Unlike the normal device attach, we will encounter two devices when it
> comes to aux-domain.
> - Parent physical device - this might be, for example, a PCIe device
> with PASID feature support, hence it is able to tag an unique PASID
> for DMA transfers originated from its subset. The device driver hence
> is able to wrapper this subset into an isolated:
> - Mediated device - a fake device created by the device driver mentioned
> above.
> Yes. All you mentioned are right for the parent device. But for mediated
> device, iommu_get_domain_for_dev() doesn't work even it has an valid
> iommu_group and iommu_domain.
> iommu_get_domain_for_dev() is a necessary interface for device drivers
> which want to support aux-domain. For example,

Only if they want to follow this very specific notion of using made-up 
devices and groups to represent aux attachments. Even if a driver 
managing its own aux domains entirely privately does create child 
devices for them, it's not like it can't keep its domain pointers in 
drvdata if it wants to ;)

Let's not conflate the current implementation of vfio_mdev with the 
general concepts involved here.

>            struct iommu_domain *domain;
>            struct device *dev = mdev_dev(mdev);
>        unsigned long pasid;
>            domain = iommu_get_domain_for_dev(dev);
>            if (!domain)
>                    return -ENODEV;
>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>        if (pasid == IOASID_INVALID)
>            return -EINVAL;
>        /* Program the device context with the PASID value */
>        ....
> Without this fix, iommu_get_domain_for_dev() always returns NULL and the
> device driver has no means to support aux-domain.

So either the IOMMU API itself is missing the ability to do the right 
thing internally, or the mdev layer isn't using it appropriately. Either 
way, simply punching holes in the API for mdev to hack around its own 
mess doesn't seem like the best thing to do.

The initial impression I got was that it's implicitly assumed here that 
the mdev itself is attached to exactly one aux domain and nothing else, 
at which point I would wonder why it's using aux at all, but are you 
saying that in fact no attach happens with the mdev group either way, 
only to the parent device?

I'll admit I'm not hugely familiar with any of this, but it seems to me 
that the logical flow should be:

	- allocate domain
	- attach as aux to parent
	- retrieve aux domain PASID
	- create mdev child based on PASID
	- attach mdev to domain (normally)

Of course that might require giving the IOMMU API a proper first-class 
notion of mediated devices, such that it knows the mdev represents the 
PASID, and can recognise the mdev attach is equivalent to the earlier 
parent aux attach so not just blindly hand it down to an IOMMU driver 
that's never heard of this new device before. Or perhaps the IOMMU 
drivers do their own bookkeeping for the mdev bus, such that they do 
handle the attach call, and just validate it internally based on the 
associated parent device and PASID. Either way, the inside maintains 
self-consistency and from the outside it looks like standard API usage 
without nasty hacks.

I'm pretty sure I've heard suggestions of using mediated devices beyond 
VFIO (e.g. within the kernel itself), so chances are this is a direction 
that we'll have to take at some point anyway.

And, that said, even if people do want an immediate quick fix regardless 
of technical debt, I'd still be a lot happier to see 
iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)

iommu mailing list

  reply	other threads:[~2020-06-30 16:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27  3:15 [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Lu Baolu
2020-06-27  3:15 ` [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach Lu Baolu
2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
2020-06-30  1:03   ` Lu Baolu
2020-06-30 16:51     ` Robin Murphy [this message]
2020-07-01  7:32       ` Lu Baolu
2020-07-01 12:18         ` Robin Murphy
2020-07-02  1:32           ` Lu Baolu
2020-07-02  2:36           ` Lu Baolu
2020-07-07  1:26             ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).