archive mirror
 help / color / mirror / Atom feed
From: Lu Baolu <>
To: Robin Murphy <>,
	Joerg Roedel <>,
	Alex Williamson <>
Cc: Kevin Tian <>,
	Dave Jiang <>,
	Ashok Raj <>,, Cornelia Huck <>,,
Subject: Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
Date: Tue, 7 Jul 2020 09:26:48 +0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 7/2/20 10:36 AM, Lu Baolu wrote:
> Hi Robin,
> On 7/1/20 8:18 PM, Robin Murphy wrote:
>> On 2020-07-01 08:32, Lu Baolu wrote:
>>> Hi Robin,
>>> On 2020/7/1 0:51, Robin Murphy wrote:
>>>> On 2020-06-30 02:03, Lu Baolu wrote:
>>>>> Hi Robin,
>>>>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>>>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>>>>> creation and passthr are:
>>>>>>> - Create a group for mdev with iommu_group_alloc();
>>>>>>> - Add the device to the group with
>>>>>>>          group = iommu_group_alloc();
>>>>>>>          if (IS_ERR(group))
>>>>>>>                  return PTR_ERR(group);
>>>>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>>>>          if (!ret)
>>>>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>>>>                           iommu_group_id(group));
>>>>>>> - Allocate an aux-domain
>>>>>>>     iommu_domain_alloc()
>>>>>>> - Attach the aux-domain to the physical device from which the 
>>>>>>> mdev is
>>>>>>>    created.
>>>>>>>     iommu_aux_attach_device()
>>>>>>> In the whole process, an iommu group was allocated for the mdev 
>>>>>>> and an
>>>>>>> iommu domain was attached to the group, but the group->domain leaves
>>>>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work 
>>>>>>> anymore.
>>>>>>> This adds iommu_group_get/set_domain() so that group->domain 
>>>>>>> could be
>>>>>>> managed whenever a domain is attached or detached through the 
>>>>>>> aux-domain
>>>>>>> api's.
>>>>>> Letting external callers poke around directly in the internals of 
>>>>>> iommu_group doesn't look right to me.
>>>>> Unfortunately, it seems that the vifo iommu abstraction is deeply 
>>>>> bound
>>>>> to the IOMMU subsystem. We can easily find other examples:
>>>>> iommu_group_get/set_iommudata()
>>>>> iommu_group_get/set_name()
>>>>> ...
>>>> Sure, but those are ways for users of a group to attach useful 
>>>> information of their own to it, that doesn't matter to the IOMMU 
>>>> subsystem itself. The interface you've proposed gives callers rich 
>>>> new opportunities to fundamentally break correct operation of the API:
>>>>      dom = iommu_domain_alloc();
>>>>      iommu_attach_group(dom, grp);
>>>>      ...
>>>>      iommu_group_set_domain(grp, NULL);
>>>>      // oops, leaked and can't ever detach properly now
>>>> or perhaps:
>>>>      grp = iommu_group_alloc();
>>>>      iommu_group_add_device(grp, dev);
>>>>      iommu_group_set_domain(grp, dom);
>>>>      ...
>>>>      iommu_detach_group(dom, grp);
>>>>      // oops, IOMMU driver might not handle this
>>>>>> If a regular device is attached to one or more aux domains for 
>>>>>> PASID use, iommu_get_domain_for_dev() is still going to return the 
>>>>>> primary domain, so why should it be expected to behave differently 
>>>>>> for mediated
>>>>> Unlike the normal device attach, we will encounter two devices when it
>>>>> comes to aux-domain.
>>>>> - Parent physical device - this might be, for example, a PCIe device
>>>>> with PASID feature support, hence it is able to tag an unique PASID
>>>>> for DMA transfers originated from its subset. The device driver hence
>>>>> is able to wrapper this subset into an isolated:
>>>>> - Mediated device - a fake device created by the device driver 
>>>>> mentioned
>>>>> above.
>>>>> Yes. All you mentioned are right for the parent device. But for 
>>>>> mediated
>>>>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>>>>> iommu_group and iommu_domain.
>>>>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>>>>> which want to support aux-domain. For example,
>>>> Only if they want to follow this very specific notion of using 
>>>> made-up devices and groups to represent aux attachments. Even if a 
>>>> driver managing its own aux domains entirely privately does create 
>>>> child devices for them, it's not like it can't keep its domain 
>>>> pointers in drvdata if it wants to ;)
>>>> Let's not conflate the current implementation of vfio_mdev with the 
>>>> general concepts involved here.
>>>>>            struct iommu_domain *domain;
>>>>>            struct device *dev = mdev_dev(mdev);
>>>>>        unsigned long pasid;
>>>>>            domain = iommu_get_domain_for_dev(dev);
>>>>>            if (!domain)
>>>>>                    return -ENODEV;
>>>>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>>>>        if (pasid == IOASID_INVALID)
>>>>>            return -EINVAL;
>>>>>        /* Program the device context with the PASID value */
>>>>>        ....
>>>>> Without this fix, iommu_get_domain_for_dev() always returns NULL 
>>>>> and the
>>>>> device driver has no means to support aux-domain.
>>>> So either the IOMMU API itself is missing the ability to do the 
>>>> right thing internally, or the mdev layer isn't using it 
>>>> appropriately. Either way, simply punching holes in the API for mdev 
>>>> to hack around its own mess doesn't seem like the best thing to do.
>>>> The initial impression I got was that it's implicitly assumed here 
>>>> that the mdev itself is attached to exactly one aux domain and 
>>>> nothing else, at which point I would wonder why it's using aux at 
>>>> all, but are you saying that in fact no attach happens with the mdev 
>>>> group either way, only to the parent device?
>>>> I'll admit I'm not hugely familiar with any of this, but it seems to 
>>>> me that the logical flow should be:
>>>>      - allocate domain
>>>>      - attach as aux to parent
>>>>      - retrieve aux domain PASID
>>>>      - create mdev child based on PASID
>>>>      - attach mdev to domain (normally)
>>>> Of course that might require giving the IOMMU API a proper 
>>>> first-class notion of mediated devices, such that it knows the mdev 
>>>> represents the PASID, and can recognise the mdev attach is 
>>>> equivalent to the earlier parent aux attach so not just blindly hand 
>>>> it down to an IOMMU driver that's never heard of this new device 
>>>> before. Or perhaps the IOMMU drivers do their own bookkeeping for 
>>>> the mdev bus, such that they do handle the attach call, and just 
>>>> validate it internally based on the associated parent device and 
>>>> PASID. Either way, the inside maintains self-consistency and from 
>>>> the outside it looks like standard API usage without nasty hacks.
>>>> I'm pretty sure I've heard suggestions of using mediated devices 
>>>> beyond VFIO (e.g. within the kernel itself), so chances are this is 
>>>> a direction that we'll have to take at some point anyway.
>>>> And, that said, even if people do want an immediate quick fix 
>>>> regardless of technical debt, I'd still be a lot happier to see 
>>>> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)
>>> Get your point and I agree with your concerns.
>>> To maintain the relationship between mdev's iommu_group and
>>> iommu_domain, how about extending below existing aux_attach api
>>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>>                  struct device *dev)
>>> by adding the mdev's iommu_group?
>>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>>                  struct device *dev,
>>>                  struct iommu_group *group)
>>> And, in iommu_aux_attach_device(), we require,
>>>   - @group only has a single device;
>>>   - @group hasn't been attached by any devices;
>>>   - Set the @domain to @group
>>> Just like what we've done in iommu_attach_device().
>>> Any thoughts?
>> Rather than pass a bare iommu_group with implicit restrictions, it 
>> might be neater to just pass an mdev_device, so that the IOMMU core 
>> can also take care of allocating and setting up the group. Then we 
>> flag the group internally as a special "mdev group" such that we can 
>> prevent callers from subsequently trying to add/remove devices or 
>> attach/detach its domain directly. That seems like it would make a 
>> pretty straightforward and robust API extension, as long as the mdev 
>> argument here is optional so that SVA and other aux users don't have 
>> to care. Other than the slightly different ordering where caller would 
>> have to allocate the mdev first, then finish it's PASID-based 
>> configuration afterwards, I guess it's not far off what I was thinking 
>> yesterday :)
> It looks good to me if we pass an *optional* made-up device instead of
> iommu_group. But it seems that vfio/mdev assumes an iommu_group first
> and then attaches domains to the groups. Hence, it's hard to move the
> group allocation and setting up into the attach interface.
> As proposed, the new iommu_aux_attach_device() might look like this:
> int iommu_aux_attach_device(struct iommu_domain *domain,
>                              struct device *phys_dev,
>                              struct device *dev)
> where,
> @phys_dev: The physical device which supports IOMMU_DEV_FEAT_AUX;
> @dev: a made-up device which presents the subset resources binding to
>        the aux-domain. An example use case is vfio/mdev. For cases where
>        no made-up devices are used, pass NULL instead.
> With @dev passed, we can require
> - single device in group;
> - no previous attaching;
> - set up internal logistics between group and domain;
> The iommu_aux_detach_device() needs the equivalent extensions.

Okay, let me send out the code first so that people can comment on the

Best regards,
iommu mailing list

      reply	other threads:[~2020-07-07  1:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27  3:15 [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Lu Baolu
2020-06-27  3:15 ` [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach Lu Baolu
2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
2020-06-30  1:03   ` Lu Baolu
2020-06-30 16:51     ` Robin Murphy
2020-07-01  7:32       ` Lu Baolu
2020-07-01 12:18         ` Robin Murphy
2020-07-02  1:32           ` Lu Baolu
2020-07-02  2:36           ` Lu Baolu
2020-07-07  1:26             ` Lu Baolu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).