KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: baolu.lu@linux.intel.com, Kevin Tian <kevin.tian@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	kvm@vger.kernel.org, Cornelia Huck <cohuck@redhat.com>,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org
Subject: Re: [PATCH 1/2] iommu: Add iommu_group_get/set_domain()
Date: Thu, 2 Jul 2020 10:36:21 +0800
Message-ID: <5f3ad162-647c-1295-880b-6b104807ba9a@linux.intel.com> (raw)
In-Reply-To: <ffbb405b-5617-5659-3fc1-302c530aceef@arm.com>

Hi Robin,

On 7/1/20 8:18 PM, Robin Murphy wrote:
> On 2020-07-01 08:32, Lu Baolu wrote:
>> Hi Robin,
>>
>> On 2020/7/1 0:51, Robin Murphy wrote:
>>> On 2020-06-30 02:03, Lu Baolu wrote:
>>>> Hi Robin,
>>>>
>>>> On 6/29/20 7:56 PM, Robin Murphy wrote:
>>>>> On 2020-06-27 04:15, Lu Baolu wrote:
>>>>>> The hardware assistant vfio mediated device is a use case of iommu
>>>>>> aux-domain. The interactions between vfio/mdev and iommu during mdev
>>>>>> creation and passthr are:
>>>>>>
>>>>>> - Create a group for mdev with iommu_group_alloc();
>>>>>> - Add the device to the group with
>>>>>>          group = iommu_group_alloc();
>>>>>>          if (IS_ERR(group))
>>>>>>                  return PTR_ERR(group);
>>>>>>
>>>>>>          ret = iommu_group_add_device(group, &mdev->dev);
>>>>>>          if (!ret)
>>>>>>                  dev_info(&mdev->dev, "MDEV: group_id = %d\n",
>>>>>>                           iommu_group_id(group));
>>>>>> - Allocate an aux-domain
>>>>>>     iommu_domain_alloc()
>>>>>> - Attach the aux-domain to the physical device from which the mdev is
>>>>>>    created.
>>>>>>     iommu_aux_attach_device()
>>>>>>
>>>>>> In the whole process, an iommu group was allocated for the mdev 
>>>>>> and an
>>>>>> iommu domain was attached to the group, but the group->domain leaves
>>>>>> NULL. As the result, iommu_get_domain_for_dev() doesn't work anymore.
>>>>>>
>>>>>> This adds iommu_group_get/set_domain() so that group->domain could be
>>>>>> managed whenever a domain is attached or detached through the 
>>>>>> aux-domain
>>>>>> api's.
>>>>>
>>>>> Letting external callers poke around directly in the internals of 
>>>>> iommu_group doesn't look right to me.
>>>>
>>>> Unfortunately, it seems that the vifo iommu abstraction is deeply bound
>>>> to the IOMMU subsystem. We can easily find other examples:
>>>>
>>>> iommu_group_get/set_iommudata()
>>>> iommu_group_get/set_name()
>>>> ...
>>>
>>> Sure, but those are ways for users of a group to attach useful 
>>> information of their own to it, that doesn't matter to the IOMMU 
>>> subsystem itself. The interface you've proposed gives callers rich 
>>> new opportunities to fundamentally break correct operation of the API:
>>>
>>>      dom = iommu_domain_alloc();
>>>      iommu_attach_group(dom, grp);
>>>      ...
>>>      iommu_group_set_domain(grp, NULL);
>>>      // oops, leaked and can't ever detach properly now
>>>
>>> or perhaps:
>>>
>>>      grp = iommu_group_alloc();
>>>      iommu_group_add_device(grp, dev);
>>>      iommu_group_set_domain(grp, dom);
>>>      ...
>>>      iommu_detach_group(dom, grp);
>>>      // oops, IOMMU driver might not handle this
>>>
>>>>> If a regular device is attached to one or more aux domains for 
>>>>> PASID use, iommu_get_domain_for_dev() is still going to return the 
>>>>> primary domain, so why should it be expected to behave differently 
>>>>> for mediated
>>>>
>>>> Unlike the normal device attach, we will encounter two devices when it
>>>> comes to aux-domain.
>>>>
>>>> - Parent physical device - this might be, for example, a PCIe device
>>>> with PASID feature support, hence it is able to tag an unique PASID
>>>> for DMA transfers originated from its subset. The device driver hence
>>>> is able to wrapper this subset into an isolated:
>>>>
>>>> - Mediated device - a fake device created by the device driver 
>>>> mentioned
>>>> above.
>>>>
>>>> Yes. All you mentioned are right for the parent device. But for 
>>>> mediated
>>>> device, iommu_get_domain_for_dev() doesn't work even it has an valid
>>>> iommu_group and iommu_domain.
>>>>
>>>> iommu_get_domain_for_dev() is a necessary interface for device drivers
>>>> which want to support aux-domain. For example,
>>>
>>> Only if they want to follow this very specific notion of using 
>>> made-up devices and groups to represent aux attachments. Even if a 
>>> driver managing its own aux domains entirely privately does create 
>>> child devices for them, it's not like it can't keep its domain 
>>> pointers in drvdata if it wants to ;)
>>>
>>> Let's not conflate the current implementation of vfio_mdev with the 
>>> general concepts involved here.
>>>
>>>>            struct iommu_domain *domain;
>>>>            struct device *dev = mdev_dev(mdev);
>>>>        unsigned long pasid;
>>>>
>>>>            domain = iommu_get_domain_for_dev(dev);
>>>>            if (!domain)
>>>>                    return -ENODEV;
>>>>
>>>>            pasid = iommu_aux_get_pasid(domain, dev->parent);
>>>>        if (pasid == IOASID_INVALID)
>>>>            return -EINVAL;
>>>>
>>>>        /* Program the device context with the PASID value */
>>>>        ....
>>>>
>>>> Without this fix, iommu_get_domain_for_dev() always returns NULL and 
>>>> the
>>>> device driver has no means to support aux-domain.
>>>
>>> So either the IOMMU API itself is missing the ability to do the right 
>>> thing internally, or the mdev layer isn't using it appropriately. 
>>> Either way, simply punching holes in the API for mdev to hack around 
>>> its own mess doesn't seem like the best thing to do.
>>>
>>> The initial impression I got was that it's implicitly assumed here 
>>> that the mdev itself is attached to exactly one aux domain and 
>>> nothing else, at which point I would wonder why it's using aux at 
>>> all, but are you saying that in fact no attach happens with the mdev 
>>> group either way, only to the parent device?
>>>
>>> I'll admit I'm not hugely familiar with any of this, but it seems to 
>>> me that the logical flow should be:
>>>
>>>      - allocate domain
>>>      - attach as aux to parent
>>>      - retrieve aux domain PASID
>>>      - create mdev child based on PASID
>>>      - attach mdev to domain (normally)
>>>
>>> Of course that might require giving the IOMMU API a proper 
>>> first-class notion of mediated devices, such that it knows the mdev 
>>> represents the PASID, and can recognise the mdev attach is equivalent 
>>> to the earlier parent aux attach so not just blindly hand it down to 
>>> an IOMMU driver that's never heard of this new device before. Or 
>>> perhaps the IOMMU drivers do their own bookkeeping for the mdev bus, 
>>> such that they do handle the attach call, and just validate it 
>>> internally based on the associated parent device and PASID. Either 
>>> way, the inside maintains self-consistency and from the outside it 
>>> looks like standard API usage without nasty hacks.
>>>
>>> I'm pretty sure I've heard suggestions of using mediated devices 
>>> beyond VFIO (e.g. within the kernel itself), so chances are this is a 
>>> direction that we'll have to take at some point anyway.
>>>
>>> And, that said, even if people do want an immediate quick fix 
>>> regardless of technical debt, I'd still be a lot happier to see 
>>> iommu_group_set_domain() lightly respun as iommu_attach_mdev() ;)
>>
>> Get your point and I agree with your concerns.
>>
>> To maintain the relationship between mdev's iommu_group and
>> iommu_domain, how about extending below existing aux_attach api
>>
>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>                  struct device *dev)
>>
>> by adding the mdev's iommu_group?
>>
>> int iommu_aux_attach_device(struct iommu_domain *domain,
>>                  struct device *dev,
>>                  struct iommu_group *group)
>>
>> And, in iommu_aux_attach_device(), we require,
>>   - @group only has a single device;
>>   - @group hasn't been attached by any devices;
>>   - Set the @domain to @group
>>
>> Just like what we've done in iommu_attach_device().
>>
>> Any thoughts?
> 
> Rather than pass a bare iommu_group with implicit restrictions, it might 
> be neater to just pass an mdev_device, so that the IOMMU core can also 
> take care of allocating and setting up the group. Then we flag the group 
> internally as a special "mdev group" such that we can prevent callers 
> from subsequently trying to add/remove devices or attach/detach its 
> domain directly. That seems like it would make a pretty straightforward 
> and robust API extension, as long as the mdev argument here is optional 
> so that SVA and other aux users don't have to care. Other than the 
> slightly different ordering where caller would have to allocate the mdev 
> first, then finish it's PASID-based configuration afterwards, I guess 
> it's not far off what I was thinking yesterday :)

It looks good to me if we pass an *optional* made-up device instead of
iommu_group. But it seems that vfio/mdev assumes an iommu_group first
and then attaches domains to the groups. Hence, it's hard to move the
group allocation and setting up into the attach interface.

As proposed, the new iommu_aux_attach_device() might look like this:

int iommu_aux_attach_device(struct iommu_domain *domain,
                             struct device *phys_dev,
                             struct device *dev)

where,

@phys_dev: The physical device which supports IOMMU_DEV_FEAT_AUX;
@dev: a made-up device which presents the subset resources binding to
       the aux-domain. An example use case is vfio/mdev. For cases where
       no made-up devices are used, pass NULL instead.

With @dev passed, we can require

- single device in group;
- no previous attaching;
- set up internal logistics between group and domain;

The iommu_aux_detach_device() needs the equivalent extensions.

Best regards,
baolu

  parent reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27  3:15 Lu Baolu
2020-06-27  3:15 ` [PATCH 2/2] vfio/type1: Update group->domain after aux attach and detach Lu Baolu
2020-06-29 11:56 ` [PATCH 1/2] iommu: Add iommu_group_get/set_domain() Robin Murphy
2020-06-30  1:03   ` Lu Baolu
2020-06-30 16:51     ` Robin Murphy
2020-07-01  7:32       ` Lu Baolu
2020-07-01 12:18         ` Robin Murphy
2020-07-02  1:32           ` Lu Baolu
2020-07-02  2:36           ` Lu Baolu [this message]
2020-07-07  1:26             ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f3ad162-647c-1295-880b-6b104807ba9a@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=cohuck@redhat.com \
    --cc=dave.jiang@intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git