linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Robin Murphy <robin.murphy@arm.com>
Cc: baolu.lu@linux.intel.com,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Joerg Roedel <joro@8bytes.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Christoph Hellwig <hch@infradead.org>,
	Kevin Tian <kevin.tian@intel.com>,
	Ashok Raj <ashok.raj@intel.com>, Will Deacon <will@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	rafael@kernel.org, Diana Craciun <diana.craciun@oss.nxp.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Eric Auger <eric.auger@redhat.com>, Liu Yi L <yi.l.liu@intel.com>,
	Jacob jun Pan <jacob.jun.pan@intel.com>,
	Chaitanya Kulkarni <kch@nvidia.com>,
	Stuart Yoder <stuyoder@gmail.com>,
	Laurentiu Tudor <laurentiu.tudor@nxp.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	Jonathan Hunter <jonathanh@nvidia.com>,
	Li Yang <leoyang.li@nxp.com>, Dmitry Osipenko <digetx@gmail.com>,
	iommu@lists.linux-foundation.org, linux-pci@vger.kernel.org,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
Date: Thu, 23 Dec 2021 13:53:24 +0800	[thread overview]
Message-ID: <fea0fc91-ac4c-dfe4-f491-5f906bea08bd@linux.intel.com> (raw)
In-Reply-To: <20211223005712.GA1779224@nvidia.com>

Hi Robin and Jason,

On 12/23/21 8:57 AM, Jason Gunthorpe wrote:
> On Wed, Dec 22, 2021 at 08:26:34PM +0000, Robin Murphy wrote:
>> On 21/12/2021 6:46 pm, Jason Gunthorpe wrote:
>>> On Tue, Dec 21, 2021 at 04:50:56PM +0000, Robin Murphy wrote:
>>>
>>>> this proposal is the worst of both worlds, in that drivers still have to be
>>>> just as aware of groups in order to know whether to call the _shared
>>>> interface or not, except it's now entirely implicit and non-obvious.
>>>
>>> Drivers are not aware of groups, where did you see that?
>>
>> `git grep iommu_attach_group -- :^drivers/iommu :^include`
>>
>> Did I really have to explain that?
> 
> Well, yes you did, because it shows you haven't understood my
> question. After this series we deleted all those calls (though Lu, we
> missed one of the tegra ones in staging, let's get it for the next
> posting)

Yes, I will.

> 
> So, after this series, where do you see drivers being aware of groups?
> If things are missed lets expect to fix them.
> 
>>> If the driver uses multiple struct devices and intends to connect them
>>> all to the same domain then it uses the _shared variant. The only
>>> difference between the two is the _shared varient lacks some of the
>>> protections against driver abuse of the API.
>>
>> You've lost me again; how are those intentions any different? Attaching one
>> device to a private domain is a literal subset of attaching more than one
>> device to a private domain.
> 
> Yes it is a subset, but drivers will malfunction if they are not
> designed to have multi-attachment and wrongly get it, and there is
> only one driver that does actually need this.
> 
> I maintain a big driver subsystem and have learned that grepability of
> the driver mess for special cases is quite a good thing to
> have. Forcing drivers to mark in code when they do something weird is
> an advantage, even if it causes some small API redundancy.
> 
> However, if you really feel strongly this should really be one API
> with the _shared implementation I won't argue it any further.
> 
>> So then we have the iommu_attach_group() interface for new code (and still
>> nobody has got round to updating the old code to it yet), for which
>> the
> 
> This series is going in the direction of eliminating
> iommu_attach_group() as part of the driver
> interface. iommu_attach_group() is repurposed to only be useful for
> VFIO.

We can also remove iommu_attach_group() in VFIO because it is
essentially equivalent to

	iommu_group_for_each_dev(group, iommu_attach_device(dev))

> 
>> properly, or iommu_attach_group() with a potentially better interface and
>> actual safety. The former is still more prevalent (and the interface
>> argument compelling), so if we put the new implementation behind that, with
>> the one tweak of having it set DMA_OWNER_PRIVATE_DOMAIN automatically, kill
>> off iommu_attach_group() by converting its couple of users,
> 
> This is what we did, iommu_attach_device() & _shared() are to be the
> only interface for the drivers, and we killed off the
> iommu_attach_group() couple of users except VFIO (the miss of
> drivers/staging excepted)
> 
>> and not only have we solved the VFIO problem but we've also finally
>> updated all the legacy code for free! Of course you can have a
>> separate version for VFIO to attach with
>> DMA_OWNER_PRIVATE_DOMAIN_USER if you like, although I still fail to
>> understand the necessity of the distinction.
> 
> And the seperate version for VFIO is called 'iommu_attach_group()'.
> 
> Lu, it is probably a good idea to add an assertion here that the group
> is in DMA_OWNER_PRIVATE_DOMAIN_USER to make it clear that
> iommu_attach_group() is only for VFIO.
> 
> VFIO has a special requirement that it be able to do:
> 
> +       ret = iommu_group_set_dma_owner(group->iommu_group,
> +                                       DMA_OWNER_PRIVATE_DOMAIN_USER, f.file);
> 
> Without having a iommu_domain to attach.
> 
> This is because of the giant special case that PPC made of VFIO's
> IOMMU code. PPC (aka vfio_iommu_spapr_tce.c) requires the group
> isolation that iommu_group_set_dma_owner() provides, but does not
> actually have an iommu_domain and can not/does not call
> iommu_attach_group().
> 
> Fixing this is a whole other giant adventure I'm hoping David will
> help me unwind next year..
> 
> This series solves this problem by using the two step sequence of
> iommu_group_set_dma_owner()/iommu_attach_group() and conceptually
> redefining how iommu_attach_group() works to require the external
> caller to have done the iommu_group_set_dma_owner() for it. This is
> why the series has three APIs, because the VFIO special one assumes
> external iommu_group_set_dma_owner(). It just happens that is exactly
> the same code as iommu_attach_group() today.
> 
> As for why does DMA_OWNER_PRIVATE_DOMAIN_USER exist? VFIO doesn't have
> an iommu_domain at this point but it still needs the iommu core to
> detatch the default domain. This is what the _USER does.

There is also a contract that after the USER ownership is claimed the
device could be accessed by userspace through the MMIO registers. So,
a device could be accessible by userspace before a user-space I/O
address is attached.

> 
> Soo..
> 
> There is another way to organize this and perhaps it does make more
> sense. I will try to sketch briefly in email, try to imagine the
> gaps..
> 
> API family (== compares to this series):
> 
>     iommu_device_use_dma_api(dev);
>       == iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL);
> 
>     iommu_group_set_dma_owner(group, file);
>       == iommu_device_set_dma_owner(dev, DMA_OWNER_PRIVATE_DOMAIN_USER,
>                                     file);
>       Always detaches all domains from the group

I hope we can drop all group variant APIs as we already have the per-
device interfaces, just iterate all device in the group and call the
device API.

> 
>     iommu_attach_device(domain, dev)
>       == as is in this patch
>       dev and domain are 1:1
> 
>     iommu_attach_device_shared(domain, dev)
>       == as is in this patch
>       dev and domain are N:1
>       * could just be the same as iommu_attach_device
> 
>     iommu_replace_group_domain(group, old_domain, new_domain)
>       Makes group point at new_domain. new_domain can be NULL.
> 
>     iommu_device_unuse_dma_api(dev)
>      == iommu_device_release_dma_owner() in this patch
> 
>     iommu_group_release_dma_owner(group)
>      == iommu_detatch_group() && iommu_group_release_dma_owner()
> 
> VFIO would use the sequence:
> 
>     iommu_group_set_dma_owner(group, file);
>     iommu_replace_group_domain(group, NULL, domain_1);
>     iommu_replace_group_domain(group, domain_1, domain_2);
>     iommu_group_release_dma_owner(group);
> 
> Simple devices would use
> 
>     iommu_attach_device(domain, dev);
>     iommu_detatch_device(domain, dev);
> 
> Tegra would use:
> 
>     iommu_attach_device_shared(domain, dev);
>     iommu_detatch_device_shared(domain, dev);
>     // Or not, if people agree we should not mark this
> 
> DMA API would have the driver core dma_configure do:
>     iommu_device_use_dma_api(dev);
>     dev->driver->probe()
>     iommu_device_unuse_dma_api(dev);
> 
> It is more APIs overall, but perhaps they have a much clearer
> purpose.
> 
> I think it would be clear why iommu_group_set_dma_owner(), which
> actually does detatch, is not the same thing as iommu_attach_device().

iommu_device_set_dma_owner() will eventually call
iommu_group_set_dma_owner(). I didn't get why
iommu_group_set_dma_owner() is special and need to keep.

> 
> I'm not sure if this entirely eliminates
> DMA_OWNER_PRIVATE_DOMAIN_USER, or not, but at least it isn't in the
> API.
> 
> Is it better?

Perhaps I missed anything. I have a simpler idea. We only need to have
below interfaces:

	iommu_device_set_dma_owner(dev, owner);
	iommu_device_release_dma_owner(dev, owner);
	iommu_attach_device(domain, dev, owner);
	iommu_detach_device(domain, dev);

All existing drivers calling iommu_attach_device() remain unchanged
since we already have singleton group enforcement. We only need to add
a default owner type.

For multiple-device group, like drm/tegra, the drivers should claim the
PRIVATE_DOMAIN ownership and call iommu_attach_device(domain, dev,
PRIVATE_DOMAIN) explicitly.

The new iommu_attach_device(domain, dev, owner) is a mix of the existing
iommu_attach_device() and the new iommu_attach_device_shared(). That
means,
	if (group_is_singleton(group))
		__iommu_atttach_device(domain, dev)
	else
		__iommu_attach_device_shared(domain, dev, owner)

The group variant interfaces will be deprecated and replace with the
device ones.

Sorry if I missed anything.

> 
>> What VFIO wants is (conceptually[1]) "attach this device to my domain,
>> provided it and any other devices in its group are managed by a driver I
>> approve of."
> 
> Yes, sure, "conceptually". But, there are troublesome details.
> 
>> VFIO will also need a struct device anyway, because once I get back from my
>> holiday in the new year I need to start working with Simon on evolving the
>> rest of the API away from bus->iommu_ops to dev->iommu so we can finally
>> support IOMMU drivers coexisting[2].
> 
> For VFIO it would be much easier to get the ops from the struct
> iommu_group (eg via iommu_group->default_domain->ops, or whatever).
> 
>> Indeed I agree with that second point, I'm just increasingly baffled how
>> it's not clear to you that there is only one fundamental use-case here.
>> Perhaps I'm too familiar with the history to objectively see how unclear the
>> current state of things might be :/
> 
> I think it is because you are just not familiar with the dark corners
> of VFIO.
> 
> VFIO has a special case, I outlined above.
> 
>>> This is taking 426a to it's logical conclusion and *removing* the
>>> group API from the drivers entirely. This is desirable because drivers
>>> cannot do anything sane with the group.
>>
>> I am in complete agreement with that (to the point of also not liking patch
>> #6).
> 
> Unfortunately patch #6 is only because of VFIO needing to use the
> group as a handle.
> 
> Jason
> 

Best regards,
baolu

  reply	other threads:[~2021-12-23  5:54 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-17  6:36 [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu
2021-12-17  6:36 ` [PATCH v4 01/13] iommu: Add device dma ownership set/release interfaces Lu Baolu
2021-12-17  6:36 ` [PATCH v4 02/13] driver core: Set DMA ownership during driver bind/unbind Lu Baolu
2021-12-22 12:47   ` Greg Kroah-Hartman
2021-12-22 17:52     ` Jason Gunthorpe
2021-12-23  2:08     ` Lu Baolu
2021-12-23  3:02     ` Lu Baolu
2021-12-23  7:13       ` Greg Kroah-Hartman
2021-12-23  7:23         ` Lu Baolu
2021-12-31  0:36           ` Jason Gunthorpe
2021-12-17  6:36 ` [PATCH v4 03/13] PCI: pci_stub: Suppress kernel DMA ownership auto-claiming Lu Baolu
2021-12-29 20:42   ` Bjorn Helgaas
2021-12-30  5:34     ` Lu Baolu
2021-12-30 22:24       ` Bjorn Helgaas
2021-12-31  0:40         ` Jason Gunthorpe
2021-12-31  1:10           ` Lu Baolu
2021-12-31  1:58             ` Lu Baolu
2022-01-03 19:53             ` Jason Gunthorpe
2022-01-04  1:54               ` Lu Baolu
2021-12-31  1:06         ` Lu Baolu
2021-12-17  6:36 ` [PATCH v4 04/13] PCI: portdrv: " Lu Baolu
2021-12-29 21:16   ` Bjorn Helgaas
2021-12-30  5:49     ` Lu Baolu
2021-12-17  6:37 ` [PATCH v4 05/13] iommu: Add security context management for assigned devices Lu Baolu
2021-12-17  6:37 ` [PATCH v4 06/13] iommu: Expose group variants of dma ownership interfaces Lu Baolu
2021-12-17  6:37 ` [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups Lu Baolu
2021-12-21 16:50   ` Robin Murphy
2021-12-21 18:46     ` Jason Gunthorpe
2021-12-22  4:22       ` Lu Baolu
2021-12-22  4:25         ` Lu Baolu
2021-12-22 20:26       ` Robin Murphy
2021-12-23  0:57         ` Jason Gunthorpe
2021-12-23  5:53           ` Lu Baolu [this message]
2021-12-23 14:03             ` Jason Gunthorpe
2021-12-24  1:30               ` Lu Baolu
2021-12-24  2:50                 ` Jason Gunthorpe
2021-12-24  6:44                   ` Lu Baolu
2022-01-04  1:53                   ` Lu Baolu
2021-12-24  3:19         ` Lu Baolu
2021-12-24 14:24           ` Jason Gunthorpe
2021-12-17  6:37 ` [PATCH v4 08/13] vfio: Set DMA USER ownership for VFIO devices Lu Baolu
2021-12-17  6:37 ` [PATCH v4 09/13] vfio: Remove use of vfio_group_viable() Lu Baolu
2021-12-17  6:37 ` [PATCH v4 10/13] vfio: Delete the unbound_list Lu Baolu
2021-12-17  6:37 ` [PATCH v4 11/13] vfio: Remove iommu group notifier Lu Baolu
2021-12-17  6:37 ` [PATCH v4 12/13] iommu: Remove iommu group changes notifier Lu Baolu
2021-12-17  6:37 ` [PATCH v4 13/13] drm/tegra: Use the iommu dma_owner mechanism Lu Baolu
2022-01-04  5:23 ` [PATCH v4 00/13] Fix BUG_ON in vfio_iommu_group_notifier() Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fea0fc91-ac4c-dfe4-f491-5f906bea08bd@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=airlied@linux.ie \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=diana.craciun@oss.nxp.com \
    --cc=digetx@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jonathanh@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kch@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=laurentiu.tudor@nxp.com \
    --cc=leoyang.li@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=stuyoder@gmail.com \
    --cc=thierry.reding@gmail.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).