kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>, Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"tiwei.bie@intel.com" <tiwei.bie@intel.com>,
	"Zeng, Xin" <xin.zeng@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>
Subject: Re: [PATCH v8 7/9] vfio/mdev: Add iommu related member in mdev_device
Date: Mon, 17 May 2021 11:03:16 -0300	[thread overview]
Message-ID: <20210517140316.GN1002214@nvidia.com> (raw)
In-Reply-To: <MWHPR11MB18866988310787FE573763878C529@MWHPR11MB1886.namprd11.prod.outlook.com>

On Wed, May 12, 2021 at 07:46:05AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, May 12, 2021 1:37 AM
> > 
> > On Tue, May 11, 2021 at 02:56:05PM +0800, Lu Baolu wrote:
> > 
> > > >     After my next series the mdev drivers will have direct access to
> > > >     the vfio_device. So an alternative to using the struct device, or
> > > >     adding 'if mdev' is to add an API to the vfio_device world to
> > > >     inject what iommu configuration is needed from that direction
> > > >     instead of trying to discover it from a struct device.
> > >
> > > Just want to make sure that I understand you correctly.
> > >
> > > We should use the existing IOMMU in-kernel APIs to connect mdev with the
> > > iommu subsystem, so that the upper lays don't need to use something
> > > like (if dev_is_mdev) to handle mdev differently. Do I get you
> > > correctly?
> > 
> > After going through all the /dev/ioasid stuff I'm pretty convinced
> > that none of the PASID use cases for mdev should need any iommu
> > connection from the mdev_device - this is an artifact of trying to
> > cram the vfio container and group model into the mdev world and is not
> > good design.
> > 
> > The PASID interfaces for /dev/ioasid should use the 'struct
> > pci_device' for everything and never pass in a mdev_device to the
> > iommu layer.
> 'struct pci_device' -> 'struct device' since /dev/ioasid also needs to support
> non-pci devices?

I don't know. PASID is a PCI concept, I half expect to have at least
some wrappers for PCI specific IOMMU APIs so there is reasonable type
safety possible. But maybe it is all general enough that isn't needed.

> I assume the so-called connection here implies using iommu_attach_device 
> to attach a device to an iommu domain. 


> Did you suggest this connection must be done by the mdev driver
> which implements vfio_device and then passing iommu domain to
> /dev/ioasid when attaching the device to an IOASID? 

Why do we need iommu domain in a uAPI at all? It is an artifact of the
current kernel implementation

> sort of like: ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid, domain);

ioasid and device_fd completely describe the IOMMU parameters, don't

> If yes, this conflicts with one design in /dev/ioasid proposal that we're
> working on. In earlier discussion we agreed that each ioasid is associated
> to a singleton iommu domain and all devices that are attached to this 
> ioasid with compatible iommu capabilities just share this domain.

I think you need to stand back a bit more from the current detailed
implementation of the iommu API. /dev/ioasid needs to be able to
create IOASID objects that can be used with as many devices as
reasonable without duplicating the page tables. If or how that maps to
todays iommu layer I don't know.

Remember the uAPI is forever, the kernel internals can change.

> Baolu and I discussed below draft proposal to avoid passing mdev_device
> to the iommu layer. Please check whether it makes sense:
> // for every device attached to an ioasid
> // mdev is represented by pasid (allocated by mdev driver)
> // pf/vf has INVALID_IOASID in pasid
> struct dev_info {
> 	struct list_head		next;
> 	struct device		*device;
> 	u32			pasid;
> }

This is a list of "attachments"? sure

> // for every allocated ioasid
> struct ioasid_info {
> 	// the handle to convey iommu operations
> 	struct iommu_domain	*domain;
> 	// metadata for map/unmap
> 	struct rb_node		dma_list;
> 	// the list of attached device
> 	struct dev_info		*dev_list;
> 	...
> }

Yes, probably something basically like that
> // called by VFIO/VDPA
> int ioasid_attach_device(struct *device, u32 ioasid, u32 pasid)

'u32 ioasid' should be a 'struct ioasid_info *' and 'pasid' is not
needed because ioasif_info->dev_list[..]->pasid already stores the

Keep in mind at this API level the 'struct device *' here shuld be the
PCI device never the mdev device.

> {
> 	// allocate a new dev_info, filled with device/pasid
> 	// allocate iommu domain if it's the 1st attached device
> 	// check iommu compatibility if an domain already exists
> 	// attach the device to the iommu domain
> 	if (pasid == INVALID_IOASID)
> 		iommu_attach_device(domain, device);
> 	else
> 		iommu_aux_attach_device(domain, device, pasid);

And if device is the pci_device I don't really see how this works.

This API layer would need to create some dummy struct device to attach
the aux domain too if you want to keep re-using the stuff we have

The idea that a PASID is 1:1 with a 'struct device' is completely VFIO
centric thinking.

> // when attaching PF/VF to an ioasid
> ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);

This would have to be a (ioasif_fd, ioasid) tuple as an ioasid is
scoped within each fd.

> -> get vfio_device of device_fd
> -> ioasid_attach_device(vfio_device->dev, ioasid, INVALID_IOASID);

The device knows if it is going to use a PASID or not. The API level
here should always very explicitly indicate *device* intention.

If the device knows it is going to form DMA's with a PASID tag then it
absoultely must be completely explict and clear in the API to the
layers below that is what is happening.

'INVALID_IOASID' does not communicate that ideal to me.

> // when attaching a mdev to an ioasid
> ioctl(device_fd, VFIO_ATTACH_IOASID, ioasid);
> -> get vfio_device of device_fd
> -> find mdev_parent of vfio_device
> -> find pasid allocated to this mdev
> -> ioasid_attach_device(parent->dev, ioasid, pasid);

Again no, mdev shouldn't be involved, it is the wrong layering.


  reply	other threads:[~2021-05-17 14:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-25  1:30 [PATCH v8 0/9] vfio/mdev: IOMMU aware mediated device Lu Baolu
2019-03-25  1:30 ` [PATCH v8 2/9] iommu/vt-d: Make intel_iommu_enable_pasid() more generic Lu Baolu
2019-03-25  1:30 ` [PATCH v8 3/9] iommu/vt-d: Add per-device IOMMU feature ops entries Lu Baolu
2019-03-25  1:30 ` [PATCH v8 4/9] iommu/vt-d: Move common code out of iommu_attch_device() Lu Baolu
2019-03-25  1:30 ` [PATCH v8 5/9] iommu/vt-d: Aux-domain specific domain attach/detach Lu Baolu
     [not found] ` <20190325013036.18400-1-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2019-03-25  1:30   ` [PATCH v8 1/9] iommu: Add APIs for multiple domains per device Lu Baolu
2019-03-25  1:30   ` [PATCH v8 6/9] iommu/vt-d: Return ID associated with an auxiliary domain Lu Baolu
2019-03-25  1:30 ` [PATCH v8 7/9] vfio/mdev: Add iommu related member in mdev_device Lu Baolu
2019-03-26  9:32   ` Kirti Wankhede
2019-03-27 14:17     ` Parav Pandit
     [not found]       ` <VI1PR0501MB2271172964AFA3CDD6F1B02CD1580-o1MPJYiShEx8vvm6e75m2MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2019-03-27 18:16         ` Alex Williamson
2019-03-26 17:42   ` Alex Williamson
2021-04-06 20:00   ` Jason Gunthorpe
2021-04-07  1:58     ` Lu Baolu
2021-04-07 11:34       ` Jason Gunthorpe
2021-05-11  6:56     ` Lu Baolu
2021-05-11 17:37       ` Jason Gunthorpe
2021-05-12  7:46         ` Tian, Kevin
2021-05-17 14:03           ` Jason Gunthorpe [this message]
2019-03-25  1:30 ` [PATCH v8 8/9] vfio/type1: Add domain at(de)taching group helpers Lu Baolu
     [not found]   ` <20190325013036.18400-9-baolu.lu-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2019-03-26 17:42     ` Alex Williamson
2019-03-25  1:30 ` [PATCH v8 9/9] vfio/type1: Handle different mdev isolation type Lu Baolu
2019-03-26  9:33   ` Kirti Wankhede
2019-03-26 17:42   ` Alex Williamson
2019-04-11 15:19 ` [PATCH v8 0/9] vfio/mdev: IOMMU aware mediated device Joerg Roedel
2019-04-12  1:36   ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210517140316.GN1002214@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=xin.zeng@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).