All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: "Liu, Yi L" <yi.l.liu@intel.com>
Cc: "kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Masahiro Yamada <yamada.masahiro@socionext.com>
Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
Date: Fri, 5 Jul 2019 09:55:20 -0600	[thread overview]
Message-ID: <20190705095520.548331c2@x1.home> (raw)
In-Reply-To: <A2975661238FB949B60364EF0F2C257439F1FF4E@SHSMSX104.ccr.corp.intel.com>

On Thu, 4 Jul 2019 09:11:02 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> Hi Alex,
> 
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Thursday, July 4, 2019 1:22 AM
> > To: Liu, Yi L <yi.l.liu@intel.com>
> > Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver
> > 
> > On Wed, 3 Jul 2019 08:25:25 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > Hi Alex,
> > >
> > > Thanks for the comments. Have four inline responses below. And one
> > > of them need your further help. :-)  
> 
> [...]
> 
> > > > > >  
> > > > > > > > > > used iommu_attach_device() rather than iommu_attach_group()
> > > > > > > > > > for non-aux mdev iommu_device.  Is there a requirement that
> > > > > > > > > > the mdev parent device is in a singleton iommu group?  
> > > > > > > > >
> > > > > > > > > I don't think there should have such limitation. Per my
> > > > > > > > > understanding, vfio-mdev-pci should also be able to bind to
> > > > > > > > > devices which shares iommu group with other devices. vfio-pci works  
> > well  
> > > > for such devices.  
> > > > > > > > > And since the two drivers share most of the codes, I think
> > > > > > > > > vfio-mdev-pci should naturally support it as well.  
> > > > > > > >
> > > > > > > > Yes, the difference though is that vfio.c knows when devices are
> > > > > > > > in the same group, which mdev vfio.c only knows about the
> > > > > > > > non-iommu backed group, not the group that is actually used for
> > > > > > > > the iommu backing.  So we either need to enlighten vfio.c or
> > > > > > > > further abstract those details in vfio_iommu_type1.c.  
> > > > > > >
> > > > > > > Not sure if it is necessary to introduce more changes to vfio.c or
> > > > > > > vfio_iommu_type1.c. If it's only for the scenario which two
> > > > > > > devices share an iommu_group, I guess it could be supported by
> > > > > > > using __iommu_attach_device() which has no device counting for the
> > > > > > > group. But maybe I missed something here. It would be great if you
> > > > > > > can elaborate a bit for it. :-)  
> > > > > >
> > > > > > We need to use the group semantics, there's a reason
> > > > > > __iommu_attach_device() is not exposed, it's an internal helper.  I
> > > > > > think there's no way around that we need to somewhere track the
> > > > > > actual group we're attaching to and have the smarts to re-use it for
> > > > > > other devices in the same group.  
> > > > >
> > > > > Hmmm, exposing __iommu_attach_device() is not good, let's forget it.
> > > > > :-)
> > > > >  
> > > > > > > > > > If this is a simplification, then vfio-mdev-pci should not
> > > > > > > > > > bind to devices where this is violated since there's no way
> > > > > > > > > > to use the device.  Can we support it though?  
> > > > > > > > >
> > > > > > > > > yeah, I think we need to support it.  
> > >
> > > I've already made vfio-mdev-pci driver work for non-singleton iommu
> > > group. e.g. for devices in a single iommu group, I can bind the devices
> > > to eithervfio-pci or vfio-mdev-pci and then passthru them to a VM. And
> > > it will fail if user tries to passthru a vfio-mdev-pci device via vfio-pci
> > > manner "-device vfio-pci,host=01:00.1". In other words, vfio-mdev-pci
> > > device can only passthru via
> > > "-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID". This is what
> > > we expect.
> > >
> > > However, I encountered a problem when trying to prevent user from
> > > passthru these devices to different VMs. I've tried in my side, and I
> > > can passthru vfio-pci device and vfio-mdev-pci device to different
> > > VMs. But actually this operation should be failed. If all the devices
> > > are bound to vfio-pci, Qemu will open iommu backed group. So
> > > Qemu can check if a given group has already been used by an
> > > AddressSpace (a.ka. VM) in vfio_get_group() thus to prevent
> > > user from passthru these devices to different VMs if the devices
> > > are in the same iommu backed group. However, here for a
> > > vfio-mdev-pci device, it has a new group and group ID, Qemu
> > > will not be able to detect if the other devices (share iommu group
> > > with vfio-mdev-pci device) are passthru to existing VMs. This is the
> > > major problem for vfio-mdev-pci to support non-singleton group
> > > in my side now. Even all devices are bound to vfio-mdev-pci driver,
> > > Qemu is still unable to check since all the vfio-mdev-pci devices
> > > have a separate mdev group.
> > >
> > > To fix it, may need Qemu to do more things. E.g. If it tries to use a
> > > non-singleton iommu backed group, it needs to check if any mdev
> > > group is created and used by an existing VM. Also it needs check if
> > > iommu backed group is passthru to an existing VM when trying to
> > > use a mdev group. For singleton iommu backed group and
> > > aux-domain enabled physical device, still allow to passthru mdev
> > > group to different VMs. To achieve these checks, Qemu may need
> > > to have knowledge whether a group is iommu backed and singleton
> > > or not. Do you think it is good to expose such info to userspace? or
> > > any other idea? :-)  
> > 
> > QEMU is never responsible for isolating a group, QEMU is just a
> > userspace driver, it's vfio's responsibility to prevent the user from
> > splitting groups in ways that are not allowed.  QEMU does not know the  
> 
> yep, also my concern.
> 
> > true group association, it only knows the "virtual" group of the mdev
> > device.  QEMU will create a container and add the mdev virtual group to
> > the container.  In the kernel, the type1 backend should actually do an
> > iommu_attach_group(), attaching the iommu_device group to the domain.
> > When QEMU processes the next device, it will have a different group,
> > but (assuming no vIOMMU) it will try to attach it to the same
> > container, which should work because the iommu_device group backing the
> > mdev virtual group is already attached to this domain.
> > If we had two separate QEMU processes, each with an mdev device from a
> > common group at the iommu_device level, the type1 backend should fail
> > to attach the group to the container for the later caller.  I'd think
> > this should fail at the iommu_attach_group() call since the group we're
> > trying to attach is already attached to another domain.  
> 
> Agree with you. At first, I want to fail it in similar way with vfio-pci devices.
> For vfio-pci devices from a common group, vfio will fail the operation around
> /dev/vfio/group_id open if user tries to assign the vfio-pci devices from common
> group to multiple QEMU processes. Meanwhile, QEMU will avoid to open a
> /dev/vfio/group_id multiple times, so current vfio/QEMU works well for 
> non- singleton group (no vIOMMU). Unfortunately, looks like we have no way
> to fail vfio-mdev-pci devices in similar mechanism as each mdev has a separate
> group. So yes, I agree with you that we may fail it around the group attach
> phase. Below is my draft idea:
> 
> In vfio_iommu_type1_attach_group(), we need to do the following checks.
> 
> if (mdev_group) {
> 	if (iommu_device group enabled aux-domain) {
> 		/*
> 		  * iommu_group enabled aux-domain means the iommu_devices
> 		  * in this group are aux-domain enabled. e.g. SIOV capable devices.
> 		  * Also, I think for aux-domain enabled group, it essentially means
> 		  * the group is a singleton group as SIOV capable devices require
> 		  * to be in a singleton group.
> 		  */
> 		 iommu_aux_attach_device();
> 	} else {
> 		/*
> 		  * needs to check the group->opened in vfio.c. Just like what
> 		  * vfio_group_fopen() does. May be a new VFIO interface required
> 		  * here since the group->opened is within vfio.c.
> 		  * vfio_iommu_device_group_opened_inc() will inc group->opened, so
> 		  * that other VM will fail when trying to open the group. And another
> 		  * VFIO interface is also required to dec group->opened when VM is
> 		  * down.
> 		  */
> 		if (vfio_iommu_device_group_opened_inc(iommu_device_group))
> 			return -EBUSY;
> 		iommu_attach_gorup(iommu_device_group);
> 	}
> }
> 
> The concern here is the two new VFIO interfaces. Any thoughts on this proposal? :-)
> 
> > It's really unfortunate that we don't have the mdev inheriting the
> > iommu group of the iommu_device so that userspace can really understand
> > this relationship.  A separate group makes sense for the aux-domain
> > case, and is (I guess) not a significant issue in the case of a
> > singleton iommu_device group, but it's pretty awkward here.  Perhaps
> > this is something we should correct in design of iommu backed mdevs.  
> 
> Yeah, for aux-domain case, it is not significant issue as aux-domain essentially
> means singleton iommu_devie group. And in early time, when designing the support
> for wrap pci as a mdev, we also considered to let vfio-mdev-pci to reuse
> iommu_device group. But this results in an iommu backed group includes mdev and
> physical devices, which might also be strange. Do you think it is valuable to reconsider
> it?

From a group perspective, the cleanest solution would seem to be that
IOMMU backed mdevs w/o aux domain support should inherit the IOMMU
group of the iommu_device, but I think the barrier here is that we have
a difficult time determining if the group is "viable" in that case.
For example a group where one devices is bound to a native host driver
and the other device bound to a vfio driver would typically be
considered non-viable as it breaks the isolation guarantees.  However
I think in this configuration, the parent device is effectively
participating in the isolation and "donating" its iommu group on behalf
of the mdev device.  I don't think we can simultaneously use that iommu
group for any other purpose.  I'm sure we could come up with a way for
vifo-core to understand this relationship and add it to the white list,
I wonder though how confusing this might be to users who now understand
the group/driver requirement to be "all endpoints bound to vfio
drivers".  This might still be the best approach regardless of this.
Thanks,

Alex

  reply	other threads:[~2019-07-05 15:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-08 13:21 [PATCH v1 0/9] vfio_pci: wrap pci device as a mediated device Liu Yi L
2019-06-08 13:21 ` [PATCH v1 1/9] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Liu Yi L
2019-06-08 13:21 ` [PATCH v1 2/9] vfio_pci: refine user config reference in vfio-pci module Liu Yi L
2019-06-08 13:21 ` [PATCH v1 3/9] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Liu Yi L
2019-06-08 13:21 ` [PATCH v1 4/9] vfio_pci: make common functions be extern Liu Yi L
2019-06-08 13:21 ` [PATCH v1 5/9] vfio_pci: duplicate vfio_pci.c Liu Yi L
2019-06-08 13:21 ` [PATCH v1 6/9] vfio_pci: shrink vfio_pci_common.c Liu Yi L
2019-06-08 13:21 ` [PATCH v1 7/9] vfio_pci: shrink vfio_pci.c Liu Yi L
2019-06-08 13:21 ` [PATCH v1 8/9] vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op Liu Yi L
2019-06-08 13:21 ` [PATCH v1 9/9] smaples: add vfio-mdev-pci driver Liu Yi L
2019-06-20  4:26   ` Alex Williamson
2019-06-20 13:00     ` Liu, Yi L
2019-06-20 21:07       ` Alex Williamson
2019-06-21 10:23         ` Liu, Yi L
2019-06-21 15:57           ` Alex Williamson
2019-06-24  8:20             ` Liu, Yi L
2019-06-28 15:07               ` Alex Williamson
2019-07-03  8:25                 ` Liu, Yi L
2019-07-03 17:22                   ` Alex Williamson
2019-07-04  9:11                     ` Liu, Yi L
2019-07-05 15:55                       ` Alex Williamson [this message]
2019-07-11 12:27                         ` Liu, Yi L
2019-07-11 19:08                           ` Alex Williamson
2019-07-12 12:55                             ` Liu, Yi L
2019-07-19 20:57                               ` Alex Williamson
2019-07-26  9:04                                 ` Liu, Yi L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190705095520.548331c2@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yamada.masahiro@socionext.com \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.