All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yi Liu <yi.l.liu@intel.com>,
	akrowiak@linux.ibm.com, jjherne@linux.ibm.com,
	chao.p.peng@intel.com, kvm@vger.kernel.org,
	Laine Stump <laine@redhat.com>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	jasowang@redhat.com, cohuck@redhat.com, thuth@redhat.com,
	peterx@redhat.com, qemu-devel@nongnu.org, pasic@linux.ibm.com,
	eric.auger@redhat.com, yi.y.sun@intel.com, nicolinc@nvidia.com,
	kevin.tian@intel.com, jgg@nvidia.com, eric.auger.pro@gmail.com,
	david@gibson.dropbear.id.au
Subject: Re: [RFC 00/18] vfio: Adopt iommufd
Date: Mon, 25 Apr 2022 11:10:14 +0100	[thread overview]
Message-ID: <YmZzhohO81z1PVKS@redhat.com> (raw)
In-Reply-To: <20220422160943.6ff4f330.alex.williamson@redhat.com>

On Fri, Apr 22, 2022 at 04:09:43PM -0600, Alex Williamson wrote:
> [Cc +libvirt folks]
> 
> On Thu, 14 Apr 2022 03:46:52 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > With the introduction of iommufd[1], the linux kernel provides a generic
> > interface for userspace drivers to propagate their DMA mappings to kernel
> > for assigned devices. This series does the porting of the VFIO devices
> > onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> > Other devices like vpda, vfio mdev and etc. are not considered yet.

snip

> > The selection of the backend is made on a device basis using the new
> > iommufd option (on/off/auto). By default the iommufd backend is selected
> > if supported by the host and by QEMU (iommufd KConfig). This option is
> > currently available only for the vfio-pci device. For other types of
> > devices, it does not yet exist and the legacy BE is chosen by default.
> 
> I've discussed this a bit with Eric, but let me propose a different
> command line interface.  Libvirt generally likes to pass file
> descriptors to QEMU rather than grant it access to those files
> directly.  This was problematic with vfio-pci because libvirt can't
> easily know when QEMU will want to grab another /dev/vfio/vfio
> container.  Therefore we abandoned this approach and instead libvirt
> grants file permissions.
> 
> However, with iommufd there's no reason that QEMU ever needs more than
> a single instance of /dev/iommufd and we're using per device vfio file
> descriptors, so it seems like a good time to revisit this.

I assume access to '/dev/iommufd' gives the process somewhat elevated
privileges, such that you don't want to unconditionally give QEMU
access to this device ?

> The interface I was considering would be to add an iommufd object to
> QEMU, so we might have a:
> 
> -device iommufd[,fd=#][,id=foo]
> 
> For non-libivrt usage this would have the ability to open /dev/iommufd
> itself if an fd is not provided.  This object could be shared with
> other iommufd users in the VM and maybe we'd allow multiple instances
> for more esoteric use cases.  [NB, maybe this should be a -object rather than
> -device since the iommufd is not a guest visible device?]

Yes,  -object would be the right answer for something that's purely
a host side backend impl selector.

> The vfio-pci device might then become:
> 
> -device vfio-pci[,host=DDDD:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]
> 
> So essentially we can specify the device via host, sysfsdev, or passing
> an fd to the vfio device file.  When an iommufd object is specified,
> "foo" in the example above, each of those options would use the
> vfio-device access mechanism, essentially the same as iommufd=on in
> your example.  With the fd passing option, an iommufd object would be
> required and necessarily use device level access.
> 
> In your example, the iommufd=auto seems especially troublesome for
> libvirt because QEMU is going to have different locked memory
> requirements based on whether we're using type1 or iommufd, where the
> latter resolves the duplicate accounting issues.  libvirt needs to know
> deterministically which backed is being used, which this proposal seems
> to provide, while at the same time bringing us more in line with fd
> passing.  Thoughts?  Thanks,

Yep, I agree that libvirt needs to have more direct control over this.
This is also even more important if there are notable feature differences
in the 2 backends.

I wonder if anyone has considered an even more distinct impl, whereby
we have a completely different device type on the backend, eg

  -device vfio-iommu-pci[,host=DDDD:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]

If a vendor wants to fully remove the legacy impl, they can then use the
Kconfig mechanism to disable the build of the legacy impl device, while
keeping the iommu impl (or vica-verca if the new iommu impl isn't considered
reliable enough for them to support yet).

Libvirt would use

   -object iommu,id=iommu0,fd=NNN
   -device vfio-iommu-pci,fd=MMM,iommu=iommu0

Non-libvirt would use a simpler

   -device vfio-iommu-pci,host=0000:03:22.1

with QEMU auto-creating a 'iommu' object in the background.

This would fit into libvirt's existing modelling better. We currently have
a concept of a PCI assignment backend, which previously supported the
legacy PCI assignment, vs the VFIO PCI assignment. This new iommu impl
feels like a 3rd PCI assignment approach, and so fits with how we modelled
it as a different device type in the past.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


WARNING: multiple messages have this Message-ID (diff)
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: akrowiak@linux.ibm.com, jjherne@linux.ibm.com, thuth@redhat.com,
	chao.p.peng@intel.com, jgg@nvidia.com, kvm@vger.kernel.org,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	jasowang@redhat.com, cohuck@redhat.com, qemu-devel@nongnu.org,
	peterx@redhat.com, pasic@linux.ibm.com, eric.auger@redhat.com,
	yi.y.sun@intel.com, Yi Liu <yi.l.liu@intel.com>,
	nicolinc@nvidia.com, kevin.tian@intel.com,
	Laine Stump <laine@redhat.com>,
	david@gibson.dropbear.id.au, eric.auger.pro@gmail.com
Subject: Re: [RFC 00/18] vfio: Adopt iommufd
Date: Mon, 25 Apr 2022 11:10:14 +0100	[thread overview]
Message-ID: <YmZzhohO81z1PVKS@redhat.com> (raw)
In-Reply-To: <20220422160943.6ff4f330.alex.williamson@redhat.com>

On Fri, Apr 22, 2022 at 04:09:43PM -0600, Alex Williamson wrote:
> [Cc +libvirt folks]
> 
> On Thu, 14 Apr 2022 03:46:52 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > With the introduction of iommufd[1], the linux kernel provides a generic
> > interface for userspace drivers to propagate their DMA mappings to kernel
> > for assigned devices. This series does the porting of the VFIO devices
> > onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> > Other devices like vpda, vfio mdev and etc. are not considered yet.

snip

> > The selection of the backend is made on a device basis using the new
> > iommufd option (on/off/auto). By default the iommufd backend is selected
> > if supported by the host and by QEMU (iommufd KConfig). This option is
> > currently available only for the vfio-pci device. For other types of
> > devices, it does not yet exist and the legacy BE is chosen by default.
> 
> I've discussed this a bit with Eric, but let me propose a different
> command line interface.  Libvirt generally likes to pass file
> descriptors to QEMU rather than grant it access to those files
> directly.  This was problematic with vfio-pci because libvirt can't
> easily know when QEMU will want to grab another /dev/vfio/vfio
> container.  Therefore we abandoned this approach and instead libvirt
> grants file permissions.
> 
> However, with iommufd there's no reason that QEMU ever needs more than
> a single instance of /dev/iommufd and we're using per device vfio file
> descriptors, so it seems like a good time to revisit this.

I assume access to '/dev/iommufd' gives the process somewhat elevated
privileges, such that you don't want to unconditionally give QEMU
access to this device ?

> The interface I was considering would be to add an iommufd object to
> QEMU, so we might have a:
> 
> -device iommufd[,fd=#][,id=foo]
> 
> For non-libivrt usage this would have the ability to open /dev/iommufd
> itself if an fd is not provided.  This object could be shared with
> other iommufd users in the VM and maybe we'd allow multiple instances
> for more esoteric use cases.  [NB, maybe this should be a -object rather than
> -device since the iommufd is not a guest visible device?]

Yes,  -object would be the right answer for something that's purely
a host side backend impl selector.

> The vfio-pci device might then become:
> 
> -device vfio-pci[,host=DDDD:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]
> 
> So essentially we can specify the device via host, sysfsdev, or passing
> an fd to the vfio device file.  When an iommufd object is specified,
> "foo" in the example above, each of those options would use the
> vfio-device access mechanism, essentially the same as iommufd=on in
> your example.  With the fd passing option, an iommufd object would be
> required and necessarily use device level access.
> 
> In your example, the iommufd=auto seems especially troublesome for
> libvirt because QEMU is going to have different locked memory
> requirements based on whether we're using type1 or iommufd, where the
> latter resolves the duplicate accounting issues.  libvirt needs to know
> deterministically which backed is being used, which this proposal seems
> to provide, while at the same time bringing us more in line with fd
> passing.  Thoughts?  Thanks,

Yep, I agree that libvirt needs to have more direct control over this.
This is also even more important if there are notable feature differences
in the 2 backends.

I wonder if anyone has considered an even more distinct impl, whereby
we have a completely different device type on the backend, eg

  -device vfio-iommu-pci[,host=DDDD:BB:DD.f][,sysfsdev=/sys/path/to/device][,fd=#][,iommufd=foo]

If a vendor wants to fully remove the legacy impl, they can then use the
Kconfig mechanism to disable the build of the legacy impl device, while
keeping the iommu impl (or vica-verca if the new iommu impl isn't considered
reliable enough for them to support yet).

Libvirt would use

   -object iommu,id=iommu0,fd=NNN
   -device vfio-iommu-pci,fd=MMM,iommu=iommu0

Non-libvirt would use a simpler

   -device vfio-iommu-pci,host=0000:03:22.1

with QEMU auto-creating a 'iommu' object in the background.

This would fit into libvirt's existing modelling better. We currently have
a concept of a PCI assignment backend, which previously supported the
legacy PCI assignment, vs the VFIO PCI assignment. This new iommu impl
feels like a 3rd PCI assignment approach, and so fits with how we modelled
it as a different device type in the past.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2022-04-25 10:10 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 10:46 [RFC 00/18] vfio: Adopt iommufd Yi Liu
2022-04-14 10:46 ` Yi Liu
2022-04-14 10:46 ` [RFC 01/18] scripts/update-linux-headers: Add iommufd.h Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 02/18] linux-headers: Import latest vfio.h and iommufd.h Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 03/18] hw/vfio/pci: fix vfio_pci_hot_reset_result trace point Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 04/18] vfio/pci: Use vbasedev local variable in vfio_realize() Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 05/18] vfio/common: Rename VFIOGuestIOMMU::iommu into ::iommu_mr Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 06/18] vfio/common: Split common.c into common.c, container.c and as.c Yi Liu
2022-04-14 10:46 ` [RFC 07/18] vfio: Add base object for VFIOContainer Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-29  6:29   ` David Gibson
2022-04-29  6:29     ` David Gibson
2022-05-03 13:05     ` Yi Liu
2022-04-14 10:47 ` [RFC 08/18] vfio/container: Introduce vfio_[attach/detach]_device Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 09/18] vfio/platform: Use vfio_[attach/detach]_device Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 10/18] vfio/ap: " Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 11/18] vfio/ccw: " Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 12/18] vfio/container-obj: Introduce [attach/detach]_device container callbacks Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 13/18] vfio/container-obj: Introduce VFIOContainer reset callback Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 14/18] hw/iommufd: Creation Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 15/18] vfio/iommufd: Implement iommufd backend Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-22 14:58   ` Jason Gunthorpe
2022-04-22 21:33     ` Alex Williamson
2022-04-22 21:33       ` Alex Williamson
2022-04-26  9:55     ` Yi Liu
2022-04-26  9:55       ` Yi Liu
2022-04-26 10:41       ` Tian, Kevin
2022-04-26 10:41         ` Tian, Kevin
2022-04-26 13:41         ` Jason Gunthorpe
2022-04-26 14:08           ` Yi Liu
2022-04-26 14:08             ` Yi Liu
2022-04-26 14:11             ` Jason Gunthorpe
2022-04-26 18:45               ` Alex Williamson
2022-04-26 18:45                 ` Alex Williamson
2022-04-26 19:27                 ` Jason Gunthorpe
2022-04-26 20:59                   ` Alex Williamson
2022-04-26 20:59                     ` Alex Williamson
2022-04-26 23:08                     ` Jason Gunthorpe
2022-04-26 13:53       ` Jason Gunthorpe
2022-04-14 10:47 ` [RFC 16/18] vfio/iommufd: Add IOAS_COPY_DMA support Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 17/18] vfio/as: Allow the selection of a given iommu backend Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 18/18] vfio/pci: Add an iommufd option Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-15  8:37 ` [RFC 00/18] vfio: Adopt iommufd Nicolin Chen
2022-04-17 10:30   ` Eric Auger
2022-04-17 10:30     ` Eric Auger
2022-04-19  3:26     ` Nicolin Chen
2022-04-25 19:40       ` Eric Auger
2022-04-25 19:40         ` Eric Auger
2022-04-18  8:49 ` Tian, Kevin
2022-04-18  8:49   ` Tian, Kevin
2022-04-18 12:09   ` Yi Liu
2022-04-18 12:09     ` Yi Liu
2022-04-25 19:51     ` Eric Auger
2022-04-25 19:51       ` Eric Auger
2022-04-25 19:55   ` Eric Auger
2022-04-25 19:55     ` Eric Auger
2022-04-26  8:39     ` Tian, Kevin
2022-04-26  8:39       ` Tian, Kevin
2022-04-22 22:09 ` Alex Williamson
2022-04-22 22:09   ` Alex Williamson
2022-04-25 10:10   ` Daniel P. Berrangé [this message]
2022-04-25 10:10     ` Daniel P. Berrangé
2022-04-25 13:36     ` Jason Gunthorpe
2022-04-25 14:37     ` Alex Williamson
2022-04-25 14:37       ` Alex Williamson
2022-04-26  8:37       ` Tian, Kevin
2022-04-26  8:37         ` Tian, Kevin
2022-04-26 12:33         ` Jason Gunthorpe
2022-04-26 16:21         ` Alex Williamson
2022-04-26 16:21           ` Alex Williamson
2022-04-26 16:42           ` Jason Gunthorpe
2022-04-26 19:24             ` Alex Williamson
2022-04-26 19:24               ` Alex Williamson
2022-04-26 19:36               ` Jason Gunthorpe
2022-04-28  3:21           ` Tian, Kevin
2022-04-28  3:21             ` Tian, Kevin
2022-04-28 14:24             ` Alex Williamson
2022-04-28 14:24               ` Alex Williamson
2022-04-28 16:20               ` Daniel P. Berrangé
2022-04-28 16:20                 ` Daniel P. Berrangé
2022-04-29  0:45                 ` Tian, Kevin
2022-04-29  0:45                   ` Tian, Kevin
2022-04-25 20:23   ` Eric Auger
2022-04-25 20:23     ` Eric Auger
2022-04-25 22:53     ` Alex Williamson
2022-04-25 22:53       ` Alex Williamson
2022-04-26  9:47 ` Shameerali Kolothum Thodi via
2022-04-26  9:47   ` Shameerali Kolothum Thodi
2022-04-26 11:44   ` Eric Auger
2022-04-26 11:44     ` Eric Auger
2022-04-26 12:43     ` Shameerali Kolothum Thodi
2022-04-26 12:43       ` Shameerali Kolothum Thodi via
2022-04-26 16:35       ` Alex Williamson
2022-04-26 16:35         ` Alex Williamson
2022-05-09 14:24         ` Zhangfei Gao
2022-05-10  3:17           ` Yi Liu
2022-05-10  6:51             ` Eric Auger
2022-05-10 12:35               ` Zhangfei Gao
2022-05-10 12:45                 ` Jason Gunthorpe
2022-05-10 14:08                   ` Yi Liu
2022-05-11 14:17                     ` zhangfei.gao
2022-05-12  9:01                       ` zhangfei.gao
2022-05-17  8:55                         ` Yi Liu
2022-05-18  7:22                           ` zhangfei.gao
2022-05-18 14:00                             ` Yi Liu
2022-06-28  8:14                               ` Shameerali Kolothum Thodi
2022-06-28  8:14                                 ` Shameerali Kolothum Thodi via
2022-06-28  8:58                                 ` Eric Auger
2022-05-17  8:52                       ` Yi Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YmZzhohO81z1PVKS@redhat.com \
    --to=berrange@redhat.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=chao.p.peng@intel.com \
    --cc=cohuck@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=jjherne@linux.ibm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=laine@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=nicolinc@nvidia.com \
    --cc=pasic@linux.ibm.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.