All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Yi Liu <yi.l.liu@intel.com>, "Tian, Kevin" <kevin.tian@intel.com>,
	"Peng, Chao P" <chao.p.peng@intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"david@gibson.dropbear.id.au" <david@gibson.dropbear.id.au>,
	"thuth@redhat.com" <thuth@redhat.com>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"mjrosato@linux.ibm.com" <mjrosato@linux.ibm.com>,
	"akrowiak@linux.ibm.com" <akrowiak@linux.ibm.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"jjherne@linux.ibm.com" <jjherne@linux.ibm.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"nicolinc@nvidia.com" <nicolinc@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"eric.auger.pro@gmail.com" <eric.auger.pro@gmail.com>,
	"peterx@redhat.com" <peterx@redhat.com>
Subject: Re: [RFC 15/18] vfio/iommufd: Implement iommufd backend
Date: Tue, 26 Apr 2022 16:27:03 -0300	[thread overview]
Message-ID: <20220426192703.GS2125828@nvidia.com> (raw)
In-Reply-To: <20220426124541.5f33f357.alex.williamson@redhat.com>

On Tue, Apr 26, 2022 at 12:45:41PM -0600, Alex Williamson wrote:
> On Tue, 26 Apr 2022 11:11:56 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Tue, Apr 26, 2022 at 10:08:30PM +0800, Yi Liu wrote:
> > 
> > > > I think it is strange that the allowed DMA a guest can do depends on
> > > > the order how devices are plugged into the guest, and varys from
> > > > device to device?
> > > > 
> > > > IMHO it would be nicer if qemu would be able to read the new reserved
> > > > regions and unmap the conflicts before hot plugging the new device. We
> > > > don't have a kernel API to do this, maybe we should have one?  
> > > 
> > > For userspace drivers, it is fine to do it. For QEMU, it's not quite easy
> > > since the IOVA is GPA which is determined per the e820 table.  
> > 
> > Sure, that is why I said we may need a new API to get this data back
> > so userspace can fix the address map before attempting to attach the
> > new device. Currently that is not possible at all, the device attach
> > fails and userspace has no way to learn what addresses are causing
> > problems.
> 
> We have APIs to get the IOVA ranges, both with legacy vfio and the
> iommufd RFC, QEMU could compare these, but deciding to remove an
> existing mapping is not something to be done lightly. 

Not quite, you can get the IOVA ranges after you attach the device,
but device attach will fail if the new range restrictions intersect
with the existing mappings. So we don't have an easy way to learn the
new range restriction in a way that lets userspace ensure an attach
will not fail due to reserved ranged overlapping with mappings.

The best you could do is make a dummy IOAS then attach the device,
read the mappings, detatch, and then do your unmaps.

I'm imagining something like IOMMUFD_DEVICE_GET_RANGES that can be
called prior to attaching on the device ID.

> We must be absolutely certain that there is no DMA to that range
> before doing so.

Yes, but at the same time if the VM thinks it can DMA to that memory
then it is quite likely to DMA to it with the new device that doesn't
have it mapped in the first place.

It is also a bit odd that the behavior depends on the order the
devices are installed as if you plug the narrower device first then
the next device will happily use the narrower ranges, but viceversa
will get a different result.

This is why I find it bit strange that qemu doesn't check the
ranges. eg I would expect that anything declared as memory in the E820
map has to be mappable to the iommu_domain or the device should not
attach at all.

The P2P is a bit trickier, and I know we don't have a good story
because we lack ACPI description, but I would have expected the same
kind of thing. Anything P2Pable should be in the iommu_domain or the
device should not attach. As with system memory there are only certain
parts of the E820 map that an OS would use for P2P.

(ideally ACPI would indicate exactly what combinations of devices are
P2Pable and then qemu would use that drive the mandatory address
ranges in the IOAS)

> > > yeah. qemu can filter the P2P BAR mapping and just stop it in qemu. We
> > > haven't added it as it is something you will add in future. so didn't
> > > add it in this RFC. :-) Please let me know if it feels better to filter
> > > it from today.  
> > 
> > I currently hope it will use a different map API entirely and not rely
> > on discovering the P2P via the VMA. eg using a DMABUF FD or something.
> > 
> > So blocking it in qemu feels like the right thing to do.
> 
> Wait a sec, so legacy vfio supports p2p between devices, which has a
> least a couple known use cases, primarily involving GPUs for at least
> one of the peers, and we're not going to make equivalent support a
> feature requirement for iommufd?  

I said "different map API" - something like IOMMU_FD_MAP_DMABUF
perhaps.

The trouble with taking in a user pointer to MMIO memory is that it
becomes quite annoying to go from a VMA back to the actual owner
object so we can establish proper refcounting and lifetime of struct-page-less
memory. Requiring userspace to make that connection via a FD
simplifies and generalizes this.

So, qemu would say 'oh this memory is exported by VFIO, I will do
VFIO_EXPORT_DMA_BUF, then do IOMMU_FD_MAP_DMABUF, then close the FD'

For vfio_compat we'd have to build some hacky compat approach to
discover the dmabuf for vfio-pci from the VMA.

But if qemu is going this way with a new implementation I would prefer
the new implementation use the new way, when we decide what it should
be.

As I mentioned before I would like to use DMABUF since I already have
a use-case to expose DMABUF from vfio-pci to connect to RDMA. I will
post the vfio DMABUF patch I have already.

Jason

  reply	other threads:[~2022-04-26 19:27 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 10:46 [RFC 00/18] vfio: Adopt iommufd Yi Liu
2022-04-14 10:46 ` Yi Liu
2022-04-14 10:46 ` [RFC 01/18] scripts/update-linux-headers: Add iommufd.h Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 02/18] linux-headers: Import latest vfio.h and iommufd.h Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 03/18] hw/vfio/pci: fix vfio_pci_hot_reset_result trace point Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 04/18] vfio/pci: Use vbasedev local variable in vfio_realize() Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 05/18] vfio/common: Rename VFIOGuestIOMMU::iommu into ::iommu_mr Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-14 10:46 ` [RFC 06/18] vfio/common: Split common.c into common.c, container.c and as.c Yi Liu
2022-04-14 10:46 ` [RFC 07/18] vfio: Add base object for VFIOContainer Yi Liu
2022-04-14 10:46   ` Yi Liu
2022-04-29  6:29   ` David Gibson
2022-04-29  6:29     ` David Gibson
2022-05-03 13:05     ` Yi Liu
2022-04-14 10:47 ` [RFC 08/18] vfio/container: Introduce vfio_[attach/detach]_device Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 09/18] vfio/platform: Use vfio_[attach/detach]_device Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 10/18] vfio/ap: " Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 11/18] vfio/ccw: " Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 12/18] vfio/container-obj: Introduce [attach/detach]_device container callbacks Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 13/18] vfio/container-obj: Introduce VFIOContainer reset callback Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 14/18] hw/iommufd: Creation Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 15/18] vfio/iommufd: Implement iommufd backend Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-22 14:58   ` Jason Gunthorpe
2022-04-22 21:33     ` Alex Williamson
2022-04-22 21:33       ` Alex Williamson
2022-04-26  9:55     ` Yi Liu
2022-04-26  9:55       ` Yi Liu
2022-04-26 10:41       ` Tian, Kevin
2022-04-26 10:41         ` Tian, Kevin
2022-04-26 13:41         ` Jason Gunthorpe
2022-04-26 14:08           ` Yi Liu
2022-04-26 14:08             ` Yi Liu
2022-04-26 14:11             ` Jason Gunthorpe
2022-04-26 18:45               ` Alex Williamson
2022-04-26 18:45                 ` Alex Williamson
2022-04-26 19:27                 ` Jason Gunthorpe [this message]
2022-04-26 20:59                   ` Alex Williamson
2022-04-26 20:59                     ` Alex Williamson
2022-04-26 23:08                     ` Jason Gunthorpe
2022-04-26 13:53       ` Jason Gunthorpe
2022-04-14 10:47 ` [RFC 16/18] vfio/iommufd: Add IOAS_COPY_DMA support Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 17/18] vfio/as: Allow the selection of a given iommu backend Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-14 10:47 ` [RFC 18/18] vfio/pci: Add an iommufd option Yi Liu
2022-04-14 10:47   ` Yi Liu
2022-04-15  8:37 ` [RFC 00/18] vfio: Adopt iommufd Nicolin Chen
2022-04-17 10:30   ` Eric Auger
2022-04-17 10:30     ` Eric Auger
2022-04-19  3:26     ` Nicolin Chen
2022-04-25 19:40       ` Eric Auger
2022-04-25 19:40         ` Eric Auger
2022-04-18  8:49 ` Tian, Kevin
2022-04-18  8:49   ` Tian, Kevin
2022-04-18 12:09   ` Yi Liu
2022-04-18 12:09     ` Yi Liu
2022-04-25 19:51     ` Eric Auger
2022-04-25 19:51       ` Eric Auger
2022-04-25 19:55   ` Eric Auger
2022-04-25 19:55     ` Eric Auger
2022-04-26  8:39     ` Tian, Kevin
2022-04-26  8:39       ` Tian, Kevin
2022-04-22 22:09 ` Alex Williamson
2022-04-22 22:09   ` Alex Williamson
2022-04-25 10:10   ` Daniel P. Berrangé
2022-04-25 10:10     ` Daniel P. Berrangé
2022-04-25 13:36     ` Jason Gunthorpe
2022-04-25 14:37     ` Alex Williamson
2022-04-25 14:37       ` Alex Williamson
2022-04-26  8:37       ` Tian, Kevin
2022-04-26  8:37         ` Tian, Kevin
2022-04-26 12:33         ` Jason Gunthorpe
2022-04-26 16:21         ` Alex Williamson
2022-04-26 16:21           ` Alex Williamson
2022-04-26 16:42           ` Jason Gunthorpe
2022-04-26 19:24             ` Alex Williamson
2022-04-26 19:24               ` Alex Williamson
2022-04-26 19:36               ` Jason Gunthorpe
2022-04-28  3:21           ` Tian, Kevin
2022-04-28  3:21             ` Tian, Kevin
2022-04-28 14:24             ` Alex Williamson
2022-04-28 14:24               ` Alex Williamson
2022-04-28 16:20               ` Daniel P. Berrangé
2022-04-28 16:20                 ` Daniel P. Berrangé
2022-04-29  0:45                 ` Tian, Kevin
2022-04-29  0:45                   ` Tian, Kevin
2022-04-25 20:23   ` Eric Auger
2022-04-25 20:23     ` Eric Auger
2022-04-25 22:53     ` Alex Williamson
2022-04-25 22:53       ` Alex Williamson
2022-04-26  9:47 ` Shameerali Kolothum Thodi via
2022-04-26  9:47   ` Shameerali Kolothum Thodi
2022-04-26 11:44   ` Eric Auger
2022-04-26 11:44     ` Eric Auger
2022-04-26 12:43     ` Shameerali Kolothum Thodi
2022-04-26 12:43       ` Shameerali Kolothum Thodi via
2022-04-26 16:35       ` Alex Williamson
2022-04-26 16:35         ` Alex Williamson
2022-05-09 14:24         ` Zhangfei Gao
2022-05-10  3:17           ` Yi Liu
2022-05-10  6:51             ` Eric Auger
2022-05-10 12:35               ` Zhangfei Gao
2022-05-10 12:45                 ` Jason Gunthorpe
2022-05-10 14:08                   ` Yi Liu
2022-05-11 14:17                     ` zhangfei.gao
2022-05-12  9:01                       ` zhangfei.gao
2022-05-17  8:55                         ` Yi Liu
2022-05-18  7:22                           ` zhangfei.gao
2022-05-18 14:00                             ` Yi Liu
2022-06-28  8:14                               ` Shameerali Kolothum Thodi
2022-06-28  8:14                                 ` Shameerali Kolothum Thodi via
2022-06-28  8:58                                 ` Eric Auger
2022-05-17  8:52                       ` Yi Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220426192703.GS2125828@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=chao.p.peng@intel.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=jasowang@redhat.com \
    --cc=jjherne@linux.ibm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=nicolinc@nvidia.com \
    --cc=pasic@linux.ibm.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.