kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <cohuck@redhat.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <peterx@redhat.com>
Subject: Re: [RFC PATCH 10/10] vfio/type1: Register device notifier
Date: Wed, 24 Feb 2021 14:55:08 -0700	[thread overview]
Message-ID: <20210224145508.1f0edb06@omen.home.shazbot.org> (raw)
In-Reply-To: <20210222175523.GQ4247@nvidia.com>

On Mon, 22 Feb 2021 13:55:23 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Feb 22, 2021 at 09:52:32AM -0700, Alex Williamson wrote:
> > Introduce a new default strict MMIO mapping mode where the vma for
> > a VM_PFNMAP mapping must be backed by a vfio device.  This allows
> > holding a reference to the device and registering a notifier for the
> > device, which additionally keeps the device in an IOMMU context for
> > the extent of the DMA mapping.  On notification of device release,
> > automatically drop the DMA mappings for it.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >  drivers/vfio/vfio_iommu_type1.c |  124 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 123 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index b34ee4b96a4a..2a16257bd5b6 100644
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -61,6 +61,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644);
> >  MODULE_PARM_DESC(dma_entry_limit,
> >  		 "Maximum number of user DMA mappings per container (65535).");
> >  
> > +static bool strict_mmio_maps = true;
> > +module_param_named(strict_mmio_maps, strict_mmio_maps, bool, 0644);
> > +MODULE_PARM_DESC(strict_mmio_maps,
> > +		 "Restrict to safe DMA mappings of device memory (true).");  
> 
> I think this should be a kconfig, historically we've required kconfig
> to opt-in to unsafe things that could violate kernel security. Someone
> building a secure boot trusted kernel system should not have an
> options for userspace to just turn off protections.

It could certainly be further protected that this option might not
exist based on a Kconfig, but I think we're already risking breaking
some existing users and I'd rather allow it with an opt-in (like we
already do for lack of interrupt isolation), possibly even with a
kernel taint if used, if necessary.

> > +/* Req separate object for async removal from notifier vs dropping vfio_dma */
> > +struct pfnmap_obj {
> > +	struct notifier_block	nb;
> > +	struct work_struct	work;
> > +	struct vfio_iommu	*iommu;
> > +	struct vfio_device	*device;
> > +};  
> 
> So this is basically the dmabuf, I think it would be simple enough to
> go in here and change it down the road if someone had interest.
> 
> > +static void unregister_device_bg(struct work_struct *work)
> > +{
> > +	struct pfnmap_obj *pfnmap = container_of(work, struct pfnmap_obj, work);
> > +
> > +	vfio_device_unregister_notifier(pfnmap->device, &pfnmap->nb);
> > +	vfio_device_put(pfnmap->device);  
> 
> The device_put keeps the device from becoming unregistered, but what
> happens during the hot reset case? Is this what the cover letter
> was talking about? CPU access is revoked but P2P is still possible?

Yes, this only addresses cutting off DMA mappings to a released device
where we can safely drop the DMA mapping entirely.  I think we can
consider complete removal of the mapping object safe in this case
because the user has no ongoing access to the device and after
re-opening the device it would be reasonable to expect mappings would
need to be re-established.

Doing the same around disabling device memory or a reset has a much
greater potential to break userspace.  In some of these cases QEMU will
do the right thing, but reset with dependent devices gets into
scenarios that I can't be sure about.  Other userspace drivers also
exist and I can't verify them all.  So ideally we'd want to temporarily
remove the IOMMU mapping, leaving the mapping object, and restoring it
on the other side.  But I don't think we have a way to do that in the
IOMMU API currently, other than unmap and later remap.  So we might
need to support a zombie mode for the mapping object or enhance the
IOMMU API where we could leave the iotlb entry in place but drop r+w
permissions.

At this point we're also just trying to define which error we get,
perhaps an unsupported request if we do nothing or an IOMMU fault if we
invalidate or suspend the mapping.  It's not guaranteed that one of
these has better behavior on the system than the other.
 
> > +static int vfio_device_nb_cb(struct notifier_block *nb,
> > +			     unsigned long action, void *unused)
> > +{
> > +	struct pfnmap_obj *pfnmap = container_of(nb, struct pfnmap_obj, nb);
> > +
> > +	switch (action) {
> > +	case VFIO_DEVICE_RELEASE:
> > +	{
> > +		struct vfio_dma *dma, *dma_last = NULL;
> > +		int retries = 0;
> > +again:
> > +		mutex_lock(&pfnmap->iommu->lock);
> > +		dma = pfnmap_find_dma(pfnmap);  
> 
> Feels a bit strange that the vfio_dma isn't linked to the pfnmap_obj
> instead of searching the entire list?

It does, I've been paranoid about whether we can trust that the
vfio_dma is still valid in all cases.  I had myself convinced that if
our notifier actions expand we could get another callback before our
workqueue removes the notifier, but that might be more simply handled
by having a vfio_dma pointer that gets cleared once we remove the
vfio_dma.  I'll take another look.


> > @@ -549,8 +625,48 @@ static int vaddr_get_pfn(struct vfio_iommu *iommu, struct vfio_dma *dma,
> >  		if (ret == -EAGAIN)
> >  			goto retry;  
> 
> I'd prefer this was written a bit differently, I would like it very
> much if this doesn't mis-use follow_pte() by returning pfn outside
> the lock.
> 
> vaddr_get_bar_pfn(..)
> {
>         vma = find_vma_intersection(mm, vaddr, vaddr + 1);
> 	if (!vma)
>            return -ENOENT;
>         if ((vma->vm_flags & VM_DENYWRITE) && (prot & PROT_WRITE)) // Check me
>            return -EFAULT;
>         device = vfio_device_get_from_vma(vma);
> 	if (!device)
>            return -ENOENT;
> 
> 	/*
>          * Now do the same as vfio_pci_mmap_fault() - the vm_pgoff must
> 	 * be the physical pfn when using this mechanism. Delete follow_pte entirely()
>          */
>         pfn = (vaddr - vma->vm_start)/PAGE_SIZE + vma->vm_pgoff
> 	
>         /* de-dup device and record that we are using device's pages in the
> 	   pfnmap */
>         ...
> }


This seems to undo both:

5cbf3264bc71 ("vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()")

(which also suggests we are going to break users without the module
option opt-in above)

And:

41311242221e ("vfio/type1: Support faulting PFNMAP vmas")

So we'd have an alternate path in the un-safe mode and we'd lose the
ability to fault in mappings.

> This would be significantly better if it could do whole ranges instead
> of page at a time.

There are some patches I just put in from Daniel Jordan that use
batching.  Thanks,

Alex


  reply	other threads:[~2021-02-24 22:01 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 16:50 [RFC PATCH 00/10] vfio: Device memory DMA mapping improvements Alex Williamson
2021-02-22 16:50 ` [RFC PATCH 01/10] vfio: Create vfio_fs_type with inode per device Alex Williamson
2021-02-26  5:38   ` Christoph Hellwig
2021-02-26 13:15     ` Jason Gunthorpe
2021-02-22 16:50 ` [RFC PATCH 02/10] vfio: Update vfio_add_group_dev() API Alex Williamson
2021-02-22 17:01   ` Jason Gunthorpe
2021-02-22 16:50 ` [RFC PATCH 03/10] vfio: Export unmap_mapping_range() wrapper Alex Williamson
2021-02-22 16:51 ` [RFC PATCH 04/10] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
2021-02-22 17:22   ` Jason Gunthorpe
2021-02-24 21:55     ` Alex Williamson
2021-02-25  0:57       ` Jason Gunthorpe
2021-02-22 16:51 ` [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup Alex Williamson
2021-02-22 17:29   ` Jason Gunthorpe
2021-02-24 21:55     ` Alex Williamson
2021-02-25  0:06       ` Jason Gunthorpe
2021-02-25 22:21         ` Alex Williamson
2021-02-25 23:49           ` Jason Gunthorpe
2021-03-04 21:37             ` Alex Williamson
2021-03-04 23:16               ` Jason Gunthorpe
2021-03-05  0:07                 ` Alex Williamson
2021-03-05  0:36                   ` Jason Gunthorpe
2021-02-22 16:51 ` [RFC PATCH 06/10] vfio: Add a device notifier interface Alex Williamson
2021-02-22 16:51 ` [RFC PATCH 07/10] vfio/pci: Notify on device release Alex Williamson
2021-02-22 16:52 ` [RFC PATCH 08/10] vfio/type1: Refactor pfn_list clearing Alex Williamson
2021-02-22 16:52 ` [RFC PATCH 09/10] vfio/type1: Pass iommu and dma objects through to vaddr_get_pfn Alex Williamson
2021-02-22 16:52 ` [RFC PATCH 10/10] vfio/type1: Register device notifier Alex Williamson
2021-02-22 17:55   ` Jason Gunthorpe
2021-02-24 21:55     ` Alex Williamson [this message]
2021-02-25  0:22       ` Jason Gunthorpe
2021-02-25 17:54         ` Peter Xu
2021-02-25 18:19           ` Jason Gunthorpe
2021-02-25 19:06             ` Peter Xu
2021-02-25 19:17               ` Jason Gunthorpe
2021-02-25 19:54                 ` Peter Xu
2021-02-26  5:47     ` Christoph Hellwig
2021-02-22 18:00 ` [RFC PATCH 00/10] vfio: Device memory DMA mapping improvements Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224145508.1f0edb06@omen.home.shazbot.org \
    --to=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).