Re: [PATCH 1/3] vfio/type1: Support faulting PFNMAP vmas

From: Jason Gunthorpe <jgg@ziepe.ca>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cohuck@redhat.com, peterx@redhat.com
Subject: Re: [PATCH 1/3] vfio/type1: Support faulting PFNMAP vmas
Date: Mon, 4 May 2020 12:02:48 -0300	[thread overview]
Message-ID: <20200504150248.GW26002@ziepe.ca> (raw)
In-Reply-To: <20200504080630.293f33e8@x1.home>

On Mon, May 04, 2020 at 08:06:30AM -0600, Alex Williamson wrote:
> On Fri, 1 May 2020 20:50:33 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> > On Fri, May 01, 2020 at 03:39:08PM -0600, Alex Williamson wrote:
> > > With conversion to follow_pfn(), DMA mapping a PFNMAP range depends on
> > > the range being faulted into the vma.  Add support to manually provide
> > > that, in the same way as done on KVM with hva_to_pfn_remapped().
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > >  drivers/vfio/vfio_iommu_type1.c |   36 +++++++++++++++++++++++++++++++++---
> > >  1 file changed, 33 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index cc1d64765ce7..4a4cb7cd86b2 100644
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -317,6 +317,32 @@ static int put_pfn(unsigned long pfn, int prot)
> > >  	return 0;
> > >  }
> > >  
> > > +static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
> > > +			    unsigned long vaddr, unsigned long *pfn,
> > > +			    bool write_fault)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = follow_pfn(vma, vaddr, pfn);
> > > +	if (ret) {
> > > +		bool unlocked = false;
> > > +
> > > +		ret = fixup_user_fault(NULL, mm, vaddr,
> > > +				       FAULT_FLAG_REMOTE |
> > > +				       (write_fault ?  FAULT_FLAG_WRITE : 0),
> > > +				       &unlocked);
> > > +		if (unlocked)
> > > +			return -EAGAIN;
> > > +
> > > +		if (ret)
> > > +			return ret;
> > > +
> > > +		ret = follow_pfn(vma, vaddr, pfn);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >  static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> > >  			 int prot, unsigned long *pfn)
> > >  {
> > > @@ -339,12 +365,16 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> > >  
> > >  	vaddr = untagged_addr(vaddr);
> > >  
> > > +retry:
> > >  	vma = find_vma_intersection(mm, vaddr, vaddr + 1);
> > >  
> > >  	if (vma && vma->vm_flags & VM_PFNMAP) {
> > > -		if (!follow_pfn(vma, vaddr, pfn) &&
> > > -		    is_invalid_reserved_pfn(*pfn))
> > > -			ret = 0;
> > > +		ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE);
> > > +		if (ret == -EAGAIN)
> > > +			goto retry;
> > > +
> > > +		if (!ret && !is_invalid_reserved_pfn(*pfn))
> > > +			ret = -EFAULT;  
> > 
> > I suggest checking vma->vm_ops == &vfio_pci_mmap_ops and adding a
> > comment that this is racy and needs to be fixed up. The ops check
> > makes this only used by other vfio bars and should prevent some
> > abuses of this hacky thing
> 
> We can't do that, vfio-pci is only one bus driver within the vfio
> ecosystem.

Given this flow is already hacky, maybe it is OK?

> > However, I wonder if this chould just link itself into the
> > vma->private data so that when the vfio that owns the bar goes away,
> > so does the iommu mapping?
> 
> I don't really see why we wouldn't use mmu notifiers so that the vfio
> iommu backend and vfio bus driver remain independent.

mmu notifiers have tended to be complicated enough that if they can be
avoided it is usually better.

eg you can't just use mmu notifiers here, you have to use an entire
whole pinless page faulting scheme with the locking like
hmm_range_fault uses.

You also have to be very very careful with locking around invalidation
of the iommu to avoid deadlock. For instance the notifier invalidate
cannot do GFP_KERNEL memory allocations.

Jason