kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cohuck@redhat.com, peterx@redhat.com
Subject: Re: [PATCH 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
Date: Tue, 5 May 2020 11:12:27 -0600	[thread overview]
Message-ID: <20200505111227.02ac9cee@w520.home> (raw)
In-Reply-To: <20200504200123.GA26002@ziepe.ca>

On Mon, 4 May 2020 17:01:23 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Mon, May 04, 2020 at 01:35:52PM -0600, Alex Williamson wrote:
> 
> > Ok, this all makes a lot more sense with memory_lock still in the
> > picture.  And it looks like you're not insisting on the wait_event, we
> > can block on memory_lock so long as we don't have an ordering issue.
> > I'll see what I can do.  Thanks,  
> 
> Right, you can block on the rwsem if it is ordered properly vs
> mmap_sem.

This is what I've come up with, please see if you agree with the logic:

void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
{
        struct vfio_pci_mmap_vma *mmap_vma, *tmp;

        /*
         * Lock ordering:
         * vma_lock is nested under mmap_sem for vm_ops callback paths.
         * The memory_lock semaphore is used by both code paths calling
         * into this function to zap vmas and the vm_ops.fault callback
         * to protect the memory enable state of the device.
         *
         * When zapping vmas we need to maintain the mmap_sem => vma_lock
         * ordering, which requires using vma_lock to walk vma_list to
         * acquire an mm, then dropping vma_lock to get the mmap_sem and
         * reacquiring vma_lock.  This logic is derived from similar
         * requirements in uverbs_user_mmap_disassociate().
         *
         * mmap_sem must always be the top-level lock when it is taken.
         * Therefore we can only hold the memory_lock write lock when
         * vma_list is empty, as we'd need to take mmap_sem to clear
         * entries.  vma_list can only be guaranteed empty when holding
         * vma_lock, thus memory_lock is nested under vma_lock.
         *
         * This enables the vm_ops.fault callback to acquire vma_lock,
         * followed by memory_lock read lock, while already holding
         * mmap_sem without risk of deadlock.
         */
        while (1) {
                struct mm_struct *mm = NULL;

                mutex_lock(&vdev->vma_lock);
                while (!list_empty(&vdev->vma_list)) {
                        mmap_vma = list_first_entry(&vdev->vma_list,
                                                    struct vfio_pci_mmap_vma,
                                                    vma_next);
                        mm = mmap_vma->vma->vm_mm;
                        if (mmget_not_zero(mm))
                                break;

                        list_del(&mmap_vma->vma_next);
                        kfree(mmap_vma);
                        mm = NULL;
                }

                if (!mm)
                        break;
                mutex_unlock(&vdev->vma_lock);

                down_read(&mm->mmap_sem);
                if (mmget_still_valid(mm)) {
                        mutex_lock(&vdev->vma_lock);
                        list_for_each_entry_safe(mmap_vma, tmp,
                                                 &vdev->vma_list, vma_next) {
                                struct vm_area_struct *vma = mmap_vma->vma;

                                if (vma->vm_mm != mm)
                                        continue;

                                list_del(&mmap_vma->vma_next);
                                kfree(mmap_vma);

                                zap_vma_ptes(vma, vma->vm_start,
                                             vma->vm_end - vma->vm_start);
                        }
                        mutex_unlock(&vdev->vma_lock);
                }
                up_read(&mm->mmap_sem);
                mmput(mm);
        }

        down_write(&vdev->memory_lock);
        mutex_unlock(&vdev->vma_lock);
}

As noted in the comment, the fault handler can simply do:

mutex_lock(&vdev->vma_lock);
down_read(&vdev->memory_lock);

This should be deadlock free now, so we can drop the retry handling

Paths needing to acquire memory_lock with vmas zapped (device reset,
memory bit *->0 transition) call this function, perform their
operation, then simply release with up_write(&vdev->memory_lock).  Both
the read and write version of acquiring memory_lock can still occur
outside this function for operations that don't require flushing all
vmas or otherwise touch vma_lock or mmap_sem (ex. read/write, MSI-X
vector table access, writing *->1 to memory enable bit).

I still need to work on the bus reset path as acquiring memory_lock
write locks across multiple devices seems like it requires try-lock
behavior, which is clearly complicated, or at least messy in the above
function.

Does this seem like it's going in a reasonable direction?  Thanks,

Alex


  reply	other threads:[~2020-05-05 17:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-01 21:38 [PATCH 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
2020-05-01 21:39 ` [PATCH 1/3] vfio/type1: Support faulting PFNMAP vmas Alex Williamson
2020-05-01 23:50   ` Jason Gunthorpe
2020-05-04 14:06     ` Alex Williamson
2020-05-04 15:02       ` Jason Gunthorpe
2020-05-01 21:39 ` [PATCH 2/3] vfio-pci: Fault mmaps to enable vma tracking Alex Williamson
2020-05-01 23:25   ` Jason Gunthorpe
2020-05-04 14:20     ` Alex Williamson
2020-05-04 15:05       ` Jason Gunthorpe
2020-05-04 15:53         ` Alex Williamson
2020-05-01 21:39 ` [PATCH 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Alex Williamson
2020-05-01 23:48   ` Jason Gunthorpe
2020-05-04 18:26     ` Alex Williamson
2020-05-04 18:44       ` Jason Gunthorpe
2020-05-04 19:35         ` Alex Williamson
2020-05-04 20:01           ` Jason Gunthorpe
2020-05-05 17:12             ` Alex Williamson [this message]
2020-05-05 18:36               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200505111227.02ac9cee@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).