From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
cohuck@redhat.com, peterx@redhat.com
Subject: Re: [PATCH 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
Date: Tue, 5 May 2020 11:12:27 -0600 [thread overview]
Message-ID: <20200505111227.02ac9cee@w520.home> (raw)
In-Reply-To: <20200504200123.GA26002@ziepe.ca>
On Mon, 4 May 2020 17:01:23 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Mon, May 04, 2020 at 01:35:52PM -0600, Alex Williamson wrote:
>
> > Ok, this all makes a lot more sense with memory_lock still in the
> > picture. And it looks like you're not insisting on the wait_event, we
> > can block on memory_lock so long as we don't have an ordering issue.
> > I'll see what I can do. Thanks,
>
> Right, you can block on the rwsem if it is ordered properly vs
> mmap_sem.
This is what I've come up with, please see if you agree with the logic:
void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
{
struct vfio_pci_mmap_vma *mmap_vma, *tmp;
/*
* Lock ordering:
* vma_lock is nested under mmap_sem for vm_ops callback paths.
* The memory_lock semaphore is used by both code paths calling
* into this function to zap vmas and the vm_ops.fault callback
* to protect the memory enable state of the device.
*
* When zapping vmas we need to maintain the mmap_sem => vma_lock
* ordering, which requires using vma_lock to walk vma_list to
* acquire an mm, then dropping vma_lock to get the mmap_sem and
* reacquiring vma_lock. This logic is derived from similar
* requirements in uverbs_user_mmap_disassociate().
*
* mmap_sem must always be the top-level lock when it is taken.
* Therefore we can only hold the memory_lock write lock when
* vma_list is empty, as we'd need to take mmap_sem to clear
* entries. vma_list can only be guaranteed empty when holding
* vma_lock, thus memory_lock is nested under vma_lock.
*
* This enables the vm_ops.fault callback to acquire vma_lock,
* followed by memory_lock read lock, while already holding
* mmap_sem without risk of deadlock.
*/
while (1) {
struct mm_struct *mm = NULL;
mutex_lock(&vdev->vma_lock);
while (!list_empty(&vdev->vma_list)) {
mmap_vma = list_first_entry(&vdev->vma_list,
struct vfio_pci_mmap_vma,
vma_next);
mm = mmap_vma->vma->vm_mm;
if (mmget_not_zero(mm))
break;
list_del(&mmap_vma->vma_next);
kfree(mmap_vma);
mm = NULL;
}
if (!mm)
break;
mutex_unlock(&vdev->vma_lock);
down_read(&mm->mmap_sem);
if (mmget_still_valid(mm)) {
mutex_lock(&vdev->vma_lock);
list_for_each_entry_safe(mmap_vma, tmp,
&vdev->vma_list, vma_next) {
struct vm_area_struct *vma = mmap_vma->vma;
if (vma->vm_mm != mm)
continue;
list_del(&mmap_vma->vma_next);
kfree(mmap_vma);
zap_vma_ptes(vma, vma->vm_start,
vma->vm_end - vma->vm_start);
}
mutex_unlock(&vdev->vma_lock);
}
up_read(&mm->mmap_sem);
mmput(mm);
}
down_write(&vdev->memory_lock);
mutex_unlock(&vdev->vma_lock);
}
As noted in the comment, the fault handler can simply do:
mutex_lock(&vdev->vma_lock);
down_read(&vdev->memory_lock);
This should be deadlock free now, so we can drop the retry handling
Paths needing to acquire memory_lock with vmas zapped (device reset,
memory bit *->0 transition) call this function, perform their
operation, then simply release with up_write(&vdev->memory_lock). Both
the read and write version of acquiring memory_lock can still occur
outside this function for operations that don't require flushing all
vmas or otherwise touch vma_lock or mmap_sem (ex. read/write, MSI-X
vector table access, writing *->1 to memory enable bit).
I still need to work on the bus reset path as acquiring memory_lock
write locks across multiple devices seems like it requires try-lock
behavior, which is clearly complicated, or at least messy in the above
function.
Does this seem like it's going in a reasonable direction? Thanks,
Alex
next prev parent reply other threads:[~2020-05-05 17:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-01 21:38 [PATCH 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
2020-05-01 21:39 ` [PATCH 1/3] vfio/type1: Support faulting PFNMAP vmas Alex Williamson
2020-05-01 23:50 ` Jason Gunthorpe
2020-05-04 14:06 ` Alex Williamson
2020-05-04 15:02 ` Jason Gunthorpe
2020-05-01 21:39 ` [PATCH 2/3] vfio-pci: Fault mmaps to enable vma tracking Alex Williamson
2020-05-01 23:25 ` Jason Gunthorpe
2020-05-04 14:20 ` Alex Williamson
2020-05-04 15:05 ` Jason Gunthorpe
2020-05-04 15:53 ` Alex Williamson
2020-05-01 21:39 ` [PATCH 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Alex Williamson
2020-05-01 23:48 ` Jason Gunthorpe
2020-05-04 18:26 ` Alex Williamson
2020-05-04 18:44 ` Jason Gunthorpe
2020-05-04 19:35 ` Alex Williamson
2020-05-04 20:01 ` Jason Gunthorpe
2020-05-05 17:12 ` Alex Williamson [this message]
2020-05-05 18:36 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200505111227.02ac9cee@w520.home \
--to=alex.williamson@redhat.com \
--cc=cohuck@redhat.com \
--cc=jgg@ziepe.ca \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).