From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Andre Przywara <andre.przywara@arm.com>
Cc: kvm@vger.kernel.org, will@kernel.org,
julien.thierry.kdev@gmail.com, sami.mujawar@arm.com,
lorenzo.pieralisi@arm.com, maz@kernel.org
Subject: Re: [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs
Date: Mon, 9 Mar 2020 12:38:44 +0000 [thread overview]
Message-ID: <b48f3b29-38cc-3ae9-c118-9f8d9b3528f7@arm.com> (raw)
In-Reply-To: <20200205170129.6681e14b@donnerap.cambridge.arm.com>
Hi,
On 2/5/20 5:01 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:57 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> When we want to map a device region into the guest address space, first
>> we perform an mmap on the device fd. The resulting VMA is a mapping
>> between host userspace addresses and physical addresses associated with
>> the device. Next, we create a memslot, which populates the stage 2 table
>> with the mappings between guest physical addresses and the device
>> physical adresses.
>>
>> However, when we want to unmap the device from the guest address space,
>> we only call munmap, which destroys the VMA and the stage 2 mappings,
>> but doesn't destroy the memslot and kvmtool's internal mem_bank
>> structure associated with the memslot.
>>
>> This has been perfectly fine so far, because we only unmap a device
>> region when we exit kvmtool. This is will change when we add support for
>> reassignable BARs, and we will have to unmap vfio regions as the guest
>> kernel writes new addresses in the BARs. This can lead to two possible
>> problems:
>>
>> - We refuse to create a valid BAR mapping because of a stale mem_bank
>> structure which belonged to a previously unmapped region.
>>
>> - It is possible that the mmap in vfio_map_region returns the same
>> address that was used to create a memslot, but was unmapped by
>> vfio_unmap_region. Guest accesses to the device memory will fault
>> because the stage 2 mappings are missing, and this can lead to
>> performance degradation.
>>
>> Let's do the right thing and destroy the memslot and the mem_bank struct
>> associated with it when we unmap a vfio region. Set host_addr to NULL
>> after the munmap call so we won't try to unmap an address which is
>> currently used if vfio_unmap_region gets called twice.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>> include/kvm/kvm.h | 2 ++
>> kvm.c | 65 ++++++++++++++++++++++++++++++++++++++++++++---
>> vfio/core.c | 6 +++++
>> 3 files changed, 69 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
>> index 50119a8672eb..c7e57b890cdd 100644
>> --- a/include/kvm/kvm.h
>> +++ b/include/kvm/kvm.h
>> @@ -56,6 +56,7 @@ struct kvm_mem_bank {
>> void *host_addr;
>> u64 size;
>> enum kvm_mem_type type;
>> + u32 slot;
>> };
>>
>> struct kvm {
>> @@ -106,6 +107,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>> void kvm__irq_trigger(struct kvm *kvm, int irq);
>> bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
>> bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
>> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
>> int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
>> enum kvm_mem_type type);
>> static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
>> diff --git a/kvm.c b/kvm.c
>> index 57c4ff98ec4c..afcf55c7bf45 100644
>> --- a/kvm.c
>> +++ b/kvm.c
>> @@ -183,20 +183,75 @@ int kvm__exit(struct kvm *kvm)
>> }
>> core_exit(kvm__exit);
>>
>> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>> + void *userspace_addr)
>> +{
>> + struct kvm_userspace_memory_region mem;
>> + struct kvm_mem_bank *bank;
>> + int ret;
>> +
>> + list_for_each_entry(bank, &kvm->mem_banks, list)
>> + if (bank->guest_phys_addr == guest_phys &&
>> + bank->size == size && bank->host_addr == userspace_addr)
>> + break;
> Shouldn't we protect the list with some lock? I am actually not sure we have this problem already, but at least now a guest could reassign BARs concurrently on different VCPUs, in which case multiple kvm__destroy_mem() and kvm__register_dev_mem() calls might race against each other.
> I think so far we got away with it because of the currently static nature of the memslot assignment.
And the fact that I haven't tested PCI passthrough with more than one device :)
I'll protect changes to the memory banks with a lock.
>
>> +
>> + if (&bank->list == &kvm->mem_banks) {
>> + pr_err("Region [%llx-%llx] not found", guest_phys,
>> + guest_phys + size - 1);
>> + return -EINVAL;
>> + }
>> +
>> + if (bank->type == KVM_MEM_TYPE_RESERVED) {
>> + pr_err("Cannot delete reserved region [%llx-%llx]",
>> + guest_phys, guest_phys + size - 1);
>> + return -EINVAL;
>> + }
>> +
>> + mem = (struct kvm_userspace_memory_region) {
>> + .slot = bank->slot,
>> + .guest_phys_addr = guest_phys,
>> + .memory_size = 0,
>> + .userspace_addr = (unsigned long)userspace_addr,
>> + };
>> +
>> + ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
>> + if (ret < 0)
>> + return -errno;
>> +
>> + list_del(&bank->list);
>> + free(bank);
>> + kvm->mem_slots--;
>> +
>> + return 0;
>> +}
>> +
>> int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>> void *userspace_addr, enum kvm_mem_type type)
>> {
>> struct kvm_userspace_memory_region mem;
>> struct kvm_mem_bank *merged = NULL;
>> struct kvm_mem_bank *bank;
>> + struct list_head *prev_entry;
>> + u32 slot;
>> int ret;
>>
>> - /* Check for overlap */
>> + /* Check for overlap and find first empty slot. */
>> + slot = 0;
>> + prev_entry = &kvm->mem_banks;
>> list_for_each_entry(bank, &kvm->mem_banks, list) {
>> u64 bank_end = bank->guest_phys_addr + bank->size - 1;
>> u64 end = guest_phys + size - 1;
>> - if (guest_phys > bank_end || end < bank->guest_phys_addr)
>> + if (guest_phys > bank_end || end < bank->guest_phys_addr) {
>> + /*
>> + * Keep the banks sorted ascending by slot, so it's
>> + * easier for us to find a free slot.
>> + */
>> + if (bank->slot == slot) {
>> + slot++;
>> + prev_entry = &bank->list;
>> + }
>> continue;
>> + }
>>
>> /* Merge overlapping reserved regions */
>> if (bank->type == KVM_MEM_TYPE_RESERVED &&
>> @@ -241,10 +296,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>> bank->host_addr = userspace_addr;
>> bank->size = size;
>> bank->type = type;
>> + bank->slot = slot;
>>
>> if (type != KVM_MEM_TYPE_RESERVED) {
>> mem = (struct kvm_userspace_memory_region) {
>> - .slot = kvm->mem_slots++,
>> + .slot = slot,
>> .guest_phys_addr = guest_phys,
>> .memory_size = size,
>> .userspace_addr = (unsigned long)userspace_addr,
>> @@ -255,7 +311,8 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>> return -errno;
>> }
>>
>> - list_add(&bank->list, &kvm->mem_banks);
>> + list_add(&bank->list, prev_entry);
>> + kvm->mem_slots++;
>>
>> return 0;
>> }
>> diff --git a/vfio/core.c b/vfio/core.c
>> index 0ed1e6fee6bf..73fdac8be675 100644
>> --- a/vfio/core.c
>> +++ b/vfio/core.c
>> @@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
>>
>> void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
>> {
>> + u64 map_size;
>> +
>> if (region->host_addr) {
>> + map_size = ALIGN(region->info.size, PAGE_SIZE);
>> munmap(region->host_addr, region->info.size);
>> + kvm__destroy_mem(kvm, region->guest_phys_addr, map_size,
>> + region->host_addr);
> Shouldn't we destroy the memslot first, then unmap? Because in the current version we are giving a no longer valid userland address to the ioctl. I actually wonder how that passes the access_ok() check in the kernel's KVM_SET_USER_MEMORY_REGION handler.
Yes, you're right. From Documentation/virt/kvm/api.txt, section 4.35
KVM_SET_USER_MEMORY_REGION:
"[..] Memory for the region is taken starting at the address denoted by the field
userspace_addr, which must point at user addressable memory for the entire memory
slot size."
I'll put the munmap after the ioctl.
Thanks,
Alex
>
> Cheers,
> Andre
>
>> + region->host_addr = NULL;
>> } else if (region->is_ioport) {
>> ioport__unregister(kvm, region->port_base);
>> } else {
next prev parent reply other threads:[~2020-03-09 12:38 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86 Alexandru Elisei
2020-01-27 18:07 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration Alexandru Elisei
2020-01-27 18:07 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 04/30] Remove pci-shmem device Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two Alexandru Elisei
2020-01-27 18:07 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
2020-01-27 18:08 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
2020-02-07 17:02 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 08/30] pci: Fix ioport allocation size Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region Alexandru Elisei
2020-01-29 18:16 ` Andre Przywara
2020-03-04 16:20 ` Alexandru Elisei
2020-03-05 13:06 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
2020-01-29 18:16 ` Andre Przywara
2020-03-05 15:41 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
2020-01-29 18:16 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
2020-01-30 14:50 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
2020-01-30 14:50 ` Andre Przywara
2020-01-30 15:52 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions Alexandru Elisei
2020-01-29 18:17 ` Andre Przywara
2020-03-06 10:54 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures Alexandru Elisei
2020-01-30 14:51 ` Andre Przywara
2020-03-06 11:20 ` Alexandru Elisei
2020-03-30 9:27 ` André Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
2020-01-30 14:51 ` Andre Przywara
2020-03-06 11:28 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors Alexandru Elisei
2020-01-30 14:52 ` Andre Przywara
2020-03-06 12:33 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0 Alexandru Elisei
2020-02-03 12:20 ` Andre Przywara
2020-02-03 12:27 ` Alexandru Elisei
2020-02-05 17:00 ` Andre Przywara
2020-03-06 12:40 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Alexandru Elisei
2020-02-03 12:23 ` Andre Przywara
2020-02-05 11:25 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
2020-02-05 17:00 ` Andre Przywara
2020-02-05 17:02 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs Alexandru Elisei
2020-02-05 17:01 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
2020-02-05 17:01 ` Andre Przywara
2020-03-09 12:38 ` Alexandru Elisei [this message]
2020-01-23 13:47 ` [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
2020-02-05 18:34 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice Alexandru Elisei
2020-02-05 18:35 ` Andre Przywara
2020-03-09 15:21 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
2020-02-06 18:21 ` Andre Przywara
2020-02-07 10:12 ` Alexandru Elisei
2020-02-07 15:39 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
2020-02-06 18:21 ` Andre Przywara
2020-02-07 11:08 ` Alexandru Elisei
2020-02-07 11:36 ` Andre Przywara
2020-02-07 11:44 ` Alexandru Elisei
2020-03-09 14:54 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs Alexandru Elisei
2020-02-07 16:50 ` Andre Przywara
2020-03-10 14:17 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
2020-02-07 16:51 ` Andre Przywara
2020-02-07 17:38 ` Andre Przywara
2020-03-10 16:04 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
2020-02-07 16:51 ` Andre Przywara
2020-01-23 13:48 ` [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2020-02-07 16:51 ` Andre Przywara
2020-03-10 16:28 ` Alexandru Elisei
2020-02-07 17:02 ` [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE " Andre Przywara
2020-05-13 14:56 ` Marc Zyngier
2020-05-13 15:15 ` Alexandru Elisei
2020-05-13 16:41 ` Alexandru Elisei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b48f3b29-38cc-3ae9-c118-9f8d9b3528f7@arm.com \
--to=alexandru.elisei@arm.com \
--cc=andre.przywara@arm.com \
--cc=julien.thierry.kdev@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=maz@kernel.org \
--cc=sami.mujawar@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).