From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6B38C35247 for ; Wed, 5 Feb 2020 17:01:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 89F5C2072B for ; Wed, 5 Feb 2020 17:01:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727482AbgBERBe (ORCPT ); Wed, 5 Feb 2020 12:01:34 -0500 Received: from foss.arm.com ([217.140.110.172]:49724 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727081AbgBERBe (ORCPT ); Wed, 5 Feb 2020 12:01:34 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EB80F1FB; Wed, 5 Feb 2020 09:01:33 -0800 (PST) Received: from donnerap.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 765AE3F52E; Wed, 5 Feb 2020 09:01:32 -0800 (PST) Date: Wed, 5 Feb 2020 17:01:29 +0000 From: Andre Przywara To: Alexandru Elisei Cc: kvm@vger.kernel.org, will@kernel.org, julien.thierry.kdev@gmail.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: Re: [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Message-ID: <20200205170129.6681e14b@donnerap.cambridge.arm.com> In-Reply-To: <20200123134805.1993-23-alexandru.elisei@arm.com> References: <20200123134805.1993-1-alexandru.elisei@arm.com> <20200123134805.1993-23-alexandru.elisei@arm.com> Organization: ARM X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; aarch64-unknown-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, 23 Jan 2020 13:47:57 +0000 Alexandru Elisei wrote: Hi, > When we want to map a device region into the guest address space, first > we perform an mmap on the device fd. The resulting VMA is a mapping > between host userspace addresses and physical addresses associated with > the device. Next, we create a memslot, which populates the stage 2 table > with the mappings between guest physical addresses and the device > physical adresses. > > However, when we want to unmap the device from the guest address space, > we only call munmap, which destroys the VMA and the stage 2 mappings, > but doesn't destroy the memslot and kvmtool's internal mem_bank > structure associated with the memslot. > > This has been perfectly fine so far, because we only unmap a device > region when we exit kvmtool. This is will change when we add support for > reassignable BARs, and we will have to unmap vfio regions as the guest > kernel writes new addresses in the BARs. This can lead to two possible > problems: > > - We refuse to create a valid BAR mapping because of a stale mem_bank > structure which belonged to a previously unmapped region. > > - It is possible that the mmap in vfio_map_region returns the same > address that was used to create a memslot, but was unmapped by > vfio_unmap_region. Guest accesses to the device memory will fault > because the stage 2 mappings are missing, and this can lead to > performance degradation. > > Let's do the right thing and destroy the memslot and the mem_bank struct > associated with it when we unmap a vfio region. Set host_addr to NULL > after the munmap call so we won't try to unmap an address which is > currently used if vfio_unmap_region gets called twice. > > Signed-off-by: Alexandru Elisei > --- > include/kvm/kvm.h | 2 ++ > kvm.c | 65 ++++++++++++++++++++++++++++++++++++++++++++--- > vfio/core.c | 6 +++++ > 3 files changed, 69 insertions(+), 4 deletions(-) > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h > index 50119a8672eb..c7e57b890cdd 100644 > --- a/include/kvm/kvm.h > +++ b/include/kvm/kvm.h > @@ -56,6 +56,7 @@ struct kvm_mem_bank { > void *host_addr; > u64 size; > enum kvm_mem_type type; > + u32 slot; > }; > > struct kvm { > @@ -106,6 +107,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level); > void kvm__irq_trigger(struct kvm *kvm, int irq); > bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count); > bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write); > +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr); > int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr, > enum kvm_mem_type type); > static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size, > diff --git a/kvm.c b/kvm.c > index 57c4ff98ec4c..afcf55c7bf45 100644 > --- a/kvm.c > +++ b/kvm.c > @@ -183,20 +183,75 @@ int kvm__exit(struct kvm *kvm) > } > core_exit(kvm__exit); > > +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, > + void *userspace_addr) > +{ > + struct kvm_userspace_memory_region mem; > + struct kvm_mem_bank *bank; > + int ret; > + > + list_for_each_entry(bank, &kvm->mem_banks, list) > + if (bank->guest_phys_addr == guest_phys && > + bank->size == size && bank->host_addr == userspace_addr) > + break; Shouldn't we protect the list with some lock? I am actually not sure we have this problem already, but at least now a guest could reassign BARs concurrently on different VCPUs, in which case multiple kvm__destroy_mem() and kvm__register_dev_mem() calls might race against each other. I think so far we got away with it because of the currently static nature of the memslot assignment. > + > + if (&bank->list == &kvm->mem_banks) { > + pr_err("Region [%llx-%llx] not found", guest_phys, > + guest_phys + size - 1); > + return -EINVAL; > + } > + > + if (bank->type == KVM_MEM_TYPE_RESERVED) { > + pr_err("Cannot delete reserved region [%llx-%llx]", > + guest_phys, guest_phys + size - 1); > + return -EINVAL; > + } > + > + mem = (struct kvm_userspace_memory_region) { > + .slot = bank->slot, > + .guest_phys_addr = guest_phys, > + .memory_size = 0, > + .userspace_addr = (unsigned long)userspace_addr, > + }; > + > + ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem); > + if (ret < 0) > + return -errno; > + > + list_del(&bank->list); > + free(bank); > + kvm->mem_slots--; > + > + return 0; > +} > + > int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, > void *userspace_addr, enum kvm_mem_type type) > { > struct kvm_userspace_memory_region mem; > struct kvm_mem_bank *merged = NULL; > struct kvm_mem_bank *bank; > + struct list_head *prev_entry; > + u32 slot; > int ret; > > - /* Check for overlap */ > + /* Check for overlap and find first empty slot. */ > + slot = 0; > + prev_entry = &kvm->mem_banks; > list_for_each_entry(bank, &kvm->mem_banks, list) { > u64 bank_end = bank->guest_phys_addr + bank->size - 1; > u64 end = guest_phys + size - 1; > - if (guest_phys > bank_end || end < bank->guest_phys_addr) > + if (guest_phys > bank_end || end < bank->guest_phys_addr) { > + /* > + * Keep the banks sorted ascending by slot, so it's > + * easier for us to find a free slot. > + */ > + if (bank->slot == slot) { > + slot++; > + prev_entry = &bank->list; > + } > continue; > + } > > /* Merge overlapping reserved regions */ > if (bank->type == KVM_MEM_TYPE_RESERVED && > @@ -241,10 +296,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, > bank->host_addr = userspace_addr; > bank->size = size; > bank->type = type; > + bank->slot = slot; > > if (type != KVM_MEM_TYPE_RESERVED) { > mem = (struct kvm_userspace_memory_region) { > - .slot = kvm->mem_slots++, > + .slot = slot, > .guest_phys_addr = guest_phys, > .memory_size = size, > .userspace_addr = (unsigned long)userspace_addr, > @@ -255,7 +311,8 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, > return -errno; > } > > - list_add(&bank->list, &kvm->mem_banks); > + list_add(&bank->list, prev_entry); > + kvm->mem_slots++; > > return 0; > } > diff --git a/vfio/core.c b/vfio/core.c > index 0ed1e6fee6bf..73fdac8be675 100644 > --- a/vfio/core.c > +++ b/vfio/core.c > @@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev, > > void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region) > { > + u64 map_size; > + > if (region->host_addr) { > + map_size = ALIGN(region->info.size, PAGE_SIZE); > munmap(region->host_addr, region->info.size); > + kvm__destroy_mem(kvm, region->guest_phys_addr, map_size, > + region->host_addr); Shouldn't we destroy the memslot first, then unmap? Because in the current version we are giving a no longer valid userland address to the ioctl. I actually wonder how that passes the access_ok() check in the kernel's KVM_SET_USER_MEMORY_REGION handler. Cheers, Andre > + region->host_addr = NULL; > } else if (region->is_ioport) { > ioport__unregister(kvm, region->port_base); > } else {