From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05B8AC33CAC for ; Mon, 3 Feb 2020 12:23:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CB4F42166E for ; Mon, 3 Feb 2020 12:23:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727965AbgBCMXs (ORCPT ); Mon, 3 Feb 2020 07:23:48 -0500 Received: from foss.arm.com ([217.140.110.172]:52722 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727947AbgBCMXs (ORCPT ); Mon, 3 Feb 2020 07:23:48 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EE817FEC; Mon, 3 Feb 2020 04:23:47 -0800 (PST) Received: from donnerap.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EB23D3F52E; Mon, 3 Feb 2020 04:23:46 -0800 (PST) Date: Mon, 3 Feb 2020 12:23:38 +0000 From: Andre Przywara To: Alexandru Elisei Cc: kvm@vger.kernel.org, will@kernel.org, julien.thierry.kdev@gmail.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: Re: [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Message-ID: <20200203122338.6565e96a@donnerap.cambridge.arm.com> In-Reply-To: <20200123134805.1993-20-alexandru.elisei@arm.com> References: <20200123134805.1993-1-alexandru.elisei@arm.com> <20200123134805.1993-20-alexandru.elisei@arm.com> Organization: ARM X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; aarch64-unknown-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, 23 Jan 2020 13:47:54 +0000 Alexandru Elisei wrote: Hi, > kvmtool uses brlock for protecting accesses to the ioport and mmio > red-black trees. brlock allows concurrent reads, but only one writer, > which is assumed not to be a VCPU thread. This is done by issuing a > compiler barrier on read and pausing the entire virtual machine on > writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread > read/write lock. > > When we will implement reassignable BARs, the mmio or ioport mapping > will be done as a result of a VCPU mmio access. When brlock is a > read/write lock, it means that we will try to acquire a write lock with > the read lock already held by the same VCPU and we will deadlock. When > it's not, a VCPU will have to call kvm__pause, which means the virtual > machine will stay paused forever. > > Let's avoid all this by using separate pthread_rwlock_t locks for the > mmio and the ioport red-black trees and carefully choosing our read > critical region such that modification as a result of a guest mmio > access doesn't deadlock. > > Signed-off-by: Alexandru Elisei > --- > ioport.c | 20 +++++++++++--------- > mmio.c | 26 +++++++++++++++++--------- > 2 files changed, 28 insertions(+), 18 deletions(-) > > diff --git a/ioport.c b/ioport.c > index d224819c6e43..c044a80dd763 100644 > --- a/ioport.c > +++ b/ioport.c > @@ -2,9 +2,9 @@ > > #include "kvm/kvm.h" > #include "kvm/util.h" > -#include "kvm/brlock.h" > #include "kvm/rbtree-interval.h" > #include "kvm/mutex.h" > +#include "kvm/rwsem.h" > > #include /* for KVM_EXIT_* */ > #include > @@ -16,6 +16,8 @@ > > #define ioport_node(n) rb_entry(n, struct ioport, node) > > +static DECLARE_RWSEM(ioport_lock); > + > static struct rb_root ioport_tree = RB_ROOT; > > static struct ioport *ioport_search(struct rb_root *root, u64 addr) > @@ -68,7 +70,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i > struct ioport *entry; > int r; > > - br_write_lock(kvm); > + down_write(&ioport_lock); > > entry = ioport_search(&ioport_tree, port); > if (entry) { > @@ -96,7 +98,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i > r = device__register(&entry->dev_hdr); > if (r < 0) > goto out_erase; > - br_write_unlock(kvm); > + up_write(&ioport_lock); > > return port; > > @@ -104,7 +106,7 @@ out_erase: > rb_int_erase(&ioport_tree, &entry->node); > out_free: > free(entry); > - br_write_unlock(kvm); > + up_write(&ioport_lock); > return r; > } > > @@ -113,7 +115,7 @@ int ioport__unregister(struct kvm *kvm, u16 port) > struct ioport *entry; > int r; > > - br_write_lock(kvm); > + down_write(&ioport_lock); > > r = -ENOENT; > entry = ioport_search(&ioport_tree, port); > @@ -128,7 +130,7 @@ int ioport__unregister(struct kvm *kvm, u16 port) > r = 0; > > done: > - br_write_unlock(kvm); > + up_write(&ioport_lock); > > return r; > } > @@ -171,8 +173,10 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, > void *ptr = data; > struct kvm *kvm = vcpu->kvm; > > - br_read_lock(kvm); > + down_read(&ioport_lock); > entry = ioport_search(&ioport_tree, port); > + up_read(&ioport_lock); > + > if (!entry) > goto out; I don't think it's valid to drop the lock that early. A concurrent ioport_unregister would free the entry pointer, so we have a potential use-after-free here. I guess you are thinking about an x86 CF8/CFC config space access here, that in turn would take the write lock when updating an I/O BAR? So I think the same comment that you added below on kvm__emulate_mmio() applies here? More on this below then .... > > @@ -188,8 +192,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, > } > > out: > - br_read_unlock(kvm); > - > if (ret) > return true; > > diff --git a/mmio.c b/mmio.c > index 61e1d47a587d..4e0ff830c738 100644 > --- a/mmio.c > +++ b/mmio.c > @@ -1,7 +1,7 @@ > #include "kvm/kvm.h" > #include "kvm/kvm-cpu.h" > #include "kvm/rbtree-interval.h" > -#include "kvm/brlock.h" > +#include "kvm/rwsem.h" > > #include > #include > @@ -15,6 +15,8 @@ > > #define mmio_node(n) rb_entry(n, struct mmio_mapping, node) > > +static DECLARE_RWSEM(mmio_lock); > + > struct mmio_mapping { > struct rb_int_node node; > void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr); > @@ -61,7 +63,7 @@ static const char *to_direction(u8 is_write) > > int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce, > void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr), > - void *ptr) > + void *ptr) > { > struct mmio_mapping *mmio; > struct kvm_coalesced_mmio_zone zone; > @@ -88,9 +90,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c > return -errno; > } > } > - br_write_lock(kvm); > + down_write(&mmio_lock); > ret = mmio_insert(&mmio_tree, mmio); > - br_write_unlock(kvm); > + up_write(&mmio_lock); > > return ret; > } > @@ -100,10 +102,10 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr) > struct mmio_mapping *mmio; > struct kvm_coalesced_mmio_zone zone; > > - br_write_lock(kvm); > + down_write(&mmio_lock); > mmio = mmio_search_single(&mmio_tree, phys_addr); > if (mmio == NULL) { > - br_write_unlock(kvm); > + up_write(&mmio_lock); > return false; > } > > @@ -114,7 +116,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr) > ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone); > > rb_int_erase(&mmio_tree, &mmio->node); > - br_write_unlock(kvm); > + up_write(&mmio_lock); > > free(mmio); > return true; > @@ -124,8 +126,15 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u > { > struct mmio_mapping *mmio; > > - br_read_lock(vcpu->kvm); > + /* > + * The callback might call kvm__register_mmio which takes a write lock, > + * so avoid deadlocks by protecting only the node search with a reader > + * lock. Note that there is still a small time window for a node to be > + * deleted by another vcpu before mmio_fn gets called. > + */ Do I get this right that this means the locking is not "fully" correct? I don't think we should tolerate this. The underlying problem seems to be that the lock protects two separate things: namely the RB tree to find the handler, but also the handlers and their data structures itself. So far this was feasible, but this doesn't work any longer. I think refcounting would be the answer here: Once mmio_search() returns an entry, a ref counter increases, preventing this entry from being removed by kvm__deregister_mmio(). If the emulation has finished, we decrement the counter, and trigger the free operation if it has reached zero. Does that make sense? Cheers, Andre. > + down_read(&mmio_lock); > mmio = mmio_search(&mmio_tree, phys_addr, len); > + up_read(&mmio_lock); > > if (mmio) > mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr); > @@ -135,7 +144,6 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u > to_direction(is_write), > (unsigned long long)phys_addr, len); > } > - br_read_unlock(vcpu->kvm); > > return true; > }