From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [patch 07/10] KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update Date: Thu, 24 Sep 2009 10:28:41 -0700 Message-ID: <20090924172841.GC6265@linux.vnet.ibm.com> References: <20090921233711.213665413@amt.cnet> <20090921234124.596305294@amt.cnet> <20090924140651.GA13623@amt.cnet> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org, avi@redhat.com To: Marcelo Tosatti Return-path: Received: from e7.ny.us.ibm.com ([32.97.182.137]:33685 "EHLO e7.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751700AbZIXR2j (ORCPT ); Thu, 24 Sep 2009 13:28:39 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e7.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id n8OHQSmN014565 for ; Thu, 24 Sep 2009 13:26:28 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n8OHSg1m255514 for ; Thu, 24 Sep 2009 13:28:42 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n8OHSg9C024839 for ; Thu, 24 Sep 2009 13:28:42 -0400 Content-Disposition: inline In-Reply-To: <20090924140651.GA13623@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Sep 24, 2009 at 11:06:51AM -0300, Marcelo Tosatti wrote: > On Mon, Sep 21, 2009 at 08:37:18PM -0300, Marcelo Tosatti wrote: > > Use two steps for memslot deletion: mark the slot invalid (which stops > > instantiation of new shadow pages for that slot, but allows destruction), > > then instantiate the new empty slot. > > > > Also simplifies kvm_handle_hva locking. > > > > Signed-off-by: Marcelo Tosatti > > > > > > > - if (!npages) > > + if (!npages) { > > + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); > > + if (!slots) > > + goto out_free; > > + memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); > > + if (mem->slot >= slots->nmemslots) > > + slots->nmemslots = mem->slot + 1; > > + slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID; > > + > > + old_memslots = kvm->memslots; > > + rcu_assign_pointer(kvm->memslots, slots); > > + synchronize_srcu(&kvm->srcu); > > + /* From this point no new shadow pages pointing to a deleted > > + * memslot will be created. > > + * > > + * validation of sp->gfn happens in: > > + * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) > > + * - kvm_is_visible_gfn (mmu_check_roots) > > + */ > > kvm_arch_flush_shadow(kvm); > > + kfree(old_memslots); > > + } > > > > r = kvm_arch_prepare_memory_region(kvm, &new, old, user_alloc); > > if (r) > > goto out_free; > > > > - spin_lock(&kvm->mmu_lock); > > - if (mem->slot >= kvm->memslots->nmemslots) > > - kvm->memslots->nmemslots = mem->slot + 1; > > +#ifdef CONFIG_DMAR > > + /* map the pages in iommu page table */ > > + if (npages) > > + r = kvm_iommu_map_pages(kvm, &new); > > + if (r) > > + goto out_free; > > +#endif > > > > - *memslot = new; > > - spin_unlock(&kvm->mmu_lock); > > + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); > > + if (!slots) > > + goto out_free; > > + memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots)); > > + if (mem->slot >= slots->nmemslots) > > + slots->nmemslots = mem->slot + 1; > > + > > + /* actual memory is freed via old in kvm_free_physmem_slot below */ > > + if (!npages) { > > + new.rmap = NULL; > > + new.dirty_bitmap = NULL; > > + for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) > > + new.lpage_info[i] = NULL; > > + } > > + > > + slots->memslots[mem->slot] = new; > > + old_memslots = kvm->memslots; > > + rcu_assign_pointer(kvm->memslots, slots); > > + synchronize_srcu(&kvm->srcu); > > > > kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); > > Paul, > > There is a scenario where this path, which updates KVM memory slots, is > called relatively often. > > Each synchronize_srcu() call takes about 10ms (avg 3ms per > synchronize_sched call), so this is hurting us. > > Is this expected? Is there any possibility for synchronize_srcu() > optimization? > > There are other sides we can work on, such as reducing the memory slot > updates, but i'm wondering what can be done regarding SRCU itself. This is expected behavior, but there is a possible fix currently in mainline (Linus's git tree). The idea would be to create a synchronize_srcu_expedited(), which starts with synchronize_srcu(), and replaces the synchronize_sched() calls with synchronize_sched_expedited(). This could potentially reduce the overall synchronize_srcu() latency to well under a microsecond. The price to be paid is that each instance of synchronize_sched_expedited() IPIs all the online CPUs, and awakens the migration thread on each. Would this approach likely work for you? Thanx, Paul