From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758701Ab1LOJKn (ORCPT ); Thu, 15 Dec 2011 04:10:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47116 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751990Ab1LOJKl (ORCPT ); Thu, 15 Dec 2011 04:10:41 -0500 Date: Thu, 15 Dec 2011 11:10:37 +0200 From: Gleb Natapov To: Liu Ping Fan Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, avi@redhat.com, aliguori@us.ibm.com, mtosatti@redhat.com, jan.kiszka@web.de Subject: Re: [PATCH v4] kvm: make vcpu life cycle separated from kvm instance Message-ID: <20111215091037.GB21664@redhat.com> References: <1323923328-917-1-git-send-email-kernelfans@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1323923328-917-1-git-send-email-kernelfans@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 15, 2011 at 12:28:48PM +0800, Liu Ping Fan wrote: > From: Liu Ping Fan > > Currently, vcpu can be destructed only when kvm instance destroyed. > Change this to vcpu's destruction before kvm instance, so vcpu MUST > and CAN be destroyed before kvm's destroy. > I see reference counting is back. > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index d9cfb78..71dda47 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -141,6 +141,7 @@ void vcpu_load(struct kvm_vcpu *vcpu) > { > int cpu; > > + kvm_vcpu_get(vcpu); > mutex_lock(&vcpu->mutex); > if (unlikely(vcpu->pid != current->pids[PIDTYPE_PID].pid)) { > /* The thread running this VCPU changed. */ > @@ -163,6 +164,7 @@ void vcpu_put(struct kvm_vcpu *vcpu) > preempt_notifier_unregister(&vcpu->preempt_notifier); > preempt_enable(); > mutex_unlock(&vcpu->mutex); > + kvm_vcpu_put(vcpu); > } > Why is kvm_vcpu_get/kvm_vcpu_put is needed in vcpu_load/vcpu_put? As far as I see load/put are only called in vcpu ioctl, kvm_arch_vcpu_setup(), kvm_arch_vcpu_destroy() and kvm_arch_destroy_vm(). kvm_arch_vcpu_setup() and kvm_arch_vcpu_destroy() are called before vcpu is added to vcpus list, so it can't be accessed by other thread at this point. kvm_arch_destroy_vm() is called on KVM destruction path when all vcpus should be destroyed already. So the only interesting place is vcpu ioctl and I think we are protected by fd refcount there. vcpu fd can't be closed while ioctl is executing for that vcpu. Otherwise we would have problem now too. > @@ -1539,12 +1547,10 @@ EXPORT_SYMBOL_GPL(kvm_resched); > void kvm_vcpu_on_spin(struct kvm_vcpu *me) > { > struct kvm *kvm = me->kvm; > - struct kvm_vcpu *vcpu; > - int last_boosted_vcpu = me->kvm->last_boosted_vcpu; > - int yielded = 0; > - int pass; > - int i; > - > + struct kvm_vcpu *vcpu, *v; > + struct task_struct *task = NULL; > + struct pid *pid; > + int pass, firststart, lastone, yielded; > /* > * We boost the priority of a VCPU that is runnable but not > * currently running, because it got preempted by something > @@ -1552,15 +1558,22 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > * VCPU is holding the lock that we need and will release it. > * We approximate round-robin by starting at the last boosted VCPU. > */ > - for (pass = 0; pass < 2 && !yielded; pass++) { > - kvm_for_each_vcpu(i, vcpu, kvm) { > - struct task_struct *task = NULL; > - struct pid *pid; > - if (!pass && i < last_boosted_vcpu) { > - i = last_boosted_vcpu; > + for (pass = 0, firststart = 0; pass < 2 && !yielded; pass++) { > + > + rcu_read_lock(); > + kvm_for_each_vcpu(vcpu, kvm) { > + if (!pass && !firststart && > + vcpu != kvm->last_boosted_vcpu && > + kvm->last_boosted_vcpu != NULL) { > + vcpu = kvm->last_boosted_vcpu; > + firststart = 1; > continue; > - } else if (pass && i > last_boosted_vcpu) > + } else if (pass && !lastone) { > + if (vcpu == kvm->last_boosted_vcpu) > + lastone = 1; > + } else if (pass && lastone) > break; > + > if (vcpu == me) > continue; > if (waitqueue_active(&vcpu->wq)) > @@ -1576,15 +1589,29 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > put_task_struct(task); > continue; > } > + v = kvm_vcpu_get(vcpu); > + if (v == NULL) > + continue; > + > + rcu_read_unlock(); > if (yield_to(task, 1)) { > put_task_struct(task); > - kvm->last_boosted_vcpu = i; > + mutex_lock(&kvm->lock); > + /*Remeber to release it.*/ > + if (kvm->last_boosted_vcpu != NULL) > + kvm_vcpu_put(kvm->last_boosted_vcpu); > + kvm->last_boosted_vcpu = vcpu; > + mutex_unlock(&kvm->lock); > yielded = 1; I think we can be smart and protect kvm->last_boosted_vcpu with the same rcu as vcpus, but yeild_to() can sleep anyway. Hmm may be we should use srcu in the first place :( Or rewrite the logic of the functions somehow to call yield_to() outside of the loop. This is heuristics anyway. -- Gleb.