On Mon, 2019-10-21 at 15:58 -0700, Sean Christopherson wrote: > Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a > borrowed > reference[*] to the VM when installing a new file descriptor > fails. KVM > expects the refcount to remain valid in this case, as the in-progress > ioctl() has an explicit reference to the VM. The primary motiviation > for the helper is to document that the 'kvm' pointer is still valid > after putting the borrowed reference, e.g. to document that doing > mutex(&kvm->lock) immediately after putting a ref to kvm isn't > broken. > > [*] When exposing a new object to userspace via a file descriptor, > e.g. > a new vcpu, KVM grabs a reference to itself (the VM) prior to > making > the object visible to userspace to avoid prematurely freeing the > VM > in the scenario where userspace immediately closes file > descriptor. > > Signed-off-by: Sean Christopherson > --- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > arch/powerpc/kvm/book3s_64_vio.c | 2 +- > include/linux/kvm_host.h | 1 + > virt/kvm/kvm_main.c | 16 ++++++++++++++-- > 4 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 9a75f0e1933b..68678e31c84c 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -2000,7 +2000,7 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, > struct kvm_get_htab_fd *ghf) > ret = anon_inode_getfd("kvm-htab", &kvm_htab_fops, ctx, rwflag > | O_CLOEXEC); > if (ret < 0) { > kfree(ctx); > - kvm_put_kvm(kvm); > + kvm_put_kvm_no_destroy(kvm); > return ret; > } > > diff --git a/arch/powerpc/kvm/book3s_64_vio.c > b/arch/powerpc/kvm/book3s_64_vio.c > index 5834db0a54c6..883a66e76638 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -317,7 +317,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm > *kvm, > if (ret >= 0) > list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables); > else > - kvm_put_kvm(kvm); > + kvm_put_kvm_no_destroy(kvm); > > mutex_unlock(&kvm->lock); > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 719fc3e15ea4..90a2102605ef 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -622,6 +622,7 @@ void kvm_exit(void); > > void kvm_get_kvm(struct kvm *kvm); > void kvm_put_kvm(struct kvm *kvm); > +void kvm_put_kvm_no_destroy(struct kvm *kvm); > > static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, > int as_id) > { > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 67ef3f2e19e8..b8534c6b8cf6 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -772,6 +772,18 @@ void kvm_put_kvm(struct kvm *kvm) > } > EXPORT_SYMBOL_GPL(kvm_put_kvm); > > +/* > + * Used to put a reference that was taken on behalf of an object > associated > + * with a user-visible file descriptor, e.g. a vcpu or device, if > installation > + * of the new file descriptor fails and the reference cannot be > transferred to > + * its final owner. In such cases, the caller is still actively > using @kvm and > + * will fail miserably if the refcount unexpectedly hits zero. > + */ > +void kvm_put_kvm_no_destroy(struct kvm *kvm) > +{ > + WARN_ON(refcount_dec_and_test(&kvm->users_count)); > +} > +EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy); > > static int kvm_vm_release(struct inode *inode, struct file *filp) > { > @@ -2679,7 +2691,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm > *kvm, u32 id) > kvm_get_kvm(kvm); > r = create_vcpu_fd(vcpu); > if (r < 0) { > - kvm_put_kvm(kvm); > + kvm_put_kvm_no_destroy(kvm); > goto unlock_vcpu_destroy; > } > > @@ -3117,7 +3129,7 @@ static int kvm_ioctl_create_device(struct kvm > *kvm, > kvm_get_kvm(kvm); > ret = anon_inode_getfd(ops->name, &kvm_device_fops, dev, O_RDWR > | O_CLOEXEC); > if (ret < 0) { > - kvm_put_kvm(kvm); > + kvm_put_kvm_no_destroy(kvm); > mutex_lock(&kvm->lock); > list_del(&dev->vm_node); > mutex_unlock(&kvm->lock); Hello, I see what are you solving here, but would not this behavior cause the refcount to reach negative values? If so, is not there a problem? I mean, in some archs (powerpc included) refcount_dec_and_test() will decrement and then test if the value is equal 0. If we ever reach a negative value, this will cause that memory to never be released. An example is that refcount_dec_and_test(), on other archs than x86, will call atomic_dec_and_test(), which on include/linux/atomic- fallback.h will do: return atomic_dec_return(v) == 0; To change this behavior, it would mean change the whole atomic_*_test behavior, or do a copy function in order to change this '== 0' to '<= 0'. Does it make sense? Do you need any help on this? Kind regards, Leonardo BrĂ¡s