On Tue, 2019-11-26 at 09:14 -0800, Sean Christopherson wrote: > On Tue, Nov 26, 2019 at 01:44:14PM -0300, Leonardo Bras wrote: > > On Mon, 2019-10-21 at 15:58 -0700, Sean Christopherson wrote: > > ... > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index 67ef3f2e19e8..b8534c6b8cf6 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -772,6 +772,18 @@ void kvm_put_kvm(struct kvm *kvm) > > > } > > > EXPORT_SYMBOL_GPL(kvm_put_kvm); > > > > > > +/* > > > + * Used to put a reference that was taken on behalf of an object > > > associated > > > + * with a user-visible file descriptor, e.g. a vcpu or device, > > > if installation > > > + * of the new file descriptor fails and the reference cannot be > > > transferred to > > > + * its final owner. In such cases, the caller is still actively > > > using @kvm and > > > + * will fail miserably if the refcount unexpectedly hits zero. > > > + */ > > > +void kvm_put_kvm_no_destroy(struct kvm *kvm) > > > +{ > > > + WARN_ON(refcount_dec_and_test(&kvm->users_count)); > > > +} > > > +EXPORT_SYMBOL_GPL(kvm_put_kvm_no_destroy); > > > > > > static int kvm_vm_release(struct inode *inode, struct file > > > *filp) > > > { > > > @@ -2679,7 +2691,7 @@ static int kvm_vm_ioctl_create_vcpu(struct > > > kvm > > > *kvm, u32 id) > > > kvm_get_kvm(kvm); > > > r = create_vcpu_fd(vcpu); > > > if (r < 0) { > > > - kvm_put_kvm(kvm); > > > + kvm_put_kvm_no_destroy(kvm); > > > goto unlock_vcpu_destroy; > > > } > > > > > > @@ -3117,7 +3129,7 @@ static int kvm_ioctl_create_device(struct > > > kvm > > > *kvm, > > > kvm_get_kvm(kvm); > > > ret = anon_inode_getfd(ops->name, &kvm_device_fops, dev, O_RDWR > > > > O_CLOEXEC); > > > if (ret < 0) { > > > - kvm_put_kvm(kvm); > > > + kvm_put_kvm_no_destroy(kvm); > > > mutex_lock(&kvm->lock); > > > list_del(&dev->vm_node); > > > mutex_unlock(&kvm->lock); > > > > Hello, > > > > I see what are you solving here, but would not this behavior cause > > the > > refcount to reach negative values? > > > > If so, is not there a problem? I mean, in some archs (powerpc > > included) > > refcount_dec_and_test() will decrement and then test if the value > > is > > equal 0. If we ever reach a negative value, this will cause that > > memory > > to never be released. > > > > An example is that refcount_dec_and_test(), on other archs than > > x86, > > will call atomic_dec_and_test(), which on include/linux/atomic- > > fallback.h will do: > > > > return atomic_dec_return(v) == 0; > > > > To change this behavior, it would mean change the whole > > atomic_*_test > > behavior, or do a copy function in order to change this '== 0' to > > '<= 0'. > > > > Does it make sense? Do you need any help on this? > > I don't think so. refcount_dec_and_test() will WARN on an underflow > when > the kernel is built with CONFIG_REFCOUNT_FULL=y. I see no value in > duplicating those sanity checks in KVM. > > This new helper and WARN is to explicitly catch @users_count > unexpectedly > hitting zero, which is orthogonal to an underflow (although odds are > good > that a bug that triggers the WARN in kvm_put_kvm_no_destroy() will > also > lead to an underflow). Leaking the memory is deliberate as the > alternative > is a guaranteed use-after-free, i.e. kvm_put_kvm_no_destroy() is > intended > to be used when users_count is guaranteed to be valid after it is > decremented. I agree an use-after-free more problem than a memory leak, but I think that there is a way to solve this without leaking the memory also. One option would be reordering the kvm_put_kvm(), like in this patch: https://lkml.org/lkml/2019/11/26/517 And the other would be creating a new atomic operation that checks if the counter is less than zero: atomic_dec_and_test_negative(atomic_t *v) { return atomic_dec_return(v) <= 0; } And apply it to generic refcount. Do you think that would work? Best regards, Leonardo Bras