All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: release anon file in failure path of vm creation
@ 2016-07-12  9:38 Liu Shuo
  2016-07-12 10:24 ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Liu Shuo @ 2016-07-12  9:38 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Zhang Yanmin, He Bo, Liu Shuo, Paolo Bonzini,
	Radim Krčmář

The failure of create debugfs of VM will return directly without release
the anon file. It will leak memory and file descriptors, even through
be not serious.

Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
---
 virt/kvm/kvm_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48bd520..8322154 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
 #include <linux/slab.h>
 #include <linux/sort.h>
 #include <linux/bsearch.h>
+#include <linux/syscalls.h>
 
 #include <asm/processor.h>
 #include <asm/io.h>
@@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 
 	if (kvm_create_vm_debugfs(kvm, r) < 0) {
 		kvm_put_kvm(kvm);
+		sys_close(r);
 		return -ENOMEM;
 	}
 
-- 
1.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-12  9:38 [PATCH] KVM: release anon file in failure path of vm creation Liu Shuo
@ 2016-07-12 10:24 ` Paolo Bonzini
  2016-07-14 16:46   ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2016-07-12 10:24 UTC (permalink / raw)
  To: Liu Shuo, linux-kernel, kvm
  Cc: Zhang Yanmin, He Bo, Liu Shuo, Radim Krčmář



On 12/07/2016 11:38, Liu Shuo wrote:
> The failure of create debugfs of VM will return directly without release
> the anon file. It will leak memory and file descriptors, even through
> be not serious.
> 
> Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
> ---
>  virt/kvm/kvm_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 48bd520..8322154 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -49,6 +49,7 @@
>  #include <linux/slab.h>
>  #include <linux/sort.h>
>  #include <linux/bsearch.h>
> +#include <linux/syscalls.h>
>  
>  #include <asm/processor.h>
>  #include <asm/io.h>
> @@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>  
>  	if (kvm_create_vm_debugfs(kvm, r) < 0) {
>  		kvm_put_kvm(kvm);
> +		sys_close(r);
>  		return -ENOMEM;
>  	}
>  
> 

Thanks, applied to kvm/master.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-12 10:24 ` Paolo Bonzini
@ 2016-07-14 16:46   ` Al Viro
  2016-07-14 16:56     ` Paolo Bonzini
  2016-07-15  2:22     ` Liu Shuo
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2016-07-14 16:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Liu Shuo, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář

On Tue, Jul 12, 2016 at 12:24:39PM +0200, Paolo Bonzini wrote:

> On 12/07/2016 11:38, Liu Shuo wrote:
> > The failure of create debugfs of VM will return directly without release
> > the anon file. It will leak memory and file descriptors, even through
> > be not serious.
> > 
> > Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>

This is broken.

> > @@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
> >  
> >  	if (kvm_create_vm_debugfs(kvm, r) < 0) {
> >  		kvm_put_kvm(kvm);
> > +		sys_close(r);

You have no warranty whatsoever that descriptor table has not been changed
by that point.  You should *NEVER* use sys_close() on failure exit paths
like that.  Moreover, this kvm_put_kvm() becomes a double-put, since
closing the damn file will drop that reference to kvm.

> >  		return -ENOMEM;
> >  	}
> >  
> > 
> 
> Thanks, applied to kvm/master.

Please, revert.  anon_inode_getfd() should be used only when there's no
possible failures past its call.  What you need in this case (due to fucking
ugly API - decriptor number used as part of debugfs pathname) is something
like this (against mainline):

[kvm] don't use anon_inode_getfd() before possible failures

Once anon_inode_getfd() has succeeded, it's impossible to undo in a clean
way and no, sys_close() is not usable in such cases.  Use anon_inode_getfile()
and get_unused_fd_flags() to get struct file and descriptor and do *not*
install the file into descriptor table until after the last possible failure
exit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 48bd520..7f82c6c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3048,6 +3048,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 {
 	int r;
 	struct kvm *kvm;
+	struct file *file;
 
 	kvm = kvm_create_vm(type);
 	if (IS_ERR(kvm))
@@ -3059,17 +3060,25 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
 		return r;
 	}
 #endif
-	r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
+	r = get_unused_fd_flags(O_CLOEXEC);
 	if (r < 0) {
 		kvm_put_kvm(kvm);
 		return r;
 	}
+	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
+	if (IS_ERR(file)) {
+		put_unused_fd(r);
+		kvm_put_kvm(kvm);
+		return PTR_ERR(file);
+	}
 
 	if (kvm_create_vm_debugfs(kvm, r) < 0) {
-		kvm_put_kvm(kvm);
+		put_unused_fd(r);
+		fput(file);
 		return -ENOMEM;
 	}
 
+	fd_install(r, file);
 	return r;
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-14 16:46   ` Al Viro
@ 2016-07-14 16:56     ` Paolo Bonzini
  2016-07-15  2:22     ` Liu Shuo
  1 sibling, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2016-07-14 16:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Liu Shuo, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář



On 14/07/2016 18:46, Al Viro wrote:
> On Tue, Jul 12, 2016 at 12:24:39PM +0200, Paolo Bonzini wrote:
> 
>> On 12/07/2016 11:38, Liu Shuo wrote:
>>> The failure of create debugfs of VM will return directly without release
>>> the anon file. It will leak memory and file descriptors, even through
>>> be not serious.
>>>
>>> Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
> 
> This is broken.
> 
>>> @@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>>>  
>>>  	if (kvm_create_vm_debugfs(kvm, r) < 0) {
>>>  		kvm_put_kvm(kvm);
>>> +		sys_close(r);
> 
> You have no warranty whatsoever that descriptor table has not been changed
> by that point.  You should *NEVER* use sys_close() on failure exit paths
> like that.  Moreover, this kvm_put_kvm() becomes a double-put, since
> closing the damn file will drop that reference to kvm.
> 
>>>  		return -ENOMEM;
>>>  	}
>>>  
>>>
>>
>> Thanks, applied to kvm/master.
> 
> Please, revert.  anon_inode_getfd() should be used only when there's no
> possible failures past its call.  What you need in this case (due to fucking
> ugly API - decriptor number used as part of debugfs pathname) is something
> like this (against mainline):

Al,

thanks very much for the explanation and the patch.  I've reverted the
original patch and installed yours instead.

Paolo

> [kvm] don't use anon_inode_getfd() before possible failures
> 
> Once anon_inode_getfd() has succeeded, it's impossible to undo in a clean
> way and no, sys_close() is not usable in such cases.  Use anon_inode_getfile()
> and get_unused_fd_flags() to get struct file and descriptor and do *not*
> install the file into descriptor table until after the last possible failure
> exit.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 48bd520..7f82c6c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3048,6 +3048,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>  {
>  	int r;
>  	struct kvm *kvm;
> +	struct file *file;
>  
>  	kvm = kvm_create_vm(type);
>  	if (IS_ERR(kvm))
> @@ -3059,17 +3060,25 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>  		return r;
>  	}
>  #endif
> -	r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
> +	r = get_unused_fd_flags(O_CLOEXEC);
>  	if (r < 0) {
>  		kvm_put_kvm(kvm);
>  		return r;
>  	}
> +	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
> +	if (IS_ERR(file)) {
> +		put_unused_fd(r);
> +		kvm_put_kvm(kvm);
> +		return PTR_ERR(file);
> +	}
>  
>  	if (kvm_create_vm_debugfs(kvm, r) < 0) {
> -		kvm_put_kvm(kvm);
> +		put_unused_fd(r);
> +		fput(file);
>  		return -ENOMEM;
>  	}
>  
> +	fd_install(r, file);
>  	return r;git
>  }
>  
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-14 16:46   ` Al Viro
  2016-07-14 16:56     ` Paolo Bonzini
@ 2016-07-15  2:22     ` Liu Shuo
  2016-07-15  2:26       ` Al Viro
  1 sibling, 1 reply; 9+ messages in thread
From: Liu Shuo @ 2016-07-15  2:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Paolo Bonzini, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář

On Thu 14.Jul'16 at 17:46:47 +0100, Al Viro wrote:
>On Tue, Jul 12, 2016 at 12:24:39PM +0200, Paolo Bonzini wrote:
>
>> On 12/07/2016 11:38, Liu Shuo wrote:
>> > The failure of create debugfs of VM will return directly without release
>> > the anon file. It will leak memory and file descriptors, even through
>> > be not serious.
>> >
>> > Signed-off-by: Liu Shuo <shuo.a.liu@intel.com>
>
>This is broken.
>
>> > @@ -3067,6 +3068,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>> >
>> >  	if (kvm_create_vm_debugfs(kvm, r) < 0) {
>> >  		kvm_put_kvm(kvm);
>> > +		sys_close(r);
>
>You have no warranty whatsoever that descriptor table has not been changed
>by that point.  You should *NEVER* use sys_close() on failure exit paths
Could you please elaborate why we're not sure descriptor table's changing at the point?
>like that.  Moreover, this kvm_put_kvm() becomes a double-put, since
>closing the damn file will drop that reference to kvm.
You're right. Thanks for the correction on this point.
>
>> >  		return -ENOMEM;
>> >  	}
>> >
>> >
>>
>> Thanks, applied to kvm/master.
>
>Please, revert.  anon_inode_getfd() should be used only when there's no
>possible failures past its call.  What you need in this case (due to fucking
>ugly API - decriptor number used as part of debugfs pathname) is something
>like this (against mainline):
>
>[kvm] don't use anon_inode_getfd() before possible failures
>
>Once anon_inode_getfd() has succeeded, it's impossible to undo in a clean
>way and no, sys_close() is not usable in such cases.  Use anon_inode_getfile()
>and get_unused_fd_flags() to get struct file and descriptor and do *not*
>install the file into descriptor table until after the last possible failure
>exit.
>
>Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Thanks. It's better.
>---
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index 48bd520..7f82c6c 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -3048,6 +3048,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
> {
> 	int r;
> 	struct kvm *kvm;
>+	struct file *file;
>
> 	kvm = kvm_create_vm(type);
> 	if (IS_ERR(kvm))
>@@ -3059,17 +3060,25 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
> 		return r;
> 	}
> #endif
>-	r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
>+	r = get_unused_fd_flags(O_CLOEXEC);
> 	if (r < 0) {
> 		kvm_put_kvm(kvm);
> 		return r;
> 	}
>+	file = anon_inode_getfile("kvm-vm", &kvm_vm_fops, kvm, O_RDWR);
>+	if (IS_ERR(file)) {
>+		put_unused_fd(r);
>+		kvm_put_kvm(kvm);
>+		return PTR_ERR(file);
>+	}
>
> 	if (kvm_create_vm_debugfs(kvm, r) < 0) {
>-		kvm_put_kvm(kvm);
>+		put_unused_fd(r);
>+		fput(file);
> 		return -ENOMEM;
> 	}
>
>+	fd_install(r, file);
> 	return r;
> }
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-15  2:22     ` Liu Shuo
@ 2016-07-15  2:26       ` Al Viro
  2016-07-15  3:18         ` Liu Shuo
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-07-15  2:26 UTC (permalink / raw)
  To: Liu Shuo
  Cc: Paolo Bonzini, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář

On Fri, Jul 15, 2016 at 10:22:04AM +0800, Liu Shuo wrote:
> > You have no warranty whatsoever that descriptor table has not been changed
> > by that point.  You should *NEVER* use sys_close() on failure exit paths
> Could you please elaborate why we're not sure descriptor table's changing at the point?

Because that could be called by one thread while another (having guessed the
descriptor you are about to get) does close()/dup2()/etc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-15  2:26       ` Al Viro
@ 2016-07-15  3:18         ` Liu Shuo
  2016-07-15  5:09           ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Liu Shuo @ 2016-07-15  3:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Paolo Bonzini, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář

On Fri 15.Jul'16 at  3:26:03 +0100, Al Viro wrote:
>On Fri, Jul 15, 2016 at 10:22:04AM +0800, Liu Shuo wrote:
>> > You have no warranty whatsoever that descriptor table has not been changed
>> > by that point.  You should *NEVER* use sys_close() on failure exit paths
>> Could you please elaborate why we're not sure descriptor table's changing at the point?
>
>Because that could be called by one thread while another (having guessed the
>descriptor you are about to get) does close()/dup2()/etc.
If there is no such thread (who operates the descriptor based on
guessing), i can think the changing is safe at the point. As the fd has
not been delivered to userspace. Am i right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-15  3:18         ` Liu Shuo
@ 2016-07-15  5:09           ` Al Viro
  2016-07-15  7:04             ` Liu Shuo
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-07-15  5:09 UTC (permalink / raw)
  To: Liu Shuo
  Cc: Paolo Bonzini, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář

On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote:

> If there is no such thread (who operates the descriptor based on
> guessing), i can think the changing is safe at the point. As the fd has
> not been delivered to userspace. Am i right?

<wry>
Expecting nice behaviour from userland code is something best avoided, really.
</wry>

All jokes aside, this other thread doesn't have to be malicious - just being
buggy would suffice.  Besides, you never know if something like userns won't
be dumped into the kernel, making your ioctl accessible to genuinely
malicious code.

The only sane approach is to treat descriptor tables as shared data structures
and postpone the insertion of struct file reference into descriptor table
until you are past all failure exits.  Including the ones related to copying
to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated
with it, reserves two descriptors, copies them into userland array and only if
everything has succeeded proceeds to fd_install().  In your case passing the
descriptor to userland is not an issue (return value of ioctl(2) goes via
register and that can't fail), so the last failure exit is that after failed
attempt to create debugfs stuff.  We have to reserve the descriptor before
that (it's used as a part of debugfs directory name), so anon_inode_getfd()
is not an option - it combines reserving descriptor with fd_install().
Such situations are exactly the reason why anon_inode_getfile() is there;
anon_inode_getfd() is usable only when it is the very last thing we do
before returning the descriptor to userland.

FWIW, original code was not unreasonable - it simply treated debugfs stuff as
optional and ignored those failures.  That way anon_inode_getfd() is fine -
there's no failure exits after it.  If we want to fail when debugfs had
been enabled and we'd failed to populate it, we need to use the real primitives
behind anon_inode_getfd(), though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] KVM: release anon file in failure path of vm creation
  2016-07-15  5:09           ` Al Viro
@ 2016-07-15  7:04             ` Liu Shuo
  0 siblings, 0 replies; 9+ messages in thread
From: Liu Shuo @ 2016-07-15  7:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Paolo Bonzini, linux-kernel, kvm, Zhang Yanmin, He Bo, Liu Shuo,
	Radim Krčmář

On Fri 15.Jul'16 at  6:09:59 +0100, Al Viro wrote:
>On Fri, Jul 15, 2016 at 11:18:41AM +0800, Liu Shuo wrote:
>
>> If there is no such thread (who operates the descriptor based on
>> guessing), i can think the changing is safe at the point. As the fd has
>> not been delivered to userspace. Am i right?
>
><wry>
>Expecting nice behaviour from userland code is something best avoided, really.
></wry>
Got it! :)
>
>All jokes aside, this other thread doesn't have to be malicious - just being
>buggy would suffice.  Besides, you never know if something like userns won't
>be dumped into the kernel, making your ioctl accessible to genuinely
>malicious code.
>
>The only sane approach is to treat descriptor tables as shared data structures
>and postpone the insertion of struct file reference into descriptor table
>until you are past all failure exits.  Including the ones related to copying
>to userland - e.g. pipe(2) creates a pipe, sets up two struct file associated
>with it, reserves two descriptors, copies them into userland array and only if
>everything has succeeded proceeds to fd_install().  In your case passing the
>descriptor to userland is not an issue (return value of ioctl(2) goes via
>register and that can't fail), so the last failure exit is that after failed
>attempt to create debugfs stuff.  We have to reserve the descriptor before
>that (it's used as a part of debugfs directory name), so anon_inode_getfd()
>is not an option - it combines reserving descriptor with fd_install().
>Such situations are exactly the reason why anon_inode_getfile() is there;
>anon_inode_getfd() is usable only when it is the very last thing we do
>before returning the descriptor to userland.
Really appreciate your exhaustive explanations. Thanks.
>
>FWIW, original code was not unreasonable - it simply treated debugfs stuff as
>optional and ignored those failures.  That way anon_inode_getfd() is fine -
>there's no failure exits after it.  If we want to fail when debugfs had
>been enabled and we'd failed to populate it, we need to use the real primitives
>behind anon_inode_getfd(), though.
So, my firstly immature idea returns back.
Is the fd used by debugfs name very useful? Do we really need it here?
Alternative approach is putting the fd into the debugfs, then the code
will be clean here, as we can put anon_inode_getfd at very last of the
ioctl.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-15  7:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12  9:38 [PATCH] KVM: release anon file in failure path of vm creation Liu Shuo
2016-07-12 10:24 ` Paolo Bonzini
2016-07-14 16:46   ` Al Viro
2016-07-14 16:56     ` Paolo Bonzini
2016-07-15  2:22     ` Liu Shuo
2016-07-15  2:26       ` Al Viro
2016-07-15  3:18         ` Liu Shuo
2016-07-15  5:09           ` Al Viro
2016-07-15  7:04             ` Liu Shuo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.