linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH kernel] KVM: Stop leaking memory in debugfs
@ 2021-07-30  4:32 Alexey Kardashevskiy
  2021-08-03 11:16 ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: Alexey Kardashevskiy @ 2021-07-30  4:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexey Kardashevskiy, kvm, Paolo Bonzini

The debugfs folder name is made of a supposedly unique pair of
the process pid and a VM fd. However it is possible to get a race here
which manifests in these messages:

[  471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!

debugfs_create_dir() returns an error which is handled correctly
everywhere except kvm_create_vm_debugfs() where the code allocates
stat data structs and overwrites the older values regardless.

Spotted by syzkaller. This slow memory leak produces way too many
OOM reports.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

I am pretty sure we better fix the race but I am not quite sure what
lock is appropriate here, ideas?

---
 virt/kvm/kvm_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 986959833d70..89496fd8127a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -904,6 +904,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 
 	snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
 	kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+	if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
+		pr_err("Failed to create %s\n", dir_name);
+		return 0;
+	}
 
 	kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
 					 sizeof(*kvm->debugfs_stat_data),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs
  2021-07-30  4:32 [RFC PATCH kernel] KVM: Stop leaking memory in debugfs Alexey Kardashevskiy
@ 2021-08-03 11:16 ` Greg KH
  2021-08-03 12:52   ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2021-08-03 11:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: linux-kernel, kvm, Paolo Bonzini

On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
> The debugfs folder name is made of a supposedly unique pair of
> the process pid and a VM fd. However it is possible to get a race here
> which manifests in these messages:
> 
> [  471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
> 
> debugfs_create_dir() returns an error which is handled correctly
> everywhere except kvm_create_vm_debugfs() where the code allocates
> stat data structs and overwrites the older values regardless.
> 
> Spotted by syzkaller. This slow memory leak produces way too many
> OOM reports.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> I am pretty sure we better fix the race but I am not quite sure what
> lock is appropriate here, ideas?
> 
> ---
>  virt/kvm/kvm_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 986959833d70..89496fd8127a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -904,6 +904,10 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
>  
>  	snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
>  	kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
> +	if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
> +		pr_err("Failed to create %s\n", dir_name);
> +		return 0;
> +	}

It should not matter if you fail a debugfs call at all.

If there is a larger race at work here, please fix that root cause, do
not paper over it by attempting to have debugfs catch the issue for you.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs
  2021-08-03 11:16 ` Greg KH
@ 2021-08-03 12:52   ` Paolo Bonzini
  2021-08-03 13:11     ` Greg KH
  2021-08-03 13:13     ` Alexey Kardashevskiy
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Bonzini @ 2021-08-03 12:52 UTC (permalink / raw)
  To: Greg KH; +Cc: Alexey Kardashevskiy, Kernel Mailing List, Linux, kvm

On Tue, Aug 3, 2021 at 1:16 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
> >       snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
> >       kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
> > +     if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
> > +             pr_err("Failed to create %s\n", dir_name);
> > +             return 0;
> > +     }
>
> It should not matter if you fail a debugfs call at all.
>
> If there is a larger race at work here, please fix that root cause, do
> not paper over it by attempting to have debugfs catch the issue for you.

I don't think it's a race, it's really just a bug that is intrinsic in how
the debugfs files are named.  You can just do something like this:

#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <sys/ioctl.h>
#include <linux/kvm.h>
#include <stdlib.h>
int main() {
        int kvmfd = open("/dev/kvm", O_RDONLY);
        int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
        if (fork() == 0) {
                printf("before: %d\n", fd);
                sleep(2);
        } else {
                close(fd);
                sleep(1);
                int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
                printf("after: %d\n", fd);
                wait(NULL);
        }
}

So Alexey's patch is okay and I've queued it, though with pr_warn_ratelimited
instead of pr_err.

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs
  2021-08-03 12:52   ` Paolo Bonzini
@ 2021-08-03 13:11     ` Greg KH
  2021-08-03 13:29       ` Paolo Bonzini
  2021-08-03 13:13     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 6+ messages in thread
From: Greg KH @ 2021-08-03 13:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Alexey Kardashevskiy, Kernel Mailing List, Linux, kvm

On Tue, Aug 03, 2021 at 02:52:17PM +0200, Paolo Bonzini wrote:
> On Tue, Aug 3, 2021 at 1:16 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
> > >       snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
> > >       kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
> > > +     if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
> > > +             pr_err("Failed to create %s\n", dir_name);
> > > +             return 0;
> > > +     }
> >
> > It should not matter if you fail a debugfs call at all.
> >
> > If there is a larger race at work here, please fix that root cause, do
> > not paper over it by attempting to have debugfs catch the issue for you.
> 
> I don't think it's a race, it's really just a bug that is intrinsic in how
> the debugfs files are named.  You can just do something like this:
> 
> #include <unistd.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/wait.h>
> #include <sys/ioctl.h>
> #include <linux/kvm.h>
> #include <stdlib.h>
> int main() {
>         int kvmfd = open("/dev/kvm", O_RDONLY);
>         int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
>         if (fork() == 0) {
>                 printf("before: %d\n", fd);
>                 sleep(2);
>         } else {
>                 close(fd);
>                 sleep(1);
>                 int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
>                 printf("after: %d\n", fd);
>                 wait(NULL);
>         }
> }
> 
> So Alexey's patch is okay and I've queued it, though with pr_warn_ratelimited
> instead of pr_err.

So userspace can create kvm resources with duplicate names?  That feels
wrong to me.

But if all that is "duplicate" is the debugfs kvm directory, why not ask
debugfs if it is already present before trying to create it again?  That
way you will not have debugfs complain about duplicate
files/directories.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs
  2021-08-03 12:52   ` Paolo Bonzini
  2021-08-03 13:11     ` Greg KH
@ 2021-08-03 13:13     ` Alexey Kardashevskiy
  1 sibling, 0 replies; 6+ messages in thread
From: Alexey Kardashevskiy @ 2021-08-03 13:13 UTC (permalink / raw)
  To: Paolo Bonzini, Greg KH; +Cc: Kernel Mailing List, Linux, kvm



On 03/08/2021 22:52, Paolo Bonzini wrote:
> On Tue, Aug 3, 2021 at 1:16 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>> On Fri, Jul 30, 2021 at 02:32:17PM +1000, Alexey Kardashevskiy wrote:
>>>        snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
>>>        kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
>>> +     if (IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
>>> +             pr_err("Failed to create %s\n", dir_name);
>>> +             return 0;
>>> +     }
>>
>> It should not matter if you fail a debugfs call at all.
>>
>> If there is a larger race at work here, please fix that root cause, do
>> not paper over it by attempting to have debugfs catch the issue for you.
> 
> I don't think it's a race, it's really just a bug that is intrinsic in how
> the debugfs files are named.  You can just do something like this:
> 
> #include <unistd.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <sys/wait.h>
> #include <sys/ioctl.h>
> #include <linux/kvm.h>
> #include <stdlib.h>
> int main() {
>          int kvmfd = open("/dev/kvm", O_RDONLY);
>          int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
>          if (fork() == 0) {
>                  printf("before: %d\n", fd);
>                  sleep(2);
>          } else {
>                  close(fd);
>                  sleep(1);
>                  int fd = ioctl(kvmfd, KVM_CREATE_VM, 0);
>                  printf("after: %d\n", fd);
>                  wait(NULL);
>          }
> }

oh nice demo :)

although I still think there was a race when I saw it as there was no 
fork() in the picture but continuous create/destroy VM in 16 threads on 
16 VCPUs with no KVM_RUN in between.

> 
> So Alexey's patch is okay and I've queued it, though with pr_warn_ratelimited
> instead of pr_err.

Makes sense with your reproducer. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH kernel] KVM: Stop leaking memory in debugfs
  2021-08-03 13:11     ` Greg KH
@ 2021-08-03 13:29       ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2021-08-03 13:29 UTC (permalink / raw)
  To: Greg KH; +Cc: Alexey Kardashevskiy, Kernel Mailing List, Linux, kvm

> So userspace can create kvm resources with duplicate names?  That feels
> wrong to me.

Yes, the name is just the (pid, file descriptor). It's used only for
debugfs, and it's not really likely going to happen unless a program
does it on purpose, but the ugliness/wrongness is one of the reasons
why we now have a non-debugfs mechanism to retrieve the stats.

> But if all that is "duplicate" is the debugfs kvm directory, why not ask
> debugfs if it is already present before trying to create it again?  That
> way you will not have debugfs complain about duplicate
> files/directories.

That would also be racy; it would need a simple mutex around
debugfs_lookup and debugfs_create_dir. But it would indeed avoid the
complaints altogether, so I'll prepare a patch.  Thanks,

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-03 13:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30  4:32 [RFC PATCH kernel] KVM: Stop leaking memory in debugfs Alexey Kardashevskiy
2021-08-03 11:16 ` Greg KH
2021-08-03 12:52   ` Paolo Bonzini
2021-08-03 13:11     ` Greg KH
2021-08-03 13:29       ` Paolo Bonzini
2021-08-03 13:13     ` Alexey Kardashevskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).