linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal
@ 2020-10-26 15:09 Willem de Bruijn
  2020-10-26 16:24 ` Reinette Chatre
  0 siblings, 1 reply; 3+ messages in thread
From: Willem de Bruijn @ 2020-10-26 15:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: fenghua.yu, reinette.chatre, tglx, mingo, bp, x86, hpa, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Resctrl takes an extra kernfs ref on directory entries, to access
the entry on cleanup in rdtgroup_kn_unlock after removing the entire
subtree with kfree_remove.

But the path takes an extra ref both on mkdir and on rmdir.

The kernfs_get on mkdir causes a memleak in the unlikely exit with
error in the same function, as no extra kernfs_put exists and no extra
rdtgroup_kn_unlock occurs.

More importantly, essentially the same happens in the normal path, as
this simple program demonstrates:

    for i in {1..200000}; do
      mkdir /sys/fs/resctrl/task1
      rmdir /sys/fs/resctrl/task1
    done
    slabtop

When taking an extra ref for the duration of kernfs_remove, it is
easiest to reason about when holding this extra ref as short as
possible. For that, the refcnt on error reason and free on umount
(rmdir_all_sub), remove the first kernfs_get on mkdir, leaving the
other on rmdir.

As the caller of rdtgroup_rmdir, kernfs_iop_rmdir, itself takes a
reference on the kernfs object, the extra reference is possibly not
needed at all.

Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4d02ec8f371e..115a86bf6bd8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2839,14 +2839,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 	}
 	rdtgrp->kn = kn;
 
-	/*
-	 * kernfs_remove() will drop the reference count on "kn" which
-	 * will free it. But we still need it to stick around for the
-	 * rdtgroup_kn_unlock(kn} call below. Take one extra reference
-	 * here, which will be dropped inside rdtgroup_kn_unlock().
-	 */
-	kernfs_get(kn);
-
 	ret = rdtgroup_kn_set_ugid(kn);
 	if (ret) {
 		rdt_last_cmd_puts("kernfs perm error\n");
-- 
2.29.0.rc1.297.gfa9743e501-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal
  2020-10-26 15:09 [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal Willem de Bruijn
@ 2020-10-26 16:24 ` Reinette Chatre
  2020-10-26 16:48   ` Willem de Bruijn
  0 siblings, 1 reply; 3+ messages in thread
From: Reinette Chatre @ 2020-10-26 16:24 UTC (permalink / raw)
  To: Willem de Bruijn, linux-kernel
  Cc: fenghua.yu, tglx, mingo, bp, x86, hpa, Willem de Bruijn, Xiaochen Shen

+Xiaochen

Hi Willem,

As you described in the report you sent directly to us there are indeed 
more issues than the one described here surrounding the kernfs node 
reference counting in resctrl. Xiaochen is actively working on patch(es) 
for all the issues and you could continue working with him ... now 
externally?

On 10/26/2020 8:09 AM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Resctrl takes an extra kernfs ref on directory entries, to access
> the entry on cleanup in rdtgroup_kn_unlock after removing the entire
> subtree with kfree_remove.
> 
> But the path takes an extra ref both on mkdir and on rmdir.
> 

On resource group (control as well as monitoring) creation via a mkdir 
an extra kernfs node reference is obtained to ensure that the rdtgroup 
structure remains accessible for the rdtgroup_kn_unlock() calls where it 
is removed on deletion. This symmetry ties the resource group's lifetime 
with the kernfs node. The extra kernfs node reference count is dropped 
by kernfs_put() in rdtgroup_kn_unlock() as is documented in the comment 
removed by this patch.

As you state there is an extra reference obtained in rmdir, that is 
unnecessary.

> The kernfs_get on mkdir causes a memleak in the unlikely exit with
> error in the same function, as no extra kernfs_put exists and no extra
> rdtgroup_kn_unlock occurs.

This is a bug.

> 
> More importantly, essentially the same happens in the normal path, as
> this simple program demonstrates:
> 
>      for i in {1..200000}; do
>        mkdir /sys/fs/resctrl/task1
>        rmdir /sys/fs/resctrl/task1
>      done
>      slabtop
> 
> When taking an extra ref for the duration of kernfs_remove, it is
> easiest to reason about when holding this extra ref as short as
> possible. For that, the refcnt on error reason and free on umount
> (rmdir_all_sub), remove the first kernfs_get on mkdir, leaving the
> other on rmdir.

rmdir_all_sub() may be prevented from just removing the resource group 
if there are any waiters. In this case the resource group would be 
removed by rdtgroup_kn_unlock() by the last waiter at which point a 
reference would be dropped. With this patch there would be no reference 
to drop.

Indeed, there is another issue where the kfree(rdtgrp) in 
rmdir_all_sub() (the case when there are no waiters) is missing a 
kernfs_put(). Xiaochen is meticulously working through all of this.

> 
> As the caller of rdtgroup_rmdir, kernfs_iop_rmdir, itself takes a
> reference on the kernfs object, the extra reference is possibly not
> needed at all.

This is not obvious to me. Are you referring to 
kernfs_iop_rmdir()->kernfs_get_active(kn)? That is a different reference 
(kn->active as opposed to kn->count)?

> 
> Fixes: 60cf5e101fd4 ("x86/intel_rdt: Add mkdir to resctrl file system")
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 --------
>   1 file changed, 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 4d02ec8f371e..115a86bf6bd8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2839,14 +2839,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>   	}
>   	rdtgrp->kn = kn;
>   
> -	/*
> -	 * kernfs_remove() will drop the reference count on "kn" which
> -	 * will free it. But we still need it to stick around for the
> -	 * rdtgroup_kn_unlock(kn} call below. Take one extra reference
> -	 * here, which will be dropped inside rdtgroup_kn_unlock().
> -	 */
> -	kernfs_get(kn);
> -
>   	ret = rdtgroup_kn_set_ugid(kn);
>   	if (ret) {
>   		rdt_last_cmd_puts("kernfs perm error\n");
> 


Reinette

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal
  2020-10-26 16:24 ` Reinette Chatre
@ 2020-10-26 16:48   ` Willem de Bruijn
  0 siblings, 0 replies; 3+ messages in thread
From: Willem de Bruijn @ 2020-10-26 16:48 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Willem de Bruijn, linux-kernel, Fenghua Yu, Thomas Gleixner,
	mingo, bp, x86, HPA, Xiaochen Shen

On Mon, Oct 26, 2020 at 12:24 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> +Xiaochen
>
> Hi Willem,
>
> As you described in the report you sent directly to us there are indeed
> more issues than the one described here surrounding the kernfs node
> reference counting in resctrl. Xiaochen is actively working on patch(es)
> for all the issues and you could continue working with him ... now
> externally?

Great to hear. I wasn't aware of that. Of course. Externally or
off-list first, whichever you prefer.

For reference, one other issue occurs on mount/umount:

    for i in {1..200000}; do
        mount -t resctrl resctrl /sys/fs/resctrl;
        umount /sys/fs/resctrl;
    done

> On 10/26/2020 8:09 AM, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Resctrl takes an extra kernfs ref on directory entries, to access
> > the entry on cleanup in rdtgroup_kn_unlock after removing the entire
> > subtree with kfree_remove.
> >
> > But the path takes an extra ref both on mkdir and on rmdir.
> >
>
> On resource group (control as well as monitoring) creation via a mkdir
> an extra kernfs node reference is obtained to ensure that the rdtgroup
> structure remains accessible for the rdtgroup_kn_unlock() calls where it
> is removed on deletion. This symmetry ties the resource group's lifetime
> with the kernfs node. The extra kernfs node reference count is dropped
> by kernfs_put() in rdtgroup_kn_unlock() as is documented in the comment
> removed by this patch.
>
> As you state there is an extra reference obtained in rmdir, that is
> unnecessary.
>
> > The kernfs_get on mkdir causes a memleak in the unlikely exit with
> > error in the same function, as no extra kernfs_put exists and no extra
> > rdtgroup_kn_unlock occurs.
>
> This is a bug.
>
> >
> > More importantly, essentially the same happens in the normal path, as
> > this simple program demonstrates:
> >
> >      for i in {1..200000}; do
> >        mkdir /sys/fs/resctrl/task1
> >        rmdir /sys/fs/resctrl/task1
> >      done
> >      slabtop
> >
> > When taking an extra ref for the duration of kernfs_remove, it is
> > easiest to reason about when holding this extra ref as short as
> > possible. For that, the refcnt on error reason and free on umount
> > (rmdir_all_sub), remove the first kernfs_get on mkdir, leaving the
> > other on rmdir.
>
> rmdir_all_sub() may be prevented from just removing the resource group
> if there are any waiters. In this case the resource group would be
> removed by rdtgroup_kn_unlock() by the last waiter at which point a
> reference would be dropped. With this patch there would be no reference
> to drop.

Ah, indeed. It would be easier to reason about if rdtgroup_kn_lock_live
takes an extra ref that rdtgroup_kn_unlock releases? But either way.
I had certainly missed that path.

> Indeed, there is another issue where the kfree(rdtgrp) in
> rmdir_all_sub() (the case when there are no waiters) is missing a
> kernfs_put(). Xiaochen is meticulously working through all of this.
>
> >
> > As the caller of rdtgroup_rmdir, kernfs_iop_rmdir, itself takes a
> > reference on the kernfs object, the extra reference is possibly not
> > needed at all.
>
> This is not obvious to me. Are you referring to
> kernfs_iop_rmdir()->kernfs_get_active(kn)? That is a different reference
> (kn->active as opposed to kn->count)?

I thought that would have the same effect of ensuring that kn can
be dereferenced safely throughout rdtgroup_rmdir. But judging from the
WARN_ONCE in kernfs_put the rules on count vs active are not quite that
simple.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-26 16:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26 15:09 [PATCH] x86/resctrl: Fix memory leak on kernfs dir removal Willem de Bruijn
2020-10-26 16:24 ` Reinette Chatre
2020-10-26 16:48   ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).