linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
@ 2019-06-19 17:16 Waiman Long
  2019-06-19 23:48 ` Shakeel Butt
  0 siblings, 1 reply; 5+ messages in thread
From: Waiman Long @ 2019-06-19 17:16 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: linux-mm, linux-kernel, Michal Hocko, Roman Gushchin,
	Johannes Weiner, Shakeel Butt, Vladimir Davydov, Waiman Long

There are concerns about memory leaks from extensive use of memory
cgroups as each memory cgroup creates its own set of kmem caches. There
is a possiblity that the memcg kmem caches may remain even after the
memory cgroups have been offlined. Therefore, it will be useful to show
the status of each of memcg kmem caches.

This patch introduces a new <debugfs>/memcg_slabinfo file which is
somewhat similar to /proc/slabinfo in format, but lists only information
about kmem caches that have child memcg kmem caches. Information
available in /proc/slabinfo are not repeated in memcg_slabinfo.

A portion of a sample output of the file was:

  # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
  rpc_inode_cache   root          13     51      1      1
  rpc_inode_cache     48           0      0      0      0
  fat_inode_cache   root           1     45      1      1
  fat_inode_cache     41           2     45      1      1
  xfs_inode         root         770    816     24     24
  xfs_inode           92          22     34      1      1
  xfs_inode           88:dead      1     34      1      1
  xfs_inode           89:dead     23     34      1      1
  xfs_inode           85           4     34      1      1
  xfs_inode           84           9     34      1      1

The css id of the memcg is also listed. If a memcg is not online,
the tag ":dead" will be attached as shown above.

Suggested-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..2bca1558a722 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -17,6 +17,7 @@
 #include <linux/uaccess.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
+#include <linux/debugfs.h>
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/page.h>
@@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
 	return 0;
 }
 module_init(slab_proc_init);
+
+#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
+/*
+ * Display information about kmem caches that have child memcg caches.
+ */
+static int memcg_slabinfo_show(struct seq_file *m, void *unused)
+{
+	struct kmem_cache *s, *c;
+	struct slabinfo sinfo;
+
+	mutex_lock(&slab_mutex);
+	seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
+	seq_puts(m, " <active_slabs> <num_slabs>\n");
+	list_for_each_entry(s, &slab_root_caches, root_caches_node) {
+		/*
+		 * Skip kmem caches that don't have any memcg children.
+		 */
+		if (list_empty(&s->memcg_params.children))
+			continue;
+
+		memset(&sinfo, 0, sizeof(sinfo));
+		get_slabinfo(s, &sinfo);
+		seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
+			   cache_name(s), sinfo.active_objs, sinfo.num_objs,
+			   sinfo.active_slabs, sinfo.num_slabs);
+
+		for_each_memcg_cache(c, s) {
+			struct cgroup_subsys_state *css;
+			char *dead = "";
+
+			css = &c->memcg_params.memcg->css;
+			if (!(css->flags & CSS_ONLINE))
+				dead = ":dead";
+
+			memset(&sinfo, 0, sizeof(sinfo));
+			get_slabinfo(c, &sinfo);
+			seq_printf(m, "%-17s %4d%5s %6lu %6lu %6lu %6lu\n",
+				   cache_name(c), css->id, dead,
+				   sinfo.active_objs, sinfo.num_objs,
+				   sinfo.active_slabs, sinfo.num_slabs);
+		}
+	}
+	mutex_unlock(&slab_mutex);
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(memcg_slabinfo);
+
+static int __init memcg_slabinfo_init(void)
+{
+	debugfs_create_file("memcg_slabinfo", S_IFREG | S_IRUGO,
+			    NULL, NULL, &memcg_slabinfo_fops);
+	return 0;
+}
+
+late_initcall(memcg_slabinfo_init);
+#endif /* CONFIG_DEBUG_FS && CONFIG_MEMCG_KMEM */
 #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
 
 static __always_inline void *__do_krealloc(const void *p, size_t new_size,
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
  2019-06-19 17:16 [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file Waiman Long
@ 2019-06-19 23:48 ` Shakeel Butt
  2019-06-20 14:23   ` Waiman Long
  0 siblings, 1 reply; 5+ messages in thread
From: Shakeel Butt @ 2019-06-19 23:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux MM, LKML, Michal Hocko, Roman Gushchin,
	Johannes Weiner, Vladimir Davydov

Hi Waiman,

On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@redhat.com> wrote:
>
> There are concerns about memory leaks from extensive use of memory
> cgroups as each memory cgroup creates its own set of kmem caches. There
> is a possiblity that the memcg kmem caches may remain even after the
> memory cgroups have been offlined. Therefore, it will be useful to show
> the status of each of memcg kmem caches.
>
> This patch introduces a new <debugfs>/memcg_slabinfo file which is
> somewhat similar to /proc/slabinfo in format, but lists only information
> about kmem caches that have child memcg kmem caches. Information
> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>
> A portion of a sample output of the file was:
>
>   # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
>   rpc_inode_cache   root          13     51      1      1
>   rpc_inode_cache     48           0      0      0      0
>   fat_inode_cache   root           1     45      1      1
>   fat_inode_cache     41           2     45      1      1
>   xfs_inode         root         770    816     24     24
>   xfs_inode           92          22     34      1      1
>   xfs_inode           88:dead      1     34      1      1
>   xfs_inode           89:dead     23     34      1      1
>   xfs_inode           85           4     34      1      1
>   xfs_inode           84           9     34      1      1
>
> The css id of the memcg is also listed. If a memcg is not online,
> the tag ":dead" will be attached as shown above.
>
> Suggested-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..2bca1558a722 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -17,6 +17,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/seq_file.h>
>  #include <linux/proc_fs.h>
> +#include <linux/debugfs.h>
>  #include <asm/cacheflush.h>
>  #include <asm/tlbflush.h>
>  #include <asm/page.h>
> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
>         return 0;
>  }
>  module_init(slab_proc_init);
> +
> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
> +/*
> + * Display information about kmem caches that have child memcg caches.
> + */
> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
> +{
> +       struct kmem_cache *s, *c;
> +       struct slabinfo sinfo;
> +
> +       mutex_lock(&slab_mutex);

On large machines there can be thousands of memcgs and potentially
each memcg can have hundreds of kmem caches. So, the slab_mutex can be
held for a very long time.

Our internal implementation traverses the memcg tree and then
traverses 'memcg->kmem_caches' within the slab_mutex (and
cond_resched() after unlock).

> +       seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
> +       seq_puts(m, " <active_slabs> <num_slabs>\n");
> +       list_for_each_entry(s, &slab_root_caches, root_caches_node) {
> +               /*
> +                * Skip kmem caches that don't have any memcg children.
> +                */
> +               if (list_empty(&s->memcg_params.children))
> +                       continue;
> +
> +               memset(&sinfo, 0, sizeof(sinfo));
> +               get_slabinfo(s, &sinfo);
> +               seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
> +                          cache_name(s), sinfo.active_objs, sinfo.num_objs,
> +                          sinfo.active_slabs, sinfo.num_slabs);
> +
> +               for_each_memcg_cache(c, s) {
> +                       struct cgroup_subsys_state *css;
> +                       char *dead = "";
> +
> +                       css = &c->memcg_params.memcg->css;
> +                       if (!(css->flags & CSS_ONLINE))
> +                               dead = ":dead";

Please note that Roman's kmem cache reparenting patch series have made
kmem caches of zombie memcgs a bit tricky. On memcg offlining the
memcg kmem caches are reparented and the css->id can get recycled. So,
we want to know that the a kmem cache is reparented and which memcg it
belonged to initially. Determining if a kmem cache is reparented, we
can store a flag on the kmem cache and for the previous memcg we can
use fhandle. However to not make this more complicated, for now, we
can just have the info that the kmem cache was reparented i.e. belongs
to an offlined memcg.

> +
> +                       memset(&sinfo, 0, sizeof(sinfo));
> +                       get_slabinfo(c, &sinfo);
> +                       seq_printf(m, "%-17s %4d%5s %6lu %6lu %6lu %6lu\n",
> +                                  cache_name(c), css->id, dead,
> +                                  sinfo.active_objs, sinfo.num_objs,
> +                                  sinfo.active_slabs, sinfo.num_slabs);
> +               }
> +       }
> +       mutex_unlock(&slab_mutex);
> +       return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(memcg_slabinfo);
> +
> +static int __init memcg_slabinfo_init(void)
> +{
> +       debugfs_create_file("memcg_slabinfo", S_IFREG | S_IRUGO,
> +                           NULL, NULL, &memcg_slabinfo_fops);
> +       return 0;
> +}
> +
> +late_initcall(memcg_slabinfo_init);
> +#endif /* CONFIG_DEBUG_FS && CONFIG_MEMCG_KMEM */
>  #endif /* CONFIG_SLAB || CONFIG_SLUB_DEBUG */
>
>  static __always_inline void *__do_krealloc(const void *p, size_t new_size,
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
  2019-06-19 23:48 ` Shakeel Butt
@ 2019-06-20 14:23   ` Waiman Long
  2019-06-20 14:39     ` Shakeel Butt
  0 siblings, 1 reply; 5+ messages in thread
From: Waiman Long @ 2019-06-20 14:23 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux MM, LKML, Michal Hocko, Roman Gushchin,
	Johannes Weiner, Vladimir Davydov

On 6/19/19 7:48 PM, Shakeel Butt wrote:
> Hi Waiman,
>
> On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@redhat.com> wrote:
>> There are concerns about memory leaks from extensive use of memory
>> cgroups as each memory cgroup creates its own set of kmem caches. There
>> is a possiblity that the memcg kmem caches may remain even after the
>> memory cgroups have been offlined. Therefore, it will be useful to show
>> the status of each of memcg kmem caches.
>>
>> This patch introduces a new <debugfs>/memcg_slabinfo file which is
>> somewhat similar to /proc/slabinfo in format, but lists only information
>> about kmem caches that have child memcg kmem caches. Information
>> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>>
>> A portion of a sample output of the file was:
>>
>>   # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
>>   rpc_inode_cache   root          13     51      1      1
>>   rpc_inode_cache     48           0      0      0      0
>>   fat_inode_cache   root           1     45      1      1
>>   fat_inode_cache     41           2     45      1      1
>>   xfs_inode         root         770    816     24     24
>>   xfs_inode           92          22     34      1      1
>>   xfs_inode           88:dead      1     34      1      1
>>   xfs_inode           89:dead     23     34      1      1
>>   xfs_inode           85           4     34      1      1
>>   xfs_inode           84           9     34      1      1
>>
>> The css id of the memcg is also listed. If a memcg is not online,
>> the tag ":dead" will be attached as shown above.
>>
>> Suggested-by: Shakeel Butt <shakeelb@google.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 57 insertions(+)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 58251ba63e4a..2bca1558a722 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/uaccess.h>
>>  #include <linux/seq_file.h>
>>  #include <linux/proc_fs.h>
>> +#include <linux/debugfs.h>
>>  #include <asm/cacheflush.h>
>>  #include <asm/tlbflush.h>
>>  #include <asm/page.h>
>> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
>>         return 0;
>>  }
>>  module_init(slab_proc_init);
>> +
>> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
>> +/*
>> + * Display information about kmem caches that have child memcg caches.
>> + */
>> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
>> +{
>> +       struct kmem_cache *s, *c;
>> +       struct slabinfo sinfo;
>> +
>> +       mutex_lock(&slab_mutex);
> On large machines there can be thousands of memcgs and potentially
> each memcg can have hundreds of kmem caches. So, the slab_mutex can be
> held for a very long time.

But that is also what /proc/slabinfo does by doing mutex_lock() at
slab_start() and mutex_unlock() at slab_stop(). So the same problem will
happen when /proc/slabinfo is being read.

When you are in a situation that reading /proc/slabinfo take a long time
because of the large number of memcg's, the system is in some kind of
trouble anyway. I am saying that we should not improve the scalability
of this patch. It is just that some nasty race conditions may pop up if
we release the lock and re-acquire it latter. That will greatly
complicate the code to handle all those edge cases.

> Our internal implementation traverses the memcg tree and then
> traverses 'memcg->kmem_caches' within the slab_mutex (and
> cond_resched() after unlock).
For cgroup v1, the setting of the CONFIG_SLUB_DEBUG option will allow
you to iterate and display slabinfo just for that particular memcg. I am
thinking of extending the debug controller to do similar thing for
cgroup v2.
>> +       seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
>> +       seq_puts(m, " <active_slabs> <num_slabs>\n");
>> +       list_for_each_entry(s, &slab_root_caches, root_caches_node) {
>> +               /*
>> +                * Skip kmem caches that don't have any memcg children.
>> +                */
>> +               if (list_empty(&s->memcg_params.children))
>> +                       continue;
>> +
>> +               memset(&sinfo, 0, sizeof(sinfo));
>> +               get_slabinfo(s, &sinfo);
>> +               seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
>> +                          cache_name(s), sinfo.active_objs, sinfo.num_objs,
>> +                          sinfo.active_slabs, sinfo.num_slabs);
>> +
>> +               for_each_memcg_cache(c, s) {
>> +                       struct cgroup_subsys_state *css;
>> +                       char *dead = "";
>> +
>> +                       css = &c->memcg_params.memcg->css;
>> +                       if (!(css->flags & CSS_ONLINE))
>> +                               dead = ":dead";
> Please note that Roman's kmem cache reparenting patch series have made
> kmem caches of zombie memcgs a bit tricky. On memcg offlining the
> memcg kmem caches are reparented and the css->id can get recycled. So,
> we want to know that the a kmem cache is reparented and which memcg it
> belonged to initially. Determining if a kmem cache is reparented, we
> can store a flag on the kmem cache and for the previous memcg we can
> use fhandle. However to not make this more complicated, for now, we
> can just have the info that the kmem cache was reparented i.e. belongs
> to an offlined memcg.

I need to play with Roman's kmem cache reparenting patch a bit more to
see how to properly recognize a reparent'ed kmem cache. What I have
noticed is that the dead kmem caches that I saw at boot up were gone
after applying his patch. So that is a good thing.

For now, I think the current patch is good enough for its purpose. I may
send follow-up if I see something that can be improved.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
  2019-06-20 14:23   ` Waiman Long
@ 2019-06-20 14:39     ` Shakeel Butt
  2019-06-20 14:48       ` Waiman Long
  0 siblings, 1 reply; 5+ messages in thread
From: Shakeel Butt @ 2019-06-20 14:39 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux MM, LKML, Michal Hocko, Roman Gushchin,
	Johannes Weiner, Vladimir Davydov

On Thu, Jun 20, 2019 at 7:24 AM Waiman Long <longman@redhat.com> wrote:
>
> On 6/19/19 7:48 PM, Shakeel Butt wrote:
> > Hi Waiman,
> >
> > On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@redhat.com> wrote:
> >> There are concerns about memory leaks from extensive use of memory
> >> cgroups as each memory cgroup creates its own set of kmem caches. There
> >> is a possiblity that the memcg kmem caches may remain even after the
> >> memory cgroups have been offlined. Therefore, it will be useful to show
> >> the status of each of memcg kmem caches.
> >>
> >> This patch introduces a new <debugfs>/memcg_slabinfo file which is
> >> somewhat similar to /proc/slabinfo in format, but lists only information
> >> about kmem caches that have child memcg kmem caches. Information
> >> available in /proc/slabinfo are not repeated in memcg_slabinfo.
> >>
> >> A portion of a sample output of the file was:
> >>
> >>   # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
> >>   rpc_inode_cache   root          13     51      1      1
> >>   rpc_inode_cache     48           0      0      0      0
> >>   fat_inode_cache   root           1     45      1      1
> >>   fat_inode_cache     41           2     45      1      1
> >>   xfs_inode         root         770    816     24     24
> >>   xfs_inode           92          22     34      1      1
> >>   xfs_inode           88:dead      1     34      1      1
> >>   xfs_inode           89:dead     23     34      1      1
> >>   xfs_inode           85           4     34      1      1
> >>   xfs_inode           84           9     34      1      1
> >>
> >> The css id of the memcg is also listed. If a memcg is not online,
> >> the tag ":dead" will be attached as shown above.
> >>
> >> Suggested-by: Shakeel Butt <shakeelb@google.com>
> >> Signed-off-by: Waiman Long <longman@redhat.com>
> >> ---
> >>  mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 57 insertions(+)
> >>
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index 58251ba63e4a..2bca1558a722 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -17,6 +17,7 @@
> >>  #include <linux/uaccess.h>
> >>  #include <linux/seq_file.h>
> >>  #include <linux/proc_fs.h>
> >> +#include <linux/debugfs.h>
> >>  #include <asm/cacheflush.h>
> >>  #include <asm/tlbflush.h>
> >>  #include <asm/page.h>
> >> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
> >>         return 0;
> >>  }
> >>  module_init(slab_proc_init);
> >> +
> >> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
> >> +/*
> >> + * Display information about kmem caches that have child memcg caches.
> >> + */
> >> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
> >> +{
> >> +       struct kmem_cache *s, *c;
> >> +       struct slabinfo sinfo;
> >> +
> >> +       mutex_lock(&slab_mutex);
> > On large machines there can be thousands of memcgs and potentially
> > each memcg can have hundreds of kmem caches. So, the slab_mutex can be
> > held for a very long time.
>
> But that is also what /proc/slabinfo does by doing mutex_lock() at
> slab_start() and mutex_unlock() at slab_stop(). So the same problem will
> happen when /proc/slabinfo is being read.
>
> When you are in a situation that reading /proc/slabinfo take a long time
> because of the large number of memcg's, the system is in some kind of
> trouble anyway. I am saying that we should not improve the scalability
> of this patch. It is just that some nasty race conditions may pop up if
> we release the lock and re-acquire it latter. That will greatly
> complicate the code to handle all those edge cases.
>

We have been using that interface and implementation for couple of
years and have not seen any race condition. However I am fine with
what you have here for now. We can always come back if we think we
need to improve it.

> > Our internal implementation traverses the memcg tree and then
> > traverses 'memcg->kmem_caches' within the slab_mutex (and
> > cond_resched() after unlock).
> For cgroup v1, the setting of the CONFIG_SLUB_DEBUG option will allow
> you to iterate and display slabinfo just for that particular memcg. I am
> thinking of extending the debug controller to do similar thing for
> cgroup v2.

I was also planning to look into that and it seems like you are
already on it. Do CC me the patches.

> >> +       seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
> >> +       seq_puts(m, " <active_slabs> <num_slabs>\n");
> >> +       list_for_each_entry(s, &slab_root_caches, root_caches_node) {
> >> +               /*
> >> +                * Skip kmem caches that don't have any memcg children.
> >> +                */
> >> +               if (list_empty(&s->memcg_params.children))
> >> +                       continue;
> >> +
> >> +               memset(&sinfo, 0, sizeof(sinfo));
> >> +               get_slabinfo(s, &sinfo);
> >> +               seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
> >> +                          cache_name(s), sinfo.active_objs, sinfo.num_objs,
> >> +                          sinfo.active_slabs, sinfo.num_slabs);
> >> +
> >> +               for_each_memcg_cache(c, s) {
> >> +                       struct cgroup_subsys_state *css;
> >> +                       char *dead = "";
> >> +
> >> +                       css = &c->memcg_params.memcg->css;
> >> +                       if (!(css->flags & CSS_ONLINE))
> >> +                               dead = ":dead";
> > Please note that Roman's kmem cache reparenting patch series have made
> > kmem caches of zombie memcgs a bit tricky. On memcg offlining the
> > memcg kmem caches are reparented and the css->id can get recycled. So,
> > we want to know that the a kmem cache is reparented and which memcg it
> > belonged to initially. Determining if a kmem cache is reparented, we
> > can store a flag on the kmem cache and for the previous memcg we can
> > use fhandle. However to not make this more complicated, for now, we
> > can just have the info that the kmem cache was reparented i.e. belongs
> > to an offlined memcg.
>
> I need to play with Roman's kmem cache reparenting patch a bit more to
> see how to properly recognize a reparent'ed kmem cache. What I have
> noticed is that the dead kmem caches that I saw at boot up were gone
> after applying his patch. So that is a good thing.
>

By gone, do you mean the kmem cache got freed or the kmem cache is not
part of online parent memcg and thus no more dead kmem cache?

> For now, I think the current patch is good enough for its purpose. I may
> send follow-up if I see something that can be improved.
>

I would like to see the recognition of reparent'ed kmem cache in this
patch. However if others are ok with the current status of the patch
then I will not stand in the way.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file
  2019-06-20 14:39     ` Shakeel Butt
@ 2019-06-20 14:48       ` Waiman Long
  0 siblings, 0 replies; 5+ messages in thread
From: Waiman Long @ 2019-06-20 14:48 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux MM, LKML, Michal Hocko, Roman Gushchin,
	Johannes Weiner, Vladimir Davydov

On 6/20/19 10:39 AM, Shakeel Butt wrote:
> On Thu, Jun 20, 2019 at 7:24 AM Waiman Long <longman@redhat.com> wrote:
>> On 6/19/19 7:48 PM, Shakeel Butt wrote:
>>> Hi Waiman,
>>>
>>> On Wed, Jun 19, 2019 at 10:16 AM Waiman Long <longman@redhat.com> wrote:
>>>> There are concerns about memory leaks from extensive use of memory
>>>> cgroups as each memory cgroup creates its own set of kmem caches. There
>>>> is a possiblity that the memcg kmem caches may remain even after the
>>>> memory cgroups have been offlined. Therefore, it will be useful to show
>>>> the status of each of memcg kmem caches.
>>>>
>>>> This patch introduces a new <debugfs>/memcg_slabinfo file which is
>>>> somewhat similar to /proc/slabinfo in format, but lists only information
>>>> about kmem caches that have child memcg kmem caches. Information
>>>> available in /proc/slabinfo are not repeated in memcg_slabinfo.
>>>>
>>>> A portion of a sample output of the file was:
>>>>
>>>>   # <name> <css_id[:dead]> <active_objs> <num_objs> <active_slabs> <num_slabs>
>>>>   rpc_inode_cache   root          13     51      1      1
>>>>   rpc_inode_cache     48           0      0      0      0
>>>>   fat_inode_cache   root           1     45      1      1
>>>>   fat_inode_cache     41           2     45      1      1
>>>>   xfs_inode         root         770    816     24     24
>>>>   xfs_inode           92          22     34      1      1
>>>>   xfs_inode           88:dead      1     34      1      1
>>>>   xfs_inode           89:dead     23     34      1      1
>>>>   xfs_inode           85           4     34      1      1
>>>>   xfs_inode           84           9     34      1      1
>>>>
>>>> The css id of the memcg is also listed. If a memcg is not online,
>>>> the tag ":dead" will be attached as shown above.
>>>>
>>>> Suggested-by: Shakeel Butt <shakeelb@google.com>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>  mm/slab_common.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 57 insertions(+)
>>>>
>>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>>> index 58251ba63e4a..2bca1558a722 100644
>>>> --- a/mm/slab_common.c
>>>> +++ b/mm/slab_common.c
>>>> @@ -17,6 +17,7 @@
>>>>  #include <linux/uaccess.h>
>>>>  #include <linux/seq_file.h>
>>>>  #include <linux/proc_fs.h>
>>>> +#include <linux/debugfs.h>
>>>>  #include <asm/cacheflush.h>
>>>>  #include <asm/tlbflush.h>
>>>>  #include <asm/page.h>
>>>> @@ -1498,6 +1499,62 @@ static int __init slab_proc_init(void)
>>>>         return 0;
>>>>  }
>>>>  module_init(slab_proc_init);
>>>> +
>>>> +#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_MEMCG_KMEM)
>>>> +/*
>>>> + * Display information about kmem caches that have child memcg caches.
>>>> + */
>>>> +static int memcg_slabinfo_show(struct seq_file *m, void *unused)
>>>> +{
>>>> +       struct kmem_cache *s, *c;
>>>> +       struct slabinfo sinfo;
>>>> +
>>>> +       mutex_lock(&slab_mutex);
>>> On large machines there can be thousands of memcgs and potentially
>>> each memcg can have hundreds of kmem caches. So, the slab_mutex can be
>>> held for a very long time.
>> But that is also what /proc/slabinfo does by doing mutex_lock() at
>> slab_start() and mutex_unlock() at slab_stop(). So the same problem will
>> happen when /proc/slabinfo is being read.
>>
>> When you are in a situation that reading /proc/slabinfo take a long time
>> because of the large number of memcg's, the system is in some kind of
>> trouble anyway. I am saying that we should not improve the scalability
>> of this patch. It is just that some nasty race conditions may pop up if
>> we release the lock and re-acquire it latter. That will greatly
>> complicate the code to handle all those edge cases.
>>
> We have been using that interface and implementation for couple of
> years and have not seen any race condition. However I am fine with
> what you have here for now. We can always come back if we think we
> need to improve it.
>
>>> Our internal implementation traverses the memcg tree and then
>>> traverses 'memcg->kmem_caches' within the slab_mutex (and
>>> cond_resched() after unlock).
>> For cgroup v1, the setting of the CONFIG_SLUB_DEBUG option will allow
>> you to iterate and display slabinfo just for that particular memcg. I am
>> thinking of extending the debug controller to do similar thing for
>> cgroup v2.
> I was also planning to look into that and it seems like you are
> already on it. Do CC me the patches.
>
Sure.


>>>> +       seq_puts(m, "# <name> <css_id[:dead]> <active_objs> <num_objs>");
>>>> +       seq_puts(m, " <active_slabs> <num_slabs>\n");
>>>> +       list_for_each_entry(s, &slab_root_caches, root_caches_node) {
>>>> +               /*
>>>> +                * Skip kmem caches that don't have any memcg children.
>>>> +                */
>>>> +               if (list_empty(&s->memcg_params.children))
>>>> +                       continue;
>>>> +
>>>> +               memset(&sinfo, 0, sizeof(sinfo));
>>>> +               get_slabinfo(s, &sinfo);
>>>> +               seq_printf(m, "%-17s root      %6lu %6lu %6lu %6lu\n",
>>>> +                          cache_name(s), sinfo.active_objs, sinfo.num_objs,
>>>> +                          sinfo.active_slabs, sinfo.num_slabs);
>>>> +
>>>> +               for_each_memcg_cache(c, s) {
>>>> +                       struct cgroup_subsys_state *css;
>>>> +                       char *dead = "";
>>>> +
>>>> +                       css = &c->memcg_params.memcg->css;
>>>> +                       if (!(css->flags & CSS_ONLINE))
>>>> +                               dead = ":dead";
>>> Please note that Roman's kmem cache reparenting patch series have made
>>> kmem caches of zombie memcgs a bit tricky. On memcg offlining the
>>> memcg kmem caches are reparented and the css->id can get recycled. So,
>>> we want to know that the a kmem cache is reparented and which memcg it
>>> belonged to initially. Determining if a kmem cache is reparented, we
>>> can store a flag on the kmem cache and for the previous memcg we can
>>> use fhandle. However to not make this more complicated, for now, we
>>> can just have the info that the kmem cache was reparented i.e. belongs
>>> to an offlined memcg.
>> I need to play with Roman's kmem cache reparenting patch a bit more to
>> see how to properly recognize a reparent'ed kmem cache. What I have
>> noticed is that the dead kmem caches that I saw at boot up were gone
>> after applying his patch. So that is a good thing.
>>
> By gone, do you mean the kmem cache got freed or the kmem cache is not
> part of online parent memcg and thus no more dead kmem cache?
I just look at the online flag of the memcg's css. All of them are
online when the iteration is being done after Roman's patch. I will
probably need to check if reparenting has happened.
>
>> For now, I think the current patch is good enough for its purpose. I may
>> send follow-up if I see something that can be improved.
>>
> I would like to see the recognition of reparent'ed kmem cache in this
> patch. However if others are ok with the current status of the patch
> then I will not stand in the way.

As I said, I will work on a follow-up patch to recognize reparenting.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-20 14:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-19 17:16 [PATCH v2] mm, memcg: Add a memcg_slabinfo debugfs file Waiman Long
2019-06-19 23:48 ` Shakeel Butt
2019-06-20 14:23   ` Waiman Long
2019-06-20 14:39     ` Shakeel Butt
2019-06-20 14:48       ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).