linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups
@ 2014-02-02 16:33 Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs Vladimir Davydov
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Hi,

This patch set mostly cleanups memcg slab caches creation/destruction
paths fixing a couple of bugs in the meanwhile. However, it does
introduce some functional changes. First, it changes the memcg caches
naming convention (see patch 2). Second, it reworks sysfs layout for
memcg slub caches (see patch 6).

Comments are appreciated.

Thanks.

Vladimir Davydov (8):
  memcg: export kmemcg cache id via cgroup fs
  memcg, slab: remove cgroup name from memcg cache names
  memcg, slab: never try to merge memcg caches
  memcg, slab: separate memcg vs root cache creation paths
  slub: adjust memcg caches when creating cache alias
  slub: rework sysfs layout for memcg caches
  memcg, slab: unregister cache from memcg before starting to destroy
    it
  memcg, slab: do not destroy children caches if parent has aliases

 include/linux/memcontrol.h |   13 +--
 include/linux/slab.h       |    9 +-
 include/linux/slub_def.h   |    3 +
 mm/memcontrol.c            |   85 +++++++------------
 mm/slab.h                  |   36 ++++----
 mm/slab_common.c           |  194 ++++++++++++++++++++++++++++----------------
 mm/slub.c                  |  121 +++++++++++++++++++++------
 7 files changed, 277 insertions(+), 184 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-03  6:21   ` David Rientjes
  2014-02-04 14:40   ` Michal Hocko
  2014-02-02 16:33 ` [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names Vladimir Davydov
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Per-memcg kmem caches are named as follows:

  <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)

where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
Cache names are exposed to userspace for debugging purposes (e.g. via
sysfs in case of slub or via dmesg).

Using relative names makes it impossible in general (in case the cgroup
hierarchy is not flat) to find out which memcg a particular cache
belongs to, because <cgroup-kmem-id> is not known to the user. Since
using absolute cgroup names would be an overkill, let's fix this by
exporting the id of kmem-active memcg via cgroup fs file
"memory.kmem.id".

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/memcontrol.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 53385cd4e6f0..91d242707404 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3113,6 +3113,14 @@ int memcg_cache_id(struct mem_cgroup *memcg)
 	return memcg ? memcg->kmemcg_id : -1;
 }
 
+static s64 mem_cgroup_cache_id_read(struct cgroup_subsys_state *css,
+				    struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+	return memcg_can_account_kmem(memcg) ? memcg_cache_id(memcg) : -1;
+}
+
 static size_t memcg_caches_array_size(int num_groups)
 {
 	ssize_t size;
@@ -6301,6 +6309,10 @@ static struct cftype mem_cgroup_files[] = {
 #endif
 #ifdef CONFIG_MEMCG_KMEM
 	{
+		.name = "kmem.id",
+		.read_s64 = mem_cgroup_cache_id_read,
+	},
+	{
 		.name = "kmem.limit_in_bytes",
 		.private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
 		.write_string = mem_cgroup_write,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-04 14:45   ` Michal Hocko
  2014-02-02 16:33 ` [PATCH 3/8] memcg, slab: never try to merge memcg caches Vladimir Davydov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

The cgroup name is not informative at all in case the cgroup hierarchy
is not flat. Besides, we can always find the memcg a particular cache
belongs to by its kmemcg id, which is now exported via "memory.kmem.id"
cgroup fs file for each memcg.

So let's remove the cgroup name part from kmem caches names - it will
greatly simplify the call paths and make the code look clearer.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/memcontrol.c  |   63 +++++++++++++-----------------------------------------
 mm/slab_common.c |    6 +++++-
 2 files changed, 20 insertions(+), 49 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91d242707404..3351c5b5486d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3405,44 +3405,6 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
 	schedule_work(&cachep->memcg_params->destroy);
 }
 
-static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
-						  struct kmem_cache *s)
-{
-	struct kmem_cache *new = NULL;
-	static char *tmp_name = NULL;
-	static DEFINE_MUTEX(mutex);	/* protects tmp_name */
-
-	BUG_ON(!memcg_can_account_kmem(memcg));
-
-	mutex_lock(&mutex);
-	/*
-	 * kmem_cache_create_memcg duplicates the given name and
-	 * cgroup_name for this name requires RCU context.
-	 * This static temporary buffer is used to prevent from
-	 * pointless shortliving allocation.
-	 */
-	if (!tmp_name) {
-		tmp_name = kmalloc(PATH_MAX, GFP_KERNEL);
-		if (!tmp_name)
-			goto out;
-	}
-
-	rcu_read_lock();
-	snprintf(tmp_name, PATH_MAX, "%s(%d:%s)", s->name,
-			 memcg_cache_id(memcg), cgroup_name(memcg->css.cgroup));
-	rcu_read_unlock();
-
-	new = kmem_cache_create_memcg(memcg, tmp_name, s->object_size, s->align,
-				      (s->flags & ~SLAB_PANIC), s->ctor, s);
-	if (new)
-		new->allocflags |= __GFP_KMEMCG;
-	else
-		new = s;
-out:
-	mutex_unlock(&mutex);
-	return new;
-}
-
 void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3489,12 +3451,6 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 	mutex_unlock(&activate_kmem_mutex);
 }
 
-struct create_work {
-	struct mem_cgroup *memcg;
-	struct kmem_cache *cachep;
-	struct work_struct work;
-};
-
 static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
@@ -3512,13 +3468,24 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
+struct create_work {
+	struct mem_cgroup *memcg;
+	struct kmem_cache *cachep;
+	struct work_struct work;
+};
+
 static void memcg_create_cache_work_func(struct work_struct *w)
 {
-	struct create_work *cw;
+	struct create_work *cw = container_of(w, struct create_work, work);
+	struct mem_cgroup *memcg = cw->memcg;
+	struct kmem_cache *s = cw->cachep;
+	struct kmem_cache *new;
 
-	cw = container_of(w, struct create_work, work);
-	memcg_create_kmem_cache(cw->memcg, cw->cachep);
-	css_put(&cw->memcg->css);
+	new = kmem_cache_create_memcg(memcg, s->name, s->object_size, s->align,
+				      (s->flags & ~SLAB_PANIC), s->ctor, s);
+	if (new)
+		new->allocflags |= __GFP_KMEMCG;
+	css_put(&memcg->css);
 	kfree(cw);
 }
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1ec3c619ba04..152d9b118b7a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -213,7 +213,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
 	s->align = calculate_alignment(flags, align, size);
 	s->ctor = ctor;
 
-	s->name = kstrdup(name, GFP_KERNEL);
+	if (!memcg)
+		s->name = kstrdup(name, GFP_KERNEL);
+	else
+		s->name = kasprintf(GFP_KERNEL, "%s:%d",
+				    name, memcg_cache_id(memcg));
 	if (!s->name)
 		goto out_free_cache;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-04 14:52   ` Michal Hocko
  2014-02-02 16:33 ` [PATCH 4/8] memcg, slab: separate memcg vs root cache creation paths Vladimir Davydov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Suppose we are creating memcg cache A that could be merged with cache B
of the same memcg. Since any memcg cache has the same parameters as its
parent cache, parent caches PA and PB of memcg caches A and B must be
mergeable too. That means PA was merged with PB on creation or vice
versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
even try to create cache B, because it already exists - a contradiction.

So let's remove unused code responsible for merging memcg caches.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/slab.h        |   21 ++++-----------------
 mm/slab_common.c |    8 +++++---
 mm/slub.c        |   19 +++++++++----------
 3 files changed, 18 insertions(+), 30 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 8184a7cde272..3045316b7c9d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -55,12 +55,12 @@ extern void create_boot_cache(struct kmem_cache *, const char *name,
 struct mem_cgroup;
 #ifdef CONFIG_SLUB
 struct kmem_cache *
-__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
-		   size_t align, unsigned long flags, void (*ctor)(void *));
+__kmem_cache_alias(const char *name, size_t size, size_t align,
+		   unsigned long flags, void (*ctor)(void *));
 #else
 static inline struct kmem_cache *
-__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
-		   size_t align, unsigned long flags, void (*ctor)(void *))
+__kmem_cache_alias(const char *name, size_t size, size_t align,
+		   unsigned long flags, void (*ctor)(void *))
 { return NULL; }
 #endif
 
@@ -119,13 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
-static inline bool cache_match_memcg(struct kmem_cache *cachep,
-				     struct mem_cgroup *memcg)
-{
-	return (is_root_cache(cachep) && !memcg) ||
-				(cachep->memcg_params->memcg == memcg);
-}
-
 static inline void memcg_bind_pages(struct kmem_cache *s, int order)
 {
 	if (!is_root_cache(s))
@@ -204,12 +197,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
-static inline bool cache_match_memcg(struct kmem_cache *cachep,
-				     struct mem_cgroup *memcg)
-{
-	return true;
-}
-
 static inline void memcg_bind_pages(struct kmem_cache *s, int order)
 {
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 152d9b118b7a..a75834bb966d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -200,9 +200,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	s = __kmem_cache_alias(memcg, name, size, align, flags, ctor);
-	if (s)
-		goto out_unlock;
+	if (!memcg) {
+		s = __kmem_cache_alias(name, size, align, flags, ctor);
+		if (s)
+			goto out_unlock;
+	}
 
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
diff --git a/mm/slub.c b/mm/slub.c
index 2b1a6970e46f..962abfdfde06 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3686,9 +3686,8 @@ static int slab_unmergeable(struct kmem_cache *s)
 	return 0;
 }
 
-static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
-		size_t align, unsigned long flags, const char *name,
-		void (*ctor)(void *))
+static struct kmem_cache *find_mergeable(size_t size, size_t align,
+		unsigned long flags, const char *name, void (*ctor)(void *))
 {
 	struct kmem_cache *s;
 
@@ -3707,11 +3706,14 @@ static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
 		if (slab_unmergeable(s))
 			continue;
 
+		if (!is_root_cache(s))
+			continue;
+
 		if (size > s->size)
 			continue;
 
 		if ((flags & SLUB_MERGE_SAME) != (s->flags & SLUB_MERGE_SAME))
-				continue;
+			continue;
 		/*
 		 * Check if alignment is compatible.
 		 * Courtesy of Adrian Drzewiecki
@@ -3722,21 +3724,18 @@ static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
 		if (s->size - size >= sizeof(void *))
 			continue;
 
-		if (!cache_match_memcg(s, memcg))
-			continue;
-
 		return s;
 	}
 	return NULL;
 }
 
 struct kmem_cache *
-__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
-		   size_t align, unsigned long flags, void (*ctor)(void *))
+__kmem_cache_alias(const char *name, size_t size, size_t align,
+		   unsigned long flags, void (*ctor)(void *))
 {
 	struct kmem_cache *s;
 
-	s = find_mergeable(memcg, size, align, flags, name, ctor);
+	s = find_mergeable(size, align, flags, name, ctor);
 	if (s) {
 		s->refcount++;
 		/*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/8] memcg, slab: separate memcg vs root cache creation paths
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
                   ` (2 preceding siblings ...)
  2014-02-02 16:33 ` [PATCH 3/8] memcg, slab: never try to merge memcg caches Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 5/8] slub: adjust memcg caches when creating cache alias Vladimir Davydov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
memcg-only and except-for-memcg calls. To clean this up, let's create a
separate function handling memcg caches creation. Although this will
result in the two functions having several hunks of practically the same
code, I guess this is the case when readability fully covers the cost of
code duplication.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    8 +--
 include/linux/slab.h       |    9 ++-
 mm/memcontrol.c            |   16 ++----
 mm/slab_common.c           |  133 ++++++++++++++++++++++++++------------------
 4 files changed, 92 insertions(+), 74 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index abd0113b6620..87b8c614798f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -497,8 +497,8 @@ void __memcg_kmem_commit_charge(struct page *page,
 void __memcg_kmem_uncharge_pages(struct page *page, int order);
 
 int memcg_cache_id(struct mem_cgroup *memcg);
-int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
-			     struct kmem_cache *root_cache);
+int memcg_alloc_cache_params(struct kmem_cache *s,
+		struct mem_cgroup *memcg, struct kmem_cache *root_cache);
 void memcg_free_cache_params(struct kmem_cache *s);
 void memcg_register_cache(struct kmem_cache *s);
 void memcg_unregister_cache(struct kmem_cache *s);
@@ -641,8 +641,8 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
 	return -1;
 }
 
-static inline int memcg_alloc_cache_params(struct mem_cgroup *memcg,
-		struct kmem_cache *s, struct kmem_cache *root_cache)
+static inline int memcg_alloc_cache_params(struct kmem_cache *s,
+		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	return 0;
 }
diff --git a/include/linux/slab.h b/include/linux/slab.h
index a060142aa5f5..b8a4bad71f57 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -113,11 +113,10 @@ void __init kmem_cache_init(void);
 int slab_is_available(void);
 
 struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
-			unsigned long,
-			void (*)(void *));
-struct kmem_cache *
-kmem_cache_create_memcg(struct mem_cgroup *, const char *, size_t, size_t,
-			unsigned long, void (*)(void *), struct kmem_cache *);
+				     unsigned long, void (*)(void *));
+#ifdef CONFIG_MEMCG_KMEM
+int kmem_cache_create_memcg(struct mem_cgroup *, struct kmem_cache *);
+#endif
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 void kmem_cache_free(struct kmem_cache *, void *);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3351c5b5486d..d69c427e106b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3201,8 +3201,8 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 	return 0;
 }
 
-int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
-			     struct kmem_cache *root_cache)
+int memcg_alloc_cache_params(struct kmem_cache *s,
+		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	size_t size;
 
@@ -3477,15 +3477,9 @@ struct create_work {
 static void memcg_create_cache_work_func(struct work_struct *w)
 {
 	struct create_work *cw = container_of(w, struct create_work, work);
-	struct mem_cgroup *memcg = cw->memcg;
-	struct kmem_cache *s = cw->cachep;
-	struct kmem_cache *new;
-
-	new = kmem_cache_create_memcg(memcg, s->name, s->object_size, s->align,
-				      (s->flags & ~SLAB_PANIC), s->ctor, s);
-	if (new)
-		new->allocflags |= __GFP_KMEMCG;
-	css_put(&memcg->css);
+
+	kmem_cache_create_memcg(cw->memcg, cw->cachep);
+	css_put(&cw->memcg->css);
 	kfree(cw);
 }
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a75834bb966d..ade86bcddab9 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -29,8 +29,7 @@ DEFINE_MUTEX(slab_mutex);
 struct kmem_cache *kmem_cache;
 
 #ifdef CONFIG_DEBUG_VM
-static int kmem_cache_sanity_check(struct mem_cgroup *memcg, const char *name,
-				   size_t size)
+static int kmem_cache_sanity_check(const char *name, size_t size)
 {
 	struct kmem_cache *s = NULL;
 
@@ -57,13 +56,7 @@ static int kmem_cache_sanity_check(struct mem_cgroup *memcg, const char *name,
 		}
 
 #if !defined(CONFIG_SLUB) || !defined(CONFIG_SLUB_DEBUG_ON)
-		/*
-		 * For simplicity, we won't check this in the list of memcg
-		 * caches. We have control over memcg naming, and if there
-		 * aren't duplicates in the global list, there won't be any
-		 * duplicates in the memcg lists as well.
-		 */
-		if (!memcg && !strcmp(s->name, name)) {
+		if (!strcmp(s->name, name)) {
 			pr_err("%s (%s): Cache name already exists.\n",
 			       __func__, name);
 			dump_stack();
@@ -77,8 +70,7 @@ static int kmem_cache_sanity_check(struct mem_cgroup *memcg, const char *name,
 	return 0;
 }
 #else
-static inline int kmem_cache_sanity_check(struct mem_cgroup *memcg,
-					  const char *name, size_t size)
+static inline int kmem_cache_sanity_check(const char *name, size_t size)
 {
 	return 0;
 }
@@ -164,11 +156,9 @@ unsigned long calculate_alignment(unsigned long flags,
  * cacheline.  This can be beneficial if you're counting cycles as closely
  * as davem.
  */
-
 struct kmem_cache *
-kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
-			size_t align, unsigned long flags, void (*ctor)(void *),
-			struct kmem_cache *parent_cache)
+kmem_cache_create(const char *name, size_t size, size_t align,
+		  unsigned long flags, void (*ctor)(void *))
 {
 	struct kmem_cache *s = NULL;
 	int err;
@@ -176,22 +166,10 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
 	get_online_cpus();
 	mutex_lock(&slab_mutex);
 
-	err = kmem_cache_sanity_check(memcg, name, size);
+	err = kmem_cache_sanity_check(name, size);
 	if (err)
 		goto out_unlock;
 
-	if (memcg) {
-		/*
-		 * Since per-memcg caches are created asynchronously on first
-		 * allocation (see memcg_kmem_get_cache()), several threads can
-		 * try to create the same cache, but only one of them may
-		 * succeed. Therefore if we get here and see the cache has
-		 * already been created, we silently return NULL.
-		 */
-		if (cache_from_memcg_idx(parent_cache, memcg_cache_id(memcg)))
-			goto out_unlock;
-	}
-
 	/*
 	 * Some allocators will constraint the set of valid flags to a subset
 	 * of all flags. We expect them to define CACHE_CREATE_MASK in this
@@ -200,11 +178,9 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	if (!memcg) {
-		s = __kmem_cache_alias(name, size, align, flags, ctor);
-		if (s)
-			goto out_unlock;
-	}
+	s = __kmem_cache_alias(name, size, align, flags, ctor);
+	if (s)
+		goto out_unlock;
 
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
@@ -215,15 +191,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
 	s->align = calculate_alignment(flags, align, size);
 	s->ctor = ctor;
 
-	if (!memcg)
-		s->name = kstrdup(name, GFP_KERNEL);
-	else
-		s->name = kasprintf(GFP_KERNEL, "%s:%d",
-				    name, memcg_cache_id(memcg));
+	s->name = kstrdup(name, GFP_KERNEL);
 	if (!s->name)
 		goto out_free_cache;
 
-	err = memcg_alloc_cache_params(memcg, s, parent_cache);
+	err = memcg_alloc_cache_params(s, NULL, NULL);
 	if (err)
 		goto out_free_cache;
 
@@ -240,16 +212,6 @@ out_unlock:
 	put_online_cpus();
 
 	if (err) {
-		/*
-		 * There is no point in flooding logs with warnings or
-		 * especially crashing the system if we fail to create a cache
-		 * for a memcg. In this case we will be accounting the memcg
-		 * allocation to the root cgroup until we succeed to create its
-		 * own cache, but it isn't that critical.
-		 */
-		if (!memcg)
-			return NULL;
-
 		if (flags & SLAB_PANIC)
 			panic("kmem_cache_create: Failed to create slab '%s'. Error %d\n",
 				name, err);
@@ -268,14 +230,77 @@ out_free_cache:
 	kmem_cache_free(kmem_cache, s);
 	goto out_unlock;
 }
+EXPORT_SYMBOL(kmem_cache_create);
 
-struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
-		  unsigned long flags, void (*ctor)(void *))
+#ifdef CONFIG_MEMCG_KMEM
+/*
+ * kmem_cache_create_memcg - Create a cache for a memory cgroup.
+ * @memcg: The memory cgroup the new cache is for.
+ * @cachep: The parent of the new cache.
+ *
+ * This function creates a kmem cache that will serve allocation requests going
+ * from @memcg to @cachep. The new cache inherits properties from its parent.
+ *
+ * Returns 0 on success, -errno on failure.
+ */
+int kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *cachep)
 {
-	return kmem_cache_create_memcg(NULL, name, size, align, flags, ctor, NULL);
+	struct kmem_cache *s;
+	int err;
+
+	get_online_cpus();
+	mutex_lock(&slab_mutex);
+
+	/*
+	 * Since per-memcg caches are created asynchronously on first
+	 * allocation (see memcg_kmem_get_cache()), several threads can try to
+	 * create the same cache, but only one of them may succeed.
+	 */
+	err = -EEXIST;
+	if (cache_from_memcg_idx(cachep, memcg_cache_id(memcg)))
+		goto out_unlock;
+
+	err = -ENOMEM;
+	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
+	if (!s)
+		goto out_unlock;
+
+	s->object_size = cachep->object_size;
+	s->size = cachep->size;
+	s->align = cachep->align;
+	s->ctor = cachep->ctor;
+
+	s->name = kasprintf(GFP_KERNEL, "%s:%d",
+			    cachep->name, memcg_cache_id(memcg));
+	if (!s->name)
+		goto out_free_cache;
+
+	err = memcg_alloc_cache_params(s, memcg, cachep);
+	if (err)
+		goto out_free_cache;
+
+	err = __kmem_cache_create(s, cachep->flags & ~SLAB_PANIC);
+	if (err)
+		goto out_free_cache;
+
+	s->refcount = 1;
+	s->allocflags |= __GFP_KMEMCG;
+	list_add(&s->list, &slab_caches);
+	memcg_register_cache(s);
+
+out_unlock:
+	mutex_unlock(&slab_mutex);
+	put_online_cpus();
+
+	return err;
+
+out_free_cache:
+	memcg_free_cache_params(s);
+	kfree(s->name);
+	kmem_cache_free(kmem_cache, s);
+	goto out_unlock;
 }
-EXPORT_SYMBOL(kmem_cache_create);
+#endif /* CONFIG_MEMCG_KMEM */
 
 void kmem_cache_destroy(struct kmem_cache *s)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/8] slub: adjust memcg caches when creating cache alias
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
                   ` (3 preceding siblings ...)
  2014-02-02 16:33 ` [PATCH 4/8] memcg, slab: separate memcg vs root cache creation paths Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 6/8] slub: rework sysfs layout for memcg caches Vladimir Davydov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Otherwise, kzalloc() called from a memcg won't clear the whole object.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/slub.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 962abfdfde06..a33d88afb61d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3737,7 +3737,11 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 
 	s = find_mergeable(size, align, flags, name, ctor);
 	if (s) {
+		int i;
+		struct kmem_cache *c;
+
 		s->refcount++;
+
 		/*
 		 * Adjust the object sizes so that we clear
 		 * the complete object on kzalloc.
@@ -3745,6 +3749,16 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
 		s->object_size = max(s->object_size, (int)size);
 		s->inuse = max_t(int, s->inuse, ALIGN(size, sizeof(void *)));
 
+		BUG_ON(!is_root_cache(s));
+		for_each_memcg_cache_index(i) {
+			c = cache_from_memcg_idx(s, i);
+			if (!c)
+				continue;
+			c->object_size = s->object_size;
+			c->inuse = max_t(int, c->inuse,
+					 ALIGN(size, sizeof(void *)));
+		}
+
 		if (sysfs_slab_alias(s, name)) {
 			s->refcount--;
 			s = NULL;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/8] slub: rework sysfs layout for memcg caches
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
                   ` (4 preceding siblings ...)
  2014-02-02 16:33 ` [PATCH 5/8] slub: adjust memcg caches when creating cache alias Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 7/8] memcg, slab: unregister cache from memcg before starting to destroy it Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 8/8] memcg, slab: do not destroy children caches if parent has aliases Vladimir Davydov
  7 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches. Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache. For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:

  X
  A -> X
  B -> X

where X is some unique id (see create_unique_id()). Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:

  X
  X:M
  X:N
  A -> X
  B -> X
  A:M -> X:M
  A:N -> X:N

Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.

It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory. This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.

This patch does the trick. It creates a "cgroup" kset in each root cache
kobject to keep its children caches there.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/slub_def.h |    3 ++
 mm/slub.c                |   88 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 74 insertions(+), 17 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f56bfa9e4526..f2f7398848cf 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -87,6 +87,9 @@ struct kmem_cache {
 #ifdef CONFIG_MEMCG_KMEM
 	struct memcg_cache_params *memcg_params;
 	int max_attr_size; /* for propagation, maximum size of a stored attr */
+#ifdef CONFIG_SYSFS
+	struct kset *memcg_kset;
+#endif
 #endif
 
 #ifdef CONFIG_NUMA
diff --git a/mm/slub.c b/mm/slub.c
index a33d88afb61d..c3cf0129eda5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5125,6 +5125,44 @@ static const struct kset_uevent_ops slab_uevent_ops = {
 
 static struct kset *slab_kset;
 
+#ifdef CONFIG_MEMCG_KMEM
+static inline struct kset *cache_kset(struct kmem_cache *s)
+{
+	if (is_root_cache(s))
+		return slab_kset;
+	return s->memcg_params->root_cache->memcg_kset;
+}
+
+static int create_cache_memcg_kset(struct kmem_cache *s)
+{
+	if (!is_root_cache(s))
+		return 0;
+	s->memcg_kset = kset_create_and_add("cgroup", NULL, &s->kobj);
+	if (!s->memcg_kset)
+		return -ENOMEM;
+	return 0;
+}
+
+static void destroy_cache_memcg_kset(struct kmem_cache *s)
+{
+	kset_unregister(s->memcg_kset);
+}
+#else
+static inline struct kset *cache_kset(struct kmem_cache *s)
+{
+	return slab_kset;
+}
+
+static inline int create_cache_memcg_kset(struct kmem_cache *s)
+{
+	return 0;
+}
+
+static inline void destroy_cache_memcg_kset(struct kmem_cache *s)
+{
+}
+#endif /* CONFIG_MEMCG_KMEM */
+
 #define ID_STR_LENGTH 64
 
 /* Create a unique string id for a slab cache:
@@ -5136,7 +5174,8 @@ static char *create_unique_id(struct kmem_cache *s)
 	char *name = kmalloc(ID_STR_LENGTH, GFP_KERNEL);
 	char *p = name;
 
-	BUG_ON(!name);
+	if (!name)
+		return NULL;
 
 	*p++ = ':';
 	/*
@@ -5160,8 +5199,7 @@ static char *create_unique_id(struct kmem_cache *s)
 
 #ifdef CONFIG_MEMCG_KMEM
 	if (!is_root_cache(s))
-		p += sprintf(p, "-%08d",
-				memcg_cache_id(s->memcg_params->memcg));
+		p += sprintf(p, ":%d", memcg_cache_id(s->memcg_params->memcg));
 #endif
 
 	BUG_ON(p > name + ID_STR_LENGTH - 1);
@@ -5180,7 +5218,8 @@ static int sysfs_slab_add(struct kmem_cache *s)
 		 * This is typically the case for debug situations. In that
 		 * case we can catch duplicate names easily.
 		 */
-		sysfs_remove_link(&slab_kset->kobj, s->name);
+		if (is_root_cache(s))
+			sysfs_remove_link(&slab_kset->kobj, s->name);
 		name = s->name;
 	} else {
 		/*
@@ -5188,28 +5227,40 @@ static int sysfs_slab_add(struct kmem_cache *s)
 		 * for the symlinks.
 		 */
 		name = create_unique_id(s);
+		if (!name)
+			return -ENOMEM;
 	}
 
-	s->kobj.kset = slab_kset;
+	s->kobj.kset = cache_kset(s);
 	err = kobject_init_and_add(&s->kobj, &slab_ktype, NULL, name);
-	if (err) {
-		kobject_put(&s->kobj);
-		return err;
-	}
+	if (err)
+		goto out_put_kobj;
+
+	err = create_cache_memcg_kset(s);
+	if (err)
+		goto out_del_kobj;
 
 	err = sysfs_create_group(&s->kobj, &slab_attr_group);
-	if (err) {
-		kobject_del(&s->kobj);
-		kobject_put(&s->kobj);
-		return err;
-	}
+	if (err)
+		goto out_del_memcg_kset;
+
 	kobject_uevent(&s->kobj, KOBJ_ADD);
-	if (!unmergeable) {
+	if (!unmergeable && is_root_cache(s)) {
 		/* Setup first alias */
 		sysfs_slab_alias(s, s->name);
-		kfree(name);
 	}
-	return 0;
+out:
+	if (!unmergeable)
+		kfree(name);
+	return err;
+
+out_del_memcg_kset:
+	destroy_cache_memcg_kset(s);
+out_del_kobj:
+	kobject_del(&s->kobj);
+out_put_kobj:
+	kobject_put(&s->kobj);
+	goto out;
 }
 
 static void sysfs_slab_remove(struct kmem_cache *s)
@@ -5221,6 +5272,7 @@ static void sysfs_slab_remove(struct kmem_cache *s)
 		 */
 		return;
 
+	destroy_cache_memcg_kset(s);
 	kobject_uevent(&s->kobj, KOBJ_REMOVE);
 	kobject_del(&s->kobj);
 	kobject_put(&s->kobj);
@@ -5242,6 +5294,8 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
 {
 	struct saved_alias *al;
 
+	BUG_ON(!is_root_cache(s));
+
 	if (slab_state == FULL) {
 		/*
 		 * If we have a leftover link then remove it.
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/8] memcg, slab: unregister cache from memcg before starting to destroy it
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
                   ` (5 preceding siblings ...)
  2014-02-02 16:33 ` [PATCH 6/8] slub: rework sysfs layout for memcg caches Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  2014-02-02 16:33 ` [PATCH 8/8] memcg, slab: do not destroy children caches if parent has aliases Vladimir Davydov
  7 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Currently, memcg_unregister_cache(), which deletes the cache being
destroyed from the memcg_slab_caches list, is called after
__kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to
destroy the cache. As a result, one can access a partially destroyed
cache while traversing a memcg_slab_caches list, which can have deadly
consequences (for instance, cache_show() called for each cache on a
memcg_slab_caches list from mem_cgroup_slabinfo_read() will dereference
pointers to already freed data).

To fix this, let's move memcg_unregister_cache() before the cache
destruction process beginning, issuing memcg_register_cache() on
failure.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/memcontrol.c  |   12 ++++++------
 mm/slab_common.c |    3 ++-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d69c427e106b..69e8726aae4f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3224,6 +3224,7 @@ int memcg_alloc_cache_params(struct kmem_cache *s,
 		s->memcg_params->root_cache = root_cache;
 		INIT_WORK(&s->memcg_params->destroy,
 				kmem_cache_destroy_work_func);
+		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
 
@@ -3232,6 +3233,10 @@ int memcg_alloc_cache_params(struct kmem_cache *s,
 
 void memcg_free_cache_params(struct kmem_cache *s)
 {
+	if (!s->memcg_params)
+		return;
+	if (!s->memcg_params->is_root_cache)
+		css_put(&s->memcg_params->memcg->css);
 	kfree(s->memcg_params);
 }
 
@@ -3254,9 +3259,6 @@ void memcg_register_cache(struct kmem_cache *s)
 	memcg = s->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
-	css_get(&memcg->css);
-
-
 	/*
 	 * Since readers won't lock (see cache_from_memcg_idx()), we need a
 	 * barrier here to ensure nobody will see the kmem_cache partially
@@ -3305,10 +3307,8 @@ void memcg_unregister_cache(struct kmem_cache *s)
 	 * after removing it from the memcg_slab_caches list, otherwise we can
 	 * fail to convert memcg_params_to_cache() while traversing the list.
 	 */
-	VM_BUG_ON(!root->memcg_params->memcg_caches[id]);
+	VM_BUG_ON(root->memcg_params->memcg_caches[id] != s);
 	root->memcg_params->memcg_caches[id] = NULL;
-
-	css_put(&memcg->css);
 }
 
 /*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index ade86bcddab9..ea1075e65271 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -312,9 +312,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	s->refcount--;
 	if (!s->refcount) {
 		list_del(&s->list);
+		memcg_unregister_cache(s);
 
 		if (!__kmem_cache_shutdown(s)) {
-			memcg_unregister_cache(s);
 			mutex_unlock(&slab_mutex);
 			if (s->flags & SLAB_DESTROY_BY_RCU)
 				rcu_barrier();
@@ -324,6 +324,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 			kmem_cache_free(kmem_cache, s);
 		} else {
 			list_add(&s->list, &slab_caches);
+			memcg_register_cache(s);
 			mutex_unlock(&slab_mutex);
 			printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
 				s->name);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/8] memcg, slab: do not destroy children caches if parent has aliases
  2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
                   ` (6 preceding siblings ...)
  2014-02-02 16:33 ` [PATCH 7/8] memcg, slab: unregister cache from memcg before starting to destroy it Vladimir Davydov
@ 2014-02-02 16:33 ` Vladimir Davydov
  7 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-02 16:33 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

Currently we destroy children caches at the very beginning of
kmem_cache_destroy(). This is wrong, because the root cache will not
necessarily be destroyed in the end - if it has aliases (refcount > 0),
kmem_cache_destroy() will simply decrement its refcount and return. In
this case, at best we will get a bunch of warnings in dmesg, like this
one:

  kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
  CPU: 1 PID: 7139 Comm: modprobe Tainted: G    B   W    3.13.0+ #117
  Hardware name:
   ffff88007d7a6368 ffff880039b07e48 ffffffff8168cc2e ffff88007d7a6d68
   ffff88007d7a6300 ffff880039b07e68 ffffffff81175e9f 0000000000000000
   ffff88007d7a6300 ffff880039b07e98 ffffffff811b67c7 ffff88003e803c00
  Call Trace:
   [<ffffffff8168cc2e>] dump_stack+0x49/0x5b
   [<ffffffff81175e9f>] kmem_cache_destroy+0xdf/0xf0
   [<ffffffff811b67c7>] kmem_cache_destroy_memcg_children+0x97/0xc0
   [<ffffffff81175dcf>] kmem_cache_destroy+0xf/0xf0
   [<ffffffffa0130b21>] xfs_mru_cache_uninit+0x21/0x30 [xfs]
   [<ffffffffa01893ea>] exit_xfs_fs+0x2e/0xc44 [xfs]
   [<ffffffff810eeb58>] SyS_delete_module+0x198/0x1f0
   [<ffffffff816994f9>] system_call_fastpath+0x16/0x1b

At worst - if kmem_cache_destroy() will race with an allocation from a
memcg cache - the kernel will panic.

This patch fixes this by moving children caches destruction after the
check if the cache has aliases. Plus, it forbids destroying a root cache
if it still has children caches, because each children cache keeps a
reference to its parent.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    5 ---
 mm/memcontrol.c            |    2 +-
 mm/slab.h                  |   17 ++++++++--
 mm/slab_common.c           |   74 +++++++++++++++++++++++++++++---------------
 4 files changed, 65 insertions(+), 33 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 87b8c614798f..69f6b0f84cb4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -510,7 +510,6 @@ struct kmem_cache *
 __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 
 void mem_cgroup_destroy_cache(struct kmem_cache *cachep);
-void kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
 /**
  * memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
@@ -664,10 +663,6 @@ memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
 	return cachep;
 }
-
-static inline void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
-{
-}
 #endif /* CONFIG_MEMCG_KMEM */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 69e8726aae4f..c3f9afaeef3e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3405,7 +3405,7 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
 	schedule_work(&cachep->memcg_params->destroy);
 }
 
-void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
+void __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
 	int i;
diff --git a/mm/slab.h b/mm/slab.h
index 3045316b7c9d..b5ad968020a3 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -191,7 +191,16 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s)
 		return s;
 	return s->memcg_params->root_cache;
 }
-#else
+
+extern void __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
+
+static inline void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
+{
+	mutex_unlock(&slab_mutex);
+	__kmem_cache_destroy_memcg_children(s);
+	mutex_lock(&slab_mutex);
+}
+#else /* !CONFIG_MEMCG_KMEM */
 static inline bool is_root_cache(struct kmem_cache *s)
 {
 	return true;
@@ -226,7 +235,11 @@ static inline struct kmem_cache *memcg_root_cache(struct kmem_cache *s)
 {
 	return s;
 }
-#endif
+
+static inline void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
+{
+}
+#endif /* CONFIG_MEMCG_KMEM */
 
 static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x)
 {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index ea1075e65271..f83e0cf939a4 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -302,38 +302,62 @@ out_free_cache:
 }
 #endif /* CONFIG_MEMCG_KMEM */
 
-void kmem_cache_destroy(struct kmem_cache *s)
+static bool cache_has_children(struct kmem_cache *s)
 {
-	/* Destroy all the children caches if we aren't a memcg cache */
-	kmem_cache_destroy_memcg_children(s);
+	int i;
+
+	if (!is_root_cache(s))
+		return false;
+	for_each_memcg_cache_index(i) {
+		if (cache_from_memcg_idx(s, i))
+			return true;
+	}
+	return false;
+}
 
+void kmem_cache_destroy(struct kmem_cache *s)
+{
 	get_online_cpus();
 	mutex_lock(&slab_mutex);
+
 	s->refcount--;
-	if (!s->refcount) {
-		list_del(&s->list);
-		memcg_unregister_cache(s);
-
-		if (!__kmem_cache_shutdown(s)) {
-			mutex_unlock(&slab_mutex);
-			if (s->flags & SLAB_DESTROY_BY_RCU)
-				rcu_barrier();
-
-			memcg_free_cache_params(s);
-			kfree(s->name);
-			kmem_cache_free(kmem_cache, s);
-		} else {
-			list_add(&s->list, &slab_caches);
-			memcg_register_cache(s);
-			mutex_unlock(&slab_mutex);
-			printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
-				s->name);
-			dump_stack();
-		}
-	} else {
-		mutex_unlock(&slab_mutex);
+	if (s->refcount)
+		goto out_unlock;
+
+	list_del(&s->list);
+	memcg_unregister_cache(s);
+
+	/* Destroy all the children caches if we aren't a memcg cache */
+	kmem_cache_destroy_memcg_children(s);
+	if (cache_has_children(s))
+		goto out_undelete;
+
+	if (__kmem_cache_shutdown(s) != 0) {
+		printk(KERN_ERR "kmem_cache_destroy %s: "
+		       "Slab cache still has objects\n", s->name);
+		dump_stack();
+		goto out_undelete;
 	}
+
+	mutex_unlock(&slab_mutex);
+	if (s->flags & SLAB_DESTROY_BY_RCU)
+		rcu_barrier();
+
+	memcg_free_cache_params(s);
+	kfree(s->name);
+	kmem_cache_free(kmem_cache, s);
+	goto out_put_cpus; /* slab_mutex already unlocked */
+
+out_unlock:
+	mutex_unlock(&slab_mutex);
+out_put_cpus:
 	put_online_cpus();
+	return;
+
+out_undelete:
+	list_add(&s->list, &slab_caches);
+	memcg_register_cache(s);
+	goto out_unlock;
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-02 16:33 ` [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs Vladimir Davydov
@ 2014-02-03  6:21   ` David Rientjes
  2014-02-03  6:57     ` Vladimir Davydov
  2014-02-04 14:40   ` Michal Hocko
  1 sibling, 1 reply; 33+ messages in thread
From: David Rientjes @ 2014-02-03  6:21 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, mhocko, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Sun, 2 Feb 2014, Vladimir Davydov wrote:

> Per-memcg kmem caches are named as follows:
> 
>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
> 
> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
> Cache names are exposed to userspace for debugging purposes (e.g. via
> sysfs in case of slub or via dmesg).
> 
> Using relative names makes it impossible in general (in case the cgroup
> hierarchy is not flat) to find out which memcg a particular cache
> belongs to, because <cgroup-kmem-id> is not known to the user. Since
> using absolute cgroup names would be an overkill, let's fix this by
> exporting the id of kmem-active memcg via cgroup fs file
> "memory.kmem.id".
> 

Hmm, I'm not sure exporting additional information is the best way to do 
it only for this purpose.  I do understand the problem in naming 
collisions if the hierarchy isn't flat and we typically work around that 
by ensuring child memcgs still have a unique memcg.  This isn't only a 
problem in slab cache naming, me also avoid printing the entire absolute 
names for things like the oom killer.  So it would be nice to have 
consensus on how people are supposed to identify memcgs with a hierarchy: 
either by exporting information like the id like you do here (but leave 
the oom killer still problematic) or by insisting people name their memcgs 
with unique names if they care to differentiate them.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03  6:21   ` David Rientjes
@ 2014-02-03  6:57     ` Vladimir Davydov
  2014-02-03  7:19       ` Vladimir Davydov
                         ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-03  6:57 UTC (permalink / raw)
  To: David Rientjes
  Cc: akpm, mhocko, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/03/2014 10:21 AM, David Rientjes wrote:
> On Sun, 2 Feb 2014, Vladimir Davydov wrote:
>
>> Per-memcg kmem caches are named as follows:
>>
>>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
>>
>> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
>> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
>> Cache names are exposed to userspace for debugging purposes (e.g. via
>> sysfs in case of slub or via dmesg).
>>
>> Using relative names makes it impossible in general (in case the cgroup
>> hierarchy is not flat) to find out which memcg a particular cache
>> belongs to, because <cgroup-kmem-id> is not known to the user. Since
>> using absolute cgroup names would be an overkill, let's fix this by
>> exporting the id of kmem-active memcg via cgroup fs file
>> "memory.kmem.id".
>>
> Hmm, I'm not sure exporting additional information is the best way to do 
> it only for this purpose.  I do understand the problem in naming 
> collisions if the hierarchy isn't flat and we typically work around that 
> by ensuring child memcgs still have a unique memcg.  This isn't only a 
> problem in slab cache naming, me also avoid printing the entire absolute 
> names for things like the oom killer.

AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
memcg slab cache names serve for different purposes. The point is oom is
a perfectly normal situation for the kernel, and info dumped to dmesg is
for admin to find out the cause of the problem (a greedy user or
cgroup). On the other hand, slab cache names are dumped to dmesg only on
extraordinary situations - like bugs in slab implementation, or double
free, or detected memory leaks - where we usually do not need the name
of the memcg that triggered the problem, because the bug is likely to be
in the kernel subsys using the cache. Plus, the names are exported to
sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
use cases for oom vs slab names are completely different - information
vs debugging - and I want to export kmem.id only for the ability of
debugging kmemcg and slab subsystems.

> So it would be nice to have 
> consensus on how people are supposed to identify memcgs with a hierarchy: 
> either by exporting information like the id like you do here (but leave 
> the oom killer still problematic) or by insisting people name their memcgs 
> with unique names if they care to differentiate them.

Anyway, I agree with you that this needs a consensus, because this is a
functional change.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03  6:57     ` Vladimir Davydov
@ 2014-02-03  7:19       ` Vladimir Davydov
  2014-02-03 10:05       ` Glauber Costa
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-03  7:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Rientjes, akpm, mhocko, penberg, cl, glommer, linux-mm,
	devel, Johannes Weiner, Hugh Dickins

[adding Johannes Weiner and Hugh Dickins to cc in case they have
something to object against this]

On 02/03/2014 10:57 AM, Vladimir Davydov wrote:
> On 02/03/2014 10:21 AM, David Rientjes wrote:
>> On Sun, 2 Feb 2014, Vladimir Davydov wrote:
>>
>>> Per-memcg kmem caches are named as follows:
>>>
>>>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
>>>
>>> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
>>> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
>>> Cache names are exposed to userspace for debugging purposes (e.g. via
>>> sysfs in case of slub or via dmesg).
>>>
>>> Using relative names makes it impossible in general (in case the cgroup
>>> hierarchy is not flat) to find out which memcg a particular cache
>>> belongs to, because <cgroup-kmem-id> is not known to the user. Since
>>> using absolute cgroup names would be an overkill, let's fix this by
>>> exporting the id of kmem-active memcg via cgroup fs file
>>> "memory.kmem.id".
>>>
>> Hmm, I'm not sure exporting additional information is the best way to do 
>> it only for this purpose.  I do understand the problem in naming 
>> collisions if the hierarchy isn't flat and we typically work around that 
>> by ensuring child memcgs still have a unique memcg.  This isn't only a 
>> problem in slab cache naming, me also avoid printing the entire absolute 
>> names for things like the oom killer.
> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
> memcg slab cache names serve for different purposes. The point is oom is
> a perfectly normal situation for the kernel, and info dumped to dmesg is
> for admin to find out the cause of the problem (a greedy user or
> cgroup). On the other hand, slab cache names are dumped to dmesg only on
> extraordinary situations - like bugs in slab implementation, or double
> free, or detected memory leaks - where we usually do not need the name
> of the memcg that triggered the problem, because the bug is likely to be
> in the kernel subsys using the cache. Plus, the names are exported to
> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
> use cases for oom vs slab names are completely different - information
> vs debugging - and I want to export kmem.id only for the ability of
> debugging kmemcg and slab subsystems.
>
>> So it would be nice to have 
>> consensus on how people are supposed to identify memcgs with a hierarchy: 
>> either by exporting information like the id like you do here (but leave 
>> the oom killer still problematic) or by insisting people name their memcgs 
>> with unique names if they care to differentiate them.
> Anyway, I agree with you that this needs a consensus, because this is a
> functional change.
>
> Thanks.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03  6:57     ` Vladimir Davydov
  2014-02-03  7:19       ` Vladimir Davydov
@ 2014-02-03 10:05       ` Glauber Costa
  2014-02-03 13:01         ` Vladimir Davydov
  2014-02-03 11:04       ` David Rientjes
  2014-02-04 14:44       ` Michal Hocko
  3 siblings, 1 reply; 33+ messages in thread
From: Glauber Costa @ 2014-02-03 10:05 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: David Rientjes, Andrew Morton, Michal Hocko, Pekka Enberg,
	Christoph Lameter, linux-mm, LKML, devel

On Mon, Feb 3, 2014 at 10:57 AM, Vladimir Davydov
<vdavydov@parallels.com> wrote:
> On 02/03/2014 10:21 AM, David Rientjes wrote:
>> On Sun, 2 Feb 2014, Vladimir Davydov wrote:
>>
>>> Per-memcg kmem caches are named as follows:
>>>
>>>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
>>>
>>> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
>>> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
>>> Cache names are exposed to userspace for debugging purposes (e.g. via
>>> sysfs in case of slub or via dmesg).
>>>
>>> Using relative names makes it impossible in general (in case the cgroup
>>> hierarchy is not flat) to find out which memcg a particular cache
>>> belongs to, because <cgroup-kmem-id> is not known to the user. Since
>>> using absolute cgroup names would be an overkill, let's fix this by
>>> exporting the id of kmem-active memcg via cgroup fs file
>>> "memory.kmem.id".
>>>
>> Hmm, I'm not sure exporting additional information is the best way to do
>> it only for this purpose.  I do understand the problem in naming
>> collisions if the hierarchy isn't flat and we typically work around that
>> by ensuring child memcgs still have a unique memcg.  This isn't only a
>> problem in slab cache naming, me also avoid printing the entire absolute
>> names for things like the oom killer.
>
> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
> memcg slab cache names serve for different purposes. The point is oom is
> a perfectly normal situation for the kernel, and info dumped to dmesg is
> for admin to find out the cause of the problem (a greedy user or
> cgroup). On the other hand, slab cache names are dumped to dmesg only on
> extraordinary situations - like bugs in slab implementation, or double
> free, or detected memory leaks - where we usually do not need the name
> of the memcg that triggered the problem, because the bug is likely to be
> in the kernel subsys using the cache. Plus, the names are exported to
> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
> use cases for oom vs slab names are completely different - information
> vs debugging - and I want to export kmem.id only for the ability of
> debugging kmemcg and slab subsystems.
>

Then maybe it is better to wrap it into some kind of CONFIG_DEBUG wrap.
We already have other files like that.

>> So it would be nice to have
>> consensus on how people are supposed to identify memcgs with a hierarchy:
>> either by exporting information like the id like you do here (but leave
>> the oom killer still problematic) or by insisting people name their memcgs
>> with unique names if they care to differentiate them.
>
> Anyway, I agree with you that this needs a consensus, because this is a
> functional change.
>
> Thanks.



-- 
E Mare, Libertas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03  6:57     ` Vladimir Davydov
  2014-02-03  7:19       ` Vladimir Davydov
  2014-02-03 10:05       ` Glauber Costa
@ 2014-02-03 11:04       ` David Rientjes
  2014-02-03 13:00         ` Vladimir Davydov
  2014-02-04 14:44       ` Michal Hocko
  3 siblings, 1 reply; 33+ messages in thread
From: David Rientjes @ 2014-02-03 11:04 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, mhocko, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Mon, 3 Feb 2014, Vladimir Davydov wrote:

> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
> memcg slab cache names serve for different purposes.

Sure, you may dump the name for a number of legitimate reasons, but the 
problem still exists that it's difficult to determine what memcg is being 
referenced without a flat hierarchy and unique memcg names for all 
children.

> The point is oom is
> a perfectly normal situation for the kernel, and info dumped to dmesg is
> for admin to find out the cause of the problem (a greedy user or
> cgroup).

Hmm, so if we hand out top-level memcgs to individual jobs or users, like 
our userspace does, and they are able to configure their child memcgs as 
they wish, and then they or the admin finds in the kernel log that a 
memory hog was killed from the memcg with the perfectly anonymous memcg 
name of "memcg", how do we determine what job or user triggered that kill?  
User id is not going to be conclusive in a production environment with 
shared user accounts.

> On the other hand, slab cache names are dumped to dmesg only on
> extraordinary situations - like bugs in slab implementation, or double
> free, or detected memory leaks - where we usually do not need the name
> of the memcg that triggered the problem, because the bug is likely to be
> in the kernel subsys using the cache.

There's certainly overlap here since slab leaks triggered by a particular 
workload, perhaps by usage of a particular syscall, can occur and cause 
oom killing but the problem remains that neither the memcg name nor the 
slab cache name may be conclusive to determine what job or user triggered 
the issue.  That's why we make strict demands that memcg names are always 
unique and encode several key values to identify the user and job and we 
don't rely on the parent.

I can also see the huge maintenance burden it would be to keep around a 
mapping of kmem ids to {user, job} pairs just in case we later identify a 
problem and in 99% of the cases would be just wasted storage.

> Plus, the names are exported to
> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
> use cases for oom vs slab names are completely different - information
> vs debugging - and I want to export kmem.id only for the ability of
> debugging kmemcg and slab subsystems.
> 

Eeek, I'm not sure I agree.  I've often found that reproducing rare slab 
issues is very difficult without knowledge of the workload so that I can 
reproduce it.  Whereas X is a very large number of machines and we see 
this issue on 0.0001% of X machines, I would be required to enable this 
"debugging" aid unconditionally to ever be able to map the stored kmem id 
back to a user and job, that mapping would be extremely costly to 
maintain, and we've gained nothing if we had already demanded that 
userspace identify their memcg names with unique identifiers regardless of 
where they are in the hierarchy.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03 11:04       ` David Rientjes
@ 2014-02-03 13:00         ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-03 13:00 UTC (permalink / raw)
  To: David Rientjes
  Cc: akpm, mhocko, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/03/2014 03:04 PM, David Rientjes wrote:
> On Mon, 3 Feb 2014, Vladimir Davydov wrote:
>
>> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
>> memcg slab cache names serve for different purposes.
> Sure, you may dump the name for a number of legitimate reasons, but the 
> problem still exists that it's difficult to determine what memcg is being 
> referenced without a flat hierarchy and unique memcg names for all 
> children.
>
>> The point is oom is
>> a perfectly normal situation for the kernel, and info dumped to dmesg is
>> for admin to find out the cause of the problem (a greedy user or
>> cgroup).
> Hmm, so if we hand out top-level memcgs to individual jobs or users, like 
> our userspace does, and they are able to configure their child memcgs as 
> they wish, and then they or the admin finds in the kernel log that a 
> memory hog was killed from the memcg with the perfectly anonymous memcg 
> name of "memcg", how do we determine what job or user triggered that kill?  
> User id is not going to be conclusive in a production environment with 
> shared user accounts.
>
>> On the other hand, slab cache names are dumped to dmesg only on
>> extraordinary situations - like bugs in slab implementation, or double
>> free, or detected memory leaks - where we usually do not need the name
>> of the memcg that triggered the problem, because the bug is likely to be
>> in the kernel subsys using the cache.
> There's certainly overlap here since slab leaks triggered by a particular 
> workload, perhaps by usage of a particular syscall, can occur and cause 
> oom killing but the problem remains that neither the memcg name nor the 
> slab cache name may be conclusive to determine what job or user triggered 
> the issue.  That's why we make strict demands that memcg names are always 
> unique and encode several key values to identify the user and job and we 
> don't rely on the parent.
>
> I can also see the huge maintenance burden it would be to keep around a 
> mapping of kmem ids to {user, job} pairs just in case we later identify a 
> problem and in 99% of the cases would be just wasted storage.
>
>> Plus, the names are exported to
>> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
>> use cases for oom vs slab names are completely different - information
>> vs debugging - and I want to export kmem.id only for the ability of
>> debugging kmemcg and slab subsystems.
>>
> Eeek, I'm not sure I agree.  I've often found that reproducing rare slab 
> issues is very difficult without knowledge of the workload so that I can 
> reproduce it.  Whereas X is a very large number of machines and we see 
> this issue on 0.0001% of X machines, I would be required to enable this 
> "debugging" aid unconditionally to ever be able to map the stored kmem id 
> back to a user and job, that mapping would be extremely costly to 
> maintain, and we've gained nothing if we had already demanded that 
> userspace identify their memcg names with unique identifiers regardless of 
> where they are in the hierarchy.

I see your point, and it sounds quite reasonable to me. So I guess I'll
drop the patch removing the cgroup name part from slab cache names
(patch 2) and resend.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03 10:05       ` Glauber Costa
@ 2014-02-03 13:01         ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-03 13:01 UTC (permalink / raw)
  To: Glauber Costa
  Cc: David Rientjes, Andrew Morton, Michal Hocko, Pekka Enberg,
	Christoph Lameter, linux-mm, LKML, devel

On 02/03/2014 02:05 PM, Glauber Costa wrote:
> On Mon, Feb 3, 2014 at 10:57 AM, Vladimir Davydov
> <vdavydov@parallels.com> wrote:
>> On 02/03/2014 10:21 AM, David Rientjes wrote:
>>> On Sun, 2 Feb 2014, Vladimir Davydov wrote:
>>>
>>>> Per-memcg kmem caches are named as follows:
>>>>
>>>>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
>>>>
>>>> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
>>>> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
>>>> Cache names are exposed to userspace for debugging purposes (e.g. via
>>>> sysfs in case of slub or via dmesg).
>>>>
>>>> Using relative names makes it impossible in general (in case the cgroup
>>>> hierarchy is not flat) to find out which memcg a particular cache
>>>> belongs to, because <cgroup-kmem-id> is not known to the user. Since
>>>> using absolute cgroup names would be an overkill, let's fix this by
>>>> exporting the id of kmem-active memcg via cgroup fs file
>>>> "memory.kmem.id".
>>>>
>>> Hmm, I'm not sure exporting additional information is the best way to do
>>> it only for this purpose.  I do understand the problem in naming
>>> collisions if the hierarchy isn't flat and we typically work around that
>>> by ensuring child memcgs still have a unique memcg.  This isn't only a
>>> problem in slab cache naming, me also avoid printing the entire absolute
>>> names for things like the oom killer.
>> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
>> memcg slab cache names serve for different purposes. The point is oom is
>> a perfectly normal situation for the kernel, and info dumped to dmesg is
>> for admin to find out the cause of the problem (a greedy user or
>> cgroup). On the other hand, slab cache names are dumped to dmesg only on
>> extraordinary situations - like bugs in slab implementation, or double
>> free, or detected memory leaks - where we usually do not need the name
>> of the memcg that triggered the problem, because the bug is likely to be
>> in the kernel subsys using the cache. Plus, the names are exported to
>> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
>> use cases for oom vs slab names are completely different - information
>> vs debugging - and I want to export kmem.id only for the ability of
>> debugging kmemcg and slab subsystems.
>>
> Then maybe it is better to wrap it into some kind of CONFIG_DEBUG wrap.
> We already have other files like that.

May be. However, kmemcg ids are actually exposed to userspace even on
non-debug kernels (for instance, through /sys/kernel/slub), so I guess
it's worth having this always enabled - the overhead of this is
negligible anyway.

Thanks.

>
>>> So it would be nice to have
>>> consensus on how people are supposed to identify memcgs with a hierarchy:
>>> either by exporting information like the id like you do here (but leave
>>> the oom killer still problematic) or by insisting people name their memcgs
>>> with unique names if they care to differentiate them.
>> Anyway, I agree with you that this needs a consensus, because this is a
>> functional change.
>>
>> Thanks.
>
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-02 16:33 ` [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs Vladimir Davydov
  2014-02-03  6:21   ` David Rientjes
@ 2014-02-04 14:40   ` Michal Hocko
  2014-02-04 14:49     ` Vladimir Davydov
  1 sibling, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-02-04 14:40 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Sun 02-02-14 20:33:46, Vladimir Davydov wrote:
> Per-memcg kmem caches are named as follows:
> 
>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
> 
> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
> Cache names are exposed to userspace for debugging purposes (e.g. via
> sysfs in case of slub or via dmesg).

If this is only for debugging purposes then it shouldn't pollute regular
memcg cgroupfs namespace.

> Using relative names makes it impossible in general (in case the cgroup
> hierarchy is not flat) to find out which memcg a particular cache
> belongs to, because <cgroup-kmem-id> is not known to the user. Since
> using absolute cgroup names would be an overkill,

I do not consider it an overkill. We are using the full path when
dumping OOM information so I do not see any reason this should be any
different.

> let's fix this by
> exporting the id of kmem-active memcg via cgroup fs file
> "memory.kmem.id".
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>

Nacked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 53385cd4e6f0..91d242707404 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3113,6 +3113,14 @@ int memcg_cache_id(struct mem_cgroup *memcg)
>  	return memcg ? memcg->kmemcg_id : -1;
>  }
>  
> +static s64 mem_cgroup_cache_id_read(struct cgroup_subsys_state *css,
> +				    struct cftype *cft)
> +{
> +	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> +
> +	return memcg_can_account_kmem(memcg) ? memcg_cache_id(memcg) : -1;
> +}
> +
>  static size_t memcg_caches_array_size(int num_groups)
>  {
>  	ssize_t size;
> @@ -6301,6 +6309,10 @@ static struct cftype mem_cgroup_files[] = {
>  #endif
>  #ifdef CONFIG_MEMCG_KMEM
>  	{
> +		.name = "kmem.id",
> +		.read_s64 = mem_cgroup_cache_id_read,
> +	},
> +	{
>  		.name = "kmem.limit_in_bytes",
>  		.private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
>  		.write_string = mem_cgroup_write,
> -- 
> 1.7.10.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-03  6:57     ` Vladimir Davydov
                         ` (2 preceding siblings ...)
  2014-02-03 11:04       ` David Rientjes
@ 2014-02-04 14:44       ` Michal Hocko
  3 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-02-04 14:44 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: David Rientjes, akpm, penberg, cl, glommer, linux-mm,
	linux-kernel, devel

On Mon 03-02-14 10:57:03, Vladimir Davydov wrote:
> On 02/03/2014 10:21 AM, David Rientjes wrote:
> > On Sun, 2 Feb 2014, Vladimir Davydov wrote:
> >
> >> Per-memcg kmem caches are named as follows:
> >>
> >>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
> >>
> >> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
> >> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
> >> Cache names are exposed to userspace for debugging purposes (e.g. via
> >> sysfs in case of slub or via dmesg).
> >>
> >> Using relative names makes it impossible in general (in case the cgroup
> >> hierarchy is not flat) to find out which memcg a particular cache
> >> belongs to, because <cgroup-kmem-id> is not known to the user. Since
> >> using absolute cgroup names would be an overkill, let's fix this by
> >> exporting the id of kmem-active memcg via cgroup fs file
> >> "memory.kmem.id".
> >>
> > Hmm, I'm not sure exporting additional information is the best way to do 
> > it only for this purpose.  I do understand the problem in naming 
> > collisions if the hierarchy isn't flat and we typically work around that 
> > by ensuring child memcgs still have a unique memcg.  This isn't only a 
> > problem in slab cache naming, me also avoid printing the entire absolute 
> > names for things like the oom killer.
> 
> AFAIU, cgroup identifiers dumped on oom (cgroup paths, currently) and
> memcg slab cache names serve for different purposes. The point is oom is
> a perfectly normal situation for the kernel, and info dumped to dmesg is
> for admin to find out the cause of the problem (a greedy user or
> cgroup). On the other hand, slab cache names are dumped to dmesg only on
> extraordinary situations - like bugs in slab implementation, or double
> free, or detected memory leaks - where we usually do not need the name
> of the memcg that triggered the problem, because the bug is likely to be
> in the kernel subsys using the cache. Plus, the names are exported to
> sysfs in case of slub, again for debugging purposes, AFAIK. So IMO the
> use cases for oom vs slab names are completely different - information
> vs debugging - and I want to export kmem.id only for the ability of
> debugging kmemcg and slab subsystems.

I am really puzzled now. Why do you want to export the id/name then? If
the source memcg is not relevant? I would understand if you tried to
reduce your bug search place by the load which runs in the particular
memcg.

> > So it would be nice to have 
> > consensus on how people are supposed to identify memcgs with a hierarchy: 
> > either by exporting information like the id like you do here (but leave 
> > the oom killer still problematic) or by insisting people name their memcgs 
> > with unique names if they care to differentiate them.
> 
> Anyway, I agree with you that this needs a consensus, because this is a
> functional change.

I am for the full path same as we do when we dump oom information. This
is much easier to consume.

> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names
  2014-02-02 16:33 ` [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names Vladimir Davydov
@ 2014-02-04 14:45   ` Michal Hocko
  2014-02-04 15:11     ` Vladimir Davydov
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-02-04 14:45 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Sun 02-02-14 20:33:47, Vladimir Davydov wrote:
> The cgroup name is not informative at all in case the cgroup hierarchy
> is not flat. Besides, we can always find the memcg a particular cache
> belongs to by its kmemcg id, which is now exported via "memory.kmem.id"
> cgroup fs file for each memcg.
> 
> So let's remove the cgroup name part from kmem caches names - it will
> greatly simplify the call paths and make the code look clearer.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>

I guess this one doesn't make much sense withou 1/8, right?

> ---
>  mm/memcontrol.c  |   63 +++++++++++++-----------------------------------------
>  mm/slab_common.c |    6 +++++-
>  2 files changed, 20 insertions(+), 49 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 91d242707404..3351c5b5486d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3405,44 +3405,6 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
>  	schedule_work(&cachep->memcg_params->destroy);
>  }
>  
> -static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
> -						  struct kmem_cache *s)
> -{
> -	struct kmem_cache *new = NULL;
> -	static char *tmp_name = NULL;
> -	static DEFINE_MUTEX(mutex);	/* protects tmp_name */
> -
> -	BUG_ON(!memcg_can_account_kmem(memcg));
> -
> -	mutex_lock(&mutex);
> -	/*
> -	 * kmem_cache_create_memcg duplicates the given name and
> -	 * cgroup_name for this name requires RCU context.
> -	 * This static temporary buffer is used to prevent from
> -	 * pointless shortliving allocation.
> -	 */
> -	if (!tmp_name) {
> -		tmp_name = kmalloc(PATH_MAX, GFP_KERNEL);
> -		if (!tmp_name)
> -			goto out;
> -	}
> -
> -	rcu_read_lock();
> -	snprintf(tmp_name, PATH_MAX, "%s(%d:%s)", s->name,
> -			 memcg_cache_id(memcg), cgroup_name(memcg->css.cgroup));
> -	rcu_read_unlock();
> -
> -	new = kmem_cache_create_memcg(memcg, tmp_name, s->object_size, s->align,
> -				      (s->flags & ~SLAB_PANIC), s->ctor, s);
> -	if (new)
> -		new->allocflags |= __GFP_KMEMCG;
> -	else
> -		new = s;
> -out:
> -	mutex_unlock(&mutex);
> -	return new;
> -}
> -
>  void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>  {
>  	struct kmem_cache *c;
> @@ -3489,12 +3451,6 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>  	mutex_unlock(&activate_kmem_mutex);
>  }
>  
> -struct create_work {
> -	struct mem_cgroup *memcg;
> -	struct kmem_cache *cachep;
> -	struct work_struct work;
> -};
> -
>  static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
>  {
>  	struct kmem_cache *cachep;
> @@ -3512,13 +3468,24 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
>  	mutex_unlock(&memcg->slab_caches_mutex);
>  }
>  
> +struct create_work {
> +	struct mem_cgroup *memcg;
> +	struct kmem_cache *cachep;
> +	struct work_struct work;
> +};
> +
>  static void memcg_create_cache_work_func(struct work_struct *w)
>  {
> -	struct create_work *cw;
> +	struct create_work *cw = container_of(w, struct create_work, work);
> +	struct mem_cgroup *memcg = cw->memcg;
> +	struct kmem_cache *s = cw->cachep;
> +	struct kmem_cache *new;
>  
> -	cw = container_of(w, struct create_work, work);
> -	memcg_create_kmem_cache(cw->memcg, cw->cachep);
> -	css_put(&cw->memcg->css);
> +	new = kmem_cache_create_memcg(memcg, s->name, s->object_size, s->align,
> +				      (s->flags & ~SLAB_PANIC), s->ctor, s);
> +	if (new)
> +		new->allocflags |= __GFP_KMEMCG;
> +	css_put(&memcg->css);
>  	kfree(cw);
>  }
>  
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1ec3c619ba04..152d9b118b7a 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -213,7 +213,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
>  	s->align = calculate_alignment(flags, align, size);
>  	s->ctor = ctor;
>  
> -	s->name = kstrdup(name, GFP_KERNEL);
> +	if (!memcg)
> +		s->name = kstrdup(name, GFP_KERNEL);
> +	else
> +		s->name = kasprintf(GFP_KERNEL, "%s:%d",
> +				    name, memcg_cache_id(memcg));
>  	if (!s->name)
>  		goto out_free_cache;
>  
> -- 
> 1.7.10.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs
  2014-02-04 14:40   ` Michal Hocko
@ 2014-02-04 14:49     ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-04 14:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/04/2014 06:40 PM, Michal Hocko wrote:
> On Sun 02-02-14 20:33:46, Vladimir Davydov wrote:
>> Per-memcg kmem caches are named as follows:
>>
>>   <global-cache-name>(<cgroup-kmem-id>:<cgroup-name>)
>>
>> where <cgroup-kmem-id> is the unique id of the memcg the cache belongs
>> to, <cgroup-name> is the relative name of the memcg on the cgroup fs.
>> Cache names are exposed to userspace for debugging purposes (e.g. via
>> sysfs in case of slub or via dmesg).
> If this is only for debugging purposes then it shouldn't pollute regular
> memcg cgroupfs namespace.
>
>> Using relative names makes it impossible in general (in case the cgroup
>> hierarchy is not flat) to find out which memcg a particular cache
>> belongs to, because <cgroup-kmem-id> is not known to the user. Since
>> using absolute cgroup names would be an overkill,
> I do not consider it an overkill. We are using the full path when
> dumping OOM information so I do not see any reason this should be any
> different.

When we dump information, we simply print the cgroup path to dmesg and
forget about it while the memcg cache's name will leave at least until
the memcg is destroyed. Basically that means PATH_MAX *
NR_KMEM_ACTIVE_MEMCGS * NR_SLAB_CACHES memory overhead at max.

Anyway, I decided to drop this patch, so please see version 2 of this
patchset (you must be in CC):

https://lkml.org/lkml/2014/2/3/268

Thanks.

>
>> let's fix this by
>> exporting the id of kmem-active memcg via cgroup fs file
>> "memory.kmem.id".
>>
>> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Nacked-by: Michal Hocko <mhocko@suse.cz>
>
>> ---
>>  mm/memcontrol.c |   12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 53385cd4e6f0..91d242707404 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3113,6 +3113,14 @@ int memcg_cache_id(struct mem_cgroup *memcg)
>>  	return memcg ? memcg->kmemcg_id : -1;
>>  }
>>  
>> +static s64 mem_cgroup_cache_id_read(struct cgroup_subsys_state *css,
>> +				    struct cftype *cft)
>> +{
>> +	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>> +
>> +	return memcg_can_account_kmem(memcg) ? memcg_cache_id(memcg) : -1;
>> +}
>> +
>>  static size_t memcg_caches_array_size(int num_groups)
>>  {
>>  	ssize_t size;
>> @@ -6301,6 +6309,10 @@ static struct cftype mem_cgroup_files[] = {
>>  #endif
>>  #ifdef CONFIG_MEMCG_KMEM
>>  	{
>> +		.name = "kmem.id",
>> +		.read_s64 = mem_cgroup_cache_id_read,
>> +	},
>> +	{
>>  		.name = "kmem.limit_in_bytes",
>>  		.private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
>>  		.write_string = mem_cgroup_write,
>> -- 
>> 1.7.10.4
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-02 16:33 ` [PATCH 3/8] memcg, slab: never try to merge memcg caches Vladimir Davydov
@ 2014-02-04 14:52   ` Michal Hocko
  2014-02-04 14:59     ` Vladimir Davydov
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-02-04 14:52 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
> Suppose we are creating memcg cache A that could be merged with cache B
> of the same memcg. Since any memcg cache has the same parameters as its
> parent cache, parent caches PA and PB of memcg caches A and B must be
> mergeable too. That means PA was merged with PB on creation or vice
> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
> even try to create cache B, because it already exists - a contradiction.

I cannot tell I understand the above but I am totally not sure about the
statement bellow.

> So let's remove unused code responsible for merging memcg caches.

How come the code was unused? find_mergeable called cache_match_memcg...

> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> ---
>  mm/slab.h        |   21 ++++-----------------
>  mm/slab_common.c |    8 +++++---
>  mm/slub.c        |   19 +++++++++----------
>  3 files changed, 18 insertions(+), 30 deletions(-)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 8184a7cde272..3045316b7c9d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -55,12 +55,12 @@ extern void create_boot_cache(struct kmem_cache *, const char *name,
>  struct mem_cgroup;
>  #ifdef CONFIG_SLUB
>  struct kmem_cache *
> -__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
> -		   size_t align, unsigned long flags, void (*ctor)(void *));
> +__kmem_cache_alias(const char *name, size_t size, size_t align,
> +		   unsigned long flags, void (*ctor)(void *));
>  #else
>  static inline struct kmem_cache *
> -__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
> -		   size_t align, unsigned long flags, void (*ctor)(void *))
> +__kmem_cache_alias(const char *name, size_t size, size_t align,
> +		   unsigned long flags, void (*ctor)(void *))
>  { return NULL; }
>  #endif
>  
> @@ -119,13 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
>  	return !s->memcg_params || s->memcg_params->is_root_cache;
>  }
>  
> -static inline bool cache_match_memcg(struct kmem_cache *cachep,
> -				     struct mem_cgroup *memcg)
> -{
> -	return (is_root_cache(cachep) && !memcg) ||
> -				(cachep->memcg_params->memcg == memcg);
> -}
> -
>  static inline void memcg_bind_pages(struct kmem_cache *s, int order)
>  {
>  	if (!is_root_cache(s))
> @@ -204,12 +197,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
>  	return true;
>  }
>  
> -static inline bool cache_match_memcg(struct kmem_cache *cachep,
> -				     struct mem_cgroup *memcg)
> -{
> -	return true;
> -}
> -
>  static inline void memcg_bind_pages(struct kmem_cache *s, int order)
>  {
>  }
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 152d9b118b7a..a75834bb966d 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -200,9 +200,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
>  	 */
>  	flags &= CACHE_CREATE_MASK;
>  
> -	s = __kmem_cache_alias(memcg, name, size, align, flags, ctor);
> -	if (s)
> -		goto out_unlock;
> +	if (!memcg) {
> +		s = __kmem_cache_alias(name, size, align, flags, ctor);
> +		if (s)
> +			goto out_unlock;
> +	}
>  
>  	err = -ENOMEM;
>  	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
> diff --git a/mm/slub.c b/mm/slub.c
> index 2b1a6970e46f..962abfdfde06 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3686,9 +3686,8 @@ static int slab_unmergeable(struct kmem_cache *s)
>  	return 0;
>  }
>  
> -static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
> -		size_t align, unsigned long flags, const char *name,
> -		void (*ctor)(void *))
> +static struct kmem_cache *find_mergeable(size_t size, size_t align,
> +		unsigned long flags, const char *name, void (*ctor)(void *))
>  {
>  	struct kmem_cache *s;
>  
> @@ -3707,11 +3706,14 @@ static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
>  		if (slab_unmergeable(s))
>  			continue;
>  
> +		if (!is_root_cache(s))
> +			continue;
> +
>  		if (size > s->size)
>  			continue;
>  
>  		if ((flags & SLUB_MERGE_SAME) != (s->flags & SLUB_MERGE_SAME))
> -				continue;
> +			continue;
>  		/*
>  		 * Check if alignment is compatible.
>  		 * Courtesy of Adrian Drzewiecki
> @@ -3722,21 +3724,18 @@ static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
>  		if (s->size - size >= sizeof(void *))
>  			continue;
>  
> -		if (!cache_match_memcg(s, memcg))
> -			continue;
> -
>  		return s;
>  	}
>  	return NULL;
>  }
>  
>  struct kmem_cache *
> -__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
> -		   size_t align, unsigned long flags, void (*ctor)(void *))
> +__kmem_cache_alias(const char *name, size_t size, size_t align,
> +		   unsigned long flags, void (*ctor)(void *))
>  {
>  	struct kmem_cache *s;
>  
> -	s = find_mergeable(memcg, size, align, flags, name, ctor);
> +	s = find_mergeable(size, align, flags, name, ctor);
>  	if (s) {
>  		s->refcount++;
>  		/*
> -- 
> 1.7.10.4
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 14:52   ` Michal Hocko
@ 2014-02-04 14:59     ` Vladimir Davydov
  2014-02-04 15:11       ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-04 14:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/04/2014 06:52 PM, Michal Hocko wrote:
> On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
>> Suppose we are creating memcg cache A that could be merged with cache B
>> of the same memcg. Since any memcg cache has the same parameters as its
>> parent cache, parent caches PA and PB of memcg caches A and B must be
>> mergeable too. That means PA was merged with PB on creation or vice
>> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
>> even try to create cache B, because it already exists - a contradiction.
> I cannot tell I understand the above but I am totally not sure about the
> statement bellow.
>
>> So let's remove unused code responsible for merging memcg caches.
> How come the code was unused? find_mergeable called cache_match_memcg...

Oh, sorry for misleading comment. I mean the code handling merging of
per-memcg caches is useless, AFAIU: if we find an alias for a per-memcg
cache on kmem_cache_create_memcg(), the parent of the found alias must
be the same as the parent_cache passed to kmem_cache_create_memcg(), but
if it were so, we would never proceed to the memcg cache creation,
because the cache we want to create already exists.

Thanks.

>
>> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
>> ---
>>  mm/slab.h        |   21 ++++-----------------
>>  mm/slab_common.c |    8 +++++---
>>  mm/slub.c        |   19 +++++++++----------
>>  3 files changed, 18 insertions(+), 30 deletions(-)
>>
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 8184a7cde272..3045316b7c9d 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -55,12 +55,12 @@ extern void create_boot_cache(struct kmem_cache *, const char *name,
>>  struct mem_cgroup;
>>  #ifdef CONFIG_SLUB
>>  struct kmem_cache *
>> -__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
>> -		   size_t align, unsigned long flags, void (*ctor)(void *));
>> +__kmem_cache_alias(const char *name, size_t size, size_t align,
>> +		   unsigned long flags, void (*ctor)(void *));
>>  #else
>>  static inline struct kmem_cache *
>> -__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
>> -		   size_t align, unsigned long flags, void (*ctor)(void *))
>> +__kmem_cache_alias(const char *name, size_t size, size_t align,
>> +		   unsigned long flags, void (*ctor)(void *))
>>  { return NULL; }
>>  #endif
>>  
>> @@ -119,13 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
>>  	return !s->memcg_params || s->memcg_params->is_root_cache;
>>  }
>>  
>> -static inline bool cache_match_memcg(struct kmem_cache *cachep,
>> -				     struct mem_cgroup *memcg)
>> -{
>> -	return (is_root_cache(cachep) && !memcg) ||
>> -				(cachep->memcg_params->memcg == memcg);
>> -}
>> -
>>  static inline void memcg_bind_pages(struct kmem_cache *s, int order)
>>  {
>>  	if (!is_root_cache(s))
>> @@ -204,12 +197,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
>>  	return true;
>>  }
>>  
>> -static inline bool cache_match_memcg(struct kmem_cache *cachep,
>> -				     struct mem_cgroup *memcg)
>> -{
>> -	return true;
>> -}
>> -
>>  static inline void memcg_bind_pages(struct kmem_cache *s, int order)
>>  {
>>  }
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 152d9b118b7a..a75834bb966d 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -200,9 +200,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
>>  	 */
>>  	flags &= CACHE_CREATE_MASK;
>>  
>> -	s = __kmem_cache_alias(memcg, name, size, align, flags, ctor);
>> -	if (s)
>> -		goto out_unlock;
>> +	if (!memcg) {
>> +		s = __kmem_cache_alias(name, size, align, flags, ctor);
>> +		if (s)
>> +			goto out_unlock;
>> +	}
>>  
>>  	err = -ENOMEM;
>>  	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 2b1a6970e46f..962abfdfde06 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3686,9 +3686,8 @@ static int slab_unmergeable(struct kmem_cache *s)
>>  	return 0;
>>  }
>>  
>> -static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
>> -		size_t align, unsigned long flags, const char *name,
>> -		void (*ctor)(void *))
>> +static struct kmem_cache *find_mergeable(size_t size, size_t align,
>> +		unsigned long flags, const char *name, void (*ctor)(void *))
>>  {
>>  	struct kmem_cache *s;
>>  
>> @@ -3707,11 +3706,14 @@ static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
>>  		if (slab_unmergeable(s))
>>  			continue;
>>  
>> +		if (!is_root_cache(s))
>> +			continue;
>> +
>>  		if (size > s->size)
>>  			continue;
>>  
>>  		if ((flags & SLUB_MERGE_SAME) != (s->flags & SLUB_MERGE_SAME))
>> -				continue;
>> +			continue;
>>  		/*
>>  		 * Check if alignment is compatible.
>>  		 * Courtesy of Adrian Drzewiecki
>> @@ -3722,21 +3724,18 @@ static struct kmem_cache *find_mergeable(struct mem_cgroup *memcg, size_t size,
>>  		if (s->size - size >= sizeof(void *))
>>  			continue;
>>  
>> -		if (!cache_match_memcg(s, memcg))
>> -			continue;
>> -
>>  		return s;
>>  	}
>>  	return NULL;
>>  }
>>  
>>  struct kmem_cache *
>> -__kmem_cache_alias(struct mem_cgroup *memcg, const char *name, size_t size,
>> -		   size_t align, unsigned long flags, void (*ctor)(void *))
>> +__kmem_cache_alias(const char *name, size_t size, size_t align,
>> +		   unsigned long flags, void (*ctor)(void *))
>>  {
>>  	struct kmem_cache *s;
>>  
>> -	s = find_mergeable(memcg, size, align, flags, name, ctor);
>> +	s = find_mergeable(size, align, flags, name, ctor);
>>  	if (s) {
>>  		s->refcount++;
>>  		/*
>> -- 
>> 1.7.10.4
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names
  2014-02-04 14:45   ` Michal Hocko
@ 2014-02-04 15:11     ` Vladimir Davydov
  2014-02-04 15:13       ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-04 15:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/04/2014 06:45 PM, Michal Hocko wrote:
> On Sun 02-02-14 20:33:47, Vladimir Davydov wrote:
>> The cgroup name is not informative at all in case the cgroup hierarchy
>> is not flat. Besides, we can always find the memcg a particular cache
>> belongs to by its kmemcg id, which is now exported via "memory.kmem.id"
>> cgroup fs file for each memcg.
>>
>> So let's remove the cgroup name part from kmem caches names - it will
>> greatly simplify the call paths and make the code look clearer.
>>
>> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> I guess this one doesn't make much sense withou 1/8, right?

Actually, the rest of the patchset does not depend on the logic of the
first two - they only have a couple of hunks overlapped. However, there
is v2 already.

Thanks.

>
>> ---
>>  mm/memcontrol.c  |   63 +++++++++++++-----------------------------------------
>>  mm/slab_common.c |    6 +++++-
>>  2 files changed, 20 insertions(+), 49 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 91d242707404..3351c5b5486d 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3405,44 +3405,6 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
>>  	schedule_work(&cachep->memcg_params->destroy);
>>  }
>>  
>> -static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
>> -						  struct kmem_cache *s)
>> -{
>> -	struct kmem_cache *new = NULL;
>> -	static char *tmp_name = NULL;
>> -	static DEFINE_MUTEX(mutex);	/* protects tmp_name */
>> -
>> -	BUG_ON(!memcg_can_account_kmem(memcg));
>> -
>> -	mutex_lock(&mutex);
>> -	/*
>> -	 * kmem_cache_create_memcg duplicates the given name and
>> -	 * cgroup_name for this name requires RCU context.
>> -	 * This static temporary buffer is used to prevent from
>> -	 * pointless shortliving allocation.
>> -	 */
>> -	if (!tmp_name) {
>> -		tmp_name = kmalloc(PATH_MAX, GFP_KERNEL);
>> -		if (!tmp_name)
>> -			goto out;
>> -	}
>> -
>> -	rcu_read_lock();
>> -	snprintf(tmp_name, PATH_MAX, "%s(%d:%s)", s->name,
>> -			 memcg_cache_id(memcg), cgroup_name(memcg->css.cgroup));
>> -	rcu_read_unlock();
>> -
>> -	new = kmem_cache_create_memcg(memcg, tmp_name, s->object_size, s->align,
>> -				      (s->flags & ~SLAB_PANIC), s->ctor, s);
>> -	if (new)
>> -		new->allocflags |= __GFP_KMEMCG;
>> -	else
>> -		new = s;
>> -out:
>> -	mutex_unlock(&mutex);
>> -	return new;
>> -}
>> -
>>  void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>>  {
>>  	struct kmem_cache *c;
>> @@ -3489,12 +3451,6 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>>  	mutex_unlock(&activate_kmem_mutex);
>>  }
>>  
>> -struct create_work {
>> -	struct mem_cgroup *memcg;
>> -	struct kmem_cache *cachep;
>> -	struct work_struct work;
>> -};
>> -
>>  static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
>>  {
>>  	struct kmem_cache *cachep;
>> @@ -3512,13 +3468,24 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
>>  	mutex_unlock(&memcg->slab_caches_mutex);
>>  }
>>  
>> +struct create_work {
>> +	struct mem_cgroup *memcg;
>> +	struct kmem_cache *cachep;
>> +	struct work_struct work;
>> +};
>> +
>>  static void memcg_create_cache_work_func(struct work_struct *w)
>>  {
>> -	struct create_work *cw;
>> +	struct create_work *cw = container_of(w, struct create_work, work);
>> +	struct mem_cgroup *memcg = cw->memcg;
>> +	struct kmem_cache *s = cw->cachep;
>> +	struct kmem_cache *new;
>>  
>> -	cw = container_of(w, struct create_work, work);
>> -	memcg_create_kmem_cache(cw->memcg, cw->cachep);
>> -	css_put(&cw->memcg->css);
>> +	new = kmem_cache_create_memcg(memcg, s->name, s->object_size, s->align,
>> +				      (s->flags & ~SLAB_PANIC), s->ctor, s);
>> +	if (new)
>> +		new->allocflags |= __GFP_KMEMCG;
>> +	css_put(&memcg->css);
>>  	kfree(cw);
>>  }
>>  
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 1ec3c619ba04..152d9b118b7a 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -213,7 +213,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
>>  	s->align = calculate_alignment(flags, align, size);
>>  	s->ctor = ctor;
>>  
>> -	s->name = kstrdup(name, GFP_KERNEL);
>> +	if (!memcg)
>> +		s->name = kstrdup(name, GFP_KERNEL);
>> +	else
>> +		s->name = kasprintf(GFP_KERNEL, "%s:%d",
>> +				    name, memcg_cache_id(memcg));
>>  	if (!s->name)
>>  		goto out_free_cache;
>>  
>> -- 
>> 1.7.10.4
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 14:59     ` Vladimir Davydov
@ 2014-02-04 15:11       ` Michal Hocko
  2014-02-04 15:27         ` Vladimir Davydov
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-02-04 15:11 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Tue 04-02-14 18:59:23, Vladimir Davydov wrote:
> On 02/04/2014 06:52 PM, Michal Hocko wrote:
> > On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
> >> Suppose we are creating memcg cache A that could be merged with cache B
> >> of the same memcg. Since any memcg cache has the same parameters as its
> >> parent cache, parent caches PA and PB of memcg caches A and B must be
> >> mergeable too. That means PA was merged with PB on creation or vice
> >> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
> >> even try to create cache B, because it already exists - a contradiction.
> > I cannot tell I understand the above but I am totally not sure about the
> > statement bellow.
> >
> >> So let's remove unused code responsible for merging memcg caches.
> > How come the code was unused? find_mergeable called cache_match_memcg...
> 
> Oh, sorry for misleading comment. I mean the code handling merging of
> per-memcg caches is useless, AFAIU: if we find an alias for a per-memcg
> cache on kmem_cache_create_memcg(), the parent of the found alias must
> be the same as the parent_cache passed to kmem_cache_create_memcg(), but
> if it were so, we would never proceed to the memcg cache creation,
> because the cache we want to create already exists.

I am still not sure I understand this correctly. So the outcome of this
patch is that compatible caches of different memcgs can be merged
together? Sorry if this is a stupid question but I am not that familiar
with this area much I am just seeing that cache_match_memcg goes away
and my understanding of the function is that it should prevent from
different memcg's caches merging.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names
  2014-02-04 15:11     ` Vladimir Davydov
@ 2014-02-04 15:13       ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2014-02-04 15:13 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Tue 04-02-14 19:11:24, Vladimir Davydov wrote:
> On 02/04/2014 06:45 PM, Michal Hocko wrote:
> > On Sun 02-02-14 20:33:47, Vladimir Davydov wrote:
> >> The cgroup name is not informative at all in case the cgroup hierarchy
> >> is not flat. Besides, we can always find the memcg a particular cache
> >> belongs to by its kmemcg id, which is now exported via "memory.kmem.id"
> >> cgroup fs file for each memcg.
> >>
> >> So let's remove the cgroup name part from kmem caches names - it will
> >> greatly simplify the call paths and make the code look clearer.
> >>
> >> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> > I guess this one doesn't make much sense withou 1/8, right?
> 
> Actually, the rest of the patchset does not depend on the logic of the
> first two - they only have a couple of hunks overlapped. However, there
> is v2 already.

OK, I am already switched to v2 (you are too fast...). I've just noticed
that this version has generated quite some discussion so I felt I should
react here first.

> 
> Thanks.
> 
> >
> >> ---
> >>  mm/memcontrol.c  |   63 +++++++++++++-----------------------------------------
> >>  mm/slab_common.c |    6 +++++-
> >>  2 files changed, 20 insertions(+), 49 deletions(-)
> >>
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 91d242707404..3351c5b5486d 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -3405,44 +3405,6 @@ void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
> >>  	schedule_work(&cachep->memcg_params->destroy);
> >>  }
> >>  
> >> -static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
> >> -						  struct kmem_cache *s)
> >> -{
> >> -	struct kmem_cache *new = NULL;
> >> -	static char *tmp_name = NULL;
> >> -	static DEFINE_MUTEX(mutex);	/* protects tmp_name */
> >> -
> >> -	BUG_ON(!memcg_can_account_kmem(memcg));
> >> -
> >> -	mutex_lock(&mutex);
> >> -	/*
> >> -	 * kmem_cache_create_memcg duplicates the given name and
> >> -	 * cgroup_name for this name requires RCU context.
> >> -	 * This static temporary buffer is used to prevent from
> >> -	 * pointless shortliving allocation.
> >> -	 */
> >> -	if (!tmp_name) {
> >> -		tmp_name = kmalloc(PATH_MAX, GFP_KERNEL);
> >> -		if (!tmp_name)
> >> -			goto out;
> >> -	}
> >> -
> >> -	rcu_read_lock();
> >> -	snprintf(tmp_name, PATH_MAX, "%s(%d:%s)", s->name,
> >> -			 memcg_cache_id(memcg), cgroup_name(memcg->css.cgroup));
> >> -	rcu_read_unlock();
> >> -
> >> -	new = kmem_cache_create_memcg(memcg, tmp_name, s->object_size, s->align,
> >> -				      (s->flags & ~SLAB_PANIC), s->ctor, s);
> >> -	if (new)
> >> -		new->allocflags |= __GFP_KMEMCG;
> >> -	else
> >> -		new = s;
> >> -out:
> >> -	mutex_unlock(&mutex);
> >> -	return new;
> >> -}
> >> -
> >>  void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
> >>  {
> >>  	struct kmem_cache *c;
> >> @@ -3489,12 +3451,6 @@ void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
> >>  	mutex_unlock(&activate_kmem_mutex);
> >>  }
> >>  
> >> -struct create_work {
> >> -	struct mem_cgroup *memcg;
> >> -	struct kmem_cache *cachep;
> >> -	struct work_struct work;
> >> -};
> >> -
> >>  static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
> >>  {
> >>  	struct kmem_cache *cachep;
> >> @@ -3512,13 +3468,24 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
> >>  	mutex_unlock(&memcg->slab_caches_mutex);
> >>  }
> >>  
> >> +struct create_work {
> >> +	struct mem_cgroup *memcg;
> >> +	struct kmem_cache *cachep;
> >> +	struct work_struct work;
> >> +};
> >> +
> >>  static void memcg_create_cache_work_func(struct work_struct *w)
> >>  {
> >> -	struct create_work *cw;
> >> +	struct create_work *cw = container_of(w, struct create_work, work);
> >> +	struct mem_cgroup *memcg = cw->memcg;
> >> +	struct kmem_cache *s = cw->cachep;
> >> +	struct kmem_cache *new;
> >>  
> >> -	cw = container_of(w, struct create_work, work);
> >> -	memcg_create_kmem_cache(cw->memcg, cw->cachep);
> >> -	css_put(&cw->memcg->css);
> >> +	new = kmem_cache_create_memcg(memcg, s->name, s->object_size, s->align,
> >> +				      (s->flags & ~SLAB_PANIC), s->ctor, s);
> >> +	if (new)
> >> +		new->allocflags |= __GFP_KMEMCG;
> >> +	css_put(&memcg->css);
> >>  	kfree(cw);
> >>  }
> >>  
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index 1ec3c619ba04..152d9b118b7a 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -213,7 +213,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg, const char *name, size_t size,
> >>  	s->align = calculate_alignment(flags, align, size);
> >>  	s->ctor = ctor;
> >>  
> >> -	s->name = kstrdup(name, GFP_KERNEL);
> >> +	if (!memcg)
> >> +		s->name = kstrdup(name, GFP_KERNEL);
> >> +	else
> >> +		s->name = kasprintf(GFP_KERNEL, "%s:%d",
> >> +				    name, memcg_cache_id(memcg));
> >>  	if (!s->name)
> >>  		goto out_free_cache;
> >>  
> >> -- 
> >> 1.7.10.4
> >>
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 15:11       ` Michal Hocko
@ 2014-02-04 15:27         ` Vladimir Davydov
  2014-02-04 15:43           ` Glauber Costa
  2014-02-06 14:07           ` Michal Hocko
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-04 15:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/04/2014 07:11 PM, Michal Hocko wrote:
> On Tue 04-02-14 18:59:23, Vladimir Davydov wrote:
>> On 02/04/2014 06:52 PM, Michal Hocko wrote:
>>> On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
>>>> Suppose we are creating memcg cache A that could be merged with cache B
>>>> of the same memcg. Since any memcg cache has the same parameters as its
>>>> parent cache, parent caches PA and PB of memcg caches A and B must be
>>>> mergeable too. That means PA was merged with PB on creation or vice
>>>> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
>>>> even try to create cache B, because it already exists - a contradiction.
>>> I cannot tell I understand the above but I am totally not sure about the
>>> statement bellow.
>>>
>>>> So let's remove unused code responsible for merging memcg caches.
>>> How come the code was unused? find_mergeable called cache_match_memcg...
>> Oh, sorry for misleading comment. I mean the code handling merging of
>> per-memcg caches is useless, AFAIU: if we find an alias for a per-memcg
>> cache on kmem_cache_create_memcg(), the parent of the found alias must
>> be the same as the parent_cache passed to kmem_cache_create_memcg(), but
>> if it were so, we would never proceed to the memcg cache creation,
>> because the cache we want to create already exists.
> I am still not sure I understand this correctly. So the outcome of this
> patch is that compatible caches of different memcgs can be merged
> together? Sorry if this is a stupid question but I am not that familiar
> with this area much I am just seeing that cache_match_memcg goes away
> and my understanding of the function is that it should prevent from
> different memcg's caches merging.

Let me try to explain how I understand it.

What is cache merging/aliasing? When we create a cache
(kmem_cache_create()), we first try to find a compatible cache that
already exists and can handle requests from the new cache. If it is, we
do not create any new caches, instead we simply increment the old cache
refcount and return it.

What about memcgs? Currently, it operates in the same way, i.e. on memcg
cache creation we also try to find a compatible cache of the same memcg
first. But if there were such a cache, they parents would have been
merged (i.e. it would be the same cache). That means we would not even
get to this memcg cache creation, because it already exists. That's why
the code handling memcg caches merging seems pointless to me.

What does this patch change? Actually, it introduces no functional
changes - it only remove the code trying to find an alias for a memcg
cache, because it will fail anyway. So this is rather a cleanup.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 15:27         ` Vladimir Davydov
@ 2014-02-04 15:43           ` Glauber Costa
  2014-02-04 16:04             ` Vladimir Davydov
  2014-02-06 14:07           ` Michal Hocko
  1 sibling, 1 reply; 33+ messages in thread
From: Glauber Costa @ 2014-02-04 15:43 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Michal Hocko, Andrew Morton, David Rientjes, Pekka Enberg,
	Christoph Lameter, linux-mm, LKML, devel

On Tue, Feb 4, 2014 at 7:27 PM, Vladimir Davydov <vdavydov@parallels.com> wrote:
> On 02/04/2014 07:11 PM, Michal Hocko wrote:
>> On Tue 04-02-14 18:59:23, Vladimir Davydov wrote:
>>> On 02/04/2014 06:52 PM, Michal Hocko wrote:
>>>> On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
>>>>> Suppose we are creating memcg cache A that could be merged with cache B
>>>>> of the same memcg. Since any memcg cache has the same parameters as its
>>>>> parent cache, parent caches PA and PB of memcg caches A and B must be
>>>>> mergeable too. That means PA was merged with PB on creation or vice
>>>>> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
>>>>> even try to create cache B, because it already exists - a contradiction.
>>>> I cannot tell I understand the above but I am totally not sure about the
>>>> statement bellow.
>>>>
>>>>> So let's remove unused code responsible for merging memcg caches.
>>>> How come the code was unused? find_mergeable called cache_match_memcg...
>>> Oh, sorry for misleading comment. I mean the code handling merging of
>>> per-memcg caches is useless, AFAIU: if we find an alias for a per-memcg
>>> cache on kmem_cache_create_memcg(), the parent of the found alias must
>>> be the same as the parent_cache passed to kmem_cache_create_memcg(), but
>>> if it were so, we would never proceed to the memcg cache creation,
>>> because the cache we want to create already exists.
>> I am still not sure I understand this correctly. So the outcome of this
>> patch is that compatible caches of different memcgs can be merged
>> together? Sorry if this is a stupid question but I am not that familiar
>> with this area much I am just seeing that cache_match_memcg goes away
>> and my understanding of the function is that it should prevent from
>> different memcg's caches merging.
>
> Let me try to explain how I understand it.
>
> What is cache merging/aliasing? When we create a cache
> (kmem_cache_create()), we first try to find a compatible cache that
> already exists and can handle requests from the new cache. If it is, we
> do not create any new caches, instead we simply increment the old cache
> refcount and return it.
>
> What about memcgs? Currently, it operates in the same way, i.e. on memcg
> cache creation we also try to find a compatible cache of the same memcg
> first. But if there were such a cache, they parents would have been
> merged (i.e. it would be the same cache). That means we would not even
> get to this memcg cache creation, because it already exists. That's why
> the code handling memcg caches merging seems pointless to me.
>

IIRC, this may not always hold. Some of the properties are configurable via
sysfs, and it might be that you haven't merged two parent caches because they
properties differ, but would be fine merging the child caches.

If all properties we check are compile-time parameters, then it should be okay.


-- 
E Mare, Libertas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 15:43           ` Glauber Costa
@ 2014-02-04 16:04             ` Vladimir Davydov
  2014-02-04 16:10               ` Glauber Costa
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-04 16:04 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Michal Hocko, Andrew Morton, David Rientjes, Pekka Enberg,
	Christoph Lameter, linux-mm, LKML, devel

On 02/04/2014 07:43 PM, Glauber Costa wrote:
> On Tue, Feb 4, 2014 at 7:27 PM, Vladimir Davydov <vdavydov@parallels.com> wrote:
>> On 02/04/2014 07:11 PM, Michal Hocko wrote:
>>> On Tue 04-02-14 18:59:23, Vladimir Davydov wrote:
>>>> On 02/04/2014 06:52 PM, Michal Hocko wrote:
>>>>> On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
>>>>>> Suppose we are creating memcg cache A that could be merged with cache B
>>>>>> of the same memcg. Since any memcg cache has the same parameters as its
>>>>>> parent cache, parent caches PA and PB of memcg caches A and B must be
>>>>>> mergeable too. That means PA was merged with PB on creation or vice
>>>>>> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
>>>>>> even try to create cache B, because it already exists - a contradiction.
>>>>> I cannot tell I understand the above but I am totally not sure about the
>>>>> statement bellow.
>>>>>
>>>>>> So let's remove unused code responsible for merging memcg caches.
>>>>> How come the code was unused? find_mergeable called cache_match_memcg...
>>>> Oh, sorry for misleading comment. I mean the code handling merging of
>>>> per-memcg caches is useless, AFAIU: if we find an alias for a per-memcg
>>>> cache on kmem_cache_create_memcg(), the parent of the found alias must
>>>> be the same as the parent_cache passed to kmem_cache_create_memcg(), but
>>>> if it were so, we would never proceed to the memcg cache creation,
>>>> because the cache we want to create already exists.
>>> I am still not sure I understand this correctly. So the outcome of this
>>> patch is that compatible caches of different memcgs can be merged
>>> together? Sorry if this is a stupid question but I am not that familiar
>>> with this area much I am just seeing that cache_match_memcg goes away
>>> and my understanding of the function is that it should prevent from
>>> different memcg's caches merging.
>> Let me try to explain how I understand it.
>>
>> What is cache merging/aliasing? When we create a cache
>> (kmem_cache_create()), we first try to find a compatible cache that
>> already exists and can handle requests from the new cache. If it is, we
>> do not create any new caches, instead we simply increment the old cache
>> refcount and return it.
>>
>> What about memcgs? Currently, it operates in the same way, i.e. on memcg
>> cache creation we also try to find a compatible cache of the same memcg
>> first. But if there were such a cache, they parents would have been
>> merged (i.e. it would be the same cache). That means we would not even
>> get to this memcg cache creation, because it already exists. That's why
>> the code handling memcg caches merging seems pointless to me.
>>
> IIRC, this may not always hold. Some of the properties are configurable via
> sysfs, and it might be that you haven't merged two parent caches because they
> properties differ, but would be fine merging the child caches.
>
> If all properties we check are compile-time parameters, then it should be okay.

AFAIK, we decide if a cache should be merged only basing on its internal
parameters, such as size, ctor, flags, align (see find_mergeable()), but
they are the same for root and memcg caches.

The only way to disable slub merging is via the "slub_nomerge" kernel
parameter, so it is impossible to get a situation when parents can not
be merged, while children can.

The only point of concern may be so called boot caches
(create_boot_cache()), which are forcefully not allowed to be merged by
setting refcount = -1. There are actually only two of them kmem_cache
and kmem_cache_node used for internal slub allocations. I guess it was
done preliminary, and we should not merge them for memcgs neither.

To sum it up, if a particular root cache is allowed to be merged, it was
allowed to be merged since its creation and all its children caches are
also allowed to be merged. If merging was not allowed for a root cache
when it was created, we should not merge its children caches.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 16:04             ` Vladimir Davydov
@ 2014-02-04 16:10               ` Glauber Costa
  0 siblings, 0 replies; 33+ messages in thread
From: Glauber Costa @ 2014-02-04 16:10 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Michal Hocko, Andrew Morton, David Rientjes, Pekka Enberg,
	Christoph Lameter, linux-mm, LKML, devel

On Tue, Feb 4, 2014 at 8:04 PM, Vladimir Davydov <vdavydov@parallels.com> wrote:
> On 02/04/2014 07:43 PM, Glauber Costa wrote:
>> On Tue, Feb 4, 2014 at 7:27 PM, Vladimir Davydov <vdavydov@parallels.com> wrote:
>>> On 02/04/2014 07:11 PM, Michal Hocko wrote:
>>>> On Tue 04-02-14 18:59:23, Vladimir Davydov wrote:
>>>>> On 02/04/2014 06:52 PM, Michal Hocko wrote:
>>>>>> On Sun 02-02-14 20:33:48, Vladimir Davydov wrote:
>>>>>>> Suppose we are creating memcg cache A that could be merged with cache B
>>>>>>> of the same memcg. Since any memcg cache has the same parameters as its
>>>>>>> parent cache, parent caches PA and PB of memcg caches A and B must be
>>>>>>> mergeable too. That means PA was merged with PB on creation or vice
>>>>>>> versa, i.e. PA = PB. From that it follows that A = B, and we couldn't
>>>>>>> even try to create cache B, because it already exists - a contradiction.
>>>>>> I cannot tell I understand the above but I am totally not sure about the
>>>>>> statement bellow.
>>>>>>
>>>>>>> So let's remove unused code responsible for merging memcg caches.
>>>>>> How come the code was unused? find_mergeable called cache_match_memcg...
>>>>> Oh, sorry for misleading comment. I mean the code handling merging of
>>>>> per-memcg caches is useless, AFAIU: if we find an alias for a per-memcg
>>>>> cache on kmem_cache_create_memcg(), the parent of the found alias must
>>>>> be the same as the parent_cache passed to kmem_cache_create_memcg(), but
>>>>> if it were so, we would never proceed to the memcg cache creation,
>>>>> because the cache we want to create already exists.
>>>> I am still not sure I understand this correctly. So the outcome of this
>>>> patch is that compatible caches of different memcgs can be merged
>>>> together? Sorry if this is a stupid question but I am not that familiar
>>>> with this area much I am just seeing that cache_match_memcg goes away
>>>> and my understanding of the function is that it should prevent from
>>>> different memcg's caches merging.
>>> Let me try to explain how I understand it.
>>>
>>> What is cache merging/aliasing? When we create a cache
>>> (kmem_cache_create()), we first try to find a compatible cache that
>>> already exists and can handle requests from the new cache. If it is, we
>>> do not create any new caches, instead we simply increment the old cache
>>> refcount and return it.
>>>
>>> What about memcgs? Currently, it operates in the same way, i.e. on memcg
>>> cache creation we also try to find a compatible cache of the same memcg
>>> first. But if there were such a cache, they parents would have been
>>> merged (i.e. it would be the same cache). That means we would not even
>>> get to this memcg cache creation, because it already exists. That's why
>>> the code handling memcg caches merging seems pointless to me.
>>>
>> IIRC, this may not always hold. Some of the properties are configurable via
>> sysfs, and it might be that you haven't merged two parent caches because they
>> properties differ, but would be fine merging the child caches.
>>
>> If all properties we check are compile-time parameters, then it should be okay.
>
> AFAIK, we decide if a cache should be merged only basing on its internal
> parameters, such as size, ctor, flags, align (see find_mergeable()), but
> they are the same for root and memcg caches.
>
> The only way to disable slub merging is via the "slub_nomerge" kernel
> parameter, so it is impossible to get a situation when parents can not
> be merged, while children can.
>
> The only point of concern may be so called boot caches
> (create_boot_cache()), which are forcefully not allowed to be merged by
> setting refcount = -1. There are actually only two of them kmem_cache
> and kmem_cache_node used for internal slub allocations. I guess it was
> done preliminary, and we should not merge them for memcgs neither.
>
> To sum it up, if a particular root cache is allowed to be merged, it was
> allowed to be merged since its creation and all its children caches are
> also allowed to be merged. If merging was not allowed for a root cache
> when it was created, we should not merge its children caches.
>
> Thanks.

Fair Enough.


-- 
E Mare, Libertas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-04 15:27         ` Vladimir Davydov
  2014-02-04 15:43           ` Glauber Costa
@ 2014-02-06 14:07           ` Michal Hocko
  2014-02-06 14:15             ` Vladimir Davydov
  1 sibling, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-02-06 14:07 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Tue 04-02-14 19:27:19, Vladimir Davydov wrote:
[...]
> What does this patch change? Actually, it introduces no functional
> changes - it only remove the code trying to find an alias for a memcg
> cache, because it will fail anyway. So this is rather a cleanup.

But this also means that two different memcgs might share the same cache
and so the pages for that cache, no? Actually it would depend on timing
because a new page would be chaged for the current allocator.

cachep->memcg_params->memcg == memcg would prevent from such a merge
previously AFAICS, or am I still confused?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-06 14:07           ` Michal Hocko
@ 2014-02-06 14:15             ` Vladimir Davydov
  2014-02-06 15:29               ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-06 14:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/06/2014 06:07 PM, Michal Hocko wrote:
> On Tue 04-02-14 19:27:19, Vladimir Davydov wrote:
> [...]
>> What does this patch change? Actually, it introduces no functional
>> changes - it only remove the code trying to find an alias for a memcg
>> cache, because it will fail anyway. So this is rather a cleanup.
> But this also means that two different memcgs might share the same cache
> and so the pages for that cache, no?

No, because in this patch I explicitly forbid to merge memcg caches by
this hunk:

@@ -200,9 +200,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg,
const char *name, size_t size,
      */
     flags &= CACHE_CREATE_MASK;
 
-    s = __kmem_cache_alias(memcg, name, size, align, flags, ctor);
-    if (s)
-        goto out_unlock;
+    if (!memcg) {
+        s = __kmem_cache_alias(name, size, align, flags, ctor);
+        if (s)
+            goto out_unlock;
+    }

Thanks.

> Actually it would depend on timing
> because a new page would be chaged for the current allocator.
>
> cachep->memcg_params->memcg == memcg would prevent from such a merge
> previously AFAICS, or am I still confused?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-06 14:15             ` Vladimir Davydov
@ 2014-02-06 15:29               ` Michal Hocko
  2014-02-06 15:39                 ` Vladimir Davydov
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2014-02-06 15:29 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On Thu 06-02-14 18:15:50, Vladimir Davydov wrote:
> On 02/06/2014 06:07 PM, Michal Hocko wrote:
> > On Tue 04-02-14 19:27:19, Vladimir Davydov wrote:
> > [...]
> >> What does this patch change? Actually, it introduces no functional
> >> changes - it only remove the code trying to find an alias for a memcg
> >> cache, because it will fail anyway. So this is rather a cleanup.
> > But this also means that two different memcgs might share the same cache
> > and so the pages for that cache, no?
> 
> No, because in this patch I explicitly forbid to merge memcg caches by
> this hunk:
> 
> @@ -200,9 +200,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg,
> const char *name, size_t size,
>       */
>      flags &= CACHE_CREATE_MASK;
>  
> -    s = __kmem_cache_alias(memcg, name, size, align, flags, ctor);
> -    if (s)
> -        goto out_unlock;
> +    if (!memcg) {
> +        s = __kmem_cache_alias(name, size, align, flags, ctor);
> +        if (s)
> +            goto out_unlock;
> +    }

Ohh, that was the missing part. Thanks and sorry I have missed it. Maybe
it is worth mentioning in the changelog?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/8] memcg, slab: never try to merge memcg caches
  2014-02-06 15:29               ` Michal Hocko
@ 2014-02-06 15:39                 ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2014-02-06 15:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, rientjes, penberg, cl, glommer, linux-mm, linux-kernel, devel

On 02/06/2014 07:29 PM, Michal Hocko wrote:
> On Thu 06-02-14 18:15:50, Vladimir Davydov wrote:
>> On 02/06/2014 06:07 PM, Michal Hocko wrote:
>>> On Tue 04-02-14 19:27:19, Vladimir Davydov wrote:
>>> [...]
>>>> What does this patch change? Actually, it introduces no functional
>>>> changes - it only remove the code trying to find an alias for a memcg
>>>> cache, because it will fail anyway. So this is rather a cleanup.
>>> But this also means that two different memcgs might share the same cache
>>> and so the pages for that cache, no?
>> No, because in this patch I explicitly forbid to merge memcg caches by
>> this hunk:
>>
>> @@ -200,9 +200,11 @@ kmem_cache_create_memcg(struct mem_cgroup *memcg,
>> const char *name, size_t size,
>>       */
>>      flags &= CACHE_CREATE_MASK;
>>  
>> -    s = __kmem_cache_alias(memcg, name, size, align, flags, ctor);
>> -    if (s)
>> -        goto out_unlock;
>> +    if (!memcg) {
>> +        s = __kmem_cache_alias(name, size, align, flags, ctor);
>> +        if (s)
>> +            goto out_unlock;
>> +    }
> Ohh, that was the missing part. Thanks and sorry I have missed it.

Never mind.

> Maybe it is worth mentioning in the changelog?

Hmm, changelog? This hunk was there from the very beginning :-/

Anyway, I'm going to expand this patch's comment, because it's too short
and difficult to understand.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-02-06 15:39 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-02 16:33 [PATCH 0/8] memcg-vs-slab related fixes, improvements, cleanups Vladimir Davydov
2014-02-02 16:33 ` [PATCH 1/8] memcg: export kmemcg cache id via cgroup fs Vladimir Davydov
2014-02-03  6:21   ` David Rientjes
2014-02-03  6:57     ` Vladimir Davydov
2014-02-03  7:19       ` Vladimir Davydov
2014-02-03 10:05       ` Glauber Costa
2014-02-03 13:01         ` Vladimir Davydov
2014-02-03 11:04       ` David Rientjes
2014-02-03 13:00         ` Vladimir Davydov
2014-02-04 14:44       ` Michal Hocko
2014-02-04 14:40   ` Michal Hocko
2014-02-04 14:49     ` Vladimir Davydov
2014-02-02 16:33 ` [PATCH 2/8] memcg, slab: remove cgroup name from memcg cache names Vladimir Davydov
2014-02-04 14:45   ` Michal Hocko
2014-02-04 15:11     ` Vladimir Davydov
2014-02-04 15:13       ` Michal Hocko
2014-02-02 16:33 ` [PATCH 3/8] memcg, slab: never try to merge memcg caches Vladimir Davydov
2014-02-04 14:52   ` Michal Hocko
2014-02-04 14:59     ` Vladimir Davydov
2014-02-04 15:11       ` Michal Hocko
2014-02-04 15:27         ` Vladimir Davydov
2014-02-04 15:43           ` Glauber Costa
2014-02-04 16:04             ` Vladimir Davydov
2014-02-04 16:10               ` Glauber Costa
2014-02-06 14:07           ` Michal Hocko
2014-02-06 14:15             ` Vladimir Davydov
2014-02-06 15:29               ` Michal Hocko
2014-02-06 15:39                 ` Vladimir Davydov
2014-02-02 16:33 ` [PATCH 4/8] memcg, slab: separate memcg vs root cache creation paths Vladimir Davydov
2014-02-02 16:33 ` [PATCH 5/8] slub: adjust memcg caches when creating cache alias Vladimir Davydov
2014-02-02 16:33 ` [PATCH 6/8] slub: rework sysfs layout for memcg caches Vladimir Davydov
2014-02-02 16:33 ` [PATCH 7/8] memcg, slab: unregister cache from memcg before starting to destroy it Vladimir Davydov
2014-02-02 16:33 ` [PATCH 8/8] memcg, slab: do not destroy children caches if parent has aliases Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).