All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm 0/4] memcg-vs-slab cleanup
@ 2014-04-09 15:02 ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi,

This patchset does a bit of cleanup in the kmemcg implementation
hopefully making it more readable.

It requires the following two patches applied on top of the mmotm tree:
  [PATCH -mm v2 1/2] sl[au]b: charge slabs to kmemcg explicitly
  [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG
(see https://lkml.org/lkml/2014/4/1/100)

Thanks,

Vladimir Davydov (4):
  memcg, slab: do not schedule cache destruction when last page goes
    away
  memcg, slab: merge memcg_{bind,release}_pages to
    memcg_{un}charge_slab
  memcg, slab: change memcg::slab_caches_mutex vs slab_mutex locking
    order
  memcg, slab: remove memcg_cache_params::destroy work

 include/linux/memcontrol.h |   15 +---
 include/linux/slab.h       |    8 +-
 mm/memcontrol.c            |  209 +++++++++++++++++---------------------------
 mm/slab.c                  |    2 -
 mm/slab.h                  |   28 +-----
 mm/slab_common.c           |   22 ++---
 mm/slub.c                  |    2 -
 7 files changed, 94 insertions(+), 192 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH -mm 0/4] memcg-vs-slab cleanup
@ 2014-04-09 15:02 ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi,

This patchset does a bit of cleanup in the kmemcg implementation
hopefully making it more readable.

It requires the following two patches applied on top of the mmotm tree:
  [PATCH -mm v2 1/2] sl[au]b: charge slabs to kmemcg explicitly
  [PATCH -mm v2.2] mm: get rid of __GFP_KMEMCG
(see https://lkml.org/lkml/2014/4/1/100)

Thanks,

Vladimir Davydov (4):
  memcg, slab: do not schedule cache destruction when last page goes
    away
  memcg, slab: merge memcg_{bind,release}_pages to
    memcg_{un}charge_slab
  memcg, slab: change memcg::slab_caches_mutex vs slab_mutex locking
    order
  memcg, slab: remove memcg_cache_params::destroy work

 include/linux/memcontrol.h |   15 +---
 include/linux/slab.h       |    8 +-
 mm/memcontrol.c            |  209 +++++++++++++++++---------------------------
 mm/slab.c                  |    2 -
 mm/slab.h                  |   28 +-----
 mm/slab_common.c           |   22 ++---
 mm/slub.c                  |    2 -
 7 files changed, 94 insertions(+), 192 deletions(-)

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-09 15:02 ` Vladimir Davydov
@ 2014-04-09 15:02   ` Vladimir Davydov
  -1 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

After the memcg is offlined, we mark its kmem caches that cannot be
deleted right now due to pending objects as dead by setting the
memcg_cache_params::dead flag, so that memcg_release_pages will schedule
cache destruction (memcg_cache_params::destroy) as soon as the last slab
of the cache is freed (memcg_cache_params::nr_pages drops to zero).

I guess the idea was to destroy the caches as soon as possible, i.e.
immediately after freeing the last object. However, it just doesn't work
that way, because kmem caches always preserve some pages for the sake of
performance, so that nr_pages never gets to zero unless the cache is
shrunk explicitly using kmem_cache_shrink. Of course, we could account
the total number of objects on the cache or check if all the slabs
allocated for the cache are empty on kmem_cache_free and schedule
destruction if so, but that would be too costly.

Thus we have a piece of code that works only when we explicitly call
kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
it's racy in fact. For instance, kmem_cache_shrink may free the last
slab and thus schedule cache destruction before it finishes checking
that the cache is empty, which can lead to use-after-free.

So I propose to remove this async cache destruction from
memcg_release_pages, and check if the cache is empty explicitly after
calling kmem_cache_shrink instead. This will simplify things a lot w/o
introducing any functional changes.

And regarding dead memcg caches (i.e. those that are left hanging around
after memcg offline for they have objects), I suppose we should reap
them either periodically or on vmpressure as Glauber suggested
initially. I'm going to implement this later.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    1 -
 include/linux/slab.h       |    2 --
 mm/memcontrol.c            |   63 ++------------------------------------------
 mm/slab.h                  |    7 ++---
 4 files changed, 4 insertions(+), 69 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 13acdb5259f5..41b381412367 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -515,7 +515,6 @@ __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
 void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep);
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
 /**
diff --git a/include/linux/slab.h b/include/linux/slab.h
index eaaf2f69c54d..0217e05e1d83 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -513,7 +513,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @memcg: pointer to the memcg this cache belongs to
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
- * @dead: set to true after the memcg dies; the cache may still be around.
  * @nr_pages: number of pages that belongs to this cache.
  * @destroy: worker to be called whenever we are ready, or believe we may be
  *           ready, to destroy this cache.
@@ -529,7 +528,6 @@ struct memcg_cache_params {
 			struct mem_cgroup *memcg;
 			struct list_head list;
 			struct kmem_cache *root_cache;
-			bool dead;
 			atomic_t nr_pages;
 			struct work_struct destroy;
 		};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba30b26fc483..78431620d225 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3267,60 +3267,11 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 
 	cachep = memcg_params_to_cache(p);
 
-	/*
-	 * If we get down to 0 after shrink, we could delete right away.
-	 * However, memcg_release_pages() already puts us back in the workqueue
-	 * in that case. If we proceed deleting, we'll get a dangling
-	 * reference, and removing the object from the workqueue in that case
-	 * is unnecessary complication. We are not a fast path.
-	 *
-	 * Note that this case is fundamentally different from racing with
-	 * shrink_slab(): if memcg_cgroup_destroy_cache() is called in
-	 * kmem_cache_shrink, not only we would be reinserting a dead cache
-	 * into the queue, but doing so from inside the worker racing to
-	 * destroy it.
-	 *
-	 * So if we aren't down to zero, we'll just schedule a worker and try
-	 * again
-	 */
-	if (atomic_read(&cachep->memcg_params->nr_pages) != 0)
-		kmem_cache_shrink(cachep);
-	else
+	kmem_cache_shrink(cachep);
+	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
 		kmem_cache_destroy(cachep);
 }
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
-{
-	if (!cachep->memcg_params->dead)
-		return;
-
-	/*
-	 * There are many ways in which we can get here.
-	 *
-	 * We can get to a memory-pressure situation while the delayed work is
-	 * still pending to run. The vmscan shrinkers can then release all
-	 * cache memory and get us to destruction. If this is the case, we'll
-	 * be executed twice, which is a bug (the second time will execute over
-	 * bogus data). In this case, cancelling the work should be fine.
-	 *
-	 * But we can also get here from the worker itself, if
-	 * kmem_cache_shrink is enough to shake all the remaining objects and
-	 * get the page count to 0. In this case, we'll deadlock if we try to
-	 * cancel the work (the worker runs with an internal lock held, which
-	 * is the same lock we would hold for cancel_work_sync().)
-	 *
-	 * Since we can't possibly know who got us here, just refrain from
-	 * running if there is already work pending
-	 */
-	if (work_pending(&cachep->memcg_params->destroy))
-		return;
-	/*
-	 * We have to defer the actual destroying to a workqueue, because
-	 * we might currently be in a context that cannot sleep.
-	 */
-	schedule_work(&cachep->memcg_params->destroy);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3346,16 +3297,7 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		 * We will now manually delete the caches, so to avoid races
 		 * we need to cancel all pending destruction workers and
 		 * proceed with destruction ourselves.
-		 *
-		 * kmem_cache_destroy() will call kmem_cache_shrink internally,
-		 * and that could spawn the workers again: it is likely that
-		 * the cache still have active pages until this very moment.
-		 * This would lead us back to mem_cgroup_destroy_cache.
-		 *
-		 * But that will not execute at all if the "dead" flag is not
-		 * set, so flip it down to guarantee we are in control.
 		 */
-		c->memcg_params->dead = false;
 		cancel_work_sync(&c->memcg_params->destroy);
 		kmem_cache_destroy(c);
 
@@ -3377,7 +3319,6 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 	mutex_lock(&memcg->slab_caches_mutex);
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		cachep->memcg_params->dead = true;
 		schedule_work(&cachep->memcg_params->destroy);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
diff --git a/mm/slab.h b/mm/slab.h
index 3db3c52f80a2..efe14d420010 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -127,11 +127,8 @@ static inline void memcg_bind_pages(struct kmem_cache *s, int order)
 
 static inline void memcg_release_pages(struct kmem_cache *s, int order)
 {
-	if (is_root_cache(s))
-		return;
-
-	if (atomic_sub_and_test((1 << order), &s->memcg_params->nr_pages))
-		mem_cgroup_destroy_cache(s);
+	if (!is_root_cache(s))
+		atomic_sub(1 << order, &s->memcg_params->nr_pages);
 }
 
 static inline bool slab_equal_or_root(struct kmem_cache *s,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-09 15:02   ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

After the memcg is offlined, we mark its kmem caches that cannot be
deleted right now due to pending objects as dead by setting the
memcg_cache_params::dead flag, so that memcg_release_pages will schedule
cache destruction (memcg_cache_params::destroy) as soon as the last slab
of the cache is freed (memcg_cache_params::nr_pages drops to zero).

I guess the idea was to destroy the caches as soon as possible, i.e.
immediately after freeing the last object. However, it just doesn't work
that way, because kmem caches always preserve some pages for the sake of
performance, so that nr_pages never gets to zero unless the cache is
shrunk explicitly using kmem_cache_shrink. Of course, we could account
the total number of objects on the cache or check if all the slabs
allocated for the cache are empty on kmem_cache_free and schedule
destruction if so, but that would be too costly.

Thus we have a piece of code that works only when we explicitly call
kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
it's racy in fact. For instance, kmem_cache_shrink may free the last
slab and thus schedule cache destruction before it finishes checking
that the cache is empty, which can lead to use-after-free.

So I propose to remove this async cache destruction from
memcg_release_pages, and check if the cache is empty explicitly after
calling kmem_cache_shrink instead. This will simplify things a lot w/o
introducing any functional changes.

And regarding dead memcg caches (i.e. those that are left hanging around
after memcg offline for they have objects), I suppose we should reap
them either periodically or on vmpressure as Glauber suggested
initially. I'm going to implement this later.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    1 -
 include/linux/slab.h       |    2 --
 mm/memcontrol.c            |   63 ++------------------------------------------
 mm/slab.h                  |    7 ++---
 4 files changed, 4 insertions(+), 69 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 13acdb5259f5..41b381412367 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -515,7 +515,6 @@ __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
 void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep);
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
 /**
diff --git a/include/linux/slab.h b/include/linux/slab.h
index eaaf2f69c54d..0217e05e1d83 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -513,7 +513,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @memcg: pointer to the memcg this cache belongs to
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
- * @dead: set to true after the memcg dies; the cache may still be around.
  * @nr_pages: number of pages that belongs to this cache.
  * @destroy: worker to be called whenever we are ready, or believe we may be
  *           ready, to destroy this cache.
@@ -529,7 +528,6 @@ struct memcg_cache_params {
 			struct mem_cgroup *memcg;
 			struct list_head list;
 			struct kmem_cache *root_cache;
-			bool dead;
 			atomic_t nr_pages;
 			struct work_struct destroy;
 		};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba30b26fc483..78431620d225 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3267,60 +3267,11 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 
 	cachep = memcg_params_to_cache(p);
 
-	/*
-	 * If we get down to 0 after shrink, we could delete right away.
-	 * However, memcg_release_pages() already puts us back in the workqueue
-	 * in that case. If we proceed deleting, we'll get a dangling
-	 * reference, and removing the object from the workqueue in that case
-	 * is unnecessary complication. We are not a fast path.
-	 *
-	 * Note that this case is fundamentally different from racing with
-	 * shrink_slab(): if memcg_cgroup_destroy_cache() is called in
-	 * kmem_cache_shrink, not only we would be reinserting a dead cache
-	 * into the queue, but doing so from inside the worker racing to
-	 * destroy it.
-	 *
-	 * So if we aren't down to zero, we'll just schedule a worker and try
-	 * again
-	 */
-	if (atomic_read(&cachep->memcg_params->nr_pages) != 0)
-		kmem_cache_shrink(cachep);
-	else
+	kmem_cache_shrink(cachep);
+	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
 		kmem_cache_destroy(cachep);
 }
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
-{
-	if (!cachep->memcg_params->dead)
-		return;
-
-	/*
-	 * There are many ways in which we can get here.
-	 *
-	 * We can get to a memory-pressure situation while the delayed work is
-	 * still pending to run. The vmscan shrinkers can then release all
-	 * cache memory and get us to destruction. If this is the case, we'll
-	 * be executed twice, which is a bug (the second time will execute over
-	 * bogus data). In this case, cancelling the work should be fine.
-	 *
-	 * But we can also get here from the worker itself, if
-	 * kmem_cache_shrink is enough to shake all the remaining objects and
-	 * get the page count to 0. In this case, we'll deadlock if we try to
-	 * cancel the work (the worker runs with an internal lock held, which
-	 * is the same lock we would hold for cancel_work_sync().)
-	 *
-	 * Since we can't possibly know who got us here, just refrain from
-	 * running if there is already work pending
-	 */
-	if (work_pending(&cachep->memcg_params->destroy))
-		return;
-	/*
-	 * We have to defer the actual destroying to a workqueue, because
-	 * we might currently be in a context that cannot sleep.
-	 */
-	schedule_work(&cachep->memcg_params->destroy);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3346,16 +3297,7 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		 * We will now manually delete the caches, so to avoid races
 		 * we need to cancel all pending destruction workers and
 		 * proceed with destruction ourselves.
-		 *
-		 * kmem_cache_destroy() will call kmem_cache_shrink internally,
-		 * and that could spawn the workers again: it is likely that
-		 * the cache still have active pages until this very moment.
-		 * This would lead us back to mem_cgroup_destroy_cache.
-		 *
-		 * But that will not execute at all if the "dead" flag is not
-		 * set, so flip it down to guarantee we are in control.
 		 */
-		c->memcg_params->dead = false;
 		cancel_work_sync(&c->memcg_params->destroy);
 		kmem_cache_destroy(c);
 
@@ -3377,7 +3319,6 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 	mutex_lock(&memcg->slab_caches_mutex);
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		cachep->memcg_params->dead = true;
 		schedule_work(&cachep->memcg_params->destroy);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
diff --git a/mm/slab.h b/mm/slab.h
index 3db3c52f80a2..efe14d420010 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -127,11 +127,8 @@ static inline void memcg_bind_pages(struct kmem_cache *s, int order)
 
 static inline void memcg_release_pages(struct kmem_cache *s, int order)
 {
-	if (is_root_cache(s))
-		return;
-
-	if (atomic_sub_and_test((1 << order), &s->memcg_params->nr_pages))
-		mem_cgroup_destroy_cache(s);
+	if (!is_root_cache(s))
+		atomic_sub(1 << order, &s->memcg_params->nr_pages);
 }
 
 static inline bool slab_equal_or_root(struct kmem_cache *s,
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 2/4] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
  2014-04-09 15:02 ` Vladimir Davydov
@ 2014-04-09 15:02   ` Vladimir Davydov
  -1 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

Currently we have two pairs of kmemcg-related functions that are called
on slab alloc/free. The first is memcg_{bind,release}_pages that count
the total number of pages allocated on a kmem cache. The second is
memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
counter. Let's just merge them to keep the code clean.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    4 ++--
 mm/memcontrol.c            |   22 ++++++++++++++++++++--
 mm/slab.c                  |    2 --
 mm/slab.h                  |   25 ++-----------------------
 mm/slub.c                  |    2 --
 5 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 41b381412367..da87fa2124cf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -512,8 +512,8 @@ void memcg_update_array_size(int num_groups);
 struct kmem_cache *
 __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
 
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 78431620d225..253d6a064d89 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
 }
 #endif
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
+static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 {
 	struct res_counter *fail_res;
 	int ret = 0;
@@ -2982,7 +2982,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 	return ret;
 }
 
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
+static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
 {
 	res_counter_uncharge(&memcg->res, size);
 	if (do_swap_account)
@@ -3380,6 +3380,24 @@ static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 	__memcg_create_cache_enqueue(memcg, cachep);
 	memcg_resume_kmem_account();
 }
+
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
+{
+	int res;
+
+	res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp,
+				PAGE_SIZE << order);
+	if (!res)
+		atomic_add(1 << order, &cachep->memcg_params->nr_pages);
+	return res;
+}
+
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
+{
+	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
+	atomic_sub(1 << order, &cachep->memcg_params->nr_pages);
+}
+
 /*
  * Return the kmem_cache we're supposed to use for a slab allocation.
  * We try to use the current memcg's version of the cache.
diff --git a/mm/slab.c b/mm/slab.c
index af126a37dafd..cd645dbaab0a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1689,7 +1689,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
 		SetPageSlabPfmemalloc(page);
-	memcg_bind_pages(cachep, cachep->gfporder);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1725,7 +1724,6 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	page_mapcount_reset(page);
 	page->mapping = NULL;
 
-	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
 	__free_pages(page, cachep->gfporder);
diff --git a/mm/slab.h b/mm/slab.h
index efe14d420010..11eb623f0e61 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -119,18 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_add(1 << order, &s->memcg_params->nr_pages);
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_sub(1 << order, &s->memcg_params->nr_pages);
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 					struct kmem_cache *p)
 {
@@ -196,8 +184,7 @@ static __always_inline int memcg_charge_slab(struct kmem_cache *s,
 		return 0;
 	if (is_root_cache(s))
 		return 0;
-	return memcg_charge_kmem(s->memcg_params->memcg, gfp,
-				 PAGE_SIZE << order);
+	return __memcg_charge_slab(s, gfp, order);
 }
 
 static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
@@ -206,7 +193,7 @@ static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
 		return;
 	if (is_root_cache(s))
 		return;
-	memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order);
+	__memcg_uncharge_slab(s, order);
 }
 #else
 static inline bool is_root_cache(struct kmem_cache *s)
@@ -214,14 +201,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 				      struct kmem_cache *p)
 {
diff --git a/mm/slub.c b/mm/slub.c
index fa7a1817835e..c17d8b9975be 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1427,7 +1427,6 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	order = compound_order(page);
 	inc_slabs_node(s, page_to_nid(page), page->objects);
-	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
@@ -1478,7 +1477,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
 
-	memcg_release_pages(s, order);
 	page_mapcount_reset(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 2/4] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
@ 2014-04-09 15:02   ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

Currently we have two pairs of kmemcg-related functions that are called
on slab alloc/free. The first is memcg_{bind,release}_pages that count
the total number of pages allocated on a kmem cache. The second is
memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
counter. Let's just merge them to keep the code clean.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    4 ++--
 mm/memcontrol.c            |   22 ++++++++++++++++++++--
 mm/slab.c                  |    2 --
 mm/slab.h                  |   25 ++-----------------------
 mm/slub.c                  |    2 --
 5 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 41b381412367..da87fa2124cf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -512,8 +512,8 @@ void memcg_update_array_size(int num_groups);
 struct kmem_cache *
 __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
 
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 78431620d225..253d6a064d89 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
 }
 #endif
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
+static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 {
 	struct res_counter *fail_res;
 	int ret = 0;
@@ -2982,7 +2982,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 	return ret;
 }
 
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
+static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
 {
 	res_counter_uncharge(&memcg->res, size);
 	if (do_swap_account)
@@ -3380,6 +3380,24 @@ static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 	__memcg_create_cache_enqueue(memcg, cachep);
 	memcg_resume_kmem_account();
 }
+
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
+{
+	int res;
+
+	res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp,
+				PAGE_SIZE << order);
+	if (!res)
+		atomic_add(1 << order, &cachep->memcg_params->nr_pages);
+	return res;
+}
+
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
+{
+	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
+	atomic_sub(1 << order, &cachep->memcg_params->nr_pages);
+}
+
 /*
  * Return the kmem_cache we're supposed to use for a slab allocation.
  * We try to use the current memcg's version of the cache.
diff --git a/mm/slab.c b/mm/slab.c
index af126a37dafd..cd645dbaab0a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1689,7 +1689,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
 		SetPageSlabPfmemalloc(page);
-	memcg_bind_pages(cachep, cachep->gfporder);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1725,7 +1724,6 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	page_mapcount_reset(page);
 	page->mapping = NULL;
 
-	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
 	__free_pages(page, cachep->gfporder);
diff --git a/mm/slab.h b/mm/slab.h
index efe14d420010..11eb623f0e61 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -119,18 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_add(1 << order, &s->memcg_params->nr_pages);
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_sub(1 << order, &s->memcg_params->nr_pages);
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 					struct kmem_cache *p)
 {
@@ -196,8 +184,7 @@ static __always_inline int memcg_charge_slab(struct kmem_cache *s,
 		return 0;
 	if (is_root_cache(s))
 		return 0;
-	return memcg_charge_kmem(s->memcg_params->memcg, gfp,
-				 PAGE_SIZE << order);
+	return __memcg_charge_slab(s, gfp, order);
 }
 
 static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
@@ -206,7 +193,7 @@ static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
 		return;
 	if (is_root_cache(s))
 		return;
-	memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order);
+	__memcg_uncharge_slab(s, order);
 }
 #else
 static inline bool is_root_cache(struct kmem_cache *s)
@@ -214,14 +201,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 				      struct kmem_cache *p)
 {
diff --git a/mm/slub.c b/mm/slub.c
index fa7a1817835e..c17d8b9975be 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1427,7 +1427,6 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	order = compound_order(page);
 	inc_slabs_node(s, page_to_nid(page), page->objects);
-	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
@@ -1478,7 +1477,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
 
-	memcg_release_pages(s, order);
 	page_mapcount_reset(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 3/4] memcg, slab: change memcg::slab_caches_mutex vs slab_mutex locking order
  2014-04-09 15:02 ` Vladimir Davydov
@ 2014-04-09 15:02   ` Vladimir Davydov
  -1 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

When creating/destroying a kmem cache for a memcg we must hold two locks
- memcg::slab_caches_mutex, which protects memcg caches list, and the
slab_mutex taken by kmem_cache_create/destroy. Currently we first take
the slab_mutex and then in memcg_{un}register_cache take the memcg's
slab_caches_mutex in order to synchronize changes to memcg caches list.

Such a locking order create the ugly dependency, which prevents us from
calling kmem_cache_shrink/destroy while iterating per-memcg caches list,
because we must hold memcg::slab_caches_mutex to protect against changes
to the list. As a result, in mem_cgroup_destroy_all_caches we can't just
shrink and destroy a memcg cache if it's get emptied. Instead we
schedule an async work that does the trick.

This patch changes the locking order of the two mutexes to opposite so
that we could get rid of the memcg cache destruction work (this is done
in the next patch).

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |   10 -----
 include/linux/slab.h       |    3 +-
 mm/memcontrol.c            |   89 +++++++++++++++++++++++---------------------
 mm/slab_common.c           |   22 ++++-------
 4 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index da87fa2124cf..d815b26f8086 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -503,8 +503,6 @@ char *memcg_create_cache_name(struct mem_cgroup *memcg,
 int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 			     struct kmem_cache *root_cache);
 void memcg_free_cache_params(struct kmem_cache *s);
-void memcg_register_cache(struct kmem_cache *s);
-void memcg_unregister_cache(struct kmem_cache *s);
 
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups);
 void memcg_update_array_size(int num_groups);
@@ -646,14 +644,6 @@ static inline void memcg_free_cache_params(struct kmem_cache *s)
 {
 }
 
-static inline void memcg_register_cache(struct kmem_cache *s)
-{
-}
-
-static inline void memcg_unregister_cache(struct kmem_cache *s)
-{
-}
-
 static inline struct kmem_cache *
 memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0217e05e1d83..22ebcf475814 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -116,7 +116,8 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
 			unsigned long,
 			void (*)(void *));
 #ifdef CONFIG_MEMCG_KMEM
-void kmem_cache_create_memcg(struct mem_cgroup *, struct kmem_cache *);
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *,
+					   struct kmem_cache *);
 #endif
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 253d6a064d89..9621157f2e5b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3156,24 +3156,33 @@ void memcg_free_cache_params(struct kmem_cache *s)
 	kfree(s->memcg_params);
 }
 
-void memcg_register_cache(struct kmem_cache *s)
+static void memcg_kmem_create_cache(struct mem_cgroup *memcg,
+				    struct kmem_cache *root_cache)
 {
-	struct kmem_cache *root;
-	struct mem_cgroup *memcg;
+	struct kmem_cache *cachep;
 	int id;
+	
+	id = memcg_cache_id(memcg);
 
-	if (is_root_cache(s))
-		return;
+	mutex_lock(&memcg->slab_caches_mutex);
+	/*
+	 * Since per-memcg caches are created asynchronously on first
+	 * allocation (see memcg_kmem_get_cache()), several threads can try to
+	 * create the same cache, but only one of them may succeed.
+	 */
+	if (cache_from_memcg_idx(root_cache, id))
+		goto out;
 
+	cachep = kmem_cache_create_memcg(memcg, root_cache);
 	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
+	 * If we could not create a memcg cache, do not complain, because
+	 * that's not critical at all as we can always proceed with the root
+	 * cache.
 	 */
-	lockdep_assert_held(&slab_mutex);
+	if (!cachep)
+		goto out;
 
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
-	id = memcg_cache_id(memcg);
+	list_add(&cachep->memcg_params->list, &memcg->memcg_slab_caches);
 
 	/*
 	 * Since readers won't lock (see cache_from_memcg_idx()), we need a
@@ -3183,48 +3192,45 @@ void memcg_register_cache(struct kmem_cache *s)
 	smp_wmb();
 
 	/*
-	 * Initialize the pointer to this cache in its parent's memcg_params
-	 * before adding it to the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
+	 * Holding the activate_kmem_mutex assures nobody will touch the
+	 * memcg_caches array while we are modifying it.
 	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id]);
-	root->memcg_params->memcg_caches[id] = s;
-
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_add(&s->memcg_params->list, &memcg->memcg_slab_caches);
+	mutex_lock(&activate_kmem_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id]);
+	root_cache->memcg_params->memcg_caches[id] = cachep;
+	mutex_unlock(&activate_kmem_mutex);
+out:
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
-void memcg_unregister_cache(struct kmem_cache *s)
+static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 {
-	struct kmem_cache *root;
+	struct kmem_cache *root_cache;
 	struct mem_cgroup *memcg;
 	int id;
 
-	if (is_root_cache(s))
-		return;
+	BUG_ON(is_root_cache(cachep));
 
-	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
-	 */
-	lockdep_assert_held(&slab_mutex);
-
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
+	root_cache = cachep->memcg_params->root_cache;
+	memcg = cachep->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
 	mutex_lock(&memcg->slab_caches_mutex);
-	list_del(&s->memcg_params->list);
-	mutex_unlock(&memcg->slab_caches_mutex);
 
 	/*
-	 * Clear the pointer to this cache in its parent's memcg_params only
-	 * after removing it from the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
+	 * Holding the activate_kmem_mutex assures nobody will touch the
+	 * memcg_caches array while we are modifying it.
 	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id] != s);
-	root->memcg_params->memcg_caches[id] = NULL;
+	mutex_lock(&activate_kmem_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
+	root_cache->memcg_params->memcg_caches[id] = NULL;
+	mutex_unlock(&activate_kmem_mutex);
+
+	list_del(&cachep->memcg_params->list);
+
+	kmem_cache_destroy(cachep);
+
+	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
 /*
@@ -3264,12 +3270,11 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 	struct memcg_cache_params *p;
 
 	p = container_of(w, struct memcg_cache_params, destroy);
-
 	cachep = memcg_params_to_cache(p);
 
 	kmem_cache_shrink(cachep);
 	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
-		kmem_cache_destroy(cachep);
+		memcg_kmem_destroy_cache(cachep);
 }
 
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
@@ -3299,7 +3304,7 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		 * proceed with destruction ourselves.
 		 */
 		cancel_work_sync(&c->memcg_params->destroy);
-		kmem_cache_destroy(c);
+		memcg_kmem_destroy_cache(c);
 
 		if (cache_from_memcg_idx(s, i))
 			failed++;
@@ -3336,7 +3341,7 @@ static void memcg_create_cache_work_func(struct work_struct *w)
 	struct mem_cgroup *memcg = cw->memcg;
 	struct kmem_cache *cachep = cw->cachep;
 
-	kmem_cache_create_memcg(memcg, cachep);
+	memcg_kmem_create_cache(memcg, cachep);
 	css_put(&memcg->css);
 	kfree(cw);
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index cab4c49b3e8c..b3968ca1e55d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -160,7 +160,6 @@ do_kmem_cache_create(char *name, size_t object_size, size_t size, size_t align,
 
 	s->refcount = 1;
 	list_add(&s->list, &slab_caches);
-	memcg_register_cache(s);
 out:
 	if (err)
 		return ERR_PTR(err);
@@ -266,22 +265,15 @@ EXPORT_SYMBOL(kmem_cache_create);
  * requests going from @memcg to @root_cache. The new cache inherits properties
  * from its parent.
  */
-void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_cache)
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *memcg,
+					   struct kmem_cache *root_cache)
 {
-	struct kmem_cache *s;
+	struct kmem_cache *s = NULL;
 	char *cache_name;
 
 	get_online_cpus();
 	mutex_lock(&slab_mutex);
 
-	/*
-	 * Since per-memcg caches are created asynchronously on first
-	 * allocation (see memcg_kmem_get_cache()), several threads can try to
-	 * create the same cache, but only one of them may succeed.
-	 */
-	if (cache_from_memcg_idx(root_cache, memcg_cache_id(memcg)))
-		goto out_unlock;
-
 	cache_name = memcg_create_cache_name(memcg, root_cache);
 	if (!cache_name)
 		goto out_unlock;
@@ -290,12 +282,15 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 				 root_cache->size, root_cache->align,
 				 root_cache->flags, root_cache->ctor,
 				 memcg, root_cache);
-	if (IS_ERR(s))
+	if (IS_ERR(s)) {
 		kfree(cache_name);
+		s = NULL;
+	}
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
 	put_online_cpus();
+	return s;
 }
 
 static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
@@ -332,11 +327,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		goto out_unlock;
 
 	list_del(&s->list);
-	memcg_unregister_cache(s);
-
 	if (__kmem_cache_shutdown(s) != 0) {
 		list_add(&s->list, &slab_caches);
-		memcg_register_cache(s);
 		printk(KERN_ERR "kmem_cache_destroy %s: "
 		       "Slab cache still has objects\n", s->name);
 		dump_stack();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 3/4] memcg, slab: change memcg::slab_caches_mutex vs slab_mutex locking order
@ 2014-04-09 15:02   ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

When creating/destroying a kmem cache for a memcg we must hold two locks
- memcg::slab_caches_mutex, which protects memcg caches list, and the
slab_mutex taken by kmem_cache_create/destroy. Currently we first take
the slab_mutex and then in memcg_{un}register_cache take the memcg's
slab_caches_mutex in order to synchronize changes to memcg caches list.

Such a locking order create the ugly dependency, which prevents us from
calling kmem_cache_shrink/destroy while iterating per-memcg caches list,
because we must hold memcg::slab_caches_mutex to protect against changes
to the list. As a result, in mem_cgroup_destroy_all_caches we can't just
shrink and destroy a memcg cache if it's get emptied. Instead we
schedule an async work that does the trick.

This patch changes the locking order of the two mutexes to opposite so
that we could get rid of the memcg cache destruction work (this is done
in the next patch).

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |   10 -----
 include/linux/slab.h       |    3 +-
 mm/memcontrol.c            |   89 +++++++++++++++++++++++---------------------
 mm/slab_common.c           |   22 ++++-------
 4 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index da87fa2124cf..d815b26f8086 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -503,8 +503,6 @@ char *memcg_create_cache_name(struct mem_cgroup *memcg,
 int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 			     struct kmem_cache *root_cache);
 void memcg_free_cache_params(struct kmem_cache *s);
-void memcg_register_cache(struct kmem_cache *s);
-void memcg_unregister_cache(struct kmem_cache *s);
 
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups);
 void memcg_update_array_size(int num_groups);
@@ -646,14 +644,6 @@ static inline void memcg_free_cache_params(struct kmem_cache *s)
 {
 }
 
-static inline void memcg_register_cache(struct kmem_cache *s)
-{
-}
-
-static inline void memcg_unregister_cache(struct kmem_cache *s)
-{
-}
-
 static inline struct kmem_cache *
 memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0217e05e1d83..22ebcf475814 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -116,7 +116,8 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
 			unsigned long,
 			void (*)(void *));
 #ifdef CONFIG_MEMCG_KMEM
-void kmem_cache_create_memcg(struct mem_cgroup *, struct kmem_cache *);
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *,
+					   struct kmem_cache *);
 #endif
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 253d6a064d89..9621157f2e5b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3156,24 +3156,33 @@ void memcg_free_cache_params(struct kmem_cache *s)
 	kfree(s->memcg_params);
 }
 
-void memcg_register_cache(struct kmem_cache *s)
+static void memcg_kmem_create_cache(struct mem_cgroup *memcg,
+				    struct kmem_cache *root_cache)
 {
-	struct kmem_cache *root;
-	struct mem_cgroup *memcg;
+	struct kmem_cache *cachep;
 	int id;
+	
+	id = memcg_cache_id(memcg);
 
-	if (is_root_cache(s))
-		return;
+	mutex_lock(&memcg->slab_caches_mutex);
+	/*
+	 * Since per-memcg caches are created asynchronously on first
+	 * allocation (see memcg_kmem_get_cache()), several threads can try to
+	 * create the same cache, but only one of them may succeed.
+	 */
+	if (cache_from_memcg_idx(root_cache, id))
+		goto out;
 
+	cachep = kmem_cache_create_memcg(memcg, root_cache);
 	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
+	 * If we could not create a memcg cache, do not complain, because
+	 * that's not critical at all as we can always proceed with the root
+	 * cache.
 	 */
-	lockdep_assert_held(&slab_mutex);
+	if (!cachep)
+		goto out;
 
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
-	id = memcg_cache_id(memcg);
+	list_add(&cachep->memcg_params->list, &memcg->memcg_slab_caches);
 
 	/*
 	 * Since readers won't lock (see cache_from_memcg_idx()), we need a
@@ -3183,48 +3192,45 @@ void memcg_register_cache(struct kmem_cache *s)
 	smp_wmb();
 
 	/*
-	 * Initialize the pointer to this cache in its parent's memcg_params
-	 * before adding it to the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
+	 * Holding the activate_kmem_mutex assures nobody will touch the
+	 * memcg_caches array while we are modifying it.
 	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id]);
-	root->memcg_params->memcg_caches[id] = s;
-
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_add(&s->memcg_params->list, &memcg->memcg_slab_caches);
+	mutex_lock(&activate_kmem_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id]);
+	root_cache->memcg_params->memcg_caches[id] = cachep;
+	mutex_unlock(&activate_kmem_mutex);
+out:
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
-void memcg_unregister_cache(struct kmem_cache *s)
+static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 {
-	struct kmem_cache *root;
+	struct kmem_cache *root_cache;
 	struct mem_cgroup *memcg;
 	int id;
 
-	if (is_root_cache(s))
-		return;
+	BUG_ON(is_root_cache(cachep));
 
-	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
-	 */
-	lockdep_assert_held(&slab_mutex);
-
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
+	root_cache = cachep->memcg_params->root_cache;
+	memcg = cachep->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
 	mutex_lock(&memcg->slab_caches_mutex);
-	list_del(&s->memcg_params->list);
-	mutex_unlock(&memcg->slab_caches_mutex);
 
 	/*
-	 * Clear the pointer to this cache in its parent's memcg_params only
-	 * after removing it from the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
+	 * Holding the activate_kmem_mutex assures nobody will touch the
+	 * memcg_caches array while we are modifying it.
 	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id] != s);
-	root->memcg_params->memcg_caches[id] = NULL;
+	mutex_lock(&activate_kmem_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
+	root_cache->memcg_params->memcg_caches[id] = NULL;
+	mutex_unlock(&activate_kmem_mutex);
+
+	list_del(&cachep->memcg_params->list);
+
+	kmem_cache_destroy(cachep);
+
+	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
 /*
@@ -3264,12 +3270,11 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 	struct memcg_cache_params *p;
 
 	p = container_of(w, struct memcg_cache_params, destroy);
-
 	cachep = memcg_params_to_cache(p);
 
 	kmem_cache_shrink(cachep);
 	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
-		kmem_cache_destroy(cachep);
+		memcg_kmem_destroy_cache(cachep);
 }
 
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
@@ -3299,7 +3304,7 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		 * proceed with destruction ourselves.
 		 */
 		cancel_work_sync(&c->memcg_params->destroy);
-		kmem_cache_destroy(c);
+		memcg_kmem_destroy_cache(c);
 
 		if (cache_from_memcg_idx(s, i))
 			failed++;
@@ -3336,7 +3341,7 @@ static void memcg_create_cache_work_func(struct work_struct *w)
 	struct mem_cgroup *memcg = cw->memcg;
 	struct kmem_cache *cachep = cw->cachep;
 
-	kmem_cache_create_memcg(memcg, cachep);
+	memcg_kmem_create_cache(memcg, cachep);
 	css_put(&memcg->css);
 	kfree(cw);
 }
diff --git a/mm/slab_common.c b/mm/slab_common.c
index cab4c49b3e8c..b3968ca1e55d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -160,7 +160,6 @@ do_kmem_cache_create(char *name, size_t object_size, size_t size, size_t align,
 
 	s->refcount = 1;
 	list_add(&s->list, &slab_caches);
-	memcg_register_cache(s);
 out:
 	if (err)
 		return ERR_PTR(err);
@@ -266,22 +265,15 @@ EXPORT_SYMBOL(kmem_cache_create);
  * requests going from @memcg to @root_cache. The new cache inherits properties
  * from its parent.
  */
-void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_cache)
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *memcg,
+					   struct kmem_cache *root_cache)
 {
-	struct kmem_cache *s;
+	struct kmem_cache *s = NULL;
 	char *cache_name;
 
 	get_online_cpus();
 	mutex_lock(&slab_mutex);
 
-	/*
-	 * Since per-memcg caches are created asynchronously on first
-	 * allocation (see memcg_kmem_get_cache()), several threads can try to
-	 * create the same cache, but only one of them may succeed.
-	 */
-	if (cache_from_memcg_idx(root_cache, memcg_cache_id(memcg)))
-		goto out_unlock;
-
 	cache_name = memcg_create_cache_name(memcg, root_cache);
 	if (!cache_name)
 		goto out_unlock;
@@ -290,12 +282,15 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 				 root_cache->size, root_cache->align,
 				 root_cache->flags, root_cache->ctor,
 				 memcg, root_cache);
-	if (IS_ERR(s))
+	if (IS_ERR(s)) {
 		kfree(cache_name);
+		s = NULL;
+	}
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
 	put_online_cpus();
+	return s;
 }
 
 static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
@@ -332,11 +327,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		goto out_unlock;
 
 	list_del(&s->list);
-	memcg_unregister_cache(s);
-
 	if (__kmem_cache_shutdown(s) != 0) {
 		list_add(&s->list, &slab_caches);
-		memcg_register_cache(s);
 		printk(KERN_ERR "kmem_cache_destroy %s: "
 		       "Slab cache still has objects\n", s->name);
 		dump_stack();
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 4/4] memcg, slab: remove memcg_cache_params::destroy work
  2014-04-09 15:02 ` Vladimir Davydov
@ 2014-04-09 15:02   ` Vladimir Davydov
  -1 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

There is no need to schedule a work for kmem_cache_shrink/destroy from
mem_cgroup_destroy_all_caches any more, because we can now call them
directly under memcg's slab_caches_mutex.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/slab.h |    3 ---
 mm/memcontrol.c      |   45 +++++++++++++++++----------------------------
 2 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 22ebcf475814..5b48ef7c0cea 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -515,8 +515,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @nr_pages: number of pages that belongs to this cache.
- * @destroy: worker to be called whenever we are ready, or believe we may be
- *           ready, to destroy this cache.
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -530,7 +528,6 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_t nr_pages;
-			struct work_struct destroy;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9621157f2e5b..443a5ff0d923 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3040,8 +3040,6 @@ void memcg_update_array_size(int num)
 		memcg_limited_groups_array_size = memcg_caches_array_size(num);
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w);
-
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 {
 	struct memcg_cache_params *cur_params = s->memcg_params;
@@ -3138,8 +3136,6 @@ int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 	if (memcg) {
 		s->memcg_params->memcg = memcg;
 		s->memcg_params->root_cache = root_cache;
-		INIT_WORK(&s->memcg_params->destroy,
-				kmem_cache_destroy_work_func);
 		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
@@ -3203,7 +3199,7 @@ out:
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
-static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
+static void __memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 {
 	struct kmem_cache *root_cache;
 	struct mem_cgroup *memcg;
@@ -3215,7 +3211,7 @@ static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 	memcg = cachep->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
-	mutex_lock(&memcg->slab_caches_mutex);
+	lockdep_assert_held(&memcg->slab_caches_mutex);
 
 	/*
 	 * Holding the activate_kmem_mutex assures nobody will touch the
@@ -3229,7 +3225,17 @@ static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 	list_del(&cachep->memcg_params->list);
 
 	kmem_cache_destroy(cachep);
+}
+
+static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
+{
+	struct mem_cgroup *memcg;
+
+	BUG_ON(is_root_cache(cachep));
+	memcg = cachep->memcg_params->memcg;
 
+	mutex_lock(&memcg->slab_caches_mutex);
+	__memcg_kmem_destroy_cache(cachep);
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
@@ -3264,19 +3270,6 @@ static inline void memcg_resume_kmem_account(void)
 	current->memcg_kmem_skip_account--;
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w)
-{
-	struct kmem_cache *cachep;
-	struct memcg_cache_params *p;
-
-	p = container_of(w, struct memcg_cache_params, destroy);
-	cachep = memcg_params_to_cache(p);
-
-	kmem_cache_shrink(cachep);
-	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
-		memcg_kmem_destroy_cache(cachep);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3298,12 +3291,6 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		if (!c)
 			continue;
 
-		/*
-		 * We will now manually delete the caches, so to avoid races
-		 * we need to cancel all pending destruction workers and
-		 * proceed with destruction ourselves.
-		 */
-		cancel_work_sync(&c->memcg_params->destroy);
 		memcg_kmem_destroy_cache(c);
 
 		if (cache_from_memcg_idx(s, i))
@@ -3316,15 +3303,17 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
-	struct memcg_cache_params *params;
+	struct memcg_cache_params *params, *tmp;
 
 	if (!memcg_kmem_is_active(memcg))
 		return;
 
 	mutex_lock(&memcg->slab_caches_mutex);
-	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
+	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		schedule_work(&cachep->memcg_params->destroy);
+		kmem_cache_shrink(cachep);
+		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+			__memcg_kmem_destroy_cache(cachep);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH -mm 4/4] memcg, slab: remove memcg_cache_params::destroy work
@ 2014-04-09 15:02   ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-09 15:02 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

There is no need to schedule a work for kmem_cache_shrink/destroy from
mem_cgroup_destroy_all_caches any more, because we can now call them
directly under memcg's slab_caches_mutex.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/slab.h |    3 ---
 mm/memcontrol.c      |   45 +++++++++++++++++----------------------------
 2 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 22ebcf475814..5b48ef7c0cea 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -515,8 +515,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @nr_pages: number of pages that belongs to this cache.
- * @destroy: worker to be called whenever we are ready, or believe we may be
- *           ready, to destroy this cache.
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -530,7 +528,6 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_t nr_pages;
-			struct work_struct destroy;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9621157f2e5b..443a5ff0d923 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3040,8 +3040,6 @@ void memcg_update_array_size(int num)
 		memcg_limited_groups_array_size = memcg_caches_array_size(num);
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w);
-
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 {
 	struct memcg_cache_params *cur_params = s->memcg_params;
@@ -3138,8 +3136,6 @@ int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 	if (memcg) {
 		s->memcg_params->memcg = memcg;
 		s->memcg_params->root_cache = root_cache;
-		INIT_WORK(&s->memcg_params->destroy,
-				kmem_cache_destroy_work_func);
 		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
@@ -3203,7 +3199,7 @@ out:
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
-static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
+static void __memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 {
 	struct kmem_cache *root_cache;
 	struct mem_cgroup *memcg;
@@ -3215,7 +3211,7 @@ static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 	memcg = cachep->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
-	mutex_lock(&memcg->slab_caches_mutex);
+	lockdep_assert_held(&memcg->slab_caches_mutex);
 
 	/*
 	 * Holding the activate_kmem_mutex assures nobody will touch the
@@ -3229,7 +3225,17 @@ static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 	list_del(&cachep->memcg_params->list);
 
 	kmem_cache_destroy(cachep);
+}
+
+static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
+{
+	struct mem_cgroup *memcg;
+
+	BUG_ON(is_root_cache(cachep));
+	memcg = cachep->memcg_params->memcg;
 
+	mutex_lock(&memcg->slab_caches_mutex);
+	__memcg_kmem_destroy_cache(cachep);
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
 
@@ -3264,19 +3270,6 @@ static inline void memcg_resume_kmem_account(void)
 	current->memcg_kmem_skip_account--;
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w)
-{
-	struct kmem_cache *cachep;
-	struct memcg_cache_params *p;
-
-	p = container_of(w, struct memcg_cache_params, destroy);
-	cachep = memcg_params_to_cache(p);
-
-	kmem_cache_shrink(cachep);
-	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
-		memcg_kmem_destroy_cache(cachep);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3298,12 +3291,6 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		if (!c)
 			continue;
 
-		/*
-		 * We will now manually delete the caches, so to avoid races
-		 * we need to cancel all pending destruction workers and
-		 * proceed with destruction ourselves.
-		 */
-		cancel_work_sync(&c->memcg_params->destroy);
 		memcg_kmem_destroy_cache(c);
 
 		if (cache_from_memcg_idx(s, i))
@@ -3316,15 +3303,17 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
-	struct memcg_cache_params *params;
+	struct memcg_cache_params *params, *tmp;
 
 	if (!memcg_kmem_is_active(memcg))
 		return;
 
 	mutex_lock(&memcg->slab_caches_mutex);
-	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
+	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		schedule_work(&cachep->memcg_params->destroy);
+		kmem_cache_shrink(cachep);
+		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+			__memcg_kmem_destroy_cache(cachep);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
 }
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-09 15:02   ` Vladimir Davydov
@ 2014-04-15  2:16     ` Johannes Weiner
  -1 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-04-15  2:16 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, mhocko, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Wed, Apr 09, 2014 at 07:02:30PM +0400, Vladimir Davydov wrote:
> After the memcg is offlined, we mark its kmem caches that cannot be
> deleted right now due to pending objects as dead by setting the
> memcg_cache_params::dead flag, so that memcg_release_pages will schedule
> cache destruction (memcg_cache_params::destroy) as soon as the last slab
> of the cache is freed (memcg_cache_params::nr_pages drops to zero).
> 
> I guess the idea was to destroy the caches as soon as possible, i.e.
> immediately after freeing the last object. However, it just doesn't work
> that way, because kmem caches always preserve some pages for the sake of
> performance, so that nr_pages never gets to zero unless the cache is
> shrunk explicitly using kmem_cache_shrink. Of course, we could account
> the total number of objects on the cache or check if all the slabs
> allocated for the cache are empty on kmem_cache_free and schedule
> destruction if so, but that would be too costly.
> 
> Thus we have a piece of code that works only when we explicitly call
> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
> it's racy in fact. For instance, kmem_cache_shrink may free the last
> slab and thus schedule cache destruction before it finishes checking
> that the cache is empty, which can lead to use-after-free.
> 
> So I propose to remove this async cache destruction from
> memcg_release_pages, and check if the cache is empty explicitly after
> calling kmem_cache_shrink instead. This will simplify things a lot w/o
> introducing any functional changes.
> 
> And regarding dead memcg caches (i.e. those that are left hanging around
> after memcg offline for they have objects), I suppose we should reap
> them either periodically or on vmpressure as Glauber suggested
> initially. I'm going to implement this later.

memcg_release_pages() can be called after cgroup destruction, and thus
it *must* ensure that the now-empty cache is destroyed - or we'll leak
it.

There is no excuse to downgrade to periodic reaping when we already
directly hook into the event that makes the cache empty.  If slab
needs to hold on to the cache for slightly longer than the final
memcg_release_pages(), then it should grab a refcount to it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-15  2:16     ` Johannes Weiner
  0 siblings, 0 replies; 20+ messages in thread
From: Johannes Weiner @ 2014-04-15  2:16 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, mhocko, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Wed, Apr 09, 2014 at 07:02:30PM +0400, Vladimir Davydov wrote:
> After the memcg is offlined, we mark its kmem caches that cannot be
> deleted right now due to pending objects as dead by setting the
> memcg_cache_params::dead flag, so that memcg_release_pages will schedule
> cache destruction (memcg_cache_params::destroy) as soon as the last slab
> of the cache is freed (memcg_cache_params::nr_pages drops to zero).
> 
> I guess the idea was to destroy the caches as soon as possible, i.e.
> immediately after freeing the last object. However, it just doesn't work
> that way, because kmem caches always preserve some pages for the sake of
> performance, so that nr_pages never gets to zero unless the cache is
> shrunk explicitly using kmem_cache_shrink. Of course, we could account
> the total number of objects on the cache or check if all the slabs
> allocated for the cache are empty on kmem_cache_free and schedule
> destruction if so, but that would be too costly.
> 
> Thus we have a piece of code that works only when we explicitly call
> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
> it's racy in fact. For instance, kmem_cache_shrink may free the last
> slab and thus schedule cache destruction before it finishes checking
> that the cache is empty, which can lead to use-after-free.
> 
> So I propose to remove this async cache destruction from
> memcg_release_pages, and check if the cache is empty explicitly after
> calling kmem_cache_shrink instead. This will simplify things a lot w/o
> introducing any functional changes.
> 
> And regarding dead memcg caches (i.e. those that are left hanging around
> after memcg offline for they have objects), I suppose we should reap
> them either periodically or on vmpressure as Glauber suggested
> initially. I'm going to implement this later.

memcg_release_pages() can be called after cgroup destruction, and thus
it *must* ensure that the now-empty cache is destroyed - or we'll leak
it.

There is no excuse to downgrade to periodic reaping when we already
directly hook into the event that makes the cache empty.  If slab
needs to hold on to the cache for slightly longer than the final
memcg_release_pages(), then it should grab a refcount to it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-15  2:16     ` Johannes Weiner
@ 2014-04-15  6:24       ` Vladimir Davydov
  -1 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-15  6:24 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, mhocko, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Johannes,

On 04/15/2014 06:16 AM, Johannes Weiner wrote:
> On Wed, Apr 09, 2014 at 07:02:30PM +0400, Vladimir Davydov wrote:
>> After the memcg is offlined, we mark its kmem caches that cannot be
>> deleted right now due to pending objects as dead by setting the
>> memcg_cache_params::dead flag, so that memcg_release_pages will schedule
>> cache destruction (memcg_cache_params::destroy) as soon as the last slab
>> of the cache is freed (memcg_cache_params::nr_pages drops to zero).
>>
>> I guess the idea was to destroy the caches as soon as possible, i.e.
>> immediately after freeing the last object. However, it just doesn't work
>> that way, because kmem caches always preserve some pages for the sake of
>> performance, so that nr_pages never gets to zero unless the cache is
>> shrunk explicitly using kmem_cache_shrink. Of course, we could account
>> the total number of objects on the cache or check if all the slabs
>> allocated for the cache are empty on kmem_cache_free and schedule
>> destruction if so, but that would be too costly.
>>
>> Thus we have a piece of code that works only when we explicitly call
>> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
>> it's racy in fact. For instance, kmem_cache_shrink may free the last
>> slab and thus schedule cache destruction before it finishes checking
>> that the cache is empty, which can lead to use-after-free.
>>
>> So I propose to remove this async cache destruction from
>> memcg_release_pages, and check if the cache is empty explicitly after
>> calling kmem_cache_shrink instead. This will simplify things a lot w/o
>> introducing any functional changes.
>>
>> And regarding dead memcg caches (i.e. those that are left hanging around
>> after memcg offline for they have objects), I suppose we should reap
>> them either periodically or on vmpressure as Glauber suggested
>> initially. I'm going to implement this later.
> memcg_release_pages() can be called after cgroup destruction, and thus
> it *must* ensure that the now-empty cache is destroyed - or we'll leak
> it.

But the problem is that we already leak *all* per memcg caches that are
not empty by the time memcg is offlined, because even when all objects
of such a cache are freed, it will still have some pages cached
per-cpu/node in both slab and slub implementations. That said, at
present the code scheduling cache destruction from memcg_release_pages
only works when we call kmem_cache_shrink.

I propose to remove this piece of code from memcg_release_pages and
instead call kmem_cache_destroy explicitly after kmem_cache_shrink if
the cache becomes empty. The caches that still have objects after memcg
offline should be shrunk either periodically or on vmpressure. That
would greatly simplify synchronization.

> There is no excuse to downgrade to periodic reaping when we already
> directly hook into the event that makes the cache empty.  If slab
> needs to hold on to the cache for slightly longer than the final
> memcg_release_pages(), then it should grab a refcount to it.

IMO that wouldn't be a downgrade - the code in memcg_release_pages
already does not work as expected, at least it doesn't initiate cache
destruction when the last object goes away.

If we really want to destroy caches as soon as possible we could:

1) Count active objects per cache instead of pages as we do now. That
would be too costly IMO.

2) When freeing an object of a dead memcg cache, initiate thorough check
if the cache is really empty and destroy it then. That could be
implemented by poking the reaping thread on kfree, and actually does not
require the schedule_work in memcg_release_pages IMO.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-15  6:24       ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-15  6:24 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, mhocko, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Johannes,

On 04/15/2014 06:16 AM, Johannes Weiner wrote:
> On Wed, Apr 09, 2014 at 07:02:30PM +0400, Vladimir Davydov wrote:
>> After the memcg is offlined, we mark its kmem caches that cannot be
>> deleted right now due to pending objects as dead by setting the
>> memcg_cache_params::dead flag, so that memcg_release_pages will schedule
>> cache destruction (memcg_cache_params::destroy) as soon as the last slab
>> of the cache is freed (memcg_cache_params::nr_pages drops to zero).
>>
>> I guess the idea was to destroy the caches as soon as possible, i.e.
>> immediately after freeing the last object. However, it just doesn't work
>> that way, because kmem caches always preserve some pages for the sake of
>> performance, so that nr_pages never gets to zero unless the cache is
>> shrunk explicitly using kmem_cache_shrink. Of course, we could account
>> the total number of objects on the cache or check if all the slabs
>> allocated for the cache are empty on kmem_cache_free and schedule
>> destruction if so, but that would be too costly.
>>
>> Thus we have a piece of code that works only when we explicitly call
>> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
>> it's racy in fact. For instance, kmem_cache_shrink may free the last
>> slab and thus schedule cache destruction before it finishes checking
>> that the cache is empty, which can lead to use-after-free.
>>
>> So I propose to remove this async cache destruction from
>> memcg_release_pages, and check if the cache is empty explicitly after
>> calling kmem_cache_shrink instead. This will simplify things a lot w/o
>> introducing any functional changes.
>>
>> And regarding dead memcg caches (i.e. those that are left hanging around
>> after memcg offline for they have objects), I suppose we should reap
>> them either periodically or on vmpressure as Glauber suggested
>> initially. I'm going to implement this later.
> memcg_release_pages() can be called after cgroup destruction, and thus
> it *must* ensure that the now-empty cache is destroyed - or we'll leak
> it.

But the problem is that we already leak *all* per memcg caches that are
not empty by the time memcg is offlined, because even when all objects
of such a cache are freed, it will still have some pages cached
per-cpu/node in both slab and slub implementations. That said, at
present the code scheduling cache destruction from memcg_release_pages
only works when we call kmem_cache_shrink.

I propose to remove this piece of code from memcg_release_pages and
instead call kmem_cache_destroy explicitly after kmem_cache_shrink if
the cache becomes empty. The caches that still have objects after memcg
offline should be shrunk either periodically or on vmpressure. That
would greatly simplify synchronization.

> There is no excuse to downgrade to periodic reaping when we already
> directly hook into the event that makes the cache empty.  If slab
> needs to hold on to the cache for slightly longer than the final
> memcg_release_pages(), then it should grab a refcount to it.

IMO that wouldn't be a downgrade - the code in memcg_release_pages
already does not work as expected, at least it doesn't initiate cache
destruction when the last object goes away.

If we really want to destroy caches as soon as possible we could:

1) Count active objects per cache instead of pages as we do now. That
would be too costly IMO.

2) When freeing an object of a dead memcg cache, initiate thorough check
if the cache is really empty and destroy it then. That could be
implemented by poking the reaping thread on kfree, and actually does not
require the schedule_work in memcg_release_pages IMO.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-15  6:24       ` Vladimir Davydov
@ 2014-04-15 15:17         ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2014-04-15 15:17 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Johannes Weiner, akpm, mhocko, glommer, penberg, linux-kernel,
	linux-mm, devel

On Tue, 15 Apr 2014, Vladimir Davydov wrote:

> 2) When freeing an object of a dead memcg cache, initiate thorough check
> if the cache is really empty and destroy it then. That could be
> implemented by poking the reaping thread on kfree, and actually does not
> require the schedule_work in memcg_release_pages IMO.

There is already logic in both slub and slab that does that on cache
close.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-15 15:17         ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2014-04-15 15:17 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Johannes Weiner, akpm, mhocko, glommer, penberg, linux-kernel,
	linux-mm, devel

On Tue, 15 Apr 2014, Vladimir Davydov wrote:

> 2) When freeing an object of a dead memcg cache, initiate thorough check
> if the cache is really empty and destroy it then. That could be
> implemented by poking the reaping thread on kfree, and actually does not
> require the schedule_work in memcg_release_pages IMO.

There is already logic in both slub and slab that does that on cache
close.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-15 15:17         ` Christoph Lameter
@ 2014-04-15 19:08           ` Vladimir Davydov
  -1 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-15 19:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Johannes Weiner, akpm, mhocko, glommer, penberg, linux-kernel,
	linux-mm, devel

Hi Christoph,

15.04.2014 19:17, Christoph Lameter:
> On Tue, 15 Apr 2014, Vladimir Davydov wrote:
>
>> 2) When freeing an object of a dead memcg cache, initiate thorough check
>> if the cache is really empty and destroy it then. That could be
>> implemented by poking the reaping thread on kfree, and actually does not
>> require the schedule_work in memcg_release_pages IMO.
>
> There is already logic in both slub and slab that does that on cache
> close.

Yeah, but here the question is when we should close caches left after 
memcg offline. Obviously we should do it after all objects of such a 
cache have gone, but when exactly? Do it immediately after the last 
kfree (have to count objects per cache then AFAIU) or may be check 
periodically (or on vmpressure) that the cache is empty by issuing 
kmem_cache_shrink and looking if memcg_params::nr_pages = 0?

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-15 19:08           ` Vladimir Davydov
  0 siblings, 0 replies; 20+ messages in thread
From: Vladimir Davydov @ 2014-04-15 19:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Johannes Weiner, akpm, mhocko, glommer, penberg, linux-kernel,
	linux-mm, devel

Hi Christoph,

15.04.2014 19:17, Christoph Lameter:
> On Tue, 15 Apr 2014, Vladimir Davydov wrote:
>
>> 2) When freeing an object of a dead memcg cache, initiate thorough check
>> if the cache is really empty and destroy it then. That could be
>> implemented by poking the reaping thread on kfree, and actually does not
>> require the schedule_work in memcg_release_pages IMO.
>
> There is already logic in both slub and slab that does that on cache
> close.

Yeah, but here the question is when we should close caches left after 
memcg offline. Obviously we should do it after all objects of such a 
cache have gone, but when exactly? Do it immediately after the last 
kfree (have to count objects per cache then AFAIU) or may be check 
periodically (or on vmpressure) that the cache is empty by issuing 
kmem_cache_shrink and looking if memcg_params::nr_pages = 0?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-15 19:08           ` Vladimir Davydov
@ 2014-04-15 19:32             ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2014-04-15 19:32 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Johannes Weiner, akpm, mhocko, glommer, penberg, linux-kernel,
	linux-mm, devel

On Tue, 15 Apr 2014, Vladimir Davydov wrote:

> > There is already logic in both slub and slab that does that on cache
> > close.
>
> Yeah, but here the question is when we should close caches left after memcg
> offline. Obviously we should do it after all objects of such a cache have
> gone, but when exactly? Do it immediately after the last kfree (have to count
> objects per cache then AFAIU) or may be check periodically (or on vmpressure)
> that the cache is empty by issuing kmem_cache_shrink and looking if
> memcg_params::nr_pages = 0?

Guess check once in a while if you have no other way to determine this. A
hook in kfree() would impact all users.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-15 19:32             ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2014-04-15 19:32 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Johannes Weiner, akpm, mhocko, glommer, penberg, linux-kernel,
	linux-mm, devel

On Tue, 15 Apr 2014, Vladimir Davydov wrote:

> > There is already logic in both slub and slab that does that on cache
> > close.
>
> Yeah, but here the question is when we should close caches left after memcg
> offline. Obviously we should do it after all objects of such a cache have
> gone, but when exactly? Do it immediately after the last kfree (have to count
> objects per cache then AFAIU) or may be check periodically (or on vmpressure)
> that the cache is empty by issuing kmem_cache_shrink and looking if
> memcg_params::nr_pages = 0?

Guess check once in a while if you have no other way to determine this. A
hook in kfree() would impact all users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-04-15 19:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-09 15:02 [PATCH -mm 0/4] memcg-vs-slab cleanup Vladimir Davydov
2014-04-09 15:02 ` Vladimir Davydov
2014-04-09 15:02 ` [PATCH -mm 1/4] memcg, slab: do not schedule cache destruction when last page goes away Vladimir Davydov
2014-04-09 15:02   ` Vladimir Davydov
2014-04-15  2:16   ` Johannes Weiner
2014-04-15  2:16     ` Johannes Weiner
2014-04-15  6:24     ` Vladimir Davydov
2014-04-15  6:24       ` Vladimir Davydov
2014-04-15 15:17       ` Christoph Lameter
2014-04-15 15:17         ` Christoph Lameter
2014-04-15 19:08         ` Vladimir Davydov
2014-04-15 19:08           ` Vladimir Davydov
2014-04-15 19:32           ` Christoph Lameter
2014-04-15 19:32             ` Christoph Lameter
2014-04-09 15:02 ` [PATCH -mm 2/4] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab Vladimir Davydov
2014-04-09 15:02   ` Vladimir Davydov
2014-04-09 15:02 ` [PATCH -mm 3/4] memcg, slab: change memcg::slab_caches_mutex vs slab_mutex locking order Vladimir Davydov
2014-04-09 15:02   ` Vladimir Davydov
2014-04-09 15:02 ` [PATCH -mm 4/4] memcg, slab: remove memcg_cache_params::destroy work Vladimir Davydov
2014-04-09 15:02   ` Vladimir Davydov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.