[PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
@ 2014-04-18  8:04 ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Michal, Johannes,

This patch-set is a part of preparations for kmemcg re-parenting. It
targets at simplifying kmemcg work-flows and synchronization.

First, it removes async per memcg cache destruction (see patches 1, 2).
Now caches are only destroyed on memcg offline. That means the caches
that are not empty on memcg offline will be leaked. However, they are
already leaked, because memcg_cache_params::nr_pages normally never
drops to 0 so the destruction work is never scheduled except
kmem_cache_shrink is called explicitly. In the future I'm planning
reaping such dead caches on vmpressure or periodically.

Second, it substitutes per memcg slab_caches_mutex's with the global
memcg_slab_mutex, which should be taken during the whole per memcg cache
creation/destruction path before the slab_mutex (see patch 3). This
greatly simplifies synchronization among various per memcg cache
creation/destruction paths.

I really need your help, because I'm far not sure if what I'm doing here
is right. So I would appreciate if you could look through the patches
and share your thoughts about the design changes they introduce.

Thanks,

Vladimir Davydov (3):
  memcg, slab: do not schedule cache destruction when last page goes
    away
  memcg, slab: merge memcg_{bind,release}_pages to
    memcg_{un}charge_slab
  memcg, slab: simplify synchronization scheme

 include/linux/memcontrol.h |   15 +--
 include/linux/slab.h       |    8 +-
 mm/memcontrol.c            |  231 +++++++++++++++-----------------------------
 mm/slab.c                  |    2 -
 mm/slab.h                  |   28 +-----
 mm/slab_common.c           |   22 ++---
 mm/slub.c                  |    2 -
 7 files changed, 92 insertions(+), 216 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
@ 2014-04-18  8:04 ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Michal, Johannes,

This patch-set is a part of preparations for kmemcg re-parenting. It
targets at simplifying kmemcg work-flows and synchronization.

First, it removes async per memcg cache destruction (see patches 1, 2).
Now caches are only destroyed on memcg offline. That means the caches
that are not empty on memcg offline will be leaked. However, they are
already leaked, because memcg_cache_params::nr_pages normally never
drops to 0 so the destruction work is never scheduled except
kmem_cache_shrink is called explicitly. In the future I'm planning
reaping such dead caches on vmpressure or periodically.

Second, it substitutes per memcg slab_caches_mutex's with the global
memcg_slab_mutex, which should be taken during the whole per memcg cache
creation/destruction path before the slab_mutex (see patch 3). This
greatly simplifies synchronization among various per memcg cache
creation/destruction paths.

I really need your help, because I'm far not sure if what I'm doing here
is right. So I would appreciate if you could look through the patches
and share your thoughts about the design changes they introduce.

Thanks,

Vladimir Davydov (3):
  memcg, slab: do not schedule cache destruction when last page goes
    away
  memcg, slab: merge memcg_{bind,release}_pages to
    memcg_{un}charge_slab
  memcg, slab: simplify synchronization scheme

 include/linux/memcontrol.h |   15 +--
 include/linux/slab.h       |    8 +-
 mm/memcontrol.c            |  231 +++++++++++++++-----------------------------
 mm/slab.c                  |    2 -
 mm/slab.h                  |   28 +-----
 mm/slab_common.c           |   22 ++---
 mm/slub.c                  |    2 -
 7 files changed, 92 insertions(+), 216 deletions(-)

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-18  8:04 ` Vladimir Davydov
@ 2014-04-18  8:04   ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

After a memcg is offlined, we mark its kmem caches that cannot be
deleted right now due to pending objects as dead by setting the
memcg_cache_params::dead flag, so that memcg_release_pages will schedule
cache destruction (memcg_cache_params::destroy) as soon as the last slab
of the cache is freed (memcg_cache_params::nr_pages drops to zero).

I guess the idea was to destroy the caches as soon as possible, i.e.
immediately after freeing the last object. However, it just doesn't work
that way, because kmem caches always preserve some pages for the sake of
performance, so that nr_pages never gets to zero unless the cache is
shrunk explicitly using kmem_cache_shrink. Of course, we could account
the total number of objects on the cache or check if all the slabs
allocated for the cache are empty on kmem_cache_free and schedule
destruction if so, but that would be too costly.

Thus we have a piece of code that works only when we explicitly call
kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
it's racy in fact. For instance, kmem_cache_shrink may free the last
slab and thus schedule cache destruction before it finishes checking
that the cache is empty, which can lead to use-after-free.

So I propose to remove this async cache destruction from
memcg_release_pages, and check if the cache is empty explicitly after
calling kmem_cache_shrink instead. This will simplify things a lot w/o
introducing any functional changes.

And regarding dead memcg caches (i.e. those that are left hanging around
after memcg offline for they have objects), I suppose we should reap
them either periodically or on vmpressure as Glauber suggested
initially. I'm going to implement this later.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    1 -
 include/linux/slab.h       |    2 --
 mm/memcontrol.c            |   63 ++------------------------------------------
 mm/slab.h                  |    7 ++---
 4 files changed, 4 insertions(+), 69 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5155d09e749d..087a45314181 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -509,7 +509,6 @@ __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
 void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep);
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
 /**
diff --git a/include/linux/slab.h b/include/linux/slab.h
index a6aab2c0dfc5..905541dd3778 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -524,7 +524,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @memcg: pointer to the memcg this cache belongs to
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
- * @dead: set to true after the memcg dies; the cache may still be around.
  * @nr_pages: number of pages that belongs to this cache.
  * @destroy: worker to be called whenever we are ready, or believe we may be
  *           ready, to destroy this cache.
@@ -540,7 +539,6 @@ struct memcg_cache_params {
 			struct mem_cgroup *memcg;
 			struct list_head list;
 			struct kmem_cache *root_cache;
-			bool dead;
 			atomic_t nr_pages;
 			struct work_struct destroy;
 		};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e59f5729e5e6..81ecb0de95dd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3267,60 +3267,11 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 
 	cachep = memcg_params_to_cache(p);
 
-	/*
-	 * If we get down to 0 after shrink, we could delete right away.
-	 * However, memcg_release_pages() already puts us back in the workqueue
-	 * in that case. If we proceed deleting, we'll get a dangling
-	 * reference, and removing the object from the workqueue in that case
-	 * is unnecessary complication. We are not a fast path.
-	 *
-	 * Note that this case is fundamentally different from racing with
-	 * shrink_slab(): if memcg_cgroup_destroy_cache() is called in
-	 * kmem_cache_shrink, not only we would be reinserting a dead cache
-	 * into the queue, but doing so from inside the worker racing to
-	 * destroy it.
-	 *
-	 * So if we aren't down to zero, we'll just schedule a worker and try
-	 * again
-	 */
-	if (atomic_read(&cachep->memcg_params->nr_pages) != 0)
-		kmem_cache_shrink(cachep);
-	else
+	kmem_cache_shrink(cachep);
+	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
 		kmem_cache_destroy(cachep);
 }
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
-{
-	if (!cachep->memcg_params->dead)
-		return;
-
-	/*
-	 * There are many ways in which we can get here.
-	 *
-	 * We can get to a memory-pressure situation while the delayed work is
-	 * still pending to run. The vmscan shrinkers can then release all
-	 * cache memory and get us to destruction. If this is the case, we'll
-	 * be executed twice, which is a bug (the second time will execute over
-	 * bogus data). In this case, cancelling the work should be fine.
-	 *
-	 * But we can also get here from the worker itself, if
-	 * kmem_cache_shrink is enough to shake all the remaining objects and
-	 * get the page count to 0. In this case, we'll deadlock if we try to
-	 * cancel the work (the worker runs with an internal lock held, which
-	 * is the same lock we would hold for cancel_work_sync().)
-	 *
-	 * Since we can't possibly know who got us here, just refrain from
-	 * running if there is already work pending
-	 */
-	if (work_pending(&cachep->memcg_params->destroy))
-		return;
-	/*
-	 * We have to defer the actual destroying to a workqueue, because
-	 * we might currently be in a context that cannot sleep.
-	 */
-	schedule_work(&cachep->memcg_params->destroy);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3346,16 +3297,7 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		 * We will now manually delete the caches, so to avoid races
 		 * we need to cancel all pending destruction workers and
 		 * proceed with destruction ourselves.
-		 *
-		 * kmem_cache_destroy() will call kmem_cache_shrink internally,
-		 * and that could spawn the workers again: it is likely that
-		 * the cache still have active pages until this very moment.
-		 * This would lead us back to mem_cgroup_destroy_cache.
-		 *
-		 * But that will not execute at all if the "dead" flag is not
-		 * set, so flip it down to guarantee we are in control.
 		 */
-		c->memcg_params->dead = false;
 		cancel_work_sync(&c->memcg_params->destroy);
 		kmem_cache_destroy(c);
 
@@ -3377,7 +3319,6 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 	mutex_lock(&memcg->slab_caches_mutex);
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		cachep->memcg_params->dead = true;
 		schedule_work(&cachep->memcg_params->destroy);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
diff --git a/mm/slab.h b/mm/slab.h
index 3db3c52f80a2..efe14d420010 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -127,11 +127,8 @@ static inline void memcg_bind_pages(struct kmem_cache *s, int order)
 
 static inline void memcg_release_pages(struct kmem_cache *s, int order)
 {
-	if (is_root_cache(s))
-		return;
-
-	if (atomic_sub_and_test((1 << order), &s->memcg_params->nr_pages))
-		mem_cgroup_destroy_cache(s);
+	if (!is_root_cache(s))
+		atomic_sub(1 << order, &s->memcg_params->nr_pages);
 }
 
 static inline bool slab_equal_or_root(struct kmem_cache *s,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-18  8:04   ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

After a memcg is offlined, we mark its kmem caches that cannot be
deleted right now due to pending objects as dead by setting the
memcg_cache_params::dead flag, so that memcg_release_pages will schedule
cache destruction (memcg_cache_params::destroy) as soon as the last slab
of the cache is freed (memcg_cache_params::nr_pages drops to zero).

I guess the idea was to destroy the caches as soon as possible, i.e.
immediately after freeing the last object. However, it just doesn't work
that way, because kmem caches always preserve some pages for the sake of
performance, so that nr_pages never gets to zero unless the cache is
shrunk explicitly using kmem_cache_shrink. Of course, we could account
the total number of objects on the cache or check if all the slabs
allocated for the cache are empty on kmem_cache_free and schedule
destruction if so, but that would be too costly.

Thus we have a piece of code that works only when we explicitly call
kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
it's racy in fact. For instance, kmem_cache_shrink may free the last
slab and thus schedule cache destruction before it finishes checking
that the cache is empty, which can lead to use-after-free.

So I propose to remove this async cache destruction from
memcg_release_pages, and check if the cache is empty explicitly after
calling kmem_cache_shrink instead. This will simplify things a lot w/o
introducing any functional changes.

And regarding dead memcg caches (i.e. those that are left hanging around
after memcg offline for they have objects), I suppose we should reap
them either periodically or on vmpressure as Glauber suggested
initially. I'm going to implement this later.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    1 -
 include/linux/slab.h       |    2 --
 mm/memcontrol.c            |   63 ++------------------------------------------
 mm/slab.h                  |    7 ++---
 4 files changed, 4 insertions(+), 69 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 5155d09e749d..087a45314181 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -509,7 +509,6 @@ __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
 void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep);
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
 /**
diff --git a/include/linux/slab.h b/include/linux/slab.h
index a6aab2c0dfc5..905541dd3778 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -524,7 +524,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @memcg: pointer to the memcg this cache belongs to
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
- * @dead: set to true after the memcg dies; the cache may still be around.
  * @nr_pages: number of pages that belongs to this cache.
  * @destroy: worker to be called whenever we are ready, or believe we may be
  *           ready, to destroy this cache.
@@ -540,7 +539,6 @@ struct memcg_cache_params {
 			struct mem_cgroup *memcg;
 			struct list_head list;
 			struct kmem_cache *root_cache;
-			bool dead;
 			atomic_t nr_pages;
 			struct work_struct destroy;
 		};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e59f5729e5e6..81ecb0de95dd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3267,60 +3267,11 @@ static void kmem_cache_destroy_work_func(struct work_struct *w)
 
 	cachep = memcg_params_to_cache(p);
 
-	/*
-	 * If we get down to 0 after shrink, we could delete right away.
-	 * However, memcg_release_pages() already puts us back in the workqueue
-	 * in that case. If we proceed deleting, we'll get a dangling
-	 * reference, and removing the object from the workqueue in that case
-	 * is unnecessary complication. We are not a fast path.
-	 *
-	 * Note that this case is fundamentally different from racing with
-	 * shrink_slab(): if memcg_cgroup_destroy_cache() is called in
-	 * kmem_cache_shrink, not only we would be reinserting a dead cache
-	 * into the queue, but doing so from inside the worker racing to
-	 * destroy it.
-	 *
-	 * So if we aren't down to zero, we'll just schedule a worker and try
-	 * again
-	 */
-	if (atomic_read(&cachep->memcg_params->nr_pages) != 0)
-		kmem_cache_shrink(cachep);
-	else
+	kmem_cache_shrink(cachep);
+	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
 		kmem_cache_destroy(cachep);
 }
 
-void mem_cgroup_destroy_cache(struct kmem_cache *cachep)
-{
-	if (!cachep->memcg_params->dead)
-		return;
-
-	/*
-	 * There are many ways in which we can get here.
-	 *
-	 * We can get to a memory-pressure situation while the delayed work is
-	 * still pending to run. The vmscan shrinkers can then release all
-	 * cache memory and get us to destruction. If this is the case, we'll
-	 * be executed twice, which is a bug (the second time will execute over
-	 * bogus data). In this case, cancelling the work should be fine.
-	 *
-	 * But we can also get here from the worker itself, if
-	 * kmem_cache_shrink is enough to shake all the remaining objects and
-	 * get the page count to 0. In this case, we'll deadlock if we try to
-	 * cancel the work (the worker runs with an internal lock held, which
-	 * is the same lock we would hold for cancel_work_sync().)
-	 *
-	 * Since we can't possibly know who got us here, just refrain from
-	 * running if there is already work pending
-	 */
-	if (work_pending(&cachep->memcg_params->destroy))
-		return;
-	/*
-	 * We have to defer the actual destroying to a workqueue, because
-	 * we might currently be in a context that cannot sleep.
-	 */
-	schedule_work(&cachep->memcg_params->destroy);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
@@ -3346,16 +3297,7 @@ int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 		 * We will now manually delete the caches, so to avoid races
 		 * we need to cancel all pending destruction workers and
 		 * proceed with destruction ourselves.
-		 *
-		 * kmem_cache_destroy() will call kmem_cache_shrink internally,
-		 * and that could spawn the workers again: it is likely that
-		 * the cache still have active pages until this very moment.
-		 * This would lead us back to mem_cgroup_destroy_cache.
-		 *
-		 * But that will not execute at all if the "dead" flag is not
-		 * set, so flip it down to guarantee we are in control.
 		 */
-		c->memcg_params->dead = false;
 		cancel_work_sync(&c->memcg_params->destroy);
 		kmem_cache_destroy(c);
 
@@ -3377,7 +3319,6 @@ static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 	mutex_lock(&memcg->slab_caches_mutex);
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		cachep->memcg_params->dead = true;
 		schedule_work(&cachep->memcg_params->destroy);
 	}
 	mutex_unlock(&memcg->slab_caches_mutex);
diff --git a/mm/slab.h b/mm/slab.h
index 3db3c52f80a2..efe14d420010 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -127,11 +127,8 @@ static inline void memcg_bind_pages(struct kmem_cache *s, int order)
 
 static inline void memcg_release_pages(struct kmem_cache *s, int order)
 {
-	if (is_root_cache(s))
-		return;
-
-	if (atomic_sub_and_test((1 << order), &s->memcg_params->nr_pages))
-		mem_cgroup_destroy_cache(s);
+	if (!is_root_cache(s))
+		atomic_sub(1 << order, &s->memcg_params->nr_pages);
 }
 
 static inline bool slab_equal_or_root(struct kmem_cache *s,
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
  2014-04-18  8:04 ` Vladimir Davydov
@ 2014-04-18  8:04   ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

Currently we have two pairs of kmemcg-related functions that are called
on slab alloc/free. The first is memcg_{bind,release}_pages that count
the total number of pages allocated on a kmem cache. The second is
memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
counter. Let's just merge them to keep the code clean.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    4 ++--
 mm/memcontrol.c            |   22 ++++++++++++++++++++--
 mm/slab.c                  |    2 --
 mm/slab.h                  |   25 ++-----------------------
 mm/slub.c                  |    2 --
 5 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 087a45314181..d38d190f4cec 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -506,8 +506,8 @@ void memcg_update_array_size(int num_groups);
 struct kmem_cache *
 __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
 
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 81ecb0de95dd..5221347b0e1b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
 }
 #endif
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
+static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 {
 	struct res_counter *fail_res;
 	int ret = 0;
@@ -2982,7 +2982,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 	return ret;
 }
 
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
+static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
 {
 	res_counter_uncharge(&memcg->res, size);
 	if (do_swap_account)
@@ -3380,6 +3380,24 @@ static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 	__memcg_create_cache_enqueue(memcg, cachep);
 	memcg_resume_kmem_account();
 }
+
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
+{
+	int res;
+
+	res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp,
+				PAGE_SIZE << order);
+	if (!res)
+		atomic_add(1 << order, &cachep->memcg_params->nr_pages);
+	return res;
+}
+
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
+{
+	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
+	atomic_sub(1 << order, &cachep->memcg_params->nr_pages);
+}
+
 /*
  * Return the kmem_cache we're supposed to use for a slab allocation.
  * We try to use the current memcg's version of the cache.
diff --git a/mm/slab.c b/mm/slab.c
index cbcd2fa7af2f..fa28f86ac16f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1706,7 +1706,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
 		SetPageSlabPfmemalloc(page);
-	memcg_bind_pages(cachep, cachep->gfporder);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1742,7 +1741,6 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	page_mapcount_reset(page);
 	page->mapping = NULL;
 
-	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
 	__free_pages(page, cachep->gfporder);
diff --git a/mm/slab.h b/mm/slab.h
index efe14d420010..11eb623f0e61 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -119,18 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_add(1 << order, &s->memcg_params->nr_pages);
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_sub(1 << order, &s->memcg_params->nr_pages);
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 					struct kmem_cache *p)
 {
@@ -196,8 +184,7 @@ static __always_inline int memcg_charge_slab(struct kmem_cache *s,
 		return 0;
 	if (is_root_cache(s))
 		return 0;
-	return memcg_charge_kmem(s->memcg_params->memcg, gfp,
-				 PAGE_SIZE << order);
+	return __memcg_charge_slab(s, gfp, order);
 }
 
 static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
@@ -206,7 +193,7 @@ static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
 		return;
 	if (is_root_cache(s))
 		return;
-	memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order);
+	__memcg_uncharge_slab(s, order);
 }
 #else
 static inline bool is_root_cache(struct kmem_cache *s)
@@ -214,14 +201,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 				      struct kmem_cache *p)
 {
diff --git a/mm/slub.c b/mm/slub.c
index fa7a1817835e..c17d8b9975be 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1427,7 +1427,6 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	order = compound_order(page);
 	inc_slabs_node(s, page_to_nid(page), page->objects);
-	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
@@ -1478,7 +1477,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
 
-	memcg_release_pages(s, order);
 	page_mapcount_reset(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
@ 2014-04-18  8:04   ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

Currently we have two pairs of kmemcg-related functions that are called
on slab alloc/free. The first is memcg_{bind,release}_pages that count
the total number of pages allocated on a kmem cache. The second is
memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
counter. Let's just merge them to keep the code clean.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |    4 ++--
 mm/memcontrol.c            |   22 ++++++++++++++++++++--
 mm/slab.c                  |    2 --
 mm/slab.h                  |   25 ++-----------------------
 mm/slub.c                  |    2 --
 5 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 087a45314181..d38d190f4cec 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -506,8 +506,8 @@ void memcg_update_array_size(int num_groups);
 struct kmem_cache *
 __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
 
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s);
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 81ecb0de95dd..5221347b0e1b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2944,7 +2944,7 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
 }
 #endif
 
-int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
+static int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 {
 	struct res_counter *fail_res;
 	int ret = 0;
@@ -2982,7 +2982,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size)
 	return ret;
 }
 
-void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
+static void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size)
 {
 	res_counter_uncharge(&memcg->res, size);
 	if (do_swap_account)
@@ -3380,6 +3380,24 @@ static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
 	__memcg_create_cache_enqueue(memcg, cachep);
 	memcg_resume_kmem_account();
 }
+
+int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
+{
+	int res;
+
+	res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp,
+				PAGE_SIZE << order);
+	if (!res)
+		atomic_add(1 << order, &cachep->memcg_params->nr_pages);
+	return res;
+}
+
+void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
+{
+	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
+	atomic_sub(1 << order, &cachep->memcg_params->nr_pages);
+}
+
 /*
  * Return the kmem_cache we're supposed to use for a slab allocation.
  * We try to use the current memcg's version of the cache.
diff --git a/mm/slab.c b/mm/slab.c
index cbcd2fa7af2f..fa28f86ac16f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1706,7 +1706,6 @@ static struct page *kmem_getpages(struct kmem_cache *cachep, gfp_t flags,
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
 		SetPageSlabPfmemalloc(page);
-	memcg_bind_pages(cachep, cachep->gfporder);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1742,7 +1741,6 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page)
 	page_mapcount_reset(page);
 	page->mapping = NULL;
 
-	memcg_release_pages(cachep, cachep->gfporder);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
 	__free_pages(page, cachep->gfporder);
diff --git a/mm/slab.h b/mm/slab.h
index efe14d420010..11eb623f0e61 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -119,18 +119,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_add(1 << order, &s->memcg_params->nr_pages);
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-	if (!is_root_cache(s))
-		atomic_sub(1 << order, &s->memcg_params->nr_pages);
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 					struct kmem_cache *p)
 {
@@ -196,8 +184,7 @@ static __always_inline int memcg_charge_slab(struct kmem_cache *s,
 		return 0;
 	if (is_root_cache(s))
 		return 0;
-	return memcg_charge_kmem(s->memcg_params->memcg, gfp,
-				 PAGE_SIZE << order);
+	return __memcg_charge_slab(s, gfp, order);
 }
 
 static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
@@ -206,7 +193,7 @@ static __always_inline void memcg_uncharge_slab(struct kmem_cache *s, int order)
 		return;
 	if (is_root_cache(s))
 		return;
-	memcg_uncharge_kmem(s->memcg_params->memcg, PAGE_SIZE << order);
+	__memcg_uncharge_slab(s, order);
 }
 #else
 static inline bool is_root_cache(struct kmem_cache *s)
@@ -214,14 +201,6 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
-static inline void memcg_bind_pages(struct kmem_cache *s, int order)
-{
-}
-
-static inline void memcg_release_pages(struct kmem_cache *s, int order)
-{
-}
-
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 				      struct kmem_cache *p)
 {
diff --git a/mm/slub.c b/mm/slub.c
index fa7a1817835e..c17d8b9975be 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1427,7 +1427,6 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 
 	order = compound_order(page);
 	inc_slabs_node(s, page_to_nid(page), page->objects);
-	memcg_bind_pages(s, order);
 	page->slab_cache = s;
 	__SetPageSlab(page);
 	if (page->pfmemalloc)
@@ -1478,7 +1477,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 	__ClearPageSlabPfmemalloc(page);
 	__ClearPageSlab(page);
 
-	memcg_release_pages(s, order);
 	page_mapcount_reset(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
  2014-04-18  8:04 ` Vladimir Davydov
@ 2014-04-18  8:04   ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

At present, we have the following mutexes protecting data related to
per memcg kmem caches:

 - slab_mutex. This one is held during the whole kmem cache creation and
   destruction paths. We also take it when updating per root cache
   memcg_caches arrays (see memcg_update_all_caches). As a result,
   taking it guarantees there will be no changes to any kmem cache
   (including per memcg). Why do we need something else then? The point
   is it is private to slab implementation and has some internal
   dependencies with other mutexes (get_online_cpus). So we just don't
   want to rely upon it and prefer to introduce additional mutexes
   instead.

 - activate_kmem_mutex. Initially it was added to synchronize
   initializing kmem limit (memcg_activate_kmem). However, since we can
   grow per root cache memcg_caches arrays only on kmem limit
   initialization (see memcg_update_all_caches), we also employ it to
   protect against memcg_caches arrays relocation (e.g. see
   __kmem_cache_destroy_memcg_children).

 - We have a convention not to take slab_mutex in memcontrol.c, but we
   want to walk over per memcg memcg_slab_caches lists there (e.g. for
   destroying all memcg caches on offline). So we have per memcg
   slab_caches_mutex's protecting those lists.

The mutexes are taken in the following order:

   activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex

Such a syncrhonization scheme has a number of flaws, for instance:

 - We can't call kmem_cache_{destroy,shrink} while walking over a
   memcg::memcg_slab_caches list due to locking order. As a result, in
   mem_cgroup_destroy_all_caches we schedule the
   memcg_cache_params::destroy work shrinking and destroying the cache.

 - We don't have a mutex to synchronize per memcg caches destruction
   between memcg offline (mem_cgroup_destroy_all_caches) and root cache
   destruction (__kmem_cache_destroy_memcg_children). Currently we just
   don't bother about it.

This patch simplifies it by substituting per memcg slab_caches_mutex's
with the global memcg_slab_mutex. It will be held whenever a new per
memcg cache is created or destroyed, so it protects per root cache
memcg_caches arrays and per memcg memcg_slab_caches lists. The locking
order is following:

   activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex

This allows us to call kmem_cache_{create,shrink,destroy} under the
memcg_slab_mutex. As a result, we don't need memcg_cache_params::destroy
work any more - we can simply destroy caches while iterating over a per
memcg slab caches list.

Also using the global mutex simplifies synchronization between
concurrent per memcg caches creation/destruction, e.g.
mem_cgroup_destroy_all_caches vs __kmem_cache_destroy_memcg_children.

The downside of this is that we substitute per-memcg slab_caches_mutex's
with a hummer-like global mutex, but since we already take either the
slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it
shouldn't hurt concurrency a lot.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |   10 ---
 include/linux/slab.h       |    6 +-
 mm/memcontrol.c            |  150 +++++++++++++++++---------------------------
 mm/slab_common.c           |   22 +++----
 4 files changed, 68 insertions(+), 120 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d38d190f4cec..1fa23244fe37 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -497,8 +497,6 @@ char *memcg_create_cache_name(struct mem_cgroup *memcg,
 int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 			     struct kmem_cache *root_cache);
 void memcg_free_cache_params(struct kmem_cache *s);
-void memcg_register_cache(struct kmem_cache *s);
-void memcg_unregister_cache(struct kmem_cache *s);
 
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups);
 void memcg_update_array_size(int num_groups);
@@ -640,14 +638,6 @@ static inline void memcg_free_cache_params(struct kmem_cache *s)
 {
 }
 
-static inline void memcg_register_cache(struct kmem_cache *s)
-{
-}
-
-static inline void memcg_unregister_cache(struct kmem_cache *s)
-{
-}
-
 static inline struct kmem_cache *
 memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 905541dd3778..ecbec9ccb80d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -116,7 +116,8 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
 			unsigned long,
 			void (*)(void *));
 #ifdef CONFIG_MEMCG_KMEM
-void kmem_cache_create_memcg(struct mem_cgroup *, struct kmem_cache *);
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *,
+					   struct kmem_cache *);
 #endif
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
@@ -525,8 +526,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @nr_pages: number of pages that belongs to this cache.
- * @destroy: worker to be called whenever we are ready, or believe we may be
- *           ready, to destroy this cache.
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -540,7 +539,6 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_t nr_pages;
-			struct work_struct destroy;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5221347b0e1b..900f968b41ee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -357,10 +357,9 @@ struct mem_cgroup {
 	struct cg_proto tcp_mem;
 #endif
 #if defined(CONFIG_MEMCG_KMEM)
-	/* analogous to slab_common's slab_caches list. per-memcg */
+	/* analogous to slab_common's slab_caches list, but per-memcg;
+	 * protected by memcg_slab_mutex */
 	struct list_head memcg_slab_caches;
-	/* Not a spinlock, we can take a lot of time walking the list */
-	struct mutex slab_caches_mutex;
         /* Index in the kmem_cache->memcg_params->memcg_caches array */
 	int kmemcg_id;
 #endif
@@ -2903,6 +2902,12 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 static DEFINE_MUTEX(set_limit_mutex);
 
 #ifdef CONFIG_MEMCG_KMEM
+/*
+ * The memcg_slab_mutex is held whenever a per memcg kmem cache is created or
+ * destroyed. It protects memcg_caches arrays and memcg_slab_caches lists.
+ */
+static DEFINE_MUTEX(memcg_slab_mutex);
+
 static DEFINE_MUTEX(activate_kmem_mutex);
 
 static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg)
@@ -2935,10 +2940,10 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
 
 	print_slabinfo_header(m);
 
-	mutex_lock(&memcg->slab_caches_mutex);
+	mutex_lock(&memcg_slab_mutex);
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list)
 		cache_show(memcg_params_to_cache(params), m);
-	mutex_unlock(&memcg->slab_caches_mutex);
+	mutex_unlock(&memcg_slab_mutex);
 
 	return 0;
 }
@@ -3040,8 +3045,6 @@ void memcg_update_array_size(int num)
 		memcg_limited_groups_array_size = memcg_caches_array_size(num);
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w);
-
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 {
 	struct memcg_cache_params *cur_params = s->memcg_params;
@@ -3138,8 +3141,6 @@ int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 	if (memcg) {
 		s->memcg_params->memcg = memcg;
 		s->memcg_params->root_cache = root_cache;
-		INIT_WORK(&s->memcg_params->destroy,
-				kmem_cache_destroy_work_func);
 		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
@@ -3156,24 +3157,34 @@ void memcg_free_cache_params(struct kmem_cache *s)
 	kfree(s->memcg_params);
 }
 
-void memcg_register_cache(struct kmem_cache *s)
+static void memcg_kmem_create_cache(struct mem_cgroup *memcg,
+				    struct kmem_cache *root_cache)
 {
-	struct kmem_cache *root;
-	struct mem_cgroup *memcg;
+	struct kmem_cache *cachep;
 	int id;
 
-	if (is_root_cache(s))
+	lockdep_assert_held(&memcg_slab_mutex);
+
+	id = memcg_cache_id(memcg);
+
+	/*
+	 * Since per-memcg caches are created asynchronously on first
+	 * allocation (see memcg_kmem_get_cache()), several threads can try to
+	 * create the same cache, but only one of them may succeed.
+	 */
+	if (cache_from_memcg_idx(root_cache, id))
 		return;
 
+	cachep = kmem_cache_create_memcg(memcg, root_cache);
 	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
+	 * If we could not create a memcg cache, do not complain, because
+	 * that's not critical at all as we can always proceed with the root
+	 * cache.
 	 */
-	lockdep_assert_held(&slab_mutex);
+	if (!cachep)
+		return;
 
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
-	id = memcg_cache_id(memcg);
+	list_add(&cachep->memcg_params->list, &memcg->memcg_slab_caches);
 
 	/*
 	 * Since readers won't lock (see cache_from_memcg_idx()), we need a
@@ -3182,49 +3193,30 @@ void memcg_register_cache(struct kmem_cache *s)
 	 */
 	smp_wmb();
 
-	/*
-	 * Initialize the pointer to this cache in its parent's memcg_params
-	 * before adding it to the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
-	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id]);
-	root->memcg_params->memcg_caches[id] = s;
-
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_add(&s->memcg_params->list, &memcg->memcg_slab_caches);
-	mutex_unlock(&memcg->slab_caches_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id]);
+	root_cache->memcg_params->memcg_caches[id] = cachep;
 }
 
-void memcg_unregister_cache(struct kmem_cache *s)
+static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 {
-	struct kmem_cache *root;
+	struct kmem_cache *root_cache;
 	struct mem_cgroup *memcg;
 	int id;
 
-	if (is_root_cache(s))
-		return;
+	lockdep_assert_held(&memcg_slab_mutex);
 
-	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
-	 */
-	lockdep_assert_held(&slab_mutex);
+	BUG_ON(is_root_cache(cachep));
 
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
+	root_cache = cachep->memcg_params->root_cache;
+	memcg = cachep->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_del(&s->memcg_params->list);
-	mutex_unlock(&memcg->slab_caches_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
+	root_cache->memcg_params->memcg_caches[id] = NULL;
 
-	/*
-	 * Clear the pointer to this cache in its parent's memcg_params only
-	 * after removing it from the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
-	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id] != s);
-	root->memcg_params->memcg_caches[id] = NULL;
+	list_del(&cachep->memcg_params->list);
+
+	kmem_cache_destroy(cachep);
 }
 
 /*
@@ -3258,70 +3250,42 @@ static inline void memcg_resume_kmem_account(void)
 	current->memcg_kmem_skip_account--;
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w)
-{
-	struct kmem_cache *cachep;
-	struct memcg_cache_params *p;
-
-	p = container_of(w, struct memcg_cache_params, destroy);
-
-	cachep = memcg_params_to_cache(p);
-
-	kmem_cache_shrink(cachep);
-	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
-		kmem_cache_destroy(cachep);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
 	int i, failed = 0;
 
-	/*
-	 * If the cache is being destroyed, we trust that there is no one else
-	 * requesting objects from it. Even if there are, the sanity checks in
-	 * kmem_cache_destroy should caught this ill-case.
-	 *
-	 * Still, we don't want anyone else freeing memcg_caches under our
-	 * noses, which can happen if a new memcg comes to life. As usual,
-	 * we'll take the activate_kmem_mutex to protect ourselves against
-	 * this.
-	 */
-	mutex_lock(&activate_kmem_mutex);
+	mutex_lock(&memcg_slab_mutex);
 	for_each_memcg_cache_index(i) {
 		c = cache_from_memcg_idx(s, i);
 		if (!c)
 			continue;
 
-		/*
-		 * We will now manually delete the caches, so to avoid races
-		 * we need to cancel all pending destruction workers and
-		 * proceed with destruction ourselves.
-		 */
-		cancel_work_sync(&c->memcg_params->destroy);
-		kmem_cache_destroy(c);
+		memcg_kmem_destroy_cache(c);
 
 		if (cache_from_memcg_idx(s, i))
 			failed++;
 	}
-	mutex_unlock(&activate_kmem_mutex);
+	mutex_unlock(&memcg_slab_mutex);
 	return failed;
 }
 
 static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
-	struct memcg_cache_params *params;
+	struct memcg_cache_params *params, *tmp;
 
 	if (!memcg_kmem_is_active(memcg))
 		return;
 
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
+	mutex_lock(&memcg_slab_mutex);
+	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		schedule_work(&cachep->memcg_params->destroy);
+		kmem_cache_shrink(cachep);
+		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+			memcg_kmem_destroy_cache(cachep);
 	}
-	mutex_unlock(&memcg->slab_caches_mutex);
+	mutex_unlock(&memcg_slab_mutex);
 }
 
 struct create_work {
@@ -3336,7 +3300,10 @@ static void memcg_create_cache_work_func(struct work_struct *w)
 	struct mem_cgroup *memcg = cw->memcg;
 	struct kmem_cache *cachep = cw->cachep;
 
-	kmem_cache_create_memcg(memcg, cachep);
+	mutex_lock(&memcg_slab_mutex);
+	memcg_kmem_create_cache(memcg, cachep);
+	mutex_unlock(&memcg_slab_mutex);
+
 	css_put(&memcg->css);
 	kfree(cw);
 }
@@ -5021,13 +4988,14 @@ static int __memcg_activate_kmem(struct mem_cgroup *memcg,
 	 * Make sure we have enough space for this cgroup in each root cache's
 	 * memcg_params.
 	 */
+	mutex_lock(&memcg_slab_mutex);
 	err = memcg_update_all_caches(memcg_id + 1);
+	mutex_unlock(&memcg_slab_mutex);
 	if (err)
 		goto out_rmid;
 
 	memcg->kmemcg_id = memcg_id;
 	INIT_LIST_HEAD(&memcg->memcg_slab_caches);
-	mutex_init(&memcg->slab_caches_mutex);
 
 	/*
 	 * We couldn't have accounted to this cgroup, because it hasn't got the
diff --git a/mm/slab_common.c b/mm/slab_common.c
index cab4c49b3e8c..b3968ca1e55d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -160,7 +160,6 @@ do_kmem_cache_create(char *name, size_t object_size, size_t size, size_t align,
 
 	s->refcount = 1;
 	list_add(&s->list, &slab_caches);
-	memcg_register_cache(s);
 out:
 	if (err)
 		return ERR_PTR(err);
@@ -266,22 +265,15 @@ EXPORT_SYMBOL(kmem_cache_create);
  * requests going from @memcg to @root_cache. The new cache inherits properties
  * from its parent.
  */
-void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_cache)
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *memcg,
+					   struct kmem_cache *root_cache)
 {
-	struct kmem_cache *s;
+	struct kmem_cache *s = NULL;
 	char *cache_name;
 
 	get_online_cpus();
 	mutex_lock(&slab_mutex);
 
-	/*
-	 * Since per-memcg caches are created asynchronously on first
-	 * allocation (see memcg_kmem_get_cache()), several threads can try to
-	 * create the same cache, but only one of them may succeed.
-	 */
-	if (cache_from_memcg_idx(root_cache, memcg_cache_id(memcg)))
-		goto out_unlock;
-
 	cache_name = memcg_create_cache_name(memcg, root_cache);
 	if (!cache_name)
 		goto out_unlock;
@@ -290,12 +282,15 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 				 root_cache->size, root_cache->align,
 				 root_cache->flags, root_cache->ctor,
 				 memcg, root_cache);
-	if (IS_ERR(s))
+	if (IS_ERR(s)) {
 		kfree(cache_name);
+		s = NULL;
+	}
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
 	put_online_cpus();
+	return s;
 }
 
 static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
@@ -332,11 +327,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		goto out_unlock;
 
 	list_del(&s->list);
-	memcg_unregister_cache(s);
-
 	if (__kmem_cache_shutdown(s) != 0) {
 		list_add(&s->list, &slab_caches);
-		memcg_register_cache(s);
 		printk(KERN_ERR "kmem_cache_destroy %s: "
 		       "Slab cache still has objects\n", s->name);
 		dump_stack();
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
@ 2014-04-18  8:04   ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:04 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

At present, we have the following mutexes protecting data related to
per memcg kmem caches:

 - slab_mutex. This one is held during the whole kmem cache creation and
   destruction paths. We also take it when updating per root cache
   memcg_caches arrays (see memcg_update_all_caches). As a result,
   taking it guarantees there will be no changes to any kmem cache
   (including per memcg). Why do we need something else then? The point
   is it is private to slab implementation and has some internal
   dependencies with other mutexes (get_online_cpus). So we just don't
   want to rely upon it and prefer to introduce additional mutexes
   instead.

 - activate_kmem_mutex. Initially it was added to synchronize
   initializing kmem limit (memcg_activate_kmem). However, since we can
   grow per root cache memcg_caches arrays only on kmem limit
   initialization (see memcg_update_all_caches), we also employ it to
   protect against memcg_caches arrays relocation (e.g. see
   __kmem_cache_destroy_memcg_children).

 - We have a convention not to take slab_mutex in memcontrol.c, but we
   want to walk over per memcg memcg_slab_caches lists there (e.g. for
   destroying all memcg caches on offline). So we have per memcg
   slab_caches_mutex's protecting those lists.

The mutexes are taken in the following order:

   activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex

Such a syncrhonization scheme has a number of flaws, for instance:

 - We can't call kmem_cache_{destroy,shrink} while walking over a
   memcg::memcg_slab_caches list due to locking order. As a result, in
   mem_cgroup_destroy_all_caches we schedule the
   memcg_cache_params::destroy work shrinking and destroying the cache.

 - We don't have a mutex to synchronize per memcg caches destruction
   between memcg offline (mem_cgroup_destroy_all_caches) and root cache
   destruction (__kmem_cache_destroy_memcg_children). Currently we just
   don't bother about it.

This patch simplifies it by substituting per memcg slab_caches_mutex's
with the global memcg_slab_mutex. It will be held whenever a new per
memcg cache is created or destroyed, so it protects per root cache
memcg_caches arrays and per memcg memcg_slab_caches lists. The locking
order is following:

   activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex

This allows us to call kmem_cache_{create,shrink,destroy} under the
memcg_slab_mutex. As a result, we don't need memcg_cache_params::destroy
work any more - we can simply destroy caches while iterating over a per
memcg slab caches list.

Also using the global mutex simplifies synchronization between
concurrent per memcg caches creation/destruction, e.g.
mem_cgroup_destroy_all_caches vs __kmem_cache_destroy_memcg_children.

The downside of this is that we substitute per-memcg slab_caches_mutex's
with a hummer-like global mutex, but since we already take either the
slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it
shouldn't hurt concurrency a lot.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/memcontrol.h |   10 ---
 include/linux/slab.h       |    6 +-
 mm/memcontrol.c            |  150 +++++++++++++++++---------------------------
 mm/slab_common.c           |   22 +++----
 4 files changed, 68 insertions(+), 120 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d38d190f4cec..1fa23244fe37 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -497,8 +497,6 @@ char *memcg_create_cache_name(struct mem_cgroup *memcg,
 int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 			     struct kmem_cache *root_cache);
 void memcg_free_cache_params(struct kmem_cache *s);
-void memcg_register_cache(struct kmem_cache *s);
-void memcg_unregister_cache(struct kmem_cache *s);
 
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups);
 void memcg_update_array_size(int num_groups);
@@ -640,14 +638,6 @@ static inline void memcg_free_cache_params(struct kmem_cache *s)
 {
 }
 
-static inline void memcg_register_cache(struct kmem_cache *s)
-{
-}
-
-static inline void memcg_unregister_cache(struct kmem_cache *s)
-{
-}
-
 static inline struct kmem_cache *
 memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 905541dd3778..ecbec9ccb80d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -116,7 +116,8 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
 			unsigned long,
 			void (*)(void *));
 #ifdef CONFIG_MEMCG_KMEM
-void kmem_cache_create_memcg(struct mem_cgroup *, struct kmem_cache *);
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *,
+					   struct kmem_cache *);
 #endif
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
@@ -525,8 +526,6 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @nr_pages: number of pages that belongs to this cache.
- * @destroy: worker to be called whenever we are ready, or believe we may be
- *           ready, to destroy this cache.
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -540,7 +539,6 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_t nr_pages;
-			struct work_struct destroy;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5221347b0e1b..900f968b41ee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -357,10 +357,9 @@ struct mem_cgroup {
 	struct cg_proto tcp_mem;
 #endif
 #if defined(CONFIG_MEMCG_KMEM)
-	/* analogous to slab_common's slab_caches list. per-memcg */
+	/* analogous to slab_common's slab_caches list, but per-memcg;
+	 * protected by memcg_slab_mutex */
 	struct list_head memcg_slab_caches;
-	/* Not a spinlock, we can take a lot of time walking the list */
-	struct mutex slab_caches_mutex;
         /* Index in the kmem_cache->memcg_params->memcg_caches array */
 	int kmemcg_id;
 #endif
@@ -2903,6 +2902,12 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 static DEFINE_MUTEX(set_limit_mutex);
 
 #ifdef CONFIG_MEMCG_KMEM
+/*
+ * The memcg_slab_mutex is held whenever a per memcg kmem cache is created or
+ * destroyed. It protects memcg_caches arrays and memcg_slab_caches lists.
+ */
+static DEFINE_MUTEX(memcg_slab_mutex);
+
 static DEFINE_MUTEX(activate_kmem_mutex);
 
 static inline bool memcg_can_account_kmem(struct mem_cgroup *memcg)
@@ -2935,10 +2940,10 @@ static int mem_cgroup_slabinfo_read(struct seq_file *m, void *v)
 
 	print_slabinfo_header(m);
 
-	mutex_lock(&memcg->slab_caches_mutex);
+	mutex_lock(&memcg_slab_mutex);
 	list_for_each_entry(params, &memcg->memcg_slab_caches, list)
 		cache_show(memcg_params_to_cache(params), m);
-	mutex_unlock(&memcg->slab_caches_mutex);
+	mutex_unlock(&memcg_slab_mutex);
 
 	return 0;
 }
@@ -3040,8 +3045,6 @@ void memcg_update_array_size(int num)
 		memcg_limited_groups_array_size = memcg_caches_array_size(num);
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w);
-
 int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 {
 	struct memcg_cache_params *cur_params = s->memcg_params;
@@ -3138,8 +3141,6 @@ int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 	if (memcg) {
 		s->memcg_params->memcg = memcg;
 		s->memcg_params->root_cache = root_cache;
-		INIT_WORK(&s->memcg_params->destroy,
-				kmem_cache_destroy_work_func);
 		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
@@ -3156,24 +3157,34 @@ void memcg_free_cache_params(struct kmem_cache *s)
 	kfree(s->memcg_params);
 }
 
-void memcg_register_cache(struct kmem_cache *s)
+static void memcg_kmem_create_cache(struct mem_cgroup *memcg,
+				    struct kmem_cache *root_cache)
 {
-	struct kmem_cache *root;
-	struct mem_cgroup *memcg;
+	struct kmem_cache *cachep;
 	int id;
 
-	if (is_root_cache(s))
+	lockdep_assert_held(&memcg_slab_mutex);
+
+	id = memcg_cache_id(memcg);
+
+	/*
+	 * Since per-memcg caches are created asynchronously on first
+	 * allocation (see memcg_kmem_get_cache()), several threads can try to
+	 * create the same cache, but only one of them may succeed.
+	 */
+	if (cache_from_memcg_idx(root_cache, id))
 		return;
 
+	cachep = kmem_cache_create_memcg(memcg, root_cache);
 	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
+	 * If we could not create a memcg cache, do not complain, because
+	 * that's not critical at all as we can always proceed with the root
+	 * cache.
 	 */
-	lockdep_assert_held(&slab_mutex);
+	if (!cachep)
+		return;
 
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
-	id = memcg_cache_id(memcg);
+	list_add(&cachep->memcg_params->list, &memcg->memcg_slab_caches);
 
 	/*
 	 * Since readers won't lock (see cache_from_memcg_idx()), we need a
@@ -3182,49 +3193,30 @@ void memcg_register_cache(struct kmem_cache *s)
 	 */
 	smp_wmb();
 
-	/*
-	 * Initialize the pointer to this cache in its parent's memcg_params
-	 * before adding it to the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
-	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id]);
-	root->memcg_params->memcg_caches[id] = s;
-
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_add(&s->memcg_params->list, &memcg->memcg_slab_caches);
-	mutex_unlock(&memcg->slab_caches_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id]);
+	root_cache->memcg_params->memcg_caches[id] = cachep;
 }
 
-void memcg_unregister_cache(struct kmem_cache *s)
+static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)
 {
-	struct kmem_cache *root;
+	struct kmem_cache *root_cache;
 	struct mem_cgroup *memcg;
 	int id;
 
-	if (is_root_cache(s))
-		return;
+	lockdep_assert_held(&memcg_slab_mutex);
 
-	/*
-	 * Holding the slab_mutex assures nobody will touch the memcg_caches
-	 * array while we are modifying it.
-	 */
-	lockdep_assert_held(&slab_mutex);
+	BUG_ON(is_root_cache(cachep));
 
-	root = s->memcg_params->root_cache;
-	memcg = s->memcg_params->memcg;
+	root_cache = cachep->memcg_params->root_cache;
+	memcg = cachep->memcg_params->memcg;
 	id = memcg_cache_id(memcg);
 
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_del(&s->memcg_params->list);
-	mutex_unlock(&memcg->slab_caches_mutex);
+	BUG_ON(root_cache->memcg_params->memcg_caches[id] != cachep);
+	root_cache->memcg_params->memcg_caches[id] = NULL;
 
-	/*
-	 * Clear the pointer to this cache in its parent's memcg_params only
-	 * after removing it from the memcg_slab_caches list, otherwise we can
-	 * fail to convert memcg_params_to_cache() while traversing the list.
-	 */
-	VM_BUG_ON(root->memcg_params->memcg_caches[id] != s);
-	root->memcg_params->memcg_caches[id] = NULL;
+	list_del(&cachep->memcg_params->list);
+
+	kmem_cache_destroy(cachep);
 }
 
 /*
@@ -3258,70 +3250,42 @@ static inline void memcg_resume_kmem_account(void)
 	current->memcg_kmem_skip_account--;
 }
 
-static void kmem_cache_destroy_work_func(struct work_struct *w)
-{
-	struct kmem_cache *cachep;
-	struct memcg_cache_params *p;
-
-	p = container_of(w, struct memcg_cache_params, destroy);
-
-	cachep = memcg_params_to_cache(p);
-
-	kmem_cache_shrink(cachep);
-	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
-		kmem_cache_destroy(cachep);
-}
-
 int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
 {
 	struct kmem_cache *c;
 	int i, failed = 0;
 
-	/*
-	 * If the cache is being destroyed, we trust that there is no one else
-	 * requesting objects from it. Even if there are, the sanity checks in
-	 * kmem_cache_destroy should caught this ill-case.
-	 *
-	 * Still, we don't want anyone else freeing memcg_caches under our
-	 * noses, which can happen if a new memcg comes to life. As usual,
-	 * we'll take the activate_kmem_mutex to protect ourselves against
-	 * this.
-	 */
-	mutex_lock(&activate_kmem_mutex);
+	mutex_lock(&memcg_slab_mutex);
 	for_each_memcg_cache_index(i) {
 		c = cache_from_memcg_idx(s, i);
 		if (!c)
 			continue;
 
-		/*
-		 * We will now manually delete the caches, so to avoid races
-		 * we need to cancel all pending destruction workers and
-		 * proceed with destruction ourselves.
-		 */
-		cancel_work_sync(&c->memcg_params->destroy);
-		kmem_cache_destroy(c);
+		memcg_kmem_destroy_cache(c);
 
 		if (cache_from_memcg_idx(s, i))
 			failed++;
 	}
-	mutex_unlock(&activate_kmem_mutex);
+	mutex_unlock(&memcg_slab_mutex);
 	return failed;
 }
 
 static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
-	struct memcg_cache_params *params;
+	struct memcg_cache_params *params, *tmp;
 
 	if (!memcg_kmem_is_active(memcg))
 		return;
 
-	mutex_lock(&memcg->slab_caches_mutex);
-	list_for_each_entry(params, &memcg->memcg_slab_caches, list) {
+	mutex_lock(&memcg_slab_mutex);
+	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
-		schedule_work(&cachep->memcg_params->destroy);
+		kmem_cache_shrink(cachep);
+		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+			memcg_kmem_destroy_cache(cachep);
 	}
-	mutex_unlock(&memcg->slab_caches_mutex);
+	mutex_unlock(&memcg_slab_mutex);
 }
 
 struct create_work {
@@ -3336,7 +3300,10 @@ static void memcg_create_cache_work_func(struct work_struct *w)
 	struct mem_cgroup *memcg = cw->memcg;
 	struct kmem_cache *cachep = cw->cachep;
 
-	kmem_cache_create_memcg(memcg, cachep);
+	mutex_lock(&memcg_slab_mutex);
+	memcg_kmem_create_cache(memcg, cachep);
+	mutex_unlock(&memcg_slab_mutex);
+
 	css_put(&memcg->css);
 	kfree(cw);
 }
@@ -5021,13 +4988,14 @@ static int __memcg_activate_kmem(struct mem_cgroup *memcg,
 	 * Make sure we have enough space for this cgroup in each root cache's
 	 * memcg_params.
 	 */
+	mutex_lock(&memcg_slab_mutex);
 	err = memcg_update_all_caches(memcg_id + 1);
+	mutex_unlock(&memcg_slab_mutex);
 	if (err)
 		goto out_rmid;
 
 	memcg->kmemcg_id = memcg_id;
 	INIT_LIST_HEAD(&memcg->memcg_slab_caches);
-	mutex_init(&memcg->slab_caches_mutex);
 
 	/*
 	 * We couldn't have accounted to this cgroup, because it hasn't got the
diff --git a/mm/slab_common.c b/mm/slab_common.c
index cab4c49b3e8c..b3968ca1e55d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -160,7 +160,6 @@ do_kmem_cache_create(char *name, size_t object_size, size_t size, size_t align,
 
 	s->refcount = 1;
 	list_add(&s->list, &slab_caches);
-	memcg_register_cache(s);
 out:
 	if (err)
 		return ERR_PTR(err);
@@ -266,22 +265,15 @@ EXPORT_SYMBOL(kmem_cache_create);
  * requests going from @memcg to @root_cache. The new cache inherits properties
  * from its parent.
  */
-void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_cache)
+struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *memcg,
+					   struct kmem_cache *root_cache)
 {
-	struct kmem_cache *s;
+	struct kmem_cache *s = NULL;
 	char *cache_name;
 
 	get_online_cpus();
 	mutex_lock(&slab_mutex);
 
-	/*
-	 * Since per-memcg caches are created asynchronously on first
-	 * allocation (see memcg_kmem_get_cache()), several threads can try to
-	 * create the same cache, but only one of them may succeed.
-	 */
-	if (cache_from_memcg_idx(root_cache, memcg_cache_id(memcg)))
-		goto out_unlock;
-
 	cache_name = memcg_create_cache_name(memcg, root_cache);
 	if (!cache_name)
 		goto out_unlock;
@@ -290,12 +282,15 @@ void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_c
 				 root_cache->size, root_cache->align,
 				 root_cache->flags, root_cache->ctor,
 				 memcg, root_cache);
-	if (IS_ERR(s))
+	if (IS_ERR(s)) {
 		kfree(cache_name);
+		s = NULL;
+	}
 
 out_unlock:
 	mutex_unlock(&slab_mutex);
 	put_online_cpus();
+	return s;
 }
 
 static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
@@ -332,11 +327,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		goto out_unlock;
 
 	list_del(&s->list);
-	memcg_unregister_cache(s);
-
 	if (__kmem_cache_shutdown(s) != 0) {
 		list_add(&s->list, &slab_caches);
-		memcg_register_cache(s);
 		printk(KERN_ERR "kmem_cache_destroy %s: "
 		       "Slab cache still has objects\n", s->name);
 		dump_stack();
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
  2014-04-18  8:04 ` Vladimir Davydov
@ 2014-04-18  8:08   ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:08 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 12:04 PM, Vladimir Davydov wrote:
> Hi Michal, Johannes,
>
> This patch-set is a part of preparations for kmemcg re-parenting. It
> targets at simplifying kmemcg work-flows and synchronization.
>
> First, it removes async per memcg cache destruction (see patches 1, 2).
> Now caches are only destroyed on memcg offline. That means the caches
> that are not empty on memcg offline will be leaked. However, they are
> already leaked, because memcg_cache_params::nr_pages normally never
> drops to 0 so the destruction work is never scheduled except
> kmem_cache_shrink is called explicitly. In the future I'm planning
> reaping such dead caches on vmpressure or periodically.
>
> Second, it substitutes per memcg slab_caches_mutex's with the global
> memcg_slab_mutex, which should be taken during the whole per memcg cache
> creation/destruction path before the slab_mutex (see patch 3). This
> greatly simplifies synchronization among various per memcg cache
> creation/destruction paths.

v1 can be found here: https://lkml.org/lkml/2014/4/9/298

Changes in v2:
  - substitute per memcg slab_caches_mutex's with the global
    memcg_slab_mutex and re-split the set.

>
> I really need your help, because I'm far not sure if what I'm doing here
> is right. So I would appreciate if you could look through the patches
> and share your thoughts about the design changes they introduce.
>
> Thanks,
>
> Vladimir Davydov (3):
>    memcg, slab: do not schedule cache destruction when last page goes
>      away
>    memcg, slab: merge memcg_{bind,release}_pages to
>      memcg_{un}charge_slab
>    memcg, slab: simplify synchronization scheme
>
>   include/linux/memcontrol.h |   15 +--
>   include/linux/slab.h       |    8 +-
>   mm/memcontrol.c            |  231 +++++++++++++++-----------------------------
>   mm/slab.c                  |    2 -
>   mm/slab.h                  |   28 +-----
>   mm/slab_common.c           |   22 ++---
>   mm/slub.c                  |    2 -
>   7 files changed, 92 insertions(+), 216 deletions(-)
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
@ 2014-04-18  8:08   ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18  8:08 UTC (permalink / raw)
  To: mhocko, hannes; +Cc: akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 12:04 PM, Vladimir Davydov wrote:
> Hi Michal, Johannes,
>
> This patch-set is a part of preparations for kmemcg re-parenting. It
> targets at simplifying kmemcg work-flows and synchronization.
>
> First, it removes async per memcg cache destruction (see patches 1, 2).
> Now caches are only destroyed on memcg offline. That means the caches
> that are not empty on memcg offline will be leaked. However, they are
> already leaked, because memcg_cache_params::nr_pages normally never
> drops to 0 so the destruction work is never scheduled except
> kmem_cache_shrink is called explicitly. In the future I'm planning
> reaping such dead caches on vmpressure or periodically.
>
> Second, it substitutes per memcg slab_caches_mutex's with the global
> memcg_slab_mutex, which should be taken during the whole per memcg cache
> creation/destruction path before the slab_mutex (see patch 3). This
> greatly simplifies synchronization among various per memcg cache
> creation/destruction paths.

v1 can be found here: https://lkml.org/lkml/2014/4/9/298

Changes in v2:
  - substitute per memcg slab_caches_mutex's with the global
    memcg_slab_mutex and re-split the set.

>
> I really need your help, because I'm far not sure if what I'm doing here
> is right. So I would appreciate if you could look through the patches
> and share your thoughts about the design changes they introduce.
>
> Thanks,
>
> Vladimir Davydov (3):
>    memcg, slab: do not schedule cache destruction when last page goes
>      away
>    memcg, slab: merge memcg_{bind,release}_pages to
>      memcg_{un}charge_slab
>    memcg, slab: simplify synchronization scheme
>
>   include/linux/memcontrol.h |   15 +--
>   include/linux/slab.h       |    8 +-
>   mm/memcontrol.c            |  231 +++++++++++++++-----------------------------
>   mm/slab.c                  |    2 -
>   mm/slab.h                  |   28 +-----
>   mm/slab_common.c           |   22 ++---
>   mm/slub.c                  |    2 -
>   7 files changed, 92 insertions(+), 216 deletions(-)
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
  2014-04-18  8:04 ` Vladimir Davydov
@ 2014-04-18 13:23   ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 13:23 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Vladimir,

On Fri, Apr 18, 2014 at 12:04:46PM +0400, Vladimir Davydov wrote:
> Hi Michal, Johannes,
> 
> This patch-set is a part of preparations for kmemcg re-parenting. It
> targets at simplifying kmemcg work-flows and synchronization.
> 
> First, it removes async per memcg cache destruction (see patches 1, 2).
> Now caches are only destroyed on memcg offline. That means the caches
> that are not empty on memcg offline will be leaked. However, they are
> already leaked, because memcg_cache_params::nr_pages normally never
> drops to 0 so the destruction work is never scheduled except
> kmem_cache_shrink is called explicitly. In the future I'm planning
> reaping such dead caches on vmpressure or periodically.

I like the synchronous handling on css destruction, but the periodical
reaping part still bothers me.  If there is absolutely 0 use for these
caches remaining, they shouldn't hang around until we encounter memory
pressure or a random time interval.

Would it be feasible to implement cache merging in both slub and slab,
so that upon css destruction the child's cache's remaining slabs could
be moved to the parent's cache?  If the parent doesn't have one, just
reparent the whole cache.

> Second, it substitutes per memcg slab_caches_mutex's with the global
> memcg_slab_mutex, which should be taken during the whole per memcg cache
> creation/destruction path before the slab_mutex (see patch 3). This
> greatly simplifies synchronization among various per memcg cache
> creation/destruction paths.

This sounds reasonable.  I'll go look at the code.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
@ 2014-04-18 13:23   ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 13:23 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Vladimir,

On Fri, Apr 18, 2014 at 12:04:46PM +0400, Vladimir Davydov wrote:
> Hi Michal, Johannes,
> 
> This patch-set is a part of preparations for kmemcg re-parenting. It
> targets at simplifying kmemcg work-flows and synchronization.
> 
> First, it removes async per memcg cache destruction (see patches 1, 2).
> Now caches are only destroyed on memcg offline. That means the caches
> that are not empty on memcg offline will be leaked. However, they are
> already leaked, because memcg_cache_params::nr_pages normally never
> drops to 0 so the destruction work is never scheduled except
> kmem_cache_shrink is called explicitly. In the future I'm planning
> reaping such dead caches on vmpressure or periodically.

I like the synchronous handling on css destruction, but the periodical
reaping part still bothers me.  If there is absolutely 0 use for these
caches remaining, they shouldn't hang around until we encounter memory
pressure or a random time interval.

Would it be feasible to implement cache merging in both slub and slab,
so that upon css destruction the child's cache's remaining slabs could
be moved to the parent's cache?  If the parent doesn't have one, just
reparent the whole cache.

> Second, it substitutes per memcg slab_caches_mutex's with the global
> memcg_slab_mutex, which should be taken during the whole per memcg cache
> creation/destruction path before the slab_mutex (see patch 3). This
> greatly simplifies synchronization among various per memcg cache
> creation/destruction paths.

This sounds reasonable.  I'll go look at the code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-18  8:04   ` Vladimir Davydov
@ 2014-04-18 13:41     ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 13:41 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Fri, Apr 18, 2014 at 12:04:47PM +0400, Vladimir Davydov wrote:
> After a memcg is offlined, we mark its kmem caches that cannot be
> deleted right now due to pending objects as dead by setting the
> memcg_cache_params::dead flag, so that memcg_release_pages will schedule
> cache destruction (memcg_cache_params::destroy) as soon as the last slab
> of the cache is freed (memcg_cache_params::nr_pages drops to zero).
> 
> I guess the idea was to destroy the caches as soon as possible, i.e.
> immediately after freeing the last object. However, it just doesn't work
> that way, because kmem caches always preserve some pages for the sake of
> performance, so that nr_pages never gets to zero unless the cache is
> shrunk explicitly using kmem_cache_shrink. Of course, we could account
> the total number of objects on the cache or check if all the slabs
> allocated for the cache are empty on kmem_cache_free and schedule
> destruction if so, but that would be too costly.
> 
> Thus we have a piece of code that works only when we explicitly call
> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
> it's racy in fact. For instance, kmem_cache_shrink may free the last
> slab and thus schedule cache destruction before it finishes checking
> that the cache is empty, which can lead to use-after-free.

Can't this still happen when the last object free races with css
destruction?  IIRC, you were worried in the past that slab/slub might
need a refcount to the cache to prevent this.  What changed?

> So I propose to remove this async cache destruction from
> memcg_release_pages, and check if the cache is empty explicitly after
> calling kmem_cache_shrink instead. This will simplify things a lot w/o
> introducing any functional changes.

Agreed.

> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-18 13:41     ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 13:41 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Fri, Apr 18, 2014 at 12:04:47PM +0400, Vladimir Davydov wrote:
> After a memcg is offlined, we mark its kmem caches that cannot be
> deleted right now due to pending objects as dead by setting the
> memcg_cache_params::dead flag, so that memcg_release_pages will schedule
> cache destruction (memcg_cache_params::destroy) as soon as the last slab
> of the cache is freed (memcg_cache_params::nr_pages drops to zero).
> 
> I guess the idea was to destroy the caches as soon as possible, i.e.
> immediately after freeing the last object. However, it just doesn't work
> that way, because kmem caches always preserve some pages for the sake of
> performance, so that nr_pages never gets to zero unless the cache is
> shrunk explicitly using kmem_cache_shrink. Of course, we could account
> the total number of objects on the cache or check if all the slabs
> allocated for the cache are empty on kmem_cache_free and schedule
> destruction if so, but that would be too costly.
> 
> Thus we have a piece of code that works only when we explicitly call
> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
> it's racy in fact. For instance, kmem_cache_shrink may free the last
> slab and thus schedule cache destruction before it finishes checking
> that the cache is empty, which can lead to use-after-free.

Can't this still happen when the last object free races with css
destruction?  IIRC, you were worried in the past that slab/slub might
need a refcount to the cache to prevent this.  What changed?

> So I propose to remove this async cache destruction from
> memcg_release_pages, and check if the cache is empty explicitly after
> calling kmem_cache_shrink instead. This will simplify things a lot w/o
> introducing any functional changes.

Agreed.

> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
  2014-04-18  8:04   ` Vladimir Davydov
@ 2014-04-18 13:44     ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 13:44 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Fri, Apr 18, 2014 at 12:04:48PM +0400, Vladimir Davydov wrote:
> Currently we have two pairs of kmemcg-related functions that are called
> on slab alloc/free. The first is memcg_{bind,release}_pages that count
> the total number of pages allocated on a kmem cache. The second is
> memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
> counter. Let's just merge them to keep the code clean.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> ---
>  include/linux/memcontrol.h |    4 ++--
>  mm/memcontrol.c            |   22 ++++++++++++++++++++--
>  mm/slab.c                  |    2 --
>  mm/slab.h                  |   25 ++-----------------------
>  mm/slub.c                  |    2 --
>  5 files changed, 24 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 087a45314181..d38d190f4cec 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -506,8 +506,8 @@ void memcg_update_array_size(int num_groups);
>  struct kmem_cache *
>  __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
>  
> -int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
> -void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
> +int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
> +void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);

I like the patch overall, but why the __prefix and not just
memcg_charge_slab() and memcg_uncharge_slab()?

Not a show stopper, though:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
@ 2014-04-18 13:44     ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 13:44 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Fri, Apr 18, 2014 at 12:04:48PM +0400, Vladimir Davydov wrote:
> Currently we have two pairs of kmemcg-related functions that are called
> on slab alloc/free. The first is memcg_{bind,release}_pages that count
> the total number of pages allocated on a kmem cache. The second is
> memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
> counter. Let's just merge them to keep the code clean.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> ---
>  include/linux/memcontrol.h |    4 ++--
>  mm/memcontrol.c            |   22 ++++++++++++++++++++--
>  mm/slab.c                  |    2 --
>  mm/slab.h                  |   25 ++-----------------------
>  mm/slub.c                  |    2 --
>  5 files changed, 24 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 087a45314181..d38d190f4cec 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -506,8 +506,8 @@ void memcg_update_array_size(int num_groups);
>  struct kmem_cache *
>  __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
>  
> -int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
> -void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
> +int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
> +void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);

I like the patch overall, but why the __prefix and not just
memcg_charge_slab() and memcg_uncharge_slab()?

Not a show stopper, though:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
  2014-04-18  8:04   ` Vladimir Davydov
@ 2014-04-18 14:17     ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 14:17 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

I like this patch, but the API names are confusing.  Could we fix up
that whole thing by any chance?  Some suggestions below, but they
might only be marginally better...

On Fri, Apr 18, 2014 at 12:04:49PM +0400, Vladimir Davydov wrote:
> @@ -3156,24 +3157,34 @@ void memcg_free_cache_params(struct kmem_cache *s)
>  	kfree(s->memcg_params);
>  }
>  
> -void memcg_register_cache(struct kmem_cache *s)
> +static void memcg_kmem_create_cache(struct mem_cgroup *memcg,
> +				    struct kmem_cache *root_cache)

memcg_copy_kmem_cache()?

> @@ -3182,49 +3193,30 @@ void memcg_register_cache(struct kmem_cache *s)
>  	 */
>  	smp_wmb();
>  
> -	/*
> -	 * Initialize the pointer to this cache in its parent's memcg_params
> -	 * before adding it to the memcg_slab_caches list, otherwise we can
> -	 * fail to convert memcg_params_to_cache() while traversing the list.
> -	 */
> -	VM_BUG_ON(root->memcg_params->memcg_caches[id]);
> -	root->memcg_params->memcg_caches[id] = s;
> -
> -	mutex_lock(&memcg->slab_caches_mutex);
> -	list_add(&s->memcg_params->list, &memcg->memcg_slab_caches);
> -	mutex_unlock(&memcg->slab_caches_mutex);
> +	BUG_ON(root_cache->memcg_params->memcg_caches[id]);
> +	root_cache->memcg_params->memcg_caches[id] = cachep;
>  }
>  
> -void memcg_unregister_cache(struct kmem_cache *s)
> +static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)

memcg_destroy_kmem_cache()?

> @@ -3258,70 +3250,42 @@ static inline void memcg_resume_kmem_account(void)
>  	current->memcg_kmem_skip_account--;
>  }
>  
> -static void kmem_cache_destroy_work_func(struct work_struct *w)
> -{
> -	struct kmem_cache *cachep;
> -	struct memcg_cache_params *p;
> -
> -	p = container_of(w, struct memcg_cache_params, destroy);
> -
> -	cachep = memcg_params_to_cache(p);
> -
> -	kmem_cache_shrink(cachep);
> -	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
> -		kmem_cache_destroy(cachep);
> -}
> -
>  int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)

kmem_cache_destroy_memcg_copies()?

>  static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)

memcg_destroy_kmem_cache_copies()?

> @@ -266,22 +265,15 @@ EXPORT_SYMBOL(kmem_cache_create);
>   * requests going from @memcg to @root_cache. The new cache inherits properties
>   * from its parent.
>   */
> -void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_cache)
> +struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *memcg,
> +					   struct kmem_cache *root_cache)

kmem_cache_request_memcg_copy()?

>  {
> -	struct kmem_cache *s;
> +	struct kmem_cache *s = NULL;
>  	char *cache_name;
>  
>  	get_online_cpus();
>  	mutex_lock(&slab_mutex);
>  
> -	/*
> -	 * Since per-memcg caches are created asynchronously on first
> -	 * allocation (see memcg_kmem_get_cache()), several threads can try to
> -	 * create the same cache, but only one of them may succeed.
> -	 */
> -	if (cache_from_memcg_idx(root_cache, memcg_cache_id(memcg)))
> -		goto out_unlock;
> -
>  	cache_name = memcg_create_cache_name(memcg, root_cache);

memcg_name_kmem_cache()?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
@ 2014-04-18 14:17     ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 14:17 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

I like this patch, but the API names are confusing.  Could we fix up
that whole thing by any chance?  Some suggestions below, but they
might only be marginally better...

On Fri, Apr 18, 2014 at 12:04:49PM +0400, Vladimir Davydov wrote:
> @@ -3156,24 +3157,34 @@ void memcg_free_cache_params(struct kmem_cache *s)
>  	kfree(s->memcg_params);
>  }
>  
> -void memcg_register_cache(struct kmem_cache *s)
> +static void memcg_kmem_create_cache(struct mem_cgroup *memcg,
> +				    struct kmem_cache *root_cache)

memcg_copy_kmem_cache()?

> @@ -3182,49 +3193,30 @@ void memcg_register_cache(struct kmem_cache *s)
>  	 */
>  	smp_wmb();
>  
> -	/*
> -	 * Initialize the pointer to this cache in its parent's memcg_params
> -	 * before adding it to the memcg_slab_caches list, otherwise we can
> -	 * fail to convert memcg_params_to_cache() while traversing the list.
> -	 */
> -	VM_BUG_ON(root->memcg_params->memcg_caches[id]);
> -	root->memcg_params->memcg_caches[id] = s;
> -
> -	mutex_lock(&memcg->slab_caches_mutex);
> -	list_add(&s->memcg_params->list, &memcg->memcg_slab_caches);
> -	mutex_unlock(&memcg->slab_caches_mutex);
> +	BUG_ON(root_cache->memcg_params->memcg_caches[id]);
> +	root_cache->memcg_params->memcg_caches[id] = cachep;
>  }
>  
> -void memcg_unregister_cache(struct kmem_cache *s)
> +static void memcg_kmem_destroy_cache(struct kmem_cache *cachep)

memcg_destroy_kmem_cache()?

> @@ -3258,70 +3250,42 @@ static inline void memcg_resume_kmem_account(void)
>  	current->memcg_kmem_skip_account--;
>  }
>  
> -static void kmem_cache_destroy_work_func(struct work_struct *w)
> -{
> -	struct kmem_cache *cachep;
> -	struct memcg_cache_params *p;
> -
> -	p = container_of(w, struct memcg_cache_params, destroy);
> -
> -	cachep = memcg_params_to_cache(p);
> -
> -	kmem_cache_shrink(cachep);
> -	if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
> -		kmem_cache_destroy(cachep);
> -}
> -
>  int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)

kmem_cache_destroy_memcg_copies()?

>  static void mem_cgroup_destroy_all_caches(struct mem_cgroup *memcg)

memcg_destroy_kmem_cache_copies()?

> @@ -266,22 +265,15 @@ EXPORT_SYMBOL(kmem_cache_create);
>   * requests going from @memcg to @root_cache. The new cache inherits properties
>   * from its parent.
>   */
> -void kmem_cache_create_memcg(struct mem_cgroup *memcg, struct kmem_cache *root_cache)
> +struct kmem_cache *kmem_cache_create_memcg(struct mem_cgroup *memcg,
> +					   struct kmem_cache *root_cache)

kmem_cache_request_memcg_copy()?

>  {
> -	struct kmem_cache *s;
> +	struct kmem_cache *s = NULL;
>  	char *cache_name;
>  
>  	get_online_cpus();
>  	mutex_lock(&slab_mutex);
>  
> -	/*
> -	 * Since per-memcg caches are created asynchronously on first
> -	 * allocation (see memcg_kmem_get_cache()), several threads can try to
> -	 * create the same cache, but only one of them may succeed.
> -	 */
> -	if (cache_from_memcg_idx(root_cache, memcg_cache_id(memcg)))
> -		goto out_unlock;
> -
>  	cache_name = memcg_create_cache_name(memcg, root_cache);

memcg_name_kmem_cache()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
  2014-04-18 13:23   ` Johannes Weiner
@ 2014-04-18 16:04     ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 05:23 PM, Johannes Weiner wrote:
>> First, it removes async per memcg cache destruction (see patches 1, 2).
>> Now caches are only destroyed on memcg offline. That means the caches
>> that are not empty on memcg offline will be leaked. However, they are
>> already leaked, because memcg_cache_params::nr_pages normally never
>> drops to 0 so the destruction work is never scheduled except
>> kmem_cache_shrink is called explicitly. In the future I'm planning
>> reaping such dead caches on vmpressure or periodically.
>
> I like the synchronous handling on css destruction, but the periodical
> reaping part still bothers me.  If there is absolutely 0 use for these
> caches remaining, they shouldn't hang around until we encounter memory
> pressure or a random time interval.

Agree.

> Would it be feasible to implement cache merging in both slub and slab,
> so that upon css destruction the child's cache's remaining slabs could
> be moved to the parent's cache?  If the parent doesn't have one, just
> reparent the whole cache.

Interesting idea. That would definitely look neater than periodic
reaping. But it's going to be an uneasy thing to do I guess, because
synchronization in sl[au]b is a subtle thing. I'll have a closer look at
slab's internals to understand if it's feasible.

>
>> Second, it substitutes per memcg slab_caches_mutex's with the global
>> memcg_slab_mutex, which should be taken during the whole per memcg cache
>> creation/destruction path before the slab_mutex (see patch 3). This
>> greatly simplifies synchronization among various per memcg cache
>> creation/destruction paths.
>
> This sounds reasonable.  I'll go look at the code.

Thank you!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
@ 2014-04-18 16:04     ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 05:23 PM, Johannes Weiner wrote:
>> First, it removes async per memcg cache destruction (see patches 1, 2).
>> Now caches are only destroyed on memcg offline. That means the caches
>> that are not empty on memcg offline will be leaked. However, they are
>> already leaked, because memcg_cache_params::nr_pages normally never
>> drops to 0 so the destruction work is never scheduled except
>> kmem_cache_shrink is called explicitly. In the future I'm planning
>> reaping such dead caches on vmpressure or periodically.
>
> I like the synchronous handling on css destruction, but the periodical
> reaping part still bothers me.  If there is absolutely 0 use for these
> caches remaining, they shouldn't hang around until we encounter memory
> pressure or a random time interval.

Agree.

> Would it be feasible to implement cache merging in both slub and slab,
> so that upon css destruction the child's cache's remaining slabs could
> be moved to the parent's cache?  If the parent doesn't have one, just
> reparent the whole cache.

Interesting idea. That would definitely look neater than periodic
reaping. But it's going to be an uneasy thing to do I guess, because
synchronization in sl[au]b is a subtle thing. I'll have a closer look at
slab's internals to understand if it's feasible.

>
>> Second, it substitutes per memcg slab_caches_mutex's with the global
>> memcg_slab_mutex, which should be taken during the whole per memcg cache
>> creation/destruction path before the slab_mutex (see patch 3). This
>> greatly simplifies synchronization among various per memcg cache
>> creation/destruction paths.
>
> This sounds reasonable.  I'll go look at the code.

Thank you!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away
  2014-04-18 13:41     ` Johannes Weiner
@ 2014-04-18 16:05       ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 05:41 PM, Johannes Weiner wrote:
>> Thus we have a piece of code that works only when we explicitly call
>> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
>> it's racy in fact. For instance, kmem_cache_shrink may free the last
>> slab and thus schedule cache destruction before it finishes checking
>> that the cache is empty, which can lead to use-after-free.
>
> Can't this still happen when the last object free races with css
> destruction?

AFAIU, yes, it still can happen, but we have less places to fix now. I'm
planning to sort this out by rearranging operations inside
kmem_cache_free so that we do not touch the cache after we've
decremented memcg_cache_params::nr_pages and made the cache potentially
destroyable.

Or, if we could reparent individual slabs as you proposed earlier, we
wouldn't have to bother about it at all any more as well as about per
memcg cache destruction.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away
@ 2014-04-18 16:05       ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 05:41 PM, Johannes Weiner wrote:
>> Thus we have a piece of code that works only when we explicitly call
>> kmem_cache_shrink, but complicates the whole picture a lot. Moreover,
>> it's racy in fact. For instance, kmem_cache_shrink may free the last
>> slab and thus schedule cache destruction before it finishes checking
>> that the cache is empty, which can lead to use-after-free.
>
> Can't this still happen when the last object free races with css
> destruction?

AFAIU, yes, it still can happen, but we have less places to fix now. I'm
planning to sort this out by rearranging operations inside
kmem_cache_free so that we do not touch the cache after we've
decremented memcg_cache_params::nr_pages and made the cache potentially
destroyable.

Or, if we could reparent individual slabs as you proposed earlier, we
wouldn't have to bother about it at all any more as well as about per
memcg cache destruction.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
  2014-04-18 13:44     ` Johannes Weiner
@ 2014-04-18 16:07       ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 05:44 PM, Johannes Weiner wrote:
> On Fri, Apr 18, 2014 at 12:04:48PM +0400, Vladimir Davydov wrote:
>> Currently we have two pairs of kmemcg-related functions that are called
>> on slab alloc/free. The first is memcg_{bind,release}_pages that count
>> the total number of pages allocated on a kmem cache. The second is
>> memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
>> counter. Let's just merge them to keep the code clean.
>>
>> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
>> ---
>>   include/linux/memcontrol.h |    4 ++--
>>   mm/memcontrol.c            |   22 ++++++++++++++++++++--
>>   mm/slab.c                  |    2 --
>>   mm/slab.h                  |   25 ++-----------------------
>>   mm/slub.c                  |    2 --
>>   5 files changed, 24 insertions(+), 31 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 087a45314181..d38d190f4cec 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -506,8 +506,8 @@ void memcg_update_array_size(int num_groups);
>>   struct kmem_cache *
>>   __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
>>
>> -int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
>> -void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
>> +int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
>> +void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
>
> I like the patch overall, but why the __prefix and not just
> memcg_charge_slab() and memcg_uncharge_slab()?

Because I have memcg_{un}charge_slab (without underscores) in mm/slab.h.
Those functions are inline so that we only issue a function call if the
memcg_kmem_enabled static key is on and the cache is not a global one.

Actually I'm not sure if we really need such an optimization in slab
allocation/free paths, which are not very hot, but it wouldn't hurt,
would it?

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab
@ 2014-04-18 16:07       ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 05:44 PM, Johannes Weiner wrote:
> On Fri, Apr 18, 2014 at 12:04:48PM +0400, Vladimir Davydov wrote:
>> Currently we have two pairs of kmemcg-related functions that are called
>> on slab alloc/free. The first is memcg_{bind,release}_pages that count
>> the total number of pages allocated on a kmem cache. The second is
>> memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
>> counter. Let's just merge them to keep the code clean.
>>
>> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
>> ---
>>   include/linux/memcontrol.h |    4 ++--
>>   mm/memcontrol.c            |   22 ++++++++++++++++++++--
>>   mm/slab.c                  |    2 --
>>   mm/slab.h                  |   25 ++-----------------------
>>   mm/slub.c                  |    2 --
>>   5 files changed, 24 insertions(+), 31 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 087a45314181..d38d190f4cec 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -506,8 +506,8 @@ void memcg_update_array_size(int num_groups);
>>   struct kmem_cache *
>>   __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
>>
>> -int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp, u64 size);
>> -void memcg_uncharge_kmem(struct mem_cgroup *memcg, u64 size);
>> +int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order);
>> +void __memcg_uncharge_slab(struct kmem_cache *cachep, int order);
>
> I like the patch overall, but why the __prefix and not just
> memcg_charge_slab() and memcg_uncharge_slab()?

Because I have memcg_{un}charge_slab (without underscores) in mm/slab.h.
Those functions are inline so that we only issue a function call if the
memcg_kmem_enabled static key is on and the cache is not a global one.

Actually I'm not sure if we really need such an optimization in slab
allocation/free paths, which are not very hot, but it wouldn't hurt,
would it?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
  2014-04-18 14:17     ` Johannes Weiner
@ 2014-04-18 16:08       ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 06:17 PM, Johannes Weiner wrote:
> I like this patch, but the API names are confusing.  Could we fix up
> that whole thing by any chance?  Some suggestions below, but they
> might only be marginally better...

Yeah, names are inconsistent in kmemcg and desperately want improvement:

mem_cgroup_destroy_all_caches
kmem_cgroup_css_offline
memcg_kmem_get_cache
memcg_charge_kmem
memcg_create_cache_name

I've been thinking on cleaning this up for some time, but couldn't make
up my mind to do this. I think it cannot wait any more now, so my next
patch set will rework kmemcg naming.

Can we apply this patch as is for now? It'd be more convenient for me to
rework naming on top of the end picture.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
@ 2014-04-18 16:08       ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-18 16:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On 04/18/2014 06:17 PM, Johannes Weiner wrote:
> I like this patch, but the API names are confusing.  Could we fix up
> that whole thing by any chance?  Some suggestions below, but they
> might only be marginally better...

Yeah, names are inconsistent in kmemcg and desperately want improvement:

mem_cgroup_destroy_all_caches
kmem_cgroup_css_offline
memcg_kmem_get_cache
memcg_charge_kmem
memcg_create_cache_name

I've been thinking on cleaning this up for some time, but couldn't make
up my mind to do this. I think it cannot wait any more now, so my next
patch set will rework kmemcg naming.

Can we apply this patch as is for now? It'd be more convenient for me to
rework naming on top of the end picture.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
  2014-04-18 16:08       ` Vladimir Davydov
@ 2014-04-18 18:26         ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 18:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Fri, Apr 18, 2014 at 08:08:17PM +0400, Vladimir Davydov wrote:
> On 04/18/2014 06:17 PM, Johannes Weiner wrote:
> > I like this patch, but the API names are confusing.  Could we fix up
> > that whole thing by any chance?  Some suggestions below, but they
> > might only be marginally better...
> 
> Yeah, names are inconsistent in kmemcg and desperately want improvement:
> 
> mem_cgroup_destroy_all_caches
> kmem_cgroup_css_offline
> memcg_kmem_get_cache
> memcg_charge_kmem
> memcg_create_cache_name
> 
> I've been thinking on cleaning this up for some time, but couldn't make
> up my mind to do this. I think it cannot wait any more now, so my next
> patch set will rework kmemcg naming.
> 
> Can we apply this patch as is for now? It'd be more convenient for me to
> rework naming on top of the end picture.

Yes, absolutely.  For this patch:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme
@ 2014-04-18 18:26         ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2014-04-18 18:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: mhocko, akpm, glommer, cl, penberg, linux-kernel, linux-mm, devel

On Fri, Apr 18, 2014 at 08:08:17PM +0400, Vladimir Davydov wrote:
> On 04/18/2014 06:17 PM, Johannes Weiner wrote:
> > I like this patch, but the API names are confusing.  Could we fix up
> > that whole thing by any chance?  Some suggestions below, but they
> > might only be marginally better...
> 
> Yeah, names are inconsistent in kmemcg and desperately want improvement:
> 
> mem_cgroup_destroy_all_caches
> kmem_cgroup_css_offline
> memcg_kmem_get_cache
> memcg_charge_kmem
> memcg_create_cache_name
> 
> I've been thinking on cleaning this up for some time, but couldn't make
> up my mind to do this. I think it cannot wait any more now, so my next
> patch set will rework kmemcg naming.
> 
> Can we apply this patch as is for now? It'd be more convenient for me to
> rework naming on top of the end picture.

Yes, absolutely.  For this patch:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
  2014-04-18  8:04 ` Vladimir Davydov
@ 2014-04-20 10:32   ` Vladimir Davydov
  -1 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-20 10:32 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Andrew,

Please do not apply this set for the time being, because I still have
doubts about the end picture I had in mind when submitting it.

Besides, it won't apply cleanly on top of the mm tree after "slab:
get_online_mems for kmem_cache_{create,destroy,shrink}" has been merged.

I'm going to have a bit more thinking/discussing on the problem and then
will either rework or just rebase the patch set and resend.

Thanks.

On 04/18/2014 12:04 PM, Vladimir Davydov wrote:
> Hi Michal, Johannes,
> 
> This patch-set is a part of preparations for kmemcg re-parenting. It
> targets at simplifying kmemcg work-flows and synchronization.
> 
> First, it removes async per memcg cache destruction (see patches 1, 2).
> Now caches are only destroyed on memcg offline. That means the caches
> that are not empty on memcg offline will be leaked. However, they are
> already leaked, because memcg_cache_params::nr_pages normally never
> drops to 0 so the destruction work is never scheduled except
> kmem_cache_shrink is called explicitly. In the future I'm planning
> reaping such dead caches on vmpressure or periodically.
> 
> Second, it substitutes per memcg slab_caches_mutex's with the global
> memcg_slab_mutex, which should be taken during the whole per memcg cache
> creation/destruction path before the slab_mutex (see patch 3). This
> greatly simplifies synchronization among various per memcg cache
> creation/destruction paths.
> 
> I really need your help, because I'm far not sure if what I'm doing here
> is right. So I would appreciate if you could look through the patches
> and share your thoughts about the design changes they introduce.
> 
> Thanks,
> 
> Vladimir Davydov (3):
>   memcg, slab: do not schedule cache destruction when last page goes
>     away
>   memcg, slab: merge memcg_{bind,release}_pages to
>     memcg_{un}charge_slab
>   memcg, slab: simplify synchronization scheme
> 
>  include/linux/memcontrol.h |   15 +--
>  include/linux/slab.h       |    8 +-
>  mm/memcontrol.c            |  231 +++++++++++++++-----------------------------
>  mm/slab.c                  |    2 -
>  mm/slab.h                  |   28 +-----
>  mm/slab_common.c           |   22 ++---
>  mm/slub.c                  |    2 -
>  7 files changed, 92 insertions(+), 216 deletions(-)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup")
@ 2014-04-20 10:32   ` Vladimir Davydov
  0 siblings, 0 replies; 30+ messages in thread
From: Vladimir Davydov @ 2014-04-20 10:32 UTC (permalink / raw)
  To: akpm; +Cc: mhocko, hannes, glommer, cl, penberg, linux-kernel, linux-mm, devel

Hi Andrew,

Please do not apply this set for the time being, because I still have
doubts about the end picture I had in mind when submitting it.

Besides, it won't apply cleanly on top of the mm tree after "slab:
get_online_mems for kmem_cache_{create,destroy,shrink}" has been merged.

I'm going to have a bit more thinking/discussing on the problem and then
will either rework or just rebase the patch set and resend.

Thanks.

On 04/18/2014 12:04 PM, Vladimir Davydov wrote:
> Hi Michal, Johannes,
> 
> This patch-set is a part of preparations for kmemcg re-parenting. It
> targets at simplifying kmemcg work-flows and synchronization.
> 
> First, it removes async per memcg cache destruction (see patches 1, 2).
> Now caches are only destroyed on memcg offline. That means the caches
> that are not empty on memcg offline will be leaked. However, they are
> already leaked, because memcg_cache_params::nr_pages normally never
> drops to 0 so the destruction work is never scheduled except
> kmem_cache_shrink is called explicitly. In the future I'm planning
> reaping such dead caches on vmpressure or periodically.
> 
> Second, it substitutes per memcg slab_caches_mutex's with the global
> memcg_slab_mutex, which should be taken during the whole per memcg cache
> creation/destruction path before the slab_mutex (see patch 3). This
> greatly simplifies synchronization among various per memcg cache
> creation/destruction paths.
> 
> I really need your help, because I'm far not sure if what I'm doing here
> is right. So I would appreciate if you could look through the patches
> and share your thoughts about the design changes they introduce.
> 
> Thanks,
> 
> Vladimir Davydov (3):
>   memcg, slab: do not schedule cache destruction when last page goes
>     away
>   memcg, slab: merge memcg_{bind,release}_pages to
>     memcg_{un}charge_slab
>   memcg, slab: simplify synchronization scheme
> 
>  include/linux/memcontrol.h |   15 +--
>  include/linux/slab.h       |    8 +-
>  mm/memcontrol.c            |  231 +++++++++++++++-----------------------------
>  mm/slab.c                  |    2 -
>  mm/slab.h                  |   28 +-----
>  mm/slab_common.c           |   22 ++---
>  mm/slub.c                  |    2 -
>  7 files changed, 92 insertions(+), 216 deletions(-)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-04-20 10:32 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-18  8:04 [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup") Vladimir Davydov
2014-04-18  8:04 ` Vladimir Davydov
2014-04-18  8:04 ` [PATCH RFC -mm v2 1/3] memcg, slab: do not schedule cache destruction when last page goes away Vladimir Davydov
2014-04-18  8:04   ` Vladimir Davydov
2014-04-18 13:41   ` Johannes Weiner
2014-04-18 13:41     ` Johannes Weiner
2014-04-18 16:05     ` Vladimir Davydov
2014-04-18 16:05       ` Vladimir Davydov
2014-04-18  8:04 ` [PATCH RFC -mm v2 2/3] memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab Vladimir Davydov
2014-04-18  8:04   ` Vladimir Davydov
2014-04-18 13:44   ` Johannes Weiner
2014-04-18 13:44     ` Johannes Weiner
2014-04-18 16:07     ` Vladimir Davydov
2014-04-18 16:07       ` Vladimir Davydov
2014-04-18  8:04 ` [PATCH RFC -mm v2 3/3] memcg, slab: simplify synchronization scheme Vladimir Davydov
2014-04-18  8:04   ` Vladimir Davydov
2014-04-18 14:17   ` Johannes Weiner
2014-04-18 14:17     ` Johannes Weiner
2014-04-18 16:08     ` Vladimir Davydov
2014-04-18 16:08       ` Vladimir Davydov
2014-04-18 18:26       ` Johannes Weiner
2014-04-18 18:26         ` Johannes Weiner
2014-04-18  8:08 ` [PATCH RFC -mm v2 0/3] kmemcg: simplify work-flow (was "memcg-vs-slab cleanup") Vladimir Davydov
2014-04-18  8:08   ` Vladimir Davydov
2014-04-18 13:23 ` Johannes Weiner
2014-04-18 13:23   ` Johannes Weiner
2014-04-18 16:04   ` Vladimir Davydov
2014-04-18 16:04     ` Vladimir Davydov
2014-04-20 10:32 ` Vladimir Davydov
2014-04-20 10:32   ` Vladimir Davydov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.