All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm v2 0/8] memcg/slab: reintroduce dead cache self-destruction
@ 2014-06-06 13:22 ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Hi,

When a memcg is turned offline, some of its kmem caches can still have
active objects and therefore cannot be destroyed immediately. Currently,
we simply leak such caches along with the owner memcg, which is bad and
should be resolved.

It would be perfect if we could move all slab pages of such dead caches
to the root/parent cache on memcg offline. However, when I tried to
implement such re-parenting, I was pointed out by Christoph that the
overhead of this would be unacceptable, at least for SLUB (see
https://lkml.org/lkml/2014/5/13/446)

The problem with re-parenting of individual slabs is that it requires
tracking of all slabs allocated to a cache, but SLUB doesn't track full
slabs if !debug. Changing this behavior would result in significant
performance degradation of regular alloc/free paths, because it would
make alloc/free take per node list locks more often.

After pondering about this problem for some time, I think we should
return to dead caches self-destruction, i.e. scheduling cache
destruction work when the last slab page is freed.

This is the behavior we had before commit 5bd93da9917f ("memcg, slab:
simplify synchronization scheme"). The reason why it was removed was that
it simply didn't work, because SL[AU]B are implemented in such a way
that they don't discard empty slabs immediately, but prefer keeping them
cached for indefinite time to speed up further allocations.

However, we can change this w/o noticeable performance impact for both
SLAB and SLUB by making them drop free slabs as soon as they become
empty. Since dead caches should never be allocated from, removing empty
slabs from them shouldn't result in noticeable performance degradation.

So, this patch set reintroduces dead cache self-destruction and adds
some tweaks to SL[AU]B to prevent dead caches from hanging around
indefinitely. It is organized as follows:

 - patches 1-3 reintroduce dead memcg cache self-destruction;
 - patch 4 makes SLUB's version of kmem_cache_shrink always drop empty
   slabs, even if it fails to allocate a temporary array;
 - patches 5 and 6 fix possible use-after-free connected with
   asynchronous cache destruction;
 - patches 7 and 8 disable caching of empty slabs for dead memcg caches
   for SLUB and SLAB respectively.

Note, this doesn't resolve the problem of memcgs pinned by dead kmem
caches. I'm planning to solve this by re-parenting dead kmem caches to
the parent memcg.

v2:
 - fix use-after-free connected with asynchronous cache destruction;
 - less intrusive version of SLUB's kmem_cache_shrink fix;
 - simplify disabling of free slabs caching for SLUB (Joonsoo);
 - disable free slabs caching instead of using cache_reap for SLAB
   (Christoph).

v1: https://lkml.org/lkml/2014/5/30/264

Thanks,

Vladimir Davydov (8):
  memcg: cleanup memcg_cache_params refcnt usage
  memcg: destroy kmem caches when last slab is freed
  memcg: mark caches that belong to offline memcgs as dead
  slub: don't fail kmem_cache_shrink if slab placement optimization
    fails
  slub: make slab_free non-preemptable
  memcg: wait for kfree's to finish before destroying cache
  slub: make dead memcg caches discard free slabs immediately
  slab: make dead memcg caches discard free slabs immediately

 include/linux/slab.h |   14 +++++----
 mm/memcontrol.c      |   57 +++++++++++++++++++++++++++++++---
 mm/slab.c            |   83 +++++++++++++++++++++++++++++++++++++-------------
 mm/slab.h            |   10 ++++++
 mm/slub.c            |   49 +++++++++++++++++++++--------
 5 files changed, 170 insertions(+), 43 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 0/8] memcg/slab: reintroduce dead cache self-destruction
@ 2014-06-06 13:22 ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Hi,

When a memcg is turned offline, some of its kmem caches can still have
active objects and therefore cannot be destroyed immediately. Currently,
we simply leak such caches along with the owner memcg, which is bad and
should be resolved.

It would be perfect if we could move all slab pages of such dead caches
to the root/parent cache on memcg offline. However, when I tried to
implement such re-parenting, I was pointed out by Christoph that the
overhead of this would be unacceptable, at least for SLUB (see
https://lkml.org/lkml/2014/5/13/446)

The problem with re-parenting of individual slabs is that it requires
tracking of all slabs allocated to a cache, but SLUB doesn't track full
slabs if !debug. Changing this behavior would result in significant
performance degradation of regular alloc/free paths, because it would
make alloc/free take per node list locks more often.

After pondering about this problem for some time, I think we should
return to dead caches self-destruction, i.e. scheduling cache
destruction work when the last slab page is freed.

This is the behavior we had before commit 5bd93da9917f ("memcg, slab:
simplify synchronization scheme"). The reason why it was removed was that
it simply didn't work, because SL[AU]B are implemented in such a way
that they don't discard empty slabs immediately, but prefer keeping them
cached for indefinite time to speed up further allocations.

However, we can change this w/o noticeable performance impact for both
SLAB and SLUB by making them drop free slabs as soon as they become
empty. Since dead caches should never be allocated from, removing empty
slabs from them shouldn't result in noticeable performance degradation.

So, this patch set reintroduces dead cache self-destruction and adds
some tweaks to SL[AU]B to prevent dead caches from hanging around
indefinitely. It is organized as follows:

 - patches 1-3 reintroduce dead memcg cache self-destruction;
 - patch 4 makes SLUB's version of kmem_cache_shrink always drop empty
   slabs, even if it fails to allocate a temporary array;
 - patches 5 and 6 fix possible use-after-free connected with
   asynchronous cache destruction;
 - patches 7 and 8 disable caching of empty slabs for dead memcg caches
   for SLUB and SLAB respectively.

Note, this doesn't resolve the problem of memcgs pinned by dead kmem
caches. I'm planning to solve this by re-parenting dead kmem caches to
the parent memcg.

v2:
 - fix use-after-free connected with asynchronous cache destruction;
 - less intrusive version of SLUB's kmem_cache_shrink fix;
 - simplify disabling of free slabs caching for SLUB (Joonsoo);
 - disable free slabs caching instead of using cache_reap for SLAB
   (Christoph).

v1: https://lkml.org/lkml/2014/5/30/264

Thanks,

Vladimir Davydov (8):
  memcg: cleanup memcg_cache_params refcnt usage
  memcg: destroy kmem caches when last slab is freed
  memcg: mark caches that belong to offline memcgs as dead
  slub: don't fail kmem_cache_shrink if slab placement optimization
    fails
  slub: make slab_free non-preemptable
  memcg: wait for kfree's to finish before destroying cache
  slub: make dead memcg caches discard free slabs immediately
  slab: make dead memcg caches discard free slabs immediately

 include/linux/slab.h |   14 +++++----
 mm/memcontrol.c      |   57 +++++++++++++++++++++++++++++++---
 mm/slab.c            |   83 +++++++++++++++++++++++++++++++++++++-------------
 mm/slab.h            |   10 ++++++
 mm/slub.c            |   49 +++++++++++++++++++++--------
 5 files changed, 170 insertions(+), 43 deletions(-)

-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 1/8] memcg: cleanup memcg_cache_params refcnt usage
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Currently, we count the number of pages allocated to a per memcg cache
in memcg_cache_params->nr_pages. We only use this counter to find out if
the cache is empty and can be destroyed. So let's rename it to refcnt
and make it count not pages, but slabs so that we can use atomic_inc/dec
instead of atomic_add/sub in memcg_charge/uncharge_slab.

Also, as the number of slabs theoretically can be greater than INT_MAX,
let's use atomic_long for the counter.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h |    4 ++--
 mm/memcontrol.c      |    6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1d9abb7d22a0..1985bd9bec7d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -526,7 +526,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @memcg: pointer to the memcg this cache belongs to
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
- * @nr_pages: number of pages that belongs to this cache.
+ * @refcnt: reference counter
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -539,7 +539,7 @@ struct memcg_cache_params {
 			struct mem_cgroup *memcg;
 			struct list_head list;
 			struct kmem_cache *root_cache;
-			atomic_t nr_pages;
+			atomic_long_t refcnt;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 15bda8133ff9..98a24e5ea4b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3279,7 +3279,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
 		kmem_cache_shrink(cachep);
-		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+		if (atomic_long_read(&cachep->memcg_params->refcnt) == 0)
 			memcg_unregister_cache(cachep);
 	}
 	mutex_unlock(&memcg_slab_mutex);
@@ -3353,14 +3353,14 @@ int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
 	res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp,
 				PAGE_SIZE << order);
 	if (!res)
-		atomic_add(1 << order, &cachep->memcg_params->nr_pages);
+		atomic_long_inc(&cachep->memcg_params->refcnt);
 	return res;
 }
 
 void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
 {
 	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
-	atomic_sub(1 << order, &cachep->memcg_params->nr_pages);
+	atomic_long_dec(&cachep->memcg_params->refcnt);
 }
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 1/8] memcg: cleanup memcg_cache_params refcnt usage
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Currently, we count the number of pages allocated to a per memcg cache
in memcg_cache_params->nr_pages. We only use this counter to find out if
the cache is empty and can be destroyed. So let's rename it to refcnt
and make it count not pages, but slabs so that we can use atomic_inc/dec
instead of atomic_add/sub in memcg_charge/uncharge_slab.

Also, as the number of slabs theoretically can be greater than INT_MAX,
let's use atomic_long for the counter.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h |    4 ++--
 mm/memcontrol.c      |    6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1d9abb7d22a0..1985bd9bec7d 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -526,7 +526,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @memcg: pointer to the memcg this cache belongs to
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
- * @nr_pages: number of pages that belongs to this cache.
+ * @refcnt: reference counter
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -539,7 +539,7 @@ struct memcg_cache_params {
 			struct mem_cgroup *memcg;
 			struct list_head list;
 			struct kmem_cache *root_cache;
-			atomic_t nr_pages;
+			atomic_long_t refcnt;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 15bda8133ff9..98a24e5ea4b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3279,7 +3279,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
 		kmem_cache_shrink(cachep);
-		if (atomic_read(&cachep->memcg_params->nr_pages) == 0)
+		if (atomic_long_read(&cachep->memcg_params->refcnt) == 0)
 			memcg_unregister_cache(cachep);
 	}
 	mutex_unlock(&memcg_slab_mutex);
@@ -3353,14 +3353,14 @@ int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
 	res = memcg_charge_kmem(cachep->memcg_params->memcg, gfp,
 				PAGE_SIZE << order);
 	if (!res)
-		atomic_add(1 << order, &cachep->memcg_params->nr_pages);
+		atomic_long_inc(&cachep->memcg_params->refcnt);
 	return res;
 }
 
 void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
 {
 	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
-	atomic_sub(1 << order, &cachep->memcg_params->nr_pages);
+	atomic_long_dec(&cachep->memcg_params->refcnt);
 }
 
 /*
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 2/8] memcg: destroy kmem caches when last slab is freed
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

When the memcg_cache_params->refcnt goes to 0, schedule the worker that
will unregister the cache. To prevent this from happening when the owner
memcg is alive, keep the refcnt incremented during memcg lifetime.

Note, this doesn't guarantee that the cache that belongs to a dead memcg
will go away as soon as the last object is freed, because SL[AU]B
implementation can cache empty slabs for performance reasons. Hence the
cache may be hanging around indefinitely after memcg offline. This is to
be resolved by the next patches.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h |    2 ++
 mm/memcontrol.c      |   22 ++++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1985bd9bec7d..d9716fdc8211 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -527,6 +527,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @refcnt: reference counter
+ * @unregister_work: worker to destroy the cache
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -540,6 +541,7 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_long_t refcnt;
+			struct work_struct unregister_work;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 98a24e5ea4b5..886b5b414958 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3114,6 +3114,8 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 	return 0;
 }
 
+static void memcg_unregister_cache_func(struct work_struct *work);
+
 int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 			     struct kmem_cache *root_cache)
 {
@@ -3135,6 +3137,9 @@ int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 	if (memcg) {
 		s->memcg_params->memcg = memcg;
 		s->memcg_params->root_cache = root_cache;
+		atomic_long_set(&s->memcg_params->refcnt, 1);
+		INIT_WORK(&s->memcg_params->unregister_work,
+			  memcg_unregister_cache_func);
 		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
@@ -3216,6 +3221,17 @@ static void memcg_unregister_cache(struct kmem_cache *cachep)
 	kmem_cache_destroy(cachep);
 }
 
+static void memcg_unregister_cache_func(struct work_struct *work)
+{
+	struct memcg_cache_params *params =
+		container_of(work, struct memcg_cache_params, unregister_work);
+	struct kmem_cache *cachep = memcg_params_to_cache(params);
+
+	mutex_lock(&memcg_slab_mutex);
+	memcg_unregister_cache(cachep);
+	mutex_unlock(&memcg_slab_mutex);
+}
+
 /*
  * During the creation a new cache, we need to disable our accounting mechanism
  * altogether. This is true even if we are not creating, but rather just
@@ -3279,7 +3295,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
 		kmem_cache_shrink(cachep);
-		if (atomic_long_read(&cachep->memcg_params->refcnt) == 0)
+		if (atomic_long_dec_and_test(&cachep->memcg_params->refcnt))
 			memcg_unregister_cache(cachep);
 	}
 	mutex_unlock(&memcg_slab_mutex);
@@ -3360,7 +3376,9 @@ int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
 void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
 {
 	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
-	atomic_long_dec(&cachep->memcg_params->refcnt);
+
+	if (unlikely(atomic_long_dec_and_test(&cachep->memcg_params->refcnt)))
+		schedule_work(&cachep->memcg_params->unregister_work);
 }
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 2/8] memcg: destroy kmem caches when last slab is freed
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

When the memcg_cache_params->refcnt goes to 0, schedule the worker that
will unregister the cache. To prevent this from happening when the owner
memcg is alive, keep the refcnt incremented during memcg lifetime.

Note, this doesn't guarantee that the cache that belongs to a dead memcg
will go away as soon as the last object is freed, because SL[AU]B
implementation can cache empty slabs for performance reasons. Hence the
cache may be hanging around indefinitely after memcg offline. This is to
be resolved by the next patches.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h |    2 ++
 mm/memcontrol.c      |   22 ++++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1985bd9bec7d..d9716fdc8211 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -527,6 +527,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @refcnt: reference counter
+ * @unregister_work: worker to destroy the cache
  */
 struct memcg_cache_params {
 	bool is_root_cache;
@@ -540,6 +541,7 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_long_t refcnt;
+			struct work_struct unregister_work;
 		};
 	};
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 98a24e5ea4b5..886b5b414958 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3114,6 +3114,8 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 	return 0;
 }
 
+static void memcg_unregister_cache_func(struct work_struct *work);
+
 int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 			     struct kmem_cache *root_cache)
 {
@@ -3135,6 +3137,9 @@ int memcg_alloc_cache_params(struct mem_cgroup *memcg, struct kmem_cache *s,
 	if (memcg) {
 		s->memcg_params->memcg = memcg;
 		s->memcg_params->root_cache = root_cache;
+		atomic_long_set(&s->memcg_params->refcnt, 1);
+		INIT_WORK(&s->memcg_params->unregister_work,
+			  memcg_unregister_cache_func);
 		css_get(&memcg->css);
 	} else
 		s->memcg_params->is_root_cache = true;
@@ -3216,6 +3221,17 @@ static void memcg_unregister_cache(struct kmem_cache *cachep)
 	kmem_cache_destroy(cachep);
 }
 
+static void memcg_unregister_cache_func(struct work_struct *work)
+{
+	struct memcg_cache_params *params =
+		container_of(work, struct memcg_cache_params, unregister_work);
+	struct kmem_cache *cachep = memcg_params_to_cache(params);
+
+	mutex_lock(&memcg_slab_mutex);
+	memcg_unregister_cache(cachep);
+	mutex_unlock(&memcg_slab_mutex);
+}
+
 /*
  * During the creation a new cache, we need to disable our accounting mechanism
  * altogether. This is true even if we are not creating, but rather just
@@ -3279,7 +3295,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
 		kmem_cache_shrink(cachep);
-		if (atomic_long_read(&cachep->memcg_params->refcnt) == 0)
+		if (atomic_long_dec_and_test(&cachep->memcg_params->refcnt))
 			memcg_unregister_cache(cachep);
 	}
 	mutex_unlock(&memcg_slab_mutex);
@@ -3360,7 +3376,9 @@ int __memcg_charge_slab(struct kmem_cache *cachep, gfp_t gfp, int order)
 void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
 {
 	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
-	atomic_long_dec(&cachep->memcg_params->refcnt);
+
+	if (unlikely(atomic_long_dec_and_test(&cachep->memcg_params->refcnt)))
+		schedule_work(&cachep->memcg_params->unregister_work);
 }
 
 /*
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

This will be used by the next patches.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h |    2 ++
 mm/memcontrol.c      |    1 +
 mm/slab.h            |   10 ++++++++++
 3 files changed, 13 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index d9716fdc8211..d99d5212b815 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -527,6 +527,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @refcnt: reference counter
+ * @dead: set to true when owner memcg is turned offline
  * @unregister_work: worker to destroy the cache
  */
 struct memcg_cache_params {
@@ -541,6 +542,7 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_long_t refcnt;
+			bool dead;
 			struct work_struct unregister_work;
 		};
 	};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 886b5b414958..ed42fd1105a5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3294,6 +3294,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 	mutex_lock(&memcg_slab_mutex);
 	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
+		cachep->memcg_params->dead = true;
 		kmem_cache_shrink(cachep);
 		if (atomic_long_dec_and_test(&cachep->memcg_params->refcnt))
 			memcg_unregister_cache(cachep);
diff --git a/mm/slab.h b/mm/slab.h
index 961a3fb1f5a2..9515cc520bf8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -121,6 +121,11 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
+static inline bool memcg_cache_dead(struct kmem_cache *s)
+{
+	return !is_root_cache(s) && s->memcg_params->dead;
+}
+
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 					struct kmem_cache *p)
 {
@@ -203,6 +208,11 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
+static inline bool memcg_cache_dead(struct kmem_cache *s)
+{
+	return false;
+}
+
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 				      struct kmem_cache *p)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

This will be used by the next patches.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 include/linux/slab.h |    2 ++
 mm/memcontrol.c      |    1 +
 mm/slab.h            |   10 ++++++++++
 3 files changed, 13 insertions(+)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index d9716fdc8211..d99d5212b815 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -527,6 +527,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  * @list: list_head for the list of all caches in this memcg
  * @root_cache: pointer to the global, root cache, this cache was derived from
  * @refcnt: reference counter
+ * @dead: set to true when owner memcg is turned offline
  * @unregister_work: worker to destroy the cache
  */
 struct memcg_cache_params {
@@ -541,6 +542,7 @@ struct memcg_cache_params {
 			struct list_head list;
 			struct kmem_cache *root_cache;
 			atomic_long_t refcnt;
+			bool dead;
 			struct work_struct unregister_work;
 		};
 	};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 886b5b414958..ed42fd1105a5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3294,6 +3294,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 	mutex_lock(&memcg_slab_mutex);
 	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
 		cachep = memcg_params_to_cache(params);
+		cachep->memcg_params->dead = true;
 		kmem_cache_shrink(cachep);
 		if (atomic_long_dec_and_test(&cachep->memcg_params->refcnt))
 			memcg_unregister_cache(cachep);
diff --git a/mm/slab.h b/mm/slab.h
index 961a3fb1f5a2..9515cc520bf8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -121,6 +121,11 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return !s->memcg_params || s->memcg_params->is_root_cache;
 }
 
+static inline bool memcg_cache_dead(struct kmem_cache *s)
+{
+	return !is_root_cache(s) && s->memcg_params->dead;
+}
+
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 					struct kmem_cache *p)
 {
@@ -203,6 +208,11 @@ static inline bool is_root_cache(struct kmem_cache *s)
 	return true;
 }
 
+static inline bool memcg_cache_dead(struct kmem_cache *s)
+{
+	return false;
+}
+
 static inline bool slab_equal_or_root(struct kmem_cache *s,
 				      struct kmem_cache *p)
 {
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 4/8] slub: don't fail kmem_cache_shrink if slab placement optimization fails
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

SLUB's kmem_cache_shrink not only removes empty slabs from the cache,
but also sorts slabs by the number of objects in-use to cope with
fragmentation. To achieve that, it tries to allocate a temporary array.
If it fails, it will abort the whole procedure.

This is unacceptable for kmemcg, where we want to be sure that all empty
slabs are removed from the cache on memcg offline, so let's just skip
the slab placement optimization step if the allocation fails, but still
get rid of empty slabs.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/slub.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d96faa2464c3..35741592be8c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3404,12 +3404,20 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 	struct page *page;
 	struct page *t;
 	int objects = oo_objects(s->max);
+	struct list_head empty_slabs;
 	struct list_head *slabs_by_inuse =
 		kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL);
 	unsigned long flags;
 
-	if (!slabs_by_inuse)
-		return -ENOMEM;
+	if (!slabs_by_inuse) {
+		/*
+		 * Do not fail shrinking empty slabs if allocation of the
+		 * temporary array failed. Just skip the slab placement
+		 * optimization then.
+		 */
+		slabs_by_inuse = &empty_slabs;
+		objects = 1;
+	}
 
 	flush_all(s);
 	for_each_node_state(node, N_NORMAL_MEMORY) {
@@ -3430,7 +3438,9 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			list_move(&page->lru, slabs_by_inuse + page->inuse);
+			if (page->inuse < objects)
+				list_move(&page->lru,
+					  slabs_by_inuse + page->inuse);
 			if (!page->inuse)
 				n->nr_partial--;
 		}
@@ -3449,7 +3459,8 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 			discard_slab(s, page);
 	}
 
-	kfree(slabs_by_inuse);
+	if (slabs_by_inuse != &empty_slabs)
+		kfree(slabs_by_inuse);
 	return 0;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 4/8] slub: don't fail kmem_cache_shrink if slab placement optimization fails
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

SLUB's kmem_cache_shrink not only removes empty slabs from the cache,
but also sorts slabs by the number of objects in-use to cope with
fragmentation. To achieve that, it tries to allocate a temporary array.
If it fails, it will abort the whole procedure.

This is unacceptable for kmemcg, where we want to be sure that all empty
slabs are removed from the cache on memcg offline, so let's just skip
the slab placement optimization step if the allocation fails, but still
get rid of empty slabs.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
 mm/slub.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d96faa2464c3..35741592be8c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3404,12 +3404,20 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 	struct page *page;
 	struct page *t;
 	int objects = oo_objects(s->max);
+	struct list_head empty_slabs;
 	struct list_head *slabs_by_inuse =
 		kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL);
 	unsigned long flags;
 
-	if (!slabs_by_inuse)
-		return -ENOMEM;
+	if (!slabs_by_inuse) {
+		/*
+		 * Do not fail shrinking empty slabs if allocation of the
+		 * temporary array failed. Just skip the slab placement
+		 * optimization then.
+		 */
+		slabs_by_inuse = &empty_slabs;
+		objects = 1;
+	}
 
 	flush_all(s);
 	for_each_node_state(node, N_NORMAL_MEMORY) {
@@ -3430,7 +3438,9 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 		 * list_lock. page->inuse here is the upper limit.
 		 */
 		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			list_move(&page->lru, slabs_by_inuse + page->inuse);
+			if (page->inuse < objects)
+				list_move(&page->lru,
+					  slabs_by_inuse + page->inuse);
 			if (!page->inuse)
 				n->nr_partial--;
 		}
@@ -3449,7 +3459,8 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 			discard_slab(s, page);
 	}
 
-	kfree(slabs_by_inuse);
+	if (slabs_by_inuse != &empty_slabs)
+		kfree(slabs_by_inuse);
 	return 0;
 }
 
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Since per memcg cache destruction is scheduled when the last slab is
freed, to avoid use-after-free in kmem_cache_free we should either
rearrange code in kmem_cache_free so that it won't dereference the cache
ptr after freeing the object, or wait for all kmem_cache_free's to
complete before proceeding to cache destruction.

The former approach isn't a good option from the future development
point of view, because every modifications to kmem_cache_free must be
done with great care then. Hence we should provide a method to wait for
all currently executing kmem_cache_free's to finish.

This patch makes SLUB's implementation of kmem_cache_free
non-preemptable. As a result, synchronize_sched() will work as a barrier
against kmem_cache_free's in flight, so that issuing it before cache
destruction will protect us against the use-after-free.

This won't affect performance of kmem_cache_free, because we already
disable preemption there, and this patch only moves preempt_enable to
the end of the function. Neither should it affect the system latency,
because kmem_cache_free is extremely short, even in its slow path.

SLAB's version of kmem_cache_free already proceeds with irqs disabled,
so nothing to be done there.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/slub.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 35741592be8c..e46d6abe8a68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2673,18 +2673,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
 
 	slab_free_hook(s, x);
 
-redo:
-	/*
-	 * Determine the currently cpus per cpu slab.
-	 * The cpu may change afterward. However that does not matter since
-	 * data is retrieved via this pointer. If we are on the same cpu
-	 * during the cmpxchg then the free will succedd.
-	 */
 	preempt_disable();
+redo:
 	c = this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);
@@ -2701,6 +2694,7 @@ redo:
 	} else
 		__slab_free(s, page, x, addr);
 
+	preempt_enable();
 }
 
 void kmem_cache_free(struct kmem_cache *s, void *x)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Since per memcg cache destruction is scheduled when the last slab is
freed, to avoid use-after-free in kmem_cache_free we should either
rearrange code in kmem_cache_free so that it won't dereference the cache
ptr after freeing the object, or wait for all kmem_cache_free's to
complete before proceeding to cache destruction.

The former approach isn't a good option from the future development
point of view, because every modifications to kmem_cache_free must be
done with great care then. Hence we should provide a method to wait for
all currently executing kmem_cache_free's to finish.

This patch makes SLUB's implementation of kmem_cache_free
non-preemptable. As a result, synchronize_sched() will work as a barrier
against kmem_cache_free's in flight, so that issuing it before cache
destruction will protect us against the use-after-free.

This won't affect performance of kmem_cache_free, because we already
disable preemption there, and this patch only moves preempt_enable to
the end of the function. Neither should it affect the system latency,
because kmem_cache_free is extremely short, even in its slow path.

SLAB's version of kmem_cache_free already proceeds with irqs disabled,
so nothing to be done there.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/slub.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 35741592be8c..e46d6abe8a68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2673,18 +2673,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
 
 	slab_free_hook(s, x);
 
-redo:
-	/*
-	 * Determine the currently cpus per cpu slab.
-	 * The cpu may change afterward. However that does not matter since
-	 * data is retrieved via this pointer. If we are on the same cpu
-	 * during the cmpxchg then the free will succedd.
-	 */
 	preempt_disable();
+redo:
 	c = this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);
@@ -2701,6 +2694,7 @@ redo:
 	} else
 		__slab_free(s, page, x, addr);
 
+	preempt_enable();
 }
 
 void kmem_cache_free(struct kmem_cache *s, void *x)
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 6/8] memcg: wait for kfree's to finish before destroying cache
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

kmem_cache_free doesn't expect that the cache can be destroyed as soon
as the object is freed, e.g. SLUB's implementation may want to update
cache stats after putting the object to the free list.

Therefore we should wait for all kmem_cache_free's to finish before
proceeding to cache destruction. Since both SLAB and SLUB versions of
kmem_cache_free are non-preemptable, we wait for rcu-sched grace period
to elapse.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/slab.h |    6 ++----
 mm/memcontrol.c      |   34 ++++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index d99d5212b815..68b1feaba9d6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -532,11 +532,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  */
 struct memcg_cache_params {
 	bool is_root_cache;
+	struct rcu_head rcu_head;
 	union {
-		struct {
-			struct rcu_head rcu_head;
-			struct kmem_cache *memcg_caches[0];
-		};
+		struct kmem_cache *memcg_caches[0];
 		struct {
 			struct mem_cgroup *memcg;
 			struct list_head list;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ed42fd1105a5..5b3543a9257e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3232,6 +3232,14 @@ static void memcg_unregister_cache_func(struct work_struct *work)
 	mutex_unlock(&memcg_slab_mutex);
 }
 
+static void memcg_unregister_cache_rcu_func(struct rcu_head *rcu)
+{
+	struct memcg_cache_params *params =
+		container_of(rcu, struct memcg_cache_params, rcu_head);
+
+	schedule_work(&params->unregister_work);
+}
+
 /*
  * During the creation a new cache, we need to disable our accounting mechanism
  * altogether. This is true even if we are not creating, but rather just
@@ -3287,6 +3295,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
 	struct memcg_cache_params *params, *tmp;
+	LIST_HEAD(empty_caches);
 
 	if (!memcg_kmem_is_active(memcg))
 		return;
@@ -3297,7 +3306,26 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 		cachep->memcg_params->dead = true;
 		kmem_cache_shrink(cachep);
 		if (atomic_long_dec_and_test(&cachep->memcg_params->refcnt))
-			memcg_unregister_cache(cachep);
+			list_move(&cachep->memcg_params->list, &empty_caches);
+	}
+
+	/*
+	 * kmem_cache_free doesn't expect that the cache can be destroyed as
+	 * soon as the object is freed, e.g. SLUB's implementation may want to
+	 * update cache stats after putting the object to the free list.
+	 *
+	 * Therefore we should wait for all kmem_cache_free's to finish before
+	 * proceeding to cache destruction. Since both SLAB and SLUB versions
+	 * of kmem_cache_free are non-preemptable, we wait for rcu-sched grace
+	 * period to elapse.
+	 */
+	synchronize_sched();
+
+	while (!list_empty(&empty_caches)) {
+		params = list_first_entry(&empty_caches,
+					  struct memcg_cache_params, list);
+		cachep = memcg_params_to_cache(params);
+		memcg_unregister_cache(cachep);
 	}
 	mutex_unlock(&memcg_slab_mutex);
 }
@@ -3379,7 +3407,9 @@ void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
 	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
 
 	if (unlikely(atomic_long_dec_and_test(&cachep->memcg_params->refcnt)))
-		schedule_work(&cachep->memcg_params->unregister_work);
+		/* see memcg_unregister_all_caches */
+		call_rcu_sched(&cachep->memcg_params->rcu_head,
+			       memcg_unregister_cache_rcu_func);
 }
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 6/8] memcg: wait for kfree's to finish before destroying cache
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

kmem_cache_free doesn't expect that the cache can be destroyed as soon
as the object is freed, e.g. SLUB's implementation may want to update
cache stats after putting the object to the free list.

Therefore we should wait for all kmem_cache_free's to finish before
proceeding to cache destruction. Since both SLAB and SLUB versions of
kmem_cache_free are non-preemptable, we wait for rcu-sched grace period
to elapse.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 include/linux/slab.h |    6 ++----
 mm/memcontrol.c      |   34 ++++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index d99d5212b815..68b1feaba9d6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -532,11 +532,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
  */
 struct memcg_cache_params {
 	bool is_root_cache;
+	struct rcu_head rcu_head;
 	union {
-		struct {
-			struct rcu_head rcu_head;
-			struct kmem_cache *memcg_caches[0];
-		};
+		struct kmem_cache *memcg_caches[0];
 		struct {
 			struct mem_cgroup *memcg;
 			struct list_head list;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ed42fd1105a5..5b3543a9257e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3232,6 +3232,14 @@ static void memcg_unregister_cache_func(struct work_struct *work)
 	mutex_unlock(&memcg_slab_mutex);
 }
 
+static void memcg_unregister_cache_rcu_func(struct rcu_head *rcu)
+{
+	struct memcg_cache_params *params =
+		container_of(rcu, struct memcg_cache_params, rcu_head);
+
+	schedule_work(&params->unregister_work);
+}
+
 /*
  * During the creation a new cache, we need to disable our accounting mechanism
  * altogether. This is true even if we are not creating, but rather just
@@ -3287,6 +3295,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 {
 	struct kmem_cache *cachep;
 	struct memcg_cache_params *params, *tmp;
+	LIST_HEAD(empty_caches);
 
 	if (!memcg_kmem_is_active(memcg))
 		return;
@@ -3297,7 +3306,26 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
 		cachep->memcg_params->dead = true;
 		kmem_cache_shrink(cachep);
 		if (atomic_long_dec_and_test(&cachep->memcg_params->refcnt))
-			memcg_unregister_cache(cachep);
+			list_move(&cachep->memcg_params->list, &empty_caches);
+	}
+
+	/*
+	 * kmem_cache_free doesn't expect that the cache can be destroyed as
+	 * soon as the object is freed, e.g. SLUB's implementation may want to
+	 * update cache stats after putting the object to the free list.
+	 *
+	 * Therefore we should wait for all kmem_cache_free's to finish before
+	 * proceeding to cache destruction. Since both SLAB and SLUB versions
+	 * of kmem_cache_free are non-preemptable, we wait for rcu-sched grace
+	 * period to elapse.
+	 */
+	synchronize_sched();
+
+	while (!list_empty(&empty_caches)) {
+		params = list_first_entry(&empty_caches,
+					  struct memcg_cache_params, list);
+		cachep = memcg_params_to_cache(params);
+		memcg_unregister_cache(cachep);
 	}
 	mutex_unlock(&memcg_slab_mutex);
 }
@@ -3379,7 +3407,9 @@ void __memcg_uncharge_slab(struct kmem_cache *cachep, int order)
 	memcg_uncharge_kmem(cachep->memcg_params->memcg, PAGE_SIZE << order);
 
 	if (unlikely(atomic_long_dec_and_test(&cachep->memcg_params->refcnt)))
-		schedule_work(&cachep->memcg_params->unregister_work);
+		/* see memcg_unregister_all_caches */
+		call_rcu_sched(&cachep->memcg_params->rcu_head,
+			       memcg_unregister_cache_rcu_func);
 }
 
 /*
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Since a dead memcg cache is destroyed only after the last slab allocated
to it is freed, we must disable caching of empty slabs for such caches,
otherwise they will be hanging around forever.

This patch makes SLUB discard dead memcg caches' slabs as soon as they
become empty. To achieve that, it disables per cpu partial lists for
dead caches (see put_cpu_partial) and forbids keeping empty slabs on per
node partial lists by setting cache's min_partial to 0 on
kmem_cache_shrink, which is always called on memcg offline (see
memcg_unregister_all_caches).

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Thanks-to: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slub.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index e46d6abe8a68..1dad7e2c586a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2015,6 +2015,8 @@ static void unfreeze_partials(struct kmem_cache *s,
 #endif
 }
 
+static void flush_all(struct kmem_cache *s);
+
 /*
  * Put a page that was just frozen (in __slab_free) into a partial page
  * slot if available. This is done without interrupts disabled and without
@@ -2064,6 +2066,21 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
+
+	if (memcg_cache_dead(s)) {
+               bool done = false;
+               unsigned long flags;
+
+               local_irq_save(flags);
+               if (this_cpu_read(s->cpu_slab->partial) == page) {
+                       unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
+                       done = true;
+               }
+               local_irq_restore(flags);
+
+               if (!done)
+                       flush_all(s);
+	}
 #endif
 }
 
@@ -3403,6 +3420,9 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 		kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL);
 	unsigned long flags;
 
+	if (memcg_cache_dead(s))
+		s->min_partial = 0;
+
 	if (!slabs_by_inuse) {
 		/*
 		 * Do not fail shrinking empty slabs if allocation of the
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Since a dead memcg cache is destroyed only after the last slab allocated
to it is freed, we must disable caching of empty slabs for such caches,
otherwise they will be hanging around forever.

This patch makes SLUB discard dead memcg caches' slabs as soon as they
become empty. To achieve that, it disables per cpu partial lists for
dead caches (see put_cpu_partial) and forbids keeping empty slabs on per
node partial lists by setting cache's min_partial to 0 on
kmem_cache_shrink, which is always called on memcg offline (see
memcg_unregister_all_caches).

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Thanks-to: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/slub.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index e46d6abe8a68..1dad7e2c586a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2015,6 +2015,8 @@ static void unfreeze_partials(struct kmem_cache *s,
 #endif
 }
 
+static void flush_all(struct kmem_cache *s);
+
 /*
  * Put a page that was just frozen (in __slab_free) into a partial page
  * slot if available. This is done without interrupts disabled and without
@@ -2064,6 +2066,21 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 
 	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
 								!= oldpage);
+
+	if (memcg_cache_dead(s)) {
+               bool done = false;
+               unsigned long flags;
+
+               local_irq_save(flags);
+               if (this_cpu_read(s->cpu_slab->partial) == page) {
+                       unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
+                       done = true;
+               }
+               local_irq_restore(flags);
+
+               if (!done)
+                       flush_all(s);
+	}
 #endif
 }
 
@@ -3403,6 +3420,9 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 		kmalloc(sizeof(struct list_head) * objects, GFP_KERNEL);
 	unsigned long flags;
 
+	if (memcg_cache_dead(s))
+		s->min_partial = 0;
+
 	if (!slabs_by_inuse) {
 		/*
 		 * Do not fail shrinking empty slabs if allocation of the
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-06 13:22 ` Vladimir Davydov
@ 2014-06-06 13:22   ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Since a dead memcg cache is destroyed only after the last slab allocated
to it is freed, we must disable caching of empty slabs for such caches,
otherwise they will be hanging around forever.

This patch makes SLAB discard dead memcg caches' slabs as soon as they
become empty. To achieve that, it disables per cpu free object arrays by
setting array_cache->limit to 0 on each cpu and sets per node free_limit
to 0 in order to zap slabs_free lists. This is done on kmem_cache_shrink
(in do_drain, drain_array, drain_alien_cache, and drain_freelist to be
more exact), which is always called on memcg offline (see
memcg_unregister_all_caches)

Note, since array_cache->limit and kmem_cache_node->free_limit are per
cpu/node and, as a result, they may be updated on cpu/node
online/offline, we have to patch every place where the limits are
initialized.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/slab.c |   83 +++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 62 insertions(+), 21 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9ca3b87edabc..80117a13b899 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -740,7 +740,8 @@ static void start_cpu_timer(int cpu)
 	}
 }
 
-static struct array_cache *alloc_arraycache(int node, int entries,
+static struct array_cache *alloc_arraycache(struct kmem_cache *cachep,
+					    int node, int entries,
 					    int batchcount, gfp_t gfp)
 {
 	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);
@@ -757,7 +758,7 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 	kmemleak_no_scan(nc);
 	if (nc) {
 		nc->avail = 0;
-		nc->limit = entries;
+		nc->limit = memcg_cache_dead(cachep) ? 0 : entries;
 		nc->batchcount = batchcount;
 		nc->touched = 0;
 		spin_lock_init(&nc->lock);
@@ -909,7 +910,8 @@ static int transfer_objects(struct array_cache *to,
 #define drain_alien_cache(cachep, alien) do { } while (0)
 #define reap_alien(cachep, n) do { } while (0)
 
-static inline struct array_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
+static inline struct array_cache **
+alloc_alien_cache(struct kmem_cache *cachep, int node, int limit, gfp_t gfp)
 {
 	return (struct array_cache **)BAD_ALIEN_MAGIC;
 }
@@ -940,7 +942,8 @@ static inline void *____cache_alloc_node(struct kmem_cache *cachep,
 static void *____cache_alloc_node(struct kmem_cache *, gfp_t, int);
 static void *alternate_node_alloc(struct kmem_cache *, gfp_t);
 
-static struct array_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
+static struct array_cache **alloc_alien_cache(struct kmem_cache *cachep,
+					      int node, int limit, gfp_t gfp)
 {
 	struct array_cache **ac_ptr;
 	int memsize = sizeof(void *) * nr_node_ids;
@@ -953,7 +956,8 @@ static struct array_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
 		for_each_node(i) {
 			if (i == node || !node_online(i))
 				continue;
-			ac_ptr[i] = alloc_arraycache(node, limit, 0xbaadf00d, gfp);
+			ac_ptr[i] = alloc_arraycache(cachep, node, limit,
+						     0xbaadf00d, gfp);
 			if (!ac_ptr[i]) {
 				for (i--; i >= 0; i--)
 					kfree(ac_ptr[i]);
@@ -1026,6 +1030,8 @@ static void drain_alien_cache(struct kmem_cache *cachep,
 		if (ac) {
 			spin_lock_irqsave(&ac->lock, flags);
 			__drain_alien_cache(cachep, ac, i);
+			if (memcg_cache_dead(cachep))
+				ac->limit = 0;
 			spin_unlock_irqrestore(&ac->lock, flags);
 		}
 	}
@@ -1037,6 +1043,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 	struct kmem_cache_node *n;
 	struct array_cache *alien = NULL;
 	int node;
+	bool freed_alien = false;
 
 	node = numa_mem_id();
 
@@ -1053,12 +1060,18 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 		alien = n->alien[nodeid];
 		spin_lock(&alien->lock);
 		if (unlikely(alien->avail == alien->limit)) {
+			if (!alien->limit)
+				goto out;
 			STATS_INC_ACOVERFLOW(cachep);
 			__drain_alien_cache(cachep, alien, nodeid);
 		}
 		ac_put_obj(cachep, alien, objp);
+		freed_alien = true;
+out:
 		spin_unlock(&alien->lock);
-	} else {
+	}
+
+	if (!freed_alien) {
 		spin_lock(&(cachep->node[nodeid])->list_lock);
 		free_block(cachep, &objp, 1, nodeid);
 		spin_unlock(&(cachep->node[nodeid])->list_lock);
@@ -1067,6 +1080,13 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 }
 #endif
 
+static void init_cache_node_free_limit(struct kmem_cache *cachep,
+				       struct kmem_cache_node *n, int node)
+{
+	n->free_limit = memcg_cache_dead(cachep) ? 0 :
+		(1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num;
+}
+
 /*
  * Allocates and initializes node for a node on each slab cache, used for
  * either memory or cpu hotplug.  If memory is being hot-added, the kmem_cache_node
@@ -1105,9 +1125,7 @@ static int init_cache_node_node(int node)
 		}
 
 		spin_lock_irq(&cachep->node[node]->list_lock);
-		cachep->node[node]->free_limit =
-			(1 + nr_cpus_node(node)) *
-			cachep->batchcount + cachep->num;
+		init_cache_node_free_limit(cachep, cachep->node[node], node);
 		spin_unlock_irq(&cachep->node[node]->list_lock);
 	}
 	return 0;
@@ -1142,7 +1160,8 @@ static void cpuup_canceled(long cpu)
 		spin_lock_irq(&n->list_lock);
 
 		/* Free limit for this kmem_cache_node */
-		n->free_limit -= cachep->batchcount;
+		if (n->free_limit >= cachep->batchcount)
+			n->free_limit -= cachep->batchcount;
 		if (nc)
 			free_block(cachep, nc->entry, nc->avail, node);
 
@@ -1210,12 +1229,12 @@ static int cpuup_prepare(long cpu)
 		struct array_cache *shared = NULL;
 		struct array_cache **alien = NULL;
 
-		nc = alloc_arraycache(node, cachep->limit,
+		nc = alloc_arraycache(cachep, node, cachep->limit,
 					cachep->batchcount, GFP_KERNEL);
 		if (!nc)
 			goto bad;
 		if (cachep->shared) {
-			shared = alloc_arraycache(node,
+			shared = alloc_arraycache(cachep, node,
 				cachep->shared * cachep->batchcount,
 				0xbaadf00d, GFP_KERNEL);
 			if (!shared) {
@@ -1224,7 +1243,8 @@ static int cpuup_prepare(long cpu)
 			}
 		}
 		if (use_alien_caches) {
-			alien = alloc_alien_cache(node, cachep->limit, GFP_KERNEL);
+			alien = alloc_alien_cache(cachep, node,
+						  cachep->limit, GFP_KERNEL);
 			if (!alien) {
 				kfree(shared);
 				kfree(nc);
@@ -2415,6 +2435,9 @@ static void do_drain(void *arg)
 	free_block(cachep, ac->entry, ac->avail, node);
 	spin_unlock(&cachep->node[node]->list_lock);
 	ac->avail = 0;
+
+	if (memcg_cache_dead(cachep))
+		ac->limit = 0;
 }
 
 static void drain_cpu_caches(struct kmem_cache *cachep)
@@ -2450,6 +2473,12 @@ static int drain_freelist(struct kmem_cache *cache,
 	int nr_freed;
 	struct page *page;
 
+	if (memcg_cache_dead(cache) && n->free_limit) {
+		spin_lock_irq(&n->list_lock);
+		n->free_limit = 0;
+		spin_unlock_irq(&n->list_lock);
+	}
+
 	nr_freed = 0;
 	while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
 
@@ -3468,9 +3497,16 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
 
 	if (likely(ac->avail < ac->limit)) {
 		STATS_INC_FREEHIT(cachep);
-	} else {
+	} else if (ac->limit) {
 		STATS_INC_FREEMISS(cachep);
 		cache_flusharray(cachep, ac);
+	} else {
+		int nodeid = page_to_nid(virt_to_page(objp));
+
+		spin_lock(&(cachep->node[nodeid])->list_lock);
+		free_block(cachep, &objp, 1, nodeid);
+		spin_unlock(&(cachep->node[nodeid])->list_lock);
+		return;
 	}
 
 	ac_put_obj(cachep, ac, objp);
@@ -3698,14 +3734,15 @@ static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
 	for_each_online_node(node) {
 
                 if (use_alien_caches) {
-                        new_alien = alloc_alien_cache(node, cachep->limit, gfp);
+                        new_alien = alloc_alien_cache(cachep, node,
+						      cachep->limit, gfp);
                         if (!new_alien)
                                 goto fail;
                 }
 
 		new_shared = NULL;
 		if (cachep->shared) {
-			new_shared = alloc_arraycache(node,
+			new_shared = alloc_arraycache(cachep, node,
 				cachep->shared*cachep->batchcount,
 					0xbaadf00d, gfp);
 			if (!new_shared) {
@@ -3729,8 +3766,7 @@ static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
 				n->alien = new_alien;
 				new_alien = NULL;
 			}
-			n->free_limit = (1 + nr_cpus_node(node)) *
-					cachep->batchcount + cachep->num;
+			init_cache_node_free_limit(cachep, n, node);
 			spin_unlock_irq(&n->list_lock);
 			kfree(shared);
 			free_alien_cache(new_alien);
@@ -3748,8 +3784,7 @@ static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
 				((unsigned long)cachep) % REAPTIMEOUT_NODE;
 		n->shared = new_shared;
 		n->alien = new_alien;
-		n->free_limit = (1 + nr_cpus_node(node)) *
-					cachep->batchcount + cachep->num;
+		init_cache_node_free_limit(cachep, n, node);
 		cachep->node[node] = n;
 	}
 	return 0;
@@ -3803,7 +3838,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
 		return -ENOMEM;
 
 	for_each_online_cpu(i) {
-		new->new[i] = alloc_arraycache(cpu_to_mem(i), limit,
+		new->new[i] = alloc_arraycache(cachep, cpu_to_mem(i), limit,
 						batchcount, gfp);
 		if (!new->new[i]) {
 			for (i--; i >= 0; i--)
@@ -3937,6 +3972,12 @@ static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
 {
 	int tofree;
 
+	if (memcg_cache_dead(cachep) && ac && ac->limit) {
+		spin_lock_irq(&n->list_lock);
+		ac->limit = 0;
+		spin_unlock_irq(&n->list_lock);
+	}
+
 	if (!ac || !ac->avail)
 		return;
 	if (ac->touched && !force) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-06 13:22   ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-06 13:22 UTC (permalink / raw)
  To: akpm
  Cc: cl, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

Since a dead memcg cache is destroyed only after the last slab allocated
to it is freed, we must disable caching of empty slabs for such caches,
otherwise they will be hanging around forever.

This patch makes SLAB discard dead memcg caches' slabs as soon as they
become empty. To achieve that, it disables per cpu free object arrays by
setting array_cache->limit to 0 on each cpu and sets per node free_limit
to 0 in order to zap slabs_free lists. This is done on kmem_cache_shrink
(in do_drain, drain_array, drain_alien_cache, and drain_freelist to be
more exact), which is always called on memcg offline (see
memcg_unregister_all_caches)

Note, since array_cache->limit and kmem_cache_node->free_limit are per
cpu/node and, as a result, they may be updated on cpu/node
online/offline, we have to patch every place where the limits are
initialized.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 mm/slab.c |   83 +++++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 62 insertions(+), 21 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 9ca3b87edabc..80117a13b899 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -740,7 +740,8 @@ static void start_cpu_timer(int cpu)
 	}
 }
 
-static struct array_cache *alloc_arraycache(int node, int entries,
+static struct array_cache *alloc_arraycache(struct kmem_cache *cachep,
+					    int node, int entries,
 					    int batchcount, gfp_t gfp)
 {
 	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);
@@ -757,7 +758,7 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 	kmemleak_no_scan(nc);
 	if (nc) {
 		nc->avail = 0;
-		nc->limit = entries;
+		nc->limit = memcg_cache_dead(cachep) ? 0 : entries;
 		nc->batchcount = batchcount;
 		nc->touched = 0;
 		spin_lock_init(&nc->lock);
@@ -909,7 +910,8 @@ static int transfer_objects(struct array_cache *to,
 #define drain_alien_cache(cachep, alien) do { } while (0)
 #define reap_alien(cachep, n) do { } while (0)
 
-static inline struct array_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
+static inline struct array_cache **
+alloc_alien_cache(struct kmem_cache *cachep, int node, int limit, gfp_t gfp)
 {
 	return (struct array_cache **)BAD_ALIEN_MAGIC;
 }
@@ -940,7 +942,8 @@ static inline void *____cache_alloc_node(struct kmem_cache *cachep,
 static void *____cache_alloc_node(struct kmem_cache *, gfp_t, int);
 static void *alternate_node_alloc(struct kmem_cache *, gfp_t);
 
-static struct array_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
+static struct array_cache **alloc_alien_cache(struct kmem_cache *cachep,
+					      int node, int limit, gfp_t gfp)
 {
 	struct array_cache **ac_ptr;
 	int memsize = sizeof(void *) * nr_node_ids;
@@ -953,7 +956,8 @@ static struct array_cache **alloc_alien_cache(int node, int limit, gfp_t gfp)
 		for_each_node(i) {
 			if (i == node || !node_online(i))
 				continue;
-			ac_ptr[i] = alloc_arraycache(node, limit, 0xbaadf00d, gfp);
+			ac_ptr[i] = alloc_arraycache(cachep, node, limit,
+						     0xbaadf00d, gfp);
 			if (!ac_ptr[i]) {
 				for (i--; i >= 0; i--)
 					kfree(ac_ptr[i]);
@@ -1026,6 +1030,8 @@ static void drain_alien_cache(struct kmem_cache *cachep,
 		if (ac) {
 			spin_lock_irqsave(&ac->lock, flags);
 			__drain_alien_cache(cachep, ac, i);
+			if (memcg_cache_dead(cachep))
+				ac->limit = 0;
 			spin_unlock_irqrestore(&ac->lock, flags);
 		}
 	}
@@ -1037,6 +1043,7 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 	struct kmem_cache_node *n;
 	struct array_cache *alien = NULL;
 	int node;
+	bool freed_alien = false;
 
 	node = numa_mem_id();
 
@@ -1053,12 +1060,18 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 		alien = n->alien[nodeid];
 		spin_lock(&alien->lock);
 		if (unlikely(alien->avail == alien->limit)) {
+			if (!alien->limit)
+				goto out;
 			STATS_INC_ACOVERFLOW(cachep);
 			__drain_alien_cache(cachep, alien, nodeid);
 		}
 		ac_put_obj(cachep, alien, objp);
+		freed_alien = true;
+out:
 		spin_unlock(&alien->lock);
-	} else {
+	}
+
+	if (!freed_alien) {
 		spin_lock(&(cachep->node[nodeid])->list_lock);
 		free_block(cachep, &objp, 1, nodeid);
 		spin_unlock(&(cachep->node[nodeid])->list_lock);
@@ -1067,6 +1080,13 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
 }
 #endif
 
+static void init_cache_node_free_limit(struct kmem_cache *cachep,
+				       struct kmem_cache_node *n, int node)
+{
+	n->free_limit = memcg_cache_dead(cachep) ? 0 :
+		(1 + nr_cpus_node(node)) * cachep->batchcount + cachep->num;
+}
+
 /*
  * Allocates and initializes node for a node on each slab cache, used for
  * either memory or cpu hotplug.  If memory is being hot-added, the kmem_cache_node
@@ -1105,9 +1125,7 @@ static int init_cache_node_node(int node)
 		}
 
 		spin_lock_irq(&cachep->node[node]->list_lock);
-		cachep->node[node]->free_limit =
-			(1 + nr_cpus_node(node)) *
-			cachep->batchcount + cachep->num;
+		init_cache_node_free_limit(cachep, cachep->node[node], node);
 		spin_unlock_irq(&cachep->node[node]->list_lock);
 	}
 	return 0;
@@ -1142,7 +1160,8 @@ static void cpuup_canceled(long cpu)
 		spin_lock_irq(&n->list_lock);
 
 		/* Free limit for this kmem_cache_node */
-		n->free_limit -= cachep->batchcount;
+		if (n->free_limit >= cachep->batchcount)
+			n->free_limit -= cachep->batchcount;
 		if (nc)
 			free_block(cachep, nc->entry, nc->avail, node);
 
@@ -1210,12 +1229,12 @@ static int cpuup_prepare(long cpu)
 		struct array_cache *shared = NULL;
 		struct array_cache **alien = NULL;
 
-		nc = alloc_arraycache(node, cachep->limit,
+		nc = alloc_arraycache(cachep, node, cachep->limit,
 					cachep->batchcount, GFP_KERNEL);
 		if (!nc)
 			goto bad;
 		if (cachep->shared) {
-			shared = alloc_arraycache(node,
+			shared = alloc_arraycache(cachep, node,
 				cachep->shared * cachep->batchcount,
 				0xbaadf00d, GFP_KERNEL);
 			if (!shared) {
@@ -1224,7 +1243,8 @@ static int cpuup_prepare(long cpu)
 			}
 		}
 		if (use_alien_caches) {
-			alien = alloc_alien_cache(node, cachep->limit, GFP_KERNEL);
+			alien = alloc_alien_cache(cachep, node,
+						  cachep->limit, GFP_KERNEL);
 			if (!alien) {
 				kfree(shared);
 				kfree(nc);
@@ -2415,6 +2435,9 @@ static void do_drain(void *arg)
 	free_block(cachep, ac->entry, ac->avail, node);
 	spin_unlock(&cachep->node[node]->list_lock);
 	ac->avail = 0;
+
+	if (memcg_cache_dead(cachep))
+		ac->limit = 0;
 }
 
 static void drain_cpu_caches(struct kmem_cache *cachep)
@@ -2450,6 +2473,12 @@ static int drain_freelist(struct kmem_cache *cache,
 	int nr_freed;
 	struct page *page;
 
+	if (memcg_cache_dead(cache) && n->free_limit) {
+		spin_lock_irq(&n->list_lock);
+		n->free_limit = 0;
+		spin_unlock_irq(&n->list_lock);
+	}
+
 	nr_freed = 0;
 	while (nr_freed < tofree && !list_empty(&n->slabs_free)) {
 
@@ -3468,9 +3497,16 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
 
 	if (likely(ac->avail < ac->limit)) {
 		STATS_INC_FREEHIT(cachep);
-	} else {
+	} else if (ac->limit) {
 		STATS_INC_FREEMISS(cachep);
 		cache_flusharray(cachep, ac);
+	} else {
+		int nodeid = page_to_nid(virt_to_page(objp));
+
+		spin_lock(&(cachep->node[nodeid])->list_lock);
+		free_block(cachep, &objp, 1, nodeid);
+		spin_unlock(&(cachep->node[nodeid])->list_lock);
+		return;
 	}
 
 	ac_put_obj(cachep, ac, objp);
@@ -3698,14 +3734,15 @@ static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
 	for_each_online_node(node) {
 
                 if (use_alien_caches) {
-                        new_alien = alloc_alien_cache(node, cachep->limit, gfp);
+                        new_alien = alloc_alien_cache(cachep, node,
+						      cachep->limit, gfp);
                         if (!new_alien)
                                 goto fail;
                 }
 
 		new_shared = NULL;
 		if (cachep->shared) {
-			new_shared = alloc_arraycache(node,
+			new_shared = alloc_arraycache(cachep, node,
 				cachep->shared*cachep->batchcount,
 					0xbaadf00d, gfp);
 			if (!new_shared) {
@@ -3729,8 +3766,7 @@ static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
 				n->alien = new_alien;
 				new_alien = NULL;
 			}
-			n->free_limit = (1 + nr_cpus_node(node)) *
-					cachep->batchcount + cachep->num;
+			init_cache_node_free_limit(cachep, n, node);
 			spin_unlock_irq(&n->list_lock);
 			kfree(shared);
 			free_alien_cache(new_alien);
@@ -3748,8 +3784,7 @@ static int alloc_kmem_cache_node(struct kmem_cache *cachep, gfp_t gfp)
 				((unsigned long)cachep) % REAPTIMEOUT_NODE;
 		n->shared = new_shared;
 		n->alien = new_alien;
-		n->free_limit = (1 + nr_cpus_node(node)) *
-					cachep->batchcount + cachep->num;
+		init_cache_node_free_limit(cachep, n, node);
 		cachep->node[node] = n;
 	}
 	return 0;
@@ -3803,7 +3838,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
 		return -ENOMEM;
 
 	for_each_online_cpu(i) {
-		new->new[i] = alloc_arraycache(cpu_to_mem(i), limit,
+		new->new[i] = alloc_arraycache(cachep, cpu_to_mem(i), limit,
 						batchcount, gfp);
 		if (!new->new[i]) {
 			for (i--; i >= 0; i--)
@@ -3937,6 +3972,12 @@ static void drain_array(struct kmem_cache *cachep, struct kmem_cache_node *n,
 {
 	int tofree;
 
+	if (memcg_cache_dead(cachep) && ac && ac->limit) {
+		spin_lock_irq(&n->list_lock);
+		ac->limit = 0;
+		spin_unlock_irq(&n->list_lock);
+	}
+
 	if (!ac || !ac->avail)
 		return;
 	if (ac->touched && !force) {
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-06 14:46     ` Christoph Lameter
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-06 14:46 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, 6 Jun 2014, Vladimir Davydov wrote:

> This patch makes SLUB's implementation of kmem_cache_free
> non-preemptable. As a result, synchronize_sched() will work as a barrier
> against kmem_cache_free's in flight, so that issuing it before cache
> destruction will protect us against the use-after-free.


Subject: slub: reenable preemption before the freeing of slabs from slab_free

I would prefer to call the page allocator with preemption enabled if possible.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2014-05-29 11:45:32.065859887 -0500
+++ linux/mm/slub.c	2014-06-06 09:45:12.822480834 -0500
@@ -1998,6 +1998,7 @@
 	if (n)
 		spin_unlock(&n->list_lock);

+	preempt_enable();
 	while (discard_page) {
 		page = discard_page;
 		discard_page = discard_page->next;
@@ -2006,6 +2007,7 @@
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
+	preempt_disable();
 #endif
 }


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
@ 2014-06-06 14:46     ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-06 14:46 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, 6 Jun 2014, Vladimir Davydov wrote:

> This patch makes SLUB's implementation of kmem_cache_free
> non-preemptable. As a result, synchronize_sched() will work as a barrier
> against kmem_cache_free's in flight, so that issuing it before cache
> destruction will protect us against the use-after-free.


Subject: slub: reenable preemption before the freeing of slabs from slab_free

I would prefer to call the page allocator with preemption enabled if possible.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2014-05-29 11:45:32.065859887 -0500
+++ linux/mm/slub.c	2014-06-06 09:45:12.822480834 -0500
@@ -1998,6 +1998,7 @@
 	if (n)
 		spin_unlock(&n->list_lock);

+	preempt_enable();
 	while (discard_page) {
 		page = discard_page;
 		discard_page = discard_page->next;
@@ -2006,6 +2007,7 @@
 		discard_slab(s, page);
 		stat(s, FREE_SLAB);
 	}
+	preempt_disable();
 #endif
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-06 14:48     ` Christoph Lameter
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-06 14:48 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, 6 Jun 2014, Vladimir Davydov wrote:

> Since a dead memcg cache is destroyed only after the last slab allocated
> to it is freed, we must disable caching of empty slabs for such caches,
> otherwise they will be hanging around forever.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
@ 2014-06-06 14:48     ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-06 14:48 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, 6 Jun 2014, Vladimir Davydov wrote:

> Since a dead memcg cache is destroyed only after the last slab allocated
> to it is freed, we must disable caching of empty slabs for such caches,
> otherwise they will be hanging around forever.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-06 14:52     ` Christoph Lameter
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-06 14:52 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, 6 Jun 2014, Vladimir Davydov wrote:

> @@ -740,7 +740,8 @@ static void start_cpu_timer(int cpu)
>  	}
>  }
>
> -static struct array_cache *alloc_arraycache(int node, int entries,
> +static struct array_cache *alloc_arraycache(struct kmem_cache *cachep,
> +					    int node, int entries,
>  					    int batchcount, gfp_t gfp)
>  {
>  	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);

If you pass struct kmem_cache * into alloc_arraycache then we do not need
to pass entries or batchcount because they are available in struct
kmem_cache.

Otherwise this patch looks a bit too large to me. Simplify a bit?


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-06 14:52     ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-06 14:52 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, 6 Jun 2014, Vladimir Davydov wrote:

> @@ -740,7 +740,8 @@ static void start_cpu_timer(int cpu)
>  	}
>  }
>
> -static struct array_cache *alloc_arraycache(int node, int entries,
> +static struct array_cache *alloc_arraycache(struct kmem_cache *cachep,
> +					    int node, int entries,
>  					    int batchcount, gfp_t gfp)
>  {
>  	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);

If you pass struct kmem_cache * into alloc_arraycache then we do not need
to pass entries or batchcount because they are available in struct
kmem_cache.

Otherwise this patch looks a bit too large to me. Simplify a bit?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
  2014-06-06 14:46     ` Christoph Lameter
@ 2014-06-09 12:52       ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-09 12:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 09:46:57AM -0500, Christoph Lameter wrote:
> On Fri, 6 Jun 2014, Vladimir Davydov wrote:
> 
> > This patch makes SLUB's implementation of kmem_cache_free
> > non-preemptable. As a result, synchronize_sched() will work as a barrier
> > against kmem_cache_free's in flight, so that issuing it before cache
> > destruction will protect us against the use-after-free.
> 
> 
> Subject: slub: reenable preemption before the freeing of slabs from slab_free
> 
> I would prefer to call the page allocator with preemption enabled if possible.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2014-05-29 11:45:32.065859887 -0500
> +++ linux/mm/slub.c	2014-06-06 09:45:12.822480834 -0500
> @@ -1998,6 +1998,7 @@
>  	if (n)
>  		spin_unlock(&n->list_lock);
> 
> +	preempt_enable();

The whole function (unfreeze_partials) is currently called with irqs
off, so this is effectively a no-op. I guess we can restore irqs here
though.

>  	while (discard_page) {
>  		page = discard_page;
>  		discard_page = discard_page->next;
> @@ -2006,6 +2007,7 @@
>  		discard_slab(s, page);

If we just freed the last slab of the cache and then get preempted
(suppose we restored irqs above), nothing will prevent the cache from
destruction, which may result in use-after-free below. We need to be
more cautious if we want to call for page allocator with preemption and
irqs on.

However, I still don't understand what's the point in it. We *already*
call discard_slab with irqs disabled, which is harder, and it haven't
caused any problems AFAIK. Moreover, even if we enabled preemption/irqs,
it wouldn't guarantee that discard_slab would always be called with
preemption/irqs on, because the whole function - I mean kmem_cache_free
- can be called with preemption/irqs disabled.

So my point it would only complicate the code.

Thanks.

>  		stat(s, FREE_SLAB);
>  	}
> +	preempt_disable();
>  #endif
>  }
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
@ 2014-06-09 12:52       ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-09 12:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 09:46:57AM -0500, Christoph Lameter wrote:
> On Fri, 6 Jun 2014, Vladimir Davydov wrote:
> 
> > This patch makes SLUB's implementation of kmem_cache_free
> > non-preemptable. As a result, synchronize_sched() will work as a barrier
> > against kmem_cache_free's in flight, so that issuing it before cache
> > destruction will protect us against the use-after-free.
> 
> 
> Subject: slub: reenable preemption before the freeing of slabs from slab_free
> 
> I would prefer to call the page allocator with preemption enabled if possible.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2014-05-29 11:45:32.065859887 -0500
> +++ linux/mm/slub.c	2014-06-06 09:45:12.822480834 -0500
> @@ -1998,6 +1998,7 @@
>  	if (n)
>  		spin_unlock(&n->list_lock);
> 
> +	preempt_enable();

The whole function (unfreeze_partials) is currently called with irqs
off, so this is effectively a no-op. I guess we can restore irqs here
though.

>  	while (discard_page) {
>  		page = discard_page;
>  		discard_page = discard_page->next;
> @@ -2006,6 +2007,7 @@
>  		discard_slab(s, page);

If we just freed the last slab of the cache and then get preempted
(suppose we restored irqs above), nothing will prevent the cache from
destruction, which may result in use-after-free below. We need to be
more cautious if we want to call for page allocator with preemption and
irqs on.

However, I still don't understand what's the point in it. We *already*
call discard_slab with irqs disabled, which is harder, and it haven't
caused any problems AFAIK. Moreover, even if we enabled preemption/irqs,
it wouldn't guarantee that discard_slab would always be called with
preemption/irqs on, because the whole function - I mean kmem_cache_free
- can be called with preemption/irqs disabled.

So my point it would only complicate the code.

Thanks.

>  		stat(s, FREE_SLAB);
>  	}
> +	preempt_disable();
>  #endif
>  }
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-06 14:52     ` Christoph Lameter
@ 2014-06-09 13:04       ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-09 13:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 09:52:17AM -0500, Christoph Lameter wrote:
> On Fri, 6 Jun 2014, Vladimir Davydov wrote:
> 
> > @@ -740,7 +740,8 @@ static void start_cpu_timer(int cpu)
> >  	}
> >  }
> >
> > -static struct array_cache *alloc_arraycache(int node, int entries,
> > +static struct array_cache *alloc_arraycache(struct kmem_cache *cachep,
> > +					    int node, int entries,
> >  					    int batchcount, gfp_t gfp)
> >  {
> >  	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);
> 
> If you pass struct kmem_cache * into alloc_arraycache then we do not need
> to pass entries or batchcount because they are available in struct
> kmem_cache.

Seems you're right. Will rework.

> Otherwise this patch looks a bit too large to me. Simplify a bit?

Yeah, exactly. I will split it and resend.

Thank you.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-09 13:04       ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-09 13:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 09:52:17AM -0500, Christoph Lameter wrote:
> On Fri, 6 Jun 2014, Vladimir Davydov wrote:
> 
> > @@ -740,7 +740,8 @@ static void start_cpu_timer(int cpu)
> >  	}
> >  }
> >
> > -static struct array_cache *alloc_arraycache(int node, int entries,
> > +static struct array_cache *alloc_arraycache(struct kmem_cache *cachep,
> > +					    int node, int entries,
> >  					    int batchcount, gfp_t gfp)
> >  {
> >  	int memsize = sizeof(void *) * entries + sizeof(struct array_cache);
> 
> If you pass struct kmem_cache * into alloc_arraycache then we do not need
> to pass entries or batchcount because they are available in struct
> kmem_cache.

Seems you're right. Will rework.

> Otherwise this patch looks a bit too large to me. Simplify a bit?

Yeah, exactly. I will split it and resend.

Thank you.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
  2014-06-09 12:52       ` Vladimir Davydov
@ 2014-06-09 13:52         ` Christoph Lameter
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-09 13:52 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Mon, 9 Jun 2014, Vladimir Davydov wrote:

> The whole function (unfreeze_partials) is currently called with irqs
> off, so this is effectively a no-op. I guess we can restore irqs here
> though.

We could move the local_irq_save from put_cpu_partial() into
unfreeze_partials().

> If we just freed the last slab of the cache and then get preempted
> (suppose we restored irqs above), nothing will prevent the cache from
> destruction, which may result in use-after-free below. We need to be
> more cautious if we want to call for page allocator with preemption and
> irqs on.

Hmmm. Ok.
>
> However, I still don't understand what's the point in it. We *already*
> call discard_slab with irqs disabled, which is harder, and it haven't
> caused any problems AFAIK. Moreover, even if we enabled preemption/irqs,
> it wouldn't guarantee that discard_slab would always be called with
> preemption/irqs on, because the whole function - I mean kmem_cache_free
> - can be called with preemption/irqs disabled.
>
> So my point it would only complicate the code.

Ok.

Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
@ 2014-06-09 13:52         ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-09 13:52 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, iamjoonsoo.kim, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Mon, 9 Jun 2014, Vladimir Davydov wrote:

> The whole function (unfreeze_partials) is currently called with irqs
> off, so this is effectively a no-op. I guess we can restore irqs here
> though.

We could move the local_irq_save from put_cpu_partial() into
unfreeze_partials().

> If we just freed the last slab of the cache and then get preempted
> (suppose we restored irqs above), nothing will prevent the cache from
> destruction, which may result in use-after-free below. We need to be
> more cautious if we want to call for page allocator with preemption and
> irqs on.

Hmmm. Ok.
>
> However, I still don't understand what's the point in it. We *already*
> call discard_slab with irqs disabled, which is harder, and it haven't
> caused any problems AFAIK. Moreover, even if we enabled preemption/irqs,
> it wouldn't guarantee that discard_slab would always be called with
> preemption/irqs on, because the whole function - I mean kmem_cache_free
> - can be called with preemption/irqs disabled.
>
> So my point it would only complicate the code.

Ok.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-10  7:43     ` Joonsoo Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-10  7:43 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:45PM +0400, Vladimir Davydov wrote:
> Since a dead memcg cache is destroyed only after the last slab allocated
> to it is freed, we must disable caching of empty slabs for such caches,
> otherwise they will be hanging around forever.
> 
> This patch makes SLAB discard dead memcg caches' slabs as soon as they
> become empty. To achieve that, it disables per cpu free object arrays by
> setting array_cache->limit to 0 on each cpu and sets per node free_limit
> to 0 in order to zap slabs_free lists. This is done on kmem_cache_shrink
> (in do_drain, drain_array, drain_alien_cache, and drain_freelist to be
> more exact), which is always called on memcg offline (see
> memcg_unregister_all_caches)
> 
> Note, since array_cache->limit and kmem_cache_node->free_limit are per
> cpu/node and, as a result, they may be updated on cpu/node
> online/offline, we have to patch every place where the limits are
> initialized.

Hello,

You mentioned that disabling per cpu arrays would degrade performance.
But, this patch is implemented to disable per cpu arrays. Is there any
reason to do like this? How about not disabling per cpu arrays and
others? Leaving it as is makes the patch less intrusive and has low
impact on performance. I guess that amount of reclaimed memory has no
big difference between both approaches.

Thanks.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-10  7:43     ` Joonsoo Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-10  7:43 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:45PM +0400, Vladimir Davydov wrote:
> Since a dead memcg cache is destroyed only after the last slab allocated
> to it is freed, we must disable caching of empty slabs for such caches,
> otherwise they will be hanging around forever.
> 
> This patch makes SLAB discard dead memcg caches' slabs as soon as they
> become empty. To achieve that, it disables per cpu free object arrays by
> setting array_cache->limit to 0 on each cpu and sets per node free_limit
> to 0 in order to zap slabs_free lists. This is done on kmem_cache_shrink
> (in do_drain, drain_array, drain_alien_cache, and drain_freelist to be
> more exact), which is always called on memcg offline (see
> memcg_unregister_all_caches)
> 
> Note, since array_cache->limit and kmem_cache_node->free_limit are per
> cpu/node and, as a result, they may be updated on cpu/node
> online/offline, we have to patch every place where the limits are
> initialized.

Hello,

You mentioned that disabling per cpu arrays would degrade performance.
But, this patch is implemented to disable per cpu arrays. Is there any
reason to do like this? How about not disabling per cpu arrays and
others? Leaving it as is makes the patch less intrusive and has low
impact on performance. I guess that amount of reclaimed memory has no
big difference between both approaches.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-10  7:48     ` Joonsoo Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-10  7:48 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:40PM +0400, Vladimir Davydov wrote:
> This will be used by the next patches.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> ---
>  include/linux/slab.h |    2 ++
>  mm/memcontrol.c      |    1 +
>  mm/slab.h            |   10 ++++++++++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index d9716fdc8211..d99d5212b815 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -527,6 +527,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>   * @list: list_head for the list of all caches in this memcg
>   * @root_cache: pointer to the global, root cache, this cache was derived from
>   * @refcnt: reference counter
> + * @dead: set to true when owner memcg is turned offline
>   * @unregister_work: worker to destroy the cache
>   */
>  struct memcg_cache_params {
> @@ -541,6 +542,7 @@ struct memcg_cache_params {
>  			struct list_head list;
>  			struct kmem_cache *root_cache;
>  			atomic_long_t refcnt;
> +			bool dead;
>  			struct work_struct unregister_work;
>  		};
>  	};
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 886b5b414958..ed42fd1105a5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3294,6 +3294,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
>  	mutex_lock(&memcg_slab_mutex);
>  	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
>  		cachep = memcg_params_to_cache(params);
> +		cachep->memcg_params->dead = true;

I guess that this needs smp_wmb() and memcg_cache_dead() needs
smp_rmb(), since we could call memcg_cache_dead() without holding any locks.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead
@ 2014-06-10  7:48     ` Joonsoo Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-10  7:48 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:40PM +0400, Vladimir Davydov wrote:
> This will be used by the next patches.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> ---
>  include/linux/slab.h |    2 ++
>  mm/memcontrol.c      |    1 +
>  mm/slab.h            |   10 ++++++++++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index d9716fdc8211..d99d5212b815 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -527,6 +527,7 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>   * @list: list_head for the list of all caches in this memcg
>   * @root_cache: pointer to the global, root cache, this cache was derived from
>   * @refcnt: reference counter
> + * @dead: set to true when owner memcg is turned offline
>   * @unregister_work: worker to destroy the cache
>   */
>  struct memcg_cache_params {
> @@ -541,6 +542,7 @@ struct memcg_cache_params {
>  			struct list_head list;
>  			struct kmem_cache *root_cache;
>  			atomic_long_t refcnt;
> +			bool dead;
>  			struct work_struct unregister_work;
>  		};
>  	};
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 886b5b414958..ed42fd1105a5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3294,6 +3294,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
>  	mutex_lock(&memcg_slab_mutex);
>  	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
>  		cachep = memcg_params_to_cache(params);
> +		cachep->memcg_params->dead = true;

I guess that this needs smp_wmb() and memcg_cache_dead() needs
smp_rmb(), since we could call memcg_cache_dead() without holding any locks.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-10  8:09     ` Joonsoo Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-10  8:09 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:44PM +0400, Vladimir Davydov wrote:
> Since a dead memcg cache is destroyed only after the last slab allocated
> to it is freed, we must disable caching of empty slabs for such caches,
> otherwise they will be hanging around forever.
> 
> This patch makes SLUB discard dead memcg caches' slabs as soon as they
> become empty. To achieve that, it disables per cpu partial lists for
> dead caches (see put_cpu_partial) and forbids keeping empty slabs on per
> node partial lists by setting cache's min_partial to 0 on
> kmem_cache_shrink, which is always called on memcg offline (see
> memcg_unregister_all_caches).
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Thanks-to: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/slub.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index e46d6abe8a68..1dad7e2c586a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2015,6 +2015,8 @@ static void unfreeze_partials(struct kmem_cache *s,
>  #endif
>  }
>  
> +static void flush_all(struct kmem_cache *s);
> +
>  /*
>   * Put a page that was just frozen (in __slab_free) into a partial page
>   * slot if available. This is done without interrupts disabled and without
> @@ -2064,6 +2066,21 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
>  
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
>  								!= oldpage);
> +
> +	if (memcg_cache_dead(s)) {
> +               bool done = false;
> +               unsigned long flags;
> +
> +               local_irq_save(flags);
> +               if (this_cpu_read(s->cpu_slab->partial) == page) {
> +                       unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
> +                       done = true;
> +               }
> +               local_irq_restore(flags);
> +
> +               if (!done)
> +                       flush_all(s);
> +	}

Now, slab_free() is non-preemptable so flush_all() isn't needed.

Thanks.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
@ 2014-06-10  8:09     ` Joonsoo Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-10  8:09 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:44PM +0400, Vladimir Davydov wrote:
> Since a dead memcg cache is destroyed only after the last slab allocated
> to it is freed, we must disable caching of empty slabs for such caches,
> otherwise they will be hanging around forever.
> 
> This patch makes SLUB discard dead memcg caches' slabs as soon as they
> become empty. To achieve that, it disables per cpu partial lists for
> dead caches (see put_cpu_partial) and forbids keeping empty slabs on per
> node partial lists by setting cache's min_partial to 0 on
> kmem_cache_shrink, which is always called on memcg offline (see
> memcg_unregister_all_caches).
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> Thanks-to: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/slub.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index e46d6abe8a68..1dad7e2c586a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2015,6 +2015,8 @@ static void unfreeze_partials(struct kmem_cache *s,
>  #endif
>  }
>  
> +static void flush_all(struct kmem_cache *s);
> +
>  /*
>   * Put a page that was just frozen (in __slab_free) into a partial page
>   * slot if available. This is done without interrupts disabled and without
> @@ -2064,6 +2066,21 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
>  
>  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
>  								!= oldpage);
> +
> +	if (memcg_cache_dead(s)) {
> +               bool done = false;
> +               unsigned long flags;
> +
> +               local_irq_save(flags);
> +               if (this_cpu_read(s->cpu_slab->partial) == page) {
> +                       unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
> +                       done = true;
> +               }
> +               local_irq_restore(flags);
> +
> +               if (!done)
> +                       flush_all(s);
> +	}

Now, slab_free() is non-preemptable so flush_all() isn't needed.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-10  7:43     ` Joonsoo Kim
@ 2014-06-10 10:03       ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 10:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

Hi,

On Tue, Jun 10, 2014 at 04:43:17PM +0900, Joonsoo Kim wrote:
> You mentioned that disabling per cpu arrays would degrade performance.
> But, this patch is implemented to disable per cpu arrays. Is there any
> reason to do like this? How about not disabling per cpu arrays and
> others? Leaving it as is makes the patch less intrusive and has low
> impact on performance. I guess that amount of reclaimed memory has no
> big difference between both approaches.

Frankly, I incline to shrinking dead SLAB caches periodically from
cache_reap too, because it looks neater and less intrusive to me. Also
it has zero performance impact, which is nice.

However, Christoph proposed to disable per cpu arrays for dead caches,
similarly to SLUB, and I decided to give it a try, just to see the end
code we'd have with it.

I'm still not quite sure which way we should choose though...

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-10 10:03       ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 10:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

Hi,

On Tue, Jun 10, 2014 at 04:43:17PM +0900, Joonsoo Kim wrote:
> You mentioned that disabling per cpu arrays would degrade performance.
> But, this patch is implemented to disable per cpu arrays. Is there any
> reason to do like this? How about not disabling per cpu arrays and
> others? Leaving it as is makes the patch less intrusive and has low
> impact on performance. I guess that amount of reclaimed memory has no
> big difference between both approaches.

Frankly, I incline to shrinking dead SLAB caches periodically from
cache_reap too, because it looks neater and less intrusive to me. Also
it has zero performance impact, which is nice.

However, Christoph proposed to disable per cpu arrays for dead caches,
similarly to SLUB, and I decided to give it a try, just to see the end
code we'd have with it.

I'm still not quite sure which way we should choose though...

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead
  2014-06-10  7:48     ` Joonsoo Kim
@ 2014-06-10 10:06       ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 10:06 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 04:48:40PM +0900, Joonsoo Kim wrote:
> On Fri, Jun 06, 2014 at 05:22:40PM +0400, Vladimir Davydov wrote:
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 886b5b414958..ed42fd1105a5 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -3294,6 +3294,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
> >  	mutex_lock(&memcg_slab_mutex);
> >  	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
> >  		cachep = memcg_params_to_cache(params);
> > +		cachep->memcg_params->dead = true;
> 
> I guess that this needs smp_wmb() and memcg_cache_dead() needs
> smp_rmb(), since we could call memcg_cache_dead() without holding any locks.

Good catch! Actually, I thought we always call on_each_cpu, which works
effectively as a full memory barrier, from kmem_cache_shrink, but that's
not always true for SLUB, so we do need the barriers here. Will fix in
the next iteration.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead
@ 2014-06-10 10:06       ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 10:06 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 04:48:40PM +0900, Joonsoo Kim wrote:
> On Fri, Jun 06, 2014 at 05:22:40PM +0400, Vladimir Davydov wrote:
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 886b5b414958..ed42fd1105a5 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -3294,6 +3294,7 @@ static void memcg_unregister_all_caches(struct mem_cgroup *memcg)
> >  	mutex_lock(&memcg_slab_mutex);
> >  	list_for_each_entry_safe(params, tmp, &memcg->memcg_slab_caches, list) {
> >  		cachep = memcg_params_to_cache(params);
> > +		cachep->memcg_params->dead = true;
> 
> I guess that this needs smp_wmb() and memcg_cache_dead() needs
> smp_rmb(), since we could call memcg_cache_dead() without holding any locks.

Good catch! Actually, I thought we always call on_each_cpu, which works
effectively as a full memory barrier, from kmem_cache_shrink, but that's
not always true for SLUB, so we do need the barriers here. Will fix in
the next iteration.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
  2014-06-10  8:09     ` Joonsoo Kim
@ 2014-06-10 10:09       ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 10:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 05:09:35PM +0900, Joonsoo Kim wrote:
> On Fri, Jun 06, 2014 at 05:22:44PM +0400, Vladimir Davydov wrote:
> > @@ -2064,6 +2066,21 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >  
> >  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
> >  								!= oldpage);
> > +
> > +	if (memcg_cache_dead(s)) {
> > +               bool done = false;
> > +               unsigned long flags;
> > +
> > +               local_irq_save(flags);
> > +               if (this_cpu_read(s->cpu_slab->partial) == page) {
> > +                       unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
> > +                       done = true;
> > +               }
> > +               local_irq_restore(flags);
> > +
> > +               if (!done)
> > +                       flush_all(s);
> > +	}
> 
> Now, slab_free() is non-preemptable so flush_all() isn't needed.

Right! Will fix.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately
@ 2014-06-10 10:09       ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 10:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 05:09:35PM +0900, Joonsoo Kim wrote:
> On Fri, Jun 06, 2014 at 05:22:44PM +0400, Vladimir Davydov wrote:
> > @@ -2064,6 +2066,21 @@ static void put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
> >  
> >  	} while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page)
> >  								!= oldpage);
> > +
> > +	if (memcg_cache_dead(s)) {
> > +               bool done = false;
> > +               unsigned long flags;
> > +
> > +               local_irq_save(flags);
> > +               if (this_cpu_read(s->cpu_slab->partial) == page) {
> > +                       unfreeze_partials(s, this_cpu_ptr(s->cpu_slab));
> > +                       done = true;
> > +               }
> > +               local_irq_restore(flags);
> > +
> > +               if (!done)
> > +                       flush_all(s);
> > +	}
> 
> Now, slab_free() is non-preemptable so flush_all() isn't needed.

Right! Will fix.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-10 10:03       ` Vladimir Davydov
@ 2014-06-10 14:26         ` Christoph Lameter
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-10 14:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Joonsoo Kim, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, 10 Jun 2014, Vladimir Davydov wrote:

> Frankly, I incline to shrinking dead SLAB caches periodically from
> cache_reap too, because it looks neater and less intrusive to me. Also
> it has zero performance impact, which is nice.
>
> However, Christoph proposed to disable per cpu arrays for dead caches,
> similarly to SLUB, and I decided to give it a try, just to see the end
> code we'd have with it.
>
> I'm still not quite sure which way we should choose though...

Which one is cleaner?


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-10 14:26         ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-10 14:26 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Joonsoo Kim, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, 10 Jun 2014, Vladimir Davydov wrote:

> Frankly, I incline to shrinking dead SLAB caches periodically from
> cache_reap too, because it looks neater and less intrusive to me. Also
> it has zero performance impact, which is nice.
>
> However, Christoph proposed to disable per cpu arrays for dead caches,
> similarly to SLUB, and I decided to give it a try, just to see the end
> code we'd have with it.
>
> I'm still not quite sure which way we should choose though...

Which one is cleaner?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-10 14:26         ` Christoph Lameter
@ 2014-06-10 15:18           ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 15:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> 
> > Frankly, I incline to shrinking dead SLAB caches periodically from
> > cache_reap too, because it looks neater and less intrusive to me. Also
> > it has zero performance impact, which is nice.
> >
> > However, Christoph proposed to disable per cpu arrays for dead caches,
> > similarly to SLUB, and I decided to give it a try, just to see the end
> > code we'd have with it.
> >
> > I'm still not quite sure which way we should choose though...
> 
> Which one is cleaner?

To shrink dead caches aggressively, we only need to modify cache_reap
(see https://lkml.org/lkml/2014/5/30/271).

To zap object arrays for dead caches (this is what this patch does), we
have to:
 - set array_cache->limit to 0 for each per cpu, shared, and alien array
   caches on kmem_cache_shrink;
 - make cpu/node hotplug paths init new array cache sizes to 0;
 - make free paths (__cache_free, cache_free_alien) handle zero array
   cache size properly, because currently they doesn't.

So IMO the first one (reaping dead caches periodically) requires less
modifications and therefore is cleaner.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-10 15:18           ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-10 15:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> 
> > Frankly, I incline to shrinking dead SLAB caches periodically from
> > cache_reap too, because it looks neater and less intrusive to me. Also
> > it has zero performance impact, which is nice.
> >
> > However, Christoph proposed to disable per cpu arrays for dead caches,
> > similarly to SLUB, and I decided to give it a try, just to see the end
> > code we'd have with it.
> >
> > I'm still not quite sure which way we should choose though...
> 
> Which one is cleaner?

To shrink dead caches aggressively, we only need to modify cache_reap
(see https://lkml.org/lkml/2014/5/30/271).

To zap object arrays for dead caches (this is what this patch does), we
have to:
 - set array_cache->limit to 0 for each per cpu, shared, and alien array
   caches on kmem_cache_shrink;
 - make cpu/node hotplug paths init new array cache sizes to 0;
 - make free paths (__cache_free, cache_free_alien) handle zero array
   cache size properly, because currently they doesn't.

So IMO the first one (reaping dead caches periodically) requires less
modifications and therefore is cleaner.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-10 15:18           ` Vladimir Davydov
@ 2014-06-11  8:11             ` Joonsoo Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-11  8:11 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Christoph Lameter, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > 
> > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > it has zero performance impact, which is nice.
> > >
> > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > code we'd have with it.
> > >
> > > I'm still not quite sure which way we should choose though...
> > 
> > Which one is cleaner?
> 
> To shrink dead caches aggressively, we only need to modify cache_reap
> (see https://lkml.org/lkml/2014/5/30/271).
> 
> To zap object arrays for dead caches (this is what this patch does), we
> have to:
>  - set array_cache->limit to 0 for each per cpu, shared, and alien array
>    caches on kmem_cache_shrink;
>  - make cpu/node hotplug paths init new array cache sizes to 0;
>  - make free paths (__cache_free, cache_free_alien) handle zero array
>    cache size properly, because currently they doesn't.
> 
> So IMO the first one (reaping dead caches periodically) requires less
> modifications and therefore is cleaner.

Yeah, I also like the first one.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-11  8:11             ` Joonsoo Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-11  8:11 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Christoph Lameter, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > 
> > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > it has zero performance impact, which is nice.
> > >
> > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > code we'd have with it.
> > >
> > > I'm still not quite sure which way we should choose though...
> > 
> > Which one is cleaner?
> 
> To shrink dead caches aggressively, we only need to modify cache_reap
> (see https://lkml.org/lkml/2014/5/30/271).
> 
> To zap object arrays for dead caches (this is what this patch does), we
> have to:
>  - set array_cache->limit to 0 for each per cpu, shared, and alien array
>    caches on kmem_cache_shrink;
>  - make cpu/node hotplug paths init new array cache sizes to 0;
>  - make free paths (__cache_free, cache_free_alien) handle zero array
>    cache size properly, because currently they doesn't.
> 
> So IMO the first one (reaping dead caches periodically) requires less
> modifications and therefore is cleaner.

Yeah, I also like the first one.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-10 15:18           ` Vladimir Davydov
@ 2014-06-11 21:24             ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-11 21:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > 
> > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > it has zero performance impact, which is nice.
> > >
> > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > code we'd have with it.
> > >
> > > I'm still not quite sure which way we should choose though...
> > 
> > Which one is cleaner?
> 
> To shrink dead caches aggressively, we only need to modify cache_reap
> (see https://lkml.org/lkml/2014/5/30/271).

Hmm, reap_alien, which is called from cache_reap to shrink per node
alien object arrays, only processes one node at a time. That means with
the patch I gave a link to above it will take up to
(REAPTIMEOUT_AC*nr_online_nodes) seconds to destroy a virtually empty
dead cache, which may be quite long on large machines. Of course, we can
make reap_alien walk over all alien caches of the current node, but that
will probably hurt performance...

> 
> To zap object arrays for dead caches (this is what this patch does), we
> have to:
>  - set array_cache->limit to 0 for each per cpu, shared, and alien array
>    caches on kmem_cache_shrink;
>  - make cpu/node hotplug paths init new array cache sizes to 0;
>  - make free paths (__cache_free, cache_free_alien) handle zero array
>    cache size properly, because currently they doesn't.
> 
> So IMO the first one (reaping dead caches periodically) requires less
> modifications and therefore is cleaner.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-11 21:24             ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-11 21:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > 
> > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > it has zero performance impact, which is nice.
> > >
> > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > code we'd have with it.
> > >
> > > I'm still not quite sure which way we should choose though...
> > 
> > Which one is cleaner?
> 
> To shrink dead caches aggressively, we only need to modify cache_reap
> (see https://lkml.org/lkml/2014/5/30/271).

Hmm, reap_alien, which is called from cache_reap to shrink per node
alien object arrays, only processes one node at a time. That means with
the patch I gave a link to above it will take up to
(REAPTIMEOUT_AC*nr_online_nodes) seconds to destroy a virtually empty
dead cache, which may be quite long on large machines. Of course, we can
make reap_alien walk over all alien caches of the current node, but that
will probably hurt performance...

> 
> To zap object arrays for dead caches (this is what this patch does), we
> have to:
>  - set array_cache->limit to 0 for each per cpu, shared, and alien array
>    caches on kmem_cache_shrink;
>  - make cpu/node hotplug paths init new array cache sizes to 0;
>  - make free paths (__cache_free, cache_free_alien) handle zero array
>    cache size properly, because currently they doesn't.
> 
> So IMO the first one (reaping dead caches periodically) requires less
> modifications and therefore is cleaner.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-11 21:24             ` Vladimir Davydov
@ 2014-06-12  6:53               ` Joonsoo Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-12  6:53 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Christoph Lameter, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Thu, Jun 12, 2014 at 01:24:34AM +0400, Vladimir Davydov wrote:
> On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> > On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > > 
> > > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > > it has zero performance impact, which is nice.
> > > >
> > > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > > code we'd have with it.
> > > >
> > > > I'm still not quite sure which way we should choose though...
> > > 
> > > Which one is cleaner?
> > 
> > To shrink dead caches aggressively, we only need to modify cache_reap
> > (see https://lkml.org/lkml/2014/5/30/271).
> 
> Hmm, reap_alien, which is called from cache_reap to shrink per node
> alien object arrays, only processes one node at a time. That means with
> the patch I gave a link to above it will take up to
> (REAPTIMEOUT_AC*nr_online_nodes) seconds to destroy a virtually empty
> dead cache, which may be quite long on large machines. Of course, we can
> make reap_alien walk over all alien caches of the current node, but that
> will probably hurt performance...

Hmm, maybe we have a few of objects on other node, doesn't it?

BTW, I have a question about cache_reap(). If there are many kmemcg
users, we would have a lot of slab caches and just to traverse slab
cache list could take some times. Is it no problem?

Thanks.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-12  6:53               ` Joonsoo Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-12  6:53 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Christoph Lameter, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Thu, Jun 12, 2014 at 01:24:34AM +0400, Vladimir Davydov wrote:
> On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> > On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > > 
> > > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > > it has zero performance impact, which is nice.
> > > >
> > > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > > code we'd have with it.
> > > >
> > > > I'm still not quite sure which way we should choose though...
> > > 
> > > Which one is cleaner?
> > 
> > To shrink dead caches aggressively, we only need to modify cache_reap
> > (see https://lkml.org/lkml/2014/5/30/271).
> 
> Hmm, reap_alien, which is called from cache_reap to shrink per node
> alien object arrays, only processes one node at a time. That means with
> the patch I gave a link to above it will take up to
> (REAPTIMEOUT_AC*nr_online_nodes) seconds to destroy a virtually empty
> dead cache, which may be quite long on large machines. Of course, we can
> make reap_alien walk over all alien caches of the current node, but that
> will probably hurt performance...

Hmm, maybe we have a few of objects on other node, doesn't it?

BTW, I have a question about cache_reap(). If there are many kmemcg
users, we would have a lot of slab caches and just to traverse slab
cache list could take some times. Is it no problem?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
  2014-06-06 13:22   ` Vladimir Davydov
@ 2014-06-12  6:58     ` Joonsoo Kim
  -1 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-12  6:58 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:42PM +0400, Vladimir Davydov wrote:
> Since per memcg cache destruction is scheduled when the last slab is
> freed, to avoid use-after-free in kmem_cache_free we should either
> rearrange code in kmem_cache_free so that it won't dereference the cache
> ptr after freeing the object, or wait for all kmem_cache_free's to
> complete before proceeding to cache destruction.
> 
> The former approach isn't a good option from the future development
> point of view, because every modifications to kmem_cache_free must be
> done with great care then. Hence we should provide a method to wait for
> all currently executing kmem_cache_free's to finish.
> 
> This patch makes SLUB's implementation of kmem_cache_free
> non-preemptable. As a result, synchronize_sched() will work as a barrier
> against kmem_cache_free's in flight, so that issuing it before cache
> destruction will protect us against the use-after-free.
> 
> This won't affect performance of kmem_cache_free, because we already
> disable preemption there, and this patch only moves preempt_enable to
> the end of the function. Neither should it affect the system latency,
> because kmem_cache_free is extremely short, even in its slow path.
> 
> SLAB's version of kmem_cache_free already proceeds with irqs disabled,
> so nothing to be done there.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> ---
>  mm/slub.c |   10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 35741592be8c..e46d6abe8a68 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2673,18 +2673,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
>  
>  	slab_free_hook(s, x);
>  
> -redo:
> -	/*
> -	 * Determine the currently cpus per cpu slab.
> -	 * The cpu may change afterward. However that does not matter since
> -	 * data is retrieved via this pointer. If we are on the same cpu
> -	 * during the cmpxchg then the free will succedd.
> -	 */
>  	preempt_disable();

Hello,

Could you add some code comment why this preempt_disable/enable() is
needed? We don't have any clue that kmemcg depends on these things
on code, so someone cannot understand why it is here.

If possible, please add similar code comment on slab_alloc in mm/slab.c.

Thanks.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
@ 2014-06-12  6:58     ` Joonsoo Kim
  0 siblings, 0 replies; 60+ messages in thread
From: Joonsoo Kim @ 2014-06-12  6:58 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

On Fri, Jun 06, 2014 at 05:22:42PM +0400, Vladimir Davydov wrote:
> Since per memcg cache destruction is scheduled when the last slab is
> freed, to avoid use-after-free in kmem_cache_free we should either
> rearrange code in kmem_cache_free so that it won't dereference the cache
> ptr after freeing the object, or wait for all kmem_cache_free's to
> complete before proceeding to cache destruction.
> 
> The former approach isn't a good option from the future development
> point of view, because every modifications to kmem_cache_free must be
> done with great care then. Hence we should provide a method to wait for
> all currently executing kmem_cache_free's to finish.
> 
> This patch makes SLUB's implementation of kmem_cache_free
> non-preemptable. As a result, synchronize_sched() will work as a barrier
> against kmem_cache_free's in flight, so that issuing it before cache
> destruction will protect us against the use-after-free.
> 
> This won't affect performance of kmem_cache_free, because we already
> disable preemption there, and this patch only moves preempt_enable to
> the end of the function. Neither should it affect the system latency,
> because kmem_cache_free is extremely short, even in its slow path.
> 
> SLAB's version of kmem_cache_free already proceeds with irqs disabled,
> so nothing to be done there.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> ---
>  mm/slub.c |   10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 35741592be8c..e46d6abe8a68 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2673,18 +2673,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
>  
>  	slab_free_hook(s, x);
>  
> -redo:
> -	/*
> -	 * Determine the currently cpus per cpu slab.
> -	 * The cpu may change afterward. However that does not matter since
> -	 * data is retrieved via this pointer. If we are on the same cpu
> -	 * during the cmpxchg then the free will succedd.
> -	 */
>  	preempt_disable();

Hello,

Could you add some code comment why this preempt_disable/enable() is
needed? We don't have any clue that kmemcg depends on these things
on code, so someone cannot understand why it is here.

If possible, please add similar code comment on slab_alloc in mm/slab.c.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-12  6:53               ` Joonsoo Kim
@ 2014-06-12 10:02                 ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-12 10:02 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Thu, Jun 12, 2014 at 03:53:45PM +0900, Joonsoo Kim wrote:
> On Thu, Jun 12, 2014 at 01:24:34AM +0400, Vladimir Davydov wrote:
> > On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> > > On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > > > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > > > 
> > > > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > > > it has zero performance impact, which is nice.
> > > > >
> > > > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > > > code we'd have with it.
> > > > >
> > > > > I'm still not quite sure which way we should choose though...
> > > > 
> > > > Which one is cleaner?
> > > 
> > > To shrink dead caches aggressively, we only need to modify cache_reap
> > > (see https://lkml.org/lkml/2014/5/30/271).
> > 
> > Hmm, reap_alien, which is called from cache_reap to shrink per node
> > alien object arrays, only processes one node at a time. That means with
> > the patch I gave a link to above it will take up to
> > (REAPTIMEOUT_AC*nr_online_nodes) seconds to destroy a virtually empty
> > dead cache, which may be quite long on large machines. Of course, we can
> > make reap_alien walk over all alien caches of the current node, but that
> > will probably hurt performance...
> 
> Hmm, maybe we have a few of objects on other node, doesn't it?

I think so, but those few objects will prevent the cache from
destruction until they are reaped, which may take long.

> BTW, I have a question about cache_reap(). If there are many kmemcg
> users, we would have a lot of slab caches and just to traverse slab
> cache list could take some times. Is it no problem?

This may be a problem. Since a cache will stay alive while it has at
least one active object, there may be throngs of dead caches on the
list, actually their number won't even be limited by the number of
memcgs. This can slow down cache reaping and result in noticeable memory
pressure. Also, it will delay destruction of dead caches, making the
situation even worse. And we can't even delete dead caches from the
list, because they won't be reaped then...

OTOH, if we disable per cpu arrays for dead caches, we won't have to
reap them and therefore can remove them from the slab_caches list. Then
the number of caches on the list will be bound by the number of memcgs
multiplied by a constant. Although it still may be quite large, this
will be predictable at least - the more kmem-active memcgs you have, the
more memory you need, which sounds reasonable to me.

Regarding the slowdown introduced by disabling of per cpu arrays, I
guess it shouldn't be critical, because, as dead caches are never
allocated from, the number of kfree's left after death is quite limited.

So, everything isn't that straightforward yet...

I think I'll try to simplify the patch that disables per cpu arrays for
dead caches and send implementations of both approaches with their pros
and cons outlined in the next iteration, so that we can compare them
side by side.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-12 10:02                 ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-12 10:02 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Thu, Jun 12, 2014 at 03:53:45PM +0900, Joonsoo Kim wrote:
> On Thu, Jun 12, 2014 at 01:24:34AM +0400, Vladimir Davydov wrote:
> > On Tue, Jun 10, 2014 at 07:18:34PM +0400, Vladimir Davydov wrote:
> > > On Tue, Jun 10, 2014 at 09:26:19AM -0500, Christoph Lameter wrote:
> > > > On Tue, 10 Jun 2014, Vladimir Davydov wrote:
> > > > 
> > > > > Frankly, I incline to shrinking dead SLAB caches periodically from
> > > > > cache_reap too, because it looks neater and less intrusive to me. Also
> > > > > it has zero performance impact, which is nice.
> > > > >
> > > > > However, Christoph proposed to disable per cpu arrays for dead caches,
> > > > > similarly to SLUB, and I decided to give it a try, just to see the end
> > > > > code we'd have with it.
> > > > >
> > > > > I'm still not quite sure which way we should choose though...
> > > > 
> > > > Which one is cleaner?
> > > 
> > > To shrink dead caches aggressively, we only need to modify cache_reap
> > > (see https://lkml.org/lkml/2014/5/30/271).
> > 
> > Hmm, reap_alien, which is called from cache_reap to shrink per node
> > alien object arrays, only processes one node at a time. That means with
> > the patch I gave a link to above it will take up to
> > (REAPTIMEOUT_AC*nr_online_nodes) seconds to destroy a virtually empty
> > dead cache, which may be quite long on large machines. Of course, we can
> > make reap_alien walk over all alien caches of the current node, but that
> > will probably hurt performance...
> 
> Hmm, maybe we have a few of objects on other node, doesn't it?

I think so, but those few objects will prevent the cache from
destruction until they are reaped, which may take long.

> BTW, I have a question about cache_reap(). If there are many kmemcg
> users, we would have a lot of slab caches and just to traverse slab
> cache list could take some times. Is it no problem?

This may be a problem. Since a cache will stay alive while it has at
least one active object, there may be throngs of dead caches on the
list, actually their number won't even be limited by the number of
memcgs. This can slow down cache reaping and result in noticeable memory
pressure. Also, it will delay destruction of dead caches, making the
situation even worse. And we can't even delete dead caches from the
list, because they won't be reaped then...

OTOH, if we disable per cpu arrays for dead caches, we won't have to
reap them and therefore can remove them from the slab_caches list. Then
the number of caches on the list will be bound by the number of memcgs
multiplied by a constant. Although it still may be quite large, this
will be predictable at least - the more kmem-active memcgs you have, the
more memory you need, which sounds reasonable to me.

Regarding the slowdown introduced by disabling of per cpu arrays, I
guess it shouldn't be critical, because, as dead caches are never
allocated from, the number of kfree's left after death is quite limited.

So, everything isn't that straightforward yet...

I think I'll try to simplify the patch that disables per cpu arrays for
dead caches and send implementations of both approaches with their pros
and cons outlined in the next iteration, so that we can compare them
side by side.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
  2014-06-12  6:58     ` Joonsoo Kim
@ 2014-06-12 10:03       ` Vladimir Davydov
  -1 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-12 10:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

Hi,

On Thu, Jun 12, 2014 at 03:58:42PM +0900, Joonsoo Kim wrote:
> On Fri, Jun 06, 2014 at 05:22:42PM +0400, Vladimir Davydov wrote:
> > @@ -2673,18 +2673,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
> >  
> >  	slab_free_hook(s, x);
> >  
> > -redo:
> > -	/*
> > -	 * Determine the currently cpus per cpu slab.
> > -	 * The cpu may change afterward. However that does not matter since
> > -	 * data is retrieved via this pointer. If we are on the same cpu
> > -	 * during the cmpxchg then the free will succedd.
> > -	 */
> >  	preempt_disable();
> 
> Could you add some code comment why this preempt_disable/enable() is
> needed? We don't have any clue that kmemcg depends on these things
> on code, so someone cannot understand why it is here.
> 
> If possible, please add similar code comment on slab_alloc in mm/slab.c.

Sure.

Thanks.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 5/8] slub: make slab_free non-preemptable
@ 2014-06-12 10:03       ` Vladimir Davydov
  0 siblings, 0 replies; 60+ messages in thread
From: Vladimir Davydov @ 2014-06-12 10:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, cl, rientjes, penberg, hannes, mhocko, linux-kernel, linux-mm

Hi,

On Thu, Jun 12, 2014 at 03:58:42PM +0900, Joonsoo Kim wrote:
> On Fri, Jun 06, 2014 at 05:22:42PM +0400, Vladimir Davydov wrote:
> > @@ -2673,18 +2673,11 @@ static __always_inline void slab_free(struct kmem_cache *s,
> >  
> >  	slab_free_hook(s, x);
> >  
> > -redo:
> > -	/*
> > -	 * Determine the currently cpus per cpu slab.
> > -	 * The cpu may change afterward. However that does not matter since
> > -	 * data is retrieved via this pointer. If we are on the same cpu
> > -	 * during the cmpxchg then the free will succedd.
> > -	 */
> >  	preempt_disable();
> 
> Could you add some code comment why this preempt_disable/enable() is
> needed? We don't have any clue that kmemcg depends on these things
> on code, so someone cannot understand why it is here.
> 
> If possible, please add similar code comment on slab_alloc in mm/slab.c.

Sure.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
  2014-06-12  6:53               ` Joonsoo Kim
@ 2014-06-13 16:34                 ` Christoph Lameter
  -1 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-13 16:34 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vladimir Davydov, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Thu, 12 Jun 2014, Joonsoo Kim wrote:

> BTW, I have a question about cache_reap(). If there are many kmemcg
> users, we would have a lot of slab caches and just to traverse slab
> cache list could take some times. Is it no problem?

Its a big problem and one of the reasons that SLUB was developed. Cache
reaping caused noticable random delays to processing which was
significantly impacting HPC loads of SGI.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH -mm v2 8/8] slab: make dead memcg caches discard free slabs immediately
@ 2014-06-13 16:34                 ` Christoph Lameter
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Lameter @ 2014-06-13 16:34 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vladimir Davydov, akpm, rientjes, penberg, hannes, mhocko,
	linux-kernel, linux-mm

On Thu, 12 Jun 2014, Joonsoo Kim wrote:

> BTW, I have a question about cache_reap(). If there are many kmemcg
> users, we would have a lot of slab caches and just to traverse slab
> cache list could take some times. Is it no problem?

Its a big problem and one of the reasons that SLUB was developed. Cache
reaping caused noticable random delays to processing which was
significantly impacting HPC loads of SGI.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2014-06-13 16:34 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06 13:22 [PATCH -mm v2 0/8] memcg/slab: reintroduce dead cache self-destruction Vladimir Davydov
2014-06-06 13:22 ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 1/8] memcg: cleanup memcg_cache_params refcnt usage Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 2/8] memcg: destroy kmem caches when last slab is freed Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 3/8] memcg: mark caches that belong to offline memcgs as dead Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-10  7:48   ` Joonsoo Kim
2014-06-10  7:48     ` Joonsoo Kim
2014-06-10 10:06     ` Vladimir Davydov
2014-06-10 10:06       ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 4/8] slub: don't fail kmem_cache_shrink if slab placement optimization fails Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 5/8] slub: make slab_free non-preemptable Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 14:46   ` Christoph Lameter
2014-06-06 14:46     ` Christoph Lameter
2014-06-09 12:52     ` Vladimir Davydov
2014-06-09 12:52       ` Vladimir Davydov
2014-06-09 13:52       ` Christoph Lameter
2014-06-09 13:52         ` Christoph Lameter
2014-06-12  6:58   ` Joonsoo Kim
2014-06-12  6:58     ` Joonsoo Kim
2014-06-12 10:03     ` Vladimir Davydov
2014-06-12 10:03       ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 6/8] memcg: wait for kfree's to finish before destroying cache Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 7/8] slub: make dead memcg caches discard free slabs immediately Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 14:48   ` Christoph Lameter
2014-06-06 14:48     ` Christoph Lameter
2014-06-10  8:09   ` Joonsoo Kim
2014-06-10  8:09     ` Joonsoo Kim
2014-06-10 10:09     ` Vladimir Davydov
2014-06-10 10:09       ` Vladimir Davydov
2014-06-06 13:22 ` [PATCH -mm v2 8/8] slab: " Vladimir Davydov
2014-06-06 13:22   ` Vladimir Davydov
2014-06-06 14:52   ` Christoph Lameter
2014-06-06 14:52     ` Christoph Lameter
2014-06-09 13:04     ` Vladimir Davydov
2014-06-09 13:04       ` Vladimir Davydov
2014-06-10  7:43   ` Joonsoo Kim
2014-06-10  7:43     ` Joonsoo Kim
2014-06-10 10:03     ` Vladimir Davydov
2014-06-10 10:03       ` Vladimir Davydov
2014-06-10 14:26       ` Christoph Lameter
2014-06-10 14:26         ` Christoph Lameter
2014-06-10 15:18         ` Vladimir Davydov
2014-06-10 15:18           ` Vladimir Davydov
2014-06-11  8:11           ` Joonsoo Kim
2014-06-11  8:11             ` Joonsoo Kim
2014-06-11 21:24           ` Vladimir Davydov
2014-06-11 21:24             ` Vladimir Davydov
2014-06-12  6:53             ` Joonsoo Kim
2014-06-12  6:53               ` Joonsoo Kim
2014-06-12 10:02               ` Vladimir Davydov
2014-06-12 10:02                 ` Vladimir Davydov
2014-06-13 16:34               ` Christoph Lameter
2014-06-13 16:34                 ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.