All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] simple system for enable/disable slabs being tracked by memcg.
@ 2012-03-28 16:42 Glauber Costa
  2012-03-29  0:01 ` Suleiman Souhlal
  2012-03-29  0:03 ` KAMEZAWA Hiroyuki
  0 siblings, 2 replies; 6+ messages in thread
From: Glauber Costa @ 2012-03-28 16:42 UTC (permalink / raw)
  To: linux-mm
  Cc: Glauber Costa, Suleiman Souhlal, Hiroyouki Kamezawa,
	Johannes Weiner, Michal Hocko

Hi.

This is a proposal I've got for how to finally settle down the
question of which slabs should be tracked. The patch I am providing
is for discussion only, and should apply ontop of Suleiman's latest
version posted to the list.

The idea is to create a new file, memory.kmem.slabs_allowed.
I decided not to overload the slabinfo file for that, but I can,
if you ultimately want to. I just think it is cleaner this way.
As a small rationale, I'd like to somehow show which caches are
available but disabled. And yet, keep the format compatible with
/proc/slabinfo.

Reading from this file will provide this information
Writers should write a string:
 [+-]cache_name

The wild card * is accepted, but only that. I am leaving
any complex processing to userspace.

The * wildcard, though, is nice. It allows us to do:
 -* (disable all)
 +cache1
 +cache2

and so on.

Part of this patch is actually converting the slab pointers in memcg
to a complex memcg-specific structure that can hold a disabled pointer.

We could actually store it in a free bit in the address, but that is
a first version. Let me know if this is how you would like me to tackle
this.

With a system like this (either this, or something alike), my opposition
to Suleiman's idea of tracking everything under the sun basically vanishes,
since I can then selectively disable most of them.

I still prefer a special kmalloc call than a GFP flag, though.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Suleiman Souhlal <suleiman@google.com>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Michal Hocko <mhocko@suse.cz>
---
 include/linux/memcontrol.h |   17 ++++++++
 include/linux/slab.h       |   13 ++++++
 mm/memcontrol.c            |   87 ++++++++++++++++++++++++++++++++++----
 mm/slab.c                  |   99 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 207 insertions(+), 9 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f5458b0..acd38a5 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -427,6 +427,9 @@ bool mem_cgroup_charge_slab(struct kmem_cache *cachep, gfp_t gfp, size_t size);
 void mem_cgroup_uncharge_slab(struct kmem_cache *cachep, size_t size);
 void mem_cgroup_flush_cache_create_queue(void);
 void mem_cgroup_remove_child_kmem_cache(struct kmem_cache *cachep, int id);
+int mem_cgroup_slab_allowed(struct mem_cgroup *memcg, int id);
+void mem_cgroup_slab_allow(struct mem_cgroup *memcg, int id);
+void mem_cgroup_slab_disallow(struct mem_cgroup *memcg, int id);
 #else /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
 static inline void sock_update_memcg(struct sock *sk)
 {
@@ -456,6 +459,20 @@ static inline void
 mem_cgroup_flush_cache_create_queue(void)
 {
 }
+
+int mem_cgroup_slab_allowed(struct mem_cgroup *memcg, int id)
+{
+	return 0;
+}
+
+void mem_cgroup_slab_disallow(struct mem_cgroup *memcg, int id)
+{
+}
+
+void mem_cgroup_slab_allow(struct mem_cgroup *memcg, int id)
+{
+}
+
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
 #endif /* _LINUX_MEMCONTROL_H */
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0ff5ee2..3106843 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -380,6 +380,8 @@ void kmem_cache_drop_ref(struct kmem_cache *cachep);
 
 void *kmalloc_no_account(size_t size, gfp_t flags);
 
+int mem_cgroup_tune_slab(struct mem_cgroup *mem, const char *buffer);
+int mem_cgroup_probe_slab(struct mem_cgroup *mem, struct seq_file *m);
 #else /* !CONFIG_CGROUP_MEM_RES_CTLR_KMEM || !CONFIG_SLAB */
 
 #define MAX_KMEM_CACHE_TYPES 0
@@ -407,6 +409,17 @@ mem_cgroup_slabinfo(struct mem_cgroup *mem, struct seq_file *m)
 	return 0;
 }
 
+static inline int mem_cgroup_tune_slab(struct mem_cgroup *mem, const char *buffer)
+{
+	return 0;
+}
+
+static inline int mem_cgroup_probe_slab(struct mem_cgroup *mem, const char *buffer)
+{
+	return 0;
+}
+
+
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM && CONFIG_SLAB */
 
 #endif	/* _LINUX_SLAB_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba042d9..e8c6a92 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -226,6 +226,11 @@ enum memcg_flags {
 					 */
 };
 
+struct memcg_slab {
+	struct kmem_cache *cache;
+	bool disabled;
+};
+
 /*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
@@ -305,7 +310,7 @@ struct mem_cgroup {
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
 	/* Slab accounting */
-	struct kmem_cache *slabs[MAX_KMEM_CACHE_TYPES];
+	struct memcg_slab slabs[MAX_KMEM_CACHE_TYPES];
 #endif
 };
 
@@ -4671,6 +4676,21 @@ static int mem_cgroup_independent_kmem_limit_write(struct cgroup *cgrp,
 	return 0;
 }
 
+int mem_cgroup_slab_allowed(struct mem_cgroup *memcg, int idx)
+{
+	return !memcg->slabs[idx].disabled;
+}
+
+void mem_cgroup_slab_allow(struct mem_cgroup *memcg, int idx)
+{
+	memcg->slabs[idx].disabled = false;
+}
+
+void mem_cgroup_slab_disallow(struct mem_cgroup *memcg, int idx)
+{
+	memcg->slabs[idx].disabled = true;
+}
+
 static int
 mem_cgroup_slabinfo_show(struct cgroup *cgroup, struct cftype *ctf,
     struct seq_file *m)
@@ -4685,6 +4705,35 @@ mem_cgroup_slabinfo_show(struct cgroup *cgroup, struct cftype *ctf,
 	return mem_cgroup_slabinfo(mem, m);
 }
 
+static int mem_cgroup_slabs_read(struct cgroup *cgroup, struct cftype *ctf,
+				 struct seq_file *m)
+{
+	struct mem_cgroup *mem;
+
+	mem  = mem_cgroup_from_cont(cgroup);
+
+	if (mem == root_mem_cgroup)
+		return -EINVAL;
+
+	if (!list_empty(&cgroup->children))
+		return -EBUSY;
+
+	return mem_cgroup_probe_slab(mem, m);
+}
+
+static int mem_cgroup_slabs_write(struct cgroup *cgroup, struct cftype *cft,
+				  const char *buffer)
+{
+	struct mem_cgroup *mem;
+
+	mem  = mem_cgroup_from_cont(cgroup);
+
+	if (mem == root_mem_cgroup)
+		return -EINVAL;
+
+	return mem_cgroup_tune_slab(mem, buffer);
+}
+
 static struct cftype kmem_cgroup_files[] = {
 	{
 		.name = "kmem.independent_kmem_limit",
@@ -4706,6 +4755,12 @@ static struct cftype kmem_cgroup_files[] = {
 		.name = "kmem.slabinfo",
 		.read_seq_string = mem_cgroup_slabinfo_show,
 	},
+	{
+		.name = "kmem.slabs_allowed",
+		.read_seq_string = mem_cgroup_slabs_read,
+		.write_string = mem_cgroup_slabs_write,
+	},
+
 };
 
 static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
@@ -5765,7 +5820,7 @@ memcg_create_kmem_cache(struct mem_cgroup *memcg, struct kmem_cache *cachep)
 	 * This should behave as a write barrier, so we should be fine
 	 * with RCU.
 	 */
-	if (cmpxchg(&memcg->slabs[idx], NULL, new_cachep) != NULL) {
+	if (cmpxchg(&memcg->slabs[idx].cache, NULL, new_cachep) != NULL) {
 		kmem_cache_destroy(new_cachep);
 		return cachep;
 	}
@@ -5838,6 +5893,7 @@ memcg_create_cache_enqueue(struct mem_cgroup *memcg, struct kmem_cache *cachep)
 {
 	struct create_work *cw;
 	unsigned long flags;
+	int idx;
 
 	spin_lock_irqsave(&create_queue_lock, flags);
 	list_for_each_entry(cw, &create_queue, list) {
@@ -5848,6 +5904,14 @@ memcg_create_cache_enqueue(struct mem_cgroup *memcg, struct kmem_cache *cachep)
 	}
 	spin_unlock_irqrestore(&create_queue_lock, flags);
 
+	/*
+	 * If this cache is disabled, it basically means we are doing
+	 * global accounting for that particular cache. So skip it
+	 */
+	idx = cachep->memcg_params.id;
+	if (memcg->slabs[idx].disabled)
+		return;
+
 	/* The corresponding put will be done in the workqueue. */
 	if (!css_tryget(&memcg->css))
 		return;
@@ -5912,18 +5976,18 @@ mem_cgroup_get_kmem_cache(struct kmem_cache *cachep, gfp_t gfp)
 
 	VM_BUG_ON(idx == -1);
 
-	if (rcu_access_pointer(memcg->slabs[idx]) == NULL) {
+	if (rcu_access_pointer(memcg->slabs[idx].cache) == NULL) {
 		memcg_create_cache_enqueue(memcg, cachep);
 		return cachep;
 	}
 
-	return rcu_dereference(memcg->slabs[idx]);
+	return rcu_dereference(memcg->slabs[idx].cache);
 }
 
 void
 mem_cgroup_remove_child_kmem_cache(struct kmem_cache *cachep, int id)
 {
-	rcu_assign_pointer(cachep->memcg_params.memcg->slabs[id], NULL);
+	rcu_assign_pointer(cachep->memcg_params.memcg->slabs[id].cache, NULL);
 }
 
 bool
@@ -5966,10 +6030,15 @@ mem_cgroup_uncharge_slab(struct kmem_cache *cachep, size_t size)
 static void
 memcg_slab_init(struct mem_cgroup *memcg)
 {
+	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
 	int i;
 
-	for (i = 0; i < MAX_KMEM_CACHE_TYPES; i++)
-		rcu_assign_pointer(memcg->slabs[i], NULL);
+	for (i = 0; i < MAX_KMEM_CACHE_TYPES; i++) {
+		rcu_assign_pointer(memcg->slabs[i].cache, NULL);
+		if (parent)
+			memcg->slabs[i].disabled =
+			parent->slabs[i].disabled;
+	}
 }
 
 /*
@@ -5988,9 +6057,9 @@ memcg_slab_move(struct mem_cgroup *memcg)
 	mem_cgroup_flush_cache_create_queue();
 
 	for (i = 0; i < MAX_KMEM_CACHE_TYPES; i++) {
-		cachep = rcu_access_pointer(memcg->slabs[i]);
+		cachep = rcu_access_pointer(memcg->slabs[i].cache);
 		if (cachep != NULL) {
-			rcu_assign_pointer(memcg->slabs[i], NULL);
+			rcu_assign_pointer(memcg->slabs[i].cache, NULL);
 			cachep->memcg_params.memcg = NULL;
 
 			/* The space for this is already allocated */
diff --git a/mm/slab.c b/mm/slab.c
index 1b35799..1bf13f1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4766,6 +4766,105 @@ mem_cgroup_slabinfo(struct mem_cgroup *mem, struct seq_file *m)
 
 	return 0;
 }
+
+int mem_cgroup_probe_slab(struct mem_cgroup *memcg, struct seq_file *m)
+{
+	struct kmem_cache *cachep;
+
+	seq_printf(m, "Available slabs:\n");
+
+	mutex_lock(&cache_chain_mutex);
+	list_for_each_entry(cachep, &cache_chain, next) {
+		bool allowed;
+		if (cachep->memcg_params.id == -1)
+			continue;
+
+		if (!(cachep->flags & SLAB_MEMCG_ACCT))
+			continue;
+
+		allowed = !mem_cgroup_slab_allowed(memcg, cachep->memcg_params.id);
+
+		seq_printf(m, "%c %-17s\n", allowed ? '*' : ' ', cachep->name);
+	}
+	mutex_unlock(&cache_chain_mutex);
+
+	return 0;
+}
+
+/*
+ * selects which slabs are tracked in this memcg, from the pool of available
+ * slabs.
+ *
+ * Not worth to implement a full regex parser. Pre-processing can be done in
+ * userspace if needed. It helps, however, to at least have a * wildcard for
+ * groups of cache, like size-*.
+ *
+ * The first character is either a + or a -, meaning either add or remove
+ * a particular cache from the list of tracked caches.
+ */
+int mem_cgroup_tune_slab(struct mem_cgroup *mem, const char *buffer)
+{
+	struct kmem_cache *cachep;
+	int op;
+	int ret = -EINVAL;
+
+	if (!buffer)
+		return ret;
+
+	if (*buffer == '+' )
+		op = 1;
+	else if (*buffer == '-')
+		op = 0;
+	else
+		return ret;
+	
+	buffer++;
+
+	mutex_lock(&cache_chain_mutex);
+	list_for_each_entry(cachep, &cache_chain, next) {
+		const char *cname = cachep->name;
+		const char *ptr = buffer;
+		const char *next = NULL;
+
+		if (cachep->memcg_params.id == -1)
+			continue;
+
+		while (*ptr && *cname) {
+			if (*ptr == '*') {
+				if (!next) {
+					next = ptr;
+					next++;
+				} 
+				if (*next == *cname) {
+					next = NULL;
+					ptr++;	
+					continue;
+				}
+				cname++;
+				if (!*cname)
+					ptr++;
+				continue;
+			} else if (*ptr != *cname)
+				break;
+			ptr++;
+			cname++;
+		}
+		if (*cname || *ptr)
+			continue;
+		ret = 0;
+
+		if (op == 0)
+			mem_cgroup_slab_disallow(mem, cachep->memcg_params.id);
+		else if (op == 1)
+			mem_cgroup_slab_allow(mem, cachep->memcg_params.id);
+		else
+			BUG();
+	}
+
+	mutex_unlock(&cache_chain_mutex);
+	return ret;
+}
+
 #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
 
 #ifdef CONFIG_DEBUG_SLAB_LEAK
-- 
1.7.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC] simple system for enable/disable slabs being tracked by memcg.
  2012-03-28 16:42 [RFC] simple system for enable/disable slabs being tracked by memcg Glauber Costa
@ 2012-03-29  0:01 ` Suleiman Souhlal
  2012-03-29 11:10   ` Glauber Costa
  2012-03-29  0:03 ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 6+ messages in thread
From: Suleiman Souhlal @ 2012-03-29  0:01 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Hiroyouki Kamezawa, Johannes Weiner, Michal Hocko

Hi Glauber,

On Wed, Mar 28, 2012 at 9:42 AM, Glauber Costa <glommer@parallels.com> wrote:
> Hi.
>
> This is a proposal I've got for how to finally settle down the
> question of which slabs should be tracked. The patch I am providing
> is for discussion only, and should apply ontop of Suleiman's latest
> version posted to the list.
>
> The idea is to create a new file, memory.kmem.slabs_allowed.
> I decided not to overload the slabinfo file for that, but I can,
> if you ultimately want to. I just think it is cleaner this way.
> As a small rationale, I'd like to somehow show which caches are
> available but disabled. And yet, keep the format compatible with
> /proc/slabinfo.
>
> Reading from this file will provide this information
> Writers should write a string:
>  [+-]cache_name
>
> The wild card * is accepted, but only that. I am leaving
> any complex processing to userspace.
>
> The * wildcard, though, is nice. It allows us to do:
>  -* (disable all)
>  +cache1
>  +cache2
>
> and so on.
>
> Part of this patch is actually converting the slab pointers in memcg
> to a complex memcg-specific structure that can hold a disabled pointer.
>
> We could actually store it in a free bit in the address, but that is
> a first version. Let me know if this is how you would like me to tackle
> this.
>
> With a system like this (either this, or something alike), my opposition
> to Suleiman's idea of tracking everything under the sun basically vanishes,
> since I can then selectively disable most of them.
>
> I still prefer a special kmalloc call than a GFP flag, though.

How would something like this interact with slab types that will have
a per-memcg shrinker?
Only do memcg shrinking for a slab type if it's not disabled?

While I like the idea of making it configurable by the user, I wonder
if we should be adding even more complexity to an already large
patchset, at this point.
I am also afraid that we might make this too hard setup correctly and use.

If it's ok, I'd prefer to keep going with a slab flag being passed to
kmem_cache_create, to determine if a slab type should be accounted or
not (opt-in), for now.

-- Suleiman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] simple system for enable/disable slabs being tracked by memcg.
  2012-03-28 16:42 [RFC] simple system for enable/disable slabs being tracked by memcg Glauber Costa
  2012-03-29  0:01 ` Suleiman Souhlal
@ 2012-03-29  0:03 ` KAMEZAWA Hiroyuki
  2012-03-29 11:22   ` Glauber Costa
  2012-03-29 11:51   ` Glauber Costa
  1 sibling, 2 replies; 6+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-29  0:03 UTC (permalink / raw)
  To: Glauber Costa; +Cc: linux-mm, Suleiman Souhlal, Johannes Weiner, Michal Hocko

(2012/03/29 1:42), Glauber Costa wrote:

> Hi.
> 
> This is a proposal I've got for how to finally settle down the
> question of which slabs should be tracked. The patch I am providing
> is for discussion only, and should apply ontop of Suleiman's latest
> version posted to the list.
> 
> The idea is to create a new file, memory.kmem.slabs_allowed.
> I decided not to overload the slabinfo file for that, but I can,
> if you ultimately want to. I just think it is cleaner this way.
> As a small rationale, I'd like to somehow show which caches are
> available but disabled. And yet, keep the format compatible with
> /proc/slabinfo.
> 
> Reading from this file will provide this information
> Writers should write a string:
>  [+-]cache_name
> 
> The wild card * is accepted, but only that. I am leaving
> any complex processing to userspace.
> 
> The * wildcard, though, is nice. It allows us to do:
>  -* (disable all)
>  +cache1
>  +cache2
> 

I like to pass a word 'all' explicitly rather than wildcard..

Hmm, but having private format of list is good ?
In another idea, how about having 3 files as device cgroup ?

	memory.kmem.slabs.allow   (similar to device.allow)
	memory.kmem.slabs.deny    (similar to device.deny)
	memory.kmem.slabs.list	    (similar to device.list)

BTW, when a slab which is accounted is changed to be unaccounted,
res_counter.usage will decrease properly ?

small comments in below.


> and so on.
> 
> Part of this patch is actually converting the slab pointers in memcg
> to a complex memcg-specific structure that can hold a disabled pointer.
> 
> We could actually store it in a free bit in the address, but that is
> a first version. Let me know if this is how you would like me to tackle
> this.
> 
> With a system like this (either this, or something alike), my opposition
> to Suleiman's idea of tracking everything under the sun basically vanishes,
> since I can then selectively disable most of them.
> 
> I still prefer a special kmalloc call than a GFP flag, though.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Suleiman Souhlal <suleiman@google.com>
> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Michal Hocko <mhocko@suse.cz>
> ---
>  include/linux/memcontrol.h |   17 ++++++++
>  include/linux/slab.h       |   13 ++++++
>  mm/memcontrol.c            |   87 ++++++++++++++++++++++++++++++++++----
>  mm/slab.c                  |   99 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 207 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f5458b0..acd38a5 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -427,6 +427,9 @@ bool mem_cgroup_charge_slab(struct kmem_cache *cachep, gfp_t gfp, size_t size);
>  void mem_cgroup_uncharge_slab(struct kmem_cache *cachep, size_t size);
>  void mem_cgroup_flush_cache_create_queue(void);
>  void mem_cgroup_remove_child_kmem_cache(struct kmem_cache *cachep, int id);
> +int mem_cgroup_slab_allowed(struct mem_cgroup *memcg, int id);
> +void mem_cgroup_slab_allow(struct mem_cgroup *memcg, int id);
> +void mem_cgroup_slab_disallow(struct mem_cgroup *memcg, int id);
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
>  static inline void sock_update_memcg(struct sock *sk)
>  {
> @@ -456,6 +459,20 @@ static inline void
>  mem_cgroup_flush_cache_create_queue(void)
>  {
>  }
> +
> +int mem_cgroup_slab_allowed(struct mem_cgroup *memcg, int id)
> +{
> +	return 0;
> +}
> +
> +void mem_cgroup_slab_disallow(struct mem_cgroup *memcg, int id)
> +{
> +}
> +
> +void mem_cgroup_slab_allow(struct mem_cgroup *memcg, int id)
> +{
> +}
> +
>  #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM */
>  #endif /* _LINUX_MEMCONTROL_H */
>  
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 0ff5ee2..3106843 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -380,6 +380,8 @@ void kmem_cache_drop_ref(struct kmem_cache *cachep);
>  
>  void *kmalloc_no_account(size_t size, gfp_t flags);
>  
> +int mem_cgroup_tune_slab(struct mem_cgroup *mem, const char *buffer);
> +int mem_cgroup_probe_slab(struct mem_cgroup *mem, struct seq_file *m);
>  #else /* !CONFIG_CGROUP_MEM_RES_CTLR_KMEM || !CONFIG_SLAB */
>  
>  #define MAX_KMEM_CACHE_TYPES 0
> @@ -407,6 +409,17 @@ mem_cgroup_slabinfo(struct mem_cgroup *mem, struct seq_file *m)
>  	return 0;
>  }
>  
> +static inline int mem_cgroup_tune_slab(struct mem_cgroup *mem, const char *buffer)
> +{
> +	return 0;
> +}
> +
> +static inline int mem_cgroup_probe_slab(struct mem_cgroup *mem, const char *buffer)
> +{
> +	return 0;
> +}
> +
> +
>  #endif /* CONFIG_CGROUP_MEM_RES_CTLR_KMEM && CONFIG_SLAB */
>  
>  #endif	/* _LINUX_SLAB_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba042d9..e8c6a92 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -226,6 +226,11 @@ enum memcg_flags {
>  					 */
>  };
>  
> +struct memcg_slab {
> +	struct kmem_cache *cache;
> +	bool disabled;
> +};
> +
>  /*
>   * The memory controller data structure. The memory controller controls both
>   * page cache and RSS per cgroup. We would eventually like to provide
> @@ -305,7 +310,7 @@ struct mem_cgroup {
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
>  	/* Slab accounting */
> -	struct kmem_cache *slabs[MAX_KMEM_CACHE_TYPES];
> +	struct memcg_slab slabs[MAX_KMEM_CACHE_TYPES];
>  #endif
>  };
>  
> @@ -4671,6 +4676,21 @@ static int mem_cgroup_independent_kmem_limit_write(struct cgroup *cgrp,
>  	return 0;
>  }
>  
> +int mem_cgroup_slab_allowed(struct mem_cgroup *memcg, int idx)
> +{
> +	return !memcg->slabs[idx].disabled;
> +}
> +
> +void mem_cgroup_slab_allow(struct mem_cgroup *memcg, int idx)
> +{
> +	memcg->slabs[idx].disabled = false;
> +}
> +
> +void mem_cgroup_slab_disallow(struct mem_cgroup *memcg, int idx)
> +{
> +	memcg->slabs[idx].disabled = true;
> +}
> +


>  static int
>  mem_cgroup_slabinfo_show(struct cgroup *cgroup, struct cftype *ctf,
>      struct seq_file *m)
> @@ -4685,6 +4705,35 @@ mem_cgroup_slabinfo_show(struct cgroup *cgroup, struct cftype *ctf,
>  	return mem_cgroup_slabinfo(mem, m);
>  }
>  
> +static int mem_cgroup_slabs_read(struct cgroup *cgroup, struct cftype *ctf,
> +				 struct seq_file *m)
> +{
> +	struct mem_cgroup *mem;
> +
> +	mem  = mem_cgroup_from_cont(cgroup);
> +
> +	if (mem == root_mem_cgroup)
> +		return -EINVAL;
> +
> +	if (!list_empty(&cgroup->children))
> +		return -EBUSY;
> +
> +	return mem_cgroup_probe_slab(mem, m);
> +}
> +
> +static int mem_cgroup_slabs_write(struct cgroup *cgroup, struct cftype *cft,
> +				  const char *buffer)
> +{
> +	struct mem_cgroup *mem;
> +
> +	mem  = mem_cgroup_from_cont(cgroup);
> +
> +	if (mem == root_mem_cgroup)
> +		return -EINVAL;
> +
> +	return mem_cgroup_tune_slab(mem, buffer);
> +}
> +
>  static struct cftype kmem_cgroup_files[] = {
>  	{
>  		.name = "kmem.independent_kmem_limit",
> @@ -4706,6 +4755,12 @@ static struct cftype kmem_cgroup_files[] = {
>  		.name = "kmem.slabinfo",
>  		.read_seq_string = mem_cgroup_slabinfo_show,
>  	},
> +	{
> +		.name = "kmem.slabs_allowed",
> +		.read_seq_string = mem_cgroup_slabs_read,
> +		.write_string = mem_cgroup_slabs_write,
> +	},
> +
>  };
>  
>  static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss)
> @@ -5765,7 +5820,7 @@ memcg_create_kmem_cache(struct mem_cgroup *memcg, struct kmem_cache *cachep)
>  	 * This should behave as a write barrier, so we should be fine
>  	 * with RCU.
>  	 */
> -	if (cmpxchg(&memcg->slabs[idx], NULL, new_cachep) != NULL) {
> +	if (cmpxchg(&memcg->slabs[idx].cache, NULL, new_cachep) != NULL) {
>  		kmem_cache_destroy(new_cachep);
>  		return cachep;
>  	}


I'm sorry if I misunderstand.... can we use cmpxchg in generic code of the kernel ?
We need to put this under #if defined(__HAVE_ARCH_CMPXCHG) ?


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] simple system for enable/disable slabs being tracked by memcg.
  2012-03-29  0:01 ` Suleiman Souhlal
@ 2012-03-29 11:10   ` Glauber Costa
  0 siblings, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-03-29 11:10 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: linux-mm, Hiroyouki Kamezawa, Johannes Weiner, Michal Hocko

On 03/29/2012 02:01 AM, Suleiman Souhlal wrote:
> Hi Glauber,
>
> On Wed, Mar 28, 2012 at 9:42 AM, Glauber Costa<glommer@parallels.com>  wrote:
>> Hi.
>>
>> This is a proposal I've got for how to finally settle down the
>> question of which slabs should be tracked. The patch I am providing
>> is for discussion only, and should apply ontop of Suleiman's latest
>> version posted to the list.
>>
>> The idea is to create a new file, memory.kmem.slabs_allowed.
>> I decided not to overload the slabinfo file for that, but I can,
>> if you ultimately want to. I just think it is cleaner this way.
>> As a small rationale, I'd like to somehow show which caches are
>> available but disabled. And yet, keep the format compatible with
>> /proc/slabinfo.
>>
>> Reading from this file will provide this information
>> Writers should write a string:
>>   [+-]cache_name
>>
>> The wild card * is accepted, but only that. I am leaving
>> any complex processing to userspace.
>>
>> The * wildcard, though, is nice. It allows us to do:
>>   -* (disable all)
>>   +cache1
>>   +cache2
>>
>> and so on.
>>
>> Part of this patch is actually converting the slab pointers in memcg
>> to a complex memcg-specific structure that can hold a disabled pointer.
>>
>> We could actually store it in a free bit in the address, but that is
>> a first version. Let me know if this is how you would like me to tackle
>> this.
>>
>> With a system like this (either this, or something alike), my opposition
>> to Suleiman's idea of tracking everything under the sun basically vanishes,
>> since I can then selectively disable most of them.
>>
>> I still prefer a special kmalloc call than a GFP flag, though.
>
> How would something like this interact with slab types that will have
> a per-memcg shrinker?
> Only do memcg shrinking for a slab type if it's not disabled?

The idea is that if the slab type is disabled, it should not even be 
created.

I actually plan to include some tests to disallow disabling slabs after 
either they are created already, or we have tasks in the cgroup.

Regardless of the path, this is only sane if it is a setup-like thing 
much like it is for use_hierarchy today.

Enabling a disabled cache, though, can always be fine...

So if the cache was never created, there is nothing to worry about wrt 
shrinkers.

> While I like the idea of making it configurable by the user, I wonder
> if we should be adding even more complexity to an already large
> patchset, at this point.
I don't see a reason not to, specially because it is pretty 
self-contained. Moreover, one of the reasons I moved forward with this, 
is that I believe the kind of interfaces we'll have at hand can 
interfere in some design decisions.

Now, whether or not we should *merge* them all at the same time, is a 
different story. We do can phase it if needed.

> I am also afraid that we might make this too hard setup correctly and use.

Can you please try to make this a bit more sound? At this point, I think 
that getting the interface right is more important than the 
implementation, so your concerns would be a very nice outcome of this 
discussion.

> If it's ok, I'd prefer to keep going with a slab flag being passed to
> kmem_cache_create, to determine if a slab type should be accounted or
> not (opt-in), for now.
See, now that's something I don't agree with.

It's too important of a change for us to keep flipping. So I believe if
everybody agrees with a general interface for disabling/enabling the 
caches, we should design with that in mind (even if we phase the 
merging), and stick to it.

What works best for your use case, I'll leave to you: if it is tracking 
everything, or have a flag for the tracked ones.

But let's make the decision based on how it should look like, not as an 
intermediate step.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] simple system for enable/disable slabs being tracked by memcg.
  2012-03-29  0:03 ` KAMEZAWA Hiroyuki
@ 2012-03-29 11:22   ` Glauber Costa
  2012-03-29 11:51   ` Glauber Costa
  1 sibling, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-03-29 11:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Suleiman Souhlal, Johannes Weiner, Michal Hocko

On 03/29/2012 02:03 AM, KAMEZAWA Hiroyuki wrote:
> (2012/03/29 1:42), Glauber Costa wrote:
> 
>> Hi.
>>
>> This is a proposal I've got for how to finally settle down the
>> question of which slabs should be tracked. The patch I am providing
>> is for discussion only, and should apply ontop of Suleiman's latest
>> version posted to the list.
>>
>> The idea is to create a new file, memory.kmem.slabs_allowed.
>> I decided not to overload the slabinfo file for that, but I can,
>> if you ultimately want to. I just think it is cleaner this way.
>> As a small rationale, I'd like to somehow show which caches are
>> available but disabled. And yet, keep the format compatible with
>> /proc/slabinfo.
>>
>> Reading from this file will provide this information
>> Writers should write a string:
>>   [+-]cache_name
>>
>> The wild card * is accepted, but only that. I am leaving
>> any complex processing to userspace.
>>
>> The * wildcard, though, is nice. It allows us to do:
>>   -* (disable all)
>>   +cache1
>>   +cache2
>>
> 
> I like to pass a word 'all' explicitly rather than wildcard..
I don't for a very simple reason:

We don't have a cache called "all", but one could very well come up with
one in a driver.
And then we have problems.

Now, it is a lot less likely that someone will ever create a cache
called "*"

Also, the wildcard allows us to do things like size-*. But that it is
not terribly important,
since I am leaving all the heavy processing for userspace, though. It is
convenient and I
like it, but if you really oppose, we can have * to mean all, but not
allow the * symbol to be
used as a wildcard in the middle of a string.

We could, of course, not even have it, and simply rely on userspace to
read all
caches from the available list and move them to the deny list.

But I think the "disable all" and "enable all" are somewhat special.
There can
be races associated with caches appearing in the middle of those two
operations,
and this semantics avoids them.

That actually applies to the use of * as a wildcard as well. The only
sane way to say:
"Enable/Disable" all size-type caches, is to have it done atomically.

> 
> Hmm, but having private format of list is good ?
> In another idea, how about having 3 files as device cgroup ?
> 
> 	memory.kmem.slabs.allow   (similar to device.allow)
> 	memory.kmem.slabs.deny    (similar to device.deny)
> 	memory.kmem.slabs.list	    (similar to device.list)
> 
> BTW, when a slab which is accounted is changed to be unaccounted,
> res_counter.usage will decrease properly ?
> 
> small comments in below.
> 
> 

>> -	if (cmpxchg(&memcg->slabs[idx], NULL, new_cachep) != NULL) {
>> +	if (cmpxchg(&memcg->slabs[idx].cache, NULL, new_cachep) != NULL) {
>>   		kmem_cache_destroy(new_cachep);
>>   		return cachep;
>>   	}
> 
> 
> I'm sorry if I misunderstand.... can we use cmpxchg in generic code of the kernel ?
> We need to put this under #if defined(__HAVE_ARCH_CMPXCHG) ?
> 
> 

The cmpxchg was already there. I am just patching the cache access.
Now for the question itself, If I'm not mistaken this macro should be
safe. xchg() seems
to be used quite extensively, and I would expect a emulation function to
be provided for arches
lacking cmpxchg ?

Still, is better to have Suleiman to confirm this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] simple system for enable/disable slabs being tracked by memcg.
  2012-03-29  0:03 ` KAMEZAWA Hiroyuki
  2012-03-29 11:22   ` Glauber Costa
@ 2012-03-29 11:51   ` Glauber Costa
  1 sibling, 0 replies; 6+ messages in thread
From: Glauber Costa @ 2012-03-29 11:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Suleiman Souhlal, Johannes Weiner, Michal Hocko

On 03/29/2012 02:03 AM, KAMEZAWA Hiroyuki wrote:
> Hmm, but having private format of list is good ?
> In another idea, how about having 3 files as device cgroup ?
> 
> 	memory.kmem.slabs.allow   (similar to device.allow)
> 	memory.kmem.slabs.deny    (similar to device.deny)
> 	memory.kmem.slabs.list	    (similar to device.list)
> 
I forgot to comment on this point.

I fear #files explosion, but since one the drawbacks of our beloved cgroups
IMHO is the sheer inconsistency among them (which we're trying to fix), if
there is precedence for that, we can use it. Certainly solves the problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-03-29 11:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28 16:42 [RFC] simple system for enable/disable slabs being tracked by memcg Glauber Costa
2012-03-29  0:01 ` Suleiman Souhlal
2012-03-29 11:10   ` Glauber Costa
2012-03-29  0:03 ` KAMEZAWA Hiroyuki
2012-03-29 11:22   ` Glauber Costa
2012-03-29 11:51   ` Glauber Costa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.