All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] memcg: towards I/O aware memcg v5
@ 2010-08-20  9:55 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20  9:55 UTC (permalink / raw)
  To: linux-mm
  Cc: nishimura, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

This is v5.

Sorry for delaying...but I had time for resetting myself and..several
changes are added. I think this version is simpler than v4.

Major changes from v4 is 
 a) added kernel/cgroup.c hooks again. (for b)
 b) make RCU aware. previous version seems dangerous in an extreme case.

Then, codes are updated. Most of changes are related to RCU.

Patch brief view:
 1. add hooks to kernel/cgroup.c for ID management.
 2. use ID-array in memcg.
 3. record ID to page_cgroup rather than pointer.
 4. make update_file_mapped to be RCU aware routine instead of spinlock.
 5. make update_file_mapped as general-purpose function.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH] memcg: towards I/O aware memcg v5
@ 2010-08-20  9:55 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20  9:55 UTC (permalink / raw)
  To: linux-mm
  Cc: nishimura, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

This is v5.

Sorry for delaying...but I had time for resetting myself and..several
changes are added. I think this version is simpler than v4.

Major changes from v4 is 
 a) added kernel/cgroup.c hooks again. (for b)
 b) make RCU aware. previous version seems dangerous in an extreme case.

Then, codes are updated. Most of changes are related to RCU.

Patch brief view:
 1. add hooks to kernel/cgroup.c for ID management.
 2. use ID-array in memcg.
 3. record ID to page_cgroup rather than pointer.
 4. make update_file_mapped to be RCU aware routine instead of spinlock.
 5. make update_file_mapped as general-purpose function.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 1/5] cgroup: ID notification call back
  2010-08-20  9:55 ` KAMEZAWA Hiroyuki
@ 2010-08-20  9:58   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20  9:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, menage, lizf

CC'ed to Paul Menage and Li Zefan.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
after successful call of ->create(). css_ID is tightly coupled with
css struct itself but it is allocated by ->create() call, IOW,
per-subsystem special allocations.

To know css_id before creation, this patch adds id_attached() callback.
after css_ID allocation. This will be used by memory cgroup's quick lookup
routine.

Maybe you can think of other implementations as
	- pass ID to ->create()
	or
	- add post_create()
	etc...
But when considering dirtiness of codes, this straightforward patch seems
good to me. If someone wants post_create(), this patch can be replaced.

Changelog: 20100820
 - new approarch.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/cgroups.txt |   10 ++++++++++
 include/linux/cgroup.h            |    1 +
 kernel/cgroup.c                   |    5 ++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

Index: mmotm-0811/Documentation/cgroups/cgroups.txt
===================================================================
--- mmotm-0811.orig/Documentation/cgroups/cgroups.txt
+++ mmotm-0811/Documentation/cgroups/cgroups.txt
@@ -621,6 +621,16 @@ and root cgroup. Currently this will onl
 the default hierarchy (which never has sub-cgroups) and a hierarchy
 that is being created/destroyed (and hence has no sub-cgroups).
 
+void id_attached(struct cgroup_subsys *ss, struct cgroup *root)
+(cgroup_mutex and ss->hierarchy_mutex held by caller)
+(called only when ss->use_id=1)
+
+Called when css_id is attached to css. Because css_id is assigned
+against "css", css_id is not available until ->create() is called.
+If subsystem wants to make use of ID at createtion time, use
+this handler. This handler will be called after css_id is assigned
+to css. Not necessary to be implemented in usual(see memcontrol.c)
+
 4. Questions
 ============
 
Index: mmotm-0811/include/linux/cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/cgroup.h
+++ mmotm-0811/include/linux/cgroup.h
@@ -475,6 +475,7 @@ struct cgroup_subsys {
 			struct cgroup *cgrp);
 	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
+	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 
 	int subsys_id;
 	int active;
Index: mmotm-0811/kernel/cgroup.c
===================================================================
--- mmotm-0811.orig/kernel/cgroup.c
+++ mmotm-0811/kernel/cgroup.c
@@ -4618,6 +4618,8 @@ static int __init_or_module cgroup_init_
 	newid->stack[0] = newid->id;
 	newid->css = rootcss;
 	rootcss->id = newid;
+	if (ss->id_attached)
+		ss->id_attached(ss, dummytop);
 	return 0;
 }
 
@@ -4646,7 +4648,8 @@ static int alloc_css_id(struct cgroup_su
 	 * see cgroup_populate_dir()
 	 */
 	rcu_assign_pointer(child_css->id, child_id);
-
+	if (ss->id_attached)
+		ss->id_attached(ss, child);
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-20  9:58   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20  9:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, menage, lizf

CC'ed to Paul Menage and Li Zefan.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
after successful call of ->create(). css_ID is tightly coupled with
css struct itself but it is allocated by ->create() call, IOW,
per-subsystem special allocations.

To know css_id before creation, this patch adds id_attached() callback.
after css_ID allocation. This will be used by memory cgroup's quick lookup
routine.

Maybe you can think of other implementations as
	- pass ID to ->create()
	or
	- add post_create()
	etc...
But when considering dirtiness of codes, this straightforward patch seems
good to me. If someone wants post_create(), this patch can be replaced.

Changelog: 20100820
 - new approarch.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/cgroups.txt |   10 ++++++++++
 include/linux/cgroup.h            |    1 +
 kernel/cgroup.c                   |    5 ++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

Index: mmotm-0811/Documentation/cgroups/cgroups.txt
===================================================================
--- mmotm-0811.orig/Documentation/cgroups/cgroups.txt
+++ mmotm-0811/Documentation/cgroups/cgroups.txt
@@ -621,6 +621,16 @@ and root cgroup. Currently this will onl
 the default hierarchy (which never has sub-cgroups) and a hierarchy
 that is being created/destroyed (and hence has no sub-cgroups).
 
+void id_attached(struct cgroup_subsys *ss, struct cgroup *root)
+(cgroup_mutex and ss->hierarchy_mutex held by caller)
+(called only when ss->use_id=1)
+
+Called when css_id is attached to css. Because css_id is assigned
+against "css", css_id is not available until ->create() is called.
+If subsystem wants to make use of ID at createtion time, use
+this handler. This handler will be called after css_id is assigned
+to css. Not necessary to be implemented in usual(see memcontrol.c)
+
 4. Questions
 ============
 
Index: mmotm-0811/include/linux/cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/cgroup.h
+++ mmotm-0811/include/linux/cgroup.h
@@ -475,6 +475,7 @@ struct cgroup_subsys {
 			struct cgroup *cgrp);
 	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
+	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 
 	int subsys_id;
 	int active;
Index: mmotm-0811/kernel/cgroup.c
===================================================================
--- mmotm-0811.orig/kernel/cgroup.c
+++ mmotm-0811/kernel/cgroup.c
@@ -4618,6 +4618,8 @@ static int __init_or_module cgroup_init_
 	newid->stack[0] = newid->id;
 	newid->css = rootcss;
 	rootcss->id = newid;
+	if (ss->id_attached)
+		ss->id_attached(ss, dummytop);
 	return 0;
 }
 
@@ -4646,7 +4648,8 @@ static int alloc_css_id(struct cgroup_su
 	 * see cgroup_populate_dir()
 	 */
 	rcu_assign_pointer(child_css->id, child_id);
-
+	if (ss->id_attached)
+		ss->id_attached(ss, child);
 	return 0;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 2/5] memcg: use array and ID for quick look up
  2010-08-20  9:55 ` KAMEZAWA Hiroyuki
@ 2010-08-20  9:59   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20  9:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, memory cgroup has an ID per cgroup and make use of it at
 - hierarchy walk,
 - swap recording.

This patch is for making more use of it. The final purpose is
to replace page_cgroup->mem_cgroup's pointer to an unsigned short.

This patch caches a pointer of memcg in an array. By this, we
don't have to call css_lookup() which requires radix-hash walk.
This saves some amount of memory footprint at lookup memcg via id.

Changelog: 20100811
 - adjusted onto mmotm-2010-08-11
 - fixed RCU related parts.
 - use attach_id() callback.

Changelog: 20100804
 - fixed description in init/Kconfig

Changelog: 20100730
 - fixed rcu_read_unlock() placement.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 init/Kconfig    |   10 ++++++
 mm/memcontrol.c |   83 ++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 73 insertions(+), 20 deletions(-)

Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -195,6 +195,7 @@ static void mem_cgroup_oom_notify(struct
  */
 struct mem_cgroup {
 	struct cgroup_subsys_state css;
+	int	valid; /* for checking validness under RCU access.*/
 	/*
 	 * the counter to account for memory usage
 	 */
@@ -294,6 +295,29 @@ static bool move_file(void)
 					&mc.to->move_charge_at_immigrate);
 }
 
+/* 0 is unused */
+static atomic_t mem_cgroup_num;
+#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
+static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
+
+/* Must be called under rcu_read_lock */
+static struct mem_cgroup *id_to_memcg(unsigned short id)
+{
+	struct mem_cgroup *ret;
+	/* see mem_cgroup_free() */
+	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
+	if (likely(ret && ret->valid))
+		return ret;
+	return NULL;
+}
+
+static void register_memcg_id(struct mem_cgroup *mem)
+{
+	int id = css_id(&mem->css);
+	rcu_assign_pointer(mem_cgroups[id], mem);
+	VM_BUG_ON(!mem->valid);
+}
+
 /*
  * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
  * limit reclaim to prevent infinite loops, if they ever occur.
@@ -1847,18 +1871,7 @@ static void mem_cgroup_cancel_charge(str
  * it's concern. (dropping refcnt from swap can be called against removed
  * memcg.)
  */
-static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
-{
-	struct cgroup_subsys_state *css;
 
-	/* ID 0 is unused ID */
-	if (!id)
-		return NULL;
-	css = css_lookup(&mem_cgroup_subsys, id);
-	if (!css)
-		return NULL;
-	return container_of(css, struct mem_cgroup, css);
-}
 
 struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
@@ -1879,7 +1892,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
 		ent.val = page_private(page);
 		id = lookup_swap_cgroup(ent);
 		rcu_read_lock();
-		mem = mem_cgroup_lookup(id);
+		mem = id_to_memcg(id);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 		rcu_read_unlock();
@@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
 
 		id = swap_cgroup_record(ent, 0);
 		rcu_read_lock();
-		memcg = mem_cgroup_lookup(id);
+		memcg = id_to_memcg(id);
 		if (memcg) {
 			/*
 			 * This recorded memcg can be obsolete one. So, avoid
@@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
 			if (!mem_cgroup_is_root(memcg))
 				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
 			mem_cgroup_swap_statistics(memcg, false);
+			rcu_read_unlock();
 			mem_cgroup_put(memcg);
-		}
-		rcu_read_unlock();
+		} else
+			rcu_read_unlock();
 	}
 	/*
 	 * At swapin, we may charge account against cgroup which has no tasks.
@@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
 
 	id = swap_cgroup_record(ent, 0);
 	rcu_read_lock();
-	memcg = mem_cgroup_lookup(id);
+	memcg = id_to_memcg(id);
 	if (memcg) {
 		/*
 		 * We uncharge this because swap is freed.
@@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
 		if (!mem_cgroup_is_root(memcg))
 			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
 		mem_cgroup_swap_statistics(memcg, false);
+		rcu_read_unlock();
 		mem_cgroup_put(memcg);
-	}
-	rcu_read_unlock();
+	} else
+		rcu_read_unlock();
 }
 
 /**
@@ -4010,6 +4025,9 @@ static struct mem_cgroup *mem_cgroup_all
 	struct mem_cgroup *mem;
 	int size = sizeof(struct mem_cgroup);
 
+	if (atomic_read(&mem_cgroup_num) == NR_MEMCG_GROUPS)
+		return NULL;
+
 	/* Can be very big if MAX_NUMNODES is very big */
 	if (size < PAGE_SIZE)
 		mem = kmalloc(size, GFP_KERNEL);
@@ -4020,6 +4038,7 @@ static struct mem_cgroup *mem_cgroup_all
 		return NULL;
 
 	memset(mem, 0, size);
+	mem->valid = 1;
 	mem->stat = alloc_percpu(struct mem_cgroup_stat_cpu);
 	if (!mem->stat) {
 		if (size < PAGE_SIZE)
@@ -4049,6 +4068,7 @@ static void __mem_cgroup_free(struct mem
 	mem_cgroup_remove_from_trees(mem);
 	free_css_id(&mem_cgroup_subsys, &mem->css);
 
+	atomic_dec(&mem_cgroup_num);
 	for_each_node_state(node, N_POSSIBLE)
 		free_mem_cgroup_per_zone_info(mem, node);
 
@@ -4059,6 +4079,19 @@ static void __mem_cgroup_free(struct mem
 		vfree(mem);
 }
 
+static void mem_cgroup_free(struct mem_cgroup *mem)
+{
+	/* No more lookup */
+	mem->valid = 0;
+	rcu_assign_pointer(mem_cgroups[css_id(&mem->css)], NULL);
+	/*
+	 * Because we call vfree() etc...use synchronize_rcu() rather than
+ 	 * call_rcu();
+ 	 */
+	synchronize_rcu();
+	__mem_cgroup_free(mem);
+}
+
 static void mem_cgroup_get(struct mem_cgroup *mem)
 {
 	atomic_inc(&mem->refcnt);
@@ -4068,7 +4101,7 @@ static void __mem_cgroup_put(struct mem_
 {
 	if (atomic_sub_and_test(count, &mem->refcnt)) {
 		struct mem_cgroup *parent = parent_mem_cgroup(mem);
-		__mem_cgroup_free(mem);
+		mem_cgroup_free(mem);
 		if (parent)
 			mem_cgroup_put(parent);
 	}
@@ -4184,13 +4217,22 @@ mem_cgroup_create(struct cgroup_subsys *
 	atomic_set(&mem->refcnt, 1);
 	mem->move_charge_at_immigrate = 0;
 	mutex_init(&mem->thresholds_lock);
+	atomic_inc(&mem_cgroup_num);
 	return &mem->css;
 free_out:
-	__mem_cgroup_free(mem);
+	mem_cgroup_free(mem);
 	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
+static void
+mem_cgroup_id_attached(struct cgroup_subsys *ss, struct cgroup *cont)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
+
+	register_memcg_id(mem);
+}
+
 static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
@@ -4714,6 +4756,7 @@ struct cgroup_subsys mem_cgroup_subsys =
 	.can_attach = mem_cgroup_can_attach,
 	.cancel_attach = mem_cgroup_cancel_attach,
 	.attach = mem_cgroup_move_task,
+	.id_attached = mem_cgroup_id_attached,
 	.early_init = 0,
 	.use_id = 1,
 };
Index: mmotm-0811/init/Kconfig
===================================================================
--- mmotm-0811.orig/init/Kconfig
+++ mmotm-0811/init/Kconfig
@@ -594,6 +594,16 @@ config CGROUP_MEM_RES_CTLR_SWAP
 	  Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
 	  size is 4096bytes, 512k per 1Gbytes of swap.
 
+config MEM_CGROUP_MAX_GROUPS
+	int "Maximum number of memory cgroups on a system"
+	range 1 65535
+	default 8192 if 64BIT
+	default 2048 if 32BIT
+	help
+	  Memory cgroup has limitation of the number of groups created.
+	  Please select your favorite value. The more you allow, the more
+	  memory(a pointer per group) will be consumed.
+
 menuconfig CGROUP_SCHED
 	bool "Group CPU scheduler"
 	depends on EXPERIMENTAL && CGROUPS


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 2/5] memcg: use array and ID for quick look up
@ 2010-08-20  9:59   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20  9:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, memory cgroup has an ID per cgroup and make use of it at
 - hierarchy walk,
 - swap recording.

This patch is for making more use of it. The final purpose is
to replace page_cgroup->mem_cgroup's pointer to an unsigned short.

This patch caches a pointer of memcg in an array. By this, we
don't have to call css_lookup() which requires radix-hash walk.
This saves some amount of memory footprint at lookup memcg via id.

Changelog: 20100811
 - adjusted onto mmotm-2010-08-11
 - fixed RCU related parts.
 - use attach_id() callback.

Changelog: 20100804
 - fixed description in init/Kconfig

Changelog: 20100730
 - fixed rcu_read_unlock() placement.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 init/Kconfig    |   10 ++++++
 mm/memcontrol.c |   83 ++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 73 insertions(+), 20 deletions(-)

Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -195,6 +195,7 @@ static void mem_cgroup_oom_notify(struct
  */
 struct mem_cgroup {
 	struct cgroup_subsys_state css;
+	int	valid; /* for checking validness under RCU access.*/
 	/*
 	 * the counter to account for memory usage
 	 */
@@ -294,6 +295,29 @@ static bool move_file(void)
 					&mc.to->move_charge_at_immigrate);
 }
 
+/* 0 is unused */
+static atomic_t mem_cgroup_num;
+#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
+static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
+
+/* Must be called under rcu_read_lock */
+static struct mem_cgroup *id_to_memcg(unsigned short id)
+{
+	struct mem_cgroup *ret;
+	/* see mem_cgroup_free() */
+	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
+	if (likely(ret && ret->valid))
+		return ret;
+	return NULL;
+}
+
+static void register_memcg_id(struct mem_cgroup *mem)
+{
+	int id = css_id(&mem->css);
+	rcu_assign_pointer(mem_cgroups[id], mem);
+	VM_BUG_ON(!mem->valid);
+}
+
 /*
  * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
  * limit reclaim to prevent infinite loops, if they ever occur.
@@ -1847,18 +1871,7 @@ static void mem_cgroup_cancel_charge(str
  * it's concern. (dropping refcnt from swap can be called against removed
  * memcg.)
  */
-static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
-{
-	struct cgroup_subsys_state *css;
 
-	/* ID 0 is unused ID */
-	if (!id)
-		return NULL;
-	css = css_lookup(&mem_cgroup_subsys, id);
-	if (!css)
-		return NULL;
-	return container_of(css, struct mem_cgroup, css);
-}
 
 struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 {
@@ -1879,7 +1892,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
 		ent.val = page_private(page);
 		id = lookup_swap_cgroup(ent);
 		rcu_read_lock();
-		mem = mem_cgroup_lookup(id);
+		mem = id_to_memcg(id);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 		rcu_read_unlock();
@@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
 
 		id = swap_cgroup_record(ent, 0);
 		rcu_read_lock();
-		memcg = mem_cgroup_lookup(id);
+		memcg = id_to_memcg(id);
 		if (memcg) {
 			/*
 			 * This recorded memcg can be obsolete one. So, avoid
@@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
 			if (!mem_cgroup_is_root(memcg))
 				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
 			mem_cgroup_swap_statistics(memcg, false);
+			rcu_read_unlock();
 			mem_cgroup_put(memcg);
-		}
-		rcu_read_unlock();
+		} else
+			rcu_read_unlock();
 	}
 	/*
 	 * At swapin, we may charge account against cgroup which has no tasks.
@@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
 
 	id = swap_cgroup_record(ent, 0);
 	rcu_read_lock();
-	memcg = mem_cgroup_lookup(id);
+	memcg = id_to_memcg(id);
 	if (memcg) {
 		/*
 		 * We uncharge this because swap is freed.
@@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
 		if (!mem_cgroup_is_root(memcg))
 			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
 		mem_cgroup_swap_statistics(memcg, false);
+		rcu_read_unlock();
 		mem_cgroup_put(memcg);
-	}
-	rcu_read_unlock();
+	} else
+		rcu_read_unlock();
 }
 
 /**
@@ -4010,6 +4025,9 @@ static struct mem_cgroup *mem_cgroup_all
 	struct mem_cgroup *mem;
 	int size = sizeof(struct mem_cgroup);
 
+	if (atomic_read(&mem_cgroup_num) == NR_MEMCG_GROUPS)
+		return NULL;
+
 	/* Can be very big if MAX_NUMNODES is very big */
 	if (size < PAGE_SIZE)
 		mem = kmalloc(size, GFP_KERNEL);
@@ -4020,6 +4038,7 @@ static struct mem_cgroup *mem_cgroup_all
 		return NULL;
 
 	memset(mem, 0, size);
+	mem->valid = 1;
 	mem->stat = alloc_percpu(struct mem_cgroup_stat_cpu);
 	if (!mem->stat) {
 		if (size < PAGE_SIZE)
@@ -4049,6 +4068,7 @@ static void __mem_cgroup_free(struct mem
 	mem_cgroup_remove_from_trees(mem);
 	free_css_id(&mem_cgroup_subsys, &mem->css);
 
+	atomic_dec(&mem_cgroup_num);
 	for_each_node_state(node, N_POSSIBLE)
 		free_mem_cgroup_per_zone_info(mem, node);
 
@@ -4059,6 +4079,19 @@ static void __mem_cgroup_free(struct mem
 		vfree(mem);
 }
 
+static void mem_cgroup_free(struct mem_cgroup *mem)
+{
+	/* No more lookup */
+	mem->valid = 0;
+	rcu_assign_pointer(mem_cgroups[css_id(&mem->css)], NULL);
+	/*
+	 * Because we call vfree() etc...use synchronize_rcu() rather than
+ 	 * call_rcu();
+ 	 */
+	synchronize_rcu();
+	__mem_cgroup_free(mem);
+}
+
 static void mem_cgroup_get(struct mem_cgroup *mem)
 {
 	atomic_inc(&mem->refcnt);
@@ -4068,7 +4101,7 @@ static void __mem_cgroup_put(struct mem_
 {
 	if (atomic_sub_and_test(count, &mem->refcnt)) {
 		struct mem_cgroup *parent = parent_mem_cgroup(mem);
-		__mem_cgroup_free(mem);
+		mem_cgroup_free(mem);
 		if (parent)
 			mem_cgroup_put(parent);
 	}
@@ -4184,13 +4217,22 @@ mem_cgroup_create(struct cgroup_subsys *
 	atomic_set(&mem->refcnt, 1);
 	mem->move_charge_at_immigrate = 0;
 	mutex_init(&mem->thresholds_lock);
+	atomic_inc(&mem_cgroup_num);
 	return &mem->css;
 free_out:
-	__mem_cgroup_free(mem);
+	mem_cgroup_free(mem);
 	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
+static void
+mem_cgroup_id_attached(struct cgroup_subsys *ss, struct cgroup *cont)
+{
+	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
+
+	register_memcg_id(mem);
+}
+
 static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 					struct cgroup *cont)
 {
@@ -4714,6 +4756,7 @@ struct cgroup_subsys mem_cgroup_subsys =
 	.can_attach = mem_cgroup_can_attach,
 	.cancel_attach = mem_cgroup_cancel_attach,
 	.attach = mem_cgroup_move_task,
+	.id_attached = mem_cgroup_id_attached,
 	.early_init = 0,
 	.use_id = 1,
 };
Index: mmotm-0811/init/Kconfig
===================================================================
--- mmotm-0811.orig/init/Kconfig
+++ mmotm-0811/init/Kconfig
@@ -594,6 +594,16 @@ config CGROUP_MEM_RES_CTLR_SWAP
 	  Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
 	  size is 4096bytes, 512k per 1Gbytes of swap.
 
+config MEM_CGROUP_MAX_GROUPS
+	int "Maximum number of memory cgroups on a system"
+	range 1 65535
+	default 8192 if 64BIT
+	default 2048 if 32BIT
+	help
+	  Memory cgroup has limitation of the number of groups created.
+	  Please select your favorite value. The more you allow, the more
+	  memory(a pointer per group) will be consumed.
+
 menuconfig CGROUP_SCHED
 	bool "Group CPU scheduler"
 	depends on EXPERIMENTAL && CGROUPS

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH] memcg: use ID in page_cgroup
  2010-08-20  9:55 ` KAMEZAWA Hiroyuki
@ 2010-08-20 10:01   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki


I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
But it will be after this work.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, addresses of memory cgroup can be calculated by their ID without complex.
This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
On 64bit architecture, this offers us more 6bytes room per page_cgroup.
Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
some light-weight concurrent access.

We may able to move this id onto flags field but ...go step by step.

Changelog: 20100811
 - using new rcu APIs, as rcu_dereference_check() etc.
Changelog: 20100804
 - added comments to page_cgroup.h
Changelog: 20100730
 - fixed some garbage added by debug code in early stage

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    6 ++++-
 mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
 mm/page_cgroup.c            |    2 -
 3 files changed, 38 insertions(+), 22 deletions(-)

Index: mmotm-0811/include/linux/page_cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/page_cgroup.h
+++ mmotm-0811/include/linux/page_cgroup.h
@@ -9,10 +9,14 @@
  * page_cgroup helps us identify information about the cgroup
  * All page cgroups are allocated at boot or memory hotplug event,
  * then the page cgroup for pfn always exists.
+ *
+ * TODO: It seems ID for cgroup can be packed into "flags". But there will
+ * be race between assigning ID <-> set/clear flags. Please be careful.
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
+	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
+	unsigned short blk_cgroup;	/* Not Used..but will be. */
 	struct page *page;
 	struct list_head lru;		/* per cgroup LRU list */
 };
Index: mmotm-0811/mm/page_cgroup.c
===================================================================
--- mmotm-0811.orig/mm/page_cgroup.c
+++ mmotm-0811/mm/page_cgroup.c
@@ -15,7 +15,7 @@ static void __meminit
 __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
 {
 	pc->flags = 0;
-	pc->mem_cgroup = NULL;
+	pc->mem_cgroup = 0;
 	pc->page = pfn_to_page(pfn);
 	INIT_LIST_HEAD(&pc->lru);
 }
Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
 #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
 static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
 
-/* Must be called under rcu_read_lock */
-static struct mem_cgroup *id_to_memcg(unsigned short id)
+/* Must be called under rcu_read_lock, set safe==true if under lock */
+static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
 {
 	struct mem_cgroup *ret;
 	/* see mem_cgroup_free() */
-	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
+	ret = rcu_dereference_check(mem_cgroups[id],
+				rch_read_lock_held() || safe);
 	if (likely(ret && ret->valid))
 		return ret;
 	return NULL;
@@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
 static struct mem_cgroup_per_zone *
 page_cgroup_zoneinfo(struct page_cgroup *pc)
 {
-	struct mem_cgroup *mem = pc->mem_cgroup;
+	/*
+	 * The caller should guarantee this "pc" is under lock. In typical
+	 * case, this function is called by lru function with zone->lru_lock.
+	 * It is a safe access.
+	 */
+	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
 	int nid = page_cgroup_nid(pc);
 	int zid = page_cgroup_zid(pc);
 
@@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
 	return (mem == root_mem_cgroup);
 }
 
+static inline bool mem_cgroup_is_rootid(unsigned short id)
+{
+	return (id == 1);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
 	 */
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_is_rootid(pc->mem_cgroup))
 		return;
 	VM_BUG_ON(list_empty(&pc->lru));
 	list_del_init(&pc->lru);
@@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
 	 */
 	smp_rmb();
 	/* unused or root page is not rotated. */
-	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
 	SetPageCgroupAcctLRU(pc);
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_is_rootid(pc->mem_cgroup))
 		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
@@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
 		return;
 
 	lock_page_cgroup(pc);
-	mem = pc->mem_cgroup;
+	mem = id_to_memcg(pc->mem_cgroup, true);
 	if (!mem || !PageCgroupUsed(pc))
 		goto done;
 
@@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = id_to_memcg(pc->mem_cgroup, true);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 	} else if (PageSwapCache(page)) {
 		ent.val = page_private(page);
 		id = lookup_swap_cgroup(ent);
 		rcu_read_lock();
-		mem = id_to_memcg(id);
+		mem = id_to_memcg(id, false);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 		rcu_read_unlock();
@@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
 		return;
 	}
 
-	pc->mem_cgroup = mem;
+	pc->mem_cgroup = css_id(&mem->css);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
 	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
 	VM_BUG_ON(PageLRU(pc->page));
 	VM_BUG_ON(!PageCgroupLocked(pc));
 	VM_BUG_ON(!PageCgroupUsed(pc));
-	VM_BUG_ON(pc->mem_cgroup != from);
+	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
 
 	if (PageCgroupFileMapped(pc)) {
 		/* Update mapped_file data for mem_cgroup */
@@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
 		mem_cgroup_cancel_charge(from);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc->mem_cgroup = css_id(&to->css);
 	mem_cgroup_charge_statistics(to, pc, true);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
 {
 	int ret = -EINVAL;
 	lock_page_cgroup(pc);
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
+	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
 		__mem_cgroup_move_account(pc, from, to, uncharge);
 		ret = 0;
 	}
-	unlock_page_cgroup(pc);
+	rcu_read_unlock();
 	/*
 	 * check events
 	 */
@@ -2244,7 +2255,7 @@ __mem_cgroup_commit_charge_swapin(struct
 
 		id = swap_cgroup_record(ent, 0);
 		rcu_read_lock();
-		memcg = id_to_memcg(id);
+		memcg = id_to_memcg(id, false);
 		if (memcg) {
 			/*
 			 * This recorded memcg can be obsolete one. So, avoid
@@ -2354,7 +2365,7 @@ __mem_cgroup_uncharge_common(struct page
 
 	lock_page_cgroup(pc);
 
-	mem = pc->mem_cgroup;
+	mem = id_to_memcg(pc->mem_cgroup, true);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -2509,7 +2520,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
 
 	id = swap_cgroup_record(ent, 0);
 	rcu_read_lock();
-	memcg = id_to_memcg(id);
+	memcg = id_to_memcg(id, false);
 	if (memcg) {
 		/*
 		 * We uncharge this because swap is freed.
@@ -2600,7 +2611,7 @@ int mem_cgroup_prepare_migration(struct 
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = id_to_memcg(pc->mem_cgroup, true);
 		css_get(&mem->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -4442,7 +4453,8 @@ static int is_target_pte_for_mc(struct v
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) &&
+			id_to_memcg(pc->mem_cgroup, true) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH] memcg: use ID in page_cgroup
@ 2010-08-20 10:01   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki


I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
But it will be after this work.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, addresses of memory cgroup can be calculated by their ID without complex.
This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
On 64bit architecture, this offers us more 6bytes room per page_cgroup.
Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
some light-weight concurrent access.

We may able to move this id onto flags field but ...go step by step.

Changelog: 20100811
 - using new rcu APIs, as rcu_dereference_check() etc.
Changelog: 20100804
 - added comments to page_cgroup.h
Changelog: 20100730
 - fixed some garbage added by debug code in early stage

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    6 ++++-
 mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
 mm/page_cgroup.c            |    2 -
 3 files changed, 38 insertions(+), 22 deletions(-)

Index: mmotm-0811/include/linux/page_cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/page_cgroup.h
+++ mmotm-0811/include/linux/page_cgroup.h
@@ -9,10 +9,14 @@
  * page_cgroup helps us identify information about the cgroup
  * All page cgroups are allocated at boot or memory hotplug event,
  * then the page cgroup for pfn always exists.
+ *
+ * TODO: It seems ID for cgroup can be packed into "flags". But there will
+ * be race between assigning ID <-> set/clear flags. Please be careful.
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
+	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
+	unsigned short blk_cgroup;	/* Not Used..but will be. */
 	struct page *page;
 	struct list_head lru;		/* per cgroup LRU list */
 };
Index: mmotm-0811/mm/page_cgroup.c
===================================================================
--- mmotm-0811.orig/mm/page_cgroup.c
+++ mmotm-0811/mm/page_cgroup.c
@@ -15,7 +15,7 @@ static void __meminit
 __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
 {
 	pc->flags = 0;
-	pc->mem_cgroup = NULL;
+	pc->mem_cgroup = 0;
 	pc->page = pfn_to_page(pfn);
 	INIT_LIST_HEAD(&pc->lru);
 }
Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
 #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
 static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
 
-/* Must be called under rcu_read_lock */
-static struct mem_cgroup *id_to_memcg(unsigned short id)
+/* Must be called under rcu_read_lock, set safe==true if under lock */
+static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
 {
 	struct mem_cgroup *ret;
 	/* see mem_cgroup_free() */
-	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
+	ret = rcu_dereference_check(mem_cgroups[id],
+				rch_read_lock_held() || safe);
 	if (likely(ret && ret->valid))
 		return ret;
 	return NULL;
@@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
 static struct mem_cgroup_per_zone *
 page_cgroup_zoneinfo(struct page_cgroup *pc)
 {
-	struct mem_cgroup *mem = pc->mem_cgroup;
+	/*
+	 * The caller should guarantee this "pc" is under lock. In typical
+	 * case, this function is called by lru function with zone->lru_lock.
+	 * It is a safe access.
+	 */
+	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
 	int nid = page_cgroup_nid(pc);
 	int zid = page_cgroup_zid(pc);
 
@@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
 	return (mem == root_mem_cgroup);
 }
 
+static inline bool mem_cgroup_is_rootid(unsigned short id)
+{
+	return (id == 1);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
 	 */
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_is_rootid(pc->mem_cgroup))
 		return;
 	VM_BUG_ON(list_empty(&pc->lru));
 	list_del_init(&pc->lru);
@@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
 	 */
 	smp_rmb();
 	/* unused or root page is not rotated. */
-	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
 	SetPageCgroupAcctLRU(pc);
-	if (mem_cgroup_is_root(pc->mem_cgroup))
+	if (mem_cgroup_is_rootid(pc->mem_cgroup))
 		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
@@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
 		return;
 
 	lock_page_cgroup(pc);
-	mem = pc->mem_cgroup;
+	mem = id_to_memcg(pc->mem_cgroup, true);
 	if (!mem || !PageCgroupUsed(pc))
 		goto done;
 
@@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = id_to_memcg(pc->mem_cgroup, true);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 	} else if (PageSwapCache(page)) {
 		ent.val = page_private(page);
 		id = lookup_swap_cgroup(ent);
 		rcu_read_lock();
-		mem = id_to_memcg(id);
+		mem = id_to_memcg(id, false);
 		if (mem && !css_tryget(&mem->css))
 			mem = NULL;
 		rcu_read_unlock();
@@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
 		return;
 	}
 
-	pc->mem_cgroup = mem;
+	pc->mem_cgroup = css_id(&mem->css);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
 	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
@@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
 	VM_BUG_ON(PageLRU(pc->page));
 	VM_BUG_ON(!PageCgroupLocked(pc));
 	VM_BUG_ON(!PageCgroupUsed(pc));
-	VM_BUG_ON(pc->mem_cgroup != from);
+	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
 
 	if (PageCgroupFileMapped(pc)) {
 		/* Update mapped_file data for mem_cgroup */
@@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
 		mem_cgroup_cancel_charge(from);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc->mem_cgroup = css_id(&to->css);
 	mem_cgroup_charge_statistics(to, pc, true);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
 {
 	int ret = -EINVAL;
 	lock_page_cgroup(pc);
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
+	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
 		__mem_cgroup_move_account(pc, from, to, uncharge);
 		ret = 0;
 	}
-	unlock_page_cgroup(pc);
+	rcu_read_unlock();
 	/*
 	 * check events
 	 */
@@ -2244,7 +2255,7 @@ __mem_cgroup_commit_charge_swapin(struct
 
 		id = swap_cgroup_record(ent, 0);
 		rcu_read_lock();
-		memcg = id_to_memcg(id);
+		memcg = id_to_memcg(id, false);
 		if (memcg) {
 			/*
 			 * This recorded memcg can be obsolete one. So, avoid
@@ -2354,7 +2365,7 @@ __mem_cgroup_uncharge_common(struct page
 
 	lock_page_cgroup(pc);
 
-	mem = pc->mem_cgroup;
+	mem = id_to_memcg(pc->mem_cgroup, true);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -2509,7 +2520,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
 
 	id = swap_cgroup_record(ent, 0);
 	rcu_read_lock();
-	memcg = id_to_memcg(id);
+	memcg = id_to_memcg(id, false);
 	if (memcg) {
 		/*
 		 * We uncharge this because swap is freed.
@@ -2600,7 +2611,7 @@ int mem_cgroup_prepare_migration(struct 
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		mem = pc->mem_cgroup;
+		mem = id_to_memcg(pc->mem_cgroup, true);
 		css_get(&mem->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -4442,7 +4453,8 @@ static int is_target_pte_for_mc(struct v
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) &&
+			id_to_memcg(pc->mem_cgroup, true) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 4/5] memcg: lockless update of file_mapped
  2010-08-20  9:55 ` KAMEZAWA Hiroyuki
@ 2010-08-20 10:02   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

No changes from v4.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

At accounting file events per memory cgroup, we need to find memory cgroup
via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup().

But, considering the context which page-cgroup for files are accessed,
we can use alternative light-weight mutual execusion in the most case.
At handling file-caches, the only race we have to take care of is "moving"
account, IOW, overwriting page_cgroup->mem_cgroup. Because file status
update is done while the page-cache is in stable state, we don't have to
take care of race with charge/uncharge.

Unlike charge/uncharge, "move" happens not so frequently. It happens only when
rmdir() and task-moving (with a special settings.)
This patch adds a race-checker for file-cache-status accounting v.s. account
moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
The routine for account move 
  1. Increment it before start moving
  2. Call synchronize_rcu()
  3. Decrement it after the end of moving.
By this, file-status-counting routine can check it needs to call
lock_page_cgroup(). In most case, I doesn't need to call it.


Changelog: 20100804
 - added a comment for possible optimization hint.
Changelog: 20100730
 - some cleanup.
Changelog: 20100729
 - replaced __this_cpu_xxx() with this_cpu_xxx
   (because we don't call spinlock)
 - added VM_BUG_ON().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 67 insertions(+), 12 deletions(-)

Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -90,6 +90,7 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
 	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
 	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
+	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
 
 	MEM_CGROUP_STAT_NSTATS,
 };
@@ -1086,7 +1087,50 @@ static unsigned int get_swappiness(struc
 	return swappiness;
 }
 
-/* A routine for testing mem is not under move_account */
+static void mem_cgroup_start_move(struct mem_cgroup *mem)
+{
+	int cpu;
+	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
+	spin_lock(&mc.lock);
+	/* TODO: Can we optimize this by for_each_online_cpu() ? */
+	for_each_possible_cpu(cpu)
+		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
+	spin_unlock(&mc.lock);
+
+	synchronize_rcu();
+}
+
+static void mem_cgroup_end_move(struct mem_cgroup *mem)
+{
+	int cpu;
+
+	if (!mem)
+		return;
+	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
+	spin_lock(&mc.lock);
+	for_each_possible_cpu(cpu)
+		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
+	spin_unlock(&mc.lock);
+}
+
+/*
+ * mem_cgroup_is_moved -- checking a cgroup is mc.from target or not.
+ *                          used for avoiding race.
+ * mem_cgroup_under_move -- checking a cgroup is mc.from or mc.to or
+ *			    under hierarchy of them. used for waiting at
+ *			    memory pressure.
+ * Result of is_moved can be trusted until the end of rcu_read_unlock().
+ * The caller must do
+ *	rcu_read_lock();
+ *	result = mem_cgroup_is_moved();
+ *	.....make use of result here....
+ *	rcu_read_unlock();
+ */
+static bool mem_cgroup_is_moved(struct mem_cgroup *mem)
+{
+	VM_BUG_ON(!rcu_read_lock_held());
+	return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
+}
 
 static bool mem_cgroup_under_move(struct mem_cgroup *mem)
 {
@@ -1502,29 +1546,36 @@ void mem_cgroup_update_file_mapped(struc
 {
 	struct mem_cgroup *mem;
 	struct page_cgroup *pc;
+	bool need_lock = false;
 
 	pc = lookup_page_cgroup(page);
 	if (unlikely(!pc))
 		return;
-
-	lock_page_cgroup(pc);
+	rcu_read_lock();
 	mem = id_to_memcg(pc->mem_cgroup, true);
-	if (!mem || !PageCgroupUsed(pc))
+	if (likely(mem)) {
+		if (mem_cgroup_is_moved(mem)) {
+			/* need to serialize with move_account */
+			lock_page_cgroup(pc);
+			need_lock = true;
+			mem = id_to_memcg(pc->mem_cgroup, true);
+			if (unlikely(!mem))
+				goto done;
+		}
+	}
+	if (unlikely(!PageCgroupUsed(pc)))
 		goto done;
-
-	/*
-	 * Preemption is already disabled. We can use __this_cpu_xxx
-	 */
 	if (val > 0) {
-		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
+		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
 		SetPageCgroupFileMapped(pc);
 	} else {
-		__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
+		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
 		ClearPageCgroupFileMapped(pc);
 	}
-
 done:
-	unlock_page_cgroup(pc);
+	if (need_lock)
+		unlock_page_cgroup(pc);
+	rcu_read_unlock();
 }
 
 /*
@@ -3064,6 +3115,7 @@ move_account:
 		lru_add_drain_all();
 		drain_all_stock_sync();
 		ret = 0;
+		mem_cgroup_start_move(mem);
 		for_each_node_state(node, N_HIGH_MEMORY) {
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
 				enum lru_list l;
@@ -3077,6 +3129,7 @@ move_account:
 			if (ret)
 				break;
 		}
+		mem_cgroup_end_move(mem);
 		memcg_oom_recover(mem);
 		/* it seems parent cgroup doesn't have enough mem */
 		if (ret == -ENOMEM)
@@ -4563,6 +4616,7 @@ static void mem_cgroup_clear_mc(void)
 	mc.to = NULL;
 	mc.moving_task = NULL;
 	spin_unlock(&mc.lock);
+	mem_cgroup_end_move(from);
 	memcg_oom_recover(from);
 	memcg_oom_recover(to);
 	wake_up_all(&mc.waitq);
@@ -4593,6 +4647,7 @@ static int mem_cgroup_can_attach(struct 
 			VM_BUG_ON(mc.moved_charge);
 			VM_BUG_ON(mc.moved_swap);
 			VM_BUG_ON(mc.moving_task);
+			mem_cgroup_start_move(from);
 			spin_lock(&mc.lock);
 			mc.from = from;
 			mc.to = mem;


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 4/5] memcg: lockless update of file_mapped
@ 2010-08-20 10:02   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

No changes from v4.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

At accounting file events per memory cgroup, we need to find memory cgroup
via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup().

But, considering the context which page-cgroup for files are accessed,
we can use alternative light-weight mutual execusion in the most case.
At handling file-caches, the only race we have to take care of is "moving"
account, IOW, overwriting page_cgroup->mem_cgroup. Because file status
update is done while the page-cache is in stable state, we don't have to
take care of race with charge/uncharge.

Unlike charge/uncharge, "move" happens not so frequently. It happens only when
rmdir() and task-moving (with a special settings.)
This patch adds a race-checker for file-cache-status accounting v.s. account
moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
The routine for account move 
  1. Increment it before start moving
  2. Call synchronize_rcu()
  3. Decrement it after the end of moving.
By this, file-status-counting routine can check it needs to call
lock_page_cgroup(). In most case, I doesn't need to call it.


Changelog: 20100804
 - added a comment for possible optimization hint.
Changelog: 20100730
 - some cleanup.
Changelog: 20100729
 - replaced __this_cpu_xxx() with this_cpu_xxx
   (because we don't call spinlock)
 - added VM_BUG_ON().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 67 insertions(+), 12 deletions(-)

Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -90,6 +90,7 @@ enum mem_cgroup_stat_index {
 	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
 	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
 	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
+	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
 
 	MEM_CGROUP_STAT_NSTATS,
 };
@@ -1086,7 +1087,50 @@ static unsigned int get_swappiness(struc
 	return swappiness;
 }
 
-/* A routine for testing mem is not under move_account */
+static void mem_cgroup_start_move(struct mem_cgroup *mem)
+{
+	int cpu;
+	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
+	spin_lock(&mc.lock);
+	/* TODO: Can we optimize this by for_each_online_cpu() ? */
+	for_each_possible_cpu(cpu)
+		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
+	spin_unlock(&mc.lock);
+
+	synchronize_rcu();
+}
+
+static void mem_cgroup_end_move(struct mem_cgroup *mem)
+{
+	int cpu;
+
+	if (!mem)
+		return;
+	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
+	spin_lock(&mc.lock);
+	for_each_possible_cpu(cpu)
+		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
+	spin_unlock(&mc.lock);
+}
+
+/*
+ * mem_cgroup_is_moved -- checking a cgroup is mc.from target or not.
+ *                          used for avoiding race.
+ * mem_cgroup_under_move -- checking a cgroup is mc.from or mc.to or
+ *			    under hierarchy of them. used for waiting at
+ *			    memory pressure.
+ * Result of is_moved can be trusted until the end of rcu_read_unlock().
+ * The caller must do
+ *	rcu_read_lock();
+ *	result = mem_cgroup_is_moved();
+ *	.....make use of result here....
+ *	rcu_read_unlock();
+ */
+static bool mem_cgroup_is_moved(struct mem_cgroup *mem)
+{
+	VM_BUG_ON(!rcu_read_lock_held());
+	return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
+}
 
 static bool mem_cgroup_under_move(struct mem_cgroup *mem)
 {
@@ -1502,29 +1546,36 @@ void mem_cgroup_update_file_mapped(struc
 {
 	struct mem_cgroup *mem;
 	struct page_cgroup *pc;
+	bool need_lock = false;
 
 	pc = lookup_page_cgroup(page);
 	if (unlikely(!pc))
 		return;
-
-	lock_page_cgroup(pc);
+	rcu_read_lock();
 	mem = id_to_memcg(pc->mem_cgroup, true);
-	if (!mem || !PageCgroupUsed(pc))
+	if (likely(mem)) {
+		if (mem_cgroup_is_moved(mem)) {
+			/* need to serialize with move_account */
+			lock_page_cgroup(pc);
+			need_lock = true;
+			mem = id_to_memcg(pc->mem_cgroup, true);
+			if (unlikely(!mem))
+				goto done;
+		}
+	}
+	if (unlikely(!PageCgroupUsed(pc)))
 		goto done;
-
-	/*
-	 * Preemption is already disabled. We can use __this_cpu_xxx
-	 */
 	if (val > 0) {
-		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
+		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
 		SetPageCgroupFileMapped(pc);
 	} else {
-		__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
+		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
 		ClearPageCgroupFileMapped(pc);
 	}
-
 done:
-	unlock_page_cgroup(pc);
+	if (need_lock)
+		unlock_page_cgroup(pc);
+	rcu_read_unlock();
 }
 
 /*
@@ -3064,6 +3115,7 @@ move_account:
 		lru_add_drain_all();
 		drain_all_stock_sync();
 		ret = 0;
+		mem_cgroup_start_move(mem);
 		for_each_node_state(node, N_HIGH_MEMORY) {
 			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
 				enum lru_list l;
@@ -3077,6 +3129,7 @@ move_account:
 			if (ret)
 				break;
 		}
+		mem_cgroup_end_move(mem);
 		memcg_oom_recover(mem);
 		/* it seems parent cgroup doesn't have enough mem */
 		if (ret == -ENOMEM)
@@ -4563,6 +4616,7 @@ static void mem_cgroup_clear_mc(void)
 	mc.to = NULL;
 	mc.moving_task = NULL;
 	spin_unlock(&mc.lock);
+	mem_cgroup_end_move(from);
 	memcg_oom_recover(from);
 	memcg_oom_recover(to);
 	wake_up_all(&mc.waitq);
@@ -4593,6 +4647,7 @@ static int mem_cgroup_can_attach(struct 
 			VM_BUG_ON(mc.moved_charge);
 			VM_BUG_ON(mc.moved_swap);
 			VM_BUG_ON(mc.moving_task);
+			mem_cgroup_start_move(from);
 			spin_lock(&mc.lock);
 			mc.from = from;
 			mc.to = mem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 5/5] memcg: generic file accounting update function
  2010-08-20  9:55 ` KAMEZAWA Hiroyuki
@ 2010-08-20 10:03   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

No changes from v4.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Preparing for adding new status arounf file caches.(dirty, writeback,etc..)
Using a unified macro and more generic names.
All counters will have the same rule for updating.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h  |   24 ++++++++++++++++++++++
 include/linux/page_cgroup.h |   19 ++++++++++++------
 mm/memcontrol.c             |   46 ++++++++++++++++++--------------------------
 3 files changed, 56 insertions(+), 33 deletions(-)

Index: mmotm-0811/include/linux/memcontrol.h
===================================================================
--- mmotm-0811.orig/include/linux/memcontrol.h
+++ mmotm-0811/include/linux/memcontrol.h
@@ -25,6 +25,30 @@ struct page_cgroup;
 struct page;
 struct mm_struct;
 
+/*
+ * Per-cpu Statistics for memory cgroup.
+ */
+enum mem_cgroup_stat_index {
+	/*
+	 * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
+	 */
+	MEM_CGROUP_STAT_CACHE,		/* # of pages charged as cache */
+	MEM_CGROUP_STAT_RSS,	   /* # of pages charged as anon rss */
+	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
+	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
+	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
+	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
+	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
+	/* When you add new member for file-stat, please update page_cgroup.h */
+	MEM_CGROUP_FSTAT_BASE,
+	MEM_CGROUP_FSTAT_FILE_MAPPED = MEM_CGROUP_FSTAT_BASE,
+	MEM_CGROUP_FSTAT_END,
+	MEM_CGROUP_STAT_NSTATS = MEM_CGROUP_FSTAT_END,
+};
+
+#define MEMCG_FSTAT_IDX(idx)	((idx) - MEM_CGROUP_FSTAT_BASE)
+#define NR_FILE_FLAGS_MEMCG ((MEM_CGROUP_FSTAT_END - MEM_CGROUP_FSTAT_BASE))
+
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct list_head *dst,
 					unsigned long *scanned, int order,
Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -76,24 +76,6 @@ static int really_do_swap_account __init
 #define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
 #define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
 
-/*
- * Statistics for memory cgroup.
- */
-enum mem_cgroup_stat_index {
-	/*
-	 * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
-	 */
-	MEM_CGROUP_STAT_CACHE, 	   /* # of pages charged as cache */
-	MEM_CGROUP_STAT_RSS,	   /* # of pages charged as anon rss */
-	MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
-	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
-	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
-	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
-	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
-	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
-
-	MEM_CGROUP_STAT_NSTATS,
-};
 
 struct mem_cgroup_stat_cpu {
 	s64 count[MEM_CGROUP_STAT_NSTATS];
@@ -1545,7 +1527,8 @@ bool mem_cgroup_handle_oom(struct mem_cg
  * Currently used to update mapped file statistics, but the routine can be
  * generalized to update other statistics as well.
  */
-void mem_cgroup_update_file_mapped(struct page *page, int val)
+static void
+mem_cgroup_update_file_stat(struct page *page, unsigned int idx, int val)
 {
 	struct mem_cgroup *mem;
 	struct page_cgroup *pc;
@@ -1569,11 +1552,11 @@ void mem_cgroup_update_file_mapped(struc
 	if (unlikely(!PageCgroupUsed(pc)))
 		goto done;
 	if (val > 0) {
-		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		SetPageCgroupFileMapped(pc);
+		this_cpu_inc(mem->stat->count[idx]);
+		set_bit(fflag_idx(MEMCG_FSTAT_IDX(idx)), &pc->flags);
 	} else {
-		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		ClearPageCgroupFileMapped(pc);
+		this_cpu_dec(mem->stat->count[idx]);
+		clear_bit(fflag_idx(MEMCG_FSTAT_IDX(idx)), &pc->flags);
 	}
 done:
 	if (need_lock)
@@ -1581,6 +1564,12 @@ done:
 	rcu_read_unlock();
 }
 
+void mem_cgroup_update_file_mapped(struct page *page, int val)
+{
+	return mem_cgroup_update_file_stat(page,
+		MEM_CGROUP_FSTAT_FILE_MAPPED, val);
+}
+
 /*
  * size of first charge trial. "32" comes from vmscan.c's magic value.
  * TODO: maybe necessary to use big numbers in big irons.
@@ -2040,17 +2029,20 @@ static void __mem_cgroup_commit_charge(s
 static void __mem_cgroup_move_account(struct page_cgroup *pc,
 	struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
 {
+	int i;
 	VM_BUG_ON(from == to);
 	VM_BUG_ON(PageLRU(pc->page));
 	VM_BUG_ON(!PageCgroupLocked(pc));
 	VM_BUG_ON(!PageCgroupUsed(pc));
 	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
 
-	if (PageCgroupFileMapped(pc)) {
+	for (i = MEM_CGROUP_FSTAT_BASE; i < MEM_CGROUP_FSTAT_END; ++i) {
+		if (!test_bit(fflag_idx(MEMCG_FSTAT_IDX(i)), &pc->flags))
+			continue;
 		/* Update mapped_file data for mem_cgroup */
 		preempt_disable();
-		__this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		__this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
+		__this_cpu_dec(from->stat->count[i]);
+		__this_cpu_inc(to->stat->count[i]);
 		preempt_enable();
 	}
 	mem_cgroup_charge_statistics(from, pc, false);
@@ -3479,7 +3471,7 @@ static int mem_cgroup_get_local_stat(str
 	s->stat[MCS_CACHE] += val * PAGE_SIZE;
 	val = mem_cgroup_read_stat(mem, MEM_CGROUP_STAT_RSS);
 	s->stat[MCS_RSS] += val * PAGE_SIZE;
-	val = mem_cgroup_read_stat(mem, MEM_CGROUP_STAT_FILE_MAPPED);
+	val = mem_cgroup_read_stat(mem, MEM_CGROUP_FSTAT_FILE_MAPPED);
 	s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
 	val = mem_cgroup_read_stat(mem, MEM_CGROUP_STAT_PGPGIN_COUNT);
 	s->stat[MCS_PGPGIN] += val;
Index: mmotm-0811/include/linux/page_cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/page_cgroup.h
+++ mmotm-0811/include/linux/page_cgroup.h
@@ -3,6 +3,7 @@
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 #include <linux/bit_spinlock.h>
+#include <linux/memcontrol.h> /* for flags */
 /*
  * Page Cgroup can be considered as an extended mem_map.
  * A page_cgroup page is associated with every page descriptor. The
@@ -43,10 +44,22 @@ enum {
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
 	PCG_ACCT_LRU, /* page has been accounted for */
-	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
 	PCG_MIGRATION, /* under page migration */
+	PCG_FILE_FLAGS_MEMCG, /* see memcontrol.h */
+	PCG_FILE_FLAGS_MEMCG_END
+		= PCG_FILE_FLAGS_MEMCG + NR_FILE_FLAGS_MEMCG - 1,
 };
 
+/*
+ * file-stat flags are defined regarding to memcg's stat information.
+ * Here, just defines a macro for indexing
+ */
+static inline int fflag_idx(int idx)
+{
+	VM_BUG_ON((idx) >= NR_FILE_FLAGS_MEMCG);
+	return (idx) + PCG_FILE_FLAGS_MEMCG;
+}
+
 #define TESTPCGFLAG(uname, lname)			\
 static inline int PageCgroup##uname(struct page_cgroup *pc)	\
 	{ return test_bit(PCG_##lname, &pc->flags); }
@@ -79,11 +92,6 @@ CLEARPCGFLAG(AcctLRU, ACCT_LRU)
 TESTPCGFLAG(AcctLRU, ACCT_LRU)
 TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
 
-
-SETPCGFLAG(FileMapped, FILE_MAPPED)
-CLEARPCGFLAG(FileMapped, FILE_MAPPED)
-TESTPCGFLAG(FileMapped, FILE_MAPPED)
-
 SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 5/5] memcg: generic file accounting update function
@ 2010-08-20 10:03   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:03 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

No changes from v4.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Preparing for adding new status arounf file caches.(dirty, writeback,etc..)
Using a unified macro and more generic names.
All counters will have the same rule for updating.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h  |   24 ++++++++++++++++++++++
 include/linux/page_cgroup.h |   19 ++++++++++++------
 mm/memcontrol.c             |   46 ++++++++++++++++++--------------------------
 3 files changed, 56 insertions(+), 33 deletions(-)

Index: mmotm-0811/include/linux/memcontrol.h
===================================================================
--- mmotm-0811.orig/include/linux/memcontrol.h
+++ mmotm-0811/include/linux/memcontrol.h
@@ -25,6 +25,30 @@ struct page_cgroup;
 struct page;
 struct mm_struct;
 
+/*
+ * Per-cpu Statistics for memory cgroup.
+ */
+enum mem_cgroup_stat_index {
+	/*
+	 * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
+	 */
+	MEM_CGROUP_STAT_CACHE,		/* # of pages charged as cache */
+	MEM_CGROUP_STAT_RSS,	   /* # of pages charged as anon rss */
+	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
+	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
+	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
+	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
+	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
+	/* When you add new member for file-stat, please update page_cgroup.h */
+	MEM_CGROUP_FSTAT_BASE,
+	MEM_CGROUP_FSTAT_FILE_MAPPED = MEM_CGROUP_FSTAT_BASE,
+	MEM_CGROUP_FSTAT_END,
+	MEM_CGROUP_STAT_NSTATS = MEM_CGROUP_FSTAT_END,
+};
+
+#define MEMCG_FSTAT_IDX(idx)	((idx) - MEM_CGROUP_FSTAT_BASE)
+#define NR_FILE_FLAGS_MEMCG ((MEM_CGROUP_FSTAT_END - MEM_CGROUP_FSTAT_BASE))
+
 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
 					struct list_head *dst,
 					unsigned long *scanned, int order,
Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -76,24 +76,6 @@ static int really_do_swap_account __init
 #define THRESHOLDS_EVENTS_THRESH (7) /* once in 128 */
 #define SOFTLIMIT_EVENTS_THRESH (10) /* once in 1024 */
 
-/*
- * Statistics for memory cgroup.
- */
-enum mem_cgroup_stat_index {
-	/*
-	 * For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
-	 */
-	MEM_CGROUP_STAT_CACHE, 	   /* # of pages charged as cache */
-	MEM_CGROUP_STAT_RSS,	   /* # of pages charged as anon rss */
-	MEM_CGROUP_STAT_FILE_MAPPED,  /* # of pages charged as file rss */
-	MEM_CGROUP_STAT_PGPGIN_COUNT,	/* # of pages paged in */
-	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
-	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
-	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
-	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
-
-	MEM_CGROUP_STAT_NSTATS,
-};
 
 struct mem_cgroup_stat_cpu {
 	s64 count[MEM_CGROUP_STAT_NSTATS];
@@ -1545,7 +1527,8 @@ bool mem_cgroup_handle_oom(struct mem_cg
  * Currently used to update mapped file statistics, but the routine can be
  * generalized to update other statistics as well.
  */
-void mem_cgroup_update_file_mapped(struct page *page, int val)
+static void
+mem_cgroup_update_file_stat(struct page *page, unsigned int idx, int val)
 {
 	struct mem_cgroup *mem;
 	struct page_cgroup *pc;
@@ -1569,11 +1552,11 @@ void mem_cgroup_update_file_mapped(struc
 	if (unlikely(!PageCgroupUsed(pc)))
 		goto done;
 	if (val > 0) {
-		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		SetPageCgroupFileMapped(pc);
+		this_cpu_inc(mem->stat->count[idx]);
+		set_bit(fflag_idx(MEMCG_FSTAT_IDX(idx)), &pc->flags);
 	} else {
-		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		ClearPageCgroupFileMapped(pc);
+		this_cpu_dec(mem->stat->count[idx]);
+		clear_bit(fflag_idx(MEMCG_FSTAT_IDX(idx)), &pc->flags);
 	}
 done:
 	if (need_lock)
@@ -1581,6 +1564,12 @@ done:
 	rcu_read_unlock();
 }
 
+void mem_cgroup_update_file_mapped(struct page *page, int val)
+{
+	return mem_cgroup_update_file_stat(page,
+		MEM_CGROUP_FSTAT_FILE_MAPPED, val);
+}
+
 /*
  * size of first charge trial. "32" comes from vmscan.c's magic value.
  * TODO: maybe necessary to use big numbers in big irons.
@@ -2040,17 +2029,20 @@ static void __mem_cgroup_commit_charge(s
 static void __mem_cgroup_move_account(struct page_cgroup *pc,
 	struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge)
 {
+	int i;
 	VM_BUG_ON(from == to);
 	VM_BUG_ON(PageLRU(pc->page));
 	VM_BUG_ON(!PageCgroupLocked(pc));
 	VM_BUG_ON(!PageCgroupUsed(pc));
 	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
 
-	if (PageCgroupFileMapped(pc)) {
+	for (i = MEM_CGROUP_FSTAT_BASE; i < MEM_CGROUP_FSTAT_END; ++i) {
+		if (!test_bit(fflag_idx(MEMCG_FSTAT_IDX(i)), &pc->flags))
+			continue;
 		/* Update mapped_file data for mem_cgroup */
 		preempt_disable();
-		__this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
-		__this_cpu_inc(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
+		__this_cpu_dec(from->stat->count[i]);
+		__this_cpu_inc(to->stat->count[i]);
 		preempt_enable();
 	}
 	mem_cgroup_charge_statistics(from, pc, false);
@@ -3479,7 +3471,7 @@ static int mem_cgroup_get_local_stat(str
 	s->stat[MCS_CACHE] += val * PAGE_SIZE;
 	val = mem_cgroup_read_stat(mem, MEM_CGROUP_STAT_RSS);
 	s->stat[MCS_RSS] += val * PAGE_SIZE;
-	val = mem_cgroup_read_stat(mem, MEM_CGROUP_STAT_FILE_MAPPED);
+	val = mem_cgroup_read_stat(mem, MEM_CGROUP_FSTAT_FILE_MAPPED);
 	s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
 	val = mem_cgroup_read_stat(mem, MEM_CGROUP_STAT_PGPGIN_COUNT);
 	s->stat[MCS_PGPGIN] += val;
Index: mmotm-0811/include/linux/page_cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/page_cgroup.h
+++ mmotm-0811/include/linux/page_cgroup.h
@@ -3,6 +3,7 @@
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 #include <linux/bit_spinlock.h>
+#include <linux/memcontrol.h> /* for flags */
 /*
  * Page Cgroup can be considered as an extended mem_map.
  * A page_cgroup page is associated with every page descriptor. The
@@ -43,10 +44,22 @@ enum {
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
 	PCG_ACCT_LRU, /* page has been accounted for */
-	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
 	PCG_MIGRATION, /* under page migration */
+	PCG_FILE_FLAGS_MEMCG, /* see memcontrol.h */
+	PCG_FILE_FLAGS_MEMCG_END
+		= PCG_FILE_FLAGS_MEMCG + NR_FILE_FLAGS_MEMCG - 1,
 };
 
+/*
+ * file-stat flags are defined regarding to memcg's stat information.
+ * Here, just defines a macro for indexing
+ */
+static inline int fflag_idx(int idx)
+{
+	VM_BUG_ON((idx) >= NR_FILE_FLAGS_MEMCG);
+	return (idx) + PCG_FILE_FLAGS_MEMCG;
+}
+
 #define TESTPCGFLAG(uname, lname)			\
 static inline int PageCgroup##uname(struct page_cgroup *pc)	\
 	{ return test_bit(PCG_##lname, &pc->flags); }
@@ -79,11 +92,6 @@ CLEARPCGFLAG(AcctLRU, ACCT_LRU)
 TESTPCGFLAG(AcctLRU, ACCT_LRU)
 TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
 
-
-SETPCGFLAG(FileMapped, FILE_MAPPED)
-CLEARPCGFLAG(FileMapped, FILE_MAPPED)
-TESTPCGFLAG(FileMapped, FILE_MAPPED)
-
 SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-20 10:01   ` KAMEZAWA Hiroyuki
  (?)
@ 2010-08-20 10:05   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-20 10:05 UTC (permalink / raw)
  To: linux-kernel


But title....this was [PATCH 3/5].
Sorry.
-kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
  2010-08-20  9:59   ` KAMEZAWA Hiroyuki
@ 2010-08-23  3:35     ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-23  3:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

Hi,

> +/* 0 is unused */
> +static atomic_t mem_cgroup_num;
> +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> +
> +/* Must be called under rcu_read_lock */
> +static struct mem_cgroup *id_to_memcg(unsigned short id)
> +{
> +	struct mem_cgroup *ret;
> +	/* see mem_cgroup_free() */
> +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> +	if (likely(ret && ret->valid))
> +		return ret;
> +	return NULL;
> +}
> +
I prefer "mem" to "ret".

> @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = mem_cgroup_lookup(id);
> +		memcg = id_to_memcg(id);
>  		if (memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
> @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
>  			if (!mem_cgroup_is_root(memcg))
>  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  			mem_cgroup_swap_statistics(memcg, false);
> +			rcu_read_unlock();
>  			mem_cgroup_put(memcg);
> -		}
> -		rcu_read_unlock();
> +		} else
> +			rcu_read_unlock();
>  	}
>  	/*
>  	 * At swapin, we may charge account against cgroup which has no tasks.
> @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  
>  	id = swap_cgroup_record(ent, 0);
>  	rcu_read_lock();
> -	memcg = mem_cgroup_lookup(id);
> +	memcg = id_to_memcg(id);
>  	if (memcg) {
>  		/*
>  		 * We uncharge this because swap is freed.
> @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  		if (!mem_cgroup_is_root(memcg))
>  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  		mem_cgroup_swap_statistics(memcg, false);
> +		rcu_read_unlock();
>  		mem_cgroup_put(memcg);
> -	}
> -	rcu_read_unlock();
> +	} else
> +		rcu_read_unlock();
>  }
>  
>  /**
Could you explain why we need rcu_read_unlock() before mem_cgroup_put() ?
I suspect that it's because mem_cgroup_put() can free the memcg, but do we
need mem->valid then ?


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
@ 2010-08-23  3:35     ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-23  3:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

Hi,

> +/* 0 is unused */
> +static atomic_t mem_cgroup_num;
> +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> +
> +/* Must be called under rcu_read_lock */
> +static struct mem_cgroup *id_to_memcg(unsigned short id)
> +{
> +	struct mem_cgroup *ret;
> +	/* see mem_cgroup_free() */
> +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> +	if (likely(ret && ret->valid))
> +		return ret;
> +	return NULL;
> +}
> +
I prefer "mem" to "ret".

> @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = mem_cgroup_lookup(id);
> +		memcg = id_to_memcg(id);
>  		if (memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
> @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
>  			if (!mem_cgroup_is_root(memcg))
>  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  			mem_cgroup_swap_statistics(memcg, false);
> +			rcu_read_unlock();
>  			mem_cgroup_put(memcg);
> -		}
> -		rcu_read_unlock();
> +		} else
> +			rcu_read_unlock();
>  	}
>  	/*
>  	 * At swapin, we may charge account against cgroup which has no tasks.
> @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  
>  	id = swap_cgroup_record(ent, 0);
>  	rcu_read_lock();
> -	memcg = mem_cgroup_lookup(id);
> +	memcg = id_to_memcg(id);
>  	if (memcg) {
>  		/*
>  		 * We uncharge this because swap is freed.
> @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  		if (!mem_cgroup_is_root(memcg))
>  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  		mem_cgroup_swap_statistics(memcg, false);
> +		rcu_read_unlock();
>  		mem_cgroup_put(memcg);
> -	}
> -	rcu_read_unlock();
> +	} else
> +		rcu_read_unlock();
>  }
>  
>  /**
Could you explain why we need rcu_read_unlock() before mem_cgroup_put() ?
I suspect that it's because mem_cgroup_put() can free the memcg, but do we
need mem->valid then ?


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-20 10:01   ` KAMEZAWA Hiroyuki
@ 2010-08-23  5:32     ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-23  5:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

On Fri, 20 Aug 2010 19:01:32 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> 
> I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> But it will be after this work.
Another off topic. I think we can reduce the size of mem_cgroup by packing
some boolean members into one "unsinged long flags".

> @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
>  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
>  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
>  
> -/* Must be called under rcu_read_lock */
> -static struct mem_cgroup *id_to_memcg(unsigned short id)
> +/* Must be called under rcu_read_lock, set safe==true if under lock */
Do you mean, "Set safe==true if we can ensure by some locks that the id can be
safely dereferenced without rcu_read_lock", right ?

> +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
>  {
>  	struct mem_cgroup *ret;
>  	/* see mem_cgroup_free() */
> -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> +	ret = rcu_dereference_check(mem_cgroups[id],
> +				rch_read_lock_held() || safe);
>  	if (likely(ret && ret->valid))
>  		return ret;
>  	return NULL;

(snip)
> @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
>  	return (mem == root_mem_cgroup);
>  }
>  
> +static inline bool mem_cgroup_is_rootid(unsigned short id)
> +{
> +	return (id == 1);
> +}
> +
It might be better to add

	BUG_ON(newid->id != 1)

in cgroup.c::cgroup_init_idr().


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-23  5:32     ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-23  5:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

On Fri, 20 Aug 2010 19:01:32 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> 
> I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> But it will be after this work.
Another off topic. I think we can reduce the size of mem_cgroup by packing
some boolean members into one "unsinged long flags".

> @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
>  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
>  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
>  
> -/* Must be called under rcu_read_lock */
> -static struct mem_cgroup *id_to_memcg(unsigned short id)
> +/* Must be called under rcu_read_lock, set safe==true if under lock */
Do you mean, "Set safe==true if we can ensure by some locks that the id can be
safely dereferenced without rcu_read_lock", right ?

> +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
>  {
>  	struct mem_cgroup *ret;
>  	/* see mem_cgroup_free() */
> -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> +	ret = rcu_dereference_check(mem_cgroups[id],
> +				rch_read_lock_held() || safe);
>  	if (likely(ret && ret->valid))
>  		return ret;
>  	return NULL;

(snip)
> @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
>  	return (mem == root_mem_cgroup);
>  }
>  
> +static inline bool mem_cgroup_is_rootid(unsigned short id)
> +{
> +	return (id == 1);
> +}
> +
It might be better to add

	BUG_ON(newid->id != 1)

in cgroup.c::cgroup_init_idr().


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/5] memcg: lockless update of file_mapped
  2010-08-20 10:02   ` KAMEZAWA Hiroyuki
@ 2010-08-23  8:50     ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-23  8:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

This patch looks good to me, but I have one question.

Why do we need to acquire sc.lock inside mem_cgroup_(start|end)_move() ?
These functions doesn't access mc.*.

Thanks,
Daisuke Nishimura.

On Fri, 20 Aug 2010 19:02:56 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> No changes from v4.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> At accounting file events per memory cgroup, we need to find memory cgroup
> via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup().
> 
> But, considering the context which page-cgroup for files are accessed,
> we can use alternative light-weight mutual execusion in the most case.
> At handling file-caches, the only race we have to take care of is "moving"
> account, IOW, overwriting page_cgroup->mem_cgroup. Because file status
> update is done while the page-cache is in stable state, we don't have to
> take care of race with charge/uncharge.
> 
> Unlike charge/uncharge, "move" happens not so frequently. It happens only when
> rmdir() and task-moving (with a special settings.)
> This patch adds a race-checker for file-cache-status accounting v.s. account
> moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
> The routine for account move 
>   1. Increment it before start moving
>   2. Call synchronize_rcu()
>   3. Decrement it after the end of moving.
> By this, file-status-counting routine can check it needs to call
> lock_page_cgroup(). In most case, I doesn't need to call it.
> 
> 
> Changelog: 20100804
>  - added a comment for possible optimization hint.
> Changelog: 20100730
>  - some cleanup.
> Changelog: 20100729
>  - replaced __this_cpu_xxx() with this_cpu_xxx
>    (because we don't call spinlock)
>  - added VM_BUG_ON().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 67 insertions(+), 12 deletions(-)
> 
> Index: mmotm-0811/mm/memcontrol.c
> ===================================================================
> --- mmotm-0811.orig/mm/memcontrol.c
> +++ mmotm-0811/mm/memcontrol.c
> @@ -90,6 +90,7 @@ enum mem_cgroup_stat_index {
>  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
>  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>  	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
> +	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
>  
>  	MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -1086,7 +1087,50 @@ static unsigned int get_swappiness(struc
>  	return swappiness;
>  }
>  
> -/* A routine for testing mem is not under move_account */
> +static void mem_cgroup_start_move(struct mem_cgroup *mem)
> +{
> +	int cpu;
> +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> +	spin_lock(&mc.lock);
> +	/* TODO: Can we optimize this by for_each_online_cpu() ? */
> +	for_each_possible_cpu(cpu)
> +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
> +	spin_unlock(&mc.lock);
> +
> +	synchronize_rcu();
> +}
> +
> +static void mem_cgroup_end_move(struct mem_cgroup *mem)
> +{
> +	int cpu;
> +
> +	if (!mem)
> +		return;
> +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> +	spin_lock(&mc.lock);
> +	for_each_possible_cpu(cpu)
> +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
> +	spin_unlock(&mc.lock);
> +}
> +
> +/*
> + * mem_cgroup_is_moved -- checking a cgroup is mc.from target or not.
> + *                          used for avoiding race.
> + * mem_cgroup_under_move -- checking a cgroup is mc.from or mc.to or
> + *			    under hierarchy of them. used for waiting at
> + *			    memory pressure.
> + * Result of is_moved can be trusted until the end of rcu_read_unlock().
> + * The caller must do
> + *	rcu_read_lock();
> + *	result = mem_cgroup_is_moved();
> + *	.....make use of result here....
> + *	rcu_read_unlock();
> + */
> +static bool mem_cgroup_is_moved(struct mem_cgroup *mem)
> +{
> +	VM_BUG_ON(!rcu_read_lock_held());
> +	return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
> +}
>  
>  static bool mem_cgroup_under_move(struct mem_cgroup *mem)
>  {
> @@ -1502,29 +1546,36 @@ void mem_cgroup_update_file_mapped(struc
>  {
>  	struct mem_cgroup *mem;
>  	struct page_cgroup *pc;
> +	bool need_lock = false;
>  
>  	pc = lookup_page_cgroup(page);
>  	if (unlikely(!pc))
>  		return;
> -
> -	lock_page_cgroup(pc);
> +	rcu_read_lock();
>  	mem = id_to_memcg(pc->mem_cgroup, true);
> -	if (!mem || !PageCgroupUsed(pc))
> +	if (likely(mem)) {
> +		if (mem_cgroup_is_moved(mem)) {
> +			/* need to serialize with move_account */
> +			lock_page_cgroup(pc);
> +			need_lock = true;
> +			mem = id_to_memcg(pc->mem_cgroup, true);
> +			if (unlikely(!mem))
> +				goto done;
> +		}
> +	}
> +	if (unlikely(!PageCgroupUsed(pc)))
>  		goto done;
> -
> -	/*
> -	 * Preemption is already disabled. We can use __this_cpu_xxx
> -	 */
>  	if (val > 0) {
> -		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> +		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
>  		SetPageCgroupFileMapped(pc);
>  	} else {
> -		__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> +		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
>  		ClearPageCgroupFileMapped(pc);
>  	}
> -
>  done:
> -	unlock_page_cgroup(pc);
> +	if (need_lock)
> +		unlock_page_cgroup(pc);
> +	rcu_read_unlock();
>  }
>  
>  /*
> @@ -3064,6 +3115,7 @@ move_account:
>  		lru_add_drain_all();
>  		drain_all_stock_sync();
>  		ret = 0;
> +		mem_cgroup_start_move(mem);
>  		for_each_node_state(node, N_HIGH_MEMORY) {
>  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
>  				enum lru_list l;
> @@ -3077,6 +3129,7 @@ move_account:
>  			if (ret)
>  				break;
>  		}
> +		mem_cgroup_end_move(mem);
>  		memcg_oom_recover(mem);
>  		/* it seems parent cgroup doesn't have enough mem */
>  		if (ret == -ENOMEM)
> @@ -4563,6 +4616,7 @@ static void mem_cgroup_clear_mc(void)
>  	mc.to = NULL;
>  	mc.moving_task = NULL;
>  	spin_unlock(&mc.lock);
> +	mem_cgroup_end_move(from);
>  	memcg_oom_recover(from);
>  	memcg_oom_recover(to);
>  	wake_up_all(&mc.waitq);
> @@ -4593,6 +4647,7 @@ static int mem_cgroup_can_attach(struct 
>  			VM_BUG_ON(mc.moved_charge);
>  			VM_BUG_ON(mc.moved_swap);
>  			VM_BUG_ON(mc.moving_task);
> +			mem_cgroup_start_move(from);
>  			spin_lock(&mc.lock);
>  			mc.from = from;
>  			mc.to = mem;
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/5] memcg: lockless update of file_mapped
@ 2010-08-23  8:50     ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-23  8:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

This patch looks good to me, but I have one question.

Why do we need to acquire sc.lock inside mem_cgroup_(start|end)_move() ?
These functions doesn't access mc.*.

Thanks,
Daisuke Nishimura.

On Fri, 20 Aug 2010 19:02:56 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> No changes from v4.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> At accounting file events per memory cgroup, we need to find memory cgroup
> via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup().
> 
> But, considering the context which page-cgroup for files are accessed,
> we can use alternative light-weight mutual execusion in the most case.
> At handling file-caches, the only race we have to take care of is "moving"
> account, IOW, overwriting page_cgroup->mem_cgroup. Because file status
> update is done while the page-cache is in stable state, we don't have to
> take care of race with charge/uncharge.
> 
> Unlike charge/uncharge, "move" happens not so frequently. It happens only when
> rmdir() and task-moving (with a special settings.)
> This patch adds a race-checker for file-cache-status accounting v.s. account
> moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
> The routine for account move 
>   1. Increment it before start moving
>   2. Call synchronize_rcu()
>   3. Decrement it after the end of moving.
> By this, file-status-counting routine can check it needs to call
> lock_page_cgroup(). In most case, I doesn't need to call it.
> 
> 
> Changelog: 20100804
>  - added a comment for possible optimization hint.
> Changelog: 20100730
>  - some cleanup.
> Changelog: 20100729
>  - replaced __this_cpu_xxx() with this_cpu_xxx
>    (because we don't call spinlock)
>  - added VM_BUG_ON().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 67 insertions(+), 12 deletions(-)
> 
> Index: mmotm-0811/mm/memcontrol.c
> ===================================================================
> --- mmotm-0811.orig/mm/memcontrol.c
> +++ mmotm-0811/mm/memcontrol.c
> @@ -90,6 +90,7 @@ enum mem_cgroup_stat_index {
>  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
>  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
>  	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
> +	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
>  
>  	MEM_CGROUP_STAT_NSTATS,
>  };
> @@ -1086,7 +1087,50 @@ static unsigned int get_swappiness(struc
>  	return swappiness;
>  }
>  
> -/* A routine for testing mem is not under move_account */
> +static void mem_cgroup_start_move(struct mem_cgroup *mem)
> +{
> +	int cpu;
> +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> +	spin_lock(&mc.lock);
> +	/* TODO: Can we optimize this by for_each_online_cpu() ? */
> +	for_each_possible_cpu(cpu)
> +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
> +	spin_unlock(&mc.lock);
> +
> +	synchronize_rcu();
> +}
> +
> +static void mem_cgroup_end_move(struct mem_cgroup *mem)
> +{
> +	int cpu;
> +
> +	if (!mem)
> +		return;
> +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> +	spin_lock(&mc.lock);
> +	for_each_possible_cpu(cpu)
> +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
> +	spin_unlock(&mc.lock);
> +}
> +
> +/*
> + * mem_cgroup_is_moved -- checking a cgroup is mc.from target or not.
> + *                          used for avoiding race.
> + * mem_cgroup_under_move -- checking a cgroup is mc.from or mc.to or
> + *			    under hierarchy of them. used for waiting at
> + *			    memory pressure.
> + * Result of is_moved can be trusted until the end of rcu_read_unlock().
> + * The caller must do
> + *	rcu_read_lock();
> + *	result = mem_cgroup_is_moved();
> + *	.....make use of result here....
> + *	rcu_read_unlock();
> + */
> +static bool mem_cgroup_is_moved(struct mem_cgroup *mem)
> +{
> +	VM_BUG_ON(!rcu_read_lock_held());
> +	return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
> +}
>  
>  static bool mem_cgroup_under_move(struct mem_cgroup *mem)
>  {
> @@ -1502,29 +1546,36 @@ void mem_cgroup_update_file_mapped(struc
>  {
>  	struct mem_cgroup *mem;
>  	struct page_cgroup *pc;
> +	bool need_lock = false;
>  
>  	pc = lookup_page_cgroup(page);
>  	if (unlikely(!pc))
>  		return;
> -
> -	lock_page_cgroup(pc);
> +	rcu_read_lock();
>  	mem = id_to_memcg(pc->mem_cgroup, true);
> -	if (!mem || !PageCgroupUsed(pc))
> +	if (likely(mem)) {
> +		if (mem_cgroup_is_moved(mem)) {
> +			/* need to serialize with move_account */
> +			lock_page_cgroup(pc);
> +			need_lock = true;
> +			mem = id_to_memcg(pc->mem_cgroup, true);
> +			if (unlikely(!mem))
> +				goto done;
> +		}
> +	}
> +	if (unlikely(!PageCgroupUsed(pc)))
>  		goto done;
> -
> -	/*
> -	 * Preemption is already disabled. We can use __this_cpu_xxx
> -	 */
>  	if (val > 0) {
> -		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> +		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
>  		SetPageCgroupFileMapped(pc);
>  	} else {
> -		__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> +		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
>  		ClearPageCgroupFileMapped(pc);
>  	}
> -
>  done:
> -	unlock_page_cgroup(pc);
> +	if (need_lock)
> +		unlock_page_cgroup(pc);
> +	rcu_read_unlock();
>  }
>  
>  /*
> @@ -3064,6 +3115,7 @@ move_account:
>  		lru_add_drain_all();
>  		drain_all_stock_sync();
>  		ret = 0;
> +		mem_cgroup_start_move(mem);
>  		for_each_node_state(node, N_HIGH_MEMORY) {
>  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
>  				enum lru_list l;
> @@ -3077,6 +3129,7 @@ move_account:
>  			if (ret)
>  				break;
>  		}
> +		mem_cgroup_end_move(mem);
>  		memcg_oom_recover(mem);
>  		/* it seems parent cgroup doesn't have enough mem */
>  		if (ret == -ENOMEM)
> @@ -4563,6 +4616,7 @@ static void mem_cgroup_clear_mc(void)
>  	mc.to = NULL;
>  	mc.moving_task = NULL;
>  	spin_unlock(&mc.lock);
> +	mem_cgroup_end_move(from);
>  	memcg_oom_recover(from);
>  	memcg_oom_recover(to);
>  	wake_up_all(&mc.waitq);
> @@ -4593,6 +4647,7 @@ static int mem_cgroup_can_attach(struct 
>  			VM_BUG_ON(mc.moved_charge);
>  			VM_BUG_ON(mc.moved_swap);
>  			VM_BUG_ON(mc.moving_task);
> +			mem_cgroup_start_move(from);
>  			spin_lock(&mc.lock);
>  			mc.from = from;
>  			mc.to = mem;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/5] memcg: lockless update of file_mapped
  2010-08-23  8:50     ` Daisuke Nishimura
@ 2010-08-23 23:49       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-23 23:49 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Mon, 23 Aug 2010 17:50:15 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch looks good to me, but I have one question.
> 
> Why do we need to acquire sc.lock inside mem_cgroup_(start|end)_move() ?
> These functions doesn't access mc.*.
> 

just reusing a lock to update status. If you don't like, I'll add a new lock.

Thanks,
-Kame

> Thanks,
> Daisuke Nishimura.
> 
> On Fri, 20 Aug 2010 19:02:56 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > No changes from v4.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > At accounting file events per memory cgroup, we need to find memory cgroup
> > via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup().
> > 
> > But, considering the context which page-cgroup for files are accessed,
> > we can use alternative light-weight mutual execusion in the most case.
> > At handling file-caches, the only race we have to take care of is "moving"
> > account, IOW, overwriting page_cgroup->mem_cgroup. Because file status
> > update is done while the page-cache is in stable state, we don't have to
> > take care of race with charge/uncharge.
> > 
> > Unlike charge/uncharge, "move" happens not so frequently. It happens only when
> > rmdir() and task-moving (with a special settings.)
> > This patch adds a race-checker for file-cache-status accounting v.s. account
> > moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
> > The routine for account move 
> >   1. Increment it before start moving
> >   2. Call synchronize_rcu()
> >   3. Decrement it after the end of moving.
> > By this, file-status-counting routine can check it needs to call
> > lock_page_cgroup(). In most case, I doesn't need to call it.
> > 
> > 
> > Changelog: 20100804
> >  - added a comment for possible optimization hint.
> > Changelog: 20100730
> >  - some cleanup.
> > Changelog: 20100729
> >  - replaced __this_cpu_xxx() with this_cpu_xxx
> >    (because we don't call spinlock)
> >  - added VM_BUG_ON().
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 67 insertions(+), 12 deletions(-)
> > 
> > Index: mmotm-0811/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/memcontrol.c
> > +++ mmotm-0811/mm/memcontrol.c
> > @@ -90,6 +90,7 @@ enum mem_cgroup_stat_index {
> >  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
> >  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> >  	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
> > +	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
> >  
> >  	MEM_CGROUP_STAT_NSTATS,
> >  };
> > @@ -1086,7 +1087,50 @@ static unsigned int get_swappiness(struc
> >  	return swappiness;
> >  }
> >  
> > -/* A routine for testing mem is not under move_account */
> > +static void mem_cgroup_start_move(struct mem_cgroup *mem)
> > +{
> > +	int cpu;
> > +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> > +	spin_lock(&mc.lock);
> > +	/* TODO: Can we optimize this by for_each_online_cpu() ? */
> > +	for_each_possible_cpu(cpu)
> > +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
> > +	spin_unlock(&mc.lock);
> > +
> > +	synchronize_rcu();
> > +}
> > +
> > +static void mem_cgroup_end_move(struct mem_cgroup *mem)
> > +{
> > +	int cpu;
> > +
> > +	if (!mem)
> > +		return;
> > +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> > +	spin_lock(&mc.lock);
> > +	for_each_possible_cpu(cpu)
> > +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
> > +	spin_unlock(&mc.lock);
> > +}
> > +
> > +/*
> > + * mem_cgroup_is_moved -- checking a cgroup is mc.from target or not.
> > + *                          used for avoiding race.
> > + * mem_cgroup_under_move -- checking a cgroup is mc.from or mc.to or
> > + *			    under hierarchy of them. used for waiting at
> > + *			    memory pressure.
> > + * Result of is_moved can be trusted until the end of rcu_read_unlock().
> > + * The caller must do
> > + *	rcu_read_lock();
> > + *	result = mem_cgroup_is_moved();
> > + *	.....make use of result here....
> > + *	rcu_read_unlock();
> > + */
> > +static bool mem_cgroup_is_moved(struct mem_cgroup *mem)
> > +{
> > +	VM_BUG_ON(!rcu_read_lock_held());
> > +	return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
> > +}
> >  
> >  static bool mem_cgroup_under_move(struct mem_cgroup *mem)
> >  {
> > @@ -1502,29 +1546,36 @@ void mem_cgroup_update_file_mapped(struc
> >  {
> >  	struct mem_cgroup *mem;
> >  	struct page_cgroup *pc;
> > +	bool need_lock = false;
> >  
> >  	pc = lookup_page_cgroup(page);
> >  	if (unlikely(!pc))
> >  		return;
> > -
> > -	lock_page_cgroup(pc);
> > +	rcu_read_lock();
> >  	mem = id_to_memcg(pc->mem_cgroup, true);
> > -	if (!mem || !PageCgroupUsed(pc))
> > +	if (likely(mem)) {
> > +		if (mem_cgroup_is_moved(mem)) {
> > +			/* need to serialize with move_account */
> > +			lock_page_cgroup(pc);
> > +			need_lock = true;
> > +			mem = id_to_memcg(pc->mem_cgroup, true);
> > +			if (unlikely(!mem))
> > +				goto done;
> > +		}
> > +	}
> > +	if (unlikely(!PageCgroupUsed(pc)))
> >  		goto done;
> > -
> > -	/*
> > -	 * Preemption is already disabled. We can use __this_cpu_xxx
> > -	 */
> >  	if (val > 0) {
> > -		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > +		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> >  		SetPageCgroupFileMapped(pc);
> >  	} else {
> > -		__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > +		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> >  		ClearPageCgroupFileMapped(pc);
> >  	}
> > -
> >  done:
> > -	unlock_page_cgroup(pc);
> > +	if (need_lock)
> > +		unlock_page_cgroup(pc);
> > +	rcu_read_unlock();
> >  }
> >  
> >  /*
> > @@ -3064,6 +3115,7 @@ move_account:
> >  		lru_add_drain_all();
> >  		drain_all_stock_sync();
> >  		ret = 0;
> > +		mem_cgroup_start_move(mem);
> >  		for_each_node_state(node, N_HIGH_MEMORY) {
> >  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
> >  				enum lru_list l;
> > @@ -3077,6 +3129,7 @@ move_account:
> >  			if (ret)
> >  				break;
> >  		}
> > +		mem_cgroup_end_move(mem);
> >  		memcg_oom_recover(mem);
> >  		/* it seems parent cgroup doesn't have enough mem */
> >  		if (ret == -ENOMEM)
> > @@ -4563,6 +4616,7 @@ static void mem_cgroup_clear_mc(void)
> >  	mc.to = NULL;
> >  	mc.moving_task = NULL;
> >  	spin_unlock(&mc.lock);
> > +	mem_cgroup_end_move(from);
> >  	memcg_oom_recover(from);
> >  	memcg_oom_recover(to);
> >  	wake_up_all(&mc.waitq);
> > @@ -4593,6 +4647,7 @@ static int mem_cgroup_can_attach(struct 
> >  			VM_BUG_ON(mc.moved_charge);
> >  			VM_BUG_ON(mc.moved_swap);
> >  			VM_BUG_ON(mc.moving_task);
> > +			mem_cgroup_start_move(from);
> >  			spin_lock(&mc.lock);
> >  			mc.from = from;
> >  			mc.to = mem;
> > 
> 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/5] memcg: lockless update of file_mapped
@ 2010-08-23 23:49       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-23 23:49 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Mon, 23 Aug 2010 17:50:15 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> This patch looks good to me, but I have one question.
> 
> Why do we need to acquire sc.lock inside mem_cgroup_(start|end)_move() ?
> These functions doesn't access mc.*.
> 

just reusing a lock to update status. If you don't like, I'll add a new lock.

Thanks,
-Kame

> Thanks,
> Daisuke Nishimura.
> 
> On Fri, 20 Aug 2010 19:02:56 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > No changes from v4.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > At accounting file events per memory cgroup, we need to find memory cgroup
> > via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup().
> > 
> > But, considering the context which page-cgroup for files are accessed,
> > we can use alternative light-weight mutual execusion in the most case.
> > At handling file-caches, the only race we have to take care of is "moving"
> > account, IOW, overwriting page_cgroup->mem_cgroup. Because file status
> > update is done while the page-cache is in stable state, we don't have to
> > take care of race with charge/uncharge.
> > 
> > Unlike charge/uncharge, "move" happens not so frequently. It happens only when
> > rmdir() and task-moving (with a special settings.)
> > This patch adds a race-checker for file-cache-status accounting v.s. account
> > moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
> > The routine for account move 
> >   1. Increment it before start moving
> >   2. Call synchronize_rcu()
> >   3. Decrement it after the end of moving.
> > By this, file-status-counting routine can check it needs to call
> > lock_page_cgroup(). In most case, I doesn't need to call it.
> > 
> > 
> > Changelog: 20100804
> >  - added a comment for possible optimization hint.
> > Changelog: 20100730
> >  - some cleanup.
> > Changelog: 20100729
> >  - replaced __this_cpu_xxx() with this_cpu_xxx
> >    (because we don't call spinlock)
> >  - added VM_BUG_ON().
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/memcontrol.c |   79 +++++++++++++++++++++++++++++++++++++++++++++++---------
> >  1 file changed, 67 insertions(+), 12 deletions(-)
> > 
> > Index: mmotm-0811/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/memcontrol.c
> > +++ mmotm-0811/mm/memcontrol.c
> > @@ -90,6 +90,7 @@ enum mem_cgroup_stat_index {
> >  	MEM_CGROUP_STAT_PGPGOUT_COUNT,	/* # of pages paged out */
> >  	MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */
> >  	MEM_CGROUP_EVENTS,	/* incremented at every  pagein/pageout */
> > +	MEM_CGROUP_ON_MOVE,   /* A check for locking move account/status */
> >  
> >  	MEM_CGROUP_STAT_NSTATS,
> >  };
> > @@ -1086,7 +1087,50 @@ static unsigned int get_swappiness(struc
> >  	return swappiness;
> >  }
> >  
> > -/* A routine for testing mem is not under move_account */
> > +static void mem_cgroup_start_move(struct mem_cgroup *mem)
> > +{
> > +	int cpu;
> > +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> > +	spin_lock(&mc.lock);
> > +	/* TODO: Can we optimize this by for_each_online_cpu() ? */
> > +	for_each_possible_cpu(cpu)
> > +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) += 1;
> > +	spin_unlock(&mc.lock);
> > +
> > +	synchronize_rcu();
> > +}
> > +
> > +static void mem_cgroup_end_move(struct mem_cgroup *mem)
> > +{
> > +	int cpu;
> > +
> > +	if (!mem)
> > +		return;
> > +	/* for fast checking in mem_cgroup_update_file_stat() etc..*/
> > +	spin_lock(&mc.lock);
> > +	for_each_possible_cpu(cpu)
> > +		per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) -= 1;
> > +	spin_unlock(&mc.lock);
> > +}
> > +
> > +/*
> > + * mem_cgroup_is_moved -- checking a cgroup is mc.from target or not.
> > + *                          used for avoiding race.
> > + * mem_cgroup_under_move -- checking a cgroup is mc.from or mc.to or
> > + *			    under hierarchy of them. used for waiting at
> > + *			    memory pressure.
> > + * Result of is_moved can be trusted until the end of rcu_read_unlock().
> > + * The caller must do
> > + *	rcu_read_lock();
> > + *	result = mem_cgroup_is_moved();
> > + *	.....make use of result here....
> > + *	rcu_read_unlock();
> > + */
> > +static bool mem_cgroup_is_moved(struct mem_cgroup *mem)
> > +{
> > +	VM_BUG_ON(!rcu_read_lock_held());
> > +	return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
> > +}
> >  
> >  static bool mem_cgroup_under_move(struct mem_cgroup *mem)
> >  {
> > @@ -1502,29 +1546,36 @@ void mem_cgroup_update_file_mapped(struc
> >  {
> >  	struct mem_cgroup *mem;
> >  	struct page_cgroup *pc;
> > +	bool need_lock = false;
> >  
> >  	pc = lookup_page_cgroup(page);
> >  	if (unlikely(!pc))
> >  		return;
> > -
> > -	lock_page_cgroup(pc);
> > +	rcu_read_lock();
> >  	mem = id_to_memcg(pc->mem_cgroup, true);
> > -	if (!mem || !PageCgroupUsed(pc))
> > +	if (likely(mem)) {
> > +		if (mem_cgroup_is_moved(mem)) {
> > +			/* need to serialize with move_account */
> > +			lock_page_cgroup(pc);
> > +			need_lock = true;
> > +			mem = id_to_memcg(pc->mem_cgroup, true);
> > +			if (unlikely(!mem))
> > +				goto done;
> > +		}
> > +	}
> > +	if (unlikely(!PageCgroupUsed(pc)))
> >  		goto done;
> > -
> > -	/*
> > -	 * Preemption is already disabled. We can use __this_cpu_xxx
> > -	 */
> >  	if (val > 0) {
> > -		__this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > +		this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> >  		SetPageCgroupFileMapped(pc);
> >  	} else {
> > -		__this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> > +		this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);
> >  		ClearPageCgroupFileMapped(pc);
> >  	}
> > -
> >  done:
> > -	unlock_page_cgroup(pc);
> > +	if (need_lock)
> > +		unlock_page_cgroup(pc);
> > +	rcu_read_unlock();
> >  }
> >  
> >  /*
> > @@ -3064,6 +3115,7 @@ move_account:
> >  		lru_add_drain_all();
> >  		drain_all_stock_sync();
> >  		ret = 0;
> > +		mem_cgroup_start_move(mem);
> >  		for_each_node_state(node, N_HIGH_MEMORY) {
> >  			for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
> >  				enum lru_list l;
> > @@ -3077,6 +3129,7 @@ move_account:
> >  			if (ret)
> >  				break;
> >  		}
> > +		mem_cgroup_end_move(mem);
> >  		memcg_oom_recover(mem);
> >  		/* it seems parent cgroup doesn't have enough mem */
> >  		if (ret == -ENOMEM)
> > @@ -4563,6 +4616,7 @@ static void mem_cgroup_clear_mc(void)
> >  	mc.to = NULL;
> >  	mc.moving_task = NULL;
> >  	spin_unlock(&mc.lock);
> > +	mem_cgroup_end_move(from);
> >  	memcg_oom_recover(from);
> >  	memcg_oom_recover(to);
> >  	wake_up_all(&mc.waitq);
> > @@ -4593,6 +4647,7 @@ static int mem_cgroup_can_attach(struct 
> >  			VM_BUG_ON(mc.moved_charge);
> >  			VM_BUG_ON(mc.moved_swap);
> >  			VM_BUG_ON(mc.moving_task);
> > +			mem_cgroup_start_move(from);
> >  			spin_lock(&mc.lock);
> >  			mc.from = from;
> >  			mc.to = mem;
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
  2010-08-23  3:35     ` Daisuke Nishimura
@ 2010-08-23 23:51       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-23 23:51 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Mon, 23 Aug 2010 12:35:33 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Hi,
> 
> > +/* 0 is unused */
> > +static atomic_t mem_cgroup_num;
> > +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> > +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> > +
> > +/* Must be called under rcu_read_lock */
> > +static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +{
> > +	struct mem_cgroup *ret;
> > +	/* see mem_cgroup_free() */
> > +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> > +	if (likely(ret && ret->valid))
> > +		return ret;
> > +	return NULL;
> > +}
> > +
> I prefer "mem" to "ret".
> 
Hmm, ok.


> > @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
> >  
> >  		id = swap_cgroup_record(ent, 0);
> >  		rcu_read_lock();
> > -		memcg = mem_cgroup_lookup(id);
> > +		memcg = id_to_memcg(id);
> >  		if (memcg) {
> >  			/*
> >  			 * This recorded memcg can be obsolete one. So, avoid
> > @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
> >  			if (!mem_cgroup_is_root(memcg))
> >  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> >  			mem_cgroup_swap_statistics(memcg, false);
> > +			rcu_read_unlock();
> >  			mem_cgroup_put(memcg);
> > -		}
> > -		rcu_read_unlock();
> > +		} else
> > +			rcu_read_unlock();
> >  	}
> >  	/*
> >  	 * At swapin, we may charge account against cgroup which has no tasks.
> > @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> >  
> >  	id = swap_cgroup_record(ent, 0);
> >  	rcu_read_lock();
> > -	memcg = mem_cgroup_lookup(id);
> > +	memcg = id_to_memcg(id);
> >  	if (memcg) {
> >  		/*
> >  		 * We uncharge this because swap is freed.
> > @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
> >  		if (!mem_cgroup_is_root(memcg))
> >  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> >  		mem_cgroup_swap_statistics(memcg, false);
> > +		rcu_read_unlock();
> >  		mem_cgroup_put(memcg);
> > -	}
> > -	rcu_read_unlock();
> > +	} else
> > +		rcu_read_unlock();
> >  }
> >  
> >  /**
> Could you explain why we need rcu_read_unlock() before mem_cgroup_put() ?
> I suspect that it's because mem_cgroup_put() can free the memcg, but do we
> need mem->valid then ?
> 
mem_cgroup_put() may call synchronize_rcu(). So, we have to unlock before it.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
@ 2010-08-23 23:51       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-23 23:51 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Mon, 23 Aug 2010 12:35:33 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> Hi,
> 
> > +/* 0 is unused */
> > +static atomic_t mem_cgroup_num;
> > +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> > +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> > +
> > +/* Must be called under rcu_read_lock */
> > +static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +{
> > +	struct mem_cgroup *ret;
> > +	/* see mem_cgroup_free() */
> > +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> > +	if (likely(ret && ret->valid))
> > +		return ret;
> > +	return NULL;
> > +}
> > +
> I prefer "mem" to "ret".
> 
Hmm, ok.


> > @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
> >  
> >  		id = swap_cgroup_record(ent, 0);
> >  		rcu_read_lock();
> > -		memcg = mem_cgroup_lookup(id);
> > +		memcg = id_to_memcg(id);
> >  		if (memcg) {
> >  			/*
> >  			 * This recorded memcg can be obsolete one. So, avoid
> > @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
> >  			if (!mem_cgroup_is_root(memcg))
> >  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> >  			mem_cgroup_swap_statistics(memcg, false);
> > +			rcu_read_unlock();
> >  			mem_cgroup_put(memcg);
> > -		}
> > -		rcu_read_unlock();
> > +		} else
> > +			rcu_read_unlock();
> >  	}
> >  	/*
> >  	 * At swapin, we may charge account against cgroup which has no tasks.
> > @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> >  
> >  	id = swap_cgroup_record(ent, 0);
> >  	rcu_read_lock();
> > -	memcg = mem_cgroup_lookup(id);
> > +	memcg = id_to_memcg(id);
> >  	if (memcg) {
> >  		/*
> >  		 * We uncharge this because swap is freed.
> > @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
> >  		if (!mem_cgroup_is_root(memcg))
> >  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> >  		mem_cgroup_swap_statistics(memcg, false);
> > +		rcu_read_unlock();
> >  		mem_cgroup_put(memcg);
> > -	}
> > -	rcu_read_unlock();
> > +	} else
> > +		rcu_read_unlock();
> >  }
> >  
> >  /**
> Could you explain why we need rcu_read_unlock() before mem_cgroup_put() ?
> I suspect that it's because mem_cgroup_put() can free the memcg, but do we
> need mem->valid then ?
> 
mem_cgroup_put() may call synchronize_rcu(). So, we have to unlock before it.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-23  5:32     ` Daisuke Nishimura
@ 2010-08-23 23:52       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-23 23:52 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Mon, 23 Aug 2010 14:32:37 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Fri, 20 Aug 2010 19:01:32 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > 
> > I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> > But it will be after this work.
> Another off topic. I think we can reduce the size of mem_cgroup by packing
> some boolean members into one "unsinged long flags".
> 
I'll use "flags" to remove struct page pointer.


> > @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
> >  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> >  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> >  
> > -/* Must be called under rcu_read_lock */
> > -static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +/* Must be called under rcu_read_lock, set safe==true if under lock */
> Do you mean, "Set safe==true if we can ensure by some locks that the id can be
> safely dereferenced without rcu_read_lock", right ?
> 

yes. that's just for rcu_deerference_check().


> > +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
> >  {
> >  	struct mem_cgroup *ret;
> >  	/* see mem_cgroup_free() */
> > -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> > +	ret = rcu_dereference_check(mem_cgroups[id],
> > +				rch_read_lock_held() || safe);
> >  	if (likely(ret && ret->valid))
> >  		return ret;
> >  	return NULL;
> 
> (snip)
> > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> >  	return (mem == root_mem_cgroup);
> >  }
> >  
> > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > +{
> > +	return (id == 1);
> > +}
> > +
> It might be better to add
> 
> 	BUG_ON(newid->id != 1)
> 
> in cgroup.c::cgroup_init_idr().
> 

Why ??

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-23 23:52       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-23 23:52 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Mon, 23 Aug 2010 14:32:37 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Fri, 20 Aug 2010 19:01:32 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > 
> > I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> > But it will be after this work.
> Another off topic. I think we can reduce the size of mem_cgroup by packing
> some boolean members into one "unsinged long flags".
> 
I'll use "flags" to remove struct page pointer.


> > @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
> >  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> >  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> >  
> > -/* Must be called under rcu_read_lock */
> > -static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +/* Must be called under rcu_read_lock, set safe==true if under lock */
> Do you mean, "Set safe==true if we can ensure by some locks that the id can be
> safely dereferenced without rcu_read_lock", right ?
> 

yes. that's just for rcu_deerference_check().


> > +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
> >  {
> >  	struct mem_cgroup *ret;
> >  	/* see mem_cgroup_free() */
> > -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> > +	ret = rcu_dereference_check(mem_cgroups[id],
> > +				rch_read_lock_held() || safe);
> >  	if (likely(ret && ret->valid))
> >  		return ret;
> >  	return NULL;
> 
> (snip)
> > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> >  	return (mem == root_mem_cgroup);
> >  }
> >  
> > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > +{
> > +	return (id == 1);
> > +}
> > +
> It might be better to add
> 
> 	BUG_ON(newid->id != 1)
> 
> in cgroup.c::cgroup_init_idr().
> 

Why ??

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
  2010-08-23 23:51       ` KAMEZAWA Hiroyuki
@ 2010-08-24  0:19         ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  0:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

> > > @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
> > >  
> > >  		id = swap_cgroup_record(ent, 0);
> > >  		rcu_read_lock();
> > > -		memcg = mem_cgroup_lookup(id);
> > > +		memcg = id_to_memcg(id);
> > >  		if (memcg) {
> > >  			/*
> > >  			 * This recorded memcg can be obsolete one. So, avoid
> > > @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
> > >  			if (!mem_cgroup_is_root(memcg))
> > >  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > >  			mem_cgroup_swap_statistics(memcg, false);
> > > +			rcu_read_unlock();
> > >  			mem_cgroup_put(memcg);
> > > -		}
> > > -		rcu_read_unlock();
> > > +		} else
> > > +			rcu_read_unlock();
> > >  	}
> > >  	/*
> > >  	 * At swapin, we may charge account against cgroup which has no tasks.
> > > @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> > >  
> > >  	id = swap_cgroup_record(ent, 0);
> > >  	rcu_read_lock();
> > > -	memcg = mem_cgroup_lookup(id);
> > > +	memcg = id_to_memcg(id);
> > >  	if (memcg) {
> > >  		/*
> > >  		 * We uncharge this because swap is freed.
> > > @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
> > >  		if (!mem_cgroup_is_root(memcg))
> > >  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > >  		mem_cgroup_swap_statistics(memcg, false);
> > > +		rcu_read_unlock();
> > >  		mem_cgroup_put(memcg);
> > > -	}
> > > -	rcu_read_unlock();
> > > +	} else
> > > +		rcu_read_unlock();
> > >  }
> > >  
> > >  /**
> > Could you explain why we need rcu_read_unlock() before mem_cgroup_put() ?
> > I suspect that it's because mem_cgroup_put() can free the memcg, but do we
> > need mem->valid then ?
> > 
> mem_cgroup_put() may call synchronize_rcu(). So, we have to unlock before it.
> 
Ah, I see. Thank you for your explanation.

Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
@ 2010-08-24  0:19         ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  0:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

> > > @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
> > >  
> > >  		id = swap_cgroup_record(ent, 0);
> > >  		rcu_read_lock();
> > > -		memcg = mem_cgroup_lookup(id);
> > > +		memcg = id_to_memcg(id);
> > >  		if (memcg) {
> > >  			/*
> > >  			 * This recorded memcg can be obsolete one. So, avoid
> > > @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
> > >  			if (!mem_cgroup_is_root(memcg))
> > >  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > >  			mem_cgroup_swap_statistics(memcg, false);
> > > +			rcu_read_unlock();
> > >  			mem_cgroup_put(memcg);
> > > -		}
> > > -		rcu_read_unlock();
> > > +		} else
> > > +			rcu_read_unlock();
> > >  	}
> > >  	/*
> > >  	 * At swapin, we may charge account against cgroup which has no tasks.
> > > @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
> > >  
> > >  	id = swap_cgroup_record(ent, 0);
> > >  	rcu_read_lock();
> > > -	memcg = mem_cgroup_lookup(id);
> > > +	memcg = id_to_memcg(id);
> > >  	if (memcg) {
> > >  		/*
> > >  		 * We uncharge this because swap is freed.
> > > @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
> > >  		if (!mem_cgroup_is_root(memcg))
> > >  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> > >  		mem_cgroup_swap_statistics(memcg, false);
> > > +		rcu_read_unlock();
> > >  		mem_cgroup_put(memcg);
> > > -	}
> > > -	rcu_read_unlock();
> > > +	} else
> > > +		rcu_read_unlock();
> > >  }
> > >  
> > >  /**
> > Could you explain why we need rcu_read_unlock() before mem_cgroup_put() ?
> > I suspect that it's because mem_cgroup_put() can free the memcg, but do we
> > need mem->valid then ?
> > 
> mem_cgroup_put() may call synchronize_rcu(). So, we have to unlock before it.
> 
Ah, I see. Thank you for your explanation.

Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/5] memcg: lockless update of file_mapped
  2010-08-23 23:49       ` KAMEZAWA Hiroyuki
@ 2010-08-24  0:19         ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  0:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

On Tue, 24 Aug 2010 08:49:16 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 23 Aug 2010 17:50:15 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This patch looks good to me, but I have one question.
> > 
> > Why do we need to acquire sc.lock inside mem_cgroup_(start|end)_move() ?
> > These functions doesn't access mc.*.
> > 
> 
> just reusing a lock to update status. If you don't like, I'll add a new lock.
> 
I see. I think it would be enough just to add some comments about it.


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/5] memcg: lockless update of file_mapped
@ 2010-08-24  0:19         ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  0:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, linux-mm, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

On Tue, 24 Aug 2010 08:49:16 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Mon, 23 Aug 2010 17:50:15 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > This patch looks good to me, but I have one question.
> > 
> > Why do we need to acquire sc.lock inside mem_cgroup_(start|end)_move() ?
> > These functions doesn't access mc.*.
> > 
> 
> just reusing a lock to update status. If you don't like, I'll add a new lock.
> 
I see. I think it would be enough just to add some comments about it.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-23 23:52       ` KAMEZAWA Hiroyuki
@ 2010-08-24  1:14         ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  1:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

> > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > >  	return (mem == root_mem_cgroup);
> > >  }
> > >  
> > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > +{
> > > +	return (id == 1);
> > > +}
> > > +
> > It might be better to add
> > 
> > 	BUG_ON(newid->id != 1)
> > 
> > in cgroup.c::cgroup_init_idr().
> > 
> 
> Why ??
> 
Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
use of the fact.
I'm sorry if I miss something.

Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  1:14         ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  1:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

> > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > >  	return (mem == root_mem_cgroup);
> > >  }
> > >  
> > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > +{
> > > +	return (id == 1);
> > > +}
> > > +
> > It might be better to add
> > 
> > 	BUG_ON(newid->id != 1)
> > 
> > in cgroup.c::cgroup_init_idr().
> > 
> 
> Why ??
> 
Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
use of the fact.
I'm sorry if I miss something.

Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-24  1:14         ` Daisuke Nishimura
@ 2010-08-24  1:54           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  1:54 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 10:14:25 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > > >  	return (mem == root_mem_cgroup);
> > > >  }
> > > >  
> > > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > > +{
> > > > +	return (id == 1);
> > > > +}
> > > > +
> > > It might be better to add
> > > 
> > > 	BUG_ON(newid->id != 1)
> > > 
> > > in cgroup.c::cgroup_init_idr().
> > > 
> > 
> > Why ??
> > 
> Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
> use of the fact.
> I'm sorry if I miss something.
> 

Hmm. The function allocating ID does

4530 static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
4531 {
==
4546         spin_lock(&ss->id_lock);
4547         /* Don't use 0. allocates an ID of 1-65535 */
4548         error = idr_get_new_above(&ss->idr, newid, 1, &myid);
4549         spin_unlock(&ss->id_lock);
==

and allocates ID above "1", always.

Adding BUG_ON(newid->id != 1) will mean that we doubt the bitmap function and
consider possibility that new->id == 0.

But, we're 100% sure that it never happens.

I don't think adding a comment is a right thing to do.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  1:54           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  1:54 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 10:14:25 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> > > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > > >  	return (mem == root_mem_cgroup);
> > > >  }
> > > >  
> > > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > > +{
> > > > +	return (id == 1);
> > > > +}
> > > > +
> > > It might be better to add
> > > 
> > > 	BUG_ON(newid->id != 1)
> > > 
> > > in cgroup.c::cgroup_init_idr().
> > > 
> > 
> > Why ??
> > 
> Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
> use of the fact.
> I'm sorry if I miss something.
> 

Hmm. The function allocating ID does

4530 static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
4531 {
==
4546         spin_lock(&ss->id_lock);
4547         /* Don't use 0. allocates an ID of 1-65535 */
4548         error = idr_get_new_above(&ss->idr, newid, 1, &myid);
4549         spin_unlock(&ss->id_lock);
==

and allocates ID above "1", always.

Adding BUG_ON(newid->id != 1) will mean that we doubt the bitmap function and
consider possibility that new->id == 0.

But, we're 100% sure that it never happens.

I don't think adding a comment is a right thing to do.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-24  1:54           ` KAMEZAWA Hiroyuki
@ 2010-08-24  4:04             ` Daisuke Nishimura
  -1 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  4:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

On Tue, 24 Aug 2010 10:54:05 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 24 Aug 2010 10:14:25 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > > > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > > > >  	return (mem == root_mem_cgroup);
> > > > >  }
> > > > >  
> > > > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > > > +{
> > > > > +	return (id == 1);
> > > > > +}
> > > > > +
> > > > It might be better to add
> > > > 
> > > > 	BUG_ON(newid->id != 1)
> > > > 
> > > > in cgroup.c::cgroup_init_idr().
> > > > 
> > > 
> > > Why ??
> > > 
> > Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
> > use of the fact.
> > I'm sorry if I miss something.
> > 
> 
> Hmm. The function allocating ID does
> 
> 4530 static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
> 4531 {
> ==
> 4546         spin_lock(&ss->id_lock);
> 4547         /* Don't use 0. allocates an ID of 1-65535 */
> 4548         error = idr_get_new_above(&ss->idr, newid, 1, &myid);
> 4549         spin_unlock(&ss->id_lock);
> ==
> 
> and allocates ID above "1", always.
> 
> Adding BUG_ON(newid->id != 1) will mean that we doubt the bitmap function and
> consider possibility that new->id == 0.
> 
> But, we're 100% sure that it never happens.
> 
> I don't think adding a comment is a right thing to do.
> 
Okey, I don't have strong requirement to add BUG_ON() anyway.

These patches looks good to me except for some minor points I've commented.

Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  4:04             ` Daisuke Nishimura
  0 siblings, 0 replies; 87+ messages in thread
From: Daisuke Nishimura @ 2010-08-24  4:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, Daisuke Nishimura

On Tue, 24 Aug 2010 10:54:05 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 24 Aug 2010 10:14:25 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
> > > > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > > > >  	return (mem == root_mem_cgroup);
> > > > >  }
> > > > >  
> > > > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > > > +{
> > > > > +	return (id == 1);
> > > > > +}
> > > > > +
> > > > It might be better to add
> > > > 
> > > > 	BUG_ON(newid->id != 1)
> > > > 
> > > > in cgroup.c::cgroup_init_idr().
> > > > 
> > > 
> > > Why ??
> > > 
> > Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
> > use of the fact.
> > I'm sorry if I miss something.
> > 
> 
> Hmm. The function allocating ID does
> 
> 4530 static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
> 4531 {
> ==
> 4546         spin_lock(&ss->id_lock);
> 4547         /* Don't use 0. allocates an ID of 1-65535 */
> 4548         error = idr_get_new_above(&ss->idr, newid, 1, &myid);
> 4549         spin_unlock(&ss->id_lock);
> ==
> 
> and allocates ID above "1", always.
> 
> Adding BUG_ON(newid->id != 1) will mean that we doubt the bitmap function and
> consider possibility that new->id == 0.
> 
> But, we're 100% sure that it never happens.
> 
> I don't think adding a comment is a right thing to do.
> 
Okey, I don't have strong requirement to add BUG_ON() anyway.

These patches looks good to me except for some minor points I've commented.

Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-24  4:04             ` Daisuke Nishimura
@ 2010-08-24  6:05               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  6:05 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 13:04:02 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Tue, 24 Aug 2010 10:54:05 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Tue, 24 Aug 2010 10:14:25 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > > > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > > > > >  	return (mem == root_mem_cgroup);
> > > > > >  }
> > > > > >  
> > > > > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > > > > +{
> > > > > > +	return (id == 1);
> > > > > > +}
> > > > > > +
> > > > > It might be better to add
> > > > > 
> > > > > 	BUG_ON(newid->id != 1)
> > > > > 
> > > > > in cgroup.c::cgroup_init_idr().
> > > > > 
> > > > 
> > > > Why ??
> > > > 
> > > Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
> > > use of the fact.
> > > I'm sorry if I miss something.
> > > 
> > 
> > Hmm. The function allocating ID does
> > 
> > 4530 static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
> > 4531 {
> > ==
> > 4546         spin_lock(&ss->id_lock);
> > 4547         /* Don't use 0. allocates an ID of 1-65535 */
> > 4548         error = idr_get_new_above(&ss->idr, newid, 1, &myid);
> > 4549         spin_unlock(&ss->id_lock);
> > ==
> > 
> > and allocates ID above "1", always.
> > 
> > Adding BUG_ON(newid->id != 1) will mean that we doubt the bitmap function and
> > consider possibility that new->id == 0.
> > 
> > But, we're 100% sure that it never happens.
> > 
> > I don't think adding a comment is a right thing to do.
> > 
> Okey, I don't have strong requirement to add BUG_ON() anyway.
> 
> These patches looks good to me except for some minor points I've commented.
> 

Thank you for review. I'll repost after applying your feedback.

-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  6:05               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  6:05 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: linux-mm, balbir, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 13:04:02 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Tue, 24 Aug 2010 10:54:05 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Tue, 24 Aug 2010 10:14:25 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > 
> > > > > > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> > > > > >  	return (mem == root_mem_cgroup);
> > > > > >  }
> > > > > >  
> > > > > > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > > > > > +{
> > > > > > +	return (id == 1);
> > > > > > +}
> > > > > > +
> > > > > It might be better to add
> > > > > 
> > > > > 	BUG_ON(newid->id != 1)
> > > > > 
> > > > > in cgroup.c::cgroup_init_idr().
> > > > > 
> > > > 
> > > > Why ??
> > > > 
> > > Just to make sure that the root css has id==1. mem_cgroup_is_rootid() make
> > > use of the fact.
> > > I'm sorry if I miss something.
> > > 
> > 
> > Hmm. The function allocating ID does
> > 
> > 4530 static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
> > 4531 {
> > ==
> > 4546         spin_lock(&ss->id_lock);
> > 4547         /* Don't use 0. allocates an ID of 1-65535 */
> > 4548         error = idr_get_new_above(&ss->idr, newid, 1, &myid);
> > 4549         spin_unlock(&ss->id_lock);
> > ==
> > 
> > and allocates ID above "1", always.
> > 
> > Adding BUG_ON(newid->id != 1) will mean that we doubt the bitmap function and
> > consider possibility that new->id == 0.
> > 
> > But, we're 100% sure that it never happens.
> > 
> > I don't think adding a comment is a right thing to do.
> > 
> Okey, I don't have strong requirement to add BUG_ON() anyway.
> 
> These patches looks good to me except for some minor points I've commented.
> 

Thank you for review. I'll repost after applying your feedback.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-24  7:19     ` Greg Thelen
@ 2010-08-24  7:18       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:18 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, menage, lizf

On Tue, 24 Aug 2010 00:19:01 -0700
Greg Thelen <gthelen@google.com> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > CC'ed to Paul Menage and Li Zefan.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> > after successful call of ->create(). css_ID is tightly coupled with
> > css struct itself but it is allocated by ->create() call, IOW,
> > per-subsystem special allocations.
> >
> > To know css_id before creation, this patch adds id_attached() callback.
> > after css_ID allocation. This will be used by memory cgroup's quick lookup
> > routine.
> >
> > Maybe you can think of other implementations as
> > 	- pass ID to ->create()
> > 	or
> > 	- add post_create()
> > 	etc...
> > But when considering dirtiness of codes, this straightforward patch seems
> > good to me. If someone wants post_create(), this patch can be replaced.
> >
> > Changelog: 20100820
> >  - new approarch.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  Documentation/cgroups/cgroups.txt |   10 ++++++++++
> >  include/linux/cgroup.h            |    1 +
> >  kernel/cgroup.c                   |    5 ++++-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> >
> > Index: mmotm-0811/Documentation/cgroups/cgroups.txt
> > ===================================================================
> > --- mmotm-0811.orig/Documentation/cgroups/cgroups.txt
> > +++ mmotm-0811/Documentation/cgroups/cgroups.txt
> > @@ -621,6 +621,16 @@ and root cgroup. Currently this will onl
> >  the default hierarchy (which never has sub-cgroups) and a hierarchy
> >  that is being created/destroyed (and hence has no sub-cgroups).
> >  
> > +void id_attached(struct cgroup_subsys *ss, struct cgroup *root)
> > +(cgroup_mutex and ss->hierarchy_mutex held by caller)
> > +(called only when ss->use_id=1)
> > +
> > +Called when css_id is attached to css. Because css_id is assigned
> > +against "css", css_id is not available until ->create() is called.
> > +If subsystem wants to make use of ID at createtion time, use
> 
> Minor spelling correction: s/createtion/creation/
> 
ok.

> > +this handler. This handler will be called after css_id is assigned
> > +to css. Not necessary to be implemented in usual(see memcontrol.c)
> 
> Maybe this sounds better?
> 
> "This handler is not usually needed.  See memcontrol.c for an example."
> 

Sure. will fix.

> > +
> >  4. Questions
> >  ============
> >  
> > Index: mmotm-0811/include/linux/cgroup.h
> > ===================================================================
> > --- mmotm-0811.orig/include/linux/cgroup.h
> > +++ mmotm-0811/include/linux/cgroup.h
> > @@ -475,6 +475,7 @@ struct cgroup_subsys {
> >  			struct cgroup *cgrp);
> >  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> >  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> > +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> >  
> >  	int subsys_id;
> >  	int active;
> > Index: mmotm-0811/kernel/cgroup.c
> > ===================================================================
> > --- mmotm-0811.orig/kernel/cgroup.c
> > +++ mmotm-0811/kernel/cgroup.c
> > @@ -4618,6 +4618,8 @@ static int __init_or_module cgroup_init_
> >  	newid->stack[0] = newid->id;
> >  	newid->css = rootcss;
> >  	rootcss->id = newid;
> > +	if (ss->id_attached)
> > +		ss->id_attached(ss, dummytop);
> >  	return 0;
> >  }
> >  
> > @@ -4646,7 +4648,8 @@ static int alloc_css_id(struct cgroup_su
> >  	 * see cgroup_populate_dir()
> >  	 */
> >  	rcu_assign_pointer(child_css->id, child_id);
> > -
> > +	if (ss->id_attached)
> > +		ss->id_attached(ss, child);
> >  	return 0;
> >  }
> >  
> 

Thank you for review.

-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-24  7:18       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:18 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, menage, lizf

On Tue, 24 Aug 2010 00:19:01 -0700
Greg Thelen <gthelen@google.com> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > CC'ed to Paul Menage and Li Zefan.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> > after successful call of ->create(). css_ID is tightly coupled with
> > css struct itself but it is allocated by ->create() call, IOW,
> > per-subsystem special allocations.
> >
> > To know css_id before creation, this patch adds id_attached() callback.
> > after css_ID allocation. This will be used by memory cgroup's quick lookup
> > routine.
> >
> > Maybe you can think of other implementations as
> > 	- pass ID to ->create()
> > 	or
> > 	- add post_create()
> > 	etc...
> > But when considering dirtiness of codes, this straightforward patch seems
> > good to me. If someone wants post_create(), this patch can be replaced.
> >
> > Changelog: 20100820
> >  - new approarch.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  Documentation/cgroups/cgroups.txt |   10 ++++++++++
> >  include/linux/cgroup.h            |    1 +
> >  kernel/cgroup.c                   |    5 ++++-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> >
> > Index: mmotm-0811/Documentation/cgroups/cgroups.txt
> > ===================================================================
> > --- mmotm-0811.orig/Documentation/cgroups/cgroups.txt
> > +++ mmotm-0811/Documentation/cgroups/cgroups.txt
> > @@ -621,6 +621,16 @@ and root cgroup. Currently this will onl
> >  the default hierarchy (which never has sub-cgroups) and a hierarchy
> >  that is being created/destroyed (and hence has no sub-cgroups).
> >  
> > +void id_attached(struct cgroup_subsys *ss, struct cgroup *root)
> > +(cgroup_mutex and ss->hierarchy_mutex held by caller)
> > +(called only when ss->use_id=1)
> > +
> > +Called when css_id is attached to css. Because css_id is assigned
> > +against "css", css_id is not available until ->create() is called.
> > +If subsystem wants to make use of ID at createtion time, use
> 
> Minor spelling correction: s/createtion/creation/
> 
ok.

> > +this handler. This handler will be called after css_id is assigned
> > +to css. Not necessary to be implemented in usual(see memcontrol.c)
> 
> Maybe this sounds better?
> 
> "This handler is not usually needed.  See memcontrol.c for an example."
> 

Sure. will fix.

> > +
> >  4. Questions
> >  ============
> >  
> > Index: mmotm-0811/include/linux/cgroup.h
> > ===================================================================
> > --- mmotm-0811.orig/include/linux/cgroup.h
> > +++ mmotm-0811/include/linux/cgroup.h
> > @@ -475,6 +475,7 @@ struct cgroup_subsys {
> >  			struct cgroup *cgrp);
> >  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> >  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> > +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> >  
> >  	int subsys_id;
> >  	int active;
> > Index: mmotm-0811/kernel/cgroup.c
> > ===================================================================
> > --- mmotm-0811.orig/kernel/cgroup.c
> > +++ mmotm-0811/kernel/cgroup.c
> > @@ -4618,6 +4618,8 @@ static int __init_or_module cgroup_init_
> >  	newid->stack[0] = newid->id;
> >  	newid->css = rootcss;
> >  	rootcss->id = newid;
> > +	if (ss->id_attached)
> > +		ss->id_attached(ss, dummytop);
> >  	return 0;
> >  }
> >  
> > @@ -4646,7 +4648,8 @@ static int alloc_css_id(struct cgroup_su
> >  	 * see cgroup_populate_dir()
> >  	 */
> >  	rcu_assign_pointer(child_css->id, child_id);
> > -
> > +	if (ss->id_attached)
> > +		ss->id_attached(ss, child);
> >  	return 0;
> >  }
> >  
> 

Thank you for review.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-20  9:58   ` KAMEZAWA Hiroyuki
@ 2010-08-24  7:19     ` Greg Thelen
  -1 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  7:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, menage, lizf

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> CC'ed to Paul Menage and Li Zefan.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> after successful call of ->create(). css_ID is tightly coupled with
> css struct itself but it is allocated by ->create() call, IOW,
> per-subsystem special allocations.
>
> To know css_id before creation, this patch adds id_attached() callback.
> after css_ID allocation. This will be used by memory cgroup's quick lookup
> routine.
>
> Maybe you can think of other implementations as
> 	- pass ID to ->create()
> 	or
> 	- add post_create()
> 	etc...
> But when considering dirtiness of codes, this straightforward patch seems
> good to me. If someone wants post_create(), this patch can be replaced.
>
> Changelog: 20100820
>  - new approarch.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  Documentation/cgroups/cgroups.txt |   10 ++++++++++
>  include/linux/cgroup.h            |    1 +
>  kernel/cgroup.c                   |    5 ++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
>
> Index: mmotm-0811/Documentation/cgroups/cgroups.txt
> ===================================================================
> --- mmotm-0811.orig/Documentation/cgroups/cgroups.txt
> +++ mmotm-0811/Documentation/cgroups/cgroups.txt
> @@ -621,6 +621,16 @@ and root cgroup. Currently this will onl
>  the default hierarchy (which never has sub-cgroups) and a hierarchy
>  that is being created/destroyed (and hence has no sub-cgroups).
>  
> +void id_attached(struct cgroup_subsys *ss, struct cgroup *root)
> +(cgroup_mutex and ss->hierarchy_mutex held by caller)
> +(called only when ss->use_id=1)
> +
> +Called when css_id is attached to css. Because css_id is assigned
> +against "css", css_id is not available until ->create() is called.
> +If subsystem wants to make use of ID at createtion time, use

Minor spelling correction: s/createtion/creation/

> +this handler. This handler will be called after css_id is assigned
> +to css. Not necessary to be implemented in usual(see memcontrol.c)

Maybe this sounds better?

"This handler is not usually needed.  See memcontrol.c for an example."

> +
>  4. Questions
>  ============
>  
> Index: mmotm-0811/include/linux/cgroup.h
> ===================================================================
> --- mmotm-0811.orig/include/linux/cgroup.h
> +++ mmotm-0811/include/linux/cgroup.h
> @@ -475,6 +475,7 @@ struct cgroup_subsys {
>  			struct cgroup *cgrp);
>  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  
>  	int subsys_id;
>  	int active;
> Index: mmotm-0811/kernel/cgroup.c
> ===================================================================
> --- mmotm-0811.orig/kernel/cgroup.c
> +++ mmotm-0811/kernel/cgroup.c
> @@ -4618,6 +4618,8 @@ static int __init_or_module cgroup_init_
>  	newid->stack[0] = newid->id;
>  	newid->css = rootcss;
>  	rootcss->id = newid;
> +	if (ss->id_attached)
> +		ss->id_attached(ss, dummytop);
>  	return 0;
>  }
>  
> @@ -4646,7 +4648,8 @@ static int alloc_css_id(struct cgroup_su
>  	 * see cgroup_populate_dir()
>  	 */
>  	rcu_assign_pointer(child_css->id, child_id);
> -
> +	if (ss->id_attached)
> +		ss->id_attached(ss, child);
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-24  7:19     ` Greg Thelen
  0 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  7:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki, menage, lizf

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> CC'ed to Paul Menage and Li Zefan.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> after successful call of ->create(). css_ID is tightly coupled with
> css struct itself but it is allocated by ->create() call, IOW,
> per-subsystem special allocations.
>
> To know css_id before creation, this patch adds id_attached() callback.
> after css_ID allocation. This will be used by memory cgroup's quick lookup
> routine.
>
> Maybe you can think of other implementations as
> 	- pass ID to ->create()
> 	or
> 	- add post_create()
> 	etc...
> But when considering dirtiness of codes, this straightforward patch seems
> good to me. If someone wants post_create(), this patch can be replaced.
>
> Changelog: 20100820
>  - new approarch.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  Documentation/cgroups/cgroups.txt |   10 ++++++++++
>  include/linux/cgroup.h            |    1 +
>  kernel/cgroup.c                   |    5 ++++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
>
> Index: mmotm-0811/Documentation/cgroups/cgroups.txt
> ===================================================================
> --- mmotm-0811.orig/Documentation/cgroups/cgroups.txt
> +++ mmotm-0811/Documentation/cgroups/cgroups.txt
> @@ -621,6 +621,16 @@ and root cgroup. Currently this will onl
>  the default hierarchy (which never has sub-cgroups) and a hierarchy
>  that is being created/destroyed (and hence has no sub-cgroups).
>  
> +void id_attached(struct cgroup_subsys *ss, struct cgroup *root)
> +(cgroup_mutex and ss->hierarchy_mutex held by caller)
> +(called only when ss->use_id=1)
> +
> +Called when css_id is attached to css. Because css_id is assigned
> +against "css", css_id is not available until ->create() is called.
> +If subsystem wants to make use of ID at createtion time, use

Minor spelling correction: s/createtion/creation/

> +this handler. This handler will be called after css_id is assigned
> +to css. Not necessary to be implemented in usual(see memcontrol.c)

Maybe this sounds better?

"This handler is not usually needed.  See memcontrol.c for an example."

> +
>  4. Questions
>  ============
>  
> Index: mmotm-0811/include/linux/cgroup.h
> ===================================================================
> --- mmotm-0811.orig/include/linux/cgroup.h
> +++ mmotm-0811/include/linux/cgroup.h
> @@ -475,6 +475,7 @@ struct cgroup_subsys {
>  			struct cgroup *cgrp);
>  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  
>  	int subsys_id;
>  	int active;
> Index: mmotm-0811/kernel/cgroup.c
> ===================================================================
> --- mmotm-0811.orig/kernel/cgroup.c
> +++ mmotm-0811/kernel/cgroup.c
> @@ -4618,6 +4618,8 @@ static int __init_or_module cgroup_init_
>  	newid->stack[0] = newid->id;
>  	newid->css = rootcss;
>  	rootcss->id = newid;
> +	if (ss->id_attached)
> +		ss->id_attached(ss, dummytop);
>  	return 0;
>  }
>  
> @@ -4646,7 +4648,8 @@ static int alloc_css_id(struct cgroup_su
>  	 * see cgroup_populate_dir()
>  	 */
>  	rcu_assign_pointer(child_css->id, child_id);
> -
> +	if (ss->id_attached)
> +		ss->id_attached(ss, child);
>  	return 0;
>  }
>  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
  2010-08-24  7:44     ` Greg Thelen
@ 2010-08-24  7:42       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:42 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 00:44:59 -0700
Greg Thelen <gthelen@google.com> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Now, memory cgroup has an ID per cgroup and make use of it at
> >  - hierarchy walk,
> >  - swap recording.
> >
> > This patch is for making more use of it. The final purpose is
> > to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
> >
> > This patch caches a pointer of memcg in an array. By this, we
> > don't have to call css_lookup() which requires radix-hash walk.
> > This saves some amount of memory footprint at lookup memcg via id.
> >
> > Changelog: 20100811
> >  - adjusted onto mmotm-2010-08-11
> >  - fixed RCU related parts.
> >  - use attach_id() callback.
> >
> > Changelog: 20100804
> >  - fixed description in init/Kconfig
> >
> > Changelog: 20100730
> >  - fixed rcu_read_unlock() placement.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  init/Kconfig    |   10 ++++++
> >  mm/memcontrol.c |   83 ++++++++++++++++++++++++++++++++++++++++++--------------
> >  2 files changed, 73 insertions(+), 20 deletions(-)
> >
> > Index: mmotm-0811/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/memcontrol.c
> > +++ mmotm-0811/mm/memcontrol.c
> > @@ -195,6 +195,7 @@ static void mem_cgroup_oom_notify(struct
> >   */
> >  struct mem_cgroup {
> >  	struct cgroup_subsys_state css;
> > +	int	valid; /* for checking validness under RCU access.*/
> >  	/*
> >  	 * the counter to account for memory usage
> >  	 */
> > @@ -294,6 +295,29 @@ static bool move_file(void)
> >  					&mc.to->move_charge_at_immigrate);
> >  }
> >  
> > +/* 0 is unused */
> > +static atomic_t mem_cgroup_num;
> > +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> > +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> > +
> > +/* Must be called under rcu_read_lock */
> > +static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +{
> > +	struct mem_cgroup *ret;
> > +	/* see mem_cgroup_free() */
> > +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> 
> I think this be rcu_read_lock_held() instead of rch_read_lock_held()?
> 
yes, mayb overwritten by following patch.. thank you for finding.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
@ 2010-08-24  7:42       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:42 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 00:44:59 -0700
Greg Thelen <gthelen@google.com> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Now, memory cgroup has an ID per cgroup and make use of it at
> >  - hierarchy walk,
> >  - swap recording.
> >
> > This patch is for making more use of it. The final purpose is
> > to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
> >
> > This patch caches a pointer of memcg in an array. By this, we
> > don't have to call css_lookup() which requires radix-hash walk.
> > This saves some amount of memory footprint at lookup memcg via id.
> >
> > Changelog: 20100811
> >  - adjusted onto mmotm-2010-08-11
> >  - fixed RCU related parts.
> >  - use attach_id() callback.
> >
> > Changelog: 20100804
> >  - fixed description in init/Kconfig
> >
> > Changelog: 20100730
> >  - fixed rcu_read_unlock() placement.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  init/Kconfig    |   10 ++++++
> >  mm/memcontrol.c |   83 ++++++++++++++++++++++++++++++++++++++++++--------------
> >  2 files changed, 73 insertions(+), 20 deletions(-)
> >
> > Index: mmotm-0811/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/memcontrol.c
> > +++ mmotm-0811/mm/memcontrol.c
> > @@ -195,6 +195,7 @@ static void mem_cgroup_oom_notify(struct
> >   */
> >  struct mem_cgroup {
> >  	struct cgroup_subsys_state css;
> > +	int	valid; /* for checking validness under RCU access.*/
> >  	/*
> >  	 * the counter to account for memory usage
> >  	 */
> > @@ -294,6 +295,29 @@ static bool move_file(void)
> >  					&mc.to->move_charge_at_immigrate);
> >  }
> >  
> > +/* 0 is unused */
> > +static atomic_t mem_cgroup_num;
> > +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> > +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> > +
> > +/* Must be called under rcu_read_lock */
> > +static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +{
> > +	struct mem_cgroup *ret;
> > +	/* see mem_cgroup_free() */
> > +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> 
> I think this be rcu_read_lock_held() instead of rch_read_lock_held()?
> 
yes, mayb overwritten by following patch.. thank you for finding.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
  2010-08-20  9:59   ` KAMEZAWA Hiroyuki
@ 2010-08-24  7:44     ` Greg Thelen
  -1 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  7:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Now, memory cgroup has an ID per cgroup and make use of it at
>  - hierarchy walk,
>  - swap recording.
>
> This patch is for making more use of it. The final purpose is
> to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
>
> This patch caches a pointer of memcg in an array. By this, we
> don't have to call css_lookup() which requires radix-hash walk.
> This saves some amount of memory footprint at lookup memcg via id.
>
> Changelog: 20100811
>  - adjusted onto mmotm-2010-08-11
>  - fixed RCU related parts.
>  - use attach_id() callback.
>
> Changelog: 20100804
>  - fixed description in init/Kconfig
>
> Changelog: 20100730
>  - fixed rcu_read_unlock() placement.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  init/Kconfig    |   10 ++++++
>  mm/memcontrol.c |   83 ++++++++++++++++++++++++++++++++++++++++++--------------
>  2 files changed, 73 insertions(+), 20 deletions(-)
>
> Index: mmotm-0811/mm/memcontrol.c
> ===================================================================
> --- mmotm-0811.orig/mm/memcontrol.c
> +++ mmotm-0811/mm/memcontrol.c
> @@ -195,6 +195,7 @@ static void mem_cgroup_oom_notify(struct
>   */
>  struct mem_cgroup {
>  	struct cgroup_subsys_state css;
> +	int	valid; /* for checking validness under RCU access.*/
>  	/*
>  	 * the counter to account for memory usage
>  	 */
> @@ -294,6 +295,29 @@ static bool move_file(void)
>  					&mc.to->move_charge_at_immigrate);
>  }
>  
> +/* 0 is unused */
> +static atomic_t mem_cgroup_num;
> +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> +
> +/* Must be called under rcu_read_lock */
> +static struct mem_cgroup *id_to_memcg(unsigned short id)
> +{
> +	struct mem_cgroup *ret;
> +	/* see mem_cgroup_free() */
> +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());

I think this be rcu_read_lock_held() instead of rch_read_lock_held()?

> +	if (likely(ret && ret->valid))
> +		return ret;
> +	return NULL;
> +}
> +
> +static void register_memcg_id(struct mem_cgroup *mem)
> +{
> +	int id = css_id(&mem->css);
> +	rcu_assign_pointer(mem_cgroups[id], mem);
> +	VM_BUG_ON(!mem->valid);
> +}
> +
>  /*
>   * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
>   * limit reclaim to prevent infinite loops, if they ever occur.
> @@ -1847,18 +1871,7 @@ static void mem_cgroup_cancel_charge(str
>   * it's concern. (dropping refcnt from swap can be called against removed
>   * memcg.)
>   */
> -static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
> -{
> -	struct cgroup_subsys_state *css;
>  
> -	/* ID 0 is unused ID */
> -	if (!id)
> -		return NULL;
> -	css = css_lookup(&mem_cgroup_subsys, id);
> -	if (!css)
> -		return NULL;
> -	return container_of(css, struct mem_cgroup, css);
> -}
>  
>  struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>  {
> @@ -1879,7 +1892,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
>  		ent.val = page_private(page);
>  		id = lookup_swap_cgroup(ent);
>  		rcu_read_lock();
> -		mem = mem_cgroup_lookup(id);
> +		mem = id_to_memcg(id);
>  		if (mem && !css_tryget(&mem->css))
>  			mem = NULL;
>  		rcu_read_unlock();
> @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = mem_cgroup_lookup(id);
> +		memcg = id_to_memcg(id);
>  		if (memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
> @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
>  			if (!mem_cgroup_is_root(memcg))
>  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  			mem_cgroup_swap_statistics(memcg, false);
> +			rcu_read_unlock();
>  			mem_cgroup_put(memcg);
> -		}
> -		rcu_read_unlock();
> +		} else
> +			rcu_read_unlock();
>  	}
>  	/*
>  	 * At swapin, we may charge account against cgroup which has no tasks.
> @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  
>  	id = swap_cgroup_record(ent, 0);
>  	rcu_read_lock();
> -	memcg = mem_cgroup_lookup(id);
> +	memcg = id_to_memcg(id);
>  	if (memcg) {
>  		/*
>  		 * We uncharge this because swap is freed.
> @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  		if (!mem_cgroup_is_root(memcg))
>  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  		mem_cgroup_swap_statistics(memcg, false);
> +		rcu_read_unlock();
>  		mem_cgroup_put(memcg);
> -	}
> -	rcu_read_unlock();
> +	} else
> +		rcu_read_unlock();
>  }
>  
>  /**
> @@ -4010,6 +4025,9 @@ static struct mem_cgroup *mem_cgroup_all
>  	struct mem_cgroup *mem;
>  	int size = sizeof(struct mem_cgroup);
>  
> +	if (atomic_read(&mem_cgroup_num) == NR_MEMCG_GROUPS)
> +		return NULL;
> +
>  	/* Can be very big if MAX_NUMNODES is very big */
>  	if (size < PAGE_SIZE)
>  		mem = kmalloc(size, GFP_KERNEL);
> @@ -4020,6 +4038,7 @@ static struct mem_cgroup *mem_cgroup_all
>  		return NULL;
>  
>  	memset(mem, 0, size);
> +	mem->valid = 1;
>  	mem->stat = alloc_percpu(struct mem_cgroup_stat_cpu);
>  	if (!mem->stat) {
>  		if (size < PAGE_SIZE)
> @@ -4049,6 +4068,7 @@ static void __mem_cgroup_free(struct mem
>  	mem_cgroup_remove_from_trees(mem);
>  	free_css_id(&mem_cgroup_subsys, &mem->css);
>  
> +	atomic_dec(&mem_cgroup_num);
>  	for_each_node_state(node, N_POSSIBLE)
>  		free_mem_cgroup_per_zone_info(mem, node);
>  
> @@ -4059,6 +4079,19 @@ static void __mem_cgroup_free(struct mem
>  		vfree(mem);
>  }
>  
> +static void mem_cgroup_free(struct mem_cgroup *mem)
> +{
> +	/* No more lookup */
> +	mem->valid = 0;
> +	rcu_assign_pointer(mem_cgroups[css_id(&mem->css)], NULL);
> +	/*
> +	 * Because we call vfree() etc...use synchronize_rcu() rather than
> + 	 * call_rcu();
> + 	 */
> +	synchronize_rcu();
> +	__mem_cgroup_free(mem);
> +}
> +
>  static void mem_cgroup_get(struct mem_cgroup *mem)
>  {
>  	atomic_inc(&mem->refcnt);
> @@ -4068,7 +4101,7 @@ static void __mem_cgroup_put(struct mem_
>  {
>  	if (atomic_sub_and_test(count, &mem->refcnt)) {
>  		struct mem_cgroup *parent = parent_mem_cgroup(mem);
> -		__mem_cgroup_free(mem);
> +		mem_cgroup_free(mem);
>  		if (parent)
>  			mem_cgroup_put(parent);
>  	}
> @@ -4184,13 +4217,22 @@ mem_cgroup_create(struct cgroup_subsys *
>  	atomic_set(&mem->refcnt, 1);
>  	mem->move_charge_at_immigrate = 0;
>  	mutex_init(&mem->thresholds_lock);
> +	atomic_inc(&mem_cgroup_num);
>  	return &mem->css;
>  free_out:
> -	__mem_cgroup_free(mem);
> +	mem_cgroup_free(mem);
>  	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> +static void
> +mem_cgroup_id_attached(struct cgroup_subsys *ss, struct cgroup *cont)
> +{
> +	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> +
> +	register_memcg_id(mem);
> +}
> +
>  static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
>  					struct cgroup *cont)
>  {
> @@ -4714,6 +4756,7 @@ struct cgroup_subsys mem_cgroup_subsys =
>  	.can_attach = mem_cgroup_can_attach,
>  	.cancel_attach = mem_cgroup_cancel_attach,
>  	.attach = mem_cgroup_move_task,
> +	.id_attached = mem_cgroup_id_attached,
>  	.early_init = 0,
>  	.use_id = 1,
>  };
> Index: mmotm-0811/init/Kconfig
> ===================================================================
> --- mmotm-0811.orig/init/Kconfig
> +++ mmotm-0811/init/Kconfig
> @@ -594,6 +594,16 @@ config CGROUP_MEM_RES_CTLR_SWAP
>  	  Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
>  	  size is 4096bytes, 512k per 1Gbytes of swap.
>  
> +config MEM_CGROUP_MAX_GROUPS
> +	int "Maximum number of memory cgroups on a system"
> +	range 1 65535
> +	default 8192 if 64BIT
> +	default 2048 if 32BIT
> +	help
> +	  Memory cgroup has limitation of the number of groups created.
> +	  Please select your favorite value. The more you allow, the more
> +	  memory(a pointer per group) will be consumed.
> +
>  menuconfig CGROUP_SCHED
>  	bool "Group CPU scheduler"
>  	depends on EXPERIMENTAL && CGROUPS

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/5] memcg: use array and ID for quick look up
@ 2010-08-24  7:44     ` Greg Thelen
  0 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  7:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Now, memory cgroup has an ID per cgroup and make use of it at
>  - hierarchy walk,
>  - swap recording.
>
> This patch is for making more use of it. The final purpose is
> to replace page_cgroup->mem_cgroup's pointer to an unsigned short.
>
> This patch caches a pointer of memcg in an array. By this, we
> don't have to call css_lookup() which requires radix-hash walk.
> This saves some amount of memory footprint at lookup memcg via id.
>
> Changelog: 20100811
>  - adjusted onto mmotm-2010-08-11
>  - fixed RCU related parts.
>  - use attach_id() callback.
>
> Changelog: 20100804
>  - fixed description in init/Kconfig
>
> Changelog: 20100730
>  - fixed rcu_read_unlock() placement.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  init/Kconfig    |   10 ++++++
>  mm/memcontrol.c |   83 ++++++++++++++++++++++++++++++++++++++++++--------------
>  2 files changed, 73 insertions(+), 20 deletions(-)
>
> Index: mmotm-0811/mm/memcontrol.c
> ===================================================================
> --- mmotm-0811.orig/mm/memcontrol.c
> +++ mmotm-0811/mm/memcontrol.c
> @@ -195,6 +195,7 @@ static void mem_cgroup_oom_notify(struct
>   */
>  struct mem_cgroup {
>  	struct cgroup_subsys_state css;
> +	int	valid; /* for checking validness under RCU access.*/
>  	/*
>  	 * the counter to account for memory usage
>  	 */
> @@ -294,6 +295,29 @@ static bool move_file(void)
>  					&mc.to->move_charge_at_immigrate);
>  }
>  
> +/* 0 is unused */
> +static atomic_t mem_cgroup_num;
> +#define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> +static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> +
> +/* Must be called under rcu_read_lock */
> +static struct mem_cgroup *id_to_memcg(unsigned short id)
> +{
> +	struct mem_cgroup *ret;
> +	/* see mem_cgroup_free() */
> +	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());

I think this be rcu_read_lock_held() instead of rch_read_lock_held()?

> +	if (likely(ret && ret->valid))
> +		return ret;
> +	return NULL;
> +}
> +
> +static void register_memcg_id(struct mem_cgroup *mem)
> +{
> +	int id = css_id(&mem->css);
> +	rcu_assign_pointer(mem_cgroups[id], mem);
> +	VM_BUG_ON(!mem->valid);
> +}
> +
>  /*
>   * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
>   * limit reclaim to prevent infinite loops, if they ever occur.
> @@ -1847,18 +1871,7 @@ static void mem_cgroup_cancel_charge(str
>   * it's concern. (dropping refcnt from swap can be called against removed
>   * memcg.)
>   */
> -static struct mem_cgroup *mem_cgroup_lookup(unsigned short id)
> -{
> -	struct cgroup_subsys_state *css;
>  
> -	/* ID 0 is unused ID */
> -	if (!id)
> -		return NULL;
> -	css = css_lookup(&mem_cgroup_subsys, id);
> -	if (!css)
> -		return NULL;
> -	return container_of(css, struct mem_cgroup, css);
> -}
>  
>  struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>  {
> @@ -1879,7 +1892,7 @@ struct mem_cgroup *try_get_mem_cgroup_fr
>  		ent.val = page_private(page);
>  		id = lookup_swap_cgroup(ent);
>  		rcu_read_lock();
> -		mem = mem_cgroup_lookup(id);
> +		mem = id_to_memcg(id);
>  		if (mem && !css_tryget(&mem->css))
>  			mem = NULL;
>  		rcu_read_unlock();
> @@ -2231,7 +2244,7 @@ __mem_cgroup_commit_charge_swapin(struct
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = mem_cgroup_lookup(id);
> +		memcg = id_to_memcg(id);
>  		if (memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
> @@ -2240,9 +2253,10 @@ __mem_cgroup_commit_charge_swapin(struct
>  			if (!mem_cgroup_is_root(memcg))
>  				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  			mem_cgroup_swap_statistics(memcg, false);
> +			rcu_read_unlock();
>  			mem_cgroup_put(memcg);
> -		}
> -		rcu_read_unlock();
> +		} else
> +			rcu_read_unlock();
>  	}
>  	/*
>  	 * At swapin, we may charge account against cgroup which has no tasks.
> @@ -2495,7 +2509,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  
>  	id = swap_cgroup_record(ent, 0);
>  	rcu_read_lock();
> -	memcg = mem_cgroup_lookup(id);
> +	memcg = id_to_memcg(id);
>  	if (memcg) {
>  		/*
>  		 * We uncharge this because swap is freed.
> @@ -2504,9 +2518,10 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  		if (!mem_cgroup_is_root(memcg))
>  			res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
>  		mem_cgroup_swap_statistics(memcg, false);
> +		rcu_read_unlock();
>  		mem_cgroup_put(memcg);
> -	}
> -	rcu_read_unlock();
> +	} else
> +		rcu_read_unlock();
>  }
>  
>  /**
> @@ -4010,6 +4025,9 @@ static struct mem_cgroup *mem_cgroup_all
>  	struct mem_cgroup *mem;
>  	int size = sizeof(struct mem_cgroup);
>  
> +	if (atomic_read(&mem_cgroup_num) == NR_MEMCG_GROUPS)
> +		return NULL;
> +
>  	/* Can be very big if MAX_NUMNODES is very big */
>  	if (size < PAGE_SIZE)
>  		mem = kmalloc(size, GFP_KERNEL);
> @@ -4020,6 +4038,7 @@ static struct mem_cgroup *mem_cgroup_all
>  		return NULL;
>  
>  	memset(mem, 0, size);
> +	mem->valid = 1;
>  	mem->stat = alloc_percpu(struct mem_cgroup_stat_cpu);
>  	if (!mem->stat) {
>  		if (size < PAGE_SIZE)
> @@ -4049,6 +4068,7 @@ static void __mem_cgroup_free(struct mem
>  	mem_cgroup_remove_from_trees(mem);
>  	free_css_id(&mem_cgroup_subsys, &mem->css);
>  
> +	atomic_dec(&mem_cgroup_num);
>  	for_each_node_state(node, N_POSSIBLE)
>  		free_mem_cgroup_per_zone_info(mem, node);
>  
> @@ -4059,6 +4079,19 @@ static void __mem_cgroup_free(struct mem
>  		vfree(mem);
>  }
>  
> +static void mem_cgroup_free(struct mem_cgroup *mem)
> +{
> +	/* No more lookup */
> +	mem->valid = 0;
> +	rcu_assign_pointer(mem_cgroups[css_id(&mem->css)], NULL);
> +	/*
> +	 * Because we call vfree() etc...use synchronize_rcu() rather than
> + 	 * call_rcu();
> + 	 */
> +	synchronize_rcu();
> +	__mem_cgroup_free(mem);
> +}
> +
>  static void mem_cgroup_get(struct mem_cgroup *mem)
>  {
>  	atomic_inc(&mem->refcnt);
> @@ -4068,7 +4101,7 @@ static void __mem_cgroup_put(struct mem_
>  {
>  	if (atomic_sub_and_test(count, &mem->refcnt)) {
>  		struct mem_cgroup *parent = parent_mem_cgroup(mem);
> -		__mem_cgroup_free(mem);
> +		mem_cgroup_free(mem);
>  		if (parent)
>  			mem_cgroup_put(parent);
>  	}
> @@ -4184,13 +4217,22 @@ mem_cgroup_create(struct cgroup_subsys *
>  	atomic_set(&mem->refcnt, 1);
>  	mem->move_charge_at_immigrate = 0;
>  	mutex_init(&mem->thresholds_lock);
> +	atomic_inc(&mem_cgroup_num);
>  	return &mem->css;
>  free_out:
> -	__mem_cgroup_free(mem);
> +	mem_cgroup_free(mem);
>  	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> +static void
> +mem_cgroup_id_attached(struct cgroup_subsys *ss, struct cgroup *cont)
> +{
> +	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> +
> +	register_memcg_id(mem);
> +}
> +
>  static int mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
>  					struct cgroup *cont)
>  {
> @@ -4714,6 +4756,7 @@ struct cgroup_subsys mem_cgroup_subsys =
>  	.can_attach = mem_cgroup_can_attach,
>  	.cancel_attach = mem_cgroup_cancel_attach,
>  	.attach = mem_cgroup_move_task,
> +	.id_attached = mem_cgroup_id_attached,
>  	.early_init = 0,
>  	.use_id = 1,
>  };
> Index: mmotm-0811/init/Kconfig
> ===================================================================
> --- mmotm-0811.orig/init/Kconfig
> +++ mmotm-0811/init/Kconfig
> @@ -594,6 +594,16 @@ config CGROUP_MEM_RES_CTLR_SWAP
>  	  Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page
>  	  size is 4096bytes, 512k per 1Gbytes of swap.
>  
> +config MEM_CGROUP_MAX_GROUPS
> +	int "Maximum number of memory cgroups on a system"
> +	range 1 65535
> +	default 8192 if 64BIT
> +	default 2048 if 32BIT
> +	help
> +	  Memory cgroup has limitation of the number of groups created.
> +	  Please select your favorite value. The more you allow, the more
> +	  memory(a pointer per group) will be consumed.
> +
>  menuconfig CGROUP_SCHED
>  	bool "Group CPU scheduler"
>  	depends on EXPERIMENTAL && CGROUPS

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: towards I/O aware memcg v5
  2010-08-20  9:55 ` KAMEZAWA Hiroyuki
@ 2010-08-24  7:46   ` Balbir Singh
  -1 siblings, 0 replies; 87+ messages in thread
From: Balbir Singh @ 2010-08-24  7:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-08-20 18:55:52]:

> This is v5.
> 
> Sorry for delaying...but I had time for resetting myself and..several
> changes are added. I think this version is simpler than v4.
> 
> Major changes from v4 is 
>  a) added kernel/cgroup.c hooks again. (for b)
>  b) make RCU aware. previous version seems dangerous in an extreme case.
> 
> Then, codes are updated. Most of changes are related to RCU.
> 
> Patch brief view:
>  1. add hooks to kernel/cgroup.c for ID management.
>  2. use ID-array in memcg.
>  3. record ID to page_cgroup rather than pointer.
>  4. make update_file_mapped to be RCU aware routine instead of spinlock.
>  5. make update_file_mapped as general-purpose function.
>

Thanks for being persistent, will review the patches with comments in
the relevant patches. 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: towards I/O aware memcg v5
@ 2010-08-24  7:46   ` Balbir Singh
  0 siblings, 0 replies; 87+ messages in thread
From: Balbir Singh @ 2010-08-24  7:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-08-20 18:55:52]:

> This is v5.
> 
> Sorry for delaying...but I had time for resetting myself and..several
> changes are added. I think this version is simpler than v4.
> 
> Major changes from v4 is 
>  a) added kernel/cgroup.c hooks again. (for b)
>  b) make RCU aware. previous version seems dangerous in an extreme case.
> 
> Then, codes are updated. Most of changes are related to RCU.
> 
> Patch brief view:
>  1. add hooks to kernel/cgroup.c for ID management.
>  2. use ID-array in memcg.
>  3. record ID to page_cgroup rather than pointer.
>  4. make update_file_mapped to be RCU aware routine instead of spinlock.
>  5. make update_file_mapped as general-purpose function.
>

Thanks for being persistent, will review the patches with comments in
the relevant patches. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-20 10:01   ` KAMEZAWA Hiroyuki
@ 2010-08-24  7:47     ` Greg Thelen
  -1 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  7:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> But it will be after this work.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Now, addresses of memory cgroup can be calculated by their ID without complex.
> This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
> On 64bit architecture, this offers us more 6bytes room per page_cgroup.
> Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
> some light-weight concurrent access.
>
> We may able to move this id onto flags field but ...go step by step.
>
> Changelog: 20100811
>  - using new rcu APIs, as rcu_dereference_check() etc.
> Changelog: 20100804
>  - added comments to page_cgroup.h
> Changelog: 20100730
>  - fixed some garbage added by debug code in early stage
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |    6 ++++-
>  mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
>  mm/page_cgroup.c            |    2 -
>  3 files changed, 38 insertions(+), 22 deletions(-)
>
> Index: mmotm-0811/include/linux/page_cgroup.h
> ===================================================================
> --- mmotm-0811.orig/include/linux/page_cgroup.h
> +++ mmotm-0811/include/linux/page_cgroup.h
> @@ -9,10 +9,14 @@
>   * page_cgroup helps us identify information about the cgroup
>   * All page cgroups are allocated at boot or memory hotplug event,
>   * then the page cgroup for pfn always exists.
> + *
> + * TODO: It seems ID for cgroup can be packed into "flags". But there will
> + * be race between assigning ID <-> set/clear flags. Please be careful.
>   */
>  struct page_cgroup {
>  	unsigned long flags;
> -	struct mem_cgroup *mem_cgroup;
> +	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
> +	unsigned short blk_cgroup;	/* Not Used..but will be. */
>  	struct page *page;
>  	struct list_head lru;		/* per cgroup LRU list */
>  };
> Index: mmotm-0811/mm/page_cgroup.c
> ===================================================================
> --- mmotm-0811.orig/mm/page_cgroup.c
> +++ mmotm-0811/mm/page_cgroup.c
> @@ -15,7 +15,7 @@ static void __meminit
>  __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
>  {
>  	pc->flags = 0;
> -	pc->mem_cgroup = NULL;
> +	pc->mem_cgroup = 0;
>  	pc->page = pfn_to_page(pfn);
>  	INIT_LIST_HEAD(&pc->lru);
>  }
> Index: mmotm-0811/mm/memcontrol.c
> ===================================================================
> --- mmotm-0811.orig/mm/memcontrol.c
> +++ mmotm-0811/mm/memcontrol.c
> @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
>  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
>  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
>  
> -/* Must be called under rcu_read_lock */
> -static struct mem_cgroup *id_to_memcg(unsigned short id)
> +/* Must be called under rcu_read_lock, set safe==true if under lock */
> +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
>  {
>  	struct mem_cgroup *ret;
>  	/* see mem_cgroup_free() */
> -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> +	ret = rcu_dereference_check(mem_cgroups[id],
> +				rch_read_lock_held() || safe);
>  	if (likely(ret && ret->valid))
>  		return ret;
>  	return NULL;
> @@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
>  static struct mem_cgroup_per_zone *
>  page_cgroup_zoneinfo(struct page_cgroup *pc)
>  {
> -	struct mem_cgroup *mem = pc->mem_cgroup;
> +	/*
> +	 * The caller should guarantee this "pc" is under lock. In typical
> +	 * case, this function is called by lru function with zone->lru_lock.
> +	 * It is a safe access.
> +	 */
> +	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
>  	int nid = page_cgroup_nid(pc);
>  	int zid = page_cgroup_zid(pc);
>  
> @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
>  	return (mem == root_mem_cgroup);
>  }
>  
> +static inline bool mem_cgroup_is_rootid(unsigned short id)
> +{
> +	return (id == 1);
> +}
> +
>  /*
>   * Following LRU functions are allowed to be used without PCG_LOCK.
>   * Operations are called by routine of global LRU independently from memcg.
> @@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> -	if (mem_cgroup_is_root(pc->mem_cgroup))
> +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>  		return;
>  	VM_BUG_ON(list_empty(&pc->lru));
>  	list_del_init(&pc->lru);
> @@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
>  	 */
>  	smp_rmb();
>  	/* unused or root page is not rotated. */
> -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> +	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
>  	SetPageCgroupAcctLRU(pc);
> -	if (mem_cgroup_is_root(pc->mem_cgroup))
> +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>  		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
> @@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
>  		return;
>  
>  	lock_page_cgroup(pc);
> -	mem = pc->mem_cgroup;
> +	mem = id_to_memcg(pc->mem_cgroup, true);
>  	if (!mem || !PageCgroupUsed(pc))
>  		goto done;
>  
> @@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		mem = pc->mem_cgroup;
> +		mem = id_to_memcg(pc->mem_cgroup, true);
>  		if (mem && !css_tryget(&mem->css))
>  			mem = NULL;
>  	} else if (PageSwapCache(page)) {
>  		ent.val = page_private(page);
>  		id = lookup_swap_cgroup(ent);
>  		rcu_read_lock();
> -		mem = id_to_memcg(id);
> +		mem = id_to_memcg(id, false);
>  		if (mem && !css_tryget(&mem->css))
>  			mem = NULL;
>  		rcu_read_unlock();
> @@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
>  		return;
>  	}
>  
> -	pc->mem_cgroup = mem;
> +	pc->mem_cgroup = css_id(&mem->css);
>  	/*
>  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>  	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> @@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
>  	VM_BUG_ON(PageLRU(pc->page));
>  	VM_BUG_ON(!PageCgroupLocked(pc));

Should this be VM_BUG_ON(!rcu_read_lock_held())?  I suspect that
mem_cgroup_move_account() should grab rcu read lock (see my comment below).

>  	VM_BUG_ON(!PageCgroupUsed(pc));
> -	VM_BUG_ON(pc->mem_cgroup != from);
> +	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
>  
>  	if (PageCgroupFileMapped(pc)) {
>  		/* Update mapped_file data for mem_cgroup */
> @@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
>  		mem_cgroup_cancel_charge(from);
>  
>  	/* caller should have done css_get */
> -	pc->mem_cgroup = to;
> +	pc->mem_cgroup = css_id(&to->css);
>  	mem_cgroup_charge_statistics(to, pc, true);
>  	/*
>  	 * We charges against "to" which may not have any tasks. Then, "to"
> @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
>  {
>  	int ret = -EINVAL;
>  	lock_page_cgroup(pc);

Should this be rcu_read_lock() instead of lock_page_cgroup()?

> -	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> +	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
>  		__mem_cgroup_move_account(pc, from, to, uncharge);
>  		ret = 0;
>  	}
> -	unlock_page_cgroup(pc);
> +	rcu_read_unlock();
>  	/*
>  	 * check events
>  	 */
> @@ -2244,7 +2255,7 @@ __mem_cgroup_commit_charge_swapin(struct
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = id_to_memcg(id);
> +		memcg = id_to_memcg(id, false);
>  		if (memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
> @@ -2354,7 +2365,7 @@ __mem_cgroup_uncharge_common(struct page
>  
>  	lock_page_cgroup(pc);
>  
> -	mem = pc->mem_cgroup;
> +	mem = id_to_memcg(pc->mem_cgroup, true);
>  
>  	if (!PageCgroupUsed(pc))
>  		goto unlock_out;
> @@ -2509,7 +2520,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  
>  	id = swap_cgroup_record(ent, 0);
>  	rcu_read_lock();
> -	memcg = id_to_memcg(id);
> +	memcg = id_to_memcg(id, false);
>  	if (memcg) {
>  		/*
>  		 * We uncharge this because swap is freed.
> @@ -2600,7 +2611,7 @@ int mem_cgroup_prepare_migration(struct 
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		mem = pc->mem_cgroup;
> +		mem = id_to_memcg(pc->mem_cgroup, true);
>  		css_get(&mem->css);
>  		/*
>  		 * At migrating an anonymous page, its mapcount goes down
> @@ -4442,7 +4453,8 @@ static int is_target_pte_for_mc(struct v
>  		 * mem_cgroup_move_account() checks the pc is valid or not under
>  		 * the lock.
>  		 */
> -		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
> +		if (PageCgroupUsed(pc) &&
> +			id_to_memcg(pc->mem_cgroup, true) == mc.from) {
>  			ret = MC_TARGET_PAGE;
>  			if (target)
>  				target->page = page;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  7:47     ` Greg Thelen
  0 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  7:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> But it will be after this work.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Now, addresses of memory cgroup can be calculated by their ID without complex.
> This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
> On 64bit architecture, this offers us more 6bytes room per page_cgroup.
> Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
> some light-weight concurrent access.
>
> We may able to move this id onto flags field but ...go step by step.
>
> Changelog: 20100811
>  - using new rcu APIs, as rcu_dereference_check() etc.
> Changelog: 20100804
>  - added comments to page_cgroup.h
> Changelog: 20100730
>  - fixed some garbage added by debug code in early stage
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |    6 ++++-
>  mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
>  mm/page_cgroup.c            |    2 -
>  3 files changed, 38 insertions(+), 22 deletions(-)
>
> Index: mmotm-0811/include/linux/page_cgroup.h
> ===================================================================
> --- mmotm-0811.orig/include/linux/page_cgroup.h
> +++ mmotm-0811/include/linux/page_cgroup.h
> @@ -9,10 +9,14 @@
>   * page_cgroup helps us identify information about the cgroup
>   * All page cgroups are allocated at boot or memory hotplug event,
>   * then the page cgroup for pfn always exists.
> + *
> + * TODO: It seems ID for cgroup can be packed into "flags". But there will
> + * be race between assigning ID <-> set/clear flags. Please be careful.
>   */
>  struct page_cgroup {
>  	unsigned long flags;
> -	struct mem_cgroup *mem_cgroup;
> +	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
> +	unsigned short blk_cgroup;	/* Not Used..but will be. */
>  	struct page *page;
>  	struct list_head lru;		/* per cgroup LRU list */
>  };
> Index: mmotm-0811/mm/page_cgroup.c
> ===================================================================
> --- mmotm-0811.orig/mm/page_cgroup.c
> +++ mmotm-0811/mm/page_cgroup.c
> @@ -15,7 +15,7 @@ static void __meminit
>  __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
>  {
>  	pc->flags = 0;
> -	pc->mem_cgroup = NULL;
> +	pc->mem_cgroup = 0;
>  	pc->page = pfn_to_page(pfn);
>  	INIT_LIST_HEAD(&pc->lru);
>  }
> Index: mmotm-0811/mm/memcontrol.c
> ===================================================================
> --- mmotm-0811.orig/mm/memcontrol.c
> +++ mmotm-0811/mm/memcontrol.c
> @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
>  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
>  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
>  
> -/* Must be called under rcu_read_lock */
> -static struct mem_cgroup *id_to_memcg(unsigned short id)
> +/* Must be called under rcu_read_lock, set safe==true if under lock */
> +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
>  {
>  	struct mem_cgroup *ret;
>  	/* see mem_cgroup_free() */
> -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> +	ret = rcu_dereference_check(mem_cgroups[id],
> +				rch_read_lock_held() || safe);
>  	if (likely(ret && ret->valid))
>  		return ret;
>  	return NULL;
> @@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
>  static struct mem_cgroup_per_zone *
>  page_cgroup_zoneinfo(struct page_cgroup *pc)
>  {
> -	struct mem_cgroup *mem = pc->mem_cgroup;
> +	/*
> +	 * The caller should guarantee this "pc" is under lock. In typical
> +	 * case, this function is called by lru function with zone->lru_lock.
> +	 * It is a safe access.
> +	 */
> +	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
>  	int nid = page_cgroup_nid(pc);
>  	int zid = page_cgroup_zid(pc);
>  
> @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
>  	return (mem == root_mem_cgroup);
>  }
>  
> +static inline bool mem_cgroup_is_rootid(unsigned short id)
> +{
> +	return (id == 1);
> +}
> +
>  /*
>   * Following LRU functions are allowed to be used without PCG_LOCK.
>   * Operations are called by routine of global LRU independently from memcg.
> @@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> -	if (mem_cgroup_is_root(pc->mem_cgroup))
> +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>  		return;
>  	VM_BUG_ON(list_empty(&pc->lru));
>  	list_del_init(&pc->lru);
> @@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
>  	 */
>  	smp_rmb();
>  	/* unused or root page is not rotated. */
> -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> +	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
>  	SetPageCgroupAcctLRU(pc);
> -	if (mem_cgroup_is_root(pc->mem_cgroup))
> +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>  		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
> @@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
>  		return;
>  
>  	lock_page_cgroup(pc);
> -	mem = pc->mem_cgroup;
> +	mem = id_to_memcg(pc->mem_cgroup, true);
>  	if (!mem || !PageCgroupUsed(pc))
>  		goto done;
>  
> @@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		mem = pc->mem_cgroup;
> +		mem = id_to_memcg(pc->mem_cgroup, true);
>  		if (mem && !css_tryget(&mem->css))
>  			mem = NULL;
>  	} else if (PageSwapCache(page)) {
>  		ent.val = page_private(page);
>  		id = lookup_swap_cgroup(ent);
>  		rcu_read_lock();
> -		mem = id_to_memcg(id);
> +		mem = id_to_memcg(id, false);
>  		if (mem && !css_tryget(&mem->css))
>  			mem = NULL;
>  		rcu_read_unlock();
> @@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
>  		return;
>  	}
>  
> -	pc->mem_cgroup = mem;
> +	pc->mem_cgroup = css_id(&mem->css);
>  	/*
>  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>  	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> @@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
>  	VM_BUG_ON(PageLRU(pc->page));
>  	VM_BUG_ON(!PageCgroupLocked(pc));

Should this be VM_BUG_ON(!rcu_read_lock_held())?  I suspect that
mem_cgroup_move_account() should grab rcu read lock (see my comment below).

>  	VM_BUG_ON(!PageCgroupUsed(pc));
> -	VM_BUG_ON(pc->mem_cgroup != from);
> +	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
>  
>  	if (PageCgroupFileMapped(pc)) {
>  		/* Update mapped_file data for mem_cgroup */
> @@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
>  		mem_cgroup_cancel_charge(from);
>  
>  	/* caller should have done css_get */
> -	pc->mem_cgroup = to;
> +	pc->mem_cgroup = css_id(&to->css);
>  	mem_cgroup_charge_statistics(to, pc, true);
>  	/*
>  	 * We charges against "to" which may not have any tasks. Then, "to"
> @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
>  {
>  	int ret = -EINVAL;
>  	lock_page_cgroup(pc);

Should this be rcu_read_lock() instead of lock_page_cgroup()?

> -	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> +	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
>  		__mem_cgroup_move_account(pc, from, to, uncharge);
>  		ret = 0;
>  	}
> -	unlock_page_cgroup(pc);
> +	rcu_read_unlock();
>  	/*
>  	 * check events
>  	 */
> @@ -2244,7 +2255,7 @@ __mem_cgroup_commit_charge_swapin(struct
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = id_to_memcg(id);
> +		memcg = id_to_memcg(id, false);
>  		if (memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
> @@ -2354,7 +2365,7 @@ __mem_cgroup_uncharge_common(struct page
>  
>  	lock_page_cgroup(pc);
>  
> -	mem = pc->mem_cgroup;
> +	mem = id_to_memcg(pc->mem_cgroup, true);
>  
>  	if (!PageCgroupUsed(pc))
>  		goto unlock_out;
> @@ -2509,7 +2520,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
>  
>  	id = swap_cgroup_record(ent, 0);
>  	rcu_read_lock();
> -	memcg = id_to_memcg(id);
> +	memcg = id_to_memcg(id, false);
>  	if (memcg) {
>  		/*
>  		 * We uncharge this because swap is freed.
> @@ -2600,7 +2611,7 @@ int mem_cgroup_prepare_migration(struct 
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		mem = pc->mem_cgroup;
> +		mem = id_to_memcg(pc->mem_cgroup, true);
>  		css_get(&mem->css);
>  		/*
>  		 * At migrating an anonymous page, its mapcount goes down
> @@ -4442,7 +4453,8 @@ static int is_target_pte_for_mc(struct v
>  		 * mem_cgroup_move_account() checks the pc is valid or not under
>  		 * the lock.
>  		 */
> -		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
> +		if (PageCgroupUsed(pc) &&
> +			id_to_memcg(pc->mem_cgroup, true) == mc.from) {
>  			ret = MC_TARGET_PAGE;
>  			if (target)
>  				target->page = page;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-24  7:47     ` Greg Thelen
@ 2010-08-24  7:51       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:51 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 00:47:50 -0700
Greg Thelen <gthelen@google.com> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> > But it will be after this work.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Now, addresses of memory cgroup can be calculated by their ID without complex.
> > This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
> > On 64bit architecture, this offers us more 6bytes room per page_cgroup.
> > Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
> > some light-weight concurrent access.
> >
> > We may able to move this id onto flags field but ...go step by step.
> >
> > Changelog: 20100811
> >  - using new rcu APIs, as rcu_dereference_check() etc.
> > Changelog: 20100804
> >  - added comments to page_cgroup.h
> > Changelog: 20100730
> >  - fixed some garbage added by debug code in early stage
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  include/linux/page_cgroup.h |    6 ++++-
> >  mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
> >  mm/page_cgroup.c            |    2 -
> >  3 files changed, 38 insertions(+), 22 deletions(-)
> >
> > Index: mmotm-0811/include/linux/page_cgroup.h
> > ===================================================================
> > --- mmotm-0811.orig/include/linux/page_cgroup.h
> > +++ mmotm-0811/include/linux/page_cgroup.h
> > @@ -9,10 +9,14 @@
> >   * page_cgroup helps us identify information about the cgroup
> >   * All page cgroups are allocated at boot or memory hotplug event,
> >   * then the page cgroup for pfn always exists.
> > + *
> > + * TODO: It seems ID for cgroup can be packed into "flags". But there will
> > + * be race between assigning ID <-> set/clear flags. Please be careful.
> >   */
> >  struct page_cgroup {
> >  	unsigned long flags;
> > -	struct mem_cgroup *mem_cgroup;
> > +	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
> > +	unsigned short blk_cgroup;	/* Not Used..but will be. */
> >  	struct page *page;
> >  	struct list_head lru;		/* per cgroup LRU list */
> >  };
> > Index: mmotm-0811/mm/page_cgroup.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/page_cgroup.c
> > +++ mmotm-0811/mm/page_cgroup.c
> > @@ -15,7 +15,7 @@ static void __meminit
> >  __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
> >  {
> >  	pc->flags = 0;
> > -	pc->mem_cgroup = NULL;
> > +	pc->mem_cgroup = 0;
> >  	pc->page = pfn_to_page(pfn);
> >  	INIT_LIST_HEAD(&pc->lru);
> >  }
> > Index: mmotm-0811/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/memcontrol.c
> > +++ mmotm-0811/mm/memcontrol.c
> > @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
> >  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> >  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> >  
> > -/* Must be called under rcu_read_lock */
> > -static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +/* Must be called under rcu_read_lock, set safe==true if under lock */
> > +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
> >  {
> >  	struct mem_cgroup *ret;
> >  	/* see mem_cgroup_free() */
> > -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> > +	ret = rcu_dereference_check(mem_cgroups[id],
> > +				rch_read_lock_held() || safe);
> >  	if (likely(ret && ret->valid))
> >  		return ret;
> >  	return NULL;
> > @@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
> >  static struct mem_cgroup_per_zone *
> >  page_cgroup_zoneinfo(struct page_cgroup *pc)
> >  {
> > -	struct mem_cgroup *mem = pc->mem_cgroup;
> > +	/*
> > +	 * The caller should guarantee this "pc" is under lock. In typical
> > +	 * case, this function is called by lru function with zone->lru_lock.
> > +	 * It is a safe access.
> > +	 */
> > +	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
> >  	int nid = page_cgroup_nid(pc);
> >  	int zid = page_cgroup_zid(pc);
> >  
> > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> >  	return (mem == root_mem_cgroup);
> >  }
> >  
> > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > +{
> > +	return (id == 1);
> > +}
> > +
> >  /*
> >   * Following LRU functions are allowed to be used without PCG_LOCK.
> >   * Operations are called by routine of global LRU independently from memcg.
> > @@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
> >  	 */
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
> >  		return;
> >  	VM_BUG_ON(list_empty(&pc->lru));
> >  	list_del_init(&pc->lru);
> > @@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
> >  	 */
> >  	smp_rmb();
> >  	/* unused or root page is not rotated. */
> > -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> > +	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> >  	SetPageCgroupAcctLRU(pc);
> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
> >  		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> > @@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
> >  		return;
> >  
> >  	lock_page_cgroup(pc);
> > -	mem = pc->mem_cgroup;
> > +	mem = id_to_memcg(pc->mem_cgroup, true);
> >  	if (!mem || !PageCgroupUsed(pc))
> >  		goto done;
> >  
> > @@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
> >  	pc = lookup_page_cgroup(page);
> >  	lock_page_cgroup(pc);
> >  	if (PageCgroupUsed(pc)) {
> > -		mem = pc->mem_cgroup;
> > +		mem = id_to_memcg(pc->mem_cgroup, true);
> >  		if (mem && !css_tryget(&mem->css))
> >  			mem = NULL;
> >  	} else if (PageSwapCache(page)) {
> >  		ent.val = page_private(page);
> >  		id = lookup_swap_cgroup(ent);
> >  		rcu_read_lock();
> > -		mem = id_to_memcg(id);
> > +		mem = id_to_memcg(id, false);
> >  		if (mem && !css_tryget(&mem->css))
> >  			mem = NULL;
> >  		rcu_read_unlock();
> > @@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
> >  		return;
> >  	}
> >  
> > -	pc->mem_cgroup = mem;
> > +	pc->mem_cgroup = css_id(&mem->css);
> >  	/*
> >  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> >  	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> > @@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
> >  	VM_BUG_ON(PageLRU(pc->page));
> >  	VM_BUG_ON(!PageCgroupLocked(pc));
> 
> Should this be VM_BUG_ON(!rcu_read_lock_held())?  I suspect that
> mem_cgroup_move_account() should grab rcu read lock (see my comment below).
> 

No. (see below)


> >  	VM_BUG_ON(!PageCgroupUsed(pc));
> > -	VM_BUG_ON(pc->mem_cgroup != from);
> > +	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
> >  
> >  	if (PageCgroupFileMapped(pc)) {
> >  		/* Update mapped_file data for mem_cgroup */
> > @@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
> >  		mem_cgroup_cancel_charge(from);
> >  
> >  	/* caller should have done css_get */
> > -	pc->mem_cgroup = to;
> > +	pc->mem_cgroup = css_id(&to->css);
> >  	mem_cgroup_charge_statistics(to, pc, true);
> >  	/*
> >  	 * We charges against "to" which may not have any tasks. Then, "to"
> > @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
> >  {
> >  	int ret = -EINVAL;
> >  	lock_page_cgroup(pc);
> 
> Should this be rcu_read_lock() instead of lock_page_cgroup()?
> 

No. 
 - lock_page_cgroup() is for keeping page_cgroup's status stable.
 - rcu_read_lock() is for delaying to discard mem_cgroup object.

rcu_read_lock() is just for delaying to discard object, not for avoiding
racy updates. All _updates_ requires proper lock or speculative logic as
atomic_inc_not_zero() etc... Basically, RCU is for avoiding use-after-free.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  7:51       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:51 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 00:47:50 -0700
Greg Thelen <gthelen@google.com> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
> > But it will be after this work.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > Now, addresses of memory cgroup can be calculated by their ID without complex.
> > This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
> > On 64bit architecture, this offers us more 6bytes room per page_cgroup.
> > Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
> > some light-weight concurrent access.
> >
> > We may able to move this id onto flags field but ...go step by step.
> >
> > Changelog: 20100811
> >  - using new rcu APIs, as rcu_dereference_check() etc.
> > Changelog: 20100804
> >  - added comments to page_cgroup.h
> > Changelog: 20100730
> >  - fixed some garbage added by debug code in early stage
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  include/linux/page_cgroup.h |    6 ++++-
> >  mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
> >  mm/page_cgroup.c            |    2 -
> >  3 files changed, 38 insertions(+), 22 deletions(-)
> >
> > Index: mmotm-0811/include/linux/page_cgroup.h
> > ===================================================================
> > --- mmotm-0811.orig/include/linux/page_cgroup.h
> > +++ mmotm-0811/include/linux/page_cgroup.h
> > @@ -9,10 +9,14 @@
> >   * page_cgroup helps us identify information about the cgroup
> >   * All page cgroups are allocated at boot or memory hotplug event,
> >   * then the page cgroup for pfn always exists.
> > + *
> > + * TODO: It seems ID for cgroup can be packed into "flags". But there will
> > + * be race between assigning ID <-> set/clear flags. Please be careful.
> >   */
> >  struct page_cgroup {
> >  	unsigned long flags;
> > -	struct mem_cgroup *mem_cgroup;
> > +	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
> > +	unsigned short blk_cgroup;	/* Not Used..but will be. */
> >  	struct page *page;
> >  	struct list_head lru;		/* per cgroup LRU list */
> >  };
> > Index: mmotm-0811/mm/page_cgroup.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/page_cgroup.c
> > +++ mmotm-0811/mm/page_cgroup.c
> > @@ -15,7 +15,7 @@ static void __meminit
> >  __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
> >  {
> >  	pc->flags = 0;
> > -	pc->mem_cgroup = NULL;
> > +	pc->mem_cgroup = 0;
> >  	pc->page = pfn_to_page(pfn);
> >  	INIT_LIST_HEAD(&pc->lru);
> >  }
> > Index: mmotm-0811/mm/memcontrol.c
> > ===================================================================
> > --- mmotm-0811.orig/mm/memcontrol.c
> > +++ mmotm-0811/mm/memcontrol.c
> > @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
> >  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
> >  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
> >  
> > -/* Must be called under rcu_read_lock */
> > -static struct mem_cgroup *id_to_memcg(unsigned short id)
> > +/* Must be called under rcu_read_lock, set safe==true if under lock */
> > +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
> >  {
> >  	struct mem_cgroup *ret;
> >  	/* see mem_cgroup_free() */
> > -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
> > +	ret = rcu_dereference_check(mem_cgroups[id],
> > +				rch_read_lock_held() || safe);
> >  	if (likely(ret && ret->valid))
> >  		return ret;
> >  	return NULL;
> > @@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
> >  static struct mem_cgroup_per_zone *
> >  page_cgroup_zoneinfo(struct page_cgroup *pc)
> >  {
> > -	struct mem_cgroup *mem = pc->mem_cgroup;
> > +	/*
> > +	 * The caller should guarantee this "pc" is under lock. In typical
> > +	 * case, this function is called by lru function with zone->lru_lock.
> > +	 * It is a safe access.
> > +	 */
> > +	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
> >  	int nid = page_cgroup_nid(pc);
> >  	int zid = page_cgroup_zid(pc);
> >  
> > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
> >  	return (mem == root_mem_cgroup);
> >  }
> >  
> > +static inline bool mem_cgroup_is_rootid(unsigned short id)
> > +{
> > +	return (id == 1);
> > +}
> > +
> >  /*
> >   * Following LRU functions are allowed to be used without PCG_LOCK.
> >   * Operations are called by routine of global LRU independently from memcg.
> > @@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
> >  	 */
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
> >  		return;
> >  	VM_BUG_ON(list_empty(&pc->lru));
> >  	list_del_init(&pc->lru);
> > @@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
> >  	 */
> >  	smp_rmb();
> >  	/* unused or root page is not rotated. */
> > -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> > +	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> >  	SetPageCgroupAcctLRU(pc);
> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
> >  		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> > @@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
> >  		return;
> >  
> >  	lock_page_cgroup(pc);
> > -	mem = pc->mem_cgroup;
> > +	mem = id_to_memcg(pc->mem_cgroup, true);
> >  	if (!mem || !PageCgroupUsed(pc))
> >  		goto done;
> >  
> > @@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
> >  	pc = lookup_page_cgroup(page);
> >  	lock_page_cgroup(pc);
> >  	if (PageCgroupUsed(pc)) {
> > -		mem = pc->mem_cgroup;
> > +		mem = id_to_memcg(pc->mem_cgroup, true);
> >  		if (mem && !css_tryget(&mem->css))
> >  			mem = NULL;
> >  	} else if (PageSwapCache(page)) {
> >  		ent.val = page_private(page);
> >  		id = lookup_swap_cgroup(ent);
> >  		rcu_read_lock();
> > -		mem = id_to_memcg(id);
> > +		mem = id_to_memcg(id, false);
> >  		if (mem && !css_tryget(&mem->css))
> >  			mem = NULL;
> >  		rcu_read_unlock();
> > @@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
> >  		return;
> >  	}
> >  
> > -	pc->mem_cgroup = mem;
> > +	pc->mem_cgroup = css_id(&mem->css);
> >  	/*
> >  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> >  	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> > @@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
> >  	VM_BUG_ON(PageLRU(pc->page));
> >  	VM_BUG_ON(!PageCgroupLocked(pc));
> 
> Should this be VM_BUG_ON(!rcu_read_lock_held())?  I suspect that
> mem_cgroup_move_account() should grab rcu read lock (see my comment below).
> 

No. (see below)


> >  	VM_BUG_ON(!PageCgroupUsed(pc));
> > -	VM_BUG_ON(pc->mem_cgroup != from);
> > +	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
> >  
> >  	if (PageCgroupFileMapped(pc)) {
> >  		/* Update mapped_file data for mem_cgroup */
> > @@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
> >  		mem_cgroup_cancel_charge(from);
> >  
> >  	/* caller should have done css_get */
> > -	pc->mem_cgroup = to;
> > +	pc->mem_cgroup = css_id(&to->css);
> >  	mem_cgroup_charge_statistics(to, pc, true);
> >  	/*
> >  	 * We charges against "to" which may not have any tasks. Then, "to"
> > @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
> >  {
> >  	int ret = -EINVAL;
> >  	lock_page_cgroup(pc);
> 
> Should this be rcu_read_lock() instead of lock_page_cgroup()?
> 

No. 
 - lock_page_cgroup() is for keeping page_cgroup's status stable.
 - rcu_read_lock() is for delaying to discard mem_cgroup object.

rcu_read_lock() is just for delaying to discard object, not for avoiding
racy updates. All _updates_ requires proper lock or speculative logic as
atomic_inc_not_zero() etc... Basically, RCU is for avoiding use-after-free.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: towards I/O aware memcg v5
  2010-08-24  7:46   ` Balbir Singh
@ 2010-08-24  7:59     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:59 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, nishimura, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 13:16:17 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-08-20 18:55:52]:
> 
> > This is v5.
> > 
> > Sorry for delaying...but I had time for resetting myself and..several
> > changes are added. I think this version is simpler than v4.
> > 
> > Major changes from v4 is 
> >  a) added kernel/cgroup.c hooks again. (for b)
> >  b) make RCU aware. previous version seems dangerous in an extreme case.
> > 
> > Then, codes are updated. Most of changes are related to RCU.
> > 
> > Patch brief view:
> >  1. add hooks to kernel/cgroup.c for ID management.
> >  2. use ID-array in memcg.
> >  3. record ID to page_cgroup rather than pointer.
> >  4. make update_file_mapped to be RCU aware routine instead of spinlock.
> >  5. make update_file_mapped as general-purpose function.
> >
> 
> Thanks for being persistent, will review the patches with comments in
> the relevant patches. 
> 

Thank you. I may be able to post v6, tomorrow.

-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: towards I/O aware memcg v5
@ 2010-08-24  7:59     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  7:59 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, nishimura, gthelen, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 13:16:17 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-08-20 18:55:52]:
> 
> > This is v5.
> > 
> > Sorry for delaying...but I had time for resetting myself and..several
> > changes are added. I think this version is simpler than v4.
> > 
> > Major changes from v4 is 
> >  a) added kernel/cgroup.c hooks again. (for b)
> >  b) make RCU aware. previous version seems dangerous in an extreme case.
> > 
> > Then, codes are updated. Most of changes are related to RCU.
> > 
> > Patch brief view:
> >  1. add hooks to kernel/cgroup.c for ID management.
> >  2. use ID-array in memcg.
> >  3. record ID to page_cgroup rather than pointer.
> >  4. make update_file_mapped to be RCU aware routine instead of spinlock.
> >  5. make update_file_mapped as general-purpose function.
> >
> 
> Thanks for being persistent, will review the patches with comments in
> the relevant patches. 
> 

Thank you. I may be able to post v6, tomorrow.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-24  7:51       ` KAMEZAWA Hiroyuki
@ 2010-08-24  8:35         ` Greg Thelen
  -1 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  8:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> On Tue, 24 Aug 2010 00:47:50 -0700
> Greg Thelen <gthelen@google.com> wrote:
>
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
>> 
>> > I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
>> > But it will be after this work.
>> > ==
>> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >
>> > Now, addresses of memory cgroup can be calculated by their ID without complex.
>> > This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
>> > On 64bit architecture, this offers us more 6bytes room per page_cgroup.
>> > Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
>> > some light-weight concurrent access.
>> >
>> > We may able to move this id onto flags field but ...go step by step.
>> >
>> > Changelog: 20100811
>> >  - using new rcu APIs, as rcu_dereference_check() etc.
>> > Changelog: 20100804
>> >  - added comments to page_cgroup.h
>> > Changelog: 20100730
>> >  - fixed some garbage added by debug code in early stage
>> >
>> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > ---
>> >  include/linux/page_cgroup.h |    6 ++++-
>> >  mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
>> >  mm/page_cgroup.c            |    2 -
>> >  3 files changed, 38 insertions(+), 22 deletions(-)
>> >
>> > Index: mmotm-0811/include/linux/page_cgroup.h
>> > ===================================================================
>> > --- mmotm-0811.orig/include/linux/page_cgroup.h
>> > +++ mmotm-0811/include/linux/page_cgroup.h
>> > @@ -9,10 +9,14 @@
>> >   * page_cgroup helps us identify information about the cgroup
>> >   * All page cgroups are allocated at boot or memory hotplug event,
>> >   * then the page cgroup for pfn always exists.
>> > + *
>> > + * TODO: It seems ID for cgroup can be packed into "flags". But there will
>> > + * be race between assigning ID <-> set/clear flags. Please be careful.
>> >   */
>> >  struct page_cgroup {
>> >  	unsigned long flags;
>> > -	struct mem_cgroup *mem_cgroup;
>> > +	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
>> > +	unsigned short blk_cgroup;	/* Not Used..but will be. */
>> >  	struct page *page;
>> >  	struct list_head lru;		/* per cgroup LRU list */
>> >  };
>> > Index: mmotm-0811/mm/page_cgroup.c
>> > ===================================================================
>> > --- mmotm-0811.orig/mm/page_cgroup.c
>> > +++ mmotm-0811/mm/page_cgroup.c
>> > @@ -15,7 +15,7 @@ static void __meminit
>> >  __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
>> >  {
>> >  	pc->flags = 0;
>> > -	pc->mem_cgroup = NULL;
>> > +	pc->mem_cgroup = 0;
>> >  	pc->page = pfn_to_page(pfn);
>> >  	INIT_LIST_HEAD(&pc->lru);
>> >  }
>> > Index: mmotm-0811/mm/memcontrol.c
>> > ===================================================================
>> > --- mmotm-0811.orig/mm/memcontrol.c
>> > +++ mmotm-0811/mm/memcontrol.c
>> > @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
>> >  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
>> >  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
>> >  
>> > -/* Must be called under rcu_read_lock */
>> > -static struct mem_cgroup *id_to_memcg(unsigned short id)
>> > +/* Must be called under rcu_read_lock, set safe==true if under lock */
>> > +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
>> >  {
>> >  	struct mem_cgroup *ret;
>> >  	/* see mem_cgroup_free() */
>> > -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
>> > +	ret = rcu_dereference_check(mem_cgroups[id],
>> > +				rch_read_lock_held() || safe);
>> >  	if (likely(ret && ret->valid))
>> >  		return ret;
>> >  	return NULL;
>> > @@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
>> >  static struct mem_cgroup_per_zone *
>> >  page_cgroup_zoneinfo(struct page_cgroup *pc)
>> >  {
>> > -	struct mem_cgroup *mem = pc->mem_cgroup;
>> > +	/*
>> > +	 * The caller should guarantee this "pc" is under lock. In typical
>> > +	 * case, this function is called by lru function with zone->lru_lock.
>> > +	 * It is a safe access.
>> > +	 */
>> > +	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
>> >  	int nid = page_cgroup_nid(pc);
>> >  	int zid = page_cgroup_zid(pc);
>> >  
>> > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
>> >  	return (mem == root_mem_cgroup);
>> >  }
>> >  
>> > +static inline bool mem_cgroup_is_rootid(unsigned short id)
>> > +{
>> > +	return (id == 1);
>> > +}
>> > +
>> >  /*
>> >   * Following LRU functions are allowed to be used without PCG_LOCK.
>> >   * Operations are called by routine of global LRU independently from memcg.
>> > @@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
>> >  	 */
>> >  	mz = page_cgroup_zoneinfo(pc);
>> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
>> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
>> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>> >  		return;
>> >  	VM_BUG_ON(list_empty(&pc->lru));
>> >  	list_del_init(&pc->lru);
>> > @@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
>> >  	 */
>> >  	smp_rmb();
>> >  	/* unused or root page is not rotated. */
>> > -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
>> > +	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
>> >  		return;
>> >  	mz = page_cgroup_zoneinfo(pc);
>> >  	list_move(&pc->lru, &mz->lists[lru]);
>> > @@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
>> >  	mz = page_cgroup_zoneinfo(pc);
>> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
>> >  	SetPageCgroupAcctLRU(pc);
>> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
>> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>> >  		return;
>> >  	list_add(&pc->lru, &mz->lists[lru]);
>> >  }
>> > @@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
>> >  		return;
>> >  
>> >  	lock_page_cgroup(pc);
>> > -	mem = pc->mem_cgroup;
>> > +	mem = id_to_memcg(pc->mem_cgroup, true);
>> >  	if (!mem || !PageCgroupUsed(pc))
>> >  		goto done;
>> >  
>> > @@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
>> >  	pc = lookup_page_cgroup(page);
>> >  	lock_page_cgroup(pc);
>> >  	if (PageCgroupUsed(pc)) {
>> > -		mem = pc->mem_cgroup;
>> > +		mem = id_to_memcg(pc->mem_cgroup, true);
>> >  		if (mem && !css_tryget(&mem->css))
>> >  			mem = NULL;
>> >  	} else if (PageSwapCache(page)) {
>> >  		ent.val = page_private(page);
>> >  		id = lookup_swap_cgroup(ent);
>> >  		rcu_read_lock();
>> > -		mem = id_to_memcg(id);
>> > +		mem = id_to_memcg(id, false);
>> >  		if (mem && !css_tryget(&mem->css))
>> >  			mem = NULL;
>> >  		rcu_read_unlock();
>> > @@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
>> >  		return;
>> >  	}
>> >  
>> > -	pc->mem_cgroup = mem;
>> > +	pc->mem_cgroup = css_id(&mem->css);
>> >  	/*
>> >  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>> >  	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
>> > @@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
>> >  	VM_BUG_ON(PageLRU(pc->page));
>> >  	VM_BUG_ON(!PageCgroupLocked(pc));
>> 
>> Should this be VM_BUG_ON(!rcu_read_lock_held())?  I suspect that
>> mem_cgroup_move_account() should grab rcu read lock (see my comment below).
>> 
>
> No. (see below)
>
>
>> >  	VM_BUG_ON(!PageCgroupUsed(pc));
>> > -	VM_BUG_ON(pc->mem_cgroup != from);
>> > +	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
>> >  
>> >  	if (PageCgroupFileMapped(pc)) {
>> >  		/* Update mapped_file data for mem_cgroup */
>> > @@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
>> >  		mem_cgroup_cancel_charge(from);
>> >  
>> >  	/* caller should have done css_get */
>> > -	pc->mem_cgroup = to;
>> > +	pc->mem_cgroup = css_id(&to->css);
>> >  	mem_cgroup_charge_statistics(to, pc, true);
>> >  	/*
>> >  	 * We charges against "to" which may not have any tasks. Then, "to"
>> > @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
>> >  {
>> >  	int ret = -EINVAL;
>> >  	lock_page_cgroup(pc);
>> 
>> Should this be rcu_read_lock() instead of lock_page_cgroup()?
>> 
>
> No. 
>  - lock_page_cgroup() is for keeping page_cgroup's status stable.
>  - rcu_read_lock() is for delaying to discard mem_cgroup object.
>
> rcu_read_lock() is just for delaying to discard object, not for avoiding
> racy updates. All _updates_ requires proper lock or speculative logic as
> atomic_inc_not_zero() etc... Basically, RCU is for avoiding use-after-free.


Thanks for the info.  Referring to your original patch:
> @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
>  {
>  	int ret = -EINVAL;
>  	lock_page_cgroup(pc);
> -	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> +	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
>  		__mem_cgroup_move_account(pc, from, to, uncharge);
>  		ret = 0;
>  	}
> -	unlock_page_cgroup(pc);
> +	rcu_read_unlock();
> 

It seems like mem_cgroup_move_account() is not balanced.  Why is
lock_page_cgroup(pc) used to lock but rcu_read_unlock() used to unlock?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  8:35         ` Greg Thelen
  0 siblings, 0 replies; 87+ messages in thread
From: Greg Thelen @ 2010-08-24  8:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> On Tue, 24 Aug 2010 00:47:50 -0700
> Greg Thelen <gthelen@google.com> wrote:
>
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
>> 
>> > I have an idea to remove page_cgroup->page pointer, 8bytes reduction per page.
>> > But it will be after this work.
>> > ==
>> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >
>> > Now, addresses of memory cgroup can be calculated by their ID without complex.
>> > This patch relplaces pc->mem_cgroup from a pointer to a unsigned short.
>> > On 64bit architecture, this offers us more 6bytes room per page_cgroup.
>> > Use 2bytes for blkio-cgroup's page tracking. More 4bytes will be used for
>> > some light-weight concurrent access.
>> >
>> > We may able to move this id onto flags field but ...go step by step.
>> >
>> > Changelog: 20100811
>> >  - using new rcu APIs, as rcu_dereference_check() etc.
>> > Changelog: 20100804
>> >  - added comments to page_cgroup.h
>> > Changelog: 20100730
>> >  - fixed some garbage added by debug code in early stage
>> >
>> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> > ---
>> >  include/linux/page_cgroup.h |    6 ++++-
>> >  mm/memcontrol.c             |   52 +++++++++++++++++++++++++++-----------------
>> >  mm/page_cgroup.c            |    2 -
>> >  3 files changed, 38 insertions(+), 22 deletions(-)
>> >
>> > Index: mmotm-0811/include/linux/page_cgroup.h
>> > ===================================================================
>> > --- mmotm-0811.orig/include/linux/page_cgroup.h
>> > +++ mmotm-0811/include/linux/page_cgroup.h
>> > @@ -9,10 +9,14 @@
>> >   * page_cgroup helps us identify information about the cgroup
>> >   * All page cgroups are allocated at boot or memory hotplug event,
>> >   * then the page cgroup for pfn always exists.
>> > + *
>> > + * TODO: It seems ID for cgroup can be packed into "flags". But there will
>> > + * be race between assigning ID <-> set/clear flags. Please be careful.
>> >   */
>> >  struct page_cgroup {
>> >  	unsigned long flags;
>> > -	struct mem_cgroup *mem_cgroup;
>> > +	unsigned short mem_cgroup;	/* ID of assigned memory cgroup */
>> > +	unsigned short blk_cgroup;	/* Not Used..but will be. */
>> >  	struct page *page;
>> >  	struct list_head lru;		/* per cgroup LRU list */
>> >  };
>> > Index: mmotm-0811/mm/page_cgroup.c
>> > ===================================================================
>> > --- mmotm-0811.orig/mm/page_cgroup.c
>> > +++ mmotm-0811/mm/page_cgroup.c
>> > @@ -15,7 +15,7 @@ static void __meminit
>> >  __init_page_cgroup(struct page_cgroup *pc, unsigned long pfn)
>> >  {
>> >  	pc->flags = 0;
>> > -	pc->mem_cgroup = NULL;
>> > +	pc->mem_cgroup = 0;
>> >  	pc->page = pfn_to_page(pfn);
>> >  	INIT_LIST_HEAD(&pc->lru);
>> >  }
>> > Index: mmotm-0811/mm/memcontrol.c
>> > ===================================================================
>> > --- mmotm-0811.orig/mm/memcontrol.c
>> > +++ mmotm-0811/mm/memcontrol.c
>> > @@ -300,12 +300,13 @@ static atomic_t mem_cgroup_num;
>> >  #define NR_MEMCG_GROUPS (CONFIG_MEM_CGROUP_MAX_GROUPS + 1)
>> >  static struct mem_cgroup *mem_cgroups[NR_MEMCG_GROUPS] __read_mostly;
>> >  
>> > -/* Must be called under rcu_read_lock */
>> > -static struct mem_cgroup *id_to_memcg(unsigned short id)
>> > +/* Must be called under rcu_read_lock, set safe==true if under lock */
>> > +static struct mem_cgroup *id_to_memcg(unsigned short id, bool safe)
>> >  {
>> >  	struct mem_cgroup *ret;
>> >  	/* see mem_cgroup_free() */
>> > -	ret = rcu_dereference_check(mem_cgroups[id], rch_read_lock_held());
>> > +	ret = rcu_dereference_check(mem_cgroups[id],
>> > +				rch_read_lock_held() || safe);
>> >  	if (likely(ret && ret->valid))
>> >  		return ret;
>> >  	return NULL;
>> > @@ -381,7 +382,12 @@ struct cgroup_subsys_state *mem_cgroup_c
>> >  static struct mem_cgroup_per_zone *
>> >  page_cgroup_zoneinfo(struct page_cgroup *pc)
>> >  {
>> > -	struct mem_cgroup *mem = pc->mem_cgroup;
>> > +	/*
>> > +	 * The caller should guarantee this "pc" is under lock. In typical
>> > +	 * case, this function is called by lru function with zone->lru_lock.
>> > +	 * It is a safe access.
>> > +	 */
>> > +	struct mem_cgroup *mem = id_to_memcg(pc->mem_cgroup, true);
>> >  	int nid = page_cgroup_nid(pc);
>> >  	int zid = page_cgroup_zid(pc);
>> >  
>> > @@ -723,6 +729,11 @@ static inline bool mem_cgroup_is_root(st
>> >  	return (mem == root_mem_cgroup);
>> >  }
>> >  
>> > +static inline bool mem_cgroup_is_rootid(unsigned short id)
>> > +{
>> > +	return (id == 1);
>> > +}
>> > +
>> >  /*
>> >   * Following LRU functions are allowed to be used without PCG_LOCK.
>> >   * Operations are called by routine of global LRU independently from memcg.
>> > @@ -755,7 +766,7 @@ void mem_cgroup_del_lru_list(struct page
>> >  	 */
>> >  	mz = page_cgroup_zoneinfo(pc);
>> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
>> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
>> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>> >  		return;
>> >  	VM_BUG_ON(list_empty(&pc->lru));
>> >  	list_del_init(&pc->lru);
>> > @@ -782,7 +793,7 @@ void mem_cgroup_rotate_lru_list(struct p
>> >  	 */
>> >  	smp_rmb();
>> >  	/* unused or root page is not rotated. */
>> > -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
>> > +	if (!PageCgroupUsed(pc) || mem_cgroup_is_rootid(pc->mem_cgroup))
>> >  		return;
>> >  	mz = page_cgroup_zoneinfo(pc);
>> >  	list_move(&pc->lru, &mz->lists[lru]);
>> > @@ -808,7 +819,7 @@ void mem_cgroup_add_lru_list(struct page
>> >  	mz = page_cgroup_zoneinfo(pc);
>> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
>> >  	SetPageCgroupAcctLRU(pc);
>> > -	if (mem_cgroup_is_root(pc->mem_cgroup))
>> > +	if (mem_cgroup_is_rootid(pc->mem_cgroup))
>> >  		return;
>> >  	list_add(&pc->lru, &mz->lists[lru]);
>> >  }
>> > @@ -1497,7 +1508,7 @@ void mem_cgroup_update_file_mapped(struc
>> >  		return;
>> >  
>> >  	lock_page_cgroup(pc);
>> > -	mem = pc->mem_cgroup;
>> > +	mem = id_to_memcg(pc->mem_cgroup, true);
>> >  	if (!mem || !PageCgroupUsed(pc))
>> >  		goto done;
>> >  
>> > @@ -1885,14 +1896,14 @@ struct mem_cgroup *try_get_mem_cgroup_fr
>> >  	pc = lookup_page_cgroup(page);
>> >  	lock_page_cgroup(pc);
>> >  	if (PageCgroupUsed(pc)) {
>> > -		mem = pc->mem_cgroup;
>> > +		mem = id_to_memcg(pc->mem_cgroup, true);
>> >  		if (mem && !css_tryget(&mem->css))
>> >  			mem = NULL;
>> >  	} else if (PageSwapCache(page)) {
>> >  		ent.val = page_private(page);
>> >  		id = lookup_swap_cgroup(ent);
>> >  		rcu_read_lock();
>> > -		mem = id_to_memcg(id);
>> > +		mem = id_to_memcg(id, false);
>> >  		if (mem && !css_tryget(&mem->css))
>> >  			mem = NULL;
>> >  		rcu_read_unlock();
>> > @@ -1921,7 +1932,7 @@ static void __mem_cgroup_commit_charge(s
>> >  		return;
>> >  	}
>> >  
>> > -	pc->mem_cgroup = mem;
>> > +	pc->mem_cgroup = css_id(&mem->css);
>> >  	/*
>> >  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>> >  	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
>> > @@ -1979,7 +1990,7 @@ static void __mem_cgroup_move_account(st
>> >  	VM_BUG_ON(PageLRU(pc->page));
>> >  	VM_BUG_ON(!PageCgroupLocked(pc));
>> 
>> Should this be VM_BUG_ON(!rcu_read_lock_held())?  I suspect that
>> mem_cgroup_move_account() should grab rcu read lock (see my comment below).
>> 
>
> No. (see below)
>
>
>> >  	VM_BUG_ON(!PageCgroupUsed(pc));
>> > -	VM_BUG_ON(pc->mem_cgroup != from);
>> > +	VM_BUG_ON(id_to_memcg(pc->mem_cgroup, true) != from);
>> >  
>> >  	if (PageCgroupFileMapped(pc)) {
>> >  		/* Update mapped_file data for mem_cgroup */
>> > @@ -1994,7 +2005,7 @@ static void __mem_cgroup_move_account(st
>> >  		mem_cgroup_cancel_charge(from);
>> >  
>> >  	/* caller should have done css_get */
>> > -	pc->mem_cgroup = to;
>> > +	pc->mem_cgroup = css_id(&to->css);
>> >  	mem_cgroup_charge_statistics(to, pc, true);
>> >  	/*
>> >  	 * We charges against "to" which may not have any tasks. Then, "to"
>> > @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
>> >  {
>> >  	int ret = -EINVAL;
>> >  	lock_page_cgroup(pc);
>> 
>> Should this be rcu_read_lock() instead of lock_page_cgroup()?
>> 
>
> No. 
>  - lock_page_cgroup() is for keeping page_cgroup's status stable.
>  - rcu_read_lock() is for delaying to discard mem_cgroup object.
>
> rcu_read_lock() is just for delaying to discard object, not for avoiding
> racy updates. All _updates_ requires proper lock or speculative logic as
> atomic_inc_not_zero() etc... Basically, RCU is for avoiding use-after-free.


Thanks for the info.  Referring to your original patch:
> @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
>  {
>  	int ret = -EINVAL;
>  	lock_page_cgroup(pc);
> -	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> +	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
>  		__mem_cgroup_move_account(pc, from, to, uncharge);
>  		ret = 0;
>  	}
> -	unlock_page_cgroup(pc);
> +	rcu_read_unlock();
> 

It seems like mem_cgroup_move_account() is not balanced.  Why is
lock_page_cgroup(pc) used to lock but rcu_read_unlock() used to unlock?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
  2010-08-24  8:35         ` Greg Thelen
@ 2010-08-24  8:38           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  8:38 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 01:35:37 -0700
Greg Thelen <gthelen@google.com> wrote:

> > rcu_read_lock() is just for delaying to discard object, not for avoiding
> > racy updates. All _updates_ requires proper lock or speculative logic as
> > atomic_inc_not_zero() etc... Basically, RCU is for avoiding use-after-free.
> 
> 
> Thanks for the info.  Referring to your original patch:
> > @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
> >  {
> >  	int ret = -EINVAL;
> >  	lock_page_cgroup(pc);
> > -	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> > +	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
> >  		__mem_cgroup_move_account(pc, from, to, uncharge);
> >  		ret = 0;
> >  	}
> > -	unlock_page_cgroup(pc);
> > +	rcu_read_unlock();
> > 
> 
> It seems like mem_cgroup_move_account() is not balanced.  Why is
> lock_page_cgroup(pc) used to lock but rcu_read_unlock() used to unlock?
> 

Nice catch. It's bug. It seems my eyes were corrupted..
Will be fixed in the next version. Sorry for bad code.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH] memcg: use ID in page_cgroup
@ 2010-08-24  8:38           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24  8:38 UTC (permalink / raw)
  To: Greg Thelen
  Cc: linux-mm, nishimura, balbir, m-ikeda, akpm, linux-kernel,
	kamezawa.hiroyuki

On Tue, 24 Aug 2010 01:35:37 -0700
Greg Thelen <gthelen@google.com> wrote:

> > rcu_read_lock() is just for delaying to discard object, not for avoiding
> > racy updates. All _updates_ requires proper lock or speculative logic as
> > atomic_inc_not_zero() etc... Basically, RCU is for avoiding use-after-free.
> 
> 
> Thanks for the info.  Referring to your original patch:
> > @@ -2014,11 +2025,11 @@ static int mem_cgroup_move_account(struc
> >  {
> >  	int ret = -EINVAL;
> >  	lock_page_cgroup(pc);
> > -	if (PageCgroupUsed(pc) && pc->mem_cgroup == from) {
> > +	if (PageCgroupUsed(pc) && id_to_memcg(pc->mem_cgroup, true) == from) {
> >  		__mem_cgroup_move_account(pc, from, to, uncharge);
> >  		ret = 0;
> >  	}
> > -	unlock_page_cgroup(pc);
> > +	rcu_read_unlock();
> > 
> 
> It seems like mem_cgroup_move_account() is not balanced.  Why is
> lock_page_cgroup(pc) used to lock but rcu_read_unlock() used to unlock?
> 

Nice catch. It's bug. It seems my eyes were corrupted..
Will be fixed in the next version. Sorry for bad code.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-20  9:58   ` KAMEZAWA Hiroyuki
@ 2010-08-24  9:04     ` Li Zefan
  -1 siblings, 0 replies; 87+ messages in thread
From: Li Zefan @ 2010-08-24  9:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, menage

KAMEZAWA Hiroyuki wrote:
> CC'ed to Paul Menage and Li Zefan.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> after successful call of ->create(). css_ID is tightly coupled with
> css struct itself but it is allocated by ->create() call, IOW,
> per-subsystem special allocations.
> 
> To know css_id before creation, this patch adds id_attached() callback.
> after css_ID allocation. This will be used by memory cgroup's quick lookup
> routine.
> 
> Maybe you can think of other implementations as
> 	- pass ID to ->create()
> 	or
> 	- add post_create()
> 	etc...
> But when considering dirtiness of codes, this straightforward patch seems
> good to me. If someone wants post_create(), this patch can be replaced.
> 
> Changelog: 20100820
>  - new approarch.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

...
>  
> Index: mmotm-0811/include/linux/cgroup.h
> ===================================================================
> --- mmotm-0811.orig/include/linux/cgroup.h
> +++ mmotm-0811/include/linux/cgroup.h
> @@ -475,6 +475,7 @@ struct cgroup_subsys {
>  			struct cgroup *cgrp);
>  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);

Maybe pass the id number to id_attached() is better.

And actually the @ss argument is not necessary, because the memcg's
id_attached() handler of course knows it's dealing with the memory
cgroup subsystem.

So I suspect we can just remove all the @ss from all the callbacks..


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-24  9:04     ` Li Zefan
  0 siblings, 0 replies; 87+ messages in thread
From: Li Zefan @ 2010-08-24  9:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, menage

KAMEZAWA Hiroyuki wrote:
> CC'ed to Paul Menage and Li Zefan.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> after successful call of ->create(). css_ID is tightly coupled with
> css struct itself but it is allocated by ->create() call, IOW,
> per-subsystem special allocations.
> 
> To know css_id before creation, this patch adds id_attached() callback.
> after css_ID allocation. This will be used by memory cgroup's quick lookup
> routine.
> 
> Maybe you can think of other implementations as
> 	- pass ID to ->create()
> 	or
> 	- add post_create()
> 	etc...
> But when considering dirtiness of codes, this straightforward patch seems
> good to me. If someone wants post_create(), this patch can be replaced.
> 
> Changelog: 20100820
>  - new approarch.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

...
>  
> Index: mmotm-0811/include/linux/cgroup.h
> ===================================================================
> --- mmotm-0811.orig/include/linux/cgroup.h
> +++ mmotm-0811/include/linux/cgroup.h
> @@ -475,6 +475,7 @@ struct cgroup_subsys {
>  			struct cgroup *cgrp);
>  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);

Maybe pass the id number to id_attached() is better.

And actually the @ss argument is not necessary, because the memcg's
id_attached() handler of course knows it's dealing with the memory
cgroup subsystem.

So I suspect we can just remove all the @ss from all the callbacks..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-24  9:04     ` Li Zefan
@ 2010-08-24 23:58       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24 23:58 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, menage

On Tue, 24 Aug 2010 17:04:52 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > CC'ed to Paul Menage and Li Zefan.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> > after successful call of ->create(). css_ID is tightly coupled with
> > css struct itself but it is allocated by ->create() call, IOW,
> > per-subsystem special allocations.
> > 
> > To know css_id before creation, this patch adds id_attached() callback.
> > after css_ID allocation. This will be used by memory cgroup's quick lookup
> > routine.
> > 
> > Maybe you can think of other implementations as
> > 	- pass ID to ->create()
> > 	or
> > 	- add post_create()
> > 	etc...
> > But when considering dirtiness of codes, this straightforward patch seems
> > good to me. If someone wants post_create(), this patch can be replaced.
> > 
> > Changelog: 20100820
> >  - new approarch.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
> 
> ...
> >  
> > Index: mmotm-0811/include/linux/cgroup.h
> > ===================================================================
> > --- mmotm-0811.orig/include/linux/cgroup.h
> > +++ mmotm-0811/include/linux/cgroup.h
> > @@ -475,6 +475,7 @@ struct cgroup_subsys {
> >  			struct cgroup *cgrp);
> >  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> >  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> > +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> 
> Maybe pass the id number to id_attached() is better.
> 
ok.

> And actually the @ss argument is not necessary, because the memcg's
> id_attached() handler of course knows it's dealing with the memory
> cgroup subsystem.
> 
> So I suspect we can just remove all the @ss from all the callbacks..
> 

I agree. But could you write "remove ss" patch later ?
It seems an design change. I leave "ss" as this for now.

Thanks
-Kame



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-24 23:58       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-24 23:58 UTC (permalink / raw)
  To: Li Zefan
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, menage

On Tue, 24 Aug 2010 17:04:52 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > CC'ed to Paul Menage and Li Zefan.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> > after successful call of ->create(). css_ID is tightly coupled with
> > css struct itself but it is allocated by ->create() call, IOW,
> > per-subsystem special allocations.
> > 
> > To know css_id before creation, this patch adds id_attached() callback.
> > after css_ID allocation. This will be used by memory cgroup's quick lookup
> > routine.
> > 
> > Maybe you can think of other implementations as
> > 	- pass ID to ->create()
> > 	or
> > 	- add post_create()
> > 	etc...
> > But when considering dirtiness of codes, this straightforward patch seems
> > good to me. If someone wants post_create(), this patch can be replaced.
> > 
> > Changelog: 20100820
> >  - new approarch.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
> 
> ...
> >  
> > Index: mmotm-0811/include/linux/cgroup.h
> > ===================================================================
> > --- mmotm-0811.orig/include/linux/cgroup.h
> > +++ mmotm-0811/include/linux/cgroup.h
> > @@ -475,6 +475,7 @@ struct cgroup_subsys {
> >  			struct cgroup *cgrp);
> >  	void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> >  	void (*bind)(struct cgroup_subsys *ss, struct cgroup *root);
> > +	void (*id_attached)(struct cgroup_subsys *ss, struct cgroup *cgrp);
> 
> Maybe pass the id number to id_attached() is better.
> 
ok.

> And actually the @ss argument is not necessary, because the memcg's
> id_attached() handler of course knows it's dealing with the memory
> cgroup subsystem.
> 
> So I suspect we can just remove all the @ss from all the callbacks..
> 

I agree. But could you write "remove ss" patch later ?
It seems an design change. I leave "ss" as this for now.

Thanks
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-20  9:58   ` KAMEZAWA Hiroyuki
@ 2010-08-25  0:09     ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Fri, Aug 20, 2010 at 2:58 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> CC'ed to Paul Menage and Li Zefan.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> after successful call of ->create(). css_ID is tightly coupled with
> css struct itself but it is allocated by ->create() call, IOW,
> per-subsystem special allocations.
>
> To know css_id before creation, this patch adds id_attached() callback.
> after css_ID allocation. This will be used by memory cgroup's quick lookup
> routine.
>
> Maybe you can think of other implementations as
>        - pass ID to ->create()
>        or
>        - add post_create()
>        etc...
> But when considering dirtiness of codes, this straightforward patch seems
> good to me. If someone wants post_create(), this patch can be replaced.

I think I'd prefer the approach where any necessary css_ids are
allocated prior to calling any create methods (which gives the
additional advantage of removing the need to roll back partial
creation of a cgroup in the event of alloc_css_id() failing) and then
passed in to the create() method. The main cgroups framework would
still be responsible for actually filling the css->id field with the
allocated id.

Paul

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:09     ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Fri, Aug 20, 2010 at 2:58 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> CC'ed to Paul Menage and Li Zefan.
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> after successful call of ->create(). css_ID is tightly coupled with
> css struct itself but it is allocated by ->create() call, IOW,
> per-subsystem special allocations.
>
> To know css_id before creation, this patch adds id_attached() callback.
> after css_ID allocation. This will be used by memory cgroup's quick lookup
> routine.
>
> Maybe you can think of other implementations as
>        - pass ID to ->create()
>        or
>        - add post_create()
>        etc...
> But when considering dirtiness of codes, this straightforward patch seems
> good to me. If someone wants post_create(), this patch can be replaced.

I think I'd prefer the approach where any necessary css_ids are
allocated prior to calling any create methods (which gives the
additional advantage of removing the need to roll back partial
creation of a cgroup in the event of alloc_css_id() failing) and then
passed in to the create() method. The main cgroups framework would
still be responsible for actually filling the css->id field with the
allocated id.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-24  9:04     ` Li Zefan
@ 2010-08-25  0:11       ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:11 UTC (permalink / raw)
  To: Li Zefan
  Cc: KAMEZAWA Hiroyuki, linux-mm, nishimura, balbir, gthelen, m-ikeda,
	akpm, linux-kernel, kamezawa.hiroyuki

On Tue, Aug 24, 2010 at 2:04 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>
> Maybe pass the id number to id_attached() is better.
>
> And actually the @ss argument is not necessary, because the memcg's
> id_attached() handler of course knows it's dealing with the memory
> cgroup subsystem.
>
> So I suspect we can just remove all the @ss from all the callbacks..

Yes, I don't think any subsystem uses these. They dated originally
from when, as part of the initial cgroups framwork, I included a
library that could wrap a mostly-unmodified CKRM resource controller
into a cgroups subsystem, at which point the callback code didn't
necessarily know which subsystem it was being called for. But that's
obsolete now.

Paul

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:11       ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:11 UTC (permalink / raw)
  To: Li Zefan
  Cc: KAMEZAWA Hiroyuki, linux-mm, nishimura, balbir, gthelen, m-ikeda,
	akpm, linux-kernel, kamezawa.hiroyuki

On Tue, Aug 24, 2010 at 2:04 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>
> Maybe pass the id number to id_attached() is better.
>
> And actually the @ss argument is not necessary, because the memcg's
> id_attached() handler of course knows it's dealing with the memory
> cgroup subsystem.
>
> So I suspect we can just remove all the @ss from all the callbacks..

Yes, I don't think any subsystem uses these. They dated originally
from when, as part of the initial cgroups framwork, I included a
library that could wrap a mostly-unmodified CKRM resource controller
into a cgroups subsystem, at which point the callback code didn't
necessarily know which subsystem it was being called for. But that's
obsolete now.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:11       ` Paul Menage
@ 2010-08-25  0:17         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  0:17 UTC (permalink / raw)
  To: Paul Menage
  Cc: Li Zefan, linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

On Tue, 24 Aug 2010 17:11:22 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 2:04 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> >
> > Maybe pass the id number to id_attached() is better.
> >
> > And actually the @ss argument is not necessary, because the memcg's
> > id_attached() handler of course knows it's dealing with the memory
> > cgroup subsystem.
> >
> > So I suspect we can just remove all the @ss from all the callbacks..
> 
> Yes, I don't think any subsystem uses these. They dated originally
> from when, as part of the initial cgroups framwork, I included a
> library that could wrap a mostly-unmodified CKRM resource controller
> into a cgroups subsystem, at which point the callback code didn't
> necessarily know which subsystem it was being called for. But that's
> obsolete now.
> 

Hmm, then, should I remove it in the next version, or leave it as it is
now and should be removed by another total clean up ?

(IOW, mixture of inconsistent interface is O.K. ?)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:17         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  0:17 UTC (permalink / raw)
  To: Paul Menage
  Cc: Li Zefan, linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

On Tue, 24 Aug 2010 17:11:22 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 2:04 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> >
> > Maybe pass the id number to id_attached() is better.
> >
> > And actually the @ss argument is not necessary, because the memcg's
> > id_attached() handler of course knows it's dealing with the memory
> > cgroup subsystem.
> >
> > So I suspect we can just remove all the @ss from all the callbacks..
> 
> Yes, I don't think any subsystem uses these. They dated originally
> from when, as part of the initial cgroups framwork, I included a
> library that could wrap a mostly-unmodified CKRM resource controller
> into a cgroups subsystem, at which point the callback code didn't
> necessarily know which subsystem it was being called for. But that's
> obsolete now.
> 

Hmm, then, should I remove it in the next version, or leave it as it is
now and should be removed by another total clean up ?

(IOW, mixture of inconsistent interface is O.K. ?)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:09     ` Paul Menage
@ 2010-08-25  0:20       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  0:20 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 17:09:25 -0700
Paul Menage <menage@google.com> wrote:

> On Fri, Aug 20, 2010 at 2:58 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > CC'ed to Paul Menage and Li Zefan.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> > after successful call of ->create(). css_ID is tightly coupled with
> > css struct itself but it is allocated by ->create() call, IOW,
> > per-subsystem special allocations.
> >
> > To know css_id before creation, this patch adds id_attached() callback.
> > after css_ID allocation. This will be used by memory cgroup's quick lookup
> > routine.
> >
> > Maybe you can think of other implementations as
> >        - pass ID to ->create()
> >        or
> >        - add post_create()
> >        etc...
> > But when considering dirtiness of codes, this straightforward patch seems
> > good to me. If someone wants post_create(), this patch can be replaced.
> 
> I think I'd prefer the approach where any necessary css_ids are
> allocated prior to calling any create methods (which gives the
> additional advantage of removing the need to roll back partial
> creation of a cgroup in the event of alloc_css_id() failing) and then
> passed in to the create() method. The main cgroups framework would
> still be responsible for actually filling the css->id field with the
> allocated id.
> 

Hmm, sure. I'll change the ->create() interface.  O.K. ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:20       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  0:20 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 17:09:25 -0700
Paul Menage <menage@google.com> wrote:

> On Fri, Aug 20, 2010 at 2:58 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > CC'ed to Paul Menage and Li Zefan.
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> >
> > When cgroup subsystem use ID (ss->use_id==1), each css's ID is assigned
> > after successful call of ->create(). css_ID is tightly coupled with
> > css struct itself but it is allocated by ->create() call, IOW,
> > per-subsystem special allocations.
> >
> > To know css_id before creation, this patch adds id_attached() callback.
> > after css_ID allocation. This will be used by memory cgroup's quick lookup
> > routine.
> >
> > Maybe you can think of other implementations as
> > A  A  A  A - pass ID to ->create()
> > A  A  A  A or
> > A  A  A  A - add post_create()
> > A  A  A  A etc...
> > But when considering dirtiness of codes, this straightforward patch seems
> > good to me. If someone wants post_create(), this patch can be replaced.
> 
> I think I'd prefer the approach where any necessary css_ids are
> allocated prior to calling any create methods (which gives the
> additional advantage of removing the need to roll back partial
> creation of a cgroup in the event of alloc_css_id() failing) and then
> passed in to the create() method. The main cgroups framework would
> still be responsible for actually filling the css->id field with the
> allocated id.
> 

Hmm, sure. I'll change the ->create() interface.  O.K. ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:17         ` KAMEZAWA Hiroyuki
@ 2010-08-25  0:25           ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Li Zefan, linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

On Tue, Aug 24, 2010 at 5:17 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm, then, should I remove it in the next version, or leave it as it is
> now and should be removed by another total clean up ?

I think these are two separate issues. A cleanup to remove the ss
parameters from the existing callback methods would be separate from
your proposed id callback, which I think is better done (as I
mentioned above) by passing the id to create()

Paul

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:25           ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Li Zefan, linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki

On Tue, Aug 24, 2010 at 5:17 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm, then, should I remove it in the next version, or leave it as it is
> now and should be removed by another total clean up ?

I think these are two separate issues. A cleanup to remove the ss
parameters from the existing callback methods would be separate from
your proposed id callback, which I think is better done (as I
mentioned above) by passing the id to create()

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:20       ` KAMEZAWA Hiroyuki
@ 2010-08-25  0:34         ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 5:20 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm, sure. I'll change the ->create() interface.  O.K. ?
>

Hmm. An alternative (possibly cleaner) would be:

1) add a css_size field in cgroup_subsys that contains the size of the
per-subsystem structure
2) change cgroups to allocate and populate the css *before* calling
create(), since it now knows the actual size
3) simplify all the subsystem create() methods since they no longer
have to worry about allocation or out-of-memory handling
4) also add a top_css field in cgroups that allows cpusets to use the
statically-allocated top_cpuset since it's initialized prior to memory
allocation being reliable

This avoids us having to pass in any new parameters to the create()
method in future since they can be populated in the CSS.

Paul

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:34         ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 5:20 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm, sure. I'll change the ->create() interface.  O.K. ?
>

Hmm. An alternative (possibly cleaner) would be:

1) add a css_size field in cgroup_subsys that contains the size of the
per-subsystem structure
2) change cgroups to allocate and populate the css *before* calling
create(), since it now knows the actual size
3) simplify all the subsystem create() methods since they no longer
have to worry about allocation or out-of-memory handling
4) also add a top_css field in cgroups that allows cpusets to use the
statically-allocated top_cpuset since it's initialized prior to memory
allocation being reliable

This avoids us having to pass in any new parameters to the create()
method in future since they can be populated in the CSS.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:34         ` Paul Menage
@ 2010-08-25  0:37           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  0:37 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 17:34:38 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 5:20 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Hmm, sure. I'll change the ->create() interface.  O.K. ?
> >
> 
> Hmm. An alternative (possibly cleaner) would be:
> 
> 1) add a css_size field in cgroup_subsys that contains the size of the
> per-subsystem structure
> 2) change cgroups to allocate and populate the css *before* calling
> create(), since it now knows the actual size
> 3) simplify all the subsystem create() methods since they no longer
> have to worry about allocation or out-of-memory handling
> 4) also add a top_css field in cgroups that allows cpusets to use the
> statically-allocated top_cpuset since it's initialized prior to memory
> allocation being reliable
> 
> This avoids us having to pass in any new parameters to the create()
> method in future since they can be populated in the CSS.
> 

Ou...I'm sorry but I would like to use attach_id() for this time.
Forgive me, above seems a big change.
I'd like to write a series of patch to do above, later.
At least, to do a trial.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:37           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  0:37 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 17:34:38 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 5:20 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Hmm, sure. I'll change the ->create() interface. A O.K. ?
> >
> 
> Hmm. An alternative (possibly cleaner) would be:
> 
> 1) add a css_size field in cgroup_subsys that contains the size of the
> per-subsystem structure
> 2) change cgroups to allocate and populate the css *before* calling
> create(), since it now knows the actual size
> 3) simplify all the subsystem create() methods since they no longer
> have to worry about allocation or out-of-memory handling
> 4) also add a top_css field in cgroups that allows cpusets to use the
> statically-allocated top_cpuset since it's initialized prior to memory
> allocation being reliable
> 
> This avoids us having to pass in any new parameters to the create()
> method in future since they can be populated in the CSS.
> 

Ou...I'm sorry but I would like to use attach_id() for this time.
Forgive me, above seems a big change.
I'd like to write a series of patch to do above, later.
At least, to do a trial.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:37           ` KAMEZAWA Hiroyuki
@ 2010-08-25  0:46             ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 5:37 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Ou...I'm sorry but I would like to use attach_id() for this time.
> Forgive me, above seems a big change.
> I'd like to write a series of patch to do above, later.
> At least, to do a trial.
>

Sure, for testing outside the tree. I don't think introducing new code
into mainline that's intentionally ugly and cutting corners is a good
idea.

Paul

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  0:46             ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  0:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 5:37 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Ou...I'm sorry but I would like to use attach_id() for this time.
> Forgive me, above seems a big change.
> I'd like to write a series of patch to do above, later.
> At least, to do a trial.
>

Sure, for testing outside the tree. I don't think introducing new code
into mainline that's intentionally ugly and cutting corners is a good
idea.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  0:46             ` Paul Menage
@ 2010-08-25  1:03               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  1:03 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 17:46:03 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 5:37 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Ou...I'm sorry but I would like to use attach_id() for this time.
> > Forgive me, above seems a big change.
> > I'd like to write a series of patch to do above, later.
> > At least, to do a trial.
> >
> 
> Sure, for testing outside the tree. I don't think introducing new code
> into mainline that's intentionally ugly and cutting corners is a good
> idea.
> 

Hmm. How this pseudo code looks like ? This passes "new id" via
cgroup->subsys[array] at creation. (Using union will be better, maybe).

---
 include/linux/cgroup.h |    1 +
 kernel/cgroup.c        |   26 +++++++++++++++++++-------
 2 files changed, 20 insertions(+), 7 deletions(-)

Index: mmotm-0811/include/linux/cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/cgroup.h
+++ mmotm-0811/include/linux/cgroup.h
@@ -508,6 +508,7 @@ struct cgroup_subsys {
 	struct cgroupfs_root *root;
 	struct list_head sibling;
 	/* used when use_id == true */
+	int max_id;
 	struct idr idr;
 	spinlock_t id_lock;
 
Index: mmotm-0811/kernel/cgroup.c
===================================================================
--- mmotm-0811.orig/kernel/cgroup.c
+++ mmotm-0811/kernel/cgroup.c
@@ -3335,19 +3335,31 @@ static long cgroup_create(struct cgroup 
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
 
 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(ss, cgrp);
+		struct cgroup_subsys_state *css;
 
-		if (IS_ERR(css)) {
-			err = PTR_ERR(css);
-			goto err_destroy;
-		}
-		init_cgroup_css(css, ss, cgrp);
 		if (ss->use_id) {
 			err = alloc_css_id(ss, parent, cgrp);
 			if (err)
 				goto err_destroy;
+			/*
+ 			 * Here, created css_id is recorded into
+ 			 * cgrp->subsys[ss->subsys_id]
+			 * array and passed to subsystem.
+			 */
+		} else
+			cgrp->subsys[ss->subsys_id] = NULL;
+
+		css = ss->create(ss, cgrp);
+
+		if (IS_ERR(css)) {
+			/* forget preallocated id */
+			if (cgrp->subsys[ss->subsys_id])
+				free_css_id_direct((struct css_id *)
+					cgrp->subsys[ss->subsys_id]);
+			err = PTR_ERR(css);
+			goto err_destroy;
 		}
-		/* At error, ->destroy() callback has to free assigned ID. */
+		init_cgroup_css(css, ss, cgrp);
 	}
 
 	cgroup_lock_hierarchy(root);

	


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  1:03               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  1:03 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 17:46:03 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 5:37 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Ou...I'm sorry but I would like to use attach_id() for this time.
> > Forgive me, above seems a big change.
> > I'd like to write a series of patch to do above, later.
> > At least, to do a trial.
> >
> 
> Sure, for testing outside the tree. I don't think introducing new code
> into mainline that's intentionally ugly and cutting corners is a good
> idea.
> 

Hmm. How this pseudo code looks like ? This passes "new id" via
cgroup->subsys[array] at creation. (Using union will be better, maybe).

---
 include/linux/cgroup.h |    1 +
 kernel/cgroup.c        |   26 +++++++++++++++++++-------
 2 files changed, 20 insertions(+), 7 deletions(-)

Index: mmotm-0811/include/linux/cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/cgroup.h
+++ mmotm-0811/include/linux/cgroup.h
@@ -508,6 +508,7 @@ struct cgroup_subsys {
 	struct cgroupfs_root *root;
 	struct list_head sibling;
 	/* used when use_id == true */
+	int max_id;
 	struct idr idr;
 	spinlock_t id_lock;
 
Index: mmotm-0811/kernel/cgroup.c
===================================================================
--- mmotm-0811.orig/kernel/cgroup.c
+++ mmotm-0811/kernel/cgroup.c
@@ -3335,19 +3335,31 @@ static long cgroup_create(struct cgroup 
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
 
 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(ss, cgrp);
+		struct cgroup_subsys_state *css;
 
-		if (IS_ERR(css)) {
-			err = PTR_ERR(css);
-			goto err_destroy;
-		}
-		init_cgroup_css(css, ss, cgrp);
 		if (ss->use_id) {
 			err = alloc_css_id(ss, parent, cgrp);
 			if (err)
 				goto err_destroy;
+			/*
+ 			 * Here, created css_id is recorded into
+ 			 * cgrp->subsys[ss->subsys_id]
+			 * array and passed to subsystem.
+			 */
+		} else
+			cgrp->subsys[ss->subsys_id] = NULL;
+
+		css = ss->create(ss, cgrp);
+
+		if (IS_ERR(css)) {
+			/* forget preallocated id */
+			if (cgrp->subsys[ss->subsys_id])
+				free_css_id_direct((struct css_id *)
+					cgrp->subsys[ss->subsys_id]);
+			err = PTR_ERR(css);
+			goto err_destroy;
 		}
-		/* At error, ->destroy() callback has to free assigned ID. */
+		init_cgroup_css(css, ss, cgrp);
 	}
 
 	cgroup_lock_hierarchy(root);

	

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  1:03               ` KAMEZAWA Hiroyuki
@ 2010-08-25  1:35                 ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  1:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 6:03 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm. How this pseudo code looks like ? This passes "new id" via
> cgroup->subsys[array] at creation. (Using union will be better, maybe).
>

That's rather ugly. I was thinking of something more like this. (Not
even compiled yet, and the only subsystem updated is cpuset).

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed3e92e..063d9f2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -458,8 +458,7 @@ void cgroup_release_and_wakeup_rmdir(struct
cgroup_subsys_state *css);
  */

 struct cgroup_subsys {
-	struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
-						  struct cgroup *cgrp);
+	int (*create)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
@@ -513,6 +512,12 @@ struct cgroup_subsys {

 	/* should be defined only by modular subsystems */
 	struct module *module;
+
+	/* Total size of the subsystem's CSS object */
+	size_t css_size;
+
+	/* If non-NULL, the CSS to use for the root cgroup */
+	struct cgroup_subsys_state *root_css;
 };

 #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 192f88c..c589a41 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3307,6 +3307,7 @@ static long cgroup_create(struct cgroup *parent,
struct dentry *dentry,
 			     mode_t mode)
 {
 	struct cgroup *cgrp;
+	struct cgroup_subsys_state *new_css[CGROUP_SUBSYS_COUNT] = {};
 	struct cgroupfs_root *root = parent->root;
 	int err = 0;
 	struct cgroup_subsys *ss;
@@ -3325,6 +3326,16 @@ static long cgroup_create(struct cgroup
*parent, struct dentry *dentry,

 	mutex_lock(&cgroup_mutex);

+	for_each_subsys(root, ss) {
+		int id = ss->subsys_id;
+		new_css[id] = kzalloc(ss->css_size, GFP_KERNEL);
+		if (!new_css) {
+			/* Failed to allocate memory */
+			err = -ENOMEM;
+			goto err_destroy;
+		}
+	}
+
 	init_cgroup_housekeeping(cgrp);

 	cgrp->parent = parent;
@@ -3335,19 +3346,19 @@ static long cgroup_create(struct cgroup
*parent, struct dentry *dentry,
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);

 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(ss, cgrp);
-
-		if (IS_ERR(css)) {
-			err = PTR_ERR(css);
-			goto err_destroy;
-		}
-		init_cgroup_css(css, ss, cgrp);
+		int id = ss->subsys_id;
+		init_cgroup_css(new_css[id], ss, cgrp);
 		if (ss->use_id) {
 			err = alloc_css_id(ss, parent, cgrp);
 			if (err)
 				goto err_destroy;
 		}
-		/* At error, ->destroy() callback has to free assigned ID. */
+		err = ss->create(ss, cgrp);
+		if (err) {
+			free_css_id(ss, css->id);
+			goto err_destroy;
+		}
+		new_css[id] = NULL;
 	}

 	cgroup_lock_hierarchy(root);
@@ -3380,7 +3391,10 @@ static long cgroup_create(struct cgroup
*parent, struct dentry *dentry,
  err_destroy:

 	for_each_subsys(root, ss) {
-		if (cgrp->subsys[ss->subsys_id])
+		int id = ss->subsys_id;
+		if (new_css[id])
+			kfree(new_css[id]);
+		else if (cgrp->subsys[id])
 			ss->destroy(ss, cgrp);
 	}

@@ -3607,11 +3621,16 @@ static void __init cgroup_init_subsys(struct
cgroup_subsys *ss)
 	/* Create the top cgroup state for this subsystem */
 	list_add(&ss->sibling, &rootnode.subsys_list);
 	ss->root = &rootnode;
-	css = ss->create(ss, dummytop);
+	if (ss->root_css)
+		css = ss->root_css;
+	else
+		css = kzalloc(ss->css_size, GFP_KERNEL);
 	/* We don't handle early failures gracefully */
-	BUG_ON(IS_ERR(css));
+	BUG_ON(!css);
 	init_cgroup_css(css, ss, dummytop);

+	BUG_ON(ss->create(ss, dummytop));
+
 	/* Update the init_css_set to contain a subsys
 	 * pointer to this state - since the subsystem is
 	 * newly registered, all tasks and hence the
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b23c097..7720a79 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1871,24 +1871,12 @@ static void cpuset_post_clone(struct cgroup_subsys *ss,
  *	cont:	control group that the new cpuset will be part of
  */

-static struct cgroup_subsys_state *cpuset_create(
-	struct cgroup_subsys *ss,
-	struct cgroup *cont)
+static int cpuset_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
-	struct cpuset *cs;
-	struct cpuset *parent;
-
-	if (!cont->parent) {
-		return &top_cpuset.css;
-	}
-	parent = cgroup_cs(cont->parent);
-	cs = kmalloc(sizeof(*cs), GFP_KERNEL);
-	if (!cs)
-		return ERR_PTR(-ENOMEM);
-	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
-		kfree(cs);
-		return ERR_PTR(-ENOMEM);
-	}
+	struct cpuset *cs = cgroup_cs(cont);
+	struct cpuset *parent = cgroup_cs(cont->parent);
+	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
+		return -ENOMEM;

 	cs->flags = 0;
 	if (is_spread_page(parent))
@@ -1903,7 +1891,7 @@ static struct cgroup_subsys_state *cpuset_create(

 	cs->parent = parent;
 	number_of_cpusets++;
-	return &cs->css ;
+	return 0;
 }

 /*
@@ -1934,6 +1922,8 @@ struct cgroup_subsys cpuset_subsys = {
 	.post_clone = cpuset_post_clone,
 	.subsys_id = cpuset_subsys_id,
 	.early_init = 1,
+	.css_size = sizeof(struct cpuset),
+	.root_css = &top_cpuset.css;
 };

 /**

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  1:35                 ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  1:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 6:03 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm. How this pseudo code looks like ? This passes "new id" via
> cgroup->subsys[array] at creation. (Using union will be better, maybe).
>

That's rather ugly. I was thinking of something more like this. (Not
even compiled yet, and the only subsystem updated is cpuset).

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index ed3e92e..063d9f2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -458,8 +458,7 @@ void cgroup_release_and_wakeup_rmdir(struct
cgroup_subsys_state *css);
  */

 struct cgroup_subsys {
-	struct cgroup_subsys_state *(*create)(struct cgroup_subsys *ss,
-						  struct cgroup *cgrp);
+	int (*create)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
@@ -513,6 +512,12 @@ struct cgroup_subsys {

 	/* should be defined only by modular subsystems */
 	struct module *module;
+
+	/* Total size of the subsystem's CSS object */
+	size_t css_size;
+
+	/* If non-NULL, the CSS to use for the root cgroup */
+	struct cgroup_subsys_state *root_css;
 };

 #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 192f88c..c589a41 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3307,6 +3307,7 @@ static long cgroup_create(struct cgroup *parent,
struct dentry *dentry,
 			     mode_t mode)
 {
 	struct cgroup *cgrp;
+	struct cgroup_subsys_state *new_css[CGROUP_SUBSYS_COUNT] = {};
 	struct cgroupfs_root *root = parent->root;
 	int err = 0;
 	struct cgroup_subsys *ss;
@@ -3325,6 +3326,16 @@ static long cgroup_create(struct cgroup
*parent, struct dentry *dentry,

 	mutex_lock(&cgroup_mutex);

+	for_each_subsys(root, ss) {
+		int id = ss->subsys_id;
+		new_css[id] = kzalloc(ss->css_size, GFP_KERNEL);
+		if (!new_css) {
+			/* Failed to allocate memory */
+			err = -ENOMEM;
+			goto err_destroy;
+		}
+	}
+
 	init_cgroup_housekeeping(cgrp);

 	cgrp->parent = parent;
@@ -3335,19 +3346,19 @@ static long cgroup_create(struct cgroup
*parent, struct dentry *dentry,
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);

 	for_each_subsys(root, ss) {
-		struct cgroup_subsys_state *css = ss->create(ss, cgrp);
-
-		if (IS_ERR(css)) {
-			err = PTR_ERR(css);
-			goto err_destroy;
-		}
-		init_cgroup_css(css, ss, cgrp);
+		int id = ss->subsys_id;
+		init_cgroup_css(new_css[id], ss, cgrp);
 		if (ss->use_id) {
 			err = alloc_css_id(ss, parent, cgrp);
 			if (err)
 				goto err_destroy;
 		}
-		/* At error, ->destroy() callback has to free assigned ID. */
+		err = ss->create(ss, cgrp);
+		if (err) {
+			free_css_id(ss, css->id);
+			goto err_destroy;
+		}
+		new_css[id] = NULL;
 	}

 	cgroup_lock_hierarchy(root);
@@ -3380,7 +3391,10 @@ static long cgroup_create(struct cgroup
*parent, struct dentry *dentry,
  err_destroy:

 	for_each_subsys(root, ss) {
-		if (cgrp->subsys[ss->subsys_id])
+		int id = ss->subsys_id;
+		if (new_css[id])
+			kfree(new_css[id]);
+		else if (cgrp->subsys[id])
 			ss->destroy(ss, cgrp);
 	}

@@ -3607,11 +3621,16 @@ static void __init cgroup_init_subsys(struct
cgroup_subsys *ss)
 	/* Create the top cgroup state for this subsystem */
 	list_add(&ss->sibling, &rootnode.subsys_list);
 	ss->root = &rootnode;
-	css = ss->create(ss, dummytop);
+	if (ss->root_css)
+		css = ss->root_css;
+	else
+		css = kzalloc(ss->css_size, GFP_KERNEL);
 	/* We don't handle early failures gracefully */
-	BUG_ON(IS_ERR(css));
+	BUG_ON(!css);
 	init_cgroup_css(css, ss, dummytop);

+	BUG_ON(ss->create(ss, dummytop));
+
 	/* Update the init_css_set to contain a subsys
 	 * pointer to this state - since the subsystem is
 	 * newly registered, all tasks and hence the
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index b23c097..7720a79 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1871,24 +1871,12 @@ static void cpuset_post_clone(struct cgroup_subsys *ss,
  *	cont:	control group that the new cpuset will be part of
  */

-static struct cgroup_subsys_state *cpuset_create(
-	struct cgroup_subsys *ss,
-	struct cgroup *cont)
+static int cpuset_create(struct cgroup_subsys *ss, struct cgroup *cont)
 {
-	struct cpuset *cs;
-	struct cpuset *parent;
-
-	if (!cont->parent) {
-		return &top_cpuset.css;
-	}
-	parent = cgroup_cs(cont->parent);
-	cs = kmalloc(sizeof(*cs), GFP_KERNEL);
-	if (!cs)
-		return ERR_PTR(-ENOMEM);
-	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) {
-		kfree(cs);
-		return ERR_PTR(-ENOMEM);
-	}
+	struct cpuset *cs = cgroup_cs(cont);
+	struct cpuset *parent = cgroup_cs(cont->parent);
+	if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
+		return -ENOMEM;

 	cs->flags = 0;
 	if (is_spread_page(parent))
@@ -1903,7 +1891,7 @@ static struct cgroup_subsys_state *cpuset_create(

 	cs->parent = parent;
 	number_of_cpusets++;
-	return &cs->css ;
+	return 0;
 }

 /*
@@ -1934,6 +1922,8 @@ struct cgroup_subsys cpuset_subsys = {
 	.post_clone = cpuset_post_clone,
 	.subsys_id = cpuset_subsys_id,
 	.early_init = 1,
+	.css_size = sizeof(struct cpuset),
+	.root_css = &top_cpuset.css;
 };

 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  1:35                 ` Paul Menage
@ 2010-08-25  1:42                   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  1:42 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 18:35:00 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 6:03 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Hmm. How this pseudo code looks like ? This passes "new id" via
> > cgroup->subsys[array] at creation. (Using union will be better, maybe).
> >
> 
> That's rather ugly. I was thinking of something more like this. (Not
> even compiled yet, and the only subsystem updated is cpuset).
> 

Hmm, but placing css and subsystem's its own structure in different cache line
can increase cacheline/TLB miss, I think.

I wonder I should stop this patch series and do small thing.
I prefer to call alloc_css_id() by ->create() call by subsys's its own decistion
is much better and cleaner. (as my original design)

mem_cgroup_create()
{

	cgroup_attach_css_id(ss, cgrp, &mem->css);
}

And then, there will be no difficulty.

Do we have to call alloc_css_id() in kernel/cgroup.c ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  1:42                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  1:42 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 18:35:00 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 6:03 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Hmm. How this pseudo code looks like ? This passes "new id" via
> > cgroup->subsys[array] at creation. (Using union will be better, maybe).
> >
> 
> That's rather ugly. I was thinking of something more like this. (Not
> even compiled yet, and the only subsystem updated is cpuset).
> 

Hmm, but placing css and subsystem's its own structure in different cache line
can increase cacheline/TLB miss, I think.

I wonder I should stop this patch series and do small thing.
I prefer to call alloc_css_id() by ->create() call by subsys's its own decistion
is much better and cleaner. (as my original design)

mem_cgroup_create()
{

	cgroup_attach_css_id(ss, cgrp, &mem->css);
}

And then, there will be no difficulty.

Do we have to call alloc_css_id() in kernel/cgroup.c ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  1:42                   ` KAMEZAWA Hiroyuki
@ 2010-08-25  1:52                     ` Paul Menage
  -1 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  1:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 6:42 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm, but placing css and subsystem's its own structure in different cache line
> can increase cacheline/TLB miss, I think.

My patch shouldn't affect the memory placement of any structures.
struct cgroup_subsys_state is still embedded in the per-subsystem
state.

>
> Do we have to call alloc_css_id() in kernel/cgroup.c ?

I guess not, if no-one's using it except for memcg. The general
approach of allocating the CSS in cgroup.c rather than in every
subsystem is something that I'd like to do separately, though.

Paul

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  1:52                     ` Paul Menage
  0 siblings, 0 replies; 87+ messages in thread
From: Paul Menage @ 2010-08-25  1:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, Aug 24, 2010 at 6:42 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
> Hmm, but placing css and subsystem's its own structure in different cache line
> can increase cacheline/TLB miss, I think.

My patch shouldn't affect the memory placement of any structures.
struct cgroup_subsys_state is still embedded in the per-subsystem
state.

>
> Do we have to call alloc_css_id() in kernel/cgroup.c ?

I guess not, if no-one's using it except for memcg. The general
approach of allocating the CSS in cgroup.c rather than in every
subsystem is something that I'd like to do separately, though.

Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
  2010-08-25  1:52                     ` Paul Menage
@ 2010-08-25  2:29                       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  2:29 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 18:52:20 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 6:42 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Hmm, but placing css and subsystem's its own structure in different cache line
> > can increase cacheline/TLB miss, I think.
> 
> My patch shouldn't affect the memory placement of any structures.
> struct cgroup_subsys_state is still embedded in the per-subsystem
> state.
> 
> >
> > Do we have to call alloc_css_id() in kernel/cgroup.c ?
> 
> I guess not, if no-one's using it except for memcg. The general
> approach of allocating the CSS in cgroup.c rather than in every
> subsystem is something that I'd like to do separately, though.
> 

I'll use this for v6.
When you move css's allcation up to kernel/cgroup.c, this can be moved, too.
My regret is that I should argue CSS_ID is very tightly coupled with css itself
and should use this design from the 1st version.

-Kame

==

---
 block/blk-cgroup.c     |    6 ++++++
 include/linux/cgroup.h |   16 +++++++++-------
 kernel/cgroup.c        |   35 +++++++++--------------------------
 mm/memcontrol.c        |    4 ++++
 4 files changed, 28 insertions(+), 33 deletions(-)

Index: mmotm-0811/kernel/cgroup.c
===================================================================
--- mmotm-0811.orig/kernel/cgroup.c
+++ mmotm-0811/kernel/cgroup.c
@@ -770,9 +770,6 @@ static struct backing_dev_info cgroup_ba
 	.capabilities	= BDI_CAP_NO_ACCT_AND_WRITEBACK,
 };
 
-static int alloc_css_id(struct cgroup_subsys *ss,
-			struct cgroup *parent, struct cgroup *child);
-
 static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 {
 	struct inode *inode = new_inode(sb);
@@ -3257,7 +3254,8 @@ static void init_cgroup_css(struct cgrou
 	css->cgroup = cgrp;
 	atomic_set(&css->refcnt, 1);
 	css->flags = 0;
-	css->id = NULL;
+	if (!ss->use_id)
+		css->id = NULL;
 	if (cgrp == dummytop)
 		set_bit(CSS_ROOT, &css->flags);
 	BUG_ON(cgrp->subsys[ss->subsys_id]);
@@ -3342,11 +3340,6 @@ static long cgroup_create(struct cgroup 
 			goto err_destroy;
 		}
 		init_cgroup_css(css, ss, cgrp);
-		if (ss->use_id) {
-			err = alloc_css_id(ss, parent, cgrp);
-			if (err)
-				goto err_destroy;
-		}
 		/* At error, ->destroy() callback has to free assigned ID. */
 	}
 
@@ -3709,17 +3702,6 @@ int __init_or_module cgroup_load_subsys(
 
 	/* our new subsystem will be attached to the dummy hierarchy. */
 	init_cgroup_css(css, ss, dummytop);
-	/* init_idr must be after init_cgroup_css because it sets css->id. */
-	if (ss->use_id) {
-		int ret = cgroup_init_idr(ss, css);
-		if (ret) {
-			dummytop->subsys[ss->subsys_id] = NULL;
-			ss->destroy(ss, dummytop);
-			subsys[i] = NULL;
-			mutex_unlock(&cgroup_mutex);
-			return ret;
-		}
-	}
 
 	/*
 	 * Now we need to entangle the css into the existing css_sets. unlike
@@ -3888,8 +3870,6 @@ int __init cgroup_init(void)
 		struct cgroup_subsys *ss = subsys[i];
 		if (!ss->early_init)
 			cgroup_init_subsys(ss);
-		if (ss->use_id)
-			cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
 	}
 
 	/* Add init_css_set to the hash table */
@@ -4603,8 +4583,8 @@ err_out:
 
 }
 
-static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
-					    struct cgroup_subsys_state *rootcss)
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+			    struct cgroup_subsys_state *rootcss)
 {
 	struct css_id *newid;
 
@@ -4621,13 +4601,16 @@ static int __init_or_module cgroup_init_
 	return 0;
 }
 
-static int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *parent,
-			struct cgroup *child)
+int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *cgrp)
 {
 	int subsys_id, i, depth = 0;
+	struct cgroup *parent = cgrp->parent;
 	struct cgroup_subsys_state *parent_css, *child_css;
 	struct css_id *child_id, *parent_id;
 
+	if (cgrp->parent == NULL)
+		return cgroup_init_idr(ss, cgrp->subsys[ss->subsys_id]);
+
 	subsys_id = ss->subsys_id;
 	parent_css = parent->subsys[subsys_id];
 	child_css = child->subsys[subsys_id];
Index: mmotm-0811/include/linux/cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/cgroup.h
+++ mmotm-0811/include/linux/cgroup.h
@@ -583,9 +583,11 @@ int cgroup_attach_task_current_cg(struct
 /*
  * CSS ID is ID for cgroup_subsys_state structs under subsys. This only works
  * if cgroup_subsys.use_id == true. It can be used for looking up and scanning.
- * CSS ID is assigned at cgroup allocation (create) automatically
- * and removed when subsys calls free_css_id() function. This is because
- * the lifetime of cgroup_subsys_state is subsys's matter.
+ * CSS ID must be assigned by subsys itself at cgroup creation and deleted
+ * when subsys calls free_css_id() function. This is because the life time of
+ * of cgroup_subsys_state is subsys's matter.
+ *
+ * ID->css look up is available after cgroup's directory is populated.
  *
  * Looking up and scanning function should be called under rcu_read_lock().
  * Taking cgroup_mutex()/hierarchy_mutex() is not necessary for following calls.
@@ -593,10 +595,10 @@ int cgroup_attach_task_current_cg(struct
  * destroyed". The caller should check css and cgroup's status.
  */
 
-/*
- * Typically Called at ->destroy(), or somewhere the subsys frees
- * cgroup_subsys_state.
- */
+/* Should be called in ->create() by subsys itself */
+int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *child);
+
+/* Typically Called at ->destroy(), or somewhere the subsys frees css */
 void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
 
 /* Find a cgroup_subsys_state which has given ID */
Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -4141,6 +4141,10 @@ mem_cgroup_create(struct cgroup_subsys *
 		if (alloc_mem_cgroup_per_zone_info(mem, node))
 			goto free_out;
 
+	if (alloc_css_id(ss, cont))
+		goto free_out;
+	/* Here, css_id(&mem->css) works. but css_lookup(id)->mem doesn't */
+
 	/* root ? */
 	if (cont->parent == NULL) {
 		int cpu;
Index: mmotm-0811/block/blk-cgroup.c
===================================================================
--- mmotm-0811.orig/block/blk-cgroup.c
+++ mmotm-0811/block/blk-cgroup.c
@@ -958,6 +958,7 @@ blkiocg_create(struct cgroup_subsys *sub
 {
 	struct blkio_cgroup *blkcg;
 	struct cgroup *parent = cgroup->parent;
+	int ret;
 
 	if (!parent) {
 		blkcg = &blkio_root_cgroup;
@@ -971,6 +972,11 @@ blkiocg_create(struct cgroup_subsys *sub
 	blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
 	if (!blkcg)
 		return ERR_PTR(-ENOMEM);
+	ret = alloc_css_id(subsys, cgroup);
+	if (ret) {
+		kfree(blkcg);
+		return ret;
+	}
 
 	blkcg->weight = BLKIO_WEIGHT_DEFAULT;
 done:


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/5] cgroup: ID notification call back
@ 2010-08-25  2:29                       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 87+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-08-25  2:29 UTC (permalink / raw)
  To: Paul Menage
  Cc: linux-mm, nishimura, balbir, gthelen, m-ikeda, akpm,
	linux-kernel, kamezawa.hiroyuki, lizf

On Tue, 24 Aug 2010 18:52:20 -0700
Paul Menage <menage@google.com> wrote:

> On Tue, Aug 24, 2010 at 6:42 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >
> > Hmm, but placing css and subsystem's its own structure in different cache line
> > can increase cacheline/TLB miss, I think.
> 
> My patch shouldn't affect the memory placement of any structures.
> struct cgroup_subsys_state is still embedded in the per-subsystem
> state.
> 
> >
> > Do we have to call alloc_css_id() in kernel/cgroup.c ?
> 
> I guess not, if no-one's using it except for memcg. The general
> approach of allocating the CSS in cgroup.c rather than in every
> subsystem is something that I'd like to do separately, though.
> 

I'll use this for v6.
When you move css's allcation up to kernel/cgroup.c, this can be moved, too.
My regret is that I should argue CSS_ID is very tightly coupled with css itself
and should use this design from the 1st version.

-Kame

==

---
 block/blk-cgroup.c     |    6 ++++++
 include/linux/cgroup.h |   16 +++++++++-------
 kernel/cgroup.c        |   35 +++++++++--------------------------
 mm/memcontrol.c        |    4 ++++
 4 files changed, 28 insertions(+), 33 deletions(-)

Index: mmotm-0811/kernel/cgroup.c
===================================================================
--- mmotm-0811.orig/kernel/cgroup.c
+++ mmotm-0811/kernel/cgroup.c
@@ -770,9 +770,6 @@ static struct backing_dev_info cgroup_ba
 	.capabilities	= BDI_CAP_NO_ACCT_AND_WRITEBACK,
 };
 
-static int alloc_css_id(struct cgroup_subsys *ss,
-			struct cgroup *parent, struct cgroup *child);
-
 static struct inode *cgroup_new_inode(mode_t mode, struct super_block *sb)
 {
 	struct inode *inode = new_inode(sb);
@@ -3257,7 +3254,8 @@ static void init_cgroup_css(struct cgrou
 	css->cgroup = cgrp;
 	atomic_set(&css->refcnt, 1);
 	css->flags = 0;
-	css->id = NULL;
+	if (!ss->use_id)
+		css->id = NULL;
 	if (cgrp == dummytop)
 		set_bit(CSS_ROOT, &css->flags);
 	BUG_ON(cgrp->subsys[ss->subsys_id]);
@@ -3342,11 +3340,6 @@ static long cgroup_create(struct cgroup 
 			goto err_destroy;
 		}
 		init_cgroup_css(css, ss, cgrp);
-		if (ss->use_id) {
-			err = alloc_css_id(ss, parent, cgrp);
-			if (err)
-				goto err_destroy;
-		}
 		/* At error, ->destroy() callback has to free assigned ID. */
 	}
 
@@ -3709,17 +3702,6 @@ int __init_or_module cgroup_load_subsys(
 
 	/* our new subsystem will be attached to the dummy hierarchy. */
 	init_cgroup_css(css, ss, dummytop);
-	/* init_idr must be after init_cgroup_css because it sets css->id. */
-	if (ss->use_id) {
-		int ret = cgroup_init_idr(ss, css);
-		if (ret) {
-			dummytop->subsys[ss->subsys_id] = NULL;
-			ss->destroy(ss, dummytop);
-			subsys[i] = NULL;
-			mutex_unlock(&cgroup_mutex);
-			return ret;
-		}
-	}
 
 	/*
 	 * Now we need to entangle the css into the existing css_sets. unlike
@@ -3888,8 +3870,6 @@ int __init cgroup_init(void)
 		struct cgroup_subsys *ss = subsys[i];
 		if (!ss->early_init)
 			cgroup_init_subsys(ss);
-		if (ss->use_id)
-			cgroup_init_idr(ss, init_css_set.subsys[ss->subsys_id]);
 	}
 
 	/* Add init_css_set to the hash table */
@@ -4603,8 +4583,8 @@ err_out:
 
 }
 
-static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
-					    struct cgroup_subsys_state *rootcss)
+static int cgroup_init_idr(struct cgroup_subsys *ss,
+			    struct cgroup_subsys_state *rootcss)
 {
 	struct css_id *newid;
 
@@ -4621,13 +4601,16 @@ static int __init_or_module cgroup_init_
 	return 0;
 }
 
-static int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *parent,
-			struct cgroup *child)
+int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *cgrp)
 {
 	int subsys_id, i, depth = 0;
+	struct cgroup *parent = cgrp->parent;
 	struct cgroup_subsys_state *parent_css, *child_css;
 	struct css_id *child_id, *parent_id;
 
+	if (cgrp->parent == NULL)
+		return cgroup_init_idr(ss, cgrp->subsys[ss->subsys_id]);
+
 	subsys_id = ss->subsys_id;
 	parent_css = parent->subsys[subsys_id];
 	child_css = child->subsys[subsys_id];
Index: mmotm-0811/include/linux/cgroup.h
===================================================================
--- mmotm-0811.orig/include/linux/cgroup.h
+++ mmotm-0811/include/linux/cgroup.h
@@ -583,9 +583,11 @@ int cgroup_attach_task_current_cg(struct
 /*
  * CSS ID is ID for cgroup_subsys_state structs under subsys. This only works
  * if cgroup_subsys.use_id == true. It can be used for looking up and scanning.
- * CSS ID is assigned at cgroup allocation (create) automatically
- * and removed when subsys calls free_css_id() function. This is because
- * the lifetime of cgroup_subsys_state is subsys's matter.
+ * CSS ID must be assigned by subsys itself at cgroup creation and deleted
+ * when subsys calls free_css_id() function. This is because the life time of
+ * of cgroup_subsys_state is subsys's matter.
+ *
+ * ID->css look up is available after cgroup's directory is populated.
  *
  * Looking up and scanning function should be called under rcu_read_lock().
  * Taking cgroup_mutex()/hierarchy_mutex() is not necessary for following calls.
@@ -593,10 +595,10 @@ int cgroup_attach_task_current_cg(struct
  * destroyed". The caller should check css and cgroup's status.
  */
 
-/*
- * Typically Called at ->destroy(), or somewhere the subsys frees
- * cgroup_subsys_state.
- */
+/* Should be called in ->create() by subsys itself */
+int alloc_css_id(struct cgroup_subsys *ss, struct cgroup *child);
+
+/* Typically Called at ->destroy(), or somewhere the subsys frees css */
 void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css);
 
 /* Find a cgroup_subsys_state which has given ID */
Index: mmotm-0811/mm/memcontrol.c
===================================================================
--- mmotm-0811.orig/mm/memcontrol.c
+++ mmotm-0811/mm/memcontrol.c
@@ -4141,6 +4141,10 @@ mem_cgroup_create(struct cgroup_subsys *
 		if (alloc_mem_cgroup_per_zone_info(mem, node))
 			goto free_out;
 
+	if (alloc_css_id(ss, cont))
+		goto free_out;
+	/* Here, css_id(&mem->css) works. but css_lookup(id)->mem doesn't */
+
 	/* root ? */
 	if (cont->parent == NULL) {
 		int cpu;
Index: mmotm-0811/block/blk-cgroup.c
===================================================================
--- mmotm-0811.orig/block/blk-cgroup.c
+++ mmotm-0811/block/blk-cgroup.c
@@ -958,6 +958,7 @@ blkiocg_create(struct cgroup_subsys *sub
 {
 	struct blkio_cgroup *blkcg;
 	struct cgroup *parent = cgroup->parent;
+	int ret;
 
 	if (!parent) {
 		blkcg = &blkio_root_cgroup;
@@ -971,6 +972,11 @@ blkiocg_create(struct cgroup_subsys *sub
 	blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
 	if (!blkcg)
 		return ERR_PTR(-ENOMEM);
+	ret = alloc_css_id(subsys, cgroup);
+	if (ret) {
+		kfree(blkcg);
+		return ret;
+	}
 
 	blkcg->weight = BLKIO_WEIGHT_DEFAULT;
 done:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2010-08-25  2:34 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-20  9:55 [PATCH] memcg: towards I/O aware memcg v5 KAMEZAWA Hiroyuki
2010-08-20  9:55 ` KAMEZAWA Hiroyuki
2010-08-20  9:58 ` [PATCH 1/5] cgroup: ID notification call back KAMEZAWA Hiroyuki
2010-08-20  9:58   ` KAMEZAWA Hiroyuki
2010-08-24  7:19   ` Greg Thelen
2010-08-24  7:19     ` Greg Thelen
2010-08-24  7:18     ` KAMEZAWA Hiroyuki
2010-08-24  7:18       ` KAMEZAWA Hiroyuki
2010-08-24  9:04   ` Li Zefan
2010-08-24  9:04     ` Li Zefan
2010-08-24 23:58     ` KAMEZAWA Hiroyuki
2010-08-24 23:58       ` KAMEZAWA Hiroyuki
2010-08-25  0:11     ` Paul Menage
2010-08-25  0:11       ` Paul Menage
2010-08-25  0:17       ` KAMEZAWA Hiroyuki
2010-08-25  0:17         ` KAMEZAWA Hiroyuki
2010-08-25  0:25         ` Paul Menage
2010-08-25  0:25           ` Paul Menage
2010-08-25  0:09   ` Paul Menage
2010-08-25  0:09     ` Paul Menage
2010-08-25  0:20     ` KAMEZAWA Hiroyuki
2010-08-25  0:20       ` KAMEZAWA Hiroyuki
2010-08-25  0:34       ` Paul Menage
2010-08-25  0:34         ` Paul Menage
2010-08-25  0:37         ` KAMEZAWA Hiroyuki
2010-08-25  0:37           ` KAMEZAWA Hiroyuki
2010-08-25  0:46           ` Paul Menage
2010-08-25  0:46             ` Paul Menage
2010-08-25  1:03             ` KAMEZAWA Hiroyuki
2010-08-25  1:03               ` KAMEZAWA Hiroyuki
2010-08-25  1:35               ` Paul Menage
2010-08-25  1:35                 ` Paul Menage
2010-08-25  1:42                 ` KAMEZAWA Hiroyuki
2010-08-25  1:42                   ` KAMEZAWA Hiroyuki
2010-08-25  1:52                   ` Paul Menage
2010-08-25  1:52                     ` Paul Menage
2010-08-25  2:29                     ` KAMEZAWA Hiroyuki
2010-08-25  2:29                       ` KAMEZAWA Hiroyuki
2010-08-20  9:59 ` [PATCH 2/5] memcg: use array and ID for quick look up KAMEZAWA Hiroyuki
2010-08-20  9:59   ` KAMEZAWA Hiroyuki
2010-08-23  3:35   ` Daisuke Nishimura
2010-08-23  3:35     ` Daisuke Nishimura
2010-08-23 23:51     ` KAMEZAWA Hiroyuki
2010-08-23 23:51       ` KAMEZAWA Hiroyuki
2010-08-24  0:19       ` Daisuke Nishimura
2010-08-24  0:19         ` Daisuke Nishimura
2010-08-24  7:44   ` Greg Thelen
2010-08-24  7:44     ` Greg Thelen
2010-08-24  7:42     ` KAMEZAWA Hiroyuki
2010-08-24  7:42       ` KAMEZAWA Hiroyuki
2010-08-20 10:01 ` [PATCH] memcg: use ID in page_cgroup KAMEZAWA Hiroyuki
2010-08-20 10:01   ` KAMEZAWA Hiroyuki
2010-08-20 10:05   ` KAMEZAWA Hiroyuki
2010-08-23  5:32   ` Daisuke Nishimura
2010-08-23  5:32     ` Daisuke Nishimura
2010-08-23 23:52     ` KAMEZAWA Hiroyuki
2010-08-23 23:52       ` KAMEZAWA Hiroyuki
2010-08-24  1:14       ` Daisuke Nishimura
2010-08-24  1:14         ` Daisuke Nishimura
2010-08-24  1:54         ` KAMEZAWA Hiroyuki
2010-08-24  1:54           ` KAMEZAWA Hiroyuki
2010-08-24  4:04           ` Daisuke Nishimura
2010-08-24  4:04             ` Daisuke Nishimura
2010-08-24  6:05             ` KAMEZAWA Hiroyuki
2010-08-24  6:05               ` KAMEZAWA Hiroyuki
2010-08-24  7:47   ` Greg Thelen
2010-08-24  7:47     ` Greg Thelen
2010-08-24  7:51     ` KAMEZAWA Hiroyuki
2010-08-24  7:51       ` KAMEZAWA Hiroyuki
2010-08-24  8:35       ` Greg Thelen
2010-08-24  8:35         ` Greg Thelen
2010-08-24  8:38         ` KAMEZAWA Hiroyuki
2010-08-24  8:38           ` KAMEZAWA Hiroyuki
2010-08-20 10:02 ` [PATCH 4/5] memcg: lockless update of file_mapped KAMEZAWA Hiroyuki
2010-08-20 10:02   ` KAMEZAWA Hiroyuki
2010-08-23  8:50   ` Daisuke Nishimura
2010-08-23  8:50     ` Daisuke Nishimura
2010-08-23 23:49     ` KAMEZAWA Hiroyuki
2010-08-23 23:49       ` KAMEZAWA Hiroyuki
2010-08-24  0:19       ` Daisuke Nishimura
2010-08-24  0:19         ` Daisuke Nishimura
2010-08-20 10:03 ` [PATCH 5/5] memcg: generic file accounting update function KAMEZAWA Hiroyuki
2010-08-20 10:03   ` KAMEZAWA Hiroyuki
2010-08-24  7:46 ` [PATCH] memcg: towards I/O aware memcg v5 Balbir Singh
2010-08-24  7:46   ` Balbir Singh
2010-08-24  7:59   ` KAMEZAWA Hiroyuki
2010-08-24  7:59     ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.