All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/6 v2] reducing page_cgroup size
@ 2012-03-28 10:44 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

Hi, here is v2 (still RFC again because I changed many parts.)
I'd like to post v3 without 'RFC' tag after Lsf-MM Summit.

This series is for reducing size of 'struct page_cgroup' to 8 bytes.
v2 contains 6 patches and did clean-ups and fixes race in v1 and
adds a trial to integrate struct page_cgroup into struct page.

each patches are...

1/6 ....add methods to access pc->mem_cgroup
2/6 ....add pc_set_mem_cgroup_and_flags() to set ->mem_cgroup and flags by one call.
3/6 ....add PageCgroupReset() for handling a race.
4/6 ....reduce size of struct page_cgroup by removing ->mem_cgroup
5/6 ....remove unnecessary memory barrier
6/6 ....add CONFIG_INTEGRATED_PAGE_CGROUP to place page_cgroup in struct page.

Regards,
-Kame


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 0/6 v2] reducing page_cgroup size
@ 2012-03-28 10:44 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

Hi, here is v2 (still RFC again because I changed many parts.)
I'd like to post v3 without 'RFC' tag after Lsf-MM Summit.

This series is for reducing size of 'struct page_cgroup' to 8 bytes.
v2 contains 6 patches and did clean-ups and fixes race in v1 and
adds a trial to integrate struct page_cgroup into struct page.

each patches are...

1/6 ....add methods to access pc->mem_cgroup
2/6 ....add pc_set_mem_cgroup_and_flags() to set ->mem_cgroup and flags by one call.
3/6 ....add PageCgroupReset() for handling a race.
4/6 ....reduce size of struct page_cgroup by removing ->mem_cgroup
5/6 ....remove unnecessary memory barrier
6/6 ....add CONFIG_INTEGRATED_PAGE_CGROUP to place page_cgroup in struct page.

Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH 1/6] memcg: add methods to access pc->mem_cgroup
  2012-03-28 10:44 ` KAMEZAWA Hiroyuki
@ 2012-03-28 10:47   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal


In order to encode pc->mem_cgroup and pc->flags to be in a word,
access function to pc->mem_cgroup is required.

This patch replaces access to pc->mem_cgroup with
 pc_to_mem_cgroup(pc)          : pc->mem_cgroup
 pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg

Changelog:
 - rebased onto linux-next-Mar 27 2012 handle THP move_page

Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   12 +++++++
 mm/memcontrol.c             |   71 ++++++++++++++++++++++--------------------
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index a88cdba..92768cb 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -82,6 +82,18 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
+
+static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
+{
+	return pc->mem_cgroup;
+}
+
+static inline void
+pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
+{
+	pc->mem_cgroup = memcg;
+}
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b2ee6df..8077460 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1034,9 +1034,9 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
- * What we have to take care of here is validness of pc->mem_cgroup.
+ * What we have to take care of here is validness of pc's mem_cgroup.
  *
- * Changes to pc->mem_cgroup happens when
+ * Changes to pc's mem_cgroup happens when
  * 1. charge
  * 2. moving account
  * In typical case, "charge" is done before add-to-lru. Exception is SwapCache.
@@ -1068,7 +1068,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 		return &zone->lruvec;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 
 	/*
 	 * Surreptitiously switch any uncharged page to root:
@@ -1077,10 +1077,12 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 *
 	 * Our caller holds lru_lock, and PageCgroupUsed is updated
 	 * under page_cgroup lock: between them, they make all uses
-	 * of pc->mem_cgroup safe.
+	 * of pc's mem_cgroup safe.
 	 */
-	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
-		pc->mem_cgroup = memcg = root_mem_cgroup;
+	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
+		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		memcg = root_mem_cgroup;
+	}
 
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* compound_order() is stabilized through lru_lock */
@@ -1108,7 +1110,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
 		return;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* huge page split is done under lru_lock. so, we have no races. */
@@ -1255,9 +1257,9 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
 	smp_rmb();
-	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
+	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
 
@@ -1334,7 +1336,7 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
  *
  * mem_cgroup_stolen() -  checking whether a cgroup is mc.from or not. This
  *			  is used for avoiding races in accounting.  If true,
- *			  pc->mem_cgroup may be overwritten.
+ *			  pc's mem_cgroup may be overwritten.
  *
  * mem_cgroup_under_move() - checking a cgroup is mc.from or mc.to or
  *			  under hierarchy of moving cgroups. This is for
@@ -1907,8 +1909,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
  * file-stat operations happen after a page is attached to radix-tree. There
  * are no race with "charge".
  *
- * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup
- * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even
+ * Considering "uncharge", we know that memcg doesn't clear pc's mem_cgroup
+ * at "uncharge" intentionally. So, we always see valid pc's mem_cgroup even
  * if there are race with "uncharge". Statistics itself is properly handled
  * by flags.
  *
@@ -1925,7 +1927,7 @@ void __mem_cgroup_begin_update_page_stat(struct page *page,
 
 	pc = lookup_page_cgroup(page);
 again:
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
 		return;
 	/*
@@ -1938,7 +1940,7 @@ again:
 		return;
 
 	move_lock_mem_cgroup(memcg, flags);
-	if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
+	if (memcg != pc_to_mem_cgroup(pc) || !PageCgroupUsed(pc)) {
 		move_unlock_mem_cgroup(memcg, flags);
 		goto again;
 	}
@@ -1950,11 +1952,11 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
 	struct page_cgroup *pc = lookup_page_cgroup(page);
 
 	/*
-	 * It's guaranteed that pc->mem_cgroup never changes while
-	 * lock is held because a routine modifies pc->mem_cgroup
+	 * It's guaranteed that pc's mem_cgroup never changes while
+	 * lock is held because a routine modifies pc's mem_cgroup
 	 * should take move_lock_page_cgroup().
 	 */
-	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
+	move_unlock_mem_cgroup(pc_to_mem_cgroup(pc), flags);
 }
 
 void mem_cgroup_update_page_stat(struct page *page,
@@ -1967,7 +1969,7 @@ void mem_cgroup_update_page_stat(struct page *page,
 	if (mem_cgroup_disabled())
 		return;
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
 		return;
 
@@ -2264,7 +2266,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
  * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
  * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
  * as possible without any hazards. 2: all pages should have a valid
- * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
+ * pc's mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
  * pointer, that is treated as a charge to root_mem_cgroup.
  *
  * So __mem_cgroup_try_charge() will return
@@ -2457,7 +2459,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_mem_cgroup(pc);
 		if (memcg && !css_tryget(&memcg->css))
 			memcg = NULL;
 	} else if (PageSwapCache(page)) {
@@ -2509,11 +2511,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc->mem_cgroup = memcg;
+	pc_set_mem_cgroup(pc, memcg);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
-	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
+	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
 	 * before USED bit, we need memory barrier here.
 	 * See mem_cgroup_add_lru_list(), etc.
  	 */
@@ -2558,13 +2560,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 {
 	struct page_cgroup *head_pc = lookup_page_cgroup(head);
 	struct page_cgroup *pc;
+	struct mem_cgroup *memcg = pc_to_mem_cgroup(head_pc);
 	int i;
 
 	if (mem_cgroup_disabled())
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc->mem_cgroup = head_pc->mem_cgroup;
+		pc_set_mem_cgroup(pc, memcg);
 		smp_wmb();/* see __commit_charge() */
 		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
 	}
@@ -2615,7 +2618,7 @@ static int mem_cgroup_move_account(struct page *page,
 	lock_page_cgroup(pc);
 
 	ret = -EINVAL;
-	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
+	if (!PageCgroupUsed(pc) || pc_to_mem_cgroup(pc) != from)
 		goto unlock;
 
 	move_lock_mem_cgroup(from, &flags);
@@ -2633,7 +2636,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc_set_mem_cgroup(pc, to);
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2976,7 +2979,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	lock_page_cgroup(pc);
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -3012,7 +3015,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	ClearPageCgroupUsed(pc);
 	/*
-	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
+	 * pc's mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
 	 * to be reused (freed soon). Exception is SwapCache, it's handled by
 	 * special functions.
@@ -3234,7 +3237,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_mem_cgroup(pc);
 		css_get(&memcg->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -3379,7 +3382,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	pc = lookup_page_cgroup(oldpage);
 	/* fix accounting on old pages */
 	lock_page_cgroup(pc);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	mem_cgroup_charge_statistics(memcg, false, -1);
 	ClearPageCgroupUsed(pc);
 	unlock_page_cgroup(pc);
@@ -3390,7 +3393,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	/*
 	 * Even if newpage->mapping was NULL before starting replacement,
 	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
-	 * LRU while we overwrite pc->mem_cgroup.
+	 * LRU while we overwrite pc's mem_cgroup.
 	 */
 	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type, true);
 }
@@ -3426,7 +3429,7 @@ void mem_cgroup_print_bad_page(struct page *page)
 	pc = lookup_page_cgroup_used(page);
 	if (pc) {
 		printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p\n",
-		       pc, pc->flags, pc->mem_cgroup);
+		       pc, pc->flags, pc_to_mem_cgroup(pc));
 	}
 }
 #endif
@@ -5238,7 +5241,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;
@@ -5274,7 +5277,7 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 	if (!move_anon())
 		return ret;
 	pc = lookup_page_cgroup(page);
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+	if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
 		ret = MC_TARGET_PAGE;
 		if (target) {
 			get_page(page);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 1/6] memcg: add methods to access pc->mem_cgroup
@ 2012-03-28 10:47   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal


In order to encode pc->mem_cgroup and pc->flags to be in a word,
access function to pc->mem_cgroup is required.

This patch replaces access to pc->mem_cgroup with
 pc_to_mem_cgroup(pc)          : pc->mem_cgroup
 pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg

Changelog:
 - rebased onto linux-next-Mar 27 2012 handle THP move_page

Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   12 +++++++
 mm/memcontrol.c             |   71 ++++++++++++++++++++++--------------------
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index a88cdba..92768cb 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -82,6 +82,18 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
+
+static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
+{
+	return pc->mem_cgroup;
+}
+
+static inline void
+pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
+{
+	pc->mem_cgroup = memcg;
+}
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b2ee6df..8077460 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1034,9 +1034,9 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
- * What we have to take care of here is validness of pc->mem_cgroup.
+ * What we have to take care of here is validness of pc's mem_cgroup.
  *
- * Changes to pc->mem_cgroup happens when
+ * Changes to pc's mem_cgroup happens when
  * 1. charge
  * 2. moving account
  * In typical case, "charge" is done before add-to-lru. Exception is SwapCache.
@@ -1068,7 +1068,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 		return &zone->lruvec;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 
 	/*
 	 * Surreptitiously switch any uncharged page to root:
@@ -1077,10 +1077,12 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 *
 	 * Our caller holds lru_lock, and PageCgroupUsed is updated
 	 * under page_cgroup lock: between them, they make all uses
-	 * of pc->mem_cgroup safe.
+	 * of pc's mem_cgroup safe.
 	 */
-	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
-		pc->mem_cgroup = memcg = root_mem_cgroup;
+	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
+		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		memcg = root_mem_cgroup;
+	}
 
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* compound_order() is stabilized through lru_lock */
@@ -1108,7 +1110,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
 		return;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* huge page split is done under lru_lock. so, we have no races. */
@@ -1255,9 +1257,9 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
 	smp_rmb();
-	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
+	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
 
@@ -1334,7 +1336,7 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
  *
  * mem_cgroup_stolen() -  checking whether a cgroup is mc.from or not. This
  *			  is used for avoiding races in accounting.  If true,
- *			  pc->mem_cgroup may be overwritten.
+ *			  pc's mem_cgroup may be overwritten.
  *
  * mem_cgroup_under_move() - checking a cgroup is mc.from or mc.to or
  *			  under hierarchy of moving cgroups. This is for
@@ -1907,8 +1909,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask, int order)
  * file-stat operations happen after a page is attached to radix-tree. There
  * are no race with "charge".
  *
- * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup
- * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even
+ * Considering "uncharge", we know that memcg doesn't clear pc's mem_cgroup
+ * at "uncharge" intentionally. So, we always see valid pc's mem_cgroup even
  * if there are race with "uncharge". Statistics itself is properly handled
  * by flags.
  *
@@ -1925,7 +1927,7 @@ void __mem_cgroup_begin_update_page_stat(struct page *page,
 
 	pc = lookup_page_cgroup(page);
 again:
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
 		return;
 	/*
@@ -1938,7 +1940,7 @@ again:
 		return;
 
 	move_lock_mem_cgroup(memcg, flags);
-	if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
+	if (memcg != pc_to_mem_cgroup(pc) || !PageCgroupUsed(pc)) {
 		move_unlock_mem_cgroup(memcg, flags);
 		goto again;
 	}
@@ -1950,11 +1952,11 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
 	struct page_cgroup *pc = lookup_page_cgroup(page);
 
 	/*
-	 * It's guaranteed that pc->mem_cgroup never changes while
-	 * lock is held because a routine modifies pc->mem_cgroup
+	 * It's guaranteed that pc's mem_cgroup never changes while
+	 * lock is held because a routine modifies pc's mem_cgroup
 	 * should take move_lock_page_cgroup().
 	 */
-	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
+	move_unlock_mem_cgroup(pc_to_mem_cgroup(pc), flags);
 }
 
 void mem_cgroup_update_page_stat(struct page *page,
@@ -1967,7 +1969,7 @@ void mem_cgroup_update_page_stat(struct page *page,
 	if (mem_cgroup_disabled())
 		return;
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
 		return;
 
@@ -2264,7 +2266,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
  * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
  * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
  * as possible without any hazards. 2: all pages should have a valid
- * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
+ * pc's mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
  * pointer, that is treated as a charge to root_mem_cgroup.
  *
  * So __mem_cgroup_try_charge() will return
@@ -2457,7 +2459,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_mem_cgroup(pc);
 		if (memcg && !css_tryget(&memcg->css))
 			memcg = NULL;
 	} else if (PageSwapCache(page)) {
@@ -2509,11 +2511,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc->mem_cgroup = memcg;
+	pc_set_mem_cgroup(pc, memcg);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
-	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
+	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
 	 * before USED bit, we need memory barrier here.
 	 * See mem_cgroup_add_lru_list(), etc.
  	 */
@@ -2558,13 +2560,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 {
 	struct page_cgroup *head_pc = lookup_page_cgroup(head);
 	struct page_cgroup *pc;
+	struct mem_cgroup *memcg = pc_to_mem_cgroup(head_pc);
 	int i;
 
 	if (mem_cgroup_disabled())
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc->mem_cgroup = head_pc->mem_cgroup;
+		pc_set_mem_cgroup(pc, memcg);
 		smp_wmb();/* see __commit_charge() */
 		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
 	}
@@ -2615,7 +2618,7 @@ static int mem_cgroup_move_account(struct page *page,
 	lock_page_cgroup(pc);
 
 	ret = -EINVAL;
-	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
+	if (!PageCgroupUsed(pc) || pc_to_mem_cgroup(pc) != from)
 		goto unlock;
 
 	move_lock_mem_cgroup(from, &flags);
@@ -2633,7 +2636,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc_set_mem_cgroup(pc, to);
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2976,7 +2979,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	lock_page_cgroup(pc);
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -3012,7 +3015,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	ClearPageCgroupUsed(pc);
 	/*
-	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
+	 * pc's mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
 	 * to be reused (freed soon). Exception is SwapCache, it's handled by
 	 * special functions.
@@ -3234,7 +3237,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_mem_cgroup(pc);
 		css_get(&memcg->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -3379,7 +3382,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	pc = lookup_page_cgroup(oldpage);
 	/* fix accounting on old pages */
 	lock_page_cgroup(pc);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	mem_cgroup_charge_statistics(memcg, false, -1);
 	ClearPageCgroupUsed(pc);
 	unlock_page_cgroup(pc);
@@ -3390,7 +3393,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	/*
 	 * Even if newpage->mapping was NULL before starting replacement,
 	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
-	 * LRU while we overwrite pc->mem_cgroup.
+	 * LRU while we overwrite pc's mem_cgroup.
 	 */
 	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type, true);
 }
@@ -3426,7 +3429,7 @@ void mem_cgroup_print_bad_page(struct page *page)
 	pc = lookup_page_cgroup_used(page);
 	if (pc) {
 		printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p\n",
-		       pc, pc->flags, pc->mem_cgroup);
+		       pc, pc->flags, pc_to_mem_cgroup(pc));
 	}
 }
 #endif
@@ -5238,7 +5241,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;
@@ -5274,7 +5277,7 @@ static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
 	if (!move_anon())
 		return ret;
 	pc = lookup_page_cgroup(page);
-	if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+	if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
 		ret = MC_TARGET_PAGE;
 		if (target) {
 			get_page(page);
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
  2012-03-28 10:44 ` KAMEZAWA Hiroyuki
@ 2012-03-28 10:51   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:51 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   18 ++++++++++++++++++
 mm/memcontrol.c             |   18 ++++--------------
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 92768cb..2707809 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -1,6 +1,8 @@
 #ifndef __LINUX_PAGE_CGROUP_H
 #define __LINUX_PAGE_CGROUP_H
 
+#include <linux/smp.h>
+
 enum {
 	/* flags for mem_cgroup */
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
@@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
 	pc->mem_cgroup = memcg;
 }
 
+static inline void
+pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
+			unsigned long flags)
+{
+	pc->mem_cgroup = memcg;
+	/*
+	 * We access a page_cgroup asynchronously without lock_page_cgroup().
+	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
+	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
+	 * before USED bit, we need memory barrier here.
+	 * See mem_cgroup_add_lru_list(), etc.
+	 */
+	smp_wmb();
+	pc->flags = flags;
+}
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8077460..d366b60 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc_set_mem_cgroup(pc, memcg);
-	/*
-	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
-	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
-	 * before USED bit, we need memory barrier here.
-	 * See mem_cgroup_add_lru_list(), etc.
- 	 */
-	smp_wmb();
-	SetPageCgroupUsed(pc);
+	pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
 
 	if (lrucare) {
 		if (was_on_lru) {
@@ -2549,7 +2540,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2565,11 +2555,11 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 
 	if (mem_cgroup_disabled())
 		return;
+	if (!PageCgroupUsed(head_pc))
+		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc_set_mem_cgroup(pc, memcg);
-		smp_wmb();/* see __commit_charge() */
-		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+		pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED));
 	}
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
@ 2012-03-28 10:51   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:51 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   18 ++++++++++++++++++
 mm/memcontrol.c             |   18 ++++--------------
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 92768cb..2707809 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -1,6 +1,8 @@
 #ifndef __LINUX_PAGE_CGROUP_H
 #define __LINUX_PAGE_CGROUP_H
 
+#include <linux/smp.h>
+
 enum {
 	/* flags for mem_cgroup */
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
@@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
 	pc->mem_cgroup = memcg;
 }
 
+static inline void
+pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
+			unsigned long flags)
+{
+	pc->mem_cgroup = memcg;
+	/*
+	 * We access a page_cgroup asynchronously without lock_page_cgroup().
+	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
+	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
+	 * before USED bit, we need memory barrier here.
+	 * See mem_cgroup_add_lru_list(), etc.
+	 */
+	smp_wmb();
+	pc->flags = flags;
+}
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8077460..d366b60 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc_set_mem_cgroup(pc, memcg);
-	/*
-	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
-	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
-	 * before USED bit, we need memory barrier here.
-	 * See mem_cgroup_add_lru_list(), etc.
- 	 */
-	smp_wmb();
-	SetPageCgroupUsed(pc);
+	pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
 
 	if (lrucare) {
 		if (was_on_lru) {
@@ -2549,7 +2540,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2565,11 +2555,11 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 
 	if (mem_cgroup_disabled())
 		return;
+	if (!PageCgroupUsed(head_pc))
+		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc_set_mem_cgroup(pc, memcg);
-		smp_wmb();/* see __commit_charge() */
-		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+		pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED));
 	}
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 3/6] memcg: add PageCgroupReset()
  2012-03-28 10:44 ` KAMEZAWA Hiroyuki
@ 2012-03-28 10:57   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal


 A commit "memcg: simplify LRU handling by new rule" removes PCG_ACCT_LRU.
 and the bug introduced by it was fixed by "memcg: fix GPF when cgroup removal
 races with last exit"

This was for reducing flags on pc->flags....Now, we have 3bits of flags.
but this patch adds a new flag, I'm sorry. (Considering alignment of
kmalloc(), we'll able to have 5 bits..)

This patch adds PCG_RESET which is similar to PCG_ACCT_LRU. This is set
when mem_cgroup_add_lru_list() finds we cannot trust the pc's mem_cgroup.

The reason why this patch adds a (renamed) flag again is for merging
pc->flags and pc->mem_cgroup. Assume pc's mem_cgroup is encoded as

	mem_cgroup = pc->flags & ~0x7

Updating multiple bits of pc->flags without talking lock_page_cgroup()
is very dangerous. And mem_cgroup_add_lru_list() updates pc->mem_cgroup
without taking lock. Then I add RESET bit. After this, pc_to_mem_cgroup()
is written as

	if (PageCgroupReset(pc))
		return root_mem_cgroup;
	return pc->mem_cgroup;

This update of Reset bit can be done in atomic by set_bit(). And
cleared when USED bit is set.

Considering kmalloc()'s alignment, having 4bits of flags will be ok....

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   15 ++++++++-------
 mm/memcontrol.c             |    5 +++--
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 2707809..3f3b4ff 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -8,6 +8,7 @@ enum {
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
 	PCG_USED, /* this object is in use. */
 	PCG_MIGRATION, /* under page migration */
+	PCG_RESET,     /* have been reset to root_mem_cgroup */
 	__NR_PCG_FLAGS,
 };
 
@@ -70,6 +71,9 @@ SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)
 
+TESTPCGFLAG(Reset, RESET)
+SETPCGFLAG(Reset, RESET)
+
 static inline void lock_page_cgroup(struct page_cgroup *pc)
 {
 	/*
@@ -84,16 +88,13 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
+extern struct mem_cgroup*  root_mem_cgroup;
 
 static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 {
-	return pc->mem_cgroup;
-}
-
-static inline void
-pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
-{
-	pc->mem_cgroup = memcg;
+	if (likely(!PageCgroupReset(pc)))
+		return pc->mem_cgroup;
+	return root_mem_cgroup;
 }
 
 static inline void
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d366b60..622fd2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1080,7 +1080,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 * of pc's mem_cgroup safe.
 	 */
 	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
-		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		/* this reset bit is cleared when the page is charged */
+		SetPageCgroupReset(pc);
 		memcg = root_mem_cgroup;
 	}
 
@@ -2626,7 +2627,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc_set_mem_cgroup(pc, to);
+	pc_set_mem_cgroup_and_flags(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 3/6] memcg: add PageCgroupReset()
@ 2012-03-28 10:57   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal


 A commit "memcg: simplify LRU handling by new rule" removes PCG_ACCT_LRU.
 and the bug introduced by it was fixed by "memcg: fix GPF when cgroup removal
 races with last exit"

This was for reducing flags on pc->flags....Now, we have 3bits of flags.
but this patch adds a new flag, I'm sorry. (Considering alignment of
kmalloc(), we'll able to have 5 bits..)

This patch adds PCG_RESET which is similar to PCG_ACCT_LRU. This is set
when mem_cgroup_add_lru_list() finds we cannot trust the pc's mem_cgroup.

The reason why this patch adds a (renamed) flag again is for merging
pc->flags and pc->mem_cgroup. Assume pc's mem_cgroup is encoded as

	mem_cgroup = pc->flags & ~0x7

Updating multiple bits of pc->flags without talking lock_page_cgroup()
is very dangerous. And mem_cgroup_add_lru_list() updates pc->mem_cgroup
without taking lock. Then I add RESET bit. After this, pc_to_mem_cgroup()
is written as

	if (PageCgroupReset(pc))
		return root_mem_cgroup;
	return pc->mem_cgroup;

This update of Reset bit can be done in atomic by set_bit(). And
cleared when USED bit is set.

Considering kmalloc()'s alignment, having 4bits of flags will be ok....

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   15 ++++++++-------
 mm/memcontrol.c             |    5 +++--
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 2707809..3f3b4ff 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -8,6 +8,7 @@ enum {
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
 	PCG_USED, /* this object is in use. */
 	PCG_MIGRATION, /* under page migration */
+	PCG_RESET,     /* have been reset to root_mem_cgroup */
 	__NR_PCG_FLAGS,
 };
 
@@ -70,6 +71,9 @@ SETPCGFLAG(Migration, MIGRATION)
 CLEARPCGFLAG(Migration, MIGRATION)
 TESTPCGFLAG(Migration, MIGRATION)
 
+TESTPCGFLAG(Reset, RESET)
+SETPCGFLAG(Reset, RESET)
+
 static inline void lock_page_cgroup(struct page_cgroup *pc)
 {
 	/*
@@ -84,16 +88,13 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
+extern struct mem_cgroup*  root_mem_cgroup;
 
 static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 {
-	return pc->mem_cgroup;
-}
-
-static inline void
-pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
-{
-	pc->mem_cgroup = memcg;
+	if (likely(!PageCgroupReset(pc)))
+		return pc->mem_cgroup;
+	return root_mem_cgroup;
 }
 
 static inline void
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d366b60..622fd2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1080,7 +1080,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 * of pc's mem_cgroup safe.
 	 */
 	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
-		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		/* this reset bit is cleared when the page is charged */
+		SetPageCgroupReset(pc);
 		memcg = root_mem_cgroup;
 	}
 
@@ -2626,7 +2627,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc_set_mem_cgroup(pc, to);
+	pc_set_mem_cgroup_and_flags(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][[PATCH 4/6] memcg: remove mem_cgroup pointer from page_cgroup
  2012-03-28 10:44 ` KAMEZAWA Hiroyuki
@ 2012-03-28 10:59   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal


This patch removes pc->mem_cgroup and merge the pointer into flags as

63                           4     0
  | pointer to memcg..........|flags|

After this, memory cgroup's overhead is 8(4)bytes per a page.

Changelog:
 - added BUILD_BUG_ON()
 - update comments in Kconfig

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   21 +++++++++------------
 init/Kconfig                |    2 +-
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 3f3b4ff..7e3a3c7 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -3,6 +3,10 @@
 
 #include <linux/smp.h>
 
+/*
+ * These flags are encoded into low bits of unsigned long,
+ * ORed with a pointer to mem_cgroup.
+ */
 enum {
 	/* flags for mem_cgroup */
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
@@ -12,6 +16,8 @@ enum {
 	__NR_PCG_FLAGS,
 };
 
+#define PCG_FLAGS_MASK	((1 << __NR_PCG_FLAGS) - 1)
+
 #ifndef __GENERATING_BOUNDS_H
 #include <generated/bounds.h>
 
@@ -27,7 +33,6 @@ enum {
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
 };
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
@@ -93,7 +98,7 @@ extern struct mem_cgroup*  root_mem_cgroup;
 static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 {
 	if (likely(!PageCgroupReset(pc)))
-		return pc->mem_cgroup;
+		return (struct mem_cgroup*)(pc->flags & ~PCG_FLAGS_MASK);
 	return root_mem_cgroup;
 }
 
@@ -101,16 +106,8 @@ static inline void
 pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
 			unsigned long flags)
 {
-	pc->mem_cgroup = memcg;
-	/*
-	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
-	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
-	 * before USED bit, we need memory barrier here.
-	 * See mem_cgroup_add_lru_list(), etc.
-	 */
-	smp_wmb();
-	pc->flags = flags;
+	BUILD_BUG_ON(__NR_PCG_FLAGS > 5); /* assume 32byte alignment. */
+	pc->flags = (unsigned long)memcg | flags;
 }
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
diff --git a/init/Kconfig b/init/Kconfig
index 6cfd71d..e0bfe92 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -651,7 +651,7 @@ config CGROUP_MEM_RES_CTLR
 
 	  Note that setting this option increases fixed memory overhead
 	  associated with each page of memory in the system. By this,
-	  20(40)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
+	  4(8)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
 	  usage tracking struct at boot. Total amount of this is printed out
 	  at boot.
 
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][[PATCH 4/6] memcg: remove mem_cgroup pointer from page_cgroup
@ 2012-03-28 10:59   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 10:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal


This patch removes pc->mem_cgroup and merge the pointer into flags as

63                           4     0
  | pointer to memcg..........|flags|

After this, memory cgroup's overhead is 8(4)bytes per a page.

Changelog:
 - added BUILD_BUG_ON()
 - update comments in Kconfig

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   21 +++++++++------------
 init/Kconfig                |    2 +-
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 3f3b4ff..7e3a3c7 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -3,6 +3,10 @@
 
 #include <linux/smp.h>
 
+/*
+ * These flags are encoded into low bits of unsigned long,
+ * ORed with a pointer to mem_cgroup.
+ */
 enum {
 	/* flags for mem_cgroup */
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
@@ -12,6 +16,8 @@ enum {
 	__NR_PCG_FLAGS,
 };
 
+#define PCG_FLAGS_MASK	((1 << __NR_PCG_FLAGS) - 1)
+
 #ifndef __GENERATING_BOUNDS_H
 #include <generated/bounds.h>
 
@@ -27,7 +33,6 @@ enum {
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
 };
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
@@ -93,7 +98,7 @@ extern struct mem_cgroup*  root_mem_cgroup;
 static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 {
 	if (likely(!PageCgroupReset(pc)))
-		return pc->mem_cgroup;
+		return (struct mem_cgroup*)(pc->flags & ~PCG_FLAGS_MASK);
 	return root_mem_cgroup;
 }
 
@@ -101,16 +106,8 @@ static inline void
 pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
 			unsigned long flags)
 {
-	pc->mem_cgroup = memcg;
-	/*
-	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
-	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
-	 * before USED bit, we need memory barrier here.
-	 * See mem_cgroup_add_lru_list(), etc.
-	 */
-	smp_wmb();
-	pc->flags = flags;
+	BUILD_BUG_ON(__NR_PCG_FLAGS > 5); /* assume 32byte alignment. */
+	pc->flags = (unsigned long)memcg | flags;
 }
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
diff --git a/init/Kconfig b/init/Kconfig
index 6cfd71d..e0bfe92 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -651,7 +651,7 @@ config CGROUP_MEM_RES_CTLR
 
 	  Note that setting this option increases fixed memory overhead
 	  associated with each page of memory in the system. By this,
-	  20(40)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
+	  4(8)bytes/PAGE_SIZE on 32(64)bit system will be occupied by memory
 	  usage tracking struct at boot. Total amount of this is printed out
 	  at boot.
 
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 5/6] memcg: remove unnecessary memory barrier.
  2012-03-28 10:44 ` KAMEZAWA Hiroyuki
@ 2012-03-28 11:00   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

Now, Used bit and a pointer to memory cgroup are set at once.
memory barrier for Used bit -> pc->mem_cgroup is not necessary.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 622fd2e..767bef3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1258,8 +1258,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
-	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 5/6] memcg: remove unnecessary memory barrier.
@ 2012-03-28 11:00   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 11:00 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

Now, Used bit and a pointer to memory cgroup are set at once.
memory barrier for Used bit -> pc->mem_cgroup is not necessary.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 622fd2e..767bef3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1258,8 +1258,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
-	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 6/6] memcg: config for integrate page_cgroup into memmap
  2012-03-28 10:44 ` KAMEZAWA Hiroyuki
@ 2012-03-28 11:06   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 11:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

This patch is an experimental patch.
Considering 32bit archs, I think this should be CONFIG option...
==
Now, struct page_cgroup is 8byte object and allocated per a page.
This patch adds a config option to allocate page_cgroup in struct page.

By this
Pros.
  - lookup_page_cgroup() is almost 0 cost.
  - implementation seems very natural...
Cons.
  - size of 'struct page' will be increased  (to 64bytes in typical case)
  - cgroup_disable=memory will not allow user to avoid 8bytes of overhead.

Tested 'kernel make' on tmpfs.

Config=n
 Performance counter stats for 'make -j 8':

 1,180,857,100,495 instructions              #    0.00  insns per cycle
       923,084,678 cache-misses

      71.346377273 seconds time elapsed

Config=y
Performance counter stats for 'make -j 8':

 1,178,404,304,530 instructions              #    0.00  insns per cycle
       911,098,615 cache-misses

      71.607477840 seconds time elapsed

seems instructions and cache-misses decreased to some extent.
But no visible change in total execution time...

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/mm_types.h    |    4 +++-
 include/linux/page_cgroup.h |   33 ++++++++++++++++++++++++++++++++-
 init/Kconfig                |   14 ++++++++++++++
 mm/memcontrol.c             |    3 ++-
 mm/page_alloc.c             |    1 +
 mm/page_cgroup.c            |    3 ++-
 6 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 76bbdaf..2beda78 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -141,7 +141,9 @@ struct page {
 #ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
 	unsigned long debug_flags;	/* Use atomic bitops on this */
 #endif
-
+#ifdef CONFIG_INTEGRATED_PAGE_CGROUP
+	unsigned long page_cgroup;	/* see page_cgroup.h */
+#endif
 #ifdef CONFIG_KMEMCHECK
 	/*
 	 * kmemcheck wants to track the status of each byte in a page; this
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7e3a3c7..0e02632 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -35,8 +35,9 @@ struct page_cgroup {
 	unsigned long flags;
 };
 
-void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 
+#ifndef CONFIG_INTEGRATED_PAGE_CGROUP
+void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 #ifdef CONFIG_SPARSEMEM
 static inline void __init page_cgroup_init_flatmem(void)
 {
@@ -51,6 +52,36 @@ static inline void __init page_cgroup_init(void)
 
 struct page_cgroup *lookup_page_cgroup(struct page *page);
 struct page *lookup_cgroup_page(struct page_cgroup *pc);
+static inline void memmap_init_cgroup(struct page *page)
+{
+}
+#else
+static inline struct page_cgroup *lookup_page_cgroup(struct page *page)
+{
+	return (struct page_cgroup*)&page->page_cgroup;
+}
+
+static inline struct page *lookup_cgroup_page(struct page_cgroup *pc)
+{
+	return container_of((unsigned long*)pc, struct page, page_cgroup);
+}
+
+static inline void memmap_init_cgroup(struct page *page)
+{
+	page->page_cgroup = 0;
+}
+
+static inline void __init page_cgroup_init_flatmem(void)
+{
+}
+static inline void __init page_cgroup_init(void)
+{
+}
+
+static inline void pgdat_page_cgroup_init(struct pglist_data *pgdat)
+{
+}
+#endif
 
 #define TESTPCGFLAG(uname, lname)			\
 static inline int PageCgroup##uname(struct page_cgroup *pc)	\
diff --git a/init/Kconfig b/init/Kconfig
index e0bfe92..99514c2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -694,6 +694,20 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED
 	  For those who want to have the feature enabled by default should
 	  select this option (if, for some reason, they need to disable it
 	  then swapaccount=0 does the trick).
+
+config INTEGRATED_PAGE_CGROUP
+	bool "record memory cgroup information into struct page"
+	depends on CGROUP_MEM_RES_CTLR
+	default n
+	help
+	  Memory Resource Controller consumes 4/(8 if 64bit)bytes per page.
+	  It's independent of 'struct page'. If you say Y here, memory cgroup
+	  information is recorded into struct page and increase size of it
+	  4/8 bytes. With this, cpu overheads in runtime will be reduced
+	  but you cannot avoid above overheads of 4/8 bytes per page by boot
+	  option. If unsure, say N.
+
+
 config CGROUP_MEM_RES_CTLR_KMEM
 	bool "Memory Resource Controller Kernel Memory accounting (EXPERIMENTAL)"
 	depends on CGROUP_MEM_RES_CTLR && EXPERIMENTAL
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 767bef3..0c5b15c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2557,7 +2557,8 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 	if (!PageCgroupUsed(head_pc))
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
-		pc = head_pc + i;
+		/* page struct is contiguous in hugepage. */
+		pc = lookup_page_cgroup(head + i);
 		pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED));
 	}
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b37873..9be94df 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3682,6 +3682,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (!is_highmem_idx(zone))
 			set_page_address(page, __va(pfn << PAGE_SHIFT));
 #endif
+		memmap_init_cgroup(page);
 	}
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 1ccbd71..036c8ea 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -11,6 +11,7 @@
 #include <linux/swapops.h>
 #include <linux/kmemleak.h>
 
+#ifndef CONFIG_INTEGRATED_PAGE_CGROUP
 static unsigned long total_usage;
 
 #if !defined(CONFIG_SPARSEMEM)
@@ -315,7 +316,7 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 }
 
 #endif
-
+#endif
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
-- 
1.7.4.1




^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH 6/6] memcg: config for integrate page_cgroup into memmap
@ 2012-03-28 11:06   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-28 11:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Linux Kernel, Michal Hocko, Johannes Weiner, Andrew Morton,
	Han Ying, Glauber Costa, Konstantin Khlebnikov, Suleiman Souhlal

This patch is an experimental patch.
Considering 32bit archs, I think this should be CONFIG option...
==
Now, struct page_cgroup is 8byte object and allocated per a page.
This patch adds a config option to allocate page_cgroup in struct page.

By this
Pros.
  - lookup_page_cgroup() is almost 0 cost.
  - implementation seems very natural...
Cons.
  - size of 'struct page' will be increased  (to 64bytes in typical case)
  - cgroup_disable=memory will not allow user to avoid 8bytes of overhead.

Tested 'kernel make' on tmpfs.

Config=n
 Performance counter stats for 'make -j 8':

 1,180,857,100,495 instructions              #    0.00  insns per cycle
       923,084,678 cache-misses

      71.346377273 seconds time elapsed

Config=y
Performance counter stats for 'make -j 8':

 1,178,404,304,530 instructions              #    0.00  insns per cycle
       911,098,615 cache-misses

      71.607477840 seconds time elapsed

seems instructions and cache-misses decreased to some extent.
But no visible change in total execution time...

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/mm_types.h    |    4 +++-
 include/linux/page_cgroup.h |   33 ++++++++++++++++++++++++++++++++-
 init/Kconfig                |   14 ++++++++++++++
 mm/memcontrol.c             |    3 ++-
 mm/page_alloc.c             |    1 +
 mm/page_cgroup.c            |    3 ++-
 6 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 76bbdaf..2beda78 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -141,7 +141,9 @@ struct page {
 #ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
 	unsigned long debug_flags;	/* Use atomic bitops on this */
 #endif
-
+#ifdef CONFIG_INTEGRATED_PAGE_CGROUP
+	unsigned long page_cgroup;	/* see page_cgroup.h */
+#endif
 #ifdef CONFIG_KMEMCHECK
 	/*
 	 * kmemcheck wants to track the status of each byte in a page; this
diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7e3a3c7..0e02632 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -35,8 +35,9 @@ struct page_cgroup {
 	unsigned long flags;
 };
 
-void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 
+#ifndef CONFIG_INTEGRATED_PAGE_CGROUP
+void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 #ifdef CONFIG_SPARSEMEM
 static inline void __init page_cgroup_init_flatmem(void)
 {
@@ -51,6 +52,36 @@ static inline void __init page_cgroup_init(void)
 
 struct page_cgroup *lookup_page_cgroup(struct page *page);
 struct page *lookup_cgroup_page(struct page_cgroup *pc);
+static inline void memmap_init_cgroup(struct page *page)
+{
+}
+#else
+static inline struct page_cgroup *lookup_page_cgroup(struct page *page)
+{
+	return (struct page_cgroup*)&page->page_cgroup;
+}
+
+static inline struct page *lookup_cgroup_page(struct page_cgroup *pc)
+{
+	return container_of((unsigned long*)pc, struct page, page_cgroup);
+}
+
+static inline void memmap_init_cgroup(struct page *page)
+{
+	page->page_cgroup = 0;
+}
+
+static inline void __init page_cgroup_init_flatmem(void)
+{
+}
+static inline void __init page_cgroup_init(void)
+{
+}
+
+static inline void pgdat_page_cgroup_init(struct pglist_data *pgdat)
+{
+}
+#endif
 
 #define TESTPCGFLAG(uname, lname)			\
 static inline int PageCgroup##uname(struct page_cgroup *pc)	\
diff --git a/init/Kconfig b/init/Kconfig
index e0bfe92..99514c2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -694,6 +694,20 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED
 	  For those who want to have the feature enabled by default should
 	  select this option (if, for some reason, they need to disable it
 	  then swapaccount=0 does the trick).
+
+config INTEGRATED_PAGE_CGROUP
+	bool "record memory cgroup information into struct page"
+	depends on CGROUP_MEM_RES_CTLR
+	default n
+	help
+	  Memory Resource Controller consumes 4/(8 if 64bit)bytes per page.
+	  It's independent of 'struct page'. If you say Y here, memory cgroup
+	  information is recorded into struct page and increase size of it
+	  4/8 bytes. With this, cpu overheads in runtime will be reduced
+	  but you cannot avoid above overheads of 4/8 bytes per page by boot
+	  option. If unsure, say N.
+
+
 config CGROUP_MEM_RES_CTLR_KMEM
 	bool "Memory Resource Controller Kernel Memory accounting (EXPERIMENTAL)"
 	depends on CGROUP_MEM_RES_CTLR && EXPERIMENTAL
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 767bef3..0c5b15c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2557,7 +2557,8 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 	if (!PageCgroupUsed(head_pc))
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
-		pc = head_pc + i;
+		/* page struct is contiguous in hugepage. */
+		pc = lookup_page_cgroup(head + i);
 		pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED));
 	}
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b37873..9be94df 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3682,6 +3682,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if (!is_highmem_idx(zone))
 			set_page_address(page, __va(pfn << PAGE_SHIFT));
 #endif
+		memmap_init_cgroup(page);
 	}
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 1ccbd71..036c8ea 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -11,6 +11,7 @@
 #include <linux/swapops.h>
 #include <linux/kmemleak.h>
 
+#ifndef CONFIG_INTEGRATED_PAGE_CGROUP
 static unsigned long total_usage;
 
 #if !defined(CONFIG_SPARSEMEM)
@@ -315,7 +316,7 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 }
 
 #endif
-
+#endif
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
-- 
1.7.4.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
  2012-03-28 10:51   ` KAMEZAWA Hiroyuki
@ 2012-04-17 21:17     ` Ying Han
  -1 siblings, 0 replies; 24+ messages in thread
From: Ying Han @ 2012-04-17 21:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

On Wed, Mar 28, 2012 at 3:51 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
> And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   18 ++++++++++++++++++
>  mm/memcontrol.c             |   18 ++++--------------
>  2 files changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 92768cb..2707809 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -1,6 +1,8 @@
>  #ifndef __LINUX_PAGE_CGROUP_H
>  #define __LINUX_PAGE_CGROUP_H
>
> +#include <linux/smp.h>
> +
>  enum {
>        /* flags for mem_cgroup */
>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
> @@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>        pc->mem_cgroup = memcg;
>  }
>
> +static inline void
> +pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
> +                       unsigned long flags)
> +{
> +       pc->mem_cgroup = memcg;
> +       /*
> +        * We access a page_cgroup asynchronously without lock_page_cgroup().
> +        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> +        * is accessed after testing USED bit. To make pc's mem_cgroup visible
> +        * before USED bit, we need memory barrier here.
> +        * See mem_cgroup_add_lru_list(), etc.
> +        */
> +       smp_wmb();
> +       pc->flags = flags;
> +}
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct page_cgroup;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8077460..d366b60 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>                }
>        }
>
> -       pc_set_mem_cgroup(pc, memcg);
> -       /*
> -        * We access a page_cgroup asynchronously without lock_page_cgroup().
> -        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> -        * is accessed after testing USED bit. To make pc's mem_cgroup visible
> -        * before USED bit, we need memory barrier here.
> -        * See mem_cgroup_add_lru_list(), etc.
> -        */
> -       smp_wmb();
> -       SetPageCgroupUsed(pc);

I might be confused. We removed this SetPageCgroupUsed() but not
adding it back elsewhere ?

--Ying

> +       pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
>
>        if (lrucare) {
>                if (was_on_lru) {
> @@ -2549,7 +2540,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>
> -#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
>  /*
>  * Because tail pages are not marked as "used", set it. We're under
>  * zone->lru_lock, 'splitting on pmd' and compound_lock.
> @@ -2565,11 +2555,11 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>
>        if (mem_cgroup_disabled())
>                return;
> +       if (!PageCgroupUsed(head_pc))
> +               return;
>        for (i = 1; i < HPAGE_PMD_NR; i++) {
>                pc = head_pc + i;
> -               pc_set_mem_cgroup(pc, memcg);
> -               smp_wmb();/* see __commit_charge() */
> -               pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
> +               pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED));
>        }
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> --
> 1.7.4.1
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
@ 2012-04-17 21:17     ` Ying Han
  0 siblings, 0 replies; 24+ messages in thread
From: Ying Han @ 2012-04-17 21:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

On Wed, Mar 28, 2012 at 3:51 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
> And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   18 ++++++++++++++++++
>  mm/memcontrol.c             |   18 ++++--------------
>  2 files changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 92768cb..2707809 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -1,6 +1,8 @@
>  #ifndef __LINUX_PAGE_CGROUP_H
>  #define __LINUX_PAGE_CGROUP_H
>
> +#include <linux/smp.h>
> +
>  enum {
>        /* flags for mem_cgroup */
>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
> @@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>        pc->mem_cgroup = memcg;
>  }
>
> +static inline void
> +pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
> +                       unsigned long flags)
> +{
> +       pc->mem_cgroup = memcg;
> +       /*
> +        * We access a page_cgroup asynchronously without lock_page_cgroup().
> +        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> +        * is accessed after testing USED bit. To make pc's mem_cgroup visible
> +        * before USED bit, we need memory barrier here.
> +        * See mem_cgroup_add_lru_list(), etc.
> +        */
> +       smp_wmb();
> +       pc->flags = flags;
> +}
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct page_cgroup;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8077460..d366b60 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>                }
>        }
>
> -       pc_set_mem_cgroup(pc, memcg);
> -       /*
> -        * We access a page_cgroup asynchronously without lock_page_cgroup().
> -        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> -        * is accessed after testing USED bit. To make pc's mem_cgroup visible
> -        * before USED bit, we need memory barrier here.
> -        * See mem_cgroup_add_lru_list(), etc.
> -        */
> -       smp_wmb();
> -       SetPageCgroupUsed(pc);

I might be confused. We removed this SetPageCgroupUsed() but not
adding it back elsewhere ?

--Ying

> +       pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
>
>        if (lrucare) {
>                if (was_on_lru) {
> @@ -2549,7 +2540,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>
> -#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
>  /*
>  * Because tail pages are not marked as "used", set it. We're under
>  * zone->lru_lock, 'splitting on pmd' and compound_lock.
> @@ -2565,11 +2555,11 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>
>        if (mem_cgroup_disabled())
>                return;
> +       if (!PageCgroupUsed(head_pc))
> +               return;
>        for (i = 1; i < HPAGE_PMD_NR; i++) {
>                pc = head_pc + i;
> -               pc_set_mem_cgroup(pc, memcg);
> -               smp_wmb();/* see __commit_charge() */
> -               pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
> +               pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED));
>        }
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> --
> 1.7.4.1
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 3/6] memcg: add PageCgroupReset()
  2012-03-28 10:57   ` KAMEZAWA Hiroyuki
@ 2012-04-17 21:25     ` Ying Han
  -1 siblings, 0 replies; 24+ messages in thread
From: Ying Han @ 2012-04-17 21:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

On Wed, Mar 28, 2012 at 3:57 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>  A commit "memcg: simplify LRU handling by new rule" removes PCG_ACCT_LRU.
>  and the bug introduced by it was fixed by "memcg: fix GPF when cgroup removal
>  races with last exit"
>
> This was for reducing flags on pc->flags....Now, we have 3bits of flags.
> but this patch adds a new flag, I'm sorry. (Considering alignment of
> kmalloc(), we'll able to have 5 bits..)
>
> This patch adds PCG_RESET which is similar to PCG_ACCT_LRU.



This is set
> when mem_cgroup_add_lru_list() finds we cannot trust the pc's mem_cgroup.

Do we still need the new flag? I assume some of the upcoming patches
will provide the guarantee of pc->mem_cgroup.

--Ying
>
> The reason why this patch adds a (renamed) flag again is for merging
> pc->flags and pc->mem_cgroup. Assume pc's mem_cgroup is encoded as
>
>        mem_cgroup = pc->flags & ~0x7
>
> Updating multiple bits of pc->flags without talking lock_page_cgroup()
> is very dangerous. And mem_cgroup_add_lru_list() updates pc->mem_cgroup
> without taking lock. Then I add RESET bit. After this, pc_to_mem_cgroup()
> is written as
>
>        if (PageCgroupReset(pc))
>                return root_mem_cgroup;
>        return pc->mem_cgroup;
>
> This update of Reset bit can be done in atomic by set_bit(). And
> cleared when USED bit is set.
>
> Considering kmalloc()'s alignment, having 4bits of flags will be ok....
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   15 ++++++++-------
>  mm/memcontrol.c             |    5 +++--
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 2707809..3f3b4ff 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -8,6 +8,7 @@ enum {
>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>        PCG_USED, /* this object is in use. */
>        PCG_MIGRATION, /* under page migration */
> +       PCG_RESET,     /* have been reset to root_mem_cgroup */
>        __NR_PCG_FLAGS,
>  };
>
> @@ -70,6 +71,9 @@ SETPCGFLAG(Migration, MIGRATION)
>  CLEARPCGFLAG(Migration, MIGRATION)
>  TESTPCGFLAG(Migration, MIGRATION)
>
> +TESTPCGFLAG(Reset, RESET)
> +SETPCGFLAG(Reset, RESET)
> +
>  static inline void lock_page_cgroup(struct page_cgroup *pc)
>  {
>        /*
> @@ -84,16 +88,13 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
>        bit_spin_unlock(PCG_LOCK, &pc->flags);
>  }
>
> +extern struct mem_cgroup*  root_mem_cgroup;
>
>  static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
>  {
> -       return pc->mem_cgroup;
> -}
> -
> -static inline void
> -pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
> -{
> -       pc->mem_cgroup = memcg;
> +       if (likely(!PageCgroupReset(pc)))
> +               return pc->mem_cgroup;
> +       return root_mem_cgroup;
>  }
>
>  static inline void
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d366b60..622fd2e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1080,7 +1080,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>         * of pc's mem_cgroup safe.
>         */
>        if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
> -               pc_set_mem_cgroup(pc, root_mem_cgroup);
> +               /* this reset bit is cleared when the page is charged */
> +               SetPageCgroupReset(pc);
>                memcg = root_mem_cgroup;
>        }
>
> @@ -2626,7 +2627,7 @@ static int mem_cgroup_move_account(struct page *page,
>                __mem_cgroup_cancel_charge(from, nr_pages);
>
>        /* caller should have done css_get */
> -       pc_set_mem_cgroup(pc, to);
> +       pc_set_mem_cgroup_and_flags(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
>        mem_cgroup_charge_statistics(to, anon, nr_pages);
>        /*
>         * We charges against "to" which may not have any tasks. Then, "to"
> --
> 1.7.4.1
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 3/6] memcg: add PageCgroupReset()
@ 2012-04-17 21:25     ` Ying Han
  0 siblings, 0 replies; 24+ messages in thread
From: Ying Han @ 2012-04-17 21:25 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

On Wed, Mar 28, 2012 at 3:57 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>
>  A commit "memcg: simplify LRU handling by new rule" removes PCG_ACCT_LRU.
>  and the bug introduced by it was fixed by "memcg: fix GPF when cgroup removal
>  races with last exit"
>
> This was for reducing flags on pc->flags....Now, we have 3bits of flags.
> but this patch adds a new flag, I'm sorry. (Considering alignment of
> kmalloc(), we'll able to have 5 bits..)
>
> This patch adds PCG_RESET which is similar to PCG_ACCT_LRU.



This is set
> when mem_cgroup_add_lru_list() finds we cannot trust the pc's mem_cgroup.

Do we still need the new flag? I assume some of the upcoming patches
will provide the guarantee of pc->mem_cgroup.

--Ying
>
> The reason why this patch adds a (renamed) flag again is for merging
> pc->flags and pc->mem_cgroup. Assume pc's mem_cgroup is encoded as
>
>        mem_cgroup = pc->flags & ~0x7
>
> Updating multiple bits of pc->flags without talking lock_page_cgroup()
> is very dangerous. And mem_cgroup_add_lru_list() updates pc->mem_cgroup
> without taking lock. Then I add RESET bit. After this, pc_to_mem_cgroup()
> is written as
>
>        if (PageCgroupReset(pc))
>                return root_mem_cgroup;
>        return pc->mem_cgroup;
>
> This update of Reset bit can be done in atomic by set_bit(). And
> cleared when USED bit is set.
>
> Considering kmalloc()'s alignment, having 4bits of flags will be ok....
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   15 ++++++++-------
>  mm/memcontrol.c             |    5 +++--
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 2707809..3f3b4ff 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -8,6 +8,7 @@ enum {
>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>        PCG_USED, /* this object is in use. */
>        PCG_MIGRATION, /* under page migration */
> +       PCG_RESET,     /* have been reset to root_mem_cgroup */
>        __NR_PCG_FLAGS,
>  };
>
> @@ -70,6 +71,9 @@ SETPCGFLAG(Migration, MIGRATION)
>  CLEARPCGFLAG(Migration, MIGRATION)
>  TESTPCGFLAG(Migration, MIGRATION)
>
> +TESTPCGFLAG(Reset, RESET)
> +SETPCGFLAG(Reset, RESET)
> +
>  static inline void lock_page_cgroup(struct page_cgroup *pc)
>  {
>        /*
> @@ -84,16 +88,13 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
>        bit_spin_unlock(PCG_LOCK, &pc->flags);
>  }
>
> +extern struct mem_cgroup*  root_mem_cgroup;
>
>  static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
>  {
> -       return pc->mem_cgroup;
> -}
> -
> -static inline void
> -pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
> -{
> -       pc->mem_cgroup = memcg;
> +       if (likely(!PageCgroupReset(pc)))
> +               return pc->mem_cgroup;
> +       return root_mem_cgroup;
>  }
>
>  static inline void
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d366b60..622fd2e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1080,7 +1080,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>         * of pc's mem_cgroup safe.
>         */
>        if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
> -               pc_set_mem_cgroup(pc, root_mem_cgroup);
> +               /* this reset bit is cleared when the page is charged */
> +               SetPageCgroupReset(pc);
>                memcg = root_mem_cgroup;
>        }
>
> @@ -2626,7 +2627,7 @@ static int mem_cgroup_move_account(struct page *page,
>                __mem_cgroup_cancel_charge(from, nr_pages);
>
>        /* caller should have done css_get */
> -       pc_set_mem_cgroup(pc, to);
> +       pc_set_mem_cgroup_and_flags(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
>        mem_cgroup_charge_statistics(to, anon, nr_pages);
>        /*
>         * We charges against "to" which may not have any tasks. Then, "to"
> --
> 1.7.4.1
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
  2012-04-17 21:17     ` Ying Han
@ 2012-04-18  7:23       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:23 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

(2012/04/18 6:17), Ying Han wrote:

> On Wed, Mar 28, 2012 at 3:51 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
>> And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/page_cgroup.h |   18 ++++++++++++++++++
>>  mm/memcontrol.c             |   18 ++++--------------
>>  2 files changed, 22 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 92768cb..2707809 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -1,6 +1,8 @@
>>  #ifndef __LINUX_PAGE_CGROUP_H
>>  #define __LINUX_PAGE_CGROUP_H
>>
>> +#include <linux/smp.h>
>> +
>>  enum {
>>        /* flags for mem_cgroup */
>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>> @@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>>        pc->mem_cgroup = memcg;
>>  }
>>
>> +static inline void
>> +pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
>> +                       unsigned long flags)
>> +{
>> +       pc->mem_cgroup = memcg;
>> +       /*
>> +        * We access a page_cgroup asynchronously without lock_page_cgroup().
>> +        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>> +        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>> +        * before USED bit, we need memory barrier here.
>> +        * See mem_cgroup_add_lru_list(), etc.
>> +        */
>> +       smp_wmb();
>> +       pc->flags = flags;
>> +}
>> +
>>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>>  struct page_cgroup;
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 8077460..d366b60 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>>                }
>>        }
>>
>> -       pc_set_mem_cgroup(pc, memcg);
>> -       /*
>> -        * We access a page_cgroup asynchronously without lock_page_cgroup().
>> -        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>> -        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>> -        * before USED bit, we need memory barrier here.
>> -        * See mem_cgroup_add_lru_list(), etc.
>> -        */
>> -       smp_wmb();
>> -       SetPageCgroupUsed(pc);
> 
> I might be confused. We removed this SetPageCgroupUsed() but not
> adding it back elsewhere ?
> 
> --Ying
> 
>> +       pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));


Added here. This sets  

  | memcg pointer | Used | Locked |


Thanks,
-Kame





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
@ 2012-04-18  7:23       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:23 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

(2012/04/18 6:17), Ying Han wrote:

> On Wed, Mar 28, 2012 at 3:51 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
>> And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/page_cgroup.h |   18 ++++++++++++++++++
>>  mm/memcontrol.c             |   18 ++++--------------
>>  2 files changed, 22 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 92768cb..2707809 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -1,6 +1,8 @@
>>  #ifndef __LINUX_PAGE_CGROUP_H
>>  #define __LINUX_PAGE_CGROUP_H
>>
>> +#include <linux/smp.h>
>> +
>>  enum {
>>        /* flags for mem_cgroup */
>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>> @@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>>        pc->mem_cgroup = memcg;
>>  }
>>
>> +static inline void
>> +pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
>> +                       unsigned long flags)
>> +{
>> +       pc->mem_cgroup = memcg;
>> +       /*
>> +        * We access a page_cgroup asynchronously without lock_page_cgroup().
>> +        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>> +        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>> +        * before USED bit, we need memory barrier here.
>> +        * See mem_cgroup_add_lru_list(), etc.
>> +        */
>> +       smp_wmb();
>> +       pc->flags = flags;
>> +}
>> +
>>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>>  struct page_cgroup;
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 8077460..d366b60 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>>                }
>>        }
>>
>> -       pc_set_mem_cgroup(pc, memcg);
>> -       /*
>> -        * We access a page_cgroup asynchronously without lock_page_cgroup().
>> -        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>> -        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>> -        * before USED bit, we need memory barrier here.
>> -        * See mem_cgroup_add_lru_list(), etc.
>> -        */
>> -       smp_wmb();
>> -       SetPageCgroupUsed(pc);
> 
> I might be confused. We removed this SetPageCgroupUsed() but not
> adding it back elsewhere ?
> 
> --Ying
> 
>> +       pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));


Added here. This sets  

  | memcg pointer | Used | Locked |


Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 3/6] memcg: add PageCgroupReset()
  2012-04-17 21:25     ` Ying Han
@ 2012-04-18  7:34       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:34 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

(2012/04/18 6:25), Ying Han wrote:

> On Wed, Mar 28, 2012 at 3:57 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>  A commit "memcg: simplify LRU handling by new rule" removes PCG_ACCT_LRU.
>>  and the bug introduced by it was fixed by "memcg: fix GPF when cgroup removal
>>  races with last exit"
>>
>> This was for reducing flags on pc->flags....Now, we have 3bits of flags.
>> but this patch adds a new flag, I'm sorry. (Considering alignment of
>> kmalloc(), we'll able to have 5 bits..)
>>
>> This patch adds PCG_RESET which is similar to PCG_ACCT_LRU.
> 
> 
> 
> This is set
>> when mem_cgroup_add_lru_list() finds we cannot trust the pc's mem_cgroup.
> 
> Do we still need the new flag? I assume some of the upcoming patches
> will provide the guarantee of pc->mem_cgroup.
> 


If per-lruvec locking can do reference-count GC, memcg will never be freed
there are pages which points to the memcg, and allows us to remove
whole this 'move to root' logic, I agree. We'll not require a new flag.

It means that memcg cannot be freed until there are not page(_cgroup) which
points to dead memcg. mem_cgroup_reset_owner() is removed, now.

We need to handle 2 cases.
  - newly allocated pages which linked to memcg before use.
  - unused pages but added to lru by some lazy logic.

And make a guarantee
  - pages added to LRU of dead memcg will be freed or moved to other cgroup, soon.

I have no good idea.

Thanks,
-Kame


> --Ying
>>
>> The reason why this patch adds a (renamed) flag again is for merging
>> pc->flags and pc->mem_cgroup. Assume pc's mem_cgroup is encoded as
>>
>>        mem_cgroup = pc->flags & ~0x7
>>
>> Updating multiple bits of pc->flags without talking lock_page_cgroup()
>> is very dangerous. And mem_cgroup_add_lru_list() updates pc->mem_cgroup
>> without taking lock. Then I add RESET bit. After this, pc_to_mem_cgroup()
>> is written as
>>
>>        if (PageCgroupReset(pc))
>>                return root_mem_cgroup;
>>        return pc->mem_cgroup;
>>
>> This update of Reset bit can be done in atomic by set_bit(). And
>> cleared when USED bit is set.
>>
>> Considering kmalloc()'s alignment, having 4bits of flags will be ok....
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/page_cgroup.h |   15 ++++++++-------
>>  mm/memcontrol.c             |    5 +++--
>>  2 files changed, 11 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 2707809..3f3b4ff 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -8,6 +8,7 @@ enum {
>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>>        PCG_USED, /* this object is in use. */
>>        PCG_MIGRATION, /* under page migration */
>> +       PCG_RESET,     /* have been reset to root_mem_cgroup */
>>        __NR_PCG_FLAGS,
>>  };
>>
>> @@ -70,6 +71,9 @@ SETPCGFLAG(Migration, MIGRATION)
>>  CLEARPCGFLAG(Migration, MIGRATION)
>>  TESTPCGFLAG(Migration, MIGRATION)
>>
>> +TESTPCGFLAG(Reset, RESET)
>> +SETPCGFLAG(Reset, RESET)
>> +
>>  static inline void lock_page_cgroup(struct page_cgroup *pc)
>>  {
>>        /*
>> @@ -84,16 +88,13 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
>>        bit_spin_unlock(PCG_LOCK, &pc->flags);
>>  }
>>
>> +extern struct mem_cgroup*  root_mem_cgroup;
>>
>>  static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
>>  {
>> -       return pc->mem_cgroup;
>> -}
>> -
>> -static inline void
>> -pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>> -{
>> -       pc->mem_cgroup = memcg;
>> +       if (likely(!PageCgroupReset(pc)))
>> +               return pc->mem_cgroup;
>> +       return root_mem_cgroup;
>>  }
>>
>>  static inline void
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d366b60..622fd2e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1080,7 +1080,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>>         * of pc's mem_cgroup safe.
>>         */
>>        if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
>> -               pc_set_mem_cgroup(pc, root_mem_cgroup);
>> +               /* this reset bit is cleared when the page is charged */
>> +               SetPageCgroupReset(pc);
>>                memcg = root_mem_cgroup;
>>        }
>>
>> @@ -2626,7 +2627,7 @@ static int mem_cgroup_move_account(struct page *page,
>>                __mem_cgroup_cancel_charge(from, nr_pages);
>>
>>        /* caller should have done css_get */
>> -       pc_set_mem_cgroup(pc, to);
>> +       pc_set_mem_cgroup_and_flags(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
>>        mem_cgroup_charge_statistics(to, anon, nr_pages);
>>        /*
>>         * We charges against "to" which may not have any tasks. Then, "to"
>> --
>> 1.7.4.1
>>
>>
> 
> 




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 3/6] memcg: add PageCgroupReset()
@ 2012-04-18  7:34       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-04-18  7:34 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

(2012/04/18 6:25), Ying Han wrote:

> On Wed, Mar 28, 2012 at 3:57 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>  A commit "memcg: simplify LRU handling by new rule" removes PCG_ACCT_LRU.
>>  and the bug introduced by it was fixed by "memcg: fix GPF when cgroup removal
>>  races with last exit"
>>
>> This was for reducing flags on pc->flags....Now, we have 3bits of flags.
>> but this patch adds a new flag, I'm sorry. (Considering alignment of
>> kmalloc(), we'll able to have 5 bits..)
>>
>> This patch adds PCG_RESET which is similar to PCG_ACCT_LRU.
> 
> 
> 
> This is set
>> when mem_cgroup_add_lru_list() finds we cannot trust the pc's mem_cgroup.
> 
> Do we still need the new flag? I assume some of the upcoming patches
> will provide the guarantee of pc->mem_cgroup.
> 


If per-lruvec locking can do reference-count GC, memcg will never be freed
there are pages which points to the memcg, and allows us to remove
whole this 'move to root' logic, I agree. We'll not require a new flag.

It means that memcg cannot be freed until there are not page(_cgroup) which
points to dead memcg. mem_cgroup_reset_owner() is removed, now.

We need to handle 2 cases.
  - newly allocated pages which linked to memcg before use.
  - unused pages but added to lru by some lazy logic.

And make a guarantee
  - pages added to LRU of dead memcg will be freed or moved to other cgroup, soon.

I have no good idea.

Thanks,
-Kame


> --Ying
>>
>> The reason why this patch adds a (renamed) flag again is for merging
>> pc->flags and pc->mem_cgroup. Assume pc's mem_cgroup is encoded as
>>
>>        mem_cgroup = pc->flags & ~0x7
>>
>> Updating multiple bits of pc->flags without talking lock_page_cgroup()
>> is very dangerous. And mem_cgroup_add_lru_list() updates pc->mem_cgroup
>> without taking lock. Then I add RESET bit. After this, pc_to_mem_cgroup()
>> is written as
>>
>>        if (PageCgroupReset(pc))
>>                return root_mem_cgroup;
>>        return pc->mem_cgroup;
>>
>> This update of Reset bit can be done in atomic by set_bit(). And
>> cleared when USED bit is set.
>>
>> Considering kmalloc()'s alignment, having 4bits of flags will be ok....
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/page_cgroup.h |   15 ++++++++-------
>>  mm/memcontrol.c             |    5 +++--
>>  2 files changed, 11 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 2707809..3f3b4ff 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -8,6 +8,7 @@ enum {
>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>>        PCG_USED, /* this object is in use. */
>>        PCG_MIGRATION, /* under page migration */
>> +       PCG_RESET,     /* have been reset to root_mem_cgroup */
>>        __NR_PCG_FLAGS,
>>  };
>>
>> @@ -70,6 +71,9 @@ SETPCGFLAG(Migration, MIGRATION)
>>  CLEARPCGFLAG(Migration, MIGRATION)
>>  TESTPCGFLAG(Migration, MIGRATION)
>>
>> +TESTPCGFLAG(Reset, RESET)
>> +SETPCGFLAG(Reset, RESET)
>> +
>>  static inline void lock_page_cgroup(struct page_cgroup *pc)
>>  {
>>        /*
>> @@ -84,16 +88,13 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
>>        bit_spin_unlock(PCG_LOCK, &pc->flags);
>>  }
>>
>> +extern struct mem_cgroup*  root_mem_cgroup;
>>
>>  static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
>>  {
>> -       return pc->mem_cgroup;
>> -}
>> -
>> -static inline void
>> -pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>> -{
>> -       pc->mem_cgroup = memcg;
>> +       if (likely(!PageCgroupReset(pc)))
>> +               return pc->mem_cgroup;
>> +       return root_mem_cgroup;
>>  }
>>
>>  static inline void
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index d366b60..622fd2e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1080,7 +1080,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>>         * of pc's mem_cgroup safe.
>>         */
>>        if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
>> -               pc_set_mem_cgroup(pc, root_mem_cgroup);
>> +               /* this reset bit is cleared when the page is charged */
>> +               SetPageCgroupReset(pc);
>>                memcg = root_mem_cgroup;
>>        }
>>
>> @@ -2626,7 +2627,7 @@ static int mem_cgroup_move_account(struct page *page,
>>                __mem_cgroup_cancel_charge(from, nr_pages);
>>
>>        /* caller should have done css_get */
>> -       pc_set_mem_cgroup(pc, to);
>> +       pc_set_mem_cgroup_and_flags(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
>>        mem_cgroup_charge_statistics(to, anon, nr_pages);
>>        /*
>>         * We charges against "to" which may not have any tasks. Then, "to"
>> --
>> 1.7.4.1
>>
>>
> 
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
  2012-04-18  7:23       ` KAMEZAWA Hiroyuki
@ 2012-04-18 18:18         ` Ying Han
  -1 siblings, 0 replies; 24+ messages in thread
From: Ying Han @ 2012-04-18 18:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

On Wed, Apr 18, 2012 at 12:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/04/18 6:17), Ying Han wrote:
>
>> On Wed, Mar 28, 2012 at 3:51 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>> Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
>>> And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>>  include/linux/page_cgroup.h |   18 ++++++++++++++++++
>>>  mm/memcontrol.c             |   18 ++++--------------
>>>  2 files changed, 22 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>>> index 92768cb..2707809 100644
>>> --- a/include/linux/page_cgroup.h
>>> +++ b/include/linux/page_cgroup.h
>>> @@ -1,6 +1,8 @@
>>>  #ifndef __LINUX_PAGE_CGROUP_H
>>>  #define __LINUX_PAGE_CGROUP_H
>>>
>>> +#include <linux/smp.h>
>>> +
>>>  enum {
>>>        /* flags for mem_cgroup */
>>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>>> @@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>>>        pc->mem_cgroup = memcg;
>>>  }
>>>
>>> +static inline void
>>> +pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
>>> +                       unsigned long flags)
>>> +{
>>> +       pc->mem_cgroup = memcg;
>>> +       /*
>>> +        * We access a page_cgroup asynchronously without lock_page_cgroup().
>>> +        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>>> +        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>>> +        * before USED bit, we need memory barrier here.
>>> +        * See mem_cgroup_add_lru_list(), etc.
>>> +        */
>>> +       smp_wmb();
>>> +       pc->flags = flags;
>>> +}
>>> +
>>>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>>>  struct page_cgroup;
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 8077460..d366b60 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>>>                }
>>>        }
>>>
>>> -       pc_set_mem_cgroup(pc, memcg);
>>> -       /*
>>> -        * We access a page_cgroup asynchronously without lock_page_cgroup().
>>> -        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>>> -        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>>> -        * before USED bit, we need memory barrier here.
>>> -        * See mem_cgroup_add_lru_list(), etc.
>>> -        */
>>> -       smp_wmb();
>>> -       SetPageCgroupUsed(pc);
>>
>> I might be confused. We removed this SetPageCgroupUsed() but not
>> adding it back elsewhere ?
>>
>> --Ying
>>
>>> +       pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
>
>
> Added here. This sets
>
>  | memcg pointer | Used | Locked |

Ah. I missed that part.

Thanks

--Ying

>
>
> Thanks,
> -Kame
>
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags()
@ 2012-04-18 18:18         ` Ying Han
  0 siblings, 0 replies; 24+ messages in thread
From: Ying Han @ 2012-04-18 18:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, Linux Kernel, Michal Hocko, Johannes Weiner,
	Andrew Morton, Glauber Costa, Konstantin Khlebnikov,
	Suleiman Souhlal

On Wed, Apr 18, 2012 at 12:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> (2012/04/18 6:17), Ying Han wrote:
>
>> On Wed, Mar 28, 2012 at 3:51 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>> Consolidate a code for setting pc->mem_cgroup and USED bit which requires smp_wmb().
>>> And remove a macro PCGF_NOCOPY_AT_SPLIT which isn't helpful to read code, now.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>> ---
>>>  include/linux/page_cgroup.h |   18 ++++++++++++++++++
>>>  mm/memcontrol.c             |   18 ++++--------------
>>>  2 files changed, 22 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>>> index 92768cb..2707809 100644
>>> --- a/include/linux/page_cgroup.h
>>> +++ b/include/linux/page_cgroup.h
>>> @@ -1,6 +1,8 @@
>>>  #ifndef __LINUX_PAGE_CGROUP_H
>>>  #define __LINUX_PAGE_CGROUP_H
>>>
>>> +#include <linux/smp.h>
>>> +
>>>  enum {
>>>        /* flags for mem_cgroup */
>>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>>> @@ -94,6 +96,22 @@ pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
>>>        pc->mem_cgroup = memcg;
>>>  }
>>>
>>> +static inline void
>>> +pc_set_mem_cgroup_and_flags(struct page_cgroup *pc, struct mem_cgroup *memcg,
>>> +                       unsigned long flags)
>>> +{
>>> +       pc->mem_cgroup = memcg;
>>> +       /*
>>> +        * We access a page_cgroup asynchronously without lock_page_cgroup().
>>> +        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>>> +        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>>> +        * before USED bit, we need memory barrier here.
>>> +        * See mem_cgroup_add_lru_list(), etc.
>>> +        */
>>> +       smp_wmb();
>>> +       pc->flags = flags;
>>> +}
>>> +
>>>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>>>  struct page_cgroup;
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 8077460..d366b60 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -2511,16 +2511,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>>>                }
>>>        }
>>>
>>> -       pc_set_mem_cgroup(pc, memcg);
>>> -       /*
>>> -        * We access a page_cgroup asynchronously without lock_page_cgroup().
>>> -        * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>>> -        * is accessed after testing USED bit. To make pc's mem_cgroup visible
>>> -        * before USED bit, we need memory barrier here.
>>> -        * See mem_cgroup_add_lru_list(), etc.
>>> -        */
>>> -       smp_wmb();
>>> -       SetPageCgroupUsed(pc);
>>
>> I might be confused. We removed this SetPageCgroupUsed() but not
>> adding it back elsewhere ?
>>
>> --Ying
>>
>>> +       pc_set_mem_cgroup_and_flags(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
>
>
> Added here. This sets
>
>  | memcg pointer | Used | Locked |

Ah. I missed that part.

Thanks

--Ying

>
>
> Thanks,
> -Kame
>
>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-04-18 18:18 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28 10:44 [RFC][PATCH 0/6 v2] reducing page_cgroup size KAMEZAWA Hiroyuki
2012-03-28 10:44 ` KAMEZAWA Hiroyuki
2012-03-28 10:47 ` [RFC][PATCH 1/6] memcg: add methods to access pc->mem_cgroup KAMEZAWA Hiroyuki
2012-03-28 10:47   ` KAMEZAWA Hiroyuki
2012-03-28 10:51 ` [RFC][PATCH 2/6] memcg: add pc_set_mem_cgroup_and_flags() KAMEZAWA Hiroyuki
2012-03-28 10:51   ` KAMEZAWA Hiroyuki
2012-04-17 21:17   ` Ying Han
2012-04-17 21:17     ` Ying Han
2012-04-18  7:23     ` KAMEZAWA Hiroyuki
2012-04-18  7:23       ` KAMEZAWA Hiroyuki
2012-04-18 18:18       ` Ying Han
2012-04-18 18:18         ` Ying Han
2012-03-28 10:57 ` [RFC][PATCH 3/6] memcg: add PageCgroupReset() KAMEZAWA Hiroyuki
2012-03-28 10:57   ` KAMEZAWA Hiroyuki
2012-04-17 21:25   ` Ying Han
2012-04-17 21:25     ` Ying Han
2012-04-18  7:34     ` KAMEZAWA Hiroyuki
2012-04-18  7:34       ` KAMEZAWA Hiroyuki
2012-03-28 10:59 ` [RFC][[PATCH 4/6] memcg: remove mem_cgroup pointer from page_cgroup KAMEZAWA Hiroyuki
2012-03-28 10:59   ` KAMEZAWA Hiroyuki
2012-03-28 11:00 ` [RFC][PATCH 5/6] memcg: remove unnecessary memory barrier KAMEZAWA Hiroyuki
2012-03-28 11:00   ` KAMEZAWA Hiroyuki
2012-03-28 11:06 ` [RFC][PATCH 6/6] memcg: config for integrate page_cgroup into memmap KAMEZAWA Hiroyuki
2012-03-28 11:06   ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.