All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-19  7:56 KAMEZAWA Hiroyuki
  2012-03-19  7:59 ` [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup KAMEZAWA Hiroyuki
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:56 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

This is just an RFC...test is not enough yet.

I know it's merge window..this post is just for sharing idea.

This patch merges pc->flags and pc->mem_cgroup into a word. Then,
memcg's overhead will be 8bytes per page(4096bytes?).

Because this patch will affect all memory cgroup developers, I'd like to
show patches before MM Summit. I think we can agree the direction to
reduce size of page_cgroup..and finally integrate into 'struct page'
(and remove cgroup_disable= boot option...)

Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
Patch 2/3 - remove pc->mem_cgroup
Patch 3/3 - remove memory barriers.

I'm now wondering when this change should be merged....


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
  2012-03-19  7:56 [RFC][PATCH 0/3] page cgroup diet KAMEZAWA Hiroyuki
@ 2012-03-19  7:59 ` KAMEZAWA Hiroyuki
  2012-03-19 10:58   ` Glauber Costa
  2012-03-22 13:11     ` Michal Hocko
  2012-03-19  8:01 ` [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  7:59 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

In order to encode pc->mem_cgroup and pc->flags to be in a word,
access function to pc->mem_cgroup is required.

This patch replaces access to pc->mem_cgroup with
 pc_to_mem_cgroup(pc)          : pc->mem_cgroup
 pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg

Following patch will remove pc->mem_cgroup.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   12 +++++++
 mm/memcontrol.c             |   69 ++++++++++++++++++++++--------------------
 2 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index a88cdba..92768cb 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -82,6 +82,18 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 	bit_spin_unlock(PCG_LOCK, &pc->flags);
 }
 
+
+static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
+{
+	return pc->mem_cgroup;
+}
+
+static inline void
+pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
+{
+	pc->mem_cgroup = memcg;
+}
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct page_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c65e6bc..124fec9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1014,9 +1014,9 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
- * What we have to take care of here is validness of pc->mem_cgroup.
+ * What we have to take care of here is validness of pc's mem_cgroup.
  *
- * Changes to pc->mem_cgroup happens when
+ * Changes to pc's mem_cgroup happens when
  * 1. charge
  * 2. moving account
  * In typical case, "charge" is done before add-to-lru. Exception is SwapCache.
@@ -1048,7 +1048,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 		return &zone->lruvec;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 
 	/*
 	 * Surreptitiously switch any uncharged page to root:
@@ -1057,10 +1057,12 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 *
 	 * Our caller holds lru_lock, and PageCgroupUsed is updated
 	 * under page_cgroup lock: between them, they make all uses
-	 * of pc->mem_cgroup safe.
+	 * of pc's mem_cgroup safe.
 	 */
-	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
-		pc->mem_cgroup = memcg = root_mem_cgroup;
+	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
+		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		memcg = root_mem_cgroup;
+	}
 
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* compound_order() is stabilized through lru_lock */
@@ -1088,7 +1090,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
 		return;
 
 	pc = lookup_page_cgroup(page);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* huge page split is done under lru_lock. so, we have no races. */
@@ -1235,9 +1237,9 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
 	smp_rmb();
-	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
+	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
 
@@ -1314,7 +1316,7 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
  *
  * mem_cgroup_stolen() -  checking whether a cgroup is mc.from or not. This
  *			  is used for avoiding races in accounting.  If true,
- *			  pc->mem_cgroup may be overwritten.
+ *			  pc's mem_cgroup may be overwritten.
  *
  * mem_cgroup_under_move() - checking a cgroup is mc.from or mc.to or
  *			  under hierarchy of moving cgroups. This is for
@@ -1887,8 +1889,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
  * file-stat operations happen after a page is attached to radix-tree. There
  * are no race with "charge".
  *
- * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup
- * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even
+ * Considering "uncharge", we know that memcg doesn't clear pc's mem_cgroup
+ * at "uncharge" intentionally. So, we always see valid pc's mem_cgroup even
  * if there are race with "uncharge". Statistics itself is properly handled
  * by flags.
  *
@@ -1905,7 +1907,7 @@ void __mem_cgroup_begin_update_page_stat(struct page *page,
 
 	pc = lookup_page_cgroup(page);
 again:
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
 		return;
 	/*
@@ -1918,7 +1920,7 @@ again:
 		return;
 
 	move_lock_mem_cgroup(memcg, flags);
-	if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
+	if (memcg != pc_to_mem_cgroup(pc) || !PageCgroupUsed(pc)) {
 		move_unlock_mem_cgroup(memcg, flags);
 		goto again;
 	}
@@ -1930,11 +1932,11 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
 	struct page_cgroup *pc = lookup_page_cgroup(page);
 
 	/*
-	 * It's guaranteed that pc->mem_cgroup never changes while
-	 * lock is held because a routine modifies pc->mem_cgroup
+	 * It's guaranteed that pc's mem_cgroup never changes while
+	 * lock is held because a routine modifies pc's mem_cgroup
 	 * should take move_lock_page_cgroup().
 	 */
-	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
+	move_unlock_mem_cgroup(pc_to_mem_cgroup(pc), flags);
 }
 
 void mem_cgroup_update_page_stat(struct page *page,
@@ -1947,7 +1949,7 @@ void mem_cgroup_update_page_stat(struct page *page,
 	if (mem_cgroup_disabled())
 		return;
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
 		return;
 
@@ -2244,7 +2246,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
  * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
  * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
  * as possible without any hazards. 2: all pages should have a valid
- * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
+ * pc's mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
  * pointer, that is treated as a charge to root_mem_cgroup.
  *
  * So __mem_cgroup_try_charge() will return
@@ -2437,7 +2439,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_mem_cgroup(pc);
 		if (memcg && !css_tryget(&memcg->css))
 			memcg = NULL;
 	} else if (PageSwapCache(page)) {
@@ -2489,11 +2491,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc->mem_cgroup = memcg;
+	pc_set_mem_cgroup(pc, memcg);
 	/*
 	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
-	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
+	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
+	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
 	 * before USED bit, we need memory barrier here.
 	 * See mem_cgroup_add_lru_list(), etc.
  	 */
@@ -2538,13 +2540,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 {
 	struct page_cgroup *head_pc = lookup_page_cgroup(head);
 	struct page_cgroup *pc;
+	struct mem_cgroup *memcg = pc_to_mem_cgroup(head_pc);
 	int i;
 
 	if (mem_cgroup_disabled())
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc->mem_cgroup = head_pc->mem_cgroup;
+		pc_set_mem_cgroup(pc, memcg);
 		smp_wmb();/* see __commit_charge() */
 		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
 	}
@@ -2595,7 +2598,7 @@ static int mem_cgroup_move_account(struct page *page,
 	lock_page_cgroup(pc);
 
 	ret = -EINVAL;
-	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
+	if (!PageCgroupUsed(pc) || pc_to_mem_cgroup(pc) != from)
 		goto unlock;
 
 	move_lock_mem_cgroup(from, &flags);
@@ -2613,7 +2616,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc->mem_cgroup = to;
+	pc_set_mem_cgroup(pc, to);
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
@@ -2956,7 +2959,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	lock_page_cgroup(pc);
 
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 
 	if (!PageCgroupUsed(pc))
 		goto unlock_out;
@@ -2992,7 +2995,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 
 	ClearPageCgroupUsed(pc);
 	/*
-	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
+	 * pc's mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
 	 * to be reused (freed soon). Exception is SwapCache, it's handled by
 	 * special functions.
@@ -3214,7 +3217,7 @@ int mem_cgroup_prepare_migration(struct page *page,
 	pc = lookup_page_cgroup(page);
 	lock_page_cgroup(pc);
 	if (PageCgroupUsed(pc)) {
-		memcg = pc->mem_cgroup;
+		memcg = pc_to_mem_cgroup(pc);
 		css_get(&memcg->css);
 		/*
 		 * At migrating an anonymous page, its mapcount goes down
@@ -3359,7 +3362,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	pc = lookup_page_cgroup(oldpage);
 	/* fix accounting on old pages */
 	lock_page_cgroup(pc);
-	memcg = pc->mem_cgroup;
+	memcg = pc_to_mem_cgroup(pc);
 	mem_cgroup_charge_statistics(memcg, false, -1);
 	ClearPageCgroupUsed(pc);
 	unlock_page_cgroup(pc);
@@ -3370,7 +3373,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	/*
 	 * Even if newpage->mapping was NULL before starting replacement,
 	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
-	 * LRU while we overwrite pc->mem_cgroup.
+	 * LRU while we overwrite pc's mem_cgroup.
 	 */
 	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type, true);
 }
@@ -3406,7 +3409,7 @@ void mem_cgroup_print_bad_page(struct page *page)
 	pc = lookup_page_cgroup_used(page);
 	if (pc) {
 		printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p\n",
-		       pc, pc->flags, pc->mem_cgroup);
+		       pc, pc->flags, pc_to_mem_cgroup(pc));
 	}
 }
 #endif
@@ -5197,7 +5200,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
 		 * mem_cgroup_move_account() checks the pc is valid or not under
 		 * the lock.
 		 */
-		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
+		if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
 			ret = MC_TARGET_PAGE;
 			if (target)
 				target->page = page;
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup.
  2012-03-19  7:56 [RFC][PATCH 0/3] page cgroup diet KAMEZAWA Hiroyuki
  2012-03-19  7:59 ` [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup KAMEZAWA Hiroyuki
@ 2012-03-19  8:01 ` KAMEZAWA Hiroyuki
  2012-03-19 22:20   ` Suleiman Souhlal
  2012-03-22 13:11     ` Michal Hocko
  2012-03-19  8:03   ` KAMEZAWA Hiroyuki
  2012-03-19 19:59   ` Konstantin Khlebnikov
  3 siblings, 2 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  8:01 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

Now, page_cgroup->flags has only 3bits. Considering alignment of
struct mem_cgroup, which is allocated by kmalloc(), we can encode
pointer to mem_cgroup and flags into a word.

After this patch, pc->flags is encoded as

 63                           2     0
  | pointer to memcg..........|flags|

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 92768cb..bca5447 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -1,6 +1,10 @@
 #ifndef __LINUX_PAGE_CGROUP_H
 #define __LINUX_PAGE_CGROUP_H
 
+/*
+ * Because these flags are encoded into ->flags with a pointer,
+ * we cannot have too much flags.
+ */
 enum {
 	/* flags for mem_cgroup */
 	PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
@@ -9,6 +13,8 @@ enum {
 	__NR_PCG_FLAGS,
 };
 
+#define PCG_FLAGS_MASK	((1 << __NR_PCG_FLAGS) - 1)
+
 #ifndef __GENERATING_BOUNDS_H
 #include <generated/bounds.h>
 
@@ -21,10 +27,12 @@ enum {
  * page_cgroup helps us identify information about the cgroup
  * All page cgroups are allocated at boot or memory hotplug event,
  * then the page cgroup for pfn always exists.
+ *
+ * flags and a pointer to memory cgroup are encoded into ->flags.
+ * Lower 3bits are used for flags and others are used for a pointer to memcg.
  */
 struct page_cgroup {
 	unsigned long flags;
-	struct mem_cgroup *mem_cgroup;
 };
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
@@ -85,13 +93,14 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
 
 static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 {
-	return pc->mem_cgroup;
+	return (struct mem_cgroup *)(pc->flags & ~PCG_FLAGS_MASK);
 }
 
 static inline void
 pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
 {
-	pc->mem_cgroup = memcg;
+	unsigned long bits = pc->flags & PCG_FLAGS_MASK;
+	pc->flags = (unsigned long)memcg | bits;
 }
 
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC][PATCH 3/3] memcg: atomic update of memcg pointer and other bits.
@ 2012-03-19  8:03   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  8:03 UTC (permalink / raw)
  To: linux-mm
  Cc: cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

Because a pointer to memcg and flags are in the same word,
it can be updated at the same time. Then, we can remove
memory barrier which was used for fixing races.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    4 ++--
 mm/memcontrol.c             |   22 ++++------------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index bca5447..e05f157 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -97,9 +97,9 @@ static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 }
 
 static inline void
-pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
+pc_set_mem_cgroup(struct page_cgroup *pc,
+		struct mem_cgroup *memcg, unsigned long bits)
 {
-	unsigned long bits = pc->flags & PCG_FLAGS_MASK;
 	pc->flags = (unsigned long)memcg | bits;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 124fec9..603a476 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1060,7 +1060,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 * of pc's mem_cgroup safe.
 	 */
 	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
-		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		pc_set_mem_cgroup(pc, root_mem_cgroup, 0);
 		memcg = root_mem_cgroup;
 	}
 
@@ -1237,8 +1237,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
-	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
@@ -2491,16 +2489,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc_set_mem_cgroup(pc, memcg);
-	/*
-	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
-	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
-	 * before USED bit, we need memory barrier here.
-	 * See mem_cgroup_add_lru_list(), etc.
- 	 */
-	smp_wmb();
-	SetPageCgroupUsed(pc);
+	pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
 
 	if (lrucare) {
 		if (was_on_lru) {
@@ -2529,7 +2518,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2547,9 +2535,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc_set_mem_cgroup(pc, memcg);
-		smp_wmb();/* see __commit_charge() */
-		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+		pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED));
 	}
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
@@ -2616,7 +2602,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc_set_mem_cgroup(pc, to);
+	pc_set_mem_cgroup(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC][PATCH 3/3] memcg: atomic update of memcg pointer and other bits.
@ 2012-03-19  8:03   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19  8:03 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Michal Hocko,
	Hugh Dickins, Han Ying, Glauber Costa, Aneesh Kumar K.V,
	Andrew Morton, suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

Because a pointer to memcg and flags are in the same word,
it can be updated at the same time. Then, we can remove
memory barrier which was used for fixing races.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 include/linux/page_cgroup.h |    4 ++--
 mm/memcontrol.c             |   22 ++++------------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index bca5447..e05f157 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -97,9 +97,9 @@ static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
 }
 
 static inline void
-pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
+pc_set_mem_cgroup(struct page_cgroup *pc,
+		struct mem_cgroup *memcg, unsigned long bits)
 {
-	unsigned long bits = pc->flags & PCG_FLAGS_MASK;
 	pc->flags = (unsigned long)memcg | bits;
 }
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 124fec9..603a476 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1060,7 +1060,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 	 * of pc's mem_cgroup safe.
 	 */
 	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
-		pc_set_mem_cgroup(pc, root_mem_cgroup);
+		pc_set_mem_cgroup(pc, root_mem_cgroup, 0);
 		memcg = root_mem_cgroup;
 	}
 
@@ -1237,8 +1237,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 	pc = lookup_page_cgroup(page);
 	if (!PageCgroupUsed(pc))
 		return NULL;
-	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
-	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
 	return &mz->reclaim_stat;
 }
@@ -2491,16 +2489,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 		}
 	}
 
-	pc_set_mem_cgroup(pc, memcg);
-	/*
-	 * We access a page_cgroup asynchronously without lock_page_cgroup().
-	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
-	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
-	 * before USED bit, we need memory barrier here.
-	 * See mem_cgroup_add_lru_list(), etc.
- 	 */
-	smp_wmb();
-	SetPageCgroupUsed(pc);
+	pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
 
 	if (lrucare) {
 		if (was_on_lru) {
@@ -2529,7 +2518,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
-#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2547,9 +2535,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 		return;
 	for (i = 1; i < HPAGE_PMD_NR; i++) {
 		pc = head_pc + i;
-		pc_set_mem_cgroup(pc, memcg);
-		smp_wmb();/* see __commit_charge() */
-		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
+		pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED));
 	}
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
@@ -2616,7 +2602,7 @@ static int mem_cgroup_move_account(struct page *page,
 		__mem_cgroup_cancel_charge(from, nr_pages);
 
 	/* caller should have done css_get */
-	pc_set_mem_cgroup(pc, to);
+	pc_set_mem_cgroup(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
 	mem_cgroup_charge_statistics(to, anon, nr_pages);
 	/*
 	 * We charges against "to" which may not have any tasks. Then, "to"
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
  2012-03-19  7:59 ` [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup KAMEZAWA Hiroyuki
@ 2012-03-19 10:58   ` Glauber Costa
  2012-03-19 12:11       ` KAMEZAWA Hiroyuki
  2012-03-19 15:33       ` Michal Hocko
  2012-03-22 13:11     ` Michal Hocko
  1 sibling, 2 replies; 32+ messages in thread
From: Glauber Costa @ 2012-03-19 10:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Aneesh Kumar K.V, Andrew Morton, suleiman, n-horiguchi,
	khlebnikov, Tejun Heo

On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
> In order to encode pc->mem_cgroup and pc->flags to be in a word,
> access function to pc->mem_cgroup is required.
> 
> This patch replaces access to pc->mem_cgroup with
>   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
> 
> Following patch will remove pc->mem_cgroup.
> 
> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
Kame,

I can't see a reason not to merge this patch right now, regardless of
the other ones.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-19 12:11       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19 12:11 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Aneesh Kumar K.V, Andrew Morton, suleiman, n-horiguchi,
	khlebnikov, Tejun Heo

(2012/03/19 19:58), Glauber Costa wrote:

> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
>> In order to encode pc->mem_cgroup and pc->flags to be in a word,
>> access function to pc->mem_cgroup is required.
>>
>> This patch replaces access to pc->mem_cgroup with
>>   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>>   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
>>
>> Following patch will remove pc->mem_cgroup.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Kame,
> 
> I can't see a reason not to merge this patch right now, regardless of
> the other ones.
> 

Ok, if names of functions seems good, I'll post again.

Thanks,
-Kame 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-19 12:11       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-19 12:11 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Aneesh Kumar K.V, Andrew Morton, suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

(2012/03/19 19:58), Glauber Costa wrote:

> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
>> In order to encode pc->mem_cgroup and pc->flags to be in a word,
>> access function to pc->mem_cgroup is required.
>>
>> This patch replaces access to pc->mem_cgroup with
>>   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>>   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
>>
>> Following patch will remove pc->mem_cgroup.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Kame,
> 
> I can't see a reason not to merge this patch right now, regardless of
> the other ones.
> 

Ok, if names of functions seems good, I'll post again.

Thanks,
-Kame 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
  2012-03-19 12:11       ` KAMEZAWA Hiroyuki
  (?)
@ 2012-03-19 12:29       ` Glauber Costa
  -1 siblings, 0 replies; 32+ messages in thread
From: Glauber Costa @ 2012-03-19 12:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Aneesh Kumar K.V, Andrew Morton, suleiman, n-horiguchi,
	khlebnikov, Tejun Heo

On 03/19/2012 04:11 PM, KAMEZAWA Hiroyuki wrote:
> (2012/03/19 19:58), Glauber Costa wrote:
> 
>> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
>>> In order to encode pc->mem_cgroup and pc->flags to be in a word,
>>> access function to pc->mem_cgroup is required.
>>>
>>> This patch replaces access to pc->mem_cgroup with
>>>    pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>>>    pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
>>>
>>> Following patch will remove pc->mem_cgroup.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Kame,
>>
>> I can't see a reason not to merge this patch right now, regardless of
>> the other ones.
>>
> 
> Ok, if names of functions seems good, I'll post again.
> 
Acked-by: Glauber Costa <glommer@parallels.com>

just in case

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-19 15:33       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-19 15:33 UTC (permalink / raw)
  To: Glauber Costa
  Cc: KAMEZAWA Hiroyuki, linux-mm, cgroups, Johannes Weiner,
	Hugh Dickins, Han Ying, Aneesh Kumar K.V, Andrew Morton,
	suleiman, n-horiguchi, khlebnikov, Tejun Heo

On Mon 19-03-12 14:58:00, Glauber Costa wrote:
> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
> > In order to encode pc->mem_cgroup and pc->flags to be in a word,
> > access function to pc->mem_cgroup is required.
> > 
> > This patch replaces access to pc->mem_cgroup with
> >   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
> >   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
> > 
> > Following patch will remove pc->mem_cgroup.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Kame,
> 
> I can't see a reason not to merge this patch right now, regardless of
> the other ones.
 
I am not so sure about that. The patch doesn't do much on its own and
reference to the "following patch" might be confusing. Does it actually
help to rush it now?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-19 15:33       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-19 15:33 UTC (permalink / raw)
  To: Glauber Costa
  Cc: KAMEZAWA Hiroyuki, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Hugh Dickins,
	Han Ying, Aneesh Kumar K.V, Andrew Morton,
	suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

On Mon 19-03-12 14:58:00, Glauber Costa wrote:
> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
> > In order to encode pc->mem_cgroup and pc->flags to be in a word,
> > access function to pc->mem_cgroup is required.
> > 
> > This patch replaces access to pc->mem_cgroup with
> >   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
> >   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
> > 
> > Following patch will remove pc->mem_cgroup.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> Kame,
> 
> I can't see a reason not to merge this patch right now, regardless of
> the other ones.
 
I am not so sure about that. The patch doesn't do much on its own and
reference to the "following patch" might be confusing. Does it actually
help to rush it now?
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
  2012-03-19 15:33       ` Michal Hocko
  (?)
@ 2012-03-19 15:34       ` Glauber Costa
  -1 siblings, 0 replies; 32+ messages in thread
From: Glauber Costa @ 2012-03-19 15:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, linux-mm, cgroups, Johannes Weiner,
	Hugh Dickins, Han Ying, Aneesh Kumar K.V, Andrew Morton,
	suleiman, n-horiguchi, khlebnikov, Tejun Heo

On 03/19/2012 07:33 PM, Michal Hocko wrote:
> On Mon 19-03-12 14:58:00, Glauber Costa wrote:
>> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
>>> In order to encode pc->mem_cgroup and pc->flags to be in a word,
>>> access function to pc->mem_cgroup is required.
>>>
>>> This patch replaces access to pc->mem_cgroup with
>>>    pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>>>    pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
>>>
>>> Following patch will remove pc->mem_cgroup.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Kame,
>>
>> I can't see a reason not to merge this patch right now, regardless of
>> the other ones.
>
> I am not so sure about that. The patch doesn't do much on its own and
> reference to the "following patch" might be confusing. Does it actually
> help to rush it now?

The Changelog can be worked, that's for sure.
But thought as a style change, it does do good IMHO.

Of course this is an argument to get it in as much as it is an argument 
not to...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-19 19:59   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-19 19:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman, n-horiguchi, Tejun Heo

KAMEZAWA Hiroyuki wrote:
> This is just an RFC...test is not enough yet.
> 
> I know it's merge window..this post is just for sharing idea.
> 
> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
> memcg's overhead will be 8bytes per page(4096bytes?).
> 
> Because this patch will affect all memory cgroup developers, I'd like to
> show patches before MM Summit. I think we can agree the direction to
> reduce size of page_cgroup..and finally integrate into 'struct page'
> (and remove cgroup_disable= boot option...)
> 
> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
> Patch 2/3 - remove pc->mem_cgroup
> Patch 3/3 - remove memory barriers.
> 
> I'm now wondering when this change should be merged....
> 

This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.
I think we can replace zone-id and node-id in page->flags with cumulative dynamically allocated lruvec-id,
so there will be enough space for hundred cgroups even on 32-bit systems.

After lru_lock splitting page to lruvec translation will be much frequently used than page to zone,
so page->zone and page->node translations can be implemented as page->lruvec->zone and page->lruvec->node.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-19 19:59   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-19 19:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ, Tejun Heo

KAMEZAWA Hiroyuki wrote:
> This is just an RFC...test is not enough yet.
> 
> I know it's merge window..this post is just for sharing idea.
> 
> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
> memcg's overhead will be 8bytes per page(4096bytes?).
> 
> Because this patch will affect all memory cgroup developers, I'd like to
> show patches before MM Summit. I think we can agree the direction to
> reduce size of page_cgroup..and finally integrate into 'struct page'
> (and remove cgroup_disable= boot option...)
> 
> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
> Patch 2/3 - remove pc->mem_cgroup
> Patch 3/3 - remove memory barriers.
> 
> I'm now wondering when this change should be merged....
> 

This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.
I think we can replace zone-id and node-id in page->flags with cumulative dynamically allocated lruvec-id,
so there will be enough space for hundred cgroups even on 32-bit systems.

After lru_lock splitting page to lruvec translation will be much frequently used than page to zone,
so page->zone and page->node translations can be implemented as page->lruvec->zone and page->lruvec->node.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup.
  2012-03-19  8:01 ` [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup KAMEZAWA Hiroyuki
@ 2012-03-19 22:20   ` Suleiman Souhlal
  2012-03-21  0:47       ` KAMEZAWA Hiroyuki
  2012-03-22 13:11     ` Michal Hocko
  1 sibling, 1 reply; 32+ messages in thread
From: Suleiman Souhlal @ 2012-03-19 22:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	n-horiguchi, khlebnikov, Tejun Heo

2012/3/19 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
> Now, page_cgroup->flags has only 3bits. Considering alignment of
> struct mem_cgroup, which is allocated by kmalloc(), we can encode
> pointer to mem_cgroup and flags into a word.
>
> After this patch, pc->flags is encoded as
>
>  63                           2     0
>  | pointer to memcg..........|flags|
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   15 ++++++++++++---
>  1 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 92768cb..bca5447 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -1,6 +1,10 @@
>  #ifndef __LINUX_PAGE_CGROUP_H
>  #define __LINUX_PAGE_CGROUP_H
>
> +/*
> + * Because these flags are encoded into ->flags with a pointer,
> + * we cannot have too much flags.
> + */
>  enum {
>        /* flags for mem_cgroup */
>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
> @@ -9,6 +13,8 @@ enum {
>        __NR_PCG_FLAGS,
>  };
>
> +#define PCG_FLAGS_MASK ((1 << __NR_PCG_FLAGS) - 1)
> +
>  #ifndef __GENERATING_BOUNDS_H
>  #include <generated/bounds.h>
>
> @@ -21,10 +27,12 @@ enum {
>  * page_cgroup helps us identify information about the cgroup
>  * All page cgroups are allocated at boot or memory hotplug event,
>  * then the page cgroup for pfn always exists.
> + *
> + * flags and a pointer to memory cgroup are encoded into ->flags.
> + * Lower 3bits are used for flags and others are used for a pointer to memcg.

Would it be worth adding a BUILD_BUG_ON(__NR_PCG_FLAGS > 3) ?

-- Suleiman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup.
@ 2012-03-21  0:47       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  0:47 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	n-horiguchi, khlebnikov, Tejun Heo

(2012/03/20 7:20), Suleiman Souhlal wrote:

> 2012/3/19 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
>> Now, page_cgroup->flags has only 3bits. Considering alignment of
>> struct mem_cgroup, which is allocated by kmalloc(), we can encode
>> pointer to mem_cgroup and flags into a word.
>>
>> After this patch, pc->flags is encoded as
>>
>>  63                           2     0
>>  | pointer to memcg..........|flags|
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> ---
>>  include/linux/page_cgroup.h |   15 ++++++++++++---
>>  1 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 92768cb..bca5447 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -1,6 +1,10 @@
>>  #ifndef __LINUX_PAGE_CGROUP_H
>>  #define __LINUX_PAGE_CGROUP_H
>>
>> +/*
>> + * Because these flags are encoded into ->flags with a pointer,
>> + * we cannot have too much flags.
>> + */
>>  enum {
>>        /* flags for mem_cgroup */
>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>> @@ -9,6 +13,8 @@ enum {
>>        __NR_PCG_FLAGS,
>>  };
>>
>> +#define PCG_FLAGS_MASK ((1 << __NR_PCG_FLAGS) - 1)
>> +
>>  #ifndef __GENERATING_BOUNDS_H
>>  #include <generated/bounds.h>
>>
>> @@ -21,10 +27,12 @@ enum {
>>  * page_cgroup helps us identify information about the cgroup
>>  * All page cgroups are allocated at boot or memory hotplug event,
>>  * then the page cgroup for pfn always exists.
>> + *
>> + * flags and a pointer to memory cgroup are encoded into ->flags.
>> + * Lower 3bits are used for flags and others are used for a pointer to memcg.
> 
> Would it be worth adding a BUILD_BUG_ON(__NR_PCG_FLAGS > 3) ?
> 


Ok, I'll add that.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup.
@ 2012-03-21  0:47       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  0:47 UTC (permalink / raw)
  To: Suleiman Souhlal
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

(2012/03/20 7:20), Suleiman Souhlal wrote:

> 2012/3/19 KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>:
>> Now, page_cgroup->flags has only 3bits. Considering alignment of
>> struct mem_cgroup, which is allocated by kmalloc(), we can encode
>> pointer to mem_cgroup and flags into a word.
>>
>> After this patch, pc->flags is encoded as
>>
>>  63                           2     0
>>  | pointer to memcg..........|flags|
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>> ---
>>  include/linux/page_cgroup.h |   15 ++++++++++++---
>>  1 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
>> index 92768cb..bca5447 100644
>> --- a/include/linux/page_cgroup.h
>> +++ b/include/linux/page_cgroup.h
>> @@ -1,6 +1,10 @@
>>  #ifndef __LINUX_PAGE_CGROUP_H
>>  #define __LINUX_PAGE_CGROUP_H
>>
>> +/*
>> + * Because these flags are encoded into ->flags with a pointer,
>> + * we cannot have too much flags.
>> + */
>>  enum {
>>        /* flags for mem_cgroup */
>>        PCG_LOCK,  /* Lock for pc->mem_cgroup and following bits. */
>> @@ -9,6 +13,8 @@ enum {
>>        __NR_PCG_FLAGS,
>>  };
>>
>> +#define PCG_FLAGS_MASK ((1 << __NR_PCG_FLAGS) - 1)
>> +
>>  #ifndef __GENERATING_BOUNDS_H
>>  #include <generated/bounds.h>
>>
>> @@ -21,10 +27,12 @@ enum {
>>  * page_cgroup helps us identify information about the cgroup
>>  * All page cgroups are allocated at boot or memory hotplug event,
>>  * then the page cgroup for pfn always exists.
>> + *
>> + * flags and a pointer to memory cgroup are encoded into ->flags.
>> + * Lower 3bits are used for flags and others are used for a pointer to memcg.
> 
> Would it be worth adding a BUILD_BUG_ON(__NR_PCG_FLAGS > 3) ?
> 


Ok, I'll add that.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-21  1:02     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  1:02 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman, n-horiguchi, Tejun Heo

(2012/03/20 4:59), Konstantin Khlebnikov wrote:

> KAMEZAWA Hiroyuki wrote:
>> This is just an RFC...test is not enough yet.
>>
>> I know it's merge window..this post is just for sharing idea.
>>
>> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
>> memcg's overhead will be 8bytes per page(4096bytes?).
>>
>> Because this patch will affect all memory cgroup developers, I'd like to
>> show patches before MM Summit. I think we can agree the direction to
>> reduce size of page_cgroup..and finally integrate into 'struct page'
>> (and remove cgroup_disable= boot option...)
>>
>> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
>> Patch 2/3 - remove pc->mem_cgroup
>> Patch 3/3 - remove memory barriers.
>>
>> I'm now wondering when this change should be merged....
>>
> 
> This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.


Why we should skip and delay reduction of size of page_cgroup
which is considered as very big problem ? 

> I think we can replace zone-id and node-id in page->flags with cumulative dynamically allocated lruvec-id,
> so there will be enough space for hundred cgroups even on 32-bit systems.


Where section-id  is ?
IIUC, now, page->section->zone/node is calculated if CONFIG_SPARSEMEM.

BTW, I doubt that we can modify page->flags dynamically with multi-bit operations...using
cmpxchg per each page when it's charged/uncharged/other ?

> 
> After lru_lock splitting page to lruvec translation will be much frequently used than page to zone,
> so page->zone and page->node translations can be implemented as page->lruvec->zone and page->lruvec->node.
> 

And need to take rcu_read_lock() around page_zone() ?

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-21  1:02     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  1:02 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ, Tejun Heo

(2012/03/20 4:59), Konstantin Khlebnikov wrote:

> KAMEZAWA Hiroyuki wrote:
>> This is just an RFC...test is not enough yet.
>>
>> I know it's merge window..this post is just for sharing idea.
>>
>> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
>> memcg's overhead will be 8bytes per page(4096bytes?).
>>
>> Because this patch will affect all memory cgroup developers, I'd like to
>> show patches before MM Summit. I think we can agree the direction to
>> reduce size of page_cgroup..and finally integrate into 'struct page'
>> (and remove cgroup_disable= boot option...)
>>
>> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
>> Patch 2/3 - remove pc->mem_cgroup
>> Patch 3/3 - remove memory barriers.
>>
>> I'm now wondering when this change should be merged....
>>
> 
> This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.


Why we should skip and delay reduction of size of page_cgroup
which is considered as very big problem ? 

> I think we can replace zone-id and node-id in page->flags with cumulative dynamically allocated lruvec-id,
> so there will be enough space for hundred cgroups even on 32-bit systems.


Where section-id  is ?
IIUC, now, page->section->zone/node is calculated if CONFIG_SPARSEMEM.

BTW, I doubt that we can modify page->flags dynamically with multi-bit operations...using
cmpxchg per each page when it's charged/uncharged/other ?

> 
> After lru_lock splitting page to lruvec translation will be much frequently used than page to zone,
> so page->zone and page->node translations can be implemented as page->lruvec->zone and page->lruvec->node.
> 

And need to take rcu_read_lock() around page_zone() ?

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-21  1:06         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  1:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, linux-mm, cgroups, Johannes Weiner, Hugh Dickins,
	Han Ying, Aneesh Kumar K.V, Andrew Morton, suleiman, n-horiguchi,
	khlebnikov, Tejun Heo

(2012/03/20 0:33), Michal Hocko wrote:

> On Mon 19-03-12 14:58:00, Glauber Costa wrote:
>> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
>>> In order to encode pc->mem_cgroup and pc->flags to be in a word,
>>> access function to pc->mem_cgroup is required.
>>>
>>> This patch replaces access to pc->mem_cgroup with
>>>   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>>>   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
>>>
>>> Following patch will remove pc->mem_cgroup.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Kame,
>>
>> I can't see a reason not to merge this patch right now, regardless of
>> the other ones.
>  
> I am not so sure about that. The patch doesn't do much on its own and
> reference to the "following patch" might be confusing. Does it actually
> help to rush it now?


Hm. it sounds I should post full series (removing RFC) and finish
page cgroup diet all ASAP.

I'll continue updates and tests.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-21  1:06         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  1:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Glauber Costa, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Johannes Weiner, Hugh Dickins,
	Han Ying, Aneesh Kumar K.V, Andrew Morton,
	suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

(2012/03/20 0:33), Michal Hocko wrote:

> On Mon 19-03-12 14:58:00, Glauber Costa wrote:
>> On 03/19/2012 11:59 AM, KAMEZAWA Hiroyuki wrote:
>>> In order to encode pc->mem_cgroup and pc->flags to be in a word,
>>> access function to pc->mem_cgroup is required.
>>>
>>> This patch replaces access to pc->mem_cgroup with
>>>   pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>>>   pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
>>>
>>> Following patch will remove pc->mem_cgroup.
>>>
>>> Signed-off-by: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
>> Kame,
>>
>> I can't see a reason not to merge this patch right now, regardless of
>> the other ones.
>  
> I am not so sure about that. The patch doesn't do much on its own and
> reference to the "following patch" might be confusing. Does it actually
> help to rush it now?


Hm. it sounds I should post full series (removing RFC) and finish
page cgroup diet all ASAP.

I'll continue updates and tests.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-21  6:13       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-21  6:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman, n-horiguchi, Tejun Heo

KAMEZAWA Hiroyuki wrote:
> (2012/03/20 4:59), Konstantin Khlebnikov wrote:
> 
>> KAMEZAWA Hiroyuki wrote:
>>> This is just an RFC...test is not enough yet.
>>>
>>> I know it's merge window..this post is just for sharing idea.
>>>
>>> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
>>> memcg's overhead will be 8bytes per page(4096bytes?).
>>>
>>> Because this patch will affect all memory cgroup developers, I'd like to
>>> show patches before MM Summit. I think we can agree the direction to
>>> reduce size of page_cgroup..and finally integrate into 'struct page'
>>> (and remove cgroup_disable= boot option...)
>>>
>>> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
>>> Patch 2/3 - remove pc->mem_cgroup
>>> Patch 3/3 - remove memory barriers.
>>>
>>> I'm now wondering when this change should be merged....
>>>
>>
>> This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.
> 
> 
> Why we should skip and delay reduction of size of page_cgroup
> which is considered as very big problem ?

I think it would be better to solve problem completely and kill page_cgroup in one step.

> 
>> I think we can replace zone-id and node-id in page->flags with cumulative dynamically allocated lruvec-id,
>> so there will be enough space for hundred cgroups even on 32-bit systems.
> 
> 
> Where section-id  is ?
> IIUC, now, page->section->zone/node is calculated if CONFIG_SPARSEMEM.

Yeah, sections are biggest problem there. I hope we can unravel this knot.
In the worst case we can extend page->flags upto 64-bits.

> 
> BTW, I doubt that we can modify page->flags dynamically with multi-bit operations...using
> cmpxchg per each page when it's charged/uncharged/other ?

we can do atomic_xor(&page->flags, new-lruvec-id ^ old-lruvec-id) or
atomic_add(&page->flags, new-lruvec-id - old-lruvec-id) they should work faster than cmpxchg

> 
>>
>> After lru_lock splitting page to lruvec translation will be much frequently used than page to zone,
>> so page->zone and page->node translations can be implemented as page->lruvec->zone and page->lruvec->node.
>>
> 
> And need to take rcu_read_lock() around page_zone() ?

Hmm, it depends. For kernel-pages there will be pointer to root-lruvec, so no protection required.
If we hold lru_lock we also don't need this rcu_read_lock.

> 
> Thanks,
> -Kame
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-21  6:13       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2012-03-21  6:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ, Tejun Heo

KAMEZAWA Hiroyuki wrote:
> (2012/03/20 4:59), Konstantin Khlebnikov wrote:
> 
>> KAMEZAWA Hiroyuki wrote:
>>> This is just an RFC...test is not enough yet.
>>>
>>> I know it's merge window..this post is just for sharing idea.
>>>
>>> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
>>> memcg's overhead will be 8bytes per page(4096bytes?).
>>>
>>> Because this patch will affect all memory cgroup developers, I'd like to
>>> show patches before MM Summit. I think we can agree the direction to
>>> reduce size of page_cgroup..and finally integrate into 'struct page'
>>> (and remove cgroup_disable= boot option...)
>>>
>>> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
>>> Patch 2/3 - remove pc->mem_cgroup
>>> Patch 3/3 - remove memory barriers.
>>>
>>> I'm now wondering when this change should be merged....
>>>
>>
>> This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.
> 
> 
> Why we should skip and delay reduction of size of page_cgroup
> which is considered as very big problem ?

I think it would be better to solve problem completely and kill page_cgroup in one step.

> 
>> I think we can replace zone-id and node-id in page->flags with cumulative dynamically allocated lruvec-id,
>> so there will be enough space for hundred cgroups even on 32-bit systems.
> 
> 
> Where section-id  is ?
> IIUC, now, page->section->zone/node is calculated if CONFIG_SPARSEMEM.

Yeah, sections are biggest problem there. I hope we can unravel this knot.
In the worst case we can extend page->flags upto 64-bits.

> 
> BTW, I doubt that we can modify page->flags dynamically with multi-bit operations...using
> cmpxchg per each page when it's charged/uncharged/other ?

we can do atomic_xor(&page->flags, new-lruvec-id ^ old-lruvec-id) or
atomic_add(&page->flags, new-lruvec-id - old-lruvec-id) they should work faster than cmpxchg

> 
>>
>> After lru_lock splitting page to lruvec translation will be much frequently used than page to zone,
>> so page->zone and page->node translations can be implemented as page->lruvec->zone and page->lruvec->node.
>>
> 
> And need to take rcu_read_lock() around page_zone() ?

Hmm, it depends. For kernel-pages there will be pointer to root-lruvec, so no protection required.
If we hold lru_lock we also don't need this rcu_read_lock.

> 
> Thanks,
> -Kame
> 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-21  6:30         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  6:30 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, cgroups, Johannes Weiner, Michal Hocko, Hugh Dickins,
	Han Ying, Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman, n-horiguchi, Tejun Heo

(2012/03/21 15:13), Konstantin Khlebnikov wrote:

> KAMEZAWA Hiroyuki wrote:
>> (2012/03/20 4:59), Konstantin Khlebnikov wrote:
>>
>>> KAMEZAWA Hiroyuki wrote:
>>>> This is just an RFC...test is not enough yet.
>>>>
>>>> I know it's merge window..this post is just for sharing idea.
>>>>
>>>> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
>>>> memcg's overhead will be 8bytes per page(4096bytes?).
>>>>
>>>> Because this patch will affect all memory cgroup developers, I'd like to
>>>> show patches before MM Summit. I think we can agree the direction to
>>>> reduce size of page_cgroup..and finally integrate into 'struct page'
>>>> (and remove cgroup_disable= boot option...)
>>>>
>>>> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
>>>> Patch 2/3 - remove pc->mem_cgroup
>>>> Patch 3/3 - remove memory barriers.
>>>>
>>>> I'm now wondering when this change should be merged....
>>>>
>>>
>>> This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.
>>
>>
>> Why we should skip and delay reduction of size of page_cgroup
>> which is considered as very big problem ?
> 
> I think it would be better to solve problem completely and kill page_cgroup in one step.
> 


I like step-by-step rather than a big step. I devided this page_cgroup diet patch set
into 3 steps (2012-Jan, 2012-Feb, and the next one is the last.)

This change may introduce new races as step1 and step2 did (Thanks to Hugh Dickins!).

Finally, IIUC, 'struct page' has 8bytes of padding now. I think we can use that space
if we can be agreed.

Hm. BTW, using lruvec in page->flags implies you add a dynamic table lookup in
free_page() and you need to reset page->flags to point a valid
lruvec at page allocation. Please take care of performance.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 0/3] page cgroup diet
@ 2012-03-21  6:30         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-21  6:30 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Michal Hocko, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton,
	suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ, Tejun Heo

(2012/03/21 15:13), Konstantin Khlebnikov wrote:

> KAMEZAWA Hiroyuki wrote:
>> (2012/03/20 4:59), Konstantin Khlebnikov wrote:
>>
>>> KAMEZAWA Hiroyuki wrote:
>>>> This is just an RFC...test is not enough yet.
>>>>
>>>> I know it's merge window..this post is just for sharing idea.
>>>>
>>>> This patch merges pc->flags and pc->mem_cgroup into a word. Then,
>>>> memcg's overhead will be 8bytes per page(4096bytes?).
>>>>
>>>> Because this patch will affect all memory cgroup developers, I'd like to
>>>> show patches before MM Summit. I think we can agree the direction to
>>>> reduce size of page_cgroup..and finally integrate into 'struct page'
>>>> (and remove cgroup_disable= boot option...)
>>>>
>>>> Patch 1/3 - introduce pc_to_mem_cgroup and hide pc->mem_cgroup
>>>> Patch 2/3 - remove pc->mem_cgroup
>>>> Patch 3/3 - remove memory barriers.
>>>>
>>>> I'm now wondering when this change should be merged....
>>>>
>>>
>>> This is cool, but maybe we should skip this temporary step and merge all this stuff into page->flags.
>>
>>
>> Why we should skip and delay reduction of size of page_cgroup
>> which is considered as very big problem ?
> 
> I think it would be better to solve problem completely and kill page_cgroup in one step.
> 


I like step-by-step rather than a big step. I devided this page_cgroup diet patch set
into 3 steps (2012-Jan, 2012-Feb, and the next one is the last.)

This change may introduce new races as step1 and step2 did (Thanks to Hugh Dickins!).

Finally, IIUC, 'struct page' has 8bytes of padding now. I think we can use that space
if we can be agreed.

Hm. BTW, using lruvec in page->flags implies you add a dynamic table lookup in
free_page() and you need to reset page->flags to point a valid
lruvec at page allocation. Please take care of performance.

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup.
@ 2012-03-22 13:11     ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-22 13:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

On Mon 19-03-12 17:01:27, KAMEZAWA Hiroyuki wrote:
> Now, page_cgroup->flags has only 3bits. Considering alignment of
> struct mem_cgroup, which is allocated by kmalloc(), we can encode
> pointer to mem_cgroup and flags into a word.
                                  into a single word.

> 
> After this patch, pc->flags is encoded as
> 
>  63                           2     0
>   | pointer to memcg..........|flags|

Looks good.
Acked-by: Michal Hocko <mhocko@suse.cz>

> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   15 ++++++++++++---
>  1 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 92768cb..bca5447 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -1,6 +1,10 @@
>  #ifndef __LINUX_PAGE_CGROUP_H
>  #define __LINUX_PAGE_CGROUP_H
>  
> +/*
> + * Because these flags are encoded into ->flags with a pointer,
> + * we cannot have too much flags.
                     ^^^^^^^^^^^^^^ 
cannot have too many flags 
but an explicit BUILD_BUG_ON would be much more precise than a comment I
guess.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup.
@ 2012-03-22 13:11     ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-22 13:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Hugh Dickins, Han Ying, Glauber Costa,
	Aneesh Kumar K.V, Andrew Morton, suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

On Mon 19-03-12 17:01:27, KAMEZAWA Hiroyuki wrote:
> Now, page_cgroup->flags has only 3bits. Considering alignment of
> struct mem_cgroup, which is allocated by kmalloc(), we can encode
> pointer to mem_cgroup and flags into a word.
                                  into a single word.

> 
> After this patch, pc->flags is encoded as
> 
>  63                           2     0
>   | pointer to memcg..........|flags|

Looks good.
Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  include/linux/page_cgroup.h |   15 ++++++++++++---
>  1 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 92768cb..bca5447 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -1,6 +1,10 @@
>  #ifndef __LINUX_PAGE_CGROUP_H
>  #define __LINUX_PAGE_CGROUP_H
>  
> +/*
> + * Because these flags are encoded into ->flags with a pointer,
> + * we cannot have too much flags.
                     ^^^^^^^^^^^^^^ 
cannot have too many flags 
but an explicit BUILD_BUG_ON would be much more precise than a comment I
guess.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-22 13:11     ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-22 13:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

On Mon 19-03-12 16:59:47, KAMEZAWA Hiroyuki wrote:
> In order to encode pc->mem_cgroup and pc->flags to be in a word,
> access function to pc->mem_cgroup is required.
> 
> This patch replaces access to pc->mem_cgroup with
>  pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>  pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
> 
> Following patch will remove pc->mem_cgroup.

Acked-by: Michal Hocko <mhocko@suse.cz>

> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |   12 +++++++
>  mm/memcontrol.c             |   69 ++++++++++++++++++++++--------------------
>  2 files changed, 48 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index a88cdba..92768cb 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -82,6 +82,18 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
>  	bit_spin_unlock(PCG_LOCK, &pc->flags);
>  }
>  
> +
> +static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
> +{
> +	return pc->mem_cgroup;
> +}
> +
> +static inline void
> +pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
> +{
> +	pc->mem_cgroup = memcg;
> +}
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct page_cgroup;
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c65e6bc..124fec9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1014,9 +1014,9 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>  /*
>   * Following LRU functions are allowed to be used without PCG_LOCK.
>   * Operations are called by routine of global LRU independently from memcg.
> - * What we have to take care of here is validness of pc->mem_cgroup.
> + * What we have to take care of here is validness of pc's mem_cgroup.
>   *
> - * Changes to pc->mem_cgroup happens when
> + * Changes to pc's mem_cgroup happens when
>   * 1. charge
>   * 2. moving account
>   * In typical case, "charge" is done before add-to-lru. Exception is SwapCache.
> @@ -1048,7 +1048,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  		return &zone->lruvec;
>  
>  	pc = lookup_page_cgroup(page);
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  
>  	/*
>  	 * Surreptitiously switch any uncharged page to root:
> @@ -1057,10 +1057,12 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  	 *
>  	 * Our caller holds lru_lock, and PageCgroupUsed is updated
>  	 * under page_cgroup lock: between them, they make all uses
> -	 * of pc->mem_cgroup safe.
> +	 * of pc's mem_cgroup safe.
>  	 */
> -	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
> -		pc->mem_cgroup = memcg = root_mem_cgroup;
> +	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
> +		pc_set_mem_cgroup(pc, root_mem_cgroup);
> +		memcg = root_mem_cgroup;
> +	}
>  
>  	mz = page_cgroup_zoneinfo(memcg, page);
>  	/* compound_order() is stabilized through lru_lock */
> @@ -1088,7 +1090,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
>  		return;
>  
>  	pc = lookup_page_cgroup(page);
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	VM_BUG_ON(!memcg);
>  	mz = page_cgroup_zoneinfo(memcg, page);
>  	/* huge page split is done under lru_lock. so, we have no races. */
> @@ -1235,9 +1237,9 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  	pc = lookup_page_cgroup(page);
>  	if (!PageCgroupUsed(pc))
>  		return NULL;
> -	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> +	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
>  	smp_rmb();
> -	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
> +	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
>  	return &mz->reclaim_stat;
>  }
>  
> @@ -1314,7 +1316,7 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
>   *
>   * mem_cgroup_stolen() -  checking whether a cgroup is mc.from or not. This
>   *			  is used for avoiding races in accounting.  If true,
> - *			  pc->mem_cgroup may be overwritten.
> + *			  pc's mem_cgroup may be overwritten.
>   *
>   * mem_cgroup_under_move() - checking a cgroup is mc.from or mc.to or
>   *			  under hierarchy of moving cgroups. This is for
> @@ -1887,8 +1889,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
>   * file-stat operations happen after a page is attached to radix-tree. There
>   * are no race with "charge".
>   *
> - * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup
> - * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even
> + * Considering "uncharge", we know that memcg doesn't clear pc's mem_cgroup
> + * at "uncharge" intentionally. So, we always see valid pc's mem_cgroup even
>   * if there are race with "uncharge". Statistics itself is properly handled
>   * by flags.
>   *
> @@ -1905,7 +1907,7 @@ void __mem_cgroup_begin_update_page_stat(struct page *page,
>  
>  	pc = lookup_page_cgroup(page);
>  again:
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	if (unlikely(!memcg || !PageCgroupUsed(pc)))
>  		return;
>  	/*
> @@ -1918,7 +1920,7 @@ again:
>  		return;
>  
>  	move_lock_mem_cgroup(memcg, flags);
> -	if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
> +	if (memcg != pc_to_mem_cgroup(pc) || !PageCgroupUsed(pc)) {
>  		move_unlock_mem_cgroup(memcg, flags);
>  		goto again;
>  	}
> @@ -1930,11 +1932,11 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
>  	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
>  	/*
> -	 * It's guaranteed that pc->mem_cgroup never changes while
> -	 * lock is held because a routine modifies pc->mem_cgroup
> +	 * It's guaranteed that pc's mem_cgroup never changes while
> +	 * lock is held because a routine modifies pc's mem_cgroup
>  	 * should take move_lock_page_cgroup().
>  	 */
> -	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
> +	move_unlock_mem_cgroup(pc_to_mem_cgroup(pc), flags);
>  }
>  
>  void mem_cgroup_update_page_stat(struct page *page,
> @@ -1947,7 +1949,7 @@ void mem_cgroup_update_page_stat(struct page *page,
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	if (unlikely(!memcg || !PageCgroupUsed(pc)))
>  		return;
>  
> @@ -2244,7 +2246,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>   * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
>   * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
>   * as possible without any hazards. 2: all pages should have a valid
> - * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
> + * pc's mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
>   * pointer, that is treated as a charge to root_mem_cgroup.
>   *
>   * So __mem_cgroup_try_charge() will return
> @@ -2437,7 +2439,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		memcg = pc->mem_cgroup;
> +		memcg = pc_to_mem_cgroup(pc);
>  		if (memcg && !css_tryget(&memcg->css))
>  			memcg = NULL;
>  	} else if (PageSwapCache(page)) {
> @@ -2489,11 +2491,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  		}
>  	}
>  
> -	pc->mem_cgroup = memcg;
> +	pc_set_mem_cgroup(pc, memcg);
>  	/*
>  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> -	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> -	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
> +	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> +	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
>  	 * before USED bit, we need memory barrier here.
>  	 * See mem_cgroup_add_lru_list(), etc.
>   	 */
> @@ -2538,13 +2540,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  {
>  	struct page_cgroup *head_pc = lookup_page_cgroup(head);
>  	struct page_cgroup *pc;
> +	struct mem_cgroup *memcg = pc_to_mem_cgroup(head_pc);
>  	int i;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  	for (i = 1; i < HPAGE_PMD_NR; i++) {
>  		pc = head_pc + i;
> -		pc->mem_cgroup = head_pc->mem_cgroup;
> +		pc_set_mem_cgroup(pc, memcg);
>  		smp_wmb();/* see __commit_charge() */
>  		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
>  	}
> @@ -2595,7 +2598,7 @@ static int mem_cgroup_move_account(struct page *page,
>  	lock_page_cgroup(pc);
>  
>  	ret = -EINVAL;
> -	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
> +	if (!PageCgroupUsed(pc) || pc_to_mem_cgroup(pc) != from)
>  		goto unlock;
>  
>  	move_lock_mem_cgroup(from, &flags);
> @@ -2613,7 +2616,7 @@ static int mem_cgroup_move_account(struct page *page,
>  		__mem_cgroup_cancel_charge(from, nr_pages);
>  
>  	/* caller should have done css_get */
> -	pc->mem_cgroup = to;
> +	pc_set_mem_cgroup(pc, to);
>  	mem_cgroup_charge_statistics(to, anon, nr_pages);
>  	/*
>  	 * We charges against "to" which may not have any tasks. Then, "to"
> @@ -2956,7 +2959,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	lock_page_cgroup(pc);
>  
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  
>  	if (!PageCgroupUsed(pc))
>  		goto unlock_out;
> @@ -2992,7 +2995,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	ClearPageCgroupUsed(pc);
>  	/*
> -	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
> +	 * pc's mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
>  	 * to be reused (freed soon). Exception is SwapCache, it's handled by
>  	 * special functions.
> @@ -3214,7 +3217,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		memcg = pc->mem_cgroup;
> +		memcg = pc_to_mem_cgroup(pc);
>  		css_get(&memcg->css);
>  		/*
>  		 * At migrating an anonymous page, its mapcount goes down
> @@ -3359,7 +3362,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  	pc = lookup_page_cgroup(oldpage);
>  	/* fix accounting on old pages */
>  	lock_page_cgroup(pc);
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	mem_cgroup_charge_statistics(memcg, false, -1);
>  	ClearPageCgroupUsed(pc);
>  	unlock_page_cgroup(pc);
> @@ -3370,7 +3373,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  	/*
>  	 * Even if newpage->mapping was NULL before starting replacement,
>  	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
> -	 * LRU while we overwrite pc->mem_cgroup.
> +	 * LRU while we overwrite pc's mem_cgroup.
>  	 */
>  	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type, true);
>  }
> @@ -3406,7 +3409,7 @@ void mem_cgroup_print_bad_page(struct page *page)
>  	pc = lookup_page_cgroup_used(page);
>  	if (pc) {
>  		printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p\n",
> -		       pc, pc->flags, pc->mem_cgroup);
> +		       pc, pc->flags, pc_to_mem_cgroup(pc));
>  	}
>  }
>  #endif
> @@ -5197,7 +5200,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
>  		 * mem_cgroup_move_account() checks the pc is valid or not under
>  		 * the lock.
>  		 */
> -		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
> +		if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
>  			ret = MC_TARGET_PAGE;
>  			if (target)
>  				target->page = page;
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup
@ 2012-03-22 13:11     ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-22 13:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Johannes Weiner, Hugh Dickins, Han Ying, Glauber Costa,
	Aneesh Kumar K.V, Andrew Morton, suleiman-hpIqsD4AKlfQT0dZR+AlfA,
	n-horiguchi-PaJj6Psr51x8UrSeD/g0lQ,
	khlebnikov-GEFAQzZX7r8dnm+yROfE0A, Tejun Heo

On Mon 19-03-12 16:59:47, KAMEZAWA Hiroyuki wrote:
> In order to encode pc->mem_cgroup and pc->flags to be in a word,
> access function to pc->mem_cgroup is required.
> 
> This patch replaces access to pc->mem_cgroup with
>  pc_to_mem_cgroup(pc)          : pc->mem_cgroup
>  pc_set_mem_cgroup(pc, memcg)  : pc->mem_cgroup = memcg
> 
> Following patch will remove pc->mem_cgroup.

Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  include/linux/page_cgroup.h |   12 +++++++
>  mm/memcontrol.c             |   69 ++++++++++++++++++++++--------------------
>  2 files changed, 48 insertions(+), 33 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index a88cdba..92768cb 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -82,6 +82,18 @@ static inline void unlock_page_cgroup(struct page_cgroup *pc)
>  	bit_spin_unlock(PCG_LOCK, &pc->flags);
>  }
>  
> +
> +static inline struct mem_cgroup* pc_to_mem_cgroup(struct page_cgroup *pc)
> +{
> +	return pc->mem_cgroup;
> +}
> +
> +static inline void
> +pc_set_mem_cgroup(struct page_cgroup *pc, struct mem_cgroup *memcg)
> +{
> +	pc->mem_cgroup = memcg;
> +}
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct page_cgroup;
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c65e6bc..124fec9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1014,9 +1014,9 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>  /*
>   * Following LRU functions are allowed to be used without PCG_LOCK.
>   * Operations are called by routine of global LRU independently from memcg.
> - * What we have to take care of here is validness of pc->mem_cgroup.
> + * What we have to take care of here is validness of pc's mem_cgroup.
>   *
> - * Changes to pc->mem_cgroup happens when
> + * Changes to pc's mem_cgroup happens when
>   * 1. charge
>   * 2. moving account
>   * In typical case, "charge" is done before add-to-lru. Exception is SwapCache.
> @@ -1048,7 +1048,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  		return &zone->lruvec;
>  
>  	pc = lookup_page_cgroup(page);
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  
>  	/*
>  	 * Surreptitiously switch any uncharged page to root:
> @@ -1057,10 +1057,12 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  	 *
>  	 * Our caller holds lru_lock, and PageCgroupUsed is updated
>  	 * under page_cgroup lock: between them, they make all uses
> -	 * of pc->mem_cgroup safe.
> +	 * of pc's mem_cgroup safe.
>  	 */
> -	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup)
> -		pc->mem_cgroup = memcg = root_mem_cgroup;
> +	if (!PageCgroupUsed(pc) && memcg != root_mem_cgroup) {
> +		pc_set_mem_cgroup(pc, root_mem_cgroup);
> +		memcg = root_mem_cgroup;
> +	}
>  
>  	mz = page_cgroup_zoneinfo(memcg, page);
>  	/* compound_order() is stabilized through lru_lock */
> @@ -1088,7 +1090,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
>  		return;
>  
>  	pc = lookup_page_cgroup(page);
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	VM_BUG_ON(!memcg);
>  	mz = page_cgroup_zoneinfo(memcg, page);
>  	/* huge page split is done under lru_lock. so, we have no races. */
> @@ -1235,9 +1237,9 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  	pc = lookup_page_cgroup(page);
>  	if (!PageCgroupUsed(pc))
>  		return NULL;
> -	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> +	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
>  	smp_rmb();
> -	mz = page_cgroup_zoneinfo(pc->mem_cgroup, page);
> +	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
>  	return &mz->reclaim_stat;
>  }
>  
> @@ -1314,7 +1316,7 @@ static void mem_cgroup_end_move(struct mem_cgroup *memcg)
>   *
>   * mem_cgroup_stolen() -  checking whether a cgroup is mc.from or not. This
>   *			  is used for avoiding races in accounting.  If true,
> - *			  pc->mem_cgroup may be overwritten.
> + *			  pc's mem_cgroup may be overwritten.
>   *
>   * mem_cgroup_under_move() - checking a cgroup is mc.from or mc.to or
>   *			  under hierarchy of moving cgroups. This is for
> @@ -1887,8 +1889,8 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
>   * file-stat operations happen after a page is attached to radix-tree. There
>   * are no race with "charge".
>   *
> - * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup
> - * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even
> + * Considering "uncharge", we know that memcg doesn't clear pc's mem_cgroup
> + * at "uncharge" intentionally. So, we always see valid pc's mem_cgroup even
>   * if there are race with "uncharge". Statistics itself is properly handled
>   * by flags.
>   *
> @@ -1905,7 +1907,7 @@ void __mem_cgroup_begin_update_page_stat(struct page *page,
>  
>  	pc = lookup_page_cgroup(page);
>  again:
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	if (unlikely(!memcg || !PageCgroupUsed(pc)))
>  		return;
>  	/*
> @@ -1918,7 +1920,7 @@ again:
>  		return;
>  
>  	move_lock_mem_cgroup(memcg, flags);
> -	if (memcg != pc->mem_cgroup || !PageCgroupUsed(pc)) {
> +	if (memcg != pc_to_mem_cgroup(pc) || !PageCgroupUsed(pc)) {
>  		move_unlock_mem_cgroup(memcg, flags);
>  		goto again;
>  	}
> @@ -1930,11 +1932,11 @@ void __mem_cgroup_end_update_page_stat(struct page *page, unsigned long *flags)
>  	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
>  	/*
> -	 * It's guaranteed that pc->mem_cgroup never changes while
> -	 * lock is held because a routine modifies pc->mem_cgroup
> +	 * It's guaranteed that pc's mem_cgroup never changes while
> +	 * lock is held because a routine modifies pc's mem_cgroup
>  	 * should take move_lock_page_cgroup().
>  	 */
> -	move_unlock_mem_cgroup(pc->mem_cgroup, flags);
> +	move_unlock_mem_cgroup(pc_to_mem_cgroup(pc), flags);
>  }
>  
>  void mem_cgroup_update_page_stat(struct page *page,
> @@ -1947,7 +1949,7 @@ void mem_cgroup_update_page_stat(struct page *page,
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	if (unlikely(!memcg || !PageCgroupUsed(pc)))
>  		return;
>  
> @@ -2244,7 +2246,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>   * has TIF_MEMDIE, this function returns -EINTR while writing root_mem_cgroup
>   * to *ptr. There are two reasons for this. 1: fatal threads should quit as soon
>   * as possible without any hazards. 2: all pages should have a valid
> - * pc->mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
> + * pc's mem_cgroup. If mm is NULL and the caller doesn't pass a valid memcg
>   * pointer, that is treated as a charge to root_mem_cgroup.
>   *
>   * So __mem_cgroup_try_charge() will return
> @@ -2437,7 +2439,7 @@ struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		memcg = pc->mem_cgroup;
> +		memcg = pc_to_mem_cgroup(pc);
>  		if (memcg && !css_tryget(&memcg->css))
>  			memcg = NULL;
>  	} else if (PageSwapCache(page)) {
> @@ -2489,11 +2491,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  		}
>  	}
>  
> -	pc->mem_cgroup = memcg;
> +	pc_set_mem_cgroup(pc, memcg);
>  	/*
>  	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> -	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
> -	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
> +	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> +	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
>  	 * before USED bit, we need memory barrier here.
>  	 * See mem_cgroup_add_lru_list(), etc.
>   	 */
> @@ -2538,13 +2540,14 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  {
>  	struct page_cgroup *head_pc = lookup_page_cgroup(head);
>  	struct page_cgroup *pc;
> +	struct mem_cgroup *memcg = pc_to_mem_cgroup(head_pc);
>  	int i;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  	for (i = 1; i < HPAGE_PMD_NR; i++) {
>  		pc = head_pc + i;
> -		pc->mem_cgroup = head_pc->mem_cgroup;
> +		pc_set_mem_cgroup(pc, memcg);
>  		smp_wmb();/* see __commit_charge() */
>  		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
>  	}
> @@ -2595,7 +2598,7 @@ static int mem_cgroup_move_account(struct page *page,
>  	lock_page_cgroup(pc);
>  
>  	ret = -EINVAL;
> -	if (!PageCgroupUsed(pc) || pc->mem_cgroup != from)
> +	if (!PageCgroupUsed(pc) || pc_to_mem_cgroup(pc) != from)
>  		goto unlock;
>  
>  	move_lock_mem_cgroup(from, &flags);
> @@ -2613,7 +2616,7 @@ static int mem_cgroup_move_account(struct page *page,
>  		__mem_cgroup_cancel_charge(from, nr_pages);
>  
>  	/* caller should have done css_get */
> -	pc->mem_cgroup = to;
> +	pc_set_mem_cgroup(pc, to);
>  	mem_cgroup_charge_statistics(to, anon, nr_pages);
>  	/*
>  	 * We charges against "to" which may not have any tasks. Then, "to"
> @@ -2956,7 +2959,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	lock_page_cgroup(pc);
>  
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  
>  	if (!PageCgroupUsed(pc))
>  		goto unlock_out;
> @@ -2992,7 +2995,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  
>  	ClearPageCgroupUsed(pc);
>  	/*
> -	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
> +	 * pc's mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
>  	 * to be reused (freed soon). Exception is SwapCache, it's handled by
>  	 * special functions.
> @@ -3214,7 +3217,7 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	pc = lookup_page_cgroup(page);
>  	lock_page_cgroup(pc);
>  	if (PageCgroupUsed(pc)) {
> -		memcg = pc->mem_cgroup;
> +		memcg = pc_to_mem_cgroup(pc);
>  		css_get(&memcg->css);
>  		/*
>  		 * At migrating an anonymous page, its mapcount goes down
> @@ -3359,7 +3362,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  	pc = lookup_page_cgroup(oldpage);
>  	/* fix accounting on old pages */
>  	lock_page_cgroup(pc);
> -	memcg = pc->mem_cgroup;
> +	memcg = pc_to_mem_cgroup(pc);
>  	mem_cgroup_charge_statistics(memcg, false, -1);
>  	ClearPageCgroupUsed(pc);
>  	unlock_page_cgroup(pc);
> @@ -3370,7 +3373,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  	/*
>  	 * Even if newpage->mapping was NULL before starting replacement,
>  	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
> -	 * LRU while we overwrite pc->mem_cgroup.
> +	 * LRU while we overwrite pc's mem_cgroup.
>  	 */
>  	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type, true);
>  }
> @@ -3406,7 +3409,7 @@ void mem_cgroup_print_bad_page(struct page *page)
>  	pc = lookup_page_cgroup_used(page);
>  	if (pc) {
>  		printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p\n",
> -		       pc, pc->flags, pc->mem_cgroup);
> +		       pc, pc->flags, pc_to_mem_cgroup(pc));
>  	}
>  }
>  #endif
> @@ -5197,7 +5200,7 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
>  		 * mem_cgroup_move_account() checks the pc is valid or not under
>  		 * the lock.
>  		 */
> -		if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) {
> +		if (PageCgroupUsed(pc) && pc_to_mem_cgroup(pc) == mc.from) {
>  			ret = MC_TARGET_PAGE;
>  			if (target)
>  				target->page = page;
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/3] memcg: atomic update of memcg pointer and other bits.
  2012-03-19  8:03   ` KAMEZAWA Hiroyuki
  (?)
@ 2012-03-22 13:38   ` Michal Hocko
  2012-03-23  1:03     ` KAMEZAWA Hiroyuki
  -1 siblings, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2012-03-22 13:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

On Mon 19-03-12 17:03:42, KAMEZAWA Hiroyuki wrote:
[...]
> @@ -1237,8 +1237,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  	pc = lookup_page_cgroup(page);
>  	if (!PageCgroupUsed(pc))
>  		return NULL;
> -	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
> -	smp_rmb();
>  	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
>  	return &mz->reclaim_stat;
>  }
> @@ -2491,16 +2489,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  		}
>  	}
>  
> -	pc_set_mem_cgroup(pc, memcg);
> -	/*
> -	 * We access a page_cgroup asynchronously without lock_page_cgroup().
> -	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
> -	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
> -	 * before USED bit, we need memory barrier here.
> -	 * See mem_cgroup_add_lru_list(), etc.
> - 	 */
> -	smp_wmb();
> -	SetPageCgroupUsed(pc);
> +	pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));

This is not nice. Maybe we need two variants (pc_set_mem_cgroup[_flags])?

>  	if (lrucare) {
>  		if (was_on_lru) {
> @@ -2529,7 +2518,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  
> -#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
>  /*
>   * Because tail pages are not marked as "used", set it. We're under
>   * zone->lru_lock, 'splitting on pmd' and compound_lock.
> @@ -2547,9 +2535,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  		return;
>  	for (i = 1; i < HPAGE_PMD_NR; i++) {
>  		pc = head_pc + i;
> -		pc_set_mem_cgroup(pc, memcg);
> -		smp_wmb();/* see __commit_charge() */
> -		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
> +		pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED));

Maybe it would be cleaner to remove PCGF_NOCOPY_AT_SPLIT in a separate patch with 
VM_BUG_ON(!head_pc->flags & BIT(PCG_USED))?

>  	}
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> @@ -2616,7 +2602,7 @@ static int mem_cgroup_move_account(struct page *page,
>  		__mem_cgroup_cancel_charge(from, nr_pages);
>  
>  	/* caller should have done css_get */
> -	pc_set_mem_cgroup(pc, to);
> +	pc_set_mem_cgroup(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));

Same here.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/3] memcg: atomic update of memcg pointer and other bits.
  2012-03-22 13:38   ` Michal Hocko
@ 2012-03-23  1:03     ` KAMEZAWA Hiroyuki
  2012-03-23  8:54       ` Michal Hocko
  0 siblings, 1 reply; 32+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-03-23  1:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, cgroups, Johannes Weiner, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

(2012/03/22 22:38), Michal Hocko wrote:

> On Mon 19-03-12 17:03:42, KAMEZAWA Hiroyuki wrote:
> [...]
>> @@ -1237,8 +1237,6 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>>  	pc = lookup_page_cgroup(page);
>>  	if (!PageCgroupUsed(pc))
>>  		return NULL;
>> -	/* Ensure pc's mem_cgroup is visible after reading PCG_USED. */
>> -	smp_rmb();
>>  	mz = page_cgroup_zoneinfo(pc_to_mem_cgroup(pc), page);
>>  	return &mz->reclaim_stat;
>>  }
>> @@ -2491,16 +2489,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>>  		}
>>  	}
>>  
>> -	pc_set_mem_cgroup(pc, memcg);
>> -	/*
>> -	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>> -	 * Especially when a page_cgroup is taken from a page, pc's mem_cgroup
>> -	 * is accessed after testing USED bit. To make pc's mem_cgroup visible
>> -	 * before USED bit, we need memory barrier here.
>> -	 * See mem_cgroup_add_lru_list(), etc.
>> - 	 */
>> -	smp_wmb();
>> -	SetPageCgroupUsed(pc);
>> +	pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED) | BIT(PCG_LOCK));
> 
> This is not nice. Maybe we need two variants (pc_set_mem_cgroup[_flags])?
> 


Sure. I'll add that.


>>  	if (lrucare) {
>>  		if (was_on_lru) {
>> @@ -2529,7 +2518,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>>  
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  
>> -#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
>>  /*
>>   * Because tail pages are not marked as "used", set it. We're under
>>   * zone->lru_lock, 'splitting on pmd' and compound_lock.
>> @@ -2547,9 +2535,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>>  		return;
>>  	for (i = 1; i < HPAGE_PMD_NR; i++) {
>>  		pc = head_pc + i;
>> -		pc_set_mem_cgroup(pc, memcg);
>> -		smp_wmb();/* see __commit_charge() */
>> -		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
>> +		pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED));
> 
> Maybe it would be cleaner to remove PCGF_NOCOPY_AT_SPLIT in a separate patch with 
> VM_BUG_ON(!head_pc->flags & BIT(PCG_USED))?
> 


Hm, ok. I'll divide this patch.

>>  	}
>>  }
>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>> @@ -2616,7 +2602,7 @@ static int mem_cgroup_move_account(struct page *page,
>>  		__mem_cgroup_cancel_charge(from, nr_pages);
>>  
>>  	/* caller should have done css_get */
>> -	pc_set_mem_cgroup(pc, to);
>> +	pc_set_mem_cgroup(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
> 
> Same here.
> 


pc_set_mem_cgroup_flags() ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC][PATCH 3/3] memcg: atomic update of memcg pointer and other bits.
  2012-03-23  1:03     ` KAMEZAWA Hiroyuki
@ 2012-03-23  8:54       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2012-03-23  8:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, cgroups, Johannes Weiner, Hugh Dickins, Han Ying,
	Glauber Costa, Aneesh Kumar K.V, Andrew Morton, suleiman,
	n-horiguchi, khlebnikov, Tejun Heo

On Fri 23-03-12 10:03:13, KAMEZAWA Hiroyuki wrote:
> (2012/03/22 22:38), Michal Hocko wrote:
[...]
> >>  	if (lrucare) {
> >>  		if (was_on_lru) {
> >> @@ -2529,7 +2518,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
> >>  
> >>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >>  
> >> -#define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MIGRATION))
> >>  /*
> >>   * Because tail pages are not marked as "used", set it. We're under
> >>   * zone->lru_lock, 'splitting on pmd' and compound_lock.
> >> @@ -2547,9 +2535,7 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> >>  		return;
> >>  	for (i = 1; i < HPAGE_PMD_NR; i++) {
> >>  		pc = head_pc + i;
> >> -		pc_set_mem_cgroup(pc, memcg);
> >> -		smp_wmb();/* see __commit_charge() */
> >> -		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
> >> +		pc_set_mem_cgroup(pc, memcg, BIT(PCG_USED));
> > 
> > Maybe it would be cleaner to remove PCGF_NOCOPY_AT_SPLIT in a separate patch with 
> > VM_BUG_ON(!head_pc->flags & BIT(PCG_USED))?
> > 
> 
> 
> Hm, ok. I'll divide this patch.

Thanks!

> 
> >>  	}
> >>  }
> >>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >> @@ -2616,7 +2602,7 @@ static int mem_cgroup_move_account(struct page *page,
> >>  		__mem_cgroup_cancel_charge(from, nr_pages);
> >>  
> >>  	/* caller should have done css_get */
> >> -	pc_set_mem_cgroup(pc, to);
> >> +	pc_set_mem_cgroup(pc, to, BIT(PCG_USED) | BIT(PCG_LOCK));
> > 
> > Same here.
> > 
> 
> 
> pc_set_mem_cgroup_flags() ?

This sounds like we set only flags but to be honest I didn't come to a
better name which wouldn't be terribly long as well.
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-03-23  8:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-19  7:56 [RFC][PATCH 0/3] page cgroup diet KAMEZAWA Hiroyuki
2012-03-19  7:59 ` [RFC][PATCH 1/3] memcg: add methods to access pc->mem_cgroup KAMEZAWA Hiroyuki
2012-03-19 10:58   ` Glauber Costa
2012-03-19 12:11     ` KAMEZAWA Hiroyuki
2012-03-19 12:11       ` KAMEZAWA Hiroyuki
2012-03-19 12:29       ` Glauber Costa
2012-03-19 15:33     ` Michal Hocko
2012-03-19 15:33       ` Michal Hocko
2012-03-19 15:34       ` Glauber Costa
2012-03-21  1:06       ` KAMEZAWA Hiroyuki
2012-03-21  1:06         ` KAMEZAWA Hiroyuki
2012-03-22 13:11   ` Michal Hocko
2012-03-22 13:11     ` Michal Hocko
2012-03-19  8:01 ` [RFC][PATCH 2/3] memcg: reduce size of struct page_cgroup KAMEZAWA Hiroyuki
2012-03-19 22:20   ` Suleiman Souhlal
2012-03-21  0:47     ` KAMEZAWA Hiroyuki
2012-03-21  0:47       ` KAMEZAWA Hiroyuki
2012-03-22 13:11   ` Michal Hocko
2012-03-22 13:11     ` Michal Hocko
2012-03-19  8:03 ` [RFC][PATCH 3/3] memcg: atomic update of memcg pointer and other bits KAMEZAWA Hiroyuki
2012-03-19  8:03   ` KAMEZAWA Hiroyuki
2012-03-22 13:38   ` Michal Hocko
2012-03-23  1:03     ` KAMEZAWA Hiroyuki
2012-03-23  8:54       ` Michal Hocko
2012-03-19 19:59 ` [RFC][PATCH 0/3] page cgroup diet Konstantin Khlebnikov
2012-03-19 19:59   ` Konstantin Khlebnikov
2012-03-21  1:02   ` KAMEZAWA Hiroyuki
2012-03-21  1:02     ` KAMEZAWA Hiroyuki
2012-03-21  6:13     ` Konstantin Khlebnikov
2012-03-21  6:13       ` Konstantin Khlebnikov
2012-03-21  6:30       ` KAMEZAWA Hiroyuki
2012-03-21  6:30         ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.