linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/8] mm: memcg fixlets for 3.3
@ 2011-11-23 15:42 Johannes Weiner
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
                   ` (8 more replies)
  0 siblings, 9 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

Here are some minor memcg-related cleanups and optimizations, nothing
too exciting.  The bulk of the diffstat comes from renaming the
remaining variables to describe a (struct mem_cgroup *) to "memcg".
The rest cuts down on the (un)charge fastpaths, as people start to get
annoyed by those functions showing up in the profiles of their their
non-memcg workloads.  More is to come, but I wanted to get the more
obvious bits out of the way.

 include/linux/memcontrol.h  |   16 ++++----
 include/linux/oom.h         |    2 +-
 include/linux/page_cgroup.h |   20 ++++++---
 include/linux/rmap.h        |    4 +-
 mm/memcontrol.c             |   97 ++++++++++++++++---------------------------
 mm/oom_kill.c               |   42 +++++++++---------
 mm/rmap.c                   |   20 ++++----
 mm/swapfile.c               |    9 ++--
 mm/vmscan.c                 |   12 +++---
 9 files changed, 103 insertions(+), 119 deletions(-)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task()
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-23 22:35   ` David Rientjes
                     ` (3 more replies)
  2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

The memcg argument of oom_kill_task() hasn't been used since 341aea2
'oom-kill: remove boost_dying_task_prio()'.  Kill it.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/oom_kill.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 471dedb..fd9e303 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -423,7 +423,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
-static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
+static int oom_kill_task(struct task_struct *p)
 {
 	struct task_struct *q;
 	struct mm_struct *mm;
@@ -522,7 +522,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		}
 	} while_each_thread(p, t);
 
-	return oom_kill_task(victim, mem);
+	return oom_kill_task(victim);
 }
 
 /*
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-23 22:40   ` David Rientjes
                     ` (3 more replies)
  2011-11-23 15:42 ` [patch 3/8] mm: memcg: clean up fault accounting Johannes Weiner
                   ` (6 subsequent siblings)
  8 siblings, 4 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 include/linux/memcontrol.h |   16 ++++++------
 include/linux/oom.h        |    2 +-
 include/linux/rmap.h       |    4 +-
 mm/memcontrol.c            |   52 ++++++++++++++++++++++---------------------
 mm/oom_kill.c              |   38 ++++++++++++++++----------------
 mm/rmap.c                  |   20 ++++++++--------
 mm/swapfile.c              |    9 ++++---
 mm/vmscan.c                |   12 +++++-----
 8 files changed, 78 insertions(+), 75 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2bf7698..a072ebe 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -54,10 +54,10 @@ extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask);
 /* for swap handling */
 extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
-		struct page *page, gfp_t mask, struct mem_cgroup **ptr);
+		struct page *page, gfp_t mask, struct mem_cgroup **memcgp);
 extern void mem_cgroup_commit_charge_swapin(struct page *page,
-					struct mem_cgroup *ptr);
-extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr);
+					struct mem_cgroup *memcg);
+extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg);
 
 extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 					gfp_t gfp_mask);
@@ -98,7 +98,7 @@ extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
 
 extern int
 mem_cgroup_prepare_migration(struct page *page,
-	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask);
+	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask);
 extern void mem_cgroup_end_migration(struct mem_cgroup *memcg,
 	struct page *oldpage, struct page *newpage, bool migration_ok);
 
@@ -181,17 +181,17 @@ static inline int mem_cgroup_cache_charge(struct page *page,
 }
 
 static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
-		struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
+		struct page *page, gfp_t gfp_mask, struct mem_cgroup **memcgp)
 {
 	return 0;
 }
 
 static inline void mem_cgroup_commit_charge_swapin(struct page *page,
-					  struct mem_cgroup *ptr)
+					  struct mem_cgroup *memcg)
 {
 }
 
-static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr)
+static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg)
 {
 }
 
@@ -270,7 +270,7 @@ static inline struct cgroup_subsys_state
 
 static inline int
 mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
-	struct mem_cgroup **ptr, gfp_t gfp_mask)
+	struct mem_cgroup **memcgp, gfp_t gfp_mask)
 {
 	return 0;
 }
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 6f9d04a..552fba9 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -43,7 +43,7 @@ enum oom_constraint {
 extern void compare_swap_oom_score_adj(int old_val, int new_val);
 extern int test_set_oom_score_adj(int new_val);
 
-extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
+extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 			const nodemask_t *nodemask, unsigned long totalpages);
 extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
 extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 2148b12..5ee84fb 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -157,7 +157,7 @@ static inline void page_dup_rmap(struct page *page)
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-			struct mem_cgroup *cnt, unsigned long *vm_flags);
+			struct mem_cgroup *memcg, unsigned long *vm_flags);
 int page_referenced_one(struct page *, struct vm_area_struct *,
 	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
 
@@ -235,7 +235,7 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
 #define anon_vma_link(vma)	do {} while (0)
 
 static inline int page_referenced(struct page *page, int is_locked,
-				  struct mem_cgroup *cnt,
+				  struct mem_cgroup *memcg,
 				  unsigned long *vm_flags)
 {
 	*vm_flags = 0;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f524660..473b99f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2778,12 +2778,12 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
  */
 int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
 				 struct page *page,
-				 gfp_t mask, struct mem_cgroup **ptr)
+				 gfp_t mask, struct mem_cgroup **memcgp)
 {
 	struct mem_cgroup *memcg;
 	int ret;
 
-	*ptr = NULL;
+	*memcgp = NULL;
 
 	if (mem_cgroup_disabled())
 		return 0;
@@ -2801,27 +2801,27 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
 	memcg = try_get_mem_cgroup_from_page(page);
 	if (!memcg)
 		goto charge_cur_mm;
-	*ptr = memcg;
-	ret = __mem_cgroup_try_charge(NULL, mask, 1, ptr, true);
+	*memcgp = memcg;
+	ret = __mem_cgroup_try_charge(NULL, mask, 1, memcgp, true);
 	css_put(&memcg->css);
 	return ret;
 charge_cur_mm:
 	if (unlikely(!mm))
 		mm = &init_mm;
-	return __mem_cgroup_try_charge(mm, mask, 1, ptr, true);
+	return __mem_cgroup_try_charge(mm, mask, 1, memcgp, true);
 }
 
 static void
-__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
+__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *memcg,
 					enum charge_type ctype)
 {
 	if (mem_cgroup_disabled())
 		return;
-	if (!ptr)
+	if (!memcg)
 		return;
-	cgroup_exclude_rmdir(&ptr->css);
+	cgroup_exclude_rmdir(&memcg->css);
 
-	__mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
+	__mem_cgroup_commit_charge_lrucare(page, memcg, ctype);
 	/*
 	 * Now swap is on-memory. This means this page may be
 	 * counted both as mem and swap....double count.
@@ -2831,21 +2831,22 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
 	 */
 	if (do_swap_account && PageSwapCache(page)) {
 		swp_entry_t ent = {.val = page_private(page)};
+		struct mem_cgroup *swap_memcg;
 		unsigned short id;
-		struct mem_cgroup *memcg;
 
 		id = swap_cgroup_record(ent, 0);
 		rcu_read_lock();
-		memcg = mem_cgroup_lookup(id);
-		if (memcg) {
+		swap_memcg = mem_cgroup_lookup(id);
+		if (swap_memcg) {
 			/*
 			 * This recorded memcg can be obsolete one. So, avoid
 			 * calling css_tryget
 			 */
-			if (!mem_cgroup_is_root(memcg))
-				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
-			mem_cgroup_swap_statistics(memcg, false);
-			mem_cgroup_put(memcg);
+			if (!mem_cgroup_is_root(swap_memcg))
+				res_counter_uncharge(&swap_memcg->memsw,
+						     PAGE_SIZE);
+			mem_cgroup_swap_statistics(swap_memcg, false);
+			mem_cgroup_put(swap_memcg);
 		}
 		rcu_read_unlock();
 	}
@@ -2854,13 +2855,14 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
 	 * So, rmdir()->pre_destroy() can be called while we do this charge.
 	 * In that case, we need to call pre_destroy() again. check it here.
 	 */
-	cgroup_release_and_wakeup_rmdir(&ptr->css);
+	cgroup_release_and_wakeup_rmdir(&memcg->css);
 }
 
-void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
+void mem_cgroup_commit_charge_swapin(struct page *page,
+				     struct mem_cgroup *memcg)
 {
-	__mem_cgroup_commit_charge_swapin(page, ptr,
-					MEM_CGROUP_CHARGE_TYPE_MAPPED);
+	__mem_cgroup_commit_charge_swapin(page, memcg,
+					  MEM_CGROUP_CHARGE_TYPE_MAPPED);
 }
 
 void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg)
@@ -3189,14 +3191,14 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
  * page belongs to.
  */
 int mem_cgroup_prepare_migration(struct page *page,
-	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask)
+	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
 {
 	struct mem_cgroup *memcg = NULL;
 	struct page_cgroup *pc;
 	enum charge_type ctype;
 	int ret = 0;
 
-	*ptr = NULL;
+	*memcgp = NULL;
 
 	VM_BUG_ON(PageTransHuge(page));
 	if (mem_cgroup_disabled())
@@ -3247,10 +3249,10 @@ int mem_cgroup_prepare_migration(struct page *page,
 	if (!memcg)
 		return 0;
 
-	*ptr = memcg;
-	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, ptr, false);
+	*memcgp = memcg;
+	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
 	css_put(&memcg->css);/* drop extra refcnt */
-	if (ret || *ptr == NULL) {
+	if (ret || *memcgp == NULL) {
 		if (PageAnon(page)) {
 			lock_page_cgroup(pc);
 			ClearPageCgroupMigration(pc);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index fd9e303..307351e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -146,7 +146,7 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
 
 /* return true if the task is not adequate as candidate victim task. */
 static bool oom_unkillable_task(struct task_struct *p,
-		const struct mem_cgroup *mem, const nodemask_t *nodemask)
+		const struct mem_cgroup *memcg, const nodemask_t *nodemask)
 {
 	if (is_global_init(p))
 		return true;
@@ -154,7 +154,7 @@ static bool oom_unkillable_task(struct task_struct *p,
 		return true;
 
 	/* When mem_cgroup_out_of_memory() and p is not member of the group */
-	if (mem && !task_in_mem_cgroup(p, mem))
+	if (memcg && !task_in_mem_cgroup(p, memcg))
 		return true;
 
 	/* p may not have freeable memory in nodemask */
@@ -173,12 +173,12 @@ static bool oom_unkillable_task(struct task_struct *p,
  * predictable as possible.  The goal is to return the highest value for the
  * task consuming the most memory to avoid subsequent oom failures.
  */
-unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
+unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 		      const nodemask_t *nodemask, unsigned long totalpages)
 {
 	int points;
 
-	if (oom_unkillable_task(p, mem, nodemask))
+	if (oom_unkillable_task(p, memcg, nodemask))
 		return 0;
 
 	p = find_lock_task_mm(p);
@@ -297,7 +297,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
  * (not docbooked, we don't want this one cluttering up the manual)
  */
 static struct task_struct *select_bad_process(unsigned int *ppoints,
-		unsigned long totalpages, struct mem_cgroup *mem,
+		unsigned long totalpages, struct mem_cgroup *memcg,
 		const nodemask_t *nodemask)
 {
 	struct task_struct *g, *p;
@@ -309,7 +309,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 
 		if (p->exit_state)
 			continue;
-		if (oom_unkillable_task(p, mem, nodemask))
+		if (oom_unkillable_task(p, memcg, nodemask))
 			continue;
 
 		/*
@@ -353,7 +353,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
 			}
 		}
 
-		points = oom_badness(p, mem, nodemask, totalpages);
+		points = oom_badness(p, memcg, nodemask, totalpages);
 		if (points > *ppoints) {
 			chosen = p;
 			*ppoints = points;
@@ -376,14 +376,14 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
  *
  * Call with tasklist_lock read-locked.
  */
-static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
+static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
 {
 	struct task_struct *p;
 	struct task_struct *task;
 
 	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
 	for_each_process(p) {
-		if (oom_unkillable_task(p, mem, nodemask))
+		if (oom_unkillable_task(p, memcg, nodemask))
 			continue;
 
 		task = find_lock_task_mm(p);
@@ -406,7 +406,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
 }
 
 static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
-			struct mem_cgroup *mem, const nodemask_t *nodemask)
+			struct mem_cgroup *memcg, const nodemask_t *nodemask)
 {
 	task_lock(current);
 	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
@@ -416,10 +416,10 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 	cpuset_print_task_mems_allowed(current);
 	task_unlock(current);
 	dump_stack();
-	mem_cgroup_print_oom_info(mem, p);
+	mem_cgroup_print_oom_info(memcg, p);
 	show_mem(SHOW_MEM_FILTER_NODES);
 	if (sysctl_oom_dump_tasks)
-		dump_tasks(mem, nodemask);
+		dump_tasks(memcg, nodemask);
 }
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
@@ -473,7 +473,7 @@ static int oom_kill_task(struct task_struct *p)
 
 static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			    unsigned int points, unsigned long totalpages,
-			    struct mem_cgroup *mem, nodemask_t *nodemask,
+			    struct mem_cgroup *memcg, nodemask_t *nodemask,
 			    const char *message)
 {
 	struct task_struct *victim = p;
@@ -482,7 +482,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	unsigned int victim_points = 0;
 
 	if (printk_ratelimit())
-		dump_header(p, gfp_mask, order, mem, nodemask);
+		dump_header(p, gfp_mask, order, memcg, nodemask);
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -513,7 +513,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			/*
 			 * oom_badness() returns 0 if the thread is unkillable
 			 */
-			child_points = oom_badness(child, mem, nodemask,
+			child_points = oom_badness(child, memcg, nodemask,
 								totalpages);
 			if (child_points > victim_points) {
 				victim = child;
@@ -550,7 +550,7 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
 }
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
-void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
+void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask)
 {
 	unsigned long limit;
 	unsigned int points = 0;
@@ -567,14 +567,14 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
 	}
 
 	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0, NULL);
-	limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;
+	limit = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT;
 	read_lock(&tasklist_lock);
 retry:
-	p = select_bad_process(&points, limit, mem, NULL);
+	p = select_bad_process(&points, limit, memcg, NULL);
 	if (!p || PTR_ERR(p) == -1UL)
 		goto out;
 
-	if (oom_kill_process(p, gfp_mask, 0, points, limit, mem, NULL,
+	if (oom_kill_process(p, gfp_mask, 0, points, limit, memcg, NULL,
 				"Memory cgroup out of memory"))
 		goto retry;
 out:
diff --git a/mm/rmap.c b/mm/rmap.c
index a4fd368..c13791b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -728,7 +728,7 @@ out:
 }
 
 static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont,
+				struct mem_cgroup *memcg,
 				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
@@ -751,7 +751,7 @@ static int page_referenced_anon(struct page *page,
 		 * counting on behalf of references from different
 		 * cgroups
 		 */
-		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
+		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
 						  &mapcount, vm_flags);
@@ -766,7 +766,7 @@ static int page_referenced_anon(struct page *page,
 /**
  * page_referenced_file - referenced check for object-based rmap
  * @page: the page we're checking references on.
- * @mem_cont: target memory controller
+ * @memcg: target memory control group
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
  *
  * For an object-based mapped page, find all the places it is mapped and
@@ -777,7 +777,7 @@ static int page_referenced_anon(struct page *page,
  * This function is only called from page_referenced for object-based pages.
  */
 static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont,
+				struct mem_cgroup *memcg,
 				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
@@ -819,7 +819,7 @@ static int page_referenced_file(struct page *page,
 		 * counting on behalf of references from different
 		 * cgroups
 		 */
-		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
+		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
 			continue;
 		referenced += page_referenced_one(page, vma, address,
 						  &mapcount, vm_flags);
@@ -835,7 +835,7 @@ static int page_referenced_file(struct page *page,
  * page_referenced - test if the page was referenced
  * @page: the page to test
  * @is_locked: caller holds lock on the page
- * @mem_cont: target memory controller
+ * @memcg: target memory cgroup
  * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
  *
  * Quick test_and_clear_referenced for all mappings to a page,
@@ -843,7 +843,7 @@ static int page_referenced_file(struct page *page,
  */
 int page_referenced(struct page *page,
 		    int is_locked,
-		    struct mem_cgroup *mem_cont,
+		    struct mem_cgroup *memcg,
 		    unsigned long *vm_flags)
 {
 	int referenced = 0;
@@ -859,13 +859,13 @@ int page_referenced(struct page *page,
 			}
 		}
 		if (unlikely(PageKsm(page)))
-			referenced += page_referenced_ksm(page, mem_cont,
+			referenced += page_referenced_ksm(page, memcg,
 								vm_flags);
 		else if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont,
+			referenced += page_referenced_anon(page, memcg,
 								vm_flags);
 		else if (page->mapping)
-			referenced += page_referenced_file(page, mem_cont,
+			referenced += page_referenced_file(page, memcg,
 								vm_flags);
 		if (we_locked)
 			unlock_page(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index b1cd120..c2e1312 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -847,12 +847,13 @@ unsigned int count_swap_pages(int type, int free)
 static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long addr, swp_entry_t entry, struct page *page)
 {
-	struct mem_cgroup *ptr;
+	struct mem_cgroup *memcg;
 	spinlock_t *ptl;
 	pte_t *pte;
 	int ret = 1;
 
-	if (mem_cgroup_try_charge_swapin(vma->vm_mm, page, GFP_KERNEL, &ptr)) {
+	if (mem_cgroup_try_charge_swapin(vma->vm_mm, page,
+					 GFP_KERNEL, &memcg)) {
 		ret = -ENOMEM;
 		goto out_nolock;
 	}
@@ -860,7 +861,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
 		if (ret > 0)
-			mem_cgroup_cancel_charge_swapin(ptr);
+			mem_cgroup_cancel_charge_swapin(memcg);
 		ret = 0;
 		goto out;
 	}
@@ -871,7 +872,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	set_pte_at(vma->vm_mm, addr, pte,
 		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	page_add_anon_rmap(page, vma, addr);
-	mem_cgroup_commit_charge_swapin(page, ptr);
+	mem_cgroup_commit_charge_swapin(page, memcg);
 	swap_free(entry);
 	/*
 	 * Move the page to the active list so it is not
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 21ce3cb..855c450 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2387,7 +2387,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
-unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
+unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
 						gfp_t gfp_mask, bool noswap,
 						struct zone *zone,
 						unsigned long *nr_scanned)
@@ -2399,10 +2399,10 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 		.may_unmap = 1,
 		.may_swap = !noswap,
 		.order = 0,
-		.target_mem_cgroup = mem,
+		.target_mem_cgroup = memcg,
 	};
 	struct mem_cgroup_zone mz = {
-		.mem_cgroup = mem,
+		.mem_cgroup = memcg,
 		.zone = zone,
 	};
 
@@ -2428,7 +2428,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
 	return sc.nr_reclaimed;
 }
 
-unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
+unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 					   gfp_t gfp_mask,
 					   bool noswap)
 {
@@ -2441,7 +2441,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 		.may_swap = !noswap,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.order = 0,
-		.target_mem_cgroup = mem_cont,
+		.target_mem_cgroup = memcg,
 		.nodemask = NULL, /* we don't care the placement */
 		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
 				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
@@ -2455,7 +2455,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 	 * take care of from where we get pages. So the node where we start the
 	 * scan does not need to be the current node.
 	 */
-	nid = mem_cgroup_select_victim_node(mem_cont);
+	nid = mem_cgroup_select_victim_node(memcg);
 
 	zonelist = NODE_DATA(nid)->node_zonelists;
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 3/8] mm: memcg: clean up fault accounting
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
  2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-24  0:00   ` KAMEZAWA Hiroyuki
                     ` (2 more replies)
  2011-11-23 15:42 ` [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL Johannes Weiner
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

The fault accounting functions have a single, memcg-internal user, so
they don't need to be global.  In fact, their one-line bodies can be
directly folded into the caller.  And since faults happen one at a
time, use this_cpu_inc() directly instead of this_cpu_add(foo, 1).

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/memcontrol.c |   14 ++------------
 1 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 473b99f..d825af9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -589,16 +589,6 @@ static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg,
 	this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAPOUT], val);
 }
 
-void mem_cgroup_pgfault(struct mem_cgroup *memcg, int val)
-{
-	this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGFAULT], val);
-}
-
-void mem_cgroup_pgmajfault(struct mem_cgroup *memcg, int val)
-{
-	this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGMAJFAULT], val);
-}
-
 static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg,
 					    enum mem_cgroup_events_index idx)
 {
@@ -913,10 +903,10 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
 
 	switch (idx) {
 	case PGMAJFAULT:
-		mem_cgroup_pgmajfault(memcg, 1);
+		this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGFAULT]);
 		break;
 	case PGFAULT:
-		mem_cgroup_pgfault(memcg, 1);
+		this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGMAJFAULT]);
 		break;
 	default:
 		BUG();
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
                   ` (2 preceding siblings ...)
  2011-11-23 15:42 ` [patch 3/8] mm: memcg: clean up fault accounting Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-24  0:01   ` KAMEZAWA Hiroyuki
                     ` (2 more replies)
  2011-11-23 15:42 ` [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge() Johannes Weiner
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

Pages have their corresponding page_cgroup descriptors set up before
they are used in userspace, and thus managed by a memory cgroup.

The only time where lookup_page_cgroup() can return NULL is in the
page sanity checking code that executes while feeding pages into the
page allocator for the first time.

Remove the NULL checks against lookup_page_cgroup() results from all
callsites where we know that corresponding page_cgroup descriptors
must be allocated.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/memcontrol.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d825af9..d4d139a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1894,9 +1894,6 @@ void mem_cgroup_update_page_stat(struct page *page,
 	bool need_unlock = false;
 	unsigned long uninitialized_var(flags);
 
-	if (unlikely(!pc))
-		return;
-
 	rcu_read_lock();
 	memcg = pc->mem_cgroup;
 	if (unlikely(!memcg || !PageCgroupUsed(pc)))
@@ -2669,8 +2666,6 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
 	}
 
 	pc = lookup_page_cgroup(page);
-	BUG_ON(!pc); /* XXX: remove this and move pc lookup into commit */
-
 	ret = __mem_cgroup_try_charge(mm, gfp_mask, nr_pages, &memcg, oom);
 	if (ret || !memcg)
 		return ret;
@@ -2942,7 +2937,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	 * Check if our page_cgroup is valid
 	 */
 	pc = lookup_page_cgroup(page);
-	if (unlikely(!pc || !PageCgroupUsed(pc)))
+	if (unlikely(!PageCgroupUsed(pc)))
 		return NULL;
 
 	lock_page_cgroup(pc);
@@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
 	struct page_cgroup *pc;
 
 	pc = lookup_page_cgroup(page);
+	/* Can be NULL while bootstrapping the page allocator */
 	if (likely(pc) && PageCgroupUsed(pc))
 		return pc;
 	return NULL;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge()
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
                   ` (3 preceding siblings ...)
  2011-11-23 15:42 ` [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-24  0:04   ` KAMEZAWA Hiroyuki
  2011-11-23 15:42 ` [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page() Johannes Weiner
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

All callsites pass in freshly allocated pages and a valid mm.  As a
result, all checks pertaining the page's mapcount, page->mapping or
the fallback to init_mm are unneeded.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/memcontrol.c |   13 +------------
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d4d139a..0d10be4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2679,19 +2679,8 @@ int mem_cgroup_newpage_charge(struct page *page,
 {
 	if (mem_cgroup_disabled())
 		return 0;
-	/*
-	 * If already mapped, we don't have to account.
-	 * If page cache, page->mapping has address_space.
-	 * But page->mapping may have out-of-use anon_vma pointer,
-	 * detecit it by PageAnon() check. newly-mapped-anon's page->mapping
-	 * is NULL.
-  	 */
-	if (page_mapped(page) || (page->mapping && !PageAnon(page)))
-		return 0;
-	if (unlikely(!mm))
-		mm = &init_mm;
 	return mem_cgroup_charge_common(page, mm, gfp_mask,
-				MEM_CGROUP_CHARGE_TYPE_MAPPED);
+					MEM_CGROUP_CHARGE_TYPE_MAPPED);
 }
 
 static void
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page()
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
                   ` (4 preceding siblings ...)
  2011-11-23 15:42 ` [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge() Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-24  0:06   ` KAMEZAWA Hiroyuki
  2011-11-23 15:42 ` [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically Johannes Weiner
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

mem_cgroup_uncharge_page() is only called on either freshly allocated
pages without page->mapping or on rmapped PageAnon() pages.  There is
no need to check for a page->mapping that is not an anon_vma.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/memcontrol.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0d10be4..b9a3b94 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2989,8 +2989,6 @@ void mem_cgroup_uncharge_page(struct page *page)
 	/* early check. */
 	if (page_mapped(page))
 		return;
-	if (page->mapping && !PageAnon(page))
-		return;
 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
 }
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
                   ` (5 preceding siblings ...)
  2011-11-23 15:42 ` [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page() Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-23 18:52   ` Hugh Dickins
  2011-11-24  0:09   ` KAMEZAWA Hiroyuki
  2011-11-23 15:42 ` [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically Johannes Weiner
  2011-11-24  6:09 ` [patch 0/8] mm: memcg fixlets for 3.3 Balbir Singh
  8 siblings, 2 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

This bit is protected by zone->lru_lock, there is no need for locked
operations when setting and clearing it.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 include/linux/page_cgroup.h |   16 ++++++++++++----
 mm/memcontrol.c             |    4 ++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index aaa60da..a0bc9d0 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -57,14 +57,23 @@ static inline int PageCgroup##uname(struct page_cgroup *pc)	\
 #define SETPCGFLAG(uname, lname)			\
 static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
 	{ set_bit(PCG_##lname, &pc->flags);  }
+#define __SETPCGFLAG(uname, lname)			\
+static inline void __SetPageCgroup##uname(struct page_cgroup *pc)\
+	{ __set_bit(PCG_##lname, &pc->flags);  }
 
 #define CLEARPCGFLAG(uname, lname)			\
 static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 	{ clear_bit(PCG_##lname, &pc->flags);  }
+#define __CLEARPCGFLAG(uname, lname)			\
+static inline void __ClearPageCgroup##uname(struct page_cgroup *pc)	\
+	{ __clear_bit(PCG_##lname, &pc->flags);  }
 
 #define TESTCLEARPCGFLAG(uname, lname)			\
 static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
 	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
+#define __TESTCLEARPCGFLAG(uname, lname)			\
+static inline int __TestClearPageCgroup##uname(struct page_cgroup *pc)	\
+	{ return __test_and_clear_bit(PCG_##lname, &pc->flags);  }
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
@@ -75,11 +84,10 @@ TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 SETPCGFLAG(Used, USED)
 
-SETPCGFLAG(AcctLRU, ACCT_LRU)
-CLEARPCGFLAG(AcctLRU, ACCT_LRU)
+__SETPCGFLAG(AcctLRU, ACCT_LRU)
+__CLEARPCGFLAG(AcctLRU, ACCT_LRU)
 TESTPCGFLAG(AcctLRU, ACCT_LRU)
-TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
-
+__TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
 
 SETPCGFLAG(FileMapped, FILE_MAPPED)
 CLEARPCGFLAG(FileMapped, FILE_MAPPED)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b9a3b94..51aba19 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -995,7 +995,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 		/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
 		smp_rmb();
 		memcg = pc->mem_cgroup;
-		SetPageCgroupAcctLRU(pc);
+		__SetPageCgroupAcctLRU(pc);
 	} else
 		memcg = root_mem_cgroup;
 	mz = page_cgroup_zoneinfo(memcg, page);
@@ -1031,7 +1031,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
 	 * LRU-accounting happened against pc->mem_cgroup or
 	 * root_mem_cgroup.
 	 */
-	if (TestClearPageCgroupAcctLRU(pc)) {
+	if (__TestClearPageCgroupAcctLRU(pc)) {
 		VM_BUG_ON(!pc->mem_cgroup);
 		memcg = pc->mem_cgroup;
 	} else
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
                   ` (6 preceding siblings ...)
  2011-11-23 15:42 ` [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically Johannes Weiner
@ 2011-11-23 15:42 ` Johannes Weiner
  2011-11-24  0:13   ` KAMEZAWA Hiroyuki
  2011-11-24  6:09 ` [patch 0/8] mm: memcg fixlets for 3.3 Balbir Singh
  8 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-23 15:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

From: Johannes Weiner <jweiner@redhat.com>

This bit is protected by lock_page_cgroup(), there is no need for
locked operations when setting and clearing it.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 include/linux/page_cgroup.h |    4 ++--
 mm/memcontrol.c             |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index a0bc9d0..14ddcaf 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -77,8 +77,8 @@ static inline int __TestClearPageCgroup##uname(struct page_cgroup *pc)	\
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
-CLEARPCGFLAG(Cache, CACHE)
-SETPCGFLAG(Cache, CACHE)
+__CLEARPCGFLAG(Cache, CACHE)
+__SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 51aba19..8cd1d1c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2444,11 +2444,11 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 	switch (ctype) {
 	case MEM_CGROUP_CHARGE_TYPE_CACHE:
 	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
-		SetPageCgroupCache(pc);
+		__SetPageCgroupCache(pc);
 		SetPageCgroupUsed(pc);
 		break;
 	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
-		ClearPageCgroupCache(pc);
+		__ClearPageCgroupCache(pc);
 		SetPageCgroupUsed(pc);
 		break;
 	default:
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically
  2011-11-23 15:42 ` [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically Johannes Weiner
@ 2011-11-23 18:52   ` Hugh Dickins
  2011-11-24  8:53     ` Johannes Weiner
  2011-11-24  0:09   ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 45+ messages in thread
From: Hugh Dickins @ 2011-11-23 18:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh,
	cgroups, linux-mm, linux-kernel

On Wed, 23 Nov 2011, Johannes Weiner wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> This bit is protected by zone->lru_lock, there is no need for locked
> operations when setting and clearing it.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Unless there are special considerations which you have not mentioned at
all in the description above, this 7/8 and the similar 8/8 are mistaken.

The atomic operation is not for guaranteeing the setting and clearing
of the bit in question: it's for guaranteeing that you don't accidentally
set or clear any of the other bits in the same word when you're doing so,
if another task is updating them at the same time as you're doing this.

There are circumstances when non-atomic shortcuts can be taken, when
you're sure the field cannot yet be visible to other tasks (we do that
when setting PageLocked on a freshly allocated page, for example - but
even then have to rely on others using get_page_unless_zero properly).
But I don't think that's the case here.

Hugh

> ---
>  include/linux/page_cgroup.h |   16 ++++++++++++----
>  mm/memcontrol.c             |    4 ++--
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index aaa60da..a0bc9d0 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -57,14 +57,23 @@ static inline int PageCgroup##uname(struct page_cgroup *pc)	\
>  #define SETPCGFLAG(uname, lname)			\
>  static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
>  	{ set_bit(PCG_##lname, &pc->flags);  }
> +#define __SETPCGFLAG(uname, lname)			\
> +static inline void __SetPageCgroup##uname(struct page_cgroup *pc)\
> +	{ __set_bit(PCG_##lname, &pc->flags);  }
>  
>  #define CLEARPCGFLAG(uname, lname)			\
>  static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  	{ clear_bit(PCG_##lname, &pc->flags);  }
> +#define __CLEARPCGFLAG(uname, lname)			\
> +static inline void __ClearPageCgroup##uname(struct page_cgroup *pc)	\
> +	{ __clear_bit(PCG_##lname, &pc->flags);  }
>  
>  #define TESTCLEARPCGFLAG(uname, lname)			\
>  static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
>  	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
> +#define __TESTCLEARPCGFLAG(uname, lname)			\
> +static inline int __TestClearPageCgroup##uname(struct page_cgroup *pc)	\
> +	{ return __test_and_clear_bit(PCG_##lname, &pc->flags);  }
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> @@ -75,11 +84,10 @@ TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  SETPCGFLAG(Used, USED)
>  
> -SETPCGFLAG(AcctLRU, ACCT_LRU)
> -CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +__SETPCGFLAG(AcctLRU, ACCT_LRU)
> +__CLEARPCGFLAG(AcctLRU, ACCT_LRU)
>  TESTPCGFLAG(AcctLRU, ACCT_LRU)
> -TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
> -
> +__TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
>  
>  SETPCGFLAG(FileMapped, FILE_MAPPED)
>  CLEARPCGFLAG(FileMapped, FILE_MAPPED)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b9a3b94..51aba19 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -995,7 +995,7 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  		/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
>  		smp_rmb();
>  		memcg = pc->mem_cgroup;
> -		SetPageCgroupAcctLRU(pc);
> +		__SetPageCgroupAcctLRU(pc);
>  	} else
>  		memcg = root_mem_cgroup;
>  	mz = page_cgroup_zoneinfo(memcg, page);
> @@ -1031,7 +1031,7 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
>  	 * LRU-accounting happened against pc->mem_cgroup or
>  	 * root_mem_cgroup.
>  	 */
> -	if (TestClearPageCgroupAcctLRU(pc)) {
> +	if (__TestClearPageCgroupAcctLRU(pc)) {
>  		VM_BUG_ON(!pc->mem_cgroup);
>  		memcg = pc->mem_cgroup;
>  	} else
> -- 
> 1.7.6.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task()
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
@ 2011-11-23 22:35   ` David Rientjes
  2011-11-23 23:57   ` KAMEZAWA Hiroyuki
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: David Rientjes @ 2011-11-23 22:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh,
	cgroups, linux-mm, linux-kernel

On Wed, 23 Nov 2011, Johannes Weiner wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> The memcg argument of oom_kill_task() hasn't been used since 341aea2
> 'oom-kill: remove boost_dying_task_prio()'.  Kill it.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg
  2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
@ 2011-11-23 22:40   ` David Rientjes
  2011-11-23 23:58   ` KAMEZAWA Hiroyuki
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: David Rientjes @ 2011-11-23 22:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh,
	cgroups, linux-mm, linux-kernel

On Wed, 23 Nov 2011, Johannes Weiner wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task()
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
  2011-11-23 22:35   ` David Rientjes
@ 2011-11-23 23:57   ` KAMEZAWA Hiroyuki
  2011-11-24  9:07   ` Michal Hocko
  2011-11-28  0:37   ` Balbir Singh
  3 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-23 23:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:24 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> The memcg argument of oom_kill_task() hasn't been used since 341aea2
> 'oom-kill: remove boost_dying_task_prio()'.  Kill it.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg
  2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
  2011-11-23 22:40   ` David Rientjes
@ 2011-11-23 23:58   ` KAMEZAWA Hiroyuki
  2011-11-24  9:17   ` Michal Hocko
  2011-11-28  0:42   ` Balbir Singh
  3 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-23 23:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:25 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/8] mm: memcg: clean up fault accounting
  2011-11-23 15:42 ` [patch 3/8] mm: memcg: clean up fault accounting Johannes Weiner
@ 2011-11-24  0:00   ` KAMEZAWA Hiroyuki
  2011-11-24  9:33   ` Michal Hocko
  2011-11-28  0:45   ` Balbir Singh
  2 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-24  0:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:26 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> The fault accounting functions have a single, memcg-internal user, so
> they don't need to be global.  In fact, their one-line bodies can be
> directly folded into the caller.  And since faults happen one at a
> time, use this_cpu_inc() directly instead of this_cpu_add(foo, 1).
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I'm not sure why Ying Han used this style.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-23 15:42 ` [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL Johannes Weiner
@ 2011-11-24  0:01   ` KAMEZAWA Hiroyuki
  2011-11-24  9:52   ` Michal Hocko
  2011-11-28  7:03   ` Balbir Singh
  2 siblings, 0 replies; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-24  0:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:27 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> Pages have their corresponding page_cgroup descriptors set up before
> they are used in userspace, and thus managed by a memory cgroup.
> 
> The only time where lookup_page_cgroup() can return NULL is in the
> page sanity checking code that executes while feeding pages into the
> page allocator for the first time.
> 
> Remove the NULL checks against lookup_page_cgroup() results from all
> callsites where we know that corresponding page_cgroup descriptors
> must be allocated.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge()
  2011-11-23 15:42 ` [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge() Johannes Weiner
@ 2011-11-24  0:04   ` KAMEZAWA Hiroyuki
  2011-11-24  9:04     ` Johannes Weiner
  0 siblings, 1 reply; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-24  0:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:28 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> All callsites pass in freshly allocated pages and a valid mm.  As a
> result, all checks pertaining the page's mapcount, page->mapping or
> the fallback to init_mm are unneeded.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Hmm, it's true now. But for making clear our assumption to all readers of code,

could you add
VM_BUG_ON(!mm || page_mapped(page) || (page->mapping && !PageAnon(page)) ?

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

> ---
>  mm/memcontrol.c |   13 +------------
>  1 files changed, 1 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d4d139a..0d10be4 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2679,19 +2679,8 @@ int mem_cgroup_newpage_charge(struct page *page,
>  {
>  	if (mem_cgroup_disabled())
>  		return 0;
> -	/*
> -	 * If already mapped, we don't have to account.
> -	 * If page cache, page->mapping has address_space.
> -	 * But page->mapping may have out-of-use anon_vma pointer,
> -	 * detecit it by PageAnon() check. newly-mapped-anon's page->mapping
> -	 * is NULL.
> -  	 */
> -	if (page_mapped(page) || (page->mapping && !PageAnon(page)))
> -		return 0;
> -	if (unlikely(!mm))
> -		mm = &init_mm;
>  	return mem_cgroup_charge_common(page, mm, gfp_mask,
> -				MEM_CGROUP_CHARGE_TYPE_MAPPED);
> +					MEM_CGROUP_CHARGE_TYPE_MAPPED);
>  }
>  
>  static void
> -- 
> 1.7.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page()
  2011-11-23 15:42 ` [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page() Johannes Weiner
@ 2011-11-24  0:06   ` KAMEZAWA Hiroyuki
  2011-11-24  9:06     ` Johannes Weiner
  0 siblings, 1 reply; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-24  0:06 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:29 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> mem_cgroup_uncharge_page() is only called on either freshly allocated
> pages without page->mapping or on rmapped PageAnon() pages.  There is
> no need to check for a page->mapping that is not an anon_vma.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

For making our assumption clearer to readers of codes,
VM_BUG_ON(page->mapping && !PageAnon(page)) please.

Anyway,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


> ---
>  mm/memcontrol.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0d10be4..b9a3b94 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2989,8 +2989,6 @@ void mem_cgroup_uncharge_page(struct page *page)
>  	/* early check. */
>  	if (page_mapped(page))
>  		return;
> -	if (page->mapping && !PageAnon(page))
> -		return;
>  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
>  }
>  
> -- 
> 1.7.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically
  2011-11-23 15:42 ` [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically Johannes Weiner
  2011-11-23 18:52   ` Hugh Dickins
@ 2011-11-24  0:09   ` KAMEZAWA Hiroyuki
  2011-11-24  8:55     ` Johannes Weiner
  1 sibling, 1 reply; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-24  0:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:30 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> This bit is protected by zone->lru_lock, there is no need for locked
> operations when setting and clearing it.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

This atomic ops are for avoiding race with other ops as lock_page_cgroup().
Or other Set/ClearPageCgroup....

Do I misunderstand atomic ops v.s. non-atomic ops race ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically
  2011-11-23 15:42 ` [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically Johannes Weiner
@ 2011-11-24  0:13   ` KAMEZAWA Hiroyuki
  2011-11-24  9:13     ` Johannes Weiner
  0 siblings, 1 reply; 45+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-11-24  0:13 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Wed, 23 Nov 2011 16:42:31 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> From: Johannes Weiner <jweiner@redhat.com>
> 
> This bit is protected by lock_page_cgroup(), there is no need for
> locked operations when setting and clearing it.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Hm. non-atomic ops for pc->flags seems dangerous.
How about try to remove PCG_CACHE ? Maybe we can depends on PageAnon(page).
We see 'page' on memcg->lru now.
I'm sorry I forgot why we needed PCG_CACHE flag..

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 0/8] mm: memcg fixlets for 3.3
  2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
                   ` (7 preceding siblings ...)
  2011-11-23 15:42 ` [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically Johannes Weiner
@ 2011-11-24  6:09 ` Balbir Singh
  2011-11-24  9:45   ` Johannes Weiner
  8 siblings, 1 reply; 45+ messages in thread
From: Balbir Singh @ 2011-11-24  6:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Here are some minor memcg-related cleanups and optimizations, nothing
> too exciting.  The bulk of the diffstat comes from renaming the
> remaining variables to describe a (struct mem_cgroup *) to "memcg".
> The rest cuts down on the (un)charge fastpaths, as people start to get
> annoyed by those functions showing up in the profiles of their their
> non-memcg workloads.  More is to come, but I wanted to get the more
> obvious bits out of the way.

Hi, Johannes

The renaming was a separate patch sent from Raghavendra as well, not
sure if you've seen it. What tests are you using to test these
patches?

Balbir

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically
  2011-11-23 18:52   ` Hugh Dickins
@ 2011-11-24  8:53     ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  8:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, Balbir Singh,
	cgroups, linux-mm, linux-kernel

On Wed, Nov 23, 2011 at 10:52:39AM -0800, Hugh Dickins wrote:
> On Wed, 23 Nov 2011, Johannes Weiner wrote:
> 
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > This bit is protected by zone->lru_lock, there is no need for locked
> > operations when setting and clearing it.
> > 
> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> Unless there are special considerations which you have not mentioned at
> all in the description above, this 7/8 and the similar 8/8 are mistaken.
> 
> The atomic operation is not for guaranteeing the setting and clearing
> of the bit in question: it's for guaranteeing that you don't accidentally
> set or clear any of the other bits in the same word when you're doing so,
> if another task is updating them at the same time as you're doing this.
> 
> There are circumstances when non-atomic shortcuts can be taken, when
> you're sure the field cannot yet be visible to other tasks (we do that
> when setting PageLocked on a freshly allocated page, for example - but
> even then have to rely on others using get_page_unless_zero properly).
> But I don't think that's the case here.

I have no idea how I could oversee this.  You are, of course, right.

That said, I *think* that it is safe for PageCgroupCache because
nobody else should be modifying any pc->flags concurrently:

PCG_LOCK: by definition exclusive and held during setting and clearing
PCG_CACHE

PCG_CACHE: serialized by PCG_LOCK

PCG_USED: serialized by PCG_LOCK

PCG_MIGRATION: serialized by PCG_LOCK

PCG_MOVE_LOCK: 1. update_page_stat() is only called against file
pages, and the page lock serializes charging against mapping.  the
page is also charged before establishing a pte mapping, so an unmap
can not race with a charge.  2. split_huge_fixup runs against already
charged pages.  3. move_account is serialized both by LRU-isolation
during charging and PCG_LOCK

PCG_FILE_MAPPED: same as PCG_MOVE_LOCK's 1.

PCG_ACCT_LRU: pages are isolated from the LRU during charging

But all this is obviously anything but robust, and so I retract the
broken 7/8 and the might-actually-work 8/8.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically
  2011-11-24  0:09   ` KAMEZAWA Hiroyuki
@ 2011-11-24  8:55     ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  8:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Thu, Nov 24, 2011 at 09:09:15AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 23 Nov 2011 16:42:30 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > This bit is protected by zone->lru_lock, there is no need for locked
> > operations when setting and clearing it.
> > 
> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> This atomic ops are for avoiding race with other ops as lock_page_cgroup().
> Or other Set/ClearPageCgroup....
> 
> Do I misunderstand atomic ops v.s. non-atomic ops race ?

Nope, you are spot-on.  I'm the cretin. ;-) See my reply to Hugh's
email.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge()
  2011-11-24  0:04   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:04     ` Johannes Weiner
  2011-11-24 10:30       ` Michal Hocko
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  9:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Thu, Nov 24, 2011 at 09:04:43AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 23 Nov 2011 16:42:28 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > All callsites pass in freshly allocated pages and a valid mm.  As a
> > result, all checks pertaining the page's mapcount, page->mapping or
> > the fallback to init_mm are unneeded.
> > 
> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> Hmm, it's true now. But for making clear our assumption to all readers of code,
> 
> could you add
> VM_BUG_ON(!mm || page_mapped(page) || (page->mapping && !PageAnon(page)) ?

Of course.  Please find the delta patch below.  I broke them up into
three separate checks to make the problem easier to find if the BUG
triggers.

> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thank you.

---
From: Johannes Weiner <jweiner@redhat.com>
Subject: mm: memcg: remove unneeded checks from newpage_charge() fix

Document page state assumptions and catch if they are violated in
newpage_charge().

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/memcontrol.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0d10be4..f338018 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2679,6 +2679,9 @@ int mem_cgroup_newpage_charge(struct page *page,
 {
 	if (mem_cgroup_disabled())
 		return 0;
+	VM_BUG_ON(page_mapped(page));
+	VM_BUG_ON(page->mapping && !PageAnon(page));
+	VM_BUG_ON(!mm);
 	return mem_cgroup_charge_common(page, mm, gfp_mask,
 					MEM_CGROUP_CHARGE_TYPE_MAPPED);
 }
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page()
  2011-11-24  0:06   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:06     ` Johannes Weiner
  2011-11-24 10:34       ` Michal Hocko
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  9:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Thu, Nov 24, 2011 at 09:06:19AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 23 Nov 2011 16:42:29 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > mem_cgroup_uncharge_page() is only called on either freshly allocated
> > pages without page->mapping or on rmapped PageAnon() pages.  There is
> > no need to check for a page->mapping that is not an anon_vma.
> > 
> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> For making our assumption clearer to readers of codes,
> VM_BUG_ON(page->mapping && !PageAnon(page)) please.

Yep, delta patch below.

> Anyway,
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks!

---
From: Johannes Weiner <jweiner@redhat.com>
Subject: mm: memcg: remove unneeded checks from uncharge_page() fix

Document page state assumptions and catch if they are violated in
uncharge_page().

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
---
 mm/memcontrol.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2f1fdc4..872dae1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2992,6 +2992,7 @@ void mem_cgroup_uncharge_page(struct page *page)
 	/* early check. */
 	if (page_mapped(page))
 		return;
+	VM_BUG_ON(page->mapping && !PageAnon(page));
 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
 }
 
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task()
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
  2011-11-23 22:35   ` David Rientjes
  2011-11-23 23:57   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:07   ` Michal Hocko
  2011-11-28  0:37   ` Balbir Singh
  3 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2011-11-24  9:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Wed 23-11-11 16:42:24, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@redhat.com>
> 
> The memcg argument of oom_kill_task() hasn't been used since 341aea2
> 'oom-kill: remove boost_dying_task_prio()'.  Kill it.
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Right you are.

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/oom_kill.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 471dedb..fd9e303 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -423,7 +423,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> -static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> +static int oom_kill_task(struct task_struct *p)
>  {
>  	struct task_struct *q;
>  	struct mm_struct *mm;
> @@ -522,7 +522,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  		}
>  	} while_each_thread(p, t);
>  
> -	return oom_kill_task(victim, mem);
> +	return oom_kill_task(victim);
>  }
>  
>  /*
> -- 
> 1.7.6.4
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically
  2011-11-24  0:13   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:13     ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  9:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Michal Hocko, Balbir Singh, cgroups, linux-mm,
	linux-kernel

On Thu, Nov 24, 2011 at 09:13:28AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 23 Nov 2011 16:42:31 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > This bit is protected by lock_page_cgroup(), there is no need for
> > locked operations when setting and clearing it.
> > 
> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> Hm. non-atomic ops for pc->flags seems dangerous.
> How about try to remove PCG_CACHE ? Maybe we can depends on PageAnon(page).
> We see 'page' on memcg->lru now.
> I'm sorry I forgot why we needed PCG_CACHE flag..

The problem is that we charge/uncharged pages that are not fully
rmapped and so PageAnon() is not reliable.  I forgot if there are more
places, but the commit_charge in migration was a prominent one.

I have a patch set that reworks migration so to only commit pages that
are fully rmapped but it clashed with the THP patches and I didn't see
too much value to fix it up.  But I should probably revive it, because
it makes some things simpler.

As I replied to Hugh, it might even work for PCG_CACHE, but it's
definitely dangerous and not worth the complex dependencies it brings
on other parts of the code, so please consider 7/8 and 8/8 dropped.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg
  2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
  2011-11-23 22:40   ` David Rientjes
  2011-11-23 23:58   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:17   ` Michal Hocko
  2011-11-28  0:42   ` Balbir Singh
  3 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2011-11-24  9:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Wed 23-11-11 16:42:25, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@redhat.com>
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/memcontrol.h |   16 ++++++------
>  include/linux/oom.h        |    2 +-
>  include/linux/rmap.h       |    4 +-
>  mm/memcontrol.c            |   52 ++++++++++++++++++++++---------------------
>  mm/oom_kill.c              |   38 ++++++++++++++++----------------
>  mm/rmap.c                  |   20 ++++++++--------
>  mm/swapfile.c              |    9 ++++---
>  mm/vmscan.c                |   12 +++++-----
>  8 files changed, 78 insertions(+), 75 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 2bf7698..a072ebe 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -54,10 +54,10 @@ extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
>  				gfp_t gfp_mask);
>  /* for swap handling */
>  extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> -		struct page *page, gfp_t mask, struct mem_cgroup **ptr);
> +		struct page *page, gfp_t mask, struct mem_cgroup **memcgp);
>  extern void mem_cgroup_commit_charge_swapin(struct page *page,
> -					struct mem_cgroup *ptr);
> -extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr);
> +					struct mem_cgroup *memcg);
> +extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg);
>  
>  extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  					gfp_t gfp_mask);
> @@ -98,7 +98,7 @@ extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg);
>  
>  extern int
>  mem_cgroup_prepare_migration(struct page *page,
> -	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask);
> +	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask);
>  extern void mem_cgroup_end_migration(struct mem_cgroup *memcg,
>  	struct page *oldpage, struct page *newpage, bool migration_ok);
>  
> @@ -181,17 +181,17 @@ static inline int mem_cgroup_cache_charge(struct page *page,
>  }
>  
>  static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
> -		struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
> +		struct page *page, gfp_t gfp_mask, struct mem_cgroup **memcgp)
>  {
>  	return 0;
>  }
>  
>  static inline void mem_cgroup_commit_charge_swapin(struct page *page,
> -					  struct mem_cgroup *ptr)
> +					  struct mem_cgroup *memcg)
>  {
>  }
>  
> -static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr)
> +static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg)
>  {
>  }
>  
> @@ -270,7 +270,7 @@ static inline struct cgroup_subsys_state
>  
>  static inline int
>  mem_cgroup_prepare_migration(struct page *page, struct page *newpage,
> -	struct mem_cgroup **ptr, gfp_t gfp_mask)
> +	struct mem_cgroup **memcgp, gfp_t gfp_mask)
>  {
>  	return 0;
>  }
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 6f9d04a..552fba9 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -43,7 +43,7 @@ enum oom_constraint {
>  extern void compare_swap_oom_score_adj(int old_val, int new_val);
>  extern int test_set_oom_score_adj(int new_val);
>  
> -extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  			const nodemask_t *nodemask, unsigned long totalpages);
>  extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
>  extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 2148b12..5ee84fb 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -157,7 +157,7 @@ static inline void page_dup_rmap(struct page *page)
>   * Called from mm/vmscan.c to handle paging out
>   */
>  int page_referenced(struct page *, int is_locked,
> -			struct mem_cgroup *cnt, unsigned long *vm_flags);
> +			struct mem_cgroup *memcg, unsigned long *vm_flags);
>  int page_referenced_one(struct page *, struct vm_area_struct *,
>  	unsigned long address, unsigned int *mapcount, unsigned long *vm_flags);
>  
> @@ -235,7 +235,7 @@ int rmap_walk(struct page *page, int (*rmap_one)(struct page *,
>  #define anon_vma_link(vma)	do {} while (0)
>  
>  static inline int page_referenced(struct page *page, int is_locked,
> -				  struct mem_cgroup *cnt,
> +				  struct mem_cgroup *memcg,
>  				  unsigned long *vm_flags)
>  {
>  	*vm_flags = 0;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f524660..473b99f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2778,12 +2778,12 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>   */
>  int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
>  				 struct page *page,
> -				 gfp_t mask, struct mem_cgroup **ptr)
> +				 gfp_t mask, struct mem_cgroup **memcgp)
>  {
>  	struct mem_cgroup *memcg;
>  	int ret;
>  
> -	*ptr = NULL;
> +	*memcgp = NULL;
>  
>  	if (mem_cgroup_disabled())
>  		return 0;
> @@ -2801,27 +2801,27 @@ int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
>  	memcg = try_get_mem_cgroup_from_page(page);
>  	if (!memcg)
>  		goto charge_cur_mm;
> -	*ptr = memcg;
> -	ret = __mem_cgroup_try_charge(NULL, mask, 1, ptr, true);
> +	*memcgp = memcg;
> +	ret = __mem_cgroup_try_charge(NULL, mask, 1, memcgp, true);
>  	css_put(&memcg->css);
>  	return ret;
>  charge_cur_mm:
>  	if (unlikely(!mm))
>  		mm = &init_mm;
> -	return __mem_cgroup_try_charge(mm, mask, 1, ptr, true);
> +	return __mem_cgroup_try_charge(mm, mask, 1, memcgp, true);
>  }
>  
>  static void
> -__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> +__mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *memcg,
>  					enum charge_type ctype)
>  {
>  	if (mem_cgroup_disabled())
>  		return;
> -	if (!ptr)
> +	if (!memcg)
>  		return;
> -	cgroup_exclude_rmdir(&ptr->css);
> +	cgroup_exclude_rmdir(&memcg->css);
>  
> -	__mem_cgroup_commit_charge_lrucare(page, ptr, ctype);
> +	__mem_cgroup_commit_charge_lrucare(page, memcg, ctype);
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> @@ -2831,21 +2831,22 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	 */
>  	if (do_swap_account && PageSwapCache(page)) {
>  		swp_entry_t ent = {.val = page_private(page)};
> +		struct mem_cgroup *swap_memcg;
>  		unsigned short id;
> -		struct mem_cgroup *memcg;
>  
>  		id = swap_cgroup_record(ent, 0);
>  		rcu_read_lock();
> -		memcg = mem_cgroup_lookup(id);
> -		if (memcg) {
> +		swap_memcg = mem_cgroup_lookup(id);
> +		if (swap_memcg) {
>  			/*
>  			 * This recorded memcg can be obsolete one. So, avoid
>  			 * calling css_tryget
>  			 */
> -			if (!mem_cgroup_is_root(memcg))
> -				res_counter_uncharge(&memcg->memsw, PAGE_SIZE);
> -			mem_cgroup_swap_statistics(memcg, false);
> -			mem_cgroup_put(memcg);
> +			if (!mem_cgroup_is_root(swap_memcg))
> +				res_counter_uncharge(&swap_memcg->memsw,
> +						     PAGE_SIZE);
> +			mem_cgroup_swap_statistics(swap_memcg, false);
> +			mem_cgroup_put(swap_memcg);
>  		}
>  		rcu_read_unlock();
>  	}
> @@ -2854,13 +2855,14 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	 * So, rmdir()->pre_destroy() can be called while we do this charge.
>  	 * In that case, we need to call pre_destroy() again. check it here.
>  	 */
> -	cgroup_release_and_wakeup_rmdir(&ptr->css);
> +	cgroup_release_and_wakeup_rmdir(&memcg->css);
>  }
>  
> -void mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr)
> +void mem_cgroup_commit_charge_swapin(struct page *page,
> +				     struct mem_cgroup *memcg)
>  {
> -	__mem_cgroup_commit_charge_swapin(page, ptr,
> -					MEM_CGROUP_CHARGE_TYPE_MAPPED);
> +	__mem_cgroup_commit_charge_swapin(page, memcg,
> +					  MEM_CGROUP_CHARGE_TYPE_MAPPED);
>  }
>  
>  void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg)
> @@ -3189,14 +3191,14 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>   * page belongs to.
>   */
>  int mem_cgroup_prepare_migration(struct page *page,
> -	struct page *newpage, struct mem_cgroup **ptr, gfp_t gfp_mask)
> +	struct page *newpage, struct mem_cgroup **memcgp, gfp_t gfp_mask)
>  {
>  	struct mem_cgroup *memcg = NULL;
>  	struct page_cgroup *pc;
>  	enum charge_type ctype;
>  	int ret = 0;
>  
> -	*ptr = NULL;
> +	*memcgp = NULL;
>  
>  	VM_BUG_ON(PageTransHuge(page));
>  	if (mem_cgroup_disabled())
> @@ -3247,10 +3249,10 @@ int mem_cgroup_prepare_migration(struct page *page,
>  	if (!memcg)
>  		return 0;
>  
> -	*ptr = memcg;
> -	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, ptr, false);
> +	*memcgp = memcg;
> +	ret = __mem_cgroup_try_charge(NULL, gfp_mask, 1, memcgp, false);
>  	css_put(&memcg->css);/* drop extra refcnt */
> -	if (ret || *ptr == NULL) {
> +	if (ret || *memcgp == NULL) {
>  		if (PageAnon(page)) {
>  			lock_page_cgroup(pc);
>  			ClearPageCgroupMigration(pc);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index fd9e303..307351e 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -146,7 +146,7 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
>  
>  /* return true if the task is not adequate as candidate victim task. */
>  static bool oom_unkillable_task(struct task_struct *p,
> -		const struct mem_cgroup *mem, const nodemask_t *nodemask)
> +		const struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  {
>  	if (is_global_init(p))
>  		return true;
> @@ -154,7 +154,7 @@ static bool oom_unkillable_task(struct task_struct *p,
>  		return true;
>  
>  	/* When mem_cgroup_out_of_memory() and p is not member of the group */
> -	if (mem && !task_in_mem_cgroup(p, mem))
> +	if (memcg && !task_in_mem_cgroup(p, memcg))
>  		return true;
>  
>  	/* p may not have freeable memory in nodemask */
> @@ -173,12 +173,12 @@ static bool oom_unkillable_task(struct task_struct *p,
>   * predictable as possible.  The goal is to return the highest value for the
>   * task consuming the most memory to avoid subsequent oom failures.
>   */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
> +unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  		      const nodemask_t *nodemask, unsigned long totalpages)
>  {
>  	int points;
>  
> -	if (oom_unkillable_task(p, mem, nodemask))
> +	if (oom_unkillable_task(p, memcg, nodemask))
>  		return 0;
>  
>  	p = find_lock_task_mm(p);
> @@ -297,7 +297,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
>   * (not docbooked, we don't want this one cluttering up the manual)
>   */
>  static struct task_struct *select_bad_process(unsigned int *ppoints,
> -		unsigned long totalpages, struct mem_cgroup *mem,
> +		unsigned long totalpages, struct mem_cgroup *memcg,
>  		const nodemask_t *nodemask)
>  {
>  	struct task_struct *g, *p;
> @@ -309,7 +309,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  
>  		if (p->exit_state)
>  			continue;
> -		if (oom_unkillable_task(p, mem, nodemask))
> +		if (oom_unkillable_task(p, memcg, nodemask))
>  			continue;
>  
>  		/*
> @@ -353,7 +353,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>  			}
>  		}
>  
> -		points = oom_badness(p, mem, nodemask, totalpages);
> +		points = oom_badness(p, memcg, nodemask, totalpages);
>  		if (points > *ppoints) {
>  			chosen = p;
>  			*ppoints = points;
> @@ -376,14 +376,14 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>   *
>   * Call with tasklist_lock read-locked.
>   */
> -static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
> +static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  {
>  	struct task_struct *p;
>  	struct task_struct *task;
>  
>  	pr_info("[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name\n");
>  	for_each_process(p) {
> -		if (oom_unkillable_task(p, mem, nodemask))
> +		if (oom_unkillable_task(p, memcg, nodemask))
>  			continue;
>  
>  		task = find_lock_task_mm(p);
> @@ -406,7 +406,7 @@ static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
>  }
>  
>  static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
> -			struct mem_cgroup *mem, const nodemask_t *nodemask)
> +			struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  {
>  	task_lock(current);
>  	pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
> @@ -416,10 +416,10 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>  	cpuset_print_task_mems_allowed(current);
>  	task_unlock(current);
>  	dump_stack();
> -	mem_cgroup_print_oom_info(mem, p);
> +	mem_cgroup_print_oom_info(memcg, p);
>  	show_mem(SHOW_MEM_FILTER_NODES);
>  	if (sysctl_oom_dump_tasks)
> -		dump_tasks(mem, nodemask);
> +		dump_tasks(memcg, nodemask);
>  }
>  
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> @@ -473,7 +473,7 @@ static int oom_kill_task(struct task_struct *p)
>  
>  static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  			    unsigned int points, unsigned long totalpages,
> -			    struct mem_cgroup *mem, nodemask_t *nodemask,
> +			    struct mem_cgroup *memcg, nodemask_t *nodemask,
>  			    const char *message)
>  {
>  	struct task_struct *victim = p;
> @@ -482,7 +482,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  	unsigned int victim_points = 0;
>  
>  	if (printk_ratelimit())
> -		dump_header(p, gfp_mask, order, mem, nodemask);
> +		dump_header(p, gfp_mask, order, memcg, nodemask);
>  
>  	/*
>  	 * If the task is already exiting, don't alarm the sysadmin or kill
> @@ -513,7 +513,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>  			/*
>  			 * oom_badness() returns 0 if the thread is unkillable
>  			 */
> -			child_points = oom_badness(child, mem, nodemask,
> +			child_points = oom_badness(child, memcg, nodemask,
>  								totalpages);
>  			if (child_points > victim_points) {
>  				victim = child;
> @@ -550,7 +550,7 @@ static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
>  }
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> -void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
> +void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask)
>  {
>  	unsigned long limit;
>  	unsigned int points = 0;
> @@ -567,14 +567,14 @@ void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask)
>  	}
>  
>  	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0, NULL);
> -	limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;
> +	limit = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT;
>  	read_lock(&tasklist_lock);
>  retry:
> -	p = select_bad_process(&points, limit, mem, NULL);
> +	p = select_bad_process(&points, limit, memcg, NULL);
>  	if (!p || PTR_ERR(p) == -1UL)
>  		goto out;
>  
> -	if (oom_kill_process(p, gfp_mask, 0, points, limit, mem, NULL,
> +	if (oom_kill_process(p, gfp_mask, 0, points, limit, memcg, NULL,
>  				"Memory cgroup out of memory"))
>  		goto retry;
>  out:
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a4fd368..c13791b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -728,7 +728,7 @@ out:
>  }
>  
>  static int page_referenced_anon(struct page *page,
> -				struct mem_cgroup *mem_cont,
> +				struct mem_cgroup *memcg,
>  				unsigned long *vm_flags)
>  {
>  	unsigned int mapcount;
> @@ -751,7 +751,7 @@ static int page_referenced_anon(struct page *page,
>  		 * counting on behalf of references from different
>  		 * cgroups
>  		 */
> -		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> +		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
>  			continue;
>  		referenced += page_referenced_one(page, vma, address,
>  						  &mapcount, vm_flags);
> @@ -766,7 +766,7 @@ static int page_referenced_anon(struct page *page,
>  /**
>   * page_referenced_file - referenced check for object-based rmap
>   * @page: the page we're checking references on.
> - * @mem_cont: target memory controller
> + * @memcg: target memory control group
>   * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
>   *
>   * For an object-based mapped page, find all the places it is mapped and
> @@ -777,7 +777,7 @@ static int page_referenced_anon(struct page *page,
>   * This function is only called from page_referenced for object-based pages.
>   */
>  static int page_referenced_file(struct page *page,
> -				struct mem_cgroup *mem_cont,
> +				struct mem_cgroup *memcg,
>  				unsigned long *vm_flags)
>  {
>  	unsigned int mapcount;
> @@ -819,7 +819,7 @@ static int page_referenced_file(struct page *page,
>  		 * counting on behalf of references from different
>  		 * cgroups
>  		 */
> -		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> +		if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
>  			continue;
>  		referenced += page_referenced_one(page, vma, address,
>  						  &mapcount, vm_flags);
> @@ -835,7 +835,7 @@ static int page_referenced_file(struct page *page,
>   * page_referenced - test if the page was referenced
>   * @page: the page to test
>   * @is_locked: caller holds lock on the page
> - * @mem_cont: target memory controller
> + * @memcg: target memory cgroup
>   * @vm_flags: collect encountered vma->vm_flags who actually referenced the page
>   *
>   * Quick test_and_clear_referenced for all mappings to a page,
> @@ -843,7 +843,7 @@ static int page_referenced_file(struct page *page,
>   */
>  int page_referenced(struct page *page,
>  		    int is_locked,
> -		    struct mem_cgroup *mem_cont,
> +		    struct mem_cgroup *memcg,
>  		    unsigned long *vm_flags)
>  {
>  	int referenced = 0;
> @@ -859,13 +859,13 @@ int page_referenced(struct page *page,
>  			}
>  		}
>  		if (unlikely(PageKsm(page)))
> -			referenced += page_referenced_ksm(page, mem_cont,
> +			referenced += page_referenced_ksm(page, memcg,
>  								vm_flags);
>  		else if (PageAnon(page))
> -			referenced += page_referenced_anon(page, mem_cont,
> +			referenced += page_referenced_anon(page, memcg,
>  								vm_flags);
>  		else if (page->mapping)
> -			referenced += page_referenced_file(page, mem_cont,
> +			referenced += page_referenced_file(page, memcg,
>  								vm_flags);
>  		if (we_locked)
>  			unlock_page(page);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index b1cd120..c2e1312 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -847,12 +847,13 @@ unsigned int count_swap_pages(int type, int free)
>  static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  		unsigned long addr, swp_entry_t entry, struct page *page)
>  {
> -	struct mem_cgroup *ptr;
> +	struct mem_cgroup *memcg;
>  	spinlock_t *ptl;
>  	pte_t *pte;
>  	int ret = 1;
>  
> -	if (mem_cgroup_try_charge_swapin(vma->vm_mm, page, GFP_KERNEL, &ptr)) {
> +	if (mem_cgroup_try_charge_swapin(vma->vm_mm, page,
> +					 GFP_KERNEL, &memcg)) {
>  		ret = -ENOMEM;
>  		goto out_nolock;
>  	}
> @@ -860,7 +861,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>  	if (unlikely(!pte_same(*pte, swp_entry_to_pte(entry)))) {
>  		if (ret > 0)
> -			mem_cgroup_cancel_charge_swapin(ptr);
> +			mem_cgroup_cancel_charge_swapin(memcg);
>  		ret = 0;
>  		goto out;
>  	}
> @@ -871,7 +872,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  	set_pte_at(vma->vm_mm, addr, pte,
>  		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
>  	page_add_anon_rmap(page, vma, addr);
> -	mem_cgroup_commit_charge_swapin(page, ptr);
> +	mem_cgroup_commit_charge_swapin(page, memcg);
>  	swap_free(entry);
>  	/*
>  	 * Move the page to the active list so it is not
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 21ce3cb..855c450 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2387,7 +2387,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>  
> -unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
> +unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
>  						gfp_t gfp_mask, bool noswap,
>  						struct zone *zone,
>  						unsigned long *nr_scanned)
> @@ -2399,10 +2399,10 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>  		.may_unmap = 1,
>  		.may_swap = !noswap,
>  		.order = 0,
> -		.target_mem_cgroup = mem,
> +		.target_mem_cgroup = memcg,
>  	};
>  	struct mem_cgroup_zone mz = {
> -		.mem_cgroup = mem,
> +		.mem_cgroup = memcg,
>  		.zone = zone,
>  	};
>  
> @@ -2428,7 +2428,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
>  	return sc.nr_reclaimed;
>  }
>  
> -unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> +unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>  					   gfp_t gfp_mask,
>  					   bool noswap)
>  {
> @@ -2441,7 +2441,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  		.may_swap = !noswap,
>  		.nr_to_reclaim = SWAP_CLUSTER_MAX,
>  		.order = 0,
> -		.target_mem_cgroup = mem_cont,
> +		.target_mem_cgroup = memcg,
>  		.nodemask = NULL, /* we don't care the placement */
>  		.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
>  				(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK),
> @@ -2455,7 +2455,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
>  	 * take care of from where we get pages. So the node where we start the
>  	 * scan does not need to be the current node.
>  	 */
> -	nid = mem_cgroup_select_victim_node(mem_cont);
> +	nid = mem_cgroup_select_victim_node(memcg);
>  
>  	zonelist = NODE_DATA(nid)->node_zonelists;
>  
> -- 
> 1.7.6.4
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/8] mm: memcg: clean up fault accounting
  2011-11-23 15:42 ` [patch 3/8] mm: memcg: clean up fault accounting Johannes Weiner
  2011-11-24  0:00   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:33   ` Michal Hocko
  2011-11-24  9:51     ` Johannes Weiner
  2011-11-28  0:45   ` Balbir Singh
  2 siblings, 1 reply; 45+ messages in thread
From: Michal Hocko @ 2011-11-24  9:33 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Wed 23-11-11 16:42:26, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@redhat.com>
> 
> The fault accounting functions have a single, memcg-internal user, so
> they don't need to be global.  In fact, their one-line bodies can be
> directly folded into the caller.  

At first I thought that this doesn't help much because the generated
code should be exactly same but thinking about it some more it makes
sense.
We should have a single place where we account for events. Maybe we
should include also accounting done in mem_cgroup_charge_statistics
(this would however mean that mem_cgroup_count_vm_event would have to be
split). What do you think?

> And since faults happen one at a time, use this_cpu_inc() directly
> instead of this_cpu_add(foo, 1).

The generated code will be same but it is easier to read, so agreed.

> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Anyway
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c |   14 ++------------
>  1 files changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 473b99f..d825af9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -589,16 +589,6 @@ static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg,
>  	this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAPOUT], val);
>  }
>  
> -void mem_cgroup_pgfault(struct mem_cgroup *memcg, int val)
> -{
> -	this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGFAULT], val);
> -}
> -
> -void mem_cgroup_pgmajfault(struct mem_cgroup *memcg, int val)
> -{
> -	this_cpu_add(memcg->stat->events[MEM_CGROUP_EVENTS_PGMAJFAULT], val);
> -}
> -
>  static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg,
>  					    enum mem_cgroup_events_index idx)
>  {
> @@ -913,10 +903,10 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
>  
>  	switch (idx) {
>  	case PGMAJFAULT:
> -		mem_cgroup_pgmajfault(memcg, 1);
> +		this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGFAULT]);
>  		break;
>  	case PGFAULT:
> -		mem_cgroup_pgfault(memcg, 1);
> +		this_cpu_inc(memcg->stat->events[MEM_CGROUP_EVENTS_PGMAJFAULT]);
>  		break;
>  	default:
>  		BUG();
> -- 
> 1.7.6.4
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 0/8] mm: memcg fixlets for 3.3
  2011-11-24  6:09 ` [patch 0/8] mm: memcg fixlets for 3.3 Balbir Singh
@ 2011-11-24  9:45   ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  9:45 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Thu, Nov 24, 2011 at 11:39:39AM +0530, Balbir Singh wrote:
> On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > Here are some minor memcg-related cleanups and optimizations, nothing
> > too exciting.  The bulk of the diffstat comes from renaming the
> > remaining variables to describe a (struct mem_cgroup *) to "memcg".
> > The rest cuts down on the (un)charge fastpaths, as people start to get
> > annoyed by those functions showing up in the profiles of their their
> > non-memcg workloads.  More is to come, but I wanted to get the more
> > obvious bits out of the way.
> 
> Hi, Johannes
> 
> The renaming was a separate patch sent from Raghavendra as well, not
> sure if you've seen it.

I did and they are already in -mm, but unless I miss something, those
were only for memcontrol.[ch].  My patch is for the rest of mm.

> What tests are you using to test these patches?

I usually run concurrent kernbench jobs in separate memcgs as a smoke
test with these tools:

	http://git.cmpxchg.org/?p=statutils.git;a=summary

"runtest" takes a job-spec file that looks a bit like RPM spec to
define works with preparation and cleanup phases, and data collectors.
The memcg kernbench job I use is in the examples directory.  You just
need to put separate kernel source directories into place (linux-`seq
-w 04`) and then launch it like this:

	runtest -s memcg-kernbench.load `uname -r`

which will run the test and collect memory.stat of the parent memcg
every second, which you can then further evaluate with the other
tools:

	readdict < `uname -r`-memory.stat.data | columns 14 15 | plot

for example, where readdict translates the "key value" lines of
memory.stat into tables where each value is on its own row.  Columns
14 and 15 are total_cache and total_rss (find out with cat -n -- yeah,
still a bit rough).  You need python-matplotlib for plot to work.

Multiple runs can be collected into the same logfiles and then fold
ever-increasing counters with the "events" tool.  For example, to find
the average fault count, you would do something like this (19 =
total_pgfault, 20 = total_pgmajfault):

	for x in `seq 10`; do runtest -s foo.load foo`; done
	readdict < foo-memory.stat.data | columns 19 20 | events | mean -s

Oh, and workload runtime is always recorded in NAME.time, so

	events < `uname -r`.time

gives you the timings of each run, which you can then further process
with "mean" or "median" again.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/8] mm: memcg: clean up fault accounting
  2011-11-24  9:33   ` Michal Hocko
@ 2011-11-24  9:51     ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24  9:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu, Nov 24, 2011 at 10:33:49AM +0100, Michal Hocko wrote:
> On Wed 23-11-11 16:42:26, Johannes Weiner wrote:
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > The fault accounting functions have a single, memcg-internal user, so
> > they don't need to be global.  In fact, their one-line bodies can be
> > directly folded into the caller.  
> 
> At first I thought that this doesn't help much because the generated
> code should be exactly same but thinking about it some more it makes
> sense.
> We should have a single place where we account for events. Maybe we
> should include also accounting done in mem_cgroup_charge_statistics
> (this would however mean that mem_cgroup_count_vm_event would have to be
> split). What do you think?

I'm all for unifying all the stats crap into a single place.
Optimally, we should have been able to put memcg hooks below
count_vm_event* but maybe that ship has sailed with PGPGIN/PGPGOUT
having different meanings between memcg and the rest of the system :/

Anything in that direction is improvement, IMO.

> > And since faults happen one at a time, use this_cpu_inc() directly
> > instead of this_cpu_add(foo, 1).
> 
> The generated code will be same but it is easier to read, so agreed.

And it fits within 80 columns :-)

> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> Anyway
> Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-23 15:42 ` [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL Johannes Weiner
  2011-11-24  0:01   ` KAMEZAWA Hiroyuki
@ 2011-11-24  9:52   ` Michal Hocko
  2011-11-24 10:05     ` Johannes Weiner
  2011-11-28  7:03   ` Balbir Singh
  2 siblings, 1 reply; 45+ messages in thread
From: Michal Hocko @ 2011-11-24  9:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Wed 23-11-11 16:42:27, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@redhat.com>
> 
> Pages have their corresponding page_cgroup descriptors set up before
> they are used in userspace, and thus managed by a memory cgroup.
> 
> The only time where lookup_page_cgroup() can return NULL is in the
> page sanity checking code that executes while feeding pages into the
> page allocator for the first time.
> 
> Remove the NULL checks against lookup_page_cgroup() results from all
> callsites where we know that corresponding page_cgroup descriptors
> must be allocated.

OK, shouldn't we add

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 2d123f9..cb93f64 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -35,8 +35,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
 	struct page_cgroup *base;
 
 	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
-	if (unlikely(!base))
-		return NULL;
+	BUG_ON(!base);
 
 	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
 	return base + offset;
@@ -112,8 +111,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
 	unsigned long pfn = page_to_pfn(page);
 	struct mem_section *section = __pfn_to_section(pfn);
 
-	if (!section->page_cgroup)
-		return NULL;
+	BUG_ON(!section->page_cgroup);
 	return section->page_cgroup + pfn;
 }
 
just to make it explicit?

> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Other than that
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c |    8 ++------
>  1 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d825af9..d4d139a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1894,9 +1894,6 @@ void mem_cgroup_update_page_stat(struct page *page,
>  	bool need_unlock = false;
>  	unsigned long uninitialized_var(flags);
>  
> -	if (unlikely(!pc))
> -		return;
> -
>  	rcu_read_lock();
>  	memcg = pc->mem_cgroup;
>  	if (unlikely(!memcg || !PageCgroupUsed(pc)))
> @@ -2669,8 +2666,6 @@ static int mem_cgroup_charge_common(struct page *page, struct mm_struct *mm,
>  	}
>  
>  	pc = lookup_page_cgroup(page);
> -	BUG_ON(!pc); /* XXX: remove this and move pc lookup into commit */
> -
>  	ret = __mem_cgroup_try_charge(mm, gfp_mask, nr_pages, &memcg, oom);
>  	if (ret || !memcg)
>  		return ret;
> @@ -2942,7 +2937,7 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	 * Check if our page_cgroup is valid
>  	 */
>  	pc = lookup_page_cgroup(page);
> -	if (unlikely(!pc || !PageCgroupUsed(pc)))
> +	if (unlikely(!PageCgroupUsed(pc)))
>  		return NULL;
>  
>  	lock_page_cgroup(pc);
> @@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
>  	struct page_cgroup *pc;
>  
>  	pc = lookup_page_cgroup(page);
> +	/* Can be NULL while bootstrapping the page allocator */
>  	if (likely(pc) && PageCgroupUsed(pc))
>  		return pc;
>  	return NULL;
> -- 
> 1.7.6.4
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-24  9:52   ` Michal Hocko
@ 2011-11-24 10:05     ` Johannes Weiner
  2011-11-24 10:26       ` Michal Hocko
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24 10:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu, Nov 24, 2011 at 10:52:51AM +0100, Michal Hocko wrote:
> On Wed 23-11-11 16:42:27, Johannes Weiner wrote:
> > From: Johannes Weiner <jweiner@redhat.com>
> > 
> > Pages have their corresponding page_cgroup descriptors set up before
> > they are used in userspace, and thus managed by a memory cgroup.
> > 
> > The only time where lookup_page_cgroup() can return NULL is in the
> > page sanity checking code that executes while feeding pages into the
> > page allocator for the first time.
> > 
> > Remove the NULL checks against lookup_page_cgroup() results from all
> > callsites where we know that corresponding page_cgroup descriptors
> > must be allocated.
> 
> OK, shouldn't we add
> 
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 2d123f9..cb93f64 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -35,8 +35,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  	struct page_cgroup *base;
>  
>  	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
> -	if (unlikely(!base))
> -		return NULL;
> +	BUG_ON(!base);
>  
>  	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
>  	return base + offset;
> @@ -112,8 +111,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  	unsigned long pfn = page_to_pfn(page);
>  	struct mem_section *section = __pfn_to_section(pfn);
>  
> -	if (!section->page_cgroup)
> -		return NULL;
> +	BUG_ON(!section->page_cgroup);
>  	return section->page_cgroup + pfn;
>  }
>  
> just to make it explicit?

No, see the last hunk in this patch.  It's actually possible for this
to run, although only while feeding fresh pages into the allocator:

> > @@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
> >  	struct page_cgroup *pc;
> >  
> >  	pc = lookup_page_cgroup(page);
> > +	/* Can be NULL while bootstrapping the page allocator */
> >  	if (likely(pc) && PageCgroupUsed(pc))
> >  		return pc;
> >  	return NULL;

We could add a lookup_page_cgroup_safe() for this DEBUG_VM-only
callsite as an optimization separately and remove the NULL check from
lookup_page_cgroup() itself.  But this patch was purely about removing
the actively misleading checks.

> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> Other than that
> Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-24 10:05     ` Johannes Weiner
@ 2011-11-24 10:26       ` Michal Hocko
  2011-11-28  9:15         ` Johannes Weiner
  0 siblings, 1 reply; 45+ messages in thread
From: Michal Hocko @ 2011-11-24 10:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu 24-11-11 11:05:49, Johannes Weiner wrote:
> On Thu, Nov 24, 2011 at 10:52:51AM +0100, Michal Hocko wrote:
> > On Wed 23-11-11 16:42:27, Johannes Weiner wrote:
> > > From: Johannes Weiner <jweiner@redhat.com>
> > > 
> > > Pages have their corresponding page_cgroup descriptors set up before
> > > they are used in userspace, and thus managed by a memory cgroup.
> > > 
> > > The only time where lookup_page_cgroup() can return NULL is in the
> > > page sanity checking code that executes while feeding pages into the
> > > page allocator for the first time.
> > > 
> > > Remove the NULL checks against lookup_page_cgroup() results from all
> > > callsites where we know that corresponding page_cgroup descriptors
> > > must be allocated.
> > 
> > OK, shouldn't we add
> > 
> > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > index 2d123f9..cb93f64 100644
> > --- a/mm/page_cgroup.c
> > +++ b/mm/page_cgroup.c
> > @@ -35,8 +35,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
> >  	struct page_cgroup *base;
> >  
> >  	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
> > -	if (unlikely(!base))
> > -		return NULL;
> > +	BUG_ON(!base);
> >  
> >  	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
> >  	return base + offset;
> > @@ -112,8 +111,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
> >  	unsigned long pfn = page_to_pfn(page);
> >  	struct mem_section *section = __pfn_to_section(pfn);
> >  
> > -	if (!section->page_cgroup)
> > -		return NULL;
> > +	BUG_ON(!section->page_cgroup);
> >  	return section->page_cgroup + pfn;
> >  }
> >  
> > just to make it explicit?
> 
> No, see the last hunk in this patch.  It's actually possible for this
> to run, although only while feeding fresh pages into the allocator:

Bahh. Yes, I have noticed the hunk but then I started thinking about
how to make the NULL case explicit and totally forgot about that.
Sorry about the noise.

> 
> > > @@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
> > >  	struct page_cgroup *pc;
> > >  
> > >  	pc = lookup_page_cgroup(page);
> > > +	/* Can be NULL while bootstrapping the page allocator */
> > >  	if (likely(pc) && PageCgroupUsed(pc))
> > >  		return pc;
> > >  	return NULL;
> 
> We could add a lookup_page_cgroup_safe() for this DEBUG_VM-only
> callsite as an optimization separately and remove the NULL check from
> lookup_page_cgroup() itself.  But this patch was purely about removing
> the actively misleading checks.

Yes, but I am not sure whether code duplication is worth it. Let's just
stick with current form. Maybe just move the comment when it can be NULL
to the lookup_page_cgroup directly?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge()
  2011-11-24  9:04     ` Johannes Weiner
@ 2011-11-24 10:30       ` Michal Hocko
  2011-11-24 11:58         ` Johannes Weiner
  0 siblings, 1 reply; 45+ messages in thread
From: Michal Hocko @ 2011-11-24 10:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu 24-11-11 10:04:09, Johannes Weiner wrote:
> On Thu, Nov 24, 2011 at 09:04:43AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 23 Nov 2011 16:42:28 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > > From: Johannes Weiner <jweiner@redhat.com>
> > > 
> > > All callsites pass in freshly allocated pages and a valid mm.  As a
> > > result, all checks pertaining the page's mapcount, page->mapping or
> > > the fallback to init_mm are unneeded.
> > > 
> > > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> > 
> > Hmm, it's true now. But for making clear our assumption to all readers of code,
> > 
> > could you add
> > VM_BUG_ON(!mm || page_mapped(page) || (page->mapping && !PageAnon(page)) ?
> 
> Of course.  Please find the delta patch below.  I broke them up into
> three separate checks to make the problem easier to find if the BUG
> triggers.

sounds reasonable.

> 
> > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Thank you.
> 
> ---
> From: Johannes Weiner <jweiner@redhat.com>
> Subject: mm: memcg: remove unneeded checks from newpage_charge() fix
> 
> Document page state assumptions and catch if they are violated in
> newpage_charge().
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

I assume you are going to fold it into the previous one.
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 0d10be4..f338018 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2679,6 +2679,9 @@ int mem_cgroup_newpage_charge(struct page *page,
>  {
>  	if (mem_cgroup_disabled())
>  		return 0;
> +	VM_BUG_ON(page_mapped(page));
> +	VM_BUG_ON(page->mapping && !PageAnon(page));
> +	VM_BUG_ON(!mm);
>  	return mem_cgroup_charge_common(page, mm, gfp_mask,
>  					MEM_CGROUP_CHARGE_TYPE_MAPPED);
>  }
> -- 
> 1.7.6.4

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page()
  2011-11-24  9:06     ` Johannes Weiner
@ 2011-11-24 10:34       ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2011-11-24 10:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu 24-11-11 10:06:39, Johannes Weiner wrote:
> On Thu, Nov 24, 2011 at 09:06:19AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Wed, 23 Nov 2011 16:42:29 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > > From: Johannes Weiner <jweiner@redhat.com>
> > > 
> > > mem_cgroup_uncharge_page() is only called on either freshly allocated
> > > pages without page->mapping or on rmapped PageAnon() pages.  There is
> > > no need to check for a page->mapping that is not an anon_vma.
> > > 
> > > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> > 
> > For making our assumption clearer to readers of codes,
> > VM_BUG_ON(page->mapping && !PageAnon(page)) please.
> 
> Yep, delta patch below.
> 
> > Anyway,
> > Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Thanks!
> 
> ---
> From: Johannes Weiner <jweiner@redhat.com>
> Subject: mm: memcg: remove unneeded checks from uncharge_page() fix
> 
> Document page state assumptions and catch if they are violated in
> uncharge_page().
> 
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>

Same here
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2f1fdc4..872dae1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2992,6 +2992,7 @@ void mem_cgroup_uncharge_page(struct page *page)
>  	/* early check. */
>  	if (page_mapped(page))
>  		return;
> +	VM_BUG_ON(page->mapping && !PageAnon(page));
>  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_MAPPED);
>  }
>  
> -- 
> 1.7.6.4
> 

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge()
  2011-11-24 10:30       ` Michal Hocko
@ 2011-11-24 11:58         ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-24 11:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu, Nov 24, 2011 at 11:30:49AM +0100, Michal Hocko wrote:
> On Thu 24-11-11 10:04:09, Johannes Weiner wrote:
> > From: Johannes Weiner <jweiner@redhat.com>
> > Subject: mm: memcg: remove unneeded checks from newpage_charge() fix
> > 
> > Document page state assumptions and catch if they are violated in
> > newpage_charge().
> > 
> > Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> 
> I assume you are going to fold it into the previous one.
> Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks.

I'm under the assumption that incremental patches are better for
Andrew, but I forgot why.

But yes, this should be folded before going upstream.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task()
  2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
                     ` (2 preceding siblings ...)
  2011-11-24  9:07   ` Michal Hocko
@ 2011-11-28  0:37   ` Balbir Singh
  3 siblings, 0 replies; 45+ messages in thread
From: Balbir Singh @ 2011-11-28  0:37 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> From: Johannes Weiner <jweiner@redhat.com>
>
> The memcg argument of oom_kill_task() hasn't been used since 341aea2
> 'oom-kill: remove boost_dying_task_prio()'.  Kill it.
>
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>
> ---
>  mm/oom_kill.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 471dedb..fd9e303 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -423,7 +423,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
>  }
>
>  #define K(x) ((x) << (PAGE_SHIFT-10))
> -static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem)
> +static int oom_kill_task(struct task_struct *p)
>  {
>        struct task_struct *q;
>        struct mm_struct *mm;
> @@ -522,7 +522,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>                }
>        } while_each_thread(p, t);
>
> -       return oom_kill_task(victim, mem);
> +       return oom_kill_task(victim);
>  }
>

Looks good!

Balbir Singh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg
  2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
                     ` (2 preceding siblings ...)
  2011-11-24  9:17   ` Michal Hocko
@ 2011-11-28  0:42   ` Balbir Singh
  3 siblings, 0 replies; 45+ messages in thread
From: Balbir Singh @ 2011-11-28  0:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> From: Johannes Weiner <jweiner@redhat.com>
>
> Signed-off-by: Johannes Weiner <jweiner@redhat.com>


I am not very comfortable with memcgp, I'd prefer the name to
represent something more useful especially when it is an output
parameter. Having said that I don't think it is a stopper for the
patch

Balbir Singh.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 3/8] mm: memcg: clean up fault accounting
  2011-11-23 15:42 ` [patch 3/8] mm: memcg: clean up fault accounting Johannes Weiner
  2011-11-24  0:00   ` KAMEZAWA Hiroyuki
  2011-11-24  9:33   ` Michal Hocko
@ 2011-11-28  0:45   ` Balbir Singh
  2 siblings, 0 replies; 45+ messages in thread
From: Balbir Singh @ 2011-11-28  0:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> From: Johannes Weiner <jweiner@redhat.com>
>
> The fault accounting functions have a single, memcg-internal user, so
> they don't need to be global.  In fact, their one-line bodies can be
> directly folded into the caller.  And since faults happen one at a
> time, use this_cpu_inc() directly instead of this_cpu_add(foo, 1).
>

Acked-by: Balbir Singh <bsingharora@gmail.com>

Balbir Singh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-23 15:42 ` [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL Johannes Weiner
  2011-11-24  0:01   ` KAMEZAWA Hiroyuki
  2011-11-24  9:52   ` Michal Hocko
@ 2011-11-28  7:03   ` Balbir Singh
  2011-11-28  9:17     ` Johannes Weiner
  2 siblings, 1 reply; 45+ messages in thread
From: Balbir Singh @ 2011-11-28  7:03 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> From: Johannes Weiner <jweiner@redhat.com>
>
> Pages have their corresponding page_cgroup descriptors set up before
> they are used in userspace, and thus managed by a memory cgroup.
>
> The only time where lookup_page_cgroup() can return NULL is in the
> page sanity checking code that executes while feeding pages into the
> page allocator for the first time.
>

This is a legacy check from the days when we allocated PC during fault
time on demand. It might make sense to assert on !pc in DEBUG_VM mode
at some point in the future

Balbir

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-24 10:26       ` Michal Hocko
@ 2011-11-28  9:15         ` Johannes Weiner
  2011-11-28  9:34           ` Johannes Weiner
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-28  9:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Thu, Nov 24, 2011 at 11:26:06AM +0100, Michal Hocko wrote:
> On Thu 24-11-11 11:05:49, Johannes Weiner wrote:
> > On Thu, Nov 24, 2011 at 10:52:51AM +0100, Michal Hocko wrote:
> > > On Wed 23-11-11 16:42:27, Johannes Weiner wrote:
> > > > From: Johannes Weiner <jweiner@redhat.com>
> > > > 
> > > > Pages have their corresponding page_cgroup descriptors set up before
> > > > they are used in userspace, and thus managed by a memory cgroup.
> > > > 
> > > > The only time where lookup_page_cgroup() can return NULL is in the
> > > > page sanity checking code that executes while feeding pages into the
> > > > page allocator for the first time.
> > > > 
> > > > Remove the NULL checks against lookup_page_cgroup() results from all
> > > > callsites where we know that corresponding page_cgroup descriptors
> > > > must be allocated.
> > > 
> > > OK, shouldn't we add
> > > 
> > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > > index 2d123f9..cb93f64 100644
> > > --- a/mm/page_cgroup.c
> > > +++ b/mm/page_cgroup.c
> > > @@ -35,8 +35,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
> > >  	struct page_cgroup *base;
> > >  
> > >  	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
> > > -	if (unlikely(!base))
> > > -		return NULL;
> > > +	BUG_ON(!base);
> > >  
> > >  	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
> > >  	return base + offset;
> > > @@ -112,8 +111,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
> > >  	unsigned long pfn = page_to_pfn(page);
> > >  	struct mem_section *section = __pfn_to_section(pfn);
> > >  
> > > -	if (!section->page_cgroup)
> > > -		return NULL;
> > > +	BUG_ON(!section->page_cgroup);
> > >  	return section->page_cgroup + pfn;
> > >  }
> > >  
> > > just to make it explicit?
> > 
> > No, see the last hunk in this patch.  It's actually possible for this
> > to run, although only while feeding fresh pages into the allocator:
> 
> Bahh. Yes, I have noticed the hunk but then I started thinking about
> how to make the NULL case explicit and totally forgot about that.
> Sorry about the noise.
> 
> > 
> > > > @@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
> > > >  	struct page_cgroup *pc;
> > > >  
> > > >  	pc = lookup_page_cgroup(page);
> > > > +	/* Can be NULL while bootstrapping the page allocator */
> > > >  	if (likely(pc) && PageCgroupUsed(pc))
> > > >  		return pc;
> > > >  	return NULL;
> > 
> > We could add a lookup_page_cgroup_safe() for this DEBUG_VM-only
> > callsite as an optimization separately and remove the NULL check from
> > lookup_page_cgroup() itself.  But this patch was purely about removing
> > the actively misleading checks.
> 
> Yes, but I am not sure whether code duplication is worth it. Let's just
> stick with current form. Maybe just move the comment when it can be NULL
> to the lookup_page_cgroup directly?

Don't underestimate it, this function is used quite heavily while the
case of the array being NULL is a minor fraction of all calls.  But
it's for another patch, anyway.

The case for when lookup_page_cgroup() returns NULL is kinda obvious
to me when directly looking at the function itself, because the arrays
are allocated just a few lines below.  But care to send a patch?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-28  7:03   ` Balbir Singh
@ 2011-11-28  9:17     ` Johannes Weiner
  0 siblings, 0 replies; 45+ messages in thread
From: Johannes Weiner @ 2011-11-28  9:17 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Michal Hocko, cgroups,
	linux-mm, linux-kernel

On Mon, Nov 28, 2011 at 12:33:31PM +0530, Balbir Singh wrote:
> On Wed, Nov 23, 2011 at 9:12 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > From: Johannes Weiner <jweiner@redhat.com>
> >
> > Pages have their corresponding page_cgroup descriptors set up before
> > they are used in userspace, and thus managed by a memory cgroup.
> >
> > The only time where lookup_page_cgroup() can return NULL is in the
> > page sanity checking code that executes while feeding pages into the
> > page allocator for the first time.
> >
> 
> This is a legacy check from the days when we allocated PC during fault
> time on demand. It might make sense to assert on !pc in DEBUG_VM mode
> at some point in the future

I don't think a BUG_ON bears more information than a null-pointer
dereference.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-28  9:15         ` Johannes Weiner
@ 2011-11-28  9:34           ` Johannes Weiner
  2011-11-28 10:12             ` Michal Hocko
  0 siblings, 1 reply; 45+ messages in thread
From: Johannes Weiner @ 2011-11-28  9:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Mon, Nov 28, 2011 at 10:15:18AM +0100, Johannes Weiner wrote:
> On Thu, Nov 24, 2011 at 11:26:06AM +0100, Michal Hocko wrote:
> > On Thu 24-11-11 11:05:49, Johannes Weiner wrote:
> > > On Thu, Nov 24, 2011 at 10:52:51AM +0100, Michal Hocko wrote:
> > > > On Wed 23-11-11 16:42:27, Johannes Weiner wrote:
> > > > > From: Johannes Weiner <jweiner@redhat.com>
> > > > > 
> > > > > Pages have their corresponding page_cgroup descriptors set up before
> > > > > they are used in userspace, and thus managed by a memory cgroup.
> > > > > 
> > > > > The only time where lookup_page_cgroup() can return NULL is in the
> > > > > page sanity checking code that executes while feeding pages into the
> > > > > page allocator for the first time.
> > > > > 
> > > > > Remove the NULL checks against lookup_page_cgroup() results from all
> > > > > callsites where we know that corresponding page_cgroup descriptors
> > > > > must be allocated.
> > > > 
> > > > OK, shouldn't we add
> > > > 
> > > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > > > index 2d123f9..cb93f64 100644
> > > > --- a/mm/page_cgroup.c
> > > > +++ b/mm/page_cgroup.c
> > > > @@ -35,8 +35,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
> > > >  	struct page_cgroup *base;
> > > >  
> > > >  	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
> > > > -	if (unlikely(!base))
> > > > -		return NULL;
> > > > +	BUG_ON(!base);
> > > >  
> > > >  	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
> > > >  	return base + offset;
> > > > @@ -112,8 +111,7 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
> > > >  	unsigned long pfn = page_to_pfn(page);
> > > >  	struct mem_section *section = __pfn_to_section(pfn);
> > > >  
> > > > -	if (!section->page_cgroup)
> > > > -		return NULL;
> > > > +	BUG_ON(!section->page_cgroup);
> > > >  	return section->page_cgroup + pfn;
> > > >  }
> > > >  
> > > > just to make it explicit?
> > > 
> > > No, see the last hunk in this patch.  It's actually possible for this
> > > to run, although only while feeding fresh pages into the allocator:
> > 
> > Bahh. Yes, I have noticed the hunk but then I started thinking about
> > how to make the NULL case explicit and totally forgot about that.
> > Sorry about the noise.
> > 
> > > 
> > > > > @@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
> > > > >  	struct page_cgroup *pc;
> > > > >  
> > > > >  	pc = lookup_page_cgroup(page);
> > > > > +	/* Can be NULL while bootstrapping the page allocator */
> > > > >  	if (likely(pc) && PageCgroupUsed(pc))
> > > > >  		return pc;
> > > > >  	return NULL;
> > > 
> > > We could add a lookup_page_cgroup_safe() for this DEBUG_VM-only
> > > callsite as an optimization separately and remove the NULL check from
> > > lookup_page_cgroup() itself.  But this patch was purely about removing
> > > the actively misleading checks.
> > 
> > Yes, but I am not sure whether code duplication is worth it. Let's just
> > stick with current form. Maybe just move the comment when it can be NULL
> > to the lookup_page_cgroup directly?
> 
> Don't underestimate it, this function is used quite heavily while the
> case of the array being NULL is a minor fraction of all calls.  But
> it's for another patch, anyway.

Hm, how about this?

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index a14655d..58405ca 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -28,9 +28,16 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
 	struct page_cgroup *base;
 
 	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
+#ifdef CONFIG_DEBUG_VM
+	/*
+	 * The sanity checks the page allocator does upon freeing a
+	 * page can reach here before the page_cgroup arrays are
+	 * allocated when feeding a range of pages to the allocator
+	 * for the first time during bootup or memory hotplug.
+	 */
 	if (unlikely(!base))
 		return NULL;
-
+#endif
 	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
 	return base + offset;
 }
@@ -87,9 +94,16 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
 {
 	unsigned long pfn = page_to_pfn(page);
 	struct mem_section *section = __pfn_to_section(pfn);
-
+#ifdef CONFIG_DEBUG_VM
+	/*
+	 * The sanity checks the page allocator does upon freeing a
+	 * page can reach here before the page_cgroup arrays are
+	 * allocated when feeding a range of pages to the allocator
+	 * for the first time during bootup or memory hotplug.
+	 */
 	if (!section->page_cgroup)
 		return NULL;
+#endif
 	return section->page_cgroup + pfn;
 }
 

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL
  2011-11-28  9:34           ` Johannes Weiner
@ 2011-11-28 10:12             ` Michal Hocko
  0 siblings, 0 replies; 45+ messages in thread
From: Michal Hocko @ 2011-11-28 10:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, cgroups,
	linux-mm, linux-kernel

On Mon 28-11-11 10:34:35, Johannes Weiner wrote:
> On Mon, Nov 28, 2011 at 10:15:18AM +0100, Johannes Weiner wrote:
> > On Thu, Nov 24, 2011 at 11:26:06AM +0100, Michal Hocko wrote:
> > > On Thu 24-11-11 11:05:49, Johannes Weiner wrote:
> > > > On Thu, Nov 24, 2011 at 10:52:51AM +0100, Michal Hocko wrote:
> > > > > On Wed 23-11-11 16:42:27, Johannes Weiner wrote:
[...]
> > > > > > @@ -3326,6 +3321,7 @@ static struct page_cgroup *lookup_page_cgroup_used(struct page *page)
> > > > > >  	struct page_cgroup *pc;
> > > > > >  
> > > > > >  	pc = lookup_page_cgroup(page);
> > > > > > +	/* Can be NULL while bootstrapping the page allocator */
> > > > > >  	if (likely(pc) && PageCgroupUsed(pc))
> > > > > >  		return pc;
> > > > > >  	return NULL;
> > > > 
> > > > We could add a lookup_page_cgroup_safe() for this DEBUG_VM-only
> > > > callsite as an optimization separately and remove the NULL check from
> > > > lookup_page_cgroup() itself.  But this patch was purely about removing
> > > > the actively misleading checks.
> > > 
> > > Yes, but I am not sure whether code duplication is worth it. Let's just
> > > stick with current form. Maybe just move the comment when it can be NULL
> > > to the lookup_page_cgroup directly?
> > 
> > Don't underestimate it, this function is used quite heavily while the
> > case of the array being NULL is a minor fraction of all calls.  But
> > it's for another patch, anyway.
> 
> Hm, how about this?

yes, makes sense.
Thanks

> 
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index a14655d..58405ca 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -28,9 +28,16 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  	struct page_cgroup *base;
>  
>  	base = NODE_DATA(page_to_nid(page))->node_page_cgroup;
> +#ifdef CONFIG_DEBUG_VM
> +	/*
> +	 * The sanity checks the page allocator does upon freeing a
> +	 * page can reach here before the page_cgroup arrays are
> +	 * allocated when feeding a range of pages to the allocator
> +	 * for the first time during bootup or memory hotplug.
> +	 */
>  	if (unlikely(!base))
>  		return NULL;
> -
> +#endif
>  	offset = pfn - NODE_DATA(page_to_nid(page))->node_start_pfn;
>  	return base + offset;
>  }
> @@ -87,9 +94,16 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
>  {
>  	unsigned long pfn = page_to_pfn(page);
>  	struct mem_section *section = __pfn_to_section(pfn);
> -
> +#ifdef CONFIG_DEBUG_VM
> +	/*
> +	 * The sanity checks the page allocator does upon freeing a
> +	 * page can reach here before the page_cgroup arrays are
> +	 * allocated when feeding a range of pages to the allocator
> +	 * for the first time during bootup or memory hotplug.
> +	 */
>  	if (!section->page_cgroup)
>  		return NULL;
> +#endif
>  	return section->page_cgroup + pfn;
>  }
>  

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2011-11-28 10:12 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-23 15:42 [patch 0/8] mm: memcg fixlets for 3.3 Johannes Weiner
2011-11-23 15:42 ` [patch 1/8] mm: oom_kill: remove memcg argument from oom_kill_task() Johannes Weiner
2011-11-23 22:35   ` David Rientjes
2011-11-23 23:57   ` KAMEZAWA Hiroyuki
2011-11-24  9:07   ` Michal Hocko
2011-11-28  0:37   ` Balbir Singh
2011-11-23 15:42 ` [patch 2/8] mm: unify remaining mem_cont, mem, etc. variable names to memcg Johannes Weiner
2011-11-23 22:40   ` David Rientjes
2011-11-23 23:58   ` KAMEZAWA Hiroyuki
2011-11-24  9:17   ` Michal Hocko
2011-11-28  0:42   ` Balbir Singh
2011-11-23 15:42 ` [patch 3/8] mm: memcg: clean up fault accounting Johannes Weiner
2011-11-24  0:00   ` KAMEZAWA Hiroyuki
2011-11-24  9:33   ` Michal Hocko
2011-11-24  9:51     ` Johannes Weiner
2011-11-28  0:45   ` Balbir Singh
2011-11-23 15:42 ` [patch 4/8] mm: memcg: lookup_page_cgroup (almost) never returns NULL Johannes Weiner
2011-11-24  0:01   ` KAMEZAWA Hiroyuki
2011-11-24  9:52   ` Michal Hocko
2011-11-24 10:05     ` Johannes Weiner
2011-11-24 10:26       ` Michal Hocko
2011-11-28  9:15         ` Johannes Weiner
2011-11-28  9:34           ` Johannes Weiner
2011-11-28 10:12             ` Michal Hocko
2011-11-28  7:03   ` Balbir Singh
2011-11-28  9:17     ` Johannes Weiner
2011-11-23 15:42 ` [patch 5/8] mm: memcg: remove unneeded checks from newpage_charge() Johannes Weiner
2011-11-24  0:04   ` KAMEZAWA Hiroyuki
2011-11-24  9:04     ` Johannes Weiner
2011-11-24 10:30       ` Michal Hocko
2011-11-24 11:58         ` Johannes Weiner
2011-11-23 15:42 ` [patch 6/8] mm: memcg: remove unneeded checks from uncharge_page() Johannes Weiner
2011-11-24  0:06   ` KAMEZAWA Hiroyuki
2011-11-24  9:06     ` Johannes Weiner
2011-11-24 10:34       ` Michal Hocko
2011-11-23 15:42 ` [patch 7/8] mm: memcg: modify PageCgroupAcctLRU non-atomically Johannes Weiner
2011-11-23 18:52   ` Hugh Dickins
2011-11-24  8:53     ` Johannes Weiner
2011-11-24  0:09   ` KAMEZAWA Hiroyuki
2011-11-24  8:55     ` Johannes Weiner
2011-11-23 15:42 ` [patch 8/8] mm: memcg: modify PageCgroupCache non-atomically Johannes Weiner
2011-11-24  0:13   ` KAMEZAWA Hiroyuki
2011-11-24  9:13     ` Johannes Weiner
2011-11-24  6:09 ` [patch 0/8] mm: memcg fixlets for 3.3 Balbir Singh
2011-11-24  9:45   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).