linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] memcg: ready-to-go series (was memcg update v6)
@ 2008-09-29 10:19 KAMEZAWA Hiroyuki
  2008-09-29 10:21 ` [PATCH 1/4] memcg: account swap cache under lock KAMEZAWA Hiroyuki
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-29 10:19 UTC (permalink / raw)
  To: linux-mm; +Cc: LKML, balbir, xemul, nishimura

Cut out 4 patches from memcg update v5 series.
(Then, this is a part of v6)

I think we got some agreement on these 4.

please ack if ok.

[1/4] ...  account swap under lock
[2/4] ...  make page->mapping to be NULL before uncharge cache.
[3/4] ...  avoid accounting not-on-LRU pages.
[4/4] ...  optimize cpu stat

I still have 6 patches but it's under test and needs review and discussion.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] memcg: account swap cache under lock
  2008-09-29 10:19 [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) KAMEZAWA Hiroyuki
@ 2008-09-29 10:21 ` KAMEZAWA Hiroyuki
  2008-09-29 11:33   ` Daisuke Nishimura
  2008-09-30  8:05   ` Balbir Singh
  2008-09-29 10:22 ` [PATCH 2/4] memcg: set page->mapping NULL before uncharge KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-29 10:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, balbir, xemul, nishimura

While page-cache's charge/uncharge is done under page_lock(), swap-cache
isn't. (anonymous page is charged when it's newly allocated.)

This patch moves do_swap_page()'s charge() call under lock.
I don't see any bad problem *now* but this fix will be good for future
for avoiding unneccesary racy state.


Changelog: (v5) -> (v6)
 - mark_page_accessed() is moved before lock_page().
 - fixed missing unlock_page()
(no changes in previous version)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 mm/memory.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Index: mmotm-2.6.27-rc7+/mm/memory.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memory.c
+++ mmotm-2.6.27-rc7+/mm/memory.c
@@ -2326,16 +2326,17 @@ static int do_swap_page(struct mm_struct
 		count_vm_event(PGMAJFAULT);
 	}
 
+	mark_page_accessed(page);
+
+	lock_page(page);
+	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
+
 	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
-		delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 		ret = VM_FAULT_OOM;
+		unlock_page(page);
 		goto out;
 	}
 
-	mark_page_accessed(page);
-	lock_page(page);
-	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
-
 	/*
 	 * Back out if somebody else already faulted in this pte.
 	 */


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/4] memcg: set page->mapping NULL before uncharge
  2008-09-29 10:19 [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) KAMEZAWA Hiroyuki
  2008-09-29 10:21 ` [PATCH 1/4] memcg: account swap cache under lock KAMEZAWA Hiroyuki
@ 2008-09-29 10:22 ` KAMEZAWA Hiroyuki
  2008-09-29 11:39   ` Daisuke Nishimura
  2008-10-01  3:50   ` Balbir Singh
  2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-29 10:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, balbir, xemul, nishimura, Andrew Morton

This patch tries to make page->mapping to be NULL before
mem_cgroup_uncharge_cache_page() is called.

"page->mapping == NULL" is a good check for "whether the page is still
radix-tree or not".
This patch also adds BUG_ON() to mem_cgroup_uncharge_cache_page();


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 mm/filemap.c    |    2 +-
 mm/memcontrol.c |    1 +
 mm/migrate.c    |    9 +++++++--
 3 files changed, 9 insertions(+), 3 deletions(-)

Index: mmotm-2.6.27-rc7+/mm/filemap.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/filemap.c
+++ mmotm-2.6.27-rc7+/mm/filemap.c
@@ -116,12 +116,12 @@ void __remove_from_page_cache(struct pag
 {
 	struct address_space *mapping = page->mapping;
 
-	mem_cgroup_uncharge_cache_page(page);
 	radix_tree_delete(&mapping->page_tree, page->index);
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	BUG_ON(page_mapped(page));
+	mem_cgroup_uncharge_cache_page(page);
 
 	/*
 	 * Some filesystems seem to re-dirty the page even after
Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
+++ mmotm-2.6.27-rc7+/mm/memcontrol.c
@@ -734,6 +734,7 @@ void mem_cgroup_uncharge_page(struct pag
 void mem_cgroup_uncharge_cache_page(struct page *page)
 {
 	VM_BUG_ON(page_mapped(page));
+	VM_BUG_ON(page->mapping);
 	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
 }
 
Index: mmotm-2.6.27-rc7+/mm/migrate.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/migrate.c
+++ mmotm-2.6.27-rc7+/mm/migrate.c
@@ -330,8 +330,6 @@ static int migrate_page_move_mapping(str
 	__inc_zone_page_state(newpage, NR_FILE_PAGES);
 
 	spin_unlock_irq(&mapping->tree_lock);
-	if (!PageSwapCache(newpage))
-		mem_cgroup_uncharge_cache_page(page);
 
 	return 0;
 }
@@ -341,6 +339,8 @@ static int migrate_page_move_mapping(str
  */
 static void migrate_page_copy(struct page *newpage, struct page *page)
 {
+	int anon;
+
 	copy_highpage(newpage, page);
 
 	if (PageError(page))
@@ -378,8 +378,13 @@ static void migrate_page_copy(struct pag
 #endif
 	ClearPagePrivate(page);
 	set_page_private(page, 0);
+	/* page->mapping contains a flag for PageAnon() */
+	anon = PageAnon(page);
 	page->mapping = NULL;
 
+	if (!anon) /* This page was removed from radix-tree. */
+		mem_cgroup_uncharge_cache_page(page);
+
 	/*
 	 * If any waiters have accumulated on the new page then
 	 * wake them up.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/4] memcg: avoid account not-on-LRU pages
  2008-09-29 10:19 [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) KAMEZAWA Hiroyuki
  2008-09-29 10:21 ` [PATCH 1/4] memcg: account swap cache under lock KAMEZAWA Hiroyuki
  2008-09-29 10:22 ` [PATCH 2/4] memcg: set page->mapping NULL before uncharge KAMEZAWA Hiroyuki
@ 2008-09-29 10:23 ` KAMEZAWA Hiroyuki
  2008-09-29 11:19   ` Daisuke Nishimura
                     ` (3 more replies)
  2008-09-29 10:24 ` [PATCH 4/4] memcg: optimze cpustat KAMEZAWA Hiroyuki
  2008-10-06 17:15 ` [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) Balbir Singh
  4 siblings, 4 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-29 10:23 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, balbir, xemul, nishimura, Andrew Morton

There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to handle
specical case) We'd like to make use of usual objrmap/radix-tree's protcol
and don't want to account out-of-vm's control pages.

When special_mapping_fault() is called, page->mapping is tend to be NULL 
and it's charged as Anonymous page.
insert_page() also handles some special pages from drivers.

This patch is for avoiding to account special pages.

Changlog: v5 -> v6
  - modified Documentation.
  - fixed to charge only when a page is newly allocated.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 Documentation/controllers/memory.txt |   24 ++++++++++++++++--------
 mm/memory.c                          |   29 +++++++++++++----------------
 mm/rmap.c                            |    4 ++--
 3 files changed, 31 insertions(+), 26 deletions(-)

Index: mmotm-2.6.27-rc7+/mm/memory.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memory.c
+++ mmotm-2.6.27-rc7+/mm/memory.c
@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
 	pte_t *pte;
 	spinlock_t *ptl;
 
-	retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
-	if (retval)
-		goto out;
-
 	retval = -EINVAL;
 	if (PageAnon(page))
-		goto out_uncharge;
+		goto out;
 	retval = -ENOMEM;
 	flush_dcache_page(page);
 	pte = get_locked_pte(mm, addr, &ptl);
 	if (!pte)
-		goto out_uncharge;
+		goto out;
 	retval = -EBUSY;
 	if (!pte_none(*pte))
 		goto out_unlock;
@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
 	return retval;
 out_unlock:
 	pte_unmap_unlock(pte, ptl);
-out_uncharge:
-	mem_cgroup_uncharge_page(page);
 out:
 	return retval;
 }
@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
 	struct page *page;
 	pte_t entry;
 	int anon = 0;
+	int charged = 0;
 	struct page *dirty_page = NULL;
 	struct vm_fault vmf;
 	int ret;
@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
 				ret = VM_FAULT_OOM;
 				goto out;
 			}
+			if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
+				ret = VM_FAULT_OOM;
+				page_cache_release(page);
+				goto out;
+			}
+			charged = 1;
 			/*
 			 * Don't let another task, with possibly unlocked vma,
 			 * keep the mlocked page.
@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
 
 	}
 
-	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
-		ret = VM_FAULT_OOM;
-		goto out;
-	}
-
 	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
 
 	/*
@@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
 		/* no need to invalidate: a not-present page won't be cached */
 		update_mmu_cache(vma, address, entry);
 	} else {
-		mem_cgroup_uncharge_page(page);
-		if (anon)
+		if (charged)
+			mem_cgroup_uncharge_page(page);
+		if (anon) {
 			page_cache_release(page);
-		else
+		} else
 			anon = 1; /* no anon but release faulted_page */
 	}
 
Index: mmotm-2.6.27-rc7+/mm/rmap.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/rmap.c
+++ mmotm-2.6.27-rc7+/mm/rmap.c
@@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
 			page_clear_dirty(page);
 			set_page_dirty(page);
 		}
-
-		mem_cgroup_uncharge_page(page);
+		if (PageAnon(page))
+			mem_cgroup_uncharge_page(page);
 		__dec_zone_page_state(page,
 			PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
 		/*
Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
===================================================================
--- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
+++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
@@ -112,14 +112,22 @@ the per cgroup LRU.
 
 2.2.1 Accounting details
 
-All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
-RSS pages are accounted at the time of page_add_*_rmap() unless they've already
-been accounted for earlier. A file page will be accounted for as Page Cache;
-it's mapped into the page tables of a process, duplicate accounting is carefully
-avoided. Page Cache pages are accounted at the time of add_to_page_cache().
-The corresponding routines that remove a page from the page tables or removes
-a page from Page Cache is used to decrement the accounting counters of the
-cgroup.
+All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
+(some pages which never be reclaimable and will not be on global LRU
+ are not accounted. we just accounts pages under usual vm management.)
+
+RSS pages are accounted at page_fault unless they've already been accounted
+for earlier. A file page will be accounted for as Page Cache when it's
+inserted into inode (radix-tree). While it's mapped into the page tables of
+processes, duplicate accounting is carefully avoided.
+
+A RSS page is unaccounted when it's fully unmapped. A PageCache page is
+unaccounted when it's removed from radix-tree.
+
+At page migration, accounting information is kept.
+
+Note: we just account pages-on-lru because our purpose is to control amount
+of used pages. not-on-lru pages are tend to be out-of-control from vm view.
 
 2.3 Shared Page Accounting
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/4] memcg: optimze cpustat
  2008-09-29 10:19 [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
@ 2008-09-29 10:24 ` KAMEZAWA Hiroyuki
  2008-09-29 11:44   ` Daisuke Nishimura
  2008-10-06 17:15 ` [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) Balbir Singh
  4 siblings, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-29 10:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, balbir, xemul, nishimura, Andrew Morton

Some obvious optimization to memcg.

I found mem_cgroup_charge_statistics() is a little big (in object) and
does unnecessary address calclation.
This patch is for optimization to reduce the size of this function.

And res_counter_charge() is 'likely' to success.

Changlog: v5->v6
 - patch series was reordered and needs some adjustment. no changes in logic.
Changelog v3->v4:
 - merged with an other leaf patch.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>


 mm/memcontrol.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
+++ mmotm-2.6.27-rc7+/mm/memcontrol.c
@@ -66,11 +66,10 @@ struct mem_cgroup_stat {
 /*
  * For accounting under irq disable, no need for increment preempt count.
  */
-static void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat *stat,
+static inline void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat_cpu *stat,
 		enum mem_cgroup_stat_index idx, int val)
 {
-	int cpu = smp_processor_id();
-	stat->cpustat[cpu].count[idx] += val;
+	stat->count[idx] += val;
 }
 
 static s64 mem_cgroup_read_stat(struct mem_cgroup_stat *stat,
@@ -190,18 +189,21 @@ static void mem_cgroup_charge_statistics
 {
 	int val = (charge)? 1 : -1;
 	struct mem_cgroup_stat *stat = &mem->stat;
+	struct mem_cgroup_stat_cpu *cpustat;
 
 	VM_BUG_ON(!irqs_disabled());
+
+	cpustat = &stat->cpustat[smp_processor_id()];
 	if (flags & PAGE_CGROUP_FLAG_CACHE)
-		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, val);
+		__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_CACHE, val);
 	else
-		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
+		__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS, val);
 
 	if (charge)
-		__mem_cgroup_stat_add_safe(stat,
+		__mem_cgroup_stat_add_safe(cpustat,
 				MEM_CGROUP_STAT_PGPGIN_COUNT, 1);
 	else
-		__mem_cgroup_stat_add_safe(stat,
+		__mem_cgroup_stat_add_safe(cpustat,
 				MEM_CGROUP_STAT_PGPGOUT_COUNT, 1);
 }
 
@@ -558,7 +560,7 @@ static int mem_cgroup_charge_common(stru
 		css_get(&memcg->css);
 	}
 
-	while (res_counter_charge(&mem->res, PAGE_SIZE)) {
+	while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
 		if (!(gfp_mask & __GFP_WAIT))
 			goto out;
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
  2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
@ 2008-09-29 11:19   ` Daisuke Nishimura
  2008-09-29 11:59   ` kamezawa.hiroyu
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Daisuke Nishimura @ 2008-09-29 11:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: nishimura, linux-mm, LKML, balbir, xemul, Andrew Morton

On Mon, 29 Sep 2008 19:23:39 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
> 
> When special_mapping_fault() is called, page->mapping is tend to be NULL 
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
> 
> This patch is for avoiding to account special pages.
> 
> Changlog: v5 -> v6
>   - modified Documentation.
>   - fixed to charge only when a page is newly allocated.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
>  Documentation/controllers/memory.txt |   24 ++++++++++++++++--------
>  mm/memory.c                          |   29 +++++++++++++----------------
>  mm/rmap.c                            |    4 ++--
>  3 files changed, 31 insertions(+), 26 deletions(-)
> 
> Index: mmotm-2.6.27-rc7+/mm/memory.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
> +++ mmotm-2.6.27-rc7+/mm/memory.c
> @@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
>  	pte_t *pte;
>  	spinlock_t *ptl;
>  
> -	retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
> -	if (retval)
> -		goto out;
> -
>  	retval = -EINVAL;
>  	if (PageAnon(page))
> -		goto out_uncharge;
> +		goto out;
>  	retval = -ENOMEM;
>  	flush_dcache_page(page);
>  	pte = get_locked_pte(mm, addr, &ptl);
>  	if (!pte)
> -		goto out_uncharge;
> +		goto out;
>  	retval = -EBUSY;
>  	if (!pte_none(*pte))
>  		goto out_unlock;
> @@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
>  	return retval;
>  out_unlock:
>  	pte_unmap_unlock(pte, ptl);
> -out_uncharge:
> -	mem_cgroup_uncharge_page(page);
>  out:
>  	return retval;
>  }
> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
>  	struct page *page;
>  	pte_t entry;
>  	int anon = 0;
> +	int charged = 0;
>  	struct page *dirty_page = NULL;
>  	struct vm_fault vmf;
>  	int ret;
> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
>  				ret = VM_FAULT_OOM;
>  				goto out;
>  			}
> +			if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> +				ret = VM_FAULT_OOM;
> +				page_cache_release(page);
> +				goto out;
> +			}
> +			charged = 1;
>  			/*
>  			 * Don't let another task, with possibly unlocked vma,
>  			 * keep the mlocked page.
> @@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
>  
>  	}
>  
> -	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> -		ret = VM_FAULT_OOM;
> -		goto out;
> -	}
> -
>  	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>  
>  	/*
> @@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
>  		/* no need to invalidate: a not-present page won't be cached */
>  		update_mmu_cache(vma, address, entry);
>  	} else {
> -		mem_cgroup_uncharge_page(page);
> -		if (anon)
> +		if (charged)
> +			mem_cgroup_uncharge_page(page);
> +		if (anon) {
>  			page_cache_release(page);
> -		else
> +		} else
>  			anon = 1; /* no anon but release faulted_page */
>  	}
>

checkpatch reports a warning here.

I think it should be like

@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		/* no need to invalidate: a not-present page won't be cached */
 		update_mmu_cache(vma, address, entry);
 	} else {
-		mem_cgroup_uncharge_page(page);
+		if (charged)
+			mem_cgroup_uncharge_page(page);
 		if (anon)
 			page_cache_release(page);
 		else


Thanks,
Daisuke Nishimura.

> Index: mmotm-2.6.27-rc7+/mm/rmap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> +++ mmotm-2.6.27-rc7+/mm/rmap.c
> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
>  			page_clear_dirty(page);
>  			set_page_dirty(page);
>  		}
> -
> -		mem_cgroup_uncharge_page(page);
> +		if (PageAnon(page))
> +			mem_cgroup_uncharge_page(page);
>  		__dec_zone_page_state(page,
>  			PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
>  		/*
> Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
> +++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
> @@ -112,14 +112,22 @@ the per cgroup LRU.
>  
>  2.2.1 Accounting details
>  
> -All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
> -RSS pages are accounted at the time of page_add_*_rmap() unless they've already
> -been accounted for earlier. A file page will be accounted for as Page Cache;
> -it's mapped into the page tables of a process, duplicate accounting is carefully
> -avoided. Page Cache pages are accounted at the time of add_to_page_cache().
> -The corresponding routines that remove a page from the page tables or removes
> -a page from Page Cache is used to decrement the accounting counters of the
> -cgroup.
> +All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
> +(some pages which never be reclaimable and will not be on global LRU
> + are not accounted. we just accounts pages under usual vm management.)
> +
> +RSS pages are accounted at page_fault unless they've already been accounted
> +for earlier. A file page will be accounted for as Page Cache when it's
> +inserted into inode (radix-tree). While it's mapped into the page tables of
> +processes, duplicate accounting is carefully avoided.
> +
> +A RSS page is unaccounted when it's fully unmapped. A PageCache page is
> +unaccounted when it's removed from radix-tree.
> +
> +At page migration, accounting information is kept.
> +
> +Note: we just account pages-on-lru because our purpose is to control amount
> +of used pages. not-on-lru pages are tend to be out-of-control from vm view.
>  
>  2.3 Shared Page Accounting
>  
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] memcg: account swap cache under lock
  2008-09-29 10:21 ` [PATCH 1/4] memcg: account swap cache under lock KAMEZAWA Hiroyuki
@ 2008-09-29 11:33   ` Daisuke Nishimura
  2008-09-30  8:05   ` Balbir Singh
  1 sibling, 0 replies; 17+ messages in thread
From: Daisuke Nishimura @ 2008-09-29 11:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: nishimura, linux-mm, LKML, balbir, xemul

On Mon, 29 Sep 2008 19:21:23 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> While page-cache's charge/uncharge is done under page_lock(), swap-cache
> isn't. (anonymous page is charged when it's newly allocated.)
> 
> This patch moves do_swap_page()'s charge() call under lock.
> I don't see any bad problem *now* but this fix will be good for future
> for avoiding unneccesary racy state.
> 
> 
> Changelog: (v5) -> (v6)
>  - mark_page_accessed() is moved before lock_page().
>  - fixed missing unlock_page()
> (no changes in previous version)
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
Looks good to me.

	Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

>  mm/memory.c |   11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> Index: mmotm-2.6.27-rc7+/mm/memory.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
> +++ mmotm-2.6.27-rc7+/mm/memory.c
> @@ -2326,16 +2326,17 @@ static int do_swap_page(struct mm_struct
>  		count_vm_event(PGMAJFAULT);
>  	}
>  
> +	mark_page_accessed(page);
> +
> +	lock_page(page);
> +	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> +
>  	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> -		delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
>  		ret = VM_FAULT_OOM;
> +		unlock_page(page);
>  		goto out;
>  	}
>  
> -	mark_page_accessed(page);
> -	lock_page(page);
> -	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> -
>  	/*
>  	 * Back out if somebody else already faulted in this pte.
>  	 */
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] memcg: set page->mapping NULL before uncharge
  2008-09-29 10:22 ` [PATCH 2/4] memcg: set page->mapping NULL before uncharge KAMEZAWA Hiroyuki
@ 2008-09-29 11:39   ` Daisuke Nishimura
  2008-10-01  3:50   ` Balbir Singh
  1 sibling, 0 replies; 17+ messages in thread
From: Daisuke Nishimura @ 2008-09-29 11:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: nishimura, linux-mm, LKML, balbir, xemul, Andrew Morton

On Mon, 29 Sep 2008 19:22:40 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> This patch tries to make page->mapping to be NULL before
> mem_cgroup_uncharge_cache_page() is called.
> 
> "page->mapping == NULL" is a good check for "whether the page is still
> radix-tree or not".
> This patch also adds BUG_ON() to mem_cgroup_uncharge_cache_page();
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
Looks good to me.

	Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

>  mm/filemap.c    |    2 +-
>  mm/memcontrol.c |    1 +
>  mm/migrate.c    |    9 +++++++--
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> Index: mmotm-2.6.27-rc7+/mm/filemap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/filemap.c
> +++ mmotm-2.6.27-rc7+/mm/filemap.c
> @@ -116,12 +116,12 @@ void __remove_from_page_cache(struct pag
>  {
>  	struct address_space *mapping = page->mapping;
>  
> -	mem_cgroup_uncharge_cache_page(page);
>  	radix_tree_delete(&mapping->page_tree, page->index);
>  	page->mapping = NULL;
>  	mapping->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
>  	BUG_ON(page_mapped(page));
> +	mem_cgroup_uncharge_cache_page(page);
>  
>  	/*
>  	 * Some filesystems seem to re-dirty the page even after
> Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
> +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
> @@ -734,6 +734,7 @@ void mem_cgroup_uncharge_page(struct pag
>  void mem_cgroup_uncharge_cache_page(struct page *page)
>  {
>  	VM_BUG_ON(page_mapped(page));
> +	VM_BUG_ON(page->mapping);
>  	__mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE);
>  }
>  
> Index: mmotm-2.6.27-rc7+/mm/migrate.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/migrate.c
> +++ mmotm-2.6.27-rc7+/mm/migrate.c
> @@ -330,8 +330,6 @@ static int migrate_page_move_mapping(str
>  	__inc_zone_page_state(newpage, NR_FILE_PAGES);
>  
>  	spin_unlock_irq(&mapping->tree_lock);
> -	if (!PageSwapCache(newpage))
> -		mem_cgroup_uncharge_cache_page(page);
>  
>  	return 0;
>  }
> @@ -341,6 +339,8 @@ static int migrate_page_move_mapping(str
>   */
>  static void migrate_page_copy(struct page *newpage, struct page *page)
>  {
> +	int anon;
> +
>  	copy_highpage(newpage, page);
>  
>  	if (PageError(page))
> @@ -378,8 +378,13 @@ static void migrate_page_copy(struct pag
>  #endif
>  	ClearPagePrivate(page);
>  	set_page_private(page, 0);
> +	/* page->mapping contains a flag for PageAnon() */
> +	anon = PageAnon(page);
>  	page->mapping = NULL;
>  
> +	if (!anon) /* This page was removed from radix-tree. */
> +		mem_cgroup_uncharge_cache_page(page);
> +
>  	/*
>  	 * If any waiters have accumulated on the new page then
>  	 * wake them up.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] memcg: optimze cpustat
  2008-09-29 10:24 ` [PATCH 4/4] memcg: optimze cpustat KAMEZAWA Hiroyuki
@ 2008-09-29 11:44   ` Daisuke Nishimura
  0 siblings, 0 replies; 17+ messages in thread
From: Daisuke Nishimura @ 2008-09-29 11:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: nishimura, linux-mm, LKML, balbir, xemul, Andrew Morton

On Mon, 29 Sep 2008 19:24:30 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Some obvious optimization to memcg.
> 
> I found mem_cgroup_charge_statistics() is a little big (in object) and
> does unnecessary address calclation.
> This patch is for optimization to reduce the size of this function.
> 
> And res_counter_charge() is 'likely' to success.
> 
> Changlog: v5->v6
>  - patch series was reordered and needs some adjustment. no changes in logic.
> Changelog v3->v4:
>  - merged with an other leaf patch.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> 
	Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

I'll test with all the patches applied tonight.


Thanks,
Daisuke Nishimura.

>  mm/memcontrol.c |   18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> Index: mmotm-2.6.27-rc7+/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/memcontrol.c
> +++ mmotm-2.6.27-rc7+/mm/memcontrol.c
> @@ -66,11 +66,10 @@ struct mem_cgroup_stat {
>  /*
>   * For accounting under irq disable, no need for increment preempt count.
>   */
> -static void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat *stat,
> +static inline void __mem_cgroup_stat_add_safe(struct mem_cgroup_stat_cpu *stat,
>  		enum mem_cgroup_stat_index idx, int val)
>  {
> -	int cpu = smp_processor_id();
> -	stat->cpustat[cpu].count[idx] += val;
> +	stat->count[idx] += val;
>  }
>  
>  static s64 mem_cgroup_read_stat(struct mem_cgroup_stat *stat,
> @@ -190,18 +189,21 @@ static void mem_cgroup_charge_statistics
>  {
>  	int val = (charge)? 1 : -1;
>  	struct mem_cgroup_stat *stat = &mem->stat;
> +	struct mem_cgroup_stat_cpu *cpustat;
>  
>  	VM_BUG_ON(!irqs_disabled());
> +
> +	cpustat = &stat->cpustat[smp_processor_id()];
>  	if (flags & PAGE_CGROUP_FLAG_CACHE)
> -		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_CACHE, val);
> +		__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_CACHE, val);
>  	else
> -		__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
> +		__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS, val);
>  
>  	if (charge)
> -		__mem_cgroup_stat_add_safe(stat,
> +		__mem_cgroup_stat_add_safe(cpustat,
>  				MEM_CGROUP_STAT_PGPGIN_COUNT, 1);
>  	else
> -		__mem_cgroup_stat_add_safe(stat,
> +		__mem_cgroup_stat_add_safe(cpustat,
>  				MEM_CGROUP_STAT_PGPGOUT_COUNT, 1);
>  }
>  
> @@ -558,7 +560,7 @@ static int mem_cgroup_charge_common(stru
>  		css_get(&memcg->css);
>  	}
>  
> -	while (res_counter_charge(&mem->res, PAGE_SIZE)) {
> +	while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
>  		if (!(gfp_mask & __GFP_WAIT))
>  			goto out;
>  
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
  2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
  2008-09-29 11:19   ` Daisuke Nishimura
@ 2008-09-29 11:59   ` kamezawa.hiroyu
  2008-09-30  1:17   ` [PATCH/stylefix " KAMEZAWA Hiroyuki
  2008-09-30  8:14   ` [PATCH " Balbir Singh
  3 siblings, 0 replies; 17+ messages in thread
From: kamezawa.hiroyu @ 2008-09-29 11:59 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, nishimura, linux-mm, LKML, balbir, xemul,
	Andrew Morton

----- Original Message -----
>On Mon, 29 Sep 2008 19:23:39 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fuj
itsu.com> wrote:
>> There are not-on-LRU pages which can be mapped and they are not worth to
>> be accounted. (becasue we can't shrink them and need dirty codes to handle
>> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
>> and don't want to account out-of-vm's control pages.
>> 
>> When special_mapping_fault() is called, page->mapping is tend to be NULL 
>> and it's charged as Anonymous page.
>> insert_page() also handles some special pages from drivers.
>> 
>> This patch is for avoiding to account special pages.
>> 
>> Changlog: v5 -> v6
>>   - modified Documentation.
>>   - fixed to charge only when a page is newly allocated.
>> 
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> 
>>  Documentation/controllers/memory.txt |   24 ++++++++++++++++--------
>>  mm/memory.c                          |   29 +++++++++++++----------------
>>  mm/rmap.c                            |    4 ++--
>>  3 files changed, 31 insertions(+), 26 deletions(-)
>> 
>> Index: mmotm-2.6.27-rc7+/mm/memory.c
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/mm/memory.c
>> +++ mmotm-2.6.27-rc7+/mm/memory.c
>> @@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
>>  	pte_t *pte;
>>  	spinlock_t *ptl;
>>  
>> -	retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
>> -	if (retval)
>> -		goto out;
>> -
>>  	retval = -EINVAL;
>>  	if (PageAnon(page))
>> -		goto out_uncharge;
>> +		goto out;
>>  	retval = -ENOMEM;
>>  	flush_dcache_page(page);
>>  	pte = get_locked_pte(mm, addr, &ptl);
>>  	if (!pte)
>> -		goto out_uncharge;
>> +		goto out;
>>  	retval = -EBUSY;
>>  	if (!pte_none(*pte))
>>  		goto out_unlock;
>> @@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
>>  	return retval;
>>  out_unlock:
>>  	pte_unmap_unlock(pte, ptl);
>> -out_uncharge:
>> -	mem_cgroup_uncharge_page(page);
>>  out:
>>  	return retval;
>>  }
>> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
>>  	struct page *page;
>>  	pte_t entry;
>>  	int anon = 0;
>> +	int charged = 0;
>>  	struct page *dirty_page = NULL;
>>  	struct vm_fault vmf;
>>  	int ret;
>> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
>>  				ret = VM_FAULT_OOM;
>>  				goto out;
>>  			}
>> +			if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
>> +				ret = VM_FAULT_OOM;
>> +				page_cache_release(page);
>> +				goto out;
>> +			}
>> +			charged = 1;
>>  			/*
>>  			 * Don't let another task, with possibly unlocked vma,
>>  			 * keep the mlocked page.
>> @@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
>>  
>>  	}
>>  
>> -	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
>> -		ret = VM_FAULT_OOM;
>> -		goto out;
>> -	}
>> -
>>  	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>>  
>>  	/*
>> @@ -2585,10 +2581,11 @@ static int __do_fault(struct mm_struct *
>>  		/* no need to invalidate: a not-present page won't be cached */
>>  		update_mmu_cache(vma, address, entry);
>>  	} else {
>> -		mem_cgroup_uncharge_page(page);
>> -		if (anon)
>> +		if (charged)
>> +			mem_cgroup_uncharge_page(page);
>> +		if (anon) {
>>  			page_cache_release(page);
>> -		else
>> +		} else
>>  			anon = 1; /* no anon but release faulted_page */
>>  	}
>>
>
>checkpatch reports a warning here.
>
>I think it should be like
>
Oh, thanks. I'll post fixed one, tomorrow.

Thanks,
-Kame

>@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *mm, struct vm_a
rea_struct *vma,
> 		/* no need to invalidate: a not-present page won't be cached */
> 		update_mmu_cache(vma, address, entry);
> 	} else {
>-		mem_cgroup_uncharge_page(page);
>+		if (charged)
>+			mem_cgroup_uncharge_page(page);
> 		if (anon)
> 			page_cache_release(page);
> 		else
>
>
>Thanks,
>Daisuke Nishimura.
>
>> Index: mmotm-2.6.27-rc7+/mm/rmap.c
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
>> +++ mmotm-2.6.27-rc7+/mm/rmap.c
>> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
>>  			page_clear_dirty(page);
>>  			set_page_dirty(page);
>>  		}
>> -
>> -		mem_cgroup_uncharge_page(page);
>> +		if (PageAnon(page))
>> +			mem_cgroup_uncharge_page(page);
>>  		__dec_zone_page_state(page,
>>  			PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
>>  		/*
>> Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
>> ===================================================================
>> --- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
>> +++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
>> @@ -112,14 +112,22 @@ the per cgroup LRU.
>>  
>>  2.2.1 Accounting details
>>  
>> -All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
>> -RSS pages are accounted at the time of page_add_*_rmap() unless they've al
ready
>> -been accounted for earlier. A file page will be accounted for as Page Cach
e;
>> -it's mapped into the page tables of a process, duplicate accounting is car
efully
>> -avoided. Page Cache pages are accounted at the time of add_to_page_cache()
.
>> -The corresponding routines that remove a page from the page tables or remo
ves
>> -a page from Page Cache is used to decrement the accounting counters of the
>> -cgroup.
>> +All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
>> +(some pages which never be reclaimable and will not be on global LRU
>> + are not accounted. we just accounts pages under usual vm management.)
>> +
>> +RSS pages are accounted at page_fault unless they've already been accounte
d
>> +for earlier. A file page will be accounted for as Page Cache when it's
>> +inserted into inode (radix-tree). While it's mapped into the page tables o
f
>> +processes, duplicate accounting is carefully avoided.
>> +
>> +A RSS page is unaccounted when it's fully unmapped. A PageCache page is
>> +unaccounted when it's removed from radix-tree.
>> +
>> +At page migration, accounting information is kept.
>> +
>> +Note: we just account pages-on-lru because our purpose is to control amoun
t
>> +of used pages. not-on-lru pages are tend to be out-of-control from vm view
.
>>  
>>  2.3 Shared Page Accounting
>>  
>> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
  2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
  2008-09-29 11:19   ` Daisuke Nishimura
  2008-09-29 11:59   ` kamezawa.hiroyu
@ 2008-09-30  1:17   ` KAMEZAWA Hiroyuki
  2008-10-01  3:49     ` Balbir Singh
  2008-09-30  8:14   ` [PATCH " Balbir Singh
  3 siblings, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-30  1:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, balbir, xemul, nishimura, Andrew Morton

This is conding-style fixed version. Thank you, Nishimura-san.
-Kmae
==
There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to handle
specical case) We'd like to make use of usual objrmap/radix-tree's protcol
and don't want to account out-of-vm's control pages.

When special_mapping_fault() is called, page->mapping is tend to be NULL 
and it's charged as Anonymous page.
insert_page() also handles some special pages from drivers.

This patch is for avoiding to account special pages.

Changlog: v5 -> v6
  - modified Documentation.
  - fixed to charge only when a page is newly allocated.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

 Documentation/controllers/memory.txt |   24 ++++++++++++++++--------
 mm/memory.c                          |   25 +++++++++++--------------
 mm/rmap.c                            |    4 ++--
 3 files changed, 29 insertions(+), 24 deletions(-)

Index: mmotm-2.6.27-rc7+/mm/memory.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/memory.c
+++ mmotm-2.6.27-rc7+/mm/memory.c
@@ -1323,18 +1323,14 @@ static int insert_page(struct vm_area_st
 	pte_t *pte;
 	spinlock_t *ptl;
 
-	retval = mem_cgroup_charge(page, mm, GFP_KERNEL);
-	if (retval)
-		goto out;
-
 	retval = -EINVAL;
 	if (PageAnon(page))
-		goto out_uncharge;
+		goto out;
 	retval = -ENOMEM;
 	flush_dcache_page(page);
 	pte = get_locked_pte(mm, addr, &ptl);
 	if (!pte)
-		goto out_uncharge;
+		goto out;
 	retval = -EBUSY;
 	if (!pte_none(*pte))
 		goto out_unlock;
@@ -1350,8 +1346,6 @@ static int insert_page(struct vm_area_st
 	return retval;
 out_unlock:
 	pte_unmap_unlock(pte, ptl);
-out_uncharge:
-	mem_cgroup_uncharge_page(page);
 out:
 	return retval;
 }
@@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
 	struct page *page;
 	pte_t entry;
 	int anon = 0;
+	int charged = 0;
 	struct page *dirty_page = NULL;
 	struct vm_fault vmf;
 	int ret;
@@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
 				ret = VM_FAULT_OOM;
 				goto out;
 			}
+			if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
+				ret = VM_FAULT_OOM;
+				page_cache_release(page);
+				goto out;
+			}
+			charged = 1;
 			/*
 			 * Don't let another task, with possibly unlocked vma,
 			 * keep the mlocked page.
@@ -2543,11 +2544,6 @@ static int __do_fault(struct mm_struct *
 
 	}
 
-	if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
-		ret = VM_FAULT_OOM;
-		goto out;
-	}
-
 	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
 
 	/*
@@ -2585,7 +2581,8 @@ static int __do_fault(struct mm_struct *
 		/* no need to invalidate: a not-present page won't be cached */
 		update_mmu_cache(vma, address, entry);
 	} else {
-		mem_cgroup_uncharge_page(page);
+		if (charged)
+			mem_cgroup_uncharge_page(page);
 		if (anon)
 			page_cache_release(page);
 		else
Index: mmotm-2.6.27-rc7+/mm/rmap.c
===================================================================
--- mmotm-2.6.27-rc7+.orig/mm/rmap.c
+++ mmotm-2.6.27-rc7+/mm/rmap.c
@@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
 			page_clear_dirty(page);
 			set_page_dirty(page);
 		}
-
-		mem_cgroup_uncharge_page(page);
+		if (PageAnon(page))
+			mem_cgroup_uncharge_page(page);
 		__dec_zone_page_state(page,
 			PageAnon(page) ? NR_ANON_PAGES : NR_FILE_MAPPED);
 		/*
Index: mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
===================================================================
--- mmotm-2.6.27-rc7+.orig/Documentation/controllers/memory.txt
+++ mmotm-2.6.27-rc7+/Documentation/controllers/memory.txt
@@ -112,14 +112,22 @@ the per cgroup LRU.
 
 2.2.1 Accounting details
 
-All mapped pages (RSS) and unmapped user pages (Page Cache) are accounted.
-RSS pages are accounted at the time of page_add_*_rmap() unless they've already
-been accounted for earlier. A file page will be accounted for as Page Cache;
-it's mapped into the page tables of a process, duplicate accounting is carefully
-avoided. Page Cache pages are accounted at the time of add_to_page_cache().
-The corresponding routines that remove a page from the page tables or removes
-a page from Page Cache is used to decrement the accounting counters of the
-cgroup.
+All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
+(some pages which never be reclaimable and will not be on global LRU
+ are not accounted. we just accounts pages under usual vm management.)
+
+RSS pages are accounted at page_fault unless they've already been accounted
+for earlier. A file page will be accounted for as Page Cache when it's
+inserted into inode (radix-tree). While it's mapped into the page tables of
+processes, duplicate accounting is carefully avoided.
+
+A RSS page is unaccounted when it's fully unmapped. A PageCache page is
+unaccounted when it's removed from radix-tree.
+
+At page migration, accounting information is kept.
+
+Note: we just account pages-on-lru because our purpose is to control amount
+of used pages. not-on-lru pages are tend to be out-of-control from vm view.
 
 2.3 Shared Page Accounting
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] memcg: account swap cache under lock
  2008-09-29 10:21 ` [PATCH 1/4] memcg: account swap cache under lock KAMEZAWA Hiroyuki
  2008-09-29 11:33   ` Daisuke Nishimura
@ 2008-09-30  8:05   ` Balbir Singh
  1 sibling, 0 replies; 17+ messages in thread
From: Balbir Singh @ 2008-09-30  8:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, xemul, nishimura

KAMEZAWA Hiroyuki wrote:
> While page-cache's charge/uncharge is done under page_lock(), swap-cache
> isn't. (anonymous page is charged when it's newly allocated.)
> 
> This patch moves do_swap_page()'s charge() call under lock.
> I don't see any bad problem *now* but this fix will be good for future
> for avoiding unneccesary racy state.
> 
> 
> Changelog: (v5) -> (v6)
>  - mark_page_accessed() is moved before lock_page().
>  - fixed missing unlock_page()
> (no changes in previous version)

Looks good to me

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Balbir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] memcg: avoid account not-on-LRU pages
  2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
                     ` (2 preceding siblings ...)
  2008-09-30  1:17   ` [PATCH/stylefix " KAMEZAWA Hiroyuki
@ 2008-09-30  8:14   ` Balbir Singh
  3 siblings, 0 replies; 17+ messages in thread
From: Balbir Singh @ 2008-09-30  8:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, xemul, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> +		if (charged)
> +			mem_cgroup_uncharge_page(page);
> +		if (anon) {
>  			page_cache_release(page);
> -		else
> +		} else
>  			anon = 1; /* no anon but release faulted_page */
>  	}

Coding style braces for a single-line if-else, please recheck

-- 
	Balbir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
  2008-09-30  1:17   ` [PATCH/stylefix " KAMEZAWA Hiroyuki
@ 2008-10-01  3:49     ` Balbir Singh
  2008-10-01  4:50       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Balbir Singh @ 2008-10-01  3:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, xemul, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> This is conding-style fixed version. Thank you, Nishimura-san.
> -Kmae
> ==
> There are not-on-LRU pages which can be mapped and they are not worth to
> be accounted. (becasue we can't shrink them and need dirty codes to handle
> specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> and don't want to account out-of-vm's control pages.
> 
> When special_mapping_fault() is called, page->mapping is tend to be NULL 
> and it's charged as Anonymous page.
> insert_page() also handles some special pages from drivers.
> 
> This patch is for avoiding to account special pages.
> 
> Changlog: v5 -> v6
>   - modified Documentation.
>   - fixed to charge only when a page is newly allocated.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 

[snip]
> @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
>  	struct page *page;
>  	pte_t entry;
>  	int anon = 0;
> +	int charged = 0;
>  	struct page *dirty_page = NULL;
>  	struct vm_fault vmf;
>  	int ret;
> @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
>  				ret = VM_FAULT_OOM;
>  				goto out;
>  			}
> +			if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> +				ret = VM_FAULT_OOM;
> +				page_cache_release(page);
> +				goto out;
> +			}
> +			charged = 1;

If I understand this correctly, we now account only when the VMA is not shared?
Seems reasonable, since we don't allocate a page otherwise.


[snip]


> Index: mmotm-2.6.27-rc7+/mm/rmap.c
> ===================================================================
> --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> +++ mmotm-2.6.27-rc7+/mm/rmap.c
> @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
>  			page_clear_dirty(page);
>  			set_page_dirty(page);
>  		}
> -
> -		mem_cgroup_uncharge_page(page);
> +		if (PageAnon(page))
> +			mem_cgroup_uncharge_page(page);

Is the change because we expect the page to get directly uncharged when it is
removed from cache? i.e, page->mapping is set to NULL before uncharge?

Looks good to me, I am yet to test it though.

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>


-- 
	Balbir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] memcg: set page->mapping NULL before uncharge
  2008-09-29 10:22 ` [PATCH 2/4] memcg: set page->mapping NULL before uncharge KAMEZAWA Hiroyuki
  2008-09-29 11:39   ` Daisuke Nishimura
@ 2008-10-01  3:50   ` Balbir Singh
  1 sibling, 0 replies; 17+ messages in thread
From: Balbir Singh @ 2008-10-01  3:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, LKML, xemul, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> This patch tries to make page->mapping to be NULL before
> mem_cgroup_uncharge_cache_page() is called.
> 
> "page->mapping == NULL" is a good check for "whether the page is still
> radix-tree or not".
> This patch also adds BUG_ON() to mem_cgroup_uncharge_cache_page();
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Looks good to me

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Balbir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH/stylefix 3/4] memcg: avoid account not-on-LRU pages
  2008-10-01  3:49     ` Balbir Singh
@ 2008-10-01  4:50       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-10-01  4:50 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, LKML, xemul, nishimura, Andrew Morton

On Wed, 01 Oct 2008 09:19:10 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > This is conding-style fixed version. Thank you, Nishimura-san.
> > -Kmae
> > ==
> > There are not-on-LRU pages which can be mapped and they are not worth to
> > be accounted. (becasue we can't shrink them and need dirty codes to handle
> > specical case) We'd like to make use of usual objrmap/radix-tree's protcol
> > and don't want to account out-of-vm's control pages.
> > 
> > When special_mapping_fault() is called, page->mapping is tend to be NULL 
> > and it's charged as Anonymous page.
> > insert_page() also handles some special pages from drivers.
> > 
> > This patch is for avoiding to account special pages.
> > 
> > Changlog: v5 -> v6
> >   - modified Documentation.
> >   - fixed to charge only when a page is newly allocated.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> 
> [snip]
> > @@ -2463,6 +2457,7 @@ static int __do_fault(struct mm_struct *
> >  	struct page *page;
> >  	pte_t entry;
> >  	int anon = 0;
> > +	int charged = 0;
> >  	struct page *dirty_page = NULL;
> >  	struct vm_fault vmf;
> >  	int ret;
> > @@ -2503,6 +2498,12 @@ static int __do_fault(struct mm_struct *
> >  				ret = VM_FAULT_OOM;
> >  				goto out;
> >  			}
> > +			if (mem_cgroup_charge(page, mm, GFP_KERNEL)) {
> > +				ret = VM_FAULT_OOM;
> > +				page_cache_release(page);
> > +				goto out;
> > +			}
> > +			charged = 1;
> 
> If I understand this correctly, we now account only when the VMA is not shared?
> Seems reasonable, since we don't allocate a page otherwise.
> 
When we allocate a new page. If in page-cache, it's accounted at radix-tree.

> 
> [snip]
> 
> 
> > Index: mmotm-2.6.27-rc7+/mm/rmap.c
> > ===================================================================
> > --- mmotm-2.6.27-rc7+.orig/mm/rmap.c
> > +++ mmotm-2.6.27-rc7+/mm/rmap.c
> > @@ -725,8 +725,8 @@ void page_remove_rmap(struct page *page,
> >  			page_clear_dirty(page);
> >  			set_page_dirty(page);
> >  		}
> > -
> > -		mem_cgroup_uncharge_page(page);
> > +		if (PageAnon(page))
> > +			mem_cgroup_uncharge_page(page);
> 
> Is the change because we expect the page to get directly uncharged when it is
> removed from cache? i.e, page->mapping is set to NULL before uncharge?
> 
yes. I expect deletion from radix-tree catch page-cache.

> Looks good to me, I am yet to test it though.
> 
Thanks,
-Kame

> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> 
> -- 
> 	Balbir
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] memcg: ready-to-go series (was memcg update v6)
  2008-09-29 10:19 [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2008-09-29 10:24 ` [PATCH 4/4] memcg: optimze cpustat KAMEZAWA Hiroyuki
@ 2008-10-06 17:15 ` Balbir Singh
  4 siblings, 0 replies; 17+ messages in thread
From: Balbir Singh @ 2008-10-06 17:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andrew Morton; +Cc: linux-mm, LKML, xemul, nishimura

KAMEZAWA Hiroyuki wrote:
> Cut out 4 patches from memcg update v5 series.
> (Then, this is a part of v6)
> 
> I think we got some agreement on these 4.
> 
> please ack if ok.
> 
> [1/4] ...  account swap under lock
> [2/4] ...  make page->mapping to be NULL before uncharge cache.
> [3/4] ...  avoid accounting not-on-LRU pages.
> [4/4] ...  optimize cpu stat
> 
> I still have 6 patches but it's under test and needs review and discussion.

Hi, Andrew,

Could you please pick this patchset, following it is a very important set of
patches that remove struct page's page_cgroup member.


-- 
	Balbir

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-10-06 17:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-29 10:19 [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) KAMEZAWA Hiroyuki
2008-09-29 10:21 ` [PATCH 1/4] memcg: account swap cache under lock KAMEZAWA Hiroyuki
2008-09-29 11:33   ` Daisuke Nishimura
2008-09-30  8:05   ` Balbir Singh
2008-09-29 10:22 ` [PATCH 2/4] memcg: set page->mapping NULL before uncharge KAMEZAWA Hiroyuki
2008-09-29 11:39   ` Daisuke Nishimura
2008-10-01  3:50   ` Balbir Singh
2008-09-29 10:23 ` [PATCH 3/4] memcg: avoid account not-on-LRU pages KAMEZAWA Hiroyuki
2008-09-29 11:19   ` Daisuke Nishimura
2008-09-29 11:59   ` kamezawa.hiroyu
2008-09-30  1:17   ` [PATCH/stylefix " KAMEZAWA Hiroyuki
2008-10-01  3:49     ` Balbir Singh
2008-10-01  4:50       ` KAMEZAWA Hiroyuki
2008-09-30  8:14   ` [PATCH " Balbir Singh
2008-09-29 10:24 ` [PATCH 4/4] memcg: optimze cpustat KAMEZAWA Hiroyuki
2008-09-29 11:44   ` Daisuke Nishimura
2008-10-06 17:15 ` [PATCH 0/4] memcg: ready-to-go series (was memcg update v6) Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).