linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] memcg: simplify LRU handling.
@ 2011-12-14  7:47 KAMEZAWA Hiroyuki
  2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-14  7:47 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, hannes, Michal Hocko, akpm, Hugh Dickins, Ying Han


This series is onto linux-next + 
memcg-add-mem_cgroup_replace_page_cache-to-fix-lru-issue.patch

The 1st purpose of this patch is reduce overheads of mem_cgroup_add/del_lru.
They uses some atomic ops. After this patch, lru handling routine will be

==
struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
                                       enum lru_list lru)
{
        struct mem_cgroup_per_zone *mz;
        struct mem_cgroup *memcg;
        struct page_cgroup *pc;

        if (mem_cgroup_disabled())
                return &zone->lruvec;

        pc = lookup_page_cgroup(page);
        memcg = pc->mem_cgroup;
        VM_BUG_ON(!memcg);
        mz = page_cgroup_zoneinfo(memcg, page);
        /* compound_order() is stabilized through lru_lock */
        MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
        return &mz->lruvec;
}
==

simple and no atomic ops. Because of Johannes works in linux-next,
this can be archived by very straightforward way.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-14  7:47 [PATCH 0/4] memcg: simplify LRU handling KAMEZAWA Hiroyuki
@ 2011-12-14  7:49 ` KAMEZAWA Hiroyuki
  2011-12-16 22:28   ` Andrew Morton
                     ` (2 more replies)
  2011-12-14  7:50 ` [PATCH 2/4] memcg: simplify corner case handling of LRU KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-14  7:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Hugh Dickins,
	Ying Han

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This patch is a clean up. No functional/logical changes.

Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
called against FUSE's pages from splice.

So, Now, mem_cgroup_cache_charge() doesn't receive a page on LRU
unless it's not SwapCache.
For checking, WARN_ON_ONCE(PageLRU(page)) is added.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   31 +++++++++----------------------
 1 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a9e92a6..947c62c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2710,6 +2710,7 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 				gfp_t gfp_mask)
 {
 	struct mem_cgroup *memcg = NULL;
+	enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE;
 	int ret;
 
 	if (mem_cgroup_disabled())
@@ -2719,31 +2720,17 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 
 	if (unlikely(!mm))
 		mm = &init_mm;
+	if (!page_is_file_cache(page))
+		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
 
-	if (page_is_file_cache(page)) {
-		ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &memcg, true);
-		if (ret || !memcg)
-			return ret;
-
-		/*
-		 * FUSE reuses pages without going through the final
-		 * put that would remove them from the LRU list, make
-		 * sure that they get relinked properly.
-		 */
-		__mem_cgroup_commit_charge_lrucare(page, memcg,
-					MEM_CGROUP_CHARGE_TYPE_CACHE);
-		return ret;
-	}
-	/* shmem */
-	if (PageSwapCache(page)) {
+	if (!PageSwapCache(page)) {
+		ret = mem_cgroup_charge_common(page, mm, gfp_mask, type);
+		WARN_ON_ONCE(PageLRU(page));
+	} else { /* page is swapcache/shmem */
 		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &memcg);
 		if (!ret)
-			__mem_cgroup_commit_charge_swapin(page, memcg,
-					MEM_CGROUP_CHARGE_TYPE_SHMEM);
-	} else
-		ret = mem_cgroup_charge_common(page, mm, gfp_mask,
-					MEM_CGROUP_CHARGE_TYPE_SHMEM);
-
+			__mem_cgroup_commit_charge_swapin(page, memcg, type);
+	}
 	return ret;
 }
 
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] memcg: simplify corner case handling of LRU.
  2011-12-14  7:47 [PATCH 0/4] memcg: simplify LRU handling KAMEZAWA Hiroyuki
  2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
@ 2011-12-14  7:50 ` KAMEZAWA Hiroyuki
  2011-12-19 15:14   ` Johannes Weiner
  2011-12-20 15:05   ` Michal Hocko
  2011-12-14  7:51 ` [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-14  7:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Hugh Dickins,
	Ying Han

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This patch simplifies LRU handling of racy case (memcg+SwapCache).
At charging, SwapCache tend to be on LRU already. So, before
overwriting pc->mem_cgroup, the page must be removed from LRU and
added to LRU later.

This patch does
        spin_lock(zone->lru_lock);
        if (PageLRU(page))
                remove from LRU
        overwrite pc->mem_cgroup
        if (PageLRU(page))
                add to new LRU.
        spin_unlock(zone->lru_lock);

And guarantee all pages are not on LRU at modifying pc->mem_cgroup.
This patch also unfies lru handling of replace_page_cache() and
swapin.

Changelog:
 - modify PageLRU flag correctly.
 - handle replace_page_cache.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |  109 ++++++++-----------------------------------------------
 1 files changed, 16 insertions(+), 93 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 947c62c..7a857e8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1071,86 +1071,6 @@ struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone,
 }
 
 /*
- * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
- * while it's linked to lru because the page may be reused after it's fully
- * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
- * It's done under lock_page and expected that zone->lru_lock isnever held.
- */
-static void mem_cgroup_lru_del_before_commit(struct page *page)
-{
-	enum lru_list lru;
-	unsigned long flags;
-	struct zone *zone = page_zone(page);
-	struct page_cgroup *pc = lookup_page_cgroup(page);
-
-	/*
-	 * Doing this check without taking ->lru_lock seems wrong but this
-	 * is safe. Because if page_cgroup's USED bit is unset, the page
-	 * will not be added to any memcg's LRU. If page_cgroup's USED bit is
-	 * set, the commit after this will fail, anyway.
-	 * This all charge/uncharge is done under some mutual execustion.
-	 * So, we don't need to taking care of changes in USED bit.
-	 */
-	if (likely(!PageLRU(page)))
-		return;
-
-	spin_lock_irqsave(&zone->lru_lock, flags);
-	lru = page_lru(page);
-	/*
-	 * The uncharged page could still be registered to the LRU of
-	 * the stale pc->mem_cgroup.
-	 *
-	 * As pc->mem_cgroup is about to get overwritten, the old LRU
-	 * accounting needs to be taken care of.  Let root_mem_cgroup
-	 * babysit the page until the new memcg is responsible for it.
-	 *
-	 * The PCG_USED bit is guarded by lock_page() as the page is
-	 * swapcache/pagecache.
-	 */
-	if (PageLRU(page) && PageCgroupAcctLRU(pc) && !PageCgroupUsed(pc)) {
-		del_page_from_lru_list(zone, page, lru);
-		add_page_to_lru_list(zone, page, lru);
-	}
-	spin_unlock_irqrestore(&zone->lru_lock, flags);
-}
-
-static void mem_cgroup_lru_add_after_commit(struct page *page)
-{
-	enum lru_list lru;
-	unsigned long flags;
-	struct zone *zone = page_zone(page);
-	struct page_cgroup *pc = lookup_page_cgroup(page);
-	/*
-	 * putback:				charge:
-	 * SetPageLRU				SetPageCgroupUsed
-	 * smp_mb				smp_mb
-	 * PageCgroupUsed && add to memcg LRU	PageLRU && add to memcg LRU
-	 *
-	 * Ensure that one of the two sides adds the page to the memcg
-	 * LRU during a race.
-	 */
-	smp_mb();
-	/* taking care of that the page is added to LRU while we commit it */
-	if (likely(!PageLRU(page)))
-		return;
-	spin_lock_irqsave(&zone->lru_lock, flags);
-	lru = page_lru(page);
-	/*
-	 * If the page is not on the LRU, someone will soon put it
-	 * there.  If it is, and also already accounted for on the
-	 * memcg-side, it must be on the right lruvec as setting
-	 * pc->mem_cgroup and PageCgroupUsed is properly ordered.
-	 * Otherwise, root_mem_cgroup has been babysitting the page
-	 * during the charge.  Move it to the new memcg now.
-	 */
-	if (PageLRU(page) && !PageCgroupAcctLRU(pc)) {
-		del_page_from_lru_list(zone, page, lru);
-		add_page_to_lru_list(zone, page, lru);
-	}
-	spin_unlock_irqrestore(&zone->lru_lock, flags);
-}
-
-/*
  * Checks whether given mem is same or in the root_mem_cgroup's
  * hierarchy subtree
  */
@@ -2695,14 +2615,27 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
 					enum charge_type ctype)
 {
 	struct page_cgroup *pc = lookup_page_cgroup(page);
+	struct zone *zone = page_zone(page);
+	unsigned long flags;
+	bool removed = false;
+
 	/*
 	 * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
 	 * is already on LRU. It means the page may on some other page_cgroup's
 	 * LRU. Take care of it.
 	 */
-	mem_cgroup_lru_del_before_commit(page);
+	spin_lock_irqsave(&zone->lru_lock, flags);
+	if (PageLRU(page)) {
+		del_page_from_lru_list(zone, page, page_lru(page));
+		ClearPageLRU(page);
+		removed = true;
+	}
 	__mem_cgroup_commit_charge(memcg, page, 1, pc, ctype);
-	mem_cgroup_lru_add_after_commit(page);
+	if (removed) {
+		add_page_to_lru_list(zone, page, page_lru(page));
+		SetPageLRU(page);
+	}
+	spin_unlock_irqrestore(&zone->lru_lock, flags);
 	return;
 }
 
@@ -3303,9 +3236,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 {
 	struct mem_cgroup *memcg;
 	struct page_cgroup *pc;
-	struct zone *zone;
 	enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE;
-	unsigned long flags;
 
 	pc = lookup_page_cgroup(oldpage);
 	/* fix accounting on old pages */
@@ -3318,20 +3249,12 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
 	if (PageSwapBacked(oldpage))
 		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
 
-	zone = page_zone(newpage);
-	pc = lookup_page_cgroup(newpage);
 	/*
 	 * Even if newpage->mapping was NULL before starting replacement,
 	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
 	 * LRU while we overwrite pc->mem_cgroup.
 	 */
-	spin_lock_irqsave(&zone->lru_lock, flags);
-	if (PageLRU(newpage))
-		del_page_from_lru_list(zone, newpage, page_lru(newpage));
-	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type);
-	if (PageLRU(newpage))
-		add_page_to_lru_list(zone, newpage, page_lru(newpage));
-	spin_unlock_irqrestore(&zone->lru_lock, flags);
+	__mem_cgroup_commit_charge_lrucare(newpage, memcg, type);
 }
 
 #ifdef CONFIG_DEBUG_VM
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary.
  2011-12-14  7:47 [PATCH 0/4] memcg: simplify LRU handling KAMEZAWA Hiroyuki
  2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
  2011-12-14  7:50 ` [PATCH 2/4] memcg: simplify corner case handling of LRU KAMEZAWA Hiroyuki
@ 2011-12-14  7:51 ` KAMEZAWA Hiroyuki
  2011-12-16 22:30   ` Andrew Morton
                     ` (2 more replies)
  2011-12-14  7:52 ` [PATCH 4/4] memcg: simplify LRU handling by new rule KAMEZAWA Hiroyuki
  2011-12-19 15:56 ` [PATCH 0/4] memcg: simplify LRU handling Johannes Weiner
  4 siblings, 3 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-14  7:51 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Hugh Dickins,
	Ying Han

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
and reducing atomic ops/complexity in memcg LRU handling.

In some cases, pages are added to lru before charge to memcg and pages
are not classfied to memory cgroup at lru addtion. Now, the lru where
the page should be added is determined a bit in page_cgroup->flags and
pc->mem_cgroup. I'd like to remove the check of flag.

To handle the case pc->mem_cgroup may contain stale pointers if pages are
added to LRU before classification. This patch resets pc->mem_cgroup to
root_mem_cgroup before lru additions.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |    5 +++++
 mm/ksm.c                   |    1 +
 mm/memcontrol.c            |   14 ++++++++++++++
 mm/swap_state.c            |    1 +
 4 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index bd3b102..7428409 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -126,6 +126,7 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
 					struct page *newpage);
 
+extern void mem_cgroup_reset_owner(struct page *page);
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
 #endif
@@ -388,6 +389,10 @@ static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
 				struct page *newpage)
 {
 }
+
+static inline void mem_cgroup_reset_owner(struct page *page);
+{
+}
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
diff --git a/mm/ksm.c b/mm/ksm.c
index a6d3fb7..480983d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1571,6 +1571,7 @@ struct page *ksm_does_need_to_copy(struct page *page,
 
 	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
 	if (new_page) {
+		mem_cgroup_reset_owner(new_page);
 		copy_user_highpage(new_page, page, address, vma);
 
 		SetPageDirty(new_page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7a857e8..2ae973d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3024,6 +3024,20 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
 	rcu_read_unlock();
 }
 
+/*
+ * A function for resetting pc->mem_cgroup for newly allocated pages.
+ * This function should be called if the newpage will be added to LRU
+ * before start accounting.
+ */
+void mem_cgroup_reset_owner(struct page *newpage)
+{
+	struct page_cgroup *pc;
+
+	pc = lookup_page_cgroup(newpage);
+	VM_BUG_ON(PageCgroupUsed(pc));
+	pc->mem_cgroup = root_mem_cgroup;
+}
+
 /**
  * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
  * @entry: swap entry to be moved
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 78cc4d1..747539e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -301,6 +301,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 			new_page = alloc_page_vma(gfp_mask, vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
+			mem_cgroup_reset_owner(new_page);
 		}
 
 		/*
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] memcg: simplify LRU handling by new rule
  2011-12-14  7:47 [PATCH 0/4] memcg: simplify LRU handling KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2011-12-14  7:51 ` [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary KAMEZAWA Hiroyuki
@ 2011-12-14  7:52 ` KAMEZAWA Hiroyuki
  2011-12-19 15:48   ` Johannes Weiner
  2011-12-20 16:16   ` Michal Hocko
  2011-12-19 15:56 ` [PATCH 0/4] memcg: simplify LRU handling Johannes Weiner
  4 siblings, 2 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-14  7:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, akpm, Hugh Dickins,
	Ying Han

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, at LRU handling, memory cgroup needs to do complicated works
to see valid pc->mem_cgroup, which may be overwritten.

This patch is for relaxing the protocol. This patch guarantees
   - when pc->mem_cgroup is overwritten, page must not be on LRU.

By this, LRU routine can believe pc->mem_cgroup and don't need to
check bits on pc->flags. This new rule may adds small overheads to
swapin. But in most case, lru handling gets faster.

After this patch, PCG_ACCT_LRU bit is obsolete and removed.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/page_cgroup.h |    8 -----
 mm/memcontrol.c             |   72 ++++++++++--------------------------------
 2 files changed, 17 insertions(+), 63 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index aaa60da..2cddacf 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -10,8 +10,6 @@ enum {
 	/* flags for mem_cgroup and file and I/O status */
 	PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
 	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
-	/* No lock in page_cgroup */
-	PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */
 	__NR_PCG_FLAGS,
 };
 
@@ -75,12 +73,6 @@ TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 SETPCGFLAG(Used, USED)
 
-SETPCGFLAG(AcctLRU, ACCT_LRU)
-CLEARPCGFLAG(AcctLRU, ACCT_LRU)
-TESTPCGFLAG(AcctLRU, ACCT_LRU)
-TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
-
-
 SETPCGFLAG(FileMapped, FILE_MAPPED)
 CLEARPCGFLAG(FileMapped, FILE_MAPPED)
 TESTPCGFLAG(FileMapped, FILE_MAPPED)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ae973d..d5e21e7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -974,30 +974,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
 		return &zone->lruvec;
 
 	pc = lookup_page_cgroup(page);
-	VM_BUG_ON(PageCgroupAcctLRU(pc));
-	/*
-	 * putback:				charge:
-	 * SetPageLRU				SetPageCgroupUsed
-	 * smp_mb				smp_mb
-	 * PageCgroupUsed && add to memcg LRU	PageLRU && add to memcg LRU
-	 *
-	 * Ensure that one of the two sides adds the page to the memcg
-	 * LRU during a race.
-	 */
-	smp_mb();
-	/*
-	 * If the page is uncharged, it may be freed soon, but it
-	 * could also be swap cache (readahead, swapoff) that needs to
-	 * be reclaimable in the future.  root_mem_cgroup will babysit
-	 * it for the time being.
-	 */
-	if (PageCgroupUsed(pc)) {
-		/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
-		smp_rmb();
-		memcg = pc->mem_cgroup;
-		SetPageCgroupAcctLRU(pc);
-	} else
-		memcg = root_mem_cgroup;
+	memcg = pc->mem_cgroup;
+	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* compound_order() is stabilized through lru_lock */
 	MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
@@ -1024,18 +1002,8 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
 		return;
 
 	pc = lookup_page_cgroup(page);
-	/*
-	 * root_mem_cgroup babysits uncharged LRU pages, but
-	 * PageCgroupUsed is cleared when the page is about to get
-	 * freed.  PageCgroupAcctLRU remembers whether the
-	 * LRU-accounting happened against pc->mem_cgroup or
-	 * root_mem_cgroup.
-	 */
-	if (TestClearPageCgroupAcctLRU(pc)) {
-		VM_BUG_ON(!pc->mem_cgroup);
-		memcg = pc->mem_cgroup;
-	} else
-		memcg = root_mem_cgroup;
+	memcg = pc->mem_cgroup;
+	VM_BUG_ON(!memcg);
 	mz = page_cgroup_zoneinfo(memcg, page);
 	/* huge page split is done under lru_lock. so, we have no races. */
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1 << compound_order(page);
@@ -2377,6 +2345,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), nr_pages);
 	unlock_page_cgroup(pc);
+	WARN_ON_ONCE(PageLRU(page));
 	/*
 	 * "charge_statistics" updated event counter. Then, check it.
 	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
@@ -2388,7 +2357,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 
 #define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MOVE_LOCK) |\
-			(1 << PCG_ACCT_LRU) | (1 << PCG_MIGRATION))
+			(1 << PCG_MIGRATION))
 /*
  * Because tail pages are not marked as "used", set it. We're under
  * zone->lru_lock, 'splitting on pmd' and compound_lock.
@@ -2399,6 +2368,8 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 {
 	struct page_cgroup *head_pc = lookup_page_cgroup(head);
 	struct page_cgroup *pc;
+	struct mem_cgroup_per_zone *mz;
+	enum lru_list lru;
 	int i;
 
 	if (mem_cgroup_disabled())
@@ -2407,23 +2378,15 @@ void mem_cgroup_split_huge_fixup(struct page *head)
 		pc = head_pc + i;
 		pc->mem_cgroup = head_pc->mem_cgroup;
 		smp_wmb();/* see __commit_charge() */
-		/*
-		 * LRU flags cannot be copied because we need to add tail
-		 * page to LRU by generic call and our hooks will be called.
-		 */
 		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
 	}
-
-	if (PageCgroupAcctLRU(head_pc)) {
-		enum lru_list lru;
-		struct mem_cgroup_per_zone *mz;
-		/*
-		 * We hold lru_lock, then, reduce counter directly.
-		 */
-		lru = page_lru(head);
-		mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
-		MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
-	}
+	/* 
+	 * Tail pages will be added to LRU.
+	 * We hold lru_lock,then,reduce counter directly.
+	 */
+	lru = page_lru(head);
+	mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
+	MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
 }
 #endif
 
@@ -2656,10 +2619,9 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 	if (!page_is_file_cache(page))
 		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
 
-	if (!PageSwapCache(page)) {
+	if (!PageSwapCache(page))
 		ret = mem_cgroup_charge_common(page, mm, gfp_mask, type);
-		WARN_ON_ONCE(PageLRU(page));
-	} else { /* page is swapcache/shmem */
+	else { /* page is swapcache/shmem */
 		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &memcg);
 		if (!ret)
 			__mem_cgroup_commit_charge_swapin(page, memcg, type);
-- 
1.7.4.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
@ 2011-12-16 22:28   ` Andrew Morton
  2011-12-19  0:01     ` KAMEZAWA Hiroyuki
  2011-12-19 15:04   ` Johannes Weiner
  2011-12-20 15:33   ` Michal Hocko
  2 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2011-12-16 22:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, Hugh Dickins, Ying Han

On Wed, 14 Dec 2011 16:49:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
> of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
> called against FUSE's pages from splice.

Speaking of ef6a3c6311 ("mm: add replace_page_cache_page() function"),
may I pathetically remind people that it's rather inefficient?

http://lkml.indiana.edu/hypermail/linux/kernel/1109.1/00375.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary.
  2011-12-14  7:51 ` [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary KAMEZAWA Hiroyuki
@ 2011-12-16 22:30   ` Andrew Morton
  2011-12-19 15:37   ` Johannes Weiner
  2011-12-20 15:56   ` Michal Hocko
  2 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2011-12-16 22:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, Hugh Dickins, Ying Han

On Wed, 14 Dec 2011 16:51:24 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> @@ -388,6 +389,10 @@ static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
>  				struct page *newpage)
>  {
>  }
> +
> +static inline void mem_cgroup_reset_owner(struct page *page);
> +{
> +}
>  #endif /* CONFIG_CGROUP_MEM_CONT */
>  

Please print out Documentation/SubmitChecklist, tape it to your desk!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-16 22:28   ` Andrew Morton
@ 2011-12-19  0:01     ` KAMEZAWA Hiroyuki
  2011-12-20 21:58       ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-19  0:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, Hugh Dickins, Ying Han

On Fri, 16 Dec 2011 14:28:14 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 14 Dec 2011 16:49:22 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
> > of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
> > called against FUSE's pages from splice.
> 
> Speaking of ef6a3c6311 ("mm: add replace_page_cache_page() function"),
> may I pathetically remind people that it's rather inefficient?
> 
> http://lkml.indiana.edu/hypermail/linux/kernel/1109.1/00375.html
> 

IIRC, people says inefficient because it uses memcg codes for page-migration
for fixing up accounting. Now, We added replace-page-cache for memcg in
memcg-add-mem_cgroup_replace_page_cache-to-fix-lru-issue.patch

So, I think the problem originally mentioned is fixed.

I'll reconsinder LRU handling optimization after things seems to be good.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
  2011-12-16 22:28   ` Andrew Morton
@ 2011-12-19 15:04   ` Johannes Weiner
  2011-12-20 15:33   ` Michal Hocko
  2 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2011-12-19 15:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Michal Hocko, akpm, Hugh Dickins, Ying Han

On Wed, Dec 14, 2011 at 04:49:22PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch is a clean up. No functional/logical changes.
> 
> Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
> of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
> called against FUSE's pages from splice.
> 
> So, Now, mem_cgroup_cache_charge() doesn't receive a page on LRU
> unless it's not SwapCache.
> For checking, WARN_ON_ONCE(PageLRU(page)) is added.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

I like this.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] memcg: simplify corner case handling of LRU.
  2011-12-14  7:50 ` [PATCH 2/4] memcg: simplify corner case handling of LRU KAMEZAWA Hiroyuki
@ 2011-12-19 15:14   ` Johannes Weiner
  2011-12-20 15:05   ` Michal Hocko
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2011-12-19 15:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Michal Hocko, akpm, Hugh Dickins, Ying Han

On Wed, Dec 14, 2011 at 04:50:32PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch simplifies LRU handling of racy case (memcg+SwapCache).
> At charging, SwapCache tend to be on LRU already. So, before
> overwriting pc->mem_cgroup, the page must be removed from LRU and
> added to LRU later.
> 
> This patch does
>         spin_lock(zone->lru_lock);
>         if (PageLRU(page))
>                 remove from LRU
>         overwrite pc->mem_cgroup
>         if (PageLRU(page))
>                 add to new LRU.
>         spin_unlock(zone->lru_lock);

Not quite.  It also clears PageLRU in between.

Since it doesn't release the lru lock until the page is back on the
list, couldn't we just leave that bit alone like
mem_cgroup_replace_page_cache() did?

That said, thanks for removing this mind-boggling complexity, the code
is much better off with this patch.

Feel free to add my

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Relevant hunks for reference:

> @@ -2695,14 +2615,27 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
>  					enum charge_type ctype)
>  {
>  	struct page_cgroup *pc = lookup_page_cgroup(page);
> +	struct zone *zone = page_zone(page);
> +	unsigned long flags;
> +	bool removed = false;
> +
>  	/*
>  	 * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
>  	 * is already on LRU. It means the page may on some other page_cgroup's
>  	 * LRU. Take care of it.
>  	 */
> -	mem_cgroup_lru_del_before_commit(page);
> +	spin_lock_irqsave(&zone->lru_lock, flags);
> +	if (PageLRU(page)) {
> +		del_page_from_lru_list(zone, page, page_lru(page));
> +		ClearPageLRU(page);
> +		removed = true;
> +	}
>  	__mem_cgroup_commit_charge(memcg, page, 1, pc, ctype);
> -	mem_cgroup_lru_add_after_commit(page);
> +	if (removed) {
> +		add_page_to_lru_list(zone, page, page_lru(page));
> +		SetPageLRU(page);
> +	}
> +	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  	return;
>  }
>  
> @@ -3303,9 +3236,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  {
>  	struct mem_cgroup *memcg;
>  	struct page_cgroup *pc;
> -	struct zone *zone;
>  	enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE;
> -	unsigned long flags;
>  
>  	pc = lookup_page_cgroup(oldpage);
>  	/* fix accounting on old pages */
> @@ -3318,20 +3249,12 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  	if (PageSwapBacked(oldpage))
>  		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
>  
> -	zone = page_zone(newpage);
> -	pc = lookup_page_cgroup(newpage);
>  	/*
>  	 * Even if newpage->mapping was NULL before starting replacement,
>  	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
>  	 * LRU while we overwrite pc->mem_cgroup.
>  	 */
> -	spin_lock_irqsave(&zone->lru_lock, flags);
> -	if (PageLRU(newpage))
> -		del_page_from_lru_list(zone, newpage, page_lru(newpage));
> -	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type);
> -	if (PageLRU(newpage))
> -		add_page_to_lru_list(zone, newpage, page_lru(newpage));
> -	spin_unlock_irqrestore(&zone->lru_lock, flags);
> +	__mem_cgroup_commit_charge_lrucare(newpage, memcg, type);
>  }
>  
>  #ifdef CONFIG_DEBUG_VM

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary.
  2011-12-14  7:51 ` [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary KAMEZAWA Hiroyuki
  2011-12-16 22:30   ` Andrew Morton
@ 2011-12-19 15:37   ` Johannes Weiner
  2011-12-20  0:35     ` Hiroyuki Kamezawa
  2011-12-20 15:56   ` Michal Hocko
  2 siblings, 1 reply; 22+ messages in thread
From: Johannes Weiner @ 2011-12-19 15:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Michal Hocko, akpm, Hugh Dickins, Ying Han

On Wed, Dec 14, 2011 at 04:51:24PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
> and reducing atomic ops/complexity in memcg LRU handling.
> 
> In some cases, pages are added to lru before charge to memcg and pages
> are not classfied to memory cgroup at lru addtion. Now, the lru where
> the page should be added is determined a bit in page_cgroup->flags and
> pc->mem_cgroup. I'd like to remove the check of flag.
> 
> To handle the case pc->mem_cgroup may contain stale pointers if pages are
> added to LRU before classification. This patch resets pc->mem_cgroup to
> root_mem_cgroup before lru additions.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

The followup compilation fixes aside, I agree.  But the sites where
the owner is actually reset are really not too obvious.  How about the
comment patch below?

Otherwise,

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: memcg: clear pc->mem_cgorup if necessary fix

Add comments to the clearing sites.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/mm/ksm.c b/mm/ksm.c
index 5c2f0bd..f0ee5bf 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1571,6 +1571,15 @@ struct page *ksm_does_need_to_copy(struct page *page,
 
 	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
 	if (new_page) {
+		/*
+		 * The memcg-specific accounting when moving
+		 * pages around the LRU lists relies on the
+		 * page's owner (memcg) to be valid.  Usually,
+		 * pages are assigned to a new owner before
+		 * being put on the LRU list, but since this
+		 * is not the case here, the stale owner from
+		 * a previous allocation cycle must be reset.
+		 */
 		mem_cgroup_reset_owner(new_page);
 		copy_user_highpage(new_page, page, address, vma);
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 730c4c7..44ccfd2 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -302,6 +302,15 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 			new_page = alloc_page_vma(gfp_mask, vma, addr);
 			if (!new_page)
 				break;		/* Out of memory */
+			/*
+			 * The memcg-specific accounting when moving
+			 * pages around the LRU lists relies on the
+			 * page's owner (memcg) to be valid.  Usually,
+			 * pages are assigned to a new owner before
+			 * being put on the LRU list, but since this
+			 * is not the case here, the stale owner from
+			 * a previous allocation cycle must be reset.
+			 */
 			mem_cgroup_reset_owner(new_page);
 		}
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] memcg: simplify LRU handling by new rule
  2011-12-14  7:52 ` [PATCH 4/4] memcg: simplify LRU handling by new rule KAMEZAWA Hiroyuki
@ 2011-12-19 15:48   ` Johannes Weiner
  2011-12-20 16:16   ` Michal Hocko
  1 sibling, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2011-12-19 15:48 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Michal Hocko, akpm, Hugh Dickins, Ying Han

On Wed, Dec 14, 2011 at 04:52:26PM +0900, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Now, at LRU handling, memory cgroup needs to do complicated works
> to see valid pc->mem_cgroup, which may be overwritten.
> 
> This patch is for relaxing the protocol. This patch guarantees
>    - when pc->mem_cgroup is overwritten, page must not be on LRU.
> 
> By this, LRU routine can believe pc->mem_cgroup and don't need to
> check bits on pc->flags. This new rule may adds small overheads to
> swapin. But in most case, lru handling gets faster.
> 
> After this patch, PCG_ACCT_LRU bit is obsolete and removed.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
>  include/linux/page_cgroup.h |    8 -----
>  mm/memcontrol.c             |   72 ++++++++++--------------------------------
>  2 files changed, 17 insertions(+), 63 deletions(-)

This, too, speaks for itself and the logic seems sound to me.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Minor style things:

> @@ -974,30 +974,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  		return &zone->lruvec;
>  
>  	pc = lookup_page_cgroup(page);
> -	VM_BUG_ON(PageCgroupAcctLRU(pc));
> -	/*
> -	 * putback:				charge:
> -	 * SetPageLRU				SetPageCgroupUsed
> -	 * smp_mb				smp_mb
> -	 * PageCgroupUsed && add to memcg LRU	PageLRU && add to memcg LRU
> -	 *
> -	 * Ensure that one of the two sides adds the page to the memcg
> -	 * LRU during a race.
> -	 */
> -	smp_mb();
> -	/*
> -	 * If the page is uncharged, it may be freed soon, but it
> -	 * could also be swap cache (readahead, swapoff) that needs to
> -	 * be reclaimable in the future.  root_mem_cgroup will babysit
> -	 * it for the time being.
> -	 */
> -	if (PageCgroupUsed(pc)) {
> -		/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> -		smp_rmb();
> -		memcg = pc->mem_cgroup;
> -		SetPageCgroupAcctLRU(pc);
> -	} else
> -		memcg = root_mem_cgroup;
> +	memcg = pc->mem_cgroup;
> +	VM_BUG_ON(!memcg);
>  	mz = page_cgroup_zoneinfo(memcg, page);

I think the memcg local variable is not really needed anymore.

Also, please don't add bug-ons for simple NULL tests, they are
redundant when the dereference would blow up just as well.

> @@ -2399,6 +2368,8 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  {
>  	struct page_cgroup *head_pc = lookup_page_cgroup(head);
>  	struct page_cgroup *pc;
> +	struct mem_cgroup_per_zone *mz;
> +	enum lru_list lru;
>  	int i;

You broke the reverse christmas tree sorting!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] memcg: simplify LRU handling.
  2011-12-14  7:47 [PATCH 0/4] memcg: simplify LRU handling KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2011-12-14  7:52 ` [PATCH 4/4] memcg: simplify LRU handling by new rule KAMEZAWA Hiroyuki
@ 2011-12-19 15:56 ` Johannes Weiner
  4 siblings, 0 replies; 22+ messages in thread
From: Johannes Weiner @ 2011-12-19 15:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Michal Hocko, akpm, Hugh Dickins, Ying Han

On Wed, Dec 14, 2011 at 04:47:34PM +0900, KAMEZAWA Hiroyuki wrote:
> 
> This series is onto linux-next + 
> memcg-add-mem_cgroup_replace_page_cache-to-fix-lru-issue.patch
> 
> The 1st purpose of this patch is reduce overheads of mem_cgroup_add/del_lru.
> They uses some atomic ops.

Which is noticable.

With a simple sparse file cat, mem_cgroup_lru_add_list() went from

     1.12%      cat  [kernel.kallsyms]    [k] mem_cgroup_lru_add_list

to

     0.31%      cat  [kernel.kallsyms]    [k] mem_cgroup_lru_add_list

and real time went down, too:

5 runs		min	median	max	in seconds
vanilla:	7.762	7.782	7.816
patched:	7.622	7.631	7.660

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary.
  2011-12-19 15:37   ` Johannes Weiner
@ 2011-12-20  0:35     ` Hiroyuki Kamezawa
  0 siblings, 0 replies; 22+ messages in thread
From: Hiroyuki Kamezawa @ 2011-12-20  0:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Michal Hocko, akpm,
	Hugh Dickins, Ying Han

2011/12/20 Johannes Weiner <hannes@cmpxchg.org>:
> On Wed, Dec 14, 2011 at 04:51:24PM +0900, KAMEZAWA Hiroyuki wrote:
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>>
>> This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
>> and reducing atomic ops/complexity in memcg LRU handling.
>>
>> In some cases, pages are added to lru before charge to memcg and pages
>> are not classfied to memory cgroup at lru addtion. Now, the lru where
>> the page should be added is determined a bit in page_cgroup->flags and
>> pc->mem_cgroup. I'd like to remove the check of flag.
>>
>> To handle the case pc->mem_cgroup may contain stale pointers if pages are
>> added to LRU before classification. This patch resets pc->mem_cgroup to
>> root_mem_cgroup before lru additions.
>>
>> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> The followup compilation fixes aside, I agree.  But the sites where
> the owner is actually reset are really not too obvious.  How about the
> comment patch below?
>
> Otherwise,
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> ---
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: memcg: clear pc->mem_cgorup if necessary fix
>
> Add comments to the clearing sites.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Ah, yes. This seems better. Thank you.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] memcg: simplify corner case handling of LRU.
  2011-12-14  7:50 ` [PATCH 2/4] memcg: simplify corner case handling of LRU KAMEZAWA Hiroyuki
  2011-12-19 15:14   ` Johannes Weiner
@ 2011-12-20 15:05   ` Michal Hocko
  1 sibling, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2011-12-20 15:05 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, akpm, Hugh Dickins, Ying Han

On Wed 14-12-11 16:50:32, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch simplifies LRU handling of racy case (memcg+SwapCache).
> At charging, SwapCache tend to be on LRU already. So, before
> overwriting pc->mem_cgroup, the page must be removed from LRU and
> added to LRU later.
> 
> This patch does
>         spin_lock(zone->lru_lock);
>         if (PageLRU(page))
>                 remove from LRU
>         overwrite pc->mem_cgroup
>         if (PageLRU(page))
>                 add to new LRU.
>         spin_unlock(zone->lru_lock);
> 
> And guarantee all pages are not on LRU at modifying pc->mem_cgroup.
> This patch also unfies lru handling of replace_page_cache() and
> swapin.
> 
> Changelog:
>  - modify PageLRU flag correctly.
>  - handle replace_page_cache.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/memcontrol.c |  109 ++++++++-----------------------------------------------
>  1 files changed, 16 insertions(+), 93 deletions(-)

Wow, really nice. I always hated {before,after}_commit thingies. It was
just too complex.

After ClearPageLRU && SetPageLRU cleanup mentioned by Johannes already
feel free to add my 
Acked-by: Michal Hocko <mhocko@suse.cz>

> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 947c62c..7a857e8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1071,86 +1071,6 @@ struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone,
>  }
>  
>  /*
> - * At handling SwapCache and other FUSE stuff, pc->mem_cgroup may be changed
> - * while it's linked to lru because the page may be reused after it's fully
> - * uncharged. To handle that, unlink page_cgroup from LRU when charge it again.
> - * It's done under lock_page and expected that zone->lru_lock isnever held.
> - */
> -static void mem_cgroup_lru_del_before_commit(struct page *page)
> -{
> -	enum lru_list lru;
> -	unsigned long flags;
> -	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
> -
> -	/*
> -	 * Doing this check without taking ->lru_lock seems wrong but this
> -	 * is safe. Because if page_cgroup's USED bit is unset, the page
> -	 * will not be added to any memcg's LRU. If page_cgroup's USED bit is
> -	 * set, the commit after this will fail, anyway.
> -	 * This all charge/uncharge is done under some mutual execustion.
> -	 * So, we don't need to taking care of changes in USED bit.
> -	 */
> -	if (likely(!PageLRU(page)))
> -		return;
> -
> -	spin_lock_irqsave(&zone->lru_lock, flags);
> -	lru = page_lru(page);
> -	/*
> -	 * The uncharged page could still be registered to the LRU of
> -	 * the stale pc->mem_cgroup.
> -	 *
> -	 * As pc->mem_cgroup is about to get overwritten, the old LRU
> -	 * accounting needs to be taken care of.  Let root_mem_cgroup
> -	 * babysit the page until the new memcg is responsible for it.
> -	 *
> -	 * The PCG_USED bit is guarded by lock_page() as the page is
> -	 * swapcache/pagecache.
> -	 */
> -	if (PageLRU(page) && PageCgroupAcctLRU(pc) && !PageCgroupUsed(pc)) {
> -		del_page_from_lru_list(zone, page, lru);
> -		add_page_to_lru_list(zone, page, lru);
> -	}
> -	spin_unlock_irqrestore(&zone->lru_lock, flags);
> -}
> -
> -static void mem_cgroup_lru_add_after_commit(struct page *page)
> -{
> -	enum lru_list lru;
> -	unsigned long flags;
> -	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
> -	/*
> -	 * putback:				charge:
> -	 * SetPageLRU				SetPageCgroupUsed
> -	 * smp_mb				smp_mb
> -	 * PageCgroupUsed && add to memcg LRU	PageLRU && add to memcg LRU
> -	 *
> -	 * Ensure that one of the two sides adds the page to the memcg
> -	 * LRU during a race.
> -	 */
> -	smp_mb();
> -	/* taking care of that the page is added to LRU while we commit it */
> -	if (likely(!PageLRU(page)))
> -		return;
> -	spin_lock_irqsave(&zone->lru_lock, flags);
> -	lru = page_lru(page);
> -	/*
> -	 * If the page is not on the LRU, someone will soon put it
> -	 * there.  If it is, and also already accounted for on the
> -	 * memcg-side, it must be on the right lruvec as setting
> -	 * pc->mem_cgroup and PageCgroupUsed is properly ordered.
> -	 * Otherwise, root_mem_cgroup has been babysitting the page
> -	 * during the charge.  Move it to the new memcg now.
> -	 */
> -	if (PageLRU(page) && !PageCgroupAcctLRU(pc)) {
> -		del_page_from_lru_list(zone, page, lru);
> -		add_page_to_lru_list(zone, page, lru);
> -	}
> -	spin_unlock_irqrestore(&zone->lru_lock, flags);
> -}
> -
> -/*
>   * Checks whether given mem is same or in the root_mem_cgroup's
>   * hierarchy subtree
>   */
> @@ -2695,14 +2615,27 @@ __mem_cgroup_commit_charge_lrucare(struct page *page, struct mem_cgroup *memcg,
>  					enum charge_type ctype)
>  {
>  	struct page_cgroup *pc = lookup_page_cgroup(page);
> +	struct zone *zone = page_zone(page);
> +	unsigned long flags;
> +	bool removed = false;
> +
>  	/*
>  	 * In some case, SwapCache, FUSE(splice_buf->radixtree), the page
>  	 * is already on LRU. It means the page may on some other page_cgroup's
>  	 * LRU. Take care of it.
>  	 */
> -	mem_cgroup_lru_del_before_commit(page);
> +	spin_lock_irqsave(&zone->lru_lock, flags);
> +	if (PageLRU(page)) {
> +		del_page_from_lru_list(zone, page, page_lru(page));
> +		ClearPageLRU(page);
> +		removed = true;
> +	}
>  	__mem_cgroup_commit_charge(memcg, page, 1, pc, ctype);
> -	mem_cgroup_lru_add_after_commit(page);
> +	if (removed) {
> +		add_page_to_lru_list(zone, page, page_lru(page));
> +		SetPageLRU(page);
> +	}
> +	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  	return;
>  }
>  
> @@ -3303,9 +3236,7 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,


>  {
>  	struct mem_cgroup *memcg;
>  	struct page_cgroup *pc;
> -	struct zone *zone;
>  	enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE;
> -	unsigned long flags;
>  
>  	pc = lookup_page_cgroup(oldpage);
>  	/* fix accounting on old pages */
> @@ -3318,20 +3249,12 @@ void mem_cgroup_replace_page_cache(struct page *oldpage,
>  	if (PageSwapBacked(oldpage))
>  		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
>  
> -	zone = page_zone(newpage);
> -	pc = lookup_page_cgroup(newpage);
>  	/*
>  	 * Even if newpage->mapping was NULL before starting replacement,
>  	 * the newpage may be on LRU(or pagevec for LRU) already. We lock
>  	 * LRU while we overwrite pc->mem_cgroup.
>  	 */
> -	spin_lock_irqsave(&zone->lru_lock, flags);
> -	if (PageLRU(newpage))
> -		del_page_from_lru_list(zone, newpage, page_lru(newpage));
> -	__mem_cgroup_commit_charge(memcg, newpage, 1, pc, type);
> -	if (PageLRU(newpage))
> -		add_page_to_lru_list(zone, newpage, page_lru(newpage));
> -	spin_unlock_irqrestore(&zone->lru_lock, flags);
> +	__mem_cgroup_commit_charge_lrucare(newpage, memcg, type);
>  }
>  
>  #ifdef CONFIG_DEBUG_VM
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
  2011-12-16 22:28   ` Andrew Morton
  2011-12-19 15:04   ` Johannes Weiner
@ 2011-12-20 15:33   ` Michal Hocko
  2 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2011-12-20 15:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, akpm, Hugh Dickins, Ying Han

On Wed 14-12-11 16:49:22, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch is a clean up. No functional/logical changes.
> 
> Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
> of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
> called against FUSE's pages from splice.
> 
> So, Now, mem_cgroup_cache_charge() doesn't receive a page on LRU
> unless it's not SwapCache.

too many negations makes it hard to read. What about:

mem_cgroup_cache_charge gets pages that are not on LRU with exception of
PageSwapCache pages.

> For checking, WARN_ON_ONCE(PageLRU(page)) is added.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Makes sense.
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/memcontrol.c |   31 +++++++++----------------------
>  1 files changed, 9 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a9e92a6..947c62c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2710,6 +2710,7 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  				gfp_t gfp_mask)
>  {
>  	struct mem_cgroup *memcg = NULL;
> +	enum charge_type type = MEM_CGROUP_CHARGE_TYPE_CACHE;
>  	int ret;
>  
>  	if (mem_cgroup_disabled())
> @@ -2719,31 +2720,17 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  
>  	if (unlikely(!mm))
>  		mm = &init_mm;
> +	if (!page_is_file_cache(page))
> +		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
>  
> -	if (page_is_file_cache(page)) {
> -		ret = __mem_cgroup_try_charge(mm, gfp_mask, 1, &memcg, true);
> -		if (ret || !memcg)
> -			return ret;
> -
> -		/*
> -		 * FUSE reuses pages without going through the final
> -		 * put that would remove them from the LRU list, make
> -		 * sure that they get relinked properly.
> -		 */
> -		__mem_cgroup_commit_charge_lrucare(page, memcg,
> -					MEM_CGROUP_CHARGE_TYPE_CACHE);
> -		return ret;
> -	}
> -	/* shmem */
> -	if (PageSwapCache(page)) {
> +	if (!PageSwapCache(page)) {
> +		ret = mem_cgroup_charge_common(page, mm, gfp_mask, type);
> +		WARN_ON_ONCE(PageLRU(page));
> +	} else { /* page is swapcache/shmem */
>  		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &memcg);
>  		if (!ret)
> -			__mem_cgroup_commit_charge_swapin(page, memcg,
> -					MEM_CGROUP_CHARGE_TYPE_SHMEM);
> -	} else
> -		ret = mem_cgroup_charge_common(page, mm, gfp_mask,
> -					MEM_CGROUP_CHARGE_TYPE_SHMEM);
> -
> +			__mem_cgroup_commit_charge_swapin(page, memcg, type);
> +	}
>  	return ret;
>  }
>  
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary.
  2011-12-14  7:51 ` [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary KAMEZAWA Hiroyuki
  2011-12-16 22:30   ` Andrew Morton
  2011-12-19 15:37   ` Johannes Weiner
@ 2011-12-20 15:56   ` Michal Hocko
  2 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2011-12-20 15:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, akpm, Hugh Dickins, Ying Han

On Wed 14-12-11 16:51:24, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This is a preparation before removing a flag PCG_ACCT_LRU in page_cgroup
> and reducing atomic ops/complexity in memcg LRU handling.
> 
> In some cases, pages are added to lru before charge to memcg and pages
> are not classfied to memory cgroup at lru addtion. Now, the lru where
> the page should be added is determined a bit in page_cgroup->flags and
> pc->mem_cgroup. I'd like to remove the check of flag.
> 
> To handle the case pc->mem_cgroup may contain stale pointers if pages are
> added to LRU before classification. This patch resets pc->mem_cgroup to
> root_mem_cgroup before lru additions.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

With Johannes' comments + ksm needs to include memcontrol.h I guess.
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  include/linux/memcontrol.h |    5 +++++
>  mm/ksm.c                   |    1 +
>  mm/memcontrol.c            |   14 ++++++++++++++
>  mm/swap_state.c            |    1 +
>  4 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index bd3b102..7428409 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -126,6 +126,7 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
>  extern void mem_cgroup_replace_page_cache(struct page *oldpage,
>  					struct page *newpage);
>  
> +extern void mem_cgroup_reset_owner(struct page *page);
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  extern int do_swap_account;
>  #endif
> @@ -388,6 +389,10 @@ static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
>  				struct page *newpage)
>  {
>  }
> +
> +static inline void mem_cgroup_reset_owner(struct page *page);
> +{
> +}
>  #endif /* CONFIG_CGROUP_MEM_CONT */
>  
>  #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index a6d3fb7..480983d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1571,6 +1571,7 @@ struct page *ksm_does_need_to_copy(struct page *page,
>  
>  	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
>  	if (new_page) {
> +		mem_cgroup_reset_owner(new_page);
>  		copy_user_highpage(new_page, page, address, vma);
>  
>  		SetPageDirty(new_page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7a857e8..2ae973d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3024,6 +3024,20 @@ void mem_cgroup_uncharge_swap(swp_entry_t ent)
>  	rcu_read_unlock();
>  }
>  
> +/*
> + * A function for resetting pc->mem_cgroup for newly allocated pages.
> + * This function should be called if the newpage will be added to LRU
> + * before start accounting.
> + */
> +void mem_cgroup_reset_owner(struct page *newpage)
> +{
> +	struct page_cgroup *pc;
> +
> +	pc = lookup_page_cgroup(newpage);
> +	VM_BUG_ON(PageCgroupUsed(pc));
> +	pc->mem_cgroup = root_mem_cgroup;
> +}
> +
>  /**
>   * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
>   * @entry: swap entry to be moved
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 78cc4d1..747539e 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -301,6 +301,7 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>  			new_page = alloc_page_vma(gfp_mask, vma, addr);
>  			if (!new_page)
>  				break;		/* Out of memory */
> +			mem_cgroup_reset_owner(new_page);
>  		}
>  
>  		/*
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] memcg: simplify LRU handling by new rule
  2011-12-14  7:52 ` [PATCH 4/4] memcg: simplify LRU handling by new rule KAMEZAWA Hiroyuki
  2011-12-19 15:48   ` Johannes Weiner
@ 2011-12-20 16:16   ` Michal Hocko
  2011-12-21  0:09     ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2011-12-20 16:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, akpm, Hugh Dickins, Ying Han

On Wed 14-12-11 16:52:26, KAMEZAWA Hiroyuki wrote:
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Now, at LRU handling, memory cgroup needs to do complicated works
> to see valid pc->mem_cgroup, which may be overwritten.
> 
> This patch is for relaxing the protocol. This patch guarantees
>    - when pc->mem_cgroup is overwritten, page must not be on LRU.

How the patch guarantees that? I do not see any enforcement. In fact we
depend on the previous patches, don't we.

> 
> By this, LRU routine can believe pc->mem_cgroup and don't need to
> check bits on pc->flags. This new rule may adds small overheads to
> swapin. But in most case, lru handling gets faster.
> 
> After this patch, PCG_ACCT_LRU bit is obsolete and removed.

It makes things much more simpler. I just think it needs a better
description.

> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/page_cgroup.h |    8 -----
>  mm/memcontrol.c             |   72 ++++++++++--------------------------------
>  2 files changed, 17 insertions(+), 63 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index aaa60da..2cddacf 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -10,8 +10,6 @@ enum {
>  	/* flags for mem_cgroup and file and I/O status */
>  	PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
>  	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
> -	/* No lock in page_cgroup */
> -	PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */
>  	__NR_PCG_FLAGS,
>  };
>  
> @@ -75,12 +73,6 @@ TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  SETPCGFLAG(Used, USED)
>  
> -SETPCGFLAG(AcctLRU, ACCT_LRU)
> -CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> -TESTPCGFLAG(AcctLRU, ACCT_LRU)
> -TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
> -
> -
>  SETPCGFLAG(FileMapped, FILE_MAPPED)
>  CLEARPCGFLAG(FileMapped, FILE_MAPPED)
>  TESTPCGFLAG(FileMapped, FILE_MAPPED)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2ae973d..d5e21e7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -974,30 +974,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
>  		return &zone->lruvec;
>  
>  	pc = lookup_page_cgroup(page);
> -	VM_BUG_ON(PageCgroupAcctLRU(pc));
> -	/*
> -	 * putback:				charge:
> -	 * SetPageLRU				SetPageCgroupUsed
> -	 * smp_mb				smp_mb
> -	 * PageCgroupUsed && add to memcg LRU	PageLRU && add to memcg LRU
> -	 *
> -	 * Ensure that one of the two sides adds the page to the memcg
> -	 * LRU during a race.
> -	 */
> -	smp_mb();
> -	/*
> -	 * If the page is uncharged, it may be freed soon, but it
> -	 * could also be swap cache (readahead, swapoff) that needs to
> -	 * be reclaimable in the future.  root_mem_cgroup will babysit
> -	 * it for the time being.
> -	 */
> -	if (PageCgroupUsed(pc)) {
> -		/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> -		smp_rmb();
> -		memcg = pc->mem_cgroup;
> -		SetPageCgroupAcctLRU(pc);
> -	} else
> -		memcg = root_mem_cgroup;
> +	memcg = pc->mem_cgroup;
> +	VM_BUG_ON(!memcg);
>  	mz = page_cgroup_zoneinfo(memcg, page);
>  	/* compound_order() is stabilized through lru_lock */
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
> @@ -1024,18 +1002,8 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
>  		return;
>  
>  	pc = lookup_page_cgroup(page);
> -	/*
> -	 * root_mem_cgroup babysits uncharged LRU pages, but
> -	 * PageCgroupUsed is cleared when the page is about to get
> -	 * freed.  PageCgroupAcctLRU remembers whether the
> -	 * LRU-accounting happened against pc->mem_cgroup or
> -	 * root_mem_cgroup.
> -	 */
> -	if (TestClearPageCgroupAcctLRU(pc)) {
> -		VM_BUG_ON(!pc->mem_cgroup);
> -		memcg = pc->mem_cgroup;
> -	} else
> -		memcg = root_mem_cgroup;
> +	memcg = pc->mem_cgroup;
> +	VM_BUG_ON(!memcg);
>  	mz = page_cgroup_zoneinfo(memcg, page);
>  	/* huge page split is done under lru_lock. so, we have no races. */
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1 << compound_order(page);
> @@ -2377,6 +2345,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  
>  	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), nr_pages);
>  	unlock_page_cgroup(pc);
> +	WARN_ON_ONCE(PageLRU(page));
>  	/*
>  	 * "charge_statistics" updated event counter. Then, check it.
>  	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> @@ -2388,7 +2357,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  
>  #define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MOVE_LOCK) |\
> -			(1 << PCG_ACCT_LRU) | (1 << PCG_MIGRATION))
> +			(1 << PCG_MIGRATION))
>  /*
>   * Because tail pages are not marked as "used", set it. We're under
>   * zone->lru_lock, 'splitting on pmd' and compound_lock.
> @@ -2399,6 +2368,8 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  {
>  	struct page_cgroup *head_pc = lookup_page_cgroup(head);
>  	struct page_cgroup *pc;
> +	struct mem_cgroup_per_zone *mz;
> +	enum lru_list lru;
>  	int i;
>  
>  	if (mem_cgroup_disabled())
> @@ -2407,23 +2378,15 @@ void mem_cgroup_split_huge_fixup(struct page *head)
>  		pc = head_pc + i;
>  		pc->mem_cgroup = head_pc->mem_cgroup;
>  		smp_wmb();/* see __commit_charge() */
> -		/*
> -		 * LRU flags cannot be copied because we need to add tail
> -		 * page to LRU by generic call and our hooks will be called.
> -		 */
>  		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
>  	}
> -
> -	if (PageCgroupAcctLRU(head_pc)) {
> -		enum lru_list lru;
> -		struct mem_cgroup_per_zone *mz;
> -		/*
> -		 * We hold lru_lock, then, reduce counter directly.
> -		 */
> -		lru = page_lru(head);
> -		mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
> -		MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
> -	}
> +	/* 
> +	 * Tail pages will be added to LRU.
> +	 * We hold lru_lock,then,reduce counter directly.
> +	 */
> +	lru = page_lru(head);
> +	mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
> +	MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
>  }
>  #endif
>  
> @@ -2656,10 +2619,9 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  	if (!page_is_file_cache(page))
>  		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
>  
> -	if (!PageSwapCache(page)) {
> +	if (!PageSwapCache(page))
>  		ret = mem_cgroup_charge_common(page, mm, gfp_mask, type);
> -		WARN_ON_ONCE(PageLRU(page));
> -	} else { /* page is swapcache/shmem */
> +	else { /* page is swapcache/shmem */
>  		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &memcg);
>  		if (!ret)
>  			__mem_cgroup_commit_charge_swapin(page, memcg, type);
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-19  0:01     ` KAMEZAWA Hiroyuki
@ 2011-12-20 21:58       ` Andrew Morton
  2011-12-21  0:01         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2011-12-20 21:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, Hugh Dickins, Ying Han

On Mon, 19 Dec 2011 09:01:22 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Fri, 16 Dec 2011 14:28:14 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Wed, 14 Dec 2011 16:49:22 +0900
> > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > 
> > > Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
> > > of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
> > > called against FUSE's pages from splice.
> > 
> > Speaking of ef6a3c6311 ("mm: add replace_page_cache_page() function"),
> > may I pathetically remind people that it's rather inefficient?
> > 
> > http://lkml.indiana.edu/hypermail/linux/kernel/1109.1/00375.html
> > 
> 
> IIRC, people says inefficient because it uses memcg codes for page-migration
> for fixing up accounting. Now, We added replace-page-cache for memcg in
> memcg-add-mem_cgroup_replace_page_cache-to-fix-lru-issue.patch
> 
> So, I think the problem originally mentioned is fixed.
> 

No, the inefficiency in replace_page_cache_page() is still there.  Two
identical walks down the radix tree, a pointless decrement then
increment of mapping->nrpages, two writes to page->mapping, an often
pointless decrement then increment of NR_FILE_PAGES, and probably other things.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] memcg: simplify page cache charging.
  2011-12-20 21:58       ` Andrew Morton
@ 2011-12-21  0:01         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-21  0:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, hannes, Michal Hocko, Hugh Dickins, Ying Han

On Tue, 20 Dec 2011 13:58:17 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon, 19 Dec 2011 09:01:22 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > On Fri, 16 Dec 2011 14:28:14 -0800
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Wed, 14 Dec 2011 16:49:22 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > > 
> > > > Because of commit ef6a3c6311, FUSE uses replace_page_cache() instead
> > > > of add_to_page_cache(). Then, mem_cgroup_cache_charge() is not
> > > > called against FUSE's pages from splice.
> > > 
> > > Speaking of ef6a3c6311 ("mm: add replace_page_cache_page() function"),
> > > may I pathetically remind people that it's rather inefficient?
> > > 
> > > http://lkml.indiana.edu/hypermail/linux/kernel/1109.1/00375.html
> > > 
> > 
> > IIRC, people says inefficient because it uses memcg codes for page-migration
> > for fixing up accounting. Now, We added replace-page-cache for memcg in
> > memcg-add-mem_cgroup_replace_page_cache-to-fix-lru-issue.patch
> > 
> > So, I think the problem originally mentioned is fixed.
> > 
> 
> No, the inefficiency in replace_page_cache_page() is still there.  Two
> identical walks down the radix tree, a pointless decrement then
> increment of mapping->nrpages, two writes to page->mapping, an often
> pointless decrement then increment of NR_FILE_PAGES, and probably other things.
> 

Hmm, then, replace_page_cache_page() itself has some problem.
I'll look into that.

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] memcg: simplify LRU handling by new rule
  2011-12-20 16:16   ` Michal Hocko
@ 2011-12-21  0:09     ` KAMEZAWA Hiroyuki
  2011-12-21  6:56       ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-12-21  0:09 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, linux-kernel, hannes, akpm, Hugh Dickins, Ying Han

On Tue, 20 Dec 2011 17:16:15 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> On Wed 14-12-11 16:52:26, KAMEZAWA Hiroyuki wrote:
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > Now, at LRU handling, memory cgroup needs to do complicated works
> > to see valid pc->mem_cgroup, which may be overwritten.
> > 
> > This patch is for relaxing the protocol. This patch guarantees
> >    - when pc->mem_cgroup is overwritten, page must not be on LRU.
> 
> How the patch guarantees that? I do not see any enforcement. In fact we
> depend on the previous patches, don't we.
> 

Ah, yes. We depends on previous patch series.


> > 
> > By this, LRU routine can believe pc->mem_cgroup and don't need to
> > check bits on pc->flags. This new rule may adds small overheads to
> > swapin. But in most case, lru handling gets faster.
> > 
> > After this patch, PCG_ACCT_LRU bit is obsolete and removed.
> 
> It makes things much more simpler. I just think it needs a better
> description.
> 

O.K.

99% of memcg charging are done by following call path.

   - alloc_page() -> charge() -> map/enter radix-tree -> add to LRU.

We need some special case cares.

   - SwapCache - newly allocated/fully unmapped pages are added to LRU
                 before charge.
     => handled by previous patch.
   - FUSE      - unused pages are reused.
     => handled by previous patch.

   - move_account
     => we do isolate_page().

Now, we can guarantee pc->mem_cgroup is set when page is not added to
LRU or under zone->lru_lock + isolate from LRU.

I'll add some Documenation to...memcg_debug.txt

Thanks,
-Kame

> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  include/linux/page_cgroup.h |    8 -----
> >  mm/memcontrol.c             |   72 ++++++++++--------------------------------
> >  2 files changed, 17 insertions(+), 63 deletions(-)
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index aaa60da..2cddacf 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -10,8 +10,6 @@ enum {
> >  	/* flags for mem_cgroup and file and I/O status */
> >  	PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
> >  	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
> > -	/* No lock in page_cgroup */
> > -	PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */
> >  	__NR_PCG_FLAGS,
> >  };
> >  
> > @@ -75,12 +73,6 @@ TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> >  SETPCGFLAG(Used, USED)
> >  
> > -SETPCGFLAG(AcctLRU, ACCT_LRU)
> > -CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> > -TESTPCGFLAG(AcctLRU, ACCT_LRU)
> > -TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
> > -
> > -
> >  SETPCGFLAG(FileMapped, FILE_MAPPED)
> >  CLEARPCGFLAG(FileMapped, FILE_MAPPED)
> >  TESTPCGFLAG(FileMapped, FILE_MAPPED)
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 2ae973d..d5e21e7 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -974,30 +974,8 @@ struct lruvec *mem_cgroup_lru_add_list(struct zone *zone, struct page *page,
> >  		return &zone->lruvec;
> >  
> >  	pc = lookup_page_cgroup(page);
> > -	VM_BUG_ON(PageCgroupAcctLRU(pc));
> > -	/*
> > -	 * putback:				charge:
> > -	 * SetPageLRU				SetPageCgroupUsed
> > -	 * smp_mb				smp_mb
> > -	 * PageCgroupUsed && add to memcg LRU	PageLRU && add to memcg LRU
> > -	 *
> > -	 * Ensure that one of the two sides adds the page to the memcg
> > -	 * LRU during a race.
> > -	 */
> > -	smp_mb();
> > -	/*
> > -	 * If the page is uncharged, it may be freed soon, but it
> > -	 * could also be swap cache (readahead, swapoff) that needs to
> > -	 * be reclaimable in the future.  root_mem_cgroup will babysit
> > -	 * it for the time being.
> > -	 */
> > -	if (PageCgroupUsed(pc)) {
> > -		/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> > -		smp_rmb();
> > -		memcg = pc->mem_cgroup;
> > -		SetPageCgroupAcctLRU(pc);
> > -	} else
> > -		memcg = root_mem_cgroup;
> > +	memcg = pc->mem_cgroup;
> > +	VM_BUG_ON(!memcg);
> >  	mz = page_cgroup_zoneinfo(memcg, page);
> >  	/* compound_order() is stabilized through lru_lock */
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
> > @@ -1024,18 +1002,8 @@ void mem_cgroup_lru_del_list(struct page *page, enum lru_list lru)
> >  		return;
> >  
> >  	pc = lookup_page_cgroup(page);
> > -	/*
> > -	 * root_mem_cgroup babysits uncharged LRU pages, but
> > -	 * PageCgroupUsed is cleared when the page is about to get
> > -	 * freed.  PageCgroupAcctLRU remembers whether the
> > -	 * LRU-accounting happened against pc->mem_cgroup or
> > -	 * root_mem_cgroup.
> > -	 */
> > -	if (TestClearPageCgroupAcctLRU(pc)) {
> > -		VM_BUG_ON(!pc->mem_cgroup);
> > -		memcg = pc->mem_cgroup;
> > -	} else
> > -		memcg = root_mem_cgroup;
> > +	memcg = pc->mem_cgroup;
> > +	VM_BUG_ON(!memcg);
> >  	mz = page_cgroup_zoneinfo(memcg, page);
> >  	/* huge page split is done under lru_lock. so, we have no races. */
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1 << compound_order(page);
> > @@ -2377,6 +2345,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
> >  
> >  	mem_cgroup_charge_statistics(memcg, PageCgroupCache(pc), nr_pages);
> >  	unlock_page_cgroup(pc);
> > +	WARN_ON_ONCE(PageLRU(page));
> >  	/*
> >  	 * "charge_statistics" updated event counter. Then, check it.
> >  	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
> > @@ -2388,7 +2357,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  
> >  #define PCGF_NOCOPY_AT_SPLIT ((1 << PCG_LOCK) | (1 << PCG_MOVE_LOCK) |\
> > -			(1 << PCG_ACCT_LRU) | (1 << PCG_MIGRATION))
> > +			(1 << PCG_MIGRATION))
> >  /*
> >   * Because tail pages are not marked as "used", set it. We're under
> >   * zone->lru_lock, 'splitting on pmd' and compound_lock.
> > @@ -2399,6 +2368,8 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> >  {
> >  	struct page_cgroup *head_pc = lookup_page_cgroup(head);
> >  	struct page_cgroup *pc;
> > +	struct mem_cgroup_per_zone *mz;
> > +	enum lru_list lru;
> >  	int i;
> >  
> >  	if (mem_cgroup_disabled())
> > @@ -2407,23 +2378,15 @@ void mem_cgroup_split_huge_fixup(struct page *head)
> >  		pc = head_pc + i;
> >  		pc->mem_cgroup = head_pc->mem_cgroup;
> >  		smp_wmb();/* see __commit_charge() */
> > -		/*
> > -		 * LRU flags cannot be copied because we need to add tail
> > -		 * page to LRU by generic call and our hooks will be called.
> > -		 */
> >  		pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT;
> >  	}
> > -
> > -	if (PageCgroupAcctLRU(head_pc)) {
> > -		enum lru_list lru;
> > -		struct mem_cgroup_per_zone *mz;
> > -		/*
> > -		 * We hold lru_lock, then, reduce counter directly.
> > -		 */
> > -		lru = page_lru(head);
> > -		mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
> > -		MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
> > -	}
> > +	/* 
> > +	 * Tail pages will be added to LRU.
> > +	 * We hold lru_lock,then,reduce counter directly.
> > +	 */
> > +	lru = page_lru(head);
> > +	mz = page_cgroup_zoneinfo(head_pc->mem_cgroup, head);
> > +	MEM_CGROUP_ZSTAT(mz, lru) -= HPAGE_PMD_NR - 1;
> >  }
> >  #endif
> >  
> > @@ -2656,10 +2619,9 @@ int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
> >  	if (!page_is_file_cache(page))
> >  		type = MEM_CGROUP_CHARGE_TYPE_SHMEM;
> >  
> > -	if (!PageSwapCache(page)) {
> > +	if (!PageSwapCache(page))
> >  		ret = mem_cgroup_charge_common(page, mm, gfp_mask, type);
> > -		WARN_ON_ONCE(PageLRU(page));
> > -	} else { /* page is swapcache/shmem */
> > +	else { /* page is swapcache/shmem */
> >  		ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &memcg);
> >  		if (!ret)
> >  			__mem_cgroup_commit_charge_swapin(page, memcg, type);
> > -- 
> > 1.7.4.1
> > 
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9    
> Czech Republic
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] memcg: simplify LRU handling by new rule
  2011-12-21  0:09     ` KAMEZAWA Hiroyuki
@ 2011-12-21  6:56       ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2011-12-21  6:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, hannes, akpm, Hugh Dickins, Ying Han

On Wed 21-12-11 09:09:41, KAMEZAWA Hiroyuki wrote:
> On Tue, 20 Dec 2011 17:16:15 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > On Wed 14-12-11 16:52:26, KAMEZAWA Hiroyuki wrote:
> > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > Now, at LRU handling, memory cgroup needs to do complicated works
> > > to see valid pc->mem_cgroup, which may be overwritten.
> > > 
> > > This patch is for relaxing the protocol. This patch guarantees
> > >    - when pc->mem_cgroup is overwritten, page must not be on LRU.
> > 
> > How the patch guarantees that? I do not see any enforcement. In fact we
> > depend on the previous patches, don't we.
> > 
> 
> Ah, yes. We depends on previous patch series.
> 
> 
> > > 
> > > By this, LRU routine can believe pc->mem_cgroup and don't need to
> > > check bits on pc->flags. This new rule may adds small overheads to
> > > swapin. But in most case, lru handling gets faster.
> > > 
> > > After this patch, PCG_ACCT_LRU bit is obsolete and removed.
> > 
> > It makes things much more simpler. I just think it needs a better
> > description.
> > 
> 
> O.K.
> 
> 99% of memcg charging are done by following call path.
> 
>    - alloc_page() -> charge() -> map/enter radix-tree -> add to LRU.
> 
> We need some special case cares.
> 
>    - SwapCache - newly allocated/fully unmapped pages are added to LRU
>                  before charge.
>      => handled by previous patch.
>    - FUSE      - unused pages are reused.
>      => handled by previous patch.
> 
>    - move_account
>      => we do isolate_page().
> 
> Now, we can guarantee pc->mem_cgroup is set when page is not added to
> LRU or under zone->lru_lock + isolate from LRU.
> 
> I'll add some Documenation to...memcg_debug.txt

Yes, much better.
Btw. I forgot to add
Acked-by: Michal Hocko <mhocko@suse.cz>

Thanks
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-12-21  6:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-14  7:47 [PATCH 0/4] memcg: simplify LRU handling KAMEZAWA Hiroyuki
2011-12-14  7:49 ` [PATCH 1/4] memcg: simplify page cache charging KAMEZAWA Hiroyuki
2011-12-16 22:28   ` Andrew Morton
2011-12-19  0:01     ` KAMEZAWA Hiroyuki
2011-12-20 21:58       ` Andrew Morton
2011-12-21  0:01         ` KAMEZAWA Hiroyuki
2011-12-19 15:04   ` Johannes Weiner
2011-12-20 15:33   ` Michal Hocko
2011-12-14  7:50 ` [PATCH 2/4] memcg: simplify corner case handling of LRU KAMEZAWA Hiroyuki
2011-12-19 15:14   ` Johannes Weiner
2011-12-20 15:05   ` Michal Hocko
2011-12-14  7:51 ` [PATCH 3/4] memcg: clear pc->mem_cgorup if necessary KAMEZAWA Hiroyuki
2011-12-16 22:30   ` Andrew Morton
2011-12-19 15:37   ` Johannes Weiner
2011-12-20  0:35     ` Hiroyuki Kamezawa
2011-12-20 15:56   ` Michal Hocko
2011-12-14  7:52 ` [PATCH 4/4] memcg: simplify LRU handling by new rule KAMEZAWA Hiroyuki
2011-12-19 15:48   ` Johannes Weiner
2011-12-20 16:16   ` Michal Hocko
2011-12-21  0:09     ` KAMEZAWA Hiroyuki
2011-12-21  6:56       ` Michal Hocko
2011-12-19 15:56 ` [PATCH 0/4] memcg: simplify LRU handling Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).