All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 1/5] mm: memcg softlimit reclaim rework
@ 2012-06-18 16:47 Ying Han
  2012-06-18 16:47 ` [PATCH V2 2/5] mm: memcg set soft_limit_in_bytes to 0 by default Ying Han
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Ying Han @ 2012-06-18 16:47 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hillf Danton, Hugh Dickins, Dan Magenheimer,
	Andrew Morton
  Cc: linux-mm

This patch reverts all the existing softlimit reclaim implementations and
instead integrates the softlimit reclaim into existing global reclaim logic.

The new softlimit reclaim includes the following changes:

1. add function should_reclaim_mem_cgroup()

Add the filter function should_reclaim_mem_cgroup() under the common function
shrink_zone(). The later one is being called both from per-memcg reclaim as
well as global reclaim.

Today the softlimit takes effect only under global memory pressure. The memcgs
get free run above their softlimit until there is a global memory contention.
This patch doesn't change the semantics.

Under the global reclaim, we try to skip reclaiming from a memcg under its
softlimit. To prevent reclaim from trying too hard on hitting memcgs
(above softlimit) w/ only hard-to-reclaim pages, the reclaim priority is used
to skip the softlimit check. This is a trade-off of system performance and
resource isolation.

2. "hierarchical" softlimit reclaim

This is consistant to how softlimit was previously implemented, where the
pressure is put for the whole hiearchy as long as the "root" of the hierarchy
over its softlimit.

This part is not in my previous posts, and is quite different from my
understanding of softlimit reclaim. After quite a lot of discussions with
Johannes and Michal, i decided to go with it for now. And this is designed
to work with both trusted setups and untrusted setups.

What's the trusted and untrusted setups ?

case 1 : Administrator is the only one setting up the limits and also he
expects gurantees of memory under each cgroup's softlimit:

Considering the following:

root (soft: unlimited, use_hierarchy = 1)
  -- A (soft: unlimited, usage 22G)
      -- A1 (soft: 10G, usage 17G)
      -- A2 (soft: 6G, usage 5G)
  -- B (soft: 16G, usage 10G)

So we have A1 above its softlimit and none of its ancestor does, then
global reclaim will only pick A1 to reclaim first.

case 2: Untrusted enviroment where cgroups changes its softlimit or
adminstrator could make mistakes. In that case, we still want to attack the
mis-configured child if its parent is above softlimit.

Considering the following:

root (soft: unlimited, use_hierarchy = 1)
  -- A (soft: 16G, usage 22G)
      -- A1 (soft: 10G, usage 17G)
      -- A2 (soft: 1000G, usage 5G)
  -- B (soft: 16G, usage 10G)

Here A2 would set its softlimit way higher than its parent, but the current
logic makes sure to still attack it when A exceeds its softlimit.

v4..v5:
1. rebase the patchset on memcg-dev tree
2. apply KOSAKI's patch on do_try_to_free_pages()

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Ying Han <yinghan@google.com>
---
 include/linux/memcontrol.h |   18 +-
 include/linux/swap.h       |    4 -
 mm/memcontrol.c            |  455 +++-----------------------------------------
 mm/vmscan.c                |   82 ++-------
 4 files changed, 49 insertions(+), 510 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0e495bc..2b0a4f2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -124,6 +124,8 @@ extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
 extern void mem_cgroup_replace_page_cache(struct page *oldpage,
 					struct page *newpage);
 
+extern bool should_reclaim_mem_cgroup(struct mem_cgroup *memcg);
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 extern int do_swap_account;
 #endif
@@ -179,9 +181,6 @@ static inline void mem_cgroup_dec_page_stat(struct page *page,
 	mem_cgroup_update_page_stat(page, idx, -1);
 }
 
-unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
-						gfp_t gfp_mask,
-						unsigned long *total_scanned);
 u64 mem_cgroup_get_limit(struct mem_cgroup *memcg);
 
 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx);
@@ -359,14 +358,6 @@ static inline void mem_cgroup_dec_page_stat(struct page *page,
 }
 
 static inline
-unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
-					    gfp_t gfp_mask,
-					    unsigned long *total_scanned)
-{
-	return 0;
-}
-
-static inline
 u64 mem_cgroup_get_limit(struct mem_cgroup *memcg)
 {
 	return 0;
@@ -384,6 +375,11 @@ static inline void mem_cgroup_replace_page_cache(struct page *oldpage,
 				struct page *newpage)
 {
 }
+static inline
+bool should_reclaim_mem_cgroup(struct mem_cgroup *memcg)
+{
+	return true;
+}
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #if !defined(CONFIG_CGROUP_MEM_RES_CTLR) || !defined(CONFIG_DEBUG_VM)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 35c545c..8f38222 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -254,10 +254,6 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
 						  gfp_t gfp_mask, bool noswap);
-extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						unsigned long *nr_scanned);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c9d897a..dfe8fc3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -35,7 +35,6 @@
 #include <linux/limits.h>
 #include <linux/export.h>
 #include <linux/mutex.h>
-#include <linux/rbtree.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
@@ -131,7 +130,6 @@ static const char * const mem_cgroup_events_names[] = {
  */
 enum mem_cgroup_events_target {
 	MEM_CGROUP_TARGET_THRESH,
-	MEM_CGROUP_TARGET_SOFTLIMIT,
 	MEM_CGROUP_TARGET_NUMAINFO,
 	MEM_CGROUP_NTARGETS,
 };
@@ -161,13 +159,6 @@ struct mem_cgroup_per_zone {
 	unsigned long		lru_size[NR_LRU_LISTS];
 
 	struct mem_cgroup_reclaim_iter reclaim_iter[DEF_PRIORITY + 1];
-
-	struct rb_node		tree_node;	/* RB tree node */
-	unsigned long long	usage_in_excess;/* Set to the value by which */
-						/* the soft limit is exceeded*/
-	bool			on_tree;
-	struct mem_cgroup	*memcg;		/* Back pointer, we cannot */
-						/* use container_of	   */
 };
 
 struct mem_cgroup_per_node {
@@ -178,26 +169,6 @@ struct mem_cgroup_lru_info {
 	struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
 };
 
-/*
- * Cgroups above their limits are maintained in a RB-Tree, independent of
- * their hierarchy representation
- */
-
-struct mem_cgroup_tree_per_zone {
-	struct rb_root rb_root;
-	spinlock_t lock;
-};
-
-struct mem_cgroup_tree_per_node {
-	struct mem_cgroup_tree_per_zone rb_tree_per_zone[MAX_NR_ZONES];
-};
-
-struct mem_cgroup_tree {
-	struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES];
-};
-
-static struct mem_cgroup_tree soft_limit_tree __read_mostly;
-
 struct mem_cgroup_threshold {
 	struct eventfd_ctx *eventfd;
 	u64 threshold;
@@ -383,12 +354,7 @@ static bool move_file(void)
 					&mc.to->move_charge_at_immigrate);
 }
 
-/*
- * Maximum loops in mem_cgroup_hierarchical_reclaim(), used for soft
- * limit reclaim to prevent infinite loops, if they ever occur.
- */
 #define	MEM_CGROUP_MAX_RECLAIM_LOOPS		100
-#define	MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS	2
 
 enum charge_type {
 	MEM_CGROUP_CHARGE_TYPE_CACHE = 0,
@@ -425,12 +391,12 @@ enum charge_type {
 static void mem_cgroup_get(struct mem_cgroup *memcg);
 static void mem_cgroup_put(struct mem_cgroup *memcg);
 
+static bool mem_cgroup_is_root(struct mem_cgroup *memcg);
 /* Writing them here to avoid exposing memcg's inner layout */
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
 #include <net/sock.h>
 #include <net/ip.h>
 
-static bool mem_cgroup_is_root(struct mem_cgroup *memcg);
 void sock_update_memcg(struct sock *sk)
 {
 	if (mem_cgroup_sockets_enabled) {
@@ -522,164 +488,6 @@ page_cgroup_zoneinfo(struct mem_cgroup *memcg, struct page *page)
 	return mem_cgroup_zoneinfo(memcg, nid, zid);
 }
 
-static struct mem_cgroup_tree_per_zone *
-soft_limit_tree_node_zone(int nid, int zid)
-{
-	return &soft_limit_tree.rb_tree_per_node[nid]->rb_tree_per_zone[zid];
-}
-
-static struct mem_cgroup_tree_per_zone *
-soft_limit_tree_from_page(struct page *page)
-{
-	int nid = page_to_nid(page);
-	int zid = page_zonenum(page);
-
-	return &soft_limit_tree.rb_tree_per_node[nid]->rb_tree_per_zone[zid];
-}
-
-static void
-__mem_cgroup_insert_exceeded(struct mem_cgroup *memcg,
-				struct mem_cgroup_per_zone *mz,
-				struct mem_cgroup_tree_per_zone *mctz,
-				unsigned long long new_usage_in_excess)
-{
-	struct rb_node **p = &mctz->rb_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct mem_cgroup_per_zone *mz_node;
-
-	if (mz->on_tree)
-		return;
-
-	mz->usage_in_excess = new_usage_in_excess;
-	if (!mz->usage_in_excess)
-		return;
-	while (*p) {
-		parent = *p;
-		mz_node = rb_entry(parent, struct mem_cgroup_per_zone,
-					tree_node);
-		if (mz->usage_in_excess < mz_node->usage_in_excess)
-			p = &(*p)->rb_left;
-		/*
-		 * We can't avoid mem cgroups that are over their soft
-		 * limit by the same amount
-		 */
-		else if (mz->usage_in_excess >= mz_node->usage_in_excess)
-			p = &(*p)->rb_right;
-	}
-	rb_link_node(&mz->tree_node, parent, p);
-	rb_insert_color(&mz->tree_node, &mctz->rb_root);
-	mz->on_tree = true;
-}
-
-static void
-__mem_cgroup_remove_exceeded(struct mem_cgroup *memcg,
-				struct mem_cgroup_per_zone *mz,
-				struct mem_cgroup_tree_per_zone *mctz)
-{
-	if (!mz->on_tree)
-		return;
-	rb_erase(&mz->tree_node, &mctz->rb_root);
-	mz->on_tree = false;
-}
-
-static void
-mem_cgroup_remove_exceeded(struct mem_cgroup *memcg,
-				struct mem_cgroup_per_zone *mz,
-				struct mem_cgroup_tree_per_zone *mctz)
-{
-	spin_lock(&mctz->lock);
-	__mem_cgroup_remove_exceeded(memcg, mz, mctz);
-	spin_unlock(&mctz->lock);
-}
-
-
-static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page *page)
-{
-	unsigned long long excess;
-	struct mem_cgroup_per_zone *mz;
-	struct mem_cgroup_tree_per_zone *mctz;
-	int nid = page_to_nid(page);
-	int zid = page_zonenum(page);
-	mctz = soft_limit_tree_from_page(page);
-
-	/*
-	 * Necessary to update all ancestors when hierarchy is used.
-	 * because their event counter is not touched.
-	 */
-	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
-		mz = mem_cgroup_zoneinfo(memcg, nid, zid);
-		excess = res_counter_soft_limit_excess(&memcg->res);
-		/*
-		 * We have to update the tree if mz is on RB-tree or
-		 * mem is over its softlimit.
-		 */
-		if (excess || mz->on_tree) {
-			spin_lock(&mctz->lock);
-			/* if on-tree, remove it */
-			if (mz->on_tree)
-				__mem_cgroup_remove_exceeded(memcg, mz, mctz);
-			/*
-			 * Insert again. mz->usage_in_excess will be updated.
-			 * If excess is 0, no tree ops.
-			 */
-			__mem_cgroup_insert_exceeded(memcg, mz, mctz, excess);
-			spin_unlock(&mctz->lock);
-		}
-	}
-}
-
-static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg)
-{
-	int node, zone;
-	struct mem_cgroup_per_zone *mz;
-	struct mem_cgroup_tree_per_zone *mctz;
-
-	for_each_node(node) {
-		for (zone = 0; zone < MAX_NR_ZONES; zone++) {
-			mz = mem_cgroup_zoneinfo(memcg, node, zone);
-			mctz = soft_limit_tree_node_zone(node, zone);
-			mem_cgroup_remove_exceeded(memcg, mz, mctz);
-		}
-	}
-}
-
-static struct mem_cgroup_per_zone *
-__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
-{
-	struct rb_node *rightmost = NULL;
-	struct mem_cgroup_per_zone *mz;
-
-retry:
-	mz = NULL;
-	rightmost = rb_last(&mctz->rb_root);
-	if (!rightmost)
-		goto done;		/* Nothing to reclaim from */
-
-	mz = rb_entry(rightmost, struct mem_cgroup_per_zone, tree_node);
-	/*
-	 * Remove the node now but someone else can add it back,
-	 * we will to add it back at the end of reclaim to its correct
-	 * position in the tree.
-	 */
-	__mem_cgroup_remove_exceeded(mz->memcg, mz, mctz);
-	if (!res_counter_soft_limit_excess(&mz->memcg->res) ||
-		!css_tryget(&mz->memcg->css))
-		goto retry;
-done:
-	return mz;
-}
-
-static struct mem_cgroup_per_zone *
-mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
-{
-	struct mem_cgroup_per_zone *mz;
-
-	spin_lock(&mctz->lock);
-	mz = __mem_cgroup_largest_soft_limit_node(mctz);
-	spin_unlock(&mctz->lock);
-	return mz;
-}
-
 /*
  * Implementation Note: reading percpu statistics for memcg.
  *
@@ -833,9 +641,6 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
 		case MEM_CGROUP_TARGET_THRESH:
 			next = val + THRESHOLDS_EVENTS_TARGET;
 			break;
-		case MEM_CGROUP_TARGET_SOFTLIMIT:
-			next = val + SOFTLIMIT_EVENTS_TARGET;
-			break;
 		case MEM_CGROUP_TARGET_NUMAINFO:
 			next = val + NUMAINFO_EVENTS_TARGET;
 			break;
@@ -858,11 +663,8 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
 	/* threshold event is triggered in finer grain than soft limit */
 	if (unlikely(mem_cgroup_event_ratelimit(memcg,
 						MEM_CGROUP_TARGET_THRESH))) {
-		bool do_softlimit;
 		bool do_numainfo __maybe_unused;
 
-		do_softlimit = mem_cgroup_event_ratelimit(memcg,
-						MEM_CGROUP_TARGET_SOFTLIMIT);
 #if MAX_NUMNODES > 1
 		do_numainfo = mem_cgroup_event_ratelimit(memcg,
 						MEM_CGROUP_TARGET_NUMAINFO);
@@ -870,8 +672,6 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
 		preempt_enable();
 
 		mem_cgroup_threshold(memcg);
-		if (unlikely(do_softlimit))
-			mem_cgroup_update_tree(memcg, page);
 #if MAX_NUMNODES > 1
 		if (unlikely(do_numainfo))
 			atomic_inc(&memcg->numainfo_events);
@@ -922,6 +722,31 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
 	return memcg;
 }
 
+bool should_reclaim_mem_cgroup(struct mem_cgroup *memcg)
+{
+	if (mem_cgroup_disabled())
+		return true;
+
+	/*
+	 * We treat the root cgroup special here to always reclaim pages.
+	 * Now root cgroup has its own lru, and the only chance to reclaim
+	 * pages from it is through global reclaim. note, root cgroup does
+	 * not trigger targeted reclaim.
+	 */
+	if (mem_cgroup_is_root(memcg))
+		return true;
+
+	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
+		/* This is global reclaim, stop at root cgroup */
+		if (mem_cgroup_is_root(memcg))
+			break;
+		if (res_counter_soft_limit_excess(&memcg->res))
+			return true;
+	}
+
+	return false;
+}
+
 /**
  * mem_cgroup_iter - iterate over memory cgroup hierarchy
  * @root: hierarchy root
@@ -1614,106 +1439,13 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg)
 	return node;
 }
 
-/*
- * Check all nodes whether it contains reclaimable pages or not.
- * For quick scan, we make use of scan_nodes. This will allow us to skip
- * unused nodes. But scan_nodes is lazily updated and may not cotain
- * enough new information. We need to do double check.
- */
-static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
-{
-	int nid;
-
-	/*
-	 * quick check...making use of scan_node.
-	 * We can skip unused nodes.
-	 */
-	if (!nodes_empty(memcg->scan_nodes)) {
-		for (nid = first_node(memcg->scan_nodes);
-		     nid < MAX_NUMNODES;
-		     nid = next_node(nid, memcg->scan_nodes)) {
-
-			if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
-				return true;
-		}
-	}
-	/*
-	 * Check rest of nodes.
-	 */
-	for_each_node_state(nid, N_HIGH_MEMORY) {
-		if (node_isset(nid, memcg->scan_nodes))
-			continue;
-		if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
-			return true;
-	}
-	return false;
-}
-
 #else
 int mem_cgroup_select_victim_node(struct mem_cgroup *memcg)
 {
 	return 0;
 }
-
-static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
-{
-	return test_mem_cgroup_node_reclaimable(memcg, 0, noswap);
-}
 #endif
 
-static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
-				   struct zone *zone,
-				   gfp_t gfp_mask,
-				   unsigned long *total_scanned)
-{
-	struct mem_cgroup *victim = NULL;
-	int total = 0;
-	int loop = 0;
-	unsigned long excess;
-	unsigned long nr_scanned;
-	struct mem_cgroup_reclaim_cookie reclaim = {
-		.zone = zone,
-		.priority = 0,
-	};
-
-	excess = res_counter_soft_limit_excess(&root_memcg->res) >> PAGE_SHIFT;
-
-	while (1) {
-		victim = mem_cgroup_iter(root_memcg, victim, &reclaim);
-		if (!victim) {
-			loop++;
-			if (loop >= 2) {
-				/*
-				 * If we have not been able to reclaim
-				 * anything, it might because there are
-				 * no reclaimable pages under this hierarchy
-				 */
-				if (!total)
-					break;
-				/*
-				 * We want to do more targeted reclaim.
-				 * excess >> 2 is not to excessive so as to
-				 * reclaim too much, nor too less that we keep
-				 * coming back to reclaim from this cgroup
-				 */
-				if (total >= (excess >> 2) ||
-					(loop > MEM_CGROUP_MAX_RECLAIM_LOOPS))
-					break;
-			}
-			continue;
-		}
-		if (!mem_cgroup_reclaimable(victim, false))
-			continue;
-		total += mem_cgroup_shrink_node_zone(victim, gfp_mask, false,
-						     zone, &nr_scanned);
-		*total_scanned += nr_scanned;
-		if (!res_counter_soft_limit_excess(&root_memcg->res))
-			break;
-	}
-	mem_cgroup_iter_break(root_memcg, victim);
-	return total;
-}
-
 /*
  * Check OOM-Killer is already running under our hierarchy.
  * If someone is running, return false.
@@ -2547,8 +2279,6 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *memcg,
 
 	/*
 	 * "charge_statistics" updated event counter. Then, check it.
-	 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree.
-	 * if they exceeds softlimit.
 	 */
 	memcg_check_events(memcg, page);
 }
@@ -3702,98 +3432,6 @@ static int mem_cgroup_resize_memsw_limit(struct mem_cgroup *memcg,
 	return ret;
 }
 
-unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
-					    gfp_t gfp_mask,
-					    unsigned long *total_scanned)
-{
-	unsigned long nr_reclaimed = 0;
-	struct mem_cgroup_per_zone *mz, *next_mz = NULL;
-	unsigned long reclaimed;
-	int loop = 0;
-	struct mem_cgroup_tree_per_zone *mctz;
-	unsigned long long excess;
-	unsigned long nr_scanned;
-
-	if (order > 0)
-		return 0;
-
-	mctz = soft_limit_tree_node_zone(zone_to_nid(zone), zone_idx(zone));
-	/*
-	 * This loop can run a while, specially if mem_cgroup's continuously
-	 * keep exceeding their soft limit and putting the system under
-	 * pressure
-	 */
-	do {
-		if (next_mz)
-			mz = next_mz;
-		else
-			mz = mem_cgroup_largest_soft_limit_node(mctz);
-		if (!mz)
-			break;
-
-		nr_scanned = 0;
-		reclaimed = mem_cgroup_soft_reclaim(mz->memcg, zone,
-						    gfp_mask, &nr_scanned);
-		nr_reclaimed += reclaimed;
-		*total_scanned += nr_scanned;
-		spin_lock(&mctz->lock);
-
-		/*
-		 * If we failed to reclaim anything from this memory cgroup
-		 * it is time to move on to the next cgroup
-		 */
-		next_mz = NULL;
-		if (!reclaimed) {
-			do {
-				/*
-				 * Loop until we find yet another one.
-				 *
-				 * By the time we get the soft_limit lock
-				 * again, someone might have aded the
-				 * group back on the RB tree. Iterate to
-				 * make sure we get a different mem.
-				 * mem_cgroup_largest_soft_limit_node returns
-				 * NULL if no other cgroup is present on
-				 * the tree
-				 */
-				next_mz =
-				__mem_cgroup_largest_soft_limit_node(mctz);
-				if (next_mz == mz)
-					css_put(&next_mz->memcg->css);
-				else /* next_mz == NULL or other memcg */
-					break;
-			} while (1);
-		}
-		__mem_cgroup_remove_exceeded(mz->memcg, mz, mctz);
-		excess = res_counter_soft_limit_excess(&mz->memcg->res);
-		/*
-		 * One school of thought says that we should not add
-		 * back the node to the tree if reclaim returns 0.
-		 * But our reclaim could return 0, simply because due
-		 * to priority we are exposing a smaller subset of
-		 * memory to reclaim from. Consider this as a longer
-		 * term TODO.
-		 */
-		/* If excess == 0, no tree ops */
-		__mem_cgroup_insert_exceeded(mz->memcg, mz, mctz, excess);
-		spin_unlock(&mctz->lock);
-		css_put(&mz->memcg->css);
-		loop++;
-		/*
-		 * Could not reclaim anything and there are no more
-		 * mem cgroups to try or we seem to be looping without
-		 * reclaiming anything.
-		 */
-		if (!nr_reclaimed &&
-			(next_mz == NULL ||
-			loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
-			break;
-	} while (!nr_reclaimed);
-	if (next_mz)
-		css_put(&next_mz->memcg->css);
-	return nr_reclaimed;
-}
-
 /*
  * This routine traverse page_cgroup in given list and drop them all.
  * *And* this routine doesn't reclaim page itself, just removes page_cgroup.
@@ -4886,9 +4524,6 @@ static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node)
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
 		mz = &pn->zoneinfo[zone];
 		lruvec_init(&mz->lruvec, &NODE_DATA(node)->node_zones[zone]);
-		mz->usage_in_excess = 0;
-		mz->on_tree = false;
-		mz->memcg = memcg;
 	}
 	memcg->info.nodeinfo[node] = pn;
 	return 0;
@@ -4980,7 +4615,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
 	int node;
 
-	mem_cgroup_remove_from_trees(memcg);
 	free_css_id(&mem_cgroup_subsys, &memcg->css);
 
 	for_each_node(node)
@@ -5033,41 +4667,6 @@ static void __init enable_swap_cgroup(void)
 }
 #endif
 
-static int mem_cgroup_soft_limit_tree_init(void)
-{
-	struct mem_cgroup_tree_per_node *rtpn;
-	struct mem_cgroup_tree_per_zone *rtpz;
-	int tmp, node, zone;
-
-	for_each_node(node) {
-		tmp = node;
-		if (!node_state(node, N_NORMAL_MEMORY))
-			tmp = -1;
-		rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, tmp);
-		if (!rtpn)
-			goto err_cleanup;
-
-		soft_limit_tree.rb_tree_per_node[node] = rtpn;
-
-		for (zone = 0; zone < MAX_NR_ZONES; zone++) {
-			rtpz = &rtpn->rb_tree_per_zone[zone];
-			rtpz->rb_root = RB_ROOT;
-			spin_lock_init(&rtpz->lock);
-		}
-	}
-	return 0;
-
-err_cleanup:
-	for_each_node(node) {
-		if (!soft_limit_tree.rb_tree_per_node[node])
-			break;
-		kfree(soft_limit_tree.rb_tree_per_node[node]);
-		soft_limit_tree.rb_tree_per_node[node] = NULL;
-	}
-	return 1;
-
-}
-
 static struct cgroup_subsys_state * __ref
 mem_cgroup_create(struct cgroup *cont)
 {
@@ -5089,8 +4688,6 @@ mem_cgroup_create(struct cgroup *cont)
 		int cpu;
 		enable_swap_cgroup();
 		parent = NULL;
-		if (mem_cgroup_soft_limit_tree_init())
-			goto free_out;
 		root_mem_cgroup = memcg;
 		for_each_possible_cpu(cpu) {
 			struct memcg_stock_pcp *stock =
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b90ec2b..8c367e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1832,7 +1832,22 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
 	do {
 		struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 
-		shrink_lruvec(lruvec, sc);
+		/*
+		 * Reclaim from mem_cgroup if any of these conditions are met:
+		 * - this is a targetted reclaim ( not global reclaim)
+		 * - reclaim priority is less than  DEF_PRIORITY - 2
+		 * - mem_cgroup or its ancestor ( not including root cgroup)
+		 * exceeds its soft limit
+		 *
+		 * Note: The priority check is a balance of how hard to
+		 * preserve the pages under softlimit. If the memcgs of the
+		 * zone having trouble to reclaim pages above their softlimit,
+		 * we have to reclaim under softlimit instead of burning more
+		 * cpu cycles.
+		 */
+		if (!global_reclaim(sc) || sc->priority < DEF_PRIORITY - 2 ||
+				should_reclaim_mem_cgroup(memcg))
+			shrink_lruvec(lruvec, sc);
 
 		/*
 		 * Limit reclaim has historically picked one memcg and
@@ -1913,8 +1928,6 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 {
 	struct zoneref *z;
 	struct zone *zone;
-	unsigned long nr_soft_reclaimed;
-	unsigned long nr_soft_scanned;
 	bool aborted_reclaim = false;
 
 	/*
@@ -1954,18 +1967,6 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 					continue;
 				}
 			}
-			/*
-			 * This steals pages from memory cgroups over softlimit
-			 * and returns the number of reclaimed pages and
-			 * scanned pages. This works for global memory pressure
-			 * and balancing, not for a memcg's limit.
-			 */
-			nr_soft_scanned = 0;
-			nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
-						sc->order, sc->gfp_mask,
-						&nr_soft_scanned);
-			sc->nr_reclaimed += nr_soft_reclaimed;
-			sc->nr_scanned += nr_soft_scanned;
 			/* need some check for avoid more shrink_zone() */
 		}
 
@@ -2143,45 +2144,6 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 
-unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
-						gfp_t gfp_mask, bool noswap,
-						struct zone *zone,
-						unsigned long *nr_scanned)
-{
-	struct scan_control sc = {
-		.nr_scanned = 0,
-		.nr_to_reclaim = SWAP_CLUSTER_MAX,
-		.may_writepage = !laptop_mode,
-		.may_unmap = 1,
-		.may_swap = !noswap,
-		.order = 0,
-		.priority = 0,
-		.target_mem_cgroup = memcg,
-	};
-	struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
-
-	sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
-			(GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
-
-	trace_mm_vmscan_memcg_softlimit_reclaim_begin(sc.order,
-						      sc.may_writepage,
-						      sc.gfp_mask);
-
-	/*
-	 * NOTE: Although we can get the priority field, using it
-	 * here is not a good idea, since it limits the pages we can scan.
-	 * if we don't reclaim here, the shrink_zone from balance_pgdat
-	 * will pick up pages from other mem cgroup's as well. We hack
-	 * the priority and make it zero.
-	 */
-	shrink_lruvec(lruvec, &sc);
-
-	trace_mm_vmscan_memcg_softlimit_reclaim_end(sc.nr_reclaimed);
-
-	*nr_scanned = sc.nr_scanned;
-	return sc.nr_reclaimed;
-}
-
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 					   gfp_t gfp_mask,
 					   bool noswap)
@@ -2352,8 +2314,6 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 	int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 	unsigned long total_scanned;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
-	unsigned long nr_soft_reclaimed;
-	unsigned long nr_soft_scanned;
 	struct scan_control sc = {
 		.gfp_mask = GFP_KERNEL,
 		.may_unmap = 1,
@@ -2455,16 +2415,6 @@ loop_again:
 
 			sc.nr_scanned = 0;
 
-			nr_soft_scanned = 0;
-			/*
-			 * Call soft limit reclaim before calling shrink_zone.
-			 */
-			nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
-							order, sc.gfp_mask,
-							&nr_soft_scanned);
-			sc.nr_reclaimed += nr_soft_reclaimed;
-			total_scanned += nr_soft_scanned;
-
 			/*
 			 * We put equal pressure on every zone, unless
 			 * one zone has way too many pages free
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V2 2/5] mm: memcg set soft_limit_in_bytes to 0 by default
  2012-06-18 16:47 [PATCH V5 1/5] mm: memcg softlimit reclaim rework Ying Han
@ 2012-06-18 16:47 ` Ying Han
  2012-06-18 16:47 ` [PATCH V5 3/5] mm: memcg detect no memcgs above softlimit under zone reclaim Ying Han
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-06-18 16:47 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hillf Danton, Hugh Dickins, Dan Magenheimer,
	Andrew Morton
  Cc: linux-mm

This idea is based on discussion with Michal and Johannes from LSF.

1. If soft_limit are all set to MAX, it wastes first three priority iterations
without scanning anything.

2. By default every memcg is eligible for softlimit reclaim, and we can also
set the value to MAX for special memcg which is immune to soft limit reclaim.

There is a behavior change after this patch: (N == DEF_PRIORITY - 2)

        A: usage > softlimit        B: usage <= softlimit        U: softlimit unset
old:    reclaim at each priority    reclaim when priority < N    reclaim when priority < N
new:    reclaim at each priority    reclaim when priority < N    reclaim at each priority

Note: I can leave the counter->soft_limit uninitialized, at least all the
caller of res_counter_init() have the memcg as pre-zeroed structure. However, I
might be better not rely on that.

Signed-off-by: Ying Han <yinghan@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 kernel/res_counter.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index d9ea45e..9cbffce 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -18,7 +18,7 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent)
 {
 	spin_lock_init(&counter->lock);
 	counter->limit = RESOURCE_MAX;
-	counter->soft_limit = RESOURCE_MAX;
+	counter->soft_limit = 0;
 	counter->parent = parent;
 }
 
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V5 3/5] mm: memcg detect no memcgs above softlimit under zone reclaim
  2012-06-18 16:47 [PATCH V5 1/5] mm: memcg softlimit reclaim rework Ying Han
  2012-06-18 16:47 ` [PATCH V2 2/5] mm: memcg set soft_limit_in_bytes to 0 by default Ying Han
@ 2012-06-18 16:47 ` Ying Han
  2012-06-18 16:47 ` [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock Ying Han
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-06-18 16:47 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hillf Danton, Hugh Dickins, Dan Magenheimer,
	Andrew Morton
  Cc: linux-mm

In memcg kernel, cgroup under its softlimit is not targeted under global
reclaim. It could be possible that all memcgs are under their softlimit for
a particular zone. If that is the case, the current implementation will
burn extra cpu cycles without making forward progress.

The idea is from LSF discussion where we detect it after the first round of
scanning and restart the reclaim by not looking at softlimit at all. This
allows us to make forward progress on shrink_zone().

Signed-off-by: Ying Han <yinghan@google.com>
---
 mm/vmscan.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8c367e1..51f8cc9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1827,6 +1827,10 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
 		.priority = sc->priority,
 	};
 	struct mem_cgroup *memcg;
+	bool over_softlimit, ignore_softlimit = false;
+
+restart:
+	over_softlimit = false;
 
 	memcg = mem_cgroup_iter(root, NULL, &reclaim);
 	do {
@@ -1845,10 +1849,14 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
 		 * we have to reclaim under softlimit instead of burning more
 		 * cpu cycles.
 		 */
-		if (!global_reclaim(sc) || sc->priority < DEF_PRIORITY - 2 ||
-				should_reclaim_mem_cgroup(memcg))
+		if (ignore_softlimit || !global_reclaim(sc) ||
+				sc->priority < DEF_PRIORITY - 2 ||
+				should_reclaim_mem_cgroup(memcg)) {
 			shrink_lruvec(lruvec, sc);
 
+			over_softlimit = true;
+		}
+
 		/*
 		 * Limit reclaim has historically picked one memcg and
 		 * scanned it with decreasing priority levels until
@@ -1865,6 +1873,11 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
 		}
 		memcg = mem_cgroup_iter(root, memcg, &reclaim);
 	} while (memcg);
+
+	if (!over_softlimit) {
+		ignore_softlimit = true;
+		goto restart;
+	}
 }
 
 /* Returns true if compaction should go ahead for a high-order request */
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock
  2012-06-18 16:47 [PATCH V5 1/5] mm: memcg softlimit reclaim rework Ying Han
  2012-06-18 16:47 ` [PATCH V2 2/5] mm: memcg set soft_limit_in_bytes to 0 by default Ying Han
  2012-06-18 16:47 ` [PATCH V5 3/5] mm: memcg detect no memcgs above softlimit under zone reclaim Ying Han
@ 2012-06-18 16:47 ` Ying Han
  2012-06-19 18:29   ` KOSAKI Motohiro
  2012-06-18 16:47 ` [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages Ying Han
  2012-06-19 11:29 ` [PATCH V5 1/5] mm: memcg softlimit reclaim rework Johannes Weiner
  4 siblings, 1 reply; 14+ messages in thread
From: Ying Han @ 2012-06-18 16:47 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hillf Danton, Hugh Dickins, Dan Magenheimer,
	Andrew Morton
  Cc: linux-mm, KOSAKI Motohiro

Currently, do_try_to_free_pages() can enter livelock. Because of,
now vmscan has two conflicted policies.

1) kswapd sleep when it couldn't reclaim any page even though
   reach priority 0. This is because to avoid kswapd() infinite
   loop. That said, kswapd assume direct reclaim makes enough
   free pages either regular page reclaim or oom-killer.
   This logic makes kswapd -> direct-reclaim dependency.
2) direct reclaim continue to reclaim without oom-killer until
   kswapd turn on zone->all_unreclaimble. This is because
   to avoid too early oom-kill.
   This logic makes direct-reclaim -> kswapd dependency.

In worst case, direct-reclaim may continue to page reclaim forever
when kswapd is slept and any other thread don't wakeup kswapd.

We can't turn on zone->all_unreclaimable because this is racy.
direct reclaim path don't take any lock. Thus this patch removes
zone->all_unreclaimable field completely and recalculates every
time.

Note: we can't take the idea that direct-reclaim see zone->pages_scanned
directly and kswapd continue to use zone->all_unreclaimable. Because,
it is racy. commit 929bea7c71 (vmscan: all_unreclaimable() use
zone->all_unreclaimable as a name) describes the detail.

Reported-by: Aaditya Kumar <aaditya.kumar.30@gmail.com>
Reported-by: Ying Han <yinghan@google.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
---
 include/linux/mm_inline.h |   19 ++++++++++++++++
 include/linux/mmzone.h    |    2 +-
 include/linux/vmstat.h    |    1 -
 mm/page-writeback.c       |    2 +
 mm/page_alloc.c           |    5 +--
 mm/vmscan.c               |   51 +++++++++++++-------------------------------
 mm/vmstat.c               |    3 +-
 7 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 1397ccf..5cb796c 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -2,6 +2,7 @@
 #define LINUX_MM_INLINE_H
 
 #include <linux/huge_mm.h>
+#include <linux/swap.h>
 
 /**
  * page_is_file_cache - should the page be on a file LRU or anon LRU?
@@ -99,4 +100,22 @@ static __always_inline enum lru_list page_lru(struct page *page)
 	return lru;
 }
 
+static inline unsigned long zone_reclaimable_pages(struct zone *zone)
+{
+	int nr;
+
+	nr = zone_page_state(zone, NR_ACTIVE_FILE) +
+	     zone_page_state(zone, NR_INACTIVE_FILE);
+
+	if (nr_swap_pages > 0)
+		nr += zone_page_state(zone, NR_ACTIVE_ANON) +
+		      zone_page_state(zone, NR_INACTIVE_ANON);
+
+		return nr;
+}
+
+static inline bool zone_reclaimable(struct zone *zone)
+{
+	return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
+}
 #endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0ee785d..b380ec3 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -343,7 +343,7 @@ struct zone {
 	 * free areas of different sizes
 	 */
 	spinlock_t		lock;
-	int                     all_unreclaimable; /* All pages pinned */
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 65efb92..9607256 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -140,7 +140,6 @@ static inline unsigned long zone_page_state_snapshot(struct zone *zone,
 }
 
 extern unsigned long global_reclaimable_pages(void);
-extern unsigned long zone_reclaimable_pages(struct zone *zone);
 
 #ifdef CONFIG_NUMA
 /*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 26adea8..e869f8a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -34,6 +34,8 @@
 #include <linux/syscalls.h>
 #include <linux/buffer_head.h> /* __set_page_dirty_buffers */
 #include <linux/pagevec.h>
+#include <linux/mm_inline.h>
+
 #include <trace/events/writeback.h>
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 164a6d2..aec24de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -58,6 +58,7 @@
 #include <linux/memcontrol.h>
 #include <linux/prefetch.h>
 #include <linux/page-debug-flags.h>
+#include <linux/mm_inline.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -637,7 +638,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	int to_free = count;
 
 	spin_lock(&zone->lock);
-	zone->all_unreclaimable = 0;
 	zone->pages_scanned = 0;
 
 	while (to_free) {
@@ -679,7 +679,6 @@ static void free_one_page(struct zone *zone, struct page *page, int order,
 				int migratetype)
 {
 	spin_lock(&zone->lock);
-	zone->all_unreclaimable = 0;
 	zone->pages_scanned = 0;
 
 	__free_one_page(page, zone, order, migratetype);
@@ -2814,7 +2813,7 @@ void show_free_areas(unsigned int filter)
 			K(zone_page_state(zone, NR_BOUNCE)),
 			K(zone_page_state(zone, NR_WRITEBACK_TEMP)),
 			zone->pages_scanned,
-			(zone->all_unreclaimable ? "yes" : "no")
+			(zone_reclaimable(zone) ? "yes" : "no")
 			);
 		printk("lowmem_reserve[]:");
 		for (i = 0; i < MAX_NR_ZONES; i++)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 51f8cc9..b95344c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1592,7 +1592,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	 * latencies, so it's better to scan a minimum amount there as
 	 * well.
 	 */
-	if (current_is_kswapd() && zone->all_unreclaimable)
+	if (current_is_kswapd() && !zone_reclaimable(zone))
 		force_scan = true;
 	if (!global_reclaim(sc))
 		force_scan = true;
@@ -1962,8 +1962,8 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 		if (global_reclaim(sc)) {
 			if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
 				continue;
-			if (zone->all_unreclaimable &&
-					sc->priority != DEF_PRIORITY)
+			if (!zone_reclaimable(zone) &&
+			    sc->priority != DEF_PRIORITY)
 				continue;	/* Let kswapd poll it */
 			if (COMPACTION_BUILD) {
 				/*
@@ -1989,11 +1989,6 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 	return aborted_reclaim;
 }
 
-static bool zone_reclaimable(struct zone *zone)
-{
-	return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
-}
-
 /* All zones in zonelist are unreclaimable? */
 static bool all_unreclaimable(struct zonelist *zonelist,
 		struct scan_control *sc)
@@ -2007,7 +2002,7 @@ static bool all_unreclaimable(struct zonelist *zonelist,
 			continue;
 		if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
 			continue;
-		if (!zone->all_unreclaimable)
+		if (zone_reclaimable(zone))
 			return false;
 	}
 
@@ -2274,7 +2269,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 		 * they must be considered balanced here as well if kswapd
 		 * is to sleep
 		 */
-		if (zone->all_unreclaimable) {
+		if (zone_reclaimable(zone)) {
 			balanced += zone->present_pages;
 			continue;
 		}
@@ -2366,8 +2361,8 @@ loop_again:
 			if (!populated_zone(zone))
 				continue;
 
-			if (zone->all_unreclaimable &&
-			    sc.priority != DEF_PRIORITY)
+			if (!zone_reclaimable(zone) &&
+					sc.priority != DEF_PRIORITY)
 				continue;
 
 			/*
@@ -2416,14 +2411,14 @@ loop_again:
 		 */
 		for (i = 0; i <= end_zone; i++) {
 			struct zone *zone = pgdat->node_zones + i;
-			int nr_slab, testorder;
+			int testorder;
 			unsigned long balance_gap;
 
 			if (!populated_zone(zone))
 				continue;
 
-			if (zone->all_unreclaimable &&
-			    sc.priority != DEF_PRIORITY)
+			if (!zone_reclaimable(zone) &&
+					sc.priority != DEF_PRIORITY)
 				continue;
 
 			sc.nr_scanned = 0;
@@ -2460,12 +2455,10 @@ loop_again:
 				shrink_zone(zone, &sc);
 
 				reclaim_state->reclaimed_slab = 0;
-				nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
+				shrink_slab(&shrink, sc.nr_scanned, lru_pages);
 				sc.nr_reclaimed += reclaim_state->reclaimed_slab;
 				total_scanned += sc.nr_scanned;
 
-				if (nr_slab == 0 && !zone_reclaimable(zone))
-					zone->all_unreclaimable = 1;
 			}
 
 			/*
@@ -2477,7 +2470,7 @@ loop_again:
 			    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
 				sc.may_writepage = 1;
 
-			if (zone->all_unreclaimable) {
+			if (!zone_reclaimable(zone)) {
 				if (end_zone && end_zone == i)
 					end_zone--;
 				continue;
@@ -2579,8 +2572,8 @@ out:
 			if (!populated_zone(zone))
 				continue;
 
-			if (zone->all_unreclaimable &&
-			    sc.priority != DEF_PRIORITY)
+			if (!zone_reclaimable(zone) &&
+					sc.priority != DEF_PRIORITY)
 				continue;
 
 			/* Would compaction fail due to lack of free memory? */
@@ -2813,20 +2806,6 @@ unsigned long global_reclaimable_pages(void)
 	return nr;
 }
 
-unsigned long zone_reclaimable_pages(struct zone *zone)
-{
-	int nr;
-
-	nr = zone_page_state(zone, NR_ACTIVE_FILE) +
-	     zone_page_state(zone, NR_INACTIVE_FILE);
-
-	if (nr_swap_pages > 0)
-		nr += zone_page_state(zone, NR_ACTIVE_ANON) +
-		      zone_page_state(zone, NR_INACTIVE_ANON);
-
-	return nr;
-}
-
 #ifdef CONFIG_HIBERNATION
 /*
  * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
@@ -3121,7 +3100,7 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	    zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
 		return ZONE_RECLAIM_FULL;
 
-	if (zone->all_unreclaimable)
+	if (!zone_reclaimable(zone))
 		return ZONE_RECLAIM_FULL;
 
 	/*
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 7db1b9b..f6800a1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -19,6 +19,7 @@
 #include <linux/math64.h>
 #include <linux/writeback.h>
 #include <linux/compaction.h>
+#include <linux/mm_inline.h>
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
@@ -1019,7 +1020,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
 		   "\n  all_unreclaimable: %u"
 		   "\n  start_pfn:         %lu"
 		   "\n  inactive_ratio:    %u",
-		   zone->all_unreclaimable,
+		   !zone_reclaimable(zone),
 		   zone->zone_start_pfn,
 		   zone->inactive_ratio);
 	seq_putc(m, '\n');
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages
  2012-06-18 16:47 [PATCH V5 1/5] mm: memcg softlimit reclaim rework Ying Han
                   ` (2 preceding siblings ...)
  2012-06-18 16:47 ` [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock Ying Han
@ 2012-06-18 16:47 ` Ying Han
  2012-06-19 12:05   ` Johannes Weiner
  2012-06-19 11:29 ` [PATCH V5 1/5] mm: memcg softlimit reclaim rework Johannes Weiner
  4 siblings, 1 reply; 14+ messages in thread
From: Ying Han @ 2012-06-18 16:47 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Mel Gorman, KAMEZAWA Hiroyuki,
	Rik van Riel, Hillf Danton, Hugh Dickins, Dan Magenheimer,
	Andrew Morton
  Cc: linux-mm

The function zone_reclaimable() marks zone->all_unreclaimable based on
per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true,
alloc_pages could go to OOM instead of getting stuck in page reclaim.

In memcg kernel, cgroup under its softlimit is not targeted under global
reclaim. So we need to remove those pages from reclaimable_pages, otherwise
it will cause reclaim mechanism to get stuck trying to reclaim from
all_unreclaimable zone.

Signed-off-by: Ying Han <yinghan@google.com>
---
 include/linux/mm_inline.h |   32 +++++++++++++++++++++++++-------
 mm/vmscan.c               |    8 --------
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 5cb796c..521a498 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page)
 	return lru;
 }
 
+static inline unsigned long get_lru_size(struct lruvec *lruvec,
+					 enum lru_list lru)
+{
+	if (!mem_cgroup_disabled())
+		return mem_cgroup_get_lru_size(lruvec, lru);
+
+	return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
+}
+
 static inline unsigned long zone_reclaimable_pages(struct zone *zone)
 {
-	int nr;
+	int nr = 0;
+	struct mem_cgroup *memcg;
+
+	memcg = mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
 
-	nr = zone_page_state(zone, NR_ACTIVE_FILE) +
-	     zone_page_state(zone, NR_INACTIVE_FILE);
+		if (should_reclaim_mem_cgroup(memcg)) {
+			nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) +
+			      get_lru_size(lruvec, LRU_ACTIVE_FILE);
 
-	if (nr_swap_pages > 0)
-		nr += zone_page_state(zone, NR_ACTIVE_ANON) +
-		      zone_page_state(zone, NR_INACTIVE_ANON);
+			if (nr_swap_pages > 0)
+				nr += get_lru_size(lruvec, LRU_ACTIVE_ANON) +
+				      get_lru_size(lruvec, LRU_INACTIVE_ANON);
+		}
+		memcg = mem_cgroup_iter(NULL, memcg, NULL);
+	} while (memcg);
 
-		return nr;
+	return nr;
 }
 
 static inline bool zone_reclaimable(struct zone *zone)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b95344c..4a44890 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -145,14 +145,6 @@ static bool global_reclaim(struct scan_control *sc)
 }
 #endif
 
-static unsigned long get_lru_size(struct lruvec *lruvec, enum lru_list lru)
-{
-	if (!mem_cgroup_disabled())
-		return mem_cgroup_get_lru_size(lruvec, lru);
-
-	return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
-}
-
 /*
  * Add a shrinker callback to be called from the vm
  */
-- 
1.7.7.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/5] mm: memcg softlimit reclaim rework
  2012-06-18 16:47 [PATCH V5 1/5] mm: memcg softlimit reclaim rework Ying Han
                   ` (3 preceding siblings ...)
  2012-06-18 16:47 ` [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages Ying Han
@ 2012-06-19 11:29 ` Johannes Weiner
  2012-06-20  3:45   ` Ying Han
  4 siblings, 1 reply; 14+ messages in thread
From: Johannes Weiner @ 2012-06-19 11:29 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Mon, Jun 18, 2012 at 09:47:27AM -0700, Ying Han wrote:
> This patch reverts all the existing softlimit reclaim implementations and
> instead integrates the softlimit reclaim into existing global reclaim logic.
> 
> The new softlimit reclaim includes the following changes:
> 
> 1. add function should_reclaim_mem_cgroup()
> 
> Add the filter function should_reclaim_mem_cgroup() under the common function
> shrink_zone(). The later one is being called both from per-memcg reclaim as
> well as global reclaim.
> 
> Today the softlimit takes effect only under global memory pressure. The memcgs
> get free run above their softlimit until there is a global memory contention.
> This patch doesn't change the semantics.

But it's quite a performance regression.  Maybe it would be better
after all to combine this change with 'make 0 the default'?

Yes, I was the one asking for the changes to be separated, if
possible, but I didn't mean regressing in between.  No forward
dependencies in patch series, please.

> Under the global reclaim, we try to skip reclaiming from a memcg under its
> softlimit. To prevent reclaim from trying too hard on hitting memcgs
> (above softlimit) w/ only hard-to-reclaim pages, the reclaim priority is used
> to skip the softlimit check. This is a trade-off of system performance and
> resource isolation.
> 
> 2. "hierarchical" softlimit reclaim
>
> This is consistant to how softlimit was previously implemented, where the
> pressure is put for the whole hiearchy as long as the "root" of the hierarchy
> over its softlimit.
> 
> This part is not in my previous posts, and is quite different from my
> understanding of softlimit reclaim. After quite a lot of discussions with
> Johannes and Michal, i decided to go with it for now. And this is designed
> to work with both trusted setups and untrusted setups.

This may be really confusing to someone uninvolved reading the
changelog as it doesn't have anything to do with what the patch
actually does.

It may be better to include past discussion outcomes in the
introductary email of a series.

> @@ -870,8 +672,6 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
>  		preempt_enable();
>  
>  		mem_cgroup_threshold(memcg);
> -		if (unlikely(do_softlimit))
> -			mem_cgroup_update_tree(memcg, page);
>  #if MAX_NUMNODES > 1
>  		if (unlikely(do_numainfo))
>  			atomic_inc(&memcg->numainfo_events);
> @@ -922,6 +722,31 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
>  	return memcg;
>  }
>  
> +bool should_reclaim_mem_cgroup(struct mem_cgroup *memcg)

I'm not too fond of the magical name.  The API provides an information
about soft limits, the decision should rest with vmscan.c.

mem_cgroup_over_soft_limit() e.g.?

> +{
> +	if (mem_cgroup_disabled())
> +		return true;
> +
> +	/*
> +	 * We treat the root cgroup special here to always reclaim pages.
> +	 * Now root cgroup has its own lru, and the only chance to reclaim
> +	 * pages from it is through global reclaim. note, root cgroup does
> +	 * not trigger targeted reclaim.
> +	 */
> +	if (mem_cgroup_is_root(memcg))
> +		return true;

With the soft limit at 0, the comment is no longer accurate because
this check turns into a simple optimization.  We could check the
res_counter soft limit, which would always result in the root group
being above the limit, but we take the short cut.

> +	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> +		/* This is global reclaim, stop at root cgroup */
> +		if (mem_cgroup_is_root(memcg))
> +			break;

I don't see why you add this check and the comment does not help.

> +		if (res_counter_soft_limit_excess(&memcg->res))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * mem_cgroup_iter - iterate over memory cgroup hierarchy
>   * @root: hierarchy root

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages
  2012-06-18 16:47 ` [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages Ying Han
@ 2012-06-19 12:05   ` Johannes Weiner
  2012-06-20  3:51     ` Ying Han
  2012-06-25 21:00     ` Ying Han
  0 siblings, 2 replies; 14+ messages in thread
From: Johannes Weiner @ 2012-06-19 12:05 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Mon, Jun 18, 2012 at 09:47:31AM -0700, Ying Han wrote:
> The function zone_reclaimable() marks zone->all_unreclaimable based on
> per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true,
> alloc_pages could go to OOM instead of getting stuck in page reclaim.

There is no zone->all_unreclaimable at this point, you removed it in
the previous patch.

> In memcg kernel, cgroup under its softlimit is not targeted under global
> reclaim. So we need to remove those pages from reclaimable_pages, otherwise
> it will cause reclaim mechanism to get stuck trying to reclaim from
> all_unreclaimable zone.

Can't you check if zone->pages_scanned changed in between reclaim
runs?

Or sum up the scanned and reclaimable pages encountered while
iterating the hierarchy during regular reclaim and then use those
numbers in the equation instead of the per-zone counters?

Walking the full global hierarchy in all the places where we check if
a zone is reclaimable is a scalability nightmare.

> @@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page)
>  	return lru;
>  }
>  
> +static inline unsigned long get_lru_size(struct lruvec *lruvec,
> +					 enum lru_list lru)
> +{
> +	if (!mem_cgroup_disabled())
> +		return mem_cgroup_get_lru_size(lruvec, lru);
> +
> +	return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> +}
> +
>  static inline unsigned long zone_reclaimable_pages(struct zone *zone)
>  {
> -	int nr;
> +	int nr = 0;
> +	struct mem_cgroup *memcg;
> +
> +	memcg = mem_cgroup_iter(NULL, NULL, NULL);
> +	do {
> +		struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>  
> -	nr = zone_page_state(zone, NR_ACTIVE_FILE) +
> -	     zone_page_state(zone, NR_INACTIVE_FILE);
> +		if (should_reclaim_mem_cgroup(memcg)) {
> +			nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) +
> +			      get_lru_size(lruvec, LRU_ACTIVE_FILE);

Sometimes, the number of reclaimable pages DO include those of groups
for which should_reclaim_mem_cgroup() is false: when the priority
level is <= DEF_PRIORITY - 2, as you defined in 1/5!  This means that
you consider pages you just scanned unreclaimable, which can result in
the zone being unreclaimable after the DEF_PRIORITY - 2 cycle, no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock
  2012-06-18 16:47 ` [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock Ying Han
@ 2012-06-19 18:29   ` KOSAKI Motohiro
  2012-06-20  3:29     ` Ying Han
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2012-06-19 18:29 UTC (permalink / raw)
  To: yinghan
  Cc: mhocko, hannes, mel, kamezawa.hiroyu, riel, dhillf, hughd,
	dan.magenheimer, akpm, linux-mm, kosaki.motohiro

On 6/18/2012 12:47 PM, Ying Han wrote:
> Currently, do_try_to_free_pages() can enter livelock. Because of,
> now vmscan has two conflicted policies.
> 
> 1) kswapd sleep when it couldn't reclaim any page even though
>    reach priority 0. This is because to avoid kswapd() infinite
>    loop. That said, kswapd assume direct reclaim makes enough
>    free pages either regular page reclaim or oom-killer.
>    This logic makes kswapd -> direct-reclaim dependency.
> 2) direct reclaim continue to reclaim without oom-killer until
>    kswapd turn on zone->all_unreclaimble. This is because
>    to avoid too early oom-kill.
>    This logic makes direct-reclaim -> kswapd dependency.
> 
> In worst case, direct-reclaim may continue to page reclaim forever
> when kswapd is slept and any other thread don't wakeup kswapd.
> 
> We can't turn on zone->all_unreclaimable because this is racy.
> direct reclaim path don't take any lock. Thus this patch removes
> zone->all_unreclaimable field completely and recalculates every
> time.
> 
> Note: we can't take the idea that direct-reclaim see zone->pages_scanned
> directly and kswapd continue to use zone->all_unreclaimable. Because,
> it is racy. commit 929bea7c71 (vmscan: all_unreclaimable() use
> zone->all_unreclaimable as a name) describes the detail.
> 
> Reported-by: Aaditya Kumar <aaditya.kumar.30@gmail.com>
> Reported-by: Ying Han <yinghan@google.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: Rik van Riel <riel@redhat.com>

Please drop this. I've got some review comment about this patch and
i need respin. but thank you for paying attention this.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock
  2012-06-19 18:29   ` KOSAKI Motohiro
@ 2012-06-20  3:29     ` Ying Han
  0 siblings, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-06-20  3:29 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: mhocko, hannes, mel, kamezawa.hiroyu, riel, dhillf, hughd,
	dan.magenheimer, akpm, linux-mm

On Tue, Jun 19, 2012 at 11:29 AM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
> On 6/18/2012 12:47 PM, Ying Han wrote:
>> Currently, do_try_to_free_pages() can enter livelock. Because of,
>> now vmscan has two conflicted policies.
>>
>> 1) kswapd sleep when it couldn't reclaim any page even though
>>    reach priority 0. This is because to avoid kswapd() infinite
>>    loop. That said, kswapd assume direct reclaim makes enough
>>    free pages either regular page reclaim or oom-killer.
>>    This logic makes kswapd -> direct-reclaim dependency.
>> 2) direct reclaim continue to reclaim without oom-killer until
>>    kswapd turn on zone->all_unreclaimble. This is because
>>    to avoid too early oom-kill.
>>    This logic makes direct-reclaim -> kswapd dependency.
>>
>> In worst case, direct-reclaim may continue to page reclaim forever
>> when kswapd is slept and any other thread don't wakeup kswapd.
>>
>> We can't turn on zone->all_unreclaimable because this is racy.
>> direct reclaim path don't take any lock. Thus this patch removes
>> zone->all_unreclaimable field completely and recalculates every
>> time.
>>
>> Note: we can't take the idea that direct-reclaim see zone->pages_scanned
>> directly and kswapd continue to use zone->all_unreclaimable. Because,
>> it is racy. commit 929bea7c71 (vmscan: all_unreclaimable() use
>> zone->all_unreclaimable as a name) describes the detail.
>>
>> Reported-by: Aaditya Kumar <aaditya.kumar.30@gmail.com>
>> Reported-by: Ying Han <yinghan@google.com>
>> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Acked-by: Rik van Riel <riel@redhat.com>
>
> Please drop this. I've got some review comment about this patch and
> i need respin. but thank you for paying attention this.

Thanks for the heads up. Are you working on the new version of it,
since I included this patch in my softlimit reclaim patchset as a
replacement of one patch i had.

--Ying
>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/5] mm: memcg softlimit reclaim rework
  2012-06-19 11:29 ` [PATCH V5 1/5] mm: memcg softlimit reclaim rework Johannes Weiner
@ 2012-06-20  3:45   ` Ying Han
  2012-06-20  8:53     ` Johannes Weiner
  0 siblings, 1 reply; 14+ messages in thread
From: Ying Han @ 2012-06-20  3:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Tue, Jun 19, 2012 at 4:29 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, Jun 18, 2012 at 09:47:27AM -0700, Ying Han wrote:
>> This patch reverts all the existing softlimit reclaim implementations and
>> instead integrates the softlimit reclaim into existing global reclaim logic.
>>
>> The new softlimit reclaim includes the following changes:
>>
>> 1. add function should_reclaim_mem_cgroup()
>>
>> Add the filter function should_reclaim_mem_cgroup() under the common function
>> shrink_zone(). The later one is being called both from per-memcg reclaim as
>> well as global reclaim.
>>
>> Today the softlimit takes effect only under global memory pressure. The memcgs
>> get free run above their softlimit until there is a global memory contention.
>> This patch doesn't change the semantics.
>
> But it's quite a performance regression.  Maybe it would be better
> after all to combine this change with 'make 0 the default'?
>
> Yes, I was the one asking for the changes to be separated, if
> possible, but I didn't mean regressing in between.  No forward
> dependencies in patch series, please.

Ok, I don't have problem to squash that patch in next time.

>
>> Under the global reclaim, we try to skip reclaiming from a memcg under its
>> softlimit. To prevent reclaim from trying too hard on hitting memcgs
>> (above softlimit) w/ only hard-to-reclaim pages, the reclaim priority is used
>> to skip the softlimit check. This is a trade-off of system performance and
>> resource isolation.
>>
>> 2. "hierarchical" softlimit reclaim
>>
>> This is consistant to how softlimit was previously implemented, where the
>> pressure is put for the whole hiearchy as long as the "root" of the hierarchy
>> over its softlimit.
>>
>> This part is not in my previous posts, and is quite different from my
>> understanding of softlimit reclaim. After quite a lot of discussions with
>> Johannes and Michal, i decided to go with it for now. And this is designed
>> to work with both trusted setups and untrusted setups.
>
> This may be really confusing to someone uninvolved reading the
> changelog as it doesn't have anything to do with what the patch
> actually does.
>
> It may be better to include past discussion outcomes in the
> introductary email of a series.

I will try to include some of the points from our last discussion in
the commit log.

>> @@ -870,8 +672,6 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
>>               preempt_enable();
>>
>>               mem_cgroup_threshold(memcg);
>> -             if (unlikely(do_softlimit))
>> -                     mem_cgroup_update_tree(memcg, page);
>>  #if MAX_NUMNODES > 1
>>               if (unlikely(do_numainfo))
>>                       atomic_inc(&memcg->numainfo_events);
>> @@ -922,6 +722,31 @@ struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
>>       return memcg;
>>  }
>>
>> +bool should_reclaim_mem_cgroup(struct mem_cgroup *memcg)
>
> I'm not too fond of the magical name.  The API provides an information
> about soft limits, the decision should rest with vmscan.c.
>
> mem_cgroup_over_soft_limit() e.g.?

That is fine w/ me.

>
>> +{
>> +     if (mem_cgroup_disabled())
>> +             return true;
>> +
>> +     /*
>> +      * We treat the root cgroup special here to always reclaim pages.
>> +      * Now root cgroup has its own lru, and the only chance to reclaim
>> +      * pages from it is through global reclaim. note, root cgroup does
>> +      * not trigger targeted reclaim.
>> +      */
>> +     if (mem_cgroup_is_root(memcg))
>> +             return true;
>
> With the soft limit at 0, the comment is no longer accurate because
> this check turns into a simple optimization.  We could check the
> res_counter soft limit, which would always result in the root group
> being above the limit, but we take the short cut.

For root group, my intention here is always reclaim pages from it
regardless of the softlimit setting. And the reason is exactly the one
in the comment. If the softlimit is set to 0 as default, I agree this
is then a short cut.

Anything you suggest that I need to change here?

>
>> +     for (; memcg; memcg = parent_mem_cgroup(memcg)) {
>> +             /* This is global reclaim, stop at root cgroup */
>> +             if (mem_cgroup_is_root(memcg))
>> +                     break;
>
> I don't see why you add this check and the comment does not help.

The root cgroup would have softlimit set to 0 ( in most of the cases
), and not skipping root will make everyone reclaimable here.

Thank you for reviewing !

--Ying
>
>> +             if (res_counter_soft_limit_excess(&memcg->res))
>> +                     return true;
>> +     }
>> +
>> +     return false;
>> +}
>> +
>>  /**
>>   * mem_cgroup_iter - iterate over memory cgroup hierarchy
>>   * @root: hierarchy root

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages
  2012-06-19 12:05   ` Johannes Weiner
@ 2012-06-20  3:51     ` Ying Han
  2012-06-25 21:00     ` Ying Han
  1 sibling, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-06-20  3:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Tue, Jun 19, 2012 at 5:05 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, Jun 18, 2012 at 09:47:31AM -0700, Ying Han wrote:
>> The function zone_reclaimable() marks zone->all_unreclaimable based on
>> per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true,
>> alloc_pages could go to OOM instead of getting stuck in page reclaim.
>
> There is no zone->all_unreclaimable at this point, you removed it in
> the previous patch.

Ah, forgot to update the commit log after applying the recent patch from Kosaki.

>> In memcg kernel, cgroup under its softlimit is not targeted under global
>> reclaim. So we need to remove those pages from reclaimable_pages, otherwise
>> it will cause reclaim mechanism to get stuck trying to reclaim from
>> all_unreclaimable zone.
>
> Can't you check if zone->pages_scanned changed in between reclaim
> runs?
>
> Or sum up the scanned and reclaimable pages encountered while
> iterating the hierarchy during regular reclaim and then use those
> numbers in the equation instead of the per-zone counters?
>
> Walking the full global hierarchy in all the places where we check if
> a zone is reclaimable is a scalability nightmare.

I agree on that, i will exploring a bit more on that.

>
>> @@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page)
>>       return lru;
>>  }
>>
>> +static inline unsigned long get_lru_size(struct lruvec *lruvec,
>> +                                      enum lru_list lru)
>> +{
>> +     if (!mem_cgroup_disabled())
>> +             return mem_cgroup_get_lru_size(lruvec, lru);
>> +
>> +     return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
>> +}
>> +
>>  static inline unsigned long zone_reclaimable_pages(struct zone *zone)
>>  {
>> -     int nr;
>> +     int nr = 0;
>> +     struct mem_cgroup *memcg;
>> +
>> +     memcg = mem_cgroup_iter(NULL, NULL, NULL);
>> +     do {
>> +             struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>>
>> -     nr = zone_page_state(zone, NR_ACTIVE_FILE) +
>> -          zone_page_state(zone, NR_INACTIVE_FILE);
>> +             if (should_reclaim_mem_cgroup(memcg)) {
>> +                     nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) +
>> +                           get_lru_size(lruvec, LRU_ACTIVE_FILE);
>
> Sometimes, the number of reclaimable pages DO include those of groups
> for which should_reclaim_mem_cgroup() is false: when the priority
> level is <= DEF_PRIORITY - 2, as you defined in 1/5!  This means that
> you consider pages you just scanned unreclaimable, which can result in
> the zone being unreclaimable after the DEF_PRIORITY - 2 cycle, no?

That is true and I thought about it as well. I would as well adding
the priority check here where only start considering the pages if the
priority < DEF_PRIORITY - 2

--Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/5] mm: memcg softlimit reclaim rework
  2012-06-20  3:45   ` Ying Han
@ 2012-06-20  8:53     ` Johannes Weiner
  2012-06-20 14:59       ` Ying Han
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Weiner @ 2012-06-20  8:53 UTC (permalink / raw)
  To: Ying Han
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Tue, Jun 19, 2012 at 08:45:03PM -0700, Ying Han wrote:
> On Tue, Jun 19, 2012 at 4:29 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Mon, Jun 18, 2012 at 09:47:27AM -0700, Ying Han wrote:
> >> +{
> >> +     if (mem_cgroup_disabled())
> >> +             return true;
> >> +
> >> +     /*
> >> +      * We treat the root cgroup special here to always reclaim pages.
> >> +      * Now root cgroup has its own lru, and the only chance to reclaim
> >> +      * pages from it is through global reclaim. note, root cgroup does
> >> +      * not trigger targeted reclaim.
> >> +      */
> >> +     if (mem_cgroup_is_root(memcg))
> >> +             return true;
> >
> > With the soft limit at 0, the comment is no longer accurate because
> > this check turns into a simple optimization.  We could check the
> > res_counter soft limit, which would always result in the root group
> > being above the limit, but we take the short cut.
> 
> For root group, my intention here is always reclaim pages from it
> regardless of the softlimit setting. And the reason is exactly the one
> in the comment. If the softlimit is set to 0 as default, I agree this
> is then a short cut.
> 
> Anything you suggest that I need to change here?

Well, not in this patch as it stands.  But once you squash the '0 per
default', it may be good to note that this is a shortcut.

> >> +     for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> >> +             /* This is global reclaim, stop at root cgroup */
> >> +             if (mem_cgroup_is_root(memcg))
> >> +                     break;
> >
> > I don't see why you add this check and the comment does not help.
> 
> The root cgroup would have softlimit set to 0 ( in most of the cases
> ), and not skipping root will make everyone reclaimable here.

Only if root_mem_cgroup->use_hierarchy is set.  At the same time, we
usually behave as if this was the case, in accounting and reclaim.

Right now we allow setting the soft limit in root_mem_cgroup but it
does not make any sense.  After your patch, even less so, because of
these shortcut checks that now actually change semantics.  Could we
make this more consistent to users and forbid setting as soft limit in
root_mem_cgroup?  Patch below.

The reason this behaves differently from hard limits is because the
soft limits now have double meaning; they are upper limit and minimum
guarantee at the same time.  The unchangeable defaults in the root
cgroup should be "no guarantee" and "unlimited soft limit" at the same
time, but that is obviously not possible if these are opposing range
ends of the same knob.  So we pick no guarantees, always up for
reclaim when looking top down but also behave as if the soft limit was
unlimited in the root cgroup when looking bottom up.

This is what the second check does.  But I think it needs a clearer
comment.

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcg: forbid setting soft limit on root cgroup

Setting a soft limit in the root cgroup does not make sense, as soft
limits are enforced hierarchically and the root cgroup is the
hierarchical parent of every other cgroup.  It would not provide the
discrimination between groups that soft limits are usually used for.

With the current implementation of soft limits, it would only make
global reclaim more aggressive compared to target reclaim, but we
absolutely don't want anyone to rely on this behaviour.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ac35bcc..21c45a0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3905,6 +3967,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
 	case RES_SOFT_LIMIT:
+		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
 			break;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 1/5] mm: memcg softlimit reclaim rework
  2012-06-20  8:53     ` Johannes Weiner
@ 2012-06-20 14:59       ` Ying Han
  0 siblings, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-06-20 14:59 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Wed, Jun 20, 2012 at 1:53 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Jun 19, 2012 at 08:45:03PM -0700, Ying Han wrote:
>> On Tue, Jun 19, 2012 at 4:29 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > On Mon, Jun 18, 2012 at 09:47:27AM -0700, Ying Han wrote:
>> >> +{
>> >> +     if (mem_cgroup_disabled())
>> >> +             return true;
>> >> +
>> >> +     /*
>> >> +      * We treat the root cgroup special here to always reclaim pages.
>> >> +      * Now root cgroup has its own lru, and the only chance to reclaim
>> >> +      * pages from it is through global reclaim. note, root cgroup does
>> >> +      * not trigger targeted reclaim.
>> >> +      */
>> >> +     if (mem_cgroup_is_root(memcg))
>> >> +             return true;
>> >
>> > With the soft limit at 0, the comment is no longer accurate because
>> > this check turns into a simple optimization.  We could check the
>> > res_counter soft limit, which would always result in the root group
>> > being above the limit, but we take the short cut.
>>
>> For root group, my intention here is always reclaim pages from it
>> regardless of the softlimit setting. And the reason is exactly the one
>> in the comment. If the softlimit is set to 0 as default, I agree this
>> is then a short cut.
>>
>> Anything you suggest that I need to change here?
>
> Well, not in this patch as it stands.  But once you squash the '0 per
> default', it may be good to note that this is a shortcut.

Will include some notes next time.

>
>> >> +     for (; memcg; memcg = parent_mem_cgroup(memcg)) {
>> >> +             /* This is global reclaim, stop at root cgroup */
>> >> +             if (mem_cgroup_is_root(memcg))
>> >> +                     break;
>> >
>> > I don't see why you add this check and the comment does not help.
>>
>> The root cgroup would have softlimit set to 0 ( in most of the cases
>> ), and not skipping root will make everyone reclaimable here.
>
> Only if root_mem_cgroup->use_hierarchy is set.  At the same time, we
> usually behave as if this was the case, in accounting and reclaim.
>
> Right now we allow setting the soft limit in root_mem_cgroup but it
> does not make any sense.  After your patch, even less so, because of
> these shortcut checks that now actually change semantics.  Could we
> make this more consistent to users and forbid setting as soft limit in
> root_mem_cgroup?  Patch below.
>
> The reason this behaves differently from hard limits is because the
> soft limits now have double meaning; they are upper limit and minimum
> guarantee at the same time.  The unchangeable defaults in the root
> cgroup should be "no guarantee" and "unlimited soft limit" at the same
> time, but that is obviously not possible if these are opposing range
> ends of the same knob.  So we pick no guarantees, always up for
> reclaim when looking top down but also behave as if the soft limit was
> unlimited in the root cgroup when looking bottom up.
>
> This is what the second check does.  But I think it needs a clearer
> comment.
>
> ---
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: mm: memcg: forbid setting soft limit on root cgroup
>
> Setting a soft limit in the root cgroup does not make sense, as soft
> limits are enforced hierarchically and the root cgroup is the
> hierarchical parent of every other cgroup.  It would not provide the
> discrimination between groups that soft limits are usually used for.
>
> With the current implementation of soft limits, it would only make
> global reclaim more aggressive compared to target reclaim, but we
> absolutely don't want anyone to rely on this behaviour.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ac35bcc..21c45a0 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3905,6 +3967,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>                        ret = mem_cgroup_resize_memsw_limit(memcg, val);
>                break;
>        case RES_SOFT_LIMIT:
> +               if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +                       ret = -EINVAL;
> +                       break;
> +               }
>                ret = res_counter_memparse_write_strategy(buffer, &val);
>                if (ret)
>                        break;

Thanks, the patch makes sense to me and I will include in the next post.

--Ying

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages
  2012-06-19 12:05   ` Johannes Weiner
  2012-06-20  3:51     ` Ying Han
@ 2012-06-25 21:00     ` Ying Han
  1 sibling, 0 replies; 14+ messages in thread
From: Ying Han @ 2012-06-25 21:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Mel Gorman, KAMEZAWA Hiroyuki, Rik van Riel,
	Hillf Danton, Hugh Dickins, Dan Magenheimer, Andrew Morton,
	linux-mm

On Tue, Jun 19, 2012 at 5:05 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, Jun 18, 2012 at 09:47:31AM -0700, Ying Han wrote:
>> The function zone_reclaimable() marks zone->all_unreclaimable based on
>> per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true,
>> alloc_pages could go to OOM instead of getting stuck in page reclaim.
>
> There is no zone->all_unreclaimable at this point, you removed it in
> the previous patch.
>
>> In memcg kernel, cgroup under its softlimit is not targeted under global
>> reclaim. So we need to remove those pages from reclaimable_pages, otherwise
>> it will cause reclaim mechanism to get stuck trying to reclaim from
>> all_unreclaimable zone.
>
> Can't you check if zone->pages_scanned changed in between reclaim
> runs?
>
> Or sum up the scanned and reclaimable pages encountered while
> iterating the hierarchy during regular reclaim and then use those
> numbers in the equation instead of the per-zone counters?
>
> Walking the full global hierarchy in all the places where we check if
> a zone is reclaimable is a scalability nightmare.

One way to solve this is to record the per-zone reclaimable pages (
sum of reclaimable pages of memcg above softlimits ) after each
shrink_zone(). The later function does walk the memcg hierarchy and
also checks the softlimit, so we don't need to do it again. The new
value pages_reclaimed is recorded per-zone, and the caller side could
use that to compare w/ zone->pages_scanned.

While I run tests on the patch, it turns out that I can not reproduce
the problem ( machine hang while over-committing the softlimit) even
w/o the patch. Then I realize that the problem only exist in the
internal version we don't have the check "sc->priority < DEF_PRIORITY
- 2" to bypass softlimit check. The reason we did that part is to
guarantee no global pressure on high priority memcgs.  So In that
case, global reclaim can never steal any pages from any memgs and the
system can easily hang.

This is not the case in the version I am posting here. The patch
guarantees not looping in memcgs all under softlimit by :
1. detects whether no memcg above their softlimit, if so, skip
checking softlimit
2. only check softlimit memcg if priority is >= DEF_PRIORITY - 2

In summary, the problem described in this patch doesn't exist. So I am
thinking to drop this one on my next post. Please comment.

--Ying

>> @@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page)
>>       return lru;
>>  }
>>
>> +static inline unsigned long get_lru_size(struct lruvec *lruvec,
>> +                                      enum lru_list lru)
>> +{
>> +     if (!mem_cgroup_disabled())
>> +             return mem_cgroup_get_lru_size(lruvec, lru);
>> +
>> +     return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
>> +}
>> +
>>  static inline unsigned long zone_reclaimable_pages(struct zone *zone)
>>  {
>> -     int nr;
>> +     int nr = 0;
>> +     struct mem_cgroup *memcg;
>> +
>> +     memcg = mem_cgroup_iter(NULL, NULL, NULL);
>> +     do {
>> +             struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>>
>> -     nr = zone_page_state(zone, NR_ACTIVE_FILE) +
>> -          zone_page_state(zone, NR_INACTIVE_FILE);
>> +             if (should_reclaim_mem_cgroup(memcg)) {
>> +                     nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) +
>> +                           get_lru_size(lruvec, LRU_ACTIVE_FILE);
>
> Sometimes, the number of reclaimable pages DO include those of groups
> for which should_reclaim_mem_cgroup() is false: when the priority
> level is <= DEF_PRIORITY - 2, as you defined in 1/5!  This means that
> you consider pages you just scanned unreclaimable, which can result in
> the zone being unreclaimable after the DEF_PRIORITY - 2 cycle, no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-06-25 21:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-18 16:47 [PATCH V5 1/5] mm: memcg softlimit reclaim rework Ying Han
2012-06-18 16:47 ` [PATCH V2 2/5] mm: memcg set soft_limit_in_bytes to 0 by default Ying Han
2012-06-18 16:47 ` [PATCH V5 3/5] mm: memcg detect no memcgs above softlimit under zone reclaim Ying Han
2012-06-18 16:47 ` [PATCH V5 4/5] mm, vmscan: fix do_try_to_free_pages() livelock Ying Han
2012-06-19 18:29   ` KOSAKI Motohiro
2012-06-20  3:29     ` Ying Han
2012-06-18 16:47 ` [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages Ying Han
2012-06-19 12:05   ` Johannes Weiner
2012-06-20  3:51     ` Ying Han
2012-06-25 21:00     ` Ying Han
2012-06-19 11:29 ` [PATCH V5 1/5] mm: memcg softlimit reclaim rework Johannes Weiner
2012-06-20  3:45   ` Ying Han
2012-06-20  8:53     ` Johannes Weiner
2012-06-20 14:59       ` Ying Han

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.