All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  0:52 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:52 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, balbir, nishimura, kosaki.motohiro

Hi, this is a patch for implemnt softlimit to memcg.

I did some clean up and bug fixes. 

Anyway I have to look into details of "LRU scan algorithm" after this.

How this works:

 (1) Set softlimit threshold to memcg.
     #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.

 (2) Define priority as victim.
     #echo 3 > /cgroups/my_group/memory.softlimit_priority.
     0 is the lowest, 8 is the highest.
     If "8", softlimit feature ignore this group.
     default value is "8".

 (3) Add some memory pressure and make kswapd() work.
     kswapd will reclaim memory from victims paying regard to priority.

Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)

  While a process malloc 800MB of memory and touch it and sleep in a group,
  run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.

  Without softlimit => 400MB of malloc'ed memory are swapped out.
  With softlimit    =>  80MB of malloc'ed memory are swapped out. 

I think 80MB of swap is from direct memory reclaim path. And this
seems not to be terrible result.

I'll do more test on other hosts. Any comments are welcome.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  0:52 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:52 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, balbir, nishimura, kosaki.motohiro

Hi, this is a patch for implemnt softlimit to memcg.

I did some clean up and bug fixes. 

Anyway I have to look into details of "LRU scan algorithm" after this.

How this works:

 (1) Set softlimit threshold to memcg.
     #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.

 (2) Define priority as victim.
     #echo 3 > /cgroups/my_group/memory.softlimit_priority.
     0 is the lowest, 8 is the highest.
     If "8", softlimit feature ignore this group.
     default value is "8".

 (3) Add some memory pressure and make kswapd() work.
     kswapd will reclaim memory from victims paying regard to priority.

Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)

  While a process malloc 800MB of memory and touch it and sleep in a group,
  run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.

  Without softlimit => 400MB of malloc'ed memory are swapped out.
  With softlimit    =>  80MB of malloc'ed memory are swapped out. 

I think 80MB of swap is from direct memory reclaim path. And this
seems not to be terrible result.

I'll do more test on other hosts. Any comments are welcome.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  0:55   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro, akpm

Andrew, this [1/5] is a bug fix, others are not.

==
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Even when page reclaim is under mem_cgroup, # of scan page is determined by
status of global LRU. Fix that.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: mmotm-2.6.29-Mar10/mm/vmscan.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
+++ mmotm-2.6.29-Mar10/mm/vmscan.c
@@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
 		int file = is_file_lru(l);
 		int scan;
 
-		scan = zone_page_state(zone, NR_LRU_BASE + l);
+		scan = zone_nr_pages(zone, sc, l);
 		if (priority) {
 			scan >>= priority;
 			scan = (scan * percent[file]) / 100;


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  0:55   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:55 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro, akpm

Andrew, this [1/5] is a bug fix, others are not.

==
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Even when page reclaim is under mem_cgroup, # of scan page is determined by
status of global LRU. Fix that.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: mmotm-2.6.29-Mar10/mm/vmscan.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
+++ mmotm-2.6.29-Mar10/mm/vmscan.c
@@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
 		int file = is_file_lru(l);
 		int scan;
 
-		scan = zone_page_state(zone, NR_LRU_BASE + l);
+		scan = zone_nr_pages(zone, sc, l);
 		if (priority) {
 			scan >>= priority;
 			scan = (scan * percent[file]) / 100;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 2/5] add softlimit to res_counter
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  0:56   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Adds an interface for defining sotlimit per memcg. (no handler in this patch.)
softlimit paramater itself is added to res_counter and 
 res_counter_set_softlimit() and
 res_counter_check_under_softlimit() is provided as an interface.


Changelog v2->v3:
 - softlimit is moved to res_counter
Changelog v1->v2:
 - For refactoring, divided a patch into 2 part and this patch just
   involves memory.softlimit interface.
 - Removed governor-detect routine, it was buggy in design.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    9 +++++++++
 kernel/res_counter.c        |   29 +++++++++++++++++++++++++++++
 mm/memcontrol.c             |   12 ++++++++++++
 3 files changed, 50 insertions(+)

Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar10/mm/memcontrol.c
@@ -2002,6 +2002,12 @@ static int mem_cgroup_write(struct cgrou
 		else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
+	case RES_SOFTLIMIT:
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (ret)
+			break;
+		ret = res_counter_set_softlimit(&memcg->res, val);
+		break;
 	default:
 		ret = -EINVAL; /* should be BUG() ? */
 		break;
@@ -2251,6 +2257,12 @@ static struct cftype mem_cgroup_files[] 
 		.read_u64 = mem_cgroup_read,
 	},
 	{
+		.name = "softlimit_in_bytes",
+		.private = MEMFILE_PRIVATE(_MEM, RES_SOFTLIMIT),
+		.write_string = mem_cgroup_write,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
 		.name = "failcnt",
 		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
 		.trigger = mem_cgroup_reset,
Index: mmotm-2.6.29-Mar10/include/linux/res_counter.h
===================================================================
--- mmotm-2.6.29-Mar10.orig/include/linux/res_counter.h
+++ mmotm-2.6.29-Mar10/include/linux/res_counter.h
@@ -39,6 +39,10 @@ struct res_counter {
 	 */
 	unsigned long long failcnt;
 	/*
+	 * the softlimit.
+	 */
+	unsigned long long softlimit;
+	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
 	 */
@@ -85,6 +89,7 @@ enum {
 	RES_MAX_USAGE,
 	RES_LIMIT,
 	RES_FAILCNT,
+	RES_SOFTLIMIT,
 };
 
 /*
@@ -178,4 +183,8 @@ static inline int res_counter_set_limit(
 	return ret;
 }
 
+/* res_counter's softlimit check can handles hierarchy in proper way */
+int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val);
+bool res_counter_check_under_softlimit(struct res_counter *cnt);
+
 #endif
Index: mmotm-2.6.29-Mar10/kernel/res_counter.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/kernel/res_counter.c
+++ mmotm-2.6.29-Mar10/kernel/res_counter.c
@@ -20,6 +20,7 @@ void res_counter_init(struct res_counter
 	spin_lock_init(&counter->lock);
 	counter->limit = (unsigned long long)LLONG_MAX;
 	counter->parent = parent;
+	counter->softlimit = (unsigned long long)LLONG_MAX;
 }
 
 int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
@@ -88,6 +89,32 @@ void res_counter_uncharge(struct res_cou
 	local_irq_restore(flags);
 }
 
+int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	cnt->softlimit = val;
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return 0;
+}
+
+bool res_counter_check_under_softlimit(struct res_counter *cnt)
+{
+	struct res_counter *c;
+	unsigned long flags;
+	bool ret = true;
+
+	local_irq_save(flags);
+	for (c = cnt; ret && c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->softlimit < c->usage)
+			ret = false;
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+	return ret;
+}
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
@@ -101,6 +128,8 @@ res_counter_member(struct res_counter *c
 		return &counter->limit;
 	case RES_FAILCNT:
 		return &counter->failcnt;
+	case RES_SOFTLIMIT:
+		return &counter->softlimit;
 	};
 
 	BUG();


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 2/5] add softlimit to res_counter
@ 2009-03-12  0:56   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Adds an interface for defining sotlimit per memcg. (no handler in this patch.)
softlimit paramater itself is added to res_counter and 
 res_counter_set_softlimit() and
 res_counter_check_under_softlimit() is provided as an interface.


Changelog v2->v3:
 - softlimit is moved to res_counter
Changelog v1->v2:
 - For refactoring, divided a patch into 2 part and this patch just
   involves memory.softlimit interface.
 - Removed governor-detect routine, it was buggy in design.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/res_counter.h |    9 +++++++++
 kernel/res_counter.c        |   29 +++++++++++++++++++++++++++++
 mm/memcontrol.c             |   12 ++++++++++++
 3 files changed, 50 insertions(+)

Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar10/mm/memcontrol.c
@@ -2002,6 +2002,12 @@ static int mem_cgroup_write(struct cgrou
 		else
 			ret = mem_cgroup_resize_memsw_limit(memcg, val);
 		break;
+	case RES_SOFTLIMIT:
+		ret = res_counter_memparse_write_strategy(buffer, &val);
+		if (ret)
+			break;
+		ret = res_counter_set_softlimit(&memcg->res, val);
+		break;
 	default:
 		ret = -EINVAL; /* should be BUG() ? */
 		break;
@@ -2251,6 +2257,12 @@ static struct cftype mem_cgroup_files[] 
 		.read_u64 = mem_cgroup_read,
 	},
 	{
+		.name = "softlimit_in_bytes",
+		.private = MEMFILE_PRIVATE(_MEM, RES_SOFTLIMIT),
+		.write_string = mem_cgroup_write,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
 		.name = "failcnt",
 		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
 		.trigger = mem_cgroup_reset,
Index: mmotm-2.6.29-Mar10/include/linux/res_counter.h
===================================================================
--- mmotm-2.6.29-Mar10.orig/include/linux/res_counter.h
+++ mmotm-2.6.29-Mar10/include/linux/res_counter.h
@@ -39,6 +39,10 @@ struct res_counter {
 	 */
 	unsigned long long failcnt;
 	/*
+	 * the softlimit.
+	 */
+	unsigned long long softlimit;
+	/*
 	 * the lock to protect all of the above.
 	 * the routines below consider this to be IRQ-safe
 	 */
@@ -85,6 +89,7 @@ enum {
 	RES_MAX_USAGE,
 	RES_LIMIT,
 	RES_FAILCNT,
+	RES_SOFTLIMIT,
 };
 
 /*
@@ -178,4 +183,8 @@ static inline int res_counter_set_limit(
 	return ret;
 }
 
+/* res_counter's softlimit check can handles hierarchy in proper way */
+int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val);
+bool res_counter_check_under_softlimit(struct res_counter *cnt);
+
 #endif
Index: mmotm-2.6.29-Mar10/kernel/res_counter.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/kernel/res_counter.c
+++ mmotm-2.6.29-Mar10/kernel/res_counter.c
@@ -20,6 +20,7 @@ void res_counter_init(struct res_counter
 	spin_lock_init(&counter->lock);
 	counter->limit = (unsigned long long)LLONG_MAX;
 	counter->parent = parent;
+	counter->softlimit = (unsigned long long)LLONG_MAX;
 }
 
 int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
@@ -88,6 +89,32 @@ void res_counter_uncharge(struct res_cou
 	local_irq_restore(flags);
 }
 
+int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	cnt->softlimit = val;
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return 0;
+}
+
+bool res_counter_check_under_softlimit(struct res_counter *cnt)
+{
+	struct res_counter *c;
+	unsigned long flags;
+	bool ret = true;
+
+	local_irq_save(flags);
+	for (c = cnt; ret && c != NULL; c = c->parent) {
+		spin_lock(&c->lock);
+		if (c->softlimit < c->usage)
+			ret = false;
+		spin_unlock(&c->lock);
+	}
+	local_irq_restore(flags);
+	return ret;
+}
 
 static inline unsigned long long *
 res_counter_member(struct res_counter *counter, int member)
@@ -101,6 +128,8 @@ res_counter_member(struct res_counter *c
 		return &counter->limit;
 	case RES_FAILCNT:
 		return &counter->failcnt;
+	case RES_SOFTLIMIT:
+		return &counter->softlimit;
 	};
 
 	BUG();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 3/5] memcg per zone softlimit scheduler core
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  0:57   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This patch implements per-zone queue for softlimit and adds some
member to memcg.
(This patch adds softlimit_priority but interface to modify this is
 in other patch.)

There are following requirements to implement softlimit.
  - softlimit has to check the whole usage of memcg v.s. softlimit.
  - hierarchy should be handled.
  - Need to know per-zone usage for making a cgroup to be victim.
  - Keeping predictability of behavior by users is important.
  - We want to avoid too much scan and global locks.

Considering above, this patch's softlimit handling concept is
  - Handle softlimit by priority queue
  - Use per-zone priority queue
  - Victim selection algorithm is static priority round robin
  - Prepare 2 lines of queue , Active Queue and Inactive queue.
    If an entry on Active queue doesn't hit condition for softlimit,
    it's moved to Inactive queue.
  - When reschedule_all() is called, Inactive queues are merged to
    Active queue to check all again.

For easy review, user interface etc...is in other patches.

Changelog v2->v3:
 - removed global rwsem.
 - renamed some definitions.
 - fixed problem at memory cgroup is disabled case.
 - almost all comments are rewritten.
 - removed sl_state from per-zone struct. added queue->victim.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |   20 +++
 mm/memcontrol.c            |  232 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 250 insertions(+), 2 deletions(-)

Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar10/mm/memcontrol.c
@@ -116,6 +116,9 @@ struct mem_cgroup_per_zone {
 	unsigned long		count[NR_LRU_LISTS];
 
 	struct zone_reclaim_stat reclaim_stat;
+	/* For softlimit per-zone queue. See softlimit handling code. */
+	struct mem_cgroup *mem;
+	struct list_head sl_queue;
 };
 /* Macro for accessing counter */
 #define MEM_CGROUP_ZSTAT(mz, idx)	((mz)->count[(idx)])
@@ -175,7 +178,11 @@ struct mem_cgroup {
 	atomic_t	refcnt;
 
 	unsigned int	swappiness;
-
+	/*
+	 * priority of softlimit.
+	 */
+	int softlimit_priority;
+	struct mutex softlimit_mutex;
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -1916,6 +1923,221 @@ int mem_cgroup_force_empty_write(struct 
 	return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true);
 }
 
+/*
+ * SoftLimit
+ */
+/*
+ * Priority of softlimit. This is a scheduling parameter for softlimit victim
+ * selection logic. Low number is low priority. If priority is maximum, the
+ * cgroup will never be victim at softlimit memory reclaiming.
+ */
+#define SOFTLIMIT_MAXPRI (8)
+
+/* Name of queue for softlimit */
+enum {
+	SLQ_ACTIVE, /* queue for candidates of softlimit victim */
+	SLQ_INACTIVE, /* queue for not-candidates of softlimit victim */
+	SLQ_NUM,
+};
+/*
+ * On this queue, mem_cgroup_per_zone will be enqueued (sl_queue is used.)
+ * mz can take following 4 state.
+ * softlimitq_zone->victim == mz (selected by kswapd) or
+ * on ACTIVE queue (candidates for victim)
+ * on INACTIVE queue (not candidates for victim but prirority is not the highest
+ * out-of-queue (has the maximum priority or on some transition status)
+ */
+struct softlimitq_zone {
+	spinlock_t lock;
+	struct mem_cgroup_per_zone *victim;
+	struct list_head queue[SLQ_NUM][SOFTLIMIT_MAXPRI];
+};
+
+struct softlimitq_node {
+	struct softlimitq_zone zone[MAX_NR_ZONES];
+};
+
+struct softlimitq_node *softlimitq[MAX_NUMNODES];
+
+/* Return queue head for zone */
+static inline struct softlimitq_zone *softlimit_queue(int nid, int zid)
+{
+	return &softlimitq[nid]->zone[zid];
+}
+
+static void __init softlimitq_init(void)
+{
+	struct softlimitq_node *sqn;
+	struct softlimitq_zone *sqz;
+	int nid, zid, i;
+
+	for_each_node_state(nid, N_POSSIBLE) {
+		int tmp = nid;
+
+		if (!node_state(tmp, N_NORMAL_MEMORY))
+			tmp = -1;
+		sqn = kmalloc_node(sizeof(*sqn), GFP_KERNEL, tmp);
+		BUG_ON(!sqn);
+		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+			sqz = &sqn->zone[zid];
+			spin_lock_init(&sqz->lock);
+			sqz->victim = NULL;
+			for (i = 0; i < SOFTLIMIT_MAXPRI;i++) {
+				INIT_LIST_HEAD(&sqz->queue[SLQ_ACTIVE][i]);
+				INIT_LIST_HEAD(&sqz->queue[SLQ_INACTIVE][i]);
+			}
+		}
+		softlimitq[nid] = sqn;
+	}
+}
+
+/*
+ * Add (or remove) all mz of mem_cgroup to the queue. Using open codes to
+ * to handle racy corner case. Called by softlimit_priority user interface.
+ */
+static void memcg_softlimit_requeue(struct mem_cgroup *mem, int prio)
+{
+	int nid, zid;
+
+	/*
+	 * This mutex is for serializing multiple writers to softlimit file...
+	 * pesimistic but necessary for sanity.
+	 */
+	mutex_lock(&mem->softlimit_mutex);
+	mem->softlimit_priority = prio;
+
+	for_each_node_state(nid, N_POSSIBLE) {
+		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+			struct softlimitq_zone *sqz;
+			struct mem_cgroup_per_zone *mz;
+
+			sqz = softlimit_queue(nid, zid);
+			mz = mem_cgroup_zoneinfo(mem, nid, zid);
+			spin_lock(&sqz->lock);
+			/* If now grabbed by kswapd(), nothing to do */
+			if (sqz->victim != mz) {
+				list_del_init(&mz->sl_queue);
+				if (prio < SOFTLIMIT_MAXPRI)
+					list_add_tail(&mz->sl_queue,
+						&sqz->queue[SLQ_ACTIVE][prio]);
+			}
+			spin_unlock(&sqz->lock);
+		}
+	}
+	mutex_unlock(&mem->softlimit_mutex);
+}
+
+/*
+ * Join inactive list to active list to restart schedule and
+ * refresh queue information
+ */
+static void __softlimit_join_queue(int nid, int zid)
+{
+	struct softlimitq_zone *sqz = softlimit_queue(nid, zid);
+	int i;
+
+	spin_lock(&sqz->lock);
+	for (i = 0; i < SOFTLIMIT_MAXPRI; i++)
+		list_splice_tail_init(&sqz->queue[SLQ_INACTIVE][i],
+				      &sqz->queue[SLQ_ACTIVE][i]);
+	spin_unlock(&sqz->lock);
+}
+
+/* Return # of evictable memory in zone */
+static int mz_evictable_usage(struct mem_cgroup_per_zone *mz)
+{
+	long usage = 0;
+
+	if (nr_swap_pages) {
+		usage += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_ANON);
+		usage += MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_ANON);
+	}
+	usage += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_FILE);
+	usage += MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_FILE);
+
+	return usage;
+}
+
+struct mem_cgroup *mem_cgroup_schedule(int nid, int zid)
+{
+	struct softlimitq_zone *sqz;
+	struct mem_cgroup_per_zone *mz;
+	struct mem_cgroup *mem, *ret;
+	int prio;
+
+	if (mem_cgroup_disabled())
+		return NULL;
+	sqz = softlimit_queue(nid, zid);
+	ret = NULL;
+	spin_lock(&sqz->lock);
+	for (prio = 0; prio < SOFTLIMIT_MAXPRI; prio++) {
+		if (list_empty(&sqz->queue[SLQ_ACTIVE][prio]))
+			continue;
+		mz = list_first_entry(&sqz->queue[SLQ_ACTIVE][prio],
+				      struct mem_cgroup_per_zone, sl_queue);
+		list_del_init(&mz->sl_queue);
+		/*
+		 * Victim will be selected if
+		 * 1. it has memory in this zone.
+		 * 2. usage is bigger than softlimit
+		 * 3. it's not obsolete.
+		 */
+		if (mz_evictable_usage(mz)) {
+			mem = mz->mem;
+			if (!res_counter_check_under_softlimit(&mem->res)
+			    && css_tryget(&mem->css)) {
+				sqz->victim = mz;
+				ret = mem;
+				break;
+			}
+		}
+		/* This is not a candidate. enqueue this to INACTIVE list */
+		list_add_tail(&mz->sl_queue, &sqz->queue[SLQ_INACTIVE][prio]);
+	}
+	spin_unlock(&sqz->lock);
+	return ret;
+}
+
+/* requeue selected victim */
+void
+mem_cgroup_schedule_end(int nid, int zid, struct mem_cgroup *mem, bool hint)
+{
+	struct mem_cgroup_per_zone *mz;
+	struct softlimitq_zone *sqz;
+	long usage;
+	int prio;
+
+	if (!mem)
+		return;
+
+	sqz = softlimit_queue(nid, zid);
+	mz = mem_cgroup_zoneinfo(mem, nid, zid);
+	spin_lock(&sqz->lock);
+	/* clear information */
+	sqz->victim = NULL;
+	prio = mem->softlimit_priority;
+	/* priority can be changed */
+	if (prio == SOFTLIMIT_MAXPRI)
+		goto out;
+
+	usage = mz_evictable_usage(mz);
+	/* worth to be requeued ? */
+	if (hint)
+		list_add_tail(&mz->sl_queue, &sqz->queue[SLQ_ACTIVE][prio]);
+	else
+		list_add_tail(&mz->sl_queue, &sqz->queue[SLQ_INACTIVE][prio]);
+out:
+	spin_unlock(&sqz->lock);
+	css_put(&mem->css);
+}
+
+void mem_cgroup_reschedule_all(int nid)
+{
+	int zid;
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++)
+		__softlimit_join_queue(nid, zid);
+}
 
 static u64 mem_cgroup_hierarchy_read(struct cgroup *cont, struct cftype *cft)
 {
@@ -2356,6 +2578,8 @@ static int alloc_mem_cgroup_per_zone_inf
 		mz = &pn->zoneinfo[zone];
 		for_each_lru(l)
 			INIT_LIST_HEAD(&mz->lists[l]);
+		INIT_LIST_HEAD(&mz->sl_queue);
+		mz->mem = mem;
 	}
 	return 0;
 }
@@ -2466,6 +2690,7 @@ mem_cgroup_create(struct cgroup_subsys *
 	/* root ? */
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
+		softlimitq_init();
 		parent = NULL;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
@@ -2487,6 +2712,8 @@ mem_cgroup_create(struct cgroup_subsys *
 		res_counter_init(&mem->memsw, NULL);
 	}
 	mem->last_scanned_child = 0;
+	mem->softlimit_priority = SOFTLIMIT_MAXPRI;
+	mutex_init(&mem->softlimit_mutex);
 	spin_lock_init(&mem->reclaim_param_lock);
 
 	if (parent)
@@ -2510,7 +2737,8 @@ static void mem_cgroup_destroy(struct cg
 				struct cgroup *cont)
 {
 	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
-
+	/* By calling this with MAXPRI, mz->sl_queue will be removed */
+	memcg_softlimit_requeue(mem, SOFTLIMIT_MAXPRI);
 	mem_cgroup_put(mem);
 }
 
Index: mmotm-2.6.29-Mar10/include/linux/memcontrol.h
===================================================================
--- mmotm-2.6.29-Mar10.orig/include/linux/memcontrol.h
+++ mmotm-2.6.29-Mar10/include/linux/memcontrol.h
@@ -117,6 +117,12 @@ static inline bool mem_cgroup_disabled(v
 
 extern bool mem_cgroup_oom_called(struct task_struct *task);
 
+/* softlimit */
+struct mem_cgroup *mem_cgroup_schedule(int nid, int zid);
+void mem_cgroup_schedule_end(int nid, int zid,
+		struct mem_cgroup *mem, bool hint);
+void mem_cgroup_reschedule_all(int nid);
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
@@ -264,6 +270,20 @@ mem_cgroup_print_oom_info(struct mem_cgr
 {
 }
 
+struct mem_cgroup *mem_cgroup_schedule(int nid, int zid)
+{
+	return NULL;
+}
+
+void mem_cgroup_schedule_end(int nid, int zid,
+	struct mem_cgroup *mem, bool hint)
+{
+}
+
+void mem_cgroup_reschedule(int nid)
+{
+}
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 3/5] memcg per zone softlimit scheduler core
@ 2009-03-12  0:57   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This patch implements per-zone queue for softlimit and adds some
member to memcg.
(This patch adds softlimit_priority but interface to modify this is
 in other patch.)

There are following requirements to implement softlimit.
  - softlimit has to check the whole usage of memcg v.s. softlimit.
  - hierarchy should be handled.
  - Need to know per-zone usage for making a cgroup to be victim.
  - Keeping predictability of behavior by users is important.
  - We want to avoid too much scan and global locks.

Considering above, this patch's softlimit handling concept is
  - Handle softlimit by priority queue
  - Use per-zone priority queue
  - Victim selection algorithm is static priority round robin
  - Prepare 2 lines of queue , Active Queue and Inactive queue.
    If an entry on Active queue doesn't hit condition for softlimit,
    it's moved to Inactive queue.
  - When reschedule_all() is called, Inactive queues are merged to
    Active queue to check all again.

For easy review, user interface etc...is in other patches.

Changelog v2->v3:
 - removed global rwsem.
 - renamed some definitions.
 - fixed problem at memory cgroup is disabled case.
 - almost all comments are rewritten.
 - removed sl_state from per-zone struct. added queue->victim.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/memcontrol.h |   20 +++
 mm/memcontrol.c            |  232 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 250 insertions(+), 2 deletions(-)

Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar10/mm/memcontrol.c
@@ -116,6 +116,9 @@ struct mem_cgroup_per_zone {
 	unsigned long		count[NR_LRU_LISTS];
 
 	struct zone_reclaim_stat reclaim_stat;
+	/* For softlimit per-zone queue. See softlimit handling code. */
+	struct mem_cgroup *mem;
+	struct list_head sl_queue;
 };
 /* Macro for accessing counter */
 #define MEM_CGROUP_ZSTAT(mz, idx)	((mz)->count[(idx)])
@@ -175,7 +178,11 @@ struct mem_cgroup {
 	atomic_t	refcnt;
 
 	unsigned int	swappiness;
-
+	/*
+	 * priority of softlimit.
+	 */
+	int softlimit_priority;
+	struct mutex softlimit_mutex;
 	/*
 	 * statistics. This must be placed at the end of memcg.
 	 */
@@ -1916,6 +1923,221 @@ int mem_cgroup_force_empty_write(struct 
 	return mem_cgroup_force_empty(mem_cgroup_from_cont(cont), true);
 }
 
+/*
+ * SoftLimit
+ */
+/*
+ * Priority of softlimit. This is a scheduling parameter for softlimit victim
+ * selection logic. Low number is low priority. If priority is maximum, the
+ * cgroup will never be victim at softlimit memory reclaiming.
+ */
+#define SOFTLIMIT_MAXPRI (8)
+
+/* Name of queue for softlimit */
+enum {
+	SLQ_ACTIVE, /* queue for candidates of softlimit victim */
+	SLQ_INACTIVE, /* queue for not-candidates of softlimit victim */
+	SLQ_NUM,
+};
+/*
+ * On this queue, mem_cgroup_per_zone will be enqueued (sl_queue is used.)
+ * mz can take following 4 state.
+ * softlimitq_zone->victim == mz (selected by kswapd) or
+ * on ACTIVE queue (candidates for victim)
+ * on INACTIVE queue (not candidates for victim but prirority is not the highest
+ * out-of-queue (has the maximum priority or on some transition status)
+ */
+struct softlimitq_zone {
+	spinlock_t lock;
+	struct mem_cgroup_per_zone *victim;
+	struct list_head queue[SLQ_NUM][SOFTLIMIT_MAXPRI];
+};
+
+struct softlimitq_node {
+	struct softlimitq_zone zone[MAX_NR_ZONES];
+};
+
+struct softlimitq_node *softlimitq[MAX_NUMNODES];
+
+/* Return queue head for zone */
+static inline struct softlimitq_zone *softlimit_queue(int nid, int zid)
+{
+	return &softlimitq[nid]->zone[zid];
+}
+
+static void __init softlimitq_init(void)
+{
+	struct softlimitq_node *sqn;
+	struct softlimitq_zone *sqz;
+	int nid, zid, i;
+
+	for_each_node_state(nid, N_POSSIBLE) {
+		int tmp = nid;
+
+		if (!node_state(tmp, N_NORMAL_MEMORY))
+			tmp = -1;
+		sqn = kmalloc_node(sizeof(*sqn), GFP_KERNEL, tmp);
+		BUG_ON(!sqn);
+		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+			sqz = &sqn->zone[zid];
+			spin_lock_init(&sqz->lock);
+			sqz->victim = NULL;
+			for (i = 0; i < SOFTLIMIT_MAXPRI;i++) {
+				INIT_LIST_HEAD(&sqz->queue[SLQ_ACTIVE][i]);
+				INIT_LIST_HEAD(&sqz->queue[SLQ_INACTIVE][i]);
+			}
+		}
+		softlimitq[nid] = sqn;
+	}
+}
+
+/*
+ * Add (or remove) all mz of mem_cgroup to the queue. Using open codes to
+ * to handle racy corner case. Called by softlimit_priority user interface.
+ */
+static void memcg_softlimit_requeue(struct mem_cgroup *mem, int prio)
+{
+	int nid, zid;
+
+	/*
+	 * This mutex is for serializing multiple writers to softlimit file...
+	 * pesimistic but necessary for sanity.
+	 */
+	mutex_lock(&mem->softlimit_mutex);
+	mem->softlimit_priority = prio;
+
+	for_each_node_state(nid, N_POSSIBLE) {
+		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+			struct softlimitq_zone *sqz;
+			struct mem_cgroup_per_zone *mz;
+
+			sqz = softlimit_queue(nid, zid);
+			mz = mem_cgroup_zoneinfo(mem, nid, zid);
+			spin_lock(&sqz->lock);
+			/* If now grabbed by kswapd(), nothing to do */
+			if (sqz->victim != mz) {
+				list_del_init(&mz->sl_queue);
+				if (prio < SOFTLIMIT_MAXPRI)
+					list_add_tail(&mz->sl_queue,
+						&sqz->queue[SLQ_ACTIVE][prio]);
+			}
+			spin_unlock(&sqz->lock);
+		}
+	}
+	mutex_unlock(&mem->softlimit_mutex);
+}
+
+/*
+ * Join inactive list to active list to restart schedule and
+ * refresh queue information
+ */
+static void __softlimit_join_queue(int nid, int zid)
+{
+	struct softlimitq_zone *sqz = softlimit_queue(nid, zid);
+	int i;
+
+	spin_lock(&sqz->lock);
+	for (i = 0; i < SOFTLIMIT_MAXPRI; i++)
+		list_splice_tail_init(&sqz->queue[SLQ_INACTIVE][i],
+				      &sqz->queue[SLQ_ACTIVE][i]);
+	spin_unlock(&sqz->lock);
+}
+
+/* Return # of evictable memory in zone */
+static int mz_evictable_usage(struct mem_cgroup_per_zone *mz)
+{
+	long usage = 0;
+
+	if (nr_swap_pages) {
+		usage += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_ANON);
+		usage += MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_ANON);
+	}
+	usage += MEM_CGROUP_ZSTAT(mz, LRU_ACTIVE_FILE);
+	usage += MEM_CGROUP_ZSTAT(mz, LRU_INACTIVE_FILE);
+
+	return usage;
+}
+
+struct mem_cgroup *mem_cgroup_schedule(int nid, int zid)
+{
+	struct softlimitq_zone *sqz;
+	struct mem_cgroup_per_zone *mz;
+	struct mem_cgroup *mem, *ret;
+	int prio;
+
+	if (mem_cgroup_disabled())
+		return NULL;
+	sqz = softlimit_queue(nid, zid);
+	ret = NULL;
+	spin_lock(&sqz->lock);
+	for (prio = 0; prio < SOFTLIMIT_MAXPRI; prio++) {
+		if (list_empty(&sqz->queue[SLQ_ACTIVE][prio]))
+			continue;
+		mz = list_first_entry(&sqz->queue[SLQ_ACTIVE][prio],
+				      struct mem_cgroup_per_zone, sl_queue);
+		list_del_init(&mz->sl_queue);
+		/*
+		 * Victim will be selected if
+		 * 1. it has memory in this zone.
+		 * 2. usage is bigger than softlimit
+		 * 3. it's not obsolete.
+		 */
+		if (mz_evictable_usage(mz)) {
+			mem = mz->mem;
+			if (!res_counter_check_under_softlimit(&mem->res)
+			    && css_tryget(&mem->css)) {
+				sqz->victim = mz;
+				ret = mem;
+				break;
+			}
+		}
+		/* This is not a candidate. enqueue this to INACTIVE list */
+		list_add_tail(&mz->sl_queue, &sqz->queue[SLQ_INACTIVE][prio]);
+	}
+	spin_unlock(&sqz->lock);
+	return ret;
+}
+
+/* requeue selected victim */
+void
+mem_cgroup_schedule_end(int nid, int zid, struct mem_cgroup *mem, bool hint)
+{
+	struct mem_cgroup_per_zone *mz;
+	struct softlimitq_zone *sqz;
+	long usage;
+	int prio;
+
+	if (!mem)
+		return;
+
+	sqz = softlimit_queue(nid, zid);
+	mz = mem_cgroup_zoneinfo(mem, nid, zid);
+	spin_lock(&sqz->lock);
+	/* clear information */
+	sqz->victim = NULL;
+	prio = mem->softlimit_priority;
+	/* priority can be changed */
+	if (prio == SOFTLIMIT_MAXPRI)
+		goto out;
+
+	usage = mz_evictable_usage(mz);
+	/* worth to be requeued ? */
+	if (hint)
+		list_add_tail(&mz->sl_queue, &sqz->queue[SLQ_ACTIVE][prio]);
+	else
+		list_add_tail(&mz->sl_queue, &sqz->queue[SLQ_INACTIVE][prio]);
+out:
+	spin_unlock(&sqz->lock);
+	css_put(&mem->css);
+}
+
+void mem_cgroup_reschedule_all(int nid)
+{
+	int zid;
+
+	for (zid = 0; zid < MAX_NR_ZONES; zid++)
+		__softlimit_join_queue(nid, zid);
+}
 
 static u64 mem_cgroup_hierarchy_read(struct cgroup *cont, struct cftype *cft)
 {
@@ -2356,6 +2578,8 @@ static int alloc_mem_cgroup_per_zone_inf
 		mz = &pn->zoneinfo[zone];
 		for_each_lru(l)
 			INIT_LIST_HEAD(&mz->lists[l]);
+		INIT_LIST_HEAD(&mz->sl_queue);
+		mz->mem = mem;
 	}
 	return 0;
 }
@@ -2466,6 +2690,7 @@ mem_cgroup_create(struct cgroup_subsys *
 	/* root ? */
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
+		softlimitq_init();
 		parent = NULL;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
@@ -2487,6 +2712,8 @@ mem_cgroup_create(struct cgroup_subsys *
 		res_counter_init(&mem->memsw, NULL);
 	}
 	mem->last_scanned_child = 0;
+	mem->softlimit_priority = SOFTLIMIT_MAXPRI;
+	mutex_init(&mem->softlimit_mutex);
 	spin_lock_init(&mem->reclaim_param_lock);
 
 	if (parent)
@@ -2510,7 +2737,8 @@ static void mem_cgroup_destroy(struct cg
 				struct cgroup *cont)
 {
 	struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
-
+	/* By calling this with MAXPRI, mz->sl_queue will be removed */
+	memcg_softlimit_requeue(mem, SOFTLIMIT_MAXPRI);
 	mem_cgroup_put(mem);
 }
 
Index: mmotm-2.6.29-Mar10/include/linux/memcontrol.h
===================================================================
--- mmotm-2.6.29-Mar10.orig/include/linux/memcontrol.h
+++ mmotm-2.6.29-Mar10/include/linux/memcontrol.h
@@ -117,6 +117,12 @@ static inline bool mem_cgroup_disabled(v
 
 extern bool mem_cgroup_oom_called(struct task_struct *task);
 
+/* softlimit */
+struct mem_cgroup *mem_cgroup_schedule(int nid, int zid);
+void mem_cgroup_schedule_end(int nid, int zid,
+		struct mem_cgroup *mem, bool hint);
+void mem_cgroup_reschedule_all(int nid);
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
@@ -264,6 +270,20 @@ mem_cgroup_print_oom_info(struct mem_cgr
 {
 }
 
+struct mem_cgroup *mem_cgroup_schedule(int nid, int zid)
+{
+	return NULL;
+}
+
+void mem_cgroup_schedule_end(int nid, int zid,
+	struct mem_cgroup *mem, bool hint)
+{
+}
+
+void mem_cgroup_reschedule(int nid)
+{
+}
+
 #endif /* CONFIG_CGROUP_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 4/5] memcg softlimit_priority
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  0:58   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

An iterface to set/read softlimit priority of cgroup.

Changelog: v2->v3
 - removed complicated handling of hierarchy.
   i.e. Changes in priority doesn't affect children.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar10/mm/memcontrol.c
@@ -217,6 +217,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
 #define MEMFILE_TYPE(val)	(((val) >> 16) & 0xffff)
 #define MEMFILE_ATTR(val)	((val) & 0xffff)
 
+#define MEM_SOFTLIMIT_PRIO     (0x10)
+
 static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
@@ -2187,7 +2189,14 @@ static u64 mem_cgroup_read(struct cgroup
 	name = MEMFILE_ATTR(cft->private);
 	switch (type) {
 	case _MEM:
-		val = res_counter_read_u64(&mem->res, name);
+		switch (name) {
+		case MEM_SOFTLIMIT_PRIO:
+			val = mem->softlimit_priority;
+			break;
+		default:
+			val = res_counter_read_u64(&mem->res, name);
+			break;
+		}
 		break;
 	case _MEMSWAP:
 		if (do_swap_account)
@@ -2290,6 +2299,18 @@ static int mem_cgroup_reset(struct cgrou
 	return 0;
 }
 
+static int mem_cgroup_write_softlimit_priority(struct cgroup *cgrp,
+					struct cftype *cft,
+					u64 val)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	int priority = (int)val;
+
+	if ((priority < 0) || (priority > SOFTLIMIT_MAXPRI))
+		return -EINVAL;
+	memcg_softlimit_requeue(memcg, priority);
+	return 0;
+}
 
 /* For read statistics */
 enum {
@@ -2485,6 +2506,12 @@ static struct cftype mem_cgroup_files[] 
 		.read_u64 = mem_cgroup_read,
 	},
 	{
+		.name = "softlimit_priority",
+		.private = MEMFILE_PRIVATE(_MEM, MEM_SOFTLIMIT_PRIO),
+		.write_u64 = mem_cgroup_write_softlimit_priority,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
 		.name = "failcnt",
 		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
 		.trigger = mem_cgroup_reset,
@@ -2711,8 +2738,8 @@ mem_cgroup_create(struct cgroup_subsys *
 		res_counter_init(&mem->res, NULL);
 		res_counter_init(&mem->memsw, NULL);
 	}
-	mem->last_scanned_child = 0;
 	mem->softlimit_priority = SOFTLIMIT_MAXPRI;
+	mem->last_scanned_child = 0;
 	mutex_init(&mem->softlimit_mutex);
 	spin_lock_init(&mem->reclaim_param_lock);
 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 4/5] memcg softlimit_priority
@ 2009-03-12  0:58   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  0:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

An iterface to set/read softlimit priority of cgroup.

Changelog: v2->v3
 - removed complicated handling of hierarchy.
   i.e. Changes in priority doesn't affect children.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
+++ mmotm-2.6.29-Mar10/mm/memcontrol.c
@@ -217,6 +217,8 @@ pcg_default_flags[NR_CHARGE_TYPE] = {
 #define MEMFILE_TYPE(val)	(((val) >> 16) & 0xffff)
 #define MEMFILE_ATTR(val)	((val) & 0xffff)
 
+#define MEM_SOFTLIMIT_PRIO     (0x10)
+
 static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
@@ -2187,7 +2189,14 @@ static u64 mem_cgroup_read(struct cgroup
 	name = MEMFILE_ATTR(cft->private);
 	switch (type) {
 	case _MEM:
-		val = res_counter_read_u64(&mem->res, name);
+		switch (name) {
+		case MEM_SOFTLIMIT_PRIO:
+			val = mem->softlimit_priority;
+			break;
+		default:
+			val = res_counter_read_u64(&mem->res, name);
+			break;
+		}
 		break;
 	case _MEMSWAP:
 		if (do_swap_account)
@@ -2290,6 +2299,18 @@ static int mem_cgroup_reset(struct cgrou
 	return 0;
 }
 
+static int mem_cgroup_write_softlimit_priority(struct cgroup *cgrp,
+					struct cftype *cft,
+					u64 val)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp);
+	int priority = (int)val;
+
+	if ((priority < 0) || (priority > SOFTLIMIT_MAXPRI))
+		return -EINVAL;
+	memcg_softlimit_requeue(memcg, priority);
+	return 0;
+}
 
 /* For read statistics */
 enum {
@@ -2485,6 +2506,12 @@ static struct cftype mem_cgroup_files[] 
 		.read_u64 = mem_cgroup_read,
 	},
 	{
+		.name = "softlimit_priority",
+		.private = MEMFILE_PRIVATE(_MEM, MEM_SOFTLIMIT_PRIO),
+		.write_u64 = mem_cgroup_write_softlimit_priority,
+		.read_u64 = mem_cgroup_read,
+	},
+	{
 		.name = "failcnt",
 		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
 		.trigger = mem_cgroup_reset,
@@ -2711,8 +2738,8 @@ mem_cgroup_create(struct cgroup_subsys *
 		res_counter_init(&mem->res, NULL);
 		res_counter_init(&mem->memsw, NULL);
 	}
-	mem->last_scanned_child = 0;
 	mem->softlimit_priority = SOFTLIMIT_MAXPRI;
+	mem->last_scanned_child = 0;
 	mutex_init(&mem->softlimit_mutex);
 	spin_lock_init(&mem->reclaim_param_lock);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  1:00   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  1:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

This patch needs MORE investigation...

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This patch adds hooks for memcg's softlimit to kswapd().

Softlimit handler is called...
  - before generic shrink_zone() is called.
  - # of pages to be scanned depends on priority.
  - If not enough progress, selected memcg will be moved to UNUSED queue.
  - at each call for balance_pgdat(), softlimit queue is rebalanced.

Changelog: v3 -> v4
 - move "sc" as local variable

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Index: mmotm-2.6.29-Mar10/mm/vmscan.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
+++ mmotm-2.6.29-Mar10/mm/vmscan.c
@@ -1733,6 +1733,49 @@ unsigned long try_to_free_mem_cgroup_pag
 }
 #endif
 
+static void shrink_zone_softlimit(struct zone *zone, int order, int priority,
+			   int target, int end_zone)
+{
+	int scan = SWAP_CLUSTER_MAX;
+	int nid = zone->zone_pgdat->node_id;
+	int zid = zone_idx(zone);
+	struct mem_cgroup *mem;
+	struct scan_control sc =  {
+		.gfp_mask = GFP_KERNEL,
+		.may_writepage = !laptop_mode,
+		.swap_cluster_max = SWAP_CLUSTER_MAX,
+		.may_unmap = 1,
+		.swappiness = vm_swappiness,
+		.order = order,
+		.mem_cgroup = NULL,
+		.isolate_pages = mem_cgroup_isolate_pages,
+	};
+
+	scan = target * 2;
+
+	sc.nr_scanned = 0;
+	sc.nr_reclaimed = 0;
+	while (scan > 0) {
+		if (zone_watermark_ok(zone, order, target, end_zone, 0))
+			break;
+		mem = mem_cgroup_schedule(nid, zid);
+		if (!mem)
+			return;
+		sc.mem_cgroup = mem;
+
+		sc.nr_reclaimed = 0;
+		shrink_zone(priority, zone, &sc);
+
+		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX/2)
+			mem_cgroup_schedule_end(nid, zid, mem, true);
+		else
+			mem_cgroup_schedule_end(nid, zid, mem, false);
+
+		scan -= sc.nr_scanned;
+	}
+
+	return;
+}
 /*
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at pages_high.
@@ -1776,6 +1819,8 @@ static unsigned long balance_pgdat(pg_da
 	 */
 	int temp_priority[MAX_NR_ZONES];
 
+	/* Refill softlimit queue */
+	mem_cgroup_reschedule_all(pgdat->node_id);
 loop_again:
 	total_scanned = 0;
 	sc.nr_reclaimed = 0;
@@ -1856,6 +1901,13 @@ loop_again:
 					       end_zone, 0))
 				all_zones_ok = 0;
 			temp_priority[i] = priority;
+
+			/*
+			 * Try soft limit at first.  This reclaims page
+			 * with regard to user's hint.
+			 */
+			shrink_zone_softlimit(zone, order, priority,
+					       8 * zone->pages_high, end_zone);
 			sc.nr_scanned = 0;
 			note_zone_scanning_priority(zone, priority);
 			/*


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
@ 2009-03-12  1:00   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  1:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

This patch needs MORE investigation...

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

This patch adds hooks for memcg's softlimit to kswapd().

Softlimit handler is called...
  - before generic shrink_zone() is called.
  - # of pages to be scanned depends on priority.
  - If not enough progress, selected memcg will be moved to UNUSED queue.
  - at each call for balance_pgdat(), softlimit queue is rebalanced.

Changelog: v3 -> v4
 - move "sc" as local variable

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Index: mmotm-2.6.29-Mar10/mm/vmscan.c
===================================================================
--- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
+++ mmotm-2.6.29-Mar10/mm/vmscan.c
@@ -1733,6 +1733,49 @@ unsigned long try_to_free_mem_cgroup_pag
 }
 #endif
 
+static void shrink_zone_softlimit(struct zone *zone, int order, int priority,
+			   int target, int end_zone)
+{
+	int scan = SWAP_CLUSTER_MAX;
+	int nid = zone->zone_pgdat->node_id;
+	int zid = zone_idx(zone);
+	struct mem_cgroup *mem;
+	struct scan_control sc =  {
+		.gfp_mask = GFP_KERNEL,
+		.may_writepage = !laptop_mode,
+		.swap_cluster_max = SWAP_CLUSTER_MAX,
+		.may_unmap = 1,
+		.swappiness = vm_swappiness,
+		.order = order,
+		.mem_cgroup = NULL,
+		.isolate_pages = mem_cgroup_isolate_pages,
+	};
+
+	scan = target * 2;
+
+	sc.nr_scanned = 0;
+	sc.nr_reclaimed = 0;
+	while (scan > 0) {
+		if (zone_watermark_ok(zone, order, target, end_zone, 0))
+			break;
+		mem = mem_cgroup_schedule(nid, zid);
+		if (!mem)
+			return;
+		sc.mem_cgroup = mem;
+
+		sc.nr_reclaimed = 0;
+		shrink_zone(priority, zone, &sc);
+
+		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX/2)
+			mem_cgroup_schedule_end(nid, zid, mem, true);
+		else
+			mem_cgroup_schedule_end(nid, zid, mem, false);
+
+		scan -= sc.nr_scanned;
+	}
+
+	return;
+}
 /*
  * For kswapd, balance_pgdat() will work across all this node's zones until
  * they are all at pages_high.
@@ -1776,6 +1819,8 @@ static unsigned long balance_pgdat(pg_da
 	 */
 	int temp_priority[MAX_NR_ZONES];
 
+	/* Refill softlimit queue */
+	mem_cgroup_reschedule_all(pgdat->node_id);
 loop_again:
 	total_scanned = 0;
 	sc.nr_reclaimed = 0;
@@ -1856,6 +1901,13 @@ loop_again:
 					       end_zone, 0))
 				all_zones_ok = 0;
 			temp_priority[i] = priority;
+
+			/*
+			 * Try soft limit at first.  This reclaims page
+			 * with regard to user's hint.
+			 */
+			shrink_zone_softlimit(zone, order, priority,
+					       8 * zone->pages_high, end_zone);
 			sc.nr_scanned = 0;
 			note_zone_scanning_priority(zone, priority);
 			/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 6/5] softlimit document
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  1:01   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  1:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

Sorry...6th patch.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Documentation for softlimit

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/memory.txt |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Index: mmotm-2.6.29-Mar10/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.29-Mar10.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.29-Mar10/Documentation/cgroups/memory.txt
@@ -322,6 +322,25 @@ will be charged as a new owner of it.
   - a cgroup which uses hierarchy and it has child cgroup.
   - a cgroup which uses hierarchy and not the root of hierarchy.
 
+5.4 softlimit
+  Memory cgroup supports softlimit and has 2 params for control.
+    - memory.softlimit_in_bytes
+	- softlimit to this cgroup.
+    - memory.softlimit_priority.
+	- priority of this cgroup at softlimit reclaim.
+	  Allowed priority level is 3-0 and 3 is the lowest.
+	  If 0, this cgroup will not be target of softlimit.
+
+  At memory shortage of the system (or local node/zone), softlimit helps
+  kswapd(), a global memory recalim kernel thread, and inform victim cgroup
+  to be shrinked to kswapd.
+
+  Victim selection logic:
+  The kernel searches from the lowest priroty(3) up to the highest(1).
+  If it find a cgroup witch has memory larger than softlimit, steal memory
+  from it.
+  If multiple cgroups are on the same priority, each cgroup wil be a
+  victim in turn.
 
 6. Hierarchy support
 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH 6/5] softlimit document
@ 2009-03-12  1:01   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  1:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

Sorry...6th patch.
==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Documentation for softlimit

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 Documentation/cgroups/memory.txt |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Index: mmotm-2.6.29-Mar10/Documentation/cgroups/memory.txt
===================================================================
--- mmotm-2.6.29-Mar10.orig/Documentation/cgroups/memory.txt
+++ mmotm-2.6.29-Mar10/Documentation/cgroups/memory.txt
@@ -322,6 +322,25 @@ will be charged as a new owner of it.
   - a cgroup which uses hierarchy and it has child cgroup.
   - a cgroup which uses hierarchy and not the root of hierarchy.
 
+5.4 softlimit
+  Memory cgroup supports softlimit and has 2 params for control.
+    - memory.softlimit_in_bytes
+	- softlimit to this cgroup.
+    - memory.softlimit_priority.
+	- priority of this cgroup at softlimit reclaim.
+	  Allowed priority level is 3-0 and 3 is the lowest.
+	  If 0, this cgroup will not be target of softlimit.
+
+  At memory shortage of the system (or local node/zone), softlimit helps
+  kswapd(), a global memory recalim kernel thread, and inform victim cgroup
+  to be shrinked to kswapd.
+
+  Victim selection logic:
+  The kernel searches from the lowest priroty(3) up to the highest(1).
+  If it find a cgroup witch has memory larger than softlimit, steal memory
+  from it.
+  If multiple cgroups are on the same priority, each cgroup wil be a
+  victim in turn.
 
 6. Hierarchy support
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 6/5] softlimit document
  2009-03-12  1:01   ` KAMEZAWA Hiroyuki
@ 2009-03-12  1:54     ` Li Zefan
  -1 siblings, 0 replies; 68+ messages in thread
From: Li Zefan @ 2009-03-12  1:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

> +    - memory.softlimit_priority.
> +	- priority of this cgroup at softlimit reclaim.
> +	  Allowed priority level is 3-0 and 3 is the lowest.
> +	  If 0, this cgroup will not be target of softlimit.
> +

Seems this document is the older one...



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 6/5] softlimit document
@ 2009-03-12  1:54     ` Li Zefan
  0 siblings, 0 replies; 68+ messages in thread
From: Li Zefan @ 2009-03-12  1:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

> +    - memory.softlimit_priority.
> +	- priority of this cgroup at softlimit reclaim.
> +	  Allowed priority level is 3-0 and 3 is the lowest.
> +	  If 0, this cgroup will not be target of softlimit.
> +

Seems this document is the older one...


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 6/5] softlimit document
  2009-03-12  1:54     ` Li Zefan
@ 2009-03-12  2:01       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  2:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:54:02 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> > +    - memory.softlimit_priority.
> > +	- priority of this cgroup at softlimit reclaim.
> > +	  Allowed priority level is 3-0 and 3 is the lowest.
> > +	  If 0, this cgroup will not be target of softlimit.
> > +
> 
> Seems this document is the older one...
> 
Ouch..my merge miss...please ignore this 6/5.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 6/5] softlimit document
@ 2009-03-12  2:01       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  2:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:54:02 +0800
Li Zefan <lizf@cn.fujitsu.com> wrote:

> > +    - memory.softlimit_priority.
> > +	- priority of this cgroup at softlimit reclaim.
> > +	  Allowed priority level is 3-0 and 3 is the lowest.
> > +	  If 0, this cgroup will not be target of softlimit.
> > +
> 
> Seems this document is the older one...
> 
Ouch..my merge miss...please ignore this 6/5.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-12  3:46   ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:

> Hi, this is a patch for implemnt softlimit to memcg.
> 
> I did some clean up and bug fixes. 
> 
> Anyway I have to look into details of "LRU scan algorithm" after this.
> 
> How this works:
> 
>  (1) Set softlimit threshold to memcg.
>      #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.
> 
>  (2) Define priority as victim.
>      #echo 3 > /cgroups/my_group/memory.softlimit_priority.
>      0 is the lowest, 8 is the highest.
>      If "8", softlimit feature ignore this group.
>      default value is "8".
> 
>  (3) Add some memory pressure and make kswapd() work.
>      kswapd will reclaim memory from victims paying regard to priority.
> 
> Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)
> 
>   While a process malloc 800MB of memory and touch it and sleep in a group,
>   run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.
> 
>   Without softlimit => 400MB of malloc'ed memory are swapped out.
>   With softlimit    =>  80MB of malloc'ed memory are swapped out. 
> 
> I think 80MB of swap is from direct memory reclaim path. And this
> seems not to be terrible result.
> 
> I'll do more test on other hosts. Any comments are welcome.
>

I've tested so far by

Creating two cgroups and then 

a. Assigning limits of 1G and 2G and run memory allocation and touch
test
b. Same as (a) with 1G and 1G
c. Same as (a) with 0 and 1G
d. Same as (a) with 0 and 0

More comments in induvidual patches.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  3:46   ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:

> Hi, this is a patch for implemnt softlimit to memcg.
> 
> I did some clean up and bug fixes. 
> 
> Anyway I have to look into details of "LRU scan algorithm" after this.
> 
> How this works:
> 
>  (1) Set softlimit threshold to memcg.
>      #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.
> 
>  (2) Define priority as victim.
>      #echo 3 > /cgroups/my_group/memory.softlimit_priority.
>      0 is the lowest, 8 is the highest.
>      If "8", softlimit feature ignore this group.
>      default value is "8".
> 
>  (3) Add some memory pressure and make kswapd() work.
>      kswapd will reclaim memory from victims paying regard to priority.
> 
> Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)
> 
>   While a process malloc 800MB of memory and touch it and sleep in a group,
>   run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.
> 
>   Without softlimit => 400MB of malloc'ed memory are swapped out.
>   With softlimit    =>  80MB of malloc'ed memory are swapped out. 
> 
> I think 80MB of swap is from direct memory reclaim path. And this
> seems not to be terrible result.
> 
> I'll do more test on other hosts. Any comments are welcome.
>

I've tested so far by

Creating two cgroups and then 

a. Assigning limits of 1G and 2G and run memory allocation and touch
test
b. Same as (a) with 1G and 1G
c. Same as (a) with 0 and 1G
d. Same as (a) with 0 and 0

More comments in induvidual patches.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  0:55   ` KAMEZAWA Hiroyuki
@ 2009-03-12  3:49     ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:

> Andrew, this [1/5] is a bug fix, others are not.
> 
> ==
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Even when page reclaim is under mem_cgroup, # of scan page is determined by
> status of global LRU. Fix that.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
>  		int file = is_file_lru(l);
>  		int scan;
> 
> -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> +		scan = zone_nr_pages(zone, sc, l);

I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
buggy. We don't hold any sort of lock while extracting
MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
we are reading it?

>  		if (priority) {
>  			scan >>= priority;
>  			scan = (scan * percent[file]) / 100;
> 
> 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  3:49     ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:

> Andrew, this [1/5] is a bug fix, others are not.
> 
> ==
> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Even when page reclaim is under mem_cgroup, # of scan page is determined by
> status of global LRU. Fix that.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
>  		int file = is_file_lru(l);
>  		int scan;
> 
> -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> +		scan = zone_nr_pages(zone, sc, l);

I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
buggy. We don't hold any sort of lock while extracting
MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
we are reading it?

>  		if (priority) {
>  			scan >>= priority;
>  			scan = (scan * percent[file]) / 100;
> 
> 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  3:49     ` Balbir Singh
@ 2009-03-12  3:51       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  3:51 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

On Thu, 12 Mar 2009 09:19:18 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> 
> > Andrew, this [1/5] is a bug fix, others are not.
> > 
> > ==
> > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > 
> > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > status of global LRU. Fix that.
> > 
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/vmscan.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > ===================================================================
> > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> >  		int file = is_file_lru(l);
> >  		int scan;
> > 
> > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > +		scan = zone_nr_pages(zone, sc, l);
> 
> I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> buggy. We don't hold any sort of lock while extracting
> MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> we are reading it?
> 
Is it big problem ? We don't need very precise value and ZSTAT just have
increment/decrement. So, I tend to ignore this small race.
(and it's unsigned long, not long long.)

Thanks,
-Kame


> >  		if (priority) {
> >  			scan >>= priority;
> >  			scan = (scan * percent[file]) / 100;
> > 
> > 
> 
> -- 
> 	Balbir
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  3:51       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  3:51 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

On Thu, 12 Mar 2009 09:19:18 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> 
> > Andrew, this [1/5] is a bug fix, others are not.
> > 
> > ==
> > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > 
> > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > status of global LRU. Fix that.
> > 
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/vmscan.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > ===================================================================
> > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> >  		int file = is_file_lru(l);
> >  		int scan;
> > 
> > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > +		scan = zone_nr_pages(zone, sc, l);
> 
> I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> buggy. We don't hold any sort of lock while extracting
> MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> we are reading it?
> 
Is it big problem ? We don't need very precise value and ZSTAT just have
increment/decrement. So, I tend to ignore this small race.
(and it's unsigned long, not long long.)

Thanks,
-Kame


> >  		if (priority) {
> >  			scan >>= priority;
> >  			scan = (scan * percent[file]) / 100;
> > 
> > 
> 
> -- 
> 	Balbir
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
  2009-03-12  0:56   ` KAMEZAWA Hiroyuki
@ 2009-03-12  3:54     ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:56:12]:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Adds an interface for defining sotlimit per memcg. (no handler in this patch.)
> softlimit paramater itself is added to res_counter and 
>  res_counter_set_softlimit() and
>  res_counter_check_under_softlimit() is provided as an interface.
> 
> 
> Changelog v2->v3:
>  - softlimit is moved to res_counter

Good, this is very similar to the patch I have in my post as well. Please feel
free to add my signed-off-by on this patch, but please see below for
comments.

> Changelog v1->v2:
>  - For refactoring, divided a patch into 2 part and this patch just
>    involves memory.softlimit interface.
>  - Removed governor-detect routine, it was buggy in design.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/res_counter.h |    9 +++++++++
>  kernel/res_counter.c        |   29 +++++++++++++++++++++++++++++
>  mm/memcontrol.c             |   12 ++++++++++++
>  3 files changed, 50 insertions(+)
> 
> Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
> +++ mmotm-2.6.29-Mar10/mm/memcontrol.c
> @@ -2002,6 +2002,12 @@ static int mem_cgroup_write(struct cgrou
>  		else
>  			ret = mem_cgroup_resize_memsw_limit(memcg, val);
>  		break;
> +	case RES_SOFTLIMIT:
> +		ret = res_counter_memparse_write_strategy(buffer, &val);
> +		if (ret)
> +			break;
> +		ret = res_counter_set_softlimit(&memcg->res, val);
> +		break;
>  	default:
>  		ret = -EINVAL; /* should be BUG() ? */
>  		break;
> @@ -2251,6 +2257,12 @@ static struct cftype mem_cgroup_files[] 
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
> +		.name = "softlimit_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEM, RES_SOFTLIMIT),
> +		.write_string = mem_cgroup_write,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
>  		.name = "failcnt",
>  		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
>  		.trigger = mem_cgroup_reset,
> Index: mmotm-2.6.29-Mar10/include/linux/res_counter.h
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/include/linux/res_counter.h
> +++ mmotm-2.6.29-Mar10/include/linux/res_counter.h
> @@ -39,6 +39,10 @@ struct res_counter {
>  	 */
>  	unsigned long long failcnt;
>  	/*
> +	 * the softlimit.
> +	 */
> +	unsigned long long softlimit;
> +	/*
>  	 * the lock to protect all of the above.
>  	 * the routines below consider this to be IRQ-safe
>  	 */
> @@ -85,6 +89,7 @@ enum {
>  	RES_MAX_USAGE,
>  	RES_LIMIT,
>  	RES_FAILCNT,
> +	RES_SOFTLIMIT,
>  };
> 
>  /*
> @@ -178,4 +183,8 @@ static inline int res_counter_set_limit(
>  	return ret;
>  }
> 
> +/* res_counter's softlimit check can handles hierarchy in proper way */
> +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val);
> +bool res_counter_check_under_softlimit(struct res_counter *cnt);
> +
>  #endif
> Index: mmotm-2.6.29-Mar10/kernel/res_counter.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/kernel/res_counter.c
> +++ mmotm-2.6.29-Mar10/kernel/res_counter.c
> @@ -20,6 +20,7 @@ void res_counter_init(struct res_counter
>  	spin_lock_init(&counter->lock);
>  	counter->limit = (unsigned long long)LLONG_MAX;
>  	counter->parent = parent;
> +	counter->softlimit = (unsigned long long)LLONG_MAX;
>  }
> 
>  int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> @@ -88,6 +89,32 @@ void res_counter_uncharge(struct res_cou
>  	local_irq_restore(flags);
>  }
> 
> +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	cnt->softlimit = val;
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return 0;
> +}
> +
> +bool res_counter_check_under_softlimit(struct res_counter *cnt)
> +{
> +	struct res_counter *c;
> +	unsigned long flags;
> +	bool ret = true;
> +
> +	local_irq_save(flags);
> +	for (c = cnt; ret && c != NULL; c = c->parent) {
> +		spin_lock(&c->lock);
> +		if (c->softlimit < c->usage)
> +			ret = false;

So if a child was under the soft limit and the parent is *not*, we
_override_ ret and return false?

> +		spin_unlock(&c->lock);
> +	}
> +	local_irq_restore(flags);
> +	return ret;
> +}

Why is the check_under_softlimit hierarchical? BTW, this patch is
buggy. See above.

> 
>  static inline unsigned long long *
>  res_counter_member(struct res_counter *counter, int member)
> @@ -101,6 +128,8 @@ res_counter_member(struct res_counter *c
>  		return &counter->limit;
>  	case RES_FAILCNT:
>  		return &counter->failcnt;
> +	case RES_SOFTLIMIT:
> +		return &counter->softlimit;
>  	};
> 
>  	BUG();
> 
> 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
@ 2009-03-12  3:54     ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:54 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:56:12]:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> Adds an interface for defining sotlimit per memcg. (no handler in this patch.)
> softlimit paramater itself is added to res_counter and 
>  res_counter_set_softlimit() and
>  res_counter_check_under_softlimit() is provided as an interface.
> 
> 
> Changelog v2->v3:
>  - softlimit is moved to res_counter

Good, this is very similar to the patch I have in my post as well. Please feel
free to add my signed-off-by on this patch, but please see below for
comments.

> Changelog v1->v2:
>  - For refactoring, divided a patch into 2 part and this patch just
>    involves memory.softlimit interface.
>  - Removed governor-detect routine, it was buggy in design.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  include/linux/res_counter.h |    9 +++++++++
>  kernel/res_counter.c        |   29 +++++++++++++++++++++++++++++
>  mm/memcontrol.c             |   12 ++++++++++++
>  3 files changed, 50 insertions(+)
> 
> Index: mmotm-2.6.29-Mar10/mm/memcontrol.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/memcontrol.c
> +++ mmotm-2.6.29-Mar10/mm/memcontrol.c
> @@ -2002,6 +2002,12 @@ static int mem_cgroup_write(struct cgrou
>  		else
>  			ret = mem_cgroup_resize_memsw_limit(memcg, val);
>  		break;
> +	case RES_SOFTLIMIT:
> +		ret = res_counter_memparse_write_strategy(buffer, &val);
> +		if (ret)
> +			break;
> +		ret = res_counter_set_softlimit(&memcg->res, val);
> +		break;
>  	default:
>  		ret = -EINVAL; /* should be BUG() ? */
>  		break;
> @@ -2251,6 +2257,12 @@ static struct cftype mem_cgroup_files[] 
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
> +		.name = "softlimit_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEM, RES_SOFTLIMIT),
> +		.write_string = mem_cgroup_write,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
>  		.name = "failcnt",
>  		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
>  		.trigger = mem_cgroup_reset,
> Index: mmotm-2.6.29-Mar10/include/linux/res_counter.h
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/include/linux/res_counter.h
> +++ mmotm-2.6.29-Mar10/include/linux/res_counter.h
> @@ -39,6 +39,10 @@ struct res_counter {
>  	 */
>  	unsigned long long failcnt;
>  	/*
> +	 * the softlimit.
> +	 */
> +	unsigned long long softlimit;
> +	/*
>  	 * the lock to protect all of the above.
>  	 * the routines below consider this to be IRQ-safe
>  	 */
> @@ -85,6 +89,7 @@ enum {
>  	RES_MAX_USAGE,
>  	RES_LIMIT,
>  	RES_FAILCNT,
> +	RES_SOFTLIMIT,
>  };
> 
>  /*
> @@ -178,4 +183,8 @@ static inline int res_counter_set_limit(
>  	return ret;
>  }
> 
> +/* res_counter's softlimit check can handles hierarchy in proper way */
> +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val);
> +bool res_counter_check_under_softlimit(struct res_counter *cnt);
> +
>  #endif
> Index: mmotm-2.6.29-Mar10/kernel/res_counter.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/kernel/res_counter.c
> +++ mmotm-2.6.29-Mar10/kernel/res_counter.c
> @@ -20,6 +20,7 @@ void res_counter_init(struct res_counter
>  	spin_lock_init(&counter->lock);
>  	counter->limit = (unsigned long long)LLONG_MAX;
>  	counter->parent = parent;
> +	counter->softlimit = (unsigned long long)LLONG_MAX;
>  }
> 
>  int res_counter_charge_locked(struct res_counter *counter, unsigned long val)
> @@ -88,6 +89,32 @@ void res_counter_uncharge(struct res_cou
>  	local_irq_restore(flags);
>  }
> 
> +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	cnt->softlimit = val;
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return 0;
> +}
> +
> +bool res_counter_check_under_softlimit(struct res_counter *cnt)
> +{
> +	struct res_counter *c;
> +	unsigned long flags;
> +	bool ret = true;
> +
> +	local_irq_save(flags);
> +	for (c = cnt; ret && c != NULL; c = c->parent) {
> +		spin_lock(&c->lock);
> +		if (c->softlimit < c->usage)
> +			ret = false;

So if a child was under the soft limit and the parent is *not*, we
_override_ ret and return false?

> +		spin_unlock(&c->lock);
> +	}
> +	local_irq_restore(flags);
> +	return ret;
> +}

Why is the check_under_softlimit hierarchical? BTW, this patch is
buggy. See above.

> 
>  static inline unsigned long long *
>  res_counter_member(struct res_counter *counter, int member)
> @@ -101,6 +128,8 @@ res_counter_member(struct res_counter *c
>  		return &counter->limit;
>  	case RES_FAILCNT:
>  		return &counter->failcnt;
> +	case RES_SOFTLIMIT:
> +		return &counter->softlimit;
>  	};
> 
>  	BUG();
> 
> 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
  2009-03-12  1:00   ` KAMEZAWA Hiroyuki
@ 2009-03-12  3:58     ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 10:00:08]:

> This patch needs MORE investigation...
> 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch adds hooks for memcg's softlimit to kswapd().
> 
> Softlimit handler is called...
>   - before generic shrink_zone() is called.
>   - # of pages to be scanned depends on priority.
>   - If not enough progress, selected memcg will be moved to UNUSED queue.
>   - at each call for balance_pgdat(), softlimit queue is rebalanced.
> 
> Changelog: v3 -> v4
>  - move "sc" as local variable
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> @@ -1733,6 +1733,49 @@ unsigned long try_to_free_mem_cgroup_pag
>  }
>  #endif
> 
> +static void shrink_zone_softlimit(struct zone *zone, int order, int priority,
> +			   int target, int end_zone)
> +{
> +	int scan = SWAP_CLUSTER_MAX;
> +	int nid = zone->zone_pgdat->node_id;
> +	int zid = zone_idx(zone);
> +	struct mem_cgroup *mem;
> +	struct scan_control sc =  {
> +		.gfp_mask = GFP_KERNEL,
> +		.may_writepage = !laptop_mode,
> +		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.may_unmap = 1,
> +		.swappiness = vm_swappiness,
> +		.order = order,
> +		.mem_cgroup = NULL,
> +		.isolate_pages = mem_cgroup_isolate_pages,
> +	};
> +
> +	scan = target * 2;
> +
> +	sc.nr_scanned = 0;
> +	sc.nr_reclaimed = 0;
> +	while (scan > 0) {
> +		if (zone_watermark_ok(zone, order, target, end_zone, 0))
> +			break;
> +		mem = mem_cgroup_schedule(nid, zid);
> +		if (!mem)
> +			return;
> +		sc.mem_cgroup = mem;
> +
> +		sc.nr_reclaimed = 0;
> +		shrink_zone(priority, zone, &sc);
> +
> +		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX/2)
> +			mem_cgroup_schedule_end(nid, zid, mem, true);
> +		else
> +			mem_cgroup_schedule_end(nid, zid, mem, false);
> +
> +		scan -= sc.nr_scanned;
> +	}
> +
> +	return;
> +}

I experimented a *lot* with zone reclaim and found it to be not so
effective. Here is why

1. We have no control over priority or how much to scan, that is
controlled by balance_pgdat(). If we find that we are unable to scan
anything, we continue scanning with the scan > 0 check, but we scan
the same pages and the same number, because shrink_zone does scan >>
priority.
2. If we fail to reclaim pages in shrink_zone_softlimit, shrink_zone()
will reclaim pages independent of the soft limit for us

I spent a couple of days looking at zone based reclaim, but ran into
(1) and (2) above.

>  /*
>   * For kswapd, balance_pgdat() will work across all this node's zones until
>   * they are all at pages_high.
> @@ -1776,6 +1819,8 @@ static unsigned long balance_pgdat(pg_da
>  	 */
>  	int temp_priority[MAX_NR_ZONES];
> 
> +	/* Refill softlimit queue */
> +	mem_cgroup_reschedule_all(pgdat->node_id);
>  loop_again:
>  	total_scanned = 0;
>  	sc.nr_reclaimed = 0;
> @@ -1856,6 +1901,13 @@ loop_again:
>  					       end_zone, 0))
>  				all_zones_ok = 0;
>  			temp_priority[i] = priority;
> +
> +			/*
> +			 * Try soft limit at first.  This reclaims page
> +			 * with regard to user's hint.
> +			 */
> +			shrink_zone_softlimit(zone, order, priority,
> +					       8 * zone->pages_high, end_zone);
>  			sc.nr_scanned = 0;
>  			note_zone_scanning_priority(zone, priority);
>  			/*
> 
> 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
@ 2009-03-12  3:58     ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  3:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 10:00:08]:

> This patch needs MORE investigation...
> 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch adds hooks for memcg's softlimit to kswapd().
> 
> Softlimit handler is called...
>   - before generic shrink_zone() is called.
>   - # of pages to be scanned depends on priority.
>   - If not enough progress, selected memcg will be moved to UNUSED queue.
>   - at each call for balance_pgdat(), softlimit queue is rebalanced.
> 
> Changelog: v3 -> v4
>  - move "sc" as local variable
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> @@ -1733,6 +1733,49 @@ unsigned long try_to_free_mem_cgroup_pag
>  }
>  #endif
> 
> +static void shrink_zone_softlimit(struct zone *zone, int order, int priority,
> +			   int target, int end_zone)
> +{
> +	int scan = SWAP_CLUSTER_MAX;
> +	int nid = zone->zone_pgdat->node_id;
> +	int zid = zone_idx(zone);
> +	struct mem_cgroup *mem;
> +	struct scan_control sc =  {
> +		.gfp_mask = GFP_KERNEL,
> +		.may_writepage = !laptop_mode,
> +		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.may_unmap = 1,
> +		.swappiness = vm_swappiness,
> +		.order = order,
> +		.mem_cgroup = NULL,
> +		.isolate_pages = mem_cgroup_isolate_pages,
> +	};
> +
> +	scan = target * 2;
> +
> +	sc.nr_scanned = 0;
> +	sc.nr_reclaimed = 0;
> +	while (scan > 0) {
> +		if (zone_watermark_ok(zone, order, target, end_zone, 0))
> +			break;
> +		mem = mem_cgroup_schedule(nid, zid);
> +		if (!mem)
> +			return;
> +		sc.mem_cgroup = mem;
> +
> +		sc.nr_reclaimed = 0;
> +		shrink_zone(priority, zone, &sc);
> +
> +		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX/2)
> +			mem_cgroup_schedule_end(nid, zid, mem, true);
> +		else
> +			mem_cgroup_schedule_end(nid, zid, mem, false);
> +
> +		scan -= sc.nr_scanned;
> +	}
> +
> +	return;
> +}

I experimented a *lot* with zone reclaim and found it to be not so
effective. Here is why

1. We have no control over priority or how much to scan, that is
controlled by balance_pgdat(). If we find that we are unable to scan
anything, we continue scanning with the scan > 0 check, but we scan
the same pages and the same number, because shrink_zone does scan >>
priority.
2. If we fail to reclaim pages in shrink_zone_softlimit, shrink_zone()
will reclaim pages independent of the soft limit for us

I spent a couple of days looking at zone based reclaim, but ran into
(1) and (2) above.

>  /*
>   * For kswapd, balance_pgdat() will work across all this node's zones until
>   * they are all at pages_high.
> @@ -1776,6 +1819,8 @@ static unsigned long balance_pgdat(pg_da
>  	 */
>  	int temp_priority[MAX_NR_ZONES];
> 
> +	/* Refill softlimit queue */
> +	mem_cgroup_reschedule_all(pgdat->node_id);
>  loop_again:
>  	total_scanned = 0;
>  	sc.nr_reclaimed = 0;
> @@ -1856,6 +1901,13 @@ loop_again:
>  					       end_zone, 0))
>  				all_zones_ok = 0;
>  			temp_priority[i] = priority;
> +
> +			/*
> +			 * Try soft limit at first.  This reclaims page
> +			 * with regard to user's hint.
> +			 */
> +			shrink_zone_softlimit(zone, order, priority,
> +					       8 * zone->pages_high, end_zone);
>  			sc.nr_scanned = 0;
>  			note_zone_scanning_priority(zone, priority);
>  			/*
> 
> 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
  2009-03-12  3:54     ` Balbir Singh
@ 2009-03-12  3:58       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  3:58 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:24:44 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

>
> > +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cnt->lock, flags);
> > +	cnt->softlimit = val;
> > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	return 0;
> > +}
> > +
> > +bool res_counter_check_under_softlimit(struct res_counter *cnt)
> > +{
> > +	struct res_counter *c;
> > +	unsigned long flags;
> > +	bool ret = true;
> > +
> > +	local_irq_save(flags);
> > +	for (c = cnt; ret && c != NULL; c = c->parent) {
> > +		spin_lock(&c->lock);
> > +		if (c->softlimit < c->usage)
> > +			ret = false;
> 
> So if a child was under the soft limit and the parent is *not*, we
> _override_ ret and return false?
> 
yes. If you don't want this behavior I'll rename this to
res_counter_check_under_softlimit_hierarchical().


> > +		spin_unlock(&c->lock);
> > +	}
> > +	local_irq_restore(flags);
> > +	return ret;
> > +}
> 
> Why is the check_under_softlimit hierarchical? 

At checking whether a mem_cgroup is a candidate for softlimit-reclaim,
we need to check all parents.

> BTW, this patch is buggy. See above.
> 

Not buggy. Just meets my requiremnt.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
@ 2009-03-12  3:58       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  3:58 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:24:44 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

>
> > +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&cnt->lock, flags);
> > +	cnt->softlimit = val;
> > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	return 0;
> > +}
> > +
> > +bool res_counter_check_under_softlimit(struct res_counter *cnt)
> > +{
> > +	struct res_counter *c;
> > +	unsigned long flags;
> > +	bool ret = true;
> > +
> > +	local_irq_save(flags);
> > +	for (c = cnt; ret && c != NULL; c = c->parent) {
> > +		spin_lock(&c->lock);
> > +		if (c->softlimit < c->usage)
> > +			ret = false;
> 
> So if a child was under the soft limit and the parent is *not*, we
> _override_ ret and return false?
> 
yes. If you don't want this behavior I'll rename this to
res_counter_check_under_softlimit_hierarchical().


> > +		spin_unlock(&c->lock);
> > +	}
> > +	local_irq_restore(flags);
> > +	return ret;
> > +}
> 
> Why is the check_under_softlimit hierarchical? 

At checking whether a mem_cgroup is a candidate for softlimit-reclaim,
we need to check all parents.

> BTW, this patch is buggy. See above.
> 

Not buggy. Just meets my requiremnt.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  3:51       ` KAMEZAWA Hiroyuki
@ 2009-03-12  4:00         ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  4:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:

> On Thu, 12 Mar 2009 09:19:18 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > 
> > > Andrew, this [1/5] is a bug fix, others are not.
> > > 
> > > ==
> > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > 
> > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > status of global LRU. Fix that.
> > > 
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  mm/vmscan.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > ===================================================================
> > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > >  		int file = is_file_lru(l);
> > >  		int scan;
> > > 
> > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > +		scan = zone_nr_pages(zone, sc, l);
> > 
> > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > buggy. We don't hold any sort of lock while extracting
> > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > we are reading it?
> > 
> Is it big problem ? We don't need very precise value and ZSTAT just have
> increment/decrement. So, I tend to ignore this small race.
> (and it's unsigned long, not long long.)
>

The assumption is that unsigned long read is atomic even on 32 bit
systems? What if we get pre-empted in the middle of reading the data
and don't return back for long? The data can be highly in-accurate.
No? 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  4:00         ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  4:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:

> On Thu, 12 Mar 2009 09:19:18 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > 
> > > Andrew, this [1/5] is a bug fix, others are not.
> > > 
> > > ==
> > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > 
> > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > status of global LRU. Fix that.
> > > 
> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > ---
> > >  mm/vmscan.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > ===================================================================
> > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > >  		int file = is_file_lru(l);
> > >  		int scan;
> > > 
> > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > +		scan = zone_nr_pages(zone, sc, l);
> > 
> > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > buggy. We don't hold any sort of lock while extracting
> > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > we are reading it?
> > 
> Is it big problem ? We don't need very precise value and ZSTAT just have
> increment/decrement. So, I tend to ignore this small race.
> (and it's unsigned long, not long long.)
>

The assumption is that unsigned long read is atomic even on 32 bit
systems? What if we get pre-empted in the middle of reading the data
and don't return back for long? The data can be highly in-accurate.
No? 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
  2009-03-12  3:58     ` Balbir Singh
@ 2009-03-12  4:02       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:02 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:28:37 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 10:00:08]:

> > +	return;
> > +}
> 
> I experimented a *lot* with zone reclaim and found it to be not so
> effective. Here is why
> 
> 1. We have no control over priority or how much to scan, that is
> controlled by balance_pgdat(). If we find that we are unable to scan
> anything, we continue scanning with the scan > 0 check, but we scan
> the same pages and the same number, because shrink_zone does scan >>
> priority.

If sc->nr_reclaimd==0, "false" is passed and mem_cgroup_schedule_end()
and it will be moved to INACTIVE queue. (and not appear here again.)


> 2. If we fail to reclaim pages in shrink_zone_softlimit, shrink_zone()
> will reclaim pages independent of the soft limit for us
> 
yes. It's intentional behavior.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
@ 2009-03-12  4:02       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:02 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:28:37 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 10:00:08]:

> > +	return;
> > +}
> 
> I experimented a *lot* with zone reclaim and found it to be not so
> effective. Here is why
> 
> 1. We have no control over priority or how much to scan, that is
> controlled by balance_pgdat(). If we find that we are unable to scan
> anything, we continue scanning with the scan > 0 check, but we scan
> the same pages and the same number, because shrink_zone does scan >>
> priority.

If sc->nr_reclaimd==0, "false" is passed and mem_cgroup_schedule_end()
and it will be moved to INACTIVE queue. (and not appear here again.)


> 2. If we fail to reclaim pages in shrink_zone_softlimit, shrink_zone()
> will reclaim pages independent of the soft limit for us
> 
yes. It's intentional behavior.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  4:00         ` Balbir Singh
@ 2009-03-12  4:05           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:05 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

On Thu, 12 Mar 2009 09:30:54 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> 
> > On Thu, 12 Mar 2009 09:19:18 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > 
> > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > 
> > > > ==
> > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > 
> > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > status of global LRU. Fix that.
> > > > 
> > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > ---
> > > >  mm/vmscan.c |    2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > ===================================================================
> > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > >  		int file = is_file_lru(l);
> > > >  		int scan;
> > > > 
> > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > +		scan = zone_nr_pages(zone, sc, l);
> > > 
> > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > buggy. We don't hold any sort of lock while extracting
> > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > we are reading it?
> > > 
> > Is it big problem ? We don't need very precise value and ZSTAT just have
> > increment/decrement. So, I tend to ignore this small race.
> > (and it's unsigned long, not long long.)
> >
> 
> The assumption is that unsigned long read is atomic even on 32 bit
> systems? What if we get pre-empted in the middle of reading the data
> and don't return back for long? The data can be highly in-accurate.
> No? 
> 
Hmm,  preempt_disable() is appropriate ?

But shrink_zone() itself works on the value which is read at this time and
dont' take care of changes in situation by preeemption...so it's not problem
of memcg.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  4:05           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:05 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

On Thu, 12 Mar 2009 09:30:54 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> 
> > On Thu, 12 Mar 2009 09:19:18 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > 
> > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > 
> > > > ==
> > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > 
> > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > status of global LRU. Fix that.
> > > > 
> > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > ---
> > > >  mm/vmscan.c |    2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > ===================================================================
> > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > >  		int file = is_file_lru(l);
> > > >  		int scan;
> > > > 
> > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > +		scan = zone_nr_pages(zone, sc, l);
> > > 
> > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > buggy. We don't hold any sort of lock while extracting
> > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > we are reading it?
> > > 
> > Is it big problem ? We don't need very precise value and ZSTAT just have
> > increment/decrement. So, I tend to ignore this small race.
> > (and it's unsigned long, not long long.)
> >
> 
> The assumption is that unsigned long read is atomic even on 32 bit
> systems? What if we get pre-empted in the middle of reading the data
> and don't return back for long? The data can be highly in-accurate.
> No? 
> 
Hmm,  preempt_disable() is appropriate ?

But shrink_zone() itself works on the value which is read at this time and
dont' take care of changes in situation by preeemption...so it's not problem
of memcg.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
  2009-03-12  3:58       ` KAMEZAWA Hiroyuki
@ 2009-03-12  4:10         ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  4:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:58:39]:

> On Thu, 12 Mar 2009 09:24:44 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> >
> > > +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
> > > +{
> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(&cnt->lock, flags);
> > > +	cnt->softlimit = val;
> > > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > > +	return 0;
> > > +}
> > > +
> > > +bool res_counter_check_under_softlimit(struct res_counter *cnt)
> > > +{
> > > +	struct res_counter *c;
> > > +	unsigned long flags;
> > > +	bool ret = true;
> > > +
> > > +	local_irq_save(flags);
> > > +	for (c = cnt; ret && c != NULL; c = c->parent) {
> > > +		spin_lock(&c->lock);
> > > +		if (c->softlimit < c->usage)
> > > +			ret = false;
> > 
> > So if a child was under the soft limit and the parent is *not*, we
> > _override_ ret and return false?
> > 
> yes. If you don't want this behavior I'll rename this to
> res_counter_check_under_softlimit_hierarchical().
> 

That is a nicer name.

> 
> > > +		spin_unlock(&c->lock);
> > > +	}
> > > +	local_irq_restore(flags);
> > > +	return ret;
> > > +}
> > 
> > Why is the check_under_softlimit hierarchical? 
> 
> At checking whether a mem_cgroup is a candidate for softlimit-reclaim,
> we need to check all parents.
> 
> > BTW, this patch is buggy. See above.
> > 
> 
> Not buggy. Just meets my requiremnt.

Correct me if I am wrong, but this boils down to checking if the top
root is above it's soft limit? Instead of checking all the way up in
the hierarchy, can't we do a conditional check for

        c->parent == NULL && (c->softlimit < c->usage)

BTW, I would prefer to split the word softlimit to soft_limit, it is
more readable that way.


-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
@ 2009-03-12  4:10         ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  4:10 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:58:39]:

> On Thu, 12 Mar 2009 09:24:44 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> >
> > > +int res_counter_set_softlimit(struct res_counter *cnt, unsigned long long val)
> > > +{
> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(&cnt->lock, flags);
> > > +	cnt->softlimit = val;
> > > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > > +	return 0;
> > > +}
> > > +
> > > +bool res_counter_check_under_softlimit(struct res_counter *cnt)
> > > +{
> > > +	struct res_counter *c;
> > > +	unsigned long flags;
> > > +	bool ret = true;
> > > +
> > > +	local_irq_save(flags);
> > > +	for (c = cnt; ret && c != NULL; c = c->parent) {
> > > +		spin_lock(&c->lock);
> > > +		if (c->softlimit < c->usage)
> > > +			ret = false;
> > 
> > So if a child was under the soft limit and the parent is *not*, we
> > _override_ ret and return false?
> > 
> yes. If you don't want this behavior I'll rename this to
> res_counter_check_under_softlimit_hierarchical().
> 

That is a nicer name.

> 
> > > +		spin_unlock(&c->lock);
> > > +	}
> > > +	local_irq_restore(flags);
> > > +	return ret;
> > > +}
> > 
> > Why is the check_under_softlimit hierarchical? 
> 
> At checking whether a mem_cgroup is a candidate for softlimit-reclaim,
> we need to check all parents.
> 
> > BTW, this patch is buggy. See above.
> > 
> 
> Not buggy. Just meets my requiremnt.

Correct me if I am wrong, but this boils down to checking if the top
root is above it's soft limit? Instead of checking all the way up in
the hierarchy, can't we do a conditional check for

        c->parent == NULL && (c->softlimit < c->usage)

BTW, I would prefer to split the word softlimit to soft_limit, it is
more readable that way.


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  4:05           ` KAMEZAWA Hiroyuki
@ 2009-03-12  4:14             ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  4:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:

> On Thu, 12 Mar 2009 09:30:54 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > 
> > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > 
> > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > 
> > > > > ==
> > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > 
> > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > status of global LRU. Fix that.
> > > > > 
> > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > ---
> > > > >  mm/vmscan.c |    2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > ===================================================================
> > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > >  		int file = is_file_lru(l);
> > > > >  		int scan;
> > > > > 
> > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > 
> > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > buggy. We don't hold any sort of lock while extracting
> > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > we are reading it?
> > > > 
> > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > increment/decrement. So, I tend to ignore this small race.
> > > (and it's unsigned long, not long long.)
> > >
> > 
> > The assumption is that unsigned long read is atomic even on 32 bit
> > systems? What if we get pre-empted in the middle of reading the data
> > and don't return back for long? The data can be highly in-accurate.
> > No? 
> > 
> Hmm,  preempt_disable() is appropriate ?
> 
> But shrink_zone() itself works on the value which is read at this time and
> dont' take care of changes in situation by preeemption...so it's not problem
> of memcg.
>

You'll end up reclaiming based on old stale data. shrink_zone itself
maintains atomic data for zones.

I think the assumption that unsigned long read is atomic seems quite
reasonable, but I want to validate this across architectures. Anyone
know the correct answer? 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  4:14             ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  4:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:

> On Thu, 12 Mar 2009 09:30:54 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > 
> > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > 
> > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > 
> > > > > ==
> > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > 
> > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > status of global LRU. Fix that.
> > > > > 
> > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > ---
> > > > >  mm/vmscan.c |    2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > ===================================================================
> > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > >  		int file = is_file_lru(l);
> > > > >  		int scan;
> > > > > 
> > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > 
> > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > buggy. We don't hold any sort of lock while extracting
> > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > we are reading it?
> > > > 
> > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > increment/decrement. So, I tend to ignore this small race.
> > > (and it's unsigned long, not long long.)
> > >
> > 
> > The assumption is that unsigned long read is atomic even on 32 bit
> > systems? What if we get pre-empted in the middle of reading the data
> > and don't return back for long? The data can be highly in-accurate.
> > No? 
> > 
> Hmm,  preempt_disable() is appropriate ?
> 
> But shrink_zone() itself works on the value which is read at this time and
> dont' take care of changes in situation by preeemption...so it's not problem
> of memcg.
>

You'll end up reclaiming based on old stale data. shrink_zone itself
maintains atomic data for zones.

I think the assumption that unsigned long read is atomic seems quite
reasonable, but I want to validate this across architectures. Anyone
know the correct answer? 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
  2009-03-12  4:10         ` Balbir Singh
@ 2009-03-12  4:14           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:14 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:40:38 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Correct me if I am wrong, but this boils down to checking if the top
> root is above it's soft limit? 

  Level_1    soft limit=400M
    Level_2  soft limit=200M
      Level_3  no soft limit
      Level_3  softlimit=100M
    Level_2  soft limit=200M
    Level_2  soft limit=200M

When checking Level3, we need to check Level_2 and Level_1.


> Instead of checking all the way up in
> the hierarchy, can't we do a conditional check for
> 
>         c->parent == NULL && (c->softlimit < c->usage)
> 
> BTW, I would prefer to split the word softlimit to soft_limit, it is
> more readable that way.
> 
Ok, it will give me tons of HUNK but will do ;)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 2/5] add softlimit to res_counter
@ 2009-03-12  4:14           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:14 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:40:38 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Correct me if I am wrong, but this boils down to checking if the top
> root is above it's soft limit? 

  Level_1    soft limit=400M
    Level_2  soft limit=200M
      Level_3  no soft limit
      Level_3  softlimit=100M
    Level_2  soft limit=200M
    Level_2  soft limit=200M

When checking Level3, we need to check Level_2 and Level_1.


> Instead of checking all the way up in
> the hierarchy, can't we do a conditional check for
> 
>         c->parent == NULL && (c->softlimit < c->usage)
> 
> BTW, I would prefer to split the word softlimit to soft_limit, it is
> more readable that way.
> 
Ok, it will give me tons of HUNK but will do ;)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  4:14             ` Balbir Singh
@ 2009-03-12  4:17               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:17 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

On Thu, 12 Mar 2009 09:44:14 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:
> 
> > On Thu, 12 Mar 2009 09:30:54 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > > 
> > > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > > 
> > > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > > 
> > > > > > ==
> > > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > 
> > > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > > status of global LRU. Fix that.
> > > > > > 
> > > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > > ---
> > > > > >  mm/vmscan.c |    2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > ===================================================================
> > > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > > >  		int file = is_file_lru(l);
> > > > > >  		int scan;
> > > > > > 
> > > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > > 
> > > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > > buggy. We don't hold any sort of lock while extracting
> > > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > > we are reading it?
> > > > > 
> > > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > > increment/decrement. So, I tend to ignore this small race.
> > > > (and it's unsigned long, not long long.)
> > > >
> > > 
> > > The assumption is that unsigned long read is atomic even on 32 bit
> > > systems? What if we get pre-empted in the middle of reading the data
> > > and don't return back for long? The data can be highly in-accurate.
> > > No? 
> > > 
> > Hmm,  preempt_disable() is appropriate ?
> > 
> > But shrink_zone() itself works on the value which is read at this time and
> > dont' take care of changes in situation by preeemption...so it's not problem
> > of memcg.
> >
> 
> You'll end up reclaiming based on old stale data. shrink_zone itself
> maintains atomic data for zones.
> 
IIUC, # of pages to be scanned is just determined once, here.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  4:17               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:17 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro, akpm

On Thu, 12 Mar 2009 09:44:14 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:
> 
> > On Thu, 12 Mar 2009 09:30:54 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > > 
> > > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > > 
> > > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > > 
> > > > > > ==
> > > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > 
> > > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > > status of global LRU. Fix that.
> > > > > > 
> > > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > > ---
> > > > > >  mm/vmscan.c |    2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > ===================================================================
> > > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > > >  		int file = is_file_lru(l);
> > > > > >  		int scan;
> > > > > > 
> > > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > > 
> > > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > > buggy. We don't hold any sort of lock while extracting
> > > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > > we are reading it?
> > > > > 
> > > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > > increment/decrement. So, I tend to ignore this small race.
> > > > (and it's unsigned long, not long long.)
> > > >
> > > 
> > > The assumption is that unsigned long read is atomic even on 32 bit
> > > systems? What if we get pre-empted in the middle of reading the data
> > > and don't return back for long? The data can be highly in-accurate.
> > > No? 
> > > 
> > Hmm,  preempt_disable() is appropriate ?
> > 
> > But shrink_zone() itself works on the value which is read at this time and
> > dont' take care of changes in situation by preeemption...so it's not problem
> > of memcg.
> >
> 
> You'll end up reclaiming based on old stale data. shrink_zone itself
> maintains atomic data for zones.
> 
IIUC, # of pages to be scanned is just determined once, here.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  3:46   ` Balbir Singh
@ 2009-03-12  4:39     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:39 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:16:47 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:
> I've tested so far by
> 
> Creating two cgroups and then 
> 
> a. Assigning limits of 1G and 2G and run memory allocation and touch
> test
softlimit ?

> b. Same as (a) with 1G and 1G
> c. Same as (a) with 0 and 1G
> d. Same as (a) with 0 and 0
> 
> More comments in induvidual patches.
> 
Then,
  1. what's the number of active threads ?
  2. what's the number of cpus ?
  3. what's the numa configuration, if numa ?
  4. what's the zone configuration ?
  5. what's arch ?
  6. what's amount of total memory ?
  7. Do you find difference in behavior with and without softlimit ?
  8. Do you tested *this* version ?

Thanks,
-Kame


> -- 
> 	Balbir
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  4:39     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:39 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 09:16:47 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:
> I've tested so far by
> 
> Creating two cgroups and then 
> 
> a. Assigning limits of 1G and 2G and run memory allocation and touch
> test
softlimit ?

> b. Same as (a) with 1G and 1G
> c. Same as (a) with 0 and 1G
> d. Same as (a) with 0 and 0
> 
> More comments in induvidual patches.
> 
Then,
  1. what's the number of active threads ?
  2. what's the number of cpus ?
  3. what's the numa configuration, if numa ?
  4. what's the zone configuration ?
  5. what's arch ?
  6. what's amount of total memory ?
  7. Do you find difference in behavior with and without softlimit ?
  8. Do you tested *this* version ?

Thanks,
-Kame


> -- 
> 	Balbir
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
  2009-03-12  1:00   ` KAMEZAWA Hiroyuki
@ 2009-03-12  4:59     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 10:00:08 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> This patch needs MORE investigation...
> 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch adds hooks for memcg's softlimit to kswapd().
> 
> Softlimit handler is called...
>   - before generic shrink_zone() is called.
>   - # of pages to be scanned depends on priority.
>   - If not enough progress, selected memcg will be moved to UNUSED queue.
>   - at each call for balance_pgdat(), softlimit queue is rebalanced.
> 
> Changelog: v3 -> v4
>  - move "sc" as local variable
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> @@ -1733,6 +1733,49 @@ unsigned long try_to_free_mem_cgroup_pag
>  }
>  #endif
>  
> +static void shrink_zone_softlimit(struct zone *zone, int order, int priority,
> +			   int target, int end_zone)
> +{
> +	int scan = SWAP_CLUSTER_MAX;
> +	int nid = zone->zone_pgdat->node_id;
> +	int zid = zone_idx(zone);
> +	struct mem_cgroup *mem;
> +	struct scan_control sc =  {
> +		.gfp_mask = GFP_KERNEL,
> +		.may_writepage = !laptop_mode,
> +		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.may_unmap = 1,
> +		.swappiness = vm_swappiness,
> +		.order = order,
> +		.mem_cgroup = NULL,
> +		.isolate_pages = mem_cgroup_isolate_pages,
> +	};
> +
> +	scan = target * 2;
> +
> +	sc.nr_scanned = 0;
> +	sc.nr_reclaimed = 0;
> +	while (scan > 0) {
> +		if (zone_watermark_ok(zone, order, target, end_zone, 0))
> +			break;
> +		mem = mem_cgroup_schedule(nid, zid);
> +		if (!mem)
> +			return;
> +		sc.mem_cgroup = mem;
> +
> +		sc.nr_reclaimed = 0;
needs
  sc.nr_scanned = 0;
...
-Kame
> +		shrink_zone(priority, zone, &sc);
> +
> +		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX/2)
> +			mem_cgroup_schedule_end(nid, zid, mem, true);
> +		else
> +			mem_cgroup_schedule_end(nid, zid, mem, false);
> +
> +		scan -= sc.nr_scanned;
> +	}
> +
> +	return;
> +}
>  /*
>   * For kswapd, balance_pgdat() will work across all this node's zones until
>   * they are all at pages_high.
> @@ -1776,6 +1819,8 @@ static unsigned long balance_pgdat(pg_da
>  	 */
>  	int temp_priority[MAX_NR_ZONES];
>  
> +	/* Refill softlimit queue */
> +	mem_cgroup_reschedule_all(pgdat->node_id);
>  loop_again:
>  	total_scanned = 0;
>  	sc.nr_reclaimed = 0;
> @@ -1856,6 +1901,13 @@ loop_again:
>  					       end_zone, 0))
>  				all_zones_ok = 0;
>  			temp_priority[i] = priority;
> +
> +			/*
> +			 * Try soft limit at first.  This reclaims page
> +			 * with regard to user's hint.
> +			 */
> +			shrink_zone_softlimit(zone, order, priority,
> +					       8 * zone->pages_high, end_zone);
>  			sc.nr_scanned = 0;
>  			note_zone_scanning_priority(zone, priority);
>  			/*
> 
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 5/5] memcg softlimit hooks to kswapd
@ 2009-03-12  4:59     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  4:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, balbir, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 10:00:08 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> This patch needs MORE investigation...
> 
> ==
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> This patch adds hooks for memcg's softlimit to kswapd().
> 
> Softlimit handler is called...
>   - before generic shrink_zone() is called.
>   - # of pages to be scanned depends on priority.
>   - If not enough progress, selected memcg will be moved to UNUSED queue.
>   - at each call for balance_pgdat(), softlimit queue is rebalanced.
> 
> Changelog: v3 -> v4
>  - move "sc" as local variable
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> ===================================================================
> --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> @@ -1733,6 +1733,49 @@ unsigned long try_to_free_mem_cgroup_pag
>  }
>  #endif
>  
> +static void shrink_zone_softlimit(struct zone *zone, int order, int priority,
> +			   int target, int end_zone)
> +{
> +	int scan = SWAP_CLUSTER_MAX;
> +	int nid = zone->zone_pgdat->node_id;
> +	int zid = zone_idx(zone);
> +	struct mem_cgroup *mem;
> +	struct scan_control sc =  {
> +		.gfp_mask = GFP_KERNEL,
> +		.may_writepage = !laptop_mode,
> +		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.may_unmap = 1,
> +		.swappiness = vm_swappiness,
> +		.order = order,
> +		.mem_cgroup = NULL,
> +		.isolate_pages = mem_cgroup_isolate_pages,
> +	};
> +
> +	scan = target * 2;
> +
> +	sc.nr_scanned = 0;
> +	sc.nr_reclaimed = 0;
> +	while (scan > 0) {
> +		if (zone_watermark_ok(zone, order, target, end_zone, 0))
> +			break;
> +		mem = mem_cgroup_schedule(nid, zid);
> +		if (!mem)
> +			return;
> +		sc.mem_cgroup = mem;
> +
> +		sc.nr_reclaimed = 0;
needs
  sc.nr_scanned = 0;
...
-Kame
> +		shrink_zone(priority, zone, &sc);
> +
> +		if (sc.nr_reclaimed >= SWAP_CLUSTER_MAX/2)
> +			mem_cgroup_schedule_end(nid, zid, mem, true);
> +		else
> +			mem_cgroup_schedule_end(nid, zid, mem, false);
> +
> +		scan -= sc.nr_scanned;
> +	}
> +
> +	return;
> +}
>  /*
>   * For kswapd, balance_pgdat() will work across all this node's zones until
>   * they are all at pages_high.
> @@ -1776,6 +1819,8 @@ static unsigned long balance_pgdat(pg_da
>  	 */
>  	int temp_priority[MAX_NR_ZONES];
>  
> +	/* Refill softlimit queue */
> +	mem_cgroup_reschedule_all(pgdat->node_id);
>  loop_again:
>  	total_scanned = 0;
>  	sc.nr_reclaimed = 0;
> @@ -1856,6 +1901,13 @@ loop_again:
>  					       end_zone, 0))
>  				all_zones_ok = 0;
>  			temp_priority[i] = priority;
> +
> +			/*
> +			 * Try soft limit at first.  This reclaims page
> +			 * with regard to user's hint.
> +			 */
> +			shrink_zone_softlimit(zone, order, priority,
> +					       8 * zone->pages_high, end_zone);
>  			sc.nr_scanned = 0;
>  			note_zone_scanning_priority(zone, priority);
>  			/*
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  4:39     ` KAMEZAWA Hiroyuki
@ 2009-03-12  5:04       ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  5:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:39:49]:

> On Thu, 12 Mar 2009 09:16:47 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:
> > I've tested so far by
> > 
> > Creating two cgroups and then 
> > 
> > a. Assigning limits of 1G and 2G and run memory allocation and touch
> > test
> softlimit ?
>

Yes
 
> > b. Same as (a) with 1G and 1G
> > c. Same as (a) with 0 and 1G
> > d. Same as (a) with 0 and 0
> > 
> > More comments in induvidual patches.
> > 
> Then,
>   1. what's the number of active threads ?

One for each process in the two groups

>   2. what's the number of cpus ?

4

>   3. what's the numa configuration, if numa ?

Fake NUMA with nodes = 4, I have DMA, DMA32 and NORMAL split across
nodes.

>   4. what's the zone configuration ?
>   5. what's arch ?
>   6. what's amount of total memory ?

I have 4GB on x86-64 system (Quad Core)

>   7. Do you find difference in behavior with and without softlimit ?

Very much so.. I see the resources being shared as defined by soft
limits.

>   8. Do you tested *this* version ?
> 

Not yet.. you just posted it. I am testing my v5, which I'll post
soon. I am seeing very good results with v5. I'll test yours later
today.

> Thanks,
> -Kame

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  5:04       ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  5:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:39:49]:

> On Thu, 12 Mar 2009 09:16:47 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:
> > I've tested so far by
> > 
> > Creating two cgroups and then 
> > 
> > a. Assigning limits of 1G and 2G and run memory allocation and touch
> > test
> softlimit ?
>

Yes
 
> > b. Same as (a) with 1G and 1G
> > c. Same as (a) with 0 and 1G
> > d. Same as (a) with 0 and 0
> > 
> > More comments in induvidual patches.
> > 
> Then,
>   1. what's the number of active threads ?

One for each process in the two groups

>   2. what's the number of cpus ?

4

>   3. what's the numa configuration, if numa ?

Fake NUMA with nodes = 4, I have DMA, DMA32 and NORMAL split across
nodes.

>   4. what's the zone configuration ?
>   5. what's arch ?
>   6. what's amount of total memory ?

I have 4GB on x86-64 system (Quad Core)

>   7. Do you find difference in behavior with and without softlimit ?

Very much so.. I see the resources being shared as defined by soft
limits.

>   8. Do you tested *this* version ?
> 

Not yet.. you just posted it. I am testing my v5, which I'll post
soon. I am seeing very good results with v5. I'll test yours later
today.

> Thanks,
> -Kame

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  5:04       ` Balbir Singh
@ 2009-03-12  5:32         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  5:32 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 10:34:23 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Not yet.. you just posted it. I am testing my v5, which I'll post
> soon. I am seeing very good results with v5. I'll test yours later
> today.
> 

If "hooks" to usual path doesn't exist and there are no global locks,
I don't have much concern with your version.
But 'sorting' seems to be overkill to me.

I'm sorry if my responce to your patch is delayed. I may not be in office.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  5:32         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  5:32 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 10:34:23 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Not yet.. you just posted it. I am testing my v5, which I'll post
> soon. I am seeing very good results with v5. I'll test yours later
> today.
> 

If "hooks" to usual path doesn't exist and there are no global locks,
I don't have much concern with your version.
But 'sorting' seems to be overkill to me.

I'm sorry if my responce to your patch is delayed. I may not be in office.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  4:17               ` KAMEZAWA Hiroyuki
@ 2009-03-12  7:45                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 68+ messages in thread
From: KOSAKI Motohiro @ 2009-03-12  7:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, balbir, linux-mm, linux-kernel, nishimura, akpm

> On Thu, 12 Mar 2009 09:44:14 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:
> > 
> > > On Thu, 12 Mar 2009 09:30:54 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > > > 
> > > > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > > 
> > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > > > 
> > > > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > > > 
> > > > > > > ==
> > > > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > 
> > > > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > > > status of global LRU. Fix that.
> > > > > > > 
> > > > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > > > ---
> > > > > > >  mm/vmscan.c |    2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > ===================================================================
> > > > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > > > >  		int file = is_file_lru(l);
> > > > > > >  		int scan;
> > > > > > > 
> > > > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > > > 
> > > > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > > > buggy. We don't hold any sort of lock while extracting
> > > > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > > > we are reading it?
> > > > > > 
> > > > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > > > increment/decrement. So, I tend to ignore this small race.
> > > > > (and it's unsigned long, not long long.)
> > > > >
> > > > 
> > > > The assumption is that unsigned long read is atomic even on 32 bit
> > > > systems? What if we get pre-empted in the middle of reading the data
> > > > and don't return back for long? The data can be highly in-accurate.
> > > > No? 
> > > > 
> > > Hmm,  preempt_disable() is appropriate ?
> > > 
> > > But shrink_zone() itself works on the value which is read at this time and
> > > dont' take care of changes in situation by preeemption...so it's not problem
> > > of memcg.
> > >
> > 
> > You'll end up reclaiming based on old stale data. shrink_zone itself
> > maintains atomic data for zones.
> > 
> IIUC, # of pages to be scanned is just determined once, here.

In this case, lockless is right behavior.
lockless is valuable than precise ZSTAT. end user can't observe this race.





^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  7:45                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 68+ messages in thread
From: KOSAKI Motohiro @ 2009-03-12  7:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro, balbir, linux-mm, linux-kernel, nishimura, akpm

> On Thu, 12 Mar 2009 09:44:14 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:
> > 
> > > On Thu, 12 Mar 2009 09:30:54 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > > > 
> > > > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > > 
> > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > > > 
> > > > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > > > 
> > > > > > > ==
> > > > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > 
> > > > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > > > status of global LRU. Fix that.
> > > > > > > 
> > > > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > > > ---
> > > > > > >  mm/vmscan.c |    2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > ===================================================================
> > > > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > > > >  		int file = is_file_lru(l);
> > > > > > >  		int scan;
> > > > > > > 
> > > > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > > > 
> > > > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > > > buggy. We don't hold any sort of lock while extracting
> > > > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > > > we are reading it?
> > > > > > 
> > > > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > > > increment/decrement. So, I tend to ignore this small race.
> > > > > (and it's unsigned long, not long long.)
> > > > >
> > > > 
> > > > The assumption is that unsigned long read is atomic even on 32 bit
> > > > systems? What if we get pre-empted in the middle of reading the data
> > > > and don't return back for long? The data can be highly in-accurate.
> > > > No? 
> > > > 
> > > Hmm,  preempt_disable() is appropriate ?
> > > 
> > > But shrink_zone() itself works on the value which is read at this time and
> > > dont' take care of changes in situation by preeemption...so it's not problem
> > > of memcg.
> > >
> > 
> > You'll end up reclaiming based on old stale data. shrink_zone itself
> > maintains atomic data for zones.
> > 
> IIUC, # of pages to be scanned is just determined once, here.

In this case, lockless is right behavior.
lockless is valuable than precise ZSTAT. end user can't observe this race.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  5:32         ` KAMEZAWA Hiroyuki
@ 2009-03-12  8:26           ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  8:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 14:32:12]:

> On Thu, 12 Mar 2009 10:34:23 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > Not yet.. you just posted it. I am testing my v5, which I'll post
> > soon. I am seeing very good results with v5. I'll test yours later
> > today.
> > 
> 
> If "hooks" to usual path doesn't exist and there are no global locks,
> I don't have much concern with your version.

Good to know. I think it is always good to have competing patches and
then collaborating and getting the best in.

> But 'sorting' seems to be overkill to me.
> 

Sorting is very useful, specially if you have many cgroups. Without
sorting, how do we select what group to select first.

> I'm sorry if my responce to your patch is delayed. I may not be in office.
>

No problem 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  8:26           ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  8:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 14:32:12]:

> On Thu, 12 Mar 2009 10:34:23 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > Not yet.. you just posted it. I am testing my v5, which I'll post
> > soon. I am seeing very good results with v5. I'll test yours later
> > today.
> > 
> 
> If "hooks" to usual path doesn't exist and there are no global locks,
> I don't have much concern with your version.

Good to know. I think it is always good to have competing patches and
then collaborating and getting the best in.

> But 'sorting' seems to be overkill to me.
> 

Sorting is very useful, specially if you have many cgroups. Without
sorting, how do we select what group to select first.

> I'm sorry if my responce to your patch is delayed. I may not be in office.
>

No problem 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  8:26           ` Balbir Singh
@ 2009-03-12  8:45             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  8:45 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 13:56:46 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 14:32:12]:
> 
> > On Thu, 12 Mar 2009 10:34:23 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Not yet.. you just posted it. I am testing my v5, which I'll post
> > > soon. I am seeing very good results with v5. I'll test yours later
> > > today.
> > > 
> > 
> > If "hooks" to usual path doesn't exist and there are no global locks,
> > I don't have much concern with your version.
> 
> Good to know. I think it is always good to have competing patches and
> then collaborating and getting the best in.
> 
> > But 'sorting' seems to be overkill to me.
> > 
> 
> Sorting is very useful, specially if you have many cgroups. Without
> sorting, how do we select what group to select first.
> 
As I explained, if round-robin works well, ordering has no meaning.
That's just a difference of what is the fairness.

  1. In your method, recalaim at first from the user which exceeds the most
     is fair.
  2. In my method, reclaim from each cgroup in round robin is fair.

No big issue to users if the kernel policy is fixed.
Why I take "2" is that the usage of memcg doesn't mean the usage in the zone,
so, there are no big difference between 1 and 2 on NUMA.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  8:45             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-12  8:45 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Thu, 12 Mar 2009 13:56:46 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 14:32:12]:
> 
> > On Thu, 12 Mar 2009 10:34:23 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Not yet.. you just posted it. I am testing my v5, which I'll post
> > > soon. I am seeing very good results with v5. I'll test yours later
> > > today.
> > > 
> > 
> > If "hooks" to usual path doesn't exist and there are no global locks,
> > I don't have much concern with your version.
> 
> Good to know. I think it is always good to have competing patches and
> then collaborating and getting the best in.
> 
> > But 'sorting' seems to be overkill to me.
> > 
> 
> Sorting is very useful, specially if you have many cgroups. Without
> sorting, how do we select what group to select first.
> 
As I explained, if round-robin works well, ordering has no meaning.
That's just a difference of what is the fairness.

  1. In your method, recalaim at first from the user which exceeds the most
     is fair.
  2. In my method, reclaim from each cgroup in round robin is fair.

No big issue to users if the kernel policy is fixed.
Why I take "2" is that the usage of memcg doesn't mean the usage in the zone,
so, there are no big difference between 1 and 2 on NUMA.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  7:45                 ` KOSAKI Motohiro
@ 2009-03-12  9:45                   ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  9:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, nishimura, akpm

* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-03-12 16:45:59]:

> > On Thu, 12 Mar 2009 09:44:14 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:
> > > 
> > > > On Thu, 12 Mar 2009 09:30:54 +0530
> > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > > > > 
> > > > > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > > > 
> > > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > > > > 
> > > > > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > > > > 
> > > > > > > > ==
> > > > > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > > 
> > > > > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > > > > status of global LRU. Fix that.
> > > > > > > > 
> > > > > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > > > > ---
> > > > > > > >  mm/vmscan.c |    2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > > ===================================================================
> > > > > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > > > > >  		int file = is_file_lru(l);
> > > > > > > >  		int scan;
> > > > > > > > 
> > > > > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > > > > 
> > > > > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > > > > buggy. We don't hold any sort of lock while extracting
> > > > > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > > > > we are reading it?
> > > > > > > 
> > > > > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > > > > increment/decrement. So, I tend to ignore this small race.
> > > > > > (and it's unsigned long, not long long.)
> > > > > >
> > > > > 
> > > > > The assumption is that unsigned long read is atomic even on 32 bit
> > > > > systems? What if we get pre-empted in the middle of reading the data
> > > > > and don't return back for long? The data can be highly in-accurate.
> > > > > No? 
> > > > > 
> > > > Hmm,  preempt_disable() is appropriate ?
> > > > 
> > > > But shrink_zone() itself works on the value which is read at this time and
> > > > dont' take care of changes in situation by preeemption...so it's not problem
> > > > of memcg.
> > > >
> > > 
> > > You'll end up reclaiming based on old stale data. shrink_zone itself
> > > maintains atomic data for zones.
> > > 
> > IIUC, # of pages to be scanned is just determined once, here.
> 
> In this case, lockless is right behavior.
> lockless is valuable than precise ZSTAT. end user can't observe this race.
>

Lockless works fine provided the data is correctly aligned. I need to
check this out more thoroghly.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12  9:45                   ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  9:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, nishimura, akpm

* KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2009-03-12 16:45:59]:

> > On Thu, 12 Mar 2009 09:44:14 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 13:05:56]:
> > > 
> > > > On Thu, 12 Mar 2009 09:30:54 +0530
> > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 12:51:24]:
> > > > > 
> > > > > > On Thu, 12 Mar 2009 09:19:18 +0530
> > > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > > > 
> > > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:55:16]:
> > > > > > > 
> > > > > > > > Andrew, this [1/5] is a bug fix, others are not.
> > > > > > > > 
> > > > > > > > ==
> > > > > > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > > 
> > > > > > > > Even when page reclaim is under mem_cgroup, # of scan page is determined by
> > > > > > > > status of global LRU. Fix that.
> > > > > > > > 
> > > > > > > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > > > > > ---
> > > > > > > >  mm/vmscan.c |    2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > Index: mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > > ===================================================================
> > > > > > > > --- mmotm-2.6.29-Mar10.orig/mm/vmscan.c
> > > > > > > > +++ mmotm-2.6.29-Mar10/mm/vmscan.c
> > > > > > > > @@ -1470,7 +1470,7 @@ static void shrink_zone(int priority, st
> > > > > > > >  		int file = is_file_lru(l);
> > > > > > > >  		int scan;
> > > > > > > > 
> > > > > > > > -		scan = zone_page_state(zone, NR_LRU_BASE + l);
> > > > > > > > +		scan = zone_nr_pages(zone, sc, l);
> > > > > > > 
> > > > > > > I have the exact same patch in my patch queue. BTW, mem_cgroup_zone_nr_pages is
> > > > > > > buggy. We don't hold any sort of lock while extracting
> > > > > > > MEM_CGROUP_ZSTAT (ideally we need zone->lru_lock). Without that how do
> > > > > > > we guarantee that MEM_CGRUP_ZSTAT is not changing at the same time as
> > > > > > > we are reading it?
> > > > > > > 
> > > > > > Is it big problem ? We don't need very precise value and ZSTAT just have
> > > > > > increment/decrement. So, I tend to ignore this small race.
> > > > > > (and it's unsigned long, not long long.)
> > > > > >
> > > > > 
> > > > > The assumption is that unsigned long read is atomic even on 32 bit
> > > > > systems? What if we get pre-empted in the middle of reading the data
> > > > > and don't return back for long? The data can be highly in-accurate.
> > > > > No? 
> > > > > 
> > > > Hmm,  preempt_disable() is appropriate ?
> > > > 
> > > > But shrink_zone() itself works on the value which is read at this time and
> > > > dont' take care of changes in situation by preeemption...so it's not problem
> > > > of memcg.
> > > >
> > > 
> > > You'll end up reclaiming based on old stale data. shrink_zone itself
> > > maintains atomic data for zones.
> > > 
> > IIUC, # of pages to be scanned is just determined once, here.
> 
> In this case, lockless is right behavior.
> lockless is valuable than precise ZSTAT. end user can't observe this race.
>

Lockless works fine provided the data is correctly aligned. I need to
check this out more thoroghly.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  8:45             ` KAMEZAWA Hiroyuki
@ 2009-03-12  9:53               ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  9:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 17:45:44]:

> On Thu, 12 Mar 2009 13:56:46 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 14:32:12]:
> > 
> > > On Thu, 12 Mar 2009 10:34:23 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > Not yet.. you just posted it. I am testing my v5, which I'll post
> > > > soon. I am seeing very good results with v5. I'll test yours later
> > > > today.
> > > > 
> > > 
> > > If "hooks" to usual path doesn't exist and there are no global locks,
> > > I don't have much concern with your version.
> > 
> > Good to know. I think it is always good to have competing patches and
> > then collaborating and getting the best in.
> > 
> > > But 'sorting' seems to be overkill to me.
> > > 
> > 
> > Sorting is very useful, specially if you have many cgroups. Without
> > sorting, how do we select what group to select first.
> > 
> As I explained, if round-robin works well, ordering has no meaning.
> That's just a difference of what is the fairness.
> 
>   1. In your method, recalaim at first from the user which exceeds the most
>      is fair.
>   2. In my method, reclaim from each cgroup in round robin is fair.
> 
> No big issue to users if the kernel policy is fixed.
> Why I take "2" is that the usage of memcg doesn't mean the usage in the zone,
> so, there are no big difference between 1 and 2 on NUMA.
>

Round robin can be bad for soft limits. If an application started up
way ahead of others, but had a small soft limit, we would like
resources to be properly allocated when the second application comes
up. As the number of cgroups increase, selecting the correct cgroup to
reclaim from is going to be a challenge without sorting.

Having said that, I need to do more testing with your patches. 

-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-12  9:53               ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-12  9:53 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 17:45:44]:

> On Thu, 12 Mar 2009 13:56:46 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 14:32:12]:
> > 
> > > On Thu, 12 Mar 2009 10:34:23 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > Not yet.. you just posted it. I am testing my v5, which I'll post
> > > > soon. I am seeing very good results with v5. I'll test yours later
> > > > today.
> > > > 
> > > 
> > > If "hooks" to usual path doesn't exist and there are no global locks,
> > > I don't have much concern with your version.
> > 
> > Good to know. I think it is always good to have competing patches and
> > then collaborating and getting the best in.
> > 
> > > But 'sorting' seems to be overkill to me.
> > > 
> > 
> > Sorting is very useful, specially if you have many cgroups. Without
> > sorting, how do we select what group to select first.
> > 
> As I explained, if round-robin works well, ordering has no meaning.
> That's just a difference of what is the fairness.
> 
>   1. In your method, recalaim at first from the user which exceeds the most
>      is fair.
>   2. In my method, reclaim from each cgroup in round robin is fair.
> 
> No big issue to users if the kernel policy is fixed.
> Why I take "2" is that the usage of memcg doesn't mean the usage in the zone,
> so, there are no big difference between 1 and 2 on NUMA.
>

Round robin can be bad for soft limits. If an application started up
way ahead of others, but had a small soft limit, we would like
resources to be properly allocated when the second application comes
up. As the number of cgroups increase, selecting the correct cgroup to
reclaim from is going to be a challenge without sorting.

Having said that, I need to do more testing with your patches. 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
  2009-03-12  9:45                   ` Balbir Singh
@ 2009-03-12 11:23                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 68+ messages in thread
From: KOSAKI Motohiro @ 2009-03-12 11:23 UTC (permalink / raw)
  To: balbir; +Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, nishimura, akpm

>> > IIUC, # of pages to be scanned is just determined once, here.
>>
>> In this case, lockless is right behavior.
>> lockless is valuable than precise ZSTAT. end user can't observe this race.
>>
>
> Lockless works fine provided the data is correctly aligned. I need to
> check this out more thoroghly.

Thanks a lot :)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim
@ 2009-03-12 11:23                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 68+ messages in thread
From: KOSAKI Motohiro @ 2009-03-12 11:23 UTC (permalink / raw)
  To: balbir; +Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, nishimura, akpm

>> > IIUC, # of pages to be scanned is just determined once, here.
>>
>> In this case, lockless is right behavior.
>> lockless is valuable than precise ZSTAT. end user can't observe this race.
>>
>
> Lockless works fine provided the data is correctly aligned. I need to
> check this out more thoroghly.

Thanks a lot :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-12  0:52 ` KAMEZAWA Hiroyuki
@ 2009-03-14 18:52   ` Balbir Singh
  -1 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-14 18:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:

> Hi, this is a patch for implemnt softlimit to memcg.
> 
> I did some clean up and bug fixes. 
> 
> Anyway I have to look into details of "LRU scan algorithm" after this.
> 
> How this works:
> 
>  (1) Set softlimit threshold to memcg.
>      #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.
> 
>  (2) Define priority as victim.
>      #echo 3 > /cgroups/my_group/memory.softlimit_priority.
>      0 is the lowest, 8 is the highest.
>      If "8", softlimit feature ignore this group.
>      default value is "8".
> 
>  (3) Add some memory pressure and make kswapd() work.
>      kswapd will reclaim memory from victims paying regard to priority.
> 
> Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)
> 
>   While a process malloc 800MB of memory and touch it and sleep in a group,
>   run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.
> 
>   Without softlimit => 400MB of malloc'ed memory are swapped out.
>   With softlimit    =>  80MB of malloc'ed memory are swapped out. 
> 
> I think 80MB of swap is from direct memory reclaim path. And this
> seems not to be terrible result.
> 
> I'll do more test on other hosts. Any comments are welcome.

Hi, Kamezawa-San,

I tried some simple tests with this patch and the results are not
anywhere close to expected.

1. My setup is 4GB RAM with 4 CPUs and I boot with numa=fake=4
2. I setup my cgroups as follows
   a. created /a and /b and set memory.use_hierarchy=1
   b. created /a/x and /b/x, set their memory.softlimit_priority=1
   c. set softlimit_in_bytes for a/x to 1G and b/x to 2G
   d. I assigned tasks to a/x and b/x

I expected the tasks in a/x and b/x to get memory distributed in the
ratio to 1:2. Here is what I found

1. The task in a/x got more memory than the task in b/x even though
   I started the task in b/x first
2. Even changing softlimit_priority (increased for b) did not help much


-- 
	Balbir

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-14 18:52   ` Balbir Singh
  0 siblings, 0 replies; 68+ messages in thread
From: Balbir Singh @ 2009-03-14 18:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:

> Hi, this is a patch for implemnt softlimit to memcg.
> 
> I did some clean up and bug fixes. 
> 
> Anyway I have to look into details of "LRU scan algorithm" after this.
> 
> How this works:
> 
>  (1) Set softlimit threshold to memcg.
>      #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.
> 
>  (2) Define priority as victim.
>      #echo 3 > /cgroups/my_group/memory.softlimit_priority.
>      0 is the lowest, 8 is the highest.
>      If "8", softlimit feature ignore this group.
>      default value is "8".
> 
>  (3) Add some memory pressure and make kswapd() work.
>      kswapd will reclaim memory from victims paying regard to priority.
> 
> Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)
> 
>   While a process malloc 800MB of memory and touch it and sleep in a group,
>   run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.
> 
>   Without softlimit => 400MB of malloc'ed memory are swapped out.
>   With softlimit    =>  80MB of malloc'ed memory are swapped out. 
> 
> I think 80MB of swap is from direct memory reclaim path. And this
> seems not to be terrible result.
> 
> I'll do more test on other hosts. Any comments are welcome.

Hi, Kamezawa-San,

I tried some simple tests with this patch and the results are not
anywhere close to expected.

1. My setup is 4GB RAM with 4 CPUs and I boot with numa=fake=4
2. I setup my cgroups as follows
   a. created /a and /b and set memory.use_hierarchy=1
   b. created /a/x and /b/x, set their memory.softlimit_priority=1
   c. set softlimit_in_bytes for a/x to 1G and b/x to 2G
   d. I assigned tasks to a/x and b/x

I expected the tasks in a/x and b/x to get memory distributed in the
ratio to 1:2. Here is what I found

1. The task in a/x got more memory than the task in b/x even though
   I started the task in b/x first
2. Even changing softlimit_priority (increased for b) did not help much


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
  2009-03-14 18:52   ` Balbir Singh
@ 2009-03-16  0:10     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-16  0:10 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Sun, 15 Mar 2009 00:22:46 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:
> 
> > Hi, this is a patch for implemnt softlimit to memcg.
> > 
> > I did some clean up and bug fixes. 
> > 
> > Anyway I have to look into details of "LRU scan algorithm" after this.
> > 
> > How this works:
> > 
> >  (1) Set softlimit threshold to memcg.
> >      #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.
> > 
> >  (2) Define priority as victim.
> >      #echo 3 > /cgroups/my_group/memory.softlimit_priority.
> >      0 is the lowest, 8 is the highest.
> >      If "8", softlimit feature ignore this group.
> >      default value is "8".
> > 
> >  (3) Add some memory pressure and make kswapd() work.
> >      kswapd will reclaim memory from victims paying regard to priority.
> > 
> > Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)
> > 
> >   While a process malloc 800MB of memory and touch it and sleep in a group,
> >   run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.
> > 
> >   Without softlimit => 400MB of malloc'ed memory are swapped out.
> >   With softlimit    =>  80MB of malloc'ed memory are swapped out. 
> > 
> > I think 80MB of swap is from direct memory reclaim path. And this
> > seems not to be terrible result.
> > 
> > I'll do more test on other hosts. Any comments are welcome.
> 
> Hi, Kamezawa-San,
> 
> I tried some simple tests with this patch and the results are not
> anywhere close to expected.
> 
> 1. My setup is 4GB RAM with 4 CPUs and I boot with numa=fake=4
> 2. I setup my cgroups as follows
>    a. created /a and /b and set memory.use_hierarchy=1
>    b. created /a/x and /b/x, set their memory.softlimit_priority=1
>    c. set softlimit_in_bytes for a/x to 1G and b/x to 2G
>    d. I assigned tasks to a/x and b/x
> 
> I expected the tasks in a/x and b/x to get memory distributed in the
> ratio to 1:2. Here is what I found
> 
> 1. The task in a/x got more memory than the task in b/x even though
>    I started the task in b/x first
> 2. Even changing softlimit_priority (increased for b) did not help much
> 

Thank you, I'll rewrite all. But 1G/2G usage can make kswapd() run on
4GB host ? What memory usage will be just depens on usage per-zone and
if both of a/x, b/x 's usage are always over softlimit,
the result will never be 1:2, because any usage over softlimit 
can be victim and reclaimed in round-robin.
Anyway, softlimit_priority seems to be not good, I'll remove it.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH 0/5] memcg softlimit (Another one) v4
@ 2009-03-16  0:10     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 68+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-03-16  0:10 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, linux-kernel, nishimura, kosaki.motohiro

On Sun, 15 Mar 2009 00:22:46 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-03-12 09:52:47]:
> 
> > Hi, this is a patch for implemnt softlimit to memcg.
> > 
> > I did some clean up and bug fixes. 
> > 
> > Anyway I have to look into details of "LRU scan algorithm" after this.
> > 
> > How this works:
> > 
> >  (1) Set softlimit threshold to memcg.
> >      #echo 400M > /cgroups/my_group/memory.softlimit_in_bytes.
> > 
> >  (2) Define priority as victim.
> >      #echo 3 > /cgroups/my_group/memory.softlimit_priority.
> >      0 is the lowest, 8 is the highest.
> >      If "8", softlimit feature ignore this group.
> >      default value is "8".
> > 
> >  (3) Add some memory pressure and make kswapd() work.
> >      kswapd will reclaim memory from victims paying regard to priority.
> > 
> > Simple test on my 2cpu 86-64 box with 1.6Gbytes of memory (...vmware)
> > 
> >   While a process malloc 800MB of memory and touch it and sleep in a group,
> >   run kernel make -j 16 under a victim cgroup with softlimit=300M, priority=3.
> > 
> >   Without softlimit => 400MB of malloc'ed memory are swapped out.
> >   With softlimit    =>  80MB of malloc'ed memory are swapped out. 
> > 
> > I think 80MB of swap is from direct memory reclaim path. And this
> > seems not to be terrible result.
> > 
> > I'll do more test on other hosts. Any comments are welcome.
> 
> Hi, Kamezawa-San,
> 
> I tried some simple tests with this patch and the results are not
> anywhere close to expected.
> 
> 1. My setup is 4GB RAM with 4 CPUs and I boot with numa=fake=4
> 2. I setup my cgroups as follows
>    a. created /a and /b and set memory.use_hierarchy=1
>    b. created /a/x and /b/x, set their memory.softlimit_priority=1
>    c. set softlimit_in_bytes for a/x to 1G and b/x to 2G
>    d. I assigned tasks to a/x and b/x
> 
> I expected the tasks in a/x and b/x to get memory distributed in the
> ratio to 1:2. Here is what I found
> 
> 1. The task in a/x got more memory than the task in b/x even though
>    I started the task in b/x first
> 2. Even changing softlimit_priority (increased for b) did not help much
> 

Thank you, I'll rewrite all. But 1G/2G usage can make kswapd() run on
4GB host ? What memory usage will be just depens on usage per-zone and
if both of a/x, b/x 's usage are always over softlimit,
the result will never be 1:2, because any usage over softlimit 
can be victim and reclaimed in round-robin.
Anyway, softlimit_priority seems to be not good, I'll remove it.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2009-03-16  0:11 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12  0:52 [RFC][PATCH 0/5] memcg softlimit (Another one) v4 KAMEZAWA Hiroyuki
2009-03-12  0:52 ` KAMEZAWA Hiroyuki
2009-03-12  0:55 ` [BUGFIX][PATCH 1/5] memcg use correct scan number at reclaim KAMEZAWA Hiroyuki
2009-03-12  0:55   ` KAMEZAWA Hiroyuki
2009-03-12  3:49   ` Balbir Singh
2009-03-12  3:49     ` Balbir Singh
2009-03-12  3:51     ` KAMEZAWA Hiroyuki
2009-03-12  3:51       ` KAMEZAWA Hiroyuki
2009-03-12  4:00       ` Balbir Singh
2009-03-12  4:00         ` Balbir Singh
2009-03-12  4:05         ` KAMEZAWA Hiroyuki
2009-03-12  4:05           ` KAMEZAWA Hiroyuki
2009-03-12  4:14           ` Balbir Singh
2009-03-12  4:14             ` Balbir Singh
2009-03-12  4:17             ` KAMEZAWA Hiroyuki
2009-03-12  4:17               ` KAMEZAWA Hiroyuki
2009-03-12  7:45               ` KOSAKI Motohiro
2009-03-12  7:45                 ` KOSAKI Motohiro
2009-03-12  9:45                 ` Balbir Singh
2009-03-12  9:45                   ` Balbir Singh
2009-03-12 11:23                   ` KOSAKI Motohiro
2009-03-12 11:23                     ` KOSAKI Motohiro
2009-03-12  0:56 ` [RFC][PATCH 2/5] add softlimit to res_counter KAMEZAWA Hiroyuki
2009-03-12  0:56   ` KAMEZAWA Hiroyuki
2009-03-12  3:54   ` Balbir Singh
2009-03-12  3:54     ` Balbir Singh
2009-03-12  3:58     ` KAMEZAWA Hiroyuki
2009-03-12  3:58       ` KAMEZAWA Hiroyuki
2009-03-12  4:10       ` Balbir Singh
2009-03-12  4:10         ` Balbir Singh
2009-03-12  4:14         ` KAMEZAWA Hiroyuki
2009-03-12  4:14           ` KAMEZAWA Hiroyuki
2009-03-12  0:57 ` [RFC][PATCH 3/5] memcg per zone softlimit scheduler core KAMEZAWA Hiroyuki
2009-03-12  0:57   ` KAMEZAWA Hiroyuki
2009-03-12  0:58 ` [RFC][PATCH 4/5] memcg softlimit_priority KAMEZAWA Hiroyuki
2009-03-12  0:58   ` KAMEZAWA Hiroyuki
2009-03-12  1:00 ` [RFC][PATCH 5/5] memcg softlimit hooks to kswapd KAMEZAWA Hiroyuki
2009-03-12  1:00   ` KAMEZAWA Hiroyuki
2009-03-12  3:58   ` Balbir Singh
2009-03-12  3:58     ` Balbir Singh
2009-03-12  4:02     ` KAMEZAWA Hiroyuki
2009-03-12  4:02       ` KAMEZAWA Hiroyuki
2009-03-12  4:59   ` KAMEZAWA Hiroyuki
2009-03-12  4:59     ` KAMEZAWA Hiroyuki
2009-03-12  1:01 ` [RFC][PATCH 6/5] softlimit document KAMEZAWA Hiroyuki
2009-03-12  1:01   ` KAMEZAWA Hiroyuki
2009-03-12  1:54   ` Li Zefan
2009-03-12  1:54     ` Li Zefan
2009-03-12  2:01     ` KAMEZAWA Hiroyuki
2009-03-12  2:01       ` KAMEZAWA Hiroyuki
2009-03-12  3:46 ` [RFC][PATCH 0/5] memcg softlimit (Another one) v4 Balbir Singh
2009-03-12  3:46   ` Balbir Singh
2009-03-12  4:39   ` KAMEZAWA Hiroyuki
2009-03-12  4:39     ` KAMEZAWA Hiroyuki
2009-03-12  5:04     ` Balbir Singh
2009-03-12  5:04       ` Balbir Singh
2009-03-12  5:32       ` KAMEZAWA Hiroyuki
2009-03-12  5:32         ` KAMEZAWA Hiroyuki
2009-03-12  8:26         ` Balbir Singh
2009-03-12  8:26           ` Balbir Singh
2009-03-12  8:45           ` KAMEZAWA Hiroyuki
2009-03-12  8:45             ` KAMEZAWA Hiroyuki
2009-03-12  9:53             ` Balbir Singh
2009-03-12  9:53               ` Balbir Singh
2009-03-14 18:52 ` Balbir Singh
2009-03-14 18:52   ` Balbir Singh
2009-03-16  0:10   ` KAMEZAWA Hiroyuki
2009-03-16  0:10     ` KAMEZAWA Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.