All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] memcg: Reduce usage at change limit
@ 2008-06-17  3:31 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  3:31 UTC (permalink / raw)
  To: linux-mm
  Cc: LKML, balbir, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

Hi, this is a patch for reducing usage at the change of limit for memcg.
A cut out from hierarchy patch set.I added Andrew Morton to CC.

I think this version is enough neat and small. 
This patch implements only necessary things.

Change log (hierarchy set's) v4 -> (this version) v5.
 - just for changing limit.
 - avoid to add rich limit handlers to res_counter because..
   1. memcg is maybe an only user which shrink_usage can be implemented.
   2. Many objections ;)

This patch adds feedback control at set_limit as following

   -> user's request to set limit to 'val'
      1. try to set limit to 'val'
      2. at success goto 6.
      3. try to shrink usage...
      4. if we cannot make progress any more, return -EBUSY.
      5. goto 1.
      6. successs.

Tested on x86-64.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/2] memcg: Reduce usage at change limit
@ 2008-06-17  3:31 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  3:31 UTC (permalink / raw)
  To: linux-mm
  Cc: LKML, balbir, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

Hi, this is a patch for reducing usage at the change of limit for memcg.
A cut out from hierarchy patch set.I added Andrew Morton to CC.

I think this version is enough neat and small. 
This patch implements only necessary things.

Change log (hierarchy set's) v4 -> (this version) v5.
 - just for changing limit.
 - avoid to add rich limit handlers to res_counter because..
   1. memcg is maybe an only user which shrink_usage can be implemented.
   2. Many objections ;)

This patch adds feedback control at set_limit as following

   -> user's request to set limit to 'val'
      1. try to set limit to 'val'
      2. at success goto 6.
      3. try to shrink usage...
      4. if we cannot make progress any more, return -EBUSY.
      5. goto 1.
      6. successs.

Tested on x86-64.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] memcg: res counter set limit
  2008-06-17  3:31 ` KAMEZAWA Hiroyuki
@ 2008-06-17  3:33   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  3:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, balbir, xemul, menage, lizf, yamamoto, nishimura,
	Andrew Morton

Helper function of res_counter for reducing usage in subsys(memcg).

Changelog xxx -> v5.
 - new file.

Background:
 In v3, I was asked to implement generic ones to res_counter.
 In v4. I was asked not to implement generic ones because memcg is only
 controller which can reduce usage by the kernel. Okay, maybe make sense.
 In this version, adds only necessary helpers to res_counter in 
 not-invasive manner.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/res_counter.h |    7 +++++++
 kernel/res_counter.c        |   24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

Index: mm-2.6.26-rc5-mm3/include/linux/res_counter.h
===================================================================
--- mm-2.6.26-rc5-mm3.orig/include/linux/res_counter.h
+++ mm-2.6.26-rc5-mm3/include/linux/res_counter.h
@@ -136,6 +136,12 @@ static inline bool res_counter_check_und
 	return ret;
 }
 
+/*
+ * set new limit to the val. if usage > val, returns -EBUSY.
+ * returns 0 at success.
+ */
+int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit);
+
 static inline void res_counter_reset_max(struct res_counter *cnt)
 {
 	unsigned long flags;
@@ -153,4 +159,5 @@ static inline void res_counter_reset_fai
 	cnt->failcnt = 0;
 	spin_unlock_irqrestore(&cnt->lock, flags);
 }
+
 #endif
Index: mm-2.6.26-rc5-mm3/kernel/res_counter.c
===================================================================
--- mm-2.6.26-rc5-mm3.orig/kernel/res_counter.c
+++ mm-2.6.26-rc5-mm3/kernel/res_counter.c
@@ -143,3 +143,27 @@ out_free:
 out:
 	return ret;
 }
+
+
+/**
+ * res_counter_set_limit - set limit of res_counter.
+ * @cnt: the res_counter
+ * @limit: the new limit
+ *
+ * Note that res_coutner_write() allows the same kind of operation.
+ * But this returns -EBUSY when new limit < usage. If you want strict control
+ * of limit, please use this.
+ */
+int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit)
+{
+	unsigned long flags;
+	int ret = -EBUSY;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if (cnt->usage <= limit) {
+		cnt->limit = limit;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] memcg: res counter set limit
@ 2008-06-17  3:33   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  3:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, balbir, xemul, menage, lizf, yamamoto, nishimura,
	Andrew Morton

Helper function of res_counter for reducing usage in subsys(memcg).

Changelog xxx -> v5.
 - new file.

Background:
 In v3, I was asked to implement generic ones to res_counter.
 In v4. I was asked not to implement generic ones because memcg is only
 controller which can reduce usage by the kernel. Okay, maybe make sense.
 In this version, adds only necessary helpers to res_counter in 
 not-invasive manner.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/res_counter.h |    7 +++++++
 kernel/res_counter.c        |   24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

Index: mm-2.6.26-rc5-mm3/include/linux/res_counter.h
===================================================================
--- mm-2.6.26-rc5-mm3.orig/include/linux/res_counter.h
+++ mm-2.6.26-rc5-mm3/include/linux/res_counter.h
@@ -136,6 +136,12 @@ static inline bool res_counter_check_und
 	return ret;
 }
 
+/*
+ * set new limit to the val. if usage > val, returns -EBUSY.
+ * returns 0 at success.
+ */
+int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit);
+
 static inline void res_counter_reset_max(struct res_counter *cnt)
 {
 	unsigned long flags;
@@ -153,4 +159,5 @@ static inline void res_counter_reset_fai
 	cnt->failcnt = 0;
 	spin_unlock_irqrestore(&cnt->lock, flags);
 }
+
 #endif
Index: mm-2.6.26-rc5-mm3/kernel/res_counter.c
===================================================================
--- mm-2.6.26-rc5-mm3.orig/kernel/res_counter.c
+++ mm-2.6.26-rc5-mm3/kernel/res_counter.c
@@ -143,3 +143,27 @@ out_free:
 out:
 	return ret;
 }
+
+
+/**
+ * res_counter_set_limit - set limit of res_counter.
+ * @cnt: the res_counter
+ * @limit: the new limit
+ *
+ * Note that res_coutner_write() allows the same kind of operation.
+ * But this returns -EBUSY when new limit < usage. If you want strict control
+ * of limit, please use this.
+ */
+int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit)
+{
+	unsigned long flags;
+	int ret = -EBUSY;
+
+	spin_lock_irqsave(&cnt->lock, flags);
+	if (cnt->usage <= limit) {
+		cnt->limit = limit;
+		ret = 0;
+	}
+	spin_unlock_irqrestore(&cnt->lock, flags);
+	return ret;
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17  3:31 ` KAMEZAWA Hiroyuki
@ 2008-06-17  3:36   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  3:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, balbir, xemul, menage, lizf, yamamoto, nishimura,
	Andrew Morton

Reduce the usage of res_counter at the change of limit.

Changelog v4 -> v5.
 - moved "feedback" alogrithm from res_counter to memcg.

Background:
 - Now, mem->usage is not reduced at limit change. So, the users will see
   usage > limit case in memcg every time. This patch fixes it.

 Before:
 - no usage change at setting limit.
 - setting limit always returns 0 even if usage can never be less than zero.
   (This can happen when memory is locked or there is no swap.)
 - This is BUG, I think.
 After:
 - usage will be less than new limit at limit change.
 - set limit returns -EBUSY when the usage cannot be reduced.


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 Documentation/controllers/memory.txt |    3 -
 mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
 2 files changed, 56 insertions(+), 15 deletions(-)

Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
===================================================================
--- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
+++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
@@ -852,18 +852,30 @@ out:
 	css_put(&mem->css);
 	return ret;
 }
+/*
+ * try to set limit and reduce usage if necessary.
+ * returns 0 at success.
+ * returns -EBUSY if memory cannot be dropped.
+ */
 
-static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
+static inline int mem_cgroup_resize_limit(struct cgroup *cont,
+					unsigned long long val)
 {
-	*tmp = memparse(buf, &buf);
-	if (*buf != '\0')
-		return -EINVAL;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	int retry_count = 0;
+	int progress;
 
-	/*
-	 * Round up the value to the closest page size
-	 */
-	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
-	return 0;
+retry:
+	if (!res_counter_set_limit(&memcg->res, val))
+		return 0;
+	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
+		return -EBUSY;
+
+	cond_resched();
+	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
+	if (!progress)
+		retry_count++;
+	goto retry;
 }
 
 static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
@@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
 
 static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 				struct file *file, const char __user *userbuf,
-				size_t nbytes, loff_t *ppos)
+				size_t bbytes, loff_t *ppos)
 {
-	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
-				cft->private, userbuf, nbytes, ppos,
-				mem_cgroup_write_strategy);
+	char *buf, *end;
+	unsigned long long val;
+	int ret;
+
+	buf = kmalloc(bbytes + 1, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	buf[bbytes] = '\0';
+	ret = -EFAULT;
+	if (copy_from_user(buf, userbuf, bbytes))
+		goto out;
+
+	ret = -EINVAL;
+	strstrip(buf);
+	val = memparse(buf, &end);
+	if (*end != '\0')
+		goto out;
+	/* Round to page size */
+	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
+
+	switch(cft->private) {
+	case RES_LIMIT:
+		ret = mem_cgroup_resize_limit(cont, val);
+		break;
+	default:
+		ret = -EINVAL;
+		goto out;
+	}
+	if (!ret)
+		ret = bbytes;
+out:
+	kfree(buf);
+	return ret;
 }
 
 static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
===================================================================
--- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
+++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
@@ -242,8 +242,7 @@ rmdir() if there are no tasks.
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
 3. Teach controller to account for shared-pages
-4. Start reclamation when the limit is lowered
-5. Start reclamation in the background when the limit is
+4. Start reclamation in the background when the limit is
    not yet hit but the usage is getting closer
 
 Summary


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-17  3:36   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  3:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, balbir, xemul, menage, lizf, yamamoto, nishimura,
	Andrew Morton

Reduce the usage of res_counter at the change of limit.

Changelog v4 -> v5.
 - moved "feedback" alogrithm from res_counter to memcg.

Background:
 - Now, mem->usage is not reduced at limit change. So, the users will see
   usage > limit case in memcg every time. This patch fixes it.

 Before:
 - no usage change at setting limit.
 - setting limit always returns 0 even if usage can never be less than zero.
   (This can happen when memory is locked or there is no swap.)
 - This is BUG, I think.
 After:
 - usage will be less than new limit at limit change.
 - set limit returns -EBUSY when the usage cannot be reduced.


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 Documentation/controllers/memory.txt |    3 -
 mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
 2 files changed, 56 insertions(+), 15 deletions(-)

Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
===================================================================
--- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
+++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
@@ -852,18 +852,30 @@ out:
 	css_put(&mem->css);
 	return ret;
 }
+/*
+ * try to set limit and reduce usage if necessary.
+ * returns 0 at success.
+ * returns -EBUSY if memory cannot be dropped.
+ */
 
-static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
+static inline int mem_cgroup_resize_limit(struct cgroup *cont,
+					unsigned long long val)
 {
-	*tmp = memparse(buf, &buf);
-	if (*buf != '\0')
-		return -EINVAL;
+	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
+	int retry_count = 0;
+	int progress;
 
-	/*
-	 * Round up the value to the closest page size
-	 */
-	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
-	return 0;
+retry:
+	if (!res_counter_set_limit(&memcg->res, val))
+		return 0;
+	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
+		return -EBUSY;
+
+	cond_resched();
+	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
+	if (!progress)
+		retry_count++;
+	goto retry;
 }
 
 static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
@@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
 
 static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 				struct file *file, const char __user *userbuf,
-				size_t nbytes, loff_t *ppos)
+				size_t bbytes, loff_t *ppos)
 {
-	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
-				cft->private, userbuf, nbytes, ppos,
-				mem_cgroup_write_strategy);
+	char *buf, *end;
+	unsigned long long val;
+	int ret;
+
+	buf = kmalloc(bbytes + 1, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	buf[bbytes] = '\0';
+	ret = -EFAULT;
+	if (copy_from_user(buf, userbuf, bbytes))
+		goto out;
+
+	ret = -EINVAL;
+	strstrip(buf);
+	val = memparse(buf, &end);
+	if (*end != '\0')
+		goto out;
+	/* Round to page size */
+	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
+
+	switch(cft->private) {
+	case RES_LIMIT:
+		ret = mem_cgroup_resize_limit(cont, val);
+		break;
+	default:
+		ret = -EINVAL;
+		goto out;
+	}
+	if (!ret)
+		ret = bbytes;
+out:
+	kfree(buf);
+	return ret;
 }
 
 static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
===================================================================
--- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
+++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
@@ -242,8 +242,7 @@ rmdir() if there are no tasks.
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
 3. Teach controller to account for shared-pages
-4. Start reclamation when the limit is lowered
-5. Start reclamation in the background when the limit is
+4. Start reclamation in the background when the limit is
    not yet hit but the usage is getting closer
 
 Summary

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] memcg: res counter set limit
  2008-06-17  3:33   ` KAMEZAWA Hiroyuki
@ 2008-06-17  3:40     ` Balbir Singh
  -1 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2008-06-17  3:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> Helper function of res_counter for reducing usage in subsys(memcg).
> 
> Changelog xxx -> v5.
>  - new file.
> 
> Background:
>  In v3, I was asked to implement generic ones to res_counter.
>  In v4. I was asked not to implement generic ones because memcg is only
>  controller which can reduce usage by the kernel. Okay, maybe make sense.
>  In this version, adds only necessary helpers to res_counter in 
>  not-invasive manner.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  include/linux/res_counter.h |    7 +++++++
>  kernel/res_counter.c        |   24 ++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> Index: mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/include/linux/res_counter.h
> +++ mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> @@ -136,6 +136,12 @@ static inline bool res_counter_check_und
>  	return ret;
>  }
> 
> +/*
> + * set new limit to the val. if usage > val, returns -EBUSY.
> + * returns 0 at success.
> + */
> +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit);
> +
>  static inline void res_counter_reset_max(struct res_counter *cnt)
>  {
>  	unsigned long flags;
> @@ -153,4 +159,5 @@ static inline void res_counter_reset_fai
>  	cnt->failcnt = 0;
>  	spin_unlock_irqrestore(&cnt->lock, flags);
>  }
> +

I don't understand this extra newline here

>  #endif
> Index: mm-2.6.26-rc5-mm3/kernel/res_counter.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/kernel/res_counter.c
> +++ mm-2.6.26-rc5-mm3/kernel/res_counter.c
> @@ -143,3 +143,27 @@ out_free:
>  out:
>  	return ret;
>  }
> +
> +
> +/**
> + * res_counter_set_limit - set limit of res_counter.
> + * @cnt: the res_counter
> + * @limit: the new limit
> + *
> + * Note that res_coutner_write() allows the same kind of operation.
> + * But this returns -EBUSY when new limit < usage. If you want strict control
> + * of limit, please use this.
> + */
> +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit)
> +{
> +	unsigned long flags;
> +	int ret = -EBUSY;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	if (cnt->usage <= limit) {
> +		cnt->limit = limit;
> +		ret = 0;
> +	}
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}

Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] memcg: res counter set limit
@ 2008-06-17  3:40     ` Balbir Singh
  0 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2008-06-17  3:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> Helper function of res_counter for reducing usage in subsys(memcg).
> 
> Changelog xxx -> v5.
>  - new file.
> 
> Background:
>  In v3, I was asked to implement generic ones to res_counter.
>  In v4. I was asked not to implement generic ones because memcg is only
>  controller which can reduce usage by the kernel. Okay, maybe make sense.
>  In this version, adds only necessary helpers to res_counter in 
>  not-invasive manner.
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  include/linux/res_counter.h |    7 +++++++
>  kernel/res_counter.c        |   24 ++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> Index: mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/include/linux/res_counter.h
> +++ mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> @@ -136,6 +136,12 @@ static inline bool res_counter_check_und
>  	return ret;
>  }
> 
> +/*
> + * set new limit to the val. if usage > val, returns -EBUSY.
> + * returns 0 at success.
> + */
> +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit);
> +
>  static inline void res_counter_reset_max(struct res_counter *cnt)
>  {
>  	unsigned long flags;
> @@ -153,4 +159,5 @@ static inline void res_counter_reset_fai
>  	cnt->failcnt = 0;
>  	spin_unlock_irqrestore(&cnt->lock, flags);
>  }
> +

I don't understand this extra newline here

>  #endif
> Index: mm-2.6.26-rc5-mm3/kernel/res_counter.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/kernel/res_counter.c
> +++ mm-2.6.26-rc5-mm3/kernel/res_counter.c
> @@ -143,3 +143,27 @@ out_free:
>  out:
>  	return ret;
>  }
> +
> +
> +/**
> + * res_counter_set_limit - set limit of res_counter.
> + * @cnt: the res_counter
> + * @limit: the new limit
> + *
> + * Note that res_coutner_write() allows the same kind of operation.
> + * But this returns -EBUSY when new limit < usage. If you want strict control
> + * of limit, please use this.
> + */
> +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit)
> +{
> +	unsigned long flags;
> +	int ret = -EBUSY;
> +
> +	spin_lock_irqsave(&cnt->lock, flags);
> +	if (cnt->usage <= limit) {
> +		cnt->limit = limit;
> +		ret = 0;
> +	}
> +	spin_unlock_irqrestore(&cnt->lock, flags);
> +	return ret;
> +}

Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17  3:36   ` KAMEZAWA Hiroyuki
@ 2008-06-17  3:46     ` Balbir Singh
  -1 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2008-06-17  3:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> Reduce the usage of res_counter at the change of limit.
> 
> Changelog v4 -> v5.
>  - moved "feedback" alogrithm from res_counter to memcg.
> 
> Background:
>  - Now, mem->usage is not reduced at limit change. So, the users will see
>    usage > limit case in memcg every time. This patch fixes it.
> 
>  Before:
>  - no usage change at setting limit.
>  - setting limit always returns 0 even if usage can never be less than zero.
>    (This can happen when memory is locked or there is no swap.)
>  - This is BUG, I think.
>  After:
>  - usage will be less than new limit at limit change.
>  - set limit returns -EBUSY when the usage cannot be reduced.
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  Documentation/controllers/memory.txt |    3 -
>  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
>  2 files changed, 56 insertions(+), 15 deletions(-)
> 
> Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -852,18 +852,30 @@ out:
>  	css_put(&mem->css);
>  	return ret;
>  }
> +/*
> + * try to set limit and reduce usage if necessary.
> + * returns 0 at success.
> + * returns -EBUSY if memory cannot be dropped.
> + */
> 
> -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> +					unsigned long long val)
>  {
> -	*tmp = memparse(buf, &buf);
> -	if (*buf != '\0')
> -		return -EINVAL;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	int retry_count = 0;
> +	int progress;
> 
> -	/*
> -	 * Round up the value to the closest page size
> -	 */
> -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> -	return 0;
> +retry:
> +	if (!res_counter_set_limit(&memcg->res, val))
> +		return 0;
> +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> +		return -EBUSY;
> +
> +	cond_resched();

Do we really need this? We do have cond_resched in shrink_page_list(),
shrink_active_list(), do we need it here as well?

> +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> +	if (!progress)
> +		retry_count++;
> +	goto retry;

I don't like upward goto's. Can't we convert this to a nice do {} while or
while() loop?

>  }
> 
>  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
> 
>  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  				struct file *file, const char __user *userbuf,
> -				size_t nbytes, loff_t *ppos)
> +				size_t bbytes, loff_t *ppos)
>  {
> -	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> -				cft->private, userbuf, nbytes, ppos,
> -				mem_cgroup_write_strategy);
> +	char *buf, *end;
> +	unsigned long long val;
> +	int ret;
> +
> +	buf = kmalloc(bbytes + 1, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +	buf[bbytes] = '\0';
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, userbuf, bbytes))
> +		goto out;
> +
> +	ret = -EINVAL;
> +	strstrip(buf);
> +	val = memparse(buf, &end);
> +	if (*end != '\0')
> +		goto out;
> +	/* Round to page size */
> +	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> +
> +	switch(cft->private) {
> +	case RES_LIMIT:
> +		ret = mem_cgroup_resize_limit(cont, val);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	if (!ret)
> +		ret = bbytes;
> +out:
> +	kfree(buf);
> +	return ret;
>  }
> 
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
>  3. Teach controller to account for shared-pages
> -4. Start reclamation when the limit is lowered
> -5. Start reclamation in the background when the limit is
> +4. Start reclamation in the background when the limit is
>     not yet hit but the usage is getting closer

Except for the minor nits

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-17  3:46     ` Balbir Singh
  0 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2008-06-17  3:46 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:
> Reduce the usage of res_counter at the change of limit.
> 
> Changelog v4 -> v5.
>  - moved "feedback" alogrithm from res_counter to memcg.
> 
> Background:
>  - Now, mem->usage is not reduced at limit change. So, the users will see
>    usage > limit case in memcg every time. This patch fixes it.
> 
>  Before:
>  - no usage change at setting limit.
>  - setting limit always returns 0 even if usage can never be less than zero.
>    (This can happen when memory is locked or there is no swap.)
>  - This is BUG, I think.
>  After:
>  - usage will be less than new limit at limit change.
>  - set limit returns -EBUSY when the usage cannot be reduced.
> 
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  Documentation/controllers/memory.txt |    3 -
>  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
>  2 files changed, 56 insertions(+), 15 deletions(-)
> 
> Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -852,18 +852,30 @@ out:
>  	css_put(&mem->css);
>  	return ret;
>  }
> +/*
> + * try to set limit and reduce usage if necessary.
> + * returns 0 at success.
> + * returns -EBUSY if memory cannot be dropped.
> + */
> 
> -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> +					unsigned long long val)
>  {
> -	*tmp = memparse(buf, &buf);
> -	if (*buf != '\0')
> -		return -EINVAL;
> +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +	int retry_count = 0;
> +	int progress;
> 
> -	/*
> -	 * Round up the value to the closest page size
> -	 */
> -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> -	return 0;
> +retry:
> +	if (!res_counter_set_limit(&memcg->res, val))
> +		return 0;
> +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> +		return -EBUSY;
> +
> +	cond_resched();

Do we really need this? We do have cond_resched in shrink_page_list(),
shrink_active_list(), do we need it here as well?

> +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> +	if (!progress)
> +		retry_count++;
> +	goto retry;

I don't like upward goto's. Can't we convert this to a nice do {} while or
while() loop?

>  }
> 
>  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
> 
>  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  				struct file *file, const char __user *userbuf,
> -				size_t nbytes, loff_t *ppos)
> +				size_t bbytes, loff_t *ppos)
>  {
> -	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> -				cft->private, userbuf, nbytes, ppos,
> -				mem_cgroup_write_strategy);
> +	char *buf, *end;
> +	unsigned long long val;
> +	int ret;
> +
> +	buf = kmalloc(bbytes + 1, GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +	buf[bbytes] = '\0';
> +	ret = -EFAULT;
> +	if (copy_from_user(buf, userbuf, bbytes))
> +		goto out;
> +
> +	ret = -EINVAL;
> +	strstrip(buf);
> +	val = memparse(buf, &end);
> +	if (*end != '\0')
> +		goto out;
> +	/* Round to page size */
> +	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> +
> +	switch(cft->private) {
> +	case RES_LIMIT:
> +		ret = mem_cgroup_resize_limit(cont, val);
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +	if (!ret)
> +		ret = bbytes;
> +out:
> +	kfree(buf);
> +	return ret;
>  }
> 
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
>  3. Teach controller to account for shared-pages
> -4. Start reclamation when the limit is lowered
> -5. Start reclamation in the background when the limit is
> +4. Start reclamation in the background when the limit is
>     not yet hit but the usage is getting closer

Except for the minor nits

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] memcg: res counter set limit
  2008-06-17  3:40     ` Balbir Singh
@ 2008-06-17  4:05       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  4:05 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

On Tue, 17 Jun 2008 09:10:38 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Helper function of res_counter for reducing usage in subsys(memcg).
> > 
> > Changelog xxx -> v5.
> >  - new file.
> > 
> > Background:
> >  In v3, I was asked to implement generic ones to res_counter.
> >  In v4. I was asked not to implement generic ones because memcg is only
> >  controller which can reduce usage by the kernel. Okay, maybe make sense.
> >  In this version, adds only necessary helpers to res_counter in 
> >  not-invasive manner.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > ---
> >  include/linux/res_counter.h |    7 +++++++
> >  kernel/res_counter.c        |   24 ++++++++++++++++++++++++
> >  2 files changed, 31 insertions(+)
> > 
> > Index: mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/include/linux/res_counter.h
> > +++ mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> > @@ -136,6 +136,12 @@ static inline bool res_counter_check_und
> >  	return ret;
> >  }
> > 
> > +/*
> > + * set new limit to the val. if usage > val, returns -EBUSY.
> > + * returns 0 at success.
> > + */
> > +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit);
> > +
> >  static inline void res_counter_reset_max(struct res_counter *cnt)
> >  {
> >  	unsigned long flags;
> > @@ -153,4 +159,5 @@ static inline void res_counter_reset_fai
> >  	cnt->failcnt = 0;
> >  	spin_unlock_irqrestore(&cnt->lock, flags);
> >  }
> > +
> 
> I don't understand this extra newline here
> 
Sorry. I'll remove this ;)

Thanks,
-Kame


> >  #endif
> > Index: mm-2.6.26-rc5-mm3/kernel/res_counter.c
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/kernel/res_counter.c
> > +++ mm-2.6.26-rc5-mm3/kernel/res_counter.c
> > @@ -143,3 +143,27 @@ out_free:
> >  out:
> >  	return ret;
> >  }
> > +
> > +
> > +/**
> > + * res_counter_set_limit - set limit of res_counter.
> > + * @cnt: the res_counter
> > + * @limit: the new limit
> > + *
> > + * Note that res_coutner_write() allows the same kind of operation.
> > + * But this returns -EBUSY when new limit < usage. If you want strict control
> > + * of limit, please use this.
> > + */
> > +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit)
> > +{
> > +	unsigned long flags;
> > +	int ret = -EBUSY;
> > +
> > +	spin_lock_irqsave(&cnt->lock, flags);
> > +	if (cnt->usage <= limit) {
> > +		cnt->limit = limit;
> > +		ret = 0;
> > +	}
> > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	return ret;
> > +}
> 
> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> -- 
> 	Warm Regards,
> 	Balbir Singh
> 	Linux Technology Center
> 	IBM, ISTL
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] memcg: res counter set limit
@ 2008-06-17  4:05       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  4:05 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

On Tue, 17 Jun 2008 09:10:38 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Helper function of res_counter for reducing usage in subsys(memcg).
> > 
> > Changelog xxx -> v5.
> >  - new file.
> > 
> > Background:
> >  In v3, I was asked to implement generic ones to res_counter.
> >  In v4. I was asked not to implement generic ones because memcg is only
> >  controller which can reduce usage by the kernel. Okay, maybe make sense.
> >  In this version, adds only necessary helpers to res_counter in 
> >  not-invasive manner.
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > ---
> >  include/linux/res_counter.h |    7 +++++++
> >  kernel/res_counter.c        |   24 ++++++++++++++++++++++++
> >  2 files changed, 31 insertions(+)
> > 
> > Index: mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/include/linux/res_counter.h
> > +++ mm-2.6.26-rc5-mm3/include/linux/res_counter.h
> > @@ -136,6 +136,12 @@ static inline bool res_counter_check_und
> >  	return ret;
> >  }
> > 
> > +/*
> > + * set new limit to the val. if usage > val, returns -EBUSY.
> > + * returns 0 at success.
> > + */
> > +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit);
> > +
> >  static inline void res_counter_reset_max(struct res_counter *cnt)
> >  {
> >  	unsigned long flags;
> > @@ -153,4 +159,5 @@ static inline void res_counter_reset_fai
> >  	cnt->failcnt = 0;
> >  	spin_unlock_irqrestore(&cnt->lock, flags);
> >  }
> > +
> 
> I don't understand this extra newline here
> 
Sorry. I'll remove this ;)

Thanks,
-Kame


> >  #endif
> > Index: mm-2.6.26-rc5-mm3/kernel/res_counter.c
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/kernel/res_counter.c
> > +++ mm-2.6.26-rc5-mm3/kernel/res_counter.c
> > @@ -143,3 +143,27 @@ out_free:
> >  out:
> >  	return ret;
> >  }
> > +
> > +
> > +/**
> > + * res_counter_set_limit - set limit of res_counter.
> > + * @cnt: the res_counter
> > + * @limit: the new limit
> > + *
> > + * Note that res_coutner_write() allows the same kind of operation.
> > + * But this returns -EBUSY when new limit < usage. If you want strict control
> > + * of limit, please use this.
> > + */
> > +int res_counter_set_limit(struct res_counter *cnt, unsigned long long limit)
> > +{
> > +	unsigned long flags;
> > +	int ret = -EBUSY;
> > +
> > +	spin_lock_irqsave(&cnt->lock, flags);
> > +	if (cnt->usage <= limit) {
> > +		cnt->limit = limit;
> > +		ret = 0;
> > +	}
> > +	spin_unlock_irqrestore(&cnt->lock, flags);
> > +	return ret;
> > +}
> 
> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> -- 
> 	Warm Regards,
> 	Balbir Singh
> 	Linux Technology Center
> 	IBM, ISTL
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17  3:46     ` Balbir Singh
@ 2008-06-17  4:06       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  4:06 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

On Tue, 17 Jun 2008 09:16:31 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Reduce the usage of res_counter at the change of limit.
> > 
> > Changelog v4 -> v5.
> >  - moved "feedback" alogrithm from res_counter to memcg.
> > 
> > Background:
> >  - Now, mem->usage is not reduced at limit change. So, the users will see
> >    usage > limit case in memcg every time. This patch fixes it.
> > 
> >  Before:
> >  - no usage change at setting limit.
> >  - setting limit always returns 0 even if usage can never be less than zero.
> >    (This can happen when memory is locked or there is no swap.)
> >  - This is BUG, I think.
> >  After:
> >  - usage will be less than new limit at limit change.
> >  - set limit returns -EBUSY when the usage cannot be reduced.
> > 
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > ---
> >  Documentation/controllers/memory.txt |    3 -
> >  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
> >  2 files changed, 56 insertions(+), 15 deletions(-)
> > 
> > Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> > +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > @@ -852,18 +852,30 @@ out:
> >  	css_put(&mem->css);
> >  	return ret;
> >  }
> > +/*
> > + * try to set limit and reduce usage if necessary.
> > + * returns 0 at success.
> > + * returns -EBUSY if memory cannot be dropped.
> > + */
> > 
> > -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> > +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> > +					unsigned long long val)
> >  {
> > -	*tmp = memparse(buf, &buf);
> > -	if (*buf != '\0')
> > -		return -EINVAL;
> > +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > +	int retry_count = 0;
> > +	int progress;
> > 
> > -	/*
> > -	 * Round up the value to the closest page size
> > -	 */
> > -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> > -	return 0;
> > +retry:
> > +	if (!res_counter_set_limit(&memcg->res, val))
> > +		return 0;
> > +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> > +		return -EBUSY;
> > +
> > +	cond_resched();
> 
> Do we really need this? We do have cond_resched in shrink_page_list(),
> shrink_active_list(), do we need it here as well?
> 
I'd like to add this when adding a busy loop. But ok, will remove.

> > +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> > +	if (!progress)
> > +		retry_count++;
> > +	goto retry;
> 
> I don't like upward goto's. Can't we convert this to a nice do {} while or
> while() loop?
> 
Hmm, ok.

I'll repost later, today.

Thanks,
-Kame
> >  }
> > 
> >  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> > @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
> > 
> >  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> >  				struct file *file, const char __user *userbuf,
> > -				size_t nbytes, loff_t *ppos)
> > +				size_t bbytes, loff_t *ppos)
> >  {
> > -	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> > -				cft->private, userbuf, nbytes, ppos,
> > -				mem_cgroup_write_strategy);
> > +	char *buf, *end;
> > +	unsigned long long val;
> > +	int ret;
> > +
> > +	buf = kmalloc(bbytes + 1, GFP_KERNEL);
> > +	if (!buf)
> > +		return -ENOMEM;
> > +	buf[bbytes] = '\0';
> > +	ret = -EFAULT;
> > +	if (copy_from_user(buf, userbuf, bbytes))
> > +		goto out;
> > +
> > +	ret = -EINVAL;
> > +	strstrip(buf);
> > +	val = memparse(buf, &end);
> > +	if (*end != '\0')
> > +		goto out;
> > +	/* Round to page size */
> > +	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> > +
> > +	switch(cft->private) {
> > +	case RES_LIMIT:
> > +		ret = mem_cgroup_resize_limit(cont, val);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +	if (!ret)
> > +		ret = bbytes;
> > +out:
> > +	kfree(buf);
> > +	return ret;
> >  }
> > 
> >  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> > Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> > +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> > @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
> >  1. Add support for accounting huge pages (as a separate controller)
> >  2. Make per-cgroup scanner reclaim not-shared pages first
> >  3. Teach controller to account for shared-pages
> > -4. Start reclamation when the limit is lowered
> > -5. Start reclamation in the background when the limit is
> > +4. Start reclamation in the background when the limit is
> >     not yet hit but the usage is getting closer
> 
> Except for the minor nits
> 
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> -- 
> 	Warm Regards,
> 	Balbir Singh
> 	Linux Technology Center
> 	IBM, ISTL
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-17  4:06       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17  4:06 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

On Tue, 17 Jun 2008 09:16:31 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > Reduce the usage of res_counter at the change of limit.
> > 
> > Changelog v4 -> v5.
> >  - moved "feedback" alogrithm from res_counter to memcg.
> > 
> > Background:
> >  - Now, mem->usage is not reduced at limit change. So, the users will see
> >    usage > limit case in memcg every time. This patch fixes it.
> > 
> >  Before:
> >  - no usage change at setting limit.
> >  - setting limit always returns 0 even if usage can never be less than zero.
> >    (This can happen when memory is locked or there is no swap.)
> >  - This is BUG, I think.
> >  After:
> >  - usage will be less than new limit at limit change.
> >  - set limit returns -EBUSY when the usage cannot be reduced.
> > 
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > 
> > ---
> >  Documentation/controllers/memory.txt |    3 -
> >  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
> >  2 files changed, 56 insertions(+), 15 deletions(-)
> > 
> > Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> > +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > @@ -852,18 +852,30 @@ out:
> >  	css_put(&mem->css);
> >  	return ret;
> >  }
> > +/*
> > + * try to set limit and reduce usage if necessary.
> > + * returns 0 at success.
> > + * returns -EBUSY if memory cannot be dropped.
> > + */
> > 
> > -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> > +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> > +					unsigned long long val)
> >  {
> > -	*tmp = memparse(buf, &buf);
> > -	if (*buf != '\0')
> > -		return -EINVAL;
> > +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > +	int retry_count = 0;
> > +	int progress;
> > 
> > -	/*
> > -	 * Round up the value to the closest page size
> > -	 */
> > -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> > -	return 0;
> > +retry:
> > +	if (!res_counter_set_limit(&memcg->res, val))
> > +		return 0;
> > +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> > +		return -EBUSY;
> > +
> > +	cond_resched();
> 
> Do we really need this? We do have cond_resched in shrink_page_list(),
> shrink_active_list(), do we need it here as well?
> 
I'd like to add this when adding a busy loop. But ok, will remove.

> > +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> > +	if (!progress)
> > +		retry_count++;
> > +	goto retry;
> 
> I don't like upward goto's. Can't we convert this to a nice do {} while or
> while() loop?
> 
Hmm, ok.

I'll repost later, today.

Thanks,
-Kame
> >  }
> > 
> >  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> > @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
> > 
> >  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> >  				struct file *file, const char __user *userbuf,
> > -				size_t nbytes, loff_t *ppos)
> > +				size_t bbytes, loff_t *ppos)
> >  {
> > -	return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> > -				cft->private, userbuf, nbytes, ppos,
> > -				mem_cgroup_write_strategy);
> > +	char *buf, *end;
> > +	unsigned long long val;
> > +	int ret;
> > +
> > +	buf = kmalloc(bbytes + 1, GFP_KERNEL);
> > +	if (!buf)
> > +		return -ENOMEM;
> > +	buf[bbytes] = '\0';
> > +	ret = -EFAULT;
> > +	if (copy_from_user(buf, userbuf, bbytes))
> > +		goto out;
> > +
> > +	ret = -EINVAL;
> > +	strstrip(buf);
> > +	val = memparse(buf, &end);
> > +	if (*end != '\0')
> > +		goto out;
> > +	/* Round to page size */
> > +	val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> > +
> > +	switch(cft->private) {
> > +	case RES_LIMIT:
> > +		ret = mem_cgroup_resize_limit(cont, val);
> > +		break;
> > +	default:
> > +		ret = -EINVAL;
> > +		goto out;
> > +	}
> > +	if (!ret)
> > +		ret = bbytes;
> > +out:
> > +	kfree(buf);
> > +	return ret;
> >  }
> > 
> >  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> > Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> > ===================================================================
> > --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> > +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> > @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
> >  1. Add support for accounting huge pages (as a separate controller)
> >  2. Make per-cgroup scanner reclaim not-shared pages first
> >  3. Teach controller to account for shared-pages
> > -4. Start reclamation when the limit is lowered
> > -5. Start reclamation in the background when the limit is
> > +4. Start reclamation in the background when the limit is
> >     not yet hit but the usage is getting closer
> 
> Except for the minor nits
> 
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> -- 
> 	Warm Regards,
> 	Balbir Singh
> 	Linux Technology Center
> 	IBM, ISTL
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17  4:06       ` KAMEZAWA Hiroyuki
@ 2008-06-17 10:00         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17 10:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura,
	Andrew Morton

On Tue, 17 Jun 2008 13:06:56 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 17 Jun 2008 09:16:31 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > KAMEZAWA Hiroyuki wrote:
> > > Reduce the usage of res_counter at the change of limit.
> > > 
> > > Changelog v4 -> v5.
> > >  - moved "feedback" alogrithm from res_counter to memcg.
> > > 
> > > Background:
> > >  - Now, mem->usage is not reduced at limit change. So, the users will see
> > >    usage > limit case in memcg every time. This patch fixes it.
> > > 
> > >  Before:
> > >  - no usage change at setting limit.
> > >  - setting limit always returns 0 even if usage can never be less than zero.
> > >    (This can happen when memory is locked or there is no swap.)
> > >  - This is BUG, I think.
> > >  After:
> > >  - usage will be less than new limit at limit change.
> > >  - set limit returns -EBUSY when the usage cannot be reduced.
> > > 
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > ---
> > >  Documentation/controllers/memory.txt |    3 -
> > >  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
> > >  2 files changed, 56 insertions(+), 15 deletions(-)
> > > 
> > > Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > > ===================================================================
> > > --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> > > +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > > @@ -852,18 +852,30 @@ out:
> > >  	css_put(&mem->css);
> > >  	return ret;
> > >  }
> > > +/*
> > > + * try to set limit and reduce usage if necessary.
> > > + * returns 0 at success.
> > > + * returns -EBUSY if memory cannot be dropped.
> > > + */
> > > 
> > > -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> > > +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> > > +					unsigned long long val)
> > >  {
> > > -	*tmp = memparse(buf, &buf);
> > > -	if (*buf != '\0')
> > > -		return -EINVAL;
> > > +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > > +	int retry_count = 0;
> > > +	int progress;
> > > 
> > > -	/*
> > > -	 * Round up the value to the closest page size
> > > -	 */
> > > -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> > > -	return 0;
> > > +retry:
> > > +	if (!res_counter_set_limit(&memcg->res, val))
> > > +		return 0;
> > > +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> > > +		return -EBUSY;
> > > +
> > > +	cond_resched();
> > 
> > Do we really need this? We do have cond_resched in shrink_page_list(),
> > shrink_active_list(), do we need it here as well?
> > 
> I'd like to add this when adding a busy loop. But ok, will remove.
> 
> > > +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> > > +	if (!progress)
> > > +		retry_count++;
> > > +	goto retry;
> > 
> > I don't like upward goto's. Can't we convert this to a nice do {} while or
> > while() loop?
> > 
> Hmm, ok.
> 
> I'll repost later, today.
> 
I'll postpone this until -mm is settled ;)

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-17 10:00         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 24+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-17 10:00 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: balbir, linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura,
	Andrew Morton

On Tue, 17 Jun 2008 13:06:56 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 17 Jun 2008 09:16:31 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > KAMEZAWA Hiroyuki wrote:
> > > Reduce the usage of res_counter at the change of limit.
> > > 
> > > Changelog v4 -> v5.
> > >  - moved "feedback" alogrithm from res_counter to memcg.
> > > 
> > > Background:
> > >  - Now, mem->usage is not reduced at limit change. So, the users will see
> > >    usage > limit case in memcg every time. This patch fixes it.
> > > 
> > >  Before:
> > >  - no usage change at setting limit.
> > >  - setting limit always returns 0 even if usage can never be less than zero.
> > >    (This can happen when memory is locked or there is no swap.)
> > >  - This is BUG, I think.
> > >  After:
> > >  - usage will be less than new limit at limit change.
> > >  - set limit returns -EBUSY when the usage cannot be reduced.
> > > 
> > > 
> > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > 
> > > ---
> > >  Documentation/controllers/memory.txt |    3 -
> > >  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
> > >  2 files changed, 56 insertions(+), 15 deletions(-)
> > > 
> > > Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > > ===================================================================
> > > --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> > > +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> > > @@ -852,18 +852,30 @@ out:
> > >  	css_put(&mem->css);
> > >  	return ret;
> > >  }
> > > +/*
> > > + * try to set limit and reduce usage if necessary.
> > > + * returns 0 at success.
> > > + * returns -EBUSY if memory cannot be dropped.
> > > + */
> > > 
> > > -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> > > +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> > > +					unsigned long long val)
> > >  {
> > > -	*tmp = memparse(buf, &buf);
> > > -	if (*buf != '\0')
> > > -		return -EINVAL;
> > > +	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> > > +	int retry_count = 0;
> > > +	int progress;
> > > 
> > > -	/*
> > > -	 * Round up the value to the closest page size
> > > -	 */
> > > -	*tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> > > -	return 0;
> > > +retry:
> > > +	if (!res_counter_set_limit(&memcg->res, val))
> > > +		return 0;
> > > +	if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> > > +		return -EBUSY;
> > > +
> > > +	cond_resched();
> > 
> > Do we really need this? We do have cond_resched in shrink_page_list(),
> > shrink_active_list(), do we need it here as well?
> > 
> I'd like to add this when adding a busy loop. But ok, will remove.
> 
> > > +	progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> > > +	if (!progress)
> > > +		retry_count++;
> > > +	goto retry;
> > 
> > I don't like upward goto's. Can't we convert this to a nice do {} while or
> > while() loop?
> > 
> Hmm, ok.
> 
> I'll repost later, today.
> 
I'll postpone this until -mm is settled ;)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17 10:00         ` KAMEZAWA Hiroyuki
@ 2008-06-17 10:14           ` Balbir Singh
  -1 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2008-06-17 10:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:

>> I'll repost later, today.
>>
> I'll postpone this until -mm is settled ;)
> 

Sure, by -mm is settled you mean scalable page reclaim, fast GUP and lockless
read size for pagecache? Is there something else I am unaware of?

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-17 10:14           ` Balbir Singh
  0 siblings, 0 replies; 24+ messages in thread
From: Balbir Singh @ 2008-06-17 10:14 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, xemul, menage, lizf, yamamoto, nishimura, Andrew Morton

KAMEZAWA Hiroyuki wrote:

>> I'll repost later, today.
>>
> I'll postpone this until -mm is settled ;)
> 

Sure, by -mm is settled you mean scalable page reclaim, fast GUP and lockless
read size for pagecache? Is there something else I am unaware of?

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17 10:00         ` KAMEZAWA Hiroyuki
@ 2008-06-17 12:03           ` kamezawa.hiroyu
  -1 siblings, 0 replies; 24+ messages in thread
From: kamezawa.hiroyu @ 2008-06-17 12:03 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, xemul, menage, lizf, yamamoto,
	nishimura, Andrew Morton

----- Original Message -----
>Date: 	Tue, 17 Jun 2008 15:44:53 +0530
>From: Balbir Singh <balbir@linux.vnet.ibm.com>
>KAMEZAWA Hiroyuki wrote:
>
>>> I'll repost later, today.
>>>
>> I'll postpone this until -mm is settled ;)
>> 
>
>Sure, by -mm is settled you mean scalable page reclaim, fast GUP and lockless
>read size for pagecache? Is there something else I am unaware of?
>
Ah, some panics are added ;)
It's my main concern.  And I have to study new VM LRU management
scheme and check whether there are something to be fixed(updated)
around memcg. 

Anyway, I'd like to push this patch before too late. I just stop
for a while (because -mm has trouble).

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-17 12:03           ` kamezawa.hiroyu
  0 siblings, 0 replies; 24+ messages in thread
From: kamezawa.hiroyu @ 2008-06-17 12:03 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, xemul, menage, lizf, yamamoto,
	nishimura, Andrew Morton

----- Original Message -----
>Date: 	Tue, 17 Jun 2008 15:44:53 +0530
>From: Balbir Singh <balbir@linux.vnet.ibm.com>
>KAMEZAWA Hiroyuki wrote:
>
>>> I'll repost later, today.
>>>
>> I'll postpone this until -mm is settled ;)
>> 
>
>Sure, by -mm is settled you mean scalable page reclaim, fast GUP and lockless
>read size for pagecache? Is there something else I am unaware of?
>
Ah, some panics are added ;)
It's my main concern.  And I have to study new VM LRU management
scheme and check whether there are something to be fixed(updated)
around memcg. 

Anyway, I'd like to push this patch before too late. I just stop
for a while (because -mm has trouble).

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17  3:36   ` KAMEZAWA Hiroyuki
@ 2008-06-20  5:16     ` Paul Menage
  -1 siblings, 0 replies; 24+ messages in thread
From: Paul Menage @ 2008-06-20  5:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, balbir, xemul, lizf, yamamoto, nishimura, Andrew Morton

On Mon, Jun 16, 2008 at 8:36 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Reduce the usage of res_counter at the change of limit.
>
> Changelog v4 -> v5.
>  - moved "feedback" alogrithm from res_counter to memcg.

FWIW, I really thought it was much better having it in the generic res_counter.

Paul

>
> Background:
>  - Now, mem->usage is not reduced at limit change. So, the users will see
>   usage > limit case in memcg every time. This patch fixes it.
>
>  Before:
>  - no usage change at setting limit.
>  - setting limit always returns 0 even if usage can never be less than zero.
>   (This can happen when memory is locked or there is no swap.)
>  - This is BUG, I think.
>  After:
>  - usage will be less than new limit at limit change.
>  - set limit returns -EBUSY when the usage cannot be reduced.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
>  Documentation/controllers/memory.txt |    3 -
>  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
>  2 files changed, 56 insertions(+), 15 deletions(-)
>
> Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -852,18 +852,30 @@ out:
>        css_put(&mem->css);
>        return ret;
>  }
> +/*
> + * try to set limit and reduce usage if necessary.
> + * returns 0 at success.
> + * returns -EBUSY if memory cannot be dropped.
> + */
>
> -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> +                                       unsigned long long val)
>  {
> -       *tmp = memparse(buf, &buf);
> -       if (*buf != '\0')
> -               return -EINVAL;
> +       struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +       int retry_count = 0;
> +       int progress;
>
> -       /*
> -        * Round up the value to the closest page size
> -        */
> -       *tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> -       return 0;
> +retry:
> +       if (!res_counter_set_limit(&memcg->res, val))
> +               return 0;
> +       if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> +               return -EBUSY;
> +
> +       cond_resched();
> +       progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> +       if (!progress)
> +               retry_count++;
> +       goto retry;
>  }
>
>  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
>
>  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>                                struct file *file, const char __user *userbuf,
> -                               size_t nbytes, loff_t *ppos)
> +                               size_t bbytes, loff_t *ppos)
>  {
> -       return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> -                               cft->private, userbuf, nbytes, ppos,
> -                               mem_cgroup_write_strategy);
> +       char *buf, *end;
> +       unsigned long long val;
> +       int ret;
> +
> +       buf = kmalloc(bbytes + 1, GFP_KERNEL);
> +       if (!buf)
> +               return -ENOMEM;
> +       buf[bbytes] = '\0';
> +       ret = -EFAULT;
> +       if (copy_from_user(buf, userbuf, bbytes))
> +               goto out;
> +
> +       ret = -EINVAL;
> +       strstrip(buf);
> +       val = memparse(buf, &end);
> +       if (*end != '\0')
> +               goto out;
> +       /* Round to page size */
> +       val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> +
> +       switch(cft->private) {
> +       case RES_LIMIT:
> +               ret = mem_cgroup_resize_limit(cont, val);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +       if (!ret)
> +               ret = bbytes;
> +out:
> +       kfree(buf);
> +       return ret;
>  }
>
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
>  3. Teach controller to account for shared-pages
> -4. Start reclamation when the limit is lowered
> -5. Start reclamation in the background when the limit is
> +4. Start reclamation in the background when the limit is
>    not yet hit but the usage is getting closer
>
>  Summary
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-20  5:16     ` Paul Menage
  0 siblings, 0 replies; 24+ messages in thread
From: Paul Menage @ 2008-06-20  5:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, LKML, balbir, xemul, lizf, yamamoto, nishimura, Andrew Morton

On Mon, Jun 16, 2008 at 8:36 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Reduce the usage of res_counter at the change of limit.
>
> Changelog v4 -> v5.
>  - moved "feedback" alogrithm from res_counter to memcg.

FWIW, I really thought it was much better having it in the generic res_counter.

Paul

>
> Background:
>  - Now, mem->usage is not reduced at limit change. So, the users will see
>   usage > limit case in memcg every time. This patch fixes it.
>
>  Before:
>  - no usage change at setting limit.
>  - setting limit always returns 0 even if usage can never be less than zero.
>   (This can happen when memory is locked or there is no swap.)
>  - This is BUG, I think.
>  After:
>  - usage will be less than new limit at limit change.
>  - set limit returns -EBUSY when the usage cannot be reduced.
>
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> ---
>  Documentation/controllers/memory.txt |    3 -
>  mm/memcontrol.c                      |   68 ++++++++++++++++++++++++++++-------
>  2 files changed, 56 insertions(+), 15 deletions(-)
>
> Index: mm-2.6.26-rc5-mm3/mm/memcontrol.c
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/mm/memcontrol.c
> +++ mm-2.6.26-rc5-mm3/mm/memcontrol.c
> @@ -852,18 +852,30 @@ out:
>        css_put(&mem->css);
>        return ret;
>  }
> +/*
> + * try to set limit and reduce usage if necessary.
> + * returns 0 at success.
> + * returns -EBUSY if memory cannot be dropped.
> + */
>
> -static int mem_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +static inline int mem_cgroup_resize_limit(struct cgroup *cont,
> +                                       unsigned long long val)
>  {
> -       *tmp = memparse(buf, &buf);
> -       if (*buf != '\0')
> -               return -EINVAL;
> +       struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
> +       int retry_count = 0;
> +       int progress;
>
> -       /*
> -        * Round up the value to the closest page size
> -        */
> -       *tmp = ((*tmp + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> -       return 0;
> +retry:
> +       if (!res_counter_set_limit(&memcg->res, val))
> +               return 0;
> +       if (retry_count == MEM_CGROUP_RECLAIM_RETRIES)
> +               return -EBUSY;
> +
> +       cond_resched();
> +       progress = try_to_free_mem_cgroup_pages(memcg, GFP_KERNEL);
> +       if (!progress)
> +               retry_count++;
> +       goto retry;
>  }
>
>  static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft)
> @@ -874,11 +886,41 @@ static u64 mem_cgroup_read(struct cgroup
>
>  static ssize_t mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>                                struct file *file, const char __user *userbuf,
> -                               size_t nbytes, loff_t *ppos)
> +                               size_t bbytes, loff_t *ppos)
>  {
> -       return res_counter_write(&mem_cgroup_from_cont(cont)->res,
> -                               cft->private, userbuf, nbytes, ppos,
> -                               mem_cgroup_write_strategy);
> +       char *buf, *end;
> +       unsigned long long val;
> +       int ret;
> +
> +       buf = kmalloc(bbytes + 1, GFP_KERNEL);
> +       if (!buf)
> +               return -ENOMEM;
> +       buf[bbytes] = '\0';
> +       ret = -EFAULT;
> +       if (copy_from_user(buf, userbuf, bbytes))
> +               goto out;
> +
> +       ret = -EINVAL;
> +       strstrip(buf);
> +       val = memparse(buf, &end);
> +       if (*end != '\0')
> +               goto out;
> +       /* Round to page size */
> +       val = ((val + PAGE_SIZE - 1) >> PAGE_SHIFT) << PAGE_SHIFT;
> +
> +       switch(cft->private) {
> +       case RES_LIMIT:
> +               ret = mem_cgroup_resize_limit(cont, val);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +       if (!ret)
> +               ret = bbytes;
> +out:
> +       kfree(buf);
> +       return ret;
>  }
>
>  static int mem_cgroup_reset(struct cgroup *cont, unsigned int event)
> Index: mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> ===================================================================
> --- mm-2.6.26-rc5-mm3.orig/Documentation/controllers/memory.txt
> +++ mm-2.6.26-rc5-mm3/Documentation/controllers/memory.txt
> @@ -242,8 +242,7 @@ rmdir() if there are no tasks.
>  1. Add support for accounting huge pages (as a separate controller)
>  2. Make per-cgroup scanner reclaim not-shared pages first
>  3. Teach controller to account for shared-pages
> -4. Start reclamation when the limit is lowered
> -5. Start reclamation in the background when the limit is
> +4. Start reclamation in the background when the limit is
>    not yet hit but the usage is getting closer
>
>  Summary
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [PATCH 2/2] memcg: reduce usage at change limit
  2008-06-17  3:36   ` KAMEZAWA Hiroyuki
@ 2008-06-20  6:44     ` kamezawa.hiroyu
  -1 siblings, 0 replies; 24+ messages in thread
From: kamezawa.hiroyu @ 2008-06-20  6:44 UTC (permalink / raw)
  To: Paul Menage
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, balbir, xemul, lizf, yamamoto,
	nishimura, Andrew Morton

----- Original Message -----
>Date: 	Thu, 19 Jun 2008 22:16:07 -0700
>From: "Paul Menage" <menage@google.com>
>> Reduce the usage of res_counter at the change of limit.
>>
>> Changelog v4 -> v5.
>>  - moved "feedback" alogrithm from res_counter to memcg.
>
>FWIW, I really thought it was much better having it in the generic res_counte
r.
>
Hmm ;)

Balbir and Pavel pointed out that

the resouce which can shrink if necessary is
 - user's memory usage
 - kernel memory (slab) if it can. (not implemented)

And there are other users of res_counter which cannot shrink.
(I think -EBUSY should be returned)

Now, my idea is
- implement "feedback" in memcg because it is an only user
- fix res_counter to return -EBUSY

I think we can revisit later "implement generic feedback in res_counter".
And such kind of implementation change will not big.(I think)

Another point is I don't want to make res_counter big. To support
generic ops in res_counter (handle limit, hierarchy, high-low watermark...)
res_counter must be bigger that it is. And most of users of res_counder doesn'
t want such ops.

To be honest, both way is okay to me. But I'd like to start from not-invasive
one.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re: [PATCH 2/2] memcg: reduce usage at change limit
@ 2008-06-20  6:44     ` kamezawa.hiroyu
  0 siblings, 0 replies; 24+ messages in thread
From: kamezawa.hiroyu @ 2008-06-20  6:44 UTC (permalink / raw)
  To: Paul Menage
  Cc: KAMEZAWA Hiroyuki, linux-mm, LKML, balbir, xemul, lizf, yamamoto,
	nishimura, Andrew Morton

----- Original Message -----
>Date: 	Thu, 19 Jun 2008 22:16:07 -0700
>From: "Paul Menage" <menage@google.com>
>> Reduce the usage of res_counter at the change of limit.
>>
>> Changelog v4 -> v5.
>>  - moved "feedback" alogrithm from res_counter to memcg.
>
>FWIW, I really thought it was much better having it in the generic res_counte
r.
>
Hmm ;)

Balbir and Pavel pointed out that

the resouce which can shrink if necessary is
 - user's memory usage
 - kernel memory (slab) if it can. (not implemented)

And there are other users of res_counter which cannot shrink.
(I think -EBUSY should be returned)

Now, my idea is
- implement "feedback" in memcg because it is an only user
- fix res_counter to return -EBUSY

I think we can revisit later "implement generic feedback in res_counter".
And such kind of implementation change will not big.(I think)

Another point is I don't want to make res_counter big. To support
generic ops in res_counter (handle limit, hierarchy, high-low watermark...)
res_counter must be bigger that it is. And most of users of res_counder doesn'
t want such ops.

To be honest, both way is okay to me. But I'd like to start from not-invasive
one.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2008-06-20  6:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-17  3:31 [PATCH 0/2] memcg: Reduce usage at change limit KAMEZAWA Hiroyuki
2008-06-17  3:31 ` KAMEZAWA Hiroyuki
2008-06-17  3:33 ` [PATCH 1/2] memcg: res counter set limit KAMEZAWA Hiroyuki
2008-06-17  3:33   ` KAMEZAWA Hiroyuki
2008-06-17  3:40   ` Balbir Singh
2008-06-17  3:40     ` Balbir Singh
2008-06-17  4:05     ` KAMEZAWA Hiroyuki
2008-06-17  4:05       ` KAMEZAWA Hiroyuki
2008-06-17  3:36 ` [PATCH 2/2] memcg: reduce usage at change limit KAMEZAWA Hiroyuki
2008-06-17  3:36   ` KAMEZAWA Hiroyuki
2008-06-17  3:46   ` Balbir Singh
2008-06-17  3:46     ` Balbir Singh
2008-06-17  4:06     ` KAMEZAWA Hiroyuki
2008-06-17  4:06       ` KAMEZAWA Hiroyuki
2008-06-17 10:00       ` KAMEZAWA Hiroyuki
2008-06-17 10:00         ` KAMEZAWA Hiroyuki
2008-06-17 10:14         ` Balbir Singh
2008-06-17 10:14           ` Balbir Singh
2008-06-17 12:03         ` kamezawa.hiroyu
2008-06-17 12:03           ` kamezawa.hiroyu
2008-06-20  5:16   ` Paul Menage
2008-06-20  5:16     ` Paul Menage
2008-06-20  6:44   ` kamezawa.hiroyu
2008-06-20  6:44     ` kamezawa.hiroyu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.