All of lore.kernel.org
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: [BUGFIX][PATCH 2/4] memcg: fix charge path for THP and allow early retirement
Date: Fri, 28 Jan 2011 12:26:08 +0900	[thread overview]
Message-ID: <20110128122608.cf9be26b.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20110128122229.6a4c74a2.kamezawa.hiroyu@jp.fujitsu.com>

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

When THP is used, Hugepage size charge can happen. It's not handled
correctly in mem_cgroup_do_charge(). For example, THP can fallback
to small page allocation when HUGEPAGE allocation seems difficult
or busy, but memory cgroup doesn't understand it and continue to
try HUGEPAGE charging. And the worst thing is memory cgroup
believes 'memory reclaim succeeded' if limit - usage > PAGE_SIZE.

By this, khugepaged etc...can goes into inifinite reclaim loop
if tasks in memcg are busy.

After this patch 
 - Hugepage allocation will fail if 1st trial of page reclaim fails.

Changelog:
 - make changes small. removed renaming codes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

Index: mmotm-0125/mm/memcontrol.c
===================================================================
--- mmotm-0125.orig/mm/memcontrol.c
+++ mmotm-0125/mm/memcontrol.c
@@ -1827,10 +1827,14 @@ enum {
 	CHARGE_OK,		/* success */
 	CHARGE_RETRY,		/* need to retry but retry is not bad */
 	CHARGE_NOMEM,		/* we can't do more. return -ENOMEM */
+	CHARGE_NEED_BREAK,	/* big size allocation failure */
 	CHARGE_WOULDBLOCK,	/* GFP_WAIT wasn't set and no enough res. */
 	CHARGE_OOM_DIE,		/* the current is killed because of OOM */
 };
 
+/*
+ * Now we have 3 charge size as PAGE_SIZE, HPAGE_SIZE and batched allcation.
+ */
 static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 				int csize, bool oom_check)
 {
@@ -1854,9 +1858,6 @@ static int __mem_cgroup_do_charge(struct
 	} else
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
 
-	if (csize > PAGE_SIZE) /* change csize and retry */
-		return CHARGE_RETRY;
-
 	if (!(gfp_mask & __GFP_WAIT))
 		return CHARGE_WOULDBLOCK;
 
@@ -1880,6 +1881,13 @@ static int __mem_cgroup_do_charge(struct
 		return CHARGE_RETRY;
 
 	/*
+	 * if request size is larger than PAGE_SIZE, it's not OOM
+	 * and caller will do retry in smaller size.
+	 */
+	if (csize != PAGE_SIZE)
+		return CHARGE_NEED_BREAK;
+
+	/*
 	 * At task move, charge accounts can be doubly counted. So, it's
 	 * better to wait until the end of task_move if something is going on.
 	 */
@@ -1997,10 +2005,22 @@ again:
 		case CHARGE_OK:
 			break;
 		case CHARGE_RETRY: /* not in OOM situation but retry */
-			csize = page_size;
 			css_put(&mem->css);
 			mem = NULL;
 			goto again;
+		case CHARGE_NEED_BREAK: /* page_size > PAGE_SIZE */
+			css_put(&mem->css);
+			/*
+			 * We'll come here in 2 caes, batched-charge and
+			 * hugetlb alloc. batched-charge can do retry
+			 * with smaller page size. hugepage should return
+			 * NOMEM. This doesn't mean OOM.
+			 */
+			if (page_size > PAGE_SIZE)
+				goto nomem;
+			csize = page_size;
+			mem = NULL;
+			goto again;
 		case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
 			css_put(&mem->css);
 			goto nomem;


WARNING: multiple messages have this Message-ID (diff)
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>
Subject: [BUGFIX][PATCH 2/4] memcg: fix charge path for THP and allow early retirement
Date: Fri, 28 Jan 2011 12:26:08 +0900	[thread overview]
Message-ID: <20110128122608.cf9be26b.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20110128122229.6a4c74a2.kamezawa.hiroyu@jp.fujitsu.com>

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

When THP is used, Hugepage size charge can happen. It's not handled
correctly in mem_cgroup_do_charge(). For example, THP can fallback
to small page allocation when HUGEPAGE allocation seems difficult
or busy, but memory cgroup doesn't understand it and continue to
try HUGEPAGE charging. And the worst thing is memory cgroup
believes 'memory reclaim succeeded' if limit - usage > PAGE_SIZE.

By this, khugepaged etc...can goes into inifinite reclaim loop
if tasks in memcg are busy.

After this patch 
 - Hugepage allocation will fail if 1st trial of page reclaim fails.

Changelog:
 - make changes small. removed renaming codes.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/memcontrol.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

Index: mmotm-0125/mm/memcontrol.c
===================================================================
--- mmotm-0125.orig/mm/memcontrol.c
+++ mmotm-0125/mm/memcontrol.c
@@ -1827,10 +1827,14 @@ enum {
 	CHARGE_OK,		/* success */
 	CHARGE_RETRY,		/* need to retry but retry is not bad */
 	CHARGE_NOMEM,		/* we can't do more. return -ENOMEM */
+	CHARGE_NEED_BREAK,	/* big size allocation failure */
 	CHARGE_WOULDBLOCK,	/* GFP_WAIT wasn't set and no enough res. */
 	CHARGE_OOM_DIE,		/* the current is killed because of OOM */
 };
 
+/*
+ * Now we have 3 charge size as PAGE_SIZE, HPAGE_SIZE and batched allcation.
+ */
 static int __mem_cgroup_do_charge(struct mem_cgroup *mem, gfp_t gfp_mask,
 				int csize, bool oom_check)
 {
@@ -1854,9 +1858,6 @@ static int __mem_cgroup_do_charge(struct
 	} else
 		mem_over_limit = mem_cgroup_from_res_counter(fail_res, res);
 
-	if (csize > PAGE_SIZE) /* change csize and retry */
-		return CHARGE_RETRY;
-
 	if (!(gfp_mask & __GFP_WAIT))
 		return CHARGE_WOULDBLOCK;
 
@@ -1880,6 +1881,13 @@ static int __mem_cgroup_do_charge(struct
 		return CHARGE_RETRY;
 
 	/*
+	 * if request size is larger than PAGE_SIZE, it's not OOM
+	 * and caller will do retry in smaller size.
+	 */
+	if (csize != PAGE_SIZE)
+		return CHARGE_NEED_BREAK;
+
+	/*
 	 * At task move, charge accounts can be doubly counted. So, it's
 	 * better to wait until the end of task_move if something is going on.
 	 */
@@ -1997,10 +2005,22 @@ again:
 		case CHARGE_OK:
 			break;
 		case CHARGE_RETRY: /* not in OOM situation but retry */
-			csize = page_size;
 			css_put(&mem->css);
 			mem = NULL;
 			goto again;
+		case CHARGE_NEED_BREAK: /* page_size > PAGE_SIZE */
+			css_put(&mem->css);
+			/*
+			 * We'll come here in 2 caes, batched-charge and
+			 * hugetlb alloc. batched-charge can do retry
+			 * with smaller page size. hugepage should return
+			 * NOMEM. This doesn't mean OOM.
+			 */
+			if (page_size > PAGE_SIZE)
+				goto nomem;
+			csize = page_size;
+			mem = NULL;
+			goto again;
 		case CHARGE_WOULDBLOCK: /* !__GFP_WAIT */
 			css_put(&mem->css);
 			goto nomem;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-01-28  3:32 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-28  3:22 [BUGFIX][PATCH 0/4] Fixes for memcg with THP KAMEZAWA Hiroyuki
2011-01-28  3:22 ` KAMEZAWA Hiroyuki
2011-01-28  3:24 ` [BUGFIX][PATCH 1/4] memcg: fix limit estimation at reclaim for hugepage KAMEZAWA Hiroyuki
2011-01-28  3:24   ` KAMEZAWA Hiroyuki
2011-01-28  4:40   ` Daisuke Nishimura
2011-01-28  4:40     ` Daisuke Nishimura
2011-01-28  4:49     ` KAMEZAWA Hiroyuki
2011-01-28  4:49       ` KAMEZAWA Hiroyuki
2011-01-28  4:58     ` KAMEZAWA Hiroyuki
2011-01-28  4:58       ` KAMEZAWA Hiroyuki
2011-01-28  5:36       ` Daisuke Nishimura
2011-01-28  5:36         ` Daisuke Nishimura
2011-01-28  8:04       ` Minchan Kim
2011-01-28  8:04         ` Minchan Kim
2011-01-28  8:17         ` Johannes Weiner
2011-01-28  8:17           ` Johannes Weiner
2011-01-28  8:25           ` Minchan Kim
2011-01-28  8:25             ` Minchan Kim
2011-01-28  8:36             ` KAMEZAWA Hiroyuki
2011-01-28  8:36               ` KAMEZAWA Hiroyuki
2011-01-30  2:26               ` Minchan Kim
2011-01-30  2:26                 ` Minchan Kim
2011-01-28  8:41             ` Johannes Weiner
2011-01-28  8:41               ` Johannes Weiner
2011-01-28  8:24         ` KAMEZAWA Hiroyuki
2011-01-28  8:24           ` KAMEZAWA Hiroyuki
2011-01-28  8:37           ` Minchan Kim
2011-01-28  8:37             ` Minchan Kim
2011-01-28  7:52   ` Johannes Weiner
2011-01-28  7:52     ` Johannes Weiner
2011-01-28  8:06     ` KAMEZAWA Hiroyuki
2011-01-28  8:06       ` KAMEZAWA Hiroyuki
2011-01-28  3:26 ` KAMEZAWA Hiroyuki [this message]
2011-01-28  3:26   ` [BUGFIX][PATCH 2/4] memcg: fix charge path for THP and allow early retirement KAMEZAWA Hiroyuki
2011-01-28  5:37   ` Daisuke Nishimura
2011-01-28  5:37     ` Daisuke Nishimura
2011-01-28  7:57   ` Johannes Weiner
2011-01-28  7:57     ` Johannes Weiner
2011-01-28  8:14     ` KAMEZAWA Hiroyuki
2011-01-28  8:14       ` KAMEZAWA Hiroyuki
2011-01-28  9:02       ` Johannes Weiner
2011-01-28  9:02         ` Johannes Weiner
2011-01-28  9:16         ` KAMEZAWA Hiroyuki
2011-01-28  9:16           ` KAMEZAWA Hiroyuki
2011-01-28  3:27 ` [BUGFIX][PATCH 3/4] mecg: fix oom flag at THP charge KAMEZAWA Hiroyuki
2011-01-28  3:27   ` KAMEZAWA Hiroyuki
2011-01-28  5:39   ` Daisuke Nishimura
2011-01-28  5:39     ` Daisuke Nishimura
2011-01-28  5:50     ` KAMEZAWA Hiroyuki
2011-01-28  5:50       ` KAMEZAWA Hiroyuki
2011-01-28  8:02   ` Johannes Weiner
2011-01-28  8:02     ` Johannes Weiner
2011-01-28  8:21     ` KAMEZAWA Hiroyuki
2011-01-28  8:21       ` KAMEZAWA Hiroyuki
2011-01-31  7:41       ` Balbir Singh
2011-01-31  7:41         ` Balbir Singh
2011-01-28  3:28 ` [BUGFIX][PATCH 4/4] memcg: fix khugepaged should skip busy memcg KAMEZAWA Hiroyuki
2011-01-28  3:28   ` KAMEZAWA Hiroyuki
2011-01-28  8:20   ` Daisuke Nishimura
2011-01-28  8:20     ` Daisuke Nishimura
2011-01-28  8:30     ` KAMEZAWA Hiroyuki
2011-01-28  8:30       ` KAMEZAWA Hiroyuki
2011-01-29 12:47 ` [BUGFIX][PATCH 0/4] Fixes for memcg with THP Balbir Singh
2011-01-29 12:47   ` Balbir Singh
2011-01-30 23:55   ` KAMEZAWA Hiroyuki
2011-01-30 23:55     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110128122608.cf9be26b.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.