All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Nils Holland <nholland@tisys.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	linux-btrfs@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath
Date: Fri, 16 Dec 2016 16:58:07 +0100	[thread overview]
Message-ID: <20161216155808.12809-2-mhocko@kernel.org> (raw)
In-Reply-To: <20161216155808.12809-1-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.

__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.

While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.

Changes since v1
- do not skip direct reclaim for TIF_MEMDIE && GFP_NOFAIL as per Hillf
- do not skip __alloc_pages_may_oom for TIF_MEMDIE && GFP_NOFAIL as
  per Tetsuo

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 mm/page_alloc.c | 75 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f2c9e535f7f..095e2fa286de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3640,35 +3640,21 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto got_pg;
 
 	/* Caller is not willing to reclaim, we can't balance anything */
-	if (!can_direct_reclaim) {
-		/*
-		 * All existing users of the __GFP_NOFAIL are blockable, so warn
-		 * of any new users that actually allow this type of allocation
-		 * to fail.
-		 */
-		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+	if (!can_direct_reclaim)
 		goto nopage;
-	}
 
-	/* Avoid recursion of direct reclaim */
-	if (current->flags & PF_MEMALLOC) {
-		/*
-		 * __GFP_NOFAIL request from this context is rather bizarre
-		 * because we cannot reclaim anything and only can loop waiting
-		 * for somebody to do a work for us.
-		 */
-		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
-			cond_resched();
-			goto retry;
-		}
-		goto nopage;
+	/* Make sure we know about allocations which stall for too long */
+	if (time_after(jiffies, alloc_start + stall_timeout)) {
+		warn_alloc(gfp_mask,
+			"page alloction stalls for %ums, order:%u",
+			jiffies_to_msecs(jiffies-alloc_start), order);
+		stall_timeout += 10 * HZ;
 	}
 
-	/* Avoid allocations with no watermarks from looping endlessly */
-	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+	/* Avoid recursion of direct reclaim */
+	if (current->flags & PF_MEMALLOC)
 		goto nopage;
 
-
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
 							&did_some_progress);
@@ -3692,14 +3678,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
-	/* Make sure we know about allocations which stall for too long */
-	if (time_after(jiffies, alloc_start + stall_timeout)) {
-		warn_alloc(gfp_mask,
-			"page allocation stalls for %ums, order:%u",
-			jiffies_to_msecs(jiffies-alloc_start), order);
-		stall_timeout += 10 * HZ;
-	}
-
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, &no_progress_loops))
 		goto retry;
@@ -3721,6 +3699,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Avoid allocations with no watermarks from looping endlessly */
+	if (test_thread_flag(TIF_MEMDIE))
+		goto nopage;
+
 	/* Retry as long as the OOM killer is making progress */
 	if (did_some_progress) {
 		no_progress_loops = 0;
@@ -3728,6 +3710,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	}
 
 nopage:
+	/*
+	 * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+	 * we always retry
+	 */
+	if (gfp_mask & __GFP_NOFAIL) {
+		/*
+		 * All existing users of the __GFP_NOFAIL are blockable, so warn
+		 * of any new users that actually require GFP_NOWAIT
+		 */
+		if (WARN_ON_ONCE(!can_direct_reclaim))
+			goto fail;
+
+		/*
+		 * PF_MEMALLOC request from this context is rather bizarre
+		 * because we cannot reclaim anything and only can loop waiting
+		 * for somebody to do a work for us
+		 */
+		WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+		/*
+		 * non failing costly orders are a hard requirement which we
+		 * are not prepared for much so let's warn about these users
+		 * so that we can identify them and convert them to something
+		 * else.
+		 */
+		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
+
+		cond_resched();
+		goto retry;
+	}
+fail:
 	warn_alloc(gfp_mask,
 			"page allocation failure: order:%u", order);
 got_pg:
-- 
2.10.2


WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Nils Holland <nholland@tisys.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	linux-btrfs@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath
Date: Fri, 16 Dec 2016 16:58:07 +0100	[thread overview]
Message-ID: <20161216155808.12809-2-mhocko@kernel.org> (raw)
In-Reply-To: <20161216155808.12809-1-mhocko@kernel.org>

From: Michal Hocko <mhocko@suse.com>

Tetsuo Handa has pointed out that 0a0337e0d1d1 ("mm, oom: rework oom
detection") has subtly changed semantic for costly high order requests
with __GFP_NOFAIL and withtout __GFP_REPEAT and those can fail right now.
My code inspection didn't reveal any such users in the tree but it is
true that this might lead to unexpected allocation failures and
subsequent OOPs.

__alloc_pages_slowpath wrt. GFP_NOFAIL is hard to follow currently.
There are few special cases but we are lacking a catch all place to be
sure we will not miss any case where the non failing allocation might
fail. This patch reorganizes the code a bit and puts all those special
cases under nopage label which is the generic go-to-fail path. Non
failing allocations are retried or those that cannot retry like
non-sleeping allocation go to the failure point directly. This should
make the code flow much easier to follow and make it less error prone
for future changes.

While we are there we have to move the stall check up to catch
potentially looping non-failing allocations.

Changes since v1
- do not skip direct reclaim for TIF_MEMDIE && GFP_NOFAIL as per Hillf
- do not skip __alloc_pages_may_oom for TIF_MEMDIE && GFP_NOFAIL as
  per Tetsuo

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
---
 mm/page_alloc.c | 75 +++++++++++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3f2c9e535f7f..095e2fa286de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3640,35 +3640,21 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		goto got_pg;
 
 	/* Caller is not willing to reclaim, we can't balance anything */
-	if (!can_direct_reclaim) {
-		/*
-		 * All existing users of the __GFP_NOFAIL are blockable, so warn
-		 * of any new users that actually allow this type of allocation
-		 * to fail.
-		 */
-		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+	if (!can_direct_reclaim)
 		goto nopage;
-	}
 
-	/* Avoid recursion of direct reclaim */
-	if (current->flags & PF_MEMALLOC) {
-		/*
-		 * __GFP_NOFAIL request from this context is rather bizarre
-		 * because we cannot reclaim anything and only can loop waiting
-		 * for somebody to do a work for us.
-		 */
-		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
-			cond_resched();
-			goto retry;
-		}
-		goto nopage;
+	/* Make sure we know about allocations which stall for too long */
+	if (time_after(jiffies, alloc_start + stall_timeout)) {
+		warn_alloc(gfp_mask,
+			"page alloction stalls for %ums, order:%u",
+			jiffies_to_msecs(jiffies-alloc_start), order);
+		stall_timeout += 10 * HZ;
 	}
 
-	/* Avoid allocations with no watermarks from looping endlessly */
-	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
+	/* Avoid recursion of direct reclaim */
+	if (current->flags & PF_MEMALLOC)
 		goto nopage;
 
-
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
 							&did_some_progress);
@@ -3692,14 +3678,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT))
 		goto nopage;
 
-	/* Make sure we know about allocations which stall for too long */
-	if (time_after(jiffies, alloc_start + stall_timeout)) {
-		warn_alloc(gfp_mask,
-			"page allocation stalls for %ums, order:%u",
-			jiffies_to_msecs(jiffies-alloc_start), order);
-		stall_timeout += 10 * HZ;
-	}
-
 	if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
 				 did_some_progress > 0, &no_progress_loops))
 		goto retry;
@@ -3721,6 +3699,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto got_pg;
 
+	/* Avoid allocations with no watermarks from looping endlessly */
+	if (test_thread_flag(TIF_MEMDIE))
+		goto nopage;
+
 	/* Retry as long as the OOM killer is making progress */
 	if (did_some_progress) {
 		no_progress_loops = 0;
@@ -3728,6 +3710,37 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	}
 
 nopage:
+	/*
+	 * Make sure that __GFP_NOFAIL request doesn't leak out and make sure
+	 * we always retry
+	 */
+	if (gfp_mask & __GFP_NOFAIL) {
+		/*
+		 * All existing users of the __GFP_NOFAIL are blockable, so warn
+		 * of any new users that actually require GFP_NOWAIT
+		 */
+		if (WARN_ON_ONCE(!can_direct_reclaim))
+			goto fail;
+
+		/*
+		 * PF_MEMALLOC request from this context is rather bizarre
+		 * because we cannot reclaim anything and only can loop waiting
+		 * for somebody to do a work for us
+		 */
+		WARN_ON_ONCE(current->flags & PF_MEMALLOC);
+
+		/*
+		 * non failing costly orders are a hard requirement which we
+		 * are not prepared for much so let's warn about these users
+		 * so that we can identify them and convert them to something
+		 * else.
+		 */
+		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
+
+		cond_resched();
+		goto retry;
+	}
+fail:
 	warn_alloc(gfp_mask,
 			"page allocation failure: order:%u", order);
 got_pg:
-- 
2.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-16 15:58 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland
2016-12-16  7:39 ` Michal Hocko
2016-12-16  7:39   ` Michal Hocko
2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
2016-12-16 15:58     ` Michal Hocko
2016-12-16 15:58     ` Michal Hocko [this message]
2016-12-16 15:58       ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-16 15:58       ` Michal Hocko
2016-12-16 17:31       ` Johannes Weiner
2016-12-16 17:31         ` Johannes Weiner
2016-12-16 22:12         ` Michal Hocko
2016-12-16 22:12           ` Michal Hocko
2016-12-17 11:17           ` Tetsuo Handa
2016-12-17 11:17             ` Tetsuo Handa
2016-12-18 16:37             ` Michal Hocko
2016-12-18 16:37               ` Michal Hocko
2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
2016-12-16 18:47       ` Nils Holland
2016-12-17  0:02       ` Michal Hocko
2016-12-17  0:02         ` Michal Hocko
2016-12-17 12:59         ` Nils Holland
2016-12-17 12:59           ` Nils Holland
2016-12-17 14:44           ` Tetsuo Handa
2016-12-17 14:44             ` Tetsuo Handa
2016-12-17 17:11             ` Nils Holland
2016-12-17 17:11               ` Nils Holland
2016-12-17 21:06             ` Nils Holland
2016-12-17 21:06               ` Nils Holland
2016-12-18  5:14               ` Tetsuo Handa
2016-12-18  5:14                 ` Tetsuo Handa
2016-12-19 13:45               ` Michal Hocko
2016-12-19 13:45                 ` Michal Hocko
2016-12-20  2:08                 ` Nils Holland
2016-12-20  2:08                   ` Nils Holland
2016-12-21  7:36                   ` Michal Hocko
2016-12-21  7:36                     ` Michal Hocko
2016-12-21 11:00                     ` Tetsuo Handa
2016-12-21 11:00                       ` Tetsuo Handa
2016-12-21 11:16                       ` Michal Hocko
2016-12-21 11:16                         ` Michal Hocko
2016-12-21 14:04                         ` Chris Mason
2016-12-21 14:04                           ` Chris Mason
2016-12-22 10:10                     ` Nils Holland
2016-12-22 10:10                       ` Nils Holland
2016-12-22 10:27                       ` Michal Hocko
2016-12-22 10:27                         ` Michal Hocko
2016-12-22 10:35                         ` Nils Holland
2016-12-22 10:35                           ` Nils Holland
2016-12-22 10:46                           ` Tetsuo Handa
2016-12-22 10:46                             ` Tetsuo Handa
2016-12-22 19:17                       ` Michal Hocko
2016-12-22 19:17                         ` Michal Hocko
2016-12-22 21:46                         ` Nils Holland
2016-12-22 21:46                           ` Nils Holland
2016-12-23 10:51                           ` Michal Hocko
2016-12-23 10:51                             ` Michal Hocko
2016-12-23 12:18                             ` Nils Holland
2016-12-23 12:18                               ` Nils Holland
2016-12-23 12:57                               ` Michal Hocko
2016-12-23 12:57                                 ` Michal Hocko
2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
2016-12-23 14:47                                   ` Michal Hocko
2016-12-23 22:26                                   ` Nils Holland
2016-12-23 22:26                                     ` Nils Holland
2016-12-26 12:48                                     ` Michal Hocko
2016-12-26 12:48                                       ` Michal Hocko
2016-12-26 18:57                                       ` Nils Holland
2016-12-26 18:57                                         ` Nils Holland
2016-12-27  8:08                                         ` Michal Hocko
2016-12-27  8:08                                           ` Michal Hocko
2016-12-27 11:23                                           ` Nils Holland
2016-12-27 11:23                                             ` Nils Holland
2016-12-27 11:27                                             ` Michal Hocko
2016-12-27 11:27                                               ` Michal Hocko
2016-12-27 15:55                                       ` Michal Hocko
2016-12-27 15:55                                         ` Michal Hocko
2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
2016-12-28  8:51                                           ` Michal Hocko
2016-12-28  8:51                                             ` Michal Hocko
2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
2016-12-27 19:33                                           ` Nils Holland
2016-12-28  8:57                                           ` Michal Hocko
2016-12-28  8:57                                             ` Michal Hocko
2016-12-29  1:20                                         ` Minchan Kim
2016-12-29  1:20                                           ` Minchan Kim
2016-12-29  9:04                                           ` Michal Hocko
2016-12-29  9:04                                             ` Michal Hocko
2016-12-30  2:05                                             ` Minchan Kim
2016-12-30  2:05                                               ` Minchan Kim
2016-12-30 10:40                                               ` Michal Hocko
2016-12-30 10:40                                                 ` Michal Hocko
2016-12-29  0:31                                       ` Minchan Kim
2016-12-29  0:31                                         ` Minchan Kim
2016-12-29  0:48                                         ` Minchan Kim
2016-12-29  0:48                                           ` Minchan Kim
2016-12-29  8:52                                           ` Michal Hocko
2016-12-29  8:52                                             ` Michal Hocko
2016-12-30 10:19                                       ` Mel Gorman
2016-12-30 10:19                                         ` Mel Gorman
2016-12-30 11:05                                         ` Michal Hocko
2016-12-30 11:05                                           ` Michal Hocko
2016-12-30 12:43                                           ` Mel Gorman
2016-12-30 12:43                                             ` Mel Gorman
2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
2016-12-25 22:25                                     ` kernel test robot
2016-12-26 12:26                                     ` Michal Hocko
2016-12-26 12:26                                       ` Michal Hocko
2016-12-26 12:26                                       ` Michal Hocko
2016-12-26 12:50                                       ` Michal Hocko
2016-12-26 12:50                                         ` Michal Hocko
2016-12-26 12:50                                         ` Michal Hocko
2016-12-18  0:28             ` OOM: Better, but still there on Xin Zhou
2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
2016-12-16 18:15     ` Chris Mason
2016-12-16 22:14     ` Michal Hocko
2016-12-16 22:14       ` Michal Hocko
2016-12-16 22:47       ` Chris Mason
2016-12-16 22:47         ` Chris Mason
2016-12-16 23:31         ` Michal Hocko
2016-12-16 23:31           ` Michal Hocko
2016-12-16 19:50   ` Chris Mason
2016-12-16 19:50     ` Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2016-12-01 15:25 [PATCH 0/2] GFP_NOFAIL cleanups Michal Hocko
2016-12-01 15:25 ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-12-01 15:25   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161216155808.12809-2-mhocko@kernel.org \
    --to=mhocko@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nholland@tisys.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.