All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] get rid of __alloc_pages_high_priority
@ 2015-11-16 13:22 ` mhocko
  0 siblings, 0 replies; 28+ messages in thread
From: mhocko @ 2015-11-16 13:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML

Hi,
this has been posted http://lkml.kernel.org/r/1447343618-19696-1-git-send-email-mhocko%40kernel.org
last week. David has requested to split the patch into two parts
one to removed and opencode __alloc_pages_high_priority without
any functional changes and the other one which changes the retry
behavior for __GFP_NOFAIL with ALLOC_NO_WATERMARKS allocation context.
This was reflected in this submission.

The end result is very same so I've kept Mel's Acked-by. Let me know if
you do not agree with this Mel and I will drop it.

 mm/page_alloc.c | 48 +++++++++++++++++-------------------------------
 1 file changed, 17 insertions(+), 31 deletions(-)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/2] get rid of __alloc_pages_high_priority
@ 2015-11-16 13:22 ` mhocko
  0 siblings, 0 replies; 28+ messages in thread
From: mhocko @ 2015-11-16 13:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML

Hi,
this has been posted http://lkml.kernel.org/r/1447343618-19696-1-git-send-email-mhocko%40kernel.org
last week. David has requested to split the patch into two parts
one to removed and opencode __alloc_pages_high_priority without
any functional changes and the other one which changes the retry
behavior for __GFP_NOFAIL with ALLOC_NO_WATERMARKS allocation context.
This was reflected in this submission.

The end result is very same so I've kept Mel's Acked-by. Let me know if
you do not agree with this Mel and I will drop it.

 mm/page_alloc.c | 48 +++++++++++++++++-------------------------------
 1 file changed, 17 insertions(+), 31 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
  2015-11-16 13:22 ` mhocko
@ 2015-11-16 13:22   ` mhocko
  -1 siblings, 0 replies; 28+ messages in thread
From: mhocko @ 2015-11-16 13:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__alloc_pages_high_priority doesn't do anything special other than it
calls get_page_from_freelist and loops around GFP_NOFAIL allocation
until it succeeds. It would be better if the first part was done in
__alloc_pages_slowpath where we modify the zonelist because this would
be easier to read and understand. Opencoding the function into its only
caller allows to simplify it a bit as well.

This patch doesn't introduce any functional changes.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 36 +++++++++---------------------------
 1 file changed, 9 insertions(+), 27 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8034909faad2..b153fa3d0b9b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2902,28 +2902,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-/*
- * This is called in the allocator slow-path if the allocation request is of
- * sufficient urgency to ignore watermarks and take other desperate measures
- */
-static inline struct page *
-__alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
-				const struct alloc_context *ac)
-{
-	struct page *page;
-
-	do {
-		page = get_page_from_freelist(gfp_mask, order,
-						ALLOC_NO_WATERMARKS, ac);
-
-		if (!page && gfp_mask & __GFP_NOFAIL)
-			wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC,
-									HZ/50);
-	} while (!page && (gfp_mask & __GFP_NOFAIL));
-
-	return page;
-}
-
 static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
 {
 	struct zoneref *z;
@@ -3068,12 +3046,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * allocations are system rather than user orientated
 		 */
 		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+		do {
+			page = get_page_from_freelist(gfp_mask, order,
+							ALLOC_NO_WATERMARKS, ac);
+			if (page)
+				goto got_pg;
 
-		page = __alloc_pages_high_priority(gfp_mask, order, ac);
-
-		if (page) {
-			goto got_pg;
-		}
+			if (gfp_mask & __GFP_NOFAIL)
+				wait_iff_congested(ac->preferred_zone,
+						   BLK_RW_ASYNC, HZ/50);
+		} while (gfp_mask & __GFP_NOFAIL);
 	}
 
 	/* Caller is not willing to reclaim, we can't balance anything */
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
@ 2015-11-16 13:22   ` mhocko
  0 siblings, 0 replies; 28+ messages in thread
From: mhocko @ 2015-11-16 13:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__alloc_pages_high_priority doesn't do anything special other than it
calls get_page_from_freelist and loops around GFP_NOFAIL allocation
until it succeeds. It would be better if the first part was done in
__alloc_pages_slowpath where we modify the zonelist because this would
be easier to read and understand. Opencoding the function into its only
caller allows to simplify it a bit as well.

This patch doesn't introduce any functional changes.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 36 +++++++++---------------------------
 1 file changed, 9 insertions(+), 27 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8034909faad2..b153fa3d0b9b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2902,28 +2902,6 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-/*
- * This is called in the allocator slow-path if the allocation request is of
- * sufficient urgency to ignore watermarks and take other desperate measures
- */
-static inline struct page *
-__alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
-				const struct alloc_context *ac)
-{
-	struct page *page;
-
-	do {
-		page = get_page_from_freelist(gfp_mask, order,
-						ALLOC_NO_WATERMARKS, ac);
-
-		if (!page && gfp_mask & __GFP_NOFAIL)
-			wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC,
-									HZ/50);
-	} while (!page && (gfp_mask & __GFP_NOFAIL));
-
-	return page;
-}
-
 static void wake_all_kswapds(unsigned int order, const struct alloc_context *ac)
 {
 	struct zoneref *z;
@@ -3068,12 +3046,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * allocations are system rather than user orientated
 		 */
 		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
+		do {
+			page = get_page_from_freelist(gfp_mask, order,
+							ALLOC_NO_WATERMARKS, ac);
+			if (page)
+				goto got_pg;
 
-		page = __alloc_pages_high_priority(gfp_mask, order, ac);
-
-		if (page) {
-			goto got_pg;
-		}
+			if (gfp_mask & __GFP_NOFAIL)
+				wait_iff_congested(ac->preferred_zone,
+						   BLK_RW_ASYNC, HZ/50);
+		} while (gfp_mask & __GFP_NOFAIL);
 	}
 
 	/* Caller is not willing to reclaim, we can't balance anything */
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-16 13:22 ` mhocko
@ 2015-11-16 13:22   ` mhocko
  -1 siblings, 0 replies; 28+ messages in thread
From: mhocko @ 2015-11-16 13:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
__GFP_NOFAIL is requested. This is fragile because we are basically
relying on somebody else to make the reclaim (be it the direct reclaim
or OOM killer) for us. The caller might be holding resources (e.g.
locks) which block other other reclaimers from making any progress for
example. Remove the retry loop and rely on __alloc_pages_slowpath to
invoke all allowed reclaim steps and retry logic.

We have to be careful about __GFP_NOFAIL allocations from the
PF_MEMALLOC context even though this is a very bad idea to begin with
because no progress can be gurateed at all.  We shouldn't break the
__GFP_NOFAIL semantic here though. It could be argued that this is
essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
is much harder to check for existing users because they might happen
deep down the code path performed much later after setting the flag
so we cannot really rule out there is no kernel path triggering this
combination.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b153fa3d0b9b..df7746280427 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * allocations are system rather than user orientated
 		 */
 		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
-		do {
-			page = get_page_from_freelist(gfp_mask, order,
-							ALLOC_NO_WATERMARKS, ac);
-			if (page)
-				goto got_pg;
-
-			if (gfp_mask & __GFP_NOFAIL)
-				wait_iff_congested(ac->preferred_zone,
-						   BLK_RW_ASYNC, HZ/50);
-		} while (gfp_mask & __GFP_NOFAIL);
+		page = get_page_from_freelist(gfp_mask, order,
+						ALLOC_NO_WATERMARKS, ac);
+		if (page)
+			goto got_pg;
 	}
 
 	/* Caller is not willing to reclaim, we can't balance anything */
 	if (!can_direct_reclaim) {
 		/*
-		 * All existing users of the deprecated __GFP_NOFAIL are
-		 * blockable, so warn of any new users that actually allow this
-		 * type of allocation to fail.
+		 * All existing users of the __GFP_NOFAIL are blockable, so warn
+		 * of any new users that actually allow this type of allocation
+		 * to fail.
 		 */
 		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
 		goto nopage;
 	}
 
 	/* Avoid recursion of direct reclaim */
-	if (current->flags & PF_MEMALLOC)
+	if (current->flags & PF_MEMALLOC) {
+		/*
+		 * __GFP_NOFAIL request from this context is rather bizarre
+		 * because we cannot reclaim anything and only can loop waiting
+		 * for somebody to do a work for us.
+		 */
+		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
+			cond_resched();
+			goto retry;
+		}
 		goto nopage;
+	}
 
 	/* Avoid allocations with no watermarks from looping endlessly */
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-16 13:22   ` mhocko
  0 siblings, 0 replies; 28+ messages in thread
From: mhocko @ 2015-11-16 13:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
__GFP_NOFAIL is requested. This is fragile because we are basically
relying on somebody else to make the reclaim (be it the direct reclaim
or OOM killer) for us. The caller might be holding resources (e.g.
locks) which block other other reclaimers from making any progress for
example. Remove the retry loop and rely on __alloc_pages_slowpath to
invoke all allowed reclaim steps and retry logic.

We have to be careful about __GFP_NOFAIL allocations from the
PF_MEMALLOC context even though this is a very bad idea to begin with
because no progress can be gurateed at all.  We shouldn't break the
__GFP_NOFAIL semantic here though. It could be argued that this is
essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
is much harder to check for existing users because they might happen
deep down the code path performed much later after setting the flag
so we cannot really rule out there is no kernel path triggering this
combination.

Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b153fa3d0b9b..df7746280427 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		 * allocations are system rather than user orientated
 		 */
 		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
-		do {
-			page = get_page_from_freelist(gfp_mask, order,
-							ALLOC_NO_WATERMARKS, ac);
-			if (page)
-				goto got_pg;
-
-			if (gfp_mask & __GFP_NOFAIL)
-				wait_iff_congested(ac->preferred_zone,
-						   BLK_RW_ASYNC, HZ/50);
-		} while (gfp_mask & __GFP_NOFAIL);
+		page = get_page_from_freelist(gfp_mask, order,
+						ALLOC_NO_WATERMARKS, ac);
+		if (page)
+			goto got_pg;
 	}
 
 	/* Caller is not willing to reclaim, we can't balance anything */
 	if (!can_direct_reclaim) {
 		/*
-		 * All existing users of the deprecated __GFP_NOFAIL are
-		 * blockable, so warn of any new users that actually allow this
-		 * type of allocation to fail.
+		 * All existing users of the __GFP_NOFAIL are blockable, so warn
+		 * of any new users that actually allow this type of allocation
+		 * to fail.
 		 */
 		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
 		goto nopage;
 	}
 
 	/* Avoid recursion of direct reclaim */
-	if (current->flags & PF_MEMALLOC)
+	if (current->flags & PF_MEMALLOC) {
+		/*
+		 * __GFP_NOFAIL request from this context is rather bizarre
+		 * because we cannot reclaim anything and only can loop waiting
+		 * for somebody to do a work for us.
+		 */
+		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
+			cond_resched();
+			goto retry;
+		}
 		goto nopage;
+	}
 
 	/* Avoid allocations with no watermarks from looping endlessly */
 	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
-- 
2.6.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
  2015-11-16 13:22   ` mhocko
@ 2015-11-16 18:43     ` Mel Gorman
  -1 siblings, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2015-11-16 18:43 UTC (permalink / raw)
  To: mhocko; +Cc: Andrew Morton, David Rientjes, linux-mm, LKML, Michal Hocko

On Mon, Nov 16, 2015 at 02:22:18PM +0100, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_high_priority doesn't do anything special other than it
> calls get_page_from_freelist and loops around GFP_NOFAIL allocation
> until it succeeds. It would be better if the first part was done in
> __alloc_pages_slowpath where we modify the zonelist because this would
> be easier to read and understand. Opencoding the function into its only
> caller allows to simplify it a bit as well.
> 
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
@ 2015-11-16 18:43     ` Mel Gorman
  0 siblings, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2015-11-16 18:43 UTC (permalink / raw)
  To: mhocko; +Cc: Andrew Morton, David Rientjes, linux-mm, LKML, Michal Hocko

On Mon, Nov 16, 2015 at 02:22:18PM +0100, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_high_priority doesn't do anything special other than it
> calls get_page_from_freelist and loops around GFP_NOFAIL allocation
> until it succeeds. It would be better if the first part was done in
> __alloc_pages_slowpath where we modify the zonelist because this would
> be easier to read and understand. Opencoding the function into its only
> caller allows to simplify it a bit as well.
> 
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
  2015-11-16 13:22   ` mhocko
@ 2015-11-16 21:14     ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2015-11-16 21:14 UTC (permalink / raw)
  To: mhocko; +Cc: Andrew Morton, Mel Gorman, linux-mm, LKML, Michal Hocko

On Mon, 16 Nov 2015, mhocko@kernel.org wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_high_priority doesn't do anything special other than it
> calls get_page_from_freelist and loops around GFP_NOFAIL allocation
> until it succeeds. It would be better if the first part was done in
> __alloc_pages_slowpath where we modify the zonelist because this would
> be easier to read and understand. Opencoding the function into its only
> caller allows to simplify it a bit as well.
> 
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
@ 2015-11-16 21:14     ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2015-11-16 21:14 UTC (permalink / raw)
  To: mhocko; +Cc: Andrew Morton, Mel Gorman, linux-mm, LKML, Michal Hocko

On Mon, 16 Nov 2015, mhocko@kernel.org wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_high_priority doesn't do anything special other than it
> calls get_page_from_freelist and loops around GFP_NOFAIL allocation
> until it succeeds. It would be better if the first part was done in
> __alloc_pages_slowpath where we modify the zonelist because this would
> be easier to read and understand. Opencoding the function into its only
> caller allows to simplify it a bit as well.
> 
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-16 13:22   ` mhocko
@ 2015-11-16 21:18     ` David Rientjes
  -1 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2015-11-16 21:18 UTC (permalink / raw)
  To: mhocko; +Cc: Andrew Morton, Mel Gorman, linux-mm, LKML, Michal Hocko

On Mon, 16 Nov 2015, mhocko@kernel.org wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.
> 
> We have to be careful about __GFP_NOFAIL allocations from the
> PF_MEMALLOC context even though this is a very bad idea to begin with
> because no progress can be gurateed at all.  We shouldn't break the
> __GFP_NOFAIL semantic here though. It could be argued that this is
> essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
> is much harder to check for existing users because they might happen
> deep down the code path performed much later after setting the flag
> so we cannot really rule out there is no kernel path triggering this
> combination.
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: David Rientjes <rientjes@google.com>

It'll be scary if anything actually relies on this, but I think it's more 
correct.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-16 21:18     ` David Rientjes
  0 siblings, 0 replies; 28+ messages in thread
From: David Rientjes @ 2015-11-16 21:18 UTC (permalink / raw)
  To: mhocko; +Cc: Andrew Morton, Mel Gorman, linux-mm, LKML, Michal Hocko

On Mon, 16 Nov 2015, mhocko@kernel.org wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.
> 
> We have to be careful about __GFP_NOFAIL allocations from the
> PF_MEMALLOC context even though this is a very bad idea to begin with
> because no progress can be gurateed at all.  We shouldn't break the
> __GFP_NOFAIL semantic here though. It could be argued that this is
> essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
> is much harder to check for existing users because they might happen
> deep down the code path performed much later after setting the flag
> so we cannot really rule out there is no kernel path triggering this
> combination.
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: David Rientjes <rientjes@google.com>

It'll be scary if anything actually relies on this, but I think it's more 
correct.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-16 13:22   ` mhocko
@ 2015-11-17 10:58     ` Tetsuo Handa
  -1 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2015-11-17 10:58 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

Michal Hocko wrote:
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.

This implies invoking OOM killer, doesn't it?

>   	/* Avoid recursion of direct reclaim */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		/*
> +		 * __GFP_NOFAIL request from this context is rather bizarre
> +		 * because we cannot reclaim anything and only can loop waiting
> +		 * for somebody to do a work for us.
> +		 */
> +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +			cond_resched();
> +			goto retry;

I think that this "goto retry;" omits call to out_of_memory() which is allowed
for __GFP_NOFAIL allocations. Even if this is what you meant, current thread
can be a workqueue, which currently need a short sleep (as with
wait_iff_congested() changes), can't it?

> +		}
>   		goto nopage;
> +	}
>   
>   	/* Avoid allocations with no watermarks from looping endlessly */
>   	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> 

Well, is it cond_resched() which should include

  if (current->flags & PF_WQ_WORKER)
  	schedule_timeout(1);

than wait_iff_congested() because not all yield calls use wait_iff_congested()
and giving pending workqueue jobs a chance to be processed is anyway preferable?

  int __sched _cond_resched(void)
  {
  	if (should_resched(0)) {
  		if ((current->flags & PF_WQ_WORKER) && workqueue_has_pending_jobs())
  			schedule_timeout(1);
  		else
  			preempt_schedule_common();
  		return 1;
  	}
  	return 0;
  }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-17 10:58     ` Tetsuo Handa
  0 siblings, 0 replies; 28+ messages in thread
From: Tetsuo Handa @ 2015-11-17 10:58 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

Michal Hocko wrote:
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.

This implies invoking OOM killer, doesn't it?

>   	/* Avoid recursion of direct reclaim */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		/*
> +		 * __GFP_NOFAIL request from this context is rather bizarre
> +		 * because we cannot reclaim anything and only can loop waiting
> +		 * for somebody to do a work for us.
> +		 */
> +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +			cond_resched();
> +			goto retry;

I think that this "goto retry;" omits call to out_of_memory() which is allowed
for __GFP_NOFAIL allocations. Even if this is what you meant, current thread
can be a workqueue, which currently need a short sleep (as with
wait_iff_congested() changes), can't it?

> +		}
>   		goto nopage;
> +	}
>   
>   	/* Avoid allocations with no watermarks from looping endlessly */
>   	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> 

Well, is it cond_resched() which should include

  if (current->flags & PF_WQ_WORKER)
  	schedule_timeout(1);

than wait_iff_congested() because not all yield calls use wait_iff_congested()
and giving pending workqueue jobs a chance to be processed is anyway preferable?

  int __sched _cond_resched(void)
  {
  	if (should_resched(0)) {
  		if ((current->flags & PF_WQ_WORKER) && workqueue_has_pending_jobs())
  			schedule_timeout(1);
  		else
  			preempt_schedule_common();
  		return 1;
  	}
  	return 0;
  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-17 10:58     ` Tetsuo Handa
@ 2015-11-18  9:11       ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-18  9:11 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On Tue 17-11-15 19:58:09, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> > __GFP_NOFAIL is requested. This is fragile because we are basically
> > relying on somebody else to make the reclaim (be it the direct reclaim
> > or OOM killer) for us. The caller might be holding resources (e.g.
> > locks) which block other other reclaimers from making any progress for
> > example. Remove the retry loop and rely on __alloc_pages_slowpath to
> > invoke all allowed reclaim steps and retry logic.
> 
> This implies invoking OOM killer, doesn't it?

It does and the changelog is explicit about this.

> >   	/* Avoid recursion of direct reclaim */
> > -	if (current->flags & PF_MEMALLOC)
> > +	if (current->flags & PF_MEMALLOC) {
> > +		/*
> > +		 * __GFP_NOFAIL request from this context is rather bizarre
> > +		 * because we cannot reclaim anything and only can loop waiting
> > +		 * for somebody to do a work for us.
> > +		 */
> > +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> > +			cond_resched();
> > +			goto retry;
> 
> I think that this "goto retry;" omits call to out_of_memory() which is allowed
> for __GFP_NOFAIL allocations. 

It wasn't called for PF_MEMALLOC requests though. Whether invoking OOM
killer is a good idea for this case is a harder question and out of
scope of this patch.

> Even if this is what you meant, current thread
> can be a workqueue, which currently need a short sleep (as with
> wait_iff_congested() changes), can't it?

As the changelog tries to clarify PF_MEMALLOC with __GFP_NOFAIL is
basically a bug. That is the reason I am adding WARN_ON there. I do not
think making this code more complex for abusers/buggy code is really
worthwhile. Besides that I fail to see why a work item would ever
want to set PF_MEMALLOC for legitimate reasons. I have done a quick git
grep over the tree and there doesn't seem to be any user.

> 
> > +		}
> >   		goto nopage;
> > +	}
> >   
> >   	/* Avoid allocations with no watermarks from looping endlessly */
> >   	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > 
> 
> Well, is it cond_resched() which should include
> 
>   if (current->flags & PF_WQ_WORKER)
>   	schedule_timeout(1);

I believe you are getting off-topic here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-18  9:11       ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-18  9:11 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On Tue 17-11-15 19:58:09, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> > __GFP_NOFAIL is requested. This is fragile because we are basically
> > relying on somebody else to make the reclaim (be it the direct reclaim
> > or OOM killer) for us. The caller might be holding resources (e.g.
> > locks) which block other other reclaimers from making any progress for
> > example. Remove the retry loop and rely on __alloc_pages_slowpath to
> > invoke all allowed reclaim steps and retry logic.
> 
> This implies invoking OOM killer, doesn't it?

It does and the changelog is explicit about this.

> >   	/* Avoid recursion of direct reclaim */
> > -	if (current->flags & PF_MEMALLOC)
> > +	if (current->flags & PF_MEMALLOC) {
> > +		/*
> > +		 * __GFP_NOFAIL request from this context is rather bizarre
> > +		 * because we cannot reclaim anything and only can loop waiting
> > +		 * for somebody to do a work for us.
> > +		 */
> > +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> > +			cond_resched();
> > +			goto retry;
> 
> I think that this "goto retry;" omits call to out_of_memory() which is allowed
> for __GFP_NOFAIL allocations. 

It wasn't called for PF_MEMALLOC requests though. Whether invoking OOM
killer is a good idea for this case is a harder question and out of
scope of this patch.

> Even if this is what you meant, current thread
> can be a workqueue, which currently need a short sleep (as with
> wait_iff_congested() changes), can't it?

As the changelog tries to clarify PF_MEMALLOC with __GFP_NOFAIL is
basically a bug. That is the reason I am adding WARN_ON there. I do not
think making this code more complex for abusers/buggy code is really
worthwhile. Besides that I fail to see why a work item would ever
want to set PF_MEMALLOC for legitimate reasons. I have done a quick git
grep over the tree and there doesn't seem to be any user.

> 
> > +		}
> >   		goto nopage;
> > +	}
> >   
> >   	/* Avoid allocations with no watermarks from looping endlessly */
> >   	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> > 
> 
> Well, is it cond_resched() which should include
> 
>   if (current->flags & PF_WQ_WORKER)
>   	schedule_timeout(1);

I believe you are getting off-topic here.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-18  9:11       ` Michal Hocko
@ 2015-11-18  9:22         ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-18  9:22 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On Wed 18-11-15 10:11:01, Michal Hocko wrote:
> Besides that I fail to see why a work item would ever
> want to set PF_MEMALLOC for legitimate reasons. I have done a quick git
> grep over the tree and there doesn't seem to be any user.

OK, I have missed one case. xfs_btree_split_worker is really setting
PF_MEMALLOC from the worker context basically to inherit the flag from
kswapd. This is a legitimate use but it doesn't affect the allocation
path so it is not related to this discussion.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-18  9:22         ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-18  9:22 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On Wed 18-11-15 10:11:01, Michal Hocko wrote:
> Besides that I fail to see why a work item would ever
> want to set PF_MEMALLOC for legitimate reasons. I have done a quick git
> grep over the tree and there doesn't seem to be any user.

OK, I have missed one case. xfs_btree_split_worker is really setting
PF_MEMALLOC from the worker context basically to inherit the flag from
kswapd. This is a legitimate use but it doesn't affect the allocation
path so it is not related to this discussion.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
  2015-11-16 13:22   ` mhocko
@ 2015-11-18 14:48     ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2015-11-18 14:48 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

On 11/16/2015 02:22 PM, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_high_priority doesn't do anything special other than it
> calls get_page_from_freelist and loops around GFP_NOFAIL allocation
> until it succeeds. It would be better if the first part was done in
> __alloc_pages_slowpath where we modify the zonelist because this would
> be easier to read and understand. Opencoding the function into its only
> caller allows to simplify it a bit as well.
> 
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] mm: get rid of __alloc_pages_high_priority
@ 2015-11-18 14:48     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2015-11-18 14:48 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

On 11/16/2015 02:22 PM, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_high_priority doesn't do anything special other than it
> calls get_page_from_freelist and loops around GFP_NOFAIL allocation
> until it succeeds. It would be better if the first part was done in
> __alloc_pages_slowpath where we modify the zonelist because this would
> be easier to read and understand. Opencoding the function into its only
> caller allows to simplify it a bit as well.
> 
> This patch doesn't introduce any functional changes.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-16 13:22   ` mhocko
@ 2015-11-18 14:57     ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2015-11-18 14:57 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

On 11/16/2015 02:22 PM, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.
> 
> We have to be careful about __GFP_NOFAIL allocations from the
> PF_MEMALLOC context even though this is a very bad idea to begin with
> because no progress can be gurateed at all.  We shouldn't break the
> __GFP_NOFAIL semantic here though. It could be argued that this is
> essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
> is much harder to check for existing users because they might happen
> deep down the code path performed much later after setting the flag
> so we cannot really rule out there is no kernel path triggering this
> combination.
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b153fa3d0b9b..df7746280427 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 * allocations are system rather than user orientated
>  		 */
>  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> -		do {
> -			page = get_page_from_freelist(gfp_mask, order,
> -							ALLOC_NO_WATERMARKS, ac);
> -			if (page)
> -				goto got_pg;
> -
> -			if (gfp_mask & __GFP_NOFAIL)
> -				wait_iff_congested(ac->preferred_zone,
> -						   BLK_RW_ASYNC, HZ/50);

I've been thinking if the lack of unconditional wait_iff_congested() can affect
something negatively. I guess not?

> -		} while (gfp_mask & __GFP_NOFAIL);
> +		page = get_page_from_freelist(gfp_mask, order,
> +						ALLOC_NO_WATERMARKS, ac);
> +		if (page)
> +			goto got_pg;
>  	}
>  
>  	/* Caller is not willing to reclaim, we can't balance anything */
>  	if (!can_direct_reclaim) {
>  		/*
> -		 * All existing users of the deprecated __GFP_NOFAIL are
> -		 * blockable, so warn of any new users that actually allow this
> -		 * type of allocation to fail.
> +		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> +		 * of any new users that actually allow this type of allocation
> +		 * to fail.
>  		 */
>  		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
>  		goto nopage;
>  	}
>  
>  	/* Avoid recursion of direct reclaim */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		/*
> +		 * __GFP_NOFAIL request from this context is rather bizarre
> +		 * because we cannot reclaim anything and only can loop waiting
> +		 * for somebody to do a work for us.
> +		 */
> +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +			cond_resched();
> +			goto retry;
> +		}
>  		goto nopage;
> +	}
>  
>  	/* Avoid allocations with no watermarks from looping endlessly */
>  	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-18 14:57     ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2015-11-18 14:57 UTC (permalink / raw)
  To: mhocko, Andrew Morton
  Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Michal Hocko

On 11/16/2015 02:22 PM, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.
> 
> We have to be careful about __GFP_NOFAIL allocations from the
> PF_MEMALLOC context even though this is a very bad idea to begin with
> because no progress can be gurateed at all.  We shouldn't break the
> __GFP_NOFAIL semantic here though. It could be argued that this is
> essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
> is much harder to check for existing users because they might happen
> deep down the code path performed much later after setting the flag
> so we cannot really rule out there is no kernel path triggering this
> combination.
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b153fa3d0b9b..df7746280427 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 * allocations are system rather than user orientated
>  		 */
>  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> -		do {
> -			page = get_page_from_freelist(gfp_mask, order,
> -							ALLOC_NO_WATERMARKS, ac);
> -			if (page)
> -				goto got_pg;
> -
> -			if (gfp_mask & __GFP_NOFAIL)
> -				wait_iff_congested(ac->preferred_zone,
> -						   BLK_RW_ASYNC, HZ/50);

I've been thinking if the lack of unconditional wait_iff_congested() can affect
something negatively. I guess not?

> -		} while (gfp_mask & __GFP_NOFAIL);
> +		page = get_page_from_freelist(gfp_mask, order,
> +						ALLOC_NO_WATERMARKS, ac);
> +		if (page)
> +			goto got_pg;
>  	}
>  
>  	/* Caller is not willing to reclaim, we can't balance anything */
>  	if (!can_direct_reclaim) {
>  		/*
> -		 * All existing users of the deprecated __GFP_NOFAIL are
> -		 * blockable, so warn of any new users that actually allow this
> -		 * type of allocation to fail.
> +		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> +		 * of any new users that actually allow this type of allocation
> +		 * to fail.
>  		 */
>  		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
>  		goto nopage;
>  	}
>  
>  	/* Avoid recursion of direct reclaim */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		/*
> +		 * __GFP_NOFAIL request from this context is rather bizarre
> +		 * because we cannot reclaim anything and only can loop waiting
> +		 * for somebody to do a work for us.
> +		 */
> +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +			cond_resched();
> +			goto retry;
> +		}
>  		goto nopage;
> +	}
>  
>  	/* Avoid allocations with no watermarks from looping endlessly */
>  	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-18 14:57     ` Vlastimil Babka
@ 2015-11-18 15:11       ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-18 15:11 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On Wed 18-11-15 15:57:45, Vlastimil Babka wrote:
[...]
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  		 * allocations are system rather than user orientated
> >  		 */
> >  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> > -		do {
> > -			page = get_page_from_freelist(gfp_mask, order,
> > -							ALLOC_NO_WATERMARKS, ac);
> > -			if (page)
> > -				goto got_pg;
> > -
> > -			if (gfp_mask & __GFP_NOFAIL)
> > -				wait_iff_congested(ac->preferred_zone,
> > -						   BLK_RW_ASYNC, HZ/50);
> 
> I've been thinking if the lack of unconditional wait_iff_congested() can affect
> something negatively. I guess not?

Considering that the wait_iff_congested is removed only for PF_MEMALLOC
with __GFP_NOFAIL which should be non-existent in the kernel then I
think the risk is really low. Even if there was a caller _and_ there
was a congestion then the behavior wouldn't be much more worse than
what we have currently. The system is out of memory hoplessly if
ALLOC_NO_WATERMARKS allocation fails.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-18 15:11       ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-18 15:11 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On Wed 18-11-15 15:57:45, Vlastimil Babka wrote:
[...]
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  		 * allocations are system rather than user orientated
> >  		 */
> >  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> > -		do {
> > -			page = get_page_from_freelist(gfp_mask, order,
> > -							ALLOC_NO_WATERMARKS, ac);
> > -			if (page)
> > -				goto got_pg;
> > -
> > -			if (gfp_mask & __GFP_NOFAIL)
> > -				wait_iff_congested(ac->preferred_zone,
> > -						   BLK_RW_ASYNC, HZ/50);
> 
> I've been thinking if the lack of unconditional wait_iff_congested() can affect
> something negatively. I guess not?

Considering that the wait_iff_congested is removed only for PF_MEMALLOC
with __GFP_NOFAIL which should be non-existent in the kernel then I
think the risk is really low. Even if there was a caller _and_ there
was a congestion then the behavior wouldn't be much more worse than
what we have currently. The system is out of memory hoplessly if
ALLOC_NO_WATERMARKS allocation fails.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-18 15:11       ` Michal Hocko
@ 2015-11-18 15:19         ` Vlastimil Babka
  -1 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2015-11-18 15:19 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On 11/18/2015 04:11 PM, Michal Hocko wrote:
> On Wed 18-11-15 15:57:45, Vlastimil Babka wrote:
> [...]
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>> >  		 * allocations are system rather than user orientated
>> >  		 */
>> >  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
>> > -		do {
>> > -			page = get_page_from_freelist(gfp_mask, order,
>> > -							ALLOC_NO_WATERMARKS, ac);
>> > -			if (page)
>> > -				goto got_pg;
>> > -
>> > -			if (gfp_mask & __GFP_NOFAIL)
>> > -				wait_iff_congested(ac->preferred_zone,
>> > -						   BLK_RW_ASYNC, HZ/50);
>> 
>> I've been thinking if the lack of unconditional wait_iff_congested() can affect
>> something negatively. I guess not?
> 
> Considering that the wait_iff_congested is removed only for PF_MEMALLOC
> with __GFP_NOFAIL which should be non-existent in the kernel then I

Hm that one won't reach it indeed, but also not loop, so that wasn't my concern.
I was referring to:

        /* Keep reclaiming pages as long as there is reasonable progress */                            
        pages_reclaimed += did_some_progress;
        if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) ||
            ((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) {                           
                /* Wait for some write requests to complete then retry */
                wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);                           
                goto retry;                                                                            
        }

Here we might skip the wait_iff_congested and go straight for oom. But it's true
that ordinary allocations that fail to make progress will also not wait, so I
guess it's fine.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> think the risk is really low. Even if there was a caller _and_ there
> was a congestion then the behavior wouldn't be much more worse than
> what we have currently. The system is out of memory hoplessly if
> ALLOC_NO_WATERMARKS allocation fails.
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-18 15:19         ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2015-11-18 15:19 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Mel Gorman, David Rientjes, linux-mm, LKML

On 11/18/2015 04:11 PM, Michal Hocko wrote:
> On Wed 18-11-15 15:57:45, Vlastimil Babka wrote:
> [...]
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>> >  		 * allocations are system rather than user orientated
>> >  		 */
>> >  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
>> > -		do {
>> > -			page = get_page_from_freelist(gfp_mask, order,
>> > -							ALLOC_NO_WATERMARKS, ac);
>> > -			if (page)
>> > -				goto got_pg;
>> > -
>> > -			if (gfp_mask & __GFP_NOFAIL)
>> > -				wait_iff_congested(ac->preferred_zone,
>> > -						   BLK_RW_ASYNC, HZ/50);
>> 
>> I've been thinking if the lack of unconditional wait_iff_congested() can affect
>> something negatively. I guess not?
> 
> Considering that the wait_iff_congested is removed only for PF_MEMALLOC
> with __GFP_NOFAIL which should be non-existent in the kernel then I

Hm that one won't reach it indeed, but also not loop, so that wasn't my concern.
I was referring to:

        /* Keep reclaiming pages as long as there is reasonable progress */                            
        pages_reclaimed += did_some_progress;
        if ((did_some_progress && order <= PAGE_ALLOC_COSTLY_ORDER) ||
            ((gfp_mask & __GFP_REPEAT) && pages_reclaimed < (1 << order))) {                           
                /* Wait for some write requests to complete then retry */
                wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);                           
                goto retry;                                                                            
        }

Here we might skip the wait_iff_congested and go straight for oom. But it's true
that ordinary allocations that fail to make progress will also not wait, so I
guess it's fine.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> think the risk is really low. Even if there was a caller _and_ there
> was a congestion then the behavior wouldn't be much more worse than
> what we have currently. The system is out of memory hoplessly if
> ALLOC_NO_WATERMARKS allocation fails.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
  2015-11-16 13:22   ` mhocko
@ 2015-11-23  9:33     ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-23  9:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Tetsuo Handa

It seems this patch hasn't reached the mmotm tree. Are there any
unresolved concerns left?

On Mon 16-11-15 14:22:19, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.
> 
> We have to be careful about __GFP_NOFAIL allocations from the
> PF_MEMALLOC context even though this is a very bad idea to begin with
> because no progress can be gurateed at all.  We shouldn't break the
> __GFP_NOFAIL semantic here though. It could be argued that this is
> essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
> is much harder to check for existing users because they might happen
> deep down the code path performed much later after setting the flag
> so we cannot really rule out there is no kernel path triggering this
> combination.
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b153fa3d0b9b..df7746280427 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 * allocations are system rather than user orientated
>  		 */
>  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> -		do {
> -			page = get_page_from_freelist(gfp_mask, order,
> -							ALLOC_NO_WATERMARKS, ac);
> -			if (page)
> -				goto got_pg;
> -
> -			if (gfp_mask & __GFP_NOFAIL)
> -				wait_iff_congested(ac->preferred_zone,
> -						   BLK_RW_ASYNC, HZ/50);
> -		} while (gfp_mask & __GFP_NOFAIL);
> +		page = get_page_from_freelist(gfp_mask, order,
> +						ALLOC_NO_WATERMARKS, ac);
> +		if (page)
> +			goto got_pg;
>  	}
>  
>  	/* Caller is not willing to reclaim, we can't balance anything */
>  	if (!can_direct_reclaim) {
>  		/*
> -		 * All existing users of the deprecated __GFP_NOFAIL are
> -		 * blockable, so warn of any new users that actually allow this
> -		 * type of allocation to fail.
> +		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> +		 * of any new users that actually allow this type of allocation
> +		 * to fail.
>  		 */
>  		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
>  		goto nopage;
>  	}
>  
>  	/* Avoid recursion of direct reclaim */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		/*
> +		 * __GFP_NOFAIL request from this context is rather bizarre
> +		 * because we cannot reclaim anything and only can loop waiting
> +		 * for somebody to do a work for us.
> +		 */
> +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +			cond_resched();
> +			goto retry;
> +		}
>  		goto nopage;
> +	}
>  
>  	/* Avoid allocations with no watermarks from looping endlessly */
>  	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> -- 
> 2.6.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim
@ 2015-11-23  9:33     ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2015-11-23  9:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, David Rientjes, linux-mm, LKML, Tetsuo Handa

It seems this patch hasn't reached the mmotm tree. Are there any
unresolved concerns left?

On Mon 16-11-15 14:22:19, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> __alloc_pages_slowpath is looping over ALLOC_NO_WATERMARKS requests if
> __GFP_NOFAIL is requested. This is fragile because we are basically
> relying on somebody else to make the reclaim (be it the direct reclaim
> or OOM killer) for us. The caller might be holding resources (e.g.
> locks) which block other other reclaimers from making any progress for
> example. Remove the retry loop and rely on __alloc_pages_slowpath to
> invoke all allowed reclaim steps and retry logic.
> 
> We have to be careful about __GFP_NOFAIL allocations from the
> PF_MEMALLOC context even though this is a very bad idea to begin with
> because no progress can be gurateed at all.  We shouldn't break the
> __GFP_NOFAIL semantic here though. It could be argued that this is
> essentially GFP_NOWAIT context which we do not support but PF_MEMALLOC
> is much harder to check for existing users because they might happen
> deep down the code path performed much later after setting the flag
> so we cannot really rule out there is no kernel path triggering this
> combination.
> 
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b153fa3d0b9b..df7746280427 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3046,32 +3046,36 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		 * allocations are system rather than user orientated
>  		 */
>  		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
> -		do {
> -			page = get_page_from_freelist(gfp_mask, order,
> -							ALLOC_NO_WATERMARKS, ac);
> -			if (page)
> -				goto got_pg;
> -
> -			if (gfp_mask & __GFP_NOFAIL)
> -				wait_iff_congested(ac->preferred_zone,
> -						   BLK_RW_ASYNC, HZ/50);
> -		} while (gfp_mask & __GFP_NOFAIL);
> +		page = get_page_from_freelist(gfp_mask, order,
> +						ALLOC_NO_WATERMARKS, ac);
> +		if (page)
> +			goto got_pg;
>  	}
>  
>  	/* Caller is not willing to reclaim, we can't balance anything */
>  	if (!can_direct_reclaim) {
>  		/*
> -		 * All existing users of the deprecated __GFP_NOFAIL are
> -		 * blockable, so warn of any new users that actually allow this
> -		 * type of allocation to fail.
> +		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> +		 * of any new users that actually allow this type of allocation
> +		 * to fail.
>  		 */
>  		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
>  		goto nopage;
>  	}
>  
>  	/* Avoid recursion of direct reclaim */
> -	if (current->flags & PF_MEMALLOC)
> +	if (current->flags & PF_MEMALLOC) {
> +		/*
> +		 * __GFP_NOFAIL request from this context is rather bizarre
> +		 * because we cannot reclaim anything and only can loop waiting
> +		 * for somebody to do a work for us.
> +		 */
> +		if (WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
> +			cond_resched();
> +			goto retry;
> +		}
>  		goto nopage;
> +	}
>  
>  	/* Avoid allocations with no watermarks from looping endlessly */
>  	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
> -- 
> 2.6.2
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-11-23  9:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-16 13:22 [PATCH 0/2] get rid of __alloc_pages_high_priority mhocko
2015-11-16 13:22 ` mhocko
2015-11-16 13:22 ` [PATCH 1/2] mm: " mhocko
2015-11-16 13:22   ` mhocko
2015-11-16 18:43   ` Mel Gorman
2015-11-16 18:43     ` Mel Gorman
2015-11-16 21:14   ` David Rientjes
2015-11-16 21:14     ` David Rientjes
2015-11-18 14:48   ` Vlastimil Babka
2015-11-18 14:48     ` Vlastimil Babka
2015-11-16 13:22 ` [PATCH 2/2] mm: do not loop over ALLOC_NO_WATERMARKS without triggering reclaim mhocko
2015-11-16 13:22   ` mhocko
2015-11-16 21:18   ` David Rientjes
2015-11-16 21:18     ` David Rientjes
2015-11-17 10:58   ` Tetsuo Handa
2015-11-17 10:58     ` Tetsuo Handa
2015-11-18  9:11     ` Michal Hocko
2015-11-18  9:11       ` Michal Hocko
2015-11-18  9:22       ` Michal Hocko
2015-11-18  9:22         ` Michal Hocko
2015-11-18 14:57   ` Vlastimil Babka
2015-11-18 14:57     ` Vlastimil Babka
2015-11-18 15:11     ` Michal Hocko
2015-11-18 15:11       ` Michal Hocko
2015-11-18 15:19       ` Vlastimil Babka
2015-11-18 15:19         ` Vlastimil Babka
2015-11-23  9:33   ` Michal Hocko
2015-11-23  9:33     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.