All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: avoid high-order page allocation warn with __GFP_NOFAIL
@ 2023-03-05  5:30 Gao Xiang
  2023-03-06  7:51 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Gao Xiang @ 2023-03-05  5:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Gao Xiang, Mel Gorman, Michal Hocko, Vlastimil Babka

My knowledge of this is somewhat limited, however, since vmalloc already
supported __GFP_NOFAIL in commit 9376130c390a ("mm/vmalloc: add
support for __GFP_NOFAIL").  __GFP_NOFAIL could trigger the following
stack and allocate high-order pages when CONFIG_HAVE_ARCH_HUGE_VMALLOC
is enabled:

 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
 vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
 __vmalloc_area_node mm/vmalloc.c:3057 [inline]
 __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
 kvmalloc_node+0x156/0x1a0 mm/util.c:606
 kvmalloc include/linux/slab.h:737 [inline]
 kvmalloc_array include/linux/slab.h:755 [inline]
 kvcalloc include/linux/slab.h:760 [inline]
 (codebase: Linux 6.2-rc2)

Don't warn such cases since high-order pages with __GFP_NOFAIL is
somewhat legel.

Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 mm/page_alloc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0745aedebb37..0618716c49df 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3822,12 +3822,6 @@ struct page *rmqueue(struct zone *preferred_zone,
 {
 	struct page *page;
 
-	/*
-	 * We most definitely don't want callers attempting to
-	 * allocate greater than order-1 page units with __GFP_NOFAIL.
-	 */
-	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
-
 	if (likely(pcp_allowed_order(order))) {
 		/*
 		 * MIGRATE_MOVABLE pcplist could have the pages on CMA area and
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/page_alloc: avoid high-order page allocation warn with __GFP_NOFAIL
  2023-03-05  5:30 [PATCH] mm/page_alloc: avoid high-order page allocation warn with __GFP_NOFAIL Gao Xiang
@ 2023-03-06  7:51 ` Michal Hocko
  2023-03-06  8:03   ` Gao Xiang
  2023-03-06 12:14   ` Uladzislau Rezki
  0 siblings, 2 replies; 9+ messages in thread
From: Michal Hocko @ 2023-03-06  7:51 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Vlastimil Babka, Baoquan He, Christoph Hellwig, Uladzislau Rezki

[Cc couple of more people recently involved with vmalloc code]

On Sun 05-03-23 13:30:35, Gao Xiang wrote:
> My knowledge of this is somewhat limited, however, since vmalloc already
> supported __GFP_NOFAIL in commit 9376130c390a ("mm/vmalloc: add
> support for __GFP_NOFAIL").  __GFP_NOFAIL could trigger the following
> stack and allocate high-order pages when CONFIG_HAVE_ARCH_HUGE_VMALLOC
> is enabled:
> 
>  __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
>  alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
>  vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
>
>  __vmalloc_area_node mm/vmalloc.c:3057 [inline]
>  __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
>  kvmalloc_node+0x156/0x1a0 mm/util.c:606
>  kvmalloc include/linux/slab.h:737 [inline]
>  kvmalloc_array include/linux/slab.h:755 [inline]
>  kvcalloc include/linux/slab.h:760 [inline]
>  (codebase: Linux 6.2-rc2)
> 
> Don't warn such cases since high-order pages with __GFP_NOFAIL is
> somewhat legel.

OK, this is definitely a bug and it seems my 9376130c390a was
incomplete because it hasn't covered the high order case. Not sure how
that happened but removing the warning is not the right thing to do
here. The higher order allocation is an optimization rather than a must.
So it is perfectly fine to fail that allocation and retry rather than
go into a very expensive and potentially impossible higher order
allocation that must not fail.

The proper fix should look like this unless I am missing something. I
would appreciate another pair of eyes on this because I am not fully
familiar with the high order optimization part much.

Thanks!
--- 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef910bf349e1..a8aa2765618a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 		unsigned int order, unsigned int nr_pages, struct page **pages)
 {
 	unsigned int nr_allocated = 0;
+	gfp_t alloc_gfp = gfp;
+	bool nofail = false;
 	struct page *page;
 	int i;
 
@@ -2931,20 +2933,30 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 			if (nr != nr_pages_request)
 				break;
 		}
+	} else {
+		alloc_gfp &= ~__GFP_NOFAIL;
+		nofail = true;
 	}
 
 	/* High-order pages or fallback path if "bulk" fails. */
-
 	while (nr_allocated < nr_pages) {
 		if (fatal_signal_pending(current))
 			break;
 
 		if (nid == NUMA_NO_NODE)
-			page = alloc_pages(gfp, order);
+			page = alloc_pages(alloc_gfp, order);
 		else
-			page = alloc_pages_node(nid, gfp, order);
-		if (unlikely(!page))
-			break;
+			page = alloc_pages_node(nid, alloc_gfp, order);
+		if (unlikely(!page)) {
+			if (!nofail)
+				break;
+
+			/* fall back to the zero order allocations */
+			alloc_gfp |= __GFP_NOFAIL;
+			order = 0;
+			continue;
+		}
+
 		/*
 		 * Higher order allocations must be able to be treated as
 		 * indepdenent small pages by callers (as they can with
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/page_alloc: avoid high-order page allocation warn with __GFP_NOFAIL
  2023-03-06  7:51 ` Michal Hocko
@ 2023-03-06  8:03   ` Gao Xiang
  2023-03-06 12:14   ` Uladzislau Rezki
  1 sibling, 0 replies; 9+ messages in thread
From: Gao Xiang @ 2023-03-06  8:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Vlastimil Babka, Baoquan He, Christoph Hellwig, Uladzislau Rezki



On 2023/3/6 15:51, Michal Hocko wrote:
> [Cc couple of more people recently involved with vmalloc code]
> 
> On Sun 05-03-23 13:30:35, Gao Xiang wrote:
>> My knowledge of this is somewhat limited, however, since vmalloc already
>> supported __GFP_NOFAIL in commit 9376130c390a ("mm/vmalloc: add
>> support for __GFP_NOFAIL").  __GFP_NOFAIL could trigger the following
>> stack and allocate high-order pages when CONFIG_HAVE_ARCH_HUGE_VMALLOC
>> is enabled:
>>
>>   __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
>>   alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
>>   vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
>>
>>   __vmalloc_area_node mm/vmalloc.c:3057 [inline]
>>   __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
>>   kvmalloc_node+0x156/0x1a0 mm/util.c:606
>>   kvmalloc include/linux/slab.h:737 [inline]
>>   kvmalloc_array include/linux/slab.h:755 [inline]
>>   kvcalloc include/linux/slab.h:760 [inline]
>>   (codebase: Linux 6.2-rc2)
>>
>> Don't warn such cases since high-order pages with __GFP_NOFAIL is
>> somewhat legel.
> 
> OK, this is definitely a bug and it seems my 9376130c390a was
> incomplete because it hasn't covered the high order case. Not sure how
> that happened but removing the warning is not the right thing to do
> here. The higher order allocation is an optimization rather than a must.
> So it is perfectly fine to fail that allocation and retry rather than
> go into a very expensive and potentially impossible higher order
> allocation that must not fail.
> 
> The proper fix should look like this unless I am missing something. I
> would appreciate another pair of eyes on this because I am not fully
> familiar with the high order optimization part much.

I'm fine with the fix. Although I'm not familiar with such vmalloc
allocation, I thought about this possibility as well.

The original issue was:
https://lore.kernel.org/r/0000000000007796bd05f1852ec2@google.com

which I used kvcalloc with __GFP_NOFAIL but it warned, and I made
a fix (which now seems wrong) to use kcalloc() but it now warns
the same:
https://lore.kernel.org/r/00000000000072eb6505f376dd4b@google.com

And I then realized it's a bug in kvmalloc() with __GFP_NOFAIL...

Thanks,
Gao Xiang

> 
> Thanks!
> ---
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ef910bf349e1..a8aa2765618a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>   		unsigned int order, unsigned int nr_pages, struct page **pages)
>   {
>   	unsigned int nr_allocated = 0;
> +	gfp_t alloc_gfp = gfp;
> +	bool nofail = false;
>   	struct page *page;
>   	int i;
>   
> @@ -2931,20 +2933,30 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>   			if (nr != nr_pages_request)
>   				break;
>   		}
> +	} else {
> +		alloc_gfp &= ~__GFP_NOFAIL;
> +		nofail = true;
>   	}
>   
>   	/* High-order pages or fallback path if "bulk" fails. */
> -
>   	while (nr_allocated < nr_pages) {
>   		if (fatal_signal_pending(current))
>   			break;
>   
>   		if (nid == NUMA_NO_NODE)
> -			page = alloc_pages(gfp, order);
> +			page = alloc_pages(alloc_gfp, order);
>   		else
> -			page = alloc_pages_node(nid, gfp, order);
> -		if (unlikely(!page))
> -			break;
> +			page = alloc_pages_node(nid, alloc_gfp, order);
> +		if (unlikely(!page)) {
> +			if (!nofail)
> +				break;
> +
> +			/* fall back to the zero order allocations */
> +			alloc_gfp |= __GFP_NOFAIL;
> +			order = 0;
> +			continue;
> +		}
> +
>   		/*
>   		 * Higher order allocations must be able to be treated as
>   		 * indepdenent small pages by callers (as they can with

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/page_alloc: avoid high-order page allocation warn with __GFP_NOFAIL
  2023-03-06  7:51 ` Michal Hocko
  2023-03-06  8:03   ` Gao Xiang
@ 2023-03-06 12:14   ` Uladzislau Rezki
  2023-03-06 14:03     ` [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Uladzislau Rezki @ 2023-03-06 12:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Gao Xiang, Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Vlastimil Babka, Baoquan He, Christoph Hellwig, Uladzislau Rezki

On Mon, Mar 06, 2023 at 08:51:40AM +0100, Michal Hocko wrote:
> [Cc couple of more people recently involved with vmalloc code]
> 
> On Sun 05-03-23 13:30:35, Gao Xiang wrote:
> > My knowledge of this is somewhat limited, however, since vmalloc already
> > supported __GFP_NOFAIL in commit 9376130c390a ("mm/vmalloc: add
> > support for __GFP_NOFAIL").  __GFP_NOFAIL could trigger the following
> > stack and allocate high-order pages when CONFIG_HAVE_ARCH_HUGE_VMALLOC
> > is enabled:
> > 
> >  __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
> >  alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
> >  vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
> >
> >  __vmalloc_area_node mm/vmalloc.c:3057 [inline]
> >  __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
> >  kvmalloc_node+0x156/0x1a0 mm/util.c:606
> >  kvmalloc include/linux/slab.h:737 [inline]
> >  kvmalloc_array include/linux/slab.h:755 [inline]
> >  kvcalloc include/linux/slab.h:760 [inline]
> >  (codebase: Linux 6.2-rc2)
> > 
> > Don't warn such cases since high-order pages with __GFP_NOFAIL is
> > somewhat legel.
> 
> OK, this is definitely a bug and it seems my 9376130c390a was
> incomplete because it hasn't covered the high order case. Not sure how
> that happened but removing the warning is not the right thing to do
> here. The higher order allocation is an optimization rather than a must.
> So it is perfectly fine to fail that allocation and retry rather than
> go into a very expensive and potentially impossible higher order
> allocation that must not fail.
>
> 
> The proper fix should look like this unless I am missing something. I
> would appreciate another pair of eyes on this because I am not fully
> familiar with the high order optimization part much.
> 
> Thanks!
> --- 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ef910bf349e1..a8aa2765618a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  		unsigned int order, unsigned int nr_pages, struct page **pages)
>  {
>  	unsigned int nr_allocated = 0;
> +	gfp_t alloc_gfp = gfp;
> +	bool nofail = false;
>  	struct page *page;
>  	int i;
>  
> @@ -2931,20 +2933,30 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  			if (nr != nr_pages_request)
>  				break;
>  		}
> +	} else {
> +		alloc_gfp &= ~__GFP_NOFAIL;
> +		nofail = true;
>  	}
>  
>  	/* High-order pages or fallback path if "bulk" fails. */
> -
>  	while (nr_allocated < nr_pages) {
>  		if (fatal_signal_pending(current))
>  			break;
>  
>  		if (nid == NUMA_NO_NODE)
> -			page = alloc_pages(gfp, order);
> +			page = alloc_pages(alloc_gfp, order);
>  		else
> -			page = alloc_pages_node(nid, gfp, order);
> -		if (unlikely(!page))
> -			break;
> +			page = alloc_pages_node(nid, alloc_gfp, order);
> +		if (unlikely(!page)) {
> +			if (!nofail)
> +				break;
> +
> +			/* fall back to the zero order allocations */
> +			alloc_gfp |= __GFP_NOFAIL;
> +			order = 0;
> +			continue;
> +		}
> +
>  		/*
>  		 * Higher order allocations must be able to be treated as
>  		 * indepdenent small pages by callers (as they can with

Some questions:

1. Could you please add a comment why you want the bulk_gfp without the __GFP_NOFAIL(bulk path)?
2. Could you please add a comment why a high order pages do not want __GFP_NOFAIL? You have already explained.
3. Looking at the patch:

<snip>
+       } else {
+               alloc_gfp &= ~__GFP_NOFAIL;
+               nofail = true;
<snip>

if user does not want to go with __GFP_NOFAIL flag why you force it in
case a high order allocation fails and you switch to 0 order allocations? 
(for high order-pages scenario you always use __GFP_NOFAIL in the order-0 recovery path).

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
  2023-03-06 12:14   ` Uladzislau Rezki
@ 2023-03-06 14:03     ` Michal Hocko
  2023-03-06 16:37       ` Uladzislau Rezki
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Michal Hocko @ 2023-03-06 14:03 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Gao Xiang, Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Vlastimil Babka, Baoquan He, Christoph Hellwig

On Mon 06-03-23 13:14:43, Uladzislau Rezki wrote:
[...]
> Some questions:
> 
> 1. Could you please add a comment why you want the bulk_gfp without
> the __GFP_NOFAIL(bulk path)?

The bulk allocator is not documented to fully support __GFP_NOFAIL
semantic IIRC. While it uses alloc_pages as fallback I didn't want
to make any assumptions based on the current implementation. At least
that is my recollection. If we do want to support NOFAIL by the batch
allocator then we can drop the special casing here.

> 2. Could you please add a comment why a high order pages do not want
> __GFP_NOFAIL? You have already explained.

See below
> 3. Looking at the patch:
> 
> <snip>
> +       } else {
> +               alloc_gfp &= ~__GFP_NOFAIL;
> +               nofail = true;
> <snip>
> 
> if user does not want to go with __GFP_NOFAIL flag why you force it in
> case a high order allocation fails and you switch to 0 order allocations? 

Not intended. The above should have been else if (gfp & __GFP_NOFAIL).
Thanks for catching that!

This would be the full patch with the description:
--- 
From 3ccfaa15bf2587b8998c129533a0404fedf5a484 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 6 Mar 2023 09:15:17 +0100
Subject: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations

Gao Xiang has reported that the page allocator complains about high
order __GFP_NOFAIL request coming from the vmalloc core:

 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
 vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
 __vmalloc_area_node mm/vmalloc.c:3057 [inline]
 __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
 kvmalloc_node+0x156/0x1a0 mm/util.c:606
 kvmalloc include/linux/slab.h:737 [inline]
 kvmalloc_array include/linux/slab.h:755 [inline]
 kvcalloc include/linux/slab.h:760 [inline]

it seems that I have completely missed high order allocation backing
vmalloc areas case when implementing __GFP_NOFAIL support. This means
that [k]vmalloc at al. can allocate higher order allocations with
__GFP_NOFAIL which can trigger OOM killer for non-costly orders easily
or cause a lot of reclaim/compaction activity if those requests cannot
be satisfied.

Fix the issue by falling back to zero order allocations for __GFP_NOFAIL
requests if the high order request fails.

Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmalloc.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef910bf349e1..bef6cf2b4d46 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 		unsigned int order, unsigned int nr_pages, struct page **pages)
 {
 	unsigned int nr_allocated = 0;
+	gfp_t alloc_gfp = gfp;
+	bool nofail = false;
 	struct page *page;
 	int i;
 
@@ -2893,6 +2895,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 	 * more permissive.
 	 */
 	if (!order) {
+		/* bulk allocator doesn't support nofail req. officially */
 		gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL;
 
 		while (nr_allocated < nr_pages) {
@@ -2931,20 +2934,35 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 			if (nr != nr_pages_request)
 				break;
 		}
+	} else if (gfp & __GFP_NOFAIL) {
+		/*
+		 * Higher order nofail allocations are really expensive and
+		 * potentially dangerous (pre-mature OOM, disruptive reclaim
+		 * and compaction etc.
+		 */
+		alloc_gfp &= ~__GFP_NOFAIL;
+		nofail = true;
 	}
 
 	/* High-order pages or fallback path if "bulk" fails. */
-
 	while (nr_allocated < nr_pages) {
 		if (fatal_signal_pending(current))
 			break;
 
 		if (nid == NUMA_NO_NODE)
-			page = alloc_pages(gfp, order);
+			page = alloc_pages(alloc_gfp, order);
 		else
-			page = alloc_pages_node(nid, gfp, order);
-		if (unlikely(!page))
-			break;
+			page = alloc_pages_node(nid, alloc_gfp, order);
+		if (unlikely(!page)) {
+			if (!nofail)
+				break;
+
+			/* fall back to the zero order allocations */
+			alloc_gfp |= __GFP_NOFAIL;
+			order = 0;
+			continue;
+		}
+
 		/*
 		 * Higher order allocations must be able to be treated as
 		 * indepdenent small pages by callers (as they can with
-- 
2.30.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
  2023-03-06 14:03     ` [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations Michal Hocko
@ 2023-03-06 16:37       ` Uladzislau Rezki
  2023-03-06 17:29       ` Vlastimil Babka
  2023-03-07  0:58       ` Baoquan He
  2 siblings, 0 replies; 9+ messages in thread
From: Uladzislau Rezki @ 2023-03-06 16:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki, Gao Xiang, Andrew Morton, linux-mm,
	linux-kernel, Mel Gorman, Vlastimil Babka, Baoquan He,
	Christoph Hellwig

On Mon, Mar 06, 2023 at 03:03:10PM +0100, Michal Hocko wrote:
> On Mon 06-03-23 13:14:43, Uladzislau Rezki wrote:
> [...]
> > Some questions:
> > 
> > 1. Could you please add a comment why you want the bulk_gfp without
> > the __GFP_NOFAIL(bulk path)?
> 
> The bulk allocator is not documented to fully support __GFP_NOFAIL
> semantic IIRC. While it uses alloc_pages as fallback I didn't want
> to make any assumptions based on the current implementation. At least
> that is my recollection. If we do want to support NOFAIL by the batch
> allocator then we can drop the special casing here.
> 
> > 2. Could you please add a comment why a high order pages do not want
> > __GFP_NOFAIL? You have already explained.
> 
> See below
> > 3. Looking at the patch:
> > 
> > <snip>
> > +       } else {
> > +               alloc_gfp &= ~__GFP_NOFAIL;
> > +               nofail = true;
> > <snip>
> > 
> > if user does not want to go with __GFP_NOFAIL flag why you force it in
> > case a high order allocation fails and you switch to 0 order allocations? 
> 
> Not intended. The above should have been else if (gfp & __GFP_NOFAIL).
> Thanks for catching that!
> 
> This would be the full patch with the description:
> --- 
> From 3ccfaa15bf2587b8998c129533a0404fedf5a484 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 6 Mar 2023 09:15:17 +0100
> Subject: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
> 
> Gao Xiang has reported that the page allocator complains about high
> order __GFP_NOFAIL request coming from the vmalloc core:
> 
>  __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
>  alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
>  vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
>  __vmalloc_area_node mm/vmalloc.c:3057 [inline]
>  __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
>  kvmalloc_node+0x156/0x1a0 mm/util.c:606
>  kvmalloc include/linux/slab.h:737 [inline]
>  kvmalloc_array include/linux/slab.h:755 [inline]
>  kvcalloc include/linux/slab.h:760 [inline]
> 
> it seems that I have completely missed high order allocation backing
> vmalloc areas case when implementing __GFP_NOFAIL support. This means
> that [k]vmalloc at al. can allocate higher order allocations with
> __GFP_NOFAIL which can trigger OOM killer for non-costly orders easily
> or cause a lot of reclaim/compaction activity if those requests cannot
> be satisfied.
> 
> Fix the issue by falling back to zero order allocations for __GFP_NOFAIL
> requests if the high order request fails.
> 
> Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
> Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmalloc.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ef910bf349e1..bef6cf2b4d46 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  		unsigned int order, unsigned int nr_pages, struct page **pages)
>  {
>  	unsigned int nr_allocated = 0;
> +	gfp_t alloc_gfp = gfp;
> +	bool nofail = false;
>  	struct page *page;
>  	int i;
>  
> @@ -2893,6 +2895,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  	 * more permissive.
>  	 */
>  	if (!order) {
> +		/* bulk allocator doesn't support nofail req. officially */
>  		gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL;
>  
>  		while (nr_allocated < nr_pages) {
> @@ -2931,20 +2934,35 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  			if (nr != nr_pages_request)
>  				break;
>  		}
> +	} else if (gfp & __GFP_NOFAIL) {
> +		/*
> +		 * Higher order nofail allocations are really expensive and
> +		 * potentially dangerous (pre-mature OOM, disruptive reclaim
> +		 * and compaction etc.
> +		 */
> +		alloc_gfp &= ~__GFP_NOFAIL;
> +		nofail = true;
>  	}
>  
>  	/* High-order pages or fallback path if "bulk" fails. */
> -
>  	while (nr_allocated < nr_pages) {
>  		if (fatal_signal_pending(current))
>  			break;
>  
>  		if (nid == NUMA_NO_NODE)
> -			page = alloc_pages(gfp, order);
> +			page = alloc_pages(alloc_gfp, order);
>  		else
> -			page = alloc_pages_node(nid, gfp, order);
> -		if (unlikely(!page))
> -			break;
> +			page = alloc_pages_node(nid, alloc_gfp, order);
> +		if (unlikely(!page)) {
> +			if (!nofail)
> +				break;
> +
> +			/* fall back to the zero order allocations */
> +			alloc_gfp |= __GFP_NOFAIL;
> +			order = 0;
> +			continue;
> +		}
> +
>  		/*
>  		 * Higher order allocations must be able to be treated as
>  		 * indepdenent small pages by callers (as they can with
> -- 
> 2.30.2
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
  2023-03-06 14:03     ` [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations Michal Hocko
  2023-03-06 16:37       ` Uladzislau Rezki
@ 2023-03-06 17:29       ` Vlastimil Babka
  2023-03-06 17:38         ` Michal Hocko
  2023-03-07  0:58       ` Baoquan He
  2 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2023-03-06 17:29 UTC (permalink / raw)
  To: Michal Hocko, Uladzislau Rezki
  Cc: Gao Xiang, Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	Baoquan He, Christoph Hellwig

On 3/6/23 15:03, Michal Hocko wrote:

> --- 
> From 3ccfaa15bf2587b8998c129533a0404fedf5a484 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 6 Mar 2023 09:15:17 +0100
> Subject: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
> 
> Gao Xiang has reported that the page allocator complains about high
> order __GFP_NOFAIL request coming from the vmalloc core:
> 
>  __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
>  alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
>  vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
>  __vmalloc_area_node mm/vmalloc.c:3057 [inline]
>  __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
>  kvmalloc_node+0x156/0x1a0 mm/util.c:606
>  kvmalloc include/linux/slab.h:737 [inline]
>  kvmalloc_array include/linux/slab.h:755 [inline]
>  kvcalloc include/linux/slab.h:760 [inline]
> 
> it seems that I have completely missed high order allocation backing
> vmalloc areas case when implementing __GFP_NOFAIL support. This means
> that [k]vmalloc at al. can allocate higher order allocations with
> __GFP_NOFAIL which can trigger OOM killer for non-costly orders easily
> or cause a lot of reclaim/compaction activity if those requests cannot
> be satisfied.
> 
> Fix the issue by falling back to zero order allocations for __GFP_NOFAIL
> requests if the high order request fails.
> 
> Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
> Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/vmalloc.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ef910bf349e1..bef6cf2b4d46 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  		unsigned int order, unsigned int nr_pages, struct page **pages)
>  {
>  	unsigned int nr_allocated = 0;
> +	gfp_t alloc_gfp = gfp;
> +	bool nofail = false;
>  	struct page *page;
>  	int i;
>  
> @@ -2893,6 +2895,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  	 * more permissive.
>  	 */
>  	if (!order) {
> +		/* bulk allocator doesn't support nofail req. officially */
>  		gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL;
>  
>  		while (nr_allocated < nr_pages) {
> @@ -2931,20 +2934,35 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  			if (nr != nr_pages_request)
>  				break;
>  		}
> +	} else if (gfp & __GFP_NOFAIL) {
> +		/*
> +		 * Higher order nofail allocations are really expensive and
> +		 * potentially dangerous (pre-mature OOM, disruptive reclaim
> +		 * and compaction etc.

				      ^ unclosed parenthesis

> +		 */
> +		alloc_gfp &= ~__GFP_NOFAIL;
> +		nofail = true;
>  	}
>  
>  	/* High-order pages or fallback path if "bulk" fails. */
> -
>  	while (nr_allocated < nr_pages) {
>  		if (fatal_signal_pending(current))
>  			break;
>  
>  		if (nid == NUMA_NO_NODE)
> -			page = alloc_pages(gfp, order);
> +			page = alloc_pages(alloc_gfp, order);
>  		else
> -			page = alloc_pages_node(nid, gfp, order);
> -		if (unlikely(!page))
> -			break;
> +			page = alloc_pages_node(nid, alloc_gfp, order);
> +		if (unlikely(!page)) {
> +			if (!nofail)
> +				break;
> +
> +			/* fall back to the zero order allocations */
> +			alloc_gfp |= __GFP_NOFAIL;
> +			order = 0;
> +			continue;
> +		}
> +
>  		/*
>  		 * Higher order allocations must be able to be treated as
>  		 * indepdenent small pages by callers (as they can with

		   ^ while at it the typo could also be fixed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
  2023-03-06 17:29       ` Vlastimil Babka
@ 2023-03-06 17:38         ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2023-03-06 17:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Gao Xiang, Andrew Morton, linux-mm,
	linux-kernel, Mel Gorman, Baoquan He, Christoph Hellwig

Thanks. Here is an incremental diff
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bef6cf2b4d46..b01295672a31 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2938,7 +2938,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 		/*
 		 * Higher order nofail allocations are really expensive and
 		 * potentially dangerous (pre-mature OOM, disruptive reclaim
-		 * and compaction etc.
+		 * and compaction etc).
 		 */
 		alloc_gfp &= ~__GFP_NOFAIL;
 		nofail = true;
@@ -2965,7 +2965,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
 
 		/*
 		 * Higher order allocations must be able to be treated as
-		 * indepdenent small pages by callers (as they can with
+		 * independent small pages by callers (as they can with
 		 * small-page vmallocs). Some drivers do their own refcounting
 		 * on vmalloc_to_page() pages, some use page->mapping,
 		 * page->lru, etc.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
  2023-03-06 14:03     ` [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations Michal Hocko
  2023-03-06 16:37       ` Uladzislau Rezki
  2023-03-06 17:29       ` Vlastimil Babka
@ 2023-03-07  0:58       ` Baoquan He
  2 siblings, 0 replies; 9+ messages in thread
From: Baoquan He @ 2023-03-07  0:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki, Gao Xiang, Andrew Morton, linux-mm,
	linux-kernel, Mel Gorman, Vlastimil Babka, Christoph Hellwig

n 03/06/23 at 03:03pm, Michal Hocko wrote:
 On Mon 06-03-23 13:14:43, Uladzislau Rezki wrote:
 [...]
 > Some questions:
 > 
 > 1. Could you please add a comment why you want the bulk_gfp without
 > the __GFP_NOFAIL(bulk path)?
 
 The bulk allocator is not documented to fully support __GFP_NOFAIL
 semantic IIRC. While it uses alloc_pages as fallback I didn't want
 to make any assumptions based on the current implementation. At least
 that is my recollection. If we do want to support NOFAIL by the batch
 allocator then we can drop the special casing here.
 
 > 2. Could you please add a comment why a high order pages do not want
 > __GFP_NOFAIL? You have already explained.
 
 See below
 > 3. Looking at the patch:
 > 
 > <snip>
 > +       } else {
 > +               alloc_gfp &= ~__GFP_NOFAIL;
 > +               nofail = true;
 > <snip>
 > 
 > if user does not want to go with __GFP_NOFAIL flag why you force it in
 > case a high order allocation fails and you switch to 0 order allocations? 
 
 Not intended. The above should have been else if (gfp & __GFP_NOFAIL).
 Thanks for catching that!
 
 This would be the full patch with the description:
 --- 
 From 3ccfaa15bf2587b8998c129533a0404fedf5a484 Mon Sep 17 00:00:00 2001
 From: Michal Hocko <mhocko@suse.com>
 Date: Mon, 6 Mar 2023 09:15:17 +0100
 Subject: [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations
 
 Gao Xiang has reported that the page allocator complains about high
 order __GFP_NOFAIL request coming from the vmalloc core:
 
  __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5549
  alloc_pages+0x1aa/0x270 mm/mempolicy.c:2286
  vm_area_alloc_pages mm/vmalloc.c:2989 [inline]
  __vmalloc_area_node mm/vmalloc.c:3057 [inline]
  __vmalloc_node_range+0x978/0x13c0 mm/vmalloc.c:3227
  kvmalloc_node+0x156/0x1a0 mm/util.c:606
  kvmalloc include/linux/slab.h:737 [inline]
  kvmalloc_array include/linux/slab.h:755 [inline]
  kvcalloc include/linux/slab.h:760 [inline]
 
 it seems that I have completely missed high order allocation backing
 vmalloc areas case when implementing __GFP_NOFAIL support. This means
 that [k]vmalloc at al. can allocate higher order allocations with
 __GFP_NOFAIL which can trigger OOM killer for non-costly orders easily
 or cause a lot of reclaim/compaction activity if those requests cannot
 be satisfied.
 
 Fix the issue by falling back to zero order allocations for __GFP_NOFAIL
 requests if the high order request fails.
 
 Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
 Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
 Signed-off-by: Michal Hocko <mhocko@suse.com>
 ---
  mm/vmalloc.c | 28 +++++++++++++++++++++++-----
  1 file changed, 23 insertions(+), 5 deletions(-)
 
 diff --git a/mm/vmalloc.c b/mm/vmalloc.c
 index ef910bf349e1..bef6cf2b4d46 100644
 --- a/mm/vmalloc.c
 +++ b/mm/vmalloc.c
 @@ -2883,6 +2883,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
  		unsigned int order, unsigned int nr_pages, struct page **pages)
  {
  	unsigned int nr_allocated = 0;
 +	gfp_t alloc_gfp = gfp;
 +	bool nofail = false;
  	struct page *page;
  	int i;
  
 @@ -2893,6 +2895,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
  	 * more permissive.
  	 */
  	if (!order) {
 +		/* bulk allocator doesn't support nofail req. officially */
  		gfp_t bulk_gfp = gfp & ~__GFP_NOFAIL;
  
  		while (nr_allocated < nr_pages) {
 @@ -2931,20 +2934,35 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
  			if (nr != nr_pages_request)
  				break;
  		}
 +	} else if (gfp & __GFP_NOFAIL) {
 +		/*
 +		 * Higher order nofail allocations are really expensive and
 +		 * potentially dangerous (pre-mature OOM, disruptive reclaim
 +		 * and compaction etc.
 +		 */
 +		alloc_gfp &= ~__GFP_NOFAIL;
 +		nofail = true;
  	}
  
  	/* High-order pages or fallback path if "bulk" fails. */
 -
  	while (nr_allocated < nr_pages) {
  		if (fatal_signal_pending(current))
  			break;
  
  		if (nid == NUMA_NO_NODE)
 -			page = alloc_pages(gfp, order);
 +			page = alloc_pages(alloc_gfp, order);
  		else
 -			page = alloc_pages_node(nid, gfp, order);
 -		if (unlikely(!page))
 -			break;
 +			page = alloc_pages_node(nid, alloc_gfp, order);
 +		if (unlikely(!page)) {
 +			if (!nofail)
 +				break;
 +
 +			/* fall back to the zero order allocations */
 +			alloc_gfp |= __GFP_NOFAIL;
 +			order = 0;
 +			continue;
 +		}
 +
  		/*
  		 * Higher order allocations must be able to be treated as
  		 * indepdenent small pages by callers (as they can with

Reivewed-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-07  0:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-05  5:30 [PATCH] mm/page_alloc: avoid high-order page allocation warn with __GFP_NOFAIL Gao Xiang
2023-03-06  7:51 ` Michal Hocko
2023-03-06  8:03   ` Gao Xiang
2023-03-06 12:14   ` Uladzislau Rezki
2023-03-06 14:03     ` [PATCH] mm, vmalloc: fix high order __GFP_NOFAIL allocations Michal Hocko
2023-03-06 16:37       ` Uladzislau Rezki
2023-03-06 17:29       ` Vlastimil Babka
2023-03-06 17:38         ` Michal Hocko
2023-03-07  0:58       ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.