linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches
@ 2021-07-13 15:20 Mel Gorman
  2021-07-13 15:20 ` [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Mel Gorman @ 2021-07-13 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML, Mel Gorman

(This v2 is because I didn't refresh the patches from my git tree properly
before sending, sorry for the noise)

This series is some fixes that would have likely have been included in
the 5.14-rc1 merge window if they were on time.  Mail indicates that some
may already be picked up for mmotm but the tree is not up to date yet so
I'm including them just in case.

Three are fixes to the bulk memory allocator and one is a fallout from
cleaning up warnings that trips BTF that expected a symbol to be global.

 mm/page_alloc.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held
  2021-07-13 15:20 [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
@ 2021-07-13 15:20 ` Mel Gorman
  2021-07-13 15:20 ` [PATCH 2/4] mm/page_alloc: correct return value when failing at preparing Mel Gorman
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2021-07-13 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML, Mel Gorman

Syzbot is reporting potential deadlocks due to pagesets.lock when
PAGE_OWNER is enabled. One example from Desmond Cheong Zhi Xi is
as follows

  __alloc_pages_bulk()
    local_lock_irqsave(&pagesets.lock, flags) <---- outer lock here
    prep_new_page():
      post_alloc_hook():
        set_page_owner():
          __set_page_owner():
            save_stack():
              stack_depot_save():
                alloc_pages():
                  alloc_page_interleave():
                    __alloc_pages():
                      get_page_from_freelist():
                        rm_queue():
                          rm_queue_pcplist():
                            local_lock_irqsave(&pagesets.lock, flags);
                            *** DEADLOCK ***

Zhang, Qiang also reported

  BUG: sleeping function called from invalid context at mm/page_alloc.c:5179
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
  .....
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:96
  ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9153
  prepare_alloc_pages+0x3da/0x580 mm/page_alloc.c:5179
  __alloc_pages+0x12f/0x500 mm/page_alloc.c:5375
  alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2147
  alloc_pages+0x238/0x2a0 mm/mempolicy.c:2270
  stack_depot_save+0x39d/0x4e0 lib/stackdepot.c:303
  save_stack+0x15e/0x1e0 mm/page_owner.c:120
  __set_page_owner+0x50/0x290 mm/page_owner.c:181
  prep_new_page mm/page_alloc.c:2445 [inline]
  __alloc_pages_bulk+0x8b9/0x1870 mm/page_alloc.c:5313
  alloc_pages_bulk_array_node include/linux/gfp.h:557 [inline]
  vm_area_alloc_pages mm/vmalloc.c:2775 [inline]
  __vmalloc_area_node mm/vmalloc.c:2845 [inline]
  __vmalloc_node_range+0x39d/0x960 mm/vmalloc.c:2947
  __vmalloc_node mm/vmalloc.c:2996 [inline]
  vzalloc+0x67/0x80 mm/vmalloc.c:3066

There are a number of ways it could be fixed. The page owner code could
be audited to strip GFP flags that allow sleeping but it'll impair the
functionality of PAGE_OWNER if allocations fail. The bulk allocator
could add a special case to release/reacquire the lock for prep_new_page
and lookup PCP after the lock is reacquired at the cost of performance.
The pages requiring prep could be tracked using the least significant
bit and looping through the array although it is more complicated for
the list interface. The options are relatively complex and the second
one still incurs a performance penalty when PAGE_OWNER is active so this
patch takes the simple approach -- disable bulk allocation if PAGE_OWNER is
active. The caller will be forced to allocate one page at a time incurring
a performance penalty but PAGE_OWNER is already a performance penalty.

Fixes: dbbee9d5cd83 ("mm/page_alloc: convert per-cpu list protection to local_lock")
Reported-by: Desmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
Reported-by: "Zhang, Qiang" <Qiang.Zhang@windriver.com>
Reported-and-tested-by: syzbot+127fd7828d6eeb611703@syzkaller.appspotmail.com
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
---
 mm/page_alloc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3b97e17806be..6ef86f338151 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5239,6 +5239,18 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	if (nr_pages - nr_populated == 1)
 		goto failed;
 
+#ifdef CONFIG_PAGE_OWNER
+	/*
+	 * PAGE_OWNER may recurse into the allocator to allocate space to
+	 * save the stack with pagesets.lock held. Releasing/reacquiring
+	 * removes much of the performance benefit of bulk allocation so
+	 * force the caller to allocate one page at a time as it'll have
+	 * similar performance to added complexity to the bulk allocator.
+	 */
+	if (static_branch_unlikely(&page_owner_inited))
+		goto failed;
+#endif
+
 	/* May set ALLOC_NOFRAGMENT, fragmentation will return 1 page. */
 	gfp &= gfp_allowed_mask;
 	alloc_gfp = gfp;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] mm/page_alloc: correct return value when failing at preparing
  2021-07-13 15:20 [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
  2021-07-13 15:20 ` [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held Mel Gorman
@ 2021-07-13 15:20 ` Mel Gorman
  2021-07-13 15:20 ` [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value Mel Gorman
  2021-07-13 15:21 ` [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman
  3 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2021-07-13 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML, Mel Gorman

From: Yanfei Xu <yanfei.xu@windriver.com>

If the array passed in is already partially populated, we should
return "nr_populated" even failing at preparing arguments stage.

Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20210709102855.55058-1-yanfei.xu@windriver.com
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6ef86f338151..803414ce9264 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5255,7 +5255,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	gfp &= gfp_allowed_mask;
 	alloc_gfp = gfp;
 	if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
-		return 0;
+		return nr_populated;
 	gfp = alloc_gfp;
 
 	/* Find an allowed local zone that meets the low watermark. */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value
  2021-07-13 15:20 [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
  2021-07-13 15:20 ` [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held Mel Gorman
  2021-07-13 15:20 ` [PATCH 2/4] mm/page_alloc: correct return value when failing at preparing Mel Gorman
@ 2021-07-13 15:20 ` Mel Gorman
  2021-07-13 15:34   ` Chuck Lever III
  2021-07-13 15:21 ` [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman
  3 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2021-07-13 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML, Mel Gorman

From: Chuck Lever <chuck.lever@oracle.com>

The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
bounds check after checking populated elements") was possibly
confused by the mixture of return values throughout the function.

The API contract is clear that the function "Returns the number of
pages on the list or array." It does not list zero as a unique
return value with a special meaning. Therefore zero is a plausible
return value only if @nr_pages is zero or less.

Clean up the return logic to make it clear that the returned value
is always the total number of pages in the array/list, not the
number of pages that were allocated during this call.

The only change in behavior with this patch is the value returned
if prepare_alloc_pages() fails. To match the API contract, the
number of pages currently in the array/list is returned in this
case.

The call site in __page_pool_alloc_pages_slow() also seems to be
confused on this matter. It should be attended to by someone who
is familiar with that code.

[mel@techsingularity.net: Return nr_populated if 0 pages are requested]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 803414ce9264..c66f1e6204c2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5221,9 +5221,6 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	int nr_populated = 0, nr_account = 0;
 
-	if (unlikely(nr_pages <= 0))
-		return 0;
-
 	/*
 	 * Skip populated array elements to determine if any pages need
 	 * to be allocated before disabling IRQs.
@@ -5231,9 +5228,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	while (page_array && nr_populated < nr_pages && page_array[nr_populated])
 		nr_populated++;
 
+	/* No pages requested? */
+	if (unlikely(nr_pages <= 0))
+		goto out;
+
 	/* Already populated array? */
 	if (unlikely(page_array && nr_pages - nr_populated == 0))
-		return nr_populated;
+		goto out;
 
 	/* Use the single page allocator for one page. */
 	if (nr_pages - nr_populated == 1)
@@ -5255,7 +5256,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	gfp &= gfp_allowed_mask;
 	alloc_gfp = gfp;
 	if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
-		return nr_populated;
+		goto out;
 	gfp = alloc_gfp;
 
 	/* Find an allowed local zone that meets the low watermark. */
@@ -5323,6 +5324,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
 
+out:
 	return nr_populated;
 
 failed_irq:
@@ -5338,7 +5340,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 		nr_populated++;
 	}
 
-	return nr_populated;
+	goto out;
 }
 EXPORT_SYMBOL_GPL(__alloc_pages_bulk);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-13 15:20 [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
                   ` (2 preceding siblings ...)
  2021-07-13 15:20 ` [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value Mel Gorman
@ 2021-07-13 15:21 ` Mel Gorman
  2021-07-14  7:06   ` John Hubbard
  3 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2021-07-13 15:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML, Mel Gorman

From: Matteo Croce <mcroce@microsoft.com>

This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.

Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:

  LD      vmlinux
  BTFIDS  vmlinux
FAILED unresolved symbol should_fail_alloc_page
make: *** [Makefile:1199: vmlinux] Error 255
make: *** Deleting file 'vmlinux'

Fixes: f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20210708191128.153796-1-mcroce@linux.microsoft.com
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c66f1e6204c2..3e97e68aef7a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3820,7 +3820,7 @@ static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 
 #endif /* CONFIG_FAIL_PAGE_ALLOC */
 
-static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
+noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 {
 	return __should_fail_alloc_page(gfp_mask, order);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value
  2021-07-13 15:20 ` [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value Mel Gorman
@ 2021-07-13 15:34   ` Chuck Lever III
  0 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever III @ 2021-07-13 15:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML



> On Jul 13, 2021, at 11:20 AM, Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The author of commit b3b64ebd3822 ("mm/page_alloc: do bulk array
> bounds check after checking populated elements") was possibly
> confused by the mixture of return values throughout the function.
> 
> The API contract is clear that the function "Returns the number of
> pages on the list or array." It does not list zero as a unique
> return value with a special meaning. Therefore zero is a plausible
> return value only if @nr_pages is zero or less.
> 
> Clean up the return logic to make it clear that the returned value
> is always the total number of pages in the array/list, not the
> number of pages that were allocated during this call.
> 
> The only change in behavior with this patch is the value returned
> if prepare_alloc_pages() fails. To match the API contract, the
> number of pages currently in the array/list is returned in this
> case.
> 
> The call site in __page_pool_alloc_pages_slow() also seems to be
> confused on this matter. It should be attended to by someone who
> is familiar with that code.
> 
> [mel@techsingularity.net: Return nr_populated if 0 pages are requested]
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
> mm/page_alloc.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 803414ce9264..c66f1e6204c2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5221,9 +5221,6 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
> 	int nr_populated = 0, nr_account = 0;
> 
> -	if (unlikely(nr_pages <= 0))
> -		return 0;
> -
> 	/*
> 	 * Skip populated array elements to determine if any pages need
> 	 * to be allocated before disabling IRQs.
> @@ -5231,9 +5228,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 	while (page_array && nr_populated < nr_pages && page_array[nr_populated])
> 		nr_populated++;
> 
> +	/* No pages requested? */
> +	if (unlikely(nr_pages <= 0))
> +		goto out;
> +
> 	/* Already populated array? */
> 	if (unlikely(page_array && nr_pages - nr_populated == 0))
> -		return nr_populated;
> +		goto out;
> 
> 	/* Use the single page allocator for one page. */
> 	if (nr_pages - nr_populated == 1)
> @@ -5255,7 +5256,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 	gfp &= gfp_allowed_mask;
> 	alloc_gfp = gfp;
> 	if (!prepare_alloc_pages(gfp, 0, preferred_nid, nodemask, &ac, &alloc_gfp, &alloc_flags))
> -		return nr_populated;
> +		goto out;

:thumbsup:  Thanks!


> 	gfp = alloc_gfp;
> 
> 	/* Find an allowed local zone that meets the low watermark. */
> @@ -5323,6 +5324,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
> 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
> 
> +out:
> 	return nr_populated;
> 
> failed_irq:
> @@ -5338,7 +5340,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> 		nr_populated++;
> 	}
> 
> -	return nr_populated;
> +	goto out;
> }
> EXPORT_SYMBOL_GPL(__alloc_pages_bulk);
> 
> -- 
> 2.26.2
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-13 15:21 ` [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman
@ 2021-07-14  7:06   ` John Hubbard
  2021-07-15  8:35     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 13+ messages in thread
From: John Hubbard @ 2021-07-14  7:06 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML

On 7/13/21 8:21 AM, Mel Gorman wrote:
> From: Matteo Croce <mcroce@microsoft.com>
> 
> This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.
> 
> Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:
> 
>    LD      vmlinux
>    BTFIDS  vmlinux
> FAILED unresolved symbol should_fail_alloc_page
> make: *** [Makefile:1199: vmlinux] Error 255
> make: *** Deleting file 'vmlinux'

Yes! I ran into this yesterday. Your patch fixes this build failure
for me, so feel free to add:

Tested-by: John Hubbard <jhubbard@nvidia.com>


However, I should add that I'm still seeing another build failure, after
fixing the above:

LD      vmlinux
BTFIDS  vmlinux
FAILED elf_update(WRITE): no error
make: *** [Makefile:1176: vmlinux] Error 255
make: *** Deleting file 'vmlinux'


...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
who is understands the BTFIDS build step can shed some light on that; I'm
not there yet. :)


thanks,
-- 
John Hubbard
NVIDIA

> 
> Fixes: f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static")
> Signed-off-by: Matteo Croce <mcroce@microsoft.com>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Link: https://lore.kernel.org/r/20210708191128.153796-1-mcroce@linux.microsoft.com
> ---
>   mm/page_alloc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c66f1e6204c2..3e97e68aef7a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3820,7 +3820,7 @@ static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
>   
>   #endif /* CONFIG_FAIL_PAGE_ALLOC */
>   
> -static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
> +noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
>   {
>   	return __should_fail_alloc_page(gfp_mask, order);
>   }
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-14  7:06   ` John Hubbard
@ 2021-07-15  8:35     ` Jesper Dangaard Brouer
  2021-07-16  0:04       ` John Hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2021-07-15  8:35 UTC (permalink / raw)
  To: John Hubbard, Mel Gorman, Andrew Morton, acme, Jiri Olsa
  Cc: brouer, Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu,
	Chuck Lever, Matteo Croce, Linux-MM, LKML, bpf

Cc. Jiri Olsa + Arnaldo

On 14/07/2021 09.06, John Hubbard wrote:
> On 7/13/21 8:21 AM, Mel Gorman wrote:
>> From: Matteo Croce <mcroce@microsoft.com>
>>
>> This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.
>>
>> Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:
>>
>>    LD      vmlinux
>>    BTFIDS  vmlinux
>> FAILED unresolved symbol should_fail_alloc_page
>> make: *** [Makefile:1199: vmlinux] Error 255
>> make: *** Deleting file 'vmlinux'
> 
> Yes! I ran into this yesterday. Your patch fixes this build failure
> for me, so feel free to add:
> 
> Tested-by: John Hubbard <jhubbard@nvidia.com>
> 
> 
> However, I should add that I'm still seeing another build failure, after
> fixing the above:
> 
> LD      vmlinux
> BTFIDS  vmlinux
> FAILED elf_update(WRITE): no error

This elf_update(WRITE) error is new to me.

> make: *** [Makefile:1176: vmlinux] Error 255
> make: *** Deleting file 'vmlinux'

It is annoying that vmlinux is deleted in this case, because I usually 
give Jiri the output from 'resolve_btfids -v' on vmlinux.

  $ ./tools/bpf/resolve_btfids/resolve_btfids -v vmlinux.failed

You can do:
$ git diff
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 3b261b0f74f0..02dec10a7d75 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -302,7 +302,8 @@ cleanup()
         rm -f .tmp_symversions.lds
         rm -f .tmp_vmlinux*
         rm -f System.map
-       rm -f vmlinux
+       # rm -f vmlinux
+       mv vmlinux vmlinux.failed
         rm -f vmlinux.o
  }


> 
> 
> ...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
> who is understands the BTFIDS build step can shed some light on that; I'm
> not there yet. :)

I'm just a user/consume of output from the BTFIDS build step, I think 
Jiri Olsa own the tool resolve_btfids, and ACME pahole.  I've hit a 
number of issues in the past that Jiri and ACME help resolve quickly.
The most efficient solution I've found was to upgrade pahole to a newer 
version.

What version of pahole does your build system have?

What is your GCC version?

--Jesper


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-15  8:35     ` Jesper Dangaard Brouer
@ 2021-07-16  0:04       ` John Hubbard
  2021-07-16  6:04         ` John Hubbard
  0 siblings, 1 reply; 13+ messages in thread
From: John Hubbard @ 2021-07-16  0:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Mel Gorman, Andrew Morton, acme, Jiri Olsa
  Cc: brouer, Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu,
	Chuck Lever, Matteo Croce, Linux-MM, LKML, bpf

...
>> LD      vmlinux
>> BTFIDS  vmlinux
>> FAILED elf_update(WRITE): no error
> 
> This elf_update(WRITE) error is new to me.
> 
>> make: *** [Makefile:1176: vmlinux] Error 255
>> make: *** Deleting file 'vmlinux'
> 
> It is annoying that vmlinux is deleted in this case, because I usually give Jiri the output from 
> 'resolve_btfids -v' on vmlinux.
> 
>   $ ./tools/bpf/resolve_btfids/resolve_btfids -v vmlinux.failed
> 
> You can do:
> $ git diff
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 3b261b0f74f0..02dec10a7d75 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -302,7 +302,8 @@ cleanup()
>          rm -f .tmp_symversions.lds
>          rm -f .tmp_vmlinux*
>          rm -f System.map
> -       rm -f vmlinux
> +       # rm -f vmlinux
> +       mv vmlinux vmlinux.failed
>          rm -f vmlinux.o
>   }
> 
> 
>>
>>
>> ...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
>> who is understands the BTFIDS build step can shed some light on that; I'm
>> not there yet. :)
> 
> I'm just a user/consume of output from the BTFIDS build step, I think Jiri Olsa own the tool 
> resolve_btfids, and ACME pahole.  I've hit a number of issues in the past that Jiri and ACME help 
> resolve quickly.
> The most efficient solution I've found was to upgrade pahole to a newer version.
> 
> What version of pahole does your build system have?
> 
> What is your GCC version?
> 

Just a quick answer first on the versions: this is an up to date Arch Linux system:

gcc: 11.1.0
pahole: 1.21

I'll try to get the other step done later this evening.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-16  0:04       ` John Hubbard
@ 2021-07-16  6:04         ` John Hubbard
  0 siblings, 0 replies; 13+ messages in thread
From: John Hubbard @ 2021-07-16  6:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Mel Gorman, Andrew Morton, acme, Jiri Olsa
  Cc: brouer, Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu,
	Chuck Lever, Matteo Croce, Linux-MM, LKML, bpf

On 7/15/21 5:04 PM, John Hubbard wrote:
...
>>> ...and un-setting CONFIG_DEBUG_INFO_BTF makes that disappear. Maybe someone
>>> who is understands the BTFIDS build step can shed some light on that; I'm
>>> not there yet. :)
>>
>> I'm just a user/consume of output from the BTFIDS build step, I think Jiri Olsa own the tool 
>> resolve_btfids, and ACME pahole.  I've hit a number of issues in the past that Jiri and ACME help 
>> resolve quickly.
>> The most efficient solution I've found was to upgrade pahole to a newer version.
>>
>> What version of pahole does your build system have?
>>
>> What is your GCC version?
>>
> 
> Just a quick answer first on the versions: this is an up to date Arch Linux system:
> 
> gcc: 11.1.0
> pahole: 1.21
> 
> I'll try to get the other step done later this evening.

...and...I've lost the repro completely. The only thing I changed was that I
attempted to update pahole. This caused Arch Linux reinstall pahole, claiming
that 1.21 is already the current version.

It acts as if there was something wrong with the pahole installation. This
seems unlikely, given that the system is merely on a routine update schedule.
However, that's the data I have.

If it ever comes up again I'll be able to run resolve_btfids, using your
steps here, so thanks for posting those!


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-15  6:34   ` Christoph Hellwig
@ 2021-07-15  7:36     ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2021-07-15  7:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu,
	Chuck Lever, Jesper Dangaard Brouer, Matteo Croce, Linux-MM,
	LKML

On Thu, Jul 15, 2021 at 07:34:53AM +0100, Christoph Hellwig wrote:
> On Tue, Jul 13, 2021 at 02:56:25PM +0100, Mel Gorman wrote:
> > From: Matteo Croce <mcroce@microsoft.com>
> > 
> > This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.
> > 
> > Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:
> 
> I still fundamentally disagreed with this "fix".  Whatever code requires
> a function to be non-static without a prototype and reference is
> completely fucked up beyond rescue and needs to be disabled util
> it can be fixed instead of worked around like this.

I'm definitely not happy with the fix but the breakage was unintentional
and given that it was done for a W=1 warning, the patch was low priority
and I felt that users that do error injection to stress failure paths at
least had some value. If I was fixing something important, I would feel
differently and we've slammed patches before that fixed warnings while
introducing worse problems. I'm still hoping that BTF gets fixed because
it's the right thing to do.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-13 13:56 ` [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman
@ 2021-07-15  6:34   ` Christoph Hellwig
  2021-07-15  7:36     ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2021-07-15  6:34 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu,
	Chuck Lever, Jesper Dangaard Brouer, Matteo Croce, Linux-MM,
	LKML

On Tue, Jul 13, 2021 at 02:56:25PM +0100, Mel Gorman wrote:
> From: Matteo Croce <mcroce@microsoft.com>
> 
> This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.
> 
> Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:

I still fundamentally disagreed with this "fix".  Whatever code requires
a function to be non-static without a prototype and reference is
completely fucked up beyond rescue and needs to be disabled util
it can be fixed instead of worked around like this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static"
  2021-07-13 13:56 [PATCH 0/4] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
@ 2021-07-13 13:56 ` Mel Gorman
  2021-07-15  6:34   ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2021-07-13 13:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Desmond Cheong Zhi Xi, Zhang Qiang, Yanfei Xu, Chuck Lever,
	Jesper Dangaard Brouer, Matteo Croce, Linux-MM, LKML, Mel Gorman

From: Matteo Croce <mcroce@microsoft.com>

This reverts commit f7173090033c70886d925995e9dfdfb76dbb2441.

Fix an unresolved symbol error when CONFIG_DEBUG_INFO_BTF=y:

  LD      vmlinux
  BTFIDS  vmlinux
FAILED unresolved symbol should_fail_alloc_page
make: *** [Makefile:1199: vmlinux] Error 255
make: *** Deleting file 'vmlinux'

Fixes: f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static")
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20210708191128.153796-1-mcroce@linux.microsoft.com
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e0eeb7391ec7..147bbd467214 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3820,7 +3820,7 @@ static inline bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 
 #endif /* CONFIG_FAIL_PAGE_ALLOC */
 
-static noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
+noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 {
 	return __should_fail_alloc_page(gfp_mask, order);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-07-16  6:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-13 15:20 [PATCH 0/4 v2] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
2021-07-13 15:20 ` [PATCH 1/4] mm/page_alloc: Avoid page allocator recursion with pagesets.lock held Mel Gorman
2021-07-13 15:20 ` [PATCH 2/4] mm/page_alloc: correct return value when failing at preparing Mel Gorman
2021-07-13 15:20 ` [PATCH 3/4] mm/page_alloc: Further fix __alloc_pages_bulk() return value Mel Gorman
2021-07-13 15:34   ` Chuck Lever III
2021-07-13 15:21 ` [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman
2021-07-14  7:06   ` John Hubbard
2021-07-15  8:35     ` Jesper Dangaard Brouer
2021-07-16  0:04       ` John Hubbard
2021-07-16  6:04         ` John Hubbard
  -- strict thread matches above, loose matches on Subject: below --
2021-07-13 13:56 [PATCH 0/4] 5.14-rc1 mm/page_alloc.c stray patches Mel Gorman
2021-07-13 13:56 ` [PATCH 4/4] Revert "mm/page_alloc: make should_fail_alloc_page() static" Mel Gorman
2021-07-15  6:34   ` Christoph Hellwig
2021-07-15  7:36     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).