stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix page corruption caused by racy check in __free_pages
@ 2023-02-09 17:48 David Chen
  2023-02-09 18:53 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: David Chen @ 2023-02-09 17:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Matthew Wilcox (Oracle), linux-mm, stable

When we upgraded our kernel, we started seeing some page corruption like
the following consistently:

 BUG: Bad page state in process ganesha.nfsd  pfn:1304ca
 page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
 flags: 0x17ffffc0000000()
 raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
 raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
 page dumped because: nonzero mapcount
 CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P    B      O      5.10.158-1.nutanix.20221209.el7.x86_64 #1
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
 Call Trace:
  dump_stack+0x74/0x96
  bad_page.cold+0x63/0x94
  check_new_page_bad+0x6d/0x80
  rmqueue+0x46e/0x970
  get_page_from_freelist+0xcb/0x3f0
  ? _cond_resched+0x19/0x40
  __alloc_pages_nodemask+0x164/0x300
  alloc_pages_current+0x87/0xf0
  skb_page_frag_refill+0x84/0x110
  ...

Sometimes, it would also show up as corruption in the free list pointer and
cause crashes.

After bisecting the issue, we found the issue started from e320d3012d25:

	if (put_page_testzero(page))
		free_the_page(page, order);
	else if (!PageHead(page))
		while (order-- > 0)
			free_the_page(page + (1 << order), order);

So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page. So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.

Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
---
 mm/page_alloc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0745aedebb37..3bb3484563ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5631,9 +5631,12 @@ EXPORT_SYMBOL(get_zeroed_page);
  */
 void __free_pages(struct page *page, unsigned int order)
 {
+	/* get PageHead before we drop reference */
+	int head = PageHead(page);
+
 	if (put_page_testzero(page))
 		free_the_page(page, order);
-	else if (!PageHead(page))
+	else if (!head)
 		while (order-- > 0)
 			free_the_page(page + (1 << order), order);
 }
-- 
2.22.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix page corruption caused by racy check in __free_pages
  2023-02-09 17:48 [PATCH] Fix page corruption caused by racy check in __free_pages David Chen
@ 2023-02-09 18:53 ` Matthew Wilcox
  2023-02-10 15:47 ` Vlastimil Babka
  2023-02-11 14:04 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2023-02-09 18:53 UTC (permalink / raw)
  To: David Chen; +Cc: linux-kernel, Andrew Morton, linux-mm, stable

On Thu, Feb 09, 2023 at 05:48:28PM +0000, David Chen wrote:
> So the problem is the check PageHead is racy because at this point we
> already dropped our reference to the page. So even if we came in with
> compound page, the page can already be freed and PageHead can return
> false and we will end up freeing all the tail pages causing double free.
> 
> Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: linux-mm@kvack.org
> Cc: stable@vger.kernel.org
> Signed-off-by: Chunwei Chen <david.chen@nutanix.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

That's a pretty narrow window you managed to hit!  Just a few
instructions wide.  I guess that answers the question Andrew originally
had (Does this ever happen?)  I just changed it from a silent memory
leak into a double-free.

> ---
>  mm/page_alloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0745aedebb37..3bb3484563ed 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5631,9 +5631,12 @@ EXPORT_SYMBOL(get_zeroed_page);
>   */
>  void __free_pages(struct page *page, unsigned int order)
>  {
> +	/* get PageHead before we drop reference */
> +	int head = PageHead(page);
> +
>  	if (put_page_testzero(page))
>  		free_the_page(page, order);
> -	else if (!PageHead(page))
> +	else if (!head)
>  		while (order-- > 0)
>  			free_the_page(page + (1 << order), order);
>  }
> -- 
> 2.22.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix page corruption caused by racy check in __free_pages
  2023-02-09 17:48 [PATCH] Fix page corruption caused by racy check in __free_pages David Chen
  2023-02-09 18:53 ` Matthew Wilcox
@ 2023-02-10 15:47 ` Vlastimil Babka
  2023-02-11 14:04 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2023-02-10 15:47 UTC (permalink / raw)
  To: David Chen, linux-kernel
  Cc: Andrew Morton, Matthew Wilcox (Oracle), linux-mm, stable

On 2/9/23 18:48, David Chen wrote:
> When we upgraded our kernel, we started seeing some page corruption like
> the following consistently:
> 
>  BUG: Bad page state in process ganesha.nfsd  pfn:1304ca
>  page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
>  flags: 0x17ffffc0000000()
>  raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
>  raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
>  page dumped because: nonzero mapcount
>  CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P    B      O      5.10.158-1.nutanix.20221209.el7.x86_64 #1
>  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
>  Call Trace:
>   dump_stack+0x74/0x96
>   bad_page.cold+0x63/0x94
>   check_new_page_bad+0x6d/0x80
>   rmqueue+0x46e/0x970
>   get_page_from_freelist+0xcb/0x3f0
>   ? _cond_resched+0x19/0x40
>   __alloc_pages_nodemask+0x164/0x300
>   alloc_pages_current+0x87/0xf0
>   skb_page_frag_refill+0x84/0x110
>   ...
> 
> Sometimes, it would also show up as corruption in the free list pointer and
> cause crashes.
> 
> After bisecting the issue, we found the issue started from e320d3012d25:
> 
> 	if (put_page_testzero(page))
> 		free_the_page(page, order);
> 	else if (!PageHead(page))
> 		while (order-- > 0)
> 			free_the_page(page + (1 << order), order);
> 
> So the problem is the check PageHead is racy because at this point we
> already dropped our reference to the page. So even if we came in with
> compound page, the page can already be freed and PageHead can return
> false and we will end up freeing all the tail pages causing double free.
> 
> Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: linux-mm@kvack.org
> Cc: stable@vger.kernel.org
> Signed-off-by: Chunwei Chen <david.chen@nutanix.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

That's nasty enough to go into 6.2, IMHO.

> ---
>  mm/page_alloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0745aedebb37..3bb3484563ed 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5631,9 +5631,12 @@ EXPORT_SYMBOL(get_zeroed_page);
>   */
>  void __free_pages(struct page *page, unsigned int order)
>  {
> +	/* get PageHead before we drop reference */
> +	int head = PageHead(page);
> +
>  	if (put_page_testzero(page))
>  		free_the_page(page, order);
> -	else if (!PageHead(page))
> +	else if (!head)
>  		while (order-- > 0)
>  			free_the_page(page + (1 << order), order);
>  }


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix page corruption caused by racy check in __free_pages
  2023-02-09 17:48 [PATCH] Fix page corruption caused by racy check in __free_pages David Chen
  2023-02-09 18:53 ` Matthew Wilcox
  2023-02-10 15:47 ` Vlastimil Babka
@ 2023-02-11 14:04 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 0 replies; 4+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-02-11 14:04 UTC (permalink / raw)
  To: David Chen, linux-kernel
  Cc: Andrew Morton, Matthew Wilcox (Oracle),
	linux-mm, stable, Linux kernel regressions list

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 09.02.23 18:48, David Chen wrote:
> When we upgraded our kernel, we started seeing some page corruption like
> the following consistently:
> 
>  BUG: Bad page state in process ganesha.nfsd  pfn:1304ca
>  page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
>  flags: 0x17ffffc0000000()
>  raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
>  raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
>  page dumped because: nonzero mapcount
>  CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P    B      O      5.10.158-1.nutanix.20221209.el7.x86_64 #1
>  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
>  Call Trace:
>   dump_stack+0x74/0x96
>   bad_page.cold+0x63/0x94
>   check_new_page_bad+0x6d/0x80
>   rmqueue+0x46e/0x970
>   get_page_from_freelist+0xcb/0x3f0
>   ? _cond_resched+0x19/0x40
>   __alloc_pages_nodemask+0x164/0x300
>   alloc_pages_current+0x87/0xf0
>   skb_page_frag_refill+0x84/0x110
>   ...
>
> Sometimes, it would also show up as corruption in the free list pointer and
> cause crashes.
> 
> After bisecting the issue, we found the issue started from e320d3012d25:
> 
> 	if (put_page_testzero(page))
> 		free_the_page(page, order);
> 	else if (!PageHead(page))
> 		while (order-- > 0)
> 			free_the_page(page + (1 << order), order);
> 
> So the problem is the check PageHead is racy because at this point we
> already dropped our reference to the page. So even if we came in with
> compound page, the page can already be freed and PageHead can return
> false and we will end up freeing all the tail pages causing double free.
> 
> Fixes: e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: linux-mm@kvack.org
> Cc: stable@vger.kernel.org
> Signed-off-by: Chunwei Chen <david.chen@nutanix.com>

Thanks for the report and the patch. To be sure the issue doesn't fall
through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel
regression tracking bot:

#regzbot ^introduced e320d3012d25
#regzbot title mm: page corruption caused by racy check in __free_pages
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-11 14:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 17:48 [PATCH] Fix page corruption caused by racy check in __free_pages David Chen
2023-02-09 18:53 ` Matthew Wilcox
2023-02-10 15:47 ` Vlastimil Babka
2023-02-11 14:04 ` Linux regression tracking #adding (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).