linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Revert "mm/gup: check page posion status for coredump."
@ 2021-05-05 13:54 Michal Hocko
  2021-05-05 13:55 ` David Hildenbrand
  2021-05-06  5:47 ` Aili Yao
  0 siblings, 2 replies; 7+ messages in thread
From: Michal Hocko @ 2021-05-05 13:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Hildenbrand, Aili Yao, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

While reviewing http://lkml.kernel.org/r/20210429122519.15183-4-david@redhat.com
I have crossed d3378e86d182 ("mm/gup: check page posion status for
coredump.") and noticed that this patch is broken in two ways. First it
doesn't really prevent hwpoison pages from being dumped because hwpoison
pages can be marked asynchornously at any time after the check.
Secondly, and more importantly, the patch introduces a ref count leak
because get_dump_page takes a reference on the page which is not
releases.

It also seems that the patch was merged incorrectly because there were
follow up changes not included as well as discussions on how to address
the underlying problem http://lkml.kernel.org/r/57ac524c-b49a-99ec-c1e4-ef5027bfb61b@redhat.com

Therefore revert the original patch.

Fixes: d3378e86d182 ("mm/gup: check page posion status for coredump.")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/gup.c      |  4 ----
 mm/internal.h | 20 --------------------
 2 files changed, 24 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 71e546e279fc..a33abe9048ed 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1592,10 +1592,6 @@ struct page *get_dump_page(unsigned long addr)
 				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
 	if (locked)
 		mmap_read_unlock(mm);
-
-	if (ret == 1 && is_page_poisoned(page))
-		return NULL;
-
 	return (ret == 1) ? page : NULL;
 }
 #endif /* CONFIG_ELF_CORE */
diff --git a/mm/internal.h b/mm/internal.h
index ef5f336f59bd..43c4a2f8d4cc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -96,26 +96,6 @@ static inline void set_page_refcounted(struct page *page)
 	set_page_count(page, 1);
 }
 
-/*
- * When kernel touch the user page, the user page may be have been marked
- * poison but still mapped in user space, if without this page, the kernel
- * can guarantee the data integrity and operation success, the kernel is
- * better to check the posion status and avoid touching it, be good not to
- * panic, coredump for process fatal signal is a sample case matching this
- * scenario. Or if kernel can't guarantee the data integrity, it's better
- * not to call this function, let kernel touch the poison page and get to
- * panic.
- */
-static inline bool is_page_poisoned(struct page *page)
-{
-	if (PageHWPoison(page))
-		return true;
-	else if (PageHuge(page) && PageHWPoison(compound_head(page)))
-		return true;
-
-	return false;
-}
-
 extern unsigned long highest_memmap_pfn;
 
 /*
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "mm/gup: check page posion status for coredump."
  2021-05-05 13:54 [PATCH] Revert "mm/gup: check page posion status for coredump." Michal Hocko
@ 2021-05-05 13:55 ` David Hildenbrand
  2021-05-05 17:25   ` Andrew Morton
  2021-05-06  5:47 ` Aili Yao
  1 sibling, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2021-05-05 13:55 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton; +Cc: Aili Yao, linux-mm, LKML, Michal Hocko

On 05.05.21 15:54, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> While reviewing http://lkml.kernel.org/r/20210429122519.15183-4-david@redhat.com
> I have crossed d3378e86d182 ("mm/gup: check page posion status for
> coredump.") and noticed that this patch is broken in two ways. First it
> doesn't really prevent hwpoison pages from being dumped because hwpoison
> pages can be marked asynchornously at any time after the check.
> Secondly, and more importantly, the patch introduces a ref count leak
> because get_dump_page takes a reference on the page which is not
> releases.
> 
> It also seems that the patch was merged incorrectly because there were
> follow up changes not included as well as discussions on how to address
> the underlying problem http://lkml.kernel.org/r/57ac524c-b49a-99ec-c1e4-ef5027bfb61b@redhat.com
> 
> Therefore revert the original patch.
> 
> Fixes: d3378e86d182 ("mm/gup: check page posion status for coredump.")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>   mm/gup.c      |  4 ----
>   mm/internal.h | 20 --------------------
>   2 files changed, 24 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 71e546e279fc..a33abe9048ed 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1592,10 +1592,6 @@ struct page *get_dump_page(unsigned long addr)
>   				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
>   	if (locked)
>   		mmap_read_unlock(mm);
> -
> -	if (ret == 1 && is_page_poisoned(page))
> -		return NULL;
> -
>   	return (ret == 1) ? page : NULL;
>   }
>   #endif /* CONFIG_ELF_CORE */
> diff --git a/mm/internal.h b/mm/internal.h
> index ef5f336f59bd..43c4a2f8d4cc 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -96,26 +96,6 @@ static inline void set_page_refcounted(struct page *page)
>   	set_page_count(page, 1);
>   }
>   
> -/*
> - * When kernel touch the user page, the user page may be have been marked
> - * poison but still mapped in user space, if without this page, the kernel
> - * can guarantee the data integrity and operation success, the kernel is
> - * better to check the posion status and avoid touching it, be good not to
> - * panic, coredump for process fatal signal is a sample case matching this
> - * scenario. Or if kernel can't guarantee the data integrity, it's better
> - * not to call this function, let kernel touch the poison page and get to
> - * panic.
> - */
> -static inline bool is_page_poisoned(struct page *page)
> -{
> -	if (PageHWPoison(page))
> -		return true;
> -	else if (PageHuge(page) && PageHWPoison(compound_head(page)))
> -		return true;
> -
> -	return false;
> -}
> -
>   extern unsigned long highest_memmap_pfn;
>   
>   /*
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "mm/gup: check page posion status for coredump."
  2021-05-05 13:55 ` David Hildenbrand
@ 2021-05-05 17:25   ` Andrew Morton
  2021-05-06  6:56     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2021-05-05 17:25 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Michal Hocko, Aili Yao, linux-mm, LKML, Michal Hocko

On Wed, 5 May 2021 15:55:42 +0200 David Hildenbrand <david@redhat.com> wrote:

> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

cc:stable?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "mm/gup: check page posion status for coredump."
  2021-05-05 13:54 [PATCH] Revert "mm/gup: check page posion status for coredump." Michal Hocko
  2021-05-05 13:55 ` David Hildenbrand
@ 2021-05-06  5:47 ` Aili Yao
  2021-05-06  7:02   ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Aili Yao @ 2021-05-06  5:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, David Hildenbrand, linux-mm, LKML, Michal Hocko,
	yaoaili126

On Wed, 5 May 2021 15:54:07 +0200
Michal Hocko <mhocko@kernel.org> wrote:

> From: Michal Hocko <mhocko@suse.com>
> 
> While reviewing http://lkml.kernel.org/r/20210429122519.15183-4-david@redhat.com
> I have crossed d3378e86d182 ("mm/gup: check page posion status for
> coredump.") and noticed that this patch is broken in two ways. First it
> doesn't really prevent hwpoison pages from being dumped because hwpoison
> pages can be marked asynchornously at any time after the check.

I rethink this:
There are two cases for this coredump panic issue.
One is the scenario that the hwpoison flag is set correctly, and the previous patch
will make it recoverable and avoid panic.

Another is the hwpoison flag not valid in the check, maybe race condition. I don't think
this case is worth and reliazable to be covered. As the SRAR can happen freshly in the dump
process and thus can't be detected.

And the previous patch doesn't make the Another case worse and unacceptable. just as it can't be
covered.

So here is the patch:
For most case in this topic, the patch will work. For the case hwpoison flag not valid, it will
fallback to the original process before this patch --- just panic. 

And i don't think we need to consider the minor case as you have said the posion can happen any time.

Thanks!
Aili Yao





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "mm/gup: check page posion status for coredump."
  2021-05-05 17:25   ` Andrew Morton
@ 2021-05-06  6:56     ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2021-05-06  6:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Hildenbrand, Aili Yao, linux-mm, LKML

On Wed 05-05-21 10:25:47, Andrew Morton wrote:
> On Wed, 5 May 2021 15:55:42 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
> > 
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> cc:stable?

This has been merged into 5.12-rc7 and it didn't have any Fixes tag nor
it was CCed for stable. But stable tree is quite pro-active blindly
backporting anything resembling a fix might just sneak it in. So I would
go and mark is for stable just in case.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "mm/gup: check page posion status for coredump."
  2021-05-06  5:47 ` Aili Yao
@ 2021-05-06  7:02   ` Michal Hocko
  2021-05-06  7:20     ` Aili Yao
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2021-05-06  7:02 UTC (permalink / raw)
  To: Aili Yao; +Cc: Andrew Morton, David Hildenbrand, linux-mm, LKML, yaoaili126

On Thu 06-05-21 13:47:50, Aili Yao wrote:
> On Wed, 5 May 2021 15:54:07 +0200
> Michal Hocko <mhocko@kernel.org> wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > While reviewing http://lkml.kernel.org/r/20210429122519.15183-4-david@redhat.com
> > I have crossed d3378e86d182 ("mm/gup: check page posion status for
> > coredump.") and noticed that this patch is broken in two ways. First it
> > doesn't really prevent hwpoison pages from being dumped because hwpoison
> > pages can be marked asynchornously at any time after the check.
> 
> I rethink this:
> There are two cases for this coredump panic issue.
> One is the scenario that the hwpoison flag is set correctly, and the previous patch
> will make it recoverable and avoid panic.
> 
> Another is the hwpoison flag not valid in the check, maybe race condition. I don't think
> this case is worth and reliazable to be covered. As the SRAR can happen freshly in the dump
> process and thus can't be detected.
> 
> And the previous patch doesn't make the Another case worse and unacceptable. just as it can't be
> covered.
> 
> So here is the patch:
> For most case in this topic, the patch will work. For the case hwpoison flag not valid, it will
> fallback to the original process before this patch --- just panic. 

Please propose a new fix which a) doesn't leak a page reference b)
evaluates how realistic is the scenario c) explain why any other gup
user doesn't really need to care - or in other words is the gup layer
really suitable for this issue?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "mm/gup: check page posion status for coredump."
  2021-05-06  7:02   ` Michal Hocko
@ 2021-05-06  7:20     ` Aili Yao
  0 siblings, 0 replies; 7+ messages in thread
From: Aili Yao @ 2021-05-06  7:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, David Hildenbrand, linux-mm, LKML, yaoaili126

On Thu, 6 May 2021 09:02:50 +0200
Michal Hocko <mhocko@suse.com> wrote:

> On Thu 06-05-21 13:47:50, Aili Yao wrote:
> > On Wed, 5 May 2021 15:54:07 +0200
> > Michal Hocko <mhocko@kernel.org> wrote:
> >   
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > While reviewing http://lkml.kernel.org/r/20210429122519.15183-4-david@redhat.com
> > > I have crossed d3378e86d182 ("mm/gup: check page posion status for
> > > coredump.") and noticed that this patch is broken in two ways. First it
> > > doesn't really prevent hwpoison pages from being dumped because hwpoison
> > > pages can be marked asynchornously at any time after the check.  
> > 
> > I rethink this:
> > There are two cases for this coredump panic issue.
> > One is the scenario that the hwpoison flag is set correctly, and the previous patch
> > will make it recoverable and avoid panic.
> > 
> > Another is the hwpoison flag not valid in the check, maybe race condition. I don't think
> > this case is worth and reliazable to be covered. As the SRAR can happen freshly in the dump
> > process and thus can't be detected.
> > 
> > And the previous patch doesn't make the Another case worse and unacceptable. just as it can't be
> > covered.
> > 
> > So here is the patch:
> > For most case in this topic, the patch will work. For the case hwpoison flag not valid, it will
> > fallback to the original process before this patch --- just panic.   
> 
> Please propose a new fix which a) doesn't leak a page reference b)
> evaluates how realistic is the scenario 

Got this, Thanks, I will dig into it and try to fix the leak. And There will be more comments on the
scenario that the issue will be triggered.

> c) explain why any other gup
> user doesn't really need to care - or in other words is the gup layer
> really suitable for this issue?

For SIGBUS coredump case, we will call the gup module for dump pages. For normal hwposion case, the gup module
will check the pte entry for hwpoison case, ans this issue is for another case for hwpoison. Maybe it's easy to
fix this issue in gup module.

Thanks!
Aili Yao


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-06  7:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 13:54 [PATCH] Revert "mm/gup: check page posion status for coredump." Michal Hocko
2021-05-05 13:55 ` David Hildenbrand
2021-05-05 17:25   ` Andrew Morton
2021-05-06  6:56     ` Michal Hocko
2021-05-06  5:47 ` Aili Yao
2021-05-06  7:02   ` Michal Hocko
2021-05-06  7:20     ` Aili Yao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).