All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Aili Yao <yaoaili@kingsoft.com>,
	Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org, naoya.horiguchi@nec.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	yangfeng1@kingsoft.com, sunhao2@kingsoft.com,
	Oscar Salvador <osalvador@suse.de>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH v5] mm/gup: check page hwposion status for coredump.
Date: Fri, 26 Mar 2021 15:22:49 +0100	[thread overview]
Message-ID: <f316ca3b-6f09-c51d-9661-66171f14ee33@redhat.com> (raw)
In-Reply-To: <afeac310-c6aa-f9d8-6c90-e7e7f21ddf9a@redhat.com>

On 26.03.21 15:09, David Hildenbrand wrote:
> On 22.03.21 12:33, Aili Yao wrote:
>> When we do coredump for user process signal, this may be one SIGBUS signal
>> with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
>> resulted from ECC memory fail like SRAR or SRAO, we expect the memory
>> recovery work is finished correctly, then the get_dump_page() will not
>> return the error page as its process pte is set invalid by
>> memory_failure().
>>
>> But memory_failure() may fail, and the process's related pte may not be
>> correctly set invalid, for current code, we will return the poison page,
>> get it dumped, and then lead to system panic as its in kernel code.
>>
>> So check the hwpoison status in get_dump_page(), and if TRUE, return NULL.
>>
>> There maybe other scenario that is also better to check hwposion status
>> and not to panic, so make a wrapper for this check, Thanks to David's
>> suggestion(<david@redhat.com>).
>>
>> Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine
>> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Aili Yao <yaoaili@kingsoft.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>    mm/gup.c      |  4 ++++
>>    mm/internal.h | 20 ++++++++++++++++++++
>>    2 files changed, 24 insertions(+)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index e4c224c..6f7e1aa 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -1536,6 +1536,10 @@ struct page *get_dump_page(unsigned long addr)
>>    				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
>>    	if (locked)
>>    		mmap_read_unlock(mm);
> 
> Thinking again, wouldn't we get -EFAULT from __get_user_pages_locked()
> when stumbling over a hwpoisoned page?
> 
> See __get_user_pages_locked()->__get_user_pages()->faultin_page():
> 
> handle_mm_fault()->vm_fault_to_errno(), which translates
> VM_FAULT_HWPOISON to -EFAULT, unless FOLL_HWPOISON is set (-> -EHWPOISON)
> 
> ?

Or doesn't that happen as you describe "But memory_failure() may fail, 
and the process's related pte may not be correctly set invalid" -- but 
why does that happen?

On a similar thought, should get_user_pages() never return a page that 
has HWPoison set? E.g., check also for existing PTEs if the page is 
hwpoisoned?

@Naoya, Oscar

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-03-26 14:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17  8:37 [PATCH] mm/gup: check page posion status for coredump Aili Yao
2021-03-17  9:12 ` David Hildenbrand
2021-03-18  3:15   ` Aili Yao
2021-03-18  3:18   ` [PATCH v2] " Aili Yao
2021-03-18  4:46   ` [PATCH] " Matthew Wilcox
2021-03-18  5:34     ` Aili Yao
2021-03-19  2:44       ` [PATCH v3] " Aili Yao
2021-03-20  0:35         ` Matthew Wilcox
2021-03-22  3:40           ` Aili Yao
2021-03-22 11:33           ` [PATCH v5] mm/gup: check page hwposion " Aili Yao
2021-03-26 14:09             ` David Hildenbrand
2021-03-26 14:22               ` David Hildenbrand [this message]
2021-03-31  1:52                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-31  2:43                   ` Aili Yao
2021-03-31  4:32                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-31  6:44                       ` David Hildenbrand
2021-03-31  7:07                         ` Aili Yao
2021-04-01  2:31                         ` Aili Yao
2021-04-06  2:23                         ` [PATCH v6] mm/gup: check page hwpoison status for memory recovery failures Aili Yao
2021-04-06  2:41                           ` [PATCH v7] " Aili Yao
2021-04-07  1:54                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-07  7:48                               ` Aili Yao
2021-05-10  3:13                             ` Aili Yao
2021-03-31  6:07                   ` [PATCH v5] mm/gup: check page hwposion status for coredump Matthew Wilcox
2021-03-31  6:53                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-31  7:05                       ` David Hildenbrand
2021-03-18  8:14     ` [PATCH] mm/gup: check page posion " David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f316ca3b-6f09-c51d-9661-66171f14ee33@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=sunhao2@kingsoft.com \
    --cc=willy@infradead.org \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.