All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Aili Yao <yaoaili@kingsoft.com>
Cc: naoya.horiguchi@nec.com, akpm@linux-foundation.org,
	tony.luck@intel.com, bp@alien8.de, tglx@linutronix.de,
	mingo@redhat.com, hpa@zytor.com, x86@kernel.org,
	inux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, yangfeng1@kingsoft.com
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned
Date: Wed, 24 Feb 2021 11:31:55 +0100	[thread overview]
Message-ID: <20210224103105.GA16368@linux> (raw)
In-Reply-To: <20210224151619.67c29731@alex-virtual-machine>

On Wed, Feb 24, 2021 at 03:16:19PM +0800, Aili Yao wrote:
> When the page is already poisoned, another memory_failure() call in the
> same page now return 0, meaning OK. For nested memory mce handling, this
> behavior may lead real serious problem, Example:

I have some questions:

> 1.When LCME is enabled, and there are two processes A && B running on
> different core X && Y separately, which will access one same page, then
> the page corrupted when process A access it, a MCE will be rasied to
> core X and the error process is just underway.

When !LMCE, that is not a problem because new MCE needs to wait for the ongoing MCE?

> 2.Then B access the page and trigger another MCE to core Y, it will also
> do error process, it will see TestSetPageHWPoison be true, and 0 is
> returned.

For non-nested calls, that is no problem because the page will be taken out
of business(unmapped from the processes), right? So no more MCE are possible.

> 
> 3.The kill_me_maybe will check the return:
> 
> 1244 static void kill_me_maybe(struct callback_head *cb)
> 1245 {
> 
> 1254         if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
> 1255             !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
> 1256                 set_mce_nospec(p->mce_addr >> PAGE_SHIFT,

So, IIUC, in case of a LMCE nested call, the second MCE will reach here.
set_mce_nospec() will either mark the underlying page as not mapped/cached.

Should not have memory_failure()->hwpoison_user_mappings() unmapped the page
from both process A and B? Or this is in case the ongoing MCE(process A) has
not still unmapped anything, so process B can still access this page.

So with your change, process B will be sent a SIGBUG, while process A is still
handling the MCE, right?

> p->mce_whole_page);
> 1257                 sync_core();
> 1258                 return;
> 1259         }
> 
> 1267 }
> 
> 4. The error process for B will end, and may nothing happened if
> kill-early is not set, We may let the wrong data go into effect.
> 
> For other cases which care the return value of memory_failure() should
> check why they want to process a memory error which have already been
> processed. This behavior seems reasonable.
> 
> In kill_me_maybe, log the fact about the memory may not recovered, and
> we will kill the related process.
> 
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 2 ++
>  mm/memory-failure.c            | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index e133ce1e562b..db4afc5bf15a 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1259,6 +1259,8 @@ static void kill_me_maybe(struct callback_head *cb)
>  	}
>  
>  	if (p->mce_vaddr != (void __user *)-1l) {
> +		pr_err("Memory error may not recovered: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
> +			p->mce_addr >> PAGE_SHIFT, p->comm, p->pid);
>  		force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT);
>  	} else {
>  		pr_err("Memory error not recovered");
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index e9481632fcd1..06f006174b8c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1224,7 +1224,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>  	if (TestSetPageHWPoison(head)) {
>  		pr_err("Memory failure: %#lx: already hardware poisoned\n",
>  		       pfn);
> -		return 0;
> +		return -EBUSY;

As David said, madvise_inject_error() will start returning -EBUSY now in case
we madvise(MADV_HWPOISON) on an already hwpoisoned page.

AFAICS, memory_failure() can return 0, -Eerrors, and MF_XXX.
Would it make sense to unify that? That way we could declare error codes that
make somse sense (like MF_ALREADY_HWPOISONED).

-- 
Oscar Salvador
SUSE L3

  parent reply	other threads:[~2021-02-24 10:32 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  7:16 [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-02-24 10:10 ` David Hildenbrand
2021-02-24 10:31 ` Oscar Salvador [this message]
2021-02-25  3:43   ` Aili Yao
2021-02-25 11:28     ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 11:39       ` Oscar Salvador
2021-02-25 12:38         ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 18:15           ` Luck, Tony
2021-02-26  2:19             ` HORIGUCHI NAOYA(堀口 直也)
2021-02-26  2:59               ` Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  1:55                                                         ` Jue Wang
2021-03-13  1:55                                                           ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-31 11:25                                                     ` [PATCH v3] mm,hwpoison: return -EHWPOISON " Aili Yao
2021-04-01 15:33                                                       ` Luck, Tony
2021-04-02  1:18                                                         ` Aili Yao
2021-04-02 15:11                                                           ` Luck, Tony
2021-04-05 13:50                                                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-06  1:04                                                               ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao
2021-02-26  3:26               ` Tony Luck
2021-02-26  3:26                 ` Tony Luck
2021-02-26  2:52         ` Aili Yao
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao
2021-03-31 10:56         ` Aili Yao
2021-03-31 10:58           ` David Hildenbrand
2021-03-01 23:21 Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224103105.GA16368@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=inux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.