All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: linux-mm@kvack.org, Tony Luck <tony.luck@intel.com>,
	Aili Yao <yaoaili@kingsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH v1 2/3] mm,hwpoison: return -EHWPOISON when page already
Date: Tue, 13 Apr 2021 07:43:19 +0900	[thread overview]
Message-ID: <20210412224320.1747638-3-nao.horiguchi@gmail.com> (raw)
In-Reply-To: <20210412224320.1747638-1-nao.horiguchi@gmail.com>

From: Aili Yao <yaoaili@kingsoft.com>

When the page is already poisoned, another memory_failure() call in the
same page now returns 0, meaning OK. For nested memory mce handling, this
behavior may lead to one mce looping, Example:

1. When LCME is enabled, and there are two processes A && B running on
different core X && Y separately, which will access one same page, then
the page corrupted when process A access it, a MCE will be rasied to
core X and the error process is just underway.

2. Then B access the page and trigger another MCE to core Y, it will also
do error process, it will see TestSetPageHWPoison be true, and 0 is
returned.

3. The kill_me_maybe will check the return:

    1244 static void kill_me_maybe(struct callback_head *cb)
    1245 {
    ...
    1254         if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
    1255             !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
    1256                 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
    1257                 sync_core();
    1258                 return;
    1259         }
    ...
    1267 }

4. The error process for B will end, and may nothing happened if
kill-early is not set, The process B will re-excute instruction and get
into mce again and then loop happens. And also the set_mce_nospec()
here is not proper, may refer to commit fd0e786d9d09 ("x86/mm,
mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages").

For other cases which care the return value of memory_failure() should
check why they want to process a memory error which have already been
processed. This behavior seems reasonable.

Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git v5.12-rc5/mm/memory-failure.c v5.12-rc5_patched/mm/memory-failure.c
index c1509f4b565e..368ef77e01f9 100644
--- v5.12-rc5/mm/memory-failure.c
+++ v5.12-rc5_patched/mm/memory-failure.c
@@ -1228,7 +1228,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	if (TestSetPageHWPoison(head)) {
 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
 		       pfn);
-		return 0;
+		return -EHWPOISON;
 	}
 
 	num_poisoned_pages_inc();
@@ -1438,7 +1438,7 @@ int memory_failure(unsigned long pfn, int flags)
 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
 			pfn);
 		mutex_unlock(&mf_mutex);
-		return 0;
+		return -EHWPOISON;
 	}
 
 	orig_head = hpage = compound_head(p);
-- 
2.25.1


  parent reply	other threads:[~2021-04-12 22:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12 22:43 [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Naoya Horiguchi
2021-04-12 22:43 ` [PATCH v1 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Naoya Horiguchi
2021-04-19 17:05   ` Borislav Petkov
2021-04-20  7:46     ` HORIGUCHI NAOYA(堀口 直也)
2021-04-20 10:16       ` Borislav Petkov
2021-04-21  0:57         ` HORIGUCHI NAOYA(堀口 直也)
2021-04-12 22:43 ` Naoya Horiguchi [this message]
2021-04-12 22:43 ` [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-17  5:47 ` [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Aili Yao
2021-04-19  1:09   ` HORIGUCHI NAOYA(堀口 直也)
2021-04-19  2:36     ` [PATCH v2 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-19  3:43       ` Aili Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210412224320.1747638-3-nao.horiguchi@gmail.com \
    --to=nao.horiguchi@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.