All of lore.kernel.org
 help / color / mirror / Atom feed
From: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: linux-mm@kvack.org, Tony Luck <tony.luck@intel.com>,
	Aili Yao <yaoaili@kingsoft.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Jue Wang <juew@google.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already
Date: Wed, 21 Apr 2021 09:57:27 +0900	[thread overview]
Message-ID: <20210421005728.1994268-3-nao.horiguchi@gmail.com> (raw)
In-Reply-To: <20210421005728.1994268-1-nao.horiguchi@gmail.com>

From: Aili Yao <yaoaili@kingsoft.com>

When the page is already poisoned, another memory_failure() call in the
same page now returns 0, meaning OK. For nested memory mce handling, this
behavior may lead to one mce looping, Example:

1. When LCME is enabled, and there are two processes A && B running on
different core X && Y separately, which will access one same page, then
the page corrupted when process A access it, a MCE will be rasied to
core X and the error process is just underway.

2. Then B access the page and trigger another MCE to core Y, it will also
do error process, it will see TestSetPageHWPoison be true, and 0 is
returned.

3. The kill_me_maybe will check the return:

    1244 static void kill_me_maybe(struct callback_head *cb)
    1245 {
    ...
    1254         if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
    1255             !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
    1256                 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page);
    1257                 sync_core();
    1258                 return;
    1259         }
    ...
    1267 }

4. The error process for B will end, and may nothing happened if
kill-early is not set, The process B will re-excute instruction and get
into mce again and then loop happens. And also the set_mce_nospec()
here is not proper, may refer to commit fd0e786d9d09 ("x86/mm,
mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages").

For other cases which care the return value of memory_failure() should
check why they want to process a memory error which have already been
processed. This behavior seems reasonable.

Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/memory-failure.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v5.12-rc8/mm/memory-failure.c v5.12-rc8_patched/mm/memory-failure.c
index 4087308e4b32..39d0ff0339b9 100644
--- v5.12-rc8/mm/memory-failure.c
+++ v5.12-rc8_patched/mm/memory-failure.c
@@ -1228,7 +1228,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
 	if (TestSetPageHWPoison(head)) {
 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
 		       pfn);
-		return 0;
+		return -EHWPOISON;
 	}
 
 	num_poisoned_pages_inc();
@@ -1437,6 +1437,7 @@ int memory_failure(unsigned long pfn, int flags)
 	if (TestSetPageHWPoison(p)) {
 		pr_err("Memory failure: %#lx: already hardware poisoned\n",
 			pfn);
+		res = -EHWPOISON;
 		goto unlock_mutex;
 	}
 
-- 
2.25.1


  parent reply	other threads:[~2021-04-21  0:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  0:57 [PATCH v3 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Naoya Horiguchi
2021-04-21  0:57 ` [PATCH v3 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Naoya Horiguchi
2021-04-22 17:01   ` Borislav Petkov
2021-04-21  0:57 ` Naoya Horiguchi [this message]
2021-04-22 17:02   ` [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already Borislav Petkov
2021-04-23  2:17     ` HORIGUCHI NAOYA(堀口 直也)
2021-04-21  0:57 ` [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-22 17:02   ` Borislav Petkov
2021-04-23  2:18     ` HORIGUCHI NAOYA(堀口 直也)
2021-04-23 11:57       ` Borislav Petkov
2021-04-26  8:23         ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07  2:10 ` [PATCH v3 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Luck, Tony
2021-05-07  3:37   ` Luck, Tony
2021-05-07  5:24     ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07 11:14       ` Aili Yao
2021-05-07 18:02         ` Luck, Tony
2021-05-08  2:38           ` Aili Yao
2021-05-10  3:28             ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210421005728.1994268-3-nao.horiguchi@gmail.com \
    --to=nao.horiguchi@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.