All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aili Yao <yaoaili@kingsoft.com>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	"david@redhat.com" <david@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	<sunhao2@kingsoft.com>, <yaoaili@kingsoft.com>
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned
Date: Wed, 17 Mar 2021 16:23:04 +0800	[thread overview]
Message-ID: <20210317162304.58ff188c@alex-virtual-machine> (raw)
In-Reply-To: <20210317154812.4173f423@alex-virtual-machine>


> > 
> > Returning true means you stop walking when you find the first entry pointing
> > to a given pfn. But there could be multiple such entries, so if MCE SRAR is
> > triggered by memory access to the larger address in hwpoisoned entries, the
> > returned virtual address might be wrong.
> >   
> 
> I can't find the way to fix this, maybe the virtual address is contained in
> related register, but this is really beyong my knowledge.
> 
> This is a v2 RFC patch, add support for thp and 1G huge page errors.
> 

Sorry for the debug info and other unclean modifications.

Post a clean one.

Thanks
Aili Yao

From 2289276ba943cdcddbf3b5b2cdbcaff78690e2e8 Mon Sep 17 00:00:00 2001
From: Aili Yao <yaoaili@kingsoft.com>
Date: Wed, 17 Mar 2021 16:12:41 +0800
Subject: [PATCH] fix invalid SIGBUS address for recovery fail

Walk the current process pages and compare with the pfn, then get the
user address and related page_shift.

For thp pages, we can only split anonoums thp page, so I think there may
be no race condition for walking and searching the thp error page for such
case; For non anonymous thp, the page flag and pte will not be change. so
when code goes into this place, it may be race condition for non-anonoums
thp page or from a recovery fail for anonoums thp, the page status will
not change, i am not so sure about this;

For the case we don't find the related virtual address, Maybe sending one
BUS_MCEERR_AR signal with invalid address NULL is a better option, but i am
not sure.

And this may get the wrong virtual address if process have multiple entry
for a same page, I don't find a way to get it correct.

Maybe other issues is not recognized.
---
 arch/x86/kernel/cpu/mce/core.c |  12 +---
 include/linux/mm.h             |   1 +
 mm/memory-failure.c            | 127 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index db4afc5..4cb873c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1246,7 +1246,7 @@ static void kill_me_maybe(struct callback_head *cb)
 	struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me);
 	int flags = MF_ACTION_REQUIRED;
 
-	pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr);
+	pr_err("Uncorrected hardware memory error in user-access at %llx\n", p->mce_addr);
 
 	if (!p->mce_ripv)
 		flags |= MF_MUST_KILL;
@@ -1258,14 +1258,8 @@ static void kill_me_maybe(struct callback_head *cb)
 		return;
 	}
 
-	if (p->mce_vaddr != (void __user *)-1l) {
-		pr_err("Memory error may not recovered: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
-			p->mce_addr >> PAGE_SHIFT, p->comm, p->pid);
-		force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT);
-	} else {
-		pr_err("Memory error not recovered");
-		kill_me_now(cb);
-	}
+	memory_failure_error(current, p->mce_addr >> PAGE_SHIFT);
+
 }
 
 static void queue_task_work(struct mce *m, int kill_current_task)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecdf8a8..cff2f02 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3046,6 +3046,7 @@ enum mf_flags {
 	MF_SOFT_OFFLINE = 1 << 3,
 };
 extern int memory_failure(unsigned long pfn, int flags);
+extern void memory_failure_error(struct task_struct *p, unsigned long pfn);
 extern void memory_failure_queue(unsigned long pfn, int flags);
 extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 06f0061..359b42f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -56,6 +56,7 @@
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include <linux/page-isolation.h>
+#include <linux/pagewalk.h>
 #include "internal.h"
 #include "ras/ras_event.h"
 
@@ -1553,6 +1554,132 @@ int memory_failure(unsigned long pfn, int flags)
 }
 EXPORT_SYMBOL_GPL(memory_failure);
 
+static int pte_range(pte_t *ptep, unsigned long addr, unsigned long next, struct mm_walk *walk)
+{
+	u64 *buff = (u64 *)walk->private;
+	u64 pfn = buff[0];
+	pte_t pte = *ptep;
+
+	if (!pte_none(pte) && !pte_present(pte)) {
+		swp_entry_t swp_temp = pte_to_swp_entry(pte);
+
+		if (is_hwpoison_entry(swp_temp) && swp_offset(swp_temp) == pfn)
+			goto find;
+	} else if (pte_pfn(pte) == pfn) {
+		goto find;
+	}
+
+	return 0;
+
+find:
+	buff[0] = addr;
+	buff[1] = PAGE_SHIFT;
+	return true;
+}
+
+static int pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
+			     struct mm_walk *walk)
+{
+	u64 *buff = (u64 *)walk->private;
+	struct page *page = (struct page *)buff[0];
+	u64 pfn = buff[1];
+	pmd_t pmd = *pmdp;
+
+	if (likely(!pmd_trans_huge(pmd)))
+		return 0;
+
+	if (pmd_none(pmd) || !pmd_present(pmd))
+		return 0;
+
+	if (pmd_page(pmd) != page)
+		return 0;
+
+	for (; addr != end; page++, addr += PAGE_SIZE) {
+		if (page_to_pfn(page) == pfn) {
+			buff[0] = addr;
+			buff[1] = PAGE_SHIFT;
+			return true;
+		}
+	}
+
+	return 0;
+}
+
+static int hugetlb_range(pte_t *ptep, unsigned long hmask,
+				 unsigned long addr, unsigned long end,
+				 struct mm_walk *walk)
+{
+	u64 *buff = (u64 *)walk->private;
+	u64 pfn = buff[0];
+	pte_t pte =  huge_ptep_get(ptep);
+	struct page *page = pfn_to_page(pfn);
+
+	if (!huge_pte_none(pte) && !pte_present(pte)) {
+		swp_entry_t swp_temp = pte_to_swp_entry(pte);
+
+		if (is_hwpoison_entry(swp_temp) && swp_offset(swp_temp) == pfn)
+			goto find;
+	}
+	if (pte_pfn(pte) == pfn)
+		goto find;
+
+	return 0;
+
+find:
+	buff[0] = addr;
+	buff[1] = (huge_page_size(page_hstate(page)) > PMD_SIZE) ? PUD_SHIFT : PMD_SHIFT;
+	return true;
+}
+
+void memory_failure_error(struct task_struct *p, unsigned long pfn)
+{
+	u64 buff[2] = {0};
+	struct page *page;
+	int ret = -1;
+	struct mm_walk_ops walk = {0};
+
+	if (p->mce_vaddr != (void __user *)-1l && p->mce_vaddr != (void __user *)0) {
+		force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT);
+		return;
+	}
+
+	page = pfn_to_page(pfn);
+	if (!page)
+		goto force_sigbus;
+
+	if (is_zone_device_page(page))
+		goto force_sigbus;
+
+	page = compound_head(page);
+
+	if (PageHuge(page)) {
+		walk.hugetlb_entry = hugetlb_range;
+		buff[0] = page_to_pfn(page);
+	} else if (PageTransHuge(page)) {
+		walk.pmd_entry = pmd_range;
+		buff[0] = (u64)page;
+		buff[1] = pfn;
+	} else {
+		walk.pte_entry = pte_range;
+		buff[0] = pfn;
+	}
+
+	mmap_read_lock(p->mm);
+	ret = walk_page_range(p->mm, 0, TASK_SIZE_MAX, &walk, (void *)buff);
+	mmap_read_unlock(p->mm);
+
+	pr_err("Memory error may not recovered: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
+	pfn, p->comm, p->pid);
+
+	if (ret) {
+		force_sig_mceerr(BUS_MCEERR_AR, (void __user *)buff[0], buff[1]);
+	} else {
+force_sigbus:
+		force_sig_mceerr(BUS_MCEERR_AR, (void __user *)0, PAGE_SHIFT);
+	}
+}
+EXPORT_SYMBOL_GPL(memory_failure_error);
+
 #define MEMORY_FAILURE_FIFO_ORDER	4
 #define MEMORY_FAILURE_FIFO_SIZE	(1 << MEMORY_FAILURE_FIFO_ORDER)
 
-- 
1.8.3.1


  reply	other threads:[~2021-03-17  8:23 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  7:16 [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-02-24 10:10 ` David Hildenbrand
2021-02-24 10:31 ` Oscar Salvador
2021-02-25  3:43   ` Aili Yao
2021-02-25 11:28     ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 11:39       ` Oscar Salvador
2021-02-25 12:38         ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 18:15           ` Luck, Tony
2021-02-26  2:19             ` HORIGUCHI NAOYA(堀口 直也)
2021-02-26  2:59               ` Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  1:55                                                         ` Jue Wang
2021-03-13  1:55                                                           ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-31 11:25                                                     ` [PATCH v3] mm,hwpoison: return -EHWPOISON " Aili Yao
2021-04-01 15:33                                                       ` Luck, Tony
2021-04-02  1:18                                                         ` Aili Yao
2021-04-02 15:11                                                           ` Luck, Tony
2021-04-05 13:50                                                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-06  1:04                                                               ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao [this message]
2021-02-26  3:26               ` Tony Luck
2021-02-26  3:26                 ` Tony Luck
2021-02-26  2:52         ` Aili Yao
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao
2021-03-31 10:56         ` Aili Yao
2021-03-31 10:58           ` David Hildenbrand
2021-03-01 23:21 Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210317162304.58ff188c@alex-virtual-machine \
    --to=yaoaili@kingsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=sunhao2@kingsoft.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.