All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Aili Yao <yaoaili@kingsoft.com>,
	Oscar Salvador <osalvador@suse.de>,
	"david@redhat.com" <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>
Subject: Re: [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned
Date: Wed, 10 Mar 2021 08:05:16 +0000	[thread overview]
Message-ID: <20210310080515.GA23187@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20210309200140.GA237657@agluck-desk2.amr.corp.intel.com>

On Tue, Mar 09, 2021 at 12:01:40PM -0800, Luck, Tony wrote:
> On Tue, Mar 09, 2021 at 08:28:24AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote:
> > > When the page is already poisoned, another memory_failure() call in the
> > > same page now return 0, meaning OK. For nested memory mce handling, this
> > > behavior may lead to mce looping, Example:
> > > 
> > > 1.When LCME is enabled, and there are two processes A && B running on
> > > different core X && Y separately, which will access one same page, then
> > > the page corrupted when process A access it, a MCE will be rasied to
> > > core X and the error process is just underway.
> > > 
> > > 2.Then B access the page and trigger another MCE to core Y, it will also
> > > do error process, it will see TestSetPageHWPoison be true, and 0 is
> > > returned.
> > > 
> > > 3.The kill_me_maybe will check the return:
> > > 
> > > 1244 static void kill_me_maybe(struct callback_head *cb)
> > > 1245 {
> > > 
> > > 1254         if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) &&
> > > 1255             !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) {
> > > 1256                 set_mce_nospec(p->mce_addr >> PAGE_SHIFT,
> > > p->mce_whole_page);
> > > 1257                 sync_core();
> > > 1258                 return;
> > > 1259         }
> > > 
> > > 1267 }
> > > 
> > > 4. The error process for B will end, and may nothing happened if
> > > kill-early is not set, The process B will re-excute instruction and get
> > > into mce again and then loop happens. And also the set_mce_nospec()
> > > here is not proper, may refer to commit fd0e786d9d09 ("x86/mm,
> > > mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages").
> > > 
> > > For other cases which care the return value of memory_failure() should
> > > check why they want to process a memory error which have already been
> > > processed. This behavior seems reasonable.
> > 
> > Other reviewers shared ideas about the returned value, but actually
> > I'm not sure which the best one is (EBUSY is not that bad).
> > What we need to fix the reported issue is to return non-zero value
> > for "already poisoned" case (the value itself is not so important). 
> > 
> > Other callers of memory_failure() (mostly test programs) could see
> > the change of return value, but they can already see EBUSY now and
> > anyway they should check dmesg for more detail about why failed,
> > so the impact of the change is not so big.
> > 
> > > 
> > > Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> > 
> > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> 
> I think that both this and my "add a mutex" patch are both
> too simplistic for this complex problem :-(
> 
> When multiple CPUs race to call memory_failure() for the same
> page we need the following results:
> 
> 1) Poison page should be marked not-present in all tasks
> 	I think the mutex patch achieves this as long as
> 	memory_failure() doesn't hit an error[1].

My assumption is that reserved kernel pages is not supposed to be mapped to any
process, so once memory_failure() judges a page as such, we never mark any page
table entry to hwpoison entry, is that correct?  So my question is why some
user-mapped page was judged as "reserved kernel page".  Futex allows such a situation?

I personally tried some testcase crossing futex and hwpoison, but I can't
reproduced "reserved kernel page" case.  If possible, could you provide me
with a little more detail about your testcase?

> 
> 2) All tasks that were executing an instruction that was accessing
>    the poison location should see a SIGBUS with virtual address and
>    BUS_MCEERR_AR signature in siginfo.
> 	Neither solution achieves this. The -EBUSY return ensures
> 	that there is a SIGBUS for the tasks that get the -EBUSY
> 	return, but no siginfo details.

Yes, that's not yet perfect but avoiding MCE loop is a progress.

> 	Just the mutex patch *might* have BUS_MCEERR_AO signature
> 	to the race losing tasks, but only if they have PF_MCE_EARLY
> 	set (so says the comment in kill_proc() ... but I don't
> 	see the code checking for that bit).

commit 30c9cf49270 might explain this, task_early_kill() got to call
find_early_kill_thread() (checking PF_MCE_EARLY) in this case.

> 
> #2 seems hard to achieve ... there are inherent races that mean the
> AO SIGBUS could have been queued to the task before it even hits
> the poison.

So I feel that we might want some change on memory_failure() to send
SIGBUS(BUS_MCEERR_AR) to "race losing tasks" within the new mutex.
I agree that how we find the error address it also a problem.
For now, I still have no better idea than page table walk.

> 
> Maybe should include a non-action:
> 
> 3) A task should only see one SIGBUS per poison?
> 	Not sure if this is achievable either ... what if the task
> 	has the same page mapped multiple times?

My thought is that hwpoison-aware applications could have dedlicated thread
for SIGBUS handling, so it's better to be prepared for multiple signals for
the same error event.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2021-03-10  8:06 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  7:16 [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-02-24 10:10 ` David Hildenbrand
2021-02-24 10:31 ` Oscar Salvador
2021-02-25  3:43   ` Aili Yao
2021-02-25 11:28     ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 11:39       ` Oscar Salvador
2021-02-25 12:38         ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 18:15           ` Luck, Tony
2021-02-26  2:19             ` HORIGUCHI NAOYA(堀口 直也)
2021-02-26  2:59               ` Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-03-13  1:55                                                         ` Jue Wang
2021-03-13  1:55                                                           ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-31 11:25                                                     ` [PATCH v3] mm,hwpoison: return -EHWPOISON " Aili Yao
2021-04-01 15:33                                                       ` Luck, Tony
2021-04-02  1:18                                                         ` Aili Yao
2021-04-02 15:11                                                           ` Luck, Tony
2021-04-05 13:50                                                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-06  1:04                                                               ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao
2021-02-26  3:26               ` Tony Luck
2021-02-26  3:26                 ` Tony Luck
2021-02-26  2:52         ` Aili Yao
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao
2021-03-31 10:56         ` Aili Yao
2021-03-31 10:58           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210310080515.GA23187@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.