Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Yisheng Xie <xieyisheng1@huawei.com>,
	Jan Stancek <jstancek@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"ltp@lists.linux.it" <ltp@lists.linux.it>
Subject: Re: Is MADV_HWPOISON supposed to work only on faulted-in pages?
Date: Mon, 27 Feb 2017 06:33:09 +0000	[thread overview]
Message-ID: <20170227063308.GA14387@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <22763879-C335-41E6-8102-2022EED75DAE@cs.rutgers.edu>

On Sun, Feb 26, 2017 at 10:27:02PM -0600, Zi Yan wrote:
> On 26 Feb 2017, at 19:20, Naoya Horiguchi wrote:
> 
> > On Sat, Feb 25, 2017 at 10:28:15AM +0800, Yisheng Xie wrote:
> >> hi Naoya,
> >>
> >> On 2017/2/23 11:23, Naoya Horiguchi wrote:
> >>> On Mon, Feb 20, 2017 at 05:00:17AM +0000, Horiguchi Naoya(堀口 直也) wrote:
> >>>> On Tue, Feb 14, 2017 at 04:41:29PM +0100, Jan Stancek wrote:
> >>>>> Hi,
> >>>>>
> >>>>> code below (and LTP madvise07 [1]) doesn't produce SIGBUS,
> >>>>> unless I touch/prefault page before call to madvise().
> >>>>>
> >>>>> Is this expected behavior?
> >>>>
> >>>> Thank you for reporting.
> >>>>
> >>>> madvise(MADV_HWPOISON) triggers page fault when called on the address
> >>>> over which no page is faulted-in, so I think that SIGBUS should be
> >>>> called in such case.
> >>>>
> >>>> But it seems that memory error handler considers such a page as "reserved
> >>>> kernel page" and recovery action fails (see below.)
> >>>>
> >>>>   [  383.371372] Injecting memory failure for page 0x1f10 at 0x7efcdc569000
> >>>>   [  383.375678] Memory failure: 0x1f10: reserved kernel page still referenced by 1 users
> >>>>   [  383.377570] Memory failure: 0x1f10: recovery action for reserved kernel page: Failed
> >>>>
> >>>> I'm not sure how/when this behavior was introduced, so I try to understand.
> >>>
> >>> I found that this is a zero page, which is not recoverable for memory
> >>> error now.
> >>>
> >>>> IMO, the test code below looks valid to me, so no need to change.
> >>>
> >>> I think that what the testcase effectively does is to test whether memory
> >>> handling on zero pages works or not.
> >>> And the testcase's failure seems acceptable, because it's simply not-implemented yet.
> >>> Maybe recovering from error on zero page is possible (because there's no data
> >>> loss for memory error,) but I'm not sure that code might be simple enough and/or
> >>> it's worth doing ...
> >> I question about it,  if a memory error happened on zero page, it will
> >> cause all of data read from zero page is error, I mean no-zero, right?
> >
> > Hi Yisheng,
> >
> > Yes, the impact is serious (could affect many processes,) but it's possibility
> > is very low because there's only one page in a system that is used for zero page.
> > There are many other pages which are not recoverable for memory error like
> > slab pages, so I'm not sure how I prioritize it (maybe it's not a
> > top-priority thing, nor low-hanging fruit.)
> >
> >> And can we just use re-initial it with zero data maybe by memset ?
> >
> > Maybe it's not enoguh. Under a real hwpoison, we should isolate the error
> > page to prevent the access on the broken data.
> > But zero page is statically defined as an array of global variable, so
> > it's not trival to replace it with a new zero page at runtime.
> >
> > Anyway, it's in my todo list, so hopefully revisited in the future.
> >
> 
> Hi Naoya,
> 
> The test case tries to HWPOISON a range of virtual addresses that do not
> map to any physical pages.
> 

Hi Yan,

> I expected either madvise should fail because HWPOISON does not work on
> non-existing physical pages or madvise_hwpoison() should populate
> some physical pages for that virtual address range and poison them.

The latter is the current behavior. It just comes from get_user_pages_fast()
which not only finds the page and takes refcount, but also touch the page.

madvise(MADV_HWPOISON) is a test feature, and calling it for address backed
by no page doesn't simulate anything real. IOW, the behavior is undefined.
So I don't have a strong opinion about how it should behave.

> 
> As I tested it on kernel v4.10, the test application exited at
> madvise, because madvise returns -1 and error message is
> "Device or resource busy". I think this is a proper behavior.

yes, maybe we see the same thing, you can see in dmesg "recovery action
for reserved kernel page: Failed" message.

> 
> There might be some confusion in madvise's man page on MADV_HWPOISON.
> If you add some text saying madvise fails if any page is not mapped in
> the given address range, that can eliminate the confusion*

Writing it down to man page makes readers think this behavior is a part of
specification, that might not be good now because the failure in error
handling of zero page is not the eventually fixed behavior.
I mean that if zero page handles hwpoison properly in the future, madvise
will succeed without any confusion.
So I feel that we don't have to update man page for this issue.

Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>