All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Jue Wang <juew@google.com>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Hugh Dickins" <hughd@google.com>,
	"Naoya Horiguchi" <nao.horiguchi@gmail.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Tony Luck" <tony.luck@intel.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	"Greg Thelen" <gthelen@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp
Date: Thu, 11 Mar 2021 14:00:40 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.2103111312310.7859@eggly.anvils> (raw)
In-Reply-To: <CAPcxDJ6D5OS+XRYbqpr-7bhYCySX=jdPZdZvQL1ecSrQoDEieg@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2755 bytes --]

On Thu, 11 Mar 2021, Jue Wang wrote:
> On Thu, Mar 11, 2021 at 7:14 AM HORIGUCHI NAOYA(堀口 直也)
> <naoya.horiguchi@nec.com> wrote:
> > On Wed, Mar 10, 2021 at 11:22:18PM -0800, Hugh Dickins wrote:
> > >
> > > I'm not much into memory-failure myself, but Jue discovered that the
> > > SIGBUS never arrives: because split_huge_page() on a shmem or file
> > > THP unmaps all its pmds and ptes, and (unlike with anon) leaves them
> > > unmapped - in normal circumstances, to be faulted back on demand.
> > > So the page_mapped() check in hwpoison_user_mappings() fails,
> > > and the intended SIGBUS is not delivered.
> >
> > Thanks for the information.  The split behaves quite differently between
> > for anon thp and for shmem thp.  I saw some unexpected behavior in my
> > testing, maybe that's due to the difference.
> >
> > >
> > > (Or, is it acceptable that the SIGBUS is not delivered to those who
> > > have the huge page mapped: should it get delivered later, to anyone
> > > who faults back in the bad 4k?)
> >
> > Later access should report error in page fault, so the worst scenario
> > of consuming corrupted data does not happen, but precautionary signal
> > does not work so it's not acceptable.

On the other hand, if split_huge_page() does succeed, then there is an
argument that it would be better not to SIGBUS all mappers of parts of
the THP, but wait to select only those re-accessing the one bad 4k.

> In our experiment with SHMEM THPs, later accesses resulted in a zero
> page allocated instead of a SIGBUS with BUS_MCEERR_AR reported by the
> page fault handler. That part might be an opportunity to prevent some
> silent data corruption just in case.

Thanks for filling in more detail, Jue: I understand better now.

Maybe mm/shmem.c is wrong to be using generic_error_remove_page(),
the function which punches a hole on memory-failure.

That works well for filesystems backed by storage (at least when the
page had not been modified), because it does not (I think) actually
punch a hole in the stored object; and the next touch at that offset of
the file will allocate a new cache page to be filled from good storage.

But in the case of shmem (if we ignore the less likely swap cache case)
there is no storage to read back good data from, so the next touch just
fills a new cache page with zeroes (as you report above).

I don't know enough of the philosophy of memory-failure to say, but
I can see there's an argument for leaving the bad page in cache, to
give SIGBUS or EFAULT or EIO (whether by observation of PageHWPoison,
or by another MCE) to whoever accesses it - until the file or that
part of it is deleted (then that page never returned to use again).

Hugh

  reply	other threads:[~2021-03-11 22:01 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09  6:21 [PATCH v1] mm, hwpoison: enable error handling on shmem thp Naoya Horiguchi
2021-02-09 19:46 ` Andrew Morton
2021-02-09 23:51   ` Naoya Horiguchi
2021-02-11  8:06 ` Oscar Salvador
2021-03-11  7:22 ` Hugh Dickins
2021-03-11  7:22   ` Hugh Dickins
2021-03-11 14:45   ` Matthew Wilcox
2021-03-11 15:14   ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 19:32     ` Jue Wang
2021-03-11 19:32       ` Jue Wang
2021-03-11 22:00       ` Hugh Dickins [this message]
2021-03-18 18:33         ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2103111312310.7859@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.