linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: "Jue Wang" <juew@google.com>,
	"HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Naoya Horiguchi" <nao.horiguchi@gmail.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Tony Luck" <tony.luck@intel.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	"Greg Thelen" <gthelen@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Andi Kleen" <ak@linux.intel.com>
Subject: Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp
Date: Thu, 18 Mar 2021 18:33:50 +0000	[thread overview]
Message-ID: <20210318183350.GT3420@casper.infradead.org> (raw)
In-Reply-To: <alpine.LSU.2.11.2103111312310.7859@eggly.anvils>

On Thu, Mar 11, 2021 at 02:00:40PM -0800, Hugh Dickins wrote:
> On Thu, 11 Mar 2021, Jue Wang wrote:
> > In our experiment with SHMEM THPs, later accesses resulted in a zero
> > page allocated instead of a SIGBUS with BUS_MCEERR_AR reported by the
> > page fault handler. That part might be an opportunity to prevent some
> > silent data corruption just in case.
> 
> Thanks for filling in more detail, Jue: I understand better now.
> 
> Maybe mm/shmem.c is wrong to be using generic_error_remove_page(),
> the function which punches a hole on memory-failure.
> 
> That works well for filesystems backed by storage (at least when the
> page had not been modified), because it does not (I think) actually
> punch a hole in the stored object; and the next touch at that offset of
> the file will allocate a new cache page to be filled from good storage.
> 
> But in the case of shmem (if we ignore the less likely swap cache case)
> there is no storage to read back good data from, so the next touch just
> fills a new cache page with zeroes (as you report above).
> 
> I don't know enough of the philosophy of memory-failure to say, but
> I can see there's an argument for leaving the bad page in cache, to
> give SIGBUS or EFAULT or EIO (whether by observation of PageHWPoison,
> or by another MCE) to whoever accesses it - until the file or that
> part of it is deleted (then that page never returned to use again).

I think you have it right here.  If the page hasn't been modified since
being read in from swap, it's perfectly fine to discard it (because the
data can be read back from swap if accessed again).  But if it has been
modified, the user has lost data and must be made aware of this.

It seems to me that other filesystems also get this wrong.  If the page
is dirty (eg for ext2/ext4/xfs), we'll silently throw away the user's
data and reload it from storage.  That's spelled "silent data corruption".

So I think this is part of the solution here:

+++ b/mm/truncate.c
@@ -236,6 +236,8 @@ int generic_error_remove_page(struct address_space *mapping, struct page *page)
         */
        if (!S_ISREG(mapping->host->i_mode))
                return -EIO;
+       if (PageDirty(page))
+               return -EBUSY;
        return truncate_inode_page(mapping, page);
 }
 EXPORT_SYMBOL(generic_error_remove_page);

Things I don't know ...

 - Does shmem mark pages dirty in the right circumstances for this to work?
 - How does memory-failure cope with an -EBUSY return from
   ->error_remove_page() today?  It seems to treat all errors the same
   and return MF_FAILED, but I don't know whether that's going to do
   the right thing.
 - Do we correctly decline to evict a HWPoison page from the page cache
   until the page is truncated (hole punched, whatever).

      reply	other threads:[~2021-03-18 18:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09  6:21 Naoya Horiguchi
2021-02-09 19:46 ` Andrew Morton
2021-02-09 23:51   ` Naoya Horiguchi
2021-02-11  8:06 ` Oscar Salvador
2021-03-11  7:22 ` Hugh Dickins
2021-03-11 14:45   ` Matthew Wilcox
     [not found]   ` <20210311151446.GA28650@hori.linux.bs1.fc.nec.co.jp>
2021-03-11 19:32     ` Jue Wang
2021-03-11 22:00       ` Hugh Dickins
2021-03-18 18:33         ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210318183350.GT3420@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=gthelen@google.com \
    --cc=hughd@google.com \
    --cc=juew@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --subject='Re: [PATCH v1] mm, hwpoison: enable error handling on shmem thp' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).