linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	Borislav Petkov <bp@alien8.de>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-edac@vger.kernel.org, Linux MM <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: [RFC] Make the memory failure blast radius more precise
Date: Wed, 24 Jun 2020 18:18:37 -0700	[thread overview]
Message-ID: <CAPcyv4i0Myp+wjwOk8Gofo-PUmxmoD7GyzwJ_kEzGdcCbe73qA@mail.gmail.com> (raw)
In-Reply-To: <20200625001740.GX21350@casper.infradead.org>

On Wed, Jun 24, 2020 at 5:17 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jun 24, 2020 at 04:21:24PM -0700, Dan Williams wrote:
> > On Wed, Jun 24, 2020 at 5:10 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > On Tue, Jun 23, 2020 at 05:01:24PM -0700, Darrick J. Wong wrote:
> > > > Frankly, I've wondered why the filesystem shouldn't just be in charge of
> > > > all this--
> > > >
> > > > 1. kernel receives machine check
> > > > 2. kernel tattles to xfs
> > > > 3. xfs looks up which file(s) own the pmem range
> > > > 4. xfs zeroes the region, clears the poison, and sets AS_EIO on the
> > > >    files
> > >
> > > ... machine reboots, app restarts, gets no notification anything is wrong,
> > > treats zeroed region as good data, launches nuclear missiles.
> >
> > Isn't AS_EIO stored persistently in the file block allocation map?
>
> No.  AS_EIO is in mapping->flags.  Unless Darrick was using "sets AS_EIO"
> as shorthand for something else.
>
> > Even if it isn't today that is included in the proposal that the
> > filesystem maintains a list of poison that is coordinated with the
> > pmem driver.
>
> I'd like to see a concrete proposal here.

There's still details to work through with respect to reflink. The
latest discussion was that thread I linked about how to solve the
page->index collision [1] for reverse mapping pages to files.

[1]: https://lore.kernel.org/linux-ext4/20200311063942.GE10776@dread.disaster.area/

>
> > > > Apps shouldn't have to do this punch-and-reallocate dance, seeing as
> > > > they don't currently do that for SCSI disks and the like.
> > >
> > > The SCSI disk retains the error until the sector is rewritten.
> > > I'm not entirely sure whether you're trying to draw an analogy with
> > > error-in-page-cache or error-on-storage-medium.
> > >
> > > error-on-medium needs to persist until the app takes an affirmative step
> > > to clear it.  I presume XFS does not write zeroes to sectors with
> > > errors on SCSI disks ...
> >
> > SCSI does not have an async mechanism to retrieve a list of poisoned
> > blocks from the hardware (that I know of), pmem does. I really think
> > we should not glom on pmem error handling semantics on top of the same
> > infrastructure that it has handling volatile / replaceable pages. When
>
> Erm ... commit 6100e34b2526 has your name on it.

Yes, and we're having this conversation because it turns out
mm/memory-failure.c enabling for DAX is insufficient.

>
> > the filesystem is enabled to get involved it should impose a different
> > model than generic memory error handling especially because generic
> > memory-error handling has no chance to solve the reflink problem.
> >
> > If an application wants to survive poison consumption, signals seem
> > only sufficient for interrupting an application that needs to take
> > immediate action because one of its instructions was prevented from
> > making forward progress. The interface for enumerating the extent of
> > errors for DAX goes beyond what signinfo can reasonably convey, that
> > piece is where the filesystem can be called to discover which file
> > extents are impacted by poison.
> >
> > I like Darrick's idea that the kernel stabilizes the storage by
> > default, and that the repair mechanism is just a write(2). I assume
> > "stabilize" means make sure that the file offset is permanently
> > recorded as poisoned until the next write(2), but read(2) and mmap(2)
> > return errors so no more machine checks are triggered.
>
> That seems like something we'd want to work into the iomap infrastructure,
> perhaps.  Add an IOMAP_POISONED to indicate this range needs to be
> written before it can be read?

Yes, an explicit error state for an extent range is needed for the fs
to offload the raw hardware poison list into software tracking.

  reply	other threads:[~2020-06-25  1:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 20:17 [RFC] Make the memory failure blast radius more precise Matthew Wilcox
2020-06-23 21:48 ` Dan Williams
2020-06-23 22:04 ` Luck, Tony
2020-06-23 22:17   ` Matthew Wilcox
2020-06-23 22:26     ` Luck, Tony
2020-06-23 22:40       ` Matthew Wilcox
2020-06-24  0:01         ` Darrick J. Wong
2020-06-24 12:10           ` Matthew Wilcox
2020-06-24 23:21             ` Dan Williams
2020-06-25  0:17               ` Matthew Wilcox
2020-06-25  1:18                 ` Dan Williams [this message]
2020-06-24 21:22         ` Jane Chu
2020-06-25  0:13           ` Luck, Tony
2020-06-25 16:23             ` Jane Chu
2020-06-24  4:32   ` David Rientjes
2020-06-24 20:57     ` Jane Chu
2020-06-24 22:01       ` David Rientjes
2020-06-25  2:16     ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4i0Myp+wjwOk8Gofo-PUmxmoD7GyzwJ_kEzGdcCbe73qA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=bp@alien8.de \
    --cc=darrick.wong@oracle.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=tony.luck@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).