linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jane Chu <jane.chu@oracle.com>
Subject: Re: [RFC] Make the memory failure blast radius more precise
Date: Tue, 23 Jun 2020 15:04:12 -0700	[thread overview]
Message-ID: <20200623220412.GA21232@agluck-desk2.amr.corp.intel.com> (raw)
In-Reply-To: <20200623201745.GG21350@casper.infradead.org>

On Tue, Jun 23, 2020 at 09:17:45PM +0100, Matthew Wilcox wrote:
> 
> Hardware actually tells us the blast radius of the error, but we ignore
> it and take out the entire page.  We've had a customer request to know
> exactly how much of the page is damaged so they can avoid reconstructing
> an entire 2MB page if only a single cacheline is damaged.
> 
> This is only a strawman that I did in an hour or two; I'd appreciate
> architectural-level feedback.  Should I just convert memory_failure() to
> always take an address & granularity?  Should I create a struct to pass
> around (page, phys, granularity) instead of reconstructing the missing
> pieces in half a dozen functions?  Is this functionality welcome at all,
> or is the risk of upsetting applications which expect at least a page
> of granularity too high?

What is the interface to these applications that want finer granularity?

Current code does very poorly with hugetlbfs pages ... user loses the
whole 2 MB or 1GB. That's just silly (though I've been told that it is
hard to fix because allowing a hugetlbfs page to be broken up at an arbitrary
time as the result of a mahcine check means that the kernel needs locking
around a bunch of fas paths that currently assume that a huge page will
stay being a huge page).

For sub-4K page usage, there are different problems. We can't leave the
original page with the poisoned cache line mapped to the user as they may
just access the poison data and trigger another machine check. But if we
map in some different page with all the good bits copied, the user needs
to be aware which parts of the page no longer have their data.

-Tony

  parent reply	other threads:[~2020-06-23 22:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 20:17 [RFC] Make the memory failure blast radius more precise Matthew Wilcox
2020-06-23 21:48 ` Dan Williams
2020-06-23 22:04 ` Luck, Tony [this message]
2020-06-23 22:17   ` Matthew Wilcox
2020-06-23 22:26     ` Luck, Tony
2020-06-23 22:40       ` Matthew Wilcox
2020-06-24  0:01         ` Darrick J. Wong
2020-06-24 12:10           ` Matthew Wilcox
2020-06-24 23:21             ` Dan Williams
2020-06-25  0:17               ` Matthew Wilcox
2020-06-25  1:18                 ` Dan Williams
2020-06-24 21:22         ` Jane Chu
2020-06-25  0:13           ` Luck, Tony
2020-06-25 16:23             ` Jane Chu
2020-06-24  4:32   ` David Rientjes
2020-06-24 20:57     ` Jane Chu
2020-06-24 22:01       ` David Rientjes
2020-06-25  2:16     ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200623220412.GA21232@agluck-desk2.amr.corp.intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=darrick.wong@oracle.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).