nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Linux NVDIMM <nvdimm@lists.linux.dev>,
	Jane Chu <jane.chu@oracle.com>,
	 Luis Chamberlain <mcgrof@suse.com>,
	Tony Luck <tony.luck@intel.com>
Subject: Re: [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem
Date: Tue, 14 Sep 2021 11:08:00 -0700	[thread overview]
Message-ID: <CAPcyv4hNzR8ExvYxguvyu6N6Md1x0QVSnDF_5G1WSruK=gvgEA@mail.gmail.com> (raw)
In-Reply-To: <YT8n+ae3lBQjqoDs@zn.tnic>

On Mon, Sep 13, 2021 at 3:29 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Jul 06, 2021 at 06:01:05PM -0700, Dan Williams wrote:
> > When poison is discovered and triggers memory_failure() the physical
> > page is unmapped from all process address space. However, it is not
> > unmapped from kernel address space. Unlike a typical memory page that
> > can be retired from use in the page allocator and marked 'not present',
> > pmem needs to remain accessible given it can not be physically remapped
> > or retired.
>
> I'm surely missing something obvious but why does it need to remain
> accessible? Spell it out please.

Sure, I should probably include this following note in all patches
touching the DAX-memory_failure() path, because it is a frequently
asked question. The tl;dr is:

Typical memory_failure() does not assume the physical page can be
recovered and put back into circulation, PMEM memory_failure() allows
for recovery of the page.

The longer description is:
Typical memory_failure() for anonymous, or page-cache pages, has the
flexibility to invalidate bad pages and trigger any users to request a
new page from the page allocator to replace the quarantined one. DAX
removes that flexibility. The page is a handle for a fixed storage
location, i.e. no mechanism to remap a physical page to a different
logical address. Software expects to be able to repair an error in
PMEM by reading around the poisoned cache lines and writing zeros,
fallocate(...FALLOC_FL_PUNCH_HOLE...), to overwrite poison. The page
needs to remain accessible to enable recovery.

>
> > set_memory_uc() tries to maintain consistent nominal memtype
> > mappings for a given pfn, but memory_failure() is an exceptional
> > condition.
>
> That's not clear to me too. So looking at the failure:
>
> [10683.426147] x86/PAT: fsdax_poison_v1:5018 conflicting memory types 1850600000-1850601000  uncached-minus<->write-back
>
> set_memory_uc() marked it UC- but something? wants it to be WB. Why?

PMEM is mapped WB at the beginning of time for nominal operation.
track_pfn_remap() records that driver setting and forwards it to any
track_pfn_insert() of the same pfn, i.e. this is how DAX mappings
inherit the WB cache mode. memory_failure() wants to arrange avoidance
speculative consumption of poison, set_memory_uc() checks with the
track_pfn_remap() setting, but we know this is an exceptional
condition and it is ok to force it UC against the typical memtype
expectation.

> I guess I need some more info on the whole memory offlining for pmem and
> why that should be done differently than with normal memory.

Short answer, PMEM never goes "offline" because it was never "online"
in the first place. Where "online" in this context is specifically
referring to pfns that are under the watchful eye of the core-mm page
allocator.

  reply	other threads:[~2021-09-14 18:08 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  1:01 [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem Dan Williams
2021-08-26 19:08 ` Dan Williams
2021-08-27  7:12   ` Jane Chu
2021-09-13 10:29 ` Borislav Petkov
2021-09-14 18:08   ` Dan Williams [this message]
2021-09-15 10:41     ` Borislav Petkov
2021-09-16 20:33       ` Dan Williams
2021-09-17 11:30         ` Borislav Petkov
2021-09-21  2:04           ` Dan Williams
2021-09-30 17:19             ` Borislav Petkov
2021-09-30 17:28               ` Luck, Tony
2021-09-30 19:30                 ` Borislav Petkov
2021-09-30 19:41                   ` Dan Williams
2021-09-30 19:44                   ` Luck, Tony
2021-09-30 20:01                     ` Borislav Petkov
2021-09-30 20:15                       ` Luck, Tony
2021-09-30 20:32                         ` Borislav Petkov
2021-09-30 20:39                           ` Dan Williams
2021-09-30 20:54                             ` Borislav Petkov
2021-09-30 21:05                               ` Dan Williams
2021-09-30 21:20                                 ` Borislav Petkov
2021-09-30 21:41                                   ` Dan Williams
2021-09-30 22:35                                     ` Borislav Petkov
2021-09-30 22:44                                       ` Dan Williams
2021-10-01 10:41                                         ` Borislav Petkov
2021-10-01  0:43                                       ` Jane Chu
2021-10-01  2:02                                         ` Dan Williams
2021-10-01 10:50                                           ` Borislav Petkov
2021-10-01 16:52                                             ` Dan Williams
2021-10-01 18:11                                               ` Borislav Petkov
2021-10-01 18:29                                                 ` Dan Williams
2021-10-02 10:17                                                   ` Borislav Petkov
2021-11-11  0:06                                                     ` Jane Chu
2021-11-12  0:30                                                       ` Jane Chu
2021-11-12  0:51                                                         ` Dan Williams
2021-11-12 17:57                                                           ` Jane Chu
2021-11-12 19:24                                                             ` Dan Williams
2021-11-12 22:35                                                               ` Jane Chu
2021-11-12 22:50                                                                 ` Jane Chu
2021-11-12 23:08                                                                 ` Dan Williams
2021-11-13  5:50                                                                   ` Jane Chu
2021-11-13 20:47                                                                     ` Dan Williams
2021-11-18 19:03                                                                       ` Jane Chu
2021-11-25  0:16                                                                         ` Dan Williams
2021-11-30 23:00                                                                           ` Jane Chu
2021-09-30 18:15         ` Jane Chu
2021-09-30 19:11           ` Dan Williams
2021-09-30 21:23             ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hNzR8ExvYxguvyu6N6Md1x0QVSnDF_5G1WSruK=gvgEA@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=bp@alien8.de \
    --cc=jane.chu@oracle.com \
    --cc=mcgrof@suse.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).