nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Linux NVDIMM <nvdimm@lists.linux.dev>,
	Jane Chu <jane.chu@oracle.com>,
	 Luis Chamberlain <mcgrof@suse.com>,
	Tony Luck <tony.luck@intel.com>
Subject: Re: [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem
Date: Mon, 20 Sep 2021 19:04:50 -0700	[thread overview]
Message-ID: <CAPcyv4jOk_Ej5op9ZZM+=OCitUsmp0RCZNH=PFqYTCoUeXXCCg@mail.gmail.com> (raw)
In-Reply-To: <YUR8RTx9blI2ezvQ@zn.tnic>

On Fri, Sep 17, 2021 at 4:30 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Sep 16, 2021 at 01:33:42PM -0700, Dan Williams wrote:
> > I am specifically talking about the memory_failure_dev_pagemap() path
> > taken from memory_failure().
> >
> > > But then I have no clue what the "DAX-memory_failure()" path is.
> >
> > Sorry, I should not have lazily used {PMEM,DAX} memory_failure() to
> > refer to the exact routine in question: memory_failure_dev_pagemap().
> > That path is taken when memory_failure() sees that a pfn was mapped by
> > memremap_pages(). It determines that the pfn does not represent a
> > dynamic page allocator page and may refer to a static page of storage
> > from a device like /dev/pmem, or /dev/dax.
>
> Aaaha, that's that innocent-looking pfn_to_online_page() call there in
> memory_failure() which would return NULL for pmem/dax pages.

Exactly.

>
> > The zeroing of the poison comes optionally later if userspace
> > explicitly asks for it to be cleared.
>
> I see.
>
> > Yes. The device driver avoids reconsumption of the same error by
> > recording the error in a "badblocks" structure (block/badblocks.c).
> > The driver consults its badblocks instance on every subsequent access
> > and preemptively returns EIO. The driver registers for machine-check
> > notifications and translates those events into badblocks entries. So,
> > repeat consumption is avoided unless/until the babblocks entry can be
> > cleared along with the poison itself.
>
> Sounds good.
>
> > I'll note that going forward the filesystem wants to be more involved
> > in tracking and managing these errors so the driver will grow the
> > ability to forward those notifications up the stack.
>
> ... which would save the ask-the-driver each time thing. Yap.
>
> > It does. In addition to machine-check synchronous notification of
> > poison, there are asynchronous scanning mechanisms that prepopulate
> > the badblocks entries to get ahead of consumption events.
>
> Probably something similar to patrol scrubbing on normal DRAM.

Yes, although I believe that DRAM patrol scrubbing is being done from
the host memory controller, these PMEM DIMMs have firmware and DMA
engines *in the DIMM* to do this scrub work.

> > ... Because the entire page is dead there is no need for UC because all
> > the cachelines are gone, nothing to read-around in this page. In short
> > DRAM and PMEM want to share the exact same policy here and use NP for
> > whole_page() and UC for not. Just the small matter of ignoring the
> > memtype by using _set_memory_uc().
>
> Hmm, so looking at this more:
>
>         else
>                 rc = set_memory_uc(decoy_addr, 1);
>
> ->  memtype_reserve(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
>
> ->  memtype_reserve(__pa(addr), __pa(addr) + 1,
>
> Looking at
>
>   17fae1294ad9 ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
>
> it says
>
>     Fix is to check the scope of the poison by checking the MCi_MISC register.
>     If the entire page is affected, then unmap the page. If only part of the
>     page is affected, then mark the page as uncacheable.
>
> so I guess for normal DRAM we still want to map the page uncacheable so
> that other parts except that cacheline can still be read.

Perhaps, but I don't know how you do that if memory_failure() has
"offlined" the DRAM page, in the case of PMEM you can issue a
byte-aligned direct-I/O access to the exact storage locations around
the poisoned cachelines.

> And PMEM should be excluded from that treatise here because it needs to
> be WB, as you said earlier.
>
> Right?

PMEM can still go NP if the entire page is failed, so no need to
exclude PMEM from the treatise because the driver's badblocks
implementation will cover the NP page, and the driver can use
clear_mce_nospec() to recover the WB mapping / access after the poison
has been cleared.

  reply	other threads:[~2021-09-21  2:05 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  1:01 [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem Dan Williams
2021-08-26 19:08 ` Dan Williams
2021-08-27  7:12   ` Jane Chu
2021-09-13 10:29 ` Borislav Petkov
2021-09-14 18:08   ` Dan Williams
2021-09-15 10:41     ` Borislav Petkov
2021-09-16 20:33       ` Dan Williams
2021-09-17 11:30         ` Borislav Petkov
2021-09-21  2:04           ` Dan Williams [this message]
2021-09-30 17:19             ` Borislav Petkov
2021-09-30 17:28               ` Luck, Tony
2021-09-30 19:30                 ` Borislav Petkov
2021-09-30 19:41                   ` Dan Williams
2021-09-30 19:44                   ` Luck, Tony
2021-09-30 20:01                     ` Borislav Petkov
2021-09-30 20:15                       ` Luck, Tony
2021-09-30 20:32                         ` Borislav Petkov
2021-09-30 20:39                           ` Dan Williams
2021-09-30 20:54                             ` Borislav Petkov
2021-09-30 21:05                               ` Dan Williams
2021-09-30 21:20                                 ` Borislav Petkov
2021-09-30 21:41                                   ` Dan Williams
2021-09-30 22:35                                     ` Borislav Petkov
2021-09-30 22:44                                       ` Dan Williams
2021-10-01 10:41                                         ` Borislav Petkov
2021-10-01  0:43                                       ` Jane Chu
2021-10-01  2:02                                         ` Dan Williams
2021-10-01 10:50                                           ` Borislav Petkov
2021-10-01 16:52                                             ` Dan Williams
2021-10-01 18:11                                               ` Borislav Petkov
2021-10-01 18:29                                                 ` Dan Williams
2021-10-02 10:17                                                   ` Borislav Petkov
2021-11-11  0:06                                                     ` Jane Chu
2021-11-12  0:30                                                       ` Jane Chu
2021-11-12  0:51                                                         ` Dan Williams
2021-11-12 17:57                                                           ` Jane Chu
2021-11-12 19:24                                                             ` Dan Williams
2021-11-12 22:35                                                               ` Jane Chu
2021-11-12 22:50                                                                 ` Jane Chu
2021-11-12 23:08                                                                 ` Dan Williams
2021-11-13  5:50                                                                   ` Jane Chu
2021-11-13 20:47                                                                     ` Dan Williams
2021-11-18 19:03                                                                       ` Jane Chu
2021-11-25  0:16                                                                         ` Dan Williams
2021-11-30 23:00                                                                           ` Jane Chu
2021-09-30 18:15         ` Jane Chu
2021-09-30 19:11           ` Dan Williams
2021-09-30 21:23             ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4jOk_Ej5op9ZZM+=OCitUsmp0RCZNH=PFqYTCoUeXXCCg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=bp@alien8.de \
    --cc=jane.chu@oracle.com \
    --cc=mcgrof@suse.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).