nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
	"Luck, Tony" <tony.luck@intel.com>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Luis Chamberlain <mcgrof@suse.com>
Subject: Re: [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem
Date: Thu, 18 Nov 2021 19:03:59 +0000	[thread overview]
Message-ID: <b51fb3d6-6d39-c450-e0a1-94a1645a22ec@oracle.com> (raw)
In-Reply-To: <CAPcyv4jBHnYtqoxoJY1NGNE1DXOv3bAg0gBzjZ=eOvarVXDRbA@mail.gmail.com>

On 11/13/2021 12:47 PM, Dan Williams wrote:
<snip>
>>> It should know because the MCE that unmapped the page will have
>>> communicated a "whole_page()" MCE. When dax_recovery_read() goes to
>>> consult the badblocks list to try to read the remaining good data it
>>> will see that every single cacheline is covered by badblocks, so
>>> nothing to read, and no need to establish the UC mapping. So the the
>>> "Tony fix" was incomplete in retrospect. It neglected to update the
>>> NVDIMM badblocks tracking for the whole page case.
>>
>> So the call in nfit_handle_mce():
>>     nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>>                   ALIGN(mce->addr, L1_CACHE_BYTES),
>>                   L1_CACHE_BYTES);
>> should be replaced by
>>     nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>>                   ALIGN(mce->addr, L1_CACHE_BYTES),
>>                   (1 << MCI_MISC_ADDR_LSB(m->misc)));
>> right?
> 
> Yes.
> 
>>
>> And when dax_recovery_read() calls
>>     badblocks_check(bb, sector, len / 512, &first_bad, &num_bad)
>> it should always, in case of 'NP', discover that 'first_bad'
>> is the first sector in the poisoned page,  hence no need
>> to switch to 'UC', right?
> 
> Yes.
> 
>>
>> In case the 'first_bad' is in the middle of the poisoned page,
>> that is, dax_recover_read() could potentially read some clean
>> sectors, is there problem to
>>     call _set_memory_UC(pfn, 1),
>>     do the mc_safe read,
>>     and then call set_memory_NP(pfn, 1)
>> ?
>> Why do we need to call ioremap() or vmap()?
> 
> I'm worried about concurrent operations and enabling access to threads
> outside of the one currently in dax_recovery_read(). If a local vmap()
> / ioremap() is used it effectively makes the access thread local.
> There might still need to be an rwsem to allow dax_recovery_write() to
> fixup the pfn access and syncrhonize with dax_recovery_read()
> operations.
> 

<snip>
>> I didn't even know that guest could clear poison by trapping hypervisor
>> with the ClearError DSM method,
> 
> The guest can call the Clear Error DSM if the virtual BIOS provides
> it. Whether that actually clears errors or not is up to the
> hypervisor.
> 
>> I thought guest isn't privileged with that.
> 
> The guest does not have access to the bare metal DSM path, but the
> hypervisor can certainly offer translation service for that operation.
> 
>> Would you mind to elaborate about the mechanism and maybe point
>> out the code, and perhaps if you have test case to share?
> 
> I don't have a test case because until Tony's fix I did not realize
> that a virtual #MC would allow the guest to learn of poisoned
> locations without necessarily allowing the guest to trigger actual
> poison consumption.
> 
> In other words I was operating under the assumption that telling
> guests where poison is located is potentially handing the guest a way
> to DoS the VMM. However, Tony's fix shows that when the hypervisor
> unmaps the guest physical page it can prevent the guest from accessing
> it again. So it follows that it should be ok to inject virtual #MC to
> the guest, and unmap the guest physical range, but later allow that
> guest physical range to be repaired if the guest asks the hypervisor
> to repair the page.
> 
> Tony, does this match your understanding?
> 
>>
>> but I'm not sure what to do about
>>> guests that later want to use MOVDIR64B to clear errors.
>>>
>>
>> Yeah, perhaps there is no way to prevent guest from accidentally
>> clear error via MOVDIR64B, given some application rely on MOVDIR64B
>> for fast data movement (straight to the media). I guess in that case,
>> the consequence is false alarm, nothing disastrous, right?
> 
> You'll just continue to get false positive failures because the error
> tracking will be out-of-sync with reality.
> 
>> How about allowing the potential bad-block bookkeeping gap, and
>> manage to close the gap at certain checkpoints? I guess one of
>> the checkpoints might be when page fault discovers a poisoned
>> page?
> 
> Not sure how that would work... it's already the case that new error
> entries are appended to the list at #MC time, the problem is knowing
> when to clear those stale entries. I still think that needs to be at
> dax_recovery_write() time.
> 

Thanks Dan for taking the time elaborating so much details!

After some amount of digging, I have a feel that we need to take
dax error handling in phases.

Phase-1: the simplest dax_recovery_write on page granularity, along
          with fix to set poisoned page to 'NP', serialize
          dax_recovery_write threads.
Phase-2: provide dax_recovery_read support and hence shrink the error
          recovery granularity.  As ioremap returns __iomem pointer
          that is only allowed to be referenced with helpers like
          readl() which do not have a mc_safe variant, and I'm
          not sure whether there should be.  Also the synchronization
          between dax_recovery_read and dax_recovery_write threads.
Phase-3: the hypervisor error-record keeping issue, suppose there is
          an issue, I'll need to figure out how to setup a test case.
Phase-4: the how-to-mitigate-MOVDIR64B-false-alarm issue.

Right now, it seems to me providing Phase-1 solution is urgent, to give
something that customers can rely on.

How does this sound to you?

thanks,
-jane






  reply	other threads:[~2021-11-18 19:04 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-07  1:01 [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem Dan Williams
2021-08-26 19:08 ` Dan Williams
2021-08-27  7:12   ` Jane Chu
2021-09-13 10:29 ` Borislav Petkov
2021-09-14 18:08   ` Dan Williams
2021-09-15 10:41     ` Borislav Petkov
2021-09-16 20:33       ` Dan Williams
2021-09-17 11:30         ` Borislav Petkov
2021-09-21  2:04           ` Dan Williams
2021-09-30 17:19             ` Borislav Petkov
2021-09-30 17:28               ` Luck, Tony
2021-09-30 19:30                 ` Borislav Petkov
2021-09-30 19:41                   ` Dan Williams
2021-09-30 19:44                   ` Luck, Tony
2021-09-30 20:01                     ` Borislav Petkov
2021-09-30 20:15                       ` Luck, Tony
2021-09-30 20:32                         ` Borislav Petkov
2021-09-30 20:39                           ` Dan Williams
2021-09-30 20:54                             ` Borislav Petkov
2021-09-30 21:05                               ` Dan Williams
2021-09-30 21:20                                 ` Borislav Petkov
2021-09-30 21:41                                   ` Dan Williams
2021-09-30 22:35                                     ` Borislav Petkov
2021-09-30 22:44                                       ` Dan Williams
2021-10-01 10:41                                         ` Borislav Petkov
2021-10-01  0:43                                       ` Jane Chu
2021-10-01  2:02                                         ` Dan Williams
2021-10-01 10:50                                           ` Borislav Petkov
2021-10-01 16:52                                             ` Dan Williams
2021-10-01 18:11                                               ` Borislav Petkov
2021-10-01 18:29                                                 ` Dan Williams
2021-10-02 10:17                                                   ` Borislav Petkov
2021-11-11  0:06                                                     ` Jane Chu
2021-11-12  0:30                                                       ` Jane Chu
2021-11-12  0:51                                                         ` Dan Williams
2021-11-12 17:57                                                           ` Jane Chu
2021-11-12 19:24                                                             ` Dan Williams
2021-11-12 22:35                                                               ` Jane Chu
2021-11-12 22:50                                                                 ` Jane Chu
2021-11-12 23:08                                                                 ` Dan Williams
2021-11-13  5:50                                                                   ` Jane Chu
2021-11-13 20:47                                                                     ` Dan Williams
2021-11-18 19:03                                                                       ` Jane Chu [this message]
2021-11-25  0:16                                                                         ` Dan Williams
2021-11-30 23:00                                                                           ` Jane Chu
2021-09-30 18:15         ` Jane Chu
2021-09-30 19:11           ` Dan Williams
2021-09-30 21:23             ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b51fb3d6-6d39-c450-e0a1-94a1645a22ec@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=mcgrof@suse.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).