From: Jane Chu <jane.chu@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
"Luck, Tony" <tony.luck@intel.com>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
Luis Chamberlain <mcgrof@suse.com>
Subject: Re: [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem
Date: Thu, 18 Nov 2021 19:03:59 +0000 [thread overview]
Message-ID: <b51fb3d6-6d39-c450-e0a1-94a1645a22ec@oracle.com> (raw)
In-Reply-To: <CAPcyv4jBHnYtqoxoJY1NGNE1DXOv3bAg0gBzjZ=eOvarVXDRbA@mail.gmail.com>
On 11/13/2021 12:47 PM, Dan Williams wrote:
<snip>
>>> It should know because the MCE that unmapped the page will have
>>> communicated a "whole_page()" MCE. When dax_recovery_read() goes to
>>> consult the badblocks list to try to read the remaining good data it
>>> will see that every single cacheline is covered by badblocks, so
>>> nothing to read, and no need to establish the UC mapping. So the the
>>> "Tony fix" was incomplete in retrospect. It neglected to update the
>>> NVDIMM badblocks tracking for the whole page case.
>>
>> So the call in nfit_handle_mce():
>> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>> ALIGN(mce->addr, L1_CACHE_BYTES),
>> L1_CACHE_BYTES);
>> should be replaced by
>> nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>> ALIGN(mce->addr, L1_CACHE_BYTES),
>> (1 << MCI_MISC_ADDR_LSB(m->misc)));
>> right?
>
> Yes.
>
>>
>> And when dax_recovery_read() calls
>> badblocks_check(bb, sector, len / 512, &first_bad, &num_bad)
>> it should always, in case of 'NP', discover that 'first_bad'
>> is the first sector in the poisoned page, hence no need
>> to switch to 'UC', right?
>
> Yes.
>
>>
>> In case the 'first_bad' is in the middle of the poisoned page,
>> that is, dax_recover_read() could potentially read some clean
>> sectors, is there problem to
>> call _set_memory_UC(pfn, 1),
>> do the mc_safe read,
>> and then call set_memory_NP(pfn, 1)
>> ?
>> Why do we need to call ioremap() or vmap()?
>
> I'm worried about concurrent operations and enabling access to threads
> outside of the one currently in dax_recovery_read(). If a local vmap()
> / ioremap() is used it effectively makes the access thread local.
> There might still need to be an rwsem to allow dax_recovery_write() to
> fixup the pfn access and syncrhonize with dax_recovery_read()
> operations.
>
<snip>
>> I didn't even know that guest could clear poison by trapping hypervisor
>> with the ClearError DSM method,
>
> The guest can call the Clear Error DSM if the virtual BIOS provides
> it. Whether that actually clears errors or not is up to the
> hypervisor.
>
>> I thought guest isn't privileged with that.
>
> The guest does not have access to the bare metal DSM path, but the
> hypervisor can certainly offer translation service for that operation.
>
>> Would you mind to elaborate about the mechanism and maybe point
>> out the code, and perhaps if you have test case to share?
>
> I don't have a test case because until Tony's fix I did not realize
> that a virtual #MC would allow the guest to learn of poisoned
> locations without necessarily allowing the guest to trigger actual
> poison consumption.
>
> In other words I was operating under the assumption that telling
> guests where poison is located is potentially handing the guest a way
> to DoS the VMM. However, Tony's fix shows that when the hypervisor
> unmaps the guest physical page it can prevent the guest from accessing
> it again. So it follows that it should be ok to inject virtual #MC to
> the guest, and unmap the guest physical range, but later allow that
> guest physical range to be repaired if the guest asks the hypervisor
> to repair the page.
>
> Tony, does this match your understanding?
>
>>
>> but I'm not sure what to do about
>>> guests that later want to use MOVDIR64B to clear errors.
>>>
>>
>> Yeah, perhaps there is no way to prevent guest from accidentally
>> clear error via MOVDIR64B, given some application rely on MOVDIR64B
>> for fast data movement (straight to the media). I guess in that case,
>> the consequence is false alarm, nothing disastrous, right?
>
> You'll just continue to get false positive failures because the error
> tracking will be out-of-sync with reality.
>
>> How about allowing the potential bad-block bookkeeping gap, and
>> manage to close the gap at certain checkpoints? I guess one of
>> the checkpoints might be when page fault discovers a poisoned
>> page?
>
> Not sure how that would work... it's already the case that new error
> entries are appended to the list at #MC time, the problem is knowing
> when to clear those stale entries. I still think that needs to be at
> dax_recovery_write() time.
>
Thanks Dan for taking the time elaborating so much details!
After some amount of digging, I have a feel that we need to take
dax error handling in phases.
Phase-1: the simplest dax_recovery_write on page granularity, along
with fix to set poisoned page to 'NP', serialize
dax_recovery_write threads.
Phase-2: provide dax_recovery_read support and hence shrink the error
recovery granularity. As ioremap returns __iomem pointer
that is only allowed to be referenced with helpers like
readl() which do not have a mc_safe variant, and I'm
not sure whether there should be. Also the synchronization
between dax_recovery_read and dax_recovery_write threads.
Phase-3: the hypervisor error-record keeping issue, suppose there is
an issue, I'll need to figure out how to setup a test case.
Phase-4: the how-to-mitigate-MOVDIR64B-false-alarm issue.
Right now, it seems to me providing Phase-1 solution is urgent, to give
something that customers can rely on.
How does this sound to you?
thanks,
-jane
next prev parent reply other threads:[~2021-11-18 19:04 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-07 1:01 [RFT PATCH] x86/pat: Fix set_mce_nospec() for pmem Dan Williams
2021-08-26 19:08 ` Dan Williams
2021-08-27 7:12 ` Jane Chu
2021-09-13 10:29 ` Borislav Petkov
2021-09-14 18:08 ` Dan Williams
2021-09-15 10:41 ` Borislav Petkov
2021-09-16 20:33 ` Dan Williams
2021-09-17 11:30 ` Borislav Petkov
2021-09-21 2:04 ` Dan Williams
2021-09-30 17:19 ` Borislav Petkov
2021-09-30 17:28 ` Luck, Tony
2021-09-30 19:30 ` Borislav Petkov
2021-09-30 19:41 ` Dan Williams
2021-09-30 19:44 ` Luck, Tony
2021-09-30 20:01 ` Borislav Petkov
2021-09-30 20:15 ` Luck, Tony
2021-09-30 20:32 ` Borislav Petkov
2021-09-30 20:39 ` Dan Williams
2021-09-30 20:54 ` Borislav Petkov
2021-09-30 21:05 ` Dan Williams
2021-09-30 21:20 ` Borislav Petkov
2021-09-30 21:41 ` Dan Williams
2021-09-30 22:35 ` Borislav Petkov
2021-09-30 22:44 ` Dan Williams
2021-10-01 10:41 ` Borislav Petkov
2021-10-01 0:43 ` Jane Chu
2021-10-01 2:02 ` Dan Williams
2021-10-01 10:50 ` Borislav Petkov
2021-10-01 16:52 ` Dan Williams
2021-10-01 18:11 ` Borislav Petkov
2021-10-01 18:29 ` Dan Williams
2021-10-02 10:17 ` Borislav Petkov
2021-11-11 0:06 ` Jane Chu
2021-11-12 0:30 ` Jane Chu
2021-11-12 0:51 ` Dan Williams
2021-11-12 17:57 ` Jane Chu
2021-11-12 19:24 ` Dan Williams
2021-11-12 22:35 ` Jane Chu
2021-11-12 22:50 ` Jane Chu
2021-11-12 23:08 ` Dan Williams
2021-11-13 5:50 ` Jane Chu
2021-11-13 20:47 ` Dan Williams
2021-11-18 19:03 ` Jane Chu [this message]
2021-11-25 0:16 ` Dan Williams
2021-11-30 23:00 ` Jane Chu
2021-09-30 18:15 ` Jane Chu
2021-09-30 19:11 ` Dan Williams
2021-09-30 21:23 ` Jane Chu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b51fb3d6-6d39-c450-e0a1-94a1645a22ec@oracle.com \
--to=jane.chu@oracle.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=mcgrof@suse.com \
--cc=nvdimm@lists.linux.dev \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).