All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Jan Kara <jack@suse.cz>, linux-nvdimm <linux-nvdimm@lists.01.org>,
	X86 ML <x86@kernel.org>, Linux MM <linux-mm@kvack.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>,
	linux-edac@vger.kernel.org
Subject: Re: [PATCH v2 07/11] x86, memory_failure: Introduce {set, clear}_mce_nospec()
Date: Mon, 4 Jun 2018 11:35:06 -0700	[thread overview]
Message-ID: <CAPcyv4ggLr780-bQWVFqh-iC3z+coGwRnd-bM-nwWfXC+mVgfg@mail.gmail.com> (raw)
In-Reply-To: <20180604180845.GA17942@agluck-desk>

On Mon, Jun 4, 2018 at 11:08 AM, Luck, Tony <tony.luck@intel.com> wrote:
> On Mon, Jun 04, 2018 at 10:39:48AM -0700, Dan Williams wrote:
>> On Mon, Jun 4, 2018 at 10:08 AM, Luck, Tony <tony.luck@intel.com> wrote:
>> > On Sat, Jun 02, 2018 at 10:23:20PM -0700, Dan Williams wrote:
>> >> +static inline int set_mce_nospec(unsigned long pfn)
>> >> +{
>> >> +     int rc;
>> >> +
>> >> +     rc = set_memory_uc((unsigned long) __va(PFN_PHYS(pfn)), 1);
>> >
>> > You should really do the decoy_addr thing here that I had in mce_unmap_kpfn().
>> > Putting the virtual address of the page you mustn't accidentally prefetch
>> > from into a register is a pretty good way to make sure that the processor
>> > does do a prefetch.
>>
>> Maybe I'm misreading, but doesn't that make the page completely
>> inaccessible? We still want to read pmem through the driver and the
>> linear mapping with memcpy_mcsafe(). Alternatively I could just drop
>> this patch and setup a private / alias mapping for the pmem driver to
>> use. It seems aliased mappings would be the safer option, but I want
>> to make sure I've comprehended your suggestion correctly?
>
> I'm OK with the call to set_memory_uc() to make this uncacheable
> instead of set_memory_np() to make it inaccessible.
>
> The problem is how to achieve that.
>
> The result of __va(PFN_PHYS(pfn) is the virtual address where the poison
> page is currently mapped into the kernel. That value gets put into
> register %rdi to make the call to set_memory_uc() (which goes on to
> call a bunch of other functions passing the virtual address along
> the way).
>
> Now imagine an impatient super-speculative processor is waiting for
> some result to decide where to jump next, and picks a path that isn't
> going to be taken ... out in the weeds somewhere it runs into:
>
>         movzbl  (%rdi), %eax
>
> Oops ... now you just read from the address you were trying to
> avoid. So we log an error. Eventually the speculation gets sorted
> out and the processor knows not to signal a machine check. But the
> log is sitting in a machine check bank waiting to cause an overflow
> if we try to log a second error.
>
> The decoy_addr trick in mce_unmap_kpfn() throws in the high bit
> to the address passed.  The set_memory_np() code (and I assume the
> set_memory_uc()) code ignores it, but it means any stray speculative
> access won't point at the poison page.
>
> -Tony
>
> Note: this is *mostly* a problem if the poison is in the first
> cache line of the page.  But you could hit other lines if the
> instruction you speculatively ran into had the right offset. E.g.
> to hit the third line:
>
>         movzbl  128(%rdi), %eax

Ok, makes sense and I do see now that this decoy resolves to the same
physical address once PTE_PFN_MASK is applied when we start messing
with page tables.

However, set_memory_uc() is currently not prepared for this trick as
it specifies the unmasked physical address to reserve_memtype().

        ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
                              _PAGE_CACHE_MODE_UC_MINUS, NULL);

...compared to set_memory_np() which does not manipulate the memtype tracking.

I'll fix up reserve_memtype() and free_memtype() to be prepared for
decoy addresses.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>,
	linux-edac@vger.kernel.org, X86 ML <x86@kernel.org>,
	Christoph Hellwig <hch@lst.de>, Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: [v2,07/11] x86, memory_failure: Introduce {set, clear}_mce_nospec()
Date: Mon, 4 Jun 2018 11:35:06 -0700	[thread overview]
Message-ID: <CAPcyv4ggLr780-bQWVFqh-iC3z+coGwRnd-bM-nwWfXC+mVgfg@mail.gmail.com> (raw)

On Mon, Jun 4, 2018 at 11:08 AM, Luck, Tony <tony.luck@intel.com> wrote:
> On Mon, Jun 04, 2018 at 10:39:48AM -0700, Dan Williams wrote:
>> On Mon, Jun 4, 2018 at 10:08 AM, Luck, Tony <tony.luck@intel.com> wrote:
>> > On Sat, Jun 02, 2018 at 10:23:20PM -0700, Dan Williams wrote:
>> >> +static inline int set_mce_nospec(unsigned long pfn)
>> >> +{
>> >> +     int rc;
>> >> +
>> >> +     rc = set_memory_uc((unsigned long) __va(PFN_PHYS(pfn)), 1);
>> >
>> > You should really do the decoy_addr thing here that I had in mce_unmap_kpfn().
>> > Putting the virtual address of the page you mustn't accidentally prefetch
>> > from into a register is a pretty good way to make sure that the processor
>> > does do a prefetch.
>>
>> Maybe I'm misreading, but doesn't that make the page completely
>> inaccessible? We still want to read pmem through the driver and the
>> linear mapping with memcpy_mcsafe(). Alternatively I could just drop
>> this patch and setup a private / alias mapping for the pmem driver to
>> use. It seems aliased mappings would be the safer option, but I want
>> to make sure I've comprehended your suggestion correctly?
>
> I'm OK with the call to set_memory_uc() to make this uncacheable
> instead of set_memory_np() to make it inaccessible.
>
> The problem is how to achieve that.
>
> The result of __va(PFN_PHYS(pfn) is the virtual address where the poison
> page is currently mapped into the kernel. That value gets put into
> register %rdi to make the call to set_memory_uc() (which goes on to
> call a bunch of other functions passing the virtual address along
> the way).
>
> Now imagine an impatient super-speculative processor is waiting for
> some result to decide where to jump next, and picks a path that isn't
> going to be taken ... out in the weeds somewhere it runs into:
>
>         movzbl  (%rdi), %eax
>
> Oops ... now you just read from the address you were trying to
> avoid. So we log an error. Eventually the speculation gets sorted
> out and the processor knows not to signal a machine check. But the
> log is sitting in a machine check bank waiting to cause an overflow
> if we try to log a second error.
>
> The decoy_addr trick in mce_unmap_kpfn() throws in the high bit
> to the address passed.  The set_memory_np() code (and I assume the
> set_memory_uc()) code ignores it, but it means any stray speculative
> access won't point at the poison page.
>
> -Tony
>
> Note: this is *mostly* a problem if the poison is in the first
> cache line of the page.  But you could hit other lines if the
> instruction you speculatively ran into had the right offset. E.g.
> to hit the third line:
>
>         movzbl  128(%rdi), %eax

Ok, makes sense and I do see now that this decoy resolves to the same
physical address once PTE_PFN_MASK is applied when we start messing
with page tables.

However, set_memory_uc() is currently not prepared for this trick as
it specifies the unmasked physical address to reserve_memtype().

        ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
                              _PAGE_CACHE_MODE_UC_MINUS, NULL);

...compared to set_memory_np() which does not manipulate the memtype tracking.

I'll fix up reserve_memtype() and free_memtype() to be prepared for
decoy addresses.
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: linux-nvdimm <linux-nvdimm@lists.01.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>,
	linux-edac@vger.kernel.org, X86 ML <x86@kernel.org>,
	Christoph Hellwig <hch@lst.de>, Linux MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH v2 07/11] x86, memory_failure: Introduce {set, clear}_mce_nospec()
Date: Mon, 4 Jun 2018 11:35:06 -0700	[thread overview]
Message-ID: <CAPcyv4ggLr780-bQWVFqh-iC3z+coGwRnd-bM-nwWfXC+mVgfg@mail.gmail.com> (raw)
In-Reply-To: <20180604180845.GA17942@agluck-desk>

On Mon, Jun 4, 2018 at 11:08 AM, Luck, Tony <tony.luck@intel.com> wrote:
> On Mon, Jun 04, 2018 at 10:39:48AM -0700, Dan Williams wrote:
>> On Mon, Jun 4, 2018 at 10:08 AM, Luck, Tony <tony.luck@intel.com> wrote:
>> > On Sat, Jun 02, 2018 at 10:23:20PM -0700, Dan Williams wrote:
>> >> +static inline int set_mce_nospec(unsigned long pfn)
>> >> +{
>> >> +     int rc;
>> >> +
>> >> +     rc = set_memory_uc((unsigned long) __va(PFN_PHYS(pfn)), 1);
>> >
>> > You should really do the decoy_addr thing here that I had in mce_unmap_kpfn().
>> > Putting the virtual address of the page you mustn't accidentally prefetch
>> > from into a register is a pretty good way to make sure that the processor
>> > does do a prefetch.
>>
>> Maybe I'm misreading, but doesn't that make the page completely
>> inaccessible? We still want to read pmem through the driver and the
>> linear mapping with memcpy_mcsafe(). Alternatively I could just drop
>> this patch and setup a private / alias mapping for the pmem driver to
>> use. It seems aliased mappings would be the safer option, but I want
>> to make sure I've comprehended your suggestion correctly?
>
> I'm OK with the call to set_memory_uc() to make this uncacheable
> instead of set_memory_np() to make it inaccessible.
>
> The problem is how to achieve that.
>
> The result of __va(PFN_PHYS(pfn) is the virtual address where the poison
> page is currently mapped into the kernel. That value gets put into
> register %rdi to make the call to set_memory_uc() (which goes on to
> call a bunch of other functions passing the virtual address along
> the way).
>
> Now imagine an impatient super-speculative processor is waiting for
> some result to decide where to jump next, and picks a path that isn't
> going to be taken ... out in the weeds somewhere it runs into:
>
>         movzbl  (%rdi), %eax
>
> Oops ... now you just read from the address you were trying to
> avoid. So we log an error. Eventually the speculation gets sorted
> out and the processor knows not to signal a machine check. But the
> log is sitting in a machine check bank waiting to cause an overflow
> if we try to log a second error.
>
> The decoy_addr trick in mce_unmap_kpfn() throws in the high bit
> to the address passed.  The set_memory_np() code (and I assume the
> set_memory_uc()) code ignores it, but it means any stray speculative
> access won't point at the poison page.
>
> -Tony
>
> Note: this is *mostly* a problem if the poison is in the first
> cache line of the page.  But you could hit other lines if the
> instruction you speculatively ran into had the right offset. E.g.
> to hit the third line:
>
>         movzbl  128(%rdi), %eax

Ok, makes sense and I do see now that this decoy resolves to the same
physical address once PTE_PFN_MASK is applied when we start messing
with page tables.

However, set_memory_uc() is currently not prepared for this trick as
it specifies the unmasked physical address to reserve_memtype().

        ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
                              _PAGE_CACHE_MODE_UC_MINUS, NULL);

...compared to set_memory_np() which does not manipulate the memtype tracking.

I'll fix up reserve_memtype() and free_memtype() to be prepared for
decoy addresses.

  reply	other threads:[~2018-06-04 18:35 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-03  5:22 [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Dan Williams
2018-06-03  5:22 ` Dan Williams
2018-06-03  5:22 ` Dan Williams
2018-06-03  5:22 ` [PATCH v2 01/11] device-dax: Convert to vmf_insert_mixed and vm_fault_t Dan Williams
2018-06-03  5:22   ` Dan Williams
2018-06-03  5:22 ` [PATCH v2 02/11] device-dax: Cleanup vm_fault de-reference chains Dan Williams
2018-06-03  5:22   ` Dan Williams
2018-06-03  5:22 ` [PATCH v2 03/11] device-dax: Enable page_mapping() Dan Williams
2018-06-03  5:22   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 04/11] device-dax: Set page->index Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 05/11] filesystem-dax: " Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 06/11] mm, madvise_inject_error: Let memory_failure() optionally take a page reference Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 07/11] x86, memory_failure: Introduce {set, clear}_mce_nospec() Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23   ` [v2,07/11] " Dan Williams
2018-06-04 17:08   ` [PATCH v2 07/11] " Luck, Tony
2018-06-04 17:08     ` [v2,07/11] " Luck, Tony
2018-06-04 17:39     ` [PATCH v2 07/11] " Dan Williams
2018-06-04 17:39       ` Dan Williams
2018-06-04 17:39       ` [v2,07/11] " Dan Williams
2018-06-04 18:08       ` [PATCH v2 07/11] " Luck, Tony
2018-06-04 18:08         ` Luck, Tony
2018-06-04 18:08         ` [v2,07/11] " Luck, Tony
2018-06-04 18:35         ` Dan Williams [this message]
2018-06-04 18:35           ` [PATCH v2 07/11] " Dan Williams
2018-06-04 18:35           ` [v2,07/11] " Dan Williams
2018-06-03  5:23 ` [PATCH v2 08/11] mm, memory_failure: Pass page size to kill_proc() Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 09/11] mm, memory_failure: Fix page->mapping assumptions relative to the page lock Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 10/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-03  5:23 ` [PATCH v2 11/11] libnvdimm, pmem: Restore page attributes when clearing errors Dan Williams
2018-06-03  5:23   ` Dan Williams
2018-06-04 12:40 ` [PATCH v2 00/11] mm: Teach memory_failure() about ZONE_DEVICE pages Michal Hocko
2018-06-04 12:40   ` Michal Hocko
2018-06-04 14:31   ` Dan Williams
2018-06-04 14:31     ` Dan Williams
2018-06-05 14:11     ` Michal Hocko
2018-06-05 14:11       ` Michal Hocko
2018-06-05 14:33       ` Dan Williams
2018-06-05 14:33         ` Dan Williams
2018-06-06  7:39         ` Michal Hocko
2018-06-06  7:39           ` Michal Hocko
2018-06-06 13:44           ` Dan Williams
2018-06-06 13:44             ` Dan Williams
2018-06-07 14:37             ` Michal Hocko
2018-06-07 14:37               ` Michal Hocko
2018-06-07 16:52               ` Dan Williams
2018-06-07 16:52                 ` Dan Williams
2018-06-11  7:50                 ` Michal Hocko
2018-06-11  7:50                   ` Michal Hocko
2018-06-11 14:44                   ` Dan Williams
2018-06-11 14:44                     ` Dan Williams
2018-06-11 14:56                     ` Michal Hocko
2018-06-11 14:56                       ` Michal Hocko
2018-06-11 15:19                       ` Dan Williams
2018-06-11 15:19                         ` Dan Williams
2018-06-11 17:35                         ` Andi Kleen
2018-06-11 17:35                           ` Andi Kleen
2018-06-12  1:50                         ` Naoya Horiguchi
2018-06-12  1:58                           ` Dan Williams
2018-06-12  1:58                             ` Dan Williams
2018-06-12  4:04                           ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4ggLr780-bQWVFqh-iC3z+coGwRnd-bM-nwWfXC+mVgfg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=bp@alien8.de \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=jack@suse.cz \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.