All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Peter Zijlstra <peterz@infradead.org>, X86 ML <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
Date: Wed, 2 May 2018 10:47:12 -0700	[thread overview]
Message-ID: <CAPcyv4jNkPjYLitvkw4mzXpm650rcYA7GJ6++psp_FZLRshE0g@mail.gmail.com> (raw)
In-Reply-To: <CALCETrXxnaj+YL_NDM1u0tM9v6p8ZtQw62n+y4Tv4ScB0DdZPw@mail.gmail.com>

On Wed, May 2, 2018 at 9:19 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, May 1, 2018 at 8:34 PM Linus Torvalds
> <torvalds@linux-foundation.org>
> wrote:
>
>> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
>> wrote:
>
>> > All that to say that having a typical RAM page covering poisoned pmem
>> > would complicate the 'clear badblocks' implementation.
>
>> Ugh, ok.
>
>> I guess the good news is that your patches aren't so big, and don't really
>> affect anything else.
>
>
> I pondered this a bit.  Doing better might be a big pain in the arse.  The
> interesting case is where ordinary kernel code (memcpy, plain old memory
> operands, etc) access faulty pmem.  This means that there's no extable
> entry around.  If we actually try to recover, we have a few problems:
>
>   - We can't sanely skip the instruction without causing random errors.
>
>   - If the access was through the kernel direct map, then we could plausibly
> remap a different page in place of the faulty page.  The problem is that,
> if the page is *writable* and we share it between more than one faulty
> page, then we're enabling a giant information leak.  But we still need to
> figure out how we're supposed to invalidate the old mapping from a random,
> potentially atomic context.
>
>   - If the access is through kmap or similar, then we're talking about
> modifying a PTE out from under kernel code that really isn't expecting us
> to modify it.
>
>   - How are we supposed to signal the process or fail a syscall?  The fault
> could have come from interrupt context, softirq context, kernel thread
> context, etc, and figuring out who's to blame seems quite awkward and
> fragile.
>
> All that being said, I suspect that we still have issues even with accesses
> to user VAs that are protected by extable entries.  The whole #MC mechanism
> is a supremely shitty interface for recoverable errors (especially on
> Intel), and I'm a bit scared of what happens if the offending access is,
> say, inside a perf NMI.
>
> Dan, is there any chance you could put some pressure on the architecture
> folks to invent an entirely new, less shitty way to tell the OS about
> recoverable memory errors?  And to make it testable by normal people?
> Needing big metal EINJ hardware to test the house of cards that is #MC is
> just awful and means that there are few enough kernel developers that are
> actually able to test that I can probably count them on one hand.  And I'm
> not one of them...

I feel this testing pain too. The EINJ facility is not ubiquitous
which is why I punted and wrote patch 6 to unit test this. You're
right that does not scale for all the potential places we'd like to be
able to safely handle memory errors in the kernel.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Tony Luck <tony.luck@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()
Date: Wed, 2 May 2018 10:47:12 -0700	[thread overview]
Message-ID: <CAPcyv4jNkPjYLitvkw4mzXpm650rcYA7GJ6++psp_FZLRshE0g@mail.gmail.com> (raw)
In-Reply-To: <CALCETrXxnaj+YL_NDM1u0tM9v6p8ZtQw62n+y4Tv4ScB0DdZPw@mail.gmail.com>

On Wed, May 2, 2018 at 9:19 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Tue, May 1, 2018 at 8:34 PM Linus Torvalds
> <torvalds@linux-foundation.org>
> wrote:
>
>> On Tue, May 1, 2018 at 8:22 PM Dan Williams <dan.j.williams@intel.com>
>> wrote:
>
>> > All that to say that having a typical RAM page covering poisoned pmem
>> > would complicate the 'clear badblocks' implementation.
>
>> Ugh, ok.
>
>> I guess the good news is that your patches aren't so big, and don't really
>> affect anything else.
>
>
> I pondered this a bit.  Doing better might be a big pain in the arse.  The
> interesting case is where ordinary kernel code (memcpy, plain old memory
> operands, etc) access faulty pmem.  This means that there's no extable
> entry around.  If we actually try to recover, we have a few problems:
>
>   - We can't sanely skip the instruction without causing random errors.
>
>   - If the access was through the kernel direct map, then we could plausibly
> remap a different page in place of the faulty page.  The problem is that,
> if the page is *writable* and we share it between more than one faulty
> page, then we're enabling a giant information leak.  But we still need to
> figure out how we're supposed to invalidate the old mapping from a random,
> potentially atomic context.
>
>   - If the access is through kmap or similar, then we're talking about
> modifying a PTE out from under kernel code that really isn't expecting us
> to modify it.
>
>   - How are we supposed to signal the process or fail a syscall?  The fault
> could have come from interrupt context, softirq context, kernel thread
> context, etc, and figuring out who's to blame seems quite awkward and
> fragile.
>
> All that being said, I suspect that we still have issues even with accesses
> to user VAs that are protected by extable entries.  The whole #MC mechanism
> is a supremely shitty interface for recoverable errors (especially on
> Intel), and I'm a bit scared of what happens if the offending access is,
> say, inside a perf NMI.
>
> Dan, is there any chance you could put some pressure on the architecture
> folks to invent an entirely new, less shitty way to tell the OS about
> recoverable memory errors?  And to make it testable by normal people?
> Needing big metal EINJ hardware to test the house of cards that is #MC is
> just awful and means that there are few enough kernel developers that are
> actually able to test that I can probably count them on one hand.  And I'm
> not one of them...

I feel this testing pain too. The EINJ facility is not ubiquitous
which is why I punted and wrote patch 6 to unit test this. You're
right that does not scale for all the potential places we'd like to be
able to safely handle memory errors in the kernel.

  reply	other threads:[~2018-05-02 17:47 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-01 20:45 [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter() Dan Williams
2018-05-01 20:45 ` Dan Williams
2018-05-01 20:45 ` [PATCH 1/6] x86, memcpy_mcsafe: update labels in support of write fault handling Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 2/6] x86, memcpy_mcsafe: return bytes remaining Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 3/6] x86, memcpy_mcsafe: add write-protection-fault handling Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 4/6] x86, memcpy_mcsafe: define copy_to_iter_mcsafe() Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 22:17   ` kbuild test robot
2018-05-01 22:17     ` kbuild test robot
2018-05-01 22:49   ` kbuild test robot
2018-05-01 22:49     ` kbuild test robot
2018-05-01 20:45 ` [PATCH 5/6] dax: use copy_to_iter_mcsafe() in dax_iomap_actor() Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 20:45 ` [PATCH 6/6] x86, nfit_test: unit test for memcpy_mcsafe() Dan Williams
2018-05-01 20:45   ` Dan Williams
2018-05-01 21:05 ` [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter() Linus Torvalds
2018-05-01 21:05   ` Linus Torvalds
2018-05-01 23:02   ` Dan Williams
2018-05-01 23:02     ` Dan Williams
2018-05-01 23:28     ` Andy Lutomirski
2018-05-01 23:28       ` Andy Lutomirski
2018-05-01 23:31       ` Dan Williams
2018-05-01 23:31         ` Dan Williams
2018-05-02  0:09     ` Linus Torvalds
2018-05-02  0:09       ` Linus Torvalds
2018-05-02  2:25       ` Dan Williams
2018-05-02  2:25         ` Dan Williams
2018-05-02  2:53         ` Linus Torvalds
2018-05-02  2:53           ` Linus Torvalds
2018-05-02  3:02           ` Dan Williams
2018-05-02  3:02             ` Dan Williams
2018-05-02  3:13             ` Linus Torvalds
2018-05-02  3:13               ` Linus Torvalds
2018-05-02  3:20               ` Dan Williams
2018-05-02  3:20                 ` Dan Williams
2018-05-02  3:22                 ` Dan Williams
2018-05-02  3:22                   ` Dan Williams
2018-05-02  3:33                   ` Linus Torvalds
2018-05-02  3:33                     ` Linus Torvalds
2018-05-02  4:00                     ` Dan Williams
2018-05-02  4:00                       ` Dan Williams
2018-05-02  4:14                       ` Linus Torvalds
2018-05-02  4:14                         ` Linus Torvalds
2018-05-02  5:37                         ` Dan Williams
2018-05-02  5:37                           ` Dan Williams
2018-05-02 16:19                     ` Andy Lutomirski
2018-05-02 16:19                       ` Andy Lutomirski
2018-05-02 17:47                       ` Dan Williams [this message]
2018-05-02 17:47                         ` Dan Williams
2018-05-02  8:30         ` Borislav Petkov
2018-05-02  8:30           ` Borislav Petkov
2018-05-02 13:52           ` Dan Williams
2018-05-02 13:52             ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4jNkPjYLitvkw4mzXpm650rcYA7GJ6++psp_FZLRshE0g@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.