linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-nvdimm <linux-nvdimm@ml01.01.org>, X86 ML <x86@kernel.org>,
	"elliott@hpe.com" <elliott@hpe.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHV5 3/3] x86, ras: Add __mcsafe_copy() function to recover from machine checks
Date: Sat, 26 Dec 2015 06:54:57 -0800	[thread overview]
Message-ID: <CALCETrUWmT7jwMvcS+NgaRKc7wpoZ5f_dGT8no7dOWFAGvKtmQ@mail.gmail.com> (raw)
In-Reply-To: <20151226103252.GA21988@pd.tnic>

On Dec 26, 2015 6:33 PM, "Borislav Petkov" <bp@alien8.de> wrote:
>
> On Fri, Dec 25, 2015 at 08:05:39PM +0000, Luck, Tony wrote:
> > mce_in_kernel_recov() should check whether we have a fix up entry for
> > the specific IP that hit the machine check before rating the severity
> > as kernel recoverable.
>
> Yeah, it is not precise right now. But this is easy - I'll change it to
> a simpler version of fixup_mcexception() to iterate over the exception
> table.
>
> > If we add more functions (for different cache behaviour, or to
> > optimize for specific processor model) we can make sure to put them
> > all together inside begin/end labels.
>
> Yeah, I think we can do even better than that as all the info is in the
> ELF file already. For example, ENDPROC(__mcsafe_copy) generates
>
> .type __mcsafe_copy, @function ; .size __mcsafe_copy, .-__mcsafe_copy
>
> and there's the size of the function, I guess we can macroize something
> like that or even parse the ELF file:
>
> $ readelf --syms vmlinux | grep mcsafe
>    706: ffffffff819df73e    14 OBJECT  LOCAL  DEFAULT   11 __kstrtab___mcsafe_copy
>    707: ffffffff819d0e18     8 OBJECT  LOCAL  DEFAULT    9 __kcrctab___mcsafe_copy
>  56107: ffffffff819b3bb0    16 OBJECT  GLOBAL DEFAULT    7 __ksymtab___mcsafe_copy
>  58581: ffffffff812e6d70   179 FUNC    GLOBAL DEFAULT    1 __mcsafe_copy
>  62233: 000000003313f9d4     0 NOTYPE  GLOBAL DEFAULT  ABS __crc___mcsafe_copy
>  68818: ffffffff812e6e23     0 NOTYPE  GLOBAL DEFAULT    1 __mcsafe_copy_end
>
> __mcsafe_copy is of size 179 bytes:
>
> 0xffffffff812e6d70 + 179 = 0xffffffff812e6e23 which is __mcsafe_copy_end
> so those labels should not really be necessary as they're global and
> polluting the binary unnecessarily.
>
> > We would run into trouble if we want to have some in-line macros for
> > use from arbitrary C-code like we have for the page fault case.
>
> Example?
>
> > I might make the arbitrary %rax value be #PF and #MC to reflect the
> > h/w fault that got us here rather than -EINVAL/-EFAULT. But that's
> > just bike shedding.
>
> Yeah, I picked those arbitrarily to show the intention.
>
> > But now we are back to having the fault handler poke %rax again, which
> > made Andy twitch before.
>
> Andy, why is that? It makes the exception handling much simpler this way...
>

I like the idea of moving more logic into C, but I don't like
splitting the logic across files and adding nasty special cases like
this.

But what if we generalized it?  An extable entry gives a fault IP and
a landing pad IP.  Surely we can squeeze a flag bit in there.  If you
set the bit, you get an extended extable entry.  Instead of storing a
landing pad, it stores a pointer to a handler descriptor:

struct extable_handler {
  bool (*handler)(struct pt_regs *, struct extable_handler *, ...):
};

handler returns true if it handled the error and false if it didn't.
The "..." encodes the fault number, error code, cr2, etc.  Maybe it
would be "unsigned long exception, const struct extable_info *info"
where extable_info contains a union?  I really wish C would grow up
and learn about union types.

Now the copy routine can do whatever it pleases, in C, locally.   For
example, if you set up a full stack frame (or even just a known SP
offset), you could unwind it in C and just return a value directly,
or, even better, you could manually tail-call a C fixup that goes one
byte at a time instead of writing that mess in asm.  Like this,
assuming I got it right:

regs->sp = regs->bp;
regs->bp = *(unsigned long *)regs->sp;
regs->sp += sizeof(unsigned long);
regs->ip = fix_it;
regs->di = something useful?

Bonus points if you can figure out a clean way to register a handler
for an IP range without bloating struct extable_entry.

--Andy

P.S. this mechanism could potentially clean up some entry nastiness, too.

P.P.S.  Why the hell doesn't *user* code have a mechanism like this?
Windows does, and it's been around for longer than I've known how to
write C code...

  reply	other threads:[~2015-12-26 14:55 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-24 20:54 [PATCHV4 0/3] Machine check recovery when kernel accesses poison Tony Luck
2015-12-16  1:29 ` [PATCHV4 1/3] x86, ras: Add new infrastructure for machine check fixup tables Tony Luck
2015-12-16  1:29 ` [PATCHV4 2/3] x86, ras: Extend machine check recovery code to annotated ring0 areas Tony Luck
2015-12-16  1:30 ` [PATCHV4 3/3] x86, ras: Add __mcsafe_copy() function to recover from machine checks Tony Luck
2015-12-24 21:46   ` Borislav Petkov
2015-12-16  1:30     ` [PATCHV5 " Tony Luck
2015-12-25 11:49       ` Borislav Petkov
2015-12-25 20:05         ` Luck, Tony
2015-12-26 10:32           ` Borislav Petkov
2015-12-26 14:54             ` Andy Lutomirski [this message]
2015-12-27  2:08               ` Tony Luck
2015-12-27  2:15                 ` Andy Lutomirski
2015-12-27  2:16                   ` Andy Lutomirski
2015-12-27  6:57                     ` Tony Luck
2015-12-27 10:09                       ` Borislav Petkov
2015-12-27 12:19                         ` Andy Lutomirski
2015-12-27 13:17                           ` Boris Petkov
2015-12-27 13:25                             ` Andy Lutomirski
2015-12-27 13:33                               ` Borislav Petkov
2015-12-27 13:40                                 ` Andy Lutomirski
2015-12-27 19:04                                 ` Dan Williams
2015-12-27 12:18                       ` Andy Lutomirski
2015-12-30 23:32                         ` Tony Luck
2015-12-31 20:30                           ` Tony Luck
2015-12-31 21:22                             ` Andy Lutomirski
2016-01-01 22:19                             ` Tony Luck
2016-01-03  3:40                               ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrUWmT7jwMvcS+NgaRKc7wpoZ5f_dGT8no7dOWFAGvKtmQ@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=elliott@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mingo@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).