All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Tony Luck <tony.luck@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Robert <elliott@hpe.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>
Subject: Re: [PATCH v8 1/3] x86: Expand exception table to allow new handling options
Date: Sat, 9 Jan 2016 14:32:43 -0800	[thread overview]
Message-ID: <CALCETrUAO3gYiVpi5BO+o6=bika2D9JFZJ4xa9Ph8ArGMfftgA@mail.gmail.com> (raw)
In-Reply-To: <CA+8MBbJHXTv=-OP1+dwq5KCursi8jRnWR5Mg=MavD_sVSY05eA@mail.gmail.com>

On Jan 9, 2016 11:51 AM, "Tony Luck" <tony.luck@gmail.com> wrote:
>
> > Oh, I see.  Is it the case that the MC code can't cleanly handle the
> > case where the error was nominally recoverable but the kernel doesn't
> > know how to recover from it due to the lack of a handler that's okay
> > with it, because the handler's refusal to handle the fault wouldn't be
> > known until too late?
>
> The code is just too clunky right now.  We have a table driven
> severity calculator that we invoke on each machine check bank
> that has some valid data to report.  Part of that calculation is
> "what context am I in?". Which happens earlier in the sequence
> than "Is MCi_STATUS.MCACOD some known recoverable type".
> If I invoke the fixup code I'll change regs->ip right away ... even
> if I'm executing on some innocent bystander processor that wasn't
> the source of the machine check (the bystanders on the same
> socket can usually see something logged in one of the memory
> controller banks).

Makes sense, sort of.  But even if there is an MC fixup registered,
don't you still have to make sure to execute it on the actual victim
CPU?  After all, you don't want to fail an mcsafe copy just because a
different CPU coincidentally machine checked while the mcsafe copy has
the recoverable RIP value.

>
> There are definitely some cleanups that should be done
> in this code (e.g. figuring our context just once, not once
> per bank).  But I'm pretty sure I'll always want to know
> "am I executing an instruction with a #MC recoverable
> handler?" in a way that doesn't actually invoke the recovery.

What's wrong with:

Step 1: determine that the HW context is, in principle, recoverable.

Step 2: ask the handler to try to recover.

Step 3: if the handler doesn't recover, panic

I'm not saying that restructuring the code like this should be a
prerequisite for merging this, but I'm wondering whether it would make
sense at some point in the future.

--Andy

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@amacapital.net>
To: Tony Luck <tony.luck@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Robert <elliott@hpe.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>
Subject: Re: [PATCH v8 1/3] x86: Expand exception table to allow new handling options
Date: Sat, 9 Jan 2016 14:32:43 -0800	[thread overview]
Message-ID: <CALCETrUAO3gYiVpi5BO+o6=bika2D9JFZJ4xa9Ph8ArGMfftgA@mail.gmail.com> (raw)
In-Reply-To: <CA+8MBbJHXTv=-OP1+dwq5KCursi8jRnWR5Mg=MavD_sVSY05eA@mail.gmail.com>

On Jan 9, 2016 11:51 AM, "Tony Luck" <tony.luck@gmail.com> wrote:
>
> > Oh, I see.  Is it the case that the MC code can't cleanly handle the
> > case where the error was nominally recoverable but the kernel doesn't
> > know how to recover from it due to the lack of a handler that's okay
> > with it, because the handler's refusal to handle the fault wouldn't be
> > known until too late?
>
> The code is just too clunky right now.  We have a table driven
> severity calculator that we invoke on each machine check bank
> that has some valid data to report.  Part of that calculation is
> "what context am I in?". Which happens earlier in the sequence
> than "Is MCi_STATUS.MCACOD some known recoverable type".
> If I invoke the fixup code I'll change regs->ip right away ... even
> if I'm executing on some innocent bystander processor that wasn't
> the source of the machine check (the bystanders on the same
> socket can usually see something logged in one of the memory
> controller banks).

Makes sense, sort of.  But even if there is an MC fixup registered,
don't you still have to make sure to execute it on the actual victim
CPU?  After all, you don't want to fail an mcsafe copy just because a
different CPU coincidentally machine checked while the mcsafe copy has
the recoverable RIP value.

>
> There are definitely some cleanups that should be done
> in this code (e.g. figuring our context just once, not once
> per bank).  But I'm pretty sure I'll always want to know
> "am I executing an instruction with a #MC recoverable
> handler?" in a way that doesn't actually invoke the recovery.

What's wrong with:

Step 1: determine that the HW context is, in principle, recoverable.

Step 2: ask the handler to try to recover.

Step 3: if the handler doesn't recover, panic

I'm not saying that restructuring the code like this should be a
prerequisite for merging this, but I'm wondering whether it would make
sense at some point in the future.

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-01-09 22:33 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-09  0:04 [PATCH v8 0/3] Machine check recovery when kernel accesses poison Tony Luck
2016-01-09  0:04 ` Tony Luck
2015-12-31 19:40 ` [PATCH v8 2/3] x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries Tony Luck
2015-12-31 19:40   ` Tony Luck
2016-01-08 20:49 ` [PATCH v8 1/3] x86: Expand exception table to allow new handling options Tony Luck
2016-01-08 20:49   ` Tony Luck
2016-01-09  1:52   ` Andy Lutomirski
2016-01-09  1:52     ` Andy Lutomirski
2016-01-09  3:39     ` Brian Gerst
2016-01-09  3:39       ` Brian Gerst
2016-01-09  4:31       ` Brian Gerst
2016-01-09  4:31         ` Brian Gerst
2016-01-09  6:36         ` Andy Lutomirski
2016-01-09  6:36           ` Andy Lutomirski
2016-01-11 23:09           ` Brian Gerst
2016-01-11 23:09             ` Brian Gerst
2016-01-11 23:22             ` Andy Lutomirski
2016-01-11 23:22               ` Andy Lutomirski
2016-01-11 23:48             ` Luck, Tony
2016-01-11 23:48               ` Luck, Tony
2016-01-09 17:45     ` Tony Luck
2016-01-09 17:45       ` Tony Luck
2016-01-09 18:00       ` Andy Lutomirski
2016-01-09 18:00         ` Andy Lutomirski
2016-01-09 19:51         ` Tony Luck
2016-01-09 19:51           ` Tony Luck
2016-01-09 22:32           ` Andy Lutomirski [this message]
2016-01-09 22:32             ` Andy Lutomirski
2016-01-10  1:15             ` Tony Luck
2016-01-10  1:15               ` Tony Luck
2016-01-11  0:25     ` Luck, Tony
2016-01-11  0:25       ` Luck, Tony
2016-01-08 21:18 ` [PATCH v8 3/3] x86, mce: Add __mcsafe_copy() Tony Luck
2016-01-08 21:18   ` Tony Luck
2016-01-09  1:49   ` Andy Lutomirski
2016-01-09  1:49     ` Andy Lutomirski
2016-01-09 17:48     ` Tony Luck
2016-01-09 17:48       ` Tony Luck
2016-01-09 17:57       ` Andy Lutomirski
2016-01-09 17:57         ` Andy Lutomirski
2016-01-09 19:39         ` Tony Luck
2016-01-09 19:39           ` Tony Luck
2016-01-09 22:15           ` Dan Williams
2016-01-09 22:15             ` Dan Williams
2016-01-09 22:33             ` Andy Lutomirski
2016-01-09 22:33               ` Andy Lutomirski
2016-01-10  0:23               ` Dan Williams
2016-01-10  0:23                 ` Dan Williams
2016-01-10  1:40                 ` Tony Luck
2016-01-10  1:40                   ` Tony Luck
2016-01-10 11:26                   ` Borislav Petkov
2016-01-10 11:26                     ` Borislav Petkov
2016-01-11 10:44                     ` Ingo Molnar
2016-01-11 10:44                       ` Ingo Molnar
2016-01-13 23:22                       ` Tony Luck
2016-01-13 23:22                         ` Tony Luck
2016-01-14  4:39                         ` Borislav Petkov
2016-01-14  4:39                           ` Borislav Petkov
2016-01-30  0:35                           ` Tony Luck
2016-01-30  0:35                             ` Tony Luck
2016-01-30 10:28                             ` Borislav Petkov
2016-01-30 10:28                               ` Borislav Petkov
2016-02-01 23:10                               ` Tony Luck
2016-02-01 23:10                                 ` Tony Luck
2016-02-01 23:16                                 ` Dan Williams
2016-02-01 23:16                                   ` Dan Williams
2016-01-12  0:26     ` Luck, Tony
2016-01-12  0:26       ` Luck, Tony
2016-01-12  0:30       ` Andy Lutomirski
2016-01-12  0:30         ` Andy Lutomirski
2016-01-12  0:37       ` Andy Lutomirski
2016-01-12  0:37         ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrUAO3gYiVpi5BO+o6=bika2D9JFZJ4xa9Ph8ArGMfftgA@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=elliott@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mingo@kernel.org \
    --cc=tony.luck@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.