All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>,
	linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org,
	x86@kernel.org
Subject: Re: [PATCH 2/3] x86, ras: Extend machine check recovery code to annotated ring0 areas
Date: Thu, 12 Nov 2015 11:55:52 -0800	[thread overview]
Message-ID: <20151112195552.GB31228@agluck-desk.sc.intel.com> (raw)
In-Reply-To: <56441357.70201@kernel.org>

On Wed, Nov 11, 2015 at 08:19:35PM -0800, Andy Lutomirski wrote:
> >@@ -1132,9 +1133,15 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >  		if (no_way_out)
> >  			mce_panic("Fatal machine check on current CPU", &m, msg);
> >  		if (worst == MCE_AR_SEVERITY) {
> >-			recover_paddr = m.addr;
> >-			if (!(m.mcgstatus & MCG_STATUS_RIPV))
> >-				flags |= MF_MUST_KILL;
> >+			if ((m.cs & 3) == 3) {
> >+				recover_paddr = m.addr;
> >+				if (!(m.mcgstatus & MCG_STATUS_RIPV))
> >+					flags |= MF_MUST_KILL;
> >+			} else if (fixup_mcexception(regs)) {
> >+				regs->ax = BIT(63) | m.addr;
> >+			} else
> >+				mce_panic("Failed kernel mode recovery",
> >+					  &m, NULL);
> 
> Maybe I'm misunderstanding this, but presumably you shouldn't call
> fixup_mcexception unless you've first verified RIPV (i.e. that the ip you're
> looking up in the table is valid).

Good point. We can only arrive here with a AR_SEVERITY from some
kernel code if the code in mce_severity.c assigned that severity.
But it doesn't currently look at RIPV ... I should make it do that.
Actually I'll check for both RIPV and EIPV: we don't need to look for
a fixup entry for all the innocent bystander cpus that got dragged
into the exception handler because the exception was broadcast to
everyone.

> Also... I find the general flow of this code very hard to follow.  It's
> critical that an MCE hitting kernel mode not get as far as
> ist_begin_non_atomic.  It was already hard enough to tell that the code
> follows that rule, and now it's even harder.  Would it make sense to add
> clear assertions that m.cs == regs->cs and that user_mode(regs) when you get
> to the end?  Simplifying the control flow might also be nice.

Yes. This is a mess now. It works (because we only set recover_paddr
in the user mode case ... so we'll take the "goto done" for the kernel
case). But I agree that this is far from obvious.

> >  		} else if (kill_it) {
> >  			force_sig(SIGBUS, current);
> >  		}
> >
> 
> I would argue that this should happen in the non-atomic section.  It's
> probably okay as long as we came from user mode, but it's more obviously
> safe in the non-atomic section.

Will look at relocating this too when I restructure the tail of the
function.

Thanks for the review.

-Tony

  reply	other threads:[~2015-11-12 19:55 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-09 18:26 [RFC PATCH 0/3] Machine check recovery when kernel accesses poison Tony Luck
2015-11-06 20:57 ` [PATCH 1/3] x86, ras: Add new infrastructure for machine check fixup tables Tony Luck
2015-11-10 11:21   ` Borislav Petkov
2015-11-10 22:05     ` Luck, Tony
2015-11-12  4:14   ` Andy Lutomirski
2015-11-12 19:44     ` Luck, Tony
2015-11-12 20:04       ` Andy Lutomirski
2015-11-12 21:17         ` Luck, Tony
2015-11-06 21:01 ` [PATCH 2/3] x86, ras: Extend machine check recovery code to annotated ring0 areas Tony Luck
2015-11-10 11:21   ` Borislav Petkov
2015-11-10 22:11     ` Luck, Tony
2015-11-11 11:01       ` Borislav Petkov
2015-11-12  4:19   ` Andy Lutomirski
2015-11-12 19:55     ` Luck, Tony [this message]
2015-11-06 21:08 ` [PATCH 3/3] x86, ras: Add mcsafe_memcpy() function to recover from machine checks Tony Luck
2015-11-12  7:53   ` Ingo Molnar
2015-11-12 20:01     ` Luck, Tony
2015-11-27 10:16       ` Ingo Molnar
2015-12-08 21:30         ` Dan Williams
2015-12-08 22:08           ` Luck, Tony
2015-12-08 22:08             ` Luck, Tony
2015-12-14  9:55           ` Ingo Molnar
2015-11-09 18:48 ` [RFC PATCH 0/3] Machine check recovery when kernel accesses poison Tony Luck
2015-11-10 11:21 ` Borislav Petkov
2015-11-10 21:55   ` Luck, Tony
2015-11-11 20:41     ` Borislav Petkov
2015-11-11 21:48       ` Luck, Tony
2015-11-11 22:28         ` Borislav Petkov
2015-11-11 22:32           ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151112195552.GB31228@agluck-desk.sc.intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.