linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>
Subject: Re: STI architectural question (and lretq -- I'm not even kidding)
Date: Tue, 22 Jul 2014 18:33:02 -0700	[thread overview]
Message-ID: <CALCETrUjKXfwbWkUytMXLSmKROLBVY7tRYWPHnEC01Fj=aiTkQ@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFyefnFjmOs-QZhFn68-6+tWO30VqU6ipP6BzAra=xOaFg@mail.gmail.com>

On Tue, Jul 22, 2014 at 6:03 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Jul 22, 2014 at 5:10 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> But here's the problem: what happens if an NMI or MCE happens between
>> the sti and the lretq?  I think an MCE just might be okay -- it's not
>> really recoverable anyway.  (Except for the absurd MCE broadcast crap,
>> which may cause this to be a problem.)  But what about an NMI between
>> sti and lretq?
>
> Sadly, it's not architected.
>
> The "mov ss" and "pop ss" do indeed suppress even NMI. And that *has*
> to be true, because in legacy real mode - where there is no protection
> domain change, and the "lss" instruction didn't originally exist - the
> "pop/mov ss" and "mov sp" instruction sequence had to be entirely
> atomic. And this is even very officially documented. From the intel
> system manual:
>
>     "A POP SS instruction inhibits all interrupts, including the NMI
> interrupt, until after execution of the next instruction. This action
> allows sequential execution of POP SS and MOV ESP, EBP instructions
> without the danger of having an invalid stack during an interrupt.
> However, use of the LSS instruction is the preferred method of loading
> the SS and ESP registers"
>
> However, while "sti" has conceptually the same one-instruction
> interrupt window disable as mov/pop ss, it looks like Intel broke it
> for NMI. The documentation only talks about "external, maskable
> interrupts", and while I suspect *many* micro-architectures also end
> up disabling NMI for the next instruction, there are many reasons to
> think not all do.
>
> See for example
>
>     http://www.sandpile.org/x86/inter.htm
>
> and note #5 under external interrupt suppression.
>
> Now, sandpile is pretty old, but Christian Ludloff used to get things
> like that right.
>
> So I'm afraid that "sti; lret" is not guaranteed to be architecturally
> NMI-safe. But it *might* be safe on certain micro-architectures, and
> maybe somebody inside Intel or AMD can give us a hint about when it is
> safe and when it isn't.

:)  I'm apparently not the only one who finds playing with evil
architectural stuff to be unreasonably fun.

FWIW, both the VMX and SVM code in kvm seem to explicitly implement
NMI suppression in the STI window.  I can't figure out how #MC
broadcast delivery works.  Grr.

In any event, at least the fixup would be straightforward: just do
something like:

void fixup_lret_nmi(struct pt_regs *regs) {
  if (regs->rip == native_lret_to_userspace && !user_mode_vm(regs)) {
    regs->rip = native_sti_before_lret_to_userspace;
    regs->flags &= ~X86_EFLAGS_IF;
  }
}

and call it from the NMI and MCE code.  This is probably preferable to
relying on special friendly microarchitectures.

Of course, this does nothing at all to protect us from #MC after sti
on return from #MC to userspace, but I think we're screwed regardless
-- we could just as easily get a second #MC before the sti.  Machine
check broadcast was the worst idea ever.

Anyway, I updated the tag.

--Andy

  reply	other threads:[~2014-07-23  1:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-23  0:10 STI architectural question (and lretq -- I'm not even kidding) Andy Lutomirski
2014-07-23  1:03 ` Linus Torvalds
2014-07-23  1:33   ` Andy Lutomirski [this message]
2014-07-23 10:49     ` Borislav Petkov
2014-07-23 15:12       ` Andy Lutomirski
2014-07-23 15:23         ` Borislav Petkov
2014-07-23  9:40   ` Borislav Petkov
2014-07-23 21:18 ` Andi Kleen
2014-07-23 21:52   ` Andy Lutomirski
2014-07-23 23:10     ` Andi Kleen
2014-07-24 22:15       ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrUjKXfwbWkUytMXLSmKROLBVY7tRYWPHnEC01Fj=aiTkQ@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=hpa@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).