All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joerg Roedel <jroedel@suse.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: Joerg Roedel <joro@8bytes.org>, X86 ML <x86@kernel.org>,
	stable <stable@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jiri Slaby <jslaby@suse.cz>,
	Dan Williams <dan.j.williams@intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Juergen Gross <jgross@suse.com>,
	Kees Cook <keescook@chromium.org>,
	David Rientjes <rientjes@google.com>,
	Cfir Cohen <cfir@google.com>, Erdem Aktas <erdemaktas@google.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mike Stunes <mstunes@vmware.com>,
	Sean Christopherson <seanjc@google.com>,
	Martin Radev <martin.b.radev@gmail.com>,
	Arvind Sankar <nivedita@alum.mit.edu>,
	LKML <linux-kernel@vger.kernel.org>,
	kvm list <kvm@vger.kernel.org>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>
Subject: Re: [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack
Date: Thu, 18 Feb 2021 20:21:17 +0100	[thread overview]
Message-ID: <20210218192117.GL12716@suse.de> (raw)
In-Reply-To: <CALCETrUohqQPVTBJZZKh-pj=4aZrwDAu5UFSetj3k5pGLDPbkA@mail.gmail.com>

On Thu, Feb 18, 2021 at 09:49:06AM -0800, Andy Lutomirski wrote:
> I don't understand what this means.  The whole entry mechanism on x86
> is structured so that we call a C function *and return from that C
> function without longjmp-like magic* with the sole exception of
> unwind_stack_do_exit().  This means that you can match up enters and
> exits, and that unwind_stack_do_exit() needs to unwind correctly.  In
> the former case, it's normal C and we can use normal local variables.
> In the latter case, we know exactly what state we're trying to restore
> and we can restore it directly without any linked lists or similar.

Okay, the unwinder will likely get confused by this logic.

> What do you have in mind that requires a linked list?

Cases when there are multiple IST vectors besides NMI that can hit while
the #VC handler is still on its own IST stack. #MCE comes to mind, but
that is broken anyway. At some point #VC itself will be one of them, but
when that happens the code will kill the machine.
This leaves #HV in the list, and I am not sure how that is going to be
handled yet. I think the goal is that the #HV handler is not allowed to
cause any #VC exception. In that case the linked-list logic will not be
needed.

> > I don't see how this would break, can you elaborate?
> >
> > What I think happens is:
> >
> > SYSCALL gap (RSP is from userspace and untrusted)
> >
> >         -> #VC - Handler on #VC IST stack detects that it interrupted
> >            the SYSCALL gap and switches to the task stack.
> >
> 
> Can you point me to exactly what code you're referring to?  I spent a
> while digging through the code and macro tangle and I can't find this.

See the entry code in arch/x86/entry/entry_64.S, macro idtentry_vc. It
creates the assembly code for the handler. At some point it calls
vc_switch_off_ist(), which is a C function in arch/x86/kernel/traps.c.
This function tries to find a new stack for the #VC handler.

The first thing it does is checking whether the exception came from the
SYSCALL gap and just uses the task stack in that case.

Then it will check for other kernel stacks which are safe to switch
to. If that fails it uses the fall-back stack (VC2), which will direct
the handler to a separate function which, for now, just calls panic().
Not safe are the entry or unknown stacks.

The function then copies pt_regs and returns the new stack pointer to
assembly code, which then writes it to %RSP.

> Unless AMD is more magic than I realize, the MOV SS bug^Wfeature means
> that #DB is *not* always called in safe places.

You are right, forgot about this. The MOV SS bug can very well
trigger a #VC(#DB) exception from the syscall gap.

> > And with SNP we need to be able to at least detect a malicious HV so we
> > can reliably kill the guest. Otherwise the HV could potentially take
> > control over the guest's execution flow and make it reveal its secrets.
> 
> True.  But is the rest of the machinery to be secure against EFLAGS.IF
> violations and such in place yet?

Not sure what you mean by EFLAGS.IF violations, probably enabling IRQs
while in the #VC handler? The #VC handler _must_ _not_ enable IRQs
anywhere in its call-path. If that ever happens it is a bug.

Regards,

	Joerg

WARNING: multiple messages have this Message-ID (diff)
From: Joerg Roedel <jroedel@suse.de>
To: Andy Lutomirski <luto@kernel.org>
Cc: kvm list <kvm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Linux Virtualization <virtualization@lists.linux-foundation.org>,
	Arvind Sankar <nivedita@alum.mit.edu>,
	"H. Peter Anvin" <hpa@zytor.com>, Jiri Slaby <jslaby@suse.cz>,
	Joerg Roedel <joro@8bytes.org>, X86 ML <x86@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Martin Radev <martin.b.radev@gmail.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Kees Cook <keescook@chromium.org>, Cfir Cohen <cfir@google.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Juergen Gross <jgross@suse.com>, Mike Stunes <mstunes@vmware.com>,
	Sean Christopherson <seanjc@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	stable <stable@vger.kernel.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Erdem Aktas <erdemaktas@google.com>
Subject: Re: [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack
Date: Thu, 18 Feb 2021 20:21:17 +0100	[thread overview]
Message-ID: <20210218192117.GL12716@suse.de> (raw)
In-Reply-To: <CALCETrUohqQPVTBJZZKh-pj=4aZrwDAu5UFSetj3k5pGLDPbkA@mail.gmail.com>

On Thu, Feb 18, 2021 at 09:49:06AM -0800, Andy Lutomirski wrote:
> I don't understand what this means.  The whole entry mechanism on x86
> is structured so that we call a C function *and return from that C
> function without longjmp-like magic* with the sole exception of
> unwind_stack_do_exit().  This means that you can match up enters and
> exits, and that unwind_stack_do_exit() needs to unwind correctly.  In
> the former case, it's normal C and we can use normal local variables.
> In the latter case, we know exactly what state we're trying to restore
> and we can restore it directly without any linked lists or similar.

Okay, the unwinder will likely get confused by this logic.

> What do you have in mind that requires a linked list?

Cases when there are multiple IST vectors besides NMI that can hit while
the #VC handler is still on its own IST stack. #MCE comes to mind, but
that is broken anyway. At some point #VC itself will be one of them, but
when that happens the code will kill the machine.
This leaves #HV in the list, and I am not sure how that is going to be
handled yet. I think the goal is that the #HV handler is not allowed to
cause any #VC exception. In that case the linked-list logic will not be
needed.

> > I don't see how this would break, can you elaborate?
> >
> > What I think happens is:
> >
> > SYSCALL gap (RSP is from userspace and untrusted)
> >
> >         -> #VC - Handler on #VC IST stack detects that it interrupted
> >            the SYSCALL gap and switches to the task stack.
> >
> 
> Can you point me to exactly what code you're referring to?  I spent a
> while digging through the code and macro tangle and I can't find this.

See the entry code in arch/x86/entry/entry_64.S, macro idtentry_vc. It
creates the assembly code for the handler. At some point it calls
vc_switch_off_ist(), which is a C function in arch/x86/kernel/traps.c.
This function tries to find a new stack for the #VC handler.

The first thing it does is checking whether the exception came from the
SYSCALL gap and just uses the task stack in that case.

Then it will check for other kernel stacks which are safe to switch
to. If that fails it uses the fall-back stack (VC2), which will direct
the handler to a separate function which, for now, just calls panic().
Not safe are the entry or unknown stacks.

The function then copies pt_regs and returns the new stack pointer to
assembly code, which then writes it to %RSP.

> Unless AMD is more magic than I realize, the MOV SS bug^Wfeature means
> that #DB is *not* always called in safe places.

You are right, forgot about this. The MOV SS bug can very well
trigger a #VC(#DB) exception from the syscall gap.

> > And with SNP we need to be able to at least detect a malicious HV so we
> > can reliably kill the guest. Otherwise the HV could potentially take
> > control over the guest's execution flow and make it reveal its secrets.
> 
> True.  But is the rest of the machinery to be secure against EFLAGS.IF
> violations and such in place yet?

Not sure what you mean by EFLAGS.IF violations, probably enabling IRQs
while in the #VC handler? The #VC handler _must_ _not_ enable IRQs
anywhere in its call-path. If that ever happens it is a bug.

Regards,

	Joerg
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-02-18 19:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-17 12:01 [PATCH 0/3] x86/sev-es: Check for trusted regs->sp in __sev_es_ist_enter() Joerg Roedel
2021-02-17 12:01 ` Joerg Roedel
2021-02-17 12:01 ` [PATCH 1/3] x86/sev-es: Introduce from_syscall_gap() helper Joerg Roedel
2021-02-17 12:01   ` Joerg Roedel
2021-02-17 17:59   ` Borislav Petkov
2021-02-17 17:59     ` Borislav Petkov
2021-02-17 12:01 ` [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack Joerg Roedel
2021-02-17 12:01   ` Joerg Roedel
2021-02-17 18:00   ` Borislav Petkov
2021-02-17 18:00     ` Borislav Petkov
2021-02-17 18:09   ` Andy Lutomirski
2021-02-17 18:09     ` Andy Lutomirski
2021-02-18 11:25     ` Joerg Roedel
2021-02-18 11:25       ` Joerg Roedel
2021-02-18 17:49       ` Andy Lutomirski
2021-02-18 17:49         ` Andy Lutomirski
2021-02-18 19:21         ` Joerg Roedel [this message]
2021-02-18 19:21           ` Joerg Roedel
2021-02-19  0:28           ` Andy Lutomirski
2021-02-19  0:28             ` Andy Lutomirski
2021-02-19 11:05             ` Joerg Roedel
2021-02-19 11:05               ` Joerg Roedel
2021-02-17 12:01 ` [PATCH 3/3] x86/sev-es: Improve comments in and around __sev_es_ist_enter/exit() Joerg Roedel
2021-02-17 12:01   ` Joerg Roedel
2021-02-17 18:00   ` Borislav Petkov
2021-02-17 18:00     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210218192117.GL12716@suse.de \
    --to=jroedel@suse.de \
    --cc=cfir@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=erdemaktas@google.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=joro@8bytes.org \
    --cc=jslaby@suse.cz \
    --cc=keescook@chromium.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=martin.b.radev@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=mstunes@vmware.com \
    --cc=nivedita@alum.mit.edu \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    --cc=thomas.lendacky@amd.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.