kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
@ 2023-05-31 15:01 Jon Kohler
  2023-05-31 23:18 ` Josh Poimboeuf
  2023-06-01  0:29 ` Andrew Cooper
  0 siblings, 2 replies; 19+ messages in thread
From: Jon Kohler @ 2023-05-31 15:01 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Josh Poimboeuf, Pawan Gupta, Daniel Sneddon, kvm, linux-kernel
  Cc: Jon Kohler, Andrew Cooper

Remove barrier_nospec(), which translates to LFENCE, in
vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
already exist prior to this point.

This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.

For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
does not match, which serves as an additional RSB-barrier.

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html

Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
Signed-off-by: Jon Kohler <jon@nutanix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Daniel Sneddon <daniel.sneddon@linux.intel.com>
CC: Josh Poimboeuf <jpoimboe@kernel.org>
CC: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
CC: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/vmx/vmx.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..98ca8b79aab1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7130,8 +7130,6 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
 	    vmx->spec_ctrl != hostval)
 		native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
-
-	barrier_nospec();
 }
 
 static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-05-31 15:01 [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host() Jon Kohler
@ 2023-05-31 23:18 ` Josh Poimboeuf
  2023-05-31 23:58   ` Jon Kohler
  2023-06-01  0:29 ` Andrew Cooper
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Poimboeuf @ 2023-05-31 23:18 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Pawan Gupta, Daniel Sneddon, kvm, linux-kernel, Andrew Cooper

On Wed, May 31, 2023 at 11:01:12AM -0400, Jon Kohler wrote:
> Remove barrier_nospec(), which translates to LFENCE, in
> vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
> already exist prior to this point.
> 
> This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
> RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
> commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
> 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
> directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
> 
> For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
> IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
> does not match, which serves as an additional RSB-barrier.
> 
> [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
> 
> Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
> Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")

Sorry, I knew I should have put a comment there.

The goal of this barrier_nospec() is to prevent speculative execution
from bypassing the SPEC_CTRL write (due to misprediction of the
conditional branch, Spectre v1 style).  Otherwise the next indirect
branch or unbalanced RET could be an attack target.

So any previous LFENCEs before that conditional branch won't help here.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-05-31 23:18 ` Josh Poimboeuf
@ 2023-05-31 23:58   ` Jon Kohler
  2023-06-01  0:42     ` Josh Poimboeuf
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Kohler @ 2023-05-31 23:58 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Pawan Gupta, Daniel Sneddon, kvm @ vger . kernel . org, LKML,
	Andrew Cooper



> On May 31, 2023, at 7:18 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> 
> On Wed, May 31, 2023 at 11:01:12AM -0400, Jon Kohler wrote:
>> Remove barrier_nospec(), which translates to LFENCE, in
>> vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
>> already exist prior to this point.
>> 
>> This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
>> RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
>> commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
>> 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
>> directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
>> 
>> For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
>> IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
>> does not match, which serves as an additional RSB-barrier.
>> 
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.intel.com_content_www_us_en_developer_articles_technical_software-2Dsecurity-2Dguidance_advisory-2Dguidance_post-2Dbarrier-2Dreturn-2Dstack-2Dbuffer-2Dpredictions.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=mWtdoxDY5cOR_aKAtV9D8kHfeWi380k1kYwGB_RAPTEL1F_AUSstYbevyVn9lhk-&s=IG-gZfjPGO_XI9FkdbrvZFvHPyWMQD8EK9AuBEpVY94&e= 
>> 
>> Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
>> Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
> 
> Sorry, I knew I should have put a comment there.
> 
> The goal of this barrier_nospec() is to prevent speculative execution
> from bypassing the SPEC_CTRL write (due to misprediction of the
> conditional branch, Spectre v1 style).  Otherwise the next indirect
> branch or unbalanced RET could be an attack target.
> 
> So any previous LFENCEs before that conditional branch won't help here.
> 
> -- 
> Josh

Ah interesting. Ok, to be clear, thats a guest -> host attack, correct? And such
an attack would not at all be thwarted by the first CALL retire + LFENCE that
was added on commit 2b1299322016 ("x86/speculation: Add RSB VM Exit 
protections”)? Sorry to be long winded, just wanting to triple check because
the aforementioned commit was added slightly after the original one, and I 
want to make extra sure that they aren’t solving the same thing.

If that is indeed the case, does that commit need to be revisited at all?

Or are we saying that this Intel vulnerability needs *two* LFENCE’s to keep
the host secure?

If that’s the case, that’s fine, and I’m happy to transform this commit into
some comments in this area to capture the issue for future onlookers?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-05-31 15:01 [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host() Jon Kohler
  2023-05-31 23:18 ` Josh Poimboeuf
@ 2023-06-01  0:29 ` Andrew Cooper
  2023-06-01  0:53   ` Josh Poimboeuf
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2023-06-01  0:29 UTC (permalink / raw)
  To: Jon Kohler, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Josh Poimboeuf, Pawan Gupta, Daniel Sneddon, kvm, linux-kernel,
	Milburn, Alyssa

On 31/05/2023 4:01 pm, Jon Kohler wrote:
> Remove barrier_nospec(), which translates to LFENCE, in
> vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
> already exist prior to this point.
>
> This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
> RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
> commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
> 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
> directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
>
> For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
> IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
> does not match, which serves as an additional RSB-barrier.
>
> [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html

Yeah, unfortunately PBRSB is insidious.

From memory (please correct me if I'm wrong), the required safety
property is this:  After a VMExit (if IBRS was set prior to exit) or the
write to MSR_SPEC_CTRL setting IBRS (if IBRS was not set prior to exit),
one single CALL instruction must architecturally retire before any RET
instructions execute (speculatively).

There are several ways to arrange this, but they all basically boil down
to having some serialising instruction between the first CALL and first
RET on any reachable path from VMExit.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-05-31 23:58   ` Jon Kohler
@ 2023-06-01  0:42     ` Josh Poimboeuf
  2023-06-01  0:50       ` Andrew Cooper
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-01  0:42 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Pawan Gupta, Daniel Sneddon, kvm @ vger . kernel . org, LKML,
	Andrew Cooper

On Wed, May 31, 2023 at 11:58:12PM +0000, Jon Kohler wrote:
> > The goal of this barrier_nospec() is to prevent speculative execution
> > from bypassing the SPEC_CTRL write (due to misprediction of the
> > conditional branch, Spectre v1 style).  Otherwise the next indirect
> > branch or unbalanced RET could be an attack target.
> > 
> > So any previous LFENCEs before that conditional branch won't help here.
> 
> Ah interesting. Ok, to be clear, thats a guest -> host attack, correct? And such
> an attack would not at all be thwarted by the first CALL retire + LFENCE that
> was added on commit 2b1299322016 ("x86/speculation: Add RSB VM Exit 
> protections”)?

Right.

> Sorry to be long winded, just wanting to triple check because
> the aforementioned commit was added slightly after the original one, and I 
> want to make extra sure that they aren’t solving the same thing.
> 
> If that is indeed the case, does that commit need to be revisited at all?
> 
> Or are we saying that this Intel vulnerability needs *two* LFENCE’s to keep
> the host secure?

The first LFENCE (FILL_RETURN_BUFFER) forces the CALL to retire so the
RSB stuff is guaranteed to take effect before the next unbalanced RET
can be speculatively executed.

The second LFENCE (vmx_spec_ctrl_restore_host) forces the conditional
branch to retire so the SPEC_CTRL write (potential IBRS/eIBRS
enablement) is guaranteed to take effect before the next indirect branch
and/or unbalanced RET can be speculatively executed.

So each LFENCE has a distinct purpose.  That said, there are no indirect
branches or unbalanced RETs between them.  So it should be fine to
combine them into a single LFENCE after both.

You could for example just remove the first LFENCE.  But only for that
usage site, i.e. not for other users of FILL_RETURN_BUFFER.

Or, remove them both and add an LFENCE in vmx_vmexit() right after the
call to vmx_spec_ctrl_restore_host().  That might be clearer.  Then it
could have a comment describing its dual purposes.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-01  0:42     ` Josh Poimboeuf
@ 2023-06-01  0:50       ` Andrew Cooper
  2023-06-01  0:56         ` Josh Poimboeuf
  2023-06-01  1:24         ` Pawan Gupta
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Cooper @ 2023-06-01  0:50 UTC (permalink / raw)
  To: Josh Poimboeuf, Jon Kohler
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Pawan Gupta, Daniel Sneddon, kvm @ vger . kernel . org, LKML

On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> So each LFENCE has a distinct purpose.  That said, there are no indirect
> branches or unbalanced RETs between them.

How lucky are you feeling?

You're in C at this point, which means the compiler could have emitted a
call to mem{cpy,cmp}() in place of a simple assignment/comparison.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-01  0:29 ` Andrew Cooper
@ 2023-06-01  0:53   ` Josh Poimboeuf
  0 siblings, 0 replies; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-01  0:53 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jon Kohler, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra (Intel),
	Pawan Gupta, Daniel Sneddon, kvm, linux-kernel, Milburn, Alyssa

On Thu, Jun 01, 2023 at 01:29:12AM +0100, Andrew Cooper wrote:
> On 31/05/2023 4:01 pm, Jon Kohler wrote:
> > Remove barrier_nospec(), which translates to LFENCE, in
> > vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
> > already exist prior to this point.
> >
> > This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
> > RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
> > commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
> > 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
> > directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
> >
> > For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
> > IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
> > does not match, which serves as an additional RSB-barrier.
> >
> > [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
> 
> Yeah, unfortunately PBRSB is insidious.
> 
> From memory (please correct me if I'm wrong), the required safety
> property is this:  After a VMExit (if IBRS was set prior to exit) or the
> write to MSR_SPEC_CTRL setting IBRS (if IBRS was not set prior to exit),
> one single CALL instruction must architecturally retire before any RET
> instructions execute (speculatively).
> 
> There are several ways to arrange this, but they all basically boil down
> to having some serialising instruction between the first CALL and first
> RET on any reachable path from VMExit.

The document says the problem is *unbalanced* RET, i.e. RSB underflow.

So the mitigation needs a single RSB stuff (i.e., unbalanced CALL) and
then an LFENCE anytime before the next unbalanced RET.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-01  0:50       ` Andrew Cooper
@ 2023-06-01  0:56         ` Josh Poimboeuf
  2023-06-01  1:24         ` Pawan Gupta
  1 sibling, 0 replies; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-01  0:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jon Kohler, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Peter Zijlstra (Intel),
	Pawan Gupta, Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> > So each LFENCE has a distinct purpose.  That said, there are no indirect
> > branches or unbalanced RETs between them.
> 
> How lucky are you feeling?
> 
> You're in C at this point, which means the compiler could have emitted a
> call to mem{cpy,cmp}() in place of a simple assignment/comparison.

But it's only unbalanced RETs we're concerned about, per Intel.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-01  0:50       ` Andrew Cooper
  2023-06-01  0:56         ` Josh Poimboeuf
@ 2023-06-01  1:24         ` Pawan Gupta
  2023-06-01  4:23           ` Josh Poimboeuf
  1 sibling, 1 reply; 19+ messages in thread
From: Pawan Gupta @ 2023-06-01  1:24 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Josh Poimboeuf, Jon Kohler, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> > So each LFENCE has a distinct purpose.  That said, there are no indirect
> > branches or unbalanced RETs between them.
> 
> How lucky are you feeling?
> 
> You're in C at this point, which means the compiler could have emitted a
> call to mem{cpy,cmp}() in place of a simple assignment/comparison.

Moving the second LFENCE to the else part of WRMSR should be possible?
So that the serialization can be achived either by WRMSR or LFENCE. This
saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.

---
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d2d6e1b6c788..d32e6d172dd6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7157,8 +7157,8 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
 	    vmx->spec_ctrl != hostval)
 		native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
-
-	barrier_nospec();
+	else
+		barrier_nospec();
 }
 
 static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-01  1:24         ` Pawan Gupta
@ 2023-06-01  4:23           ` Josh Poimboeuf
  2023-06-05 14:29             ` Jon Kohler
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-01  4:23 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Andrew Cooper, Jon Kohler, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:

## 2023-05-31
> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> > On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> > > So each LFENCE has a distinct purpose.  That said, there are no indirect
> > > branches or unbalanced RETs between them.
> > 
> > How lucky are you feeling?
> > 
> > You're in C at this point, which means the compiler could have emitted a
> > call to mem{cpy,cmp}() in place of a simple assignment/comparison.
> 
> Moving the second LFENCE to the else part of WRMSR should be possible?
> So that the serialization can be achived either by WRMSR or LFENCE. This
> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.

Yes.  Though in practice it might not make much of a difference.  With
wrmsr+lfence, the lfence has nothing to do so it might be almost
instantaneous anyway.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-01  4:23           ` Josh Poimboeuf
@ 2023-06-05 14:29             ` Jon Kohler
  2023-06-05 16:35               ` Josh Poimboeuf
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Kohler @ 2023-06-05 14:29 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Pawan Gupta, Andrew Cooper, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML



> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> 
> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
> 
> ## 2023-05-31
>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
>>>> So each LFENCE has a distinct purpose.  That said, there are no indirect
>>>> branches or unbalanced RETs between them.
>>> 
>>> How lucky are you feeling?
>>> 
>>> You're in C at this point, which means the compiler could have emitted a
>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
>> 
>> Moving the second LFENCE to the else part of WRMSR should be possible?
>> So that the serialization can be achived either by WRMSR or LFENCE. This
>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
> 
> Yes.  Though in practice it might not make much of a difference.  With
> wrmsr+lfence, the lfence has nothing to do so it might be almost
> instantaneous anyway.
> 
> -- 
> Josh

Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?

That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
and that would act as the “final line of defense” LFENCE.

Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
before any sort of calls no matter what?

Thanks,
Jon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 14:29             ` Jon Kohler
@ 2023-06-05 16:35               ` Josh Poimboeuf
  2023-06-05 16:39                 ` Jon Kohler
  2023-06-06  0:20                 ` Andrew Cooper
  0 siblings, 2 replies; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-05 16:35 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Pawan Gupta, Andrew Cooper, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
> 
> 
> > On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > 
> > On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
> > 
> > ## 2023-05-31
> >> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> >>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> >>>> So each LFENCE has a distinct purpose.  That said, there are no indirect
> >>>> branches or unbalanced RETs between them.
> >>> 
> >>> How lucky are you feeling?
> >>> 
> >>> You're in C at this point, which means the compiler could have emitted a
> >>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
> >> 
> >> Moving the second LFENCE to the else part of WRMSR should be possible?
> >> So that the serialization can be achived either by WRMSR or LFENCE. This
> >> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
> > 
> > Yes.  Though in practice it might not make much of a difference.  With
> > wrmsr+lfence, the lfence has nothing to do so it might be almost
> > instantaneous anyway.
> > 
> > -- 
> > Josh
> 
> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
> 
> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
> and that would act as the “final line of defense” LFENCE.
> 
> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
> before any sort of calls no matter what?

If we go by Intel's statement that only unbalanced RETs are a concern,
that *might* be ok as long as there's a nice comment above the
FILL_RETURN_BUFFER usage site describing the two purposes for the
LFENCE.

However, based on Andy's concerns, which I've discussed with him
privately (but I'm not qualified to agree or disagree with), we may want
to just convert vmx_spec_ctrl_restore_host() to asm.  Better safe than
sorry.  My original implementation of that function was actually asm.  I
can try to dig up that code.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 16:35               ` Josh Poimboeuf
@ 2023-06-05 16:39                 ` Jon Kohler
  2023-06-05 17:31                   ` Pawan Gupta
  2023-06-06  0:20                 ` Andrew Cooper
  1 sibling, 1 reply; 19+ messages in thread
From: Jon Kohler @ 2023-06-05 16:39 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Pawan Gupta, Andrew Cooper, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML



> On Jun 5, 2023, at 12:35 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> 
> On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
>> 
>> 
>>> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>>> 
>>> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
>>> 
>>> ## 2023-05-31
>>>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
>>>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
>>>>>> So each LFENCE has a distinct purpose.  That said, there are no indirect
>>>>>> branches or unbalanced RETs between them.
>>>>> 
>>>>> How lucky are you feeling?
>>>>> 
>>>>> You're in C at this point, which means the compiler could have emitted a
>>>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
>>>> 
>>>> Moving the second LFENCE to the else part of WRMSR should be possible?
>>>> So that the serialization can be achived either by WRMSR or LFENCE. This
>>>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
>>> 
>>> Yes.  Though in practice it might not make much of a difference.  With
>>> wrmsr+lfence, the lfence has nothing to do so it might be almost
>>> instantaneous anyway.
>>> 
>>> -- 
>>> Josh
>> 
>> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
>> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
>> 
>> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
>> and that would act as the “final line of defense” LFENCE.
>> 
>> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
>> before any sort of calls no matter what?
> 
> If we go by Intel's statement that only unbalanced RETs are a concern,
> that *might* be ok as long as there's a nice comment above the
> FILL_RETURN_BUFFER usage site describing the two purposes for the
> LFENCE.
> 
> However, based on Andy's concerns, which I've discussed with him
> privately (but I'm not qualified to agree or disagree with), we may want
> to just convert vmx_spec_ctrl_restore_host() to asm.  Better safe than
> sorry.  My original implementation of that function was actually asm.  I
> can try to dig up that code.
> 
> -- 
> Josh

RE ASM - that’d be nice, given that the restore guest pre-vmentry is now ASM,
would match nicely. Then all the code is in one spot, in one file, in one language.

If you could dig that up so we didn’t have to start from scratch, that’d be great (imho).

Jon


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 16:39                 ` Jon Kohler
@ 2023-06-05 17:31                   ` Pawan Gupta
  2023-06-05 18:31                     ` Sean Christopherson
  0 siblings, 1 reply; 19+ messages in thread
From: Pawan Gupta @ 2023-06-05 17:31 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Josh Poimboeuf, Andrew Cooper, Sean Christopherson,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote:
> 
> 
> > On Jun 5, 2023, at 12:35 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > 
> > On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
> >> 
> >> 
> >>> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >>> 
> >>> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
> >>> 
> >>> ## 2023-05-31
> >>>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> >>>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> >>>>>> So each LFENCE has a distinct purpose.  That said, there are no indirect
> >>>>>> branches or unbalanced RETs between them.
> >>>>> 
> >>>>> How lucky are you feeling?
> >>>>> 
> >>>>> You're in C at this point, which means the compiler could have emitted a
> >>>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
> >>>> 
> >>>> Moving the second LFENCE to the else part of WRMSR should be possible?
> >>>> So that the serialization can be achived either by WRMSR or LFENCE. This
> >>>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
> >>> 
> >>> Yes.  Though in practice it might not make much of a difference.  With
> >>> wrmsr+lfence, the lfence has nothing to do so it might be almost
> >>> instantaneous anyway.
> >>> 
> >>> -- 
> >>> Josh
> >> 
> >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
> >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
> >> 
> >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
> >> and that would act as the “final line of defense” LFENCE.
> >> 
> >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
> >> before any sort of calls no matter what?
> > 
> > If we go by Intel's statement that only unbalanced RETs are a concern,
> > that *might* be ok as long as there's a nice comment above the
> > FILL_RETURN_BUFFER usage site describing the two purposes for the
> > LFENCE.

We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE
to account for wrmsr branch misprediction. Currently LFENCE is not
executed for !X86_BUG_EIBRS_PBRSB.

> > However, based on Andy's concerns, which I've discussed with him
> > privately (but I'm not qualified to agree or disagree with), we may want
> > to just convert vmx_spec_ctrl_restore_host() to asm.  Better safe than
> > sorry.  My original implementation of that function was actually asm.  I
> > can try to dig up that code.

Note:

  VMexit
  CALL
    RET
  RET    <---- This is also a problem if the first call hasn't retired yet.
  LFENCE

Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care
of this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 17:31                   ` Pawan Gupta
@ 2023-06-05 18:31                     ` Sean Christopherson
  2023-06-05 19:57                       ` Jon Kohler
  2023-06-05 20:01                       ` Josh Poimboeuf
  0 siblings, 2 replies; 19+ messages in thread
From: Sean Christopherson @ 2023-06-05 18:31 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Jon Kohler, Josh Poimboeuf, Andrew Cooper, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Mon, Jun 05, 2023, Pawan Gupta wrote:
> On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote:
> > >>> Yes.  Though in practice it might not make much of a difference.  With
> > >>> wrmsr+lfence, the lfence has nothing to do so it might be almost
> > >>> instantaneous anyway.
> > >>> 
> > >>> -- 
> > >>> Josh
> > >> 
> > >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
> > >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
> > >> 
> > >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
> > >> and that would act as the “final line of defense” LFENCE.
> > >> 
> > >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
> > >> before any sort of calls no matter what?
> > > 
> > > If we go by Intel's statement that only unbalanced RETs are a concern,
> > > that *might* be ok as long as there's a nice comment above the
> > > FILL_RETURN_BUFFER usage site describing the two purposes for the
> > > LFENCE.
> 
> We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE
> to account for wrmsr branch misprediction. Currently LFENCE is not
> executed for !X86_BUG_EIBRS_PBRSB.
> 
> > > However, based on Andy's concerns, which I've discussed with him
> > > privately (but I'm not qualified to agree or disagree with), we may want
> > > to just convert vmx_spec_ctrl_restore_host() to asm.  Better safe than
> > > sorry.  My original implementation of that function was actually asm.  I
> > > can try to dig up that code.
> 
> Note:
> 
>   VMexit
>   CALL
>     RET
>   RET    <---- This is also a problem if the first call hasn't retired yet.
>   LFENCE
> 
> Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care
> of this.

Is there an actual bug here, or are we just micro-optimizing something that may or
may not need additional optimization?  Unless there's a bug to be fixed, moving
code into ASM and increasing complexity doesn't seem worthwhile.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 18:31                     ` Sean Christopherson
@ 2023-06-05 19:57                       ` Jon Kohler
  2023-06-05 20:01                       ` Josh Poimboeuf
  1 sibling, 0 replies; 19+ messages in thread
From: Jon Kohler @ 2023-06-05 19:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pawan Gupta, Josh Poimboeuf, Andrew Cooper, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML



> On Jun 5, 2023, at 2:31 PM, Sean Christopherson <seanjc@google.com> wrote:
> 
> On Mon, Jun 05, 2023, Pawan Gupta wrote:
>> On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote:
>>>>>> Yes.  Though in practice it might not make much of a difference.  With
>>>>>> wrmsr+lfence, the lfence has nothing to do so it might be almost
>>>>>> instantaneous anyway.
>>>>>> 
>>>>>> -- 
>>>>>> Josh
>>>>> 
>>>>> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
>>>>> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
>>>>> 
>>>>> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
>>>>> and that would act as the “final line of defense” LFENCE.
>>>>> 
>>>>> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
>>>>> before any sort of calls no matter what?
>>>> 
>>>> If we go by Intel's statement that only unbalanced RETs are a concern,
>>>> that *might* be ok as long as there's a nice comment above the
>>>> FILL_RETURN_BUFFER usage site describing the two purposes for the
>>>> LFENCE.
>> 
>> We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE
>> to account for wrmsr branch misprediction. Currently LFENCE is not
>> executed for !X86_BUG_EIBRS_PBRSB.
>> 
>>>> However, based on Andy's concerns, which I've discussed with him
>>>> privately (but I'm not qualified to agree or disagree with), we may want
>>>> to just convert vmx_spec_ctrl_restore_host() to asm.  Better safe than
>>>> sorry.  My original implementation of that function was actually asm.  I
>>>> can try to dig up that code.
>> 
>> Note:
>> 
>>  VMexit
>>  CALL
>>    RET
>>  RET    <---- This is also a problem if the first call hasn't retired yet.
>>  LFENCE
>> 
>> Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care
>> of this.
> 
> Is there an actual bug here, or are we just micro-optimizing something that may or
> may not need additional optimization?  Unless there's a bug to be fixed, moving
> code into ASM and increasing complexity doesn't seem worthwhile.

The (slight) bug here is that on systems where the host != guest spec ctrl, they get
hit with LFENCE + WRMSR to SPEC_CTRL + LFENCE, when in reality they should
just get LFENCE + WRMSR to SPEC_CTRL and thats it. 

That would be satisfied with Pawan’s suggestion to move the LFENCE into the else
condition in the last branch of vmx_spec_ctrl_restore_host().

The optimization on top of that would be to see if we could whack that 2x LFENCE
down to 1x LFENCE. Just feels wrong to have 2x LFENCE’s in the critical path,
even if the second one ends up being fairly snappy because of the first one (and/or
the WRMSR).

So to recap, fixing “the bug” does not require moving to ASM. Optimizing the 2x LFENCE
into 1x LFENCE (probably) requires going to ASM from the sounds of it?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 18:31                     ` Sean Christopherson
  2023-06-05 19:57                       ` Jon Kohler
@ 2023-06-05 20:01                       ` Josh Poimboeuf
  1 sibling, 0 replies; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-05 20:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pawan Gupta, Jon Kohler, Andrew Cooper, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Mon, Jun 05, 2023 at 11:31:34AM -0700, Sean Christopherson wrote:
> Is there an actual bug here, or are we just micro-optimizing something that may or
> may not need additional optimization?  Unless there's a bug to be fixed, moving
> code into ASM and increasing complexity doesn't seem worthwhile.

I can't really argue with that.  FWIW, here's the (completely untested)
patch.

---8<---

From: Josh Poimboeuf <jpoimboe@kernel.org>
Subject: [PATCH] KVM: VMX: Convert vmx_spec_ctrl_restore_host() to assembly

Convert vmx_spec_ctrl_restore_host() to assembly.  This allows the
removal of a redundant LFENCE.  It also "feels" safer as it doesn't
allow the compiler to insert any surprises.  And it's more symmetrical
with the pre-vmentry SPEC_CTRL handling, which is also done in assembly.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 arch/x86/kvm/vmx/vmenter.S | 71 ++++++++++++++++++++++++++++++++------
 arch/x86/kvm/vmx/vmx.c     | 25 --------------
 arch/x86/kvm/vmx/vmx.h     |  1 -
 3 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 631fd7da2bc3..977f3487469c 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -108,7 +108,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
 	lea (%_ASM_SP), %_ASM_ARG2
 	call vmx_update_host_rsp
 
-	ALTERNATIVE "jmp .Lspec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL
+	ALTERNATIVE "jmp .Lguest_spec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL
 
 	/*
 	 * SPEC_CTRL handling: if the guest's SPEC_CTRL value differs from the
@@ -122,13 +122,13 @@ SYM_FUNC_START(__vmx_vcpu_run)
 	movl VMX_spec_ctrl(%_ASM_DI), %edi
 	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
 	cmp %edi, %esi
-	je .Lspec_ctrl_done
+	je .Lguest_spec_ctrl_done
 	mov $MSR_IA32_SPEC_CTRL, %ecx
 	xor %edx, %edx
 	mov %edi, %eax
 	wrmsr
 
-.Lspec_ctrl_done:
+.Lguest_spec_ctrl_done:
 
 	/*
 	 * Since vmentry is serializing on affected CPUs, there's no need for
@@ -253,9 +253,65 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
 #endif
 
 	/*
-	 * IMPORTANT: RSB filling and SPEC_CTRL handling must be done before
-	 * the first unbalanced RET after vmexit!
+	 * IMPORTANT: The below SPEC_CTRL handling and RSB filling must be done
+	 * before the first RET after vmexit!
+	 */
+
+	ALTERNATIVE "jmp .Lhost_spec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL
+
+	pop %_ASM_SI	/* @flags */
+	pop %_ASM_DI	/* @vmx */
+
+	/*
+	 * if (flags & VMX_RUN_SAVE_SPEC_CTRL)
+	 *	vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+	 */
+	test $VMX_RUN_SAVE_SPEC_CTRL, %_ASM_SI
+	jz .Lhost_spec_ctrl_check
+
+	mov $MSR_IA32_SPEC_CTRL, %ecx
+	rdmsr
+	mov %eax, VMX_spec_ctrl(%_ASM_DI)
+
+.Lhost_spec_ctrl_check:
+	/*
+	 * If the guest/host SPEC_CTRL values differ, restore the host value.
 	 *
+	 * For legacy IBRS, the IBRS bit always needs to be written after
+	 * transitioning from a less privileged predictor mode, regardless of
+	 * whether the guest/host values differ.
+	 *
+	 * if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
+	 *     vmx->spec_ctrl != this_cpu_read(x86_spec_ctrl_current))
+	 *	native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
+	 */
+	ALTERNATIVE "", "jmp .Lhost_spec_ctrl_write", X86_FEATURE_KERNEL_IBRS
+	movl VMX_spec_ctrl(%_ASM_DI), %edi
+	movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
+	cmp %edi, %esi
+	je .Lhost_spec_ctrl_done_lfence
+
+.Lhost_spec_ctrl_write:
+	mov $MSR_IA32_SPEC_CTRL, %ecx
+	xor %edx, %edx
+	mov %esi, %eax
+	wrmsr
+
+.Lhost_spec_ctrl_done_lfence:
+	/*
+	 * This ensures that speculative execution doesn't incorrectly bypass
+	 * the above SPEC_CTRL wrmsr by mispredicting the 'je'.
+	 *
+	 * It's only needed if the below FILL_RETURN_BUFFER doesn't do an
+	 * LFENCE.  Thus the X86_FEATURE_RSB_VMEXIT{_LITE} alternatives.
+	 */
+	ALTERNATIVE_2 "lfence", \
+		      "", X86_FEATURE_RSB_VMEXIT, \
+		      "", X86_FEATURE_RSB_VMEXIT_LITE
+
+.Lhost_spec_ctrl_done:
+
+	/*
 	 * For retpoline or IBRS, RSB filling is needed to prevent poisoned RSB
 	 * entries and (in some cases) RSB underflow.
 	 *
@@ -267,11 +323,6 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
 	FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT,\
 			   X86_FEATURE_RSB_VMEXIT_LITE
 
-	pop %_ASM_ARG2	/* @flags */
-	pop %_ASM_ARG1	/* @vmx */
-
-	call vmx_spec_ctrl_restore_host
-
 	/* Put return value in AX */
 	mov %_ASM_BX, %_ASM_AX
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..d353b0e23918 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7109,31 +7109,6 @@ void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
 	}
 }
 
-void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
-					unsigned int flags)
-{
-	u64 hostval = this_cpu_read(x86_spec_ctrl_current);
-
-	if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL))
-		return;
-
-	if (flags & VMX_RUN_SAVE_SPEC_CTRL)
-		vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
-
-	/*
-	 * If the guest/host SPEC_CTRL values differ, restore the host value.
-	 *
-	 * For legacy IBRS, the IBRS bit always needs to be written after
-	 * transitioning from a less privileged predictor mode, regardless of
-	 * whether the guest/host values differ.
-	 */
-	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
-	    vmx->spec_ctrl != hostval)
-		native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
-
-	barrier_nospec();
-}
-
 static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 {
 	switch (to_vmx(vcpu)->exit_reason.basic) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9e66531861cf..f9fab33f4d76 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -420,7 +420,6 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
 struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr);
 void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu);
 void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
-void vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, unsigned int flags);
 unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx);
 bool __vmx_vcpu_run(struct vcpu_vmx *vmx, unsigned long *regs,
 		    unsigned int flags);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-05 16:35               ` Josh Poimboeuf
  2023-06-05 16:39                 ` Jon Kohler
@ 2023-06-06  0:20                 ` Andrew Cooper
  2023-06-06  3:59                   ` Josh Poimboeuf
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Cooper @ 2023-06-06  0:20 UTC (permalink / raw)
  To: Josh Poimboeuf, Jon Kohler
  Cc: Pawan Gupta, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On 05/06/2023 5:35 pm, Josh Poimboeuf wrote:
> On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
>> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
>> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
>>
>> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
>> and that would act as the “final line of defense” LFENCE.
>>
>> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
>> before any sort of calls no matter what?
> If we go by Intel's statement that only unbalanced RETs are a concern,
> that *might* be ok as long as there's a nice comment above the
> FILL_RETURN_BUFFER usage site describing the two purposes for the
> LFENCE.
>
> However, based on Andy's concerns, which I've discussed with him
> privately (but I'm not qualified to agree or disagree with), we may want
> to just convert vmx_spec_ctrl_restore_host() to asm.  Better safe than
> sorry.  My original implementation of that function was actually asm.  I
> can try to dig up that code.

Lemme see if I can remember the whole safely position.  I've just gone
through a years worth of notes and conversations, including the
following gems:

From Intel: "And then on top of that, the LFENCE in
vmx_spec_ctrl_restore_host would save you. Fingers crossed."
From me: "The how and why is important.  Not for right now, but for the
years to come when all of this logic inevitably gets
tweaked/refactored/moved".

Date-check says 11 months...


__vmx_vcpu_run() is a leaf function as far as the kernel stack is
concerned, so to a first approximation, it behaves as such:

    VMEXIT
    RET

The RET gets a prediction from the top of the RSB (a guest controlled
value), but during execute the prediction is discovered to be wrong so a
resteer occurs, causing it to restart from __vmx_vcpu_run()'s caller.

Skipping the middle-ground with a 32-entry RSB stuffling loop, we have
the following behaviour on eIBRS parts:

    VMEXIT (flush RSB if IBRS was 1 in guest)
    Restore Host MSR_SPEC_CTRL (flush RSB if IBRS was 0 in guest)
    RET

where the RET (in theory) encounters an RSB-empty condition and falls
back to an indirect prediction.

PBRSB is a missing corner case in the hardware RSB flush, which is only
corrected after one CALL instruction architecturally retires.

The problem with mitigating however is that it is heavily entangled with
BCBS, so I refer you to
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/analysis-speculative-execution-side-channels.html#inpage-nav-undefined-1-undefined
which describes one example of how RETs can go wrong.

The critical observation is that while for PBRSB it's the first
unbalanced RET which causes problem, as soon as *any* RET has executed
and got a bad resteer (even temporarily), you've lost all ability to
contain the damage.  So, to protect against PBRSB, one CALL instruction
must retire before *any* RET instruction executes.

Pawan's patch to turn the unconditional lfence into an else-lfence
should be safe seeing as Intel's guidance
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
explicitly says you can use the WRMSR to restore the host MSR_SPEC_CTRL
value as the speculation barrier.

But, the safety of vmx_spec_ctrl_restore_host() in the first place
depends on the early return never ever becoming a conditional, and the
compiler never emitting a call to memcpy()/memset()/whatever behind your
back - something which is not prohibited by noinstr.

I hope this clears things up.

~Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host()
  2023-06-06  0:20                 ` Andrew Cooper
@ 2023-06-06  3:59                   ` Josh Poimboeuf
  0 siblings, 0 replies; 19+ messages in thread
From: Josh Poimboeuf @ 2023-06-06  3:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jon Kohler, Pawan Gupta, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	X86 ML, H. Peter Anvin, Peter Zijlstra (Intel),
	Daniel Sneddon, kvm @ vger . kernel . org, LKML

On Tue, Jun 06, 2023 at 01:20:52AM +0100, Andrew Cooper wrote:

<clip very useful summary which belongs in git somewhere>

> But, the safety of vmx_spec_ctrl_restore_host() in the first place
> depends on the early return never ever becoming a conditional,

Good point.  And that would be easier to overlook in C.

> and the compiler never emitting a call to memcpy()/memset()/whatever
> behind your back - something which is not prohibited by noinstr.

Au contraire, objtool has checking for that:

	if (state->noinstr && state->instr <= 0 &&
	    !noinstr_call_dest(file, insn, insn_call_dest(insn))) {
		WARN_INSN(insn, "call to %s() leaves .noinstr.text section", call_dest_name(insn));
		return 1;
	}

Regardless, despite being the person who wrote this thing in C to begin
with, I believe asm really is a better fit due to the delicate and
precise nature of the mitigations.

-- 
Josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-06-06  4:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31 15:01 [PATCH] KVM: VMX: remove LFENCE in vmx_spec_ctrl_restore_host() Jon Kohler
2023-05-31 23:18 ` Josh Poimboeuf
2023-05-31 23:58   ` Jon Kohler
2023-06-01  0:42     ` Josh Poimboeuf
2023-06-01  0:50       ` Andrew Cooper
2023-06-01  0:56         ` Josh Poimboeuf
2023-06-01  1:24         ` Pawan Gupta
2023-06-01  4:23           ` Josh Poimboeuf
2023-06-05 14:29             ` Jon Kohler
2023-06-05 16:35               ` Josh Poimboeuf
2023-06-05 16:39                 ` Jon Kohler
2023-06-05 17:31                   ` Pawan Gupta
2023-06-05 18:31                     ` Sean Christopherson
2023-06-05 19:57                       ` Jon Kohler
2023-06-05 20:01                       ` Josh Poimboeuf
2023-06-06  0:20                 ` Andrew Cooper
2023-06-06  3:59                   ` Josh Poimboeuf
2023-06-01  0:29 ` Andrew Cooper
2023-06-01  0:53   ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).