linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
@ 2022-05-12 18:45 Jon Kohler
  2022-05-12 19:27 ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2022-05-12 18:45 UTC (permalink / raw)
  To: Jonathan Corbet, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Jon Kohler, Peter Zijlstra,
	Ashok Raj, KarimAllah Ahmed, David Woodhouse, linux-doc,
	linux-kernel, kvm
  Cc: Waiman Long

Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
attack surface is already covered by switch_mm_irqs_off() ->
cond_mitigation().

The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
wrong in its guest-to-guest design intention. There are three scenarios
at play here:

1. If the vCPUs belong to the same VM, they are in the same security
domain and do not need an IPBP.
2. If the vCPUs belong to different VMs, and each VM is in its own
mm_struct, switch_mm_irqs_off() will handle IBPB as an mm switch is
guaranteed to occur prior to loading a vCPU belonging to a different VMs.
3. If the vCPUs belong to different VMs, but multiple VMs share an
mm_struct, then the security benefits of an IBPB when switching vCPUs
are dubious, at best.

Issuing IBPB from KVM vCPU load would only cover #3, but there are no
known real world, tangible use cases for running multiple VMs belonging
to different security domains in a shared address space. Running multiple
VMs in a single address space is plausible and sane, _if_ they are all
in the same security domain or security is not a concern. If multiple VMs
are sharing an mm_structs, prediction attacks are the least of their
security worries.

Update documentation to reflect what we're protecting against and what
we're not and also remove "buddy" from vmx_vcpu_load_vmcs() as it is
now unused.

Fixes: 15d45071523d ("KVM/x86: Add IBPB support")
Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Waiman Long <longman@redhat.com>
---
v1
 - https://lore.kernel.org/all/20220411164636.74866-1-jon@nutanix.com/
v1 -> v2:
 - https://lore.kernel.org/all/20220419020011.65995-1-jon@nutanix.com/
 - Addressed comments on approach from Sean.
v2 -> v3:
 - https://lore.kernel.org/all/20220422162103.32736-1-jon@nutanix.com/
 - Updated spec-ctrl.h comments and commit msg to incorporate
   additional feedback from Sean.
v3 -> v4:
 - Addressed comments from Boris and Sean.
 - Narrowed change to removing KVM's IBPB only + docs update.
 - Removed "buddy" on vmx_vcpu_load_vmcs() as it is now unused.

 Documentation/admin-guide/hw-vuln/spectre.rst | 33 +++++++++++++++++--
 arch/x86/kvm/svm/svm.c                        |  1 -
 arch/x86/kvm/vmx/nested.c                     |  6 ++--
 arch/x86/kvm/vmx/vmx.c                        | 13 ++------
 arch/x86/kvm/vmx/vmx.h                        |  3 +-
 5 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 9e9556826450..1705f7a0b15d 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -314,7 +314,23 @@ Spectre variant 2

    Linux kernel mitigates attacks to other guests running in the same
    CPU hardware thread by flushing the return stack buffer on VM exit,
-   and clearing the branch target buffer before switching to a new guest.
+   and clearing the branch target buffer before switching to a new guest
+   using IBPB.
+
+   When using IBPB to clear branch target buffer, there are three
+   scenarios at play for VM switching, as follows:
+   If switching between vCPUs belonging to the same guest VM on the same CPU
+   hardware thread, they are considered to be in the same security domain and
+   do not need an IPBP.
+   If switching between vCPUs that belong to different VMs, and each VM has
+   its own memory address space, then IBPB will clear during the context
+   switch.
+   If the Virtual Machine Monitor (VMM) configures multiple VMs to share
+   a single address space, they are all considered to be one single security
+   domain, such that switching in between vCPUs that belong to different VMs
+   in this address space will not use IBPB. Sharing an address space between
+   multiple VMs is uncommon and should be discouraged for security minded
+   workloads.

    If SMT is used, Spectre variant 2 attacks from an untrusted guest
    in the sibling hyperthread can be mitigated by the administrator,
@@ -534,7 +550,9 @@ Spectre variant 2

    To mitigate guest-to-guest attacks in the same CPU hardware thread,
    the branch target buffer is sanitized by flushing before switching
-   to a new guest on a CPU.
+   to a new guest on a CPU. Note that this will not occur if guests are
+   configured to use a single shared memory address space, as that is
+   considered a single security domain.

    The above mitigations are turned on by default on vulnerable CPUs.

@@ -660,6 +678,17 @@ Mitigation selection guide
    against variant 2 attacks originating from programs running on
    sibling threads.

+   Note that in a virtualized environment using KVM, this flush will be
+   done when switching in between VMs. Each VM is considered its own
+   security domain and IBPB will not flush when switching in between
+   vCPUs from a single guest running on the same pCPU. Virtual machine
+   monitors (VMMs), such as qemu-kvm, traditionally configure each VM to
+   be a separate process with separate memory address space; however,
+   it should be explicitly noted that IBPB will not flush between vCPU
+   changes if a VMM configures multiple VMs in a shared address space.
+   While such a configuration is plausible, it is not practical from a
+   security perspective and should be avoided for security minded workloads.
+
    Alternatively, STIBP can be used only when running programs
    whose indirect branch speculation is explicitly disabled,
    while IBPB is still used all the time when switching to a new
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7e45d03cd018..051955145c29 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1302,7 +1302,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

 	if (sd->current_vmcb != svm->vmcb) {
 		sd->current_vmcb = svm->vmcb;
-		indirect_branch_prediction_barrier();
 	}
 	if (kvm_vcpu_apicv_active(vcpu))
 		__avic_vcpu_load(vcpu, cpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 856c87563883..e2271d935d5c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -266,7 +266,7 @@ static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
 	cpu = get_cpu();
 	prev = vmx->loaded_vmcs;
 	vmx->loaded_vmcs = vmcs;
-	vmx_vcpu_load_vmcs(vcpu, cpu, prev);
+	vmx_vcpu_load_vmcs(vcpu, cpu);
 	vmx_sync_vmcs_host_state(vmx, prev);
 	put_cpu();

@@ -4097,12 +4097,12 @@ static void copy_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu,

 	cpu = get_cpu();
 	vmx->loaded_vmcs = &vmx->nested.vmcs02;
-	vmx_vcpu_load_vmcs(vcpu, cpu, &vmx->vmcs01);
+	vmx_vcpu_load_vmcs(vcpu, cpu);

 	sync_vmcs02_to_vmcs12_rare(vcpu, vmcs12);

 	vmx->loaded_vmcs = &vmx->vmcs01;
-	vmx_vcpu_load_vmcs(vcpu, cpu, &vmx->nested.vmcs02);
+	vmx_vcpu_load_vmcs(vcpu, cpu);
 	put_cpu();
 }

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 610355b9ccce..13f720686ad1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1235,8 +1235,7 @@ static void vmx_write_guest_kernel_gs_base(struct vcpu_vmx *vmx, u64 data)
 }
 #endif

-void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
-			struct loaded_vmcs *buddy)
+void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	bool already_loaded = vmx->loaded_vmcs->cpu == cpu;
@@ -1263,14 +1262,6 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 	if (prev != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
 		vmcs_load(vmx->loaded_vmcs->vmcs);
-
-		/*
-		 * No indirect branch prediction barrier needed when switching
-		 * the active VMCS within a guest, e.g. on nested VM-Enter.
-		 * The L1 VMM can protect itself with retpolines, IBPB or IBRS.
-		 */
-		if (!buddy || WARN_ON_ONCE(buddy->vmcs != prev))
-			indirect_branch_prediction_barrier();
 	}

 	if (!already_loaded) {
@@ -1308,7 +1299,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);

-	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
+	vmx_vcpu_load_vmcs(vcpu, cpu);

 	vmx_vcpu_pi_load(vcpu, cpu);

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b98c7e96697a..d5f6d8d0b7ca 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -369,8 +369,7 @@ struct kvm_vmx {
 };

 bool nested_vmx_allowed(struct kvm_vcpu *vcpu);
-void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
-			struct loaded_vmcs *buddy);
+void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu);
 int allocate_vpid(void);
 void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
--
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 18:45 [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load Jon Kohler
@ 2022-05-12 19:27 ` Sean Christopherson
  2022-05-12 19:35   ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-05-12 19:27 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Jonathan Corbet, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Kees Cook,
	Andrea Arcangeli, Josh Poimboeuf, Kim Phillips, Lukas Bulwahn,
	Peter Zijlstra, Ashok Raj, KarimAllah Ahmed, David Woodhouse,
	linux-doc, linux-kernel, kvm, Waiman Long

On Thu, May 12, 2022, Jon Kohler wrote:
> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> attack surface is already covered by switch_mm_irqs_off() ->
> cond_mitigation().
> 
> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> wrong in its guest-to-guest design intention. There are three scenarios
> at play here:

Jim pointed offline that there's a case we didn't consider.  When switching between
vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
different security domains.  E.g. the guest will not get a notification that vCPU0 is
being swapped out for vCPU1 on a single pCPU.

So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
most definitely needs to be updated.

A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
use cases where a single VM is running a single workload.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 19:27 ` Sean Christopherson
@ 2022-05-12 19:35   ` Sean Christopherson
  2022-05-12 19:51     ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2022-05-12 19:35 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Jonathan Corbet, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Kees Cook,
	Andrea Arcangeli, Josh Poimboeuf, Kim Phillips, Lukas Bulwahn,
	Peter Zijlstra, Ashok Raj, KarimAllah Ahmed, David Woodhouse,
	linux-doc, linux-kernel, kvm, Waiman Long

On Thu, May 12, 2022, Sean Christopherson wrote:
> On Thu, May 12, 2022, Jon Kohler wrote:
> > Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> > attack surface is already covered by switch_mm_irqs_off() ->
> > cond_mitigation().
> > 
> > The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> > wrong in its guest-to-guest design intention. There are three scenarios
> > at play here:
> 
> Jim pointed offline that there's a case we didn't consider.  When switching between
> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
> different security domains.  E.g. the guest will not get a notification that vCPU0 is
> being swapped out for vCPU1 on a single pCPU.
> 
> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
> most definitely needs to be updated.
> 
> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
> use cases where a single VM is running a single workload.

Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
because then the IBPB is fully redundant with respect to any IBPB performed by
switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.

That would also sidestep the largely theoretical question of whether vCPUs from
different VMs but the same address space are in the same security domain.  It doesn't
matter, because even if they are in the same domain, KVM still needs to do IBPB.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 19:35   ` Sean Christopherson
@ 2022-05-12 19:51     ` Jon Kohler
  2022-05-12 20:06       ` Jim Mattson
  2022-05-12 20:07       ` Sean Christopherson
  0 siblings, 2 replies; 16+ messages in thread
From: Jon Kohler @ 2022-05-12 19:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jon Kohler, Jonathan Corbet, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long



> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
> 
> On Thu, May 12, 2022, Sean Christopherson wrote:
>> On Thu, May 12, 2022, Jon Kohler wrote:
>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
>>> attack surface is already covered by switch_mm_irqs_off() ->
>>> cond_mitigation().
>>> 
>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
>>> wrong in its guest-to-guest design intention. There are three scenarios
>>> at play here:
>> 
>> Jim pointed offline that there's a case we didn't consider.  When switching between
>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
>> different security domains.  E.g. the guest will not get a notification that vCPU0 is
>> being swapped out for vCPU1 on a single pCPU.
>> 
>> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
>> most definitely needs to be updated.
>> 
>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
>> use cases where a single VM is running a single workload.
> 
> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
> because then the IBPB is fully redundant with respect to any IBPB performed by
> switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
> 
> That would also sidestep the largely theoretical question of whether vCPUs from
> different VMs but the same address space are in the same security domain.  It doesn't
> matter, because even if they are in the same domain, KVM still needs to do IBPB.

So should we go back to the earlier approach where we have it be only 
IBPB on always_ibpb? Or what?

At minimum, we need to fix the unilateral-ness of all of this :) since we’re
IBPB’ing even when the user did not explicitly tell us to.

That said, since I just re-read the documentation today, it does specifically
suggest that if the guest wants to protect *itself* it should turn on IBPB or
STIBP (or other mitigations galore), so I think we end up having to think
about what our “contract” is with users who host their workloads on
KVM - are they expecting us to protect them in any/all cases?

Said another way, the internal guest areas of concern aren’t something
the kernel would always be able to A) identify far in advance and B)
always solve on the users behalf. There is an argument to be made
that the guest needs to deal with its own house, yea?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 19:51     ` Jon Kohler
@ 2022-05-12 20:06       ` Jim Mattson
  2022-05-12 20:07       ` Sean Christopherson
  1 sibling, 0 replies; 16+ messages in thread
From: Jim Mattson @ 2022-05-12 20:06 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long

On Thu, May 12, 2022 at 12:51 PM Jon Kohler <jon@nutanix.com> wrote:
>
>
>
> > On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, May 12, 2022, Sean Christopherson wrote:
> >> On Thu, May 12, 2022, Jon Kohler wrote:
> >>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> >>> attack surface is already covered by switch_mm_irqs_off() ->
> >>> cond_mitigation().
> >>>
> >>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> >>> wrong in its guest-to-guest design intention. There are three scenarios
> >>> at play here:
> >>
> >> Jim pointed offline that there's a case we didn't consider.  When switching between
> >> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
> >> different security domains.  E.g. the guest will not get a notification that vCPU0 is
> >> being swapped out for vCPU1 on a single pCPU.
> >>
> >> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
> >> most definitely needs to be updated.
> >>
> >> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
> >> use cases where a single VM is running a single workload.
> >
> > Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
> > because then the IBPB is fully redundant with respect to any IBPB performed by
> > switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
> > because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
> >
> > That would also sidestep the largely theoretical question of whether vCPUs from
> > different VMs but the same address space are in the same security domain.  It doesn't
> > matter, because even if they are in the same domain, KVM still needs to do IBPB.
>
> So should we go back to the earlier approach where we have it be only
> IBPB on always_ibpb? Or what?
>
> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
> IBPB’ing even when the user did not explicitly tell us to.
>
> That said, since I just re-read the documentation today, it does specifically
> suggest that if the guest wants to protect *itself* it should turn on IBPB or
> STIBP (or other mitigations galore), so I think we end up having to think
> about what our “contract” is with users who host their workloads on
> KVM - are they expecting us to protect them in any/all cases?
>
> Said another way, the internal guest areas of concern aren’t something
> the kernel would always be able to A) identify far in advance and B)
> always solve on the users behalf. There is an argument to be made
> that the guest needs to deal with its own house, yea?

To the extent that the guest has control over its own house, yes.

Say the guest obviates the need for internal IBPB by statically
partitioning virtual cores into different security domains. If the
hypervisor breaks core isolation on the physical platform, it is
responsible for providing the necessary mitigations.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 19:51     ` Jon Kohler
  2022-05-12 20:06       ` Jim Mattson
@ 2022-05-12 20:07       ` Sean Christopherson
  2022-05-12 20:27         ` Jim Mattson
  2022-05-12 20:31         ` Jon Kohler
  1 sibling, 2 replies; 16+ messages in thread
From: Sean Christopherson @ 2022-05-12 20:07 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Jonathan Corbet, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Kees Cook,
	Andrea Arcangeli, Josh Poimboeuf, Kim Phillips, Lukas Bulwahn,
	Peter Zijlstra, Ashok Raj, KarimAllah Ahmed, David Woodhouse,
	linux-doc, LKML, kvm @ vger . kernel . org, Waiman Long

On Thu, May 12, 2022, Jon Kohler wrote:
> 
> 
> > On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
> > 
> > On Thu, May 12, 2022, Sean Christopherson wrote:
> >> On Thu, May 12, 2022, Jon Kohler wrote:
> >>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> >>> attack surface is already covered by switch_mm_irqs_off() ->
> >>> cond_mitigation().
> >>> 
> >>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> >>> wrong in its guest-to-guest design intention. There are three scenarios
> >>> at play here:
> >> 
> >> Jim pointed offline that there's a case we didn't consider.  When switching between
> >> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
> >> different security domains.  E.g. the guest will not get a notification that vCPU0 is
> >> being swapped out for vCPU1 on a single pCPU.
> >> 
> >> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
> >> most definitely needs to be updated.
> >> 
> >> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
> >> use cases where a single VM is running a single workload.
> > 
> > Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
> > because then the IBPB is fully redundant with respect to any IBPB performed by
> > switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
> > because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
> > 
> > That would also sidestep the largely theoretical question of whether vCPUs from
> > different VMs but the same address space are in the same security domain.  It doesn't
> > matter, because even if they are in the same domain, KVM still needs to do IBPB.
> 
> So should we go back to the earlier approach where we have it be only 
> IBPB on always_ibpb? Or what?
> 
> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
> IBPB’ing even when the user did not explicitly tell us to.

I think we need separate controls for the guest.  E.g. if the userspace VMM is
sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
mean that the entire guest it's running is sufficiently hardened.

> That said, since I just re-read the documentation today, it does specifically
> suggest that if the guest wants to protect *itself* it should turn on IBPB or
> STIBP (or other mitigations galore), so I think we end up having to think
> about what our “contract” is with users who host their workloads on
> KVM - are they expecting us to protect them in any/all cases?
> 
> Said another way, the internal guest areas of concern aren’t something
> the kernel would always be able to A) identify far in advance and B)
> always solve on the users behalf. There is an argument to be made
> that the guest needs to deal with its own house, yea?

The issue is that the guest won't get a notification if vCPU0 is replaced with
vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
IBPB.  Since the host doesn't know whether or not the guest wants IBPB, unless the
owner of the host is also the owner of the guest workload, the safe approach is to
assume the guest is vulnerable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 20:07       ` Sean Christopherson
@ 2022-05-12 20:27         ` Jim Mattson
  2022-05-12 20:33           ` Jon Kohler
  2022-05-12 20:31         ` Jon Kohler
  1 sibling, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2022-05-12 20:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jon Kohler, Jonathan Corbet, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, X86 ML, H. Peter Anvin, Kees Cook,
	Andrea Arcangeli, Josh Poimboeuf, Kim Phillips, Lukas Bulwahn,
	Peter Zijlstra, Ashok Raj, KarimAllah Ahmed, David Woodhouse,
	linux-doc, LKML, kvm @ vger . kernel . org, Waiman Long

On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, May 12, 2022, Jon Kohler wrote:
> >
> >
> > > On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Thu, May 12, 2022, Sean Christopherson wrote:
> > >> On Thu, May 12, 2022, Jon Kohler wrote:
> > >>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> > >>> attack surface is already covered by switch_mm_irqs_off() ->
> > >>> cond_mitigation().
> > >>>
> > >>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> > >>> wrong in its guest-to-guest design intention. There are three scenarios
> > >>> at play here:
> > >>
> > >> Jim pointed offline that there's a case we didn't consider.  When switching between
> > >> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
> > >> different security domains.  E.g. the guest will not get a notification that vCPU0 is
> > >> being swapped out for vCPU1 on a single pCPU.
> > >>
> > >> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
> > >> most definitely needs to be updated.
> > >>
> > >> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
> > >> use cases where a single VM is running a single workload.
> > >
> > > Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
> > > because then the IBPB is fully redundant with respect to any IBPB performed by
> > > switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
> > > because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
> > >
> > > That would also sidestep the largely theoretical question of whether vCPUs from
> > > different VMs but the same address space are in the same security domain.  It doesn't
> > > matter, because even if they are in the same domain, KVM still needs to do IBPB.
> >
> > So should we go back to the earlier approach where we have it be only
> > IBPB on always_ibpb? Or what?
> >
> > At minimum, we need to fix the unilateral-ness of all of this :) since we’re
> > IBPB’ing even when the user did not explicitly tell us to.
>
> I think we need separate controls for the guest.  E.g. if the userspace VMM is
> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
> mean that the entire guest it's running is sufficiently hardened.
>
> > That said, since I just re-read the documentation today, it does specifically
> > suggest that if the guest wants to protect *itself* it should turn on IBPB or
> > STIBP (or other mitigations galore), so I think we end up having to think
> > about what our “contract” is with users who host their workloads on
> > KVM - are they expecting us to protect them in any/all cases?
> >
> > Said another way, the internal guest areas of concern aren’t something
> > the kernel would always be able to A) identify far in advance and B)
> > always solve on the users behalf. There is an argument to be made
> > that the guest needs to deal with its own house, yea?
>
> The issue is that the guest won't get a notification if vCPU0 is replaced with
> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
> IBPB.  Since the host doesn't know whether or not the guest wants IBPB, unless the
> owner of the host is also the owner of the guest workload, the safe approach is to
> assume the guest is vulnerable.

Exactly. And if the guest has used taskset as its mitigation strategy,
how is the host to know?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 20:07       ` Sean Christopherson
  2022-05-12 20:27         ` Jim Mattson
@ 2022-05-12 20:31         ` Jon Kohler
  1 sibling, 0 replies; 16+ messages in thread
From: Jon Kohler @ 2022-05-12 20:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jon Kohler, Jonathan Corbet, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long



> On May 12, 2022, at 4:07 PM, Sean Christopherson <seanjc@google.com> wrote:
> 
> On Thu, May 12, 2022, Jon Kohler wrote:
>> 
>> 
>>> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
>>> 
>>> On Thu, May 12, 2022, Sean Christopherson wrote:
>>>> On Thu, May 12, 2022, Jon Kohler wrote:
>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
>>>>> attack surface is already covered by switch_mm_irqs_off() ->
>>>>> cond_mitigation().
>>>>> 
>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
>>>>> wrong in its guest-to-guest design intention. There are three scenarios
>>>>> at play here:
>>>> 
>>>> Jim pointed offline that there's a case we didn't consider.  When switching between
>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
>>>> different security domains.  E.g. the guest will not get a notification that vCPU0 is
>>>> being swapped out for vCPU1 on a single pCPU.
>>>> 
>>>> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
>>>> most definitely needs to be updated.
>>>> 
>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
>>>> use cases where a single VM is running a single workload.
>>> 
>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
>>> because then the IBPB is fully redundant with respect to any IBPB performed by
>>> switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
>>> 
>>> That would also sidestep the largely theoretical question of whether vCPUs from
>>> different VMs but the same address space are in the same security domain.  It doesn't
>>> matter, because even if they are in the same domain, KVM still needs to do IBPB.
>> 
>> So should we go back to the earlier approach where we have it be only 
>> IBPB on always_ibpb? Or what?
>> 
>> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
>> IBPB’ing even when the user did not explicitly tell us to.
> 
> I think we need separate controls for the guest.  E.g. if the userspace VMM is
> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
> mean that the entire guest it's running is sufficiently hardened.

What if we keyed off MSR bitmap, such that if a guest *ever* issued IBPB, KVM
can do IBPB on switch? We already disable interception today, so we have the
data, just like we do for SPEC_CTRL.

    if (prev != vmx->loaded_vmcs->vmcs) {
        per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
        vmcs_load(vmx->loaded_vmcs->vmcs);

        /*
         * No indirect branch prediction barrier needed when switching
         * the active VMCS within a guest, e.g. on nested VM-Enter.
         * The L1 VMM can protect itself with retpolines, IBPB or IBRS.
         * We'll only issue this IBPB if the guest itself has ever issued
         * an IBPB, which would indicate they care about prediction barriers
         * on one or more task(s) within the guest. This guards against the
         * scenario where the guest has separate security domains on separate
         * vCPUs, and the kernel switches vCPU-x out for vCPU-y on the same
         * pCPU, before the guest has the chance to issue its own barrier.
         * In this scenario, the switch_mm() -> cond_mitigation would not
         * issue its own barrier, because the vCPUs are sharing a mm_struct.
         */
        if ((!buddy || WARN_ON_ONCE(buddy->vmcs != prev)) &&
            !msr_write_intercepted(vmx, MSR_IA32_PRED_CMD))
            indirect_branch_prediction_barrier()
    }

If the guest isn’t ever issuing IBPB, they one could say that they do not care
about vCPU-to-vCPU attack surface.

Thoughts?

> 
>> That said, since I just re-read the documentation today, it does specifically
>> suggest that if the guest wants to protect *itself* it should turn on IBPB or
>> STIBP (or other mitigations galore), so I think we end up having to think
>> about what our “contract” is with users who host their workloads on
>> KVM - are they expecting us to protect them in any/all cases?
>> 
>> Said another way, the internal guest areas of concern aren’t something
>> the kernel would always be able to A) identify far in advance and B)
>> always solve on the users behalf. There is an argument to be made
>> that the guest needs to deal with its own house, yea?
> 
> The issue is that the guest won't get a notification if vCPU0 is replaced with
> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
> IBPB.  Since the host doesn't know whether or not the guest wants IBPB, unless the
> owner of the host is also the owner of the guest workload, the safe approach is to
> assume the guest is vulnerable.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 20:27         ` Jim Mattson
@ 2022-05-12 20:33           ` Jon Kohler
  2022-05-12 23:57             ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2022-05-12 20:33 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Sean Christopherson, Jon Kohler, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long



> On May 12, 2022, at 4:27 PM, Jim Mattson <jmattson@google.com> wrote:
> 
> On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@google.com> wrote:
>> 
>> On Thu, May 12, 2022, Jon Kohler wrote:
>>> 
>>> 
>>>> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
>>>> 
>>>> On Thu, May 12, 2022, Sean Christopherson wrote:
>>>>> On Thu, May 12, 2022, Jon Kohler wrote:
>>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
>>>>>> attack surface is already covered by switch_mm_irqs_off() ->
>>>>>> cond_mitigation().
>>>>>> 
>>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
>>>>>> wrong in its guest-to-guest design intention. There are three scenarios
>>>>>> at play here:
>>>>> 
>>>>> Jim pointed offline that there's a case we didn't consider.  When switching between
>>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
>>>>> different security domains.  E.g. the guest will not get a notification that vCPU0 is
>>>>> being swapped out for vCPU1 on a single pCPU.
>>>>> 
>>>>> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
>>>>> most definitely needs to be updated.
>>>>> 
>>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
>>>>> use cases where a single VM is running a single workload.
>>>> 
>>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
>>>> because then the IBPB is fully redundant with respect to any IBPB performed by
>>>> switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
>>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
>>>> 
>>>> That would also sidestep the largely theoretical question of whether vCPUs from
>>>> different VMs but the same address space are in the same security domain.  It doesn't
>>>> matter, because even if they are in the same domain, KVM still needs to do IBPB.
>>> 
>>> So should we go back to the earlier approach where we have it be only
>>> IBPB on always_ibpb? Or what?
>>> 
>>> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
>>> IBPB’ing even when the user did not explicitly tell us to.
>> 
>> I think we need separate controls for the guest.  E.g. if the userspace VMM is
>> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
>> mean that the entire guest it's running is sufficiently hardened.
>> 
>>> That said, since I just re-read the documentation today, it does specifically
>>> suggest that if the guest wants to protect *itself* it should turn on IBPB or
>>> STIBP (or other mitigations galore), so I think we end up having to think
>>> about what our “contract” is with users who host their workloads on
>>> KVM - are they expecting us to protect them in any/all cases?
>>> 
>>> Said another way, the internal guest areas of concern aren’t something
>>> the kernel would always be able to A) identify far in advance and B)
>>> always solve on the users behalf. There is an argument to be made
>>> that the guest needs to deal with its own house, yea?
>> 
>> The issue is that the guest won't get a notification if vCPU0 is replaced with
>> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
>> IBPB.  Since the host doesn't know whether or not the guest wants IBPB, unless the
>> owner of the host is also the owner of the guest workload, the safe approach is to
>> assume the guest is vulnerable.
> 
> Exactly. And if the guest has used taskset as its mitigation strategy,
> how is the host to know?

Yea thats fair enough. I posed a solution on Sean’s response just as this email
came in, would love to know your thoughts (keying off MSR bitmap).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 20:33           ` Jon Kohler
@ 2022-05-12 23:57             ` Jim Mattson
  2022-05-13  0:50               ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2022-05-12 23:57 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long

On Thu, May 12, 2022 at 1:34 PM Jon Kohler <jon@nutanix.com> wrote:
>
>
>
> > On May 12, 2022, at 4:27 PM, Jim Mattson <jmattson@google.com> wrote:
> >
> > On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@google.com> wrote:
> >>
> >> On Thu, May 12, 2022, Jon Kohler wrote:
> >>>
> >>>
> >>>> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
> >>>>
> >>>> On Thu, May 12, 2022, Sean Christopherson wrote:
> >>>>> On Thu, May 12, 2022, Jon Kohler wrote:
> >>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
> >>>>>> attack surface is already covered by switch_mm_irqs_off() ->
> >>>>>> cond_mitigation().
> >>>>>>
> >>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
> >>>>>> wrong in its guest-to-guest design intention. There are three scenarios
> >>>>>> at play here:
> >>>>>
> >>>>> Jim pointed offline that there's a case we didn't consider.  When switching between
> >>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
> >>>>> different security domains.  E.g. the guest will not get a notification that vCPU0 is
> >>>>> being swapped out for vCPU1 on a single pCPU.
> >>>>>
> >>>>> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
> >>>>> most definitely needs to be updated.
> >>>>>
> >>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
> >>>>> use cases where a single VM is running a single workload.
> >>>>
> >>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
> >>>> because then the IBPB is fully redundant with respect to any IBPB performed by
> >>>> switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
> >>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
> >>>>
> >>>> That would also sidestep the largely theoretical question of whether vCPUs from
> >>>> different VMs but the same address space are in the same security domain.  It doesn't
> >>>> matter, because even if they are in the same domain, KVM still needs to do IBPB.
> >>>
> >>> So should we go back to the earlier approach where we have it be only
> >>> IBPB on always_ibpb? Or what?
> >>>
> >>> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
> >>> IBPB’ing even when the user did not explicitly tell us to.
> >>
> >> I think we need separate controls for the guest.  E.g. if the userspace VMM is
> >> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
> >> mean that the entire guest it's running is sufficiently hardened.
> >>
> >>> That said, since I just re-read the documentation today, it does specifically
> >>> suggest that if the guest wants to protect *itself* it should turn on IBPB or
> >>> STIBP (or other mitigations galore), so I think we end up having to think
> >>> about what our “contract” is with users who host their workloads on
> >>> KVM - are they expecting us to protect them in any/all cases?
> >>>
> >>> Said another way, the internal guest areas of concern aren’t something
> >>> the kernel would always be able to A) identify far in advance and B)
> >>> always solve on the users behalf. There is an argument to be made
> >>> that the guest needs to deal with its own house, yea?
> >>
> >> The issue is that the guest won't get a notification if vCPU0 is replaced with
> >> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
> >> IBPB.  Since the host doesn't know whether or not the guest wants )IBPB, unless the
> >> owner of the host is also the owner of the guest workload, the safe approach is to
> >> assume the guest is vulnerable.
> >
> > Exactly. And if the guest has used taskset as its mitigation strategy,
> > how is the host to know?
>
> Yea thats fair enough. I posed a solution on Sean’s response just as this email
> came in, would love to know your thoughts (keying off MSR bitmap).
>

I don't believe this works. The IBPBs in cond_mitigation (static in
arch/x86/mm/tlb.c) won't be triggered if the guest has given its
sensitive tasks exclusive use of their cores. And, if performance is a
concern, that is exactly what someone would do.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-12 23:57             ` Jim Mattson
@ 2022-05-13  0:50               ` Jon Kohler
  2022-05-13  3:06                 ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2022-05-13  0:50 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Jon Kohler, Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long



> On May 12, 2022, at 7:57 PM, Jim Mattson <jmattson@google.com> wrote:
> 
> On Thu, May 12, 2022 at 1:34 PM Jon Kohler <jon@nutanix.com> wrote:
>> 
>> 
>> 
>>> On May 12, 2022, at 4:27 PM, Jim Mattson <jmattson@google.com> wrote:
>>> 
>>> On Thu, May 12, 2022 at 1:07 PM Sean Christopherson <seanjc@google.com> wrote:
>>>> 
>>>> On Thu, May 12, 2022, Jon Kohler wrote:
>>>>> 
>>>>> 
>>>>>> On May 12, 2022, at 3:35 PM, Sean Christopherson <seanjc@google.com> wrote:
>>>>>> 
>>>>>> On Thu, May 12, 2022, Sean Christopherson wrote:
>>>>>>> On Thu, May 12, 2022, Jon Kohler wrote:
>>>>>>>> Remove IBPB that is done on KVM vCPU load, as the guest-to-guest
>>>>>>>> attack surface is already covered by switch_mm_irqs_off() ->
>>>>>>>> cond_mitigation().
>>>>>>>> 
>>>>>>>> The original commit 15d45071523d ("KVM/x86: Add IBPB support") was simply
>>>>>>>> wrong in its guest-to-guest design intention. There are three scenarios
>>>>>>>> at play here:
>>>>>>> 
>>>>>>> Jim pointed offline that there's a case we didn't consider.  When switching between
>>>>>>> vCPUs in the same VM, an IBPB may be warranted as the tasks in the VM may be in
>>>>>>> different security domains.  E.g. the guest will not get a notification that vCPU0 is
>>>>>>> being swapped out for vCPU1 on a single pCPU.
>>>>>>> 
>>>>>>> So, sadly, after all that, I think the IBPB needs to stay.  But the documentation
>>>>>>> most definitely needs to be updated.
>>>>>>> 
>>>>>>> A per-VM capability to skip the IBPB may be warranted, e.g. for container-like
>>>>>>> use cases where a single VM is running a single workload.
>>>>>> 
>>>>>> Ah, actually, the IBPB can be skipped if the vCPUs have different mm_structs,
>>>>>> because then the IBPB is fully redundant with respect to any IBPB performed by
>>>>>> switch_mm_irqs_off().  Hrm, though it might need a KVM or per-VM knob, e.g. just
>>>>>> because the VMM doesn't want IBPB doesn't mean the guest doesn't want IBPB.
>>>>>> 
>>>>>> That would also sidestep the largely theoretical question of whether vCPUs from
>>>>>> different VMs but the same address space are in the same security domain.  It doesn't
>>>>>> matter, because even if they are in the same domain, KVM still needs to do IBPB.
>>>>> 
>>>>> So should we go back to the earlier approach where we have it be only
>>>>> IBPB on always_ibpb? Or what?
>>>>> 
>>>>> At minimum, we need to fix the unilateral-ness of all of this :) since we’re
>>>>> IBPB’ing even when the user did not explicitly tell us to.
>>>> 
>>>> I think we need separate controls for the guest.  E.g. if the userspace VMM is
>>>> sufficiently hardened then it can run without "do IBPB" flag, but that doesn't
>>>> mean that the entire guest it's running is sufficiently hardened.
>>>> 
>>>>> That said, since I just re-read the documentation today, it does specifically
>>>>> suggest that if the guest wants to protect *itself* it should turn on IBPB or
>>>>> STIBP (or other mitigations galore), so I think we end up having to think
>>>>> about what our “contract” is with users who host their workloads on
>>>>> KVM - are they expecting us to protect them in any/all cases?
>>>>> 
>>>>> Said another way, the internal guest areas of concern aren’t something
>>>>> the kernel would always be able to A) identify far in advance and B)
>>>>> always solve on the users behalf. There is an argument to be made
>>>>> that the guest needs to deal with its own house, yea?
>>>> 
>>>> The issue is that the guest won't get a notification if vCPU0 is replaced with
>>>> vCPU1 on the same physical CPU, thus the guest doesn't get an opportunity to emit
>>>> IBPB.  Since the host doesn't know whether or not the guest wants )IBPB, unless the
>>>> owner of the host is also the owner of the guest workload, the safe approach is to
>>>> assume the guest is vulnerable.
>>> 
>>> Exactly. And if the guest has used taskset as its mitigation strategy,
>>> how is the host to know?
>> 
>> Yea thats fair enough. I posed a solution on Sean’s response just as this email
>> came in, would love to know your thoughts (keying off MSR bitmap).
>> 
> 
> I don't believe this works. The IBPBs in cond_mitigation (static in
> arch/x86/mm/tlb.c) won't be triggered if the guest has given its
> sensitive tasks exclusive use of their cores. And, if performance is a
> concern, that is exactly what someone would do.

I’m talking about within the guest itself, not the host level cond_mitigation.

The purposed idea here would be to look at the MSR bitmap that is
populated from the guest writing IBPB to that vCPU at least once
in its lifetime, and that a security minded workload would indeed
configure IBPB.

Even with taskset, one would think that a security minded user would
also setup IBPB to protect itself within the guest, which is exactly 
what the linux admin guide suggests that they do (in spectre.rst).

Taking a step back and going back to the ground floor:
What would be (or should be) the expectation from the guest in
this example for their own security configuration?

i.e. they are using taskset to assign security domain 0 to vcpu 0 and
security domain 1 to vcpu 1. Would we expect them to always set up
cond_ibpb (and prctl/seccomp) to protect against other casual user
space threads that just might so happen to get scheduled into vcpu0/1?
Or are we expecting them to configure always_ibpb?

In such a security minded scenario, what would be the expectation
of host configuration?

If nothing else, I’d want to make sure we get the docs correct :)

You mentioned if someone was concerned about performance, are you
saying they also critically care about performance, such that they are
willing to *not* use IBPB at all, and instead just use taskset and hope
nothing ever gets scheduled on there, and then hope that the hypervisor
does the job for them?

Would this be the expectation of just KVM? Or all hypervisors on the
market?

Again not trying to be a hard head, just trying to wrap my own head
around all of this. I appreciate the time from both you and Sean! 

Cheers,
Jon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-13  0:50               ` Jon Kohler
@ 2022-05-13  3:06                 ` Jim Mattson
  2022-05-13  3:19                   ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2022-05-13  3:06 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long

On Thu, May 12, 2022 at 5:50 PM Jon Kohler <jon@nutanix.com> wrote:

> You mentioned if someone was concerned about performance, are you
> saying they also critically care about performance, such that they are
> willing to *not* use IBPB at all, and instead just use taskset and hope
> nothing ever gets scheduled on there, and then hope that the hypervisor
> does the job for them?

I am saying that IBPB is not the only viable mitigation for
cross-process indirect branch steering. Proper scheduling can also
solve the problem, without the overhead of IBPB. Say that you have two
security domains: trusted and untrusted. If you have a two-socket
system, and you always run trusted workloads on socket#0 and untrusted
workloads on socket#1, IBPB is completely superfluous. However, if the
hypervisor chooses to schedule a vCPU thread from virtual socket#0
after a vCPU thread from virtual socket#1 on the same logical
processor, then it *must* execute an IBPB between those two vCPU
threads. Otherwise, it has introduced a non-architectural
vulnerability that the guest can't possibly be aware of.

If you can't trust your OS to schedule tasks where you tell it to
schedule them, can you really trust it to provide you with any kind of
inter-process security?

> Would this be the expectation of just KVM? Or all hypervisors on the
> market?

Any hypervisor that doesn't do this is broken, but that won't keep it
off the market. :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-13  3:06                 ` Jim Mattson
@ 2022-05-13  3:19                   ` Jon Kohler
  2022-05-13  3:50                     ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2022-05-13  3:19 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Jon Kohler, Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long



> On May 12, 2022, at 11:06 PM, Jim Mattson <jmattson@google.com> wrote:
> 
> On Thu, May 12, 2022 at 5:50 PM Jon Kohler <jon@nutanix.com> wrote:
> 
>> You mentioned if someone was concerned about performance, are you
>> saying they also critically care about performance, such that they are
>> willing to *not* use IBPB at all, and instead just use taskset and hope
>> nothing ever gets scheduled on there, and then hope that the hypervisor
>> does the job for them?
> 
> I am saying that IBPB is not the only viable mitigation for
> cross-process indirect branch steering. Proper scheduling can also
> solve the problem, without the overhead of IBPB. Say that you have two
> security domains: trusted and untrusted. If you have a two-socket
> system, and you always run trusted workloads on socket#0 and untrusted
> workloads on socket#1, IBPB is completely superfluous. However, if the
> hypervisor chooses to schedule a vCPU thread from virtual socket#0
> after a vCPU thread from virtual socket#1 on the same logical
> processor, then it *must* execute an IBPB between those two vCPU
> threads. Otherwise, it has introduced a non-architectural
> vulnerability that the guest can't possibly be aware of.
> 
> If you can't trust your OS to schedule tasks where you tell it to
> schedule them, can you really trust it to provide you with any kind of
> inter-process security?

Fair enough, so going forward:
Should this be mandatory in all cases? How this whole effort came
was that a user could configure their KVM host with conditional
IBPB, but this particular mitigation is now always on no matter what.

In our previous patch review threads, Sean and I mostly settled on making
this particular avenue active only when a user configures always_ibpb, such
that for cases like the one you describe (and others like it that come up in
the future) can be covered easily, but for cond_ibpb, we can document
that it doesn’t cover this case. 

Would that be acceptable here?

> 
>> Would this be the expectation of just KVM? Or all hypervisors on the
>> market?
> 
> Any hypervisor that doesn't do this is broken, but that won't keep it
> off the market. :-)

Very true :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-13  3:19                   ` Jon Kohler
@ 2022-05-13  3:50                     ` Jim Mattson
  2022-05-13 15:21                       ` Jon Kohler
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2022-05-13  3:50 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long

On Thu, May 12, 2022 at 8:19 PM Jon Kohler <jon@nutanix.com> wrote:
>
>
>
> > On May 12, 2022, at 11:06 PM, Jim Mattson <jmattson@google.com> wrote:
> >
> > On Thu, May 12, 2022 at 5:50 PM Jon Kohler <jon@nutanix.com> wrote:
> >
> >> You mentioned if someone was concerned about performance, are you
> >> saying they also critically care about performance, such that they are
> >> willing to *not* use IBPB at all, and instead just use taskset and hope
> >> nothing ever gets scheduled on there, and then hope that the hypervisor
> >> does the job for them?
> >
> > I am saying that IBPB is not the only viable mitigation for
> > cross-process indirect branch steering. Proper scheduling can also
> > solve the problem, without the overhead of IBPB. Say that you have two
> > security domains: trusted and untrusted. If you have a two-socket
> > system, and you always run trusted workloads on socket#0 and untrusted
> > workloads on socket#1, IBPB is completely superfluous. However, if the
> > hypervisor chooses to schedule a vCPU thread from virtual socket#0
> > after a vCPU thread from virtual socket#1 on the same logical
> > processor, then it *must* execute an IBPB between those two vCPU
> > threads. Otherwise, it has introduced a non-architectural
> > vulnerability that the guest can't possibly be aware of.
> >
> > If you can't trust your OS to schedule tasks where you tell it to
> > schedule them, can you really trust it to provide you with any kind of
> > inter-process security?
>
> Fair enough, so going forward:
> Should this be mandatory in all cases? How this whole effort came
> was that a user could configure their KVM host with conditional
> IBPB, but this particular mitigation is now always on no matter what.
>
> In our previous patch review threads, Sean and I mostly settled on making
> this particular avenue active only when a user configures always_ibpb, such
> that for cases like the one you describe (and others like it that come up in
> the future) can be covered easily, but for cond_ibpb, we can document
> that it doesn’t cover this case.
>
> Would that be acceptable here?

That would make me unhappy. We use cond_ibpb, and I don't want to
switch to always_ibpb, yet I do want this barrier.

> >
> >> Would this be the expectation of just KVM? Or all hypervisors on the
> >> market?
> >
> > Any hypervisor that doesn't do this is broken, but that won't keep it
> > off the market. :-)
>
> Very true :)
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-13  3:50                     ` Jim Mattson
@ 2022-05-13 15:21                       ` Jon Kohler
  2022-05-13 19:36                         ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Kohler @ 2022-05-13 15:21 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Jon Kohler, Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long



> On May 12, 2022, at 11:50 PM, Jim Mattson <jmattson@google.com> wrote:
> 
> On Thu, May 12, 2022 at 8:19 PM Jon Kohler <jon@nutanix.com> wrote:
>> 
>> 
>> 
>>> On May 12, 2022, at 11:06 PM, Jim Mattson <jmattson@google.com> wrote:
>>> 
>>> On Thu, May 12, 2022 at 5:50 PM Jon Kohler <jon@nutanix.com> wrote:
>>> 
>>>> You mentioned if someone was concerned about performance, are you
>>>> saying they also critically care about performance, such that they are
>>>> willing to *not* use IBPB at all, and instead just use taskset and hope
>>>> nothing ever gets scheduled on there, and then hope that the hypervisor
>>>> does the job for them?
>>> 
>>> I am saying that IBPB is not the only viable mitigation for
>>> cross-process indirect branch steering. Proper scheduling can also
>>> solve the problem, without the overhead of IBPB. Say that you have two
>>> security domains: trusted and untrusted. If you have a two-socket
>>> system, and you always run trusted workloads on socket#0 and untrusted
>>> workloads on socket#1, IBPB is completely superfluous. However, if the
>>> hypervisor chooses to schedule a vCPU thread from virtual socket#0
>>> after a vCPU thread from virtual socket#1 on the same logical
>>> processor, then it *must* execute an IBPB between those two vCPU
>>> threads. Otherwise, it has introduced a non-architectural
>>> vulnerability that the guest can't possibly be aware of.
>>> 
>>> If you can't trust your OS to schedule tasks where you tell it to
>>> schedule them, can you really trust it to provide you with any kind of
>>> inter-process security?
>> 
>> Fair enough, so going forward:
>> Should this be mandatory in all cases? How this whole effort came
>> was that a user could configure their KVM host with conditional
>> IBPB, but this particular mitigation is now always on no matter what.
>> 
>> In our previous patch review threads, Sean and I mostly settled on making
>> this particular avenue active only when a user configures always_ibpb, such
>> that for cases like the one you describe (and others like it that come up in
>> the future) can be covered easily, but for cond_ibpb, we can document
>> that it doesn’t cover this case.
>> 
>> Would that be acceptable here?
> 
> That would make me unhappy. We use cond_ibpb, and I don't want to
> switch to always_ibpb, yet I do want this barrier.

Ok gotcha, which I think is a good point for cloud providers, since the
workload(s) are especially opaque. 

How about this: I could work up a v5 patch here where this was at minimum
a system level knob (similar to other mitigation knobs) and documented
In more detail. That way folks who might want more control here have the
basic ability to do that without recompiling the kernel. Such a “knob” would
be on by default, such that there is no functional regression here.

Would that be ok with you as a middle ground?

Thanks again, 
Jon

> 
>>> 
>>>> Would this be the expectation of just KVM? Or all hypervisors on the
>>>> market?
>>> 
>>> Any hypervisor that doesn't do this is broken, but that won't keep it
>>> off the market. :-)
>> 
>> Very true :)
>> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load
  2022-05-13 15:21                       ` Jon Kohler
@ 2022-05-13 19:36                         ` Jim Mattson
  0 siblings, 0 replies; 16+ messages in thread
From: Jim Mattson @ 2022-05-13 19:36 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Sean Christopherson, Jonathan Corbet, Paolo Bonzini,
	Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, X86 ML,
	H. Peter Anvin, Kees Cook, Andrea Arcangeli, Josh Poimboeuf,
	Kim Phillips, Lukas Bulwahn, Peter Zijlstra, Ashok Raj,
	KarimAllah Ahmed, David Woodhouse, linux-doc, LKML,
	kvm @ vger . kernel . org, Waiman Long

On Fri, May 13, 2022 at 8:21 AM Jon Kohler <jon@nutanix.com> wrote:
>
>
>
> > On May 12, 2022, at 11:50 PM, Jim Mattson <jmattson@google.com> wrote:
> >
> > On Thu, May 12, 2022 at 8:19 PM Jon Kohler <jon@nutanix.com> wrote:
> >>
> >>
> >>
> >>> On May 12, 2022, at 11:06 PM, Jim Mattson <jmattson@google.com> wrote:
> >>>
> >>> On Thu, May 12, 2022 at 5:50 PM Jon Kohler <jon@nutanix.com> wrote:
> >>>
> >>>> You mentioned if someone was concerned about performance, are you
> >>>> saying they also critically care about performance, such that they are
> >>>> willing to *not* use IBPB at all, and instead just use taskset and hope
> >>>> nothing ever gets scheduled on there, and then hope that the hypervisor
> >>>> does the job for them?
> >>>
> >>> I am saying that IBPB is not the only viable mitigation for
> >>> cross-process indirect branch steering. Proper scheduling can also
> >>> solve the problem, without the overhead of IBPB. Say that you have two
> >>> security domains: trusted and untrusted. If you have a two-socket
> >>> system, and you always run trusted workloads on socket#0 and untrusted
> >>> workloads on socket#1, IBPB is completely superfluous. However, if the
> >>> hypervisor chooses to schedule a vCPU thread from virtual socket#0
> >>> after a vCPU thread from virtual socket#1 on the same logical
> >>> processor, then it *must* execute an IBPB between those two vCPU
> >>> threads. Otherwise, it has introduced a non-architectural
> >>> vulnerability that the guest can't possibly be aware of.
> >>>
> >>> If you can't trust your OS to schedule tasks where you tell it to
> >>> schedule them, can you really trust it to provide you with any kind of
> >>> inter-process security?
> >>
> >> Fair enough, so going forward:
> >> Should this be mandatory in all cases? How this whole effort came
> >> was that a user could configure their KVM host with conditional
> >> IBPB, but this particular mitigation is now always on no matter what.
> >>
> >> In our previous patch review threads, Sean and I mostly settled on making
> >> this particular avenue active only when a user configures always_ibpb, such
> >> that for cases like the one you describe (and others like it that come up in
> >> the future) can be covered easily, but for cond_ibpb, we can document
> >> that it doesn’t cover this case.
> >>
> >> Would that be acceptable here?
> >
> > That would make me unhappy. We use cond_ibpb, and I don't want to
> > switch to always_ibpb, yet I do want this barrier.
>
> Ok gotcha, which I think is a good point for cloud providers, since the
> workload(s) are especially opaque.
>
> How about this: I could work up a v5 patch here where this was at minimum
> a system level knob (similar to other mitigation knobs) and documented
> In more detail. That way folks who might want more control here have the
> basic ability to do that without recompiling the kernel. Such a “knob” would
> be on by default, such that there is no functional regression here.
>
> Would that be ok with you as a middle ground?

That would be great. Module parameter or sysctl is fine with me.

Thanks!

> Thanks again,
> Jon
>
> >
> >>>
> >>>> Would this be the expectation of just KVM? Or all hypervisors on the
> >>>> market?
> >>>
> >>> Any hypervisor that doesn't do this is broken, but that won't keep it
> >>> off the market. :-)
> >>
> >> Very true :)
> >>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-05-13 19:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 18:45 [PATCH v4] x86/speculation, KVM: remove IBPB on vCPU load Jon Kohler
2022-05-12 19:27 ` Sean Christopherson
2022-05-12 19:35   ` Sean Christopherson
2022-05-12 19:51     ` Jon Kohler
2022-05-12 20:06       ` Jim Mattson
2022-05-12 20:07       ` Sean Christopherson
2022-05-12 20:27         ` Jim Mattson
2022-05-12 20:33           ` Jon Kohler
2022-05-12 23:57             ` Jim Mattson
2022-05-13  0:50               ` Jon Kohler
2022-05-13  3:06                 ` Jim Mattson
2022-05-13  3:19                   ` Jon Kohler
2022-05-13  3:50                     ` Jim Mattson
2022-05-13 15:21                       ` Jon Kohler
2022-05-13 19:36                         ` Jim Mattson
2022-05-12 20:31         ` Jon Kohler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).