linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] nSVM fixes and optional features
@ 2021-09-14 15:48 Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr Maxim Levitsky
                   ` (14 more replies)
  0 siblings, 15 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

Those are few patches I was working on lately, all somewhat related
to the two CVEs that I found recently.

First 7 patches fix various minor bugs that relate to these CVEs.

The rest of the patches implement various optional SVM features,
some of which the guest could enable anyway due to incorrect
checking of virt_ext field.

Last patch is somewhat an RFC, I would like to hear your opinion
on that.

I also implemented nested TSC scaling while at it.

As for other optional SVM features here is my summary of few features
I took a look at:

X86_FEATURE_DECODEASSISTS:
   this feature should make it easier
   for the L1 to emulate an instruction on MMIO access, by not
   needing to read the guest memory but rather using the instruction
   bytes that the CPU already fetched.

   The challenge of implementing this is that we sometimes inject
   #PF and #NPT syntenically and in those cases we must be sure
   we set the correct instruction bytes.

   Also this feature adds assists for MOV CR/DR, INTn, and INVLPG,
   which aren't that interesting but must be supported as well to
   expose this feature to the nested guest.

X86_FEATURE_VGIF
   Might allow the L2 to run the L3 a bit faster, but due to crazy complex
   logic we already have around int_ctl and vgif probably not worth it.

X86_FEATURE_VMCBCLEAN
   Should just be enabled, because otherwise L1 doesn't even attempt
   to set the clean bits. But we need to know if we can take an
   advantage of these bits first.

X86_FEATURE_FLUSHBYASID
X86_FEATURE_AVIC
   These two features would be very good to enable, but that
   would require lots of work, and will be done eventually.

There are few more nested SVM features that I didn't yet had a
chance to take a look at.

Best regards,
	Maxim Levitsky

Maxim Levitsky (14):
  KVM: x86: nSVM: restore int_vector in svm_clear_vintr
  KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
  KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
  KVM: x86: nSVM: don't copy pause related settings
  KVM: x86: nSVM: don't copy virt_ext from vmcb12
  KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset
  KVM: x86: SVM: add warning for CVE-2021-3656
  KVM: x86: SVM: add module param to control LBR virtualization
  KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running
  KVM: x86: nSVM: implement nested LBR virtualization
  KVM: x86: nSVM: implement nested VMLOAD/VMSAVE
  KVM: x86: SVM: add module param to control TSC scaling
  KVM: x86: nSVM: implement nested TSC scaling
  KVM: x86: nSVM: support PAUSE filter threshold and count

 arch/x86/kvm/svm/nested.c                     | 105 +++++++--
 arch/x86/kvm/svm/svm.c                        | 218 +++++++++++++++---
 arch/x86/kvm/svm/svm.h                        |  20 +-
 arch/x86/kvm/vmx/vmx.c                        |   1 +
 arch/x86/kvm/x86.c                            |   1 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/svm_int_ctl_test.c   | 128 ++++++++++
 8 files changed, 427 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c

-- 
2.26.3



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 14:04   ` Paolo Bonzini
  2021-09-14 15:48 ` [PATCH 02/14] KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 Maxim Levitsky
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

In svm_clear_vintr we try to restore the virtual interrupt
injection that might be pending, but we fail to restore
the interrupt vector.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05e8d4d27969..b2e710a3fff6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1566,6 +1566,8 @@ static void svm_clear_vintr(struct vcpu_svm *svm)
 
 		svm->vmcb->control.int_ctl |= svm->nested.ctl.int_ctl &
 			V_IRQ_INJECTION_BITS_MASK;
+
+		svm->vmcb->control.int_vector = svm->nested.ctl.int_vector;
 	}
 
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/14] KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 14:05   ` Paolo Bonzini
  2021-09-14 15:48 ` [PATCH 03/14] KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround Maxim Levitsky
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

Test that if:

* L1 disables virtual interrupt masking, and INTR intercept.

* L1 setups a virtual interrupt to be injected to L2 and enters L2 with
  interrupts disabled, thus the virtual interrupt is pending.

* Now an external interrupt arrives in L1 and since
  L1 doesn't intercept it, it should be delivered to L2 when
  it enables interrupts.

  to do this L0 (abuses) V_IRQ to setup an
  interrupt window, and returns to L2.

* L2 enables interrupts.
  This should trigger the interrupt window,
  injection of the external interrupt and delivery
  of the virtual interrupt that can now be done.

* Test that now L2 gets those interrupts.

This is the test that demonstrates the issue that was
fixed in the previous patch.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/svm_int_ctl_test.c   | 128 ++++++++++++++++++
 3 files changed, 130 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 36896d251977..eb98958b15e4 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -23,6 +23,7 @@
 /x86_64/smm_test
 /x86_64/state_test
 /x86_64/svm_vmcall_test
+/x86_64/svm_int_ctl_test
 /x86_64/sync_regs_test
 /x86_64/tsc_msrs_test
 /x86_64/userspace_msr_exit_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index c103873531e0..3b8b143daecc 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -56,6 +56,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smm_test
 TEST_GEN_PROGS_x86_64 += x86_64/state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
+TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test
 TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
 TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
diff --git a/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c b/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
new file mode 100644
index 000000000000..df04f56ce859
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * svm_int_ctl_test
+ *
+ * Copyright (C) 2021, Red Hat, Inc.
+ *
+ * Nested SVM testing: test simultaneous use of V_IRQ from L1 and L0.
+ */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "apic.h"
+
+#define VCPU_ID		0
+
+static struct kvm_vm *vm;
+
+bool vintr_irq_called;
+bool intr_irq_called;
+
+#define VINTR_IRQ_NUMBER 0x20
+#define INTR_IRQ_NUMBER 0x30
+
+static void vintr_irq_handler(struct ex_regs *regs)
+{
+	vintr_irq_called = true;
+}
+
+static void intr_irq_handler(struct ex_regs *regs)
+{
+	x2apic_write_reg(APIC_EOI, 0x00);
+	intr_irq_called = true;
+}
+
+static void l2_guest_code(struct svm_test_data *svm)
+{
+	/* This code raises interrupt INTR_IRQ_NUMBER in the L1's LAPIC,
+	 * and since L1 didn't enable virtual interrupt masking,
+	 * L2 should receive it and not L1.
+	 *
+	 * L2 also has virtual interrupt 'VINTR_IRQ_NUMBER' pending in V_IRQ
+	 * so it should also receive it after the following 'sti'.
+	 */
+	x2apic_write_reg(APIC_ICR,
+		APIC_DEST_SELF | APIC_INT_ASSERT | INTR_IRQ_NUMBER);
+
+	__asm__ __volatile__(
+		"sti\n"
+		"nop\n"
+	);
+
+	GUEST_ASSERT(vintr_irq_called);
+	GUEST_ASSERT(intr_irq_called);
+
+	__asm__ __volatile__(
+		"vmcall\n"
+	);
+}
+
+static void l1_guest_code(struct svm_test_data *svm)
+{
+	#define L2_GUEST_STACK_SIZE 64
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	struct vmcb *vmcb = svm->vmcb;
+
+	x2apic_enable();
+
+	/* Prepare for L2 execution. */
+	generic_svm_setup(svm, l2_guest_code,
+			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+	/* No virtual interrupt masking */
+	vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
+
+	/* No intercepts for real and virtual interrupts */
+	vmcb->control.intercept &= ~(1ULL << INTERCEPT_INTR | INTERCEPT_VINTR);
+
+	/* Make a virtual interrupt VINTR_IRQ_NUMBER pending */
+	vmcb->control.int_ctl |= V_IRQ_MASK | (0x1 << V_INTR_PRIO_SHIFT);
+	vmcb->control.int_vector = VINTR_IRQ_NUMBER;
+
+	run_guest(vmcb, svm->vmcb_gpa);
+	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+	GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+	vm_vaddr_t svm_gva;
+
+	nested_svm_check_supported();
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code);
+
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vm, VCPU_ID);
+
+	vm_install_exception_handler(vm, VINTR_IRQ_NUMBER, vintr_irq_handler);
+	vm_install_exception_handler(vm, INTR_IRQ_NUMBER, intr_irq_handler);
+
+	vcpu_alloc_svm(vm, &svm_gva);
+	vcpu_args_set(vm, VCPU_ID, 1, svm_gva);
+
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	vcpu_run(vm, VCPU_ID);
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_ABORT:
+		TEST_FAIL("%s", (const char *)uc.args[0]);
+		break;
+		/* NOT REACHED */
+	case UCALL_DONE:
+		goto done;
+	default:
+		TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
+	}
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/14] KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 02/14] KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 14:05   ` Paolo Bonzini
  2021-09-14 15:48 ` [PATCH 04/14] KVM: x86: nSVM: don't copy pause related settings Maxim Levitsky
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

GP SVM errata workaround made the #GP handler always emulate
the SVM instructions.

However these instructions #GP in case the operand is not 4K aligned,
but the workaround code didn't check this and we ended up
emulating these instructions anyway.

This is only an emulation accuracy check bug as there is no harm for
KVM to read/write unaligned vmcb images.

Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b2e710a3fff6..6645542df9bd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2224,6 +2224,10 @@ static int gp_interception(struct kvm_vcpu *vcpu)
 	if (error_code)
 		goto reinject;
 
+	/* All SVM instructions expect page aligned RAX */
+	if (svm->vmcb->save.rax & ~PAGE_MASK)
+		goto reinject;
+
 	/* Decode the instruction for usage later */
 	if (x86_decode_emulated_instruction(vcpu, 0, NULL, 0) != EMULATION_OK)
 		goto reinject;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/14] KVM: x86: nSVM: don't copy pause related settings
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (2 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 03/14] KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 05/14] KVM: x86: nSVM: don't copy virt_ext from vmcb12 Maxim Levitsky
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

According to the SDM, the CPU never modifies these settings.
It loads them on VM entry and updates an internal copy instead.

Also don't load them from the vmcb12 as we don't expose these
features to the nested guest yet.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2545d0c61985..476e01f98035 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -551,9 +551,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	svm->vmcb->control.event_inj           = svm->nested.ctl.event_inj;
 	svm->vmcb->control.event_inj_err       = svm->nested.ctl.event_inj_err;
 
-	svm->vmcb->control.pause_filter_count  = svm->nested.ctl.pause_filter_count;
-	svm->vmcb->control.pause_filter_thresh = svm->nested.ctl.pause_filter_thresh;
-
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
@@ -808,11 +805,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	vmcb12->control.event_inj         = svm->nested.ctl.event_inj;
 	vmcb12->control.event_inj_err     = svm->nested.ctl.event_inj_err;
 
-	vmcb12->control.pause_filter_count =
-		svm->vmcb->control.pause_filter_count;
-	vmcb12->control.pause_filter_thresh =
-		svm->vmcb->control.pause_filter_thresh;
-
 	nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr);
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/14] KVM: x86: nSVM: don't copy virt_ext from vmcb12
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (3 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 04/14] KVM: x86: nSVM: don't copy pause related settings Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 14:06   ` Paolo Bonzini
  2021-09-14 15:48 ` [PATCH 06/14] KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset Maxim Levitsky
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

These field correspond to features that we don't expose yet to L2

While currently there are no CVE worthy features in this field,
if AMD adds more features to this field, that could allow guest
escapes similar to CVE-2021-3653 and CVE-2021-3656.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 476e01f98035..4df59d9795b6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -545,7 +545,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 		(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
 		(svm->vmcb01.ptr->control.int_ctl & int_ctl_vmcb01_bits);
 
-	svm->vmcb->control.virt_ext            = svm->nested.ctl.virt_ext;
 	svm->vmcb->control.int_vector          = svm->nested.ctl.int_vector;
 	svm->vmcb->control.int_state           = svm->nested.ctl.int_state;
 	svm->vmcb->control.event_inj           = svm->nested.ctl.event_inj;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/14] KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (4 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 05/14] KVM: x86: nSVM: don't copy virt_ext from vmcb12 Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 16:40   ` Paolo Bonzini
  2021-09-14 15:48 ` [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656 Maxim Levitsky
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

commit adc2a23734ac ("KVM: nSVM: improve SYSENTER emulation on AMD"),
made init_vmcb set vmload/vmsave intercepts unconditionally,
and relied on svm_vcpu_after_set_cpuid to clear them when possible.

However init_vmcb is also called when the vCPU is reset, and it is
not followed by another call to svm_vcpu_after_set_cpuid because
the CPUID is already set.

This mistake makes the VMSAVE/VMLOAD intercept to be set when
it is not needed, and harms performance of the nested guest.

Fixes: adc2a23734ac ("KVM: nSVM: improve SYSENTER emulation on AMD")

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6645542df9bd..861ac9f74331 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1199,8 +1199,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	svm_set_intercept(svm, INTERCEPT_SHUTDOWN);
 	svm_set_intercept(svm, INTERCEPT_VMRUN);
 	svm_set_intercept(svm, INTERCEPT_VMMCALL);
-	svm_set_intercept(svm, INTERCEPT_VMLOAD);
-	svm_set_intercept(svm, INTERCEPT_VMSAVE);
 	svm_set_intercept(svm, INTERCEPT_STGI);
 	svm_set_intercept(svm, INTERCEPT_CLGI);
 	svm_set_intercept(svm, INTERCEPT_SKINIT);
@@ -1377,6 +1375,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm->guest_state_loaded = false;
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
+
+	svm_set_intercept(svm, INTERCEPT_VMLOAD);
+	svm_set_intercept(svm, INTERCEPT_VMSAVE);
+
 	init_vmcb(vcpu);
 
 	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (5 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 06/14] KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 16:44   ` Paolo Bonzini
  2021-10-11 17:30   ` Xiaoyao Li
  2021-09-14 15:48 ` [PATCH 08/14] KVM: x86: SVM: add module param to control LBR virtualization Maxim Levitsky
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

Just in case, add a warning ensuring that on guest entry,
either both VMLOAD and VMSAVE intercept is enabled or
vVMLOAD/VMSAVE is enabled.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 861ac9f74331..deeebd05f682 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3784,6 +3784,12 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
 
+	/* Check that CVE-2021-3656 can't happen again */
+	if (!svm_is_intercept(svm, INTERCEPT_VMSAVE) ||
+	    !svm_is_intercept(svm, INTERCEPT_VMSAVE))
+		WARN_ON(!(svm->vmcb->control.virt_ext &
+			  VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
+
 	sync_lapic_to_cr8(vcpu);
 
 	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/14] KVM: x86: SVM: add module param to control LBR virtualization
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (6 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656 Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 09/14] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running Maxim Levitsky
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

This is useful for debug and also makes it consistent with
the rest of the SVM optional features.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index deeebd05f682..b08c5d826208 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -186,6 +186,11 @@ module_param(vls, int, 0444);
 static int vgif = true;
 module_param(vgif, int, 0444);
 
+/* enable/disable LBR virtualization */
+static int lbrv = true;
+module_param(lbrv, int, 0444);
+
+
 /*
  * enable / disable AVIC.  Because the defaults differ for APICv
  * support between VMX and SVM we cannot use module_param_named.
@@ -1059,6 +1064,13 @@ static __init int svm_hardware_setup(void)
 			pr_info("Virtual GIF supported\n");
 	}
 
+	if (lbrv) {
+		if (!boot_cpu_has(X86_FEATURE_LBRV))
+			lbrv = false;
+		else
+			pr_info("LBR virtualization supported\n");
+	}
+
 	svm_set_cpu_caps();
 
 	/*
@@ -2920,7 +2932,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->tsc_aux = data;
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
-		if (!boot_cpu_has(X86_FEATURE_LBRV)) {
+		if (!lbrv) {
 			vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTL 0x%llx, nop\n",
 				    __func__, data);
 			break;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/14] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (7 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 08/14] KVM: x86: SVM: add module param to control LBR virtualization Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 10/14] KVM: x86: nSVM: implement nested LBR virtualization Maxim Levitsky
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

When L2 is running without LBR virtualization, we should ensure
that L1's LBR msrs continue to update as usual.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 11 +++++
 arch/x86/kvm/svm/svm.c    | 98 +++++++++++++++++++++++++++++++--------
 arch/x86/kvm/svm/svm.h    |  2 +
 3 files changed, 92 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4df59d9795b6..c6f09b591696 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -502,6 +502,9 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		svm->vcpu.arch.dr6  = vmcb12->save.dr6 | DR6_ACTIVE_LOW;
 		vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 	}
+
+	if (unlikely(svm->vmcb01.ptr->control.virt_ext & LBR_CTL_ENABLE_MASK))
+		svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
 }
 
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
@@ -550,6 +553,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	svm->vmcb->control.event_inj           = svm->nested.ctl.event_inj;
 	svm->vmcb->control.event_inj_err       = svm->nested.ctl.event_inj_err;
 
+	svm->vmcb->control.virt_ext            = svm->vmcb01.ptr->control.virt_ext &
+						 LBR_CTL_ENABLE_MASK;
+
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
@@ -808,6 +814,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
+	if (unlikely(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
+		svm_copy_lbrs(svm->nested.vmcb02.ptr, svm->vmcb);
+		svm_update_lbrv(vcpu);
+	}
+
 	/*
 	 * On vmexit the  GIF is set to false and
 	 * no event can be injected in L1.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b08c5d826208..981cc9765b95 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -792,6 +792,17 @@ static void init_msrpm_offsets(void)
 	}
 }
 
+void svm_copy_lbrs(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
+{
+	to_vmcb->save.dbgctl		= from_vmcb->save.dbgctl;
+	to_vmcb->save.br_from		= from_vmcb->save.br_from;
+	to_vmcb->save.br_to		= from_vmcb->save.br_to;
+	to_vmcb->save.last_excp_from	= from_vmcb->save.last_excp_from;
+	to_vmcb->save.last_excp_to	= from_vmcb->save.last_excp_to;
+
+	vmcb_mark_dirty(to_vmcb, VMCB_LBR);
+}
+
 static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -801,6 +812,10 @@ static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+
+	/* Move the LBR msrs to the vmcb02 so that the guest can see them. */
+	if (is_guest_mode(vcpu))
+		svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
 }
 
 static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
@@ -812,6 +827,63 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
 	set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+
+	/*
+	 * Move the LBR msrs back to the vmcb01 to avoid copying them
+	 * on nested guest entries.
+	 */
+	if (is_guest_mode(vcpu))
+		svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr);
+}
+
+static int svm_get_lbr_msr(struct vcpu_svm *svm, u32 index)
+{
+	/*
+	 * If the LBR virtualization is disabled, the LBR msrs are always
+	 * kept in the vmcb01 to avoid copying them on nested guest entries.
+	 *
+	 * If nested, and the LBR virtualization is enabled/disabled, the msrs
+	 * are moved between the vmcb01 and vmcb02 as needed.
+	 */
+	struct vmcb *vmcb =
+		(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK) ?
+			svm->vmcb : svm->vmcb01.ptr;
+
+	switch (index) {
+	case MSR_IA32_DEBUGCTLMSR:
+		return vmcb->save.dbgctl;
+	case MSR_IA32_LASTBRANCHFROMIP:
+		return vmcb->save.br_from;
+	case MSR_IA32_LASTBRANCHTOIP:
+		return vmcb->save.br_to;
+	case MSR_IA32_LASTINTFROMIP:
+		return vmcb->save.last_excp_from;
+	case MSR_IA32_LASTINTTOIP:
+		return vmcb->save.last_excp_to;
+	default:
+		KVM_BUG(false, svm->vcpu.kvm,
+			"%s: Unknown MSR 0x%x", __func__, index);
+		return 0;
+	}
+}
+
+void svm_update_lbrv(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	bool enable_lbrv = svm_get_lbr_msr(svm, MSR_IA32_DEBUGCTLMSR) &
+					   DEBUGCTLMSR_LBR;
+
+	bool current_enable_lbrv = !!(svm->vmcb->control.virt_ext &
+				      LBR_CTL_ENABLE_MASK);
+
+	if (enable_lbrv == current_enable_lbrv)
+		return;
+
+	if (enable_lbrv)
+		svm_enable_lbrv(vcpu);
+	else
+		svm_disable_lbrv(vcpu);
 }
 
 void disable_nmi_singlestep(struct vcpu_svm *svm)
@@ -2704,25 +2776,12 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_TSC_AUX:
 		msr_info->data = svm->tsc_aux;
 		break;
-	/*
-	 * Nobody will change the following 5 values in the VMCB so we can
-	 * safely return them on rdmsr. They will always be 0 until LBRV is
-	 * implemented.
-	 */
 	case MSR_IA32_DEBUGCTLMSR:
-		msr_info->data = svm->vmcb->save.dbgctl;
-		break;
 	case MSR_IA32_LASTBRANCHFROMIP:
-		msr_info->data = svm->vmcb->save.br_from;
-		break;
 	case MSR_IA32_LASTBRANCHTOIP:
-		msr_info->data = svm->vmcb->save.br_to;
-		break;
 	case MSR_IA32_LASTINTFROMIP:
-		msr_info->data = svm->vmcb->save.last_excp_from;
-		break;
 	case MSR_IA32_LASTINTTOIP:
-		msr_info->data = svm->vmcb->save.last_excp_to;
+		msr_info->data = svm_get_lbr_msr(svm, msr_info->index);
 		break;
 	case MSR_VM_HSAVE_PA:
 		msr_info->data = svm->nested.hsave_msr;
@@ -2940,12 +2999,13 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		if (data & DEBUGCTL_RESERVED_BITS)
 			return 1;
 
-		svm->vmcb->save.dbgctl = data;
-		vmcb_mark_dirty(svm->vmcb, VMCB_LBR);
-		if (data & (1ULL<<0))
-			svm_enable_lbrv(vcpu);
+		if (svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)
+			svm->vmcb->save.dbgctl = data;
 		else
-			svm_disable_lbrv(vcpu);
+			svm->vmcb01.ptr->save.dbgctl = data;
+
+		svm_update_lbrv(vcpu);
+
 		break;
 	case MSR_VM_HSAVE_PA:
 		/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 524d943f3efc..0c351e9c4d6d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -417,6 +417,8 @@ u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
 void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
 void svm_vcpu_free_msrpm(u32 *msrpm);
+void svm_copy_lbrs(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
+void svm_update_lbrv(struct kvm_vcpu *vcpu);
 
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/14] KVM: x86: nSVM: implement nested LBR virtualization
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (8 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 09/14] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 11/14] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE Maxim Levitsky
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

This was tested with kvm-unit-test that was developed
for this purpose.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 21 +++++++++++++++++++--
 arch/x86/kvm/svm/svm.c    |  9 +++++++++
 arch/x86/kvm/svm/svm.h    |  1 +
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c6f09b591696..aadbff9b6514 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -503,8 +503,19 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 	}
 
-	if (unlikely(svm->vmcb01.ptr->control.virt_ext & LBR_CTL_ENABLE_MASK))
+	if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+		svm_copy_lbrs(vmcb12, svm->vmcb);
+		/*
+		 * TODO: to avoid TOC/TOU race,
+		 * make sure that we only pick LBR enable bit from
+		 * the guest.
+		 */
+		svm->vmcb->save.dbgctl &= LBR_CTL_ENABLE_MASK;
+		svm_update_lbrv(&svm->vcpu);
+
+	} else if (unlikely(svm->vmcb01.ptr->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
 		svm_copy_lbrs(svm->vmcb01.ptr, svm->vmcb);
+	}
 }
 
 static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
@@ -555,6 +566,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 
 	svm->vmcb->control.virt_ext            = svm->vmcb01.ptr->control.virt_ext &
 						 LBR_CTL_ENABLE_MASK;
+	if (svm->lbrv_enabled)
+		svm->vmcb->control.virt_ext  |=
+			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
 
 	nested_svm_transition_tlb_flush(vcpu);
 
@@ -814,7 +828,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	svm_switch_vmcb(svm, &svm->vmcb01);
 
-	if (unlikely(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
+	if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+		svm_copy_lbrs(svm->nested.vmcb02.ptr, vmcb12);
+		svm_update_lbrv(vcpu);
+	} else if (unlikely(svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
 		svm_copy_lbrs(svm->nested.vmcb02.ptr, svm->vmcb);
 		svm_update_lbrv(vcpu);
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 981cc9765b95..66f99e8d804c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -877,6 +877,10 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
 	bool current_enable_lbrv = !!(svm->vmcb->control.virt_ext &
 				      LBR_CTL_ENABLE_MASK);
 
+	if (unlikely(is_guest_mode(vcpu) && svm->lbrv_enabled))
+		if (unlikely(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))
+			enable_lbrv = true;
+
 	if (enable_lbrv == current_enable_lbrv)
 		return;
 
@@ -1006,6 +1010,9 @@ static __init void svm_set_cpu_caps(void)
 		if (npt_enabled)
 			kvm_cpu_cap_set(X86_FEATURE_NPT);
 
+		if (lbrv)
+			kvm_cpu_cap_set(X86_FEATURE_LBRV);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
@@ -4081,6 +4088,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
 			     guest_cpuid_has(vcpu, X86_FEATURE_NRIPS);
 
+	svm->lbrv_enabled = lbrv && guest_cpuid_has(vcpu, X86_FEATURE_LBRV);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0c351e9c4d6d..c9a81e18707d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -161,6 +161,7 @@ struct vcpu_svm {
 
 	/* cached guest cpuid flags for faster access */
 	bool nrips_enabled	: 1;
+	bool lbrv_enabled       : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/14] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (9 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 10/14] KVM: x86: nSVM: implement nested LBR virtualization Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 12/14] KVM: x86: SVM: add module param to control TSC scaling Maxim Levitsky
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

This was tested by booting L1,L2,L3 (all Linux) and checking
that no VMLOAD/VMSAVE vmexits happened.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 35 +++++++++++++++++++++++++++++------
 arch/x86/kvm/svm/svm.c    |  7 +++++++
 arch/x86/kvm/svm/svm.h    | 12 +++++++++---
 3 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index aadbff9b6514..29b5d0f85960 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -119,6 +119,20 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
 	vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
 }
 
+static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
+{
+	if (!svm->v_vmload_vmsave_enabled)
+		return true;
+
+	if (!nested_npt_enabled(svm))
+		return true;
+
+	if (!(svm->nested.ctl.virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK))
+		return true;
+
+	return false;
+}
+
 void recalc_intercepts(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *c, *h, *g;
@@ -159,8 +173,17 @@ void recalc_intercepts(struct vcpu_svm *svm)
 	if (!intercept_smi)
 		vmcb_clr_intercept(c, INTERCEPT_SMI);
 
-	vmcb_set_intercept(c, INTERCEPT_VMLOAD);
-	vmcb_set_intercept(c, INTERCEPT_VMSAVE);
+	if (nested_vmcb_needs_vls_intercept(svm)) {
+		/*
+		 * If the virtual VMLOAD/VMSAVE is not enabled for the L2,
+		 * we must intercept these instructions to correctly
+		 * emulate them in case L1 doesn't intercept them.
+		 */
+		vmcb_set_intercept(c, INTERCEPT_VMLOAD);
+		vmcb_set_intercept(c, INTERCEPT_VMSAVE);
+	} else {
+		WARN_ON(!(c->virt_ext & VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
+	}
 }
 
 static void copy_vmcb_control_area(struct vmcb_control_area *dst,
@@ -387,10 +410,7 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 	vmcb12->control.exit_int_info = exit_int_info;
 }
 
-static inline bool nested_npt_enabled(struct vcpu_svm *svm)
-{
-	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
-}
+
 
 static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 {
@@ -570,6 +590,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 		svm->vmcb->control.virt_ext  |=
 			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
 
+	if (!nested_vmcb_needs_vls_intercept(svm))
+		svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
+
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 66f99e8d804c..6504e40e0985 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1013,6 +1013,9 @@ static __init void svm_set_cpu_caps(void)
 		if (lbrv)
 			kvm_cpu_cap_set(X86_FEATURE_LBRV);
 
+		if (vls)
+			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
@@ -4090,6 +4093,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->lbrv_enabled = lbrv && guest_cpuid_has(vcpu, X86_FEATURE_LBRV);
 
+	svm->v_vmload_vmsave_enabled = vls && guest_cpuid_has(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
@@ -4129,6 +4134,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0);
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0);
+
+		svm->v_vmload_vmsave_enabled = false;
 	} else {
 		/*
 		 * If hardware supports Virtual VMLOAD VMSAVE then enable it
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c9a81e18707d..029340a7fbcc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -159,9 +159,10 @@ struct vcpu_svm {
 	unsigned int3_injected;
 	unsigned long int3_rip;
 
-	/* cached guest cpuid flags for faster access */
-	bool nrips_enabled	: 1;
-	bool lbrv_enabled       : 1;
+	/* optional nested SVM features that are enabled for this guest  */
+	bool nrips_enabled                : 1;
+	bool lbrv_enabled                 : 1;
+	bool v_vmload_vmsave_enabled      : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -409,6 +410,11 @@ static inline bool gif_set(struct vcpu_svm *svm)
 		return !!(svm->vcpu.arch.hflags & HF_GIF_MASK);
 }
 
+static inline bool nested_npt_enabled(struct vcpu_svm *svm)
+{
+	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+}
+
 /* svm.c */
 #define MSR_INVALID				0xffffffffU
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/14] KVM: x86: SVM: add module param to control TSC scaling
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (10 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 11/14] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 13/14] KVM: x86: nSVM: implement nested " Maxim Levitsky
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

This allows to easily simulate a CPU without this feature.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6504e40e0985..001a5af842ba 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -191,6 +191,9 @@ static int lbrv = true;
 module_param(lbrv, int, 0444);
 
 
+static int tsc_scaling = true;
+module_param(tsc_scaling, int, 0444);
+
 /*
  * enable / disable AVIC.  Because the defaults differ for APICv
  * support between VMX and SVM we cannot use module_param_named.
@@ -471,7 +474,7 @@ static int has_svm(void)
 static void svm_hardware_disable(void)
 {
 	/* Make sure we clean up behind us */
-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR))
+	if (tsc_scaling)
 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
 
 	cpu_svm_disable();
@@ -514,6 +517,10 @@ static int svm_hardware_enable(void)
 	wrmsrl(MSR_VM_HSAVE_PA, __sme_page_pa(sd->save_area));
 
 	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+		/*
+		 * Set the default value, even if we don't use TSC scaling
+		 * to avoid having stale value in the msr
+		 */
 		wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
 		__this_cpu_write(current_tsc_ratio, TSC_RATIO_DEFAULT);
 	}
@@ -1063,10 +1070,15 @@ static __init int svm_hardware_setup(void)
 	if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
 		kvm_enable_efer_bits(EFER_FFXSR);
 
-	if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
-		kvm_has_tsc_control = true;
-		kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
-		kvm_tsc_scaling_ratio_frac_bits = 32;
+	if (tsc_scaling) {
+		if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+			tsc_scaling = false;
+		} else {
+			pr_info("TSC scaling supported\n");
+			kvm_has_tsc_control = true;
+			kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX;
+			kvm_tsc_scaling_ratio_frac_bits = 32;
+		}
 	}
 
 	tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX);
@@ -1543,7 +1555,7 @@ static void svm_prepare_guest_switch(struct kvm_vcpu *vcpu)
 		vmsave(__sme_page_pa(sd->save_area));
 	}
 
-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+	if (tsc_scaling) {
 		u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio;
 		if (tsc_ratio != __this_cpu_read(current_tsc_ratio)) {
 			__this_cpu_write(current_tsc_ratio, tsc_ratio);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/14] KVM: x86: nSVM: implement nested TSC scaling
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (11 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 12/14] KVM: x86: SVM: add module param to control TSC scaling Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-14 15:48 ` [PATCH 14/14] KVM: x86: nSVM: support PAUSE filter threshold and count Maxim Levitsky
  2021-09-23 16:54 ` [PATCH 00/14] nSVM fixes and optional features Paolo Bonzini
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

This was tested by booting a nested guest with TSC=1Ghz,
observing the clocks, and doing about 100 cycles of migration.

Note that qemu patch is needed to support migration because
of a new MSR that needs to be placed in the migration state.

The patch will be sent to the qemu mailing list soon.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 29 +++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.c    | 30 ++++++++++++++++++++++++++++--
 arch/x86/kvm/svm/svm.h    |  5 +++++
 arch/x86/kvm/vmx/vmx.c    |  1 +
 arch/x86/kvm/x86.c        |  1 +
 5 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 29b5d0f85960..4c26417f36b8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -572,8 +572,17 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	if (nested_npt_enabled(svm))
 		nested_svm_init_mmu_context(vcpu);
 
-	svm->vmcb->control.tsc_offset = vcpu->arch.tsc_offset =
-		vcpu->arch.l1_tsc_offset + svm->nested.ctl.tsc_offset;
+	vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset(
+			vcpu->arch.l1_tsc_offset,
+			svm->nested.ctl.tsc_offset,
+			svm->tsc_ratio_msr);
+
+	svm->vmcb->control.tsc_offset = vcpu->arch.tsc_offset;
+
+	if (svm->tsc_ratio_msr != kvm_default_tsc_scaling_ratio) {
+		WARN_ON(!svm->tsc_scaling_enabled);
+		nested_svm_update_tsc_ratio_msr(vcpu);
+	}
 
 	svm->vmcb->control.int_ctl             =
 		(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
@@ -872,6 +881,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 	}
 
+	if (svm->tsc_ratio_msr != kvm_default_tsc_scaling_ratio) {
+		WARN_ON(!svm->tsc_scaling_enabled);
+		vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio;
+		svm_write_tsc_multiplier(vcpu, vcpu->arch.tsc_scaling_ratio);
+	}
+
 	svm->nested.ctl.nested_cr3 = 0;
 
 	/*
@@ -1259,6 +1274,16 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 	return NESTED_EXIT_CONTINUE;
 }
 
+void nested_svm_update_tsc_ratio_msr(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	vcpu->arch.tsc_scaling_ratio =
+		kvm_calc_nested_tsc_multiplier(vcpu->arch.l1_tsc_scaling_ratio,
+					       svm->tsc_ratio_msr);
+	svm_write_tsc_multiplier(vcpu, vcpu->arch.tsc_scaling_ratio);
+}
+
 static int svm_get_nested_state(struct kvm_vcpu *vcpu,
 				struct kvm_nested_state __user *user_kvm_nested_state,
 				u32 user_data_size)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 001a5af842ba..0b797351cfb9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1023,6 +1023,9 @@ static __init void svm_set_cpu_caps(void)
 		if (vls)
 			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD);
 
+		if (tsc_scaling)
+			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
@@ -1215,7 +1218,9 @@ static u64 svm_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
 
 static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu)
 {
-	return kvm_default_tsc_scaling_ratio;
+	struct vcpu_svm *svm = to_svm(vcpu);
+
+	return svm->tsc_ratio_msr;
 }
 
 static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
@@ -1227,7 +1232,7 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
 }
 
-static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
+void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier)
 {
 	wrmsrl(MSR_AMD64_TSC_RATIO, multiplier);
 }
@@ -1405,6 +1410,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 
 	enable_gif(svm);
 
+	svm->tsc_ratio_msr = kvm_default_tsc_scaling_ratio;
 }
 
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -2765,6 +2771,11 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	switch (msr_info->index) {
+	case MSR_AMD64_TSC_RATIO:
+		if (!msr_info->host_initiated && !svm->tsc_scaling_enabled)
+			return 1;
+		msr_info->data = svm->tsc_ratio_msr;
+		break;
 	case MSR_STAR:
 		msr_info->data = svm->vmcb01.ptr->save.star;
 		break;
@@ -2901,6 +2912,19 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	u32 ecx = msr->index;
 	u64 data = msr->data;
 	switch (ecx) {
+	case MSR_AMD64_TSC_RATIO:
+		if (!msr->host_initiated && !svm->tsc_scaling_enabled)
+			return 1;
+
+		if (data & TSC_RATIO_RSVD)
+			return 1;
+
+		svm->tsc_ratio_msr = data;
+
+		if (svm->tsc_scaling_enabled && is_guest_mode(vcpu))
+			nested_svm_update_tsc_ratio_msr(vcpu);
+
+		break;
 	case MSR_IA32_CR_PAT:
 		if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
 			return 1;
@@ -4107,6 +4131,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->v_vmload_vmsave_enabled = vls && guest_cpuid_has(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
 
+	svm->tsc_scaling_enabled = tsc_scaling && guest_cpuid_has(vcpu, X86_FEATURE_TSCRATEMSR);
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 029340a7fbcc..c8ea3b14da73 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -140,6 +140,8 @@ struct vcpu_svm {
 	u64 next_rip;
 
 	u64 spec_ctrl;
+
+	u64 tsc_ratio_msr;
 	/*
 	 * Contains guest-controlled bits of VIRT_SPEC_CTRL, which will be
 	 * translated into the appropriate L2_CFG bits on the host to
@@ -163,6 +165,7 @@ struct vcpu_svm {
 	bool nrips_enabled                : 1;
 	bool lbrv_enabled                 : 1;
 	bool v_vmload_vmsave_enabled      : 1;
+	bool tsc_scaling_enabled          : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -491,6 +494,8 @@ int nested_svm_check_permissions(struct kvm_vcpu *vcpu);
 int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
 			       bool has_error_code, u32 error_code);
 int nested_svm_exit_special(struct vcpu_svm *svm);
+void nested_svm_update_tsc_ratio_msr(struct kvm_vcpu *vcpu);
+void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier);
 void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
 				     struct vmcb_control_area *control);
 void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fada1055f325..e1d8f0df8172 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6404,6 +6404,7 @@ static bool vmx_has_emulated_msr(struct kvm *kvm, u32 index)
 	case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
 		return nested;
 	case MSR_AMD64_VIRT_SPEC_CTRL:
+	case MSR_AMD64_TSC_RATIO:
 		/* This is AMD only.  */
 		return false;
 	default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 86539c1686fa..1b7881c7a516 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1374,6 +1374,7 @@ static const u32 emulated_msrs_all[] = {
 	MSR_PLATFORM_INFO,
 	MSR_MISC_FEATURES_ENABLES,
 	MSR_AMD64_VIRT_SPEC_CTRL,
+	MSR_AMD64_TSC_RATIO,
 	MSR_IA32_POWER_CTL,
 	MSR_IA32_UCODE_REV,
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/14] KVM: x86: nSVM: support PAUSE filter threshold and count
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (12 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 13/14] KVM: x86: nSVM: implement nested " Maxim Levitsky
@ 2021-09-14 15:48 ` Maxim Levitsky
  2021-09-23 16:54 ` [PATCH 00/14] nSVM fixes and optional features Paolo Bonzini
  14 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-09-14 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	Maxim Levitsky, H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

Allow L1 to use these settings if L0 disables PAUSE interception
(AKA cpu_pm=on)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/nested.c |  6 ++++++
 arch/x86/kvm/svm/svm.c    | 18 ++++++++++++++++++
 arch/x86/kvm/svm/svm.h    |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c26417f36b8..b47c745ec659 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -602,6 +602,12 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	if (!nested_vmcb_needs_vls_intercept(svm))
 		svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 
+	if (svm->pause_filter_enabled)
+		svm->vmcb->control.pause_filter_count = svm->nested.ctl.pause_filter_count;
+
+	if (svm->pause_threshold_enabled)
+		svm->vmcb->control.pause_filter_thresh = svm->nested.ctl.pause_filter_thresh;
+
 	nested_svm_transition_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0b797351cfb9..95125cd2e666 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1026,6 +1026,13 @@ static __init void svm_set_cpu_caps(void)
 		if (tsc_scaling)
 			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
 
+		if (pause_filter_count)
+			kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER);
+
+		if (pause_filter_thresh)
+			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
+
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
@@ -4133,6 +4140,17 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	svm->tsc_scaling_enabled = tsc_scaling && guest_cpuid_has(vcpu, X86_FEATURE_TSCRATEMSR);
 
+	if (kvm_pause_in_guest(vcpu->kvm)) {
+		svm->pause_filter_enabled = pause_filter_count > 0 &&
+					    guest_cpuid_has(vcpu, X86_FEATURE_PAUSEFILTER);
+
+		svm->pause_threshold_enabled = pause_filter_thresh > 0 &&
+					    guest_cpuid_has(vcpu, X86_FEATURE_PFTHRESHOLD);
+	} else {
+		svm->pause_filter_enabled = false;
+		svm->pause_threshold_enabled = false;
+	}
+
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
 	/* For sev guests, the memory encryption bit is not reserved in CR3.  */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c8ea3b14da73..7e679dd7a6e4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -166,6 +166,8 @@ struct vcpu_svm {
 	bool lbrv_enabled                 : 1;
 	bool v_vmload_vmsave_enabled      : 1;
 	bool tsc_scaling_enabled          : 1;
+	bool pause_filter_enabled         : 1;
+	bool pause_threshold_enabled      : 1;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr
  2021-09-14 15:48 ` [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr Maxim Levitsky
@ 2021-09-23 14:04   ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 14:04 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> In svm_clear_vintr we try to restore the virtual interrupt
> injection that might be pending, but we fail to restore
> the interrupt vector.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 05e8d4d27969..b2e710a3fff6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1566,6 +1566,8 @@ static void svm_clear_vintr(struct vcpu_svm *svm)
>   
>   		svm->vmcb->control.int_ctl |= svm->nested.ctl.int_ctl &
>   			V_IRQ_INJECTION_BITS_MASK;
> +
> +		svm->vmcb->control.int_vector = svm->nested.ctl.int_vector;
>   	}
>   
>   	vmcb_mark_dirty(svm->vmcb, VMCB_INTR);
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 02/14] KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
  2021-09-14 15:48 ` [PATCH 02/14] KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 Maxim Levitsky
@ 2021-09-23 14:05   ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 14:05 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> Test that if:
> 
> * L1 disables virtual interrupt masking, and INTR intercept.
> 
> * L1 setups a virtual interrupt to be injected to L2 and enters L2 with
>    interrupts disabled, thus the virtual interrupt is pending.
> 
> * Now an external interrupt arrives in L1 and since
>    L1 doesn't intercept it, it should be delivered to L2 when
>    it enables interrupts.
> 
>    to do this L0 (abuses) V_IRQ to setup an
>    interrupt window, and returns to L2.
> 
> * L2 enables interrupts.
>    This should trigger the interrupt window,
>    injection of the external interrupt and delivery
>    of the virtual interrupt that can now be done.
> 
> * Test that now L2 gets those interrupts.
> 
> This is the test that demonstrates the issue that was
> fixed in the previous patch.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/x86_64/svm_int_ctl_test.c   | 128 ++++++++++++++++++
>   3 files changed, 130 insertions(+)
>   create mode 100644 tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 36896d251977..eb98958b15e4 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -23,6 +23,7 @@
>   /x86_64/smm_test
>   /x86_64/state_test
>   /x86_64/svm_vmcall_test
> +/x86_64/svm_int_ctl_test
>   /x86_64/sync_regs_test
>   /x86_64/tsc_msrs_test
>   /x86_64/userspace_msr_exit_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index c103873531e0..3b8b143daecc 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -56,6 +56,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smm_test
>   TEST_GEN_PROGS_x86_64 += x86_64/state_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
>   TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
> +TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test
>   TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
>   TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_apic_access_test
> diff --git a/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c b/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
> new file mode 100644
> index 000000000000..df04f56ce859
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
> @@ -0,0 +1,128 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * svm_int_ctl_test
> + *
> + * Copyright (C) 2021, Red Hat, Inc.
> + *
> + * Nested SVM testing: test simultaneous use of V_IRQ from L1 and L0.
> + */
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "svm_util.h"
> +#include "apic.h"
> +
> +#define VCPU_ID		0
> +
> +static struct kvm_vm *vm;
> +
> +bool vintr_irq_called;
> +bool intr_irq_called;
> +
> +#define VINTR_IRQ_NUMBER 0x20
> +#define INTR_IRQ_NUMBER 0x30
> +
> +static void vintr_irq_handler(struct ex_regs *regs)
> +{
> +	vintr_irq_called = true;
> +}
> +
> +static void intr_irq_handler(struct ex_regs *regs)
> +{
> +	x2apic_write_reg(APIC_EOI, 0x00);
> +	intr_irq_called = true;
> +}
> +
> +static void l2_guest_code(struct svm_test_data *svm)
> +{
> +	/* This code raises interrupt INTR_IRQ_NUMBER in the L1's LAPIC,
> +	 * and since L1 didn't enable virtual interrupt masking,
> +	 * L2 should receive it and not L1.
> +	 *
> +	 * L2 also has virtual interrupt 'VINTR_IRQ_NUMBER' pending in V_IRQ
> +	 * so it should also receive it after the following 'sti'.
> +	 */
> +	x2apic_write_reg(APIC_ICR,
> +		APIC_DEST_SELF | APIC_INT_ASSERT | INTR_IRQ_NUMBER);
> +
> +	__asm__ __volatile__(
> +		"sti\n"
> +		"nop\n"
> +	);
> +
> +	GUEST_ASSERT(vintr_irq_called);
> +	GUEST_ASSERT(intr_irq_called);
> +
> +	__asm__ __volatile__(
> +		"vmcall\n"
> +	);
> +}
> +
> +static void l1_guest_code(struct svm_test_data *svm)
> +{
> +	#define L2_GUEST_STACK_SIZE 64
> +	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> +	struct vmcb *vmcb = svm->vmcb;
> +
> +	x2apic_enable();
> +
> +	/* Prepare for L2 execution. */
> +	generic_svm_setup(svm, l2_guest_code,
> +			  &l2_guest_stack[L2_GUEST_STACK_SIZE]);
> +
> +	/* No virtual interrupt masking */
> +	vmcb->control.int_ctl &= ~V_INTR_MASKING_MASK;
> +
> +	/* No intercepts for real and virtual interrupts */
> +	vmcb->control.intercept &= ~(1ULL << INTERCEPT_INTR | INTERCEPT_VINTR);
> +
> +	/* Make a virtual interrupt VINTR_IRQ_NUMBER pending */
> +	vmcb->control.int_ctl |= V_IRQ_MASK | (0x1 << V_INTR_PRIO_SHIFT);
> +	vmcb->control.int_vector = VINTR_IRQ_NUMBER;
> +
> +	run_guest(vmcb, svm->vmcb_gpa);
> +	GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
> +	GUEST_DONE();
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	vm_vaddr_t svm_gva;
> +
> +	nested_svm_check_supported();
> +
> +	vm = vm_create_default(VCPU_ID, 0, (void *) l1_guest_code);
> +
> +	vm_init_descriptor_tables(vm);
> +	vcpu_init_descriptor_tables(vm, VCPU_ID);
> +
> +	vm_install_exception_handler(vm, VINTR_IRQ_NUMBER, vintr_irq_handler);
> +	vm_install_exception_handler(vm, INTR_IRQ_NUMBER, intr_irq_handler);
> +
> +	vcpu_alloc_svm(vm, &svm_gva);
> +	vcpu_args_set(vm, VCPU_ID, 1, svm_gva);
> +
> +	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
> +	struct ucall uc;
> +
> +	vcpu_run(vm, VCPU_ID);
> +	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
> +		    "Got exit_reason other than KVM_EXIT_IO: %u (%s)\n",
> +		    run->exit_reason,
> +		    exit_reason_str(run->exit_reason));
> +
> +	switch (get_ucall(vm, VCPU_ID, &uc)) {
> +	case UCALL_ABORT:
> +		TEST_FAIL("%s", (const char *)uc.args[0]);
> +		break;
> +		/* NOT REACHED */
> +	case UCALL_DONE:
> +		goto done;
> +	default:
> +		TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
> +	}
> +done:
> +	kvm_vm_free(vm);
> +	return 0;
> +}
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/14] KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
  2021-09-14 15:48 ` [PATCH 03/14] KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround Maxim Levitsky
@ 2021-09-23 14:05   ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 14:05 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> GP SVM errata workaround made the #GP handler always emulate
> the SVM instructions.
> 
> However these instructions #GP in case the operand is not 4K aligned,
> but the workaround code didn't check this and we ended up
> emulating these instructions anyway.
> 
> This is only an emulation accuracy check bug as there is no harm for
> KVM to read/write unaligned vmcb images.
> 
> Fixes: 82a11e9c6fa2 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions")
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b2e710a3fff6..6645542df9bd 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2224,6 +2224,10 @@ static int gp_interception(struct kvm_vcpu *vcpu)
>   	if (error_code)
>   		goto reinject;
>   
> +	/* All SVM instructions expect page aligned RAX */
> +	if (svm->vmcb->save.rax & ~PAGE_MASK)
> +		goto reinject;
> +
>   	/* Decode the instruction for usage later */
>   	if (x86_decode_emulated_instruction(vcpu, 0, NULL, 0) != EMULATION_OK)
>   		goto reinject;
> 


Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/14] KVM: x86: nSVM: don't copy virt_ext from vmcb12
  2021-09-14 15:48 ` [PATCH 05/14] KVM: x86: nSVM: don't copy virt_ext from vmcb12 Maxim Levitsky
@ 2021-09-23 14:06   ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 14:06 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> These field correspond to features that we don't expose yet to L2
> 
> While currently there are no CVE worthy features in this field,
> if AMD adds more features to this field, that could allow guest
> escapes similar to CVE-2021-3653 and CVE-2021-3656.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/nested.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 476e01f98035..4df59d9795b6 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -545,7 +545,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
>   		(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
>   		(svm->vmcb01.ptr->control.int_ctl & int_ctl_vmcb01_bits);
>   
> -	svm->vmcb->control.virt_ext            = svm->nested.ctl.virt_ext;
>   	svm->vmcb->control.int_vector          = svm->nested.ctl.int_vector;
>   	svm->vmcb->control.int_state           = svm->nested.ctl.int_state;
>   	svm->vmcb->control.event_inj           = svm->nested.ctl.event_inj;
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 06/14] KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset
  2021-09-14 15:48 ` [PATCH 06/14] KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset Maxim Levitsky
@ 2021-09-23 16:40   ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 16:40 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> commit adc2a23734ac ("KVM: nSVM: improve SYSENTER emulation on AMD"),
> made init_vmcb set vmload/vmsave intercepts unconditionally,
> and relied on svm_vcpu_after_set_cpuid to clear them when possible.
> 
> However init_vmcb is also called when the vCPU is reset, and it is
> not followed by another call to svm_vcpu_after_set_cpuid because
> the CPUID is already set.
> 
> This mistake makes the VMSAVE/VMLOAD intercept to be set when
> it is not needed, and harms performance of the nested guest.
> 
> Fixes: adc2a23734ac ("KVM: nSVM: improve SYSENTER emulation on AMD")
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6645542df9bd..861ac9f74331 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1199,8 +1199,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>   	svm_set_intercept(svm, INTERCEPT_SHUTDOWN);
>   	svm_set_intercept(svm, INTERCEPT_VMRUN);
>   	svm_set_intercept(svm, INTERCEPT_VMMCALL);
> -	svm_set_intercept(svm, INTERCEPT_VMLOAD);
> -	svm_set_intercept(svm, INTERCEPT_VMSAVE);
>   	svm_set_intercept(svm, INTERCEPT_STGI);
>   	svm_set_intercept(svm, INTERCEPT_CLGI);
>   	svm_set_intercept(svm, INTERCEPT_SKINIT);
> @@ -1377,6 +1375,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
>   	svm->guest_state_loaded = false;
>   
>   	svm_switch_vmcb(svm, &svm->vmcb01);
> +
> +	svm_set_intercept(svm, INTERCEPT_VMLOAD);
> +	svm_set_intercept(svm, INTERCEPT_VMSAVE);
> +
>   	init_vmcb(vcpu);
>   
>   	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
> 

This needs to be redone after the latest refactoring of svm_vcpu_reset. 
  I'll send a patch myself.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656
  2021-09-14 15:48 ` [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656 Maxim Levitsky
@ 2021-09-23 16:44   ` Paolo Bonzini
  2021-10-12  0:21     ` Sean Christopherson
  2021-10-11 17:30   ` Xiaoyao Li
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 16:44 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> Just in case, add a warning ensuring that on guest entry,
> either both VMLOAD and VMSAVE intercept is enabled or
> vVMLOAD/VMSAVE is enabled.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 861ac9f74331..deeebd05f682 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3784,6 +3784,12 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
>   
> +	/* Check that CVE-2021-3656 can't happen again */
> +	if (!svm_is_intercept(svm, INTERCEPT_VMSAVE) ||
> +	    !svm_is_intercept(svm, INTERCEPT_VMSAVE))
> +		WARN_ON(!(svm->vmcb->control.virt_ext &
> +			  VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
> +
>   	sync_lapic_to_cr8(vcpu);
>   
>   	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> 

While it's nice to be "proactive", this does adds some extra work. 
Maybe it should be under CONFIG_DEBUG_KERNEL.  It could be useful to 
make it into its own function so we can add similar intercept invariants 
in the same place.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/14] nSVM fixes and optional features
  2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
                   ` (13 preceding siblings ...)
  2021-09-14 15:48 ` [PATCH 14/14] KVM: x86: nSVM: support PAUSE filter threshold and count Maxim Levitsky
@ 2021-09-23 16:54 ` Paolo Bonzini
  14 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2021-09-23 16:54 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Borislav Petkov, Bandan Das, open list,
	Joerg Roedel, Ingo Molnar, Wei Huang, Sean Christopherson,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 14/09/21 17:48, Maxim Levitsky wrote:
> Those are few patches I was working on lately, all somewhat related
> to the two CVEs that I found recently.
> 
> First 7 patches fix various minor bugs that relate to these CVEs.
> 
> The rest of the patches implement various optional SVM features,
> some of which the guest could enable anyway due to incorrect
> checking of virt_ext field.
> 
> Last patch is somewhat an RFC, I would like to hear your opinion
> on that.
> 
> I also implemented nested TSC scaling while at it.
> 
> As for other optional SVM features here is my summary of few features
> I took a look at:
> 
> X86_FEATURE_DECODEASSISTS:
>     this feature should make it easier
>     for the L1 to emulate an instruction on MMIO access, by not
>     needing to read the guest memory but rather using the instruction
>     bytes that the CPU already fetched.
> 
>     The challenge of implementing this is that we sometimes inject
>     #PF and #NPT syntenically and in those cases we must be sure
>     we set the correct instruction bytes.
> 
>     Also this feature adds assists for MOV CR/DR, INTn, and INVLPG,
>     which aren't that interesting but must be supported as well to
>     expose this feature to the nested guest.
> 
> X86_FEATURE_VGIF
>     Might allow the L2 to run the L3 a bit faster, but due to crazy complex
>     logic we already have around int_ctl and vgif probably not worth it.
> 
> X86_FEATURE_VMCBCLEAN
>     Should just be enabled, because otherwise L1 doesn't even attempt
>     to set the clean bits. But we need to know if we can take an
>     advantage of these bits first.
> 
> X86_FEATURE_FLUSHBYASID
> X86_FEATURE_AVIC
>     These two features would be very good to enable, but that
>     would require lots of work, and will be done eventually.
> 
> There are few more nested SVM features that I didn't yet had a
> chance to take a look at.
> 
> Best regards,
> 	Maxim Levitsky
> 
> Maxim Levitsky (14):
>    KVM: x86: nSVM: restore int_vector in svm_clear_vintr
>    KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0
>    KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround
>    KVM: x86: nSVM: don't copy pause related settings
>    KVM: x86: nSVM: don't copy virt_ext from vmcb12
>    KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset
>    KVM: x86: SVM: add warning for CVE-2021-3656
>    KVM: x86: SVM: add module param to control LBR virtualization
>    KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running
>    KVM: x86: nSVM: implement nested LBR virtualization
>    KVM: x86: nSVM: implement nested VMLOAD/VMSAVE
>    KVM: x86: SVM: add module param to control TSC scaling
>    KVM: x86: nSVM: implement nested TSC scaling
>    KVM: x86: nSVM: support PAUSE filter threshold and count
> 
>   arch/x86/kvm/svm/nested.c                     | 105 +++++++--
>   arch/x86/kvm/svm/svm.c                        | 218 +++++++++++++++---
>   arch/x86/kvm/svm/svm.h                        |  20 +-
>   arch/x86/kvm/vmx/vmx.c                        |   1 +
>   arch/x86/kvm/x86.c                            |   1 +
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/x86_64/svm_int_ctl_test.c   | 128 ++++++++++
>   8 files changed, 427 insertions(+), 48 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/x86_64/svm_int_ctl_test.c
> 

Queued more patches, with 9-10-11-14 left now.

Paolo


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656
  2021-09-14 15:48 ` [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656 Maxim Levitsky
  2021-09-23 16:44   ` Paolo Bonzini
@ 2021-10-11 17:30   ` Xiaoyao Li
  2021-10-12  7:51     ` Maxim Levitsky
  1 sibling, 1 reply; 25+ messages in thread
From: Xiaoyao Li @ 2021-10-11 17:30 UTC (permalink / raw)
  To: Maxim Levitsky, kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On 9/14/2021 11:48 PM, Maxim Levitsky wrote:
> Just in case, add a warning ensuring that on guest entry,
> either both VMLOAD and VMSAVE intercept is enabled or
> vVMLOAD/VMSAVE is enabled.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 861ac9f74331..deeebd05f682 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3784,6 +3784,12 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
>   
> +	/* Check that CVE-2021-3656 can't happen again */
> +	if (!svm_is_intercept(svm, INTERCEPT_VMSAVE) ||
> +	    !svm_is_intercept(svm, INTERCEPT_VMSAVE))

either one needs to be INTERCEPT_VMLOAD, right?

> +		WARN_ON(!(svm->vmcb->control.virt_ext &
> +			  VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
> +
>   	sync_lapic_to_cr8(vcpu);
>   
>   	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656
  2021-09-23 16:44   ` Paolo Bonzini
@ 2021-10-12  0:21     ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2021-10-12  0:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Maxim Levitsky, kvm, Vitaly Kuznetsov, Borislav Petkov,
	Bandan Das, open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	open list:KERNEL SELFTEST FRAMEWORK, H. Peter Anvin, Jim Mattson,
	Thomas Gleixner, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On Thu, Sep 23, 2021, Paolo Bonzini wrote:
> On 14/09/21 17:48, Maxim Levitsky wrote:
> > Just in case, add a warning ensuring that on guest entry,
> > either both VMLOAD and VMSAVE intercept is enabled or
> > vVMLOAD/VMSAVE is enabled.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/svm/svm.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 861ac9f74331..deeebd05f682 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3784,6 +3784,12 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
> >   	WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
> > +	/* Check that CVE-2021-3656 can't happen again */
> > +	if (!svm_is_intercept(svm, INTERCEPT_VMSAVE) ||
> > +	    !svm_is_intercept(svm, INTERCEPT_VMSAVE))
> > +		WARN_ON(!(svm->vmcb->control.virt_ext &
> > +			  VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
> > +
> >   	sync_lapic_to_cr8(vcpu);
> >   	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> > 
> 
> While it's nice to be "proactive", this does adds some extra work. Maybe it
> should be under CONFIG_DEBUG_KERNEL.  It could be useful to make it into its
> own function so we can add similar intercept invariants in the same place.

I don't know that DEBUG_KERNEL will guard much, DEBUG_KERNEL=y is very common,
e.g. it's on by default in the x86 defconfigs.  I too agree it's nice to be
proactive, but this isn't that different than say failing to intercept CR3 loads
when shadow paging is enabled.

If we go down the path of effectively auditing KVM invariants, I'd rather we
commit fully and (a) add a dedicated Kconfig that is highly unlikely to be turned
on by accident and (b) audit a large number of invariants.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656
  2021-10-11 17:30   ` Xiaoyao Li
@ 2021-10-12  7:51     ` Maxim Levitsky
  0 siblings, 0 replies; 25+ messages in thread
From: Maxim Levitsky @ 2021-10-12  7:51 UTC (permalink / raw)
  To: Xiaoyao Li, kvm
  Cc: Vitaly Kuznetsov, Paolo Bonzini, Borislav Petkov, Bandan Das,
	open list, Joerg Roedel, Ingo Molnar, Wei Huang,
	Sean Christopherson, open list:KERNEL SELFTEST FRAMEWORK,
	H. Peter Anvin, Jim Mattson, Thomas Gleixner,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Shuah Khan, Wanpeng Li

On Tue, 2021-10-12 at 01:30 +0800, Xiaoyao Li wrote:
> On 9/14/2021 11:48 PM, Maxim Levitsky wrote:
> > Just in case, add a warning ensuring that on guest entry,
> > either both VMLOAD and VMSAVE intercept is enabled or
> > vVMLOAD/VMSAVE is enabled.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/svm/svm.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 861ac9f74331..deeebd05f682 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3784,6 +3784,12 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
> >   
> >   	WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));
> >   
> > +	/* Check that CVE-2021-3656 can't happen again */
> > +	if (!svm_is_intercept(svm, INTERCEPT_VMSAVE) ||
> > +	    !svm_is_intercept(svm, INTERCEPT_VMSAVE))
> 
> either one needs to be INTERCEPT_VMLOAD, right?

Oops! Of course.

Best regards,
	Maxim Levitsky
> 
> > +		WARN_ON(!(svm->vmcb->control.virt_ext &
> > +			  VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK));
> > +
> >   	sync_lapic_to_cr8(vcpu);
> >   
> >   	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> > 



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-10-12  7:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-14 15:48 [PATCH 00/14] nSVM fixes and optional features Maxim Levitsky
2021-09-14 15:48 ` [PATCH 01/14] KVM: x86: nSVM: restore int_vector in svm_clear_vintr Maxim Levitsky
2021-09-23 14:04   ` Paolo Bonzini
2021-09-14 15:48 ` [PATCH 02/14] KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 Maxim Levitsky
2021-09-23 14:05   ` Paolo Bonzini
2021-09-14 15:48 ` [PATCH 03/14] KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround Maxim Levitsky
2021-09-23 14:05   ` Paolo Bonzini
2021-09-14 15:48 ` [PATCH 04/14] KVM: x86: nSVM: don't copy pause related settings Maxim Levitsky
2021-09-14 15:48 ` [PATCH 05/14] KVM: x86: nSVM: don't copy virt_ext from vmcb12 Maxim Levitsky
2021-09-23 14:06   ` Paolo Bonzini
2021-09-14 15:48 ` [PATCH 06/14] KVM: x86: SVM: don't set VMLOAD/VMSAVE intercepts on vCPU reset Maxim Levitsky
2021-09-23 16:40   ` Paolo Bonzini
2021-09-14 15:48 ` [PATCH 07/14] KVM: x86: SVM: add warning for CVE-2021-3656 Maxim Levitsky
2021-09-23 16:44   ` Paolo Bonzini
2021-10-12  0:21     ` Sean Christopherson
2021-10-11 17:30   ` Xiaoyao Li
2021-10-12  7:51     ` Maxim Levitsky
2021-09-14 15:48 ` [PATCH 08/14] KVM: x86: SVM: add module param to control LBR virtualization Maxim Levitsky
2021-09-14 15:48 ` [PATCH 09/14] KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running Maxim Levitsky
2021-09-14 15:48 ` [PATCH 10/14] KVM: x86: nSVM: implement nested LBR virtualization Maxim Levitsky
2021-09-14 15:48 ` [PATCH 11/14] KVM: x86: nSVM: implement nested VMLOAD/VMSAVE Maxim Levitsky
2021-09-14 15:48 ` [PATCH 12/14] KVM: x86: SVM: add module param to control TSC scaling Maxim Levitsky
2021-09-14 15:48 ` [PATCH 13/14] KVM: x86: nSVM: implement nested " Maxim Levitsky
2021-09-14 15:48 ` [PATCH 14/14] KVM: x86: nSVM: support PAUSE filter threshold and count Maxim Levitsky
2021-09-23 16:54 ` [PATCH 00/14] nSVM fixes and optional features Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).