All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: x86: Allow opt out of guest hypercall patching
@ 2022-03-16  0:55 Oliver Upton
  2022-03-16  0:55 ` [PATCH 1/2] KVM: x86: Allow userspace to opt out of " Oliver Upton
  2022-03-16  0:55 ` [PATCH 2/2] selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN Oliver Upton
  0 siblings, 2 replies; 15+ messages in thread
From: Oliver Upton @ 2022-03-16  0:55 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier, Oliver Upton

Another interesting behavior of KVM is that it rewrites guest hypercall
instructions when emulated on #UD. So, if a guest uses the wrong
instruction for the vendor its running on, KVM rewrites the guest
instruction to use the correct one (i.e. VMCALL on VMX, VMMCALL on SVM).

While it may not be the end of the world for a non-nested guest that
knows its running on KVM, this is dead wrong in the context of nested
virtualization.

The nested situation could probably be seen as a bug, but I decided to
leave it as is for now in the series since there hasn't been any
complaints about it so far.

This series adds a quirk which allows userspace to opt out of hypercall
rewrites. With the quirk disabled, misbehaved guests will see a #UD
instead.

Applies to kvm/queue at the following commit:

  2ca1ba339ed8 ("KVM: x86: Test case for TSC scaling and offset sync")

Note, the series depends on KVM_CAP_DISABLE_QUIRKS2 which was introduced
in the following commit on kvm/queue:

  3a825326df69 ("KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2")

Tested with the included selftest on an Intel Skylake and AMD Rome
machine.

Oliver Upton (2):
  KVM: x86: Allow userspace to opt out of hypercall patching
  selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN

 Documentation/virt/kvm/api.rst                |   9 +
 arch/x86/include/asm/kvm_host.h               |   3 +-
 arch/x86/include/uapi/asm/kvm.h               |  11 +-
 arch/x86/kvm/x86.c                            |  11 ++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/fix_hypercall_test.c | 170 ++++++++++++++++++
 7 files changed, 200 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c

-- 
2.35.1.723.g4982287a31-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-16  0:55 [PATCH 0/2] KVM: x86: Allow opt out of guest hypercall patching Oliver Upton
@ 2022-03-16  0:55 ` Oliver Upton
  2022-03-24 17:44   ` Sean Christopherson
  2022-03-16  0:55 ` [PATCH 2/2] selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN Oliver Upton
  1 sibling, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2022-03-16  0:55 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier, Oliver Upton

KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
both of these instructions really should #UD when executed on the wrong
vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
guest's instruction with the appropriate instruction for the vendor.
Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
do not patch in the appropriate instruction using alternatives, likely
motivating KVM's intervention.

Add a quirk allowing userspace to opt out of hypercall patching. If the
quirk is disabled, KVM synthesizes a #UD in the guest.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst  |  9 +++++++++
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/include/uapi/asm/kvm.h | 11 ++++++-----
 arch/x86/kvm/x86.c              | 11 +++++++++++
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 507be67746b0..862314e156ae 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7233,6 +7233,15 @@ The valid bits in cap.args[0] are:
                                     Additionally, when this quirk is disabled,
                                     KVM clears CPUID.01H:ECX[bit 3] if
                                     IA32_MISC_ENABLE[bit 18] is cleared.
+
+ KVM_X86_QUIRK_FIX_HYPERCALL_INSN   By default, KVM rewrites guest
+                                    VMMCALL/VMCALL instructions to match the
+                                    vendor's hypercall instruction for the
+                                    system. When this quirk is disabled, KVM
+                                    will no longer rewrite invalid guest
+                                    hypercall instructions. Executing the
+                                    incorrect hypercall instruction will
+                                    generate a #UD within the guest.
 =================================== ============================================
 
 8. Other capabilities.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9694dd5e6ccc..832e9af24a85 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1996,6 +1996,7 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 	 KVM_X86_QUIRK_CD_NW_CLEARED |		\
 	 KVM_X86_QUIRK_LAPIC_MMIO_HOLE |	\
 	 KVM_X86_QUIRK_OUT_7E_INC_RIP |		\
-	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
+	 KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT |	\
+	 KVM_X86_QUIRK_FIX_HYPERCALL_INSN)
 
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index bf6e96011dfe..21614807a2cb 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -428,11 +428,12 @@ struct kvm_sync_regs {
 	struct kvm_vcpu_events events;
 };
 
-#define KVM_X86_QUIRK_LINT0_REENABLED	   (1 << 0)
-#define KVM_X86_QUIRK_CD_NW_CLEARED	   (1 << 1)
-#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE	   (1 << 2)
-#define KVM_X86_QUIRK_OUT_7E_INC_RIP	   (1 << 3)
-#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT (1 << 4)
+#define KVM_X86_QUIRK_LINT0_REENABLED		(1 << 0)
+#define KVM_X86_QUIRK_CD_NW_CLEARED		(1 << 1)
+#define KVM_X86_QUIRK_LAPIC_MMIO_HOLE		(1 << 2)
+#define KVM_X86_QUIRK_OUT_7E_INC_RIP		(1 << 3)
+#define KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT	(1 << 4)
+#define KVM_X86_QUIRK_FIX_HYPERCALL_INSN	(1 << 5)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3a9ce07a565..685c4bc453b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9291,6 +9291,17 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 	char instruction[3];
 	unsigned long rip = kvm_rip_read(vcpu);
 
+	/*
+	 * If the quirk is disabled, synthesize a #UD and let the guest pick up
+	 * the pieces.
+	 */
+	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
+		ctxt->exception.error_code_valid = false;
+		ctxt->exception.vector = UD_VECTOR;
+		ctxt->have_exception = true;
+		return X86EMUL_PROPAGATE_FAULT;
+	}
+
 	static_call(kvm_x86_patch_hypercall)(vcpu, instruction);
 
 	return emulator_write_emulated(ctxt, rip, instruction, 3,
-- 
2.35.1.723.g4982287a31-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN
  2022-03-16  0:55 [PATCH 0/2] KVM: x86: Allow opt out of guest hypercall patching Oliver Upton
  2022-03-16  0:55 ` [PATCH 1/2] KVM: x86: Allow userspace to opt out of " Oliver Upton
@ 2022-03-16  0:55 ` Oliver Upton
  2022-03-24 19:09   ` Oliver Upton
  1 sibling, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2022-03-16  0:55 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier, Oliver Upton

Add a test that asserts KVM rewrites guest hypercall instructions to
match the running architecture (VMCALL on VMX, VMMCALL on SVM).
Additionally, test that with the quirk disabled, KVM no longer rewrites
guest instructions and instead injects a #UD.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/x86_64/fix_hypercall_test.c | 170 ++++++++++++++++++
 3 files changed, 172 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 9b67343dc4ab..1f1b6c978bf7 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -15,6 +15,7 @@
 /x86_64/debug_regs
 /x86_64/evmcs_test
 /x86_64/emulator_error_test
+/x86_64/fix_hypercall_test
 /x86_64/get_msr_index_features
 /x86_64/kvm_clock_test
 /x86_64/kvm_pv_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 6d69e196f1b7..c9cdbd248727 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -48,6 +48,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
 TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
 TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
 TEST_GEN_PROGS_x86_64 += x86_64/emulator_error_test
+TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
diff --git a/tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c b/tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c
new file mode 100644
index 000000000000..1f5c32146f3d
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020, Google LLC.
+ *
+ * Tests for KVM paravirtual feature disablement
+ */
+#include <asm/kvm_para.h>
+#include <linux/kvm_para.h>
+#include <linux/stringify.h>
+#include <stdint.h>
+
+#include "apic.h"
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+
+static bool ud_expected;
+
+static void guest_ud_handler(struct ex_regs *regs)
+{
+	GUEST_ASSERT(ud_expected);
+	GUEST_DONE();
+}
+
+extern unsigned char svm_hypercall_insn;
+static uint64_t svm_do_sched_yield(uint8_t apic_id)
+{
+	uint64_t ret;
+
+	asm volatile("mov %1, %%rax\n\t"
+		     "mov %2, %%rbx\n\t"
+		     "svm_hypercall_insn:\n\t"
+		     "vmmcall\n\t"
+		     "mov %%rax, %0\n\t"
+		     : "=r"(ret)
+		     : "r"((uint64_t)KVM_HC_SCHED_YIELD), "r"((uint64_t)apic_id)
+		     : "rax", "rbx", "memory");
+
+	return ret;
+}
+
+extern unsigned char vmx_hypercall_insn;
+static uint64_t vmx_do_sched_yield(uint8_t apic_id)
+{
+	uint64_t ret;
+
+	asm volatile("mov %1, %%rax\n\t"
+		     "mov %2, %%rbx\n\t"
+		     "vmx_hypercall_insn:\n\t"
+		     "vmcall\n\t"
+		     "mov %%rax, %0\n\t"
+		     : "=r"(ret)
+		     : "r"((uint64_t)KVM_HC_SCHED_YIELD), "r"((uint64_t)apic_id)
+		     : "rax", "rbx", "memory");
+
+	return ret;
+}
+
+static void assert_hypercall_insn(unsigned char *exp_insn, unsigned char *obs_insn)
+{
+	uint32_t exp = 0, obs = 0;
+
+	memcpy(&exp, exp_insn, sizeof(exp));
+	memcpy(&obs, obs_insn, sizeof(obs));
+
+	GUEST_ASSERT_EQ(exp, obs);
+}
+
+static void guest_main(void)
+{
+	unsigned char *native_hypercall_insn, *hypercall_insn;
+	uint8_t apic_id;
+
+	apic_id = GET_APIC_ID_FIELD(xapic_read_reg(APIC_ID));
+
+	if (is_intel_cpu()) {
+		native_hypercall_insn = &vmx_hypercall_insn;
+		hypercall_insn = &svm_hypercall_insn;
+		svm_do_sched_yield(apic_id);
+	} else if (is_amd_cpu()) {
+		native_hypercall_insn = &svm_hypercall_insn;
+		hypercall_insn = &vmx_hypercall_insn;
+		vmx_do_sched_yield(apic_id);
+	} else {
+		GUEST_ASSERT(0);
+		/* unreachable */
+		return;
+	}
+
+	GUEST_ASSERT(!ud_expected);
+	assert_hypercall_insn(native_hypercall_insn, hypercall_insn);
+	GUEST_DONE();
+}
+
+static void setup_ud_vector(struct kvm_vm *vm)
+{
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vm, VCPU_ID);
+	vm_install_exception_handler(vm, UD_VECTOR, guest_ud_handler);
+}
+
+static void enter_guest(struct kvm_vm *vm)
+{
+	struct kvm_run *run;
+	struct ucall uc;
+
+	run = vcpu_state(vm, VCPU_ID);
+
+	vcpu_run(vm, VCPU_ID);
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_SYNC:
+		pr_info("%s: %016lx\n", (const char *)uc.args[2], uc.args[3]);
+		break;
+	case UCALL_DONE:
+		return;
+	case UCALL_ABORT:
+		TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], __FILE__, uc.args[1]);
+	default:
+		TEST_FAIL("Unhandled ucall: %ld\nexit_reason: %u (%s)",
+			  uc.cmd, run->exit_reason, exit_reason_str(run->exit_reason));
+	}
+}
+
+static void test_fix_hypercall(void)
+{
+	struct kvm_vm *vm;
+
+	vm = vm_create_default(VCPU_ID, 0, guest_main);
+	setup_ud_vector(vm);
+
+	ud_expected = false;
+	sync_global_to_guest(vm, ud_expected);
+
+	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+	enter_guest(vm);
+}
+
+static void test_fix_hypercall_disabled(void)
+{
+	struct kvm_enable_cap cap = {0};
+	struct kvm_vm *vm;
+
+	vm = vm_create_default(VCPU_ID, 0, guest_main);
+	setup_ud_vector(vm);
+
+	cap.cap = KVM_CAP_DISABLE_QUIRKS2;
+	cap.args[0] = KVM_X86_QUIRK_FIX_HYPERCALL_INSN;
+	vm_enable_cap(vm, &cap);
+
+	ud_expected = true;
+	sync_global_to_guest(vm, ud_expected);
+
+	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+	enter_guest(vm);
+}
+
+int main(void)
+{
+	if (!(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) & KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
+		print_skip("KVM_X86_QUIRK_HYPERCALL_INSN not supported");
+		exit(KSFT_SKIP);
+	}
+
+	test_fix_hypercall();
+	test_fix_hypercall_disabled();
+}
-- 
2.35.1.723.g4982287a31-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-16  0:55 ` [PATCH 1/2] KVM: x86: Allow userspace to opt out of " Oliver Upton
@ 2022-03-24 17:44   ` Sean Christopherson
  2022-03-24 17:57     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2022-03-24 17:44 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Dunn, Peter Shier

On Wed, Mar 16, 2022, Oliver Upton wrote:
> KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> both of these instructions really should #UD when executed on the wrong
> vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> guest's instruction with the appropriate instruction for the vendor.
> Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> do not patch in the appropriate instruction using alternatives, likely
> motivating KVM's intervention.
> 
> Add a quirk allowing userspace to opt out of hypercall patching.

A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
intentional.

https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com

> If the quirk is disabled, KVM synthesizes a #UD in the guest.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d3a9ce07a565..685c4bc453b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9291,6 +9291,17 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
>  	char instruction[3];
>  	unsigned long rip = kvm_rip_read(vcpu);
>  
> +	/*
> +	 * If the quirk is disabled, synthesize a #UD and let the guest pick up
> +	 * the pieces.
> +	 */
> +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> +		ctxt->exception.error_code_valid = false;
> +		ctxt->exception.vector = UD_VECTOR;
> +		ctxt->have_exception = true;
> +		return X86EMUL_PROPAGATE_FAULT;

This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
will also end up generating a #UD in most cases, but will play nice with
KVM_CAP_EXIT_ON_EMULATION_FAILURE.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-24 17:44   ` Sean Christopherson
@ 2022-03-24 17:57     ` Paolo Bonzini
  2022-03-24 19:05       ` Oliver Upton
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2022-03-24 17:57 UTC (permalink / raw)
  To: Sean Christopherson, Oliver Upton
  Cc: kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	David Dunn, Peter Shier

On 3/24/22 18:44, Sean Christopherson wrote:
> On Wed, Mar 16, 2022, Oliver Upton wrote:
>> KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
>> both of these instructions really should #UD when executed on the wrong
>> vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
>> guest's instruction with the appropriate instruction for the vendor.
>> Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
>> use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
>> do not patch in the appropriate instruction using alternatives, likely
>> motivating KVM's intervention.
>>
>> Add a quirk allowing userspace to opt out of hypercall patching.
> 
> A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> intentional.
> 
> https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com

It's intentional, but the days of the patching part are over.

KVM should just call emulate_hypercall if called with the wrong opcode 
(which in turn can be quirked away).  That however would be more complex 
to implement because the hypercall path wants to e.g. inject a #UD with 
kvm_queue_exception().

All this makes me want to just apply Oliver's patch.

>> +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
>> +		ctxt->exception.error_code_valid = false;
>> +		ctxt->exception.vector = UD_VECTOR;
>> +		ctxt->have_exception = true;
>> +		return X86EMUL_PROPAGATE_FAULT;
> 
> This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> will also end up generating a #UD in most cases, but will play nice with
> KVM_CAP_EXIT_ON_EMULATION_FAILURE.

Hmm, not sure about that.  This is not an emulation failure in the sense 
that we don't know what to do.  We know that for this x86 vendor the 
right thing to do is to growl at the guest.

KVM_CAP_EXIT_ON_EMULATION_FAILURE would not have a way to ask KVM to 
invoke the hypercall, anyway, so it's not really possible for userspace 
to do the emulation.

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-24 17:57     ` Paolo Bonzini
@ 2022-03-24 19:05       ` Oliver Upton
  2022-03-25 23:53         ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2022-03-24 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier

On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> On 3/24/22 18:44, Sean Christopherson wrote:
> > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > both of these instructions really should #UD when executed on the wrong
> > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > guest's instruction with the appropriate instruction for the vendor.
> > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > do not patch in the appropriate instruction using alternatives, likely
> > > motivating KVM's intervention.
> > > 
> > > Add a quirk allowing userspace to opt out of hypercall patching.
> > 
> > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > intentional.
> > 
> > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com
> 
> It's intentional, but the days of the patching part are over.
> 
> KVM should just call emulate_hypercall if called with the wrong opcode
> (which in turn can be quirked away).  That however would be more complex to
> implement because the hypercall path wants to e.g. inject a #UD with
> kvm_queue_exception().
> 
> All this makes me want to just apply Oliver's patch.
> 
> > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > +		ctxt->exception.error_code_valid = false;
> > > +		ctxt->exception.vector = UD_VECTOR;
> > > +		ctxt->have_exception = true;
> > > +		return X86EMUL_PROPAGATE_FAULT;
> > 
> > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > will also end up generating a #UD in most cases, but will play nice with
> > KVM_CAP_EXIT_ON_EMULATION_FAILURE.

Sean and I were looking at this together right now, and it turns out
that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
bounce out to userspace.

x86_decode_emulated_instruction() would need to be the spot we bail to
guard these exits with the CAP.

> Hmm, not sure about that.  This is not an emulation failure in the sense
> that we don't know what to do.  We know that for this x86 vendor the right
> thing to do is to growl at the guest.
> 
> KVM_CAP_EXIT_ON_EMULATION_FAILURE would not have a way to ask KVM to invoke
> the hypercall, anyway, so it's not really possible for userspace to do the
> emulation.

Userspace could theoretically patch the hypercall itself and retry execution.
But IMO, userspace should just leave the quirk enabled and accept the default
KVM behavior if it still wants hypercall patching.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN
  2022-03-16  0:55 ` [PATCH 2/2] selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN Oliver Upton
@ 2022-03-24 19:09   ` Oliver Upton
  0 siblings, 0 replies; 15+ messages in thread
From: Oliver Upton @ 2022-03-24 19:09 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier

I realized there was some leftover debugging residue in this test, and
some blatantly obvious copy/pasting. I'll address in v2, but as an FYI:

On Wed, Mar 16, 2022 at 12:55:38AM +0000, Oliver Upton wrote:
> Add a test that asserts KVM rewrites guest hypercall instructions to
> match the running architecture (VMCALL on VMX, VMMCALL on SVM).
> Additionally, test that with the quirk disabled, KVM no longer rewrites
> guest instructions and instead injects a #UD.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../selftests/kvm/x86_64/fix_hypercall_test.c | 170 ++++++++++++++++++
>  3 files changed, 172 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 9b67343dc4ab..1f1b6c978bf7 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -15,6 +15,7 @@
>  /x86_64/debug_regs
>  /x86_64/evmcs_test
>  /x86_64/emulator_error_test
> +/x86_64/fix_hypercall_test
>  /x86_64/get_msr_index_features
>  /x86_64/kvm_clock_test
>  /x86_64/kvm_pv_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 6d69e196f1b7..c9cdbd248727 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -48,6 +48,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
>  TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
>  TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
>  TEST_GEN_PROGS_x86_64 += x86_64/emulator_error_test
> +TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
>  TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
> diff --git a/tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c b/tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c
> new file mode 100644
> index 000000000000..1f5c32146f3d
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/fix_hypercall_test.c
> @@ -0,0 +1,170 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020, Google LLC.
> + *
> + * Tests for KVM paravirtual feature disablement

Oops.

[...]

> +	case UCALL_SYNC:
> +		pr_info("%s: %016lx\n", (const char *)uc.args[2], uc.args[3]);
> +		break;

This was for debugging, there are no ucalls in the guest any more.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-24 19:05       ` Oliver Upton
@ 2022-03-25 23:53         ` Sean Christopherson
  2022-03-28 17:28           ` Oliver Upton
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2022-03-25 23:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Dunn, Peter Shier

On Thu, Mar 24, 2022, Oliver Upton wrote:
> On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> > On 3/24/22 18:44, Sean Christopherson wrote:
> > > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > > both of these instructions really should #UD when executed on the wrong
> > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > > guest's instruction with the appropriate instruction for the vendor.
> > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > > do not patch in the appropriate instruction using alternatives, likely
> > > > motivating KVM's intervention.
> > > > 
> > > > Add a quirk allowing userspace to opt out of hypercall patching.
> > > 
> > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > > intentional.
> > > 
> > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com
> > 
> > It's intentional, but the days of the patching part are over.
> > 
> > KVM should just call emulate_hypercall if called with the wrong opcode
> > (which in turn can be quirked away).  That however would be more complex to
> > implement because the hypercall path wants to e.g. inject a #UD with
> > kvm_queue_exception().
> > 
> > All this makes me want to just apply Oliver's patch.
> > 
> > > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > > +		ctxt->exception.error_code_valid = false;
> > > > +		ctxt->exception.vector = UD_VECTOR;
> > > > +		ctxt->have_exception = true;
> > > > +		return X86EMUL_PROPAGATE_FAULT;
> > > 
> > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > > will also end up generating a #UD in most cases, but will play nice with
> > > KVM_CAP_EXIT_ON_EMULATION_FAILURE.
> 
> Sean and I were looking at this together right now, and it turns out
> that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
> bounce out to userspace.
> 
> x86_decode_emulated_instruction() would need to be the spot we bail to
> guard these exits with the CAP.

I've spent waaay too much time thinking about this...

TL;DR: I'm in favor of applying the patch as-is.

My objection to manually injecting the #UD is that there's no guarantee that KVM
is actually handling a #UD.  It's largely a moot point, as KVM barfs on VMCALL/VMMCALL
if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is
emulating due to lack of unrestricted guest.  Forcing #UD is probably a net positive
in that case, as it will cause KVM to ultimately fail with "emulation error" and
bail to userspace.

It is possible to handle this in decode (idea below), but it will set us up for
pain later.  If KVM ever does gain support for truly emulating hypercall, then as
Paolo said, the question of whether to emulate the "wrong" hypercall is orthogonal
to whether or not to patch.  The correct way to check if the "wrong" hypercall
should be emulated would be to query the vCPU vendor via guest's CPUID, at which
point we _do_ want to get into em_hypercall() on UD to do the CPUID check even if
the quirk is disabled.  So I agree with Paolo, make it a quirk that guards the
patching, and document that because of KVM's deficiencies, that also happens to
mean that encountering the "wrong" hypercall is fatal to the guest.  Maybe fodder
for KVM's new vCPU errata? :-)  If we fix that erratum, then the quirk can be
modified to only skip the patching.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index be83c9c8482d..29abeaf605e2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5248,9 +5248,15 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int

        ctxt->execute = opcode.u.execute;

-       if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
-           likely(!(ctxt->d & EmulateOnUD)))
-               return EMULATION_FAILED;
+       if (unlikely(emulation_type & EMULTYPE_TRAP_UD)) {
+               if (likely(!(ctxt->d & EmulateOnUD)))
+                       return EMULATION_FAILED;
+
+               /* blah blah blah */
+               if (ctxt->execute == em_hypercall &&
+                   !ctxt->has_quirk_fix_hypercall)
+                       return EMULATION_FAILED;
+       }

        if (unlikely(ctxt->d &
            (NotImpl|Stack|Op3264|Sse|Mmx|Intercept|CheckPerm|NearBranch|
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 1cbd46cf71f9..35d2f20d368c 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -305,6 +305,8 @@ struct x86_emulate_ctxt {
        void *vcpu;
        const struct x86_emulate_ops *ops;

+       bool has_quirk_fix_hypercall;
+
        /* Register state before/after emulation. */
        unsigned long eflags;
        unsigned long eip; /* eip before instruction emulation */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 685c4bc453b4..832f85e72978 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7915,6 +7915,8 @@ static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu)

        ctxt->vcpu = vcpu;
        ctxt->ops = &emulate_ops;
+       ctxt->has_quirk_fix_hypercall =
+               kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN);
        vcpu->arch.emulate_ctxt = ctxt;

        return ctxt;
@@ -9291,16 +9293,8 @@ static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
        char instruction[3];
        unsigned long rip = kvm_rip_read(vcpu);

-       /*
-        * If the quirk is disabled, synthesize a #UD and let the guest pick up
-        * the pieces.
-        */
-       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
-               ctxt->exception.error_code_valid = false;
-               ctxt->exception.vector = UD_VECTOR;
-               ctxt->have_exception = true;
-               return X86EMUL_PROPAGATE_FAULT;
-       }
+       if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN))
+               return X86EMUL_CONTINUE;

        static_call(kvm_x86_patch_hypercall)(vcpu, instruction);



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-25 23:53         ` Sean Christopherson
@ 2022-03-28 17:28           ` Oliver Upton
  2022-03-28 18:28             ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Upton @ 2022-03-28 17:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Dunn, Peter Shier

On Fri, Mar 25, 2022 at 11:53:05PM +0000, Sean Christopherson wrote:
> On Thu, Mar 24, 2022, Oliver Upton wrote:
> > On Thu, Mar 24, 2022 at 06:57:18PM +0100, Paolo Bonzini wrote:
> > > On 3/24/22 18:44, Sean Christopherson wrote:
> > > > On Wed, Mar 16, 2022, Oliver Upton wrote:
> > > > > KVM handles the VMCALL/VMMCALL instructions very strangely. Even though
> > > > > both of these instructions really should #UD when executed on the wrong
> > > > > vendor's hardware (i.e. VMCALL on SVM, VMMCALL on VMX), KVM replaces the
> > > > > guest's instruction with the appropriate instruction for the vendor.
> > > > > Nonetheless, older guest kernels without commit c1118b3602c2 ("x86: kvm:
> > > > > use alternatives for VMCALL vs. VMMCALL if kernel text is read-only")
> > > > > do not patch in the appropriate instruction using alternatives, likely
> > > > > motivating KVM's intervention.
> > > > > 
> > > > > Add a quirk allowing userspace to opt out of hypercall patching.
> > > > 
> > > > A quirk may not be appropriate, per Paolo, the whole cross-vendor thing is
> > > > intentional.
> > > > 
> > > > https://lore.kernel.org/all/20211210222903.3417968-1-seanjc@google.com
> > > 
> > > It's intentional, but the days of the patching part are over.
> > > 
> > > KVM should just call emulate_hypercall if called with the wrong opcode
> > > (which in turn can be quirked away).  That however would be more complex to
> > > implement because the hypercall path wants to e.g. inject a #UD with
> > > kvm_queue_exception().
> > > 
> > > All this makes me want to just apply Oliver's patch.
> > > 
> > > > > +	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_FIX_HYPERCALL_INSN)) {
> > > > > +		ctxt->exception.error_code_valid = false;
> > > > > +		ctxt->exception.vector = UD_VECTOR;
> > > > > +		ctxt->have_exception = true;
> > > > > +		return X86EMUL_PROPAGATE_FAULT;
> > > > 
> > > > This should return X86EMUL_UNHANDLEABLE instead of manually injecting a #UD.  That
> > > > will also end up generating a #UD in most cases, but will play nice with
> > > > KVM_CAP_EXIT_ON_EMULATION_FAILURE.
> > 
> > Sean and I were looking at this together right now, and it turns out
> > that returning X86EMUL_UNHANDLEABLE at this point will unconditionally
> > bounce out to userspace.
> > 
> > x86_decode_emulated_instruction() would need to be the spot we bail to
> > guard these exits with the CAP.
> 
> I've spent waaay too much time thinking about this...
> 
> TL;DR: I'm in favor of applying the patch as-is.
> 
> My objection to manually injecting the #UD is that there's no guarantee that KVM
> is actually handling a #UD.  It's largely a moot point, as KVM barfs on VMCALL/VMMCALL
> if it's _not_ from a #UD, e.g. KVM hangs the guest if it does a VMCALL when KVM is
> emulating due to lack of unrestricted guest.  Forcing #UD is probably a net positive
> in that case, as it will cause KVM to ultimately fail with "emulation error" and
> bail to userspace.
> 
> It is possible to handle this in decode (idea below), but it will set us up for
> pain later.  If KVM ever does gain support for truly emulating hypercall

There was another annoyance that motivated me to sidestep emulation
altogether.

'Correct' emulation (or whatever we decide to call what KVM does) of the hypercall
instruction would require that we actually inform the emulator about nested for
both vendor calls. And by that I mean both {svm,vmx}_check_intercept would need
to correctly handle both VMCALL/VMMCALL. The one nice thing about hypercall
patching is that we could keep L1 oblivious, as we would have already rewritten
the instruction before reflecting the exit to L1.

While I was looking at #UD under nested for this issue, I noticed:

Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
on #UD, even though it is possible that L0 was emulating an instruction not
present in hardware (like RDPID). If L1 passed through RDPID the #UD
should not be reflected to L1. I believe this would require that we make
the emulator aware of nVMX which sounds like a science project on its
own.

Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-28 17:28           ` Oliver Upton
@ 2022-03-28 18:28             ` Sean Christopherson
  2022-08-24  9:34               ` Maxim Levitsky
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2022-03-28 18:28 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Dunn, Peter Shier

On Mon, Mar 28, 2022, Oliver Upton wrote:
> While I was looking at #UD under nested for this issue, I noticed:
> 
> Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> on #UD, even though it is possible that L0 was emulating an instruction not
> present in hardware (like RDPID). If L1 passed through RDPID the #UD
> should not be reflected to L1.

Yes, it's a known bug.

> I believe this would require that we make the emulator aware of nVMX which
> sounds like a science project on its own.

I don't think it would require any new awareness in the emulator proper, KVM
would "just" need to ensure it properly morphs the resulting reflected #UD to a
nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
Just Work...

> Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)

I don't think we write it off entirely, but it's definitely on the backburner
because there are so precious few cases where KVM emulates on #UD.  And for good
reason, e.g. the RDPID case takes an instruction that exists purely to optimize
certain flows and turns them into dreadfully sloooow paths.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-03-28 18:28             ` Sean Christopherson
@ 2022-08-24  9:34               ` Maxim Levitsky
  2022-08-24 14:43                 ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Maxim Levitsky @ 2022-08-24  9:34 UTC (permalink / raw)
  To: Sean Christopherson, Oliver Upton
  Cc: Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Dunn, Peter Shier

On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote:
> On Mon, Mar 28, 2022, Oliver Upton wrote:
> > While I was looking at #UD under nested for this issue, I noticed:
> > 
> > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> > on #UD, even though it is possible that L0 was emulating an instruction not
> > present in hardware (like RDPID). If L1 passed through RDPID the #UD
> > should not be reflected to L1.
> 
> Yes, it's a known bug.
> 
> > I believe this would require that we make the emulator aware of nVMX which
> > sounds like a science project on its own.
> 
> I don't think it would require any new awareness in the emulator proper, KVM
> would "just" need to ensure it properly morphs the resulting reflected #UD to a
> nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
> Just Work...
> 
> > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)
> 
> I don't think we write it off entirely, but it's definitely on the backburner
> because there are so precious few cases where KVM emulates on #UD.  And for good
> reason, e.g. the RDPID case takes an instruction that exists purely to optimize
> certain flows and turns them into dreadfully sloooow paths.
> 

I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
that L0 patches the hypercall before L1 sees it so it can't really do anything
about it.

Do you think we can always stop patching hypercalls for the nested guest regardless
of the quirk, or that too will be considered breaking backwards compatability?

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-08-24  9:34               ` Maxim Levitsky
@ 2022-08-24 14:43                 ` Sean Christopherson
  2022-08-24 15:06                   ` Maxim Levitsky
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2022-08-24 14:43 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Oliver Upton, Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier

On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote:
> > On Mon, Mar 28, 2022, Oliver Upton wrote:
> > > While I was looking at #UD under nested for this issue, I noticed:
> > > 
> > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> > > on #UD, even though it is possible that L0 was emulating an instruction not
> > > present in hardware (like RDPID). If L1 passed through RDPID the #UD
> > > should not be reflected to L1.
> > 
> > Yes, it's a known bug.
> > 
> > > I believe this would require that we make the emulator aware of nVMX which
> > > sounds like a science project on its own.
> > 
> > I don't think it would require any new awareness in the emulator proper, KVM
> > would "just" need to ensure it properly morphs the resulting reflected #UD to a
> > nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
> > Just Work...
> > 
> > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)
> > 
> > I don't think we write it off entirely, but it's definitely on the backburner
> > because there are so precious few cases where KVM emulates on #UD.  And for good
> > reason, e.g. the RDPID case takes an instruction that exists purely to optimize
> > certain flows and turns them into dreadfully sloooow paths.
> > 
> 
> I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
> that L0 patches the hypercall before L1 sees it so it can't really do anything
> about it.
> 
> Do you think we can always stop patching hypercalls for the nested guest regardless
> of the quirk, or that too will be considered breaking backwards compatability?

Heh, go run it on Intel, problem solved ;-)

As discussed last year[*], it's impossible to get this right in all cases, ignoring
the fact that patching in the first place is arguably wrong.  E.g. if KVM is running
on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's
responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL
in L2 is spurious.

Regardless of what path we take, I do think we should align VMX and SVM on exception
intercept behavior.

[*] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-08-24 14:43                 ` Sean Christopherson
@ 2022-08-24 15:06                   ` Maxim Levitsky
  2022-08-24 17:15                     ` Paolo Bonzini
  2022-08-24 18:40                     ` Sean Christopherson
  0 siblings, 2 replies; 15+ messages in thread
From: Maxim Levitsky @ 2022-08-24 15:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier

On Wed, 2022-08-24 at 14:43 +0000, Sean Christopherson wrote:
> On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> > On Mon, 2022-03-28 at 18:28 +0000, Sean Christopherson wrote:
> > > On Mon, Mar 28, 2022, Oliver Upton wrote:
> > > > While I was looking at #UD under nested for this issue, I noticed:
> > > > 
> > > > Isn't there a subtle inversion on #UD intercepts for nVMX? L1 gets first dibs
> > > > on #UD, even though it is possible that L0 was emulating an instruction not
> > > > present in hardware (like RDPID). If L1 passed through RDPID the #UD
> > > > should not be reflected to L1.
> > > 
> > > Yes, it's a known bug.
> > > 
> > > > I believe this would require that we make the emulator aware of nVMX which
> > > > sounds like a science project on its own.
> > > 
> > > I don't think it would require any new awareness in the emulator proper, KVM
> > > would "just" need to ensure it properly morphs the resulting reflected #UD to a
> > > nested VM-Exit if the emulator doesn't "handle" the #UD.  In theory, that should
> > > Just Work...
> > > 
> > > > Do we write this off as another erratum of KVM's (virtual) hardware on VMX? :)
> > > 
> > > I don't think we write it off entirely, but it's definitely on the backburner
> > > because there are so precious few cases where KVM emulates on #UD.  And for good
> > > reason, e.g. the RDPID case takes an instruction that exists purely to optimize
> > > certain flows and turns them into dreadfully sloooow paths.
> > > 
> > 
> > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
> > that L0 patches the hypercall before L1 sees it so it can't really do anything
> > about it.
> > 
> > Do you think we can always stop patching hypercalls for the nested guest regardless
> > of the quirk, or that too will be considered breaking backwards compatability?
> 
> Heh, go run it on Intel, problem solved ;-)
> 
> As discussed last year[*], it's impossible to get this right in all cases, ignoring
> the fact that patching in the first place is arguably wrong.  E.g. if KVM is running
> on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's
> responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL
> in L2 is spurious.
> 
> Regardless of what path we take, I do think we should align VMX and SVM on exception
> intercept behavior.

Maybe then we should at least skip the unit test if running nested (should be easy to check the hypervisor
cpuid)?

Oh well, I do understand you that the whole 'patching' thing is one big mess :(

I wonder how hard it will be to ask Qemu to disable this quirk....

Best regards,
	Maxim Levitsky

> 
> [*] https://lore.kernel.org/all/YEZUhbBtNjWh0Zka@google.com
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-08-24 15:06                   ` Maxim Levitsky
@ 2022-08-24 17:15                     ` Paolo Bonzini
  2022-08-24 18:40                     ` Sean Christopherson
  1 sibling, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2022-08-24 17:15 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: Oliver Upton, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, David Dunn, Peter Shier

On 8/24/22 17:06, Maxim Levitsky wrote:
> I wonder how hard it will be to ask Qemu to disable this quirk....

I think we should just do it.  Send a patch. :)

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Allow userspace to opt out of hypercall patching
  2022-08-24 15:06                   ` Maxim Levitsky
  2022-08-24 17:15                     ` Paolo Bonzini
@ 2022-08-24 18:40                     ` Sean Christopherson
  1 sibling, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2022-08-24 18:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Oliver Upton, Paolo Bonzini, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, David Dunn, Peter Shier

On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-24 at 14:43 +0000, Sean Christopherson wrote:
> > On Wed, Aug 24, 2022, Maxim Levitsky wrote:
> > > I noticed that 'fix_hypercall_test' selftest fails if run in a VM. The reason is
> > > that L0 patches the hypercall before L1 sees it so it can't really do anything
> > > about it.
> > > 
> > > Do you think we can always stop patching hypercalls for the nested guest regardless
> > > of the quirk, or that too will be considered breaking backwards compatability?
> > 
> > Heh, go run it on Intel, problem solved ;-)
> > 
> > As discussed last year[*], it's impossible to get this right in all cases, ignoring
> > the fact that patching in the first place is arguably wrong.  E.g. if KVM is running
> > on AMD hardware and L0 exposes an Intel vCPU to L1, then it sadly becomes KVM's
> > responsibility to patch L2 because from L1's perspective, a #UD on Intel's VMCALL
> > in L2 is spurious.
> > 
> > Regardless of what path we take, I do think we should align VMX and SVM on exception
> > intercept behavior.
> 
> Maybe then we should at least skip the unit test if running nested (should be
> easy to check the hypervisor cpuid)?

My preference is to keep the test as is.  It's not completely useless in a VM,
e.g. Intel works (currently), non-KVM VMs may or may not work, and VMMs that disable
the quirk in L0 will also work.

max_guest_memory_test is in a similar boat.  Running that in L1 is not recommended
as KVM's shadow paging just can't keep up.  But that doesn't mean that the test
should _never_ be run in L1.

If we have the test skip itself, then opting in requires a code change.  On the
other hand, skipping the test in whatever script is being used to run selftests
is easy enough, e.g. `grep hypervisor /proc/cpuinfo`.  IMO running test via `make`
is a complete mess and should be avoided anyways :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-08-24 18:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16  0:55 [PATCH 0/2] KVM: x86: Allow opt out of guest hypercall patching Oliver Upton
2022-03-16  0:55 ` [PATCH 1/2] KVM: x86: Allow userspace to opt out of " Oliver Upton
2022-03-24 17:44   ` Sean Christopherson
2022-03-24 17:57     ` Paolo Bonzini
2022-03-24 19:05       ` Oliver Upton
2022-03-25 23:53         ` Sean Christopherson
2022-03-28 17:28           ` Oliver Upton
2022-03-28 18:28             ` Sean Christopherson
2022-08-24  9:34               ` Maxim Levitsky
2022-08-24 14:43                 ` Sean Christopherson
2022-08-24 15:06                   ` Maxim Levitsky
2022-08-24 17:15                     ` Paolo Bonzini
2022-08-24 18:40                     ` Sean Christopherson
2022-03-16  0:55 ` [PATCH 2/2] selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN Oliver Upton
2022-03-24 19:09   ` Oliver Upton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.