KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
@ 2021-04-16 13:18 Aaron Lewis
  2021-04-16 13:18 ` [PATCH 2/2] selftests: kvm: Allows " Aaron Lewis
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Aaron Lewis @ 2021-04-16 13:18 UTC (permalink / raw)
  To: david.edmondson; +Cc: jmattson, seanjc, kvm, Aaron Lewis

Add a fallback mechanism to the in-kernel instruction emulator that
allows userspace the opportunity to process an instruction the emulator
was unable to.  When the in-kernel instruction emulator fails to process
an instruction it will either inject a #UD into the guest or exit to
userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
not know how to proceed in an appropriate manner.  This feature lets
userspace get involved to see if it can figure out a better path
forward.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Change-Id: If9876bc73d26f6c3ff9a8bce177c2fc6f160e629
---
 Documentation/virt/kvm/api.rst  | 16 ++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  6 ++++++
 arch/x86/kvm/x86.c              | 33 +++++++++++++++++++++++++++++----
 include/uapi/linux/kvm.h        | 20 ++++++++++++++++++++
 tools/include/uapi/linux/kvm.h  | 20 ++++++++++++++++++++
 5 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 307f2fcf1b02..f8278e893fbe 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6233,6 +6233,22 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
 This capability can be used to check / enable 2nd DAWR feature provided
 by POWER10 processor.
 
+7.24 KVM_CAP_EXIT_ON_EMULATION_FAILURE
+--------------------------------------
+
+:Architectures: x86
+:Parameters: args[0] whether the feature should be enabled or not
+
+With this capability enabled, the in-kernel instruction emulator packs the exit
+struct of KVM_INTERNAL_ERROR with the instruction length and instruction bytes
+when an error occurs while emulating an instruction.  This allows userspace to
+then take a look at the instruction and see if it is able to handle it more
+gracefully than the in-kernel emulator.
+
+When this capability is enabled use the emulation_failure struct instead of the
+internal struct for the exit struct.  They have the same layout, but the
+emulation_failure struct matches the content better.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3768819693e5..07235d08e976 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1049,6 +1049,12 @@ struct kvm_arch {
 	bool exception_payload_enabled;
 
 	bool bus_lock_detection_enabled;
+	/*
+	 * If exit_on_emulation_error is set, and the in-kernel instruction
+	 * emulator fails to emulate an instruction, allow userspace
+	 * the opportunity to look at it.
+	 */
+	bool exit_on_emulation_error;
 
 	/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
 	u32 user_space_msr_mask;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eca63625aee4..f9a207f815fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3771,6 +3771,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_X86_USER_SPACE_MSR:
 	case KVM_CAP_X86_MSR_FILTER:
 	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
+	case KVM_CAP_EXIT_ON_EMULATION_FAILURE:
 		r = 1;
 		break;
 #ifdef CONFIG_KVM_XEN
@@ -5357,6 +5358,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			kvm->arch.bus_lock_detection_enabled = true;
 		r = 0;
 		break;
+	case KVM_CAP_EXIT_ON_EMULATION_FAILURE:
+		kvm->arch.exit_on_emulation_error = cap->args[0];
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -7119,8 +7124,29 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
 
+static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
+{
+	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+	u64 insn_size = ctxt->fetch.end - ctxt->fetch.data;
+	struct kvm *kvm = vcpu->kvm;
+
+	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+	vcpu->run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
+	vcpu->run->emulation_failure.ndata = 0;
+	if (kvm->arch.exit_on_emulation_error && insn_size > 0) {
+		vcpu->run->emulation_failure.ndata = 3;
+		vcpu->run->emulation_failure.flags =
+			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
+		vcpu->run->emulation_failure.insn_size = insn_size;
+		memcpy(vcpu->run->emulation_failure.insn_bytes,
+		       ctxt->fetch.data, sizeof(ctxt->fetch.data));
+	}
+}
+
 static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 {
+	struct kvm *kvm = vcpu->kvm;
+
 	++vcpu->stat.insn_emulation_fail;
 	trace_kvm_emulate_insn_failed(vcpu);
 
@@ -7129,10 +7155,9 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 		return 1;
 	}
 
-	if (emulation_type & EMULTYPE_SKIP) {
-		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-		vcpu->run->internal.ndata = 0;
+	if (kvm->arch.exit_on_emulation_error ||
+	    (emulation_type & EMULTYPE_SKIP)) {
+		prepare_emulation_failure_exit(vcpu);
 		return 0;
 	}
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..7c77099235b2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -279,6 +279,17 @@ struct kvm_xen_exit {
 /* Encounter unexpected vm-exit reason */
 #define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON	4
 
+/*
+ * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
+ * to describe what is contained in the exit struct.  The flags are used to
+ * describe it's contents, and the contents should be in ascending numerical
+ * order of the flag values.  For example, if the flag
+ * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
+ * length and instruction bytes would be expected to show up first because this
+ * flag has the lowest numerical value (1) of all the other flags.
+ */
+#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
+
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
 	/* in */
@@ -382,6 +393,14 @@ struct kvm_run {
 			__u32 ndata;
 			__u64 data[16];
 		} internal;
+		/* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
+		struct {
+			__u32 suberror;
+			__u32 ndata;
+			__u64 flags;
+			__u8  insn_size;
+			__u8  insn_bytes[15];
+		} emulation_failure;
 		/* KVM_EXIT_OSI */
 		struct {
 			__u64 gprs[32];
@@ -1078,6 +1097,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 195
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index f6afee209620..7c77099235b2 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -279,6 +279,17 @@ struct kvm_xen_exit {
 /* Encounter unexpected vm-exit reason */
 #define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON	4
 
+/*
+ * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
+ * to describe what is contained in the exit struct.  The flags are used to
+ * describe it's contents, and the contents should be in ascending numerical
+ * order of the flag values.  For example, if the flag
+ * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
+ * length and instruction bytes would be expected to show up first because this
+ * flag has the lowest numerical value (1) of all the other flags.
+ */
+#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
+
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
 	/* in */
@@ -382,6 +393,14 @@ struct kvm_run {
 			__u32 ndata;
 			__u64 data[16];
 		} internal;
+		/* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
+		struct {
+			__u32 suberror;
+			__u32 ndata;
+			__u64 flags;
+			__u8  insn_size;
+			__u8  insn_bytes[15];
+		} emulation_failure;
 		/* KVM_EXIT_OSI */
 		struct {
 			__u64 gprs[32];
@@ -1078,6 +1097,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 195
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/2] selftests: kvm: Allows userspace to handle emulation errors.
  2021-04-16 13:18 [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors Aaron Lewis
@ 2021-04-16 13:18 ` Aaron Lewis
  2021-04-19 12:41 ` [PATCH 1/2] kvm: x86: Allow " David Edmondson
  2021-04-20 18:34 ` Sean Christopherson
  2 siblings, 0 replies; 16+ messages in thread
From: Aaron Lewis @ 2021-04-16 13:18 UTC (permalink / raw)
  To: david.edmondson; +Cc: jmattson, seanjc, kvm, Aaron Lewis

This test exercises the feature KVM_CAP_EXIT_ON_EMULATION_FAILURE.  When
enabled, errors in the in-kernel instruction emulator are forwarded to
userspace with the instruction bytes stored in the exit struct for
KVM_EXIT_INTERNAL_ERROR.  So, when the guest attempts to emulate an
'flds' instruction, which isn't able to be emulated in KVM, instead
of failing, KVM sends the instruction to userspace to handle.

For this test to work properly the module parameter
'allow_smaller_maxphyaddr' has to be set.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Change-Id: I859e93ac344eaf433e56ee7298b09917b8cb0c35
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |   3 +
 .../selftests/kvm/lib/x86_64/processor.c      |  58 +++++
 .../kvm/x86_64/emulator_error_test.c          | 209 ++++++++++++++++++
 5 files changed, 272 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/emulator_error_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 7bd7e776c266..ec9e20a2f752 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -7,6 +7,7 @@
 /x86_64/cr4_cpuid_sync_test
 /x86_64/debug_regs
 /x86_64/evmcs_test
+/x86_64/emulator_error_test
 /x86_64/get_cpuid_test
 /x86_64/get_msr_index_features
 /x86_64/kvm_pv_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 67eebb53235f..5ff705d92d02 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -41,6 +41,7 @@ LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_ha
 TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
 TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
 TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
+TEST_GEN_PROGS_x86_64 += x86_64/emulator_error_test
 TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0b30b4e15c38..40f70f91e6a2 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -394,6 +394,9 @@ void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid);
 void vm_handle_exception(struct kvm_vm *vm, int vector,
 			void (*handler)(struct ex_regs *));
 
+uint64_t vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
+void vm_set_page_table_entry(struct kvm_vm *vm, uint64_t vaddr, uint64_t mask);
+
 /*
  * set_cpuid() - overwrites a matching cpuid entry with the provided value.
  *		 matches based on ent->function && ent->index. returns true
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..1aa4fc5e84c6 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -292,6 +292,64 @@ void virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 	pte[index[0]].present = 1;
 }
 
+static struct pageTableEntry *_vm_get_page_table_entry(struct kvm_vm *vm,
+						       uint64_t vaddr)
+{
+	uint16_t index[4];
+	struct pageMapL4Entry *pml4e;
+	struct pageDirectoryPointerEntry *pdpe;
+	struct pageDirectoryEntry *pde;
+	struct pageTableEntry *pte;
+
+	TEST_ASSERT(vm->mode == VM_MODE_PXXV48_4K, "Attempt to use "
+		"unknown or unsupported guest mode, mode: 0x%x", vm->mode);
+	TEST_ASSERT((vaddr % vm->page_size) == 0,
+		"Virtual address not on page boundary,\n"
+		"  vaddr: 0x%lx vm->page_size: 0x%x",
+		vaddr, vm->page_size);
+	TEST_ASSERT(sparsebit_is_set(vm->vpages_valid,
+		(vaddr >> vm->page_shift)),
+		"Invalid virtual address, vaddr: 0x%lx",
+		vaddr);
+
+	index[0] = (vaddr >> 12) & 0x1ffu;
+	index[1] = (vaddr >> 21) & 0x1ffu;
+	index[2] = (vaddr >> 30) & 0x1ffu;
+	index[3] = (vaddr >> 39) & 0x1ffu;
+
+	pml4e = addr_gpa2hva(vm, vm->pgd);
+	TEST_ASSERT(pml4e[index[3]].present,
+		"Expected pml4e to be present for gva: 0x%08lx", vaddr);
+
+	pdpe = addr_gpa2hva(vm, pml4e[index[3]].address * vm->page_size);
+	TEST_ASSERT(pdpe[index[2]].present,
+		"Expected pdpe to be present for gva: 0x%08lx", vaddr);
+
+	pde = addr_gpa2hva(vm, pdpe[index[2]].address * vm->page_size);
+	TEST_ASSERT(pde[index[1]].present,
+		"Expected pde to be present for gva: 0x%08lx", vaddr);
+
+	pte = addr_gpa2hva(vm, pde[index[1]].address * vm->page_size);
+	TEST_ASSERT(pte[index[0]].present,
+		"Expected pte to be present for gva: 0x%08lx", vaddr);
+
+	return &pte[index[0]];
+}
+
+uint64_t vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
+{
+	struct pageTableEntry *pte = _vm_get_page_table_entry(vm, vaddr);
+
+	return *(uint64_t *)pte;
+}
+
+void vm_set_page_table_entry(struct kvm_vm *vm, uint64_t vaddr, uint64_t mask)
+{
+	struct pageTableEntry *pte = _vm_get_page_table_entry(vm, vaddr);
+
+	*(uint64_t *)pte |= mask;
+}
+
 void virt_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
 {
 	struct pageMapL4Entry *pml4e, *pml4e_start;
diff --git a/tools/testing/selftests/kvm/x86_64/emulator_error_test.c b/tools/testing/selftests/kvm/x86_64/emulator_error_test.c
new file mode 100644
index 000000000000..825da57ad791
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/emulator_error_test.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020, Google LLC.
+ *
+ * Tests for KVM_CAP_EXIT_ON_EMULATION_FAILURE capability.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "vmx.h"
+
+#define VCPU_ID	   1
+#define PAGE_SIZE  4096
+#define MAXPHYADDR 36
+
+#define MEM_REGION_GVA	0x0000123456789000
+#define MEM_REGION_GPA	0x0000000700000000
+#define MEM_REGION_SLOT	10
+#define MEM_REGION_SIZE PAGE_SIZE
+
+extern char fld_start, fld_end;
+
+static void guest_code(void)
+{
+	__asm__ __volatile__("fld_start: flds (%[addr]); fld_end:"
+			     :: [addr]"r"(MEM_REGION_GVA));
+
+	GUEST_DONE();
+}
+
+static void run_guest(struct kvm_vm *vm)
+{
+	int rc;
+
+	rc = _vcpu_run(vm, VCPU_ID);
+	TEST_ASSERT(rc == 0, "vcpu_run failed: %d\n", rc);
+}
+
+/*
+ * Accessors to get R/M, REG, and Mod bits described in the SDM vol 2,
+ * figure 2-2 "Table Interpretation of ModR/M Byte (C8H)".
+ */
+#define GET_RM(insn_byte) (insn_byte & 0x7)
+#define GET_REG(insn_byte) ((insn_byte & 0x38) >> 3)
+#define GET_MOD(insn_byte) ((insn_byte & 0xc) >> 6)
+
+bool is_flds(uint8_t *insn_bytes, uint8_t insn_size)
+{
+	return insn_size >= 2 &&
+	       insn_bytes[0] == 0xd9 &&
+	       GET_REG(insn_bytes[1]) == 0x0;
+}
+
+static void process_exit_on_emulation_error(struct kvm_vm *vm)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct kvm_regs regs;
+	uint8_t *insn_bytes;
+	uint8_t insn_size;
+	uint64_t flags;
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_INTERNAL_ERROR,
+		    "Unexpected exit reason: %u (%s)",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+
+	TEST_ASSERT(run->emulation_failure.suberror == KVM_INTERNAL_ERROR_EMULATION,
+		    "Unexpected suberror: %u",
+		    run->emulation_failure.suberror);
+
+	if (run->emulation_failure.ndata >= 1) {
+		flags = run->emulation_failure.flags;
+		if ((flags & KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES) &&
+		    run->emulation_failure.ndata >= 3) {
+			insn_size = run->emulation_failure.insn_size;
+			insn_bytes = run->emulation_failure.insn_bytes;
+
+			TEST_ASSERT(insn_size <= 15 && insn_size > 0,
+				    "Unexpected instruction size: %u",
+				    insn_size);
+
+			TEST_ASSERT(is_flds(insn_bytes, insn_size),
+				    "Unexpected instruction.  Expected 'flds' (0xd9 /0), encountered (0x%x /%u)",
+				    insn_bytes[0], (insn_size >= 2) ? GET_REG(insn_bytes[1]) : 0);
+
+			vcpu_regs_get(vm, VCPU_ID, &regs);
+			regs.rip += (uintptr_t)(&fld_end) - (uintptr_t)(&fld_start);
+			vcpu_regs_set(vm, VCPU_ID, &regs);
+		}
+	}
+}
+
+static void do_guest_assert(struct kvm_vm *vm, struct ucall *uc)
+{
+	TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0], __FILE__,
+		  uc->args[1]);
+}
+
+static void check_for_guest_assert(struct kvm_vm *vm)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	if (run->exit_reason == KVM_EXIT_IO &&
+	    get_ucall(vm, VCPU_ID, &uc) == UCALL_ABORT) {
+		do_guest_assert(vm, &uc);
+	}
+}
+
+static void process_ucall_done(struct kvm_vm *vm)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	check_for_guest_assert(vm);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "Unexpected exit reason: %u (%s)",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+
+	TEST_ASSERT(get_ucall(vm, VCPU_ID, &uc) == UCALL_DONE,
+		    "Unexpected ucall command: %lu, expected UCALL_DONE (%d)",
+		    uc.cmd, UCALL_DONE);
+}
+
+static uint64_t process_ucall(struct kvm_vm *vm)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "Unexpected exit reason: %u (%s)",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_SYNC:
+		break;
+	case UCALL_ABORT:
+		do_guest_assert(vm, &uc);
+		break;
+	case UCALL_DONE:
+		process_ucall_done(vm);
+		break;
+	default:
+		TEST_ASSERT(false, "Unexpected ucall");
+	}
+
+	return uc.cmd;
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_enable_cap emul_failure_cap = {
+		.cap = KVM_CAP_EXIT_ON_EMULATION_FAILURE,
+		.args[0] = 1,
+	};
+	struct kvm_cpuid_entry2 *entry;
+	struct kvm_cpuid2 *cpuid;
+	struct kvm_vm *vm;
+	uint64_t *hva;
+	uint64_t gpa;
+	int rc;
+
+	/* Tell stdout not to buffer its content */
+	setbuf(stdout, NULL);
+
+	vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+	cpuid = kvm_get_supported_cpuid();
+
+	entry = kvm_get_supported_cpuid_index(0x80000008, 0);
+	if (entry) {
+		entry->eax = (entry->eax & 0xffffff00) | MAXPHYADDR;
+		set_cpuid(cpuid, entry);
+	}
+
+	vcpu_set_cpuid(vm, VCPU_ID, cpuid);
+
+	entry = kvm_get_supported_cpuid_index(0x80000008, 0);
+
+	rc = kvm_check_cap(KVM_CAP_EXIT_ON_EMULATION_FAILURE);
+	TEST_ASSERT(rc, "KVM_CAP_EXIT_ON_EMULATION_FAILURE is unavailable");
+	vm_enable_cap(vm, &emul_failure_cap);
+
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+				    MEM_REGION_GPA, MEM_REGION_SLOT,
+				    MEM_REGION_SIZE / PAGE_SIZE, 0);
+	gpa = vm_phy_pages_alloc(vm, MEM_REGION_SIZE / PAGE_SIZE,
+				 MEM_REGION_GPA, MEM_REGION_SLOT);
+	TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc\n");
+	virt_map(vm, MEM_REGION_GVA, MEM_REGION_GPA, 1, 0);
+	hva = addr_gpa2hva(vm, MEM_REGION_GPA);
+	memset(hva, 0, PAGE_SIZE);
+	vm_set_page_table_entry(vm, MEM_REGION_GVA, 1ull << 36);
+
+	run_guest(vm);
+	process_exit_on_emulation_error(vm);
+	run_guest(vm);
+
+	TEST_ASSERT(process_ucall(vm) == UCALL_DONE, "Expected UCALL_DONE");
+
+	kvm_vm_free(vm);
+
+	return 0;
+}
-- 
2.31.1.368.gbe11c130af-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-16 13:18 [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors Aaron Lewis
  2021-04-16 13:18 ` [PATCH 2/2] selftests: kvm: Allows " Aaron Lewis
@ 2021-04-19 12:41 ` David Edmondson
  2021-04-19 16:47   ` Aaron Lewis
  2021-04-20 18:34 ` Sean Christopherson
  2 siblings, 1 reply; 16+ messages in thread
From: David Edmondson @ 2021-04-19 12:41 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: jmattson, seanjc, kvm, Aaron Lewis

Thanks for sending the patches.

On Friday, 2021-04-16 at 06:18:19 -07, Aaron Lewis wrote:

> Add a fallback mechanism to the in-kernel instruction emulator that
> allows userspace the opportunity to process an instruction the emulator
> was unable to.  When the in-kernel instruction emulator fails to process
> an instruction it will either inject a #UD into the guest or exit to
> userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
> not know how to proceed in an appropriate manner.  This feature lets
> userspace get involved to see if it can figure out a better path
> forward.

Given that you are intending to try and handle the instruction in
user-space, it seems a little odd to overload the
KVM_EXIT_INTERNAL_ERROR/KVM_INTERNAL_ERROR_EMULATION exit reason/sub
error.

Why not add a new exit reason, particularly given that the caller has to
enable the capability to get the relevant data? (It would also remove
the need for the flag field and any mechanism for packing multiple bits
of detail into the structure.)

>
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Change-Id: If9876bc73d26f6c3ff9a8bce177c2fc6f160e629
> ---
>  Documentation/virt/kvm/api.rst  | 16 ++++++++++++++++
>  arch/x86/include/asm/kvm_host.h |  6 ++++++
>  arch/x86/kvm/x86.c              | 33 +++++++++++++++++++++++++++++----
>  include/uapi/linux/kvm.h        | 20 ++++++++++++++++++++
>  tools/include/uapi/linux/kvm.h  | 20 ++++++++++++++++++++
>  5 files changed, 91 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 307f2fcf1b02..f8278e893fbe 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6233,6 +6233,22 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them.
>  This capability can be used to check / enable 2nd DAWR feature provided
>  by POWER10 processor.
>  
> +7.24 KVM_CAP_EXIT_ON_EMULATION_FAILURE
> +--------------------------------------
> +
> +:Architectures: x86
> +:Parameters: args[0] whether the feature should be enabled or not
> +
> +With this capability enabled, the in-kernel instruction emulator packs the exit
> +struct of KVM_INTERNAL_ERROR with the instruction length and instruction bytes
> +when an error occurs while emulating an instruction.  This allows userspace to
> +then take a look at the instruction and see if it is able to handle it more
> +gracefully than the in-kernel emulator.
> +
> +When this capability is enabled use the emulation_failure struct instead of the
> +internal struct for the exit struct.  They have the same layout, but the
> +emulation_failure struct matches the content better.
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3768819693e5..07235d08e976 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1049,6 +1049,12 @@ struct kvm_arch {
>  	bool exception_payload_enabled;
>  
>  	bool bus_lock_detection_enabled;
> +	/*
> +	 * If exit_on_emulation_error is set, and the in-kernel instruction
> +	 * emulator fails to emulate an instruction, allow userspace
> +	 * the opportunity to look at it.
> +	 */
> +	bool exit_on_emulation_error;
>  
>  	/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
>  	u32 user_space_msr_mask;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eca63625aee4..f9a207f815fb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3771,6 +3771,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_X86_USER_SPACE_MSR:
>  	case KVM_CAP_X86_MSR_FILTER:
>  	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
> +	case KVM_CAP_EXIT_ON_EMULATION_FAILURE:
>  		r = 1;
>  		break;
>  #ifdef CONFIG_KVM_XEN
> @@ -5357,6 +5358,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  			kvm->arch.bus_lock_detection_enabled = true;
>  		r = 0;
>  		break;
> +	case KVM_CAP_EXIT_ON_EMULATION_FAILURE:
> +		kvm->arch.exit_on_emulation_error = cap->args[0];
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> @@ -7119,8 +7124,29 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
>  
> +static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
> +{
> +	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> +	u64 insn_size = ctxt->fetch.end - ctxt->fetch.data;
> +	struct kvm *kvm = vcpu->kvm;
> +
> +	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> +	vcpu->run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
> +	vcpu->run->emulation_failure.ndata = 0;
> +	if (kvm->arch.exit_on_emulation_error && insn_size > 0) {
> +		vcpu->run->emulation_failure.ndata = 3;
> +		vcpu->run->emulation_failure.flags =
> +			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> +		vcpu->run->emulation_failure.insn_size = insn_size;
> +		memcpy(vcpu->run->emulation_failure.insn_bytes,
> +		       ctxt->fetch.data, sizeof(ctxt->fetch.data));
> +	}
> +}
> +
>  static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
>  {
> +	struct kvm *kvm = vcpu->kvm;
> +
>  	++vcpu->stat.insn_emulation_fail;
>  	trace_kvm_emulate_insn_failed(vcpu);
>  
> @@ -7129,10 +7155,9 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
>  		return 1;
>  	}
>  
> -	if (emulation_type & EMULTYPE_SKIP) {
> -		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> -		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> -		vcpu->run->internal.ndata = 0;
> +	if (kvm->arch.exit_on_emulation_error ||
> +	    (emulation_type & EMULTYPE_SKIP)) {
> +		prepare_emulation_failure_exit(vcpu);
>  		return 0;
>  	}
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f6afee209620..7c77099235b2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -279,6 +279,17 @@ struct kvm_xen_exit {
>  /* Encounter unexpected vm-exit reason */
>  #define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON	4
>  
> +/*
> + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> + * to describe what is contained in the exit struct.  The flags are used to
> + * describe it's contents, and the contents should be in ascending numerical
> + * order of the flag values.  For example, if the flag
> + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> + * length and instruction bytes would be expected to show up first because this
> + * flag has the lowest numerical value (1) of all the other flags.
> + */
> +#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
> +
>  /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
>  struct kvm_run {
>  	/* in */
> @@ -382,6 +393,14 @@ struct kvm_run {
>  			__u32 ndata;
>  			__u64 data[16];
>  		} internal;
> +		/* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
> +		struct {
> +			__u32 suberror;
> +			__u32 ndata;
> +			__u64 flags;
> +			__u8  insn_size;
> +			__u8  insn_bytes[15];
> +		} emulation_failure;
>  		/* KVM_EXIT_OSI */
>  		struct {
>  			__u64 gprs[32];
> @@ -1078,6 +1097,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_DIRTY_LOG_RING 192
>  #define KVM_CAP_X86_BUS_LOCK_EXIT 193
>  #define KVM_CAP_PPC_DAWR1 194
> +#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 195
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index f6afee209620..7c77099235b2 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -279,6 +279,17 @@ struct kvm_xen_exit {
>  /* Encounter unexpected vm-exit reason */
>  #define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON	4
>  
> +/*
> + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> + * to describe what is contained in the exit struct.  The flags are used to
> + * describe it's contents, and the contents should be in ascending numerical
> + * order of the flag values.  For example, if the flag
> + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> + * length and instruction bytes would be expected to show up first because this
> + * flag has the lowest numerical value (1) of all the other flags.

When adding a new flag, do I steal bytes from insn_bytes[] for my
associated payload? If so, how many do I have to leave?

> + */
> +#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
> +
>  /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
>  struct kvm_run {
>  	/* in */
> @@ -382,6 +393,14 @@ struct kvm_run {
>  			__u32 ndata;
>  			__u64 data[16];
>  		} internal;
> +		/* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
> +		struct {
> +			__u32 suberror;
> +			__u32 ndata;
> +			__u64 flags;
> +			__u8  insn_size;
> +			__u8  insn_bytes[15];
> +		} emulation_failure;
>  		/* KVM_EXIT_OSI */
>  		struct {
>  			__u64 gprs[32];
> @@ -1078,6 +1097,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_DIRTY_LOG_RING 192
>  #define KVM_CAP_X86_BUS_LOCK_EXIT 193
>  #define KVM_CAP_PPC_DAWR1 194
> +#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 195
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.31.1.368.gbe11c130af-goog

dme.
-- 
Walking upside down in the sky, between the satellites passing by, I'm looking.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-19 12:41 ` [PATCH 1/2] kvm: x86: Allow " David Edmondson
@ 2021-04-19 16:47   ` Aaron Lewis
  2021-04-20  7:21     ` David Edmondson
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Lewis @ 2021-04-19 16:47 UTC (permalink / raw)
  To: David Edmondson; +Cc: Jim Mattson, Sean Christopherson, kvm list

> > Add a fallback mechanism to the in-kernel instruction emulator that
> > allows userspace the opportunity to process an instruction the emulator
> > was unable to.  When the in-kernel instruction emulator fails to process
> > an instruction it will either inject a #UD into the guest or exit to
> > userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
> > not know how to proceed in an appropriate manner.  This feature lets
> > userspace get involved to see if it can figure out a better path
> > forward.
>
> Given that you are intending to try and handle the instruction in
> user-space, it seems a little odd to overload the
> KVM_EXIT_INTERNAL_ERROR/KVM_INTERNAL_ERROR_EMULATION exit reason/sub
> error.
>
> Why not add a new exit reason, particularly given that the caller has to
> enable the capability to get the relevant data? (It would also remove
> the need for the flag field and any mechanism for packing multiple bits
> of detail into the structure.)

I considered that, but I opted for the extensibility of the exiting
KVM_EXIT_INTERNAL_ERROR instead.  To me it was six of one or half a
dozen of the other.  With either strategy I still wanted to provide
for future extensibility, and had a flags field in place.  That way we
can add to this in the future if we find something that is missing
(ie: potentially wanting a way to mark dirty pages, possibly passing a
fault address, etc...)

> > +/*
> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> > + * to describe what is contained in the exit struct.  The flags are used to
> > + * describe it's contents, and the contents should be in ascending numerical
> > + * order of the flag values.  For example, if the flag
> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> > + * length and instruction bytes would be expected to show up first because this
> > + * flag has the lowest numerical value (1) of all the other flags.
> > + */
> > +#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
> > +
> >  /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
> >  struct kvm_run {
> >       /* in */
> > @@ -382,6 +393,14 @@ struct kvm_run {
> >                       __u32 ndata;
> >                       __u64 data[16];
> >               } internal;
> > +             /* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
> > +             struct {
> > +                     __u32 suberror;
> > +                     __u32 ndata;
> > +                     __u64 flags;
> > +                     __u8  insn_size;
> > +                     __u8  insn_bytes[15];
> > +             } emulation_failure;
> > +/*
> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> > + * to describe what is contained in the exit struct.  The flags are used to
> > + * describe it's contents, and the contents should be in ascending numerical
> > + * order of the flag values.  For example, if the flag
> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> > + * length and instruction bytes would be expected to show up first because this
> > + * flag has the lowest numerical value (1) of all the other flags.
>
> When adding a new flag, do I steal bytes from insn_bytes[] for my
> associated payload? If so, how many do I have to leave?
>

The emulation_failure struct mirrors the internal struct, so if you
are just adding to what I have, you can safely add up to 16 __u64's.
I'm currently using the size equivalent to 3 of them (flags,
insn_size, insn_bytes), so there should be plenty of space left for
you to add what you need to the end.  Just add the fields you need to
the end of emulation_failure struct, increase 'ndata' to the new
count, add a new flag to 'flags' so we know its contents.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-19 16:47   ` Aaron Lewis
@ 2021-04-20  7:21     ` David Edmondson
  2021-04-20 14:57       ` Aaron Lewis
  0 siblings, 1 reply; 16+ messages in thread
From: David Edmondson @ 2021-04-20  7:21 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: Jim Mattson, Sean Christopherson, kvm list

On Monday, 2021-04-19 at 09:47:19 -07, Aaron Lewis wrote:

>> > Add a fallback mechanism to the in-kernel instruction emulator that
>> > allows userspace the opportunity to process an instruction the emulator
>> > was unable to.  When the in-kernel instruction emulator fails to process
>> > an instruction it will either inject a #UD into the guest or exit to
>> > userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
>> > not know how to proceed in an appropriate manner.  This feature lets
>> > userspace get involved to see if it can figure out a better path
>> > forward.
>>
>> Given that you are intending to try and handle the instruction in
>> user-space, it seems a little odd to overload the
>> KVM_EXIT_INTERNAL_ERROR/KVM_INTERNAL_ERROR_EMULATION exit reason/sub
>> error.
>>
>> Why not add a new exit reason, particularly given that the caller has to
>> enable the capability to get the relevant data? (It would also remove
>> the need for the flag field and any mechanism for packing multiple bits
>> of detail into the structure.)
>
> I considered that, but I opted for the extensibility of the exiting
> KVM_EXIT_INTERNAL_ERROR instead.  To me it was six of one or half a
> dozen of the other.  With either strategy I still wanted to provide
> for future extensibility, and had a flags field in place.  That way we
> can add to this in the future if we find something that is missing
> (ie: potentially wanting a way to mark dirty pages, possibly passing a
> fault address, etc...)

How many of the flag based optional fields do you anticipate needing for
any one particular exit scenario?

If it's one, then using the flags to disambiguate the emulation failure
cases after choosing to stuff all of the cases into
KVM_EXIT_INTERNAL_ERROR / KVM_INTERNAL_ERROR_EMULATION would be odd.

(I'm presuming that it's not one, but don't understand the use case.)

>> > +/*
>> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
>> > + * to describe what is contained in the exit struct.  The flags are used to
>> > + * describe it's contents, and the contents should be in ascending numerical
>> > + * order of the flag values.  For example, if the flag
>> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
>> > + * length and instruction bytes would be expected to show up first because this
>> > + * flag has the lowest numerical value (1) of all the other flags.
>> > + */
>> > +#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
>> > +
>> >  /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
>> >  struct kvm_run {
>> >       /* in */
>> > @@ -382,6 +393,14 @@ struct kvm_run {
>> >                       __u32 ndata;
>> >                       __u64 data[16];
>> >               } internal;
>> > +             /* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
>> > +             struct {
>> > +                     __u32 suberror;
>> > +                     __u32 ndata;
>> > +                     __u64 flags;
>> > +                     __u8  insn_size;
>> > +                     __u8  insn_bytes[15];
>> > +             } emulation_failure;
>> > +/*
>> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
>> > + * to describe what is contained in the exit struct.  The flags are used to
>> > + * describe it's contents, and the contents should be in ascending numerical
>> > + * order of the flag values.  For example, if the flag
>> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
>> > + * length and instruction bytes would be expected to show up first because this
>> > + * flag has the lowest numerical value (1) of all the other flags.
>>
>> When adding a new flag, do I steal bytes from insn_bytes[] for my
>> associated payload? If so, how many do I have to leave?
>>
>
> The emulation_failure struct mirrors the internal struct, so if you
> are just adding to what I have, you can safely add up to 16 __u64's.
> I'm currently using the size equivalent to 3 of them (flags,
> insn_size, insn_bytes), so there should be plenty of space left for
> you to add what you need to the end.  Just add the fields you need to
> the end of emulation_failure struct, increase 'ndata' to the new
> count, add a new flag to 'flags' so we know its contents.

My apologies, I mis-read the u8 as u64, so figured that you'd eaten all
of the remaining space.

dme.
-- 
I walk like a building, I never get wet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-20  7:21     ` David Edmondson
@ 2021-04-20 14:57       ` Aaron Lewis
  2021-04-20 16:53         ` David Edmondson
  0 siblings, 1 reply; 16+ messages in thread
From: Aaron Lewis @ 2021-04-20 14:57 UTC (permalink / raw)
  To: David Edmondson; +Cc: Jim Mattson, Sean Christopherson, kvm list

On Tue, Apr 20, 2021 at 12:21 AM David Edmondson <dme@dme.org> wrote:
>
> On Monday, 2021-04-19 at 09:47:19 -07, Aaron Lewis wrote:
>
> >> > Add a fallback mechanism to the in-kernel instruction emulator that
> >> > allows userspace the opportunity to process an instruction the emulator
> >> > was unable to.  When the in-kernel instruction emulator fails to process
> >> > an instruction it will either inject a #UD into the guest or exit to
> >> > userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
> >> > not know how to proceed in an appropriate manner.  This feature lets
> >> > userspace get involved to see if it can figure out a better path
> >> > forward.
> >>
> >> Given that you are intending to try and handle the instruction in
> >> user-space, it seems a little odd to overload the
> >> KVM_EXIT_INTERNAL_ERROR/KVM_INTERNAL_ERROR_EMULATION exit reason/sub
> >> error.
> >>
> >> Why not add a new exit reason, particularly given that the caller has to
> >> enable the capability to get the relevant data? (It would also remove
> >> the need for the flag field and any mechanism for packing multiple bits
> >> of detail into the structure.)
> >
> > I considered that, but I opted for the extensibility of the exiting
> > KVM_EXIT_INTERNAL_ERROR instead.  To me it was six of one or half a
> > dozen of the other.  With either strategy I still wanted to provide
> > for future extensibility, and had a flags field in place.  That way we
> > can add to this in the future if we find something that is missing
> > (ie: potentially wanting a way to mark dirty pages, possibly passing a
> > fault address, etc...)
>
> How many of the flag based optional fields do you anticipate needing for
> any one particular exit scenario?
>
> If it's one, then using the flags to disambiguate the emulation failure
> cases after choosing to stuff all of the cases into
> KVM_EXIT_INTERNAL_ERROR / KVM_INTERNAL_ERROR_EMULATION would be odd.
>
> (I'm presuming that it's not one, but don't understand the use case.)
>

The motivation was to allow for maximum flexibility in the future, and
not be tied down to something we potentially missed now.  I agree the
flags aren't needed if we are only adding to what's currently there,
but they are needed if we want to remove something or pack something
differently.  I didn't see how I could achieve that without adding a
flags field.  Seemed like low overhead to be more future proof.

> >> > +/*
> >> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> >> > + * to describe what is contained in the exit struct.  The flags are used to
> >> > + * describe it's contents, and the contents should be in ascending numerical
> >> > + * order of the flag values.  For example, if the flag
> >> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> >> > + * length and instruction bytes would be expected to show up first because this
> >> > + * flag has the lowest numerical value (1) of all the other flags.
> >> > + */
> >> > +#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
> >> > +
> >> >  /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
> >> >  struct kvm_run {
> >> >       /* in */
> >> > @@ -382,6 +393,14 @@ struct kvm_run {
> >> >                       __u32 ndata;
> >> >                       __u64 data[16];
> >> >               } internal;
> >> > +             /* KVM_EXIT_INTERNAL_ERROR, too (not 2) */
> >> > +             struct {
> >> > +                     __u32 suberror;
> >> > +                     __u32 ndata;
> >> > +                     __u64 flags;
> >> > +                     __u8  insn_size;
> >> > +                     __u8  insn_bytes[15];
> >> > +             } emulation_failure;
> >> > +/*
> >> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> >> > + * to describe what is contained in the exit struct.  The flags are used to
> >> > + * describe it's contents, and the contents should be in ascending numerical
> >> > + * order of the flag values.  For example, if the flag
> >> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> >> > + * length and instruction bytes would be expected to show up first because this
> >> > + * flag has the lowest numerical value (1) of all the other flags.
> >>
> >> When adding a new flag, do I steal bytes from insn_bytes[] for my
> >> associated payload? If so, how many do I have to leave?
> >>
> >
> > The emulation_failure struct mirrors the internal struct, so if you
> > are just adding to what I have, you can safely add up to 16 __u64's.
> > I'm currently using the size equivalent to 3 of them (flags,
> > insn_size, insn_bytes), so there should be plenty of space left for
> > you to add what you need to the end.  Just add the fields you need to
> > the end of emulation_failure struct, increase 'ndata' to the new
> > count, add a new flag to 'flags' so we know its contents.
>
> My apologies, I mis-read the u8 as u64, so figured that you'd eaten all
> of the remaining space.
>
> dme.
> --
> I walk like a building, I never get wet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-20 14:57       ` Aaron Lewis
@ 2021-04-20 16:53         ` David Edmondson
  2021-04-20 18:21           ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: David Edmondson @ 2021-04-20 16:53 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: Jim Mattson, Sean Christopherson, kvm list

On Tuesday, 2021-04-20 at 07:57:27 -07, Aaron Lewis wrote:

>> >> Why not add a new exit reason, particularly given that the caller has to
>> >> enable the capability to get the relevant data? (It would also remove
>> >> the need for the flag field and any mechanism for packing multiple bits
>> >> of detail into the structure.)
>> >
>> > I considered that, but I opted for the extensibility of the exiting
>> > KVM_EXIT_INTERNAL_ERROR instead.  To me it was six of one or half a
>> > dozen of the other.  With either strategy I still wanted to provide
>> > for future extensibility, and had a flags field in place.  That way we
>> > can add to this in the future if we find something that is missing
>> > (ie: potentially wanting a way to mark dirty pages, possibly passing a
>> > fault address, etc...)
>>
>> How many of the flag based optional fields do you anticipate needing for
>> any one particular exit scenario?
>>
>> If it's one, then using the flags to disambiguate the emulation failure
>> cases after choosing to stuff all of the cases into
>> KVM_EXIT_INTERNAL_ERROR / KVM_INTERNAL_ERROR_EMULATION would be odd.
>>
>> (I'm presuming that it's not one, but don't understand the use case.)
>>
>
> The motivation was to allow for maximum flexibility in the future, and
> not be tied down to something we potentially missed now.  I agree the
> flags aren't needed if we are only adding to what's currently there,
> but they are needed if we want to remove something or pack something
> differently.  I didn't see how I could achieve that without adding a
> flags field.  Seemed like low overhead to be more future proof.

With what you have now, the ndata field seems unnecessary - I should be
able to determine the contents of the rest of the structure based on the
flags. That also suggests to me that using something other than
KVM_INTERNAL_ERROR_EMULATION would make sense.

This comment:

>> >> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
>> >> > + * to describe what is contained in the exit struct.  The flags are used to
>> >> > + * describe it's contents, and the contents should be in ascending numerical
>> >> > + * order of the flag values.  For example, if the flag
>> >> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
>> >> > + * length and instruction bytes would be expected to show up first because this
>> >> > + * flag has the lowest numerical value (1) of all the other flags.

originally made me think that the flag-indicated elements were going to
be packed into the remaining space of the structure at a position
depending on which flags are set.

For example, if I add a new flag
KVM_INTERNAL_ERROR_EMULATION_FLAG_EXIT_CODE, value 2, and then want to
pass back an exit code but *not* instruction bytes, the comment appears
to suggest that the exit code will appear immediately after the flags.

This is contradicted by your other reply:

>> > Just add the fields you need to
>> > the end of emulation_failure struct, increase 'ndata' to the new
>> > count, add a new flag to 'flags' so we know its contents.

Given this, the ordering of flag values does not seem significant - the
structure elements corresponding to a flag value will always be present,
just not filled with relevant data.

dme.
-- 
When you were the brightest star, who were the shadows?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-20 16:53         ` David Edmondson
@ 2021-04-20 18:21           ` Sean Christopherson
  2021-04-21  8:00             ` David Edmondson
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2021-04-20 18:21 UTC (permalink / raw)
  To: David Edmondson; +Cc: Aaron Lewis, Jim Mattson, kvm list

On Tue, Apr 20, 2021, David Edmondson wrote:
> On Tuesday, 2021-04-20 at 07:57:27 -07, Aaron Lewis wrote:
> 
> >> >> Why not add a new exit reason, particularly given that the caller has to
> >> >> enable the capability to get the relevant data? (It would also remove
> >> >> the need for the flag field and any mechanism for packing multiple bits
> >> >> of detail into the structure.)
> >> >
> >> > I considered that, but I opted for the extensibility of the exiting
> >> > KVM_EXIT_INTERNAL_ERROR instead.  To me it was six of one or half a
> >> > dozen of the other.  With either strategy I still wanted to provide
> >> > for future extensibility, and had a flags field in place.  That way we
> >> > can add to this in the future if we find something that is missing
> >> > (ie: potentially wanting a way to mark dirty pages, possibly passing a
> >> > fault address, etc...)
> >>
> >> How many of the flag based optional fields do you anticipate needing for
> >> any one particular exit scenario?
> >>
> >> If it's one, then using the flags to disambiguate the emulation failure
> >> cases after choosing to stuff all of the cases into
> >> KVM_EXIT_INTERNAL_ERROR / KVM_INTERNAL_ERROR_EMULATION would be odd.
> >>
> >> (I'm presuming that it's not one, but don't understand the use case.)
> >
> > The motivation was to allow for maximum flexibility in the future, and
> > not be tied down to something we potentially missed now.  I agree the
> > flags aren't needed if we are only adding to what's currently there,
> > but they are needed if we want to remove something or pack something
> > differently.  I didn't see how I could achieve that without adding a
> > flags field.  Seemed like low overhead to be more future proof.
> 
> With what you have now, the ndata field seems unnecessary - I should be
> able to determine the contents of the rest of the structure based on the
> flags.

Keeping ndata is necessary if we piggyback KVM_INTERNAL_ERROR_EMULATION,
otherwise we'll break for VMMs that are not aware of the new format.  E.g. if
ndata gets stuffed with a large number, KVM could cause a buffer overrun in an
old VMM.

> That also suggests to me that using something other than
> KVM_INTERNAL_ERROR_EMULATION would make sense.

Like Aaron, I'm on the fence as to whether or not a new exit reason is in order.
On one hand, it would be slightly cleaner.  On the other hand, the existing
"KVM_INTERNAL_ERROR_EMULATION" really is the best name.  It implies nothing
about the userspace VMM, only that KVM attempted to emulate an instruction and
failed.

The other motivation is that KVM can opportunistically start dumping extra info
for old VMMs, though this patch does not do that; feedback imminent. :-)

> This comment:
> 
> >> >> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
> >> >> > + * to describe what is contained in the exit struct.  The flags are used to
> >> >> > + * describe it's contents, and the contents should be in ascending numerical
> >> >> > + * order of the flag values.  For example, if the flag
> >> >> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
> >> >> > + * length and instruction bytes would be expected to show up first because this
> >> >> > + * flag has the lowest numerical value (1) of all the other flags.
> 
> originally made me think that the flag-indicated elements were going to
> be packed into the remaining space of the structure at a position
> depending on which flags are set.
> 
> For example, if I add a new flag
> KVM_INTERNAL_ERROR_EMULATION_FLAG_EXIT_CODE, value 2, and then want to
> pass back an exit code but *not* instruction bytes, the comment appears
> to suggest that the exit code will appear immediately after the flags.
> 
> This is contradicted by your other reply:
> 
> >> > Just add the fields you need to
> >> > the end of emulation_failure struct, increase 'ndata' to the new
> >> > count, add a new flag to 'flags' so we know its contents.
> 
> Given this, the ordering of flag values does not seem significant - the
> structure elements corresponding to a flag value will always be present,
> just not filled with relevant data.

I think what Aaron is trying to say is that the order in the aliased data[] is
associated with the lowest _defined_ flag value, not the lowest _set_ flag.

That said, I would just omit the "ascending numerical" stuff entirely, e.g. I
think for the #defines, this will suffice:

/* Flags that describe what fields in emulation_failure hold valid data  */


As for not breaking userspace if/when additional fields are added, we can instead
document the new struct (and drop my snarky comment :-D), e.g.:

		/*
		 * KVM_INTERNAL_ERROR_EMULATION
		 *
		 * "struct emulation_failure" is an overlay of "struct internal"
		 * that is used for the KVM_INTERNAL_ERROR_EMULATION sub-type of
		 * KVM_EXIT_INTERNAL_ERROR.  Note, unlike other internal error
		 * sub-types, this struct is ABI!  It also needs to be backwards
		 * compabile with "struct internal".  Take special care that
		 * "ndata" is correct, that new fields are enumerated in "flags",
		 * and that each flag enumerates fields that are 64-bit aligned
		 * and sized (so that ndata+internal.data[] is valid/accurate).
		 */
		struct {
			__u32 suberror;
			__u32 ndata;
			__u64 flags;
			__u8  insn_size;
			__u8  insn_bytes[15];
		} emulation_failure;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-16 13:18 [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors Aaron Lewis
  2021-04-16 13:18 ` [PATCH 2/2] selftests: kvm: Allows " Aaron Lewis
  2021-04-19 12:41 ` [PATCH 1/2] kvm: x86: Allow " David Edmondson
@ 2021-04-20 18:34 ` Sean Christopherson
  2021-04-21  8:39   ` David Edmondson
  2021-04-21 16:31   ` Jim Mattson
  2 siblings, 2 replies; 16+ messages in thread
From: Sean Christopherson @ 2021-04-20 18:34 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: david.edmondson, jmattson, kvm

On Fri, Apr 16, 2021, Aaron Lewis wrote:
> +7.24 KVM_CAP_EXIT_ON_EMULATION_FAILURE
> +--------------------------------------
> +
> +:Architectures: x86
> +:Parameters: args[0] whether the feature should be enabled or not
> +
> +With this capability enabled, the in-kernel instruction emulator packs the exit
> +struct of KVM_INTERNAL_ERROR with the instruction length and instruction bytes
> +when an error occurs while emulating an instruction.  This allows userspace to
> +then take a look at the instruction and see if it is able to handle it more
> +gracefully than the in-kernel emulator.

As alluded to later in the thread, I don't think we should condition the extra
information on this capability.  By crafting the struct overlay to be backwards
compatibile, KVM can safely dump all the new information, even for old VMMs.
An old VMM may not programmatically use the data, but I suspect most VMMs at
least dump all info, e.g. QEMU does:

    if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
        int i;

        for (i = 0; i < run->internal.ndata; ++i) {
            fprintf(stderr, "extra data[%d]: %"PRIx64"\n",
                    i, (uint64_t)run->internal.data[i]);
        }
    }

This would be a way to feed more info to the poor sod that has to debug
emulation failures :-)

> +
> +When this capability is enabled use the emulation_failure struct instead of the
> +internal struct for the exit struct.  They have the same layout, but the
> +emulation_failure struct matches the content better.

This documentation misses the arguably more important details of what exactly
"EXIT_ON_EMULATION_FAILURE" means.  E.g. it should call out the KVM still exits
on certain types (skip) even if this capability is not enabled, and that KVM
will _never_ exit if VMware #GP emulation fails.

> +
>  8. Other capabilities.
>  ======================
> @@ -7119,8 +7124,29 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)

...

>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
>  
> +static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
> +{
> +	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> +	u64 insn_size = ctxt->fetch.end - ctxt->fetch.data;
> +	struct kvm *kvm = vcpu->kvm;
> +
> +	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;

Grab vcpu->run in a local variable.

> +	vcpu->run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
> +	vcpu->run->emulation_failure.ndata = 0;
> +	if (kvm->arch.exit_on_emulation_error && insn_size > 0) {

I definitely think this should not be conditioned on exit_on_emulation_error.

No need for "> 0", it's an unsigned value.

> +		vcpu->run->emulation_failure.ndata = 3;
> +		vcpu->run->emulation_failure.flags =

Flags needs to be zeroed when insn_size==0.  And use |= for new flags so that,
if we add new flags, the existing code doesn't need to be modified.

> +			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> +		vcpu->run->emulation_failure.insn_size = insn_size;
> +		memcpy(vcpu->run->emulation_failure.insn_bytes,
> +		       ctxt->fetch.data, sizeof(ctxt->fetch.data));

Doesn't truly matter, but I think it's less confusing to copy over insn_size
bytes.

> +	}


	...
	struct kvm_run *kvm_run = vcpu->run;

	kvm_run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
	kvm_run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
	kvm_run->emulation_failure.ndata = 1;
	kvm_run->emulation_failure.flags = 0;

	if (insn_size) {
		kvm_run->emulation_failure.ndata = 3;
		kvm_run->emulation_failure.flags |=
			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
		kvm_run->emulation_failure.insn_size = insn_size;
		memcpy(kvm_run->emulation_failure.insn_bytes, ctxt->fetch.data, insn_size);
	}

> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-20 18:21           ` Sean Christopherson
@ 2021-04-21  8:00             ` David Edmondson
  0 siblings, 0 replies; 16+ messages in thread
From: David Edmondson @ 2021-04-21  8:00 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Aaron Lewis, Jim Mattson, kvm list

On Tuesday, 2021-04-20 at 18:21:42 GMT, Sean Christopherson wrote:

> On Tue, Apr 20, 2021, David Edmondson wrote:
>> 
>> With what you have now, the ndata field seems unnecessary - I should be
>> able to determine the contents of the rest of the structure based on the
>> flags.
>
> Keeping ndata is necessary if we piggyback KVM_INTERNAL_ERROR_EMULATION,
> otherwise we'll break for VMMs that are not aware of the new format.  E.g. if
> ndata gets stuffed with a large number, KVM could cause a buffer overrun in an
> old VMM.

Agreed.

>> That also suggests to me that using something other than
>> KVM_INTERNAL_ERROR_EMULATION would make sense.
>
> Like Aaron, I'm on the fence as to whether or not a new exit reason is in order.
> On one hand, it would be slightly cleaner.  On the other hand, the existing
> "KVM_INTERNAL_ERROR_EMULATION" really is the best name.  It implies nothing
> about the userspace VMM, only that KVM attempted to emulate an instruction and
> failed.
>
> The other motivation is that KVM can opportunistically start dumping extra info
> for old VMMs, though this patch does not do that; feedback imminent. :-)

It's nothing more than that the interface ends up feeling a little
strange. With several flags added and some of the earlier flags unused,
ndata ends up indicating the largest extent of the flag-indicated data,
but the earlier elements of the structure are unused. Hence the question
about how many flags we anticipate using simultaneously.

(I'm not really arguing that we should be packing the stuff in and
having to decode it, as that is also unpleasant.)

>> This comment:
>> 
>> >> >> > + * When using the suberror KVM_INTERNAL_ERROR_EMULATION, these flags are used
>> >> >> > + * to describe what is contained in the exit struct.  The flags are used to
>> >> >> > + * describe it's contents, and the contents should be in ascending numerical
>> >> >> > + * order of the flag values.  For example, if the flag
>> >> >> > + * KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is set, the instruction
>> >> >> > + * length and instruction bytes would be expected to show up first because this
>> >> >> > + * flag has the lowest numerical value (1) of all the other flags.
>> 
>> originally made me think that the flag-indicated elements were going to
>> be packed into the remaining space of the structure at a position
>> depending on which flags are set.
>> 
>> For example, if I add a new flag
>> KVM_INTERNAL_ERROR_EMULATION_FLAG_EXIT_CODE, value 2, and then want to
>> pass back an exit code but *not* instruction bytes, the comment appears
>> to suggest that the exit code will appear immediately after the flags.
>> 
>> This is contradicted by your other reply:
>> 
>> >> > Just add the fields you need to
>> >> > the end of emulation_failure struct, increase 'ndata' to the new
>> >> > count, add a new flag to 'flags' so we know its contents.
>> 
>> Given this, the ordering of flag values does not seem significant - the
>> structure elements corresponding to a flag value will always be present,
>> just not filled with relevant data.
>
> I think what Aaron is trying to say is that the order in the aliased data[] is
> associated with the lowest _defined_ flag value, not the lowest _set_ flag.
>
> That said, I would just omit the "ascending numerical" stuff entirely, e.g. I
> think for the #defines, this will suffice:
>
> /* Flags that describe what fields in emulation_failure hold valid data  */

Agreed.

> As for not breaking userspace if/when additional fields are added, we can instead
> document the new struct (and drop my snarky comment :-D), e.g.:
>
> 		/*
> 		 * KVM_INTERNAL_ERROR_EMULATION
> 		 *
> 		 * "struct emulation_failure" is an overlay of "struct internal"
> 		 * that is used for the KVM_INTERNAL_ERROR_EMULATION sub-type of
> 		 * KVM_EXIT_INTERNAL_ERROR.  Note, unlike other internal error
> 		 * sub-types, this struct is ABI!  It also needs to be backwards
> 		 * compabile with "struct internal".  Take special care that
> 		 * "ndata" is correct, that new fields are enumerated in "flags",
> 		 * and that each flag enumerates fields that are 64-bit aligned
> 		 * and sized (so that ndata+internal.data[] is valid/accurate).
> 		 */
> 		struct {
> 			__u32 suberror;
> 			__u32 ndata;
> 			__u64 flags;
> 			__u8  insn_size;
> 			__u8  insn_bytes[15];
> 		} emulation_failure;

Looks good (even with the snark).

dme.
-- 
People in love get everything wrong.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-20 18:34 ` Sean Christopherson
@ 2021-04-21  8:39   ` David Edmondson
  2021-04-21 12:47     ` Aaron Lewis
  2021-04-21 16:26     ` Jim Mattson
  2021-04-21 16:31   ` Jim Mattson
  1 sibling, 2 replies; 16+ messages in thread
From: David Edmondson @ 2021-04-21  8:39 UTC (permalink / raw)
  To: Sean Christopherson, Aaron Lewis; +Cc: jmattson, kvm

On Tuesday, 2021-04-20 at 18:34:48 UTC, Sean Christopherson wrote:

> On Fri, Apr 16, 2021, Aaron Lewis wrote:
>> +			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
>> +		vcpu->run->emulation_failure.insn_size = insn_size;
>> +		memcpy(vcpu->run->emulation_failure.insn_bytes,
>> +		       ctxt->fetch.data, sizeof(ctxt->fetch.data));
>
> Doesn't truly matter, but I think it's less confusing to copy over insn_size
> bytes.

And zero out the rest?

dme.
-- 
Woke up in my clothes again this morning, don't know exactly where I am.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-21  8:39   ` David Edmondson
@ 2021-04-21 12:47     ` Aaron Lewis
  2021-04-21 16:26     ` Jim Mattson
  1 sibling, 0 replies; 16+ messages in thread
From: Aaron Lewis @ 2021-04-21 12:47 UTC (permalink / raw)
  To: David Edmondson, Sean Christopherson; +Cc: Jim Mattson, kvm list

>> The other motivation is that KVM can opportunistically start dumping extra info
>> for old VMMs, though this patch does not do that; feedback imminent. :-)
>
> It's nothing more than that the interface ends up feeling a little
> strange. With several flags added and some of the earlier flags unused,
> ndata ends up indicating the largest extent of the flag-indicated data,
> but the earlier elements of the structure are unused. Hence the question
> about how many flags we anticipate using simultaneously.
>
> (I'm not really arguing that we should be packing the stuff in and
> having to decode it, as that is also unpleasant.)

It's hard to say how many flags will be used, but I suspect it will be
a small set (I mentioned previously a possibility of 2 more), so I
like Sean's suggestion to keep it simple and just use the flags to
indicate which fields are used, instead of trying to pack everything
in tightly.

On Wed, Apr 21, 2021 at 1:39 AM David Edmondson <dme@dme.org> wrote:
>
> On Tuesday, 2021-04-20 at 18:34:48 UTC, Sean Christopherson wrote:
>
> > On Fri, Apr 16, 2021, Aaron Lewis wrote:
> >> +                    KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> >> +            vcpu->run->emulation_failure.insn_size = insn_size;
> >> +            memcpy(vcpu->run->emulation_failure.insn_bytes,
> >> +                   ctxt->fetch.data, sizeof(ctxt->fetch.data));
> >
> > Doesn't truly matter, but I think it's less confusing to copy over insn_size
> > bytes.
>
> And zero out the rest?
>

I left the memcpy alone as this seems like a more concervative / safer
approach.  I've had and have seen too many memcpy fails that my
preference is to be conservative here.  I could add a comment to
clarify any confusion that this may bring by not using insn_size if
that will help.


Cheers,
Aaron

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-21  8:39   ` David Edmondson
  2021-04-21 12:47     ` Aaron Lewis
@ 2021-04-21 16:26     ` Jim Mattson
  2021-04-21 17:01       ` David Edmondson
  1 sibling, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2021-04-21 16:26 UTC (permalink / raw)
  To: David Edmondson; +Cc: Sean Christopherson, Aaron Lewis, kvm list

On Wed, Apr 21, 2021 at 1:39 AM David Edmondson <dme@dme.org> wrote:
>
> On Tuesday, 2021-04-20 at 18:34:48 UTC, Sean Christopherson wrote:
>
> > On Fri, Apr 16, 2021, Aaron Lewis wrote:
> >> +                    KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> >> +            vcpu->run->emulation_failure.insn_size = insn_size;
> >> +            memcpy(vcpu->run->emulation_failure.insn_bytes,
> >> +                   ctxt->fetch.data, sizeof(ctxt->fetch.data));
> >
> > Doesn't truly matter, but I think it's less confusing to copy over insn_size
> > bytes.

> And zero out the rest?

Why zero? Since we're talking about an instruction stream, wouldn't
0x90 make more sense than zero?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-20 18:34 ` Sean Christopherson
  2021-04-21  8:39   ` David Edmondson
@ 2021-04-21 16:31   ` Jim Mattson
  1 sibling, 0 replies; 16+ messages in thread
From: Jim Mattson @ 2021-04-21 16:31 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Aaron Lewis, David Edmondson, kvm list

On Tue, Apr 20, 2021 at 11:34 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Apr 16, 2021, Aaron Lewis wrote:
...
> > +     vcpu->run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
> > +     vcpu->run->emulation_failure.ndata = 0;
> > +     if (kvm->arch.exit_on_emulation_error && insn_size > 0) {
>
> I definitely think this should not be conditioned on exit_on_emulation_error.
>
> No need for "> 0", it's an unsigned value.

Unsigned doesn't imply non-zero. If insn_size is 0, then something is wrong.

...
> > +                     KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> > +             vcpu->run->emulation_failure.insn_size = insn_size;
> > +             memcpy(vcpu->run->emulation_failure.insn_bytes,
> > +                    ctxt->fetch.data, sizeof(ctxt->fetch.data));
>
> Doesn't truly matter, but I think it's less confusing to copy over insn_size
> bytes.

Are you convinced that insn_size is always less than or equal to
sizeof(ctxt->fetch.data)? I'm not. It shouldn't be, of course, but
perhaps we should confirm that before copying insn_size bytes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-21 16:26     ` Jim Mattson
@ 2021-04-21 17:01       ` David Edmondson
  2021-04-21 17:28         ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: David Edmondson @ 2021-04-21 17:01 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Sean Christopherson, Aaron Lewis, kvm list

On Wednesday, 2021-04-21 at 09:26:34 -07, Jim Mattson wrote:

> On Wed, Apr 21, 2021 at 1:39 AM David Edmondson <dme@dme.org> wrote:
>>
>> On Tuesday, 2021-04-20 at 18:34:48 UTC, Sean Christopherson wrote:
>>
>> > On Fri, Apr 16, 2021, Aaron Lewis wrote:
>> >> +                    KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
>> >> +            vcpu->run->emulation_failure.insn_size = insn_size;
>> >> +            memcpy(vcpu->run->emulation_failure.insn_bytes,
>> >> +                   ctxt->fetch.data, sizeof(ctxt->fetch.data));
>> >
>> > Doesn't truly matter, but I think it's less confusing to copy over insn_size
>> > bytes.
>
>> And zero out the rest?
>
> Why zero? Since we're talking about an instruction stream, wouldn't
> 0x90 make more sense than zero?

I'm not sure if you are serious or not.

Zero-ing out the rest was intended to be to avoid leaking any previous
emulated instruction stream. If the user-level code wants to start
looking for instructions after insn_bytes[insn_size], they get what they
deserve.

dme.
-- 
We're deep in discussion, the party's on mute.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors
  2021-04-21 17:01       ` David Edmondson
@ 2021-04-21 17:28         ` Jim Mattson
  0 siblings, 0 replies; 16+ messages in thread
From: Jim Mattson @ 2021-04-21 17:28 UTC (permalink / raw)
  To: David Edmondson; +Cc: Sean Christopherson, Aaron Lewis, kvm list

On Wed, Apr 21, 2021 at 10:01 AM David Edmondson <dme@dme.org> wrote:
>
> On Wednesday, 2021-04-21 at 09:26:34 -07, Jim Mattson wrote:
>
> > On Wed, Apr 21, 2021 at 1:39 AM David Edmondson <dme@dme.org> wrote:
> >>
> >> On Tuesday, 2021-04-20 at 18:34:48 UTC, Sean Christopherson wrote:
> >>
> >> > On Fri, Apr 16, 2021, Aaron Lewis wrote:
> >> >> +                    KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> >> >> +            vcpu->run->emulation_failure.insn_size = insn_size;
> >> >> +            memcpy(vcpu->run->emulation_failure.insn_bytes,
> >> >> +                   ctxt->fetch.data, sizeof(ctxt->fetch.data));
> >> >
> >> > Doesn't truly matter, but I think it's less confusing to copy over insn_size
> >> > bytes.
> >
> >> And zero out the rest?
> >
> > Why zero? Since we're talking about an instruction stream, wouldn't
> > 0x90 make more sense than zero?
>
> I'm not sure if you are serious or not.
>
> Zero-ing out the rest was intended to be to avoid leaking any previous
> emulated instruction stream. If the user-level code wants to start
> looking for instructions after insn_bytes[insn_size], they get what they
> deserve.

If we limit the copy to insn_size bytes, the only leak is what was
already present in the vcpu->run structure before this, which may or
may not be a previous instruction stream, and which has been readable
by userspace all along.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16 13:18 [PATCH 1/2] kvm: x86: Allow userspace to handle emulation errors Aaron Lewis
2021-04-16 13:18 ` [PATCH 2/2] selftests: kvm: Allows " Aaron Lewis
2021-04-19 12:41 ` [PATCH 1/2] kvm: x86: Allow " David Edmondson
2021-04-19 16:47   ` Aaron Lewis
2021-04-20  7:21     ` David Edmondson
2021-04-20 14:57       ` Aaron Lewis
2021-04-20 16:53         ` David Edmondson
2021-04-20 18:21           ` Sean Christopherson
2021-04-21  8:00             ` David Edmondson
2021-04-20 18:34 ` Sean Christopherson
2021-04-21  8:39   ` David Edmondson
2021-04-21 12:47     ` Aaron Lewis
2021-04-21 16:26     ` Jim Mattson
2021-04-21 17:01       ` David Edmondson
2021-04-21 17:28         ` Jim Mattson
2021-04-21 16:31   ` Jim Mattson

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git