All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] kvm: x86: Convey the exit reason, etc. to user-space on emulation failure
@ 2021-07-29 13:39 David Edmondson
  2021-07-29 13:39 ` [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason David Edmondson
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: David Edmondson @ 2021-07-29 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Jim Mattson, kvm,
	Borislav Petkov, David Matlack, Paolo Bonzini, H. Peter Anvin,
	x86, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	David Edmondson

To help when debugging failures in the field, if instruction emulation
fails, report the VM exit reason to userspace in order that it can be
recorded.

Sean: hopefully this is something like what you intended. If not,
please clarify and I will have another go. The lack of an ABI for the
debug data does feel messy.

The SGX changes here are compiled but untested.

v3:
- Convey any debug data un-flagged after the ABI specified data in
  struct emulation_failure (Sean)
- Obey the ABI protocol in sgx_handle_emulation_failure() (Sean)

v2:
- Improve patch comments (dmatlock)
- Intel should provide the full exit reason (dmatlock)
- Pass a boolean rather than flags (dmatlock)
- Use the helper in kvm_task_switch() and kvm_handle_memory_failure()
  (dmatlock)
- Describe the exit_reason field of the emulation_failure structure
  (dmatlock)

David Edmondson (3):
  KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason
  KVM: x86: On emulation failure, convey the exit reason, etc. to
    userspace
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol

 arch/x86/include/asm/kvm_host.h | 12 ++++++--
 arch/x86/kvm/svm/svm.c          |  8 +++--
 arch/x86/kvm/trace.h            | 11 +++----
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/vmx/sgx.c          |  8 ++---
 arch/x86/kvm/vmx/vmx.c          | 11 ++++---
 arch/x86/kvm/x86.c              | 53 ++++++++++++++++++++++++++-------
 include/uapi/linux/kvm.h        |  7 +++++
 8 files changed, 79 insertions(+), 33 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason
  2021-07-29 13:39 [PATCH v3 0/3] kvm: x86: Convey the exit reason, etc. to user-space on emulation failure David Edmondson
@ 2021-07-29 13:39 ` David Edmondson
  2021-07-29 22:27   ` Sean Christopherson
  2021-07-29 13:39 ` [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace David Edmondson
  2021-07-29 13:39 ` [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol David Edmondson
  2 siblings, 1 reply; 13+ messages in thread
From: David Edmondson @ 2021-07-29 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Jim Mattson, kvm,
	Borislav Petkov, David Matlack, Paolo Bonzini, H. Peter Anvin,
	x86, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	David Edmondson

Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  7 ++++---
 arch/x86/kvm/svm/svm.c          |  8 +++++---
 arch/x86/kvm/trace.h            | 11 ++++++-----
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/vmx/vmx.c          |  6 ++++--
 5 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 974cbfb1eefe..dfb902930cdc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1377,10 +1377,11 @@ struct kvm_x86_ops {
 	void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier);
 
 	/*
-	 * Retrieve somewhat arbitrary exit information.  Intended to be used
-	 * only from within tracepoints to avoid VMREADs when tracing is off.
+	 * Retrieve somewhat arbitrary exit information.  Intended to
+	 * be used only from within tracepoints or error paths.
 	 */
-	void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+	void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *reason,
+			      u64 *info1, u64 *info2,
 			      u32 *exit_int_info, u32 *exit_int_info_err_code);
 
 	int (*check_intercept)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 664d20f0689c..e5c4354dcc6f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3301,11 +3301,13 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 	return svm_exit_handlers[exit_code](vcpu);
 }
 
-static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *reason,
+			      u64 *info1, u64 *info2,
 			      u32 *intr_info, u32 *error_code)
 {
 	struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control;
 
+	*reason = control->exit_code;
 	*info1 = control->exit_info_1;
 	*info2 = control->exit_info_2;
 	*intr_info = control->exit_int_info;
@@ -3322,7 +3324,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	struct kvm_run *kvm_run = vcpu->run;
 	u32 exit_code = svm->vmcb->control.exit_code;
 
-	trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
+	trace_kvm_exit(vcpu, KVM_ISA_SVM);
 
 	/* SEV-ES guests must use the CR write traps to track CR registers. */
 	if (!sev_es_guest(vcpu->kvm)) {
@@ -3335,7 +3337,7 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	if (is_guest_mode(vcpu)) {
 		int vmexit;
 
-		trace_kvm_nested_vmexit(exit_code, vcpu, KVM_ISA_SVM);
+		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
 
 		vmexit = nested_svm_exit_special(svm);
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b484141ea15b..2228565beda2 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -273,11 +273,11 @@ TRACE_EVENT(kvm_apic,
 
 #define TRACE_EVENT_KVM_EXIT(name)					     \
 TRACE_EVENT(name,							     \
-	TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu, u32 isa),  \
-	TP_ARGS(exit_reason, vcpu, isa),				     \
+	TP_PROTO(struct kvm_vcpu *vcpu, u32 isa),			     \
+	TP_ARGS(vcpu, isa),						     \
 									     \
 	TP_STRUCT__entry(						     \
-		__field(	unsigned int,	exit_reason	)	     \
+		__field(	u64,		exit_reason	)	     \
 		__field(	unsigned long,	guest_rip	)	     \
 		__field(	u32,	        isa             )	     \
 		__field(	u64,	        info1           )	     \
@@ -288,11 +288,12 @@ TRACE_EVENT(name,							     \
 	),								     \
 									     \
 	TP_fast_assign(							     \
-		__entry->exit_reason	= exit_reason;			     \
 		__entry->guest_rip	= kvm_rip_read(vcpu);		     \
 		__entry->isa            = isa;				     \
 		__entry->vcpu_id        = vcpu->vcpu_id;		     \
-		static_call(kvm_x86_get_exit_info)(vcpu, &__entry->info1,    \
+		static_call(kvm_x86_get_exit_info)(vcpu,		     \
+					  &__entry->exit_reason,	     \
+					  &__entry->info1,		     \
 					  &__entry->info2,		     \
 					  &__entry->intr_info,		     \
 					  &__entry->error_code);	     \
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1a52134b0c42..fbbc01e9570b 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6025,7 +6025,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
 		goto reflect_vmexit;
 	}
 
-	trace_kvm_nested_vmexit(exit_reason.full, vcpu, KVM_ISA_VMX);
+	trace_kvm_nested_vmexit(vcpu, KVM_ISA_VMX);
 
 	/* If L0 (KVM) wants the exit, it trumps L1's desires. */
 	if (nested_vmx_l0_wants_exit(vcpu, exit_reason))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 927a552393b9..fefdecb0ff3d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5617,11 +5617,13 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 static const int kvm_vmx_max_exit_handlers =
 	ARRAY_SIZE(kvm_vmx_exit_handlers);
 
-static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
+static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *reason,
+			      u64 *info1, u64 *info2,
 			      u32 *intr_info, u32 *error_code)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	*reason = vmx->exit_reason.full;
 	*info1 = vmx_get_exit_qual(vcpu);
 	if (!(vmx->exit_reason.failed_vmentry)) {
 		*info2 = vmx->idt_vectoring_info;
@@ -6748,7 +6750,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	if (likely(!vmx->exit_reason.failed_vmentry))
 		vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
-	trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX);
+	trace_kvm_exit(vcpu, KVM_ISA_VMX);
 
 	if (unlikely(vmx->exit_reason.failed_vmentry))
 		return EXIT_FASTPATH_NONE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  2021-07-29 13:39 [PATCH v3 0/3] kvm: x86: Convey the exit reason, etc. to user-space on emulation failure David Edmondson
  2021-07-29 13:39 ` [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason David Edmondson
@ 2021-07-29 13:39 ` David Edmondson
  2021-07-30 22:14   ` Sean Christopherson
  2021-07-29 13:39 ` [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol David Edmondson
  2 siblings, 1 reply; 13+ messages in thread
From: David Edmondson @ 2021-07-29 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Jim Mattson, kvm,
	Borislav Petkov, David Matlack, Paolo Bonzini, H. Peter Anvin,
	x86, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	David Edmondson, Joao Martins

Should instruction emulation fail, include the VM exit reason, etc. in
the emulation_failure data passed to userspace, in order that the VMM
can report it as a debugging aid when describing the failure.

Suggested-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 arch/x86/include/asm/kvm_host.h |  5 ++++
 arch/x86/kvm/vmx/vmx.c          |  5 +---
 arch/x86/kvm/x86.c              | 53 ++++++++++++++++++++++++++-------
 include/uapi/linux/kvm.h        |  7 +++++
 4 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dfb902930cdc..17da43c1aa67 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1630,6 +1630,11 @@ extern u64 kvm_mce_cap_supported;
 int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
 					void *insn, int insn_len);
+void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
+					bool instruction_bytes,
+					void *data, unsigned int ndata);
+void kvm_prepare_emulation_failure_exit_with_reason(struct kvm_vcpu *vcpu,
+						    bool instruction_bytes);
 
 void kvm_enable_efer_bits(u64);
 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fefdecb0ff3d..a8d303c7c099 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5367,10 +5367,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
 
 		if (vmx->emulation_required && !vmx->rmode.vm86_active &&
 		    vcpu->arch.exception.pending) {
-			vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-			vcpu->run->internal.suberror =
-						KVM_INTERNAL_ERROR_EMULATION;
-			vcpu->run->internal.ndata = 0;
+			kvm_prepare_emulation_failure_exit_with_reason(vcpu, false);
 			return 0;
 		}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a4fd10604f72..a97bacd8922f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7456,7 +7456,9 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
 }
 EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
 
-static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
+void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
+					bool instruction_bytes,
+					void *data, unsigned int ndata)
 {
 	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
 	u32 insn_size = ctxt->fetch.end - ctxt->fetch.data;
@@ -7464,10 +7466,10 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
 
 	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
 	run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
-	run->emulation_failure.ndata = 0;
+	run->emulation_failure.ndata = 1; /* Always include the flags. */
 	run->emulation_failure.flags = 0;
 
-	if (insn_size) {
+	if (instruction_bytes && insn_size) {
 		run->emulation_failure.ndata = 3;
 		run->emulation_failure.flags |=
 			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
@@ -7477,7 +7479,42 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
 		memcpy(run->emulation_failure.insn_bytes,
 		       ctxt->fetch.data, insn_size);
 	}
+
+	ndata = min((size_t)ndata, sizeof(run->internal.data) -
+		    run->emulation_failure.ndata * sizeof(u64));
+	if (ndata) {
+		unsigned int offset =
+			offsetof(struct kvm_run, emulation_failure.flags) +
+			run->emulation_failure.ndata * sizeof(u64);
+
+		memcpy((void *)run + offset, data, ndata);
+		run->emulation_failure.ndata += ndata / sizeof(u64);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);
+
+void kvm_prepare_emulation_failure_exit_with_reason(struct kvm_vcpu *vcpu,
+						    bool instruction_bytes)
+{
+	struct {
+		__u64 exit_reason;
+		__u64 exit_info1;
+		__u64 exit_info2;
+		__u32 intr_info;
+		__u32 error_code;
+	} exit_reason;
+
+	static_call(kvm_x86_get_exit_info)(vcpu,
+					   &exit_reason.exit_reason,
+					   &exit_reason.exit_info1,
+					   &exit_reason.exit_info2,
+					   &exit_reason.intr_info,
+					   &exit_reason.error_code);
+
+	kvm_prepare_emulation_failure_exit(vcpu, instruction_bytes,
+					   &exit_reason, sizeof(exit_reason));
 }
+EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit_with_reason);
 
 static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 {
@@ -7493,16 +7530,14 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 
 	if (kvm->arch.exit_on_emulation_error ||
 	    (emulation_type & EMULTYPE_SKIP)) {
-		prepare_emulation_failure_exit(vcpu);
+		kvm_prepare_emulation_failure_exit_with_reason(vcpu, true);
 		return 0;
 	}
 
 	kvm_queue_exception(vcpu, UD_VECTOR);
 
 	if (!is_guest_mode(vcpu) && static_call(kvm_x86_get_cpl)(vcpu) == 0) {
-		vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-		vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-		vcpu->run->internal.ndata = 0;
+		kvm_prepare_emulation_failure_exit_with_reason(vcpu, false);
 		return 0;
 	}
 
@@ -12095,9 +12130,7 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 	 * doesn't seem to be a real use-case behind such requests, just return
 	 * KVM_EXIT_INTERNAL_ERROR for now.
 	 */
-	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-	vcpu->run->internal.ndata = 0;
+	kvm_prepare_emulation_failure_exit_with_reason(vcpu, false);
 
 	return 0;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..f1ef4117b824 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -397,6 +397,12 @@ struct kvm_run {
 		 * "ndata" is correct, that new fields are enumerated in "flags",
 		 * and that each flag enumerates fields that are 64-bit aligned
 		 * and sized (so that ndata+internal.data[] is valid/accurate).
+		 *
+		 * Space beyond the defined fields may be used to
+		 * store arbitrary debug information relating to the
+		 * emulation failure. It is accounted for in "ndata"
+		 * but otherwise unspecified and is not represented in
+		 * "flags".
 		 */
 		struct {
 			__u32 suberror;
@@ -404,6 +410,7 @@ struct kvm_run {
 			__u64 flags;
 			__u8  insn_size;
 			__u8  insn_bytes[15];
+			/* Arbitrary debug data may follow. */
 		} emulation_failure;
 		/* KVM_EXIT_OSI */
 		struct {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  2021-07-29 13:39 [PATCH v3 0/3] kvm: x86: Convey the exit reason, etc. to user-space on emulation failure David Edmondson
  2021-07-29 13:39 ` [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason David Edmondson
  2021-07-29 13:39 ` [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace David Edmondson
@ 2021-07-29 13:39 ` David Edmondson
  2021-07-30 22:17   ` Sean Christopherson
  2 siblings, 1 reply; 13+ messages in thread
From: David Edmondson @ 2021-07-29 13:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Jim Mattson, kvm,
	Borislav Petkov, David Matlack, Paolo Bonzini, H. Peter Anvin,
	x86, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	David Edmondson

When passing the failing address and size out to user space, SGX must
ensure not to trample on the earlier fields of the emulation_failure
sub-union of struct kvm_run.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
---
 arch/x86/kvm/vmx/sgx.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 6693ebdc0770..63fb93163383 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -53,11 +53,9 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
 static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr,
 					 unsigned int size)
 {
-	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
-	vcpu->run->internal.ndata = 2;
-	vcpu->run->internal.data[0] = addr;
-	vcpu->run->internal.data[1] = size;
+	uint64_t data[2] = { addr, size };
+
+	kvm_prepare_emulation_failure_exit(vcpu, false, data, sizeof(data));
 }
 
 static int sgx_read_hva(struct kvm_vcpu *vcpu, unsigned long hva, void *data,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason
  2021-07-29 13:39 ` [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason David Edmondson
@ 2021-07-29 22:27   ` Sean Christopherson
  2021-07-30  7:29     ` David Edmondson
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2021-07-29 22:27 UTC (permalink / raw)
  To: David Edmondson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov

Shortlog is a bit odd, "should" is subjective and makes this sound like a bug fix.

  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info

On Thu, Jul 29, 2021, David Edmondson wrote:
> Extend the get_exit_info static call to provide the reason for the VM
> exit. Modify relevant trace points to use this rather than extracting
> the reason in the caller.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> ---
> -static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
> +static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *reason,
> +			      u64 *info1, u64 *info2,
>  			      u32 *intr_info, u32 *error_code)
>  {
>  	struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control;
>  
> +	*reason = control->exit_code;
>  	*info1 = control->exit_info_1;
>  	*info2 = control->exit_info_2;
>  	*intr_info = control->exit_int_info;

...

> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index b484141ea15b..2228565beda2 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -273,11 +273,11 @@ TRACE_EVENT(kvm_apic,
>  
>  #define TRACE_EVENT_KVM_EXIT(name)					     \
>  TRACE_EVENT(name,							     \
> -	TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu, u32 isa),  \
> -	TP_ARGS(exit_reason, vcpu, isa),				     \
> +	TP_PROTO(struct kvm_vcpu *vcpu, u32 isa),			     \
> +	TP_ARGS(vcpu, isa),						     \
>  									     \
>  	TP_STRUCT__entry(						     \
> -		__field(	unsigned int,	exit_reason	)	     \
> +		__field(	u64,		exit_reason	)	     \

Converting to a u64 is unnecessary and misleading.  vmcs.EXIT_REASON and
vmcb.EXIT_CODE are both u32s, a.k.a. unsigned ints.  There is vmcb.EXIT_CODE_HI,
but that's not being included, and AFAICT isn't even sanity checked by KVM.

>  		__field(	unsigned long,	guest_rip	)	     \
>  		__field(	u32,	        isa             )	     \
>  		__field(	u64,	        info1           )	     \

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason
  2021-07-29 22:27   ` Sean Christopherson
@ 2021-07-30  7:29     ` David Edmondson
  0 siblings, 0 replies; 13+ messages in thread
From: David Edmondson @ 2021-07-30  7:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov

On Thursday, 2021-07-29 at 22:27:28 GMT, Sean Christopherson wrote:

> Shortlog is a bit odd, "should" is subjective and makes this sound like a bug fix.
>
>   KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info

Okay.

> On Thu, Jul 29, 2021, David Edmondson wrote:
>> Extend the get_exit_info static call to provide the reason for the VM
>> exit. Modify relevant trace points to use this rather than extracting
>> the reason in the caller.
>> 
>> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
>> ---
>> -static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2,
>> +static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *reason,
>> +			      u64 *info1, u64 *info2,
>>  			      u32 *intr_info, u32 *error_code)
>>  {
>>  	struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control;
>>  
>> +	*reason = control->exit_code;
>>  	*info1 = control->exit_info_1;
>>  	*info2 = control->exit_info_2;
>>  	*intr_info = control->exit_int_info;
>
> ...
>
>> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
>> index b484141ea15b..2228565beda2 100644
>> --- a/arch/x86/kvm/trace.h
>> +++ b/arch/x86/kvm/trace.h
>> @@ -273,11 +273,11 @@ TRACE_EVENT(kvm_apic,
>>  
>>  #define TRACE_EVENT_KVM_EXIT(name)					     \
>>  TRACE_EVENT(name,							     \
>> -	TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu, u32 isa),  \
>> -	TP_ARGS(exit_reason, vcpu, isa),				     \
>> +	TP_PROTO(struct kvm_vcpu *vcpu, u32 isa),			     \
>> +	TP_ARGS(vcpu, isa),						     \
>>  									     \
>>  	TP_STRUCT__entry(						     \
>> -		__field(	unsigned int,	exit_reason	)	     \
>> +		__field(	u64,		exit_reason	)	     \
>
> Converting to a u64 is unnecessary and misleading.  vmcs.EXIT_REASON and
> vmcb.EXIT_CODE are both u32s, a.k.a. unsigned ints.  There is vmcb.EXIT_CODE_HI,
> but that's not being included, and AFAICT isn't even sanity checked by KVM.

Thanks for pointing this out, I can only blame brain fade.

>>  		__field(	unsigned long,	guest_rip	)	     \
>>  		__field(	u32,	        isa             )	     \
>>  		__field(	u64,	        info1           )	     \

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  2021-07-29 13:39 ` [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace David Edmondson
@ 2021-07-30 22:14   ` Sean Christopherson
  2021-08-02  7:28     ` David Edmondson
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2021-07-30 22:14 UTC (permalink / raw)
  To: David Edmondson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov, Joao Martins

On Thu, Jul 29, 2021, David Edmondson wrote:
> Should instruction emulation fail, include the VM exit reason, etc. in
> the emulation_failure data passed to userspace, in order that the VMM
> can report it as a debugging aid when describing the failure.
> 
> Suggested-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  5 ++++
>  arch/x86/kvm/vmx/vmx.c          |  5 +---
>  arch/x86/kvm/x86.c              | 53 ++++++++++++++++++++++++++-------
>  include/uapi/linux/kvm.h        |  7 +++++
>  4 files changed, 56 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index dfb902930cdc..17da43c1aa67 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1630,6 +1630,11 @@ extern u64 kvm_mce_cap_supported;
>  int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
>  int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
>  					void *insn, int insn_len);
> +void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
> +					bool instruction_bytes,
> +					void *data, unsigned int ndata);
> +void kvm_prepare_emulation_failure_exit_with_reason(struct kvm_vcpu *vcpu,
> +						    bool instruction_bytes);
>  
>  void kvm_enable_efer_bits(u64);
>  bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index fefdecb0ff3d..a8d303c7c099 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5367,10 +5367,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>  
>  		if (vmx->emulation_required && !vmx->rmode.vm86_active &&
>  		    vcpu->arch.exception.pending) {
> -			vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> -			vcpu->run->internal.suberror =
> -						KVM_INTERNAL_ERROR_EMULATION;
> -			vcpu->run->internal.ndata = 0;
> +			kvm_prepare_emulation_failure_exit_with_reason(vcpu, false);
>  			return 0;
>  		}
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a4fd10604f72..a97bacd8922f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7456,7 +7456,9 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
>  }
>  EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
>  
> -static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
> +void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
> +					bool instruction_bytes,
> +					void *data, unsigned int ndata)

'data' should be a 'u64 *', and 'ndata' should be a 'u8' that is actual ndata as
opposed to the size.  The obvious alternative would be to rename ndata to size,
but IMO that's unnecessarily complex, it's much easier to force the caller to work
with u64s.  That also reduces the probablity of KVM mangling data[] by dumping
unrelated fields into a single data[] entry, or by leaving stale chunks (if the
incoming data is not a multiple of 8 bytes).

>  {
>  	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>  	u32 insn_size = ctxt->fetch.end - ctxt->fetch.data;
> @@ -7464,10 +7466,10 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
>  
>  	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>  	run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
> -	run->emulation_failure.ndata = 0;
> +	run->emulation_failure.ndata = 1; /* Always include the flags. */
>  	run->emulation_failure.flags = 0;
>  
> -	if (insn_size) {
> +	if (instruction_bytes && insn_size) {
>  		run->emulation_failure.ndata = 3;
>  		run->emulation_failure.flags |=
>  			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> @@ -7477,7 +7479,42 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
>  		memcpy(run->emulation_failure.insn_bytes,
>  		       ctxt->fetch.data, insn_size);
>  	}
> +
> +	ndata = min((size_t)ndata, sizeof(run->internal.data) -
> +		    run->emulation_failure.ndata * sizeof(u64));
> +	if (ndata) {
> +		unsigned int offset =
> +			offsetof(struct kvm_run, emulation_failure.flags) +
> +			run->emulation_failure.ndata * sizeof(u64);
> +
> +		memcpy((void *)run + offset, data, ndata);
> +		run->emulation_failure.ndata += ndata / sizeof(u64);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);

NAK on exporting this particular helper, it consumes vcpu->arch.emulate_ctxt,
which ideally wouldn't even be visible to vendor code (stupid SVM erratum).  The
emulation context will rarely, if ever, be valid if this is called from vendor code.

The SGX call is effectively guarded by instruction_bytes=false, but that's a
messy approach as the handle_emulation_failure() patch is the _only_ case where
emulate_ctxt can be valid.

> +void kvm_prepare_emulation_failure_exit_with_reason(struct kvm_vcpu *vcpu,
> +						    bool instruction_bytes)
> +{
> +	struct {
> +		__u64 exit_reason;

As mentioned in the prior patch, exit_reason is probably best kept to a u32 at
this time.

> +		__u64 exit_info1;
> +		__u64 exit_info2;
> +		__u32 intr_info;
> +		__u32 error_code;
> +	} exit_reason;

Oooh, you're dumping all the fields in kvm_run.  That took me forever to realize
because the struct is named "exit_reason".  Unless there's a naming conflict,
'data' would be the simplest, and if that's already taken, maybe 'info'?

I'm also not sure an anonymous struct is going to be the easiest to maintain.
I do like that the fields all have names, but on the other hand the data should
be padded so that each field is in its own data[] entry when dumped to userspace.
IMO, the padding complexity isn't worth the naming niceness since this code
doesn't actually care about what each field contains.

> +
> +	static_call(kvm_x86_get_exit_info)(vcpu,
> +					   &exit_reason.exit_reason,
> +					   &exit_reason.exit_info1,
> +					   &exit_reason.exit_info2,
> +					   &exit_reason.intr_info,
> +					   &exit_reason.error_code);
> +
> +	kvm_prepare_emulation_failure_exit(vcpu, instruction_bytes,
> +					   &exit_reason, sizeof(exit_reason));

Retrieiving the exit reason and info should probably be in the inner helper, the
only case where the info will be stale is the VMX !unrestricted_guest
handle_invalid_guest_state() path, but even then I think the last VM-Exit info
would be interesting/relevant.  That should allow for a cleaner API too.

This is what I came up with after a fair bit of massaging.  The get_exit_info()
call is beyond gross, but I still think I like it more than a struct?  A
build-time assertion that the struct size is a multiple of 8 would allay my
concerns over leaking stack state to userspace, so I'm not totally opposed to it.

	struct {
		u32 exit_reason;
		u32 pad1;
		u64 info1;
		u64 info2;
		u32 intr_info;
		u32 pad2;
		u32 error_code;
		u32 pad3;
	} info;
	u64 ninfo = sizeof(info) / sizeof(u64);

	BUILD_BUG_ON(sizeof(info) % sizeof(u64));

	/*
	 * Zero the whole struct used to retrieve the exit info to ensure the
	 * padding does not leak stack data to userspace.
	 */
	memset(&info, 0, sizeof(info));

	static_call(kvm_x86_get_exit_info)(vcpu, &info.exit_reason,
					   &info.info1, &info.info2,
					   &info.intr_info, &info.error_code);



void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
					  u8 ndata);
void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);

static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
					   u8 ndata, u8 *insn_bytes, u8 insn_size)
{
	struct kvm_run *run = vcpu->run;
	u8 ndata_start;
	u64 info[5];

	/*
	 * Zero the whole array used to retrieve the exit info, casting to u32
	 * for select entries will leave some chunks uninitialized.
	 */
	memset(&info, 0, sizeof(info));

	static_call(kvm_x86_get_exit_info)(vcpu, (u32 *)&info[0], &info[1],
					   &info[2], (u32 *)&info[3],
					   (u32 *)&info[4]);

	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
	run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;

	/*
	 * There's currently space for 13 entries, but 5 are used for the exit
	 * reason and info.  Restrict to 4 to reduce the maintenance burden
	 * when expanding kvm_run.emulation_failure in the future.
	 */
	if (WARN_ON_ONCE(ndata > 4))
		ndata = 4;

	if (insn_size) {
		ndata_start = 3;
		run->emulation_failure.flags =
			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
		run->emulation_failure.insn_size = insn_size;
		memset(run->emulation_failure.insn_bytes, 0x90,
		       sizeof(run->emulation_failure.insn_bytes));
		memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
	} else {
		/* Always include the flags as a 'data' entry. */
		ndata_start = 1;
		run->emulation_failure.flags = 0;
	}

	memcpy(&run->internal.data[ndata_start], info, ARRAY_SIZE(info));
	memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);
}

static void prepare_emulation_ctxt_failure_exit(struct kvm_vcpu *vcpu)
{
	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;

	prepare_emulation_failure_exit(vcpu, NULL, 0, ctxt->fetch.data,
				       ctxt->fetch.end - ctxt->fetch.data);
}

void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
					  u8 ndata)
{
	prepare_emulation_failure_exit(vcpu, data, ndata, NULL, 0);
}
EXPORT_SYMBOL_GPL(__kvm_prepare_emulation_failure_exit);

void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
{
	__kvm_prepare_emulation_failure_exit(vcpu, NULL, 0);
}
EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  2021-07-29 13:39 ` [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol David Edmondson
@ 2021-07-30 22:17   ` Sean Christopherson
  2021-08-02  7:18     ` David Edmondson
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2021-07-30 22:17 UTC (permalink / raw)
  To: David Edmondson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov

On Thu, Jul 29, 2021, David Edmondson wrote:
> When passing the failing address and size out to user space, SGX must
> ensure not to trample on the earlier fields of the emulation_failure
> sub-union of struct kvm_run.
> 
> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
> ---
>  arch/x86/kvm/vmx/sgx.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index 6693ebdc0770..63fb93163383 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -53,11 +53,9 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
>  static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr,
>  					 unsigned int size)
>  {
> -	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> -	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> -	vcpu->run->internal.ndata = 2;
> -	vcpu->run->internal.data[0] = addr;
> -	vcpu->run->internal.data[1] = size;
> +	uint64_t data[2] = { addr, size };
> +
> +	kvm_prepare_emulation_failure_exit(vcpu, false, data, sizeof(data));

Assuming we go with my suggestion to have kvm_prepare_emulation_failure_exit()
capture the exit reason/info, it's probably worth converting all the
KVM_EXIT_INTERNAL_ERROR paths in sgx.c, even though the others don't clobber flags.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  2021-07-30 22:17   ` Sean Christopherson
@ 2021-08-02  7:18     ` David Edmondson
  0 siblings, 0 replies; 13+ messages in thread
From: David Edmondson @ 2021-08-02  7:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov

On Friday, 2021-07-30 at 22:17:02 GMT, Sean Christopherson wrote:

> On Thu, Jul 29, 2021, David Edmondson wrote:
>> When passing the failing address and size out to user space, SGX must
>> ensure not to trample on the earlier fields of the emulation_failure
>> sub-union of struct kvm_run.
>> 
>> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
>> ---
>>  arch/x86/kvm/vmx/sgx.c | 8 +++-----
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
>> index 6693ebdc0770..63fb93163383 100644
>> --- a/arch/x86/kvm/vmx/sgx.c
>> +++ b/arch/x86/kvm/vmx/sgx.c
>> @@ -53,11 +53,9 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsigned long offset,
>>  static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr,
>>  					 unsigned int size)
>>  {
>> -	vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>> -	vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
>> -	vcpu->run->internal.ndata = 2;
>> -	vcpu->run->internal.data[0] = addr;
>> -	vcpu->run->internal.data[1] = size;
>> +	uint64_t data[2] = { addr, size };
>> +
>> +	kvm_prepare_emulation_failure_exit(vcpu, false, data, sizeof(data));
>
> Assuming we go with my suggestion to have kvm_prepare_emulation_failure_exit()
> capture the exit reason/info, it's probably worth converting all the
> KVM_EXIT_INTERNAL_ERROR paths in sgx.c, even though the others don't clobber flags.

Okay.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  2021-07-30 22:14   ` Sean Christopherson
@ 2021-08-02  7:28     ` David Edmondson
  2021-08-02 16:58       ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: David Edmondson @ 2021-08-02  7:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov, Joao Martins

On Friday, 2021-07-30 at 22:14:48 GMT, Sean Christopherson wrote:

> On Thu, Jul 29, 2021, David Edmondson wrote:
>> Should instruction emulation fail, include the VM exit reason, etc. in
>> the emulation_failure data passed to userspace, in order that the VMM
>> can report it as a debugging aid when describing the failure.
>> 
>> Suggested-by: Joao Martins <joao.m.martins@oracle.com>
>> Signed-off-by: David Edmondson <david.edmondson@oracle.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  5 ++++
>>  arch/x86/kvm/vmx/vmx.c          |  5 +---
>>  arch/x86/kvm/x86.c              | 53 ++++++++++++++++++++++++++-------
>>  include/uapi/linux/kvm.h        |  7 +++++
>>  4 files changed, 56 insertions(+), 14 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index dfb902930cdc..17da43c1aa67 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1630,6 +1630,11 @@ extern u64 kvm_mce_cap_supported;
>>  int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
>>  int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
>>  					void *insn, int insn_len);
>> +void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
>> +					bool instruction_bytes,
>> +					void *data, unsigned int ndata);
>> +void kvm_prepare_emulation_failure_exit_with_reason(struct kvm_vcpu *vcpu,
>> +						    bool instruction_bytes);
>>  
>>  void kvm_enable_efer_bits(u64);
>>  bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index fefdecb0ff3d..a8d303c7c099 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -5367,10 +5367,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
>>  
>>  		if (vmx->emulation_required && !vmx->rmode.vm86_active &&
>>  		    vcpu->arch.exception.pending) {
>> -			vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>> -			vcpu->run->internal.suberror =
>> -						KVM_INTERNAL_ERROR_EMULATION;
>> -			vcpu->run->internal.ndata = 0;
>> +			kvm_prepare_emulation_failure_exit_with_reason(vcpu, false);
>>  			return 0;
>>  		}
>>  
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index a4fd10604f72..a97bacd8922f 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -7456,7 +7456,9 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
>>  
>> -static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
>> +void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu,
>> +					bool instruction_bytes,
>> +					void *data, unsigned int ndata)
>
> 'data' should be a 'u64 *', and 'ndata' should be a 'u8' that is actual ndata as
> opposed to the size.

Agreed that this will be better.

> The obvious alternative would be to rename ndata to size, but IMO
> that's unnecessarily complex, it's much easier to force the caller to
> work with u64s.  That also reduces the probablity of KVM mangling
> data[] by dumping unrelated fields into a single data[] entry, or by
> leaving stale chunks (if the incoming data is not a multiple of 8
> bytes).
>
>>  {
>>  	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>>  	u32 insn_size = ctxt->fetch.end - ctxt->fetch.data;
>> @@ -7464,10 +7466,10 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
>>  
>>  	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>  	run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
>> -	run->emulation_failure.ndata = 0;
>> +	run->emulation_failure.ndata = 1; /* Always include the flags. */
>>  	run->emulation_failure.flags = 0;
>>  
>> -	if (insn_size) {
>> +	if (instruction_bytes && insn_size) {
>>  		run->emulation_failure.ndata = 3;
>>  		run->emulation_failure.flags |=
>>  			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
>> @@ -7477,7 +7479,42 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
>>  		memcpy(run->emulation_failure.insn_bytes,
>>  		       ctxt->fetch.data, insn_size);
>>  	}
>> +
>> +	ndata = min((size_t)ndata, sizeof(run->internal.data) -
>> +		    run->emulation_failure.ndata * sizeof(u64));
>> +	if (ndata) {
>> +		unsigned int offset =
>> +			offsetof(struct kvm_run, emulation_failure.flags) +
>> +			run->emulation_failure.ndata * sizeof(u64);
>> +
>> +		memcpy((void *)run + offset, data, ndata);
>> +		run->emulation_failure.ndata += ndata / sizeof(u64);
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);
>
> NAK on exporting this particular helper, it consumes vcpu->arch.emulate_ctxt,
> which ideally wouldn't even be visible to vendor code (stupid SVM erratum).  The
> emulation context will rarely, if ever, be valid if this is called from vendor code.
>
> The SGX call is effectively guarded by instruction_bytes=false, but that's a
> messy approach as the handle_emulation_failure() patch is the _only_ case where
> emulate_ctxt can be valid.

Your changes to address this seem sensible.

>> +void kvm_prepare_emulation_failure_exit_with_reason(struct kvm_vcpu *vcpu,
>> +						    bool instruction_bytes)
>> +{
>> +	struct {
>> +		__u64 exit_reason;
>
> As mentioned in the prior patch, exit_reason is probably best kept to a u32 at
> this time.

Ack.

>> +		__u64 exit_info1;
>> +		__u64 exit_info2;
>> +		__u32 intr_info;
>> +		__u32 error_code;
>> +	} exit_reason;
>
> Oooh, you're dumping all the fields in kvm_run.  That took me forever to realize
> because the struct is named "exit_reason".  Unless there's a naming conflict,
> 'data' would be the simplest, and if that's already taken, maybe 'info'?
>
> I'm also not sure an anonymous struct is going to be the easiest to maintain.
> I do like that the fields all have names, but on the other hand the data should
> be padded so that each field is in its own data[] entry when dumped to userspace.
> IMO, the padding complexity isn't worth the naming niceness since this code
> doesn't actually care about what each field contains.

Given that this is avowedly not an ABI and that we are expecting any
(human) consumer to be intimate with the implementation to make sense of
it, is there really any requirement or need for padding?

In your example below (most of which I'm fine with), the padding has the
effect of wasting space that could be used for another u64 of debug
data.

>> +
>> +	static_call(kvm_x86_get_exit_info)(vcpu,
>> +					   &exit_reason.exit_reason,
>> +					   &exit_reason.exit_info1,
>> +					   &exit_reason.exit_info2,
>> +					   &exit_reason.intr_info,
>> +					   &exit_reason.error_code);
>> +
>> +	kvm_prepare_emulation_failure_exit(vcpu, instruction_bytes,
>> +					   &exit_reason, sizeof(exit_reason));
>
> Retrieiving the exit reason and info should probably be in the inner helper, the
> only case where the info will be stale is the VMX !unrestricted_guest
> handle_invalid_guest_state() path, but even then I think the last VM-Exit info
> would be interesting/relevant.  That should allow for a cleaner API too.

Makes sense, thanks.

> This is what I came up with after a fair bit of massaging.  The get_exit_info()
> call is beyond gross, but I still think I like it more than a struct?  A
> build-time assertion that the struct size is a multiple of 8 would allay my
> concerns over leaking stack state to userspace, so I'm not totally opposed to it.
>
> 	struct {
> 		u32 exit_reason;
> 		u32 pad1;
> 		u64 info1;
> 		u64 info2;
> 		u32 intr_info;
> 		u32 pad2;
> 		u32 error_code;
> 		u32 pad3;
> 	} info;
> 	u64 ninfo = sizeof(info) / sizeof(u64);
>
> 	BUILD_BUG_ON(sizeof(info) % sizeof(u64));
>
> 	/*
> 	 * Zero the whole struct used to retrieve the exit info to ensure the
> 	 * padding does not leak stack data to userspace.
> 	 */
> 	memset(&info, 0, sizeof(info));
>
> 	static_call(kvm_x86_get_exit_info)(vcpu, &info.exit_reason,
> 					   &info.info1, &info.info2,
> 					   &info.intr_info, &info.error_code);
>
>
>
> void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
> 					  u8 ndata);
> void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu);
>
> static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
> 					   u8 ndata, u8 *insn_bytes, u8 insn_size)
> {
> 	struct kvm_run *run = vcpu->run;
> 	u8 ndata_start;
> 	u64 info[5];
>
> 	/*
> 	 * Zero the whole array used to retrieve the exit info, casting to u32
> 	 * for select entries will leave some chunks uninitialized.
> 	 */
> 	memset(&info, 0, sizeof(info));
>
> 	static_call(kvm_x86_get_exit_info)(vcpu, (u32 *)&info[0], &info[1],
> 					   &info[2], (u32 *)&info[3],
> 					   (u32 *)&info[4]);
>
> 	run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> 	run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION;
>
> 	/*
> 	 * There's currently space for 13 entries, but 5 are used for the exit
> 	 * reason and info.  Restrict to 4 to reduce the maintenance burden
> 	 * when expanding kvm_run.emulation_failure in the future.
> 	 */
> 	if (WARN_ON_ONCE(ndata > 4))
> 		ndata = 4;
>
> 	if (insn_size) {
> 		ndata_start = 3;
> 		run->emulation_failure.flags =
> 			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> 		run->emulation_failure.insn_size = insn_size;
> 		memset(run->emulation_failure.insn_bytes, 0x90,
> 		       sizeof(run->emulation_failure.insn_bytes));
> 		memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
> 	} else {
> 		/* Always include the flags as a 'data' entry. */
> 		ndata_start = 1;
> 		run->emulation_failure.flags = 0;
> 	}

When we add another flag (presuming that we do, because if not there was
not much point in the flags) this will have to be restructured again. Is
there an objection to the original style? (prime ndata=1, flags=0, OR in
flags and adjust ndata as we go.)

> 	memcpy(&run->internal.data[ndata_start], info, ARRAY_SIZE(info));
> 	memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);
> }
>
> static void prepare_emulation_ctxt_failure_exit(struct kvm_vcpu *vcpu)
> {
> 	struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>
> 	prepare_emulation_failure_exit(vcpu, NULL, 0, ctxt->fetch.data,
> 				       ctxt->fetch.end - ctxt->fetch.data);
> }
>
> void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
> 					  u8 ndata)
> {
> 	prepare_emulation_failure_exit(vcpu, data, ndata, NULL, 0);
> }
> EXPORT_SYMBOL_GPL(__kvm_prepare_emulation_failure_exit);
>
> void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
> {
> 	__kvm_prepare_emulation_failure_exit(vcpu, NULL, 0);
> }
> EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  2021-08-02  7:28     ` David Edmondson
@ 2021-08-02 16:58       ` Sean Christopherson
  2021-08-02 17:23         ` David Edmondson
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2021-08-02 16:58 UTC (permalink / raw)
  To: David Edmondson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov, Joao Martins

On Mon, Aug 02, 2021, David Edmondson wrote:
> On Friday, 2021-07-30 at 22:14:48 GMT, Sean Christopherson wrote:
> 
> > On Thu, Jul 29, 2021, David Edmondson wrote:
> >> +		__u64 exit_info1;
> >> +		__u64 exit_info2;
> >> +		__u32 intr_info;
> >> +		__u32 error_code;
> >> +	} exit_reason;
> >
> > Oooh, you're dumping all the fields in kvm_run.  That took me forever to realize
> > because the struct is named "exit_reason".  Unless there's a naming conflict,
> > 'data' would be the simplest, and if that's already taken, maybe 'info'?
> >
> > I'm also not sure an anonymous struct is going to be the easiest to maintain.
> > I do like that the fields all have names, but on the other hand the data should
> > be padded so that each field is in its own data[] entry when dumped to userspace.
> > IMO, the padding complexity isn't worth the naming niceness since this code
> > doesn't actually care about what each field contains.
> 
> Given that this is avowedly not an ABI and that we are expecting any
> (human) consumer to be intimate with the implementation to make sense of
> it, is there really any requirement or need for padding?

My thought with the padding was to force each field into its own data[] entry.
E.g. if userspace does something like

	for (i = 0; i < ndata; i++)
		printf("\tdata[%d] = 0x%llx\n", i, data[i]);

then padding will yield

	data[0] = flags
	data[1] = exit_reason
	data[2] = exit_info1
	data[3] = exit_info2
	data[4] = intr_info
	data[5] = error_code

versus

	data[0] = <flags>
	data[1] = (exit_info1 << 32) | exit_reason
	data[2] = (exit_info2 << 32) | (exit_info1 >> 32)
	data[3] = (intr_info << 32) | (exit_info2 >> 32)
	data[4] = error_code

Changing exit_reason to a u64 would clean up the worst of the mangling, but until
there's actually a 64-bit exit reason to dump, that's just a more subtle way to
pad the data.

> In your example below (most of which I'm fine with), the padding has the
> effect of wasting space that could be used for another u64 of debug
> data.

Yes, but because it's not ABI, we can change it in the future if we get to the
point where we want to dump more info and don't have space.  Until that time, I
think it makes sense to prioritize readability with an ignorant (of the format)
userspace over memory footprint.

> > 	/*
> > 	 * There's currently space for 13 entries, but 5 are used for the exit
> > 	 * reason and info.  Restrict to 4 to reduce the maintenance burden
> > 	 * when expanding kvm_run.emulation_failure in the future.
> > 	 */
> > 	if (WARN_ON_ONCE(ndata > 4))
> > 		ndata = 4;
> >
> > 	if (insn_size) {
> > 		ndata_start = 3;
> > 		run->emulation_failure.flags =
> > 			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> > 		run->emulation_failure.insn_size = insn_size;
> > 		memset(run->emulation_failure.insn_bytes, 0x90,
> > 		       sizeof(run->emulation_failure.insn_bytes));
> > 		memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
> > 	} else {
> > 		/* Always include the flags as a 'data' entry. */
> > 		ndata_start = 1;
> > 		run->emulation_failure.flags = 0;
> > 	}
> 
> When we add another flag (presuming that we do, because if not there was
> not much point in the flags) this will have to be restructured again. Is
> there an objection to the original style? (prime ndata=1, flags=0, OR in
> flags and adjust ndata as we go.)

No objection, though if you OR in flags then you should truly _adjust_ ndata, not
set it, e.g.

        /* Always include the flags as a 'data' entry. */
        ndata_start = 1;
        run->emulation_failure.flags = 0;

        if (insn_size) {
                ndata_start += 2;  <----------------------- Adjust, not override
                run->emulation_failure.flags |=
                        KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
                run->emulation_failure.insn_size = insn_size;
                memset(run->emulation_failure.insn_bytes, 0x90,
                       sizeof(run->emulation_failure.insn_bytes));
                memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
        }

> > 	memcpy(&run->internal.data[ndata_start], info, ARRAY_SIZE(info));
> > 	memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);
> > }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  2021-08-02 16:58       ` Sean Christopherson
@ 2021-08-02 17:23         ` David Edmondson
  2021-08-07  0:59           ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: David Edmondson @ 2021-08-02 17:23 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov, Joao Martins

On Monday, 2021-08-02 at 16:58:03 GMT, Sean Christopherson wrote:

> On Mon, Aug 02, 2021, David Edmondson wrote:
>> On Friday, 2021-07-30 at 22:14:48 GMT, Sean Christopherson wrote:
>> 
>> > On Thu, Jul 29, 2021, David Edmondson wrote:
>> >> +		__u64 exit_info1;
>> >> +		__u64 exit_info2;
>> >> +		__u32 intr_info;
>> >> +		__u32 error_code;
>> >> +	} exit_reason;
>> >
>> > Oooh, you're dumping all the fields in kvm_run.  That took me forever to realize
>> > because the struct is named "exit_reason".  Unless there's a naming conflict,
>> > 'data' would be the simplest, and if that's already taken, maybe 'info'?
>> >
>> > I'm also not sure an anonymous struct is going to be the easiest to maintain.
>> > I do like that the fields all have names, but on the other hand the data should
>> > be padded so that each field is in its own data[] entry when dumped to userspace.
>> > IMO, the padding complexity isn't worth the naming niceness since this code
>> > doesn't actually care about what each field contains.
>> 
>> Given that this is avowedly not an ABI and that we are expecting any
>> (human) consumer to be intimate with the implementation to make sense of
>> it, is there really any requirement or need for padding?
>
> My thought with the padding was to force each field into its own data[] entry.
> E.g. if userspace does something like
>
> 	for (i = 0; i < ndata; i++)
> 		printf("\tdata[%d] = 0x%llx\n", i, data[i]);
>
> then padding will yield
>
> 	data[0] = flags
> 	data[1] = exit_reason
> 	data[2] = exit_info1
> 	data[3] = exit_info2
> 	data[4] = intr_info
> 	data[5] = error_code
>
> versus
>
> 	data[0] = <flags>
> 	data[1] = (exit_info1 << 32) | exit_reason
> 	data[2] = (exit_info2 << 32) | (exit_info1 >> 32)
> 	data[3] = (intr_info << 32) | (exit_info2 >> 32)
> 	data[4] = error_code
>
> Changing exit_reason to a u64 would clean up the worst of the mangling, but until
> there's actually a 64-bit exit reason to dump, that's just a more subtle way to
> pad the data.

Unnecessarily extending exit_reason to u64 would be bad, I agree.

>> In your example below (most of which I'm fine with), the padding has the
>> effect of wasting space that could be used for another u64 of debug
>> data.
>
> Yes, but because it's not ABI, we can change it in the future if we get to the
> point where we want to dump more info and don't have space.  Until that time, I
> think it makes sense to prioritize readability with an ignorant (of the format)
> userspace over memory footprint.

This seems reasonable.

>> > 	/*
>> > 	 * There's currently space for 13 entries, but 5 are used for the exit
>> > 	 * reason and info.  Restrict to 4 to reduce the maintenance burden
>> > 	 * when expanding kvm_run.emulation_failure in the future.
>> > 	 */
>> > 	if (WARN_ON_ONCE(ndata > 4))
>> > 		ndata = 4;
>> >
>> > 	if (insn_size) {
>> > 		ndata_start = 3;
>> > 		run->emulation_failure.flags =
>> > 			KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
>> > 		run->emulation_failure.insn_size = insn_size;
>> > 		memset(run->emulation_failure.insn_bytes, 0x90,
>> > 		       sizeof(run->emulation_failure.insn_bytes));
>> > 		memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
>> > 	} else {
>> > 		/* Always include the flags as a 'data' entry. */
>> > 		ndata_start = 1;
>> > 		run->emulation_failure.flags = 0;
>> > 	}
>> 
>> When we add another flag (presuming that we do, because if not there was
>> not much point in the flags) this will have to be restructured again. Is
>> there an objection to the original style? (prime ndata=1, flags=0, OR in
>> flags and adjust ndata as we go.)
>
> No objection, though if you OR in flags then you should truly _adjust_ ndata, not
> set it, e.g.

My understanding of Aaron's intent is that this would not be the
case.

That is, if we add another flag with payload and set that flag, we would
still have space for the instruction stream in data[] even if
KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is not set.

Given that, we must *set* ndata each time we add in a flag, with the
value being the extent of data[] used by the payload corresponding to
that flag, and the flags must be considered in ascending order (or we
remember a "max" along the way).

Dumping the arbitray debug data after the defined fields would require
adjusting ndata, of course.

If this is not the case, and the flag indicated payloads are packed at
the head of data[], then the current structure definition is misleading
and we should perhaps revise it.

>         /* Always include the flags as a 'data' entry. */
>         ndata_start = 1;
>         run->emulation_failure.flags = 0;
>
>         if (insn_size) {
>                 ndata_start += 2;  <----------------------- Adjust, not override
>                 run->emulation_failure.flags |=
>                         KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
>                 run->emulation_failure.insn_size = insn_size;
>                 memset(run->emulation_failure.insn_bytes, 0x90,
>                        sizeof(run->emulation_failure.insn_bytes));
>                 memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
>         }
>
>> > 	memcpy(&run->internal.data[ndata_start], info, ARRAY_SIZE(info));
>> > 	memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);
>> > }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  2021-08-02 17:23         ` David Edmondson
@ 2021-08-07  0:59           ` Sean Christopherson
  0 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2021-08-07  0:59 UTC (permalink / raw)
  To: David Edmondson
  Cc: linux-kernel, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Jim Mattson, kvm, Borislav Petkov, David Matlack, Paolo Bonzini,
	H. Peter Anvin, x86, Wanpeng Li, Vitaly Kuznetsov, Joao Martins

On Mon, Aug 02, 2021, David Edmondson wrote:
> On Monday, 2021-08-02 at 16:58:03 GMT, Sean Christopherson wrote:
> 
> > On Mon, Aug 02, 2021, David Edmondson wrote:
> >> On Friday, 2021-07-30 at 22:14:48 GMT, Sean Christopherson wrote:
> >> When we add another flag (presuming that we do, because if not there was
> >> not much point in the flags) this will have to be restructured again. Is
> >> there an objection to the original style? (prime ndata=1, flags=0, OR in
> >> flags and adjust ndata as we go.)
> >
> > No objection, though if you OR in flags then you should truly _adjust_ ndata, not
> > set it, e.g.
> 
> My understanding of Aaron's intent is that this would not be the case.
> 
> That is, if we add another flag with payload and set that flag, we would
> still have space for the instruction stream in data[] even if
> KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is not set.

Hmm, I don't think we have to make that decision yet.  Userspace must check the
flag before consuming run->emulation_failure, so we haven't fully commited one
way or the other.

I believe the original thought was indeed to skip unused fields, but I don't think
that's actually the behavior we want for completely unrelated fields, i.e. flag
combinations that will _never_ be valid together.  The only reason to skip fields
would be to keep the offset of a particular field constant, so I think the rule
can be that if fields that can coexist but are controlled by different flags, they
must be in the same anonymous struct, but in general a union is ok.

It seems rather unlikely that we'll gain many more flags, but it would be really
unfortunate if we commit to skipping fields and then run out of space because of
that decision.

> Given that, we must *set* ndata each time we add in a flag

Technically we wouldn't have to (being more than a bit pedantic), we could keep
the same flow and just do the ndata_start bump outside of the OR path, e.g.

        /* Always include the flags as a 'data' entry. */
        ndata_start = 1;
        run->emulation_failure.flags = 0;

        /* Skip unused fields instead of overloading them when they're not used. */
        ndata_start += 2;
        if (insn_size) {
                run->emulation_failure.flags |=
                        KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
                run->emulation_failure.insn_size = insn_size;
                memset(run->emulation_failure.insn_bytes, 0x90,
                       sizeof(run->emulation_failure.insn_bytes));
                memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
        }

so that it's easier to understand the magic numbers used to adjust ndata_start.

But a better solution would be to have no magic numbers at all and set ndata_start
via sizeof(run->emulation_failure).  E.g.

	/*
	 * There's currently space for 13 entries, but 5 are used for the exit
	 * reason and info.  Restrict to 4 to reduce the maintenance burden
	 * when expanding kvm_run.emulation_failure in the future.
	 */
	if (WARN_ON_ONCE(ndata > 4))
		ndata = 4;

	ndata_start = sizeof(run->emulation_failure);
	memcpy(&run->internal.data[], info, ARRAY_SIZE(info));
	memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);

	run->internal.ndata = ndata_start + ARRAY_SIZE(info) + ndata;

Though I'd prefer we not skip fields, at least not yet, e.g. to condition userspace
to do the right thing if we decide to not skip when adding a second flag (if that
even ever happens).

> with the value being the extent of data[] used by the payload corresponding
> to that flag, and the flags must be considered in ascending order (or we
> remember a "max" along the way).
> 
> Dumping the arbitray debug data after the defined fields would require
> adjusting ndata, of course.
> 
> If this is not the case, and the flag indicated payloads are packed at
> the head of data[], then the current structure definition is misleading
> and we should perhaps revise it.

Ah, because there's no wrapping union.  I wouldn't object to something like this
to hint to userpace that the struct layout may not be purely additive in the
future.

diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d9e4aabcb31a..6c79c1ce3703 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -402,8 +402,12 @@ struct kvm_run {
                        __u32 suberror;
                        __u32 ndata;
                        __u64 flags;
-                       __u8  insn_size;
-                       __u8  insn_bytes[15];
+                       union {
+                               struct {
+                                       __u8  insn_size;
+                                       __u8  insn_bytes[15];
+                               };
+                       };
                } emulation_failure;
                /* KVM_EXIT_OSI */
                struct {

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-08-07  0:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 13:39 [PATCH v3 0/3] kvm: x86: Convey the exit reason, etc. to user-space on emulation failure David Edmondson
2021-07-29 13:39 ` [PATCH v3 1/3] KVM: x86: kvm_x86_ops.get_exit_info should include the exit reason David Edmondson
2021-07-29 22:27   ` Sean Christopherson
2021-07-30  7:29     ` David Edmondson
2021-07-29 13:39 ` [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace David Edmondson
2021-07-30 22:14   ` Sean Christopherson
2021-08-02  7:28     ` David Edmondson
2021-08-02 16:58       ` Sean Christopherson
2021-08-02 17:23         ` David Edmondson
2021-08-07  0:59           ` Sean Christopherson
2021-07-29 13:39 ` [PATCH v3 3/3] KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol David Edmondson
2021-07-30 22:17   ` Sean Christopherson
2021-08-02  7:18     ` David Edmondson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.