kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Allow userspace to manage MSRs
@ 2020-08-04  4:20 Aaron Lewis
  2020-08-04  4:20 ` [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list Aaron Lewis
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

This series makes it possible for userspace to manage MSRs by having KVM
forward select MSRs to it when rdmsr and wrmsr are executed in the guest.
Userspace can set this up by calling the ioctl KVM_SET_EXIT_MSRS with a
list of MSRs it wants to manage.  When KVM encounters any of these MSRs
they are forwarded to userspace for processing.  Userspace can then read
from or write to the MSR, or it can also throw a #GP if needed.

This series includes the kernel changes needed to implement this feature
and a test that exercises this behavior.  Also, included is an
implementation of expection handling in selftests, which allows the test
to excercise throwing a #GP.

Aaron Lewis (6):
  KVM: x86: Add ioctl for accepting a userspace provided MSR list
  KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr
  KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
  KVM: x86: Ensure the MSR bitmap never clears userspace tracked MSRs
  selftests: kvm: Fix the segment descriptor layout to match the actual
    layout
  selftests: kvm: Add test to exercise userspace MSR list

 Documentation/virt/kvm/api.rst                |  53 +++-
 arch/x86/include/asm/kvm_host.h               |   5 +
 arch/x86/kvm/svm/svm.c                        |  93 ++++--
 arch/x86/kvm/trace.h                          |  24 ++
 arch/x86/kvm/vmx/nested.c                     |   2 +-
 arch/x86/kvm/vmx/vmx.c                        |  94 +++---
 arch/x86/kvm/vmx/vmx.h                        |   2 +-
 arch/x86/kvm/x86.c                            | 140 +++++++++
 include/trace/events/kvm.h                    |   2 +-
 include/uapi/linux/kvm.h                      |  12 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |  20 +-
 .../selftests/kvm/include/x86_64/processor.h  |  29 +-
 tools/testing/selftests/kvm/lib/kvm_util.c    |  17 ++
 .../selftests/kvm/lib/kvm_util_internal.h     |   2 +
 .../selftests/kvm/lib/x86_64/handlers.S       |  83 ++++++
 .../selftests/kvm/lib/x86_64/processor.c      | 168 ++++++++++-
 .../testing/selftests/kvm/lib/x86_64/ucall.c  |   3 +
 .../selftests/kvm/x86_64/userspace_msr_exit.c | 271 ++++++++++++++++++
 19 files changed, 931 insertions(+), 90 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/handlers.S
 create mode 100644 tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c

-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list
  2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
@ 2020-08-04  4:20 ` Aaron Lewis
  2020-08-07 16:12   ` Alexander Graf
  2020-08-04  4:20 ` [PATCH 2/6] KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr Aaron Lewis
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

Add KVM_SET_EXIT_MSRS ioctl to allow userspace to pass in a list of MSRs
that force an exit to userspace when rdmsr or wrmsr are used by the
guest.

KVM_SET_EXIT_MSRS will need to be called before any vCPUs are
created to protect the 'user_exit_msrs' list from being mutated while
vCPUs are running.

Add KVM_CAP_SET_MSR_EXITS to identify the feature exists.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst  | 24 +++++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 41 +++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h        |  2 ++
 4 files changed, 69 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 320788f81a05..7d8167c165aa 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1006,6 +1006,30 @@ such as migration.
 :Parameters: struct kvm_vcpu_event (out)
 :Returns: 0 on success, -1 on error
 
+4.32 KVM_SET_EXIT_MSRS
+------------------
+
+:Capability: KVM_CAP_SET_MSR_EXITS
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_msr_list (in)
+:Returns: 0 on success, -1 on error
+
+Sets the userspace MSR list which is used to track which MSRs KVM should send
+to userspace to be serviced when the guest executes rdmsr or wrmsr.
+
+This ioctl needs to be called before vCPUs are setup otherwise the list of MSRs
+will not be accepted and an EINVAL error will be returned.  Also, if a list of
+MSRs has already been supplied, and this ioctl is called again an EEXIST error
+will be returned.
+
+::
+
+  struct kvm_msr_list {
+  __u32 nmsrs;
+  __u32 indices[0];
+};
+
 X86:
 ^^^^
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index be5363b21540..510055471dd0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1004,6 +1004,8 @@ struct kvm_arch {
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
+
+	struct kvm_msr_list *user_exit_msrs;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88c593f83b28..46a0fb9e0869 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3419,6 +3419,42 @@ static inline bool kvm_can_mwait_in_guest(void)
 		boot_cpu_has(X86_FEATURE_ARAT);
 }
 
+static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
+				      struct kvm_msr_list __user *user_msr_list)
+{
+	struct kvm_msr_list *msr_list, hdr;
+	size_t indices_size;
+
+	if (kvm->arch.user_exit_msrs != NULL)
+		return -EEXIST;
+
+	if (kvm->created_vcpus)
+		return -EINVAL;
+
+	if (copy_from_user(&hdr, user_msr_list, sizeof(hdr)))
+		return -EFAULT;
+
+	if (hdr.nmsrs >= MAX_IO_MSRS)
+		return -E2BIG;
+
+	indices_size = sizeof(hdr.indices[0]) * hdr.nmsrs;
+	msr_list = kvzalloc(sizeof(struct kvm_msr_list) + indices_size,
+			    GFP_KERNEL_ACCOUNT);
+	if (!msr_list)
+		return -ENOMEM;
+	msr_list->nmsrs = hdr.nmsrs;
+
+	if (copy_from_user(msr_list->indices, user_msr_list->indices,
+			   indices_size)) {
+		kvfree(msr_list);
+		return -EFAULT;
+	}
+
+	kvm->arch.user_exit_msrs = msr_list;
+
+	return 0;
+}
+
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 {
 	int r = 0;
@@ -3476,6 +3512,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_MSR_PLATFORM_INFO:
 	case KVM_CAP_EXCEPTION_PAYLOAD:
 	case KVM_CAP_SET_GUEST_DEBUG:
+	case KVM_CAP_SET_MSR_EXITS:
 		r = 1;
 		break;
 	case KVM_CAP_SYNC_REGS:
@@ -5261,6 +5298,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_SET_EXIT_MSRS: {
+		r = kvm_vm_ioctl_set_exit_msrs(kvm, argp);
+		break;
+	}
 	case KVM_MEMORY_ENCRYPT_OP: {
 		r = -ENOTTY;
 		if (kvm_x86_ops.mem_enc_op)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fdf30316582..de4638c1bd15 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_SET_MSR_EXITS 185
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1371,6 +1372,7 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_SET_EXIT_MSRS	_IOW(KVMIO, 0xb4, struct kvm_msr_list)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr
  2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
  2020-08-04  4:20 ` [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list Aaron Lewis
@ 2020-08-04  4:20 ` Aaron Lewis
  2020-08-07 23:41   ` Alexander Graf
  2020-08-04  4:20 ` [PATCH 3/6] KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs Aaron Lewis
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

Add support for exiting to userspace on a rdmsr or wrmsr instruction if
the MSR being read from or written to is in the user_exit_msrs list.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 Documentation/virt/kvm/api.rst | 29 +++++++++++-
 arch/x86/kvm/trace.h           | 24 ++++++++++
 arch/x86/kvm/x86.c             | 83 ++++++++++++++++++++++++++++++++++
 include/trace/events/kvm.h     |  2 +-
 include/uapi/linux/kvm.h       | 10 ++++
 5 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7d8167c165aa..8b7078707e0a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4885,8 +4885,9 @@ to the byte array.
 
 .. note::
 
-      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
-      KVM_EXIT_EPR the corresponding
+      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
+      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR, and KVM_EXIT_X86_WRMSR the
+      corresponding
 
 operations are complete (and guest state is consistent) only after userspace
 has re-entered the kernel with KVM_RUN.  The kernel side will first finish
@@ -5179,6 +5180,30 @@ Note that KVM does not skip the faulting instruction as it does for
 KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
 if it decides to decode and emulate the instruction.
 
+::
+
+    /* KVM_EXIT_X86_RDMSR */
+    /* KVM_EXIT_X86_WRMSR */
+    struct {
+      __u8 inject_gp; /* out */
+      __u8 pad[3];
+      __u32 index;    /* i.e. ecx; out */
+      __u64 data;     /* in (wrmsr) / out (rdmsr) */
+    } msr;
+
+If the exit_reason is KVM_EXIT_X86_RDMSR then a rdmsr instruction in the guest
+needs to be processed by userspace.  If the exit_reason is KVM_EXIT_X86_WRMSR
+then a wrmsr instruction in the guest needs to be processed by userspace.
+
+Userspace can tell KVM to inject a #GP into the guest by setting the
+'inject_gp' flag.  Setting the flag to 1 tells KVM to inject a GP into the
+guest.  Setting the flag to 0 tells KVM to not inject a GP into the guest.
+
+The MSR being processed is indicated by 'index'.  If a read is being processed
+the 'data' field is expected to be filled out by userspace (as an out
+parameter). If a write is being processed the 'data' field will contain the
+updated value of the MSR (as an in parameter).
+
 ::
 
 		/* Fix the size of the union. */
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b66432b015d2..d03143ebd6f0 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -367,6 +367,30 @@ TRACE_EVENT(kvm_msr,
 #define trace_kvm_msr_read_ex(ecx)         trace_kvm_msr(0, ecx, 0, true)
 #define trace_kvm_msr_write_ex(ecx, data)  trace_kvm_msr(1, ecx, data, true)
 
+TRACE_EVENT(kvm_userspace_msr,
+	TP_PROTO(bool is_write, u8 inject_gp, u32 index, u64 data),
+	TP_ARGS(is_write, inject_gp, index, data),
+
+	TP_STRUCT__entry(
+		__field(bool,	is_write)
+		__field(u8,	inject_gp)
+		__field(u32,	index)
+		__field(u64,	data)
+	),
+
+	TP_fast_assign(
+		__entry->is_write	= is_write;
+		__entry->inject_gp	= inject_gp;
+		__entry->index		= index;
+		__entry->data		= data;
+	),
+
+	TP_printk("userspace %s %x = 0x%llx, %s",
+		  __entry->is_write ? "wrmsr" : "rdmsr",
+		  __entry->index, __entry->data,
+		  __entry->inject_gp ? "inject_gp" : "no_gp")
+);
+
 /*
  * Tracepoint for guest CR access.
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 46a0fb9e0869..47619b49818a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -108,6 +108,8 @@ static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 static void store_regs(struct kvm_vcpu *vcpu);
 static int sync_regs(struct kvm_vcpu *vcpu);
 
+bool kvm_msr_user_exit(struct kvm *kvm, u32 index);
+
 struct kvm_x86_ops kvm_x86_ops __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_x86_ops);
 
@@ -1549,11 +1551,61 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
 }
 EXPORT_SYMBOL_GPL(kvm_set_msr);
 
+/*
+ * On success, returns 1 so that __vcpu_run() will happen next. On
+ * error, returns 0.
+ */
+static int complete_userspace_msr(struct kvm_vcpu *vcpu, bool is_write)
+{
+	u32 ecx = vcpu->run->msr.index;
+	u64 data = vcpu->run->msr.data;
+
+	trace_kvm_userspace_msr(is_write,
+				vcpu->run->msr.inject_gp,
+				vcpu->run->msr.index,
+				vcpu->run->msr.data);
+
+	if (vcpu->run->msr.inject_gp) {
+		trace_kvm_msr(is_write, ecx, data, true);
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	trace_kvm_msr(is_write, ecx, data, false);
+	if (!is_write) {
+		kvm_rax_write(vcpu, data & -1u);
+		kvm_rdx_write(vcpu, (data >> 32) & -1u);
+	}
+
+	return kvm_skip_emulated_instruction(vcpu);
+}
+
+static int complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
+{
+	return complete_userspace_msr(vcpu, false);
+}
+
+static int complete_userspace_wrmsr(struct kvm_vcpu *vcpu)
+{
+	return complete_userspace_msr(vcpu, true);
+}
+
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 {
 	u32 ecx = kvm_rcx_read(vcpu);
 	u64 data;
 
+	if (kvm_msr_user_exit(vcpu->kvm, ecx)) {
+		vcpu->run->exit_reason = KVM_EXIT_X86_RDMSR;
+		vcpu->run->msr.index = ecx;
+		vcpu->run->msr.data = 0;
+		vcpu->run->msr.inject_gp = 0;
+		memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
+		vcpu->arch.complete_userspace_io =
+			complete_userspace_rdmsr;
+		return 0;
+	}
+
 	if (kvm_get_msr(vcpu, ecx, &data)) {
 		trace_kvm_msr_read_ex(ecx);
 		kvm_inject_gp(vcpu, 0);
@@ -1573,6 +1625,17 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	u32 ecx = kvm_rcx_read(vcpu);
 	u64 data = kvm_read_edx_eax(vcpu);
 
+	if (kvm_msr_user_exit(vcpu->kvm, ecx)) {
+		vcpu->run->exit_reason = KVM_EXIT_X86_WRMSR;
+		vcpu->run->msr.index = ecx;
+		vcpu->run->msr.data = data;
+		vcpu->run->msr.inject_gp = 0;
+		memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
+		vcpu->arch.complete_userspace_io =
+			complete_userspace_wrmsr;
+		return 0;
+	}
+
 	if (kvm_set_msr(vcpu, ecx, data)) {
 		trace_kvm_msr_write_ex(ecx, data);
 		kvm_inject_gp(vcpu, 0);
@@ -3455,6 +3518,25 @@ static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
 	return 0;
 }
 
+bool kvm_msr_user_exit(struct kvm *kvm, u32 index)
+{
+	struct kvm_msr_list *exit_msrs;
+	int i;
+
+	exit_msrs = kvm->arch.user_exit_msrs;
+
+	if (!exit_msrs)
+		return false;
+
+	for (i = 0; i < exit_msrs->nmsrs; ++i) {
+		if (exit_msrs->indices[i] == index)
+			return true;
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(kvm_msr_user_exit);
+
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 {
 	int r = 0;
@@ -10762,3 +10844,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_userspace_msr);
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 2c735a3e6613..19f33a704174 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -17,7 +17,7 @@
 	ERSN(NMI), ERSN(INTERNAL_ERROR), ERSN(OSI), ERSN(PAPR_HCALL),	\
 	ERSN(S390_UCONTROL), ERSN(WATCHDOG), ERSN(S390_TSCH), ERSN(EPR),\
 	ERSN(SYSTEM_EVENT), ERSN(S390_STSI), ERSN(IOAPIC_EOI),          \
-	ERSN(HYPERV)
+	ERSN(HYPERV), ERSN(X86_RDMSR), ERSN(X86_WRMSR)
 
 TRACE_EVENT(kvm_userspace_exit,
 	    TP_PROTO(__u32 reason, int errno),
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index de4638c1bd15..2b7d21e6338c 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -248,6 +248,8 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
 #define KVM_EXIT_ARM_NISV         28
+#define KVM_EXIT_X86_RDMSR        29
+#define KVM_EXIT_X86_WRMSR        30
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -412,6 +414,14 @@ struct kvm_run {
 			__u64 esr_iss;
 			__u64 fault_ipa;
 		} arm_nisv;
+		/* KVM_EXIT_X86_RDMSR */
+		/* KVM_EXIT_X86_RDMSR */
+		struct {
+			__u8 inject_gp;
+			__u8 pad[3];
+			__u32 index;
+			__u64 data;
+		} msr;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs
  2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
  2020-08-04  4:20 ` [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list Aaron Lewis
  2020-08-04  4:20 ` [PATCH 2/6] KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr Aaron Lewis
@ 2020-08-04  4:20 ` Aaron Lewis
  2020-08-04  4:20 ` [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears " Aaron Lewis
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

Prepare vmx and svm for a subsequent change that ensures the MSR bitmap
allows for a vm_exit for all MSRs that userspace is tracking.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/svm/svm.c    | 48 +++++++++++-----------
 arch/x86/kvm/vmx/nested.c |  2 +-
 arch/x86/kvm/vmx/vmx.c    | 83 +++++++++++++++++++--------------------
 arch/x86/kvm/vmx/vmx.h    |  2 +-
 4 files changed, 67 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c0da4dd78ac5..eb673b59f7b7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -564,7 +564,7 @@ static bool valid_msr_intercept(u32 index)
 	return false;
 }
 
-static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 {
 	u8 bit_write;
 	unsigned long tmp;
@@ -583,9 +583,11 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
 	return !!test_bit(bit_write,  &tmp);
 }
 
-static void set_msr_interception(u32 *msrpm, unsigned msr,
-				 int read, int write)
+static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
+				 int write)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+	u32 *msrpm = svm->msrpm;
 	u8 bit_read, bit_write;
 	unsigned long tmp;
 	u32 offset;
@@ -609,7 +611,7 @@ static void set_msr_interception(u32 *msrpm, unsigned msr,
 	msrpm[offset] = tmp;
 }
 
-static void svm_vcpu_init_msrpm(u32 *msrpm)
+static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
 {
 	int i;
 
@@ -619,7 +621,7 @@ static void svm_vcpu_init_msrpm(u32 *msrpm)
 		if (!direct_access_msrs[i].always)
 			continue;
 
-		set_msr_interception(msrpm, direct_access_msrs[i].index, 1, 1);
+		set_msr_interception(vcpu, direct_access_msrs[i].index, 1, 1);
 	}
 }
 
@@ -666,26 +668,26 @@ static void init_msrpm_offsets(void)
 	}
 }
 
-static void svm_enable_lbrv(struct vcpu_svm *svm)
+static void svm_enable_lbrv(struct kvm_vcpu *vcpu)
 {
-	u32 *msrpm = svm->msrpm;
+	struct vcpu_svm *svm = to_svm(vcpu);
 
 	svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
-	set_msr_interception(msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
-	set_msr_interception(msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
-	set_msr_interception(msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
-	set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+	set_msr_interception(vcpu, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
+	set_msr_interception(vcpu, MSR_IA32_LASTBRANCHTOIP, 1, 1);
+	set_msr_interception(vcpu, MSR_IA32_LASTINTFROMIP, 1, 1);
+	set_msr_interception(vcpu, MSR_IA32_LASTINTTOIP, 1, 1);
 }
 
-static void svm_disable_lbrv(struct vcpu_svm *svm)
+static void svm_disable_lbrv(struct kvm_vcpu *vcpu)
 {
-	u32 *msrpm = svm->msrpm;
+	struct vcpu_svm *svm = to_svm(vcpu);
 
 	svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK;
-	set_msr_interception(msrpm, MSR_IA32_LASTBRANCHFROMIP, 0, 0);
-	set_msr_interception(msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0);
-	set_msr_interception(msrpm, MSR_IA32_LASTINTFROMIP, 0, 0);
-	set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+	set_msr_interception(vcpu, MSR_IA32_LASTBRANCHFROMIP, 0, 0);
+	set_msr_interception(vcpu, MSR_IA32_LASTBRANCHTOIP, 0, 0);
+	set_msr_interception(vcpu, MSR_IA32_LASTINTFROMIP, 0, 0);
+	set_msr_interception(vcpu, MSR_IA32_LASTINTTOIP, 0, 0);
 }
 
 void disable_nmi_singlestep(struct vcpu_svm *svm)
@@ -1196,10 +1198,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	clear_page(svm->nested.hsave);
 
 	svm->msrpm = page_address(msrpm_pages);
-	svm_vcpu_init_msrpm(svm->msrpm);
+	svm_vcpu_init_msrpm(vcpu, svm->msrpm);
 
 	svm->nested.msrpm = page_address(nested_msrpm_pages);
-	svm_vcpu_init_msrpm(svm->nested.msrpm);
+	svm_vcpu_init_msrpm(vcpu, svm->nested.msrpm);
 
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
@@ -2540,7 +2542,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		 * We update the L1 MSR bit as well since it will end up
 		 * touching the MSR anyway now.
 		 */
-		set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+		set_msr_interception(vcpu, MSR_IA32_SPEC_CTRL, 1, 1);
 		break;
 	case MSR_IA32_PRED_CMD:
 		if (!msr->host_initiated &&
@@ -2555,7 +2557,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 			break;
 
 		wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB);
-		set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1);
+		set_msr_interception(vcpu, MSR_IA32_PRED_CMD, 0, 1);
 		break;
 	case MSR_AMD64_VIRT_SPEC_CTRL:
 		if (!msr->host_initiated &&
@@ -2619,9 +2621,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		svm->vmcb->save.dbgctl = data;
 		mark_dirty(svm->vmcb, VMCB_LBR);
 		if (data & (1ULL<<0))
-			svm_enable_lbrv(svm);
+			svm_enable_lbrv(vcpu);
 		else
-			svm_disable_lbrv(svm);
+			svm_disable_lbrv(vcpu);
 		break;
 	case MSR_VM_HSAVE_PA:
 		svm->nested.hsave_msr = data;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d4a4cec034d0..9313814d9e91 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4704,7 +4704,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 
 	if (vmx_pt_mode_is_host_guest()) {
 		vmx->pt_desc.guest.ctl = 0;
-		pt_update_intercept_for_msr(vmx);
+		pt_update_intercept_for_msr(vcpu);
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 13745f2a5ecd..1313e47a5a1e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -342,7 +342,7 @@ module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644);
 
 static bool guest_state_valid(struct kvm_vcpu *vcpu);
 static u32 vmx_segment_access_rights(struct kvm_segment *var);
-static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
 							  u32 msr, int type);
 
 void vmx_vmexit(void);
@@ -2081,7 +2081,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * in the merging. We update the vmcs01 here for L1 as well
 		 * since it will end up touching the MSR anyway now.
 		 */
-		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap,
+		vmx_disable_intercept_for_msr(vcpu,
 					      MSR_IA32_SPEC_CTRL,
 					      MSR_TYPE_RW);
 		break;
@@ -2117,8 +2117,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * vmcs02.msr_bitmap here since it gets completely overwritten
 		 * in the merging.
 		 */
-		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
-					      MSR_TYPE_W);
+		vmx_disable_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W);
 		break;
 	case MSR_IA32_CR_PAT:
 		if (!kvm_pat_valid(data))
@@ -2168,7 +2167,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, data);
 		vmx->pt_desc.guest.ctl = data;
-		pt_update_intercept_for_msr(vmx);
+		pt_update_intercept_for_msr(vcpu);
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (!pt_can_write_msr(vmx))
@@ -3691,9 +3690,11 @@ void free_vpid(int vpid)
 	spin_unlock(&vmx_vpid_lock);
 }
 
-static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
+static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
 							  u32 msr, int type)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
 	int f = sizeof(unsigned long);
 
 	if (!cpu_has_vmx_msr_bitmap())
@@ -3729,9 +3730,11 @@ static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bit
 	}
 }
 
-static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitmap,
+static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
 							 u32 msr, int type)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
 	int f = sizeof(unsigned long);
 
 	if (!cpu_has_vmx_msr_bitmap())
@@ -3767,13 +3770,13 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm
 	}
 }
 
-static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap,
-			     			      u32 msr, int type, bool value)
+static __always_inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
+						      u32 msr, int type, bool value)
 {
 	if (value)
-		vmx_enable_intercept_for_msr(msr_bitmap, msr, type);
+		vmx_enable_intercept_for_msr(vcpu, msr, type);
 	else
-		vmx_disable_intercept_for_msr(msr_bitmap, msr, type);
+		vmx_disable_intercept_for_msr(vcpu, msr, type);
 }
 
 static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
@@ -3791,8 +3794,8 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
 	return mode;
 }
 
-static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap,
-					 u8 mode)
+static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
+					 unsigned long *msr_bitmap, u8 mode)
 {
 	int msr;
 
@@ -3807,11 +3810,11 @@ static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap,
 		 * TPR reads and writes can be virtualized even if virtual interrupt
 		 * delivery is not in use.
 		 */
-		vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW);
+		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW);
 		if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
-			vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R);
-			vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
-			vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
+			vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
+			vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
+			vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
 		}
 	}
 }
@@ -3827,30 +3830,24 @@ void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
 		return;
 
 	if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV))
-		vmx_update_msr_bitmap_x2apic(msr_bitmap, mode);
+		vmx_update_msr_bitmap_x2apic(vcpu, msr_bitmap, mode);
 
 	vmx->msr_bitmap_mode = mode;
 }
 
-void pt_update_intercept_for_msr(struct vcpu_vmx *vmx)
+void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
 {
-	unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap;
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	bool flag = !(vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN);
 	u32 i;
 
-	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_STATUS,
-							MSR_TYPE_RW, flag);
-	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_OUTPUT_BASE,
-							MSR_TYPE_RW, flag);
-	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_OUTPUT_MASK,
-							MSR_TYPE_RW, flag);
-	vmx_set_intercept_for_msr(msr_bitmap, MSR_IA32_RTIT_CR3_MATCH,
-							MSR_TYPE_RW, flag);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_STATUS, MSR_TYPE_RW, flag);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_BASE, MSR_TYPE_RW, flag);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_MASK, MSR_TYPE_RW, flag);
+	vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_CR3_MATCH, MSR_TYPE_RW, flag);
 	for (i = 0; i < vmx->pt_desc.addr_range; i++) {
-		vmx_set_intercept_for_msr(msr_bitmap,
-			MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag);
-		vmx_set_intercept_for_msr(msr_bitmap,
-			MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag);
+		vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag);
 	}
 }
 
@@ -6905,18 +6902,18 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
 		goto free_pml;
 
 	msr_bitmap = vmx->vmcs01.msr_bitmap;
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_TSC, MSR_TYPE_R);
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
-	vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_TSC, MSR_TYPE_R);
+	vmx_disable_intercept_for_msr(vcpu, MSR_FS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_GS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW);
+	vmx_disable_intercept_for_msr(vcpu, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW);
 	if (kvm_cstate_in_guest(vcpu->kvm)) {
-		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C1_RES, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
-		vmx_disable_intercept_for_msr(msr_bitmap, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C1_RES, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C3_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C6_RESIDENCY, MSR_TYPE_R);
+		vmx_disable_intercept_for_msr(vcpu, MSR_CORE_C7_RESIDENCY, MSR_TYPE_R);
 	}
 	vmx->msr_bitmap_mode = 0;
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 639798e4a6ca..b3c74f0fe8a1 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -350,7 +350,7 @@ bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu);
 void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked);
 void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
 struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr);
-void pt_update_intercept_for_msr(struct vcpu_vmx *vmx);
+void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu);
 void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
 int vmx_find_msr_index(struct vmx_msrs *m, u32 msr);
 int vmx_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears userspace tracked MSRs
  2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
                   ` (2 preceding siblings ...)
  2020-08-04  4:20 ` [PATCH 3/6] KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs Aaron Lewis
@ 2020-08-04  4:20 ` Aaron Lewis
  2020-08-07 23:03   ` Alexander Graf
  2020-08-04  4:20 ` [PATCH 5/6] selftests: kvm: Fix the segment descriptor layout to match the actual layout Aaron Lewis
  2020-08-04  4:20 ` [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list Aaron Lewis
  5 siblings, 1 reply; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

SDM volume 3: 24.6.9 "MSR-Bitmap Address" and APM volume 2: 15.11 "MS
intercepts" describe MSR permission bitmaps.  Permission bitmaps are
used to control whether an execution of rdmsr or wrmsr will cause a
vm exit.  For userspace tracked MSRs it is required they cause a vm
exit, so the host is able to forward the MSR to userspace.  This change
adds vmx/svm support to ensure the permission bitmap is properly set to
cause a vm_exit to the host when rdmsr or wrmsr is used by one of the
userspace tracked MSRs.  Also, to avoid repeatedly setting them,
kvm_make_request() is used to coalesce these into a single call.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++
 arch/x86/kvm/svm/svm.c          | 49 ++++++++++++++++++++++++++-------
 arch/x86/kvm/vmx/vmx.c          | 13 ++++++++-
 arch/x86/kvm/x86.c              | 16 +++++++++++
 4 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 510055471dd0..07a85f5f0b8a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -87,6 +87,7 @@
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_APF_READY		KVM_ARCH_REQ(28)
+#define KVM_REQ_USER_MSR_UPDATE KVM_ARCH_REQ(29)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1271,6 +1272,8 @@ struct kvm_x86_ops {
 	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
 
 	void (*migrate_timers)(struct kvm_vcpu *vcpu);
+
+	void (*set_user_msr_intercept)(struct kvm_vcpu *vcpu, u32 msr);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index eb673b59f7b7..c560d283b2af 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -583,13 +583,27 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
 	return !!test_bit(bit_write,  &tmp);
 }
 
+static void __set_msr_interception(u32 *msrpm, u32 msr, int read, int write,
+				   u32 offset)
+{
+	u8 bit_read, bit_write;
+	unsigned long tmp;
+
+	bit_read  = 2 * (msr & 0x0f);
+	bit_write = 2 * (msr & 0x0f) + 1;
+	tmp       = msrpm[offset];
+
+	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
+	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
+
+	msrpm[offset] = tmp;
+}
+
 static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
 				 int write)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 	u32 *msrpm = svm->msrpm;
-	u8 bit_read, bit_write;
-	unsigned long tmp;
 	u32 offset;
 
 	/*
@@ -598,17 +612,30 @@ static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
 	 */
 	WARN_ON(!valid_msr_intercept(msr));
 
-	offset    = svm_msrpm_offset(msr);
-	bit_read  = 2 * (msr & 0x0f);
-	bit_write = 2 * (msr & 0x0f) + 1;
-	tmp       = msrpm[offset];
-
+	offset = svm_msrpm_offset(msr);
 	BUG_ON(offset == MSR_INVALID);
 
-	read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
-	write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
+	__set_msr_interception(msrpm, msr, read, write, offset);
 
-	msrpm[offset] = tmp;
+	if (read || write)
+		kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
+}
+
+static void set_user_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
+				      int write)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	u32 *msrpm = svm->msrpm;
+	u32 offset;
+
+	offset = svm_msrpm_offset(msr);
+	if (offset != MSR_INVALID)
+		__set_msr_interception(msrpm, msr, read, write, offset);
+}
+
+void svm_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
+{
+	set_user_msr_interception(vcpu, msr, 0, 0);
 }
 
 static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
@@ -4088,6 +4115,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
 
 	.apic_init_signal_blocked = svm_apic_init_signal_blocked,
+
+	.set_user_msr_intercept = svm_set_user_msr_intercept,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1313e47a5a1e..3d3d9eaeca64 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3728,6 +3728,10 @@ static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
 			__clear_bit(msr, msr_bitmap + 0xc00 / f);
 
 	}
+
+	if (type & MSR_TYPE_R || type & MSR_TYPE_W) {
+		kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
+	}
 }
 
 static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
@@ -3795,7 +3799,7 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
 }
 
 static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
-					 unsigned long *msr_bitmap, u8 mode)
+					unsigned long *msr_bitmap, u8 mode)
 {
 	int msr;
 
@@ -3819,6 +3823,11 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
 	}
 }
 
+void vmx_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
+{
+	vmx_enable_intercept_for_msr(vcpu, msr, MSR_TYPE_RW);
+}
+
 void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7965,6 +7974,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
 	.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
 	.migrate_timers = vmx_migrate_timers,
+
+	.set_user_msr_intercept = vmx_set_user_msr_intercept,
 };
 
 static __init int hardware_setup(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47619b49818a..45bf59f94d34 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3537,6 +3537,19 @@ bool kvm_msr_user_exit(struct kvm *kvm, u32 index)
 }
 EXPORT_SYMBOL_GPL(kvm_msr_user_exit);
 
+static void kvm_set_user_msr_intercepts(struct kvm_vcpu *vcpu)
+{
+	struct kvm_msr_list *msr_list = vcpu->kvm->arch.user_exit_msrs;
+	u32 i, msr;
+
+	if (msr_list) {
+		for (i = 0; i < msr_list->nmsrs; i++) {
+			msr = msr_list->indices[i];
+			kvm_x86_ops.set_user_msr_intercept(vcpu, msr);
+		}
+	}
+}
+
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 {
 	int r = 0;
@@ -8553,6 +8566,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_vcpu_update_apicv(vcpu);
 		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
 			kvm_check_async_pf_completion(vcpu);
+
+		if (kvm_check_request(KVM_REQ_USER_MSR_UPDATE, vcpu))
+			kvm_set_user_msr_intercepts(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] selftests: kvm: Fix the segment descriptor layout to match the actual layout
  2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
                   ` (3 preceding siblings ...)
  2020-08-04  4:20 ` [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears " Aaron Lewis
@ 2020-08-04  4:20 ` Aaron Lewis
  2020-08-04  4:20 ` [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list Aaron Lewis
  5 siblings, 0 replies; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

Fix the layout of 'struct desc64' to match the layout described in the
SDM Vol 3, 3.4.5 Segment Descriptors, Figure 3-8.  The test added later
in this series relies on this and crashes if this layout is not correct.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 tools/testing/selftests/kvm/include/x86_64/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 82b7fe16a824..0a65e7bb5249 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -59,7 +59,7 @@ struct gpr64_regs {
 struct desc64 {
 	uint16_t limit0;
 	uint16_t base0;
-	unsigned base1:8, s:1, type:4, dpl:2, p:1;
+	unsigned base1:8, type:4, s:1, dpl:2, p:1;
 	unsigned limit1:4, avl:1, l:1, db:1, g:1, base2:8;
 	uint32_t base3;
 	uint32_t zero1;
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list
  2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
                   ` (4 preceding siblings ...)
  2020-08-04  4:20 ` [PATCH 5/6] selftests: kvm: Fix the segment descriptor layout to match the actual layout Aaron Lewis
@ 2020-08-04  4:20 ` Aaron Lewis
  2020-08-07 23:28   ` Alexander Graf
  2020-08-10  9:53   ` Andrew Jones
  5 siblings, 2 replies; 20+ messages in thread
From: Aaron Lewis @ 2020-08-04  4:20 UTC (permalink / raw)
  To: jmattson, graf; +Cc: pshier, oupton, kvm, Aaron Lewis

Add a selftest to test that when ioctl KVM_SET_EXIT_MSRS is called with
an MSR list the guest exits to the host and then to userspace when an
MSR in that list is read from or written to.

This test uses 3 MSRs to test these new features:
  1. MSR_IA32_XSS, an MSR the kernel knows about.
  2. MSR_IA32_FLUSH_CMD, an MSR the kernel does not know about.
  3. MSR_NON_EXISTENT, an MSR invented in this test for the purposes of
     passing a fake MSR from the guest to userspace and having the guest
     be able to read from and write to it, with userspace handling it.
     KVM just acts as a pass through.

Userspace is also able to inject a #GP.  This is demonstrated when
MSR_IA32_XSS and MSR_IA32_FLUSH_CMD are misused in the test.  When this
happens a #GP is initiated in userspace to be thrown in the guest.  To
be able to handle this, exception handling was added to selftests, and a
#GP exception handler is registered in the test.  This allows the test
to inject a #GP and be able to handle it gracefully.

Signed-off-by: Aaron Lewis <aaronlewis@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |  20 +-
 .../selftests/kvm/include/x86_64/processor.h  |  27 ++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  17 ++
 .../selftests/kvm/lib/kvm_util_internal.h     |   2 +
 .../selftests/kvm/lib/x86_64/handlers.S       |  83 ++++++
 .../selftests/kvm/lib/x86_64/processor.c      | 168 ++++++++++-
 .../testing/selftests/kvm/lib/x86_64/ucall.c  |   3 +
 .../selftests/kvm/x86_64/userspace_msr_exit.c | 271 ++++++++++++++++++
 9 files changed, 582 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/lib/x86_64/handlers.S
 create mode 100644 tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 452787152748..33619f915857 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -14,6 +14,7 @@
 /x86_64/vmx_preemption_timer_test
 /x86_64/svm_vmcall_test
 /x86_64/sync_regs_test
+/x86_64/userspace_msr_exit
 /x86_64/vmx_close_while_nested_test
 /x86_64/vmx_dirty_log_test
 /x86_64/vmx_set_nested_state_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4a166588d99f..66a6652ca305 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -34,7 +34,7 @@ ifeq ($(ARCH),s390)
 endif
 
 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c lib/test_util.c
-LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c
+LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
 LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c
 
@@ -49,6 +49,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
 TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
+TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
@@ -108,14 +109,21 @@ LDFLAGS += -pthread $(no-pie-option) $(pgste-option)
 include ../lib.mk
 
 STATIC_LIBS := $(OUTPUT)/libkvm.a
-LIBKVM_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM))
-EXTRA_CLEAN += $(LIBKVM_OBJ) $(STATIC_LIBS) cscope.*
+LIBKVM_C := $(filter %.c,$(LIBKVM))
+LIBKVM_S := $(filter %.S,$(LIBKVM))
+LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
+LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
+EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.*
+
+x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ))))
+$(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
+	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
-x := $(shell mkdir -p $(sort $(dir $(LIBKVM_OBJ))))
-$(LIBKVM_OBJ): $(OUTPUT)/%.o: %.c
+$(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
 
-$(OUTPUT)/libkvm.a: $(LIBKVM_OBJ)
+LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
+$(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
 	$(AR) crs $@ $^
 
 x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0a65e7bb5249..a4de429eb408 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -36,6 +36,8 @@
 #define X86_CR4_SMAP		(1ul << 21)
 #define X86_CR4_PKE		(1ul << 22)
 
+#define UNEXPECTED_VECTOR_PORT 0xfff0u
+
 /* General Registers in 64-Bit Mode */
 struct gpr64_regs {
 	u64 rax;
@@ -239,6 +241,11 @@ static inline struct desc_ptr get_idt(void)
 	return idt;
 }
 
+static inline void outl(uint16_t port, uint32_t value)
+{
+	__asm__ __volatile__("outl %%eax, %%dx" : : "d"(port), "a"(value));
+}
+
 #define SET_XMM(__var, __xmm) \
 	asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
 
@@ -334,10 +341,30 @@ int _vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
 void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
 	  	  uint64_t msr_value);
 
+void kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
+	uint32_t msr_indices[]);
+
 uint32_t kvm_get_cpuid_max_basic(void);
 uint32_t kvm_get_cpuid_max_extended(void);
 void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
 
+struct ex_regs {
+	uint64_t rax, rcx, rdx, rbx;
+	uint64_t dummy, rbp, rsi, rdi;
+	uint64_t r8, r9, r10, r11;
+	uint64_t r12, r13, r14, r15;
+	uint64_t vector;
+	uint64_t error_code;
+	uint64_t rip;
+	uint64_t cs;
+	uint64_t rflags;
+};
+
+void vm_init_descriptor_tables(struct kvm_vm *vm);
+void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid);
+void vm_handle_exception(struct kvm_vm *vm, int vector,
+			void (*handler)(struct ex_regs *));
+
 /*
  * Basic CPU control in CR0
  */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 74776ee228f2..f8dde1cdbef0 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1195,6 +1195,21 @@ int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid)
 	do {
 		rc = ioctl(vcpu->fd, KVM_RUN, NULL);
 	} while (rc == -1 && errno == EINTR);
+
+#ifdef __x86_64__
+	if (vcpu_state(vm, vcpuid)->exit_reason == KVM_EXIT_IO
+		&& vcpu_state(vm, vcpuid)->io.port == UNEXPECTED_VECTOR_PORT
+		&& vcpu_state(vm, vcpuid)->io.size == 4) {
+		/* Grab pointer to io data */
+		uint32_t *data = (void *)vcpu_state(vm, vcpuid)
+			+ vcpu_state(vm, vcpuid)->io.data_offset;
+
+		TEST_ASSERT(false,
+			    "Unexpected vectored event in guest (vector:0x%x)",
+			    *data);
+	}
+#endif
+
 	return rc;
 }
 
@@ -1590,6 +1605,8 @@ static struct exit_reason {
 	{KVM_EXIT_INTERNAL_ERROR, "INTERNAL_ERROR"},
 	{KVM_EXIT_OSI, "OSI"},
 	{KVM_EXIT_PAPR_HCALL, "PAPR_HCALL"},
+	{KVM_EXIT_X86_RDMSR, "RDMSR"},
+	{KVM_EXIT_X86_WRMSR, "WRMSR"},
 #ifdef KVM_EXIT_MEMORY_NOT_PRESENT
 	{KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"},
 #endif
diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
index 2ef446520748..f07d383d03a1 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h
+++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
@@ -50,6 +50,8 @@ struct kvm_vm {
 	vm_paddr_t pgd;
 	vm_vaddr_t gdt;
 	vm_vaddr_t tss;
+	vm_vaddr_t idt;
+	vm_vaddr_t handlers;
 };
 
 struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/handlers.S b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
new file mode 100644
index 000000000000..783d2c0fc741
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
@@ -0,0 +1,83 @@
+handle_exception:
+	push %r15
+	push %r14
+	push %r13
+	push %r12
+	push %r11
+	push %r10
+	push %r9
+	push %r8
+
+	push %rdi
+	push %rsi
+	push %rbp
+	sub $8, %rsp
+	push %rbx
+	push %rdx
+	push %rcx
+	push %rax
+	mov %rsp, %rdi
+
+	call route_exception
+
+	pop %rax
+	pop %rcx
+	pop %rdx
+	pop %rbx
+	add $8, %rsp
+	pop %rbp
+	pop %rsi
+	pop %rdi
+	pop %r8
+	pop %r9
+	pop %r10
+	pop %r11
+	pop %r12
+	pop %r13
+	pop %r14
+	pop %r15
+
+	/* Discard vector and error code. */
+	add $16, %rsp
+	iretq
+
+/*
+ * Build the handle_exception wrappers which push the vector/error code on the
+ * stack and an array of pointers to those wrappers.
+ */
+.pushsection .rodata
+.globl idt_handlers
+idt_handlers:
+.popsection
+
+.macro HANDLERS has_error from to
+	vector = \from
+	.rept \to - \from + 1
+	.align 8
+
+	/* Fetch current address and append it to idt_handlers. */
+	current_handler = .
+.pushsection .rodata
+.quad current_handler
+.popsection
+
+	.if ! \has_error
+	pushq $0
+	.endif
+	pushq $vector
+	jmp handle_exception
+	vector = vector + 1
+	.endr
+.endm
+
+.global idt_handler_code
+idt_handler_code:
+	HANDLERS has_error=0 from=0  to=7
+	HANDLERS has_error=1 from=8  to=8
+	HANDLERS has_error=0 from=9  to=9
+	HANDLERS has_error=1 from=10 to=14
+	HANDLERS has_error=0 from=15 to=16
+	HANDLERS has_error=1 from=17 to=17
+	HANDLERS has_error=0 from=18 to=255
+
+.section        .note.GNU-stack, "", %progbits
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index f6eb34eaa0d2..ff56753f205f 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -12,6 +12,13 @@
 #include "../kvm_util_internal.h"
 #include "processor.h"
 
+#ifndef NUM_INTERRUPTS
+#define NUM_INTERRUPTS 256
+#endif
+
+#define DEFAULT_CODE_SELECTOR 0x8
+#define DEFAULT_DATA_SELECTOR 0x10
+
 /* Minimum physical address used for virtual translation tables. */
 #define KVM_GUEST_PAGE_TABLE_MIN_PADDR 0x180000
 
@@ -392,11 +399,12 @@ static void kvm_seg_fill_gdt_64bit(struct kvm_vm *vm, struct kvm_segment *segp)
 	desc->limit0 = segp->limit & 0xFFFF;
 	desc->base0 = segp->base & 0xFFFF;
 	desc->base1 = segp->base >> 16;
-	desc->s = segp->s;
 	desc->type = segp->type;
+	desc->s = segp->s;
 	desc->dpl = segp->dpl;
 	desc->p = segp->present;
 	desc->limit1 = segp->limit >> 16;
+	desc->avl = segp->avl;
 	desc->l = segp->l;
 	desc->db = segp->db;
 	desc->g = segp->g;
@@ -556,9 +564,9 @@ static void vcpu_setup(struct kvm_vm *vm, int vcpuid, int pgd_memslot, int gdt_m
 		sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
 
 		kvm_seg_set_unusable(&sregs.ldt);
-		kvm_seg_set_kernel_code_64bit(vm, 0x8, &sregs.cs);
-		kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.ds);
-		kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.es);
+		kvm_seg_set_kernel_code_64bit(vm, DEFAULT_CODE_SELECTOR, &sregs.cs);
+		kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.ds);
+		kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.es);
 		kvm_setup_tss_64bit(vm, &sregs.tr, 0x18, gdt_memslot, pgd_memslot);
 		break;
 
@@ -843,6 +851,71 @@ void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
 		"  rc: %i errno: %i", r, errno);
 }
 
+/*
+ * _KVM Set Exit MSR
+ *
+ * Input Args:
+ *   vm - Virtual Machine
+ *   nmsrs - Number of msrs in msr_indices
+ *   msr_indices[] - List of msrs.
+ *
+ * Output Args: None
+ *
+ * Return: The result of KVM_SET_EXIT_MSRS.
+ *
+ * Sets a list of MSRs that will force an exit to userspace when
+ * any of them are read from or written to by the guest.
+ */
+int _kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
+	uint32_t msr_indices[])
+{
+	const uint32_t max_nmsrs = 256;
+	struct kvm_msr_list *msr_list;
+	uint32_t i;
+	int r;
+
+	TEST_ASSERT(nmsrs <= max_nmsrs,
+		"'nmsrs' is too large.  Max is %u, currently %u.\n",
+		max_nmsrs, nmsrs);
+	uint32_t msr_list_byte_size = sizeof(struct kvm_msr_list) +
+							     (sizeof(msr_list->indices[0]) * nmsrs);
+	msr_list = alloca(msr_list_byte_size);
+	memset(msr_list, 0, msr_list_byte_size);
+
+	msr_list->nmsrs = nmsrs;
+	for (i = 0; i < nmsrs; i++)
+		msr_list->indices[i] = msr_indices[i];
+
+	r = ioctl(vm->fd, KVM_SET_EXIT_MSRS, msr_list);
+
+	return r;
+}
+
+/*
+ * KVM Set Exit MSR
+ *
+ * Input Args:
+ *   vm - Virtual Machine
+ *   nmsrs - Number of msrs in msr_indices
+ *   msr_indices[] - List of msrs.
+ *
+ * Output Args: None
+ *
+ * Return: None
+ *
+ * Sets a list of MSRs that will force an exit to userspace when
+ * any of them are read from or written to by the guest.
+ */
+void kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
+	uint32_t msr_indices[])
+{
+	int r;
+
+	r = _kvm_set_exit_msrs(vm, nmsrs, msr_indices);
+	TEST_ASSERT(r == 0, "KVM_SET_EXIT_MSRS IOCTL failed,\n"
+		"  rc: %i errno: %i", r, errno);
+}
+
 void vcpu_args_set(struct kvm_vm *vm, uint32_t vcpuid, unsigned int num, ...)
 {
 	va_list ap;
@@ -1118,3 +1191,90 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits)
 		*va_bits = (entry->eax >> 8) & 0xff;
 	}
 }
+
+struct idt_entry {
+	uint16_t offset0;
+	uint16_t selector;
+	uint16_t ist : 3;
+	uint16_t : 5;
+	uint16_t type : 4;
+	uint16_t : 1;
+	uint16_t dpl : 2;
+	uint16_t p : 1;
+	uint16_t offset1;
+	uint32_t offset2; uint32_t reserved;
+};
+
+static void set_idt_entry(struct kvm_vm *vm, int vector, unsigned long addr,
+			  int dpl, unsigned short selector)
+{
+	struct idt_entry *base =
+		(struct idt_entry *)addr_gva2hva(vm, vm->idt);
+	struct idt_entry *e = &base[vector];
+
+	memset(e, 0, sizeof(*e));
+	e->offset0 = addr;
+	e->selector = selector;
+	e->ist = 0;
+	e->type = 14;
+	e->dpl = dpl;
+	e->p = 1;
+	e->offset1 = addr >> 16;
+	e->offset2 = addr >> 32;
+}
+
+void kvm_exit_unexpected_vector(uint32_t value)
+{
+	outl(UNEXPECTED_VECTOR_PORT, value);
+}
+
+void route_exception(struct ex_regs *regs)
+{
+	typedef void(*handler)(struct ex_regs *);
+	handler *handlers;
+
+	handlers = (handler *)rdmsr(MSR_GS_BASE);
+
+	if (handlers[regs->vector]) {
+		handlers[regs->vector](regs);
+		return;
+	}
+
+	kvm_exit_unexpected_vector(regs->vector);
+}
+
+void vm_init_descriptor_tables(struct kvm_vm *vm)
+{
+	extern void *idt_handlers;
+	int i;
+
+	vm->idt = vm_vaddr_alloc(vm, getpagesize(), 0x2000, 0, 0);
+	vm->handlers = vm_vaddr_alloc(vm, 256 * sizeof(void *), 0x2000, 0, 0);
+	/* Handlers have the same address in both address spaces.*/
+	for (i = 0; i < NUM_INTERRUPTS; i++)
+		set_idt_entry(vm, i, (unsigned long)(&idt_handlers)[i], 0,
+			DEFAULT_CODE_SELECTOR);
+}
+
+void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct kvm_sregs sregs;
+
+	vcpu_sregs_get(vm, vcpuid, &sregs);
+	sregs.idt.base = vm->idt;
+	sregs.idt.limit = NUM_INTERRUPTS * sizeof(struct idt_entry) - 1;
+	sregs.gdt.base = vm->gdt;
+	sregs.gdt.limit = getpagesize() - 1;
+	/* Use GS Base to pass the pointer to the handlers to the guest.*/
+	kvm_seg_set_kernel_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs);
+	sregs.gs.base = (unsigned long) vm->handlers;
+	vcpu_sregs_set(vm, vcpuid, &sregs);
+}
+
+void vm_handle_exception(struct kvm_vm *vm, int vector,
+			 void (*handler)(struct ex_regs *))
+{
+	vm_vaddr_t *handlers = (vm_vaddr_t *)addr_gva2hva(vm, vm->handlers);
+
+	handlers[vector] = (vm_vaddr_t)handler;
+}
diff --git a/tools/testing/selftests/kvm/lib/x86_64/ucall.c b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
index da4d89ad5419..a3489973e290 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/ucall.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
@@ -40,6 +40,9 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
 	struct kvm_run *run = vcpu_state(vm, vcpu_id);
 	struct ucall ucall = {};
 
+	if (uc)
+		memset(uc, 0, sizeof(*uc));
+
 	if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) {
 		struct kvm_regs regs;
 
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
new file mode 100644
index 000000000000..6c7868dbce08
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020, Google LLC.
+ *
+ * Tests for exiting into userspace on registered MSRs
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <sys/ioctl.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "vmx.h"
+
+#define VCPU_ID	      1
+
+#define MSR_NON_EXISTENT 0x474f4f00
+
+uint32_t msrs[] = {
+	/* Test an MSR the kernel knows about. */
+	MSR_IA32_XSS,
+	/* Test an MSR the kernel doesn't know about. */
+	MSR_IA32_FLUSH_CMD,
+	/* Test a fabricated MSR that no one knows about. */
+	MSR_NON_EXISTENT,
+};
+uint32_t nmsrs = ARRAY_SIZE(msrs);
+
+uint64_t msr_non_existent_data;
+int guest_exception_count;
+
+/*
+ * Note: Force test_rdmsr() to not be inlined to prevent the labels,
+ * rdmsr_start and rdmsr_end, from being defined multiple times.
+ */
+static noinline uint64_t test_rdmsr(uint32_t msr)
+{
+	uint32_t a, d;
+
+	guest_exception_count = 0;
+
+	__asm__ __volatile__("rdmsr_start: rdmsr; rdmsr_end:" :
+			"=a"(a), "=d"(d) : "c"(msr) : "memory");
+
+	return a | ((uint64_t) d << 32);
+}
+
+/*
+ * Note: Force test_wrmsr() to not be inlined to prevent the labels,
+ * wrmsr_start and wrmsr_end, from being defined multiple times.
+ */
+static noinline void test_wrmsr(uint32_t msr, uint64_t value)
+{
+	uint32_t a = value;
+	uint32_t d = value >> 32;
+
+	guest_exception_count = 0;
+
+	__asm__ __volatile__("wrmsr_start: wrmsr; wrmsr_end:" ::
+			"a"(a), "d"(d), "c"(msr) : "memory");
+}
+
+extern char rdmsr_start, rdmsr_end;
+extern char wrmsr_start, wrmsr_end;
+
+
+static void guest_code(void)
+{
+	uint64_t data;
+
+	/*
+	 * Test userspace intercepting rdmsr / wrmsr for MSR_IA32_XSS.
+	 *
+	 * A GP is thrown if anything other than 0 is written to
+	 * MSR_IA32_XSS.
+	 */
+	data = test_rdmsr(MSR_IA32_XSS);
+	GUEST_ASSERT(data == 0);
+	GUEST_ASSERT(guest_exception_count == 0);
+
+	test_wrmsr(MSR_IA32_XSS, 0);
+	GUEST_ASSERT(guest_exception_count == 0);
+
+	test_wrmsr(MSR_IA32_XSS, 1);
+	GUEST_ASSERT(guest_exception_count == 1);
+
+	/*
+	 * Test userspace intercepting rdmsr / wrmsr for MSR_IA32_FLUSH_CMD.
+	 *
+	 * A GP is thrown if MSR_IA32_FLUSH_CMD is read
+	 * from or if a value other than 1 is written to it.
+	 */
+	test_rdmsr(MSR_IA32_FLUSH_CMD);
+	GUEST_ASSERT(guest_exception_count == 1);
+
+	test_wrmsr(MSR_IA32_FLUSH_CMD, 0);
+	GUEST_ASSERT(guest_exception_count == 1);
+
+	test_wrmsr(MSR_IA32_FLUSH_CMD, 1);
+	GUEST_ASSERT(guest_exception_count == 0);
+
+	/*
+	 * Test userspace intercepting rdmsr / wrmsr for MSR_NON_EXISTENT.
+	 *
+	 * Test that a fabricated MSR can pass through the kernel
+	 * and be handled in userspace.
+	 *
+	 */
+	test_wrmsr(MSR_NON_EXISTENT, 2);
+	GUEST_ASSERT(guest_exception_count == 0);
+
+	data = test_rdmsr(MSR_NON_EXISTENT);
+	GUEST_ASSERT(data == 2);
+	GUEST_ASSERT(guest_exception_count == 0);
+
+	GUEST_DONE();
+}
+
+static void guest_gp_handler(struct ex_regs *regs)
+{
+	if (regs->rip == (uintptr_t)&rdmsr_start) {
+		regs->rip = (uintptr_t)&rdmsr_end;
+		regs->rax = 0;
+		regs->rdx = 0;
+	} else if (regs->rip == (uintptr_t)&wrmsr_start) {
+		regs->rip = (uintptr_t)&wrmsr_end;
+	} else {
+		GUEST_ASSERT(!"RIP is at an unknown location!");
+	}
+
+	++guest_exception_count;
+}
+
+static void run_guest(struct kvm_vm *vm)
+{
+	int rc;
+
+	rc = _vcpu_run(vm, VCPU_ID);
+	TEST_ASSERT(rc == 0, "vcpu_run failed: %d\n", rc);
+}
+
+static void check_for_guest_assert(struct kvm_vm *vm)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	if (run->exit_reason == KVM_EXIT_IO &&
+		get_ucall(vm, VCPU_ID, &uc) == UCALL_ABORT) {
+			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+				__FILE__, uc.args[1]);
+	}
+}
+
+static void process_rdmsr(struct kvm_vm *vm, uint32_t msr_index)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+
+	check_for_guest_assert(vm);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_X86_RDMSR,
+		    "Unexpected exit reason: %u (%s),\n",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+	TEST_ASSERT(run->msr.index == msr_index,
+			"Unexpected msr (0x%04x), expected 0x%04x",
+			run->msr.index, msr_index);
+
+	switch (run->msr.index) {
+	case MSR_IA32_XSS:
+		run->msr.data = 0;
+		break;
+	case MSR_IA32_FLUSH_CMD:
+		run->msr.inject_gp = 1;
+		break;
+	case MSR_NON_EXISTENT:
+		run->msr.data = msr_non_existent_data;
+		break;
+	default:
+		TEST_ASSERT(false, "Unexpected MSR: 0x%04x", run->msr.index);
+	}
+}
+
+static void process_wrmsr(struct kvm_vm *vm, uint32_t msr_index)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+
+	check_for_guest_assert(vm);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_X86_WRMSR,
+		    "Unexpected exit reason: %u (%s),\n",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+	TEST_ASSERT(run->msr.index == msr_index,
+			"Unexpected msr (0x%04x), expected 0x%04x",
+			run->msr.index, msr_index);
+
+	switch (run->msr.index) {
+	case MSR_IA32_XSS:
+		if (run->msr.data != 0)
+			run->msr.inject_gp = 1;
+		break;
+	case MSR_IA32_FLUSH_CMD:
+		if (run->msr.data != 1)
+			run->msr.inject_gp = 1;
+		break;
+	case MSR_NON_EXISTENT:
+		msr_non_existent_data = run->msr.data;
+		break;
+	default:
+		TEST_ASSERT(false, "Unexpected MSR: 0x%04x", run->msr.index);
+	}
+}
+
+static void process_ucall_done(struct kvm_vm *vm)
+{
+	struct kvm_run *run = vcpu_state(vm, VCPU_ID);
+	struct ucall uc;
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+		    "Unexpected exit reason: %u (%s)",
+		    run->exit_reason,
+		    exit_reason_str(run->exit_reason));
+
+	TEST_ASSERT(get_ucall(vm, VCPU_ID, &uc) == UCALL_DONE,
+		    "Unexpected ucall command: %lu, expected UCALL_DONE (%d)",
+		    uc.cmd, UCALL_DONE);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+
+	vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
+	kvm_vm_elf_load(vm, program_invocation_name, 0, 0);
+	vm_create_irqchip(vm);
+
+	kvm_set_exit_msrs(vm, nmsrs, msrs);
+
+	vm_vcpu_add_default(vm, VCPU_ID, guest_code);
+
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vm, VCPU_ID);
+
+	vm_handle_exception(vm, GP_VECTOR, guest_gp_handler);
+
+	/* Process guest code userspace exits */
+	run_guest(vm);
+	process_rdmsr(vm, MSR_IA32_XSS);
+	run_guest(vm);
+	process_wrmsr(vm, MSR_IA32_XSS);
+	run_guest(vm);
+	process_wrmsr(vm, MSR_IA32_XSS);
+
+	run_guest(vm);
+	process_rdmsr(vm, MSR_IA32_FLUSH_CMD);
+	run_guest(vm);
+	process_wrmsr(vm, MSR_IA32_FLUSH_CMD);
+	run_guest(vm);
+	process_wrmsr(vm, MSR_IA32_FLUSH_CMD);
+
+	run_guest(vm);
+	process_wrmsr(vm, MSR_NON_EXISTENT);
+	run_guest(vm);
+	process_rdmsr(vm, MSR_NON_EXISTENT);
+
+	run_guest(vm);
+	process_ucall_done(vm);
+
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.28.0.163.g6104cc2f0b6-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list
  2020-08-04  4:20 ` [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list Aaron Lewis
@ 2020-08-07 16:12   ` Alexander Graf
  2020-08-07 21:16     ` Jim Mattson
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2020-08-07 16:12 UTC (permalink / raw)
  To: Aaron Lewis, jmattson; +Cc: pshier, oupton, kvm



On 04.08.20 06:20, Aaron Lewis wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Add KVM_SET_EXIT_MSRS ioctl to allow userspace to pass in a list of MSRs
> that force an exit to userspace when rdmsr or wrmsr are used by the
> guest.
> 
> KVM_SET_EXIT_MSRS will need to be called before any vCPUs are
> created to protect the 'user_exit_msrs' list from being mutated while
> vCPUs are running.
> 
> Add KVM_CAP_SET_MSR_EXITS to identify the feature exists.
> 
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> ---
>   Documentation/virt/kvm/api.rst  | 24 +++++++++++++++++++
>   arch/x86/include/asm/kvm_host.h |  2 ++
>   arch/x86/kvm/x86.c              | 41 +++++++++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h        |  2 ++
>   4 files changed, 69 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 320788f81a05..7d8167c165aa 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1006,6 +1006,30 @@ such as migration.
>   :Parameters: struct kvm_vcpu_event (out)
>   :Returns: 0 on success, -1 on error
> 
> +4.32 KVM_SET_EXIT_MSRS
> +------------------
> +
> +:Capability: KVM_CAP_SET_MSR_EXITS
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_msr_list (in)
> +:Returns: 0 on success, -1 on error
> +
> +Sets the userspace MSR list which is used to track which MSRs KVM should send
> +to userspace to be serviced when the guest executes rdmsr or wrmsr.

Unfortunately this doesn't solve the whole "ignore_msrs" mess that we 
have today. How can I just say "tell user space about unhandled MSRs"? 
And if I add that on top of this mechanism, how do we not make the list 
of MSRs that are handled in-kernel an ABI?

> +
> +This ioctl needs to be called before vCPUs are setup otherwise the list of MSRs
> +will not be accepted and an EINVAL error will be returned.  Also, if a list of
> +MSRs has already been supplied, and this ioctl is called again an EEXIST error
> +will be returned.
> +
> +::
> +
> +  struct kvm_msr_list {
> +  __u32 nmsrs;
> +  __u32 indices[0];
> +};
> +
>   X86:
>   ^^^^
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index be5363b21540..510055471dd0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1004,6 +1004,8 @@ struct kvm_arch {
> 
>          struct kvm_pmu_event_filter *pmu_event_filter;
>          struct task_struct *nx_lpage_recovery_thread;
> +
> +       struct kvm_msr_list *user_exit_msrs;
>   };
> 
>   struct kvm_vm_stat {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 88c593f83b28..46a0fb9e0869 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3419,6 +3419,42 @@ static inline bool kvm_can_mwait_in_guest(void)
>                  boot_cpu_has(X86_FEATURE_ARAT);
>   }
> 
> +static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
> +                                     struct kvm_msr_list __user *user_msr_list)
> +{
> +       struct kvm_msr_list *msr_list, hdr;
> +       size_t indices_size;
> +
> +       if (kvm->arch.user_exit_msrs != NULL)
> +               return -EEXIST;
> +
> +       if (kvm->created_vcpus)
> +               return -EINVAL;

What if we need to change the set of user space handled MSRs 
dynamically, for example because a feature has been enabled through a 
previous MSR?

> +
> +       if (copy_from_user(&hdr, user_msr_list, sizeof(hdr)))
> +               return -EFAULT;
> +
> +       if (hdr.nmsrs >= MAX_IO_MSRS)

This constant is not defined inside the scope of this patch. Is your 
patchset fully bisectable? Please make sure that every patch builds 
successfully on its own :).


Alex

> +               return -E2BIG;
> +
> +       indices_size = sizeof(hdr.indices[0]) * hdr.nmsrs;
> +       msr_list = kvzalloc(sizeof(struct kvm_msr_list) + indices_size,
> +                           GFP_KERNEL_ACCOUNT);
> +       if (!msr_list)
> +               return -ENOMEM;
> +       msr_list->nmsrs = hdr.nmsrs;
> +
> +       if (copy_from_user(msr_list->indices, user_msr_list->indices,
> +                          indices_size)) {
> +               kvfree(msr_list);
> +               return -EFAULT;
> +       }
> +
> +       kvm->arch.user_exit_msrs = msr_list;
> +
> +       return 0;
> +}
> +
>   int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   {
>          int r = 0;
> @@ -3476,6 +3512,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>          case KVM_CAP_MSR_PLATFORM_INFO:
>          case KVM_CAP_EXCEPTION_PAYLOAD:
>          case KVM_CAP_SET_GUEST_DEBUG:
> +       case KVM_CAP_SET_MSR_EXITS:
>                  r = 1;
>                  break;
>          case KVM_CAP_SYNC_REGS:
> @@ -5261,6 +5298,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
>                  r = 0;
>                  break;
>          }
> +       case KVM_SET_EXIT_MSRS: {
> +               r = kvm_vm_ioctl_set_exit_msrs(kvm, argp);
> +               break;
> +       }
>          case KVM_MEMORY_ENCRYPT_OP: {
>                  r = -ENOTTY;
>                  if (kvm_x86_ops.mem_enc_op)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4fdf30316582..de4638c1bd15 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>   #define KVM_CAP_PPC_SECURE_GUEST 181
>   #define KVM_CAP_HALT_POLL 182
>   #define KVM_CAP_ASYNC_PF_INT 183
> +#define KVM_CAP_SET_MSR_EXITS 185
> 
>   #ifdef KVM_CAP_IRQ_ROUTING
> 
> @@ -1371,6 +1372,7 @@ struct kvm_s390_ucas_mapping {
>   /* Available with KVM_CAP_PMU_EVENT_FILTER */
>   #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
>   #define KVM_PPC_SVM_OFF                  _IO(KVMIO,  0xb3)
> +#define KVM_SET_EXIT_MSRS      _IOW(KVMIO, 0xb4, struct kvm_msr_list)
> 
>   /* ioctl for vm fd */
>   #define KVM_CREATE_DEVICE        _IOWR(KVMIO,  0xe0, struct kvm_create_device)
> --
> 2.28.0.163.g6104cc2f0b6-goog
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list
  2020-08-07 16:12   ` Alexander Graf
@ 2020-08-07 21:16     ` Jim Mattson
  2020-08-07 22:49       ` Alexander Graf
  0 siblings, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2020-08-07 21:16 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Aaron Lewis, Peter Shier, Oliver Upton, kvm list

On Fri, Aug 7, 2020 at 9:12 AM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 04.08.20 06:20, Aaron Lewis wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > Add KVM_SET_EXIT_MSRS ioctl to allow userspace to pass in a list of MSRs
> > that force an exit to userspace when rdmsr or wrmsr are used by the
> > guest.
> >
> > KVM_SET_EXIT_MSRS will need to be called before any vCPUs are
> > created to protect the 'user_exit_msrs' list from being mutated while
> > vCPUs are running.
> >
> > Add KVM_CAP_SET_MSR_EXITS to identify the feature exists.
> >
> > Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > ---
> >   Documentation/virt/kvm/api.rst  | 24 +++++++++++++++++++
> >   arch/x86/include/asm/kvm_host.h |  2 ++
> >   arch/x86/kvm/x86.c              | 41 +++++++++++++++++++++++++++++++++
> >   include/uapi/linux/kvm.h        |  2 ++
> >   4 files changed, 69 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 320788f81a05..7d8167c165aa 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -1006,6 +1006,30 @@ such as migration.
> >   :Parameters: struct kvm_vcpu_event (out)
> >   :Returns: 0 on success, -1 on error
> >
> > +4.32 KVM_SET_EXIT_MSRS
> > +------------------
> > +
> > +:Capability: KVM_CAP_SET_MSR_EXITS
> > +:Architectures: x86
> > +:Type: vm ioctl
> > +:Parameters: struct kvm_msr_list (in)
> > +:Returns: 0 on success, -1 on error
> > +
> > +Sets the userspace MSR list which is used to track which MSRs KVM should send
> > +to userspace to be serviced when the guest executes rdmsr or wrmsr.
>
> Unfortunately this doesn't solve the whole "ignore_msrs" mess that we
> have today. How can I just say "tell user space about unhandled MSRs"?
> And if I add that on top of this mechanism, how do we not make the list
> of MSRs that are handled in-kernel an ABI?

Jumping in for Aaron, who is out this afternoon...

This patch doesn't attempt to solve your problem, "tell user space
about unhandled MSRs." It attempts to solve our problem, "tell user
space about a specific set of MSRs, even if kvm learns to handle them
in the future." This is, in fact, what we really wanted to do when
Peter Hornyack implemented the "tell user space about unhandled MSRs"
changes in 2015. We just didn't realize it at the time.

Though your proposal does partially solve our problem, it's a lot
easier to specify a small set of MSRs by inclusion than it is by
exclusion. (Where your proposal falls short of our needs is when
userspace wants to handle an MSR that kvm would not typically
intercept at all.)

> > +
> > +This ioctl needs to be called before vCPUs are setup otherwise the list of MSRs
> > +will not be accepted and an EINVAL error will be returned.  Also, if a list of
> > +MSRs has already been supplied, and this ioctl is called again an EEXIST error
> > +will be returned.
> > +
> > +::
> > +
> > +  struct kvm_msr_list {
> > +  __u32 nmsrs;
> > +  __u32 indices[0];
> > +};
> > +
> >   X86:
> >   ^^^^
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index be5363b21540..510055471dd0 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1004,6 +1004,8 @@ struct kvm_arch {
> >
> >          struct kvm_pmu_event_filter *pmu_event_filter;
> >          struct task_struct *nx_lpage_recovery_thread;
> > +
> > +       struct kvm_msr_list *user_exit_msrs;
> >   };
> >
> >   struct kvm_vm_stat {
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 88c593f83b28..46a0fb9e0869 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3419,6 +3419,42 @@ static inline bool kvm_can_mwait_in_guest(void)
> >                  boot_cpu_has(X86_FEATURE_ARAT);
> >   }
> >
> > +static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
> > +                                     struct kvm_msr_list __user *user_msr_list)
> > +{
> > +       struct kvm_msr_list *msr_list, hdr;
> > +       size_t indices_size;
> > +
> > +       if (kvm->arch.user_exit_msrs != NULL)
> > +               return -EEXIST;
> > +
> > +       if (kvm->created_vcpus)
> > +               return -EINVAL;
>
> What if we need to change the set of user space handled MSRs
> dynamically, for example because a feature has been enabled through a
> previous MSR?

It's never been an issue for us, and this approach avoids the
messiness of having to back out the old changes to the MSR permissions
bitmaps, which is fraught. It can be done, but I would question the
ROI on the additional complexity. In any case, I think a VCPU ioctl
would be more appropriate than a VM ioctl if dynamic modifications
were allowed. For instance, in your example, the previous MSR write
that enabled a feature requiring a new user space MSR exit would
likely apply only to the VCPU on which that previous MSR write
happened.

> > +
> > +       if (copy_from_user(&hdr, user_msr_list, sizeof(hdr)))
> > +               return -EFAULT;
> > +
> > +       if (hdr.nmsrs >= MAX_IO_MSRS)
>
> This constant is not defined inside the scope of this patch. Is your
> patchset fully bisectable? Please make sure that every patch builds
> successfully on its own :).

I can't vouch for this patchset being fully bisectable, but this
particular constant was moved to x86.c ~13 years ago, in commit
313a3dc75da20 ("KVM: Portability: split kvm_vcpu_ioctl"). It should
not cause any build issues. :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list
  2020-08-07 21:16     ` Jim Mattson
@ 2020-08-07 22:49       ` Alexander Graf
  2020-08-08  0:39         ` Jim Mattson
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2020-08-07 22:49 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Aaron Lewis, Peter Shier, Oliver Upton, kvm list



On 07.08.20 23:16, Jim Mattson wrote:
> 
> On Fri, Aug 7, 2020 at 9:12 AM Alexander Graf <graf@amazon.com> wrote:
>>
>>
>>
>> On 04.08.20 06:20, Aaron Lewis wrote:
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> Add KVM_SET_EXIT_MSRS ioctl to allow userspace to pass in a list of MSRs
>>> that force an exit to userspace when rdmsr or wrmsr are used by the
>>> guest.
>>>
>>> KVM_SET_EXIT_MSRS will need to be called before any vCPUs are
>>> created to protect the 'user_exit_msrs' list from being mutated while
>>> vCPUs are running.
>>>
>>> Add KVM_CAP_SET_MSR_EXITS to identify the feature exists.
>>>
>>> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
>>> Reviewed-by: Oliver Upton <oupton@google.com>
>>> ---
>>>    Documentation/virt/kvm/api.rst  | 24 +++++++++++++++++++
>>>    arch/x86/include/asm/kvm_host.h |  2 ++
>>>    arch/x86/kvm/x86.c              | 41 +++++++++++++++++++++++++++++++++
>>>    include/uapi/linux/kvm.h        |  2 ++
>>>    4 files changed, 69 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 320788f81a05..7d8167c165aa 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -1006,6 +1006,30 @@ such as migration.
>>>    :Parameters: struct kvm_vcpu_event (out)
>>>    :Returns: 0 on success, -1 on error
>>>
>>> +4.32 KVM_SET_EXIT_MSRS
>>> +------------------
>>> +
>>> +:Capability: KVM_CAP_SET_MSR_EXITS
>>> +:Architectures: x86
>>> +:Type: vm ioctl
>>> +:Parameters: struct kvm_msr_list (in)
>>> +:Returns: 0 on success, -1 on error
>>> +
>>> +Sets the userspace MSR list which is used to track which MSRs KVM should send
>>> +to userspace to be serviced when the guest executes rdmsr or wrmsr.
>>
>> Unfortunately this doesn't solve the whole "ignore_msrs" mess that we
>> have today. How can I just say "tell user space about unhandled MSRs"?
>> And if I add that on top of this mechanism, how do we not make the list
>> of MSRs that are handled in-kernel an ABI?
> 
> Jumping in for Aaron, who is out this afternoon...

Awesome, thanks for the super quick reply!

> 
> This patch doesn't attempt to solve your problem, "tell user space
> about unhandled MSRs." It attempts to solve our problem, "tell user
> space about a specific set of MSRs, even if kvm learns to handle them
> in the future." This is, in fact, what we really wanted to do when
> Peter Hornyack implemented the "tell user space about unhandled MSRs"
> changes in 2015. We just didn't realize it at the time.

Ok, let's take a step back then. What exactly are you trying to solve 
and which MSRs do you care about?

My main problem with a deny list approach is that it's (practically) 
impossible to map the allow list use case onto it. It is however trivial 
to only deny a few select MSRs explicitly with an allow list approach. I 
don't want to introduce yet another ABI to KVM in 2 years to then have 
both :).

> Though your proposal does partially solve our problem, it's a lot
> easier to specify a small set of MSRs by inclusion than it is by
> exclusion. (Where your proposal falls short of our needs is when
> userspace wants to handle an MSR that kvm would not typically
> intercept at all.)

The only MSRs that KVM does not intercept are the ones explicitly 
specified as pass-through

> 
>>> +
>>> +This ioctl needs to be called before vCPUs are setup otherwise the list of MSRs
>>> +will not be accepted and an EINVAL error will be returned.  Also, if a list of
>>> +MSRs has already been supplied, and this ioctl is called again an EEXIST error
>>> +will be returned.
>>> +
>>> +::
>>> +
>>> +  struct kvm_msr_list {
>>> +  __u32 nmsrs;
>>> +  __u32 indices[0];
>>> +};
>>> +
>>>    X86:
>>>    ^^^^
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index be5363b21540..510055471dd0 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -1004,6 +1004,8 @@ struct kvm_arch {
>>>
>>>           struct kvm_pmu_event_filter *pmu_event_filter;
>>>           struct task_struct *nx_lpage_recovery_thread;
>>> +
>>> +       struct kvm_msr_list *user_exit_msrs;
>>>    };
>>>
>>>    struct kvm_vm_stat {
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 88c593f83b28..46a0fb9e0869 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -3419,6 +3419,42 @@ static inline bool kvm_can_mwait_in_guest(void)
>>>                   boot_cpu_has(X86_FEATURE_ARAT);
>>>    }
>>>
>>> +static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
>>> +                                     struct kvm_msr_list __user *user_msr_list)
>>> +{
>>> +       struct kvm_msr_list *msr_list, hdr;
>>> +       size_t indices_size;
>>> +
>>> +       if (kvm->arch.user_exit_msrs != NULL)
>>> +               return -EEXIST;
>>> +
>>> +       if (kvm->created_vcpus)
>>> +               return -EINVAL;
>>
>> What if we need to change the set of user space handled MSRs
>> dynamically, for example because a feature has been enabled through a
>> previous MSR?
> 
> It's never been an issue for us, and this approach avoids the
> messiness of having to back out the old changes to the MSR permissions
> bitmaps, which is fraught. It can be done, but I would question the
> ROI on the additional complexity. In any case, I think a VCPU ioctl
> would be more appropriate than a VM ioctl if dynamic modifications
> were allowed. For instance, in your example, the previous MSR write
> that enabled a feature requiring a new user space MSR exit would
> likely apply only to the VCPU on which that previous MSR write
> happened.

My general rule of thumb with PPC has always been to make as much vCPU 
specific as possible. X86 KVM is a bit different there - and I guess it 
does help for memory consumption :).

> 
>>> +
>>> +       if (copy_from_user(&hdr, user_msr_list, sizeof(hdr)))
>>> +               return -EFAULT;
>>> +
>>> +       if (hdr.nmsrs >= MAX_IO_MSRS)
>>
>> This constant is not defined inside the scope of this patch. Is your
>> patchset fully bisectable? Please make sure that every patch builds
>> successfully on its own :).
> 
> I can't vouch for this patchset being fully bisectable, but this
> particular constant was moved to x86.c ~13 years ago, in commit
> 313a3dc75da20 ("KVM: Portability: split kvm_vcpu_ioctl"). It should
> not cause any build issues. :)

Ugh, sorry, my bad :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears userspace tracked MSRs
  2020-08-04  4:20 ` [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears " Aaron Lewis
@ 2020-08-07 23:03   ` Alexander Graf
  2020-08-18 21:34     ` Aaron Lewis
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2020-08-07 23:03 UTC (permalink / raw)
  To: Aaron Lewis, jmattson; +Cc: pshier, oupton, kvm



On 04.08.20 06:20, Aaron Lewis wrote:
> 
> SDM volume 3: 24.6.9 "MSR-Bitmap Address" and APM volume 2: 15.11 "MS
> intercepts" describe MSR permission bitmaps.  Permission bitmaps are
> used to control whether an execution of rdmsr or wrmsr will cause a
> vm exit.  For userspace tracked MSRs it is required they cause a vm
> exit, so the host is able to forward the MSR to userspace.  This change
> adds vmx/svm support to ensure the permission bitmap is properly set to
> cause a vm_exit to the host when rdmsr or wrmsr is used by one of the
> userspace tracked MSRs.  Also, to avoid repeatedly setting them,
> kvm_make_request() is used to coalesce these into a single call.

I might have some fundamental misunderstanding here:

1) You force that the list of trapped MSRs is set before vCPU creation, 
so at the point of vCPU creation, you know already which MSRs should 
trap to user space

2) MSR intercept bitmaps are (AFAIK) all set to "trap by default". That 
means any MSR that we want the guest to get direct access to needs 
explicit opt-in through bitmap modifications.

That means if you simply check whether an MSR is supposed to trap to 
user space in the bitmap set path, you don't need any of the complicated 
logic below, no?


Alex

> 
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  3 ++
>   arch/x86/kvm/svm/svm.c          | 49 ++++++++++++++++++++++++++-------
>   arch/x86/kvm/vmx/vmx.c          | 13 ++++++++-
>   arch/x86/kvm/x86.c              | 16 +++++++++++
>   4 files changed, 70 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 510055471dd0..07a85f5f0b8a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -87,6 +87,7 @@
>   #define KVM_REQ_HV_TLB_FLUSH \
>          KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
>   #define KVM_REQ_APF_READY              KVM_ARCH_REQ(28)
> +#define KVM_REQ_USER_MSR_UPDATE KVM_ARCH_REQ(29)
> 
>   #define CR0_RESERVED_BITS                                               \
>          (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> @@ -1271,6 +1272,8 @@ struct kvm_x86_ops {
>          int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> 
>          void (*migrate_timers)(struct kvm_vcpu *vcpu);
> +
> +       void (*set_user_msr_intercept)(struct kvm_vcpu *vcpu, u32 msr);
>   };
> 
>   struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index eb673b59f7b7..c560d283b2af 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -583,13 +583,27 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
>          return !!test_bit(bit_write,  &tmp);
>   }
> 
> +static void __set_msr_interception(u32 *msrpm, u32 msr, int read, int write,
> +                                  u32 offset)
> +{
> +       u8 bit_read, bit_write;
> +       unsigned long tmp;
> +
> +       bit_read  = 2 * (msr & 0x0f);
> +       bit_write = 2 * (msr & 0x0f) + 1;
> +       tmp       = msrpm[offset];
> +
> +       read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> +       write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> +
> +       msrpm[offset] = tmp;
> +}
> +
>   static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
>                                   int write)
>   {
>          struct vcpu_svm *svm = to_svm(vcpu);
>          u32 *msrpm = svm->msrpm;
> -       u8 bit_read, bit_write;
> -       unsigned long tmp;
>          u32 offset;
> 
>          /*
> @@ -598,17 +612,30 @@ static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
>           */
>          WARN_ON(!valid_msr_intercept(msr));
> 
> -       offset    = svm_msrpm_offset(msr);
> -       bit_read  = 2 * (msr & 0x0f);
> -       bit_write = 2 * (msr & 0x0f) + 1;
> -       tmp       = msrpm[offset];
> -
> +       offset = svm_msrpm_offset(msr);
>          BUG_ON(offset == MSR_INVALID);
> 
> -       read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> -       write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> +       __set_msr_interception(msrpm, msr, read, write, offset);
> 
> -       msrpm[offset] = tmp;
> +       if (read || write)
> +               kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
> +}
> +
> +static void set_user_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> +                                     int write)
> +{
> +       struct vcpu_svm *svm = to_svm(vcpu);
> +       u32 *msrpm = svm->msrpm;
> +       u32 offset;
> +
> +       offset = svm_msrpm_offset(msr);
> +       if (offset != MSR_INVALID)
> +               __set_msr_interception(msrpm, msr, read, write, offset);
> +}
> +
> +void svm_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +       set_user_msr_interception(vcpu, msr, 0, 0);
>   }
> 
>   static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
> @@ -4088,6 +4115,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>          .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> 
>          .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> +
> +       .set_user_msr_intercept = svm_set_user_msr_intercept,
>   };
> 
>   static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1313e47a5a1e..3d3d9eaeca64 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3728,6 +3728,10 @@ static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
>                          __clear_bit(msr, msr_bitmap + 0xc00 / f);
> 
>          }
> +
> +       if (type & MSR_TYPE_R || type & MSR_TYPE_W) {
> +               kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
> +       }
>   }
> 
>   static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
> @@ -3795,7 +3799,7 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
>   }
> 
>   static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
> -                                        unsigned long *msr_bitmap, u8 mode)
> +                                       unsigned long *msr_bitmap, u8 mode)
>   {
>          int msr;
> 
> @@ -3819,6 +3823,11 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
>          }
>   }
> 
> +void vmx_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
> +{
> +       vmx_enable_intercept_for_msr(vcpu, msr, MSR_TYPE_RW);
> +}
> +
>   void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
>   {
>          struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -7965,6 +7974,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>          .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
>          .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
>          .migrate_timers = vmx_migrate_timers,
> +
> +       .set_user_msr_intercept = vmx_set_user_msr_intercept,
>   };
> 
>   static __init int hardware_setup(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 47619b49818a..45bf59f94d34 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3537,6 +3537,19 @@ bool kvm_msr_user_exit(struct kvm *kvm, u32 index)
>   }
>   EXPORT_SYMBOL_GPL(kvm_msr_user_exit);
> 
> +static void kvm_set_user_msr_intercepts(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_msr_list *msr_list = vcpu->kvm->arch.user_exit_msrs;
> +       u32 i, msr;
> +
> +       if (msr_list) {
> +               for (i = 0; i < msr_list->nmsrs; i++) {
> +                       msr = msr_list->indices[i];
> +                       kvm_x86_ops.set_user_msr_intercept(vcpu, msr);
> +               }
> +       }
> +}
> +
>   int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   {
>          int r = 0;
> @@ -8553,6 +8566,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                          kvm_vcpu_update_apicv(vcpu);
>                  if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
>                          kvm_check_async_pf_completion(vcpu);
> +
> +               if (kvm_check_request(KVM_REQ_USER_MSR_UPDATE, vcpu))
> +                       kvm_set_user_msr_intercepts(vcpu);
>          }
> 
>          if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> --
> 2.28.0.163.g6104cc2f0b6-goog
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list
  2020-08-04  4:20 ` [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list Aaron Lewis
@ 2020-08-07 23:28   ` Alexander Graf
  2020-08-10 17:24     ` Jim Mattson
  2020-08-18 21:26     ` Aaron Lewis
  2020-08-10  9:53   ` Andrew Jones
  1 sibling, 2 replies; 20+ messages in thread
From: Alexander Graf @ 2020-08-07 23:28 UTC (permalink / raw)
  To: Aaron Lewis, jmattson; +Cc: pshier, oupton, kvm



On 04.08.20 06:20, Aaron Lewis wrote:
> 
> Add a selftest to test that when ioctl KVM_SET_EXIT_MSRS is called with
> an MSR list the guest exits to the host and then to userspace when an
> MSR in that list is read from or written to.
> 
> This test uses 3 MSRs to test these new features:
>    1. MSR_IA32_XSS, an MSR the kernel knows about.
>    2. MSR_IA32_FLUSH_CMD, an MSR the kernel does not know about.
>    3. MSR_NON_EXISTENT, an MSR invented in this test for the purposes of
>       passing a fake MSR from the guest to userspace and having the guest
>       be able to read from and write to it, with userspace handling it.
>       KVM just acts as a pass through.
> 
> Userspace is also able to inject a #GP.  This is demonstrated when
> MSR_IA32_XSS and MSR_IA32_FLUSH_CMD are misused in the test.  When this
> happens a #GP is initiated in userspace to be thrown in the guest.  To
> be able to handle this, exception handling was added to selftests, and a
> #GP exception handler is registered in the test.  This allows the test
> to inject a #GP and be able to handle it gracefully.
> 
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>

This is so much cooler than my excuse of a self test :). I do think it 
makes for easier review (and reuse) if you split the interrupt handling 
logic from the actual user space msr bits though.

> ---
>   tools/testing/selftests/kvm/.gitignore        |   1 +
>   tools/testing/selftests/kvm/Makefile          |  20 +-
>   .../selftests/kvm/include/x86_64/processor.h  |  27 ++
>   tools/testing/selftests/kvm/lib/kvm_util.c    |  17 ++
>   .../selftests/kvm/lib/kvm_util_internal.h     |   2 +
>   .../selftests/kvm/lib/x86_64/handlers.S       |  83 ++++++
>   .../selftests/kvm/lib/x86_64/processor.c      | 168 ++++++++++-
>   .../testing/selftests/kvm/lib/x86_64/ucall.c  |   3 +
>   .../selftests/kvm/x86_64/userspace_msr_exit.c | 271 ++++++++++++++++++
>   9 files changed, 582 insertions(+), 10 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/lib/x86_64/handlers.S
>   create mode 100644 tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index 452787152748..33619f915857 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -14,6 +14,7 @@
>   /x86_64/vmx_preemption_timer_test
>   /x86_64/svm_vmcall_test
>   /x86_64/sync_regs_test
> +/x86_64/userspace_msr_exit
>   /x86_64/vmx_close_while_nested_test
>   /x86_64/vmx_dirty_log_test
>   /x86_64/vmx_set_nested_state_test
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 4a166588d99f..66a6652ca305 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -34,7 +34,7 @@ ifeq ($(ARCH),s390)
>   endif
> 
>   LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c lib/test_util.c
> -LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c
> +LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
>   LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
>   LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c
> 
> @@ -49,6 +49,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/state_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
>   TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
>   TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
> +TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
> @@ -108,14 +109,21 @@ LDFLAGS += -pthread $(no-pie-option) $(pgste-option)
>   include ../lib.mk
> 
>   STATIC_LIBS := $(OUTPUT)/libkvm.a
> -LIBKVM_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM))
> -EXTRA_CLEAN += $(LIBKVM_OBJ) $(STATIC_LIBS) cscope.*
> +LIBKVM_C := $(filter %.c,$(LIBKVM))
> +LIBKVM_S := $(filter %.S,$(LIBKVM))
> +LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
> +LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
> +EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.*
> +
> +x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ))))
> +$(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
> +       $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> 
> -x := $(shell mkdir -p $(sort $(dir $(LIBKVM_OBJ))))
> -$(LIBKVM_OBJ): $(OUTPUT)/%.o: %.c
> +$(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
>          $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> 
> -$(OUTPUT)/libkvm.a: $(LIBKVM_OBJ)
> +LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
> +$(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
>          $(AR) crs $@ $^
> 
>   x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 0a65e7bb5249..a4de429eb408 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -36,6 +36,8 @@
>   #define X86_CR4_SMAP           (1ul << 21)
>   #define X86_CR4_PKE            (1ul << 22)
> 
> +#define UNEXPECTED_VECTOR_PORT 0xfff0u
> +
>   /* General Registers in 64-Bit Mode */
>   struct gpr64_regs {
>          u64 rax;
> @@ -239,6 +241,11 @@ static inline struct desc_ptr get_idt(void)
>          return idt;
>   }
> 
> +static inline void outl(uint16_t port, uint32_t value)
> +{
> +       __asm__ __volatile__("outl %%eax, %%dx" : : "d"(port), "a"(value));
> +}
> +
>   #define SET_XMM(__var, __xmm) \
>          asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
> 
> @@ -334,10 +341,30 @@ int _vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
>   void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
>                    uint64_t msr_value);
> 
> +void kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
> +       uint32_t msr_indices[]);
> +
>   uint32_t kvm_get_cpuid_max_basic(void);
>   uint32_t kvm_get_cpuid_max_extended(void);
>   void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
> 
> +struct ex_regs {
> +       uint64_t rax, rcx, rdx, rbx;
> +       uint64_t dummy, rbp, rsi, rdi;

Why the dummy?

> +       uint64_t r8, r9, r10, r11;
> +       uint64_t r12, r13, r14, r15;
> +       uint64_t vector;
> +       uint64_t error_code;
> +       uint64_t rip;
> +       uint64_t cs;
> +       uint64_t rflags;
> +};
> +
> +void vm_init_descriptor_tables(struct kvm_vm *vm);
> +void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid);
> +void vm_handle_exception(struct kvm_vm *vm, int vector,
> +                       void (*handler)(struct ex_regs *));
> +
>   /*
>    * Basic CPU control in CR0
>    */
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 74776ee228f2..f8dde1cdbef0 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1195,6 +1195,21 @@ int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid)
>          do {
>                  rc = ioctl(vcpu->fd, KVM_RUN, NULL);
>          } while (rc == -1 && errno == EINTR);
> +
> +#ifdef __x86_64__
> +       if (vcpu_state(vm, vcpuid)->exit_reason == KVM_EXIT_IO

This does feel a bit out of place here, no? Is there any particular 
reason not to handle it similar to the UCALL logic? In fact, would it 
hurt to just declare it as a new ucall? You do have a working stack 
after all ...

> +               && vcpu_state(vm, vcpuid)->io.port == UNEXPECTED_VECTOR_PORT
> +               && vcpu_state(vm, vcpuid)->io.size == 4) {
> +               /* Grab pointer to io data */
> +               uint32_t *data = (void *)vcpu_state(vm, vcpuid)
> +                       + vcpu_state(vm, vcpuid)->io.data_offset;
> +
> +               TEST_ASSERT(false,
> +                           "Unexpected vectored event in guest (vector:0x%x)",
> +                           *data);
> +       }
> +#endif
> +
>          return rc;
>   }
> 
> @@ -1590,6 +1605,8 @@ static struct exit_reason {
>          {KVM_EXIT_INTERNAL_ERROR, "INTERNAL_ERROR"},
>          {KVM_EXIT_OSI, "OSI"},
>          {KVM_EXIT_PAPR_HCALL, "PAPR_HCALL"},
> +       {KVM_EXIT_X86_RDMSR, "RDMSR"},
> +       {KVM_EXIT_X86_WRMSR, "WRMSR"},
>   #ifdef KVM_EXIT_MEMORY_NOT_PRESENT
>          {KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"},
>   #endif
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> index 2ef446520748..f07d383d03a1 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> @@ -50,6 +50,8 @@ struct kvm_vm {
>          vm_paddr_t pgd;
>          vm_vaddr_t gdt;
>          vm_vaddr_t tss;
> +       vm_vaddr_t idt;
> +       vm_vaddr_t handlers;
>   };
> 
>   struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid);
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/handlers.S b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
> new file mode 100644
> index 000000000000..783d2c0fc741
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
> @@ -0,0 +1,83 @@
> +handle_exception:
> +       push %r15
> +       push %r14
> +       push %r13
> +       push %r12
> +       push %r11
> +       push %r10
> +       push %r9
> +       push %r8
> +
> +       push %rdi
> +       push %rsi
> +       push %rbp
> +       sub $8, %rsp
> +       push %rbx
> +       push %rdx
> +       push %rcx
> +       push %rax
> +       mov %rsp, %rdi
> +
> +       call route_exception
> +
> +       pop %rax
> +       pop %rcx
> +       pop %rdx
> +       pop %rbx
> +       add $8, %rsp
> +       pop %rbp
> +       pop %rsi
> +       pop %rdi
> +       pop %r8
> +       pop %r9
> +       pop %r10
> +       pop %r11
> +       pop %r12
> +       pop %r13
> +       pop %r14
> +       pop %r15
> +
> +       /* Discard vector and error code. */
> +       add $16, %rsp
> +       iretq
> +
> +/*
> + * Build the handle_exception wrappers which push the vector/error code on the
> + * stack and an array of pointers to those wrappers.
> + */
> +.pushsection .rodata
> +.globl idt_handlers
> +idt_handlers:
> +.popsection
> +
> +.macro HANDLERS has_error from to
> +       vector = \from
> +       .rept \to - \from + 1
> +       .align 8
> +
> +       /* Fetch current address and append it to idt_handlers. */
> +       current_handler = .
> +.pushsection .rodata
> +.quad current_handler
> +.popsection
> +
> +       .if ! \has_error
> +       pushq $0
> +       .endif
> +       pushq $vector
> +       jmp handle_exception
> +       vector = vector + 1
> +       .endr
> +.endm
> +
> +.global idt_handler_code
> +idt_handler_code:
> +       HANDLERS has_error=0 from=0  to=7
> +       HANDLERS has_error=1 from=8  to=8
> +       HANDLERS has_error=0 from=9  to=9
> +       HANDLERS has_error=1 from=10 to=14
> +       HANDLERS has_error=0 from=15 to=16
> +       HANDLERS has_error=1 from=17 to=17
> +       HANDLERS has_error=0 from=18 to=255
> +
> +.section        .note.GNU-stack, "", %progbits
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index f6eb34eaa0d2..ff56753f205f 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -12,6 +12,13 @@
>   #include "../kvm_util_internal.h"
>   #include "processor.h"
> 
> +#ifndef NUM_INTERRUPTS
> +#define NUM_INTERRUPTS 256
> +#endif
> +
> +#define DEFAULT_CODE_SELECTOR 0x8
> +#define DEFAULT_DATA_SELECTOR 0x10
> +
>   /* Minimum physical address used for virtual translation tables. */
>   #define KVM_GUEST_PAGE_TABLE_MIN_PADDR 0x180000
> 
> @@ -392,11 +399,12 @@ static void kvm_seg_fill_gdt_64bit(struct kvm_vm *vm, struct kvm_segment *segp)
>          desc->limit0 = segp->limit & 0xFFFF;
>          desc->base0 = segp->base & 0xFFFF;
>          desc->base1 = segp->base >> 16;
> -       desc->s = segp->s;
>          desc->type = segp->type;
> +       desc->s = segp->s;

This probably should go into the previous patch.

>          desc->dpl = segp->dpl;
>          desc->p = segp->present;
>          desc->limit1 = segp->limit >> 16;
> +       desc->avl = segp->avl;
>          desc->l = segp->l;
>          desc->db = segp->db;
>          desc->g = segp->g;
> @@ -556,9 +564,9 @@ static void vcpu_setup(struct kvm_vm *vm, int vcpuid, int pgd_memslot, int gdt_m
>                  sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
> 
>                  kvm_seg_set_unusable(&sregs.ldt);
> -               kvm_seg_set_kernel_code_64bit(vm, 0x8, &sregs.cs);
> -               kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.ds);
> -               kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.es);
> +               kvm_seg_set_kernel_code_64bit(vm, DEFAULT_CODE_SELECTOR, &sregs.cs);
> +               kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.ds);
> +               kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.es);
>                  kvm_setup_tss_64bit(vm, &sregs.tr, 0x18, gdt_memslot, pgd_memslot);
>                  break;
> 
> @@ -843,6 +851,71 @@ void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
>                  "  rc: %i errno: %i", r, errno);
>   }
> 
> +/*
> + * _KVM Set Exit MSR
> + *
> + * Input Args:
> + *   vm - Virtual Machine
> + *   nmsrs - Number of msrs in msr_indices
> + *   msr_indices[] - List of msrs.
> + *
> + * Output Args: None
> + *
> + * Return: The result of KVM_SET_EXIT_MSRS.
> + *
> + * Sets a list of MSRs that will force an exit to userspace when
> + * any of them are read from or written to by the guest.
> + */
> +int _kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
> +       uint32_t msr_indices[])
> +{
> +       const uint32_t max_nmsrs = 256;
> +       struct kvm_msr_list *msr_list;
> +       uint32_t i;
> +       int r;
> +
> +       TEST_ASSERT(nmsrs <= max_nmsrs,
> +               "'nmsrs' is too large.  Max is %u, currently %u.\n",
> +               max_nmsrs, nmsrs);
> +       uint32_t msr_list_byte_size = sizeof(struct kvm_msr_list) +
> +                                                            (sizeof(msr_list->indices[0]) * nmsrs);
> +       msr_list = alloca(msr_list_byte_size);
> +       memset(msr_list, 0, msr_list_byte_size);
> +
> +       msr_list->nmsrs = nmsrs;
> +       for (i = 0; i < nmsrs; i++)
> +               msr_list->indices[i] = msr_indices[i];
> +
> +       r = ioctl(vm->fd, KVM_SET_EXIT_MSRS, msr_list);
> +
> +       return r;
> +}
> +
> +/*
> + * KVM Set Exit MSR
> + *
> + * Input Args:
> + *   vm - Virtual Machine
> + *   nmsrs - Number of msrs in msr_indices
> + *   msr_indices[] - List of msrs.
> + *
> + * Output Args: None
> + *
> + * Return: None
> + *
> + * Sets a list of MSRs that will force an exit to userspace when
> + * any of them are read from or written to by the guest.
> + */
> +void kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
> +       uint32_t msr_indices[])
> +{
> +       int r;
> +
> +       r = _kvm_set_exit_msrs(vm, nmsrs, msr_indices);
> +       TEST_ASSERT(r == 0, "KVM_SET_EXIT_MSRS IOCTL failed,\n"
> +               "  rc: %i errno: %i", r, errno);
> +}
> +
>   void vcpu_args_set(struct kvm_vm *vm, uint32_t vcpuid, unsigned int num, ...)
>   {
>          va_list ap;
> @@ -1118,3 +1191,90 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits)
>                  *va_bits = (entry->eax >> 8) & 0xff;
>          }
>   }
> +
> +struct idt_entry {
> +       uint16_t offset0;
> +       uint16_t selector;
> +       uint16_t ist : 3;
> +       uint16_t : 5;
> +       uint16_t type : 4;
> +       uint16_t : 1;
> +       uint16_t dpl : 2;
> +       uint16_t p : 1;
> +       uint16_t offset1;
> +       uint32_t offset2; uint32_t reserved;
> +};
> +
> +static void set_idt_entry(struct kvm_vm *vm, int vector, unsigned long addr,
> +                         int dpl, unsigned short selector)
> +{
> +       struct idt_entry *base =
> +               (struct idt_entry *)addr_gva2hva(vm, vm->idt);
> +       struct idt_entry *e = &base[vector];
> +
> +       memset(e, 0, sizeof(*e));
> +       e->offset0 = addr;
> +       e->selector = selector;
> +       e->ist = 0;
> +       e->type = 14;
> +       e->dpl = dpl;
> +       e->p = 1;
> +       e->offset1 = addr >> 16;
> +       e->offset2 = addr >> 32;
> +}
> +
> +void kvm_exit_unexpected_vector(uint32_t value)
> +{
> +       outl(UNEXPECTED_VECTOR_PORT, value);
> +}
> +
> +void route_exception(struct ex_regs *regs)
> +{
> +       typedef void(*handler)(struct ex_regs *);
> +       handler *handlers;
> +
> +       handlers = (handler *)rdmsr(MSR_GS_BASE);
> +
> +       if (handlers[regs->vector]) {
> +               handlers[regs->vector](regs);
> +               return;
> +       }
> +
> +       kvm_exit_unexpected_vector(regs->vector);
> +}
> +
> +void vm_init_descriptor_tables(struct kvm_vm *vm)
> +{
> +       extern void *idt_handlers;
> +       int i;
> +
> +       vm->idt = vm_vaddr_alloc(vm, getpagesize(), 0x2000, 0, 0);
> +       vm->handlers = vm_vaddr_alloc(vm, 256 * sizeof(void *), 0x2000, 0, 0);
> +       /* Handlers have the same address in both address spaces.*/
> +       for (i = 0; i < NUM_INTERRUPTS; i++)
> +               set_idt_entry(vm, i, (unsigned long)(&idt_handlers)[i], 0,
> +                       DEFAULT_CODE_SELECTOR);
> +}
> +
> +void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> +       struct kvm_sregs sregs;
> +
> +       vcpu_sregs_get(vm, vcpuid, &sregs);
> +       sregs.idt.base = vm->idt;
> +       sregs.idt.limit = NUM_INTERRUPTS * sizeof(struct idt_entry) - 1;
> +       sregs.gdt.base = vm->gdt;
> +       sregs.gdt.limit = getpagesize() - 1;
> +       /* Use GS Base to pass the pointer to the handlers to the guest.*/
> +       kvm_seg_set_kernel_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs);
> +       sregs.gs.base = (unsigned long) vm->handlers;
> +       vcpu_sregs_set(vm, vcpuid, &sregs);
> +}
> +
> +void vm_handle_exception(struct kvm_vm *vm, int vector,
> +                        void (*handler)(struct ex_regs *))
> +{
> +       vm_vaddr_t *handlers = (vm_vaddr_t *)addr_gva2hva(vm, vm->handlers);
> +
> +       handlers[vector] = (vm_vaddr_t)handler;
> +}
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/ucall.c b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> index da4d89ad5419..a3489973e290 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> @@ -40,6 +40,9 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
>          struct kvm_run *run = vcpu_state(vm, vcpu_id);
>          struct ucall ucall = {};
> 
> +       if (uc)
> +               memset(uc, 0, sizeof(*uc));
> +
>          if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) {
>                  struct kvm_regs regs;
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
> new file mode 100644
> index 000000000000..6c7868dbce08
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
> @@ -0,0 +1,271 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020, Google LLC.
> + *
> + * Tests for exiting into userspace on registered MSRs
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <sys/ioctl.h>
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "vmx.h"
> +
> +#define VCPU_ID              1
> +
> +#define MSR_NON_EXISTENT 0x474f4f00
> +
> +uint32_t msrs[] = {
> +       /* Test an MSR the kernel knows about. */
> +       MSR_IA32_XSS,
> +       /* Test an MSR the kernel doesn't know about. */
> +       MSR_IA32_FLUSH_CMD,
> +       /* Test a fabricated MSR that no one knows about. */
> +       MSR_NON_EXISTENT,
> +};
> +uint32_t nmsrs = ARRAY_SIZE(msrs);
> +
> +uint64_t msr_non_existent_data;
> +int guest_exception_count;
> +
> +/*
> + * Note: Force test_rdmsr() to not be inlined to prevent the labels,
> + * rdmsr_start and rdmsr_end, from being defined multiple times.
> + */
> +static noinline uint64_t test_rdmsr(uint32_t msr)
> +{
> +       uint32_t a, d;
> +
> +       guest_exception_count = 0;
> +
> +       __asm__ __volatile__("rdmsr_start: rdmsr; rdmsr_end:" :
> +                       "=a"(a), "=d"(d) : "c"(msr) : "memory");

I personally would find

asm volatile("rdmsr_start:");
r = rdmsr(msr);
asm volatile("rdmsr_end:");

more readable. Same for wrmsr.


The rest of the test looks very clean and nice to read. Kudos!

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr
  2020-08-04  4:20 ` [PATCH 2/6] KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr Aaron Lewis
@ 2020-08-07 23:41   ` Alexander Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2020-08-07 23:41 UTC (permalink / raw)
  To: Aaron Lewis, jmattson; +Cc: pshier, oupton, kvm



On 04.08.20 06:20, Aaron Lewis wrote:
> 
> 
> Add support for exiting to userspace on a rdmsr or wrmsr instruction if
> the MSR being read from or written to is in the user_exit_msrs list.
> 
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> ---
>   Documentation/virt/kvm/api.rst | 29 +++++++++++-
>   arch/x86/kvm/trace.h           | 24 ++++++++++
>   arch/x86/kvm/x86.c             | 83 ++++++++++++++++++++++++++++++++++
>   include/trace/events/kvm.h     |  2 +-
>   include/uapi/linux/kvm.h       | 10 ++++
>   5 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7d8167c165aa..8b7078707e0a 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4885,8 +4885,9 @@ to the byte array.
> 
>   .. note::
> 
> -      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
> -      KVM_EXIT_EPR the corresponding
> +      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
> +      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR, and KVM_EXIT_X86_WRMSR the
> +      corresponding
> 
>   operations are complete (and guest state is consistent) only after userspace
>   has re-entered the kernel with KVM_RUN.  The kernel side will first finish
> @@ -5179,6 +5180,30 @@ Note that KVM does not skip the faulting instruction as it does for
>   KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
>   if it decides to decode and emulate the instruction.
> 
> +::
> +
> +    /* KVM_EXIT_X86_RDMSR */
> +    /* KVM_EXIT_X86_WRMSR */
> +    struct {
> +      __u8 inject_gp; /* out */
> +      __u8 pad[3];
> +      __u32 index;    /* i.e. ecx; out */
> +      __u64 data;     /* in (wrmsr) / out (rdmsr) */
> +    } msr;

I like that struct layout! :)

> +
> +If the exit_reason is KVM_EXIT_X86_RDMSR then a rdmsr instruction in the guest
> +needs to be processed by userspace.  If the exit_reason is KVM_EXIT_X86_WRMSR
> +then a wrmsr instruction in the guest needs to be processed by userspace.
> +
> +Userspace can tell KVM to inject a #GP into the guest by setting the
> +'inject_gp' flag.  Setting the flag to 1 tells KVM to inject a GP into the
> +guest.  Setting the flag to 0 tells KVM to not inject a GP into the guest.
> +
> +The MSR being processed is indicated by 'index'.  If a read is being processed
> +the 'data' field is expected to be filled out by userspace (as an out
> +parameter). If a write is being processed the 'data' field will contain the
> +updated value of the MSR (as an in parameter).
> +
>   ::
> 
>                  /* Fix the size of the union. */
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index b66432b015d2..d03143ebd6f0 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -367,6 +367,30 @@ TRACE_EVENT(kvm_msr,
>   #define trace_kvm_msr_read_ex(ecx)         trace_kvm_msr(0, ecx, 0, true)
>   #define trace_kvm_msr_write_ex(ecx, data)  trace_kvm_msr(1, ecx, data, true)
> 
> +TRACE_EVENT(kvm_userspace_msr,
> +       TP_PROTO(bool is_write, u8 inject_gp, u32 index, u64 data),
> +       TP_ARGS(is_write, inject_gp, index, data),
> +
> +       TP_STRUCT__entry(
> +               __field(bool,   is_write)
> +               __field(u8,     inject_gp)
> +               __field(u32,    index)
> +               __field(u64,    data)
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->is_write       = is_write;
> +               __entry->inject_gp      = inject_gp;
> +               __entry->index          = index;
> +               __entry->data           = data;
> +       ),
> +
> +       TP_printk("userspace %s %x = 0x%llx, %s",
> +                 __entry->is_write ? "wrmsr" : "rdmsr",
> +                 __entry->index, __entry->data,
> +                 __entry->inject_gp ? "inject_gp" : "no_gp")
> +);
> +
>   /*
>    * Tracepoint for guest CR access.
>    */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 46a0fb9e0869..47619b49818a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -108,6 +108,8 @@ static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
>   static void store_regs(struct kvm_vcpu *vcpu);
>   static int sync_regs(struct kvm_vcpu *vcpu);
> 
> +bool kvm_msr_user_exit(struct kvm *kvm, u32 index);
> +
>   struct kvm_x86_ops kvm_x86_ops __read_mostly;
>   EXPORT_SYMBOL_GPL(kvm_x86_ops);
> 
> @@ -1549,11 +1551,61 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
>   }
>   EXPORT_SYMBOL_GPL(kvm_set_msr);
> 
> +/*
> + * On success, returns 1 so that __vcpu_run() will happen next. On
> + * error, returns 0.
> + */
> +static int complete_userspace_msr(struct kvm_vcpu *vcpu, bool is_write)
> +{
> +       u32 ecx = vcpu->run->msr.index;
> +       u64 data = vcpu->run->msr.data;
> +
> +       trace_kvm_userspace_msr(is_write,
> +                               vcpu->run->msr.inject_gp,
> +                               vcpu->run->msr.index,
> +                               vcpu->run->msr.data);
> +
> +       if (vcpu->run->msr.inject_gp) {
> +               trace_kvm_msr(is_write, ecx, data, true);

If you put the trace point one line up and make the last argument 
!!vcpu->run->msr.inject_gp, you can omit the second invocation below :). 
Bonus readability points for making inject_gp a local bool variable.

> +               kvm_inject_gp(vcpu, 0);
> +               return 1;
> +       }
> +
> +       trace_kvm_msr(is_write, ecx, data, false);
> +       if (!is_write) {
> +               kvm_rax_write(vcpu, data & -1u);
> +               kvm_rdx_write(vcpu, (data >> 32) & -1u);
> +       }
> +
> +       return kvm_skip_emulated_instruction(vcpu);
> +}
> +
> +static int complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
> +{
> +       return complete_userspace_msr(vcpu, false);
> +}
> +
> +static int complete_userspace_wrmsr(struct kvm_vcpu *vcpu)
> +{
> +       return complete_userspace_msr(vcpu, true);
> +}
> +
>   int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
>   {
>          u32 ecx = kvm_rcx_read(vcpu);
>          u64 data;
> 
> +       if (kvm_msr_user_exit(vcpu->kvm, ecx)) {
> +               vcpu->run->exit_reason = KVM_EXIT_X86_RDMSR;
> +               vcpu->run->msr.index = ecx;
> +               vcpu->run->msr.data = 0;
> +               vcpu->run->msr.inject_gp = 0;
> +               memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
> +               vcpu->arch.complete_userspace_io =
> +                       complete_userspace_rdmsr;
> +               return 0;
> +       }
> +
>          if (kvm_get_msr(vcpu, ecx, &data)) {
>                  trace_kvm_msr_read_ex(ecx);
>                  kvm_inject_gp(vcpu, 0);
> @@ -1573,6 +1625,17 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
>          u32 ecx = kvm_rcx_read(vcpu);
>          u64 data = kvm_read_edx_eax(vcpu);
> 
> +       if (kvm_msr_user_exit(vcpu->kvm, ecx)) {
> +               vcpu->run->exit_reason = KVM_EXIT_X86_WRMSR;
> +               vcpu->run->msr.index = ecx;
> +               vcpu->run->msr.data = data;
> +               vcpu->run->msr.inject_gp = 0;
> +               memset(vcpu->run->msr.pad, 0, sizeof(vcpu->run->msr.pad));
> +               vcpu->arch.complete_userspace_io =
> +                       complete_userspace_wrmsr;
> +               return 0;
> +       }
> +
>          if (kvm_set_msr(vcpu, ecx, data)) {
>                  trace_kvm_msr_write_ex(ecx, data);
>                  kvm_inject_gp(vcpu, 0);
> @@ -3455,6 +3518,25 @@ static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
>          return 0;
>   }
> 
> +bool kvm_msr_user_exit(struct kvm *kvm, u32 index)
> +{
> +       struct kvm_msr_list *exit_msrs;
> +       int i;
> +
> +       exit_msrs = kvm->arch.user_exit_msrs;
> +
> +       if (!exit_msrs)
> +               return false;
> +
> +       for (i = 0; i < exit_msrs->nmsrs; ++i) {
> +               if (exit_msrs->indices[i] == index)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> +EXPORT_SYMBOL_GPL(kvm_msr_user_exit);
> +
>   int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   {
>          int r = 0;
> @@ -10762,3 +10844,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
>   EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_userspace_msr);
> diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
> index 2c735a3e6613..19f33a704174 100644
> --- a/include/trace/events/kvm.h
> +++ b/include/trace/events/kvm.h
> @@ -17,7 +17,7 @@
>          ERSN(NMI), ERSN(INTERNAL_ERROR), ERSN(OSI), ERSN(PAPR_HCALL),   \
>          ERSN(S390_UCONTROL), ERSN(WATCHDOG), ERSN(S390_TSCH), ERSN(EPR),\
>          ERSN(SYSTEM_EVENT), ERSN(S390_STSI), ERSN(IOAPIC_EOI),          \
> -       ERSN(HYPERV)
> +       ERSN(HYPERV), ERSN(X86_RDMSR), ERSN(X86_WRMSR)
> 
>   TRACE_EVENT(kvm_userspace_exit,
>              TP_PROTO(__u32 reason, int errno),
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index de4638c1bd15..2b7d21e6338c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -248,6 +248,8 @@ struct kvm_hyperv_exit {
>   #define KVM_EXIT_IOAPIC_EOI       26
>   #define KVM_EXIT_HYPERV           27
>   #define KVM_EXIT_ARM_NISV         28
> +#define KVM_EXIT_X86_RDMSR        29
> +#define KVM_EXIT_X86_WRMSR        30
> 
>   /* For KVM_EXIT_INTERNAL_ERROR */
>   /* Emulate instruction failed. */
> @@ -412,6 +414,14 @@ struct kvm_run {
>                          __u64 esr_iss;
>                          __u64 fault_ipa;
>                  } arm_nisv;
> +               /* KVM_EXIT_X86_RDMSR */
> +               /* KVM_EXIT_X86_RDMSR */

WRMSR?

Alex

> +               struct {
> +                       __u8 inject_gp;
> +                       __u8 pad[3];
> +                       __u32 index;
> +                       __u64 data;
> +               } msr;
>                  /* Fix the size of the union. */
>                  char padding[256];
>          };
> --
> 2.28.0.163.g6104cc2f0b6-goog
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list
  2020-08-07 22:49       ` Alexander Graf
@ 2020-08-08  0:39         ` Jim Mattson
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2020-08-08  0:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Aaron Lewis, Peter Shier, Oliver Upton, kvm list

On Fri, Aug 7, 2020 at 3:49 PM Alexander Graf <graf@amazon.com> wrote:
>
> On 07.08.20 23:16, Jim Mattson wrote:
> > This patch doesn't attempt to solve your problem, "tell user space
> > about unhandled MSRs." It attempts to solve our problem, "tell user
> > space about a specific set of MSRs, even if kvm learns to handle them
> > in the future." This is, in fact, what we really wanted to do when
> > Peter Hornyack implemented the "tell user space about unhandled MSRs"
> > changes in 2015. We just didn't realize it at the time.
>
> Ok, let's take a step back then. What exactly are you trying to solve
> and which MSRs do you care about?

Our userspace handles a handful of MSRs, including some of the
synthetic PV time-related MSRs (both kvm and Hyper-V), which we want
to exit to userspace even though kvm thinks it knows what to do with
them. We also have a range of Google-specific MSRs that kvm will
probably never know about. And, finally, we find that as time goes on,
there are situations where we want to add support for a new MSR in
userspace because we can roll out a new userspace more quickly than a
new kernel. An example of this last category is
MSR_IA32_ARCH_CAPABILITIES. We don't necessarily want to cede control
of such MSRs to kvm as soon as it knows about them. For instance, we
may want to wait until the new kernel reaches its rollback horizon.
There may be a few more MSRs we handle in userspace, but I think that
covers the bulk of them.

I don't believe we have yet seen a case where we wanted to take
control of an MSR that kvm didn't intercept itself. That part of
Aaron's patch may be overkill, but it gives the API clean semantics.
It seems a bit awkward to have a userspace MSR list where accesses to
some of the MSRs exit to userspace and accesses to others do not. (I
believe that your implementation of the allow list suffers from this
awkwardness, though I haven't yet had time to review it in depth.)

> My main problem with a deny list approach is that it's (practically)
> impossible to map the allow list use case onto it. It is however trivial
> to only deny a few select MSRs explicitly with an allow list approach. I
> don't want to introduce yet another ABI to KVM in 2 years to then have
> both :).

I agree in part. A deny list does not support your use case well. But
does an allow list support it that much better? I suspect that the
allow list that specifies "every MSR that kvm knows about today" is
far from trivial to construct, especially if you consider different
microarchitectures and different VM configurations (e.g. vPMU vs. no
vPMU).

Can we back up and ask what it is that you want to accomplish? If you
simply want ignore_msrs to be per-VM, wouldn't it be easier to add an
ioctl to simply set a boolean that potentially overrides the module
parameter, and then modify the ignore_msrs code to consult the per-VM
boolean?

Getting back to one of the issues I raised earlier: your "exit instead
of #GP" patch exits to userspace on both unknown MSRs and illegal
WRMSR values. If userspace wants to implement "ignore MSRs" for a
particular VM, it's absolutely trivial when the reason for the exit is
"unknown MSR." However, it becomes more complicated when the reason
for the exit is "#GP." Now, userspace has to know which MSRs are known
to kvm before it can decide whether or not the access should #GP.

I hope this doesn't come across sounding antagonistic. Like you, I
want to avoid revisiting this design a few years down the road. We're
already in that boat with the 2015 "exit on unknown MSR" change!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list
  2020-08-04  4:20 ` [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list Aaron Lewis
  2020-08-07 23:28   ` Alexander Graf
@ 2020-08-10  9:53   ` Andrew Jones
  1 sibling, 0 replies; 20+ messages in thread
From: Andrew Jones @ 2020-08-10  9:53 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: jmattson, graf, pshier, oupton, kvm

On Mon, Aug 03, 2020 at 09:20:43PM -0700, Aaron Lewis wrote:
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/ucall.c b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> index da4d89ad5419..a3489973e290 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> @@ -40,6 +40,9 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
>  	struct kvm_run *run = vcpu_state(vm, vcpu_id);
>  	struct ucall ucall = {};
>  
> +	if (uc)
> +		memset(uc, 0, sizeof(*uc));
> +
>  	if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) {
>  		struct kvm_regs regs;
>

This is a fix that should be in a separate patch, and please patch the
other architectures too.

Thanks,
drew  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list
  2020-08-07 23:28   ` Alexander Graf
@ 2020-08-10 17:24     ` Jim Mattson
  2020-08-10 21:18       ` Aaron Lewis
  2020-08-18 21:26     ` Aaron Lewis
  1 sibling, 1 reply; 20+ messages in thread
From: Jim Mattson @ 2020-08-10 17:24 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Aaron Lewis, Peter Shier, Oliver Upton, kvm list

On Fri, Aug 7, 2020 at 4:28 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 04.08.20 06:20, Aaron Lewis wrote:

> > +       __asm__ __volatile__("rdmsr_start: rdmsr; rdmsr_end:" :
> > +                       "=a"(a), "=d"(d) : "c"(msr) : "memory");
>
> I personally would find
>
> asm volatile("rdmsr_start:");
> r = rdmsr(msr);
> asm volatile("rdmsr_end:");
>
> more readable. Same for wrmsr.

I don't think the suggested approach is guaranteed to work, because "r
= rdmsr(msr)" isn't always going to be a single instruction. The
compiler may do some register shuffling before and/or after the rdmsr
instruction itself, and I believe Aaron wants the labels immediately
before and after the rdmsr instruction.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list
  2020-08-10 17:24     ` Jim Mattson
@ 2020-08-10 21:18       ` Aaron Lewis
  0 siblings, 0 replies; 20+ messages in thread
From: Aaron Lewis @ 2020-08-10 21:18 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Alexander Graf, Peter Shier, Oliver Upton, kvm list

I released v2 of this series to incorporate Alex's changes with the
generic instruction emulator into it.  I'll incorporate this feedback
into v3.

On Mon, Aug 10, 2020 at 10:24 AM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Aug 7, 2020 at 4:28 PM Alexander Graf <graf@amazon.com> wrote:
> >
> >
> >
> > On 04.08.20 06:20, Aaron Lewis wrote:
>
> > > +       __asm__ __volatile__("rdmsr_start: rdmsr; rdmsr_end:" :
> > > +                       "=a"(a), "=d"(d) : "c"(msr) : "memory");
> >
> > I personally would find
> >
> > asm volatile("rdmsr_start:");
> > r = rdmsr(msr);
> > asm volatile("rdmsr_end:");
> >
> > more readable. Same for wrmsr.
>
> I don't think the suggested approach is guaranteed to work, because "r
> = rdmsr(msr)" isn't always going to be a single instruction. The
> compiler may do some register shuffling before and/or after the rdmsr
> instruction itself, and I believe Aaron wants the labels immediately
> before and after the rdmsr instruction.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list
  2020-08-07 23:28   ` Alexander Graf
  2020-08-10 17:24     ` Jim Mattson
@ 2020-08-18 21:26     ` Aaron Lewis
  1 sibling, 0 replies; 20+ messages in thread
From: Aaron Lewis @ 2020-08-18 21:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Jim Mattson, kvm list

On Fri, Aug 7, 2020 at 4:28 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 04.08.20 06:20, Aaron Lewis wrote:
> >
> > Add a selftest to test that when ioctl KVM_SET_EXIT_MSRS is called with
> > an MSR list the guest exits to the host and then to userspace when an
> > MSR in that list is read from or written to.
> >
> > This test uses 3 MSRs to test these new features:
> >    1. MSR_IA32_XSS, an MSR the kernel knows about.
> >    2. MSR_IA32_FLUSH_CMD, an MSR the kernel does not know about.
> >    3. MSR_NON_EXISTENT, an MSR invented in this test for the purposes of
> >       passing a fake MSR from the guest to userspace and having the guest
> >       be able to read from and write to it, with userspace handling it.
> >       KVM just acts as a pass through.
> >
> > Userspace is also able to inject a #GP.  This is demonstrated when
> > MSR_IA32_XSS and MSR_IA32_FLUSH_CMD are misused in the test.  When this
> > happens a #GP is initiated in userspace to be thrown in the guest.  To
> > be able to handle this, exception handling was added to selftests, and a
> > #GP exception handler is registered in the test.  This allows the test
> > to inject a #GP and be able to handle it gracefully.
> >
> > Signed-off-by: Aaron Lewis <aaronlewis@google.com>
>
> This is so much cooler than my excuse of a self test :). I do think it
> makes for easier review (and reuse) if you split the interrupt handling
> logic from the actual user space msr bits though.

Exception handling has been moved into its own commit.

>
> > ---
> >   tools/testing/selftests/kvm/.gitignore        |   1 +
> >   tools/testing/selftests/kvm/Makefile          |  20 +-
> >   .../selftests/kvm/include/x86_64/processor.h  |  27 ++
> >   tools/testing/selftests/kvm/lib/kvm_util.c    |  17 ++
> >   .../selftests/kvm/lib/kvm_util_internal.h     |   2 +
> >   .../selftests/kvm/lib/x86_64/handlers.S       |  83 ++++++
> >   .../selftests/kvm/lib/x86_64/processor.c      | 168 ++++++++++-
> >   .../testing/selftests/kvm/lib/x86_64/ucall.c  |   3 +
> >   .../selftests/kvm/x86_64/userspace_msr_exit.c | 271 ++++++++++++++++++
> >   9 files changed, 582 insertions(+), 10 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/lib/x86_64/handlers.S
> >   create mode 100644 tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
> >
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index 452787152748..33619f915857 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -14,6 +14,7 @@
> >   /x86_64/vmx_preemption_timer_test
> >   /x86_64/svm_vmcall_test
> >   /x86_64/sync_regs_test
> > +/x86_64/userspace_msr_exit
> >   /x86_64/vmx_close_while_nested_test
> >   /x86_64/vmx_dirty_log_test
> >   /x86_64/vmx_set_nested_state_test
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index 4a166588d99f..66a6652ca305 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -34,7 +34,7 @@ ifeq ($(ARCH),s390)
> >   endif
> >
> >   LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/sparsebit.c lib/test_util.c
> > -LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c
> > +LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
> >   LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c
> >   LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c
> >
> > @@ -49,6 +49,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/state_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/sync_regs_test
> > +TEST_GEN_PROGS_x86_64 += x86_64/userspace_msr_exit
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_close_while_nested_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
> > @@ -108,14 +109,21 @@ LDFLAGS += -pthread $(no-pie-option) $(pgste-option)
> >   include ../lib.mk
> >
> >   STATIC_LIBS := $(OUTPUT)/libkvm.a
> > -LIBKVM_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM))
> > -EXTRA_CLEAN += $(LIBKVM_OBJ) $(STATIC_LIBS) cscope.*
> > +LIBKVM_C := $(filter %.c,$(LIBKVM))
> > +LIBKVM_S := $(filter %.S,$(LIBKVM))
> > +LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
> > +LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
> > +EXTRA_CLEAN += $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(STATIC_LIBS) cscope.*
> > +
> > +x := $(shell mkdir -p $(sort $(dir $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ))))
> > +$(LIBKVM_C_OBJ): $(OUTPUT)/%.o: %.c
> > +       $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> >
> > -x := $(shell mkdir -p $(sort $(dir $(LIBKVM_OBJ))))
> > -$(LIBKVM_OBJ): $(OUTPUT)/%.o: %.c
> > +$(LIBKVM_S_OBJ): $(OUTPUT)/%.o: %.S
> >          $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@
> >
> > -$(OUTPUT)/libkvm.a: $(LIBKVM_OBJ)
> > +LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ)
> > +$(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
> >          $(AR) crs $@ $^
> >
> >   x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> > diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > index 0a65e7bb5249..a4de429eb408 100644
> > --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> > +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> > @@ -36,6 +36,8 @@
> >   #define X86_CR4_SMAP           (1ul << 21)
> >   #define X86_CR4_PKE            (1ul << 22)
> >
> > +#define UNEXPECTED_VECTOR_PORT 0xfff0u
> > +
> >   /* General Registers in 64-Bit Mode */
> >   struct gpr64_regs {
> >          u64 rax;
> > @@ -239,6 +241,11 @@ static inline struct desc_ptr get_idt(void)
> >          return idt;
> >   }
> >
> > +static inline void outl(uint16_t port, uint32_t value)
> > +{
> > +       __asm__ __volatile__("outl %%eax, %%dx" : : "d"(port), "a"(value));
> > +}
> > +
> >   #define SET_XMM(__var, __xmm) \
> >          asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
> >
> > @@ -334,10 +341,30 @@ int _vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
> >   void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
> >                    uint64_t msr_value);
> >
> > +void kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
> > +       uint32_t msr_indices[]);
> > +
> >   uint32_t kvm_get_cpuid_max_basic(void);
> >   uint32_t kvm_get_cpuid_max_extended(void);
> >   void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
> >
> > +struct ex_regs {
> > +       uint64_t rax, rcx, rdx, rbx;
> > +       uint64_t dummy, rbp, rsi, rdi;
>
> Why the dummy?

It's there to match other regs layouts.  It isn't needed in this one
though, so I removed it.

>
> > +       uint64_t r8, r9, r10, r11;
> > +       uint64_t r12, r13, r14, r15;
> > +       uint64_t vector;
> > +       uint64_t error_code;
> > +       uint64_t rip;
> > +       uint64_t cs;
> > +       uint64_t rflags;
> > +};
> > +
> > +void vm_init_descriptor_tables(struct kvm_vm *vm);
> > +void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid);
> > +void vm_handle_exception(struct kvm_vm *vm, int vector,
> > +                       void (*handler)(struct ex_regs *));
> > +
> >   /*
> >    * Basic CPU control in CR0
> >    */
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index 74776ee228f2..f8dde1cdbef0 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -1195,6 +1195,21 @@ int _vcpu_run(struct kvm_vm *vm, uint32_t vcpuid)
> >          do {
> >                  rc = ioctl(vcpu->fd, KVM_RUN, NULL);
> >          } while (rc == -1 && errno == EINTR);
> > +
> > +#ifdef __x86_64__
> > +       if (vcpu_state(vm, vcpuid)->exit_reason == KVM_EXIT_IO
>
> This does feel a bit out of place here, no? Is there any particular
> reason not to handle it similar to the UCALL logic? In fact, would it
> hurt to just declare it as a new ucall? You do have a working stack
> after all ...
>

I left this as is for now.  I'm not sure it makes sense to move this
to the UCALL system because that system works on multiple platforms
and I only implemented exception handling on x86.

> > +               && vcpu_state(vm, vcpuid)->io.port == UNEXPECTED_VECTOR_PORT
> > +               && vcpu_state(vm, vcpuid)->io.size == 4) {
> > +               /* Grab pointer to io data */
> > +               uint32_t *data = (void *)vcpu_state(vm, vcpuid)
> > +                       + vcpu_state(vm, vcpuid)->io.data_offset;
> > +
> > +               TEST_ASSERT(false,
> > +                           "Unexpected vectored event in guest (vector:0x%x)",
> > +                           *data);
> > +       }
> > +#endif
> > +
> >          return rc;
> >   }
> >
> > @@ -1590,6 +1605,8 @@ static struct exit_reason {
> >          {KVM_EXIT_INTERNAL_ERROR, "INTERNAL_ERROR"},
> >          {KVM_EXIT_OSI, "OSI"},
> >          {KVM_EXIT_PAPR_HCALL, "PAPR_HCALL"},
> > +       {KVM_EXIT_X86_RDMSR, "RDMSR"},
> > +       {KVM_EXIT_X86_WRMSR, "WRMSR"},
> >   #ifdef KVM_EXIT_MEMORY_NOT_PRESENT
> >          {KVM_EXIT_MEMORY_NOT_PRESENT, "MEMORY_NOT_PRESENT"},
> >   #endif
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> > index 2ef446520748..f07d383d03a1 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
> > @@ -50,6 +50,8 @@ struct kvm_vm {
> >          vm_paddr_t pgd;
> >          vm_vaddr_t gdt;
> >          vm_vaddr_t tss;
> > +       vm_vaddr_t idt;
> > +       vm_vaddr_t handlers;
> >   };
> >
> >   struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid);
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/handlers.S b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
> > new file mode 100644
> > index 000000000000..783d2c0fc741
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
> > @@ -0,0 +1,83 @@
> > +handle_exception:
> > +       push %r15
> > +       push %r14
> > +       push %r13
> > +       push %r12
> > +       push %r11
> > +       push %r10
> > +       push %r9
> > +       push %r8
> > +
> > +       push %rdi
> > +       push %rsi
> > +       push %rbp
> > +       sub $8, %rsp
> > +       push %rbx
> > +       push %rdx
> > +       push %rcx
> > +       push %rax
> > +       mov %rsp, %rdi
> > +
> > +       call route_exception
> > +
> > +       pop %rax
> > +       pop %rcx
> > +       pop %rdx
> > +       pop %rbx
> > +       add $8, %rsp
> > +       pop %rbp
> > +       pop %rsi
> > +       pop %rdi
> > +       pop %r8
> > +       pop %r9
> > +       pop %r10
> > +       pop %r11
> > +       pop %r12
> > +       pop %r13
> > +       pop %r14
> > +       pop %r15
> > +
> > +       /* Discard vector and error code. */
> > +       add $16, %rsp
> > +       iretq
> > +
> > +/*
> > + * Build the handle_exception wrappers which push the vector/error code on the
> > + * stack and an array of pointers to those wrappers.
> > + */
> > +.pushsection .rodata
> > +.globl idt_handlers
> > +idt_handlers:
> > +.popsection
> > +
> > +.macro HANDLERS has_error from to
> > +       vector = \from
> > +       .rept \to - \from + 1
> > +       .align 8
> > +
> > +       /* Fetch current address and append it to idt_handlers. */
> > +       current_handler = .
> > +.pushsection .rodata
> > +.quad current_handler
> > +.popsection
> > +
> > +       .if ! \has_error
> > +       pushq $0
> > +       .endif
> > +       pushq $vector
> > +       jmp handle_exception
> > +       vector = vector + 1
> > +       .endr
> > +.endm
> > +
> > +.global idt_handler_code
> > +idt_handler_code:
> > +       HANDLERS has_error=0 from=0  to=7
> > +       HANDLERS has_error=1 from=8  to=8
> > +       HANDLERS has_error=0 from=9  to=9
> > +       HANDLERS has_error=1 from=10 to=14
> > +       HANDLERS has_error=0 from=15 to=16
> > +       HANDLERS has_error=1 from=17 to=17
> > +       HANDLERS has_error=0 from=18 to=255
> > +
> > +.section        .note.GNU-stack, "", %progbits
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index f6eb34eaa0d2..ff56753f205f 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -12,6 +12,13 @@
> >   #include "../kvm_util_internal.h"
> >   #include "processor.h"
> >
> > +#ifndef NUM_INTERRUPTS
> > +#define NUM_INTERRUPTS 256
> > +#endif
> > +
> > +#define DEFAULT_CODE_SELECTOR 0x8
> > +#define DEFAULT_DATA_SELECTOR 0x10
> > +
> >   /* Minimum physical address used for virtual translation tables. */
> >   #define KVM_GUEST_PAGE_TABLE_MIN_PADDR 0x180000
> >
> > @@ -392,11 +399,12 @@ static void kvm_seg_fill_gdt_64bit(struct kvm_vm *vm, struct kvm_segment *segp)
> >          desc->limit0 = segp->limit & 0xFFFF;
> >          desc->base0 = segp->base & 0xFFFF;
> >          desc->base1 = segp->base >> 16;
> > -       desc->s = segp->s;
> >          desc->type = segp->type;
> > +       desc->s = segp->s;
>
> This probably should go into the previous patch.

done

>
> >          desc->dpl = segp->dpl;
> >          desc->p = segp->present;
> >          desc->limit1 = segp->limit >> 16;
> > +       desc->avl = segp->avl;
> >          desc->l = segp->l;
> >          desc->db = segp->db;
> >          desc->g = segp->g;
> > @@ -556,9 +564,9 @@ static void vcpu_setup(struct kvm_vm *vm, int vcpuid, int pgd_memslot, int gdt_m
> >                  sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
> >
> >                  kvm_seg_set_unusable(&sregs.ldt);
> > -               kvm_seg_set_kernel_code_64bit(vm, 0x8, &sregs.cs);
> > -               kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.ds);
> > -               kvm_seg_set_kernel_data_64bit(vm, 0x10, &sregs.es);
> > +               kvm_seg_set_kernel_code_64bit(vm, DEFAULT_CODE_SELECTOR, &sregs.cs);
> > +               kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.ds);
> > +               kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.es);
> >                  kvm_setup_tss_64bit(vm, &sregs.tr, 0x18, gdt_memslot, pgd_memslot);
> >                  break;
> >
> > @@ -843,6 +851,71 @@ void vcpu_set_msr(struct kvm_vm *vm, uint32_t vcpuid, uint64_t msr_index,
> >                  "  rc: %i errno: %i", r, errno);
> >   }
> >
> > +/*
> > + * _KVM Set Exit MSR
> > + *
> > + * Input Args:
> > + *   vm - Virtual Machine
> > + *   nmsrs - Number of msrs in msr_indices
> > + *   msr_indices[] - List of msrs.
> > + *
> > + * Output Args: None
> > + *
> > + * Return: The result of KVM_SET_EXIT_MSRS.
> > + *
> > + * Sets a list of MSRs that will force an exit to userspace when
> > + * any of them are read from or written to by the guest.
> > + */
> > +int _kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
> > +       uint32_t msr_indices[])
> > +{
> > +       const uint32_t max_nmsrs = 256;
> > +       struct kvm_msr_list *msr_list;
> > +       uint32_t i;
> > +       int r;
> > +
> > +       TEST_ASSERT(nmsrs <= max_nmsrs,
> > +               "'nmsrs' is too large.  Max is %u, currently %u.\n",
> > +               max_nmsrs, nmsrs);
> > +       uint32_t msr_list_byte_size = sizeof(struct kvm_msr_list) +
> > +                                                            (sizeof(msr_list->indices[0]) * nmsrs);
> > +       msr_list = alloca(msr_list_byte_size);
> > +       memset(msr_list, 0, msr_list_byte_size);
> > +
> > +       msr_list->nmsrs = nmsrs;
> > +       for (i = 0; i < nmsrs; i++)
> > +               msr_list->indices[i] = msr_indices[i];
> > +
> > +       r = ioctl(vm->fd, KVM_SET_EXIT_MSRS, msr_list);
> > +
> > +       return r;
> > +}
> > +
> > +/*
> > + * KVM Set Exit MSR
> > + *
> > + * Input Args:
> > + *   vm - Virtual Machine
> > + *   nmsrs - Number of msrs in msr_indices
> > + *   msr_indices[] - List of msrs.
> > + *
> > + * Output Args: None
> > + *
> > + * Return: None
> > + *
> > + * Sets a list of MSRs that will force an exit to userspace when
> > + * any of them are read from or written to by the guest.
> > + */
> > +void kvm_set_exit_msrs(struct kvm_vm *vm, uint32_t nmsrs,
> > +       uint32_t msr_indices[])
> > +{
> > +       int r;
> > +
> > +       r = _kvm_set_exit_msrs(vm, nmsrs, msr_indices);
> > +       TEST_ASSERT(r == 0, "KVM_SET_EXIT_MSRS IOCTL failed,\n"
> > +               "  rc: %i errno: %i", r, errno);
> > +}
> > +
> >   void vcpu_args_set(struct kvm_vm *vm, uint32_t vcpuid, unsigned int num, ...)
> >   {
> >          va_list ap;
> > @@ -1118,3 +1191,90 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits)
> >                  *va_bits = (entry->eax >> 8) & 0xff;
> >          }
> >   }
> > +
> > +struct idt_entry {
> > +       uint16_t offset0;
> > +       uint16_t selector;
> > +       uint16_t ist : 3;
> > +       uint16_t : 5;
> > +       uint16_t type : 4;
> > +       uint16_t : 1;
> > +       uint16_t dpl : 2;
> > +       uint16_t p : 1;
> > +       uint16_t offset1;
> > +       uint32_t offset2; uint32_t reserved;
> > +};
> > +
> > +static void set_idt_entry(struct kvm_vm *vm, int vector, unsigned long addr,
> > +                         int dpl, unsigned short selector)
> > +{
> > +       struct idt_entry *base =
> > +               (struct idt_entry *)addr_gva2hva(vm, vm->idt);
> > +       struct idt_entry *e = &base[vector];
> > +
> > +       memset(e, 0, sizeof(*e));
> > +       e->offset0 = addr;
> > +       e->selector = selector;
> > +       e->ist = 0;
> > +       e->type = 14;
> > +       e->dpl = dpl;
> > +       e->p = 1;
> > +       e->offset1 = addr >> 16;
> > +       e->offset2 = addr >> 32;
> > +}
> > +
> > +void kvm_exit_unexpected_vector(uint32_t value)
> > +{
> > +       outl(UNEXPECTED_VECTOR_PORT, value);
> > +}
> > +
> > +void route_exception(struct ex_regs *regs)
> > +{
> > +       typedef void(*handler)(struct ex_regs *);
> > +       handler *handlers;
> > +
> > +       handlers = (handler *)rdmsr(MSR_GS_BASE);
> > +
> > +       if (handlers[regs->vector]) {
> > +               handlers[regs->vector](regs);
> > +               return;
> > +       }
> > +
> > +       kvm_exit_unexpected_vector(regs->vector);
> > +}
> > +
> > +void vm_init_descriptor_tables(struct kvm_vm *vm)
> > +{
> > +       extern void *idt_handlers;
> > +       int i;
> > +
> > +       vm->idt = vm_vaddr_alloc(vm, getpagesize(), 0x2000, 0, 0);
> > +       vm->handlers = vm_vaddr_alloc(vm, 256 * sizeof(void *), 0x2000, 0, 0);
> > +       /* Handlers have the same address in both address spaces.*/
> > +       for (i = 0; i < NUM_INTERRUPTS; i++)
> > +               set_idt_entry(vm, i, (unsigned long)(&idt_handlers)[i], 0,
> > +                       DEFAULT_CODE_SELECTOR);
> > +}
> > +
> > +void vcpu_init_descriptor_tables(struct kvm_vm *vm, uint32_t vcpuid)
> > +{
> > +       struct kvm_sregs sregs;
> > +
> > +       vcpu_sregs_get(vm, vcpuid, &sregs);
> > +       sregs.idt.base = vm->idt;
> > +       sregs.idt.limit = NUM_INTERRUPTS * sizeof(struct idt_entry) - 1;
> > +       sregs.gdt.base = vm->gdt;
> > +       sregs.gdt.limit = getpagesize() - 1;
> > +       /* Use GS Base to pass the pointer to the handlers to the guest.*/
> > +       kvm_seg_set_kernel_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs);
> > +       sregs.gs.base = (unsigned long) vm->handlers;
> > +       vcpu_sregs_set(vm, vcpuid, &sregs);
> > +}
> > +
> > +void vm_handle_exception(struct kvm_vm *vm, int vector,
> > +                        void (*handler)(struct ex_regs *))
> > +{
> > +       vm_vaddr_t *handlers = (vm_vaddr_t *)addr_gva2hva(vm, vm->handlers);
> > +
> > +       handlers[vector] = (vm_vaddr_t)handler;
> > +}
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/ucall.c b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> > index da4d89ad5419..a3489973e290 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/ucall.c
> > @@ -40,6 +40,9 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc)
> >          struct kvm_run *run = vcpu_state(vm, vcpu_id);
> >          struct ucall ucall = {};
> >
> > +       if (uc)
> > +               memset(uc, 0, sizeof(*uc));
> > +
> >          if (run->exit_reason == KVM_EXIT_IO && run->io.port == UCALL_PIO_PORT) {
> >                  struct kvm_regs regs;
> >
> > diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
> > new file mode 100644
> > index 000000000000..6c7868dbce08
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit.c
> > @@ -0,0 +1,271 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2020, Google LLC.
> > + *
> > + * Tests for exiting into userspace on registered MSRs
> > + */
> > +
> > +#define _GNU_SOURCE /* for program_invocation_short_name */
> > +#include <sys/ioctl.h>
> > +
> > +#include "test_util.h"
> > +#include "kvm_util.h"
> > +#include "vmx.h"
> > +
> > +#define VCPU_ID              1
> > +
> > +#define MSR_NON_EXISTENT 0x474f4f00
> > +
> > +uint32_t msrs[] = {
> > +       /* Test an MSR the kernel knows about. */
> > +       MSR_IA32_XSS,
> > +       /* Test an MSR the kernel doesn't know about. */
> > +       MSR_IA32_FLUSH_CMD,
> > +       /* Test a fabricated MSR that no one knows about. */
> > +       MSR_NON_EXISTENT,
> > +};
> > +uint32_t nmsrs = ARRAY_SIZE(msrs);
> > +
> > +uint64_t msr_non_existent_data;
> > +int guest_exception_count;
> > +
> > +/*
> > + * Note: Force test_rdmsr() to not be inlined to prevent the labels,
> > + * rdmsr_start and rdmsr_end, from being defined multiple times.
> > + */
> > +static noinline uint64_t test_rdmsr(uint32_t msr)
> > +{
> > +       uint32_t a, d;
> > +
> > +       guest_exception_count = 0;
> > +
> > +       __asm__ __volatile__("rdmsr_start: rdmsr; rdmsr_end:" :
> > +                       "=a"(a), "=d"(d) : "c"(msr) : "memory");
>
> I personally would find
>
> asm volatile("rdmsr_start:");
> r = rdmsr(msr);
> asm volatile("rdmsr_end:");
>
> more readable. Same for wrmsr.
>
>
> The rest of the test looks very clean and nice to read. Kudos!
>
> Alex
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears userspace tracked MSRs
  2020-08-07 23:03   ` Alexander Graf
@ 2020-08-18 21:34     ` Aaron Lewis
  2020-08-18 21:57       ` Jim Mattson
  0 siblings, 1 reply; 20+ messages in thread
From: Aaron Lewis @ 2020-08-18 21:34 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Jim Mattson, kvm list

On Fri, Aug 7, 2020 at 4:03 PM Alexander Graf <graf@amazon.com> wrote:
>
>
>
> On 04.08.20 06:20, Aaron Lewis wrote:
> >
> > SDM volume 3: 24.6.9 "MSR-Bitmap Address" and APM volume 2: 15.11 "MS
> > intercepts" describe MSR permission bitmaps.  Permission bitmaps are
> > used to control whether an execution of rdmsr or wrmsr will cause a
> > vm exit.  For userspace tracked MSRs it is required they cause a vm
> > exit, so the host is able to forward the MSR to userspace.  This change
> > adds vmx/svm support to ensure the permission bitmap is properly set to
> > cause a vm_exit to the host when rdmsr or wrmsr is used by one of the
> > userspace tracked MSRs.  Also, to avoid repeatedly setting them,
> > kvm_make_request() is used to coalesce these into a single call.
>
> I might have some fundamental misunderstanding here:
>
> 1) You force that the list of trapped MSRs is set before vCPU creation,
> so at the point of vCPU creation, you know already which MSRs should
> trap to user space
>
> 2) MSR intercept bitmaps are (AFAIK) all set to "trap by default". That
> means any MSR that we want the guest to get direct access to needs
> explicit opt-in through bitmap modifications.
>
> That means if you simply check whether an MSR is supposed to trap to
> user space in the bitmap set path, you don't need any of the complicated
> logic below, no?
>
>
> Alex
>

Yes, I think that should work as well.  However, calling it after the
fact like we do does have a nice advantage of allowing us to coalesce
the calls and limit the number of times we need to search the list.

> >
> > Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > ---
> >   arch/x86/include/asm/kvm_host.h |  3 ++
> >   arch/x86/kvm/svm/svm.c          | 49 ++++++++++++++++++++++++++-------
> >   arch/x86/kvm/vmx/vmx.c          | 13 ++++++++-
> >   arch/x86/kvm/x86.c              | 16 +++++++++++
> >   4 files changed, 70 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 510055471dd0..07a85f5f0b8a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -87,6 +87,7 @@
> >   #define KVM_REQ_HV_TLB_FLUSH \
> >          KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
> >   #define KVM_REQ_APF_READY              KVM_ARCH_REQ(28)
> > +#define KVM_REQ_USER_MSR_UPDATE KVM_ARCH_REQ(29)
> >
> >   #define CR0_RESERVED_BITS                                               \
> >          (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> > @@ -1271,6 +1272,8 @@ struct kvm_x86_ops {
> >          int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> >
> >          void (*migrate_timers)(struct kvm_vcpu *vcpu);
> > +
> > +       void (*set_user_msr_intercept)(struct kvm_vcpu *vcpu, u32 msr);
> >   };
> >
> >   struct kvm_x86_nested_ops {
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index eb673b59f7b7..c560d283b2af 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -583,13 +583,27 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> >          return !!test_bit(bit_write,  &tmp);
> >   }
> >
> > +static void __set_msr_interception(u32 *msrpm, u32 msr, int read, int write,
> > +                                  u32 offset)
> > +{
> > +       u8 bit_read, bit_write;
> > +       unsigned long tmp;
> > +
> > +       bit_read  = 2 * (msr & 0x0f);
> > +       bit_write = 2 * (msr & 0x0f) + 1;
> > +       tmp       = msrpm[offset];
> > +
> > +       read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> > +       write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> > +
> > +       msrpm[offset] = tmp;
> > +}
> > +
> >   static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> >                                   int write)
> >   {
> >          struct vcpu_svm *svm = to_svm(vcpu);
> >          u32 *msrpm = svm->msrpm;
> > -       u8 bit_read, bit_write;
> > -       unsigned long tmp;
> >          u32 offset;
> >
> >          /*
> > @@ -598,17 +612,30 @@ static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> >           */
> >          WARN_ON(!valid_msr_intercept(msr));
> >
> > -       offset    = svm_msrpm_offset(msr);
> > -       bit_read  = 2 * (msr & 0x0f);
> > -       bit_write = 2 * (msr & 0x0f) + 1;
> > -       tmp       = msrpm[offset];
> > -
> > +       offset = svm_msrpm_offset(msr);
> >          BUG_ON(offset == MSR_INVALID);
> >
> > -       read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> > -       write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> > +       __set_msr_interception(msrpm, msr, read, write, offset);
> >
> > -       msrpm[offset] = tmp;
> > +       if (read || write)
> > +               kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
> > +}
> > +
> > +static void set_user_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> > +                                     int write)
> > +{
> > +       struct vcpu_svm *svm = to_svm(vcpu);
> > +       u32 *msrpm = svm->msrpm;
> > +       u32 offset;
> > +
> > +       offset = svm_msrpm_offset(msr);
> > +       if (offset != MSR_INVALID)
> > +               __set_msr_interception(msrpm, msr, read, write, offset);
> > +}
> > +
> > +void svm_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
> > +{
> > +       set_user_msr_interception(vcpu, msr, 0, 0);
> >   }
> >
> >   static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
> > @@ -4088,6 +4115,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >          .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> >
> >          .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > +
> > +       .set_user_msr_intercept = svm_set_user_msr_intercept,
> >   };
> >
> >   static struct kvm_x86_init_ops svm_init_ops __initdata = {
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 1313e47a5a1e..3d3d9eaeca64 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -3728,6 +3728,10 @@ static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
> >                          __clear_bit(msr, msr_bitmap + 0xc00 / f);
> >
> >          }
> > +
> > +       if (type & MSR_TYPE_R || type & MSR_TYPE_W) {
> > +               kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
> > +       }
> >   }
> >
> >   static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
> > @@ -3795,7 +3799,7 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
> >   }
> >
> >   static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
> > -                                        unsigned long *msr_bitmap, u8 mode)
> > +                                       unsigned long *msr_bitmap, u8 mode)
> >   {
> >          int msr;
> >
> > @@ -3819,6 +3823,11 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
> >          }
> >   }
> >
> > +void vmx_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
> > +{
> > +       vmx_enable_intercept_for_msr(vcpu, msr, MSR_TYPE_RW);
> > +}
> > +
> >   void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
> >   {
> >          struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -7965,6 +7974,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> >          .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> >          .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> >          .migrate_timers = vmx_migrate_timers,
> > +
> > +       .set_user_msr_intercept = vmx_set_user_msr_intercept,
> >   };
> >
> >   static __init int hardware_setup(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 47619b49818a..45bf59f94d34 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3537,6 +3537,19 @@ bool kvm_msr_user_exit(struct kvm *kvm, u32 index)
> >   }
> >   EXPORT_SYMBOL_GPL(kvm_msr_user_exit);
> >
> > +static void kvm_set_user_msr_intercepts(struct kvm_vcpu *vcpu)
> > +{
> > +       struct kvm_msr_list *msr_list = vcpu->kvm->arch.user_exit_msrs;
> > +       u32 i, msr;
> > +
> > +       if (msr_list) {
> > +               for (i = 0; i < msr_list->nmsrs; i++) {
> > +                       msr = msr_list->indices[i];
> > +                       kvm_x86_ops.set_user_msr_intercept(vcpu, msr);
> > +               }
> > +       }
> > +}
> > +
> >   int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >   {
> >          int r = 0;
> > @@ -8553,6 +8566,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >                          kvm_vcpu_update_apicv(vcpu);
> >                  if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
> >                          kvm_check_async_pf_completion(vcpu);
> > +
> > +               if (kvm_check_request(KVM_REQ_USER_MSR_UPDATE, vcpu))
> > +                       kvm_set_user_msr_intercepts(vcpu);
> >          }
> >
> >          if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > --
> > 2.28.0.163.g6104cc2f0b6-goog
> >
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears userspace tracked MSRs
  2020-08-18 21:34     ` Aaron Lewis
@ 2020-08-18 21:57       ` Jim Mattson
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Mattson @ 2020-08-18 21:57 UTC (permalink / raw)
  To: Aaron Lewis; +Cc: Alexander Graf, kvm list

On Tue, Aug 18, 2020 at 2:34 PM Aaron Lewis <aaronlewis@google.com> wrote:
>
> On Fri, Aug 7, 2020 at 4:03 PM Alexander Graf <graf@amazon.com> wrote:
> >
> >
> >
> > On 04.08.20 06:20, Aaron Lewis wrote:
> > >
> > > SDM volume 3: 24.6.9 "MSR-Bitmap Address" and APM volume 2: 15.11 "MS
> > > intercepts" describe MSR permission bitmaps.  Permission bitmaps are
> > > used to control whether an execution of rdmsr or wrmsr will cause a
> > > vm exit.  For userspace tracked MSRs it is required they cause a vm
> > > exit, so the host is able to forward the MSR to userspace.  This change
> > > adds vmx/svm support to ensure the permission bitmap is properly set to
> > > cause a vm_exit to the host when rdmsr or wrmsr is used by one of the
> > > userspace tracked MSRs.  Also, to avoid repeatedly setting them,
> > > kvm_make_request() is used to coalesce these into a single call.
> >
> > I might have some fundamental misunderstanding here:
> >
> > 1) You force that the list of trapped MSRs is set before vCPU creation,
> > so at the point of vCPU creation, you know already which MSRs should
> > trap to user space
> >
> > 2) MSR intercept bitmaps are (AFAIK) all set to "trap by default". That
> > means any MSR that we want the guest to get direct access to needs
> > explicit opt-in through bitmap modifications.
> >
> > That means if you simply check whether an MSR is supposed to trap to
> > user space in the bitmap set path, you don't need any of the complicated
> > logic below, no?
> >
> >
> > Alex
> >
>
> Yes, I think that should work as well.  However, calling it after the
> fact like we do does have a nice advantage of allowing us to coalesce
> the calls and limit the number of times we need to search the list.

Also, if we do decide to let the userspace MSR list change after vCPUs
are created (and it's a VM-wide ioctl), it will be nice to just send
the request to all vCPUs to update their MSR permission bitmaps before
the next VM-entry.

> > >
> > > Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h |  3 ++
> > >   arch/x86/kvm/svm/svm.c          | 49 ++++++++++++++++++++++++++-------
> > >   arch/x86/kvm/vmx/vmx.c          | 13 ++++++++-
> > >   arch/x86/kvm/x86.c              | 16 +++++++++++
> > >   4 files changed, 70 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 510055471dd0..07a85f5f0b8a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -87,6 +87,7 @@
> > >   #define KVM_REQ_HV_TLB_FLUSH \
> > >          KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
> > >   #define KVM_REQ_APF_READY              KVM_ARCH_REQ(28)
> > > +#define KVM_REQ_USER_MSR_UPDATE KVM_ARCH_REQ(29)
> > >
> > >   #define CR0_RESERVED_BITS                                               \
> > >          (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> > > @@ -1271,6 +1272,8 @@ struct kvm_x86_ops {
> > >          int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> > >
> > >          void (*migrate_timers)(struct kvm_vcpu *vcpu);
> > > +
> > > +       void (*set_user_msr_intercept)(struct kvm_vcpu *vcpu, u32 msr);
> > >   };
> > >
> > >   struct kvm_x86_nested_ops {
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index eb673b59f7b7..c560d283b2af 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -583,13 +583,27 @@ static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> > >          return !!test_bit(bit_write,  &tmp);
> > >   }
> > >
> > > +static void __set_msr_interception(u32 *msrpm, u32 msr, int read, int write,
> > > +                                  u32 offset)
> > > +{
> > > +       u8 bit_read, bit_write;
> > > +       unsigned long tmp;
> > > +
> > > +       bit_read  = 2 * (msr & 0x0f);
> > > +       bit_write = 2 * (msr & 0x0f) + 1;
> > > +       tmp       = msrpm[offset];
> > > +
> > > +       read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> > > +       write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> > > +
> > > +       msrpm[offset] = tmp;
> > > +}
> > > +
> > >   static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> > >                                   int write)
> > >   {
> > >          struct vcpu_svm *svm = to_svm(vcpu);
> > >          u32 *msrpm = svm->msrpm;
> > > -       u8 bit_read, bit_write;
> > > -       unsigned long tmp;
> > >          u32 offset;
> > >
> > >          /*
> > > @@ -598,17 +612,30 @@ static void set_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> > >           */
> > >          WARN_ON(!valid_msr_intercept(msr));
> > >
> > > -       offset    = svm_msrpm_offset(msr);
> > > -       bit_read  = 2 * (msr & 0x0f);
> > > -       bit_write = 2 * (msr & 0x0f) + 1;
> > > -       tmp       = msrpm[offset];
> > > -
> > > +       offset = svm_msrpm_offset(msr);
> > >          BUG_ON(offset == MSR_INVALID);
> > >
> > > -       read  ? clear_bit(bit_read,  &tmp) : set_bit(bit_read,  &tmp);
> > > -       write ? clear_bit(bit_write, &tmp) : set_bit(bit_write, &tmp);
> > > +       __set_msr_interception(msrpm, msr, read, write, offset);
> > >
> > > -       msrpm[offset] = tmp;
> > > +       if (read || write)
> > > +               kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
> > > +}
> > > +
> > > +static void set_user_msr_interception(struct kvm_vcpu *vcpu, u32 msr, int read,
> > > +                                     int write)
> > > +{
> > > +       struct vcpu_svm *svm = to_svm(vcpu);
> > > +       u32 *msrpm = svm->msrpm;
> > > +       u32 offset;
> > > +
> > > +       offset = svm_msrpm_offset(msr);
> > > +       if (offset != MSR_INVALID)
> > > +               __set_msr_interception(msrpm, msr, read, write, offset);
> > > +}
> > > +
> > > +void svm_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
> > > +{
> > > +       set_user_msr_interception(vcpu, msr, 0, 0);
> > >   }
> > >
> > >   static void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm)
> > > @@ -4088,6 +4115,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> > >          .need_emulation_on_page_fault = svm_need_emulation_on_page_fault,
> > >
> > >          .apic_init_signal_blocked = svm_apic_init_signal_blocked,
> > > +
> > > +       .set_user_msr_intercept = svm_set_user_msr_intercept,
> > >   };
> > >
> > >   static struct kvm_x86_init_ops svm_init_ops __initdata = {
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 1313e47a5a1e..3d3d9eaeca64 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -3728,6 +3728,10 @@ static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
> > >                          __clear_bit(msr, msr_bitmap + 0xc00 / f);
> > >
> > >          }
> > > +
> > > +       if (type & MSR_TYPE_R || type & MSR_TYPE_W) {
> > > +               kvm_make_request(KVM_REQ_USER_MSR_UPDATE, vcpu);
> > > +       }
> > >   }
> > >
> > >   static __always_inline void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu,
> > > @@ -3795,7 +3799,7 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
> > >   }
> > >
> > >   static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
> > > -                                        unsigned long *msr_bitmap, u8 mode)
> > > +                                       unsigned long *msr_bitmap, u8 mode)
> > >   {
> > >          int msr;
> > >
> > > @@ -3819,6 +3823,11 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu,
> > >          }
> > >   }
> > >
> > > +void vmx_set_user_msr_intercept(struct kvm_vcpu *vcpu, u32 msr)
> > > +{
> > > +       vmx_enable_intercept_for_msr(vcpu, msr, MSR_TYPE_RW);
> > > +}
> > > +
> > >   void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu)
> > >   {
> > >          struct vcpu_vmx *vmx = to_vmx(vcpu);
> > > @@ -7965,6 +7974,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> > >          .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> > >          .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> > >          .migrate_timers = vmx_migrate_timers,
> > > +
> > > +       .set_user_msr_intercept = vmx_set_user_msr_intercept,
> > >   };
> > >
> > >   static __init int hardware_setup(void)
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 47619b49818a..45bf59f94d34 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3537,6 +3537,19 @@ bool kvm_msr_user_exit(struct kvm *kvm, u32 index)
> > >   }
> > >   EXPORT_SYMBOL_GPL(kvm_msr_user_exit);
> > >
> > > +static void kvm_set_user_msr_intercepts(struct kvm_vcpu *vcpu)
> > > +{
> > > +       struct kvm_msr_list *msr_list = vcpu->kvm->arch.user_exit_msrs;
> > > +       u32 i, msr;
> > > +
> > > +       if (msr_list) {
> > > +               for (i = 0; i < msr_list->nmsrs; i++) {
> > > +                       msr = msr_list->indices[i];
> > > +                       kvm_x86_ops.set_user_msr_intercept(vcpu, msr);
> > > +               }
> > > +       }
> > > +}
> > > +
> > >   int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > >   {
> > >          int r = 0;
> > > @@ -8553,6 +8566,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> > >                          kvm_vcpu_update_apicv(vcpu);
> > >                  if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
> > >                          kvm_check_async_pf_completion(vcpu);
> > > +
> > > +               if (kvm_check_request(KVM_REQ_USER_MSR_UPDATE, vcpu))
> > > +                       kvm_set_user_msr_intercepts(vcpu);
> > >          }
> > >
> > >          if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> > > --
> > > 2.28.0.163.g6104cc2f0b6-goog
> > >
> >
> >
> >
> > Amazon Development Center Germany GmbH
> > Krausenstr. 38
> > 10117 Berlin
> > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> > Sitz: Berlin
> > Ust-ID: DE 289 237 879
> >
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-08-18 21:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04  4:20 [PATCH 0/6] Allow userspace to manage MSRs Aaron Lewis
2020-08-04  4:20 ` [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list Aaron Lewis
2020-08-07 16:12   ` Alexander Graf
2020-08-07 21:16     ` Jim Mattson
2020-08-07 22:49       ` Alexander Graf
2020-08-08  0:39         ` Jim Mattson
2020-08-04  4:20 ` [PATCH 2/6] KVM: x86: Add support for exiting to userspace on rdmsr or wrmsr Aaron Lewis
2020-08-07 23:41   ` Alexander Graf
2020-08-04  4:20 ` [PATCH 3/6] KVM: x86: Prepare MSR bitmaps for userspace tracked MSRs Aaron Lewis
2020-08-04  4:20 ` [PATCH 4/6] KVM: x86: Ensure the MSR bitmap never clears " Aaron Lewis
2020-08-07 23:03   ` Alexander Graf
2020-08-18 21:34     ` Aaron Lewis
2020-08-18 21:57       ` Jim Mattson
2020-08-04  4:20 ` [PATCH 5/6] selftests: kvm: Fix the segment descriptor layout to match the actual layout Aaron Lewis
2020-08-04  4:20 ` [PATCH 6/6] selftests: kvm: Add test to exercise userspace MSR list Aaron Lewis
2020-08-07 23:28   ` Alexander Graf
2020-08-10 17:24     ` Jim Mattson
2020-08-10 21:18       ` Aaron Lewis
2020-08-18 21:26     ` Aaron Lewis
2020-08-10  9:53   ` Andrew Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).