All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
@ 2020-07-10 20:07 Oliver Upton
  2020-07-10 20:07 ` [PATCH 2/4] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Oliver Upton @ 2020-07-10 20:07 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Jim Mattson, Oliver Upton

From: Peter Hornyack <peterhornyack@google.com>

The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
for determining when userspace is attempting to synchronize TSCs.
Instead of guessing at userspace's intentions in the kernel, directly
expose control of the TSC offset field to userspace such that userspace
may deliberately synchronize the guest TSCs.

Note that TSC offset support is mandatory for KVM on both SVM and VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Peter Hornyack <peterhornyack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
 arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  5 +++++
 3 files changed, 60 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 644e5326aa50..ef1fc109b901 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4701,6 +4701,33 @@ KVM_PV_VM_VERIFY
   KVM is allowed to start protected VCPUs.
 
 
+4.126 KVM_GET_TSC_OFFSET
+------------------------
+
+:Capability: KVM_CAP_TSC_OFFSET
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: __u64 (out)
+:Returns: 0 on success, < 0 on error
+
+This ioctl gets the TSC offset field. The offset is returned as an
+unsigned value, though it is interpreted as a signed value by hardware.
+
+
+4.127 KVM_SET_TSC_OFFSET
+------------------------
+
+:Capability: KVM_CAP_TSC_OFFSET
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: __u64 (in)
+:Returns: 0 on success, < 0 on error
+
+This ioctl sets the TSC offset field. The offset is represented as an
+unsigned value, though it is interpreted as a signed value by hardware.
+The guest's TSC value will change based on the written offset.
+
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e27d3db7e43f..563256ce5ade 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2026,6 +2026,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.l1_tsc_offset;
+}
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
@@ -3482,6 +3487,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_TIME:
 	case KVM_CAP_IOAPIC_POLARITY_IGNORED:
 	case KVM_CAP_TSC_DEADLINE_TIMER:
+	case KVM_CAP_TSC_OFFSET:
 	case KVM_CAP_DISABLE_QUIRKS:
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
@@ -4734,6 +4740,28 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_GET_TSC_OFFSET: {
+		u64 tsc_offset;
+
+		r = -EFAULT;
+		tsc_offset = kvm_vcpu_read_tsc_offset(vcpu);
+		if (copy_to_user(argp, &tsc_offset, sizeof(tsc_offset)))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_SET_TSC_OFFSET: {
+		u64 __user *tsc_offset_arg = argp;
+		u64 tsc_offset;
+
+		r = -EFAULT;
+		if (copy_from_user(&tsc_offset, tsc_offset_arg,
+				   sizeof(tsc_offset)))
+			goto out;
+		kvm_vcpu_write_tsc_offset(vcpu, tsc_offset);
+		r = 0;
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ff9b335620d0..41f387ffcd11 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1033,6 +1033,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
 #define KVM_CAP_LAST_CPU 184
+#define KVM_CAP_TSC_OFFSET 185
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1501,6 +1502,10 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+/* Available with KVM_CAP_TSC_OFFSET */
+#define KVM_GET_TSC_OFFSET	_IOR(KVMIO, 0xc5, __u64)
+#define KVM_SET_TSC_OFFSET	_IOW(KVMIO, 0xc6, __u64)
+
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
-- 
2.27.0.383.g050319c2ae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] kvm: vmx: check tsc offsetting with nested_cpu_has()
  2020-07-10 20:07 [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
@ 2020-07-10 20:07 ` Oliver Upton
  2020-07-10 20:07 ` [PATCH 3/4] selftests: kvm: use a helper function for reading cpuid Oliver Upton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Oliver Upton @ 2020-07-10 20:07 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2b41d987b101..fa201010dbe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1775,7 +1775,7 @@ static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	 * to the newly set TSC to get L2's TSC.
 	 */
 	if (is_guest_mode(vcpu) &&
-	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
+	    nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING))
 		g_tsc_offset = vmcs12->tsc_offset;
 
 	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-- 
2.27.0.383.g050319c2ae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] selftests: kvm: use a helper function for reading cpuid
  2020-07-10 20:07 [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
  2020-07-10 20:07 ` [PATCH 2/4] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
@ 2020-07-10 20:07 ` Oliver Upton
  2020-07-10 20:07 ` [PATCH 4/4] selftests: kvm: introduce tsc_offset_test Oliver Upton
  2020-07-10 20:38 ` [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Paolo Bonzini
  3 siblings, 0 replies; 9+ messages in thread
From: Oliver Upton @ 2020-07-10 20:07 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

Derived from kvm-unit-tests.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/include/x86_64/processor.h      | 15 +++++++++++++++
 .../selftests/kvm/include/x86_64/svm_util.h       |  7 ++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 82b7fe16a824..316fbd4700ef 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -338,6 +338,21 @@ uint32_t kvm_get_cpuid_max_basic(void);
 uint32_t kvm_get_cpuid_max_extended(void);
 void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
 
+struct cpuid {
+	u32 a, b, c, d;
+};
+
+static inline struct cpuid raw_cpuid(u32 function, u32 index)
+{
+	struct cpuid r;
+
+	asm volatile("cpuid"
+		     : "=a"(r.a), "=b"(r.b), "=c"(r.c), "=d"(r.d)
+		     : "0"(function), "2"(index));
+
+	return r;
+}
+
 /*
  * Basic CPU control in CR0
  */
diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index b7531c83b8ae..47a13aaee460 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -38,12 +38,9 @@ void nested_svm_check_supported(void);
 
 static inline bool cpu_has_svm(void)
 {
-	u32 eax = 0x80000001, ecx;
+	struct cpuid r = raw_cpuid(0x80000001, 0);
 
-	asm("cpuid" :
-	    "=a" (eax), "=c" (ecx) : "0" (eax) : "ebx", "edx");
-
-	return ecx & CPUID_SVM;
+	return r.c & CPUID_SVM;
 }
 
 #endif /* SELFTEST_KVM_SVM_UTILS_H */
-- 
2.27.0.383.g050319c2ae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] selftests: kvm: introduce tsc_offset_test
  2020-07-10 20:07 [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
  2020-07-10 20:07 ` [PATCH 2/4] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
  2020-07-10 20:07 ` [PATCH 3/4] selftests: kvm: use a helper function for reading cpuid Oliver Upton
@ 2020-07-10 20:07 ` Oliver Upton
  2020-07-10 20:38 ` [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Paolo Bonzini
  3 siblings, 0 replies; 9+ messages in thread
From: Oliver Upton @ 2020-07-10 20:07 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

Test the KVM_{GET,SET}_TSC_OFFSET ioctls. Ensure the following:
 - KVM_GET_TSC_OFFSET returns the value written by KVM_SET_TSC_OFFSET
 - The L1 TSC is appropriately adjusted based on the written value
 - Guest manipulation of the TSC results in a changed offset in
   KVM_GET_TSC_OFFSET
 - Modifying the TSC offset while in guest mode affects L2 guest
 - Modification in guest mode is also reflected in L1

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/svm_util.h   |   5 +
 .../selftests/kvm/include/x86_64/vmx.h        |   9 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
 .../selftests/kvm/x86_64/tsc_offset_test.c    | 344 ++++++++++++++++++
 7 files changed, 372 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 452787152748..f5608692b43a 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -11,6 +11,7 @@
 /x86_64/set_sregs_test
 /x86_64/smm_test
 /x86_64/state_test
+/x86_64/tsc_offset_test
 /x86_64/vmx_preemption_timer_test
 /x86_64/svm_vmcall_test
 /x86_64/sync_regs_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4a166588d99f..03dfaa49c3c4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -55,6 +55,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
 TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
+TEST_GEN_PROGS_x86_64 += x86_64/tsc_offset_test
 TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index 47a13aaee460..ef2791be3f6b 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -43,4 +43,9 @@ static inline bool cpu_has_svm(void)
 	return r.c & CPUID_SVM;
 }
 
+static inline void vmmcall(void)
+{
+	__asm__ __volatile__("vmmcall");
+}
+
 #endif /* SELFTEST_KVM_SVM_UTILS_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 16fa21ebb99c..d70bf4f350f8 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -602,6 +602,8 @@ struct vmx_pages *vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva);
 bool prepare_for_vmx_operation(struct vmx_pages *vmx);
 void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp);
 bool load_vmcs(struct vmx_pages *vmx);
+void generic_vmx_setup(struct vmx_pages *vmx, void *guest_rip,
+		       void *guest_rsp);
 
 bool nested_vmx_supported(void);
 void nested_vmx_check_supported(void);
@@ -616,4 +618,11 @@ void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot);
 
+static inline bool cpu_has_vmx(void)
+{
+	struct cpuid r = raw_cpuid(1, 0);
+
+	return r.c & CPUID_VMX;
+}
+
 #endif /* SELFTEST_KVM_VMX_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 74776ee228f2..1b9bd2ed58c8 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -23,6 +23,7 @@
 static void *align(void *x, size_t size)
 {
 	size_t mask = size - 1;
+
 	TEST_ASSERT(size != 0 && !(size & (size - 1)),
 		    "size not a power of 2: %lu", size);
 	return (void *) (((size_t) x + mask) & ~mask);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index f1e00d43eea2..a5fb3bca0219 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -542,3 +542,14 @@ void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 	vmx->eptp_hva = addr_gva2hva(vm, (uintptr_t)vmx->eptp);
 	vmx->eptp_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->eptp);
 }
+
+void generic_vmx_setup(struct vmx_pages *vmx_pages, void *guest_rip,
+		       void *guest_rsp)
+{
+	GUEST_ASSERT(vmx_pages && vmx_pages->vmcs_gpa);
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+	GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
+	prepare_vmcs(vmx_pages, guest_rip, guest_rsp);
+}
+
diff --git a/tools/testing/selftests/kvm/x86_64/tsc_offset_test.c b/tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
new file mode 100644
index 000000000000..08d653e31947
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
@@ -0,0 +1,344 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TSC offset test
+ *
+ * Copyright (C) 2020, Google, LLC.
+ *
+ * Test to ensure that userspace control of the TSC offset field behaves as
+ * expected for both non-nested and nested guests.
+ */
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#include "kselftest.h"
+
+#define L1_TSC_WRITE_VALUE 0
+#define L2_GUEST_STACK_SIZE 64
+#define L1_TSC_OFFSET (1ul << 48)
+#define L2_TSC_OFFSET -L1_TSC_OFFSET
+#define VCPU_ID 1
+
+#define swap(a, b) \
+	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
+
+bool vmx;
+
+int set_tsc_offset(struct kvm_vm *vm, u32 vcpuid, u64 val)
+{
+	return _vcpu_ioctl(vm, vcpuid, KVM_SET_TSC_OFFSET, &val);
+}
+
+int get_tsc_offset(struct kvm_vm *vm, u32 vcpuid, u64 *out)
+{
+	return _vcpu_ioctl(vm, vcpuid, KVM_GET_TSC_OFFSET, out);
+}
+
+/*
+ * Test that reading the TSC offset returns the previously written value.
+ */
+void set_get_tsc_offset_test(struct kvm_vm *vm, u32 vcpuid)
+{
+	u64 val;
+
+	TEST_ASSERT(!set_tsc_offset(vm, vcpuid, L1_TSC_OFFSET),
+		    "Failed to write TSC offset");
+	TEST_ASSERT(!get_tsc_offset(vm, vcpuid, &val),
+		    "Failed to read TSC offset");
+	TEST_ASSERT(val == L1_TSC_OFFSET,
+		    "Expected %lu from GET_TSC_OFFSET but got %lu",
+		    L1_TSC_OFFSET, val);
+}
+
+void check_value_bounds(const char *name, int stage, u64 low, u64 high, u64 val)
+{
+	TEST_ASSERT(low <= val && val <= high,
+		    "Stage %d: expected %s value in the range [%lu, %lu] but got %lu",
+		    stage, name, low, high, val);
+
+	/* only reached if passed */
+	pr_info("Stage %d: %s: %lu, expected range: [%lu, %lu]\n", stage, name,
+		val, low, high);
+}
+
+void check_value_bounds_signed(const char *name, int stage, s64 low, s64 high,
+			       s64 val)
+{
+	TEST_ASSERT(low <= val && val <= high,
+		    "Stage %d: expected %s value in the range [%ld, %ld] but got %ld",
+		    stage, name, low, high, val);
+
+	/* only reached if passed */
+	pr_info("Stage %d: %s: %ld, expected range: [%ld, %ld]\n", stage, name,
+		val, low, high);
+}
+
+void check_value_bounds_overflow(const char *name, int stage, s64 low, s64 high,
+				 s64 val)
+{
+	TEST_ASSERT(val <= low || val >= high,
+		    "Stage %d: expected %s value outside the range [%ld, %ld] but got %ld",
+		    stage, name, low, high, val);
+
+	pr_info("Stage %d: %s: %ld, expected range: [-MAX, %ld], [%ld, MAX]\n",
+		stage, name, val, low, high);
+}
+
+void generic_vmcall(void)
+{
+	if (vmx)
+		vmcall();
+	else
+		vmmcall();
+}
+
+void l2_main(void)
+{
+	/* Allow userspace to manipulate the TSC offset */
+	GUEST_SYNC(3);
+	GUEST_SYNC_ARGS(4, rdtsc(), 0, 0, 0);
+	generic_vmcall();
+}
+
+void l0_nested_setup(struct kvm_vm *vm, u32 vcpuid)
+{
+	vm_vaddr_t nested_pages = 0;
+
+	if (vmx)
+		vcpu_alloc_vmx(vm, &nested_pages);
+	else
+		vcpu_alloc_svm(vm, &nested_pages);
+
+	vcpu_args_set(vm, VCPU_ID, 1, nested_pages);
+}
+
+void l1_nested_setup(void *nested_pages, void *guest_stack)
+{
+	if (vmx)
+		generic_vmx_setup(nested_pages, l2_main, guest_stack);
+	else
+		generic_svm_setup(nested_pages, l2_main, guest_stack);
+}
+
+void l1_set_tsc_offset(void *nested_pages, u64 offset)
+{
+	if (vmx) {
+		GUEST_ASSERT(!vmwrite(CPU_BASED_VM_EXEC_CONTROL,
+				      vmreadz(CPU_BASED_VM_EXEC_CONTROL) |
+				      CPU_BASED_USE_TSC_OFFSETTING));
+
+		GUEST_ASSERT(!vmwrite(TSC_OFFSET, offset));
+	} else {
+		struct svm_test_data *svm = nested_pages;
+
+		svm->vmcb->control.tsc_offset = offset;
+		/* Mark the TSC offset field as dirty */
+		svm->vmcb->control.clean &= ~1u;
+	}
+
+}
+
+void l1_enter_guest(void *nested_pages)
+{
+	if (vmx) {
+		/* We only enter L2 once, hence VMLAUNCH */
+		GUEST_ASSERT(!vmlaunch());
+	} else {
+		struct svm_test_data *svm = nested_pages;
+
+		run_guest(svm->vmcb, svm->vmcb_gpa);
+	}
+}
+
+void l1_assert_exit_vmcall(void *nested_pages)
+{
+	if (vmx) {
+		GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+	} else {
+		struct svm_test_data *svm = nested_pages;
+
+		GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+	}
+}
+
+void l1_main(void *nested_pages)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	bool nested;
+
+	/*
+	 * Guest doesn't share memory with userspace, determine VMX presence
+	 * inside guest.
+	 */
+	vmx = cpu_has_vmx();
+	nested = vmx || cpu_has_svm();
+
+	if (nested) {
+		l1_nested_setup(nested_pages,
+				&l2_guest_stack[L2_GUEST_STACK_SIZE]);
+		l1_set_tsc_offset(nested_pages, L2_TSC_OFFSET);
+	}
+
+	GUEST_SYNC_ARGS(1, rdtsc(), 0, 0, 0);
+
+	wrmsr(MSR_IA32_TSC, L1_TSC_WRITE_VALUE);
+	GUEST_SYNC(2);
+
+	if (!nested)
+		GUEST_DONE();
+
+	l1_enter_guest(nested_pages);
+	l1_assert_exit_vmcall(nested_pages);
+
+	GUEST_SYNC_ARGS(5, rdtsc(), 0, 0, 0);
+	GUEST_DONE();
+}
+
+int main(void)
+{
+	u64 start, stop, exp_low, exp_high;
+	struct kvm_run *run;
+	struct kvm_vm *vm;
+	struct ucall uc;
+	bool nested;
+	int stage;
+
+	if (!kvm_check_cap(KVM_CAP_TSC_OFFSET)) {
+		pr_info("will skip tsc offset tests\n");
+		return 0;
+	}
+
+	/*
+	 * Nested virtualization is not explicitly required for this test, but
+	 * gates the L2 tests.
+	 */
+	vmx = nested_vmx_supported();
+	nested = vmx || nested_svm_supported();
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) l1_main);
+	vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+	run = vcpu_state(vm, VCPU_ID);
+
+	if (nested)
+		l0_nested_setup(vm, VCPU_ID);
+
+	set_get_tsc_offset_test(vm, VCPU_ID);
+
+	for (stage = 1;; stage++) {
+		start = rdtsc();
+		_vcpu_run(vm, VCPU_ID);
+		stop = rdtsc();
+
+		pr_info("Stage %d: start: %lu, stop: %lu\n", stage, start, stop);
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Stage %d: unexpected exit reason: %u (%s)\n",
+			    stage, run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+				  __FILE__, uc.args[1]);
+			/* NOT REACHED */
+		case UCALL_SYNC:
+			break;
+		case UCALL_DONE:
+			goto done;
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		}
+
+		/*
+		 * Check that the guest's TSC value falls between expected
+		 * bounds, considering the written TSC offset.
+		 */
+		if (stage == 1) {
+			exp_low = start + L1_TSC_OFFSET;
+			exp_high = stop + L1_TSC_OFFSET;
+
+			check_value_bounds("L1 TSC", stage, exp_low, exp_high,
+					   uc.args[2]);
+
+			/*
+			 * KVM interprets writes to the TSC within a second of
+			 * elapsed time as an attempt to synchronize TSCs. In
+			 * order to get a TSC offset within expected bounds for
+			 * stage 2, we must sleep for a second to avoid such
+			 * handling of the TSC write.
+			 */
+			sleep(1);
+		/*
+		 * Check that guest writes to the TSC result in a TSC offset
+		 * value between the expected bounds, considering the original
+		 * TSC offset value.
+		 */
+		} else if (stage == 2) {
+			s64 tsc_offset, low, high;
+
+			low = L1_TSC_WRITE_VALUE - stop;
+			high = L1_TSC_WRITE_VALUE - start;
+
+			TEST_ASSERT(!get_tsc_offset(vm, VCPU_ID, (u64 *) &tsc_offset),
+				    "Failed to read TSC offset");
+
+			/*
+			 * It is possible (though highly unlikely) that the
+			 * host's TSC crosses 2^63 ticks while we are running
+			 * the guest. In this case, the lower bound on the TSC
+			 * offset will have wrapped around into the positive
+			 * domain. In this case, we must instead assert that the
+			 * observed value exists outside of the range (high,
+			 * low).
+			 */
+			if (low > high) {
+				/*
+				 * Swap low and high such that the variable
+				 * names correctly imply their value.
+				 */
+				swap(low, high);
+				check_value_bounds_overflow("L1 TSC offset",
+							    stage, low, high,
+							    tsc_offset);
+			} else {
+				check_value_bounds_signed("L1 TSC offset",
+							  stage, low, high,
+							  tsc_offset);
+			}
+
+		/*
+		 * Write the TSC offset while in guest mode
+		 */
+		} else if (nested && stage == 3) {
+			TEST_ASSERT(!set_tsc_offset(vm, VCPU_ID, L1_TSC_OFFSET),
+				    "Failed to set TSC offset");
+
+		/*
+		 * Check that the write to TSC offset affects L2's perception of
+		 * the TSC
+		 */
+		} else if (nested && stage == 4) {
+			exp_low = start + L1_TSC_OFFSET + L2_TSC_OFFSET;
+			exp_high = stop + L1_TSC_OFFSET + L2_TSC_OFFSET;
+
+			check_value_bounds("L2 TSC", stage, exp_low, exp_high,
+					   uc.args[2]);
+
+		/*
+		 * Check that the modified TSC offset is also observed in L1
+		 */
+		} else if (nested && stage == 5) {
+			exp_low = start + L1_TSC_OFFSET;
+			exp_high = stop + L1_TSC_OFFSET;
+
+			check_value_bounds("L1 TSC", stage, exp_low, exp_high,
+					   uc.args[2]);
+		} else {
+			TEST_FAIL("Unexpected stage %d\n", stage);
+		}
+	}
+done:
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.27.0.383.g050319c2ae-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-10 20:07 [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
                   ` (2 preceding siblings ...)
  2020-07-10 20:07 ` [PATCH 4/4] selftests: kvm: introduce tsc_offset_test Oliver Upton
@ 2020-07-10 20:38 ` Paolo Bonzini
  2020-07-10 22:09   ` Jim Mattson
  3 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2020-07-10 20:38 UTC (permalink / raw)
  To: Oliver Upton, kvm
  Cc: Peter Shier, Sean Christopherson, Peter Hornyack, Jim Mattson

On 10/07/20 22:07, Oliver Upton wrote:
> From: Peter Hornyack <peterhornyack@google.com>
> 
> The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
> for determining when userspace is attempting to synchronize TSCs.
> Instead of guessing at userspace's intentions in the kernel, directly
> expose control of the TSC offset field to userspace such that userspace
> may deliberately synchronize the guest TSCs.
> 
> Note that TSC offset support is mandatory for KVM on both SVM and VMX.
> 
> Reviewed-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Peter Hornyack <peterhornyack@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h       |  5 +++++
>  3 files changed, 60 insertions(+)

Needless to say, a patch that comes with tests starts on the fast lane.
 But I have a fundamental question that isn't answered by either the
test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
practice_ by a VMM?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-10 20:38 ` [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Paolo Bonzini
@ 2020-07-10 22:09   ` Jim Mattson
  2020-07-10 22:37     ` Oliver Upton
  2020-07-10 22:40     ` Paolo Bonzini
  0 siblings, 2 replies; 9+ messages in thread
From: Jim Mattson @ 2020-07-10 22:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Oliver Upton, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On Fri, Jul 10, 2020 at 1:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/07/20 22:07, Oliver Upton wrote:
> > From: Peter Hornyack <peterhornyack@google.com>
> >
> > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
> > for determining when userspace is attempting to synchronize TSCs.
> > Instead of guessing at userspace's intentions in the kernel, directly
> > expose control of the TSC offset field to userspace such that userspace
> > may deliberately synchronize the guest TSCs.
> >
> > Note that TSC offset support is mandatory for KVM on both SVM and VMX.
> >
> > Reviewed-by: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Peter Hornyack <peterhornyack@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h       |  5 +++++
> >  3 files changed, 60 insertions(+)
>
> Needless to say, a patch that comes with tests starts on the fast lane.
>  But I have a fundamental question that isn't answered by either the
> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
> practice_ by a VMM?

One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
former.

This isn't the only undocumented dependency among the various
KVM_SET_* calls, but I agree that it would be helpful to document it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-10 22:09   ` Jim Mattson
@ 2020-07-10 22:37     ` Oliver Upton
  2020-07-10 22:40     ` Paolo Bonzini
  1 sibling, 0 replies; 9+ messages in thread
From: Oliver Upton @ 2020-07-10 22:37 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, kvm list, Peter Shier, Sean Christopherson,
	Peter Hornyack

On Fri, Jul 10, 2020 at 3:09 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Fri, Jul 10, 2020 at 1:38 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 10/07/20 22:07, Oliver Upton wrote:
> > > From: Peter Hornyack <peterhornyack@google.com>
> > >
> > > The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
> > > for determining when userspace is attempting to synchronize TSCs.
> > > Instead of guessing at userspace's intentions in the kernel, directly
> > > expose control of the TSC offset field to userspace such that userspace
> > > may deliberately synchronize the guest TSCs.
> > >
> > > Note that TSC offset support is mandatory for KVM on both SVM and VMX.
> > >
> > > Reviewed-by: Jim Mattson <jmattson@google.com>
> > > Signed-off-by: Peter Hornyack <peterhornyack@google.com>
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > ---
> > >  Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> > >  arch/x86/kvm/x86.c             | 28 ++++++++++++++++++++++++++++
> > >  include/uapi/linux/kvm.h       |  5 +++++
> > >  3 files changed, 60 insertions(+)
> >
> > Needless to say, a patch that comes with tests starts on the fast lane.
> >  But I have a fundamental question that isn't answered by either the
> > test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
> > practice_ by a VMM?
>
> One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
> one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
> former.
>
> This isn't the only undocumented dependency among the various
> KVM_SET_* calls, but I agree that it would be helpful to document it.

Sorry, I was AFK for a bit but it seems Jim beat me to the punch :-)

I'd be glad to add a bit of color on this to the documentation.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-10 22:09   ` Jim Mattson
  2020-07-10 22:37     ` Oliver Upton
@ 2020-07-10 22:40     ` Paolo Bonzini
  2020-07-13 19:54       ` Oliver Upton
  1 sibling, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2020-07-10 22:40 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Oliver Upton, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On 11/07/20 00:09, Jim Mattson wrote:
>>  But I have a fundamental question that isn't answered by either the
>> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
>> practice_ by a VMM?
>
> One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
> one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
> former.

Other questions:

1) how do you handle non-synchronized TSC between source and destination?

2) how is KVM_REQ_MASTERCLOCK_UPDATE triggered, since that's the main
function of the various heuristics?

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-10 22:40     ` Paolo Bonzini
@ 2020-07-13 19:54       ` Oliver Upton
  0 siblings, 0 replies; 9+ messages in thread
From: Oliver Upton @ 2020-07-13 19:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On Fri, Jul 10, 2020 at 3:40 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 11/07/20 00:09, Jim Mattson wrote:
> >>  But I have a fundamental question that isn't answered by either the
> >> test or the documentation: how should KVM_SET_TSC_OFFSET be used _in
> >> practice_ by a VMM?
> >
> > One could either omit IA32_TIME_STAMP_COUNTER from KVM_SET_MSRS, or
> > one could call KVM_SET_TSC_OFFSET after KVM_SET_MSRS. We do the
> > former.
>
> Other questions:
>
> 1) how do you handle non-synchronized TSC between source and destination?
>

On the source side we record a host TSC reading establishing the start
of migration, along with the TSC offsets. We recalculate new offsets
that compensate for the elapsed time + TSC difference between
source/target when restoring the VM.

> 2) how is KVM_REQ_MASTERCLOCK_UPDATE triggered, since that's the main
> function of the various heuristics?

We initialize all TSCs to 0 at VM creation, then restore offsets (if
any). Should the offsets be out of phase, we write a garbage value to
the TSC MSR to break the vCPU matching, then set the offset. Not the
most elegant solution, it'd be better if the {GET,SET}_TSC_OFFSET
ioctls instrumented this directly.

>
> Paolo
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-13 19:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 20:07 [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
2020-07-10 20:07 ` [PATCH 2/4] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
2020-07-10 20:07 ` [PATCH 3/4] selftests: kvm: use a helper function for reading cpuid Oliver Upton
2020-07-10 20:07 ` [PATCH 4/4] selftests: kvm: introduce tsc_offset_test Oliver Upton
2020-07-10 20:38 ` [PATCH 1/4] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Paolo Bonzini
2020-07-10 22:09   ` Jim Mattson
2020-07-10 22:37     ` Oliver Upton
2020-07-10 22:40     ` Paolo Bonzini
2020-07-13 19:54       ` Oliver Upton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.