All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
@ 2020-07-22  3:26 Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 1/5] kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc Oliver Upton
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Oliver Upton @ 2020-07-22  3:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton

To date, VMMs have typically restored the guest's TSCs by value using
the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by
value introduces some challenges with synchronization as the TSCs
continue to tick throughout the restoration process. As such, KVM has
some heuristics around TSC writes to infer whether or not the guest or
host is attempting to synchronize the TSCs.

Instead of guessing at the intentions of a VMM, it'd be better to
provide an interface that allows for explicit synchronization of the
guest's TSCs. To that end, this series introduces the
KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to
userspace.

v2 => v3:
 - Mark kvm_write_tsc_offset() as static (whoops)

v1 => v2:
 - Added clarification to the documentation of KVM_SET_TSC_OFFSET to
   indicate that it can be used instead of an IA32_TSC MSR restore
   through KVM_SET_MSRS
 - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC
   synchronization heuristics, thereby enabling the KVM masterclock when
   all vCPUs are in phase.

Oliver Upton (4):
  kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
  kvm: vmx: check tsc offsetting with nested_cpu_has()
  selftests: kvm: use a helper function for reading cpuid
  selftests: kvm: introduce tsc_offset_test

Peter Hornyack (1):
  kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls

 Documentation/virt/kvm/api.rst                |  31 ++
 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/kvm/vmx/vmx.c                        |   2 +-
 arch/x86/kvm/x86.c                            | 147 ++++---
 include/uapi/linux/kvm.h                      |   5 +
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/include/test_util.h |   3 +
 .../selftests/kvm/include/x86_64/processor.h  |  15 +
 .../selftests/kvm/include/x86_64/svm_util.h   |  10 +-
 .../selftests/kvm/include/x86_64/vmx.h        |   9 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
 .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
 14 files changed, 550 insertions(+), 49 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c

-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/5] kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
  2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
@ 2020-07-22  3:26 ` Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 2/5] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2020-07-22  3:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

kvm_write_tsc() determines when the host or guest is attempting to
synchronize the TSCs and does some bookkeeping for tracking this. Retain
the existing sync checks (host is writing 0 or TSCs being written within
a second of one another) in kvm_write_tsc(), but split off the
bookkeeping and TSC offset write into a new function. This allows for a
new ioctl to be introduced, yielding control of the TSC offset field to
userspace in a manner that respects the existing masterclock heuristics.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/x86.c | 98 +++++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e27d3db7e43f..a5c88f747da6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2045,13 +2045,64 @@ static inline bool kvm_check_tsc_unstable(void)
 	return check_tsc_unstable();
 }
 
+/*
+ * Sets the TSC offset for the vCPU and tracks the TSC matching generation. Must
+ * hold the kvm->arch.tsc_write_lock to call this function.
+ */
+static void kvm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+				 u64 ns, bool matched)
+{
+	struct kvm *kvm = vcpu->kvm;
+	bool already_matched;
+
+	already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+	/*
+	 * We also track the most recent recorded KHZ, write and time to
+	 * allow the matching interval to be extended at each write.
+	 */
+	kvm->arch.last_tsc_nsec = ns;
+	kvm->arch.last_tsc_write = tsc;
+	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+	vcpu->arch.last_guest_tsc = tsc;
+
+	/* Keep track of which generation this VCPU has synchronized to */
+	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+	kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+	if (!matched) {
+		/*
+		 * We split periods of matched TSC writes into generations.
+		 * For each generation, we track the original measured
+		 * nanosecond time, offset, and write, so if TSCs are in
+		 * sync, we can match exact offset, and if not, we can match
+		 * exact software computation in compute_guest_tsc()
+		 *
+		 * These values are tracked in kvm->arch.cur_xxx variables.
+		 */
+		kvm->arch.nr_vcpus_matched_tsc = 0;
+		kvm->arch.cur_tsc_generation++;
+		kvm->arch.cur_tsc_nsec = ns;
+		kvm->arch.cur_tsc_write = tsc;
+		kvm->arch.cur_tsc_offset = offset;
+	} else if (!already_matched) {
+		kvm->arch.nr_vcpus_matched_tsc++;
+	}
+
+	kvm_track_tsc_matching(vcpu);
+	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
+}
+
 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
 	struct kvm *kvm = vcpu->kvm;
 	u64 offset, ns, elapsed;
 	unsigned long flags;
-	bool matched;
-	bool already_matched;
+	bool matched = false;
 	u64 data = msr->data;
 	bool synchronizing = false;
 
@@ -2098,54 +2149,13 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 			offset = kvm_compute_tsc_offset(vcpu, data);
 		}
 		matched = true;
-		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
-	} else {
-		/*
-		 * We split periods of matched TSC writes into generations.
-		 * For each generation, we track the original measured
-		 * nanosecond time, offset, and write, so if TSCs are in
-		 * sync, we can match exact offset, and if not, we can match
-		 * exact software computation in compute_guest_tsc()
-		 *
-		 * These values are tracked in kvm->arch.cur_xxx variables.
-		 */
-		kvm->arch.cur_tsc_generation++;
-		kvm->arch.cur_tsc_nsec = ns;
-		kvm->arch.cur_tsc_write = data;
-		kvm->arch.cur_tsc_offset = offset;
-		matched = false;
 	}
 
-	/*
-	 * We also track th most recent recorded KHZ, write and time to
-	 * allow the matching interval to be extended at each write.
-	 */
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-	vcpu->arch.last_guest_tsc = data;
-
-	/* Keep track of which generation this VCPU has synchronized to */
-	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
 	if (!msr->host_initiated && guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST))
 		update_ia32_tsc_adjust_msr(vcpu, offset);
 
-	kvm_vcpu_write_tsc_offset(vcpu, offset);
+	kvm_write_tsc_offset(vcpu, offset, data, ns, matched);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
-
-	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
-	if (!matched) {
-		kvm->arch.nr_vcpus_matched_tsc = 0;
-	} else if (!already_matched) {
-		kvm->arch.nr_vcpus_matched_tsc++;
-	}
-
-	kvm_track_tsc_matching(vcpu);
-	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
 }
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/5] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 1/5] kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc Oliver Upton
@ 2020-07-22  3:26 ` Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 3/5] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2020-07-22  3:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Jim Mattson, Oliver Upton

From: Peter Hornyack <peterhornyack@google.com>

The KVM_SET_MSR vcpu ioctl has some temporal and value-based heuristics
for determining when userspace is attempting to synchronize TSCs.
Instead of guessing at userspace's intentions in the kernel, directly
expose control of the TSC offset field to userspace such that userspace
may deliberately synchronize the guest TSCs.

Note that TSC offset support is mandatory for KVM on both SVM and VMX.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Peter Hornyack <peterhornyack@google.com>
[oupton - expanded docs, fixed KVM master clock interaction]
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst  | 31 +++++++++++++++++++++
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 49 +++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h        |  5 ++++
 4 files changed, 86 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 644e5326aa50..7ecef103cc7f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4701,6 +4701,37 @@ KVM_PV_VM_VERIFY
   KVM is allowed to start protected VCPUs.
 
 
+4.126 KVM_GET_TSC_OFFSET
+------------------------
+
+:Capability: KVM_CAP_TSC_OFFSET
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: __u64 (out)
+:Returns: 0 on success, < 0 on error
+
+This ioctl gets the TSC offset field. The offset is returned as an
+unsigned value, though it is interpreted as a signed value by hardware.
+
+
+4.127 KVM_SET_TSC_OFFSET
+------------------------
+
+:Capability: KVM_CAP_TSC_OFFSET
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: __u64 (in)
+:Returns: 0 on success, < 0 on error
+
+This ioctl sets the TSC offset field. The offset is represented as an
+unsigned value, though it is interpreted as a signed value by hardware.
+The guest's TSC value will change based on the written offset. A
+userspace VMM may use this ioctl to restore the guest's TSC instead of
+KVM_SET_MSRS, thereby avoiding the in-kernel TSC synchronization
+detection. To that end, a userspace VMM may deliberately synchronize
+TSCs by setting the same offset value for all vCPUs.
+
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5aaef036627f..0d0e3670fffe 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -920,6 +920,7 @@ struct kvm_arch {
 	u64 last_tsc_nsec;
 	u64 last_tsc_write;
 	u32 last_tsc_khz;
+	u64 last_tsc_offset;
 	u64 cur_tsc_nsec;
 	u64 cur_tsc_write;
 	u64 cur_tsc_offset;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a5c88f747da6..98dbe98874de 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2026,6 +2026,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.l1_tsc_offset;
+}
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
@@ -2063,6 +2068,7 @@ static void kvm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
 	kvm->arch.last_tsc_nsec = ns;
 	kvm->arch.last_tsc_write = tsc;
 	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+	kvm->arch.last_tsc_offset = offset;
 
 	vcpu->arch.last_guest_tsc = tsc;
 
@@ -2160,6 +2166,26 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
 
+static void kvm_vcpu_ioctl_set_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
+{
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long flags;
+	bool matched;
+	u64 tsc, ns;
+
+	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+
+	matched = (kvm->arch.last_tsc_offset == offset &&
+		   vcpu->arch.virtual_tsc_khz &&
+		   kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz);
+
+	tsc = kvm_scale_tsc(vcpu, rdtsc()) + offset;
+	ns = get_kvmclock_base_ns();
+
+	kvm_write_tsc_offset(vcpu, offset, tsc, ns, matched);
+	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+}
+
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
 					   s64 adjustment)
 {
@@ -3492,6 +3518,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_HYPERV_TIME:
 	case KVM_CAP_IOAPIC_POLARITY_IGNORED:
 	case KVM_CAP_TSC_DEADLINE_TIMER:
+	case KVM_CAP_TSC_OFFSET:
 	case KVM_CAP_DISABLE_QUIRKS:
 	case KVM_CAP_SET_BOOT_CPU_ID:
  	case KVM_CAP_SPLIT_IRQCHIP:
@@ -4744,6 +4771,28 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_GET_TSC_OFFSET: {
+		u64 tsc_offset;
+
+		r = -EFAULT;
+		tsc_offset = kvm_vcpu_read_tsc_offset(vcpu);
+		if (copy_to_user(argp, &tsc_offset, sizeof(tsc_offset)))
+			goto out;
+		r = 0;
+		break;
+	}
+	case KVM_SET_TSC_OFFSET: {
+		u64 __user *tsc_offset_arg = argp;
+		u64 tsc_offset;
+
+		r = -EFAULT;
+		if (copy_from_user(&tsc_offset, tsc_offset_arg,
+				   sizeof(tsc_offset)))
+			goto out;
+		kvm_vcpu_ioctl_set_tsc_offset(vcpu, tsc_offset);
+		r = 0;
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ff9b335620d0..41f387ffcd11 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1033,6 +1033,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
 #define KVM_CAP_LAST_CPU 184
+#define KVM_CAP_TSC_OFFSET 185
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1501,6 +1502,10 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+/* Available with KVM_CAP_TSC_OFFSET */
+#define KVM_GET_TSC_OFFSET	_IOR(KVMIO, 0xc5, __u64)
+#define KVM_SET_TSC_OFFSET	_IOW(KVMIO, 0xc6, __u64)
+
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/5] kvm: vmx: check tsc offsetting with nested_cpu_has()
  2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 1/5] kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 2/5] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
@ 2020-07-22  3:26 ` Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 4/5] selftests: kvm: use a helper function for reading cpuid Oliver Upton
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2020-07-22  3:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2b41d987b101..fa201010dbe0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1775,7 +1775,7 @@ static u64 vmx_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	 * to the newly set TSC to get L2's TSC.
 	 */
 	if (is_guest_mode(vcpu) &&
-	    (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING))
+	    nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETTING))
 		g_tsc_offset = vmcs12->tsc_offset;
 
 	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/5] selftests: kvm: use a helper function for reading cpuid
  2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
                   ` (2 preceding siblings ...)
  2020-07-22  3:26 ` [PATCH v3 3/5] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
@ 2020-07-22  3:26 ` Oliver Upton
  2020-07-22  3:26 ` [PATCH v3 5/5] selftests: kvm: introduce tsc_offset_test Oliver Upton
  2020-07-28 16:33 ` [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
  5 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2020-07-22  3:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

Derived from kvm-unit-tests.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/include/x86_64/processor.h      | 15 +++++++++++++++
 .../selftests/kvm/include/x86_64/svm_util.h       |  7 ++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 82b7fe16a824..316fbd4700ef 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -338,6 +338,21 @@ uint32_t kvm_get_cpuid_max_basic(void);
 uint32_t kvm_get_cpuid_max_extended(void);
 void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
 
+struct cpuid {
+	u32 a, b, c, d;
+};
+
+static inline struct cpuid raw_cpuid(u32 function, u32 index)
+{
+	struct cpuid r;
+
+	asm volatile("cpuid"
+		     : "=a"(r.a), "=b"(r.b), "=c"(r.c), "=d"(r.d)
+		     : "0"(function), "2"(index));
+
+	return r;
+}
+
 /*
  * Basic CPU control in CR0
  */
diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index b7531c83b8ae..47a13aaee460 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -38,12 +38,9 @@ void nested_svm_check_supported(void);
 
 static inline bool cpu_has_svm(void)
 {
-	u32 eax = 0x80000001, ecx;
+	struct cpuid r = raw_cpuid(0x80000001, 0);
 
-	asm("cpuid" :
-	    "=a" (eax), "=c" (ecx) : "0" (eax) : "ebx", "edx");
-
-	return ecx & CPUID_SVM;
+	return r.c & CPUID_SVM;
 }
 
 #endif /* SELFTEST_KVM_SVM_UTILS_H */
-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/5] selftests: kvm: introduce tsc_offset_test
  2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
                   ` (3 preceding siblings ...)
  2020-07-22  3:26 ` [PATCH v3 4/5] selftests: kvm: use a helper function for reading cpuid Oliver Upton
@ 2020-07-22  3:26 ` Oliver Upton
  2020-07-28 16:33 ` [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
  5 siblings, 0 replies; 17+ messages in thread
From: Oliver Upton @ 2020-07-22  3:26 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack,
	Oliver Upton, Jim Mattson

Test the KVM_{GET,SET}_TSC_OFFSET ioctls. Ensure the following:
 - KVM_GET_TSC_OFFSET returns the value written by KVM_SET_TSC_OFFSET
 - The L1 TSC is appropriately adjusted based on the written value
 - Guest manipulation of the TSC results in a changed offset in
   KVM_GET_TSC_OFFSET
 - Modifying the TSC offset while in guest mode affects L2 guest
 - Modification in guest mode is also reflected in L1

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../testing/selftests/kvm/include/test_util.h |   3 +
 .../selftests/kvm/include/x86_64/svm_util.h   |   5 +
 .../selftests/kvm/include/x86_64/vmx.h        |   9 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
 tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
 .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
 8 files changed, 393 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 452787152748..f5608692b43a 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -11,6 +11,7 @@
 /x86_64/set_sregs_test
 /x86_64/smm_test
 /x86_64/state_test
+/x86_64/tsc_offset_test
 /x86_64/vmx_preemption_timer_test
 /x86_64/svm_vmcall_test
 /x86_64/sync_regs_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4a166588d99f..03dfaa49c3c4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -55,6 +55,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
 TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test
 TEST_GEN_PROGS_x86_64 += x86_64/debug_regs
+TEST_GEN_PROGS_x86_64 += x86_64/tsc_offset_test
 TEST_GEN_PROGS_x86_64 += clear_dirty_log_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 5eb01bf51b86..cb6a76e2b39a 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -65,4 +65,7 @@ struct timespec timespec_add_ns(struct timespec ts, int64_t ns);
 struct timespec timespec_add(struct timespec ts1, struct timespec ts2);
 struct timespec timespec_sub(struct timespec ts1, struct timespec ts2);
 
+#define swap(a, b) \
+	do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
+
 #endif /* SELFTEST_KVM_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/svm_util.h b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
index 47a13aaee460..ef2791be3f6b 100644
--- a/tools/testing/selftests/kvm/include/x86_64/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86_64/svm_util.h
@@ -43,4 +43,9 @@ static inline bool cpu_has_svm(void)
 	return r.c & CPUID_SVM;
 }
 
+static inline void vmmcall(void)
+{
+	__asm__ __volatile__("vmmcall");
+}
+
 #endif /* SELFTEST_KVM_SVM_UTILS_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 16fa21ebb99c..d70bf4f350f8 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -602,6 +602,8 @@ struct vmx_pages *vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva);
 bool prepare_for_vmx_operation(struct vmx_pages *vmx);
 void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp);
 bool load_vmcs(struct vmx_pages *vmx);
+void generic_vmx_setup(struct vmx_pages *vmx, void *guest_rip,
+		       void *guest_rsp);
 
 bool nested_vmx_supported(void);
 void nested_vmx_check_supported(void);
@@ -616,4 +618,11 @@ void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
 void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 		  uint32_t eptp_memslot);
 
+static inline bool cpu_has_vmx(void)
+{
+	struct cpuid r = raw_cpuid(1, 0);
+
+	return r.c & CPUID_VMX;
+}
+
 #endif /* SELFTEST_KVM_VMX_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 74776ee228f2..1b9bd2ed58c8 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -23,6 +23,7 @@
 static void *align(void *x, size_t size)
 {
 	size_t mask = size - 1;
+
 	TEST_ASSERT(size != 0 && !(size & (size - 1)),
 		    "size not a power of 2: %lu", size);
 	return (void *) (((size_t) x + mask) & ~mask);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index f1e00d43eea2..a5fb3bca0219 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -542,3 +542,14 @@ void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm,
 	vmx->eptp_hva = addr_gva2hva(vm, (uintptr_t)vmx->eptp);
 	vmx->eptp_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->eptp);
 }
+
+void generic_vmx_setup(struct vmx_pages *vmx_pages, void *guest_rip,
+		       void *guest_rsp)
+{
+	GUEST_ASSERT(vmx_pages && vmx_pages->vmcs_gpa);
+	GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+	GUEST_ASSERT(load_vmcs(vmx_pages));
+	GUEST_ASSERT(vmptrstz() == vmx_pages->vmcs_gpa);
+	prepare_vmcs(vmx_pages, guest_rip, guest_rsp);
+}
+
diff --git a/tools/testing/selftests/kvm/x86_64/tsc_offset_test.c b/tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
new file mode 100644
index 000000000000..7d39374e82c1
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
@@ -0,0 +1,362 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * TSC offset test
+ *
+ * Copyright (C) 2020, Google, LLC.
+ *
+ * Test to ensure that userspace control of the TSC offset field behaves as
+ * expected for both non-nested and nested guests.
+ */
+#include "kvm_util.h"
+#include "processor.h"
+#include "svm_util.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#include "kselftest.h"
+
+#define L1_TSC_WRITE_VALUE 0
+#define L2_GUEST_STACK_SIZE 64
+#define L1_TSC_OFFSET (1ul << 48)
+#define L2_TSC_OFFSET -L1_TSC_OFFSET
+#define VCPU_ID 1
+
+bool vmx;
+
+void set_tsc_offset(struct kvm_vm *vm, u32 vcpuid, u64 val)
+{
+	vcpu_ioctl(vm, vcpuid, KVM_SET_TSC_OFFSET, &val);
+}
+
+void get_tsc_offset(struct kvm_vm *vm, u32 vcpuid, u64 *out)
+{
+	vcpu_ioctl(vm, vcpuid, KVM_GET_TSC_OFFSET, out);
+}
+
+void get_clock(struct kvm_vm *vm, struct kvm_clock_data *out)
+{
+	vm_ioctl(vm, KVM_GET_CLOCK, out);
+}
+
+/*
+ * Test that reading the TSC offset returns the previously written value.
+ */
+void set_get_tsc_offset_test(struct kvm_vm *vm, u32 vcpuid)
+{
+	u64 val;
+
+	set_tsc_offset(vm, vcpuid, L1_TSC_OFFSET);
+	get_tsc_offset(vm, vcpuid, &val);
+	TEST_ASSERT(val == L1_TSC_OFFSET,
+		    "Expected %lu from GET_TSC_OFFSET but got %lu",
+		    L1_TSC_OFFSET, val);
+}
+
+void check_value_bounds(const char *name, int stage, u64 low, u64 high, u64 val)
+{
+	TEST_ASSERT(low <= val && val <= high,
+		    "Stage %d: expected %s value in the range [%lu, %lu] but got %lu",
+		    stage, name, low, high, val);
+
+	/* only reached if passed */
+	pr_info("Stage %d: %s: %lu, expected range: [%lu, %lu]\n", stage, name,
+		val, low, high);
+}
+
+void check_value_bounds_signed(const char *name, int stage, s64 low, s64 high,
+			       s64 val)
+{
+	TEST_ASSERT(low <= val && val <= high,
+		    "Stage %d: expected %s value in the range [%ld, %ld] but got %ld",
+		    stage, name, low, high, val);
+
+	/* only reached if passed */
+	pr_info("Stage %d: %s: %ld, expected range: [%ld, %ld]\n", stage, name,
+		val, low, high);
+}
+
+void check_value_bounds_overflow(const char *name, int stage, s64 low, s64 high,
+				 s64 val)
+{
+	TEST_ASSERT(val <= low || val >= high,
+		    "Stage %d: expected %s value outside the range [%ld, %ld] but got %ld",
+		    stage, name, low, high, val);
+
+	pr_info("Stage %d: %s: %ld, expected range: [-MAX, %ld], [%ld, MAX]\n",
+		stage, name, val, low, high);
+}
+
+void generic_vmcall(void)
+{
+	if (vmx)
+		vmcall();
+	else
+		vmmcall();
+}
+
+void l2_main(void)
+{
+	/* Allow userspace to manipulate the TSC offset */
+	GUEST_SYNC(3);
+	GUEST_SYNC_ARGS(4, rdtsc(), 0, 0, 0);
+	generic_vmcall();
+}
+
+void l0_nested_setup(struct kvm_vm *vm, u32 vcpuid)
+{
+	vm_vaddr_t nested_pages = 0;
+
+	if (vmx)
+		vcpu_alloc_vmx(vm, &nested_pages);
+	else
+		vcpu_alloc_svm(vm, &nested_pages);
+
+	vcpu_args_set(vm, VCPU_ID, 1, nested_pages);
+}
+
+void l1_nested_setup(void *nested_pages, void *guest_stack)
+{
+	if (vmx)
+		generic_vmx_setup(nested_pages, l2_main, guest_stack);
+	else
+		generic_svm_setup(nested_pages, l2_main, guest_stack);
+}
+
+void l1_set_tsc_offset(void *nested_pages, u64 offset)
+{
+	if (vmx) {
+		GUEST_ASSERT(!vmwrite(CPU_BASED_VM_EXEC_CONTROL,
+				      vmreadz(CPU_BASED_VM_EXEC_CONTROL) |
+				      CPU_BASED_USE_TSC_OFFSETTING));
+
+		GUEST_ASSERT(!vmwrite(TSC_OFFSET, offset));
+	} else {
+		struct svm_test_data *svm = nested_pages;
+
+		svm->vmcb->control.tsc_offset = offset;
+		/* Mark the TSC offset field as dirty */
+		svm->vmcb->control.clean &= ~1u;
+	}
+
+}
+
+void l1_enter_guest(void *nested_pages)
+{
+	if (vmx) {
+		/* We only enter L2 once, hence VMLAUNCH */
+		GUEST_ASSERT(!vmlaunch());
+	} else {
+		struct svm_test_data *svm = nested_pages;
+
+		run_guest(svm->vmcb, svm->vmcb_gpa);
+	}
+}
+
+void l1_assert_exit_vmcall(void *nested_pages)
+{
+	if (vmx) {
+		GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+	} else {
+		struct svm_test_data *svm = nested_pages;
+
+		GUEST_ASSERT(svm->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
+	}
+}
+
+void l1_main(void *nested_pages)
+{
+	unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+	bool nested;
+
+	/*
+	 * Guest doesn't share memory with userspace, determine VMX presence
+	 * inside guest.
+	 */
+	vmx = cpu_has_vmx();
+	nested = vmx || cpu_has_svm();
+
+	if (nested) {
+		l1_nested_setup(nested_pages,
+				&l2_guest_stack[L2_GUEST_STACK_SIZE]);
+		l1_set_tsc_offset(nested_pages, L2_TSC_OFFSET);
+	}
+
+	GUEST_SYNC_ARGS(1, rdtsc(), 0, 0, 0);
+
+	wrmsr(MSR_IA32_TSC, L1_TSC_WRITE_VALUE);
+	GUEST_SYNC(2);
+
+	if (!nested)
+		GUEST_DONE();
+
+	l1_enter_guest(nested_pages);
+	l1_assert_exit_vmcall(nested_pages);
+
+	GUEST_SYNC_ARGS(5, rdtsc(), 0, 0, 0);
+	GUEST_DONE();
+}
+
+int main(void)
+{
+	u64 start, stop, exp_low, exp_high;
+	struct kvm_clock_data clock_data;
+	struct kvm_run *run;
+	struct kvm_vm *vm;
+	struct ucall uc;
+	bool nested;
+	int stage;
+
+	if (!kvm_check_cap(KVM_CAP_TSC_OFFSET) ||
+	    !kvm_check_cap(KVM_CAP_ADJUST_CLOCK)) {
+		pr_info("will skip tsc offset tests\n");
+		return 0;
+	}
+
+	/*
+	 * Nested virtualization is not explicitly required for this test, but
+	 * gates the L2 tests.
+	 */
+	vmx = nested_vmx_supported();
+	nested = vmx || nested_svm_supported();
+
+	vm = vm_create_default(VCPU_ID, 0, (void *) l1_main);
+	vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+	run = vcpu_state(vm, VCPU_ID);
+
+	if (nested)
+		l0_nested_setup(vm, VCPU_ID);
+
+	set_get_tsc_offset_test(vm, VCPU_ID);
+
+	for (stage = 1;; stage++) {
+		start = rdtsc();
+		_vcpu_run(vm, VCPU_ID);
+		stop = rdtsc();
+
+		TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+			    "Stage %d: unexpected exit reason: %u (%s)\n",
+			    stage, run->exit_reason,
+			    exit_reason_str(run->exit_reason));
+
+		switch (get_ucall(vm, VCPU_ID, &uc)) {
+		case UCALL_ABORT:
+			TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+				  __FILE__, uc.args[1]);
+			/* NOT REACHED */
+		case UCALL_SYNC:
+			break;
+		case UCALL_DONE:
+			goto stage6;
+		default:
+			TEST_FAIL("Unknown ucall %lu", uc.cmd);
+		}
+
+		/*
+		 * Check that the guest's TSC value falls between expected
+		 * bounds, considering the written TSC offset.
+		 */
+		if (stage == 1) {
+			exp_low = start + L1_TSC_OFFSET;
+			exp_high = stop + L1_TSC_OFFSET;
+
+			check_value_bounds("L1 TSC", stage, exp_low, exp_high,
+					   uc.args[2]);
+
+			/*
+			 * KVM interprets writes to the TSC within a second of
+			 * elapsed time as an attempt to synchronize TSCs. In
+			 * order to get a TSC offset within expected bounds for
+			 * stage 2, we must sleep for a second to avoid such
+			 * handling of the TSC write.
+			 */
+			sleep(1);
+		/*
+		 * Check that guest writes to the TSC result in a TSC offset
+		 * value between the expected bounds, considering the original
+		 * TSC offset value.
+		 */
+		} else if (stage == 2) {
+			s64 tsc_offset, low, high;
+
+			low = L1_TSC_WRITE_VALUE - stop;
+			high = L1_TSC_WRITE_VALUE - start;
+
+			get_tsc_offset(vm, VCPU_ID, (u64 *) &tsc_offset);
+
+			/*
+			 * It is possible (though highly unlikely) that the
+			 * host's TSC crosses 2^63 ticks while we are running
+			 * the guest. In this case, the lower bound on the TSC
+			 * offset will have wrapped around into the positive
+			 * domain. In this case, we must instead assert that the
+			 * observed value exists outside of the range (high,
+			 * low).
+			 */
+			if (low > high) {
+				/*
+				 * Swap low and high such that the variable
+				 * names correctly imply their value.
+				 */
+				swap(low, high);
+				check_value_bounds_overflow("L1 TSC offset",
+							    stage, low, high,
+							    tsc_offset);
+			} else {
+				check_value_bounds_signed("L1 TSC offset",
+							  stage, low, high,
+							  tsc_offset);
+			}
+
+		/*
+		 * Write the TSC offset while in guest mode
+		 */
+		} else if (nested && stage == 3) {
+			set_tsc_offset(vm, VCPU_ID, L1_TSC_OFFSET);
+
+		/*
+		 * Check that the write to TSC offset affects L2's perception of
+		 * the TSC
+		 */
+		} else if (nested && stage == 4) {
+			exp_low = start + L1_TSC_OFFSET + L2_TSC_OFFSET;
+			exp_high = stop + L1_TSC_OFFSET + L2_TSC_OFFSET;
+
+			check_value_bounds("L2 TSC", stage, exp_low, exp_high,
+					   uc.args[2]);
+
+		/*
+		 * Check that the modified TSC offset is also observed in L1
+		 */
+		} else if (nested && stage == 5) {
+			exp_low = start + L1_TSC_OFFSET;
+			exp_high = stop + L1_TSC_OFFSET;
+
+			check_value_bounds("L1 TSC", stage, exp_low, exp_high,
+					   uc.args[2]);
+		} else {
+			TEST_FAIL("Unexpected stage %d\n", stage);
+		}
+	}
+
+	/*
+	 * Check that KVM sets the KVM_CLOCK_TSC_STABLE flag when vCPUs have an
+	 * equivalent TSC offset.
+	 */
+stage6:
+	vm_vcpu_add_default(vm, VCPU_ID + 1, NULL);
+	vcpu_set_cpuid(vm, VCPU_ID + 1, kvm_get_supported_cpuid());
+
+	set_tsc_offset(vm, VCPU_ID, L1_TSC_OFFSET),
+	set_tsc_offset(vm, VCPU_ID + 1, L1_TSC_OFFSET);
+	get_clock(vm, &clock_data);
+
+	TEST_ASSERT(clock_data.flags & KVM_CLOCK_TSC_STABLE,
+		    "Stage 6: expected KVM_CLOCK_TSC_STABLE (%#x) flag to be set but got %#x",
+		    KVM_CLOCK_TSC_STABLE, clock_data.flags);
+
+	pr_info("Stage 6: clock_data.flags = %#x, expected KVM_CLOCK_TSC_STABLE (%#x) flag\n",
+		clock_data.flags, KVM_CLOCK_TSC_STABLE);
+
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.28.0.rc0.142.g3c755180ce-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
                   ` (4 preceding siblings ...)
  2020-07-22  3:26 ` [PATCH v3 5/5] selftests: kvm: introduce tsc_offset_test Oliver Upton
@ 2020-07-28 16:33 ` Oliver Upton
  2020-08-05 16:06   ` Oliver Upton
  5 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2020-07-28 16:33 UTC (permalink / raw)
  To: kvm list; +Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack

On Tue, Jul 21, 2020 at 8:26 PM Oliver Upton <oupton@google.com> wrote:
>
> To date, VMMs have typically restored the guest's TSCs by value using
> the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by
> value introduces some challenges with synchronization as the TSCs
> continue to tick throughout the restoration process. As such, KVM has
> some heuristics around TSC writes to infer whether or not the guest or
> host is attempting to synchronize the TSCs.
>
> Instead of guessing at the intentions of a VMM, it'd be better to
> provide an interface that allows for explicit synchronization of the
> guest's TSCs. To that end, this series introduces the
> KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to
> userspace.
>
> v2 => v3:
>  - Mark kvm_write_tsc_offset() as static (whoops)
>
> v1 => v2:
>  - Added clarification to the documentation of KVM_SET_TSC_OFFSET to
>    indicate that it can be used instead of an IA32_TSC MSR restore
>    through KVM_SET_MSRS
>  - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC
>    synchronization heuristics, thereby enabling the KVM masterclock when
>    all vCPUs are in phase.
>
> Oliver Upton (4):
>   kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
>   kvm: vmx: check tsc offsetting with nested_cpu_has()
>   selftests: kvm: use a helper function for reading cpuid
>   selftests: kvm: introduce tsc_offset_test
>
> Peter Hornyack (1):
>   kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
>
>  Documentation/virt/kvm/api.rst                |  31 ++
>  arch/x86/include/asm/kvm_host.h               |   1 +
>  arch/x86/kvm/vmx/vmx.c                        |   2 +-
>  arch/x86/kvm/x86.c                            | 147 ++++---
>  include/uapi/linux/kvm.h                      |   5 +
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   1 +
>  .../testing/selftests/kvm/include/test_util.h |   3 +
>  .../selftests/kvm/include/x86_64/processor.h  |  15 +
>  .../selftests/kvm/include/x86_64/svm_util.h   |  10 +-
>  .../selftests/kvm/include/x86_64/vmx.h        |   9 +
>  tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
>  tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
>  .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
>  14 files changed, 550 insertions(+), 49 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
>
> --
> 2.28.0.rc0.142.g3c755180ce-goog
>

Ping :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-07-28 16:33 ` [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
@ 2020-08-05 16:06   ` Oliver Upton
  2020-08-05 18:46     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2020-08-05 16:06 UTC (permalink / raw)
  To: kvm list; +Cc: Paolo Bonzini, Peter Shier, Sean Christopherson, Peter Hornyack

On Tue, Jul 28, 2020 at 11:33 AM Oliver Upton <oupton@google.com> wrote:
>
> On Tue, Jul 21, 2020 at 8:26 PM Oliver Upton <oupton@google.com> wrote:
> >
> > To date, VMMs have typically restored the guest's TSCs by value using
> > the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by
> > value introduces some challenges with synchronization as the TSCs
> > continue to tick throughout the restoration process. As such, KVM has
> > some heuristics around TSC writes to infer whether or not the guest or
> > host is attempting to synchronize the TSCs.
> >
> > Instead of guessing at the intentions of a VMM, it'd be better to
> > provide an interface that allows for explicit synchronization of the
> > guest's TSCs. To that end, this series introduces the
> > KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to
> > userspace.
> >
> > v2 => v3:
> >  - Mark kvm_write_tsc_offset() as static (whoops)
> >
> > v1 => v2:
> >  - Added clarification to the documentation of KVM_SET_TSC_OFFSET to
> >    indicate that it can be used instead of an IA32_TSC MSR restore
> >    through KVM_SET_MSRS
> >  - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC
> >    synchronization heuristics, thereby enabling the KVM masterclock when
> >    all vCPUs are in phase.
> >
> > Oliver Upton (4):
> >   kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
> >   kvm: vmx: check tsc offsetting with nested_cpu_has()
> >   selftests: kvm: use a helper function for reading cpuid
> >   selftests: kvm: introduce tsc_offset_test
> >
> > Peter Hornyack (1):
> >   kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
> >
> >  Documentation/virt/kvm/api.rst                |  31 ++
> >  arch/x86/include/asm/kvm_host.h               |   1 +
> >  arch/x86/kvm/vmx/vmx.c                        |   2 +-
> >  arch/x86/kvm/x86.c                            | 147 ++++---
> >  include/uapi/linux/kvm.h                      |   5 +
> >  tools/testing/selftests/kvm/.gitignore        |   1 +
> >  tools/testing/selftests/kvm/Makefile          |   1 +
> >  .../testing/selftests/kvm/include/test_util.h |   3 +
> >  .../selftests/kvm/include/x86_64/processor.h  |  15 +
> >  .../selftests/kvm/include/x86_64/svm_util.h   |  10 +-
> >  .../selftests/kvm/include/x86_64/vmx.h        |   9 +
> >  tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
> >  tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
> >  .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
> >  14 files changed, 550 insertions(+), 49 deletions(-)
> >  create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
> >
> > --
> > 2.28.0.rc0.142.g3c755180ce-goog
> >
>
> Ping :)

Ping

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-05 16:06   ` Oliver Upton
@ 2020-08-05 18:46     ` Paolo Bonzini
  2020-08-05 21:33       ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2020-08-05 18:46 UTC (permalink / raw)
  To: Oliver Upton, kvm list; +Cc: Peter Shier, Sean Christopherson, Peter Hornyack

On 05/08/20 18:06, Oliver Upton wrote:
> On Tue, Jul 28, 2020 at 11:33 AM Oliver Upton <oupton@google.com> wrote:
>>
>> On Tue, Jul 21, 2020 at 8:26 PM Oliver Upton <oupton@google.com> wrote:
>>>
>>> To date, VMMs have typically restored the guest's TSCs by value using
>>> the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by
>>> value introduces some challenges with synchronization as the TSCs
>>> continue to tick throughout the restoration process. As such, KVM has
>>> some heuristics around TSC writes to infer whether or not the guest or
>>> host is attempting to synchronize the TSCs.
>>>
>>> Instead of guessing at the intentions of a VMM, it'd be better to
>>> provide an interface that allows for explicit synchronization of the
>>> guest's TSCs. To that end, this series introduces the
>>> KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to
>>> userspace.
>>>
>>> v2 => v3:
>>>  - Mark kvm_write_tsc_offset() as static (whoops)
>>>
>>> v1 => v2:
>>>  - Added clarification to the documentation of KVM_SET_TSC_OFFSET to
>>>    indicate that it can be used instead of an IA32_TSC MSR restore
>>>    through KVM_SET_MSRS
>>>  - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC
>>>    synchronization heuristics, thereby enabling the KVM masterclock when
>>>    all vCPUs are in phase.
>>>
>>> Oliver Upton (4):
>>>   kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
>>>   kvm: vmx: check tsc offsetting with nested_cpu_has()
>>>   selftests: kvm: use a helper function for reading cpuid
>>>   selftests: kvm: introduce tsc_offset_test
>>>
>>> Peter Hornyack (1):
>>>   kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
>>>
>>>  Documentation/virt/kvm/api.rst                |  31 ++
>>>  arch/x86/include/asm/kvm_host.h               |   1 +
>>>  arch/x86/kvm/vmx/vmx.c                        |   2 +-
>>>  arch/x86/kvm/x86.c                            | 147 ++++---
>>>  include/uapi/linux/kvm.h                      |   5 +
>>>  tools/testing/selftests/kvm/.gitignore        |   1 +
>>>  tools/testing/selftests/kvm/Makefile          |   1 +
>>>  .../testing/selftests/kvm/include/test_util.h |   3 +
>>>  .../selftests/kvm/include/x86_64/processor.h  |  15 +
>>>  .../selftests/kvm/include/x86_64/svm_util.h   |  10 +-
>>>  .../selftests/kvm/include/x86_64/vmx.h        |   9 +
>>>  tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
>>>  tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
>>>  .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
>>>  14 files changed, 550 insertions(+), 49 deletions(-)
>>>  create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
>>>
>>> --
>>> 2.28.0.rc0.142.g3c755180ce-goog
>>>
>>
>> Ping :)
> 
> Ping

Hi Oliver,

I saw these on vacation and decided I would delay them to 5.10.  However
they are definitely on my list.

I have one possibly very stupid question just by looking at the cover
letter: now that you've "fixed KVM_SET_TSC_OFFSET to participate in the
existing TSC synchronization heuristics" what makes it still not
"guessing the intentions of a VMM"?  (No snark intended, just quoting
the parts that puzzled me a bit).

My immediate reaction was that we should just migrate the heuristics
state somehow, but perhaps I'm missing something obvious.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-05 18:46     ` Paolo Bonzini
@ 2020-08-05 21:33       ` Oliver Upton
  2020-08-05 22:01         ` Jim Mattson
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2020-08-05 21:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On Wed, Aug 5, 2020 at 1:46 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 05/08/20 18:06, Oliver Upton wrote:
> > On Tue, Jul 28, 2020 at 11:33 AM Oliver Upton <oupton@google.com> wrote:
> >>
> >> On Tue, Jul 21, 2020 at 8:26 PM Oliver Upton <oupton@google.com> wrote:
> >>>
> >>> To date, VMMs have typically restored the guest's TSCs by value using
> >>> the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by
> >>> value introduces some challenges with synchronization as the TSCs
> >>> continue to tick throughout the restoration process. As such, KVM has
> >>> some heuristics around TSC writes to infer whether or not the guest or
> >>> host is attempting to synchronize the TSCs.
> >>>
> >>> Instead of guessing at the intentions of a VMM, it'd be better to
> >>> provide an interface that allows for explicit synchronization of the
> >>> guest's TSCs. To that end, this series introduces the
> >>> KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to
> >>> userspace.
> >>>
> >>> v2 => v3:
> >>>  - Mark kvm_write_tsc_offset() as static (whoops)
> >>>
> >>> v1 => v2:
> >>>  - Added clarification to the documentation of KVM_SET_TSC_OFFSET to
> >>>    indicate that it can be used instead of an IA32_TSC MSR restore
> >>>    through KVM_SET_MSRS
> >>>  - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC
> >>>    synchronization heuristics, thereby enabling the KVM masterclock when
> >>>    all vCPUs are in phase.
> >>>
> >>> Oliver Upton (4):
> >>>   kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
> >>>   kvm: vmx: check tsc offsetting with nested_cpu_has()
> >>>   selftests: kvm: use a helper function for reading cpuid
> >>>   selftests: kvm: introduce tsc_offset_test
> >>>
> >>> Peter Hornyack (1):
> >>>   kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
> >>>
> >>>  Documentation/virt/kvm/api.rst                |  31 ++
> >>>  arch/x86/include/asm/kvm_host.h               |   1 +
> >>>  arch/x86/kvm/vmx/vmx.c                        |   2 +-
> >>>  arch/x86/kvm/x86.c                            | 147 ++++---
> >>>  include/uapi/linux/kvm.h                      |   5 +
> >>>  tools/testing/selftests/kvm/.gitignore        |   1 +
> >>>  tools/testing/selftests/kvm/Makefile          |   1 +
> >>>  .../testing/selftests/kvm/include/test_util.h |   3 +
> >>>  .../selftests/kvm/include/x86_64/processor.h  |  15 +
> >>>  .../selftests/kvm/include/x86_64/svm_util.h   |  10 +-
> >>>  .../selftests/kvm/include/x86_64/vmx.h        |   9 +
> >>>  tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
> >>>  tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
> >>>  .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
> >>>  14 files changed, 550 insertions(+), 49 deletions(-)
> >>>  create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
> >>>
> >>> --
> >>> 2.28.0.rc0.142.g3c755180ce-goog
> >>>
> >>
> >> Ping :)
> >
> > Ping
>
> Hi Oliver,
>
> I saw these on vacation and decided I would delay them to 5.10.  However
> they are definitely on my list.
>

Hope you enjoyed vacation!

> I have one possibly very stupid question just by looking at the cover
> letter: now that you've "fixed KVM_SET_TSC_OFFSET to participate in the
> existing TSC synchronization heuristics" what makes it still not
> "guessing the intentions of a VMM"?  (No snark intended, just quoting
> the parts that puzzled me a bit).

Great point.

I'd still posit that this series disambiguates userspace
control/synchronization of the TSCs. If a VMM wants the TSCs to be in
sync, it can write identical offsets to all vCPUs

That said, participation in TSC synchronization is presently necessary
due to issues migrating a guest that was in the middle of a TSC sync.
In doing so, we still accomplish synchronization on the other end of
migration with a well-timed mix of host and guest writes.

>
> My immediate reaction was that we should just migrate the heuristics
> state somehow

Yeah, I completely agree. I believe this series fixes the
userspace-facing issues and your suggestion would address the
guest-facing issues.

> but perhaps I'm missing something obvious.

Not necessarily obvious, but I can think of a rather contrived example
where the sync heuristics break down. If we're running nested and get
migrated in the middle of a VMM setting up TSCs, it's possible that
enough time will pass that we believe subsequent writes to not be of
the same TSC generation.

> Paolo
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-05 21:33       ` Oliver Upton
@ 2020-08-05 22:01         ` Jim Mattson
  2020-08-12 13:41           ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Jim Mattson @ 2020-08-05 22:01 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Paolo Bonzini, kvm list, Peter Shier, Sean Christopherson,
	Peter Hornyack

On Wed, Aug 5, 2020 at 2:33 PM Oliver Upton <oupton@google.com> wrote:
>
> On Wed, Aug 5, 2020 at 1:46 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 05/08/20 18:06, Oliver Upton wrote:
> > > On Tue, Jul 28, 2020 at 11:33 AM Oliver Upton <oupton@google.com> wrote:
> > >>
> > >> On Tue, Jul 21, 2020 at 8:26 PM Oliver Upton <oupton@google.com> wrote:
> > >>>
> > >>> To date, VMMs have typically restored the guest's TSCs by value using
> > >>> the KVM_SET_MSRS ioctl for each vCPU. However, restoring the TSCs by
> > >>> value introduces some challenges with synchronization as the TSCs
> > >>> continue to tick throughout the restoration process. As such, KVM has
> > >>> some heuristics around TSC writes to infer whether or not the guest or
> > >>> host is attempting to synchronize the TSCs.
> > >>>
> > >>> Instead of guessing at the intentions of a VMM, it'd be better to
> > >>> provide an interface that allows for explicit synchronization of the
> > >>> guest's TSCs. To that end, this series introduces the
> > >>> KVM_{GET,SET}_TSC_OFFSET ioctls, yielding control of the TSC offset to
> > >>> userspace.
> > >>>
> > >>> v2 => v3:
> > >>>  - Mark kvm_write_tsc_offset() as static (whoops)
> > >>>
> > >>> v1 => v2:
> > >>>  - Added clarification to the documentation of KVM_SET_TSC_OFFSET to
> > >>>    indicate that it can be used instead of an IA32_TSC MSR restore
> > >>>    through KVM_SET_MSRS
> > >>>  - Fixed KVM_SET_TSC_OFFSET to participate in the existing TSC
> > >>>    synchronization heuristics, thereby enabling the KVM masterclock when
> > >>>    all vCPUs are in phase.
> > >>>
> > >>> Oliver Upton (4):
> > >>>   kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc
> > >>>   kvm: vmx: check tsc offsetting with nested_cpu_has()
> > >>>   selftests: kvm: use a helper function for reading cpuid
> > >>>   selftests: kvm: introduce tsc_offset_test
> > >>>
> > >>> Peter Hornyack (1):
> > >>>   kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls
> > >>>
> > >>>  Documentation/virt/kvm/api.rst                |  31 ++
> > >>>  arch/x86/include/asm/kvm_host.h               |   1 +
> > >>>  arch/x86/kvm/vmx/vmx.c                        |   2 +-
> > >>>  arch/x86/kvm/x86.c                            | 147 ++++---
> > >>>  include/uapi/linux/kvm.h                      |   5 +
> > >>>  tools/testing/selftests/kvm/.gitignore        |   1 +
> > >>>  tools/testing/selftests/kvm/Makefile          |   1 +
> > >>>  .../testing/selftests/kvm/include/test_util.h |   3 +
> > >>>  .../selftests/kvm/include/x86_64/processor.h  |  15 +
> > >>>  .../selftests/kvm/include/x86_64/svm_util.h   |  10 +-
> > >>>  .../selftests/kvm/include/x86_64/vmx.h        |   9 +
> > >>>  tools/testing/selftests/kvm/lib/kvm_util.c    |   1 +
> > >>>  tools/testing/selftests/kvm/lib/x86_64/vmx.c  |  11 +
> > >>>  .../selftests/kvm/x86_64/tsc_offset_test.c    | 362 ++++++++++++++++++
> > >>>  14 files changed, 550 insertions(+), 49 deletions(-)
> > >>>  create mode 100644 tools/testing/selftests/kvm/x86_64/tsc_offset_test.c
> > >>>
> > >>> --
> > >>> 2.28.0.rc0.142.g3c755180ce-goog
> > >>>
> > >>
> > >> Ping :)
> > >
> > > Ping
> >
> > Hi Oliver,
> >
> > I saw these on vacation and decided I would delay them to 5.10.  However
> > they are definitely on my list.
> >
>
> Hope you enjoyed vacation!
>
> > I have one possibly very stupid question just by looking at the cover
> > letter: now that you've "fixed KVM_SET_TSC_OFFSET to participate in the
> > existing TSC synchronization heuristics" what makes it still not
> > "guessing the intentions of a VMM"?  (No snark intended, just quoting
> > the parts that puzzled me a bit).
>
> Great point.
>
> I'd still posit that this series disambiguates userspace
> control/synchronization of the TSCs. If a VMM wants the TSCs to be in
> sync, it can write identical offsets to all vCPUs
>
> That said, participation in TSC synchronization is presently necessary
> due to issues migrating a guest that was in the middle of a TSC sync.
> In doing so, we still accomplish synchronization on the other end of
> migration with a well-timed mix of host and guest writes.
>
> >
> > My immediate reaction was that we should just migrate the heuristics
> > state somehow
>
> Yeah, I completely agree. I believe this series fixes the
> userspace-facing issues and your suggestion would address the
> guest-facing issues.
>
> > but perhaps I'm missing something obvious.
>
> Not necessarily obvious, but I can think of a rather contrived example
> where the sync heuristics break down. If we're running nested and get
> migrated in the middle of a VMM setting up TSCs, it's possible that
> enough time will pass that we believe subsequent writes to not be of
> the same TSC generation.

An example that has been biting us frequently in self-tests: migrate a
VM with less than a second accumulated in its TSC. At the destination,
the TSCs are zeroed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-05 22:01         ` Jim Mattson
@ 2020-08-12 13:41           ` Paolo Bonzini
  2020-08-14 16:55             ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2020-08-12 13:41 UTC (permalink / raw)
  To: Jim Mattson, Oliver Upton
  Cc: kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On 06/08/20 00:01, Jim Mattson wrote:
>>> but perhaps I'm missing something obvious.
>> Not necessarily obvious, but I can think of a rather contrived example
>> where the sync heuristics break down. If we're running nested and get
>> migrated in the middle of a VMM setting up TSCs, it's possible that
>> enough time will pass that we believe subsequent writes to not be of
>> the same TSC generation.
>
> An example that has been biting us frequently in self-tests: migrate a
> VM with less than a second accumulated in its TSC. At the destination,
> the TSCs are zeroed.

Yeah, good point about selftests.  But this would be about the sync
heuristics messing up, and I don't understand how these ioctls improve
things.

>> My immediate reaction was that we should just migrate the heuristics
>> state somehow
> 
> Yeah, I completely agree. I believe this series fixes the
> userspace-facing issues and your suggestion would address the
> guest-facing issues.

I still don't understand how these ioctls are any better for userspace
than migrating MSR_IA32_TSC.  The host TSC is different between source
and destination, so the TSC offset will be different.

I am all for improving migration of TSC state, but I think we should do
it right, so we should migrate a host clock / TSC pair: then the
destination host can use TSC frequency and host clock to compute the new
TSC value.  In fact, such a pair is exactly the data that the syncing
heuristics track for each "generation" of syncing.

To migrate the synchronization state, instead, we only need to migrate
the "already_matched" (vcpu->arch.this_tsc_generation ==
kvm->arch.cur_tsc_generation) state.

Putting all of this together, it would be something like this:

- a VM-wide KVM_CLOCK/KVM_SET_CLOCK needs to migrate
vcpu->arch.cur_tsc_{nsec,write} in addition to the current kvmclock
value (it's unrelated, but I don't think it's worth creating a new
ioctl).  A new flag is set if these fields are set in the struct.  If
the flag is set, KVM_SET_CLOCK copies the fields back, bumps the
generation and clears kvm->arch.nr_vcpus_matched_tsc.

- a VCPU-wide KVM_GET_TSC_INFO returns a host clock / guest TSC pair
plus the "already matched" state.  KVM_SET_TSC_INFO will only use the
host clock / TSC pair if "already matched" is false, to compute the
destination-side TSC offset but not otherwise doing anything with it; or
if "already matched" is true, it will ignore the pair, compute the TSC
offset from the data in kvm->arch, and update
kvm->arch.nr_vcpus_matched_tsc.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-12 13:41           ` Paolo Bonzini
@ 2020-08-14 16:55             ` Oliver Upton
  2020-08-17 16:49               ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2020-08-14 16:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On Wed, Aug 12, 2020 at 8:41 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 06/08/20 00:01, Jim Mattson wrote:
> >>> but perhaps I'm missing something obvious.
> >> Not necessarily obvious, but I can think of a rather contrived example
> >> where the sync heuristics break down. If we're running nested and get
> >> migrated in the middle of a VMM setting up TSCs, it's possible that
> >> enough time will pass that we believe subsequent writes to not be of
> >> the same TSC generation.
> >
> > An example that has been biting us frequently in self-tests: migrate a
> > VM with less than a second accumulated in its TSC. At the destination,
> > the TSCs are zeroed.
>
> Yeah, good point about selftests.  But this would be about the sync
> heuristics messing up, and I don't understand how these ioctls improve
> things.

The improvement would be that userspace has final say over the TSC.
The ioctl only participates in match tracking and isn't subject to
value overrides.

>
> >> My immediate reaction was that we should just migrate the heuristics
> >> state somehow
> >
> > Yeah, I completely agree. I believe this series fixes the
> > userspace-facing issues and your suggestion would address the
> > guest-facing issues.
>
> I still don't understand how these ioctls are any better for userspace
> than migrating MSR_IA32_TSC.  The host TSC is different between source
> and destination, so the TSC offset will be different.

Indeed. We do not migrate the TSC offsets, but guest TSC values
constructed from them. Otherwise, the values are complete nonsense on
the other end of the migration.

We allow our userspace to decide the host TSC / wall clock pair at
which the vCPUs were paused. From that host TSC value we reconstruct
the guest TSCs using the offsets and migrate that info. On the
destination we grab another host TSC / clock pair and recalculate
guest TSC offsets, which we then pass to KVM_SET_TSC_OFFSET. This is
desirable over a per-vCPU read of MSR_IA32_TSC because we've
effectively recorded all TSCs at the exact same moment in time.
Otherwise, we inadvertently add skew between guest TSCs by reading
each vCPU at different times. It seems that the sync heuristics
address this issue along with any guest TSC coordination.

Not only that, depending on the heuristics to detect a sync from
userspace gets a bit tricky if we (KVM) are running nested. Couldn't
more than a second of time elapse between successive KVM_SET_MSRS when
running in L1 if L0 decides to pause our vCPUs (suspend/resume,
migration)? It seems to me that in this case we will fail to detect a
sync condition and configure the L2 TSCs to be out-of-phase.

Setting the guest TSCs by offset doesn't have these complications.
Even if L0 were to pause L1 for some inordinate amount of time, the
relation of L1 -> L2 TSC is never disturbed.

>
> I am all for improving migration of TSC state, but I think we should do
> it right, so we should migrate a host clock / TSC pair: then the
> destination host can use TSC frequency and host clock to compute the new
> TSC value.  In fact, such a pair is exactly the data that the syncing
> heuristics track for each "generation" of syncing.
>
> To migrate the synchronization state, instead, we only need to migrate
> the "already_matched" (vcpu->arch.this_tsc_generation ==
> kvm->arch.cur_tsc_generation) state.
>
> Putting all of this together, it would be something like this:
>
> - a VM-wide KVM_CLOCK/KVM_SET_CLOCK needs to migrate
> vcpu->arch.cur_tsc_{nsec,write} in addition to the current kvmclock
> value (it's unrelated, but I don't think it's worth creating a new
> ioctl).  A new flag is set if these fields are set in the struct.  If
> the flag is set, KVM_SET_CLOCK copies the fields back, bumps the
> generation and clears kvm->arch.nr_vcpus_matched_tsc.
>
> - a VCPU-wide KVM_GET_TSC_INFO returns a host clock / guest TSC pair
> plus the "already matched" state.  KVM_SET_TSC_INFO will only use the
> host clock / TSC pair if "already matched" is false, to compute the
> destination-side TSC offset but not otherwise doing anything with it; or
> if "already matched" is true, it will ignore the pair, compute the TSC
> offset from the data in kvm->arch, and update
> kvm->arch.nr_vcpus_matched_tsc.
>

It seems to me that a per-vCPU ioctl (like you've suggested above) is
necessary to uphold the guest-facing side of our sync scheme,
regardless of what we do on the userspace-facing side.

> Paolo
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-14 16:55             ` Oliver Upton
@ 2020-08-17 16:49               ` Paolo Bonzini
  2020-08-17 19:40                 ` Oliver Upton
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2020-08-17 16:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Jim Mattson, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On 14/08/20 18:55, Oliver Upton wrote:
> We allow our userspace to decide the host TSC / wall clock pair at
> which the vCPUs were paused. From that host TSC value we reconstruct
> the guest TSCs using the offsets and migrate that info. On the
> destination we grab another host TSC / clock pair and recalculate
> guest TSC offsets, which we then pass to KVM_SET_TSC_OFFSET. This is
> desirable over a per-vCPU read of MSR_IA32_TSC because we've
> effectively recorded all TSCs at the exact same moment in time.

Yes, this should work very well.  But in the end KVM ends up with the
vcpu->arch.cur_tsc_{nsec,write} of the source (only shifted in time),
while losing the notion that the pair is per-VM rather than per-VCPU for
the "already matched" vCPUs.  So that is why I'm thinking of retrieving
that pair from the kernel directly.

If you don't have time to work on it I can try to find some for 5.10,
but I'm not sure exactly when.

Paolo

> Otherwise, we inadvertently add skew between guest TSCs by reading
> each vCPU at different times. It seems that the sync heuristics
> address this issue along with any guest TSC coordination.
>
> Not only that, depending on the heuristics to detect a sync from
> userspace gets a bit tricky if we (KVM) are running nested. Couldn't
> more than a second of time elapse between successive KVM_SET_MSRS when
> running in L1 if L0 decides to pause our vCPUs (suspend/resume,
> migration)? It seems to me that in this case we will fail to detect a
> sync condition and configure the L2 TSCs to be out-of-phase.
> 
> Setting the guest TSCs by offset doesn't have these complications.
> Even if L0 were to pause L1 for some inordinate amount of time, the
> relation of L1 -> L2 TSC is never disturbed.
> 
>>
>> I am all for improving migration of TSC state, but I think we should do
>> it right, so we should migrate a host clock / TSC pair: then the
>> destination host can use TSC frequency and host clock to compute the new
>> TSC value.  In fact, such a pair is exactly the data that the syncing
>> heuristics track for each "generation" of syncing.
>>
>> To migrate the synchronization state, instead, we only need to migrate
>> the "already_matched" (vcpu->arch.this_tsc_generation ==
>> kvm->arch.cur_tsc_generation) state.
>>
>> Putting all of this together, it would be something like this:
>>
>> - a VM-wide KVM_CLOCK/KVM_SET_CLOCK needs to migrate
>> vcpu->arch.cur_tsc_{nsec,write} in addition to the current kvmclock
>> value (it's unrelated, but I don't think it's worth creating a new
>> ioctl).  A new flag is set if these fields are set in the struct.  If
>> the flag is set, KVM_SET_CLOCK copies the fields back, bumps the
>> generation and clears kvm->arch.nr_vcpus_matched_tsc.
>>
>> - a VCPU-wide KVM_GET_TSC_INFO returns a host clock / guest TSC pair
>> plus the "already matched" state.  KVM_SET_TSC_INFO will only use the
>> host clock / TSC pair if "already matched" is false, to compute the
>> destination-side TSC offset but not otherwise doing anything with it; or
>> if "already matched" is true, it will ignore the pair, compute the TSC
>> offset from the data in kvm->arch, and update
>> kvm->arch.nr_vcpus_matched_tsc.
>>
> 
> It seems to me that a per-vCPU ioctl (like you've suggested above) is
> necessary to uphold the guest-facing side of our sync scheme,
> regardless of what we do on the userspace-facing side.
> 
>> Paolo
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-17 16:49               ` Paolo Bonzini
@ 2020-08-17 19:40                 ` Oliver Upton
  2020-09-24 13:43                   ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Upton @ 2020-08-17 19:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jim Mattson, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On Mon, Aug 17, 2020 at 11:49 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 14/08/20 18:55, Oliver Upton wrote:
> > We allow our userspace to decide the host TSC / wall clock pair at
> > which the vCPUs were paused. From that host TSC value we reconstruct
> > the guest TSCs using the offsets and migrate that info. On the
> > destination we grab another host TSC / clock pair and recalculate
> > guest TSC offsets, which we then pass to KVM_SET_TSC_OFFSET. This is
> > desirable over a per-vCPU read of MSR_IA32_TSC because we've
> > effectively recorded all TSCs at the exact same moment in time.
>
> Yes, this should work very well.  But in the end KVM ends up with the
> vcpu->arch.cur_tsc_{nsec,write} of the source (only shifted in time),
> while losing the notion that the pair is per-VM rather than per-VCPU for
> the "already matched" vCPUs.  So that is why I'm thinking of retrieving
> that pair from the kernel directly.
>
> If you don't have time to work on it I can try to find some for 5.10,
> but I'm not sure exactly when.

Shouldn't be an issue, I'll futz around with some changes to the
series and send them out in the coming weeks.

>
> Paolo
>
> > Otherwise, we inadvertently add skew between guest TSCs by reading
> > each vCPU at different times. It seems that the sync heuristics
> > address this issue along with any guest TSC coordination.
> >
> > Not only that, depending on the heuristics to detect a sync from
> > userspace gets a bit tricky if we (KVM) are running nested. Couldn't
> > more than a second of time elapse between successive KVM_SET_MSRS when
> > running in L1 if L0 decides to pause our vCPUs (suspend/resume,
> > migration)? It seems to me that in this case we will fail to detect a
> > sync condition and configure the L2 TSCs to be out-of-phase.
> >
> > Setting the guest TSCs by offset doesn't have these complications.
> > Even if L0 were to pause L1 for some inordinate amount of time, the
> > relation of L1 -> L2 TSC is never disturbed.
> >
> >>
> >> I am all for improving migration of TSC state, but I think we should do
> >> it right, so we should migrate a host clock / TSC pair: then the
> >> destination host can use TSC frequency and host clock to compute the new
> >> TSC value.  In fact, such a pair is exactly the data that the syncing
> >> heuristics track for each "generation" of syncing.
> >>
> >> To migrate the synchronization state, instead, we only need to migrate
> >> the "already_matched" (vcpu->arch.this_tsc_generation ==
> >> kvm->arch.cur_tsc_generation) state.
> >>
> >> Putting all of this together, it would be something like this:
> >>
> >> - a VM-wide KVM_CLOCK/KVM_SET_CLOCK needs to migrate
> >> vcpu->arch.cur_tsc_{nsec,write} in addition to the current kvmclock
> >> value (it's unrelated, but I don't think it's worth creating a new
> >> ioctl).  A new flag is set if these fields are set in the struct.  If
> >> the flag is set, KVM_SET_CLOCK copies the fields back, bumps the
> >> generation and clears kvm->arch.nr_vcpus_matched_tsc.
> >>
> >> - a VCPU-wide KVM_GET_TSC_INFO returns a host clock / guest TSC pair
> >> plus the "already matched" state.  KVM_SET_TSC_INFO will only use the
> >> host clock / TSC pair if "already matched" is false, to compute the
> >> destination-side TSC offset but not otherwise doing anything with it; or
> >> if "already matched" is true, it will ignore the pair, compute the TSC
> >> offset from the data in kvm->arch, and update
> >> kvm->arch.nr_vcpus_matched_tsc.
> >>
> >
> > It seems to me that a per-vCPU ioctl (like you've suggested above) is
> > necessary to uphold the guest-facing side of our sync scheme,
> > regardless of what we do on the userspace-facing side.
> >
> >> Paolo
> >>
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-08-17 19:40                 ` Oliver Upton
@ 2020-09-24 13:43                   ` Paolo Bonzini
  2020-09-24 15:14                     ` Maxim Levitsky
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2020-09-24 13:43 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Jim Mattson, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On 17/08/20 21:40, Oliver Upton wrote:
>> If you don't have time to work on it I can try to find some for 5.10,
>> but I'm not sure exactly when.
>
> Shouldn't be an issue, I'll futz around with some changes to the
> series and send them out in the coming weeks.

Ok, after looking more at the code with Maxim I can confidently say that
it's a total mess.  And a lot of the synchronization code is dead
because 1) as far as we could see no guest synchronizes the TSC using
MSR_IA32_TSC; and 2) writing to MSR_IA32_TSC_ADJUST does not trigger the
synchronization code in kvm_write_tsc.

Your patch works not by some sort of miracle, but rather because it
bypasses the mess and that's the smart thing to do.

The plan is now as follows:

1) guest-initiated MSR_IA32_TSC write never goes through the sync
heuristics.  I'll shortly send a patch for this, and it will fix the
testcase issue

2) to have a new KVM_X86_DISABLE_QUIRKS value, that will toggle between
"magic" and "vanilla" semantics for host-initiated TSC and TSC_ADJUST writes

3) if the quirk is present we still want existing userspace to work so:

- host-initiated MSR_IA32_TSC write always returns the L1 TSC as in
Maxim's recent patch.  They will also always go through the sync heuristics.

- host-initiated MSR_IA32_TSC_ADJUST write don't make the TSC jump, they
only write to vcpu->arch.ia32_tsc_adjust_msr (as in the current kernel)

4) if the quirk is disabled however:

- the sync heuristics are never used except in kvm_arch_vcpu_postcreate

- host-initiated MSR_IA32_TSC and MSR_IA32_TSC_ADJUST accesses work like
in the guest: reads of MSR_IA32_TSC return the "right" TSC, writes of
MSR_IA32_TSC_ADJUST writes make the TSC jump.

- for live migration, userspace is expected to use the new
KVM_GET/SET_TSC_PRECISE (or whatever the name will be) to get/set a
(nanosecond, TSC, TSC_ADJUST) tuple.  The sync heuristics will be
bypassed and it will just set the right value for the MSRs.  Setting
MSR_IA32_TSC_ADJUST is optional and controlled by a flag in the struct,
and the flag will be set by KVM_GET_TSC_PRECISE based on the guest CPUID.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls
  2020-09-24 13:43                   ` Paolo Bonzini
@ 2020-09-24 15:14                     ` Maxim Levitsky
  0 siblings, 0 replies; 17+ messages in thread
From: Maxim Levitsky @ 2020-09-24 15:14 UTC (permalink / raw)
  To: Paolo Bonzini, Oliver Upton
  Cc: Jim Mattson, kvm list, Peter Shier, Sean Christopherson, Peter Hornyack

On Thu, 2020-09-24 at 15:43 +0200, Paolo Bonzini wrote:
> On 17/08/20 21:40, Oliver Upton wrote:
> > > If you don't have time to work on it I can try to find some for 5.10,
> > > but I'm not sure exactly when.
> > 
> > Shouldn't be an issue, I'll futz around with some changes to the
> > series and send them out in the coming weeks.
> 
> Ok, after looking more at the code with Maxim I can confidently say that
> it's a total mess.  And a lot of the synchronization code is dead
> because 1) as far as we could see no guest synchronizes the TSC using
> MSR_IA32_TSC; and 2) writing to MSR_IA32_TSC_ADJUST does not trigger the
> synchronization code in kvm_write_tsc.
> 
> Your patch works not by some sort of miracle, but rather because it
> bypasses the mess and that's the smart thing to do.
> 
> The plan is now as follows:
> 
> 1) guest-initiated MSR_IA32_TSC write never goes through the sync
> heuristics.  I'll shortly send a patch for this, and it will fix the
> testcase issue
> 
> 2) to have a new KVM_X86_DISABLE_QUIRKS value, that will toggle between
> "magic" and "vanilla" semantics for host-initiated TSC and TSC_ADJUST writes
> 
> 3) if the quirk is present we still want existing userspace to work so:
> 
> - host-initiated MSR_IA32_TSC write always returns the L1 TSC as in
> Maxim's recent patch.  They will also always go through the sync heuristics.
> 
> - host-initiated MSR_IA32_TSC_ADJUST write don't make the TSC jump, they
> only write to vcpu->arch.ia32_tsc_adjust_msr (as in the current kernel)
> 
> 4) if the quirk is disabled however:
> 
> - the sync heuristics are never used except in kvm_arch_vcpu_postcreate
> 
> - host-initiated MSR_IA32_TSC and MSR_IA32_TSC_ADJUST accesses work like
> in the guest: reads of MSR_IA32_TSC return the "right" TSC, writes of
> MSR_IA32_TSC_ADJUST writes make the TSC jump.
> 
> - for live migration, userspace is expected to use the new
> KVM_GET/SET_TSC_PRECISE (or whatever the name will be) to get/set a
> (nanosecond, TSC, TSC_ADJUST) tuple.  The sync heuristics will be
> bypassed and it will just set the right value for the MSRs.  Setting
> MSR_IA32_TSC_ADJUST is optional and controlled by a flag in the struct,
> and the flag will be set by KVM_GET_TSC_PRECISE based on the guest CPUID.
> 
> Paolo
> 

I'll soon implement this.
Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-09-24 15:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-22  3:26 [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
2020-07-22  3:26 ` [PATCH v3 1/5] kvm: x86: refactor masterclock sync heuristics out of kvm_write_tsc Oliver Upton
2020-07-22  3:26 ` [PATCH v3 2/5] kvm: x86: add KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
2020-07-22  3:26 ` [PATCH v3 3/5] kvm: vmx: check tsc offsetting with nested_cpu_has() Oliver Upton
2020-07-22  3:26 ` [PATCH v3 4/5] selftests: kvm: use a helper function for reading cpuid Oliver Upton
2020-07-22  3:26 ` [PATCH v3 5/5] selftests: kvm: introduce tsc_offset_test Oliver Upton
2020-07-28 16:33 ` [PATCH v3 0/5] KVM_{GET,SET}_TSC_OFFSET ioctls Oliver Upton
2020-08-05 16:06   ` Oliver Upton
2020-08-05 18:46     ` Paolo Bonzini
2020-08-05 21:33       ` Oliver Upton
2020-08-05 22:01         ` Jim Mattson
2020-08-12 13:41           ` Paolo Bonzini
2020-08-14 16:55             ` Oliver Upton
2020-08-17 16:49               ` Paolo Bonzini
2020-08-17 19:40                 ` Oliver Upton
2020-09-24 13:43                   ` Paolo Bonzini
2020-09-24 15:14                     ` Maxim Levitsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.