kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state
@ 2021-08-16  0:11 Oliver Upton
  2021-08-16  0:11 ` [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

KVM's current means of saving/restoring system counters is plagued with
temporal issues. On x86, we migrate the guest's system counter by-value
through the respective guest's IA32_TSC value. Restoring system counters
by-value is brittle as the state is not idempotent: the host system
counter is still oscillating between the attempted save and restore.
Furthermore, VMMs may wish to transparently live migrate guest VMs,
meaning that they include the elapsed time due to live migration blackout
in the guest system counter view. The VMM thread could be preempted for
any number of reasons (scheduler, L0 hypervisor under nested) between the
time that it calculates the desired guest counter value and when
KVM actually sets this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of the TSC offset.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in a new device attribute. This
series introduces new vCPU device attributes to provide userspace access
to the vCPU's system counter offset.

Patch 1 addresses a possible race in KVM_GET_CLOCK where
use_master_clock is read outside of the pvclock_gtod_sync_lock.

Patch 2 is a cleanup, moving the implementation of KVM_{GET,SET}_CLOCK
into helper methods.

Patch 3 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
ioctls to provide userspace with a (host_tsc, realtime) instant. This is
essential for a VMM to perform precise migration of the guest's system
counters.

Patches 4-5 are some preparatory changes for exposing the TSC offset to
userspace. Patch 6 provides a vCPU attribute to provide userspace access
to the TSC offset.

This series was tested with the new KVM selftests for the KVM clock and
system counter offset controls on Haswell hardware. Note that these
tests are mailed as a separate series due to the dependencies in both
x86 and arm64.

Applies cleanly to kvm/queue.

Parent commit: a3e0b8bd99ab ("KVM: MMU: change tracepoints arguments to kvm_page_fault")

v6: https://lore.kernel.org/r/20210804085819.846610-1-oupton@google.com

v6 -> v7:
 - Separated x86, arm64, and selftests into different series
 - Rebased on top of kvm/queue

Oliver Upton (6):
  KVM: x86: Fix potential race in KVM_GET_CLOCK
  KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls
  KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
  KVM: x86: Refactor tsc synchronization code
  KVM: x86: Expose TSC offset controls to userspace

 Documentation/virt/kvm/api.rst          |  42 ++-
 Documentation/virt/kvm/devices/vcpu.rst |  57 ++++
 Documentation/virt/kvm/locking.rst      |  11 +
 arch/x86/include/asm/kvm_host.h         |   4 +
 arch/x86/include/uapi/asm/kvm.h         |   4 +
 arch/x86/kvm/x86.c                      | 362 +++++++++++++++++-------
 include/uapi/linux/kvm.h                |   7 +-
 7 files changed, 378 insertions(+), 109 deletions(-)

-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
@ 2021-08-16  0:11 ` Oliver Upton
  2021-08-19 18:24   ` Marcelo Tosatti
  2021-08-16  0:11 ` [PATCH v7 2/6] KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls Oliver Upton
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock
outside of the pvclock sync lock. This is problematic, as the clock
value written to the user may or may not actually correspond to a stable
TSC.

Fix the race by populating the entire kvm_clock_data structure behind
the pvclock_gtod_sync_lock.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fdc0c18339fb..2f3929bd5f58 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2787,19 +2787,20 @@ static void kvm_update_masterclock(struct kvm *kvm)
 	kvm_end_pvclock_update(kvm);
 }
 
-u64 get_kvmclock_ns(struct kvm *kvm)
+static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
 {
 	struct kvm_arch *ka = &kvm->arch;
 	struct pvclock_vcpu_time_info hv_clock;
 	unsigned long flags;
-	u64 ret;
 
 	spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags);
 	if (!ka->use_master_clock) {
 		spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags);
-		return get_kvmclock_base_ns() + ka->kvmclock_offset;
+		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
+		return;
 	}
 
+	data->flags |= KVM_CLOCK_TSC_STABLE;
 	hv_clock.tsc_timestamp = ka->master_cycle_now;
 	hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
 	spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags);
@@ -2811,13 +2812,26 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
 				   &hv_clock.tsc_shift,
 				   &hv_clock.tsc_to_system_mul);
-		ret = __pvclock_read_cycles(&hv_clock, rdtsc());
-	} else
-		ret = get_kvmclock_base_ns() + ka->kvmclock_offset;
+		data->clock = __pvclock_read_cycles(&hv_clock, rdtsc());
+	} else {
+		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
+	}
 
 	put_cpu();
+}
 
-	return ret;
+u64 get_kvmclock_ns(struct kvm *kvm)
+{
+	struct kvm_clock_data data;
+
+	/*
+	 * Zero flags as it's accessed RMW, leave everything else uninitialized
+	 * as clock is always written and no other fields are consumed.
+	 */
+	data.flags = 0;
+
+	get_kvmclock(kvm, &data);
+	return data.clock;
 }
 
 static void kvm_setup_pvclock_page(struct kvm_vcpu *v,
@@ -6098,11 +6112,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	}
 	case KVM_GET_CLOCK: {
 		struct kvm_clock_data user_ns;
-		u64 now_ns;
 
-		now_ns = get_kvmclock_ns(kvm);
-		user_ns.clock = now_ns;
-		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
+		/*
+		 * Zero flags as it is accessed RMW, leave everything else
+		 * uninitialized as clock is always written and no other fields
+		 * are consumed.
+		 */
+		user_ns.flags = 0;
+		get_kvmclock(kvm, &user_ns);
 		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
 
 		r = -EFAULT;
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 2/6] KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
  2021-08-16  0:11 ` [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
@ 2021-08-16  0:11 ` Oliver Upton
  2021-08-16  0:11 ` [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK Oliver Upton
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

Wrap the existing implementation of the KVM_{GET,SET}_CLOCK ioctls in
helper methods.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/x86.c | 107 ++++++++++++++++++++++++---------------------
 1 file changed, 57 insertions(+), 50 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f3929bd5f58..39eaa2fb2001 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5833,12 +5833,65 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state)
 }
 #endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */
 
+static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_clock_data data;
+
+	/*
+	 * Zero flags as it is accessed RMW, leave everything else
+	 * uninitialized as clock is always written and no other fields
+	 * are consumed.
+	 */
+	data.flags = 0;
+	get_kvmclock(kvm, &data);
+	memset(&data.pad, 0, sizeof(data.pad));
+
+	if (copy_to_user(argp, &data, sizeof(data)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
+{
+	struct kvm_arch *ka = &kvm->arch;
+	struct kvm_clock_data data;
+	u64 now_ns;
+
+	if (copy_from_user(&data, argp, sizeof(data)))
+		return -EFAULT;
+
+	if (data.flags)
+		return -EINVAL;
+
+	kvm_hv_invalidate_tsc_page(kvm);
+	kvm_start_pvclock_update(kvm);
+	pvclock_update_vm_gtod_copy(kvm);
+
+	/*
+	 * This pairs with kvm_guest_time_update(): when masterclock is
+	 * in use, we use master_kernel_ns + kvmclock_offset to set
+	 * unsigned 'system_time' so if we use get_kvmclock_ns() (which
+	 * is slightly ahead) here we risk going negative on unsigned
+	 * 'system_time' when 'data.clock' is very small.
+	 */
+	if (kvm->arch.use_master_clock)
+		now_ns = ka->master_kernel_ns;
+	else
+		now_ns = get_kvmclock_base_ns();
+	ka->kvmclock_offset = data.clock - now_ns;
+	kvm_end_pvclock_update(kvm);
+
+	return 0;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
 	struct kvm *kvm = filp->private_data;
 	void __user *argp = (void __user *)arg;
 	int r = -ENOTTY;
+
 	/*
 	 * This union makes it completely explicit to gcc-3.x
 	 * that these two variables' stack usage should be
@@ -6076,58 +6129,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		break;
 	}
 #endif
-	case KVM_SET_CLOCK: {
-		struct kvm_arch *ka = &kvm->arch;
-		struct kvm_clock_data user_ns;
-		u64 now_ns;
-
-		r = -EFAULT;
-		if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
-			goto out;
-
-		r = -EINVAL;
-		if (user_ns.flags)
-			goto out;
-
-		r = 0;
-
-		kvm_hv_invalidate_tsc_page(kvm);
-		kvm_start_pvclock_update(kvm);
-		pvclock_update_vm_gtod_copy(kvm);
-
-		/*
-		 * This pairs with kvm_guest_time_update(): when masterclock is
-		 * in use, we use master_kernel_ns + kvmclock_offset to set
-		 * unsigned 'system_time' so if we use get_kvmclock_ns() (which
-		 * is slightly ahead) here we risk going negative on unsigned
-		 * 'system_time' when 'user_ns.clock' is very small.
-		 */
-		if (kvm->arch.use_master_clock)
-			now_ns = ka->master_kernel_ns;
-		else
-			now_ns = get_kvmclock_base_ns();
-		ka->kvmclock_offset = user_ns.clock - now_ns;
-		kvm_end_pvclock_update(kvm);
+	case KVM_SET_CLOCK:
+		r = kvm_vm_ioctl_set_clock(kvm, argp);
 		break;
-	}
-	case KVM_GET_CLOCK: {
-		struct kvm_clock_data user_ns;
-
-		/*
-		 * Zero flags as it is accessed RMW, leave everything else
-		 * uninitialized as clock is always written and no other fields
-		 * are consumed.
-		 */
-		user_ns.flags = 0;
-		get_kvmclock(kvm, &user_ns);
-		memset(&user_ns.pad, 0, sizeof(user_ns.pad));
-
-		r = -EFAULT;
-		if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
-			goto out;
-		r = 0;
+	case KVM_GET_CLOCK:
+		r = kvm_vm_ioctl_get_clock(kvm, argp);
 		break;
-	}
 	case KVM_MEMORY_ENCRYPT_OP: {
 		r = -ENOTTY;
 		if (kvm_x86_ops.mem_enc_op)
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
  2021-08-16  0:11 ` [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
  2021-08-16  0:11 ` [PATCH v7 2/6] KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls Oliver Upton
@ 2021-08-16  0:11 ` Oliver Upton
  2021-08-20 12:46   ` Marcelo Tosatti
  2021-08-16  0:11 ` [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock Oliver Upton
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst  | 42 ++++++++++++++++++++++++++-------
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/x86.c              | 34 ++++++++++++++++++--------
 include/uapi/linux/kvm.h        |  7 +++++-
 4 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 86d7ad3a126c..b3d12bf9fbf5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -993,20 +993,34 @@ such as migration.
 When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
 set of bits that KVM can return in struct kvm_clock_data's flag member.
 
-The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
-value is the exact kvmclock value seen by all VCPUs at the instant
-when KVM_GET_CLOCK was called.  If clear, the returned value is simply
-CLOCK_MONOTONIC plus a constant offset; the offset can be modified
-with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
-but the exact value read by each VCPU could differ, because the host
-TSC is not stable.
+FLAGS:
+
+KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
+value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
+If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
+offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
+to make all VCPUs follow this clock, but the exact value read by each
+VCPU could differ, because the host TSC is not stable.
+
+KVM_CLOCK_REALTIME.  If set, the `realtime` field in the kvm_clock_data
+structure is populated with the value of the host's real time
+clocksource at the instant when KVM_GET_CLOCK was called. If clear,
+the `realtime` field does not contain a value.
+
+KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
+structure is populated with the value of the host's timestamp counter (TSC)
+at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
+does not contain a value.
 
 ::
 
   struct kvm_clock_data {
 	__u64 clock;  /* kvmclock current value */
 	__u32 flags;
-	__u32 pad[9];
+	__u32 pad0;
+	__u64 realtime;
+	__u64 host_tsc;
+	__u32 pad[4];
   };
 
 
@@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
 such as migration.
 
+FLAGS:
+
+KVM_CLOCK_REALTIME.  If set, KVM will compare the value of the `realtime` field
+with the value of the host's real time clocksource at the instant when
+KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
+kvmclock value that will be provided to guests.
+
 ::
 
   struct kvm_clock_data {
 	__u64 clock;  /* kvmclock current value */
 	__u32 flags;
-	__u32 pad[9];
+	__u32 pad0;
+	__u64 realtime;
+	__u64 host_tsc;
+	__u32 pad[4];
   };
 
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20daaf67a5bf..7fad2615f4a9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1916,4 +1916,7 @@ int kvm_cpu_dirty_log_size(void);
 
 int alloc_all_memslots_rmaps(struct kvm *kvm);
 
+#define KVM_CLOCK_VALID_FLAGS						\
+	(KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC)
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 39eaa2fb2001..b1e9a4885be6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2809,10 +2809,20 @@ static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data)
 	get_cpu();
 
 	if (__this_cpu_read(cpu_tsc_khz)) {
+#ifdef CONFIG_X86_64
+		struct timespec64 ts;
+
+		if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) {
+			data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec;
+			data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC;
+		} else
+#endif
+		data->host_tsc = rdtsc();
+
 		kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
 				   &hv_clock.tsc_shift,
 				   &hv_clock.tsc_to_system_mul);
-		data->clock = __pvclock_read_cycles(&hv_clock, rdtsc());
+		data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc);
 	} else {
 		data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset;
 	}
@@ -4052,7 +4062,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_SYNC_X86_VALID_FIELDS;
 		break;
 	case KVM_CAP_ADJUST_CLOCK:
-		r = KVM_CLOCK_TSC_STABLE;
+		r = KVM_CLOCK_VALID_FLAGS;
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
 		r |=  KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE |
@@ -5837,14 +5847,8 @@ static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_clock_data data;
 
-	/*
-	 * Zero flags as it is accessed RMW, leave everything else
-	 * uninitialized as clock is always written and no other fields
-	 * are consumed.
-	 */
-	data.flags = 0;
+	memset(&data, 0, sizeof(data));
 	get_kvmclock(kvm, &data);
-	memset(&data.pad, 0, sizeof(data.pad));
 
 	if (copy_to_user(argp, &data, sizeof(data)))
 		return -EFAULT;
@@ -5861,13 +5865,23 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
 	if (copy_from_user(&data, argp, sizeof(data)))
 		return -EFAULT;
 
-	if (data.flags)
+	if (data.flags & ~KVM_CLOCK_REALTIME)
 		return -EINVAL;
 
 	kvm_hv_invalidate_tsc_page(kvm);
 	kvm_start_pvclock_update(kvm);
 	pvclock_update_vm_gtod_copy(kvm);
 
+	if (data.flags & KVM_CLOCK_REALTIME) {
+		u64 now_real_ns = ktime_get_real_ns();
+
+		/*
+		 * Avoid stepping the kvmclock backwards.
+		 */
+		if (now_real_ns > data.realtime)
+			data.clock += now_real_ns - data.realtime;
+	}
+
 	/*
 	 * This pairs with kvm_guest_time_update(): when masterclock is
 	 * in use, we use master_kernel_ns + kvmclock_offset to set
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..d228bf394465 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1223,11 +1223,16 @@ struct kvm_irqfd {
 
 /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags.  */
 #define KVM_CLOCK_TSC_STABLE		2
+#define KVM_CLOCK_REALTIME		(1 << 2)
+#define KVM_CLOCK_HOST_TSC		(1 << 3)
 
 struct kvm_clock_data {
 	__u64 clock;
 	__u32 flags;
-	__u32 pad[9];
+	__u32 pad0;
+	__u64 realtime;
+	__u64 host_tsc;
+	__u32 pad[4];
 };
 
 /* For KVM_CAP_SW_TLB */
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
                   ` (2 preceding siblings ...)
  2021-08-16  0:11 ` [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK Oliver Upton
@ 2021-08-16  0:11 ` Oliver Upton
  2021-09-02 19:22   ` Sean Christopherson
  2021-08-16  0:11 ` [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code Oliver Upton
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

A later change requires that the pvclock sync lock be taken while
holding the tsc_write_lock. Change the locking in kvm_synchronize_tsc()
to align with the requirement to isolate the locking change to its own
commit.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/locking.rst | 11 +++++++++++
 arch/x86/kvm/x86.c                 |  2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 8138201efb09..0bf346adac2a 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -36,6 +36,9 @@ On x86:
   holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
   there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
 
+- kvm->arch.tsc_write_lock is taken outside
+  kvm->arch.pvclock_gtod_sync_lock
+
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
@@ -222,6 +225,14 @@ time it will be set using the Dirty tracking mechanism described above.
 :Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
 		migration.
 
+:Name:		kvm_arch::pvclock_gtod_sync_lock
+:Type:		raw_spinlock_t
+:Arch:		x86
+:Protects:	kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
+			cur_tsc_offset,nr_vcpus_matched_tsc}
+:Comment:	'raw' because updating the kvm master clock must not be
+		preempted.
+
 :Name:		kvm_arch::tsc_write_lock
 :Type:		raw_spinlock
 :Arch:		x86
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b1e9a4885be6..f1434cd388b9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2533,7 +2533,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
 
 	kvm_vcpu_write_tsc_offset(vcpu, offset);
-	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
 	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
 	if (!matched) {
@@ -2544,6 +2543,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 
 	kvm_track_tsc_matching(vcpu);
 	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 }
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
                   ` (3 preceding siblings ...)
  2021-08-16  0:11 ` [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock Oliver Upton
@ 2021-08-16  0:11 ` Oliver Upton
  2021-09-02 19:21   ` Sean Christopherson
  2021-08-16  0:11 ` [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace Oliver Upton
  2021-09-02 19:23 ` [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Sean Christopherson
  6 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/x86.c | 105 ++++++++++++++++++++++++++-------------------
 1 file changed, 61 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1434cd388b9..9d0445527dad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2447,13 +2447,71 @@ static inline bool kvm_check_tsc_unstable(void)
 	return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+				  u64 ns, bool matched)
+{
+	struct kvm *kvm = vcpu->kvm;
+	bool already_matched;
+
+	lockdep_assert_held(&kvm->arch.tsc_write_lock);
+
+	already_matched =
+	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+	/*
+	 * We track the most recent recorded KHZ, write and time to
+	 * allow the matching interval to be extended at each write.
+	 */
+	kvm->arch.last_tsc_nsec = ns;
+	kvm->arch.last_tsc_write = tsc;
+	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+	vcpu->arch.last_guest_tsc = tsc;
+
+	/* Keep track of which generation this VCPU has synchronized to */
+	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+	kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+	if (!matched) {
+		/*
+		 * We split periods of matched TSC writes into generations.
+		 * For each generation, we track the original measured
+		 * nanosecond time, offset, and write, so if TSCs are in
+		 * sync, we can match exact offset, and if not, we can match
+		 * exact software computation in compute_guest_tsc()
+		 *
+		 * These values are tracked in kvm->arch.cur_xxx variables.
+		 */
+		kvm->arch.cur_tsc_generation++;
+		kvm->arch.cur_tsc_nsec = ns;
+		kvm->arch.cur_tsc_write = tsc;
+		kvm->arch.cur_tsc_offset = offset;
+
+		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+		kvm->arch.nr_vcpus_matched_tsc = 0;
+	} else if (!already_matched) {
+		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+		kvm->arch.nr_vcpus_matched_tsc++;
+	}
+
+	kvm_track_tsc_matching(vcpu);
+	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
 	u64 offset, ns, elapsed;
 	unsigned long flags;
-	bool matched;
-	bool already_matched;
+	bool matched = false;
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@@ -2499,50 +2557,9 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 			offset = kvm_compute_l1_tsc_offset(vcpu, data);
 		}
 		matched = true;
-		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
-	} else {
-		/*
-		 * We split periods of matched TSC writes into generations.
-		 * For each generation, we track the original measured
-		 * nanosecond time, offset, and write, so if TSCs are in
-		 * sync, we can match exact offset, and if not, we can match
-		 * exact software computation in compute_guest_tsc()
-		 *
-		 * These values are tracked in kvm->arch.cur_xxx variables.
-		 */
-		kvm->arch.cur_tsc_generation++;
-		kvm->arch.cur_tsc_nsec = ns;
-		kvm->arch.cur_tsc_write = data;
-		kvm->arch.cur_tsc_offset = offset;
-		matched = false;
 	}
 
-	/*
-	 * We also track th most recent recorded KHZ, write and time to
-	 * allow the matching interval to be extended at each write.
-	 */
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-	vcpu->arch.last_guest_tsc = data;
-
-	/* Keep track of which generation this VCPU has synchronized to */
-	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
-	kvm_vcpu_write_tsc_offset(vcpu, offset);
-
-	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
-	if (!matched) {
-		kvm->arch.nr_vcpus_matched_tsc = 0;
-	} else if (!already_matched) {
-		kvm->arch.nr_vcpus_matched_tsc++;
-	}
-
-	kvm_track_tsc_matching(vcpu);
-	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+	__kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 }
 
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
                   ` (4 preceding siblings ...)
  2021-08-16  0:11 ` [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code Oliver Upton
@ 2021-08-16  0:11 ` Oliver Upton
  2021-08-23 20:56   ` Oliver Upton
  2021-09-02 19:23 ` [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Sean Christopherson
  6 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-16  0:11 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas, Oliver Upton

To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/devices/vcpu.rst |  57 +++++++++++++
 arch/x86/include/asm/kvm_host.h         |   1 +
 arch/x86/include/uapi/asm/kvm.h         |   4 +
 arch/x86/kvm/x86.c                      | 109 ++++++++++++++++++++++++
 4 files changed, 171 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 2acec3b9ef65..3b399d727c11 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -161,3 +161,60 @@ Specifies the base address of the stolen time structure for this VCPU. The
 base address must be 64 byte aligned and exist within a valid guest memory
 region. See Documentation/virt/kvm/arm/pvtime.rst for more information
 including the layout of the stolen time structure.
+
+4. GROUP: KVM_VCPU_TSC_CTRL
+===========================
+
+:Architectures: x86
+
+4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
+
+:Parameters: 64-bit unsigned TSC offset
+
+Returns:
+
+	 ======= ======================================
+	 -EFAULT Error reading/writing the provided
+		 parameter address.
+	 -ENXIO  Attribute not supported
+	 ======= ======================================
+
+Specifies the guest's TSC offset relative to the host's TSC. The guest's
+TSC is then derived by the following equation:
+
+  guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
+
+This attribute is useful for the precise migration of a guest's TSC. The
+following describes a possible algorithm to use for the migration of a
+guest's TSC:
+
+From the source VMM process:
+
+1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_0),
+   kvmclock nanoseconds (k_0), and realtime nanoseconds (r_0).
+
+2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
+   guest TSC offset (off_n).
+
+3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
+   guest's TSC (freq).
+
+From the destination VMM process:
+
+4. Invoke the KVM_SET_CLOCK ioctl, providing the kvmclock nanoseconds
+   (k_0) and realtime nanoseconds (r_0) in their respective fields.
+   Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
+   structure. KVM will advance the VM's kvmclock to account for elapsed
+   time since recording the clock values.
+
+5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (t_1) and
+   kvmclock nanoseconds (k_1).
+
+6. Adjust the guest TSC offsets for every vCPU to account for (1) time
+   elapsed since recording state and (2) difference in TSCs between the
+   source and destination machine:
+
+   new_off_n = t_0 + off_n + (k_1 - k_0) * freq - t_1
+
+7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
+   respective value derived in the previous step.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7fad2615f4a9..376b26a294c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1071,6 +1071,7 @@ struct kvm_arch {
 	u64 last_tsc_nsec;
 	u64 last_tsc_write;
 	u32 last_tsc_khz;
+	u64 last_tsc_offset;
 	u64 cur_tsc_nsec;
 	u64 cur_tsc_write;
 	u64 cur_tsc_offset;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index a6c327f8ad9e..0b22e1e84e78 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -503,4 +503,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
+#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
+#define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d0445527dad..0b1398d439c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2470,6 +2470,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
 	kvm->arch.last_tsc_nsec = ns;
 	kvm->arch.last_tsc_write = tsc;
 	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+	kvm->arch.last_tsc_offset = offset;
 
 	vcpu->arch.last_guest_tsc = tsc;
 
@@ -4923,6 +4924,109 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu,
+				 struct kvm_device_attr *attr)
+{
+	int r;
+
+	switch (attr->attr) {
+	case KVM_VCPU_TSC_OFFSET:
+		r = 0;
+		break;
+	default:
+		r = -ENXIO;
+	}
+
+	return r;
+}
+
+static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu,
+				 struct kvm_device_attr *attr)
+{
+	u64 __user *uaddr = (u64 __user *)attr->addr;
+	int r;
+
+	switch (attr->attr) {
+	case KVM_VCPU_TSC_OFFSET:
+		r = -EFAULT;
+		if (put_user(vcpu->arch.l1_tsc_offset, uaddr))
+			break;
+		r = 0;
+		break;
+	default:
+		r = -ENXIO;
+	}
+
+	return r;
+}
+
+static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu,
+				 struct kvm_device_attr *attr)
+{
+	u64 __user *uaddr = (u64 __user *)attr->addr;
+	struct kvm *kvm = vcpu->kvm;
+	int r;
+
+	switch (attr->attr) {
+	case KVM_VCPU_TSC_OFFSET: {
+		u64 offset, tsc, ns;
+		unsigned long flags;
+		bool matched;
+
+		r = -EFAULT;
+		if (get_user(offset, uaddr))
+			break;
+
+		raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+
+		matched = (vcpu->arch.virtual_tsc_khz &&
+			   kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz &&
+			   kvm->arch.last_tsc_offset == offset);
+
+		tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset;
+		ns = get_kvmclock_base_ns();
+
+		__kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
+		raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+
+		r = 0;
+		break;
+	}
+	default:
+		r = -ENXIO;
+	}
+
+	return r;
+}
+
+static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu,
+				      unsigned int ioctl,
+				      void __user *argp)
+{
+	struct kvm_device_attr attr;
+	int r;
+
+	if (copy_from_user(&attr, argp, sizeof(attr)))
+		return -EFAULT;
+
+	if (attr.group != KVM_VCPU_TSC_CTRL)
+		return -ENXIO;
+
+	switch (ioctl) {
+	case KVM_HAS_DEVICE_ATTR:
+		r = kvm_arch_tsc_has_attr(vcpu, &attr);
+		break;
+	case KVM_GET_DEVICE_ATTR:
+		r = kvm_arch_tsc_get_attr(vcpu, &attr);
+		break;
+	case KVM_SET_DEVICE_ATTR:
+		r = kvm_arch_tsc_set_attr(vcpu, &attr);
+		break;
+	}
+
+	return r;
+}
+
 static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 				     struct kvm_enable_cap *cap)
 {
@@ -5377,6 +5481,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		r = __set_sregs2(vcpu, u.sregs2);
 		break;
 	}
+	case KVM_HAS_DEVICE_ATTR:
+	case KVM_GET_DEVICE_ATTR:
+	case KVM_SET_DEVICE_ATTR:
+		r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp);
+		break;
 	default:
 		r = -EINVAL;
 	}
-- 
2.33.0.rc1.237.g0d66db33f3-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK
  2021-08-16  0:11 ` [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
@ 2021-08-19 18:24   ` Marcelo Tosatti
  2021-08-20 18:22     ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-08-19 18:24 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Sean Christopherson, Marc Zyngier,
	Peter Shier, Jim Mattson, David Matlack, Ricardo Koller,
	Jing Zhang, Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Mon, Aug 16, 2021 at 12:11:25AM +0000, Oliver Upton wrote:
> Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock
> outside of the pvclock sync lock. This is problematic, as the clock
> value written to the user may or may not actually correspond to a stable
> TSC.
> 
> Fix the race by populating the entire kvm_clock_data structure behind
> the pvclock_gtod_sync_lock.

Oliver, 

Can you please describe the race in more detail?

Is it about host TSC going unstable VS parallel KVM_GET_CLOCK ? 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  2021-08-16  0:11 ` [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK Oliver Upton
@ 2021-08-20 12:46   ` Marcelo Tosatti
  2021-09-24  8:30     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-08-20 12:46 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Sean Christopherson, Marc Zyngier,
	Peter Shier, Jim Mattson, David Matlack, Ricardo Koller,
	Jing Zhang, Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Mon, Aug 16, 2021 at 12:11:27AM +0000, Oliver Upton wrote:
> Handling the migration of TSCs correctly is difficult, in part because
> Linux does not provide userspace with the ability to retrieve a (TSC,
> realtime) clock pair for a single instant in time. In lieu of a more
> convenient facility, KVM can report similar information in the kvm_clock
> structure.
> 
> Provide userspace with a host TSC & realtime pair iff the realtime clock
> is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
> realtime value, advance the KVM clock by the amount of elapsed time. Do
> not step the KVM clock backwards, though, as it is a monotonic
> oscillator.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Oliver Upton <oupton@google.com>

This is a good idea. Userspace could check if host and destination
clocks are up to a certain difference and not use the feature if
not appropriate.

Is there a qemu patch for it?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK
  2021-08-19 18:24   ` Marcelo Tosatti
@ 2021-08-20 18:22     ` Oliver Upton
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-20 18:22 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, kvmarm, Paolo Bonzini, Sean Christopherson, Marc Zyngier,
	Peter Shier, Jim Mattson, David Matlack, Ricardo Koller,
	Jing Zhang, Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Thu, Aug 19, 2021 at 11:43 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Mon, Aug 16, 2021 at 12:11:25AM +0000, Oliver Upton wrote:
> > Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock
> > outside of the pvclock sync lock. This is problematic, as the clock
> > value written to the user may or may not actually correspond to a stable
> > TSC.
> >
> > Fix the race by populating the entire kvm_clock_data structure behind
> > the pvclock_gtod_sync_lock.
>
> Oliver,
>
> Can you please describe the race in more detail?
>
> Is it about host TSC going unstable VS parallel KVM_GET_CLOCK ?
>

Yeah, pretty much any event that causes us to set use_master_clock =
false could interleave with the KVM_GET_CLOCK ioctl. A guest could
kick its TSCs out of sync, for example, to cause this too. AFAICT, KVM
serializes the write side (pvclock_update_vm_gtod_copy()) with
pvclock_gtod_sync_lock, as it should.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace
  2021-08-16  0:11 ` [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace Oliver Upton
@ 2021-08-23 20:56   ` Oliver Upton
  2021-08-26 12:48     ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-23 20:56 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

Paolo,

On Sun, Aug 15, 2021 at 5:11 PM Oliver Upton <oupton@google.com> wrote:
>
> To date, VMM-directed TSC synchronization and migration has been a bit
> messy. KVM has some baked-in heuristics around TSC writes to infer if
> the VMM is attempting to synchronize. This is problematic, as it depends
> on host userspace writing to the guest's TSC within 1 second of the last
> write.
>
> A much cleaner approach to configuring the guest's views of the TSC is to
> simply migrate the TSC offset for every vCPU. Offsets are idempotent,
> and thus not subject to change depending on when the VMM actually
> reads/writes values from/to KVM. The VMM can then read the TSC once with
> KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
> the guest is paused.
>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>

Could you please squash the following into this patch? We need to
advertise KVM_CAP_VCPU_ATTRIBUTES to userspace. Otherwise, happy to
resend.

Thanks,
Oliver

 arch/x86/kvm/x86.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b946430faaae..b5be1ca07704 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4070,6 +4070,7 @@ int kvm_vm_ioctl_check_extension(struct kvm
*kvm, long ext)
        case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
        case KVM_CAP_SREGS2:
        case KVM_CAP_EXIT_ON_EMULATION_FAILURE:
+       case KVM_CAP_VCPU_ATTRIBUTES:
                r = 1;
                break;
        case KVM_CAP_EXIT_HYPERCALL:
-- 
2.33.0.rc2.250.ged5fa647cd-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace
  2021-08-23 20:56   ` Oliver Upton
@ 2021-08-26 12:48     ` Marcelo Tosatti
  2021-08-26 20:27       ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2021-08-26 12:48 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Sean Christopherson, Marc Zyngier,
	Peter Shier, Jim Mattson, David Matlack, Ricardo Koller,
	Jing Zhang, Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Mon, Aug 23, 2021 at 01:56:30PM -0700, Oliver Upton wrote:
> Paolo,
> 
> On Sun, Aug 15, 2021 at 5:11 PM Oliver Upton <oupton@google.com> wrote:
> >
> > To date, VMM-directed TSC synchronization and migration has been a bit
> > messy. KVM has some baked-in heuristics around TSC writes to infer if
> > the VMM is attempting to synchronize. This is problematic, as it depends
> > on host userspace writing to the guest's TSC within 1 second of the last
> > write.
> >
> > A much cleaner approach to configuring the guest's views of the TSC is to
> > simply migrate the TSC offset for every vCPU. Offsets are idempotent,
> > and thus not subject to change depending on when the VMM actually
> > reads/writes values from/to KVM. The VMM can then read the TSC once with
> > KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
> > the guest is paused.
> >
> > Cc: David Matlack <dmatlack@google.com>
> > Cc: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> 
> Could you please squash the following into this patch? We need to
> advertise KVM_CAP_VCPU_ATTRIBUTES to userspace. Otherwise, happy to
> resend.
> 
> Thanks,
> Oliver

Oliver,

Is there QEMU support for this, or are you using your own
userspace with this?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace
  2021-08-26 12:48     ` Marcelo Tosatti
@ 2021-08-26 20:27       ` Oliver Upton
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-26 20:27 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: kvm, kvmarm, Paolo Bonzini, Sean Christopherson, Marc Zyngier,
	Peter Shier, Jim Mattson, David Matlack, Ricardo Koller,
	Jing Zhang, Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

Marcelo,

On Thu, Aug 26, 2021 at 09:48:36AM -0300, Marcelo Tosatti wrote:
> On Mon, Aug 23, 2021 at 01:56:30PM -0700, Oliver Upton wrote:
> > Paolo,
> > 
> > On Sun, Aug 15, 2021 at 5:11 PM Oliver Upton <oupton@google.com> wrote:
> > >
> > > To date, VMM-directed TSC synchronization and migration has been a bit
> > > messy. KVM has some baked-in heuristics around TSC writes to infer if
> > > the VMM is attempting to synchronize. This is problematic, as it depends
> > > on host userspace writing to the guest's TSC within 1 second of the last
> > > write.
> > >
> > > A much cleaner approach to configuring the guest's views of the TSC is to
> > > simply migrate the TSC offset for every vCPU. Offsets are idempotent,
> > > and thus not subject to change depending on when the VMM actually
> > > reads/writes values from/to KVM. The VMM can then read the TSC once with
> > > KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
> > > the guest is paused.
> > >
> > > Cc: David Matlack <dmatlack@google.com>
> > > Cc: Sean Christopherson <seanjc@google.com>
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > 
> > Could you please squash the following into this patch? We need to
> > advertise KVM_CAP_VCPU_ATTRIBUTES to userspace. Otherwise, happy to
> > resend.
> > 
> > Thanks,
> > Oliver
> 
> Oliver,
> 
> Is there QEMU support for this, or are you using your own
> userspace with this?

Apologies for not getting back to you on your first mail. Sadly, I am
using our own userspace for this. That being said, adding support to
QEMU shouldn't be too challenging. I can take a stab at it if it makes
the series more amenable to upstream, with the giant disclaimer that I
haven't done work in QEMU before. Otherwise, happy to review someone
else's implementation.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code
  2021-08-16  0:11 ` [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code Oliver Upton
@ 2021-09-02 19:21   ` Sean Christopherson
  2021-09-02 19:41     ` Oliver Upton
  2021-09-24  9:28     ` Paolo Bonzini
  0 siblings, 2 replies; 20+ messages in thread
From: Sean Christopherson @ 2021-09-02 19:21 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Mon, Aug 16, 2021, Oliver Upton wrote:
> Refactor kvm_synchronize_tsc to make a new function that allows callers
> to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> for the sake of participating in TSC synchronization.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
> +	struct kvm *kvm = vcpu->kvm;
> +	bool already_matched;
> +
> +	lockdep_assert_held(&kvm->arch.tsc_write_lock);
> +
> +	already_matched =
> +	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> +

...

> +	if (!matched) {
> +		/*
> +		 * We split periods of matched TSC writes into generations.
> +		 * For each generation, we track the original measured
> +		 * nanosecond time, offset, and write, so if TSCs are in
> +		 * sync, we can match exact offset, and if not, we can match
> +		 * exact software computation in compute_guest_tsc()
> +		 *
> +		 * These values are tracked in kvm->arch.cur_xxx variables.
> +		 */
> +		kvm->arch.cur_tsc_generation++;
> +		kvm->arch.cur_tsc_nsec = ns;
> +		kvm->arch.cur_tsc_write = tsc;
> +		kvm->arch.cur_tsc_offset = offset;
> +
> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> +		kvm->arch.nr_vcpus_matched_tsc = 0;
> +	} else if (!already_matched) {
> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> +		kvm->arch.nr_vcpus_matched_tsc++;
> +	}
> +
> +	kvm_track_tsc_matching(vcpu);
> +	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);

This unlock is imbalanced if matched and already_matched are both true.  It's not
immediately obvious that that _can't_ happen, and if it truly can't happen then
conditionally locking is pointless (because it's not actually conditional).

The previous code took the lock unconditionally, I don't see a strong argument
to change that, e.g. holding it for a few extra cycles while kvm->arch.cur_tsc_*
are updated is unlikely to be noticable.

If you really want to delay taking the locking, you could do

	if (!matched) {
		kvm->arch.cur_tsc_generation++;
		kvm->arch.cur_tsc_nsec = ns;
		kvm->arch.cur_tsc_write = data;
		kvm->arch.cur_tsc_offset = offset;
	}

	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
	if (!matched)
		kvm->arch.nr_vcpus_matched_tsc = 0;
	else if (!already_matched)
		kvm->arch.nr_vcpus_matched_tsc++;
	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);

or if you want to get greedy

	if (!matched || !already_matched) {
		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
		if (!matched)
			kvm->arch.nr_vcpus_matched_tsc = 0;
		else
			kvm->arch.nr_vcpus_matched_tsc++;
		spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
	}

Though I'm not sure the minor complexity is worth avoiding spinlock contention.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
  2021-08-16  0:11 ` [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock Oliver Upton
@ 2021-09-02 19:22   ` Sean Christopherson
  0 siblings, 0 replies; 20+ messages in thread
From: Sean Christopherson @ 2021-09-02 19:22 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Mon, Aug 16, 2021, Oliver Upton wrote:
> A later change requires that the pvclock sync lock be taken while
> holding the tsc_write_lock. Change the locking in kvm_synchronize_tsc()
> to align with the requirement to isolate the locking change to its own
> commit.
> 
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/locking.rst | 11 +++++++++++
>  arch/x86/kvm/x86.c                 |  2 +-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 8138201efb09..0bf346adac2a 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -36,6 +36,9 @@ On x86:
>    holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
>    there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
>  
> +- kvm->arch.tsc_write_lock is taken outside
> +  kvm->arch.pvclock_gtod_sync_lock
> +
>  Everything else is a leaf: no other lock is taken inside the critical
>  sections.
>  
> @@ -222,6 +225,14 @@ time it will be set using the Dirty tracking mechanism described above.
>  :Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
>  		migration.
>  
> +:Name:		kvm_arch::pvclock_gtod_sync_lock
> +:Type:		raw_spinlock_t
> +:Arch:		x86
> +:Protects:	kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
> +			cur_tsc_offset,nr_vcpus_matched_tsc}
> +:Comment:	'raw' because updating the kvm master clock must not be
> +		preempted.
> +
>  :Name:		kvm_arch::tsc_write_lock
>  :Type:		raw_spinlock
>  :Arch:		x86
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b1e9a4885be6..f1434cd388b9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2533,7 +2533,6 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
>  	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
>  
>  	kvm_vcpu_write_tsc_offset(vcpu, offset);
> -	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>  
>  	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
>  	if (!matched) {
> @@ -2544,6 +2543,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
>  
>  	kvm_track_tsc_matching(vcpu);
>  	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);

Drop the irqsave/irqrestore in this patch instead of doing so while refactoring
the code in the next patch.

> +	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
>  }
>  
>  static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state
  2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
                   ` (5 preceding siblings ...)
  2021-08-16  0:11 ` [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace Oliver Upton
@ 2021-09-02 19:23 ` Sean Christopherson
  2021-09-02 19:45   ` Oliver Upton
  6 siblings, 1 reply; 20+ messages in thread
From: Sean Christopherson @ 2021-09-02 19:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Mon, Aug 16, 2021, Oliver Upton wrote:
> Applies cleanly to kvm/queue.
> 
> Parent commit: a3e0b8bd99ab ("KVM: MMU: change tracepoints arguments to kvm_page_fault")

This needs a rebase, patch 2 and presumably patch 3 conflict with commit
77fcbe823f00 ("KVM: x86: Prevent 'hv_clock->system_time' from going negative in
kvm_guest_time_update()").

> v6: https://lore.kernel.org/r/20210804085819.846610-1-oupton@google.com
> 
> v6 -> v7:
>  - Separated x86, arm64, and selftests into different series
>  - Rebased on top of kvm/queue
> 
> Oliver Upton (6):
>   KVM: x86: Fix potential race in KVM_GET_CLOCK
>   KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls
>   KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
>   KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
>   KVM: x86: Refactor tsc synchronization code
>   KVM: x86: Expose TSC offset controls to userspace
> 
>  Documentation/virt/kvm/api.rst          |  42 ++-
>  Documentation/virt/kvm/devices/vcpu.rst |  57 ++++
>  Documentation/virt/kvm/locking.rst      |  11 +
>  arch/x86/include/asm/kvm_host.h         |   4 +
>  arch/x86/include/uapi/asm/kvm.h         |   4 +
>  arch/x86/kvm/x86.c                      | 362 +++++++++++++++++-------
>  include/uapi/linux/kvm.h                |   7 +-
>  7 files changed, 378 insertions(+), 109 deletions(-)
> 
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code
  2021-09-02 19:21   ` Sean Christopherson
@ 2021-09-02 19:41     ` Oliver Upton
  2021-09-24  9:28     ` Paolo Bonzini
  1 sibling, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-09-02 19:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Paolo Bonzini, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Thu, Sep 2, 2021 at 12:21 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Aug 16, 2021, Oliver Upton wrote:
> > Refactor kvm_synchronize_tsc to make a new function that allows callers
> > to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> > for the sake of participating in TSC synchronization.
> >
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> > +     struct kvm *kvm = vcpu->kvm;
> > +     bool already_matched;
> > +
> > +     lockdep_assert_held(&kvm->arch.tsc_write_lock);
> > +
> > +     already_matched =
> > +            (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> > +
>
> ...
>
> > +     if (!matched) {
> > +             /*
> > +              * We split periods of matched TSC writes into generations.
> > +              * For each generation, we track the original measured
> > +              * nanosecond time, offset, and write, so if TSCs are in
> > +              * sync, we can match exact offset, and if not, we can match
> > +              * exact software computation in compute_guest_tsc()
> > +              *
> > +              * These values are tracked in kvm->arch.cur_xxx variables.
> > +              */
> > +             kvm->arch.cur_tsc_generation++;
> > +             kvm->arch.cur_tsc_nsec = ns;
> > +             kvm->arch.cur_tsc_write = tsc;
> > +             kvm->arch.cur_tsc_offset = offset;
> > +
> > +             spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> > +             kvm->arch.nr_vcpus_matched_tsc = 0;
> > +     } else if (!already_matched) {
> > +             spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> > +             kvm->arch.nr_vcpus_matched_tsc++;
> > +     }
> > +
> > +     kvm_track_tsc_matching(vcpu);
> > +     spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
>
> This unlock is imbalanced if matched and already_matched are both true.  It's not
> immediately obvious that that _can't_ happen, and if it truly can't happen then
> conditionally locking is pointless (because it's not actually conditional).
>
> The previous code took the lock unconditionally, I don't see a strong argument
> to change that, e.g. holding it for a few extra cycles while kvm->arch.cur_tsc_*
> are updated is unlikely to be noticable.

We may have gone full circle here :-) You had said it was confusing to
hold the lock when updating kvm->arch.cur_tsc_* a while back. I do
still agree with that sentiment, but the conditional locking is odd.

> If you really want to delay taking the locking, you could do
>
>         if (!matched) {
>                 kvm->arch.cur_tsc_generation++;
>                 kvm->arch.cur_tsc_nsec = ns;
>                 kvm->arch.cur_tsc_write = data;
>                 kvm->arch.cur_tsc_offset = offset;
>         }
>
>         spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
>         if (!matched)
>                 kvm->arch.nr_vcpus_matched_tsc = 0;
>         else if (!already_matched)
>                 kvm->arch.nr_vcpus_matched_tsc++;
>         spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);

This seems the most readable, making it clear what is guarded and what
is not. I'll probably go this route.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state
  2021-09-02 19:23 ` [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Sean Christopherson
@ 2021-09-02 19:45   ` Oliver Upton
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-09-02 19:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, kvmarm, Paolo Bonzini, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On Thu, Sep 2, 2021 at 12:23 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Aug 16, 2021, Oliver Upton wrote:
> > Applies cleanly to kvm/queue.
> >
> > Parent commit: a3e0b8bd99ab ("KVM: MMU: change tracepoints arguments to kvm_page_fault")
>
> This needs a rebase, patch 2 and presumably patch 3 conflict with commit
> 77fcbe823f00 ("KVM: x86: Prevent 'hv_clock->system_time' from going negative in
> kvm_guest_time_update()").

Thanks for the heads up! I've been hands-off with this series for a
bit, as I saw Paolo was playing around with it to fold it with his
pvclock locking changes (branch kvm/paolo). I'll pick up your
suggestions and get another series out with Paolo's additions.

--
Thanks,
Oliver

> > v6: https://lore.kernel.org/r/20210804085819.846610-1-oupton@google.com
> >
> > v6 -> v7:
> >  - Separated x86, arm64, and selftests into different series
> >  - Rebased on top of kvm/queue
> >
> > Oliver Upton (6):
> >   KVM: x86: Fix potential race in KVM_GET_CLOCK
> >   KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls
> >   KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
> >   KVM: x86: Take the pvclock sync lock behind the tsc_write_lock
> >   KVM: x86: Refactor tsc synchronization code
> >   KVM: x86: Expose TSC offset controls to userspace
> >
> >  Documentation/virt/kvm/api.rst          |  42 ++-
> >  Documentation/virt/kvm/devices/vcpu.rst |  57 ++++
> >  Documentation/virt/kvm/locking.rst      |  11 +
> >  arch/x86/include/asm/kvm_host.h         |   4 +
> >  arch/x86/include/uapi/asm/kvm.h         |   4 +
> >  arch/x86/kvm/x86.c                      | 362 +++++++++++++++++-------
> >  include/uapi/linux/kvm.h                |   7 +-
> >  7 files changed, 378 insertions(+), 109 deletions(-)
> >
> > --
> > 2.33.0.rc1.237.g0d66db33f3-goog
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  2021-08-20 12:46   ` Marcelo Tosatti
@ 2021-09-24  8:30     ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2021-09-24  8:30 UTC (permalink / raw)
  To: Marcelo Tosatti, Oliver Upton
  Cc: kvm, kvmarm, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Andrew Jones, Will Deacon,
	Catalin Marinas

On 20/08/21 14:46, Marcelo Tosatti wrote:
> On Mon, Aug 16, 2021 at 12:11:27AM +0000, Oliver Upton wrote:
>> Handling the migration of TSCs correctly is difficult, in part because
>> Linux does not provide userspace with the ability to retrieve a (TSC,
>> realtime) clock pair for a single instant in time. In lieu of a more
>> convenient facility, KVM can report similar information in the kvm_clock
>> structure.
>>
>> Provide userspace with a host TSC & realtime pair iff the realtime clock
>> is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
>> realtime value, advance the KVM clock by the amount of elapsed time. Do
>> not step the KVM clock backwards, though, as it is a monotonic
>> oscillator.
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Oliver Upton <oupton@google.com>
> 
> This is a good idea. Userspace could check if host and destination
> clocks are up to a certain difference and not use the feature if
> not appropriate.
> 
> Is there a qemu patch for it?

Not yet, but Maxim had a patch for a similar series (though with a 
different userspace API).

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code
  2021-09-02 19:21   ` Sean Christopherson
  2021-09-02 19:41     ` Oliver Upton
@ 2021-09-24  9:28     ` Paolo Bonzini
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2021-09-24  9:28 UTC (permalink / raw)
  To: Sean Christopherson, Oliver Upton
  Cc: kvm, kvmarm, Marc Zyngier, Peter Shier, Jim Mattson,
	David Matlack, Ricardo Koller, Jing Zhang, Raghavendra Rao Anata,
	James Morse, Alexandru Elisei, Suzuki K Poulose,
	linux-arm-kernel, Andrew Jones, Will Deacon, Catalin Marinas

On 02/09/21 21:21, Sean Christopherson wrote:
> 
>> +	if (!matched) {
>> +		...
>> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
>> +		kvm->arch.nr_vcpus_matched_tsc = 0;
>> +	} else if (!already_matched) {
>> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
>> +		kvm->arch.nr_vcpus_matched_tsc++;
>> +	}
>> +
>> +	kvm_track_tsc_matching(vcpu);
>> +	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
>
> This unlock is imbalanced if matched and already_matched are both true.  It's not
> immediately obvious that that_can't_  happen, and if it truly can't happen then
> conditionally locking is pointless (because it's not actually conditional).

This is IMO another reason to unify tsc_write_lock and 
pvclock_gtod_sync_lock.  The chances of contention are pretty slim.  As 
soon as I sort out the next -rc3 pull request I'll send out my version 
of Oliver's patches.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-09-24  9:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16  0:11 [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Oliver Upton
2021-08-16  0:11 ` [PATCH v7 1/6] KVM: x86: Fix potential race in KVM_GET_CLOCK Oliver Upton
2021-08-19 18:24   ` Marcelo Tosatti
2021-08-20 18:22     ` Oliver Upton
2021-08-16  0:11 ` [PATCH v7 2/6] KVM: x86: Create helper methods for KVM_{GET,SET}_CLOCK ioctls Oliver Upton
2021-08-16  0:11 ` [PATCH v7 3/6] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK Oliver Upton
2021-08-20 12:46   ` Marcelo Tosatti
2021-09-24  8:30     ` Paolo Bonzini
2021-08-16  0:11 ` [PATCH v7 4/6] KVM: x86: Take the pvclock sync lock behind the tsc_write_lock Oliver Upton
2021-09-02 19:22   ` Sean Christopherson
2021-08-16  0:11 ` [PATCH v7 5/6] KVM: x86: Refactor tsc synchronization code Oliver Upton
2021-09-02 19:21   ` Sean Christopherson
2021-09-02 19:41     ` Oliver Upton
2021-09-24  9:28     ` Paolo Bonzini
2021-08-16  0:11 ` [PATCH v7 6/6] KVM: x86: Expose TSC offset controls to userspace Oliver Upton
2021-08-23 20:56   ` Oliver Upton
2021-08-26 12:48     ` Marcelo Tosatti
2021-08-26 20:27       ` Oliver Upton
2021-09-02 19:23 ` [PATCH v7 0/6] KVM: x86: Add idempotent controls for migrating system counter state Sean Christopherson
2021-09-02 19:45   ` Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).