linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
@ 2022-02-25  1:39 Sean Christopherson
  2022-02-25 12:10 ` Paolo Bonzini
  2022-02-25 12:52 ` David Woodhouse
  0 siblings, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-02-25  1:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal, Anton Romanov

Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
host TSC is constant, in which case the actual TSC frequency will never
change and thus capturing the "max" TSC during initialization is
unnecessary, KVM can simply use tsc_khz during VM creation.

On CPUs with constant TSC, but not a hardware-specified TSC frequency,
snapshotting max_tsc_khz and using that to set a VM's default TSC
frequency can lead to KVM thinking it needs to manually scale the guest's
TSC if refining the TSC completes after KVM snapshots tsc_khz.  The
actual frequency never changes, only the kernel's calculation of what
that frequency is changes.  On systems without hardware TSC scaling, this
either puts KVM into "always catchup" mode (extremely inefficient), or
prevents creating VMs altogether.

Ideally, KVM would not be able to race with TSC refinement, or would have
a hook into tsc_refine_calibration_work() to get an alert when refinement
is complete.  Avoiding the race altogether isn't practical as refinement
takes a relative eternity; it's deliberately put on a work queue outside
of the normal boot sequence to avoid unnecessarily delaying boot.

Adding a hook is doable, but somewhat gross due to KVM's ability to be
built as a module.  And if the TSC is constant, which is likely the case
for every VMX/SVM-capable CPU produced in the last decade, the race can
be hit if and only if userspace is able to create a VM before TSC
refinement completes; refinement is slow, but not that slow.

For now, punt on a proper fix, as not taking a snapshot can help some
uses cases and not taking a snapshot is arguably correct irrespective of
the race with refinement.

Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Anton Romanov <romanton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6552360d8888..81d9d84dc59f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8727,13 +8727,15 @@ static int kvmclock_cpu_online(unsigned int cpu)
 
 static void kvm_timer_init(void)
 {
-	max_tsc_khz = tsc_khz;
-
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
 #ifdef CONFIG_CPU_FREQ
 		struct cpufreq_policy *policy;
 		int cpu;
+#endif
 
+		max_tsc_khz = tsc_khz;
+
+#ifdef CONFIG_CPU_FREQ
 		cpu = get_cpu();
 		policy = cpufreq_cpu_get(cpu);
 		if (policy) {
@@ -11160,7 +11162,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
-	kvm_set_tsc_khz(vcpu, max_tsc_khz);
+	kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz);
 	kvm_vcpu_reset(vcpu, false);
 	kvm_init_mmu(vcpu);
 	vcpu_put(vcpu);

base-commit: f4bc051fc91ab9f1d5225d94e52d369ef58bec58
-- 
2.35.1.574.g5d30c73bfb-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25  1:39 [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant Sean Christopherson
@ 2022-02-25 12:10 ` Paolo Bonzini
  2022-02-25 12:20   ` David Woodhouse
  2022-04-12 17:38   ` Anton Romanov
  2022-02-25 12:52 ` David Woodhouse
  1 sibling, 2 replies; 10+ messages in thread
From: Paolo Bonzini @ 2022-02-25 12:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Suleiman Souhlal, Anton Romanov

On 2/25/22 02:39, Sean Christopherson wrote:
> Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
> host TSC is constant, in which case the actual TSC frequency will never
> change and thus capturing the "max" TSC during initialization is
> unnecessary, KVM can simply use tsc_khz during VM creation.
> 
> On CPUs with constant TSC, but not a hardware-specified TSC frequency,
> snapshotting max_tsc_khz and using that to set a VM's default TSC
> frequency can lead to KVM thinking it needs to manually scale the guest's
> TSC if refining the TSC completes after KVM snapshots tsc_khz.  The
> actual frequency never changes, only the kernel's calculation of what
> that frequency is changes.  On systems without hardware TSC scaling, this
> either puts KVM into "always catchup" mode (extremely inefficient), or
> prevents creating VMs altogether.
> 
> Ideally, KVM would not be able to race with TSC refinement, or would have
> a hook into tsc_refine_calibration_work() to get an alert when refinement
> is complete.  Avoiding the race altogether isn't practical as refinement
> takes a relative eternity; it's deliberately put on a work queue outside
> of the normal boot sequence to avoid unnecessarily delaying boot.
> 
> Adding a hook is doable, but somewhat gross due to KVM's ability to be
> built as a module.  And if the TSC is constant, which is likely the case
> for every VMX/SVM-capable CPU produced in the last decade, the race can
> be hit if and only if userspace is able to create a VM before TSC
> refinement completes; refinement is slow, but not that slow.
> 
> For now, punt on a proper fix, as not taking a snapshot can help some
> uses cases and not taking a snapshot is arguably correct irrespective of
> the race with refinement.
> 
> Cc: Suleiman Souhlal <suleiman@google.com>
> Cc: Anton Romanov <romanton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Queued, but I'd rather have a subject that calls out that max_tsc_khz 
needs a replacement at vCPU creation time.  In fact, the real change 
(and bug, and fix) is in kvm_arch_vcpu_create(), while the subject 
mentions only the change in kvm_timer_init().

What do you think of "KVM: x86: Use current rather than max TSC 
frequency if it is constant"?

Pao

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25 12:10 ` Paolo Bonzini
@ 2022-02-25 12:20   ` David Woodhouse
  2022-02-25 16:21     ` Sean Christopherson
  2022-04-12 17:38   ` Anton Romanov
  1 sibling, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2022-02-25 12:20 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Suleiman Souhlal, Anton Romanov

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

On Fri, 2022-02-25 at 13:10 +0100, Paolo Bonzini wrote:
> 
> Queued, but I'd rather have a subject that calls out that max_tsc_khz 
> needs a replacement at vCPU creation time.  In fact, the real change 
> (and bug, and fix) is in kvm_arch_vcpu_create(), while the subject 
> mentions only the change in kvm_timer_init().

In
https://lore.kernel.org/kvm/e7be32b06676c7ebf415d9deea5faf50aa8c0785.camel@infradead.org/T/
last night I was coming round to the idea that we might want a KVM-wide 
default frequency which is settable from userspace and is used instead
of max_tsc_khz anyway.

I also have questions about the use case for the above patch.... if
this is a clean boot and you're just starting to host guests, surely we
can wait for the time it takes for the TSC synchronization to complete?

And if this is a live update scenario, where we pause the guests, kexec
into a new kernel, then resume the "migrated" guests again... why in
$DEITY's name isn't the precise TSC frequency being handed over from
kernel#1 to kernel#2 over the kexec so that it's known from the start?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25  1:39 [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant Sean Christopherson
  2022-02-25 12:10 ` Paolo Bonzini
@ 2022-02-25 12:52 ` David Woodhouse
  2022-02-25 16:40   ` Sean Christopherson
  1 sibling, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2022-02-25 12:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Suleiman Souhlal, Anton Romanov

[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]

On Fri, 2022-02-25 at 01:39 +0000, Sean Christopherson wrote:
> @@ -11160,7 +11162,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>         vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>         kvm_vcpu_mtrr_init(vcpu);
>         vcpu_load(vcpu);
> -       kvm_set_tsc_khz(vcpu, max_tsc_khz);
> +       kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz);
>         kvm_vcpu_reset(vcpu, false);
>         kvm_init_mmu(vcpu);
>         vcpu_put(vcpu);
> 

Hm, now if you hit that race you end up potentially giving *different*
frequencies to different vCPUs in a single guest, depending on when
they were created.

How about this... (and as noted, I think I want to add an explicit KVM
ioctl to set kvm->arch.default_tsc_khz for subsequently created vCPUs).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a3385db39d3e..e4696a578f41 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1119,6 +1119,8 @@ struct kvm_arch {
 	u64 cur_tsc_generation;
 	int nr_vcpus_matched_tsc;
 
+	u32 default_tsc_khz;
+
 	seqcount_raw_spinlock_t pvclock_sc;
 	bool use_master_clock;
 	u64 master_kernel_ns;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83accd3e7502..686891966c15 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2601,6 +2601,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 			 * kvm_clock stable after CPU hotplug
 			 */
 			synchronizing = true;
+			data = kvm->arch.last_tsc_write;
 		} else {
 			u64 tsc_exp = kvm->arch.last_tsc_write +
 						nsec_to_cycles(vcpu, elapsed);
@@ -8728,22 +8729,22 @@ static int kvmclock_cpu_online(unsigned int cpu)
 
 static void kvm_timer_init(void)
 {
-	max_tsc_khz = tsc_khz;
-
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
-#ifdef CONFIG_CPU_FREQ
-		struct cpufreq_policy *policy;
-		int cpu;
-
-		cpu = get_cpu();
-		policy = cpufreq_cpu_get(cpu);
-		if (policy) {
-			if (policy->cpuinfo.max_freq)
-				max_tsc_khz = policy->cpuinfo.max_freq;
-			cpufreq_cpu_put(policy);
+		max_tsc_khz = tsc_khz;
+
+		if (IS_ENABLED(CONFIG_CPU_FREQ)) {
+			struct cpufreq_policy *policy;
+			int cpu;
+
+			cpu = get_cpu();
+			policy = cpufreq_cpu_get(cpu);
+			if (policy) {
+				if (policy->cpuinfo.max_freq)
+					max_tsc_khz = policy->cpuinfo.max_freq;
+				cpufreq_cpu_put(policy);
+			}
+			put_cpu();
 		}
-		put_cpu();
-#endif
 		cpufreq_register_notifier(&kvmclock_cpufreq_notifier_block,
 					  CPUFREQ_TRANSITION_NOTIFIER);
 	}
@@ -11165,7 +11166,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	kvm_xen_init_vcpu(vcpu);
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
-	kvm_set_tsc_khz(vcpu, max_tsc_khz);
+	kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
 	kvm_vcpu_reset(vcpu, false);
 	kvm_init_mmu(vcpu);
 	vcpu_put(vcpu);
@@ -11614,6 +11615,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	pvclock_update_vm_gtod_copy(kvm);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
+	kvm->arch.default_tsc_khz = max_tsc_khz ? : tsc_khz;
 	kvm->arch.guest_can_read_msr_platform_info = true;
 
 #if IS_ENABLED(CONFIG_HYPERV)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25 12:20   ` David Woodhouse
@ 2022-02-25 16:21     ` Sean Christopherson
  2022-02-25 16:34       ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-02-25 16:21 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal, Anton Romanov

On Fri, Feb 25, 2022, David Woodhouse wrote:
> On Fri, 2022-02-25 at 13:10 +0100, Paolo Bonzini wrote:
> > 
> > Queued, but I'd rather have a subject that calls out that max_tsc_khz 
> > needs a replacement at vCPU creation time.  In fact, the real change 
> > (and bug, and fix) is in kvm_arch_vcpu_create(), while the subject 
> > mentions only the change in kvm_timer_init().
> 
> In
> https://lore.kernel.org/kvm/e7be32b06676c7ebf415d9deea5faf50aa8c0785.camel@infradead.org/T/
> last night I was coming round to the idea that we might want a KVM-wide 
> default frequency which is settable from userspace and is used instead
> of max_tsc_khz anyway.
> 
> I also have questions about the use case for the above patch.... if
> this is a clean boot and you're just starting to host guests, surely we
> can wait for the time it takes for the TSC synchronization to complete?

KVM is built into the kernel in their case, the vmx_init() => kvm_init() gets
automatically called during boot.  The VMs aren't started until well after
synchronization has completed, but KVM has already snapshotted the "bad" value.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25 16:21     ` Sean Christopherson
@ 2022-02-25 16:34       ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2022-02-25 16:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal, Anton Romanov

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

On Fri, 2022-02-25 at 16:21 +0000, Sean Christopherson wrote:
> > I also have questions about the use case for the above patch.... if
> > this is a clean boot and you're just starting to host guests, surely we
> > can wait for the time it takes for the TSC synchronization to complete?
> 
> KVM is built into the kernel in their case, the vmx_init() => kvm_init() gets
> automatically called during boot.  The VMs aren't started until well after
> synchronization has completed, but KVM has already snapshotted the "bad" value.


Gotcha.

So even when we put my patch in front, to snapshot a value into
kvm->arch.default_tsc_khz, that's happening later at VM creation time
so should also be snapshotting the *good* value.

And at least if it snapshots the bad value, all the vCPUs will be
*consistent*.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25 12:52 ` David Woodhouse
@ 2022-02-25 16:40   ` Sean Christopherson
  2022-02-25 16:51     ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-02-25 16:40 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal, Anton Romanov

On Fri, Feb 25, 2022, David Woodhouse wrote:
> On Fri, 2022-02-25 at 01:39 +0000, Sean Christopherson wrote:
> > @@ -11160,7 +11162,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >         vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
> >         kvm_vcpu_mtrr_init(vcpu);
> >         vcpu_load(vcpu);
> > -       kvm_set_tsc_khz(vcpu, max_tsc_khz);
> > +       kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz);
> >         kvm_vcpu_reset(vcpu, false);
> >         kvm_init_mmu(vcpu);
> >         vcpu_put(vcpu);
> > 
> 
> Hm, now if you hit that race you end up potentially giving *different*
> frequencies to different vCPUs in a single guest, depending on when
> they were created.

Yep.  Though the race is much harder to hit (userspace vs TSC refinement).  The
existing race being hit is essentially do_initcalls() vs. TSC refinement.  

> How about this... (and as noted, I think I want to add an explicit KVM
> ioctl to set kvm->arch.default_tsc_khz for subsequently created vCPUs).

This wouldn't necessarily help.  E.g. assuming userspace knows the actual TSC
frequency, creating a vCPU before refinement completes might put the vCPU in
"always catchup" purgatory.

To really fix the race, KVM needs a notification that refinement completed (or
failed).  KVM could simply refuse to create vCPUs until it got the notification.
In the non-constant case, KVM would also need to refresh max_tsc_khz.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25 16:40   ` Sean Christopherson
@ 2022-02-25 16:51     ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2022-02-25 16:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal, Anton Romanov

[-- Attachment #1: Type: text/plain, Size: 2095 bytes --]

On Fri, 2022-02-25 at 16:40 +0000, Sean Christopherson wrote:
> On Fri, Feb 25, 2022, David Woodhouse wrote:
> > On Fri, 2022-02-25 at 01:39 +0000, Sean Christopherson wrote:
> > > @@ -11160,7 +11162,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > >         vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
> > >         kvm_vcpu_mtrr_init(vcpu);
> > >         vcpu_load(vcpu);
> > > -       kvm_set_tsc_khz(vcpu, max_tsc_khz);
> > > +       kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz);
> > >         kvm_vcpu_reset(vcpu, false);
> > >         kvm_init_mmu(vcpu);
> > >         vcpu_put(vcpu);
> > > 
> > 
> > Hm, now if you hit that race you end up potentially giving *different*
> > frequencies to different vCPUs in a single guest, depending on when
> > they were created.
> 
> Yep.  Though the race is much harder to hit (userspace vs TSC refinement).  The
> existing race being hit is essentially do_initcalls() vs. TSC refinement.  
> 
> > How about this... (and as noted, I think I want to add an explicit KVM
> > ioctl to set kvm->arch.default_tsc_khz for subsequently created vCPUs).
> 
> This wouldn't necessarily help.  E.g. assuming userspace knows the actual TSC
> frequency, creating a vCPU before refinement completes might put the vCPU in
> "always catchup" purgatory.

Right.  But at least they'd be *consistent*.

I was actually making that change anyway, for the benefit of VMs where
we are intentionally scaling to a known, different, TSC frequency —
which is currently completely hosed when all the vCPUs set it for
themselves because the TSC sync then fails.

> To really fix the race, KVM needs a notification that refinement completed (or
> failed).  KVM could simply refuse to create vCPUs until it got the notification.
> In the non-constant case, KVM would also need to refresh max_tsc_khz.

Hm, would the world be a better place if we knew that the delta between
the unrefined and refined TSC values was always within the tolerance of
tsc_tolerance_ppm for which we wouldn't bother scaling anyway?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-02-25 12:10 ` Paolo Bonzini
  2022-02-25 12:20   ` David Woodhouse
@ 2022-04-12 17:38   ` Anton Romanov
  2022-04-13  8:31     ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Anton Romanov @ 2022-04-12 17:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal

On Fri, Feb 25, 2022 at 4:10 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 2/25/22 02:39, Sean Christopherson wrote:
> > Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the
> > host TSC is constant, in which case the actual TSC frequency will never
> > change and thus capturing the "max" TSC during initialization is
> > unnecessary, KVM can simply use tsc_khz during VM creation.
> >
> > On CPUs with constant TSC, but not a hardware-specified TSC frequency,
> > snapshotting max_tsc_khz and using that to set a VM's default TSC
> > frequency can lead to KVM thinking it needs to manually scale the guest's
> > TSC if refining the TSC completes after KVM snapshots tsc_khz.  The
> > actual frequency never changes, only the kernel's calculation of what
> > that frequency is changes.  On systems without hardware TSC scaling, this
> > either puts KVM into "always catchup" mode (extremely inefficient), or
> > prevents creating VMs altogether.
> >
> > Ideally, KVM would not be able to race with TSC refinement, or would have
> > a hook into tsc_refine_calibration_work() to get an alert when refinement
> > is complete.  Avoiding the race altogether isn't practical as refinement
> > takes a relative eternity; it's deliberately put on a work queue outside
> > of the normal boot sequence to avoid unnecessarily delaying boot.
> >
> > Adding a hook is doable, but somewhat gross due to KVM's ability to be
> > built as a module.  And if the TSC is constant, which is likely the case
> > for every VMX/SVM-capable CPU produced in the last decade, the race can
> > be hit if and only if userspace is able to create a VM before TSC
> > refinement completes; refinement is slow, but not that slow.
> >
> > For now, punt on a proper fix, as not taking a snapshot can help some
> > uses cases and not taking a snapshot is arguably correct irrespective of
> > the race with refinement.
> >
> > Cc: Suleiman Souhlal <suleiman@google.com>
> > Cc: Anton Romanov <romanton@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Queued, but I'd rather have a subject that calls out that max_tsc_khz
> needs a replacement at vCPU creation time.  In fact, the real change
> (and bug, and fix) is in kvm_arch_vcpu_create(), while the subject
> mentions only the change in kvm_timer_init().
>
> What do you think of "KVM: x86: Use current rather than max TSC
> frequency if it is constant"?
>
> Pao

Ping. This said "queued" but I don't think this ever landed.
What's the status of this?
Paolo, does this need more work?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant
  2022-04-12 17:38   ` Anton Romanov
@ 2022-04-13  8:31     ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2022-04-13  8:31 UTC (permalink / raw)
  To: Anton Romanov
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Suleiman Souhlal

On 4/12/22 19:38, Anton Romanov wrote:
>> Queued, but I'd rather have a subject that calls out that max_tsc_khz
>> needs a replacement at vCPU creation time.  In fact, the real change
>> (and bug, and fix) is in kvm_arch_vcpu_create(), while the subject
>> mentions only the change in kvm_timer_init().
>>
>> What do you think of "KVM: x86: Use current rather than max TSC
>> frequency if it is constant"?
>
> Ping. This said "queued" but I don't think this ever landed.
> What's the status of this?
> Paolo, does this need more work?

The features in my second pull request were rejected by Linus so they 
will be in 5.19.  I'm going to open kvm/next today and the patches will 
be there.

Unfortunately he hasn't replied to my rebuttal at 
https://lore.kernel.org/kvm/30ffdecc-6ecd-5194-14ec-40e8b818889a@redhat.com/#t 
so I have no idea what his opinion is.  I'll try to get more stuff in 
early for the next releases.

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-13  8:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  1:39 [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant Sean Christopherson
2022-02-25 12:10 ` Paolo Bonzini
2022-02-25 12:20   ` David Woodhouse
2022-02-25 16:21     ` Sean Christopherson
2022-02-25 16:34       ` David Woodhouse
2022-04-12 17:38   ` Anton Romanov
2022-04-13  8:31     ` Paolo Bonzini
2022-02-25 12:52 ` David Woodhouse
2022-02-25 16:40   ` Sean Christopherson
2022-02-25 16:51     ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).