All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM: x86: snapshotted TSC frequency causing time drifts in vms
@ 2022-10-31 18:35 Jayaramappa, Srilakshmi
  2022-10-31 21:59 ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Jayaramappa, Srilakshmi @ 2022-10-31 18:35 UTC (permalink / raw)
  To: kvm, seanjc; +Cc: pbonzini, vkuznets, mlevitsk, suleiman, Hunt, Joshua

Hi,

We were recently notified of significant time drift on some of our virtual machines. Upon investigation it was found that the jumps in time were larger than ntp was able to gracefully correct. After further probing we discovered that the affected vms booted with tsc frequency equal to the early tsc frequency of the host and not the calibrated frequency.

There were two variables that cached tsc_khz - cpu_tsc_khz and max_tsc_khz.
Caching max_tsc_khz would cause further scaling of the user_tsc_khz when the vcpu is created after the host tsc calibrabration and kvm is loaded before calibration. But it appears that Sean's commit "KVM: x86: Don't snapshot "max" TSC if host TSC is constant" would fix that issue. [1]

The cached cpu_tsc_khz is used in
1. get_kvmclock_ns() which incorrectly sets the factors hv_clock.tsc_to_system_mul and hv_clock.shift that estimate passage of time.
2. kvm_guest_time_update()

We came across Anton Romanov's patch "KVM: x86: Use current rather than snapshotted TSC frequency if it is constant" [2] that seems to address the cached cpu_tsc_khz  case. The patch description says "the race can be hit if and only if userspace is able to create a VM before TSC refinement completes". We think as long as the kvm module is loaded before the host tsc calibration happens the vms can be created anytime and they will have the problem (confirmed this by shutting down an affected vm and relaunching it - it continued to experience time issues). VMs need not be created before tsc refinement.

Even if kvm module loads and vcpu is created before the host tsc refinement and have incorrect time estimation on the vm until the tsc refinement, the patches referenced here would subsequently provide the correct factors to determine time. And any error in time in that small interval can be corrected by ntp if it is running on the guest. If there was no ntp, the error would probably be negligible and would not accumulate.

There doesn't seem to be any response on the v6 of Anton's patch. I wanted to ask if there is further changes in progress or if it is all set to be merged ?

I'd appreciate you taking the time with this query.

Thanks
-Sri


[1] commit id: 741e511b42086a100c05dbe8fd1baeec42e7c584
[2] https://lore.kernel.org/all/20220608183525.1143682-1-romanton@google.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
  2022-10-31 18:35 KVM: x86: snapshotted TSC frequency causing time drifts in vms Jayaramappa, Srilakshmi
@ 2022-10-31 21:59 ` Sean Christopherson
  2022-10-31 23:00   ` Jayaramappa, Srilakshmi
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2022-10-31 21:59 UTC (permalink / raw)
  To: Jayaramappa, Srilakshmi
  Cc: kvm, pbonzini, vkuznets, mlevitsk, suleiman, Hunt, Joshua

On Mon, Oct 31, 2022, Jayaramappa, Srilakshmi wrote:
> Hi,
> 
> We were recently notified of significant time drift on some of our virtual
> machines. Upon investigation it was found that the jumps in time were larger
> than ntp was able to gracefully correct. After further probing we discovered
> that the affected vms booted with tsc frequency equal to the early tsc
> frequency of the host and not the calibrated frequency.
> 
> There were two variables that cached tsc_khz - cpu_tsc_khz and max_tsc_khz.
> Caching max_tsc_khz would cause further scaling of the user_tsc_khz when the
> vcpu is created after the host tsc calibrabration and kvm is loaded before
> calibration. But it appears that Sean's commit "KVM: x86: Don't snapshot
> "max" TSC if host TSC is constant" would fix that issue. [1]
> 
> The cached cpu_tsc_khz is used in 1. get_kvmclock_ns() which incorrectly sets
> the factors hv_clock.tsc_to_system_mul and hv_clock.shift that estimate
> passage of time.  2. kvm_guest_time_update()
> 
> We came across Anton Romanov's patch "KVM: x86: Use current rather than
> snapshotted TSC frequency if it is constant" [2] that seems to address the
> cached cpu_tsc_khz  case. The patch description says "the race can be hit if
> and only if userspace is able to create a VM before TSC refinement
> completes". We think as long as the kvm module is loaded before the host tsc
> calibration happens the vms can be created anytime and they will have the
> problem (confirmed this by shutting down an affected vm and relaunching it -
> it continued to experience time issues). VMs need not be created before tsc
> refinement.
> 
> Even if kvm module loads and vcpu is created before the host tsc refinement
> and have incorrect time estimation on the vm until the tsc refinement, the
> patches referenced here would subsequently provide the correct factors to
> determine time. And any error in time in that small interval can be corrected
> by ntp if it is running on the guest. If there was no ntp, the error would
> probably be negligible and would not accumulate.
> 
> There doesn't seem to be any response on the v6 of Anton's patch. I wanted to
> ask if there is further changes in progress or if it is all set to be merged?

Drat, it slipped through the cracks.

Paolo, can you pick up the below patch?  Oobviously assuming you don't spy any
problems.

It has a superficial conflict with commit 938c8745bcf2 ("KVM: x86: Introduce
"struct kvm_caps" to track misc caps/settings"), but otherwise applies cleanly.

> [2] https://lore.kernel.org/all/20220608183525.1143682-1-romanton@google.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
  2022-10-31 21:59 ` Sean Christopherson
@ 2022-10-31 23:00   ` Jayaramappa, Srilakshmi
  2022-12-14 21:24     ` Jayaramappa, Srilakshmi
  0 siblings, 1 reply; 6+ messages in thread
From: Jayaramappa, Srilakshmi @ 2022-10-31 23:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, vkuznets, mlevitsk, suleiman, Hunt, Joshua


From: Sean Christopherson <seanjc@google.com>
Sent: Monday, October 31, 2022 5:59 PM
To: Jayaramappa, Srilakshmi
Cc: kvm@vger.kernel.org; pbonzini@redhat.com; vkuznets@redhat.com; mlevitsk@redhat.com; suleiman@google.com; Hunt, Joshua
Subject: Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
    
On Mon, Oct 31, 2022, Jayaramappa, Srilakshmi wrote:
> Hi,
> 
> We were recently notified of significant time drift on some of our virtual
> machines. Upon investigation it was found that the jumps in time were larger
> than ntp was able to gracefully correct. After further probing we discovered
> that the affected vms booted with tsc frequency equal to the early tsc
> frequency of the host and not the calibrated frequency.
> 
> There were two variables that cached tsc_khz - cpu_tsc_khz and max_tsc_khz.
> Caching max_tsc_khz would cause further scaling of the user_tsc_khz when the
> vcpu is created after the host tsc calibrabration and kvm is loaded before
> calibration. But it appears that Sean's commit "KVM: x86: Don't snapshot
> "max" TSC if host TSC is constant" would fix that issue. [1]
> 
> The cached cpu_tsc_khz is used in 1. get_kvmclock_ns() which incorrectly sets
> the factors hv_clock.tsc_to_system_mul and hv_clock.shift that estimate
> passage of time.  2. kvm_guest_time_update()
> 
> We came across Anton Romanov's patch "KVM: x86: Use current rather than
> snapshotted TSC frequency if it is constant" [2] that seems to address the
> cached cpu_tsc_khz  case. The patch description says "the race can be hit if
> and only if userspace is able to create a VM before TSC refinement
> completes". We think as long as the kvm module is loaded before the host tsc
> calibration happens the vms can be created anytime and they will have the
> problem (confirmed this by shutting down an affected vm and relaunching it -
> it continued to experience time issues). VMs need not be created before tsc
> refinement.
> 
> Even if kvm module loads and vcpu is created before the host tsc refinement
> and have incorrect time estimation on the vm until the tsc refinement, the
> patches referenced here would subsequently provide the correct factors to
> determine time. And any error in time in that small interval can be corrected
> by ntp if it is running on the guest. If there was no ntp, the error would
> probably be negligible and would not accumulate.
> 
> There doesn't seem to be any response on the v6 of Anton's patch. I wanted to
> ask if there is further changes in progress or if it is all set to be merged?

Drat, it slipped through the cracks.

Paolo, can you pick up the below patch?  Oobviously assuming you don't spy any
problems.

It has a superficial conflict with commit 938c8745bcf2 ("KVM: x86: Introduce
"struct kvm_caps" to track misc caps/settings"), but otherwise applies cleanly.

> [2]  https://urldefense.com/v3/__https://lore.kernel.org/all/20220608183525.1143682-1-romanton@google.com/__;!!GjvTz_vk!QH6DrxJkEWcYdjwasd9zcBVokREj7lO9qb6tynY5SpQoRRXRxi959dCvoy_sbU9oRcrSbNCxXwA_dw$



Thanks, Sean! Appreciate it.

-Sri


    

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
  2022-10-31 23:00   ` Jayaramappa, Srilakshmi
@ 2022-12-14 21:24     ` Jayaramappa, Srilakshmi
  2022-12-14 23:58       ` Sean Christopherson
  0 siblings, 1 reply; 6+ messages in thread
From: Jayaramappa, Srilakshmi @ 2022-12-14 21:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, vkuznets, mlevitsk, suleiman, Hunt, Joshua

From: Jayaramappa, Srilakshmi
 Sent: Monday, October 31, 2022 7:00 PM
To: Sean Christopherson
Cc: kvm@vger.kernel.org; pbonzini@redhat.com; vkuznets@redhat.com; mlevitsk@redhat.com; suleiman@google.com; Hunt, Joshua
Subject: Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
    

From: Sean Christopherson <seanjc@google.com>
Sent: Monday, October 31, 2022 5:59 PM
To: Jayaramappa, Srilakshmi
Cc: kvm@vger.kernel.org; pbonzini@redhat.com; vkuznets@redhat.com; mlevitsk@redhat.com; suleiman@google.com; Hunt, Joshua
Subject: Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
    
On Mon, Oct 31, 2022, Jayaramappa, Srilakshmi wrote:
> Hi,
> 
> We were recently notified of significant time drift on some of our virtual
> machines. Upon investigation it was found that the jumps in time were larger
> than ntp was able to gracefully correct. After further probing we discovered
> that the affected vms booted with tsc frequency equal to the early tsc
> frequency of the host and not the calibrated frequency.
> 
> There were two variables that cached tsc_khz - cpu_tsc_khz and max_tsc_khz.
> Caching max_tsc_khz would cause further scaling of the user_tsc_khz when the
> vcpu is created after the host tsc calibrabration and kvm is loaded before
> calibration. But it appears that Sean's commit "KVM: x86: Don't snapshot
> "max" TSC if host TSC is constant" would fix that issue. [1]
> 
> The cached cpu_tsc_khz is used in 1. get_kvmclock_ns() which incorrectly sets
> the factors hv_clock.tsc_to_system_mul and hv_clock.shift that estimate
> passage of time.  2. kvm_guest_time_update()
> 
> We came across Anton Romanov's patch "KVM: x86: Use current rather than
> snapshotted TSC frequency if it is constant" [2] that seems to address the
> cached cpu_tsc_khz  case. The patch description says "the race can be hit if
> and only if userspace is able to create a VM before TSC refinement
> completes". We think as long as the kvm module is loaded before the host tsc
> calibration happens the vms can be created anytime and they will have the
> problem (confirmed this by shutting down an affected vm and relaunching it -
> it continued to experience time issues). VMs need not be created before tsc
> refinement.
> 
> Even if kvm module loads and vcpu is created before the host tsc refinement
> and have incorrect time estimation on the vm until the tsc refinement, the
> patches referenced here would subsequently provide the correct factors to
> determine time. And any error in time in that small interval can be corrected
> by ntp if it is running on the guest. If there was no ntp, the error would
> probably be negligible and would not accumulate.
> 
> There doesn't seem to be any response on the v6 of Anton's patch. I wanted to
> ask if there is further changes in progress or if it is all set to be merged?

Drat, it slipped through the cracks.

Paolo, can you pick up the below patch?  Oobviously assuming you don't spy any
problems.

It has a superficial conflict with commit 938c8745bcf2 ("KVM: x86: Introduce
"struct kvm_caps" to track misc caps/settings"), but otherwise applies cleanly.

> [2]   https://urldefense.com/v3/__https://lore.kernel.org/all/20220608183525.1143682-1-romanton@google.com/__;!!GjvTz_vk!QH6DrxJkEWcYdjwasd9zcBVokREj7lO9qb6tynY5SpQoRRXRxi959dCvoy_sbU9oRcrSbNCxXwA_dw$



Thanks, Sean! Appreciate it.

-Sri



Hi Paolo,

Could I trouble you to take a look at this patch please? 

Thanks
-Sri

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
  2022-12-14 21:24     ` Jayaramappa, Srilakshmi
@ 2022-12-14 23:58       ` Sean Christopherson
  2022-12-15  0:46         ` Jayaramappa, Srilakshmi
  0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2022-12-14 23:58 UTC (permalink / raw)
  To: Jayaramappa, Srilakshmi
  Cc: kvm, pbonzini, vkuznets, mlevitsk, suleiman, Hunt, Joshua

On Wed, Dec 14, 2022, Jayaramappa, Srilakshmi wrote:
> > There doesn't seem to be any response on the v6 of Anton's patch. I wanted to
> > ask if there is further changes in progress or if it is all set to be merged?
> 
> Drat, it slipped through the cracks.
> 
> Paolo, can you pick up the below patch?  Oobviously assuming you don't spy any
> problems.
> 
> It has a superficial conflict with commit 938c8745bcf2 ("KVM: x86: Introduce

...

> Could I trouble you to take a look at this patch please? 

It's already in kvm/next

  3ebcbd2244f5 ("KVM: x86: Use current rather than snapshotted TSC frequency if it is constant")

but there was a hiccup with the KVM pull request for 6.2[*], which is why it hasn't
made it's way to Linus yet.

[*] https://lore.kernel.org/all/6d96a62e-d5a1-e606-3bd2-c38f4a6c8545@redhat.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
  2022-12-14 23:58       ` Sean Christopherson
@ 2022-12-15  0:46         ` Jayaramappa, Srilakshmi
  0 siblings, 0 replies; 6+ messages in thread
From: Jayaramappa, Srilakshmi @ 2022-12-15  0:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, pbonzini, vkuznets, mlevitsk, suleiman, Hunt, Joshua

From: Sean Christopherson <seanjc@google.com>
Sent: Wednesday, December 14, 2022 6:58 PM
To: Jayaramappa, Srilakshmi
Cc: kvm@vger.kernel.org; pbonzini@redhat.com; vkuznets@redhat.com; mlevitsk@redhat.com; suleiman@google.com; Hunt, Joshua
Subject: Re: KVM: x86: snapshotted TSC frequency causing time drifts in vms
    
On Wed, Dec 14, 2022, Jayaramappa, Srilakshmi wrote:
> > There doesn't seem to be any response on the v6 of Anton's patch. I wanted to
> > ask if there is further changes in progress or if it is all set to be merged?
> 
> Drat, it slipped through the cracks.
> 
> Paolo, can you pick up the below patch?  Oobviously assuming you don't spy any
> problems.
> 
> It has a superficial conflict with commit 938c8745bcf2 ("KVM: x86: Introduce

...

> Could I trouble you to take a look at this patch please? 

It's already in kvm/next

  3ebcbd2244f5 ("KVM: x86: Use current rather than snapshotted TSC frequency if it is constant")

but there was a hiccup with the KVM pull request for 6.2[*], which is why it hasn't
made it's way to Linus yet.

[*]  https://urldefense.com/v3/__https://lore.kernel.org/all/6d96a62e-d5a1-e606-3bd2-c38f4a6c8545@redhat.com__;!!GjvTz_vk!VYIJqzFNCr9fQP6gLlryCKVNhGb-OrtosOocQzLpVk0aIEGoGFKL5OD0Zw8rmEN3QhhX9BmLkMpH7A$
    

Oh that's great, thanks, Sean! Appreciate the help.

-Sri

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-15  0:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31 18:35 KVM: x86: snapshotted TSC frequency causing time drifts in vms Jayaramappa, Srilakshmi
2022-10-31 21:59 ` Sean Christopherson
2022-10-31 23:00   ` Jayaramappa, Srilakshmi
2022-12-14 21:24     ` Jayaramappa, Srilakshmi
2022-12-14 23:58       ` Sean Christopherson
2022-12-15  0:46         ` Jayaramappa, Srilakshmi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.