All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Jin <joe.jin@oracle.com>
To: Dongli Zhang <dongli.zhang@oracle.com>,
	kvm@vger.kernel.org, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, seanjc@google.com,
	pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically
Date: Tue, 26 Sep 2023 17:29:44 -0700	[thread overview]
Message-ID: <377d9706-cc10-dfb8-5326-96c83c47338d@oracle.com> (raw)
In-Reply-To: <20230926230649.67852-1-dongli.zhang@oracle.com>

On 9/26/23 4:06 PM, Dongli Zhang wrote:
> This is to minimize the kvmclock drift during CPU hotplug (or when the
> master clock and pvclock_vcpu_time_info are updated). The drift is
> because kvmclock and raw monotonic (tsc) use different
> equation/mult/shift to calculate that how many nanoseconds (given the tsc
> as input) has passed.
>
> The calculation of the kvmclock is based on the pvclock_vcpu_time_info
> provided by the host side.
>
> struct pvclock_vcpu_time_info {
> 	u32   version;
> 	u32   pad0;
> 	u64   tsc_timestamp;     --> by host raw monotonic
> 	u64   system_time;       --> by host raw monotonic
> 	u32   tsc_to_system_mul; --> by host KVM
> 	s8    tsc_shift;         --> by host KVM
> 	u8    flags;
> 	u8    pad[2];
> } __attribute__((__packed__));
>
> To calculate the current guest kvmclock:
>
> 1. Obtain the tsc = rdtsc() of guest.
>
> 2. If shift < 0:
>     tmp = tsc >> tsc_shift
>    if shift > 0:
>     tmp = tsc << tsc_shift
>
> 3. The kvmclock value will be: (tmp * tsc_to_system_mul) >> 32
>
> Therefore, the current kvmclock will be either:
>
> (rdtsc() >> tsc_shift) * tsc_to_system_mul >> 32
>
> ... or ...
>
> (rdtsc() << tsc_shift) * tsc_to_system_mul >> 32
>
> The 'tsc_to_system_mul' and 'tsc_shift' are calculated by the host KVM.
>
> When the master clock is actively used, the 'tsc_timestamp' and
> 'system_time' are derived from the host raw monotonic time, which is
> calculated based on the 'mult' and 'shift' of clocksource_tsc:
>
> elapsed_time = (tsc * mult) >> shift
>
> Since kvmclock and raw monotonic (clocksource_tsc) use different
> equation/mult/shift to convert the tsc to nanosecond, there may be clock
> drift issue during CPU hotplug (when the master clock is updated).
>
> 1. The guest boots and all vcpus have the same 'pvclock_vcpu_time_info'
> (suppose the master clock is used).
>
> 2. Since the master clock is never updated, the periodic kvmclock_sync_work
> does not update the values in 'pvclock_vcpu_time_info'.
>
> 3. Suppose a very long period has passed (e.g., 30-day).
>
> 4. The user adds another vcpu. Both master clock and
> 'pvclock_vcpu_time_info' are updated, based on the raw monotonic.
>
> (Ideally, we expect the update is based on 'tsc_to_system_mul' and
> 'tsc_shift' from kvmclock).
>
>
> Because kvmclock and raw monotonic (clocksource_tsc) use different
> equation/mult/shift to convert the tsc to nanosecond, there will be drift
> between:
>
> (1) kvmclock based on current rdtsc and old 'pvclock_vcpu_time_info'
> (2) kvmclock based on current rdtsc and new 'pvclock_vcpu_time_info'
>
> According to the test, there is a drift of 4502145ns between (1) and (2)
> after about 40 hours. The longer the time, the large the drift.
>
> This is to add a module param to allow the user to configure for how often
> to refresh the master clock, in order to reduce the kvmclock drift based on
> user requirement (e.g., every 5-min to every day). The more often that the
> master clock is refreshed, the smaller the kvmclock drift during the vcpu
> hotplug.
>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Other options are to update the masterclock in:
> - kvmclock_sync_work, or
> - pvclock_gtod_notify()
>
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 34 +++++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 17715cb8731d..57409dce5d73 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1331,6 +1331,7 @@ struct kvm_arch {
>  	u64 master_cycle_now;
>  	struct delayed_work kvmclock_update_work;
>  	struct delayed_work kvmclock_sync_work;
> +	struct delayed_work masterclock_sync_work;
>  
>  	struct kvm_xen_hvm_config xen_hvm_config;
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9f18b06bbda6..0b71dc3785eb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -157,6 +157,9 @@ module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR);
>  static bool __read_mostly kvmclock_periodic_sync = true;
>  module_param(kvmclock_periodic_sync, bool, S_IRUGO);
>  
> +unsigned int __read_mostly masterclock_sync_period;
> +module_param(masterclock_sync_period, uint, 0444);

Can the mode be 0644 and allow it be changed at runtime?

Thanks,
Joe
> +
>  /* tsc tolerance in parts per million - default to 1/2 of the NTP threshold */
>  static u32 __read_mostly tsc_tolerance_ppm = 250;
>  module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
> @@ -3298,6 +3301,31 @@ static void kvmclock_sync_fn(struct work_struct *work)
>  					KVMCLOCK_SYNC_PERIOD);
>  }
>  
> +static void masterclock_sync_fn(struct work_struct *work)
> +{
> +	unsigned long i;
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> +					   masterclock_sync_work);
> +	struct kvm *kvm = container_of(ka, struct kvm, arch);
> +	struct kvm_vcpu *vcpu;
> +
> +	if (!masterclock_sync_period)
> +		return;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		/*
> +		 * It is not required to kick the vcpu because it is not
> +		 * expected to update the master clock immediately.
> +		 */
> +		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> +	}
> +
> +	schedule_delayed_work(&ka->masterclock_sync_work,
> +			      masterclock_sync_period * HZ);
> +}
> +
> +
>  /* These helpers are safe iff @msr is known to be an MCx bank MSR. */
>  static bool is_mci_control_msr(u32 msr)
>  {
> @@ -11970,6 +11998,10 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>  	if (kvmclock_periodic_sync && vcpu->vcpu_idx == 0)
>  		schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
>  						KVMCLOCK_SYNC_PERIOD);
> +
> +	if (masterclock_sync_period && vcpu->vcpu_idx == 0)
> +		schedule_delayed_work(&kvm->arch.masterclock_sync_work,
> +				      masterclock_sync_period * HZ);
>  }
>  
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> @@ -12344,6 +12376,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
> +	INIT_DELAYED_WORK(&kvm->arch.masterclock_sync_work, masterclock_sync_fn);
>  
>  	kvm_apicv_init(kvm);
>  	kvm_hv_init_vm(kvm);
> @@ -12383,6 +12416,7 @@ static void kvm_unload_vcpu_mmus(struct kvm *kvm)
>  
>  void kvm_arch_sync_events(struct kvm *kvm)
>  {
> +	cancel_delayed_work_sync(&kvm->arch.masterclock_sync_work);
>  	cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
>  	cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
>  	kvm_free_pit(kvm);


  reply	other threads:[~2023-09-27  1:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 23:06 [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically Dongli Zhang
2023-09-27  0:29 ` Joe Jin [this message]
2023-09-27  0:36   ` Dongli Zhang
2023-09-28 16:18     ` Sean Christopherson
2023-09-29 20:15       ` Dongli Zhang
2023-10-02  8:33         ` David Woodhouse
2023-10-02 16:37           ` Sean Christopherson
2023-10-02 17:17             ` Dongli Zhang
2023-10-02 18:18               ` Sean Christopherson
2023-10-02 21:06                 ` Peter Zijlstra
2023-10-02 21:16                   ` Peter Zijlstra
2023-10-02 18:16             ` David Woodhouse
2023-10-03  0:53               ` Sean Christopherson
2023-10-03  1:32                 ` Dongli Zhang
2023-10-03  1:49                   ` Sean Christopherson
2023-10-03  2:07                     ` Dongli Zhang
2023-10-03 21:00                       ` Sean Christopherson
2023-10-03  5:54                 ` David Woodhouse
2023-10-04  0:04                   ` Sean Christopherson
2023-10-04 10:01                     ` David Woodhouse
2023-10-04 18:06                       ` Sean Christopherson
2023-10-04 19:13                         ` Dongli Zhang
2023-10-11  0:20                           ` Sean Christopherson
2023-10-11  7:18                             ` David Woodhouse
2023-10-13 18:07                               ` Sean Christopherson
2023-10-13 18:21                                 ` David Woodhouse
2023-10-13 19:02                                   ` Sean Christopherson
2023-10-13 19:12                                     ` David Woodhouse
2023-10-13 20:03                                       ` Sean Christopherson
2023-10-13 20:12                                 ` Dongli Zhang
2023-10-13 23:26                                   ` Sean Christopherson
2023-10-14  9:49                                     ` David Woodhouse
2023-10-16 15:47                                       ` Dongli Zhang
2023-10-16 16:25                                         ` David Woodhouse
2023-10-16 17:04                                           ` Dongli Zhang
2023-10-16 18:49                                           ` Sean Christopherson
2023-10-16 22:04                                             ` Dongli Zhang
2023-10-16 22:48                                               ` Sean Christopherson
2023-10-17 16:18                                                 ` Dongli Zhang
2023-10-03  9:12                 ` David Woodhouse
2023-10-04  0:07                   ` Sean Christopherson
2023-10-04  8:06                     ` David Woodhouse
2023-10-03 14:29                 ` David Woodhouse
2023-10-04  0:10                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=377d9706-cc10-dfb8-5326-96c83c47338d@oracle.com \
    --to=joe.jin@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.