All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: Dongli Zhang <dongli.zhang@oracle.com>,
	Joe Jin <joe.jin@oracle.com>,
	x86@kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, pbonzini@redhat.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com
Subject: Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically
Date: Tue, 3 Oct 2023 17:04:40 -0700	[thread overview]
Message-ID: <ZRysGAgk6W1bpXdl@google.com> (raw)
In-Reply-To: <9975969725a64c2ba2b398244dba3437bff5154e.camel@infradead.org>

On Tue, Oct 03, 2023, David Woodhouse wrote:
> On Mon, 2023-10-02 at 17:53 -0700, Sean Christopherson wrote:
> > 
> > The two domains use the same "clock" (constant TSC), but different math to compute
> > nanoseconds from a given TSC value.  For decently large TSC values, this results
> > in CLOCK_MONOTONIC_RAW and kvmclock computing two different times in nanoseconds.
> 
> This is the bit I'm still confused about, and it seems to be the root
> of all the other problems.
> 
> Both CLOCK_MONOTONIC_RAW and kvmclock have *one* job: to convert a
> number of ticks of the TSC running at a constant known frequency, to a
> number of nanoseconds.
> 
> So how in the name of all that is holy do they manage to get
> *different* answers?
> 
> I get that the mult/shift thing carries some imprecision, but is that
> all it is? 

Yep, pretty sure that's it.  It's like the plot from Office Space / Superman III.
Those little rounding errors add up over time.

PV clock:

  nanoseconds = ((TSC >> shift) * mult) >> 32

or 

  nanoseconds = ((TSC << shift) * mult) >> 32

versus timekeeping (mostly)

  nanoseconds = (TSC * mult) >> shift

The more I look at the PV clock stuff, the more I agree with Peter: it's garbage.
Shifting before multiplying is guaranteed to introduce error.  Shifting right drops
data, and shifting left introduces zeros.

> Can't we ensure that the kvmclock uses the *same* algorithm,
> precisely, as CLOCK_MONOTONIC_RAW?

Yes?  At least for sane hardware, after much staring, I think it's possible.

It's tricky because the two algorithms are wierdly different, the PV clock algorithm
is ABI and thus immutable, and Thomas and the timekeeping folks would rightly laugh
at us for suggesting that we try to shove the pvclock algorithm into the kernel.

The hardcoded shift right 32 in PV clock is annoying, but not the end of the world.

Compile tested only, but I believe this math is correct.  And I'm guessing we'd
want some safeguards against overflow, e.g. due to a multiplier that is too big.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6573c89c35a9..ae9275c3d580 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3212,9 +3212,19 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
                                            v->arch.l1_tsc_scaling_ratio);
 
        if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
-               kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
-                                  &vcpu->hv_clock.tsc_shift,
-                                  &vcpu->hv_clock.tsc_to_system_mul);
+               u32 shift, mult;
+
+               clocks_calc_mult_shift(&mult, &shift, tgt_tsc_khz, NSEC_PER_MSEC, 600);
+
+               if (shift <= 32) {
+                       vcpu->hv_clock.tsc_shift = 0;
+                       vcpu->hv_clock.tsc_to_system_mul = mult * BIT(32 - shift);
+               } else {
+                       kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL,
+                                          &vcpu->hv_clock.tsc_shift,
+                                          &vcpu->hv_clock.tsc_to_system_mul);
+               }
+
                vcpu->hw_tsc_khz = tgt_tsc_khz;
                kvm_xen_update_tsc_info(v);
        }


  reply	other threads:[~2023-10-04  0:04 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 23:06 [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically Dongli Zhang
2023-09-27  0:29 ` Joe Jin
2023-09-27  0:36   ` Dongli Zhang
2023-09-28 16:18     ` Sean Christopherson
2023-09-29 20:15       ` Dongli Zhang
2023-10-02  8:33         ` David Woodhouse
2023-10-02 16:37           ` Sean Christopherson
2023-10-02 17:17             ` Dongli Zhang
2023-10-02 18:18               ` Sean Christopherson
2023-10-02 21:06                 ` Peter Zijlstra
2023-10-02 21:16                   ` Peter Zijlstra
2023-10-02 18:16             ` David Woodhouse
2023-10-03  0:53               ` Sean Christopherson
2023-10-03  1:32                 ` Dongli Zhang
2023-10-03  1:49                   ` Sean Christopherson
2023-10-03  2:07                     ` Dongli Zhang
2023-10-03 21:00                       ` Sean Christopherson
2023-10-03  5:54                 ` David Woodhouse
2023-10-04  0:04                   ` Sean Christopherson [this message]
2023-10-04 10:01                     ` David Woodhouse
2023-10-04 18:06                       ` Sean Christopherson
2023-10-04 19:13                         ` Dongli Zhang
2023-10-11  0:20                           ` Sean Christopherson
2023-10-11  7:18                             ` David Woodhouse
2023-10-13 18:07                               ` Sean Christopherson
2023-10-13 18:21                                 ` David Woodhouse
2023-10-13 19:02                                   ` Sean Christopherson
2023-10-13 19:12                                     ` David Woodhouse
2023-10-13 20:03                                       ` Sean Christopherson
2023-10-13 20:12                                 ` Dongli Zhang
2023-10-13 23:26                                   ` Sean Christopherson
2023-10-14  9:49                                     ` David Woodhouse
2023-10-16 15:47                                       ` Dongli Zhang
2023-10-16 16:25                                         ` David Woodhouse
2023-10-16 17:04                                           ` Dongli Zhang
2023-10-16 18:49                                           ` Sean Christopherson
2023-10-16 22:04                                             ` Dongli Zhang
2023-10-16 22:48                                               ` Sean Christopherson
2023-10-17 16:18                                                 ` Dongli Zhang
2023-10-03  9:12                 ` David Woodhouse
2023-10-04  0:07                   ` Sean Christopherson
2023-10-04  8:06                     ` David Woodhouse
2023-10-03 14:29                 ` David Woodhouse
2023-10-04  0:10                   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRysGAgk6W1bpXdl@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=dwmw2@infradead.org \
    --cc=joe.jin@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.