kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: "Chen, Zide" <zide.chen@intel.com>,
	Jack Allister <jalliste@amazon.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	 Shuah Khan <shuah@kernel.org>
Cc: Paul Durrant <paul@xen.org>,
	kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	 linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer drift correction
Date: Wed, 24 Apr 2024 13:58:24 +0100	[thread overview]
Message-ID: <c8dca08bf848e663f192de6705bf04aa3966e856.camel@infradead.org> (raw)
In-Reply-To: <71260288-3666-4419-8283-6565e91aaba4@intel.com>

[-- Attachment #1: Type: text/plain, Size: 3592 bytes --]

On Mon, 2024-04-22 at 15:02 -0700, Chen, Zide wrote:
> the selftest works for me, and I ran the test for 1000+ iterations,
> w/ or w/o TSC scaling, the TEST_ASSERT(delta_corrected <= ±1) never
> got hit. This is awesome!

I think that with further care we can get even better than that.

Let's look at where that ±1ns tolerance comes from.

Consider a 3GHz TSC. That gives us three ticks per nanosecond. Each TSC
value can be seen as (3n) (3n+1) or (3n+2) for a given nanosecond n.

If we take a new reference point at a (3n+2) TSC value and calculate
the KVM clock from that, we *know* we're going to round down and lose
two-thirds of a nanosecond.

So then we set the new KVM clock parameters to use that new reference
point, and that's why we have to allow a disruption of up to a single
nanosecond. In fact, I don't think it's ±1 ns, is it? It'll only ever
be in the same direction (rounding down)?

But if we're careful which *which* TSC value we use as the reference
point, we can reduce that error.

The TSC value we use should be *around* the current time, but what if
we were to evaluate maybe the previous 100 TSC values. Pass *each* of
them through the conversion to nanoseconds and use the one that comes
*closest* to a precise nanosecond (nnnnnnnn.000).

It's even fairly easy to calculate those, because of the way the KVM
clock ABI has us multiply and then shift right by 32 bits. We just need
to look at those low 32 bits (the fractional nanosecond) *before*
shifting them out of existence. Something like...

   uint64_t tsc_candidate, tsc_candidate_last, best_tsc;
   uint32_t frac_ns_min = 0xffffffff;
   uint64_t frac_ns;

   best_tsc = tsc_candidate = rdtsc();
   tsc_candidate_last = tsc_candidate - 100;

   while (tsc_candidate-- > tsc_candidate_last) {
      uint64_t guest_tsc = kvm_scale_tsc(tsc_candidate, ...);
      frac_ns = guest_tsc * hvclock->tsc_to_system_mul;
      /* Shift *after* multiplication, not before as pvclock_scale_cycles() does. */
      if (hvclock->tsc_shift < 0)
          frac_ns >>= -hvclock->tsc_shift;
      else
          frac_ns <<= hvclock->tsc_shift;

      if ( (uint32_t)frac_ns <= frac_ns_min ) {
          frac_ns_min = frac_ns;
          best_tsc = tsc_candidate;
      }
   }
   printk("Best TSC to use for reference point is %lld", best_tsc);

And then you calculate your CLOCK_MONOTONIC_RAW and guest KVM clock
from *that* host TSC value, and thus minimise the discrepancies due to
rounding down?

Aside from the fact that I literally just typed that into email and
it's barely even thought through let alone entirely untested... I'm
just not sure it's even worth the runtime cost, for that ±1 ns on a
rare case.

A slop of ±1ns is probably sufficient because over the past few years
we've already shifted the definition of the KVM clock to *not* be NTP-
corrected, and we leave guests to do fine-grained synchronization
through other means anyway.

But I see talk of people offering a PPS signal to *hundreds* of guests
on the same host simultaneously, just for them all to use it to
calibrate the same underlying oscillator. Which is a little bit insane.

We *should* have a way for the host to do that once and then expose the
precise time to its guests, in a much saner way than the KVM clock
does. I'll look at adding something along those lines to this series
too, which can be driven from the host's adjtimex() adjustments (which
KVM already consumes), and fed into each guest's timekeeping as a
PTP/PPS device or something.



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

  parent reply	other threads:[~2024-04-24 12:58 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08 22:07 [PATCH 0/2] Add API to correct KVM/PV clock drift Jack Allister
2024-04-08 22:07 ` [PATCH 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for KVM clock drift fixup Jack Allister
2024-04-09  0:34   ` Dongli Zhang
2024-04-09  3:50     ` David Woodhouse
2024-04-10 10:08     ` Allister, Jack
2024-04-08 22:07 ` [PATCH 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer drift correction Jack Allister
2024-04-09  0:43   ` Dongli Zhang
2024-04-09  4:23     ` David Woodhouse
2024-04-10 10:15     ` Allister, Jack
2024-04-11 13:28       ` David Woodhouse
2024-04-19 17:13   ` Chen, Zide
     [not found]     ` <17F1A2E9-6BAD-40E7-ACDD-B110CFC124B3@infradead.org>
2024-04-19 18:43       ` David Woodhouse
2024-04-19 23:54         ` Chen, Zide
2024-04-20 10:32           ` David Woodhouse
2024-04-20 16:03           ` David Woodhouse
2024-04-22 22:02             ` Chen, Zide
2024-04-23  7:49               ` David Woodhouse
2024-04-23 17:59                 ` Chen, Zide
2024-04-23 21:02                   ` David Woodhouse
2024-04-24 12:58               ` David Woodhouse [this message]
2024-04-19 19:34     ` David Woodhouse
2024-04-19 23:53       ` Chen, Zide
2024-04-10  9:52 ` [PATCH v2 0/2] Add API for accurate KVM/PV clock migration Jack Allister
2024-04-10  9:52   ` [PATCH v2 1/2] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM " Jack Allister
2024-04-10 10:29     ` Paul Durrant
2024-04-10 12:09       ` David Woodhouse
2024-04-10 12:43         ` Paul Durrant
2024-04-17 19:50           ` David Woodhouse
2024-04-15  7:16     ` David Woodhouse
2024-04-10  9:52   ` [PATCH v2 2/2] KVM: selftests: Add KVM/PV clock selftest to prove timer correction Jack Allister
2024-04-10 10:36     ` Paul Durrant
2024-04-12  8:19     ` Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8dca08bf848e663f192de6705bf04aa3966e856.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).