All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Wei Liu" <wl@xen.org>, "Roger Pau Monné" <roger.pau@citrix.com>
Subject: [PATCH v2 1/4] x86/time: further improve TSC / CPU freq calibration accuracy
Date: Mon, 24 Jan 2022 09:25:15 +0100	[thread overview]
Message-ID: <2e97dd91-5e43-3312-2e47-534f425c28c4@suse.com> (raw)
In-Reply-To: <879e5b70-bffd-b240-b2c8-c755b09d41a9@suse.com>

Calibration logic assumes that the platform timer (HPET or ACPI PM
timer) and the TSC are read at about the same time. This assumption may
not hold when a long latency event (e.g. SMI or NMI) occurs between the
two reads. Reduce the risk of reading uncorrelated values by doing at
least four pairs of reads, using the tuple where the delta between the
enclosing TSC reads was smallest. From the fourth iteration onwards bail
if the new TSC delta isn't better (smaller) than the best earlier one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
When running virtualized, scheduling in the host would also constitute
long latency events. I wonder whether, to compensate for that, we'd want
more than 3 "base" iterations, as I would expect scheduling events to
occur more frequently than e.g. SMI (and with a higher probability of
multiple ones occurring in close succession).
---
v2: Use helper functions to fold duplicate code.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -287,9 +287,47 @@ static char *freq_string(u64 freq)
     return s;
 }
 
-static uint64_t adjust_elapsed(uint64_t elapsed, uint32_t actual,
-                               uint32_t target)
+static uint32_t __init read_pt_and_tsc(uint64_t *tsc,
+                                       const struct platform_timesource *pts)
 {
+    uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0;
+    uint32_t best = best;
+    unsigned int i;
+
+    for ( i = 0; ; ++i )
+    {
+        uint32_t pt = pts->read_counter();
+        uint64_t tsc_cur = rdtsc_ordered();
+        uint64_t tsc_delta = tsc_cur - tsc_prev;
+
+        if ( tsc_delta < tsc_min )
+        {
+            tsc_min = tsc_delta;
+            *tsc = tsc_cur;
+            best = pt;
+        }
+        else if ( i > 2 )
+            break;
+
+        tsc_prev = tsc_cur;
+    }
+
+    return best;
+}
+
+static uint64_t __init calibrate_tsc(const struct platform_timesource *pts)
+{
+    uint64_t start, end, elapsed;
+    uint32_t count = read_pt_and_tsc(&start, pts);
+    uint32_t target = CALIBRATE_VALUE(pts->frequency), actual;
+    uint32_t mask = (uint32_t)~0 >> (32 - pts->counter_bits);
+
+    while ( ((pts->read_counter() - count) & mask) < target )
+        continue;
+
+    actual = read_pt_and_tsc(&end, pts) - count;
+    elapsed = end - start;
+
     if ( likely(actual > target) )
     {
         /*
@@ -395,8 +433,7 @@ static u64 read_hpet_count(void)
 
 static int64_t __init init_hpet(struct platform_timesource *pts)
 {
-    uint64_t hpet_rate, start;
-    uint32_t count, target, elapsed;
+    uint64_t hpet_rate;
     /*
      * Allow HPET to be setup, but report a frequency of 0 so it's not selected
      * as a timer source. This is required so it can be used in legacy
@@ -467,13 +504,7 @@ static int64_t __init init_hpet(struct p
 
     pts->frequency = hpet_rate;
 
-    count = hpet_read32(HPET_COUNTER);
-    start = rdtsc_ordered();
-    target = CALIBRATE_VALUE(hpet_rate);
-    while ( (elapsed = hpet_read32(HPET_COUNTER) - count) < target )
-        continue;
-
-    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
+    return calibrate_tsc(pts);
 }
 
 static void resume_hpet(struct platform_timesource *pts)
@@ -508,22 +539,12 @@ static u64 read_pmtimer_count(void)
 
 static s64 __init init_pmtimer(struct platform_timesource *pts)
 {
-    uint64_t start;
-    uint32_t count, target, mask, elapsed;
-
     if ( !pmtmr_ioport || (pmtmr_width != 24 && pmtmr_width != 32) )
         return 0;
 
     pts->counter_bits = pmtmr_width;
-    mask = 0xffffffff >> (32 - pmtmr_width);
-
-    count = inl(pmtmr_ioport);
-    start = rdtsc_ordered();
-    target = CALIBRATE_VALUE(ACPI_PM_FREQUENCY);
-    while ( (elapsed = (inl(pmtmr_ioport) - count) & mask) < target )
-        continue;
 
-    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
+    return calibrate_tsc(pts);
 }
 
 static struct platform_timesource __initdata plt_pmtimer =



  reply	other threads:[~2022-01-24  8:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-24  8:23 [PATCH v2 0/4] x86: further improve timer freq calibration accuracy Jan Beulich
2022-01-24  8:25 ` Jan Beulich [this message]
2022-02-08 13:49   ` [PATCH v2 1/4] x86/time: further improve TSC / CPU " Jan Beulich
2022-01-24  8:26 ` [PATCH v2 2/4] x86/APIC: calibrate against platform timer when possible Jan Beulich
2022-01-24  8:27 ` [PATCH v2 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT() Jan Beulich
2022-01-24  8:27 ` [PATCH v2 4/4] x86/APIC: make connections between seemingly arbitrary numbers Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e97dd91-5e43-3312-2e47-534f425c28c4@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.