All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86: further improve timer freq calibration accuracy
@ 2022-02-14  9:22 Jan Beulich
  2022-02-14  9:24 ` [PATCH v3 1/4] x86/time: further improve TSC / CPU " Jan Beulich
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jan Beulich @ 2022-02-14  9:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

... plus some tidying (or so I hope). Only the 1st patch changed compared
to v2.

1: time: further improve TSC / CPU freq calibration accuracy
2: APIC: calibrate against platform timer when possible
3: APIC: skip unnecessary parts of __setup_APIC_LVTT()
4: APIC: make connections between seemingly arbitrary numbers

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/4] x86/time: further improve TSC / CPU freq calibration accuracy
  2022-02-14  9:22 [PATCH v3 0/4] x86: further improve timer freq calibration accuracy Jan Beulich
@ 2022-02-14  9:24 ` Jan Beulich
  2022-03-11 12:03   ` Roger Pau Monné
  2022-02-14  9:25 ` [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible Jan Beulich
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-02-14  9:24 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Calibration logic assumes that the platform timer (HPET or ACPI PM
timer) and the TSC are read at about the same time. This assumption may
not hold when a long latency event (e.g. SMI or NMI) occurs between the
two reads. Reduce the risk of reading uncorrelated values by doing at
least four pairs of reads, using the tuple where the delta between the
enclosing TSC reads was smallest. From the fourth iteration onwards bail
if the new TSC delta isn't better (smaller) than the best earlier one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
When running virtualized, scheduling in the host would also constitute
long latency events. I wonder whether, to compensate for that, we'd want
more than 3 "base" iterations, as I would expect scheduling events to
occur more frequently than e.g. SMI (and with a higher probability of
multiple ones occurring in close succession).
---
v3: Fix 24-bit PM timer wrapping between the two read_pt_and_tsc()
    invocations.
v2: Use helper functions to fold duplicate code.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -287,9 +287,47 @@ static char *freq_string(u64 freq)
     return s;
 }
 
-static uint64_t adjust_elapsed(uint64_t elapsed, uint32_t actual,
-                               uint32_t target)
+static uint32_t __init read_pt_and_tsc(uint64_t *tsc,
+                                       const struct platform_timesource *pts)
 {
+    uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0;
+    uint32_t best = best;
+    unsigned int i;
+
+    for ( i = 0; ; ++i )
+    {
+        uint32_t pt = pts->read_counter();
+        uint64_t tsc_cur = rdtsc_ordered();
+        uint64_t tsc_delta = tsc_cur - tsc_prev;
+
+        if ( tsc_delta < tsc_min )
+        {
+            tsc_min = tsc_delta;
+            *tsc = tsc_cur;
+            best = pt;
+        }
+        else if ( i > 2 )
+            break;
+
+        tsc_prev = tsc_cur;
+    }
+
+    return best;
+}
+
+static uint64_t __init calibrate_tsc(const struct platform_timesource *pts)
+{
+    uint64_t start, end, elapsed;
+    unsigned int count = read_pt_and_tsc(&start, pts);
+    unsigned int target = CALIBRATE_VALUE(pts->frequency), actual;
+    unsigned int mask = (uint32_t)~0 >> (32 - pts->counter_bits);
+
+    while ( ((pts->read_counter() - count) & mask) < target )
+        continue;
+
+    actual = (read_pt_and_tsc(&end, pts) - count) & mask;
+    elapsed = end - start;
+
     if ( likely(actual > target) )
     {
         /*
@@ -395,8 +433,7 @@ static u64 read_hpet_count(void)
 
 static int64_t __init init_hpet(struct platform_timesource *pts)
 {
-    uint64_t hpet_rate, start;
-    uint32_t count, target, elapsed;
+    uint64_t hpet_rate;
     /*
      * Allow HPET to be setup, but report a frequency of 0 so it's not selected
      * as a timer source. This is required so it can be used in legacy
@@ -467,13 +504,7 @@ static int64_t __init init_hpet(struct p
 
     pts->frequency = hpet_rate;
 
-    count = hpet_read32(HPET_COUNTER);
-    start = rdtsc_ordered();
-    target = CALIBRATE_VALUE(hpet_rate);
-    while ( (elapsed = hpet_read32(HPET_COUNTER) - count) < target )
-        continue;
-
-    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
+    return calibrate_tsc(pts);
 }
 
 static void resume_hpet(struct platform_timesource *pts)
@@ -508,22 +539,12 @@ static u64 read_pmtimer_count(void)
 
 static s64 __init init_pmtimer(struct platform_timesource *pts)
 {
-    uint64_t start;
-    uint32_t count, target, mask, elapsed;
-
     if ( !pmtmr_ioport || (pmtmr_width != 24 && pmtmr_width != 32) )
         return 0;
 
     pts->counter_bits = pmtmr_width;
-    mask = 0xffffffff >> (32 - pmtmr_width);
-
-    count = inl(pmtmr_ioport);
-    start = rdtsc_ordered();
-    target = CALIBRATE_VALUE(ACPI_PM_FREQUENCY);
-    while ( (elapsed = (inl(pmtmr_ioport) - count) & mask) < target )
-        continue;
 
-    return adjust_elapsed(rdtsc_ordered() - start, elapsed, target);
+    return calibrate_tsc(pts);
 }
 
 static struct platform_timesource __initdata plt_pmtimer =



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible
  2022-02-14  9:22 [PATCH v3 0/4] x86: further improve timer freq calibration accuracy Jan Beulich
  2022-02-14  9:24 ` [PATCH v3 1/4] x86/time: further improve TSC / CPU " Jan Beulich
@ 2022-02-14  9:25 ` Jan Beulich
  2022-03-11 13:45   ` Roger Pau Monné
  2022-02-14  9:25 ` [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT() Jan Beulich
  2022-02-14  9:25 ` [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers Jan Beulich
  3 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-02-14  9:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Use the original calibration against PIT only when the platform timer
is PIT. This implicitly excludes the "xen_guest" case from using the PIT
logic (init_pit() fails there, and as of 5e73b2594c54 ["x86/time: minor
adjustments to init_pit()"] using_pit also isn't being set too early
anymore), so the respective hack there can be dropped at the same time.
This also reduces calibration time from 100ms to 50ms, albeit this step
is being skipped as of 0731a56c7c72 ("x86/APIC: no need for timer
calibration when using TDT") anyway.

While re-indenting the PIT logic in calibrate_APIC_clock(), besides
adjusting style also switch around the 2nd TSC/TMCCT read pair, to match
the order of the 1st one, yielding more consistent deltas.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Open-coding apic_read() in read_tmcct() isn't overly nice, but I wanted
to avoid x2apic_enabled being evaluated twice in close succession. (The
barrier is there just in case only anyway: While this RDMSR isn't
serializing, I'm unaware of any statement whether it can also be
executed speculatively, like RDTSC can.) An option might be to move the
function to apic.c such that it would also be used by
calibrate_APIC_clock().

Unlike the CPU frequencies enumerated in CPUID leaf 0x16 (which aren't
precise), using CPUID[0x15].ECX - if populated - may be an option to
skip calibration altogether. Iirc the value there is precise, but using
the systems I have easy access to I cannot verify this: In the sample
of three I have, none have ECX populated.

I wonder whether the secondary CPU freq measurement (used for display
purposes only) wouldn't better be dropped at this occasion.
---
v2: New.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1182,20 +1182,6 @@ static void __init check_deadline_errata
            "please update microcode to version %#x (or later)\n", rev);
 }
 
-static void __init wait_tick_pvh(void)
-{
-    u64 lapse_ns = 1000000000ULL / HZ;
-    s_time_t start, curr_time;
-
-    start = NOW();
-
-    /* Won't wrap around */
-    do {
-        cpu_relax();
-        curr_time = NOW();
-    } while ( curr_time - start < lapse_ns );
-}
-
 /*
  * In this function we calibrate APIC bus clocks to the external
  * timer. Unfortunately we cannot use jiffies and the timer irq
@@ -1211,9 +1197,6 @@ static void __init wait_tick_pvh(void)
 
 static void __init calibrate_APIC_clock(void)
 {
-    unsigned long long t1, t2;
-    unsigned long tt1, tt2;
-    unsigned int i;
     unsigned long bus_freq; /* KAF: pointer-size avoids compile warns. */
     unsigned int bus_cycle; /* length of one bus cycle in pico-seconds */
 #define LOOPS_FRAC 10U      /* measure for one tenth of a second */
@@ -1226,39 +1209,38 @@ static void __init calibrate_APIC_clock(
      */
     __setup_APIC_LVTT(0xffffffff);
 
-    if ( !xen_guest )
+    bus_freq = calibrate_apic_timer();
+    if ( !bus_freq )
+    {
+        unsigned int i, tt1, tt2;
+        unsigned long t1, t2;
+
+        ASSERT(!xen_guest);
+
         /*
-         * The timer chip counts down to zero. Let's wait
-         * for a wraparound to start exact measurement:
-         * (the current tick might have been already half done)
+         * The timer chip counts down to zero. Let's wait for a wraparound to
+         * start exact measurement (the current tick might have been already
+         * half done):
          */
         wait_8254_wraparound();
-    else
-        wait_tick_pvh();
 
-    /*
-     * We wrapped around just now. Let's start:
-     */
-    t1 = rdtsc_ordered();
-    tt1 = apic_read(APIC_TMCCT);
+        /* We wrapped around just now. Let's start: */
+        t1 = rdtsc_ordered();
+        tt1 = apic_read(APIC_TMCCT);
 
-    /*
-     * Let's wait HZ / LOOPS_FRAC ticks:
-     */
-    for (i = 0; i < HZ / LOOPS_FRAC; i++)
-        if ( !xen_guest )
+        /* Let's wait HZ / LOOPS_FRAC ticks: */
+        for ( i = 0; i < HZ / LOOPS_FRAC; ++i )
             wait_8254_wraparound();
-        else
-            wait_tick_pvh();
 
-    tt2 = apic_read(APIC_TMCCT);
-    t2 = rdtsc_ordered();
+        t2 = rdtsc_ordered();
+        tt2 = apic_read(APIC_TMCCT);
 
-    bus_freq = (tt1 - tt2) * APIC_DIVISOR * LOOPS_FRAC;
+        bus_freq = (tt1 - tt2) * APIC_DIVISOR * LOOPS_FRAC;
 
-    apic_printk(APIC_VERBOSE, "..... CPU clock speed is %lu.%04lu MHz.\n",
-                ((unsigned long)(t2 - t1) * LOOPS_FRAC) / 1000000,
-                (((unsigned long)(t2 - t1) * LOOPS_FRAC) / 100) % 10000);
+        apic_printk(APIC_VERBOSE, "..... CPU clock speed is %lu.%04lu MHz.\n",
+                    ((t2 - t1) * LOOPS_FRAC) / 1000000,
+                    (((t2 - t1) * LOOPS_FRAC) / 100) % 10000);
+    }
 
     apic_printk(APIC_VERBOSE, "..... host bus clock speed is %ld.%04ld MHz.\n",
                 bus_freq / 1000000, (bus_freq / 100) % 10000);
--- a/xen/arch/x86/include/asm/apic.h
+++ b/xen/arch/x86/include/asm/apic.h
@@ -192,6 +192,8 @@ extern void record_boot_APIC_mode(void);
 extern enum apic_mode current_local_apic_mode(void);
 extern void check_for_unexpected_msi(unsigned int vector);
 
+uint64_t calibrate_apic_timer(void);
+
 extern void check_nmi_watchdog(void);
 
 extern unsigned int nmi_watchdog;
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -26,6 +26,7 @@
 #include <xen/symbols.h>
 #include <xen/keyhandler.h>
 #include <xen/guest_access.h>
+#include <asm/apic.h>
 #include <asm/io.h>
 #include <asm/iocap.h>
 #include <asm/msr.h>
@@ -1004,6 +1005,78 @@ static u64 __init init_platform_timer(vo
     return rc;
 }
 
+static uint32_t __init read_tmcct(void)
+{
+    if ( x2apic_enabled )
+    {
+        alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC);
+        return apic_rdmsr(APIC_TMCCT);
+    }
+
+    return apic_mem_read(APIC_TMCCT);
+}
+
+static uint64_t __init read_pt_and_tmcct(uint32_t *tmcct)
+{
+    uint32_t tmcct_prev = *tmcct = read_tmcct(), tmcct_min = ~0;
+    uint64_t best = best;
+    unsigned int i;
+
+    for ( i = 0; ; ++i )
+    {
+        uint64_t pt = plt_src.read_counter();
+        uint32_t tmcct_cur = read_tmcct();
+        uint32_t tmcct_delta = tmcct_prev - tmcct_cur;
+
+        if ( tmcct_delta < tmcct_min )
+        {
+            tmcct_min = tmcct_delta;
+            *tmcct = tmcct_cur;
+            best = pt;
+        }
+        else if ( i > 2 )
+            break;
+
+        tmcct_prev = tmcct_cur;
+    }
+
+    return best;
+}
+
+uint64_t __init calibrate_apic_timer(void)
+{
+    uint32_t start, end;
+    uint64_t count = read_pt_and_tmcct(&start), elapsed;
+    uint64_t target = CALIBRATE_VALUE(plt_src.frequency), actual;
+    uint64_t mask = (uint64_t)~0 >> (64 - plt_src.counter_bits);
+
+    /*
+     * PIT cannot be used here as it requires the timer interrupt to maintain
+     * its 32-bit software counter, yet here we run with IRQs disabled.
+     */
+    if ( using_pit )
+        return 0;
+
+    while ( ((plt_src.read_counter() - count) & mask) < target )
+        continue;
+
+    actual = read_pt_and_tmcct(&end) - count;
+    elapsed = start - end;
+
+    if ( likely(actual > target) )
+    {
+        /* See the comment in calibrate_tsc(). */
+        while ( unlikely(actual > (uint32_t)actual) )
+        {
+            actual >>= 1;
+            target >>= 1;
+        }
+        elapsed = muldiv64(elapsed, target, actual);
+    }
+
+    return elapsed * CALIBRATE_FRAC;
+}
+
 u64 stime2tsc(s_time_t stime)
 {
     struct cpu_time *t;



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT()
  2022-02-14  9:22 [PATCH v3 0/4] x86: further improve timer freq calibration accuracy Jan Beulich
  2022-02-14  9:24 ` [PATCH v3 1/4] x86/time: further improve TSC / CPU " Jan Beulich
  2022-02-14  9:25 ` [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible Jan Beulich
@ 2022-02-14  9:25 ` Jan Beulich
  2022-03-11 14:05   ` Roger Pau Monné
  2022-02-14  9:25 ` [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers Jan Beulich
  3 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-02-14  9:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

In TDT mode there's no point writing TDCR or TMICT, while outside of
that mode there's no need for the MFENCE.

No change intended to overall functioning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1059,24 +1059,25 @@ static void __setup_APIC_LVTT(unsigned i
 {
     unsigned int lvtt_value, tmp_value;
 
-    /* NB. Xen uses local APIC timer in one-shot mode. */
-    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
-
     if ( tdt_enabled )
     {
-        lvtt_value &= (~APIC_TIMER_MODE_MASK);
-        lvtt_value |= APIC_TIMER_MODE_TSC_DEADLINE;
+        lvtt_value = APIC_TIMER_MODE_TSC_DEADLINE | LOCAL_TIMER_VECTOR;
+        apic_write(APIC_LVTT, lvtt_value);
+
+        /*
+         * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
+         * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
+         * According to Intel, MFENCE can do the serialization here.
+         */
+        asm volatile( "mfence" : : : "memory" );
+
+        return;
     }
 
+    /* NB. Xen uses local APIC timer in one-shot mode. */
+    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
     apic_write(APIC_LVTT, lvtt_value);
 
-    /*
-     * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
-     * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
-     * According to Intel, MFENCE can do the serialization here.
-     */
-    asm volatile( "mfence" : : : "memory" );
-
     tmp_value = apic_read(APIC_TDCR);
     apic_write(APIC_TDCR, tmp_value | APIC_TDR_DIV_1);
 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers
  2022-02-14  9:22 [PATCH v3 0/4] x86: further improve timer freq calibration accuracy Jan Beulich
                   ` (2 preceding siblings ...)
  2022-02-14  9:25 ` [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT() Jan Beulich
@ 2022-02-14  9:25 ` Jan Beulich
  2022-03-11 14:24   ` Roger Pau Monné
  3 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-02-14  9:25 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Making adjustments to arbitrarily chosen values shouldn't require
auditing the code for possible derived numbers - such a change should
be doable in a single place, having an effect on all code depending on
that choice.

For one make the TDCR write actually use APIC_DIVISOR. With the
necessary mask constant introduced, also use that in vLAPIC code. While
introducing the constant, drop APIC_TDR_DIV_TMBASE: The bit has been
undefined in halfway recent SDM and PM versions.

And then introduce a constant tying together the scale used when
converting nanoseconds to bus clocks.

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
I thought we have a generic "glue" macro, but I couldn't find one. Hence
I'm (ab)using _AC().
---
v2: New.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -1078,8 +1078,8 @@ static void __setup_APIC_LVTT(unsigned i
     lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
     apic_write(APIC_LVTT, lvtt_value);
 
-    tmp_value = apic_read(APIC_TDCR);
-    apic_write(APIC_TDCR, tmp_value | APIC_TDR_DIV_1);
+    tmp_value = apic_read(APIC_TDCR) & ~APIC_TDR_DIV_MASK;
+    apic_write(APIC_TDCR, tmp_value | _AC(APIC_TDR_DIV_, APIC_DIVISOR));
 
     apic_write(APIC_TMICT, clocks / APIC_DIVISOR);
 }
@@ -1196,6 +1196,8 @@ static void __init check_deadline_errata
  * APIC irq that way.
  */
 
+#define BUS_SCALE_SHIFT 18
+
 static void __init calibrate_APIC_clock(void)
 {
     unsigned long bus_freq; /* KAF: pointer-size avoids compile warns. */
@@ -1249,8 +1251,8 @@ static void __init calibrate_APIC_clock(
     /* set up multipliers for accurate timer code */
     bus_cycle  = 1000000000000UL / bus_freq; /* in pico seconds */
     bus_cycle += (1000000000000UL % bus_freq) * 2 > bus_freq;
-    bus_scale  = (1000*262144)/bus_cycle;
-    bus_scale += ((1000 * 262144) % bus_cycle) * 2 > bus_cycle;
+    bus_scale  = (1000 << BUS_SCALE_SHIFT) / bus_cycle;
+    bus_scale += ((1000 << BUS_SCALE_SHIFT) % bus_cycle) * 2 > bus_cycle;
 
     apic_printk(APIC_VERBOSE, "..... bus_scale = %#x\n", bus_scale);
     /* reset APIC to zero timeout value */
@@ -1337,7 +1339,8 @@ int reprogram_timer(s_time_t timeout)
     }
 
     if ( timeout && ((expire = timeout - NOW()) > 0) )
-        apic_tmict = min_t(u64, (bus_scale * expire) >> 18, UINT_MAX);
+        apic_tmict = min_t(uint64_t, (bus_scale * expire) >> BUS_SCALE_SHIFT,
+                           UINT32_MAX);
 
     apic_write(APIC_TMICT, (unsigned long)apic_tmict);
 
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -580,7 +580,7 @@ static uint32_t vlapic_get_tmcct(const s
 static void vlapic_set_tdcr(struct vlapic *vlapic, unsigned int val)
 {
     /* Only bits 0, 1 and 3 are settable; others are MBZ. */
-    val &= 0xb;
+    val &= APIC_TDR_DIV_MASK;
     vlapic_set_reg(vlapic, APIC_TDCR, val);
 
     /* Update the demangled hw.timer_divisor. */
@@ -887,7 +887,7 @@ void vlapic_reg_write(struct vcpu *v, un
     {
         uint32_t current_divisor = vlapic->hw.timer_divisor;
 
-        vlapic_set_tdcr(vlapic, val & 0xb);
+        vlapic_set_tdcr(vlapic, val);
 
         vlapic_update_timer(vlapic, vlapic_get_reg(vlapic, APIC_LVTT), false,
                             current_divisor);
@@ -1019,7 +1019,7 @@ int guest_wrmsr_x2apic(struct vcpu *v, u
         break;
 
     case APIC_TDCR:
-        if ( msr_content & ~APIC_TDR_DIV_1 )
+        if ( msr_content & ~APIC_TDR_DIV_MASK )
             return X86EMUL_EXCEPTION;
         break;
 
--- a/xen/arch/x86/include/asm/apicdef.h
+++ b/xen/arch/x86/include/asm/apicdef.h
@@ -106,7 +106,7 @@
 #define		APIC_TMICT	0x380
 #define		APIC_TMCCT	0x390
 #define		APIC_TDCR	0x3E0
-#define			APIC_TDR_DIV_TMBASE	(1<<2)
+#define			APIC_TDR_DIV_MASK	0xB
 #define			APIC_TDR_DIV_1		0xB
 #define			APIC_TDR_DIV_2		0x0
 #define			APIC_TDR_DIV_4		0x1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/4] x86/time: further improve TSC / CPU freq calibration accuracy
  2022-02-14  9:24 ` [PATCH v3 1/4] x86/time: further improve TSC / CPU " Jan Beulich
@ 2022-03-11 12:03   ` Roger Pau Monné
  2022-03-11 12:30     ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-11 12:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Feb 14, 2022 at 10:24:49AM +0100, Jan Beulich wrote:
> Calibration logic assumes that the platform timer (HPET or ACPI PM
> timer) and the TSC are read at about the same time. This assumption may
> not hold when a long latency event (e.g. SMI or NMI) occurs between the
> two reads. Reduce the risk of reading uncorrelated values by doing at
> least four pairs of reads, using the tuple where the delta between the
> enclosing TSC reads was smallest. From the fourth iteration onwards bail
> if the new TSC delta isn't better (smaller) than the best earlier one.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> When running virtualized, scheduling in the host would also constitute
> long latency events. I wonder whether, to compensate for that, we'd want
> more than 3 "base" iterations, as I would expect scheduling events to
> occur more frequently than e.g. SMI (and with a higher probability of
> multiple ones occurring in close succession).

That's hard to tell, maybe we should make the base iteration count
settable from the command line?

> ---
> v3: Fix 24-bit PM timer wrapping between the two read_pt_and_tsc()
>     invocations.
> v2: Use helper functions to fold duplicate code.
> 
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -287,9 +287,47 @@ static char *freq_string(u64 freq)
>      return s;
>  }
>  
> -static uint64_t adjust_elapsed(uint64_t elapsed, uint32_t actual,
> -                               uint32_t target)
> +static uint32_t __init read_pt_and_tsc(uint64_t *tsc,
> +                                       const struct platform_timesource *pts)
>  {
> +    uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0;
> +    uint32_t best = best;
> +    unsigned int i;
> +
> +    for ( i = 0; ; ++i )
> +    {
> +        uint32_t pt = pts->read_counter();
> +        uint64_t tsc_cur = rdtsc_ordered();
> +        uint64_t tsc_delta = tsc_cur - tsc_prev;
> +
> +        if ( tsc_delta < tsc_min )
> +        {
> +            tsc_min = tsc_delta;
> +            *tsc = tsc_cur;
> +            best = pt;
> +        }
> +        else if ( i > 2 )
> +            break;
> +
> +        tsc_prev = tsc_cur;
> +    }
> +
> +    return best;
> +}
> +
> +static uint64_t __init calibrate_tsc(const struct platform_timesource *pts)
> +{
> +    uint64_t start, end, elapsed;
> +    unsigned int count = read_pt_and_tsc(&start, pts);
> +    unsigned int target = CALIBRATE_VALUE(pts->frequency), actual;
> +    unsigned int mask = (uint32_t)~0 >> (32 - pts->counter_bits);

Just to be on the safe side you might want to add an assert that
counter_bits <= 32.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/4] x86/time: further improve TSC / CPU freq calibration accuracy
  2022-03-11 12:03   ` Roger Pau Monné
@ 2022-03-11 12:30     ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-03-11 12:30 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 11.03.2022 13:03, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 10:24:49AM +0100, Jan Beulich wrote:
>> Calibration logic assumes that the platform timer (HPET or ACPI PM
>> timer) and the TSC are read at about the same time. This assumption may
>> not hold when a long latency event (e.g. SMI or NMI) occurs between the
>> two reads. Reduce the risk of reading uncorrelated values by doing at
>> least four pairs of reads, using the tuple where the delta between the
>> enclosing TSC reads was smallest. From the fourth iteration onwards bail
>> if the new TSC delta isn't better (smaller) than the best earlier one.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> ---
>> When running virtualized, scheduling in the host would also constitute
>> long latency events. I wonder whether, to compensate for that, we'd want
>> more than 3 "base" iterations, as I would expect scheduling events to
>> occur more frequently than e.g. SMI (and with a higher probability of
>> multiple ones occurring in close succession).
> 
> That's hard to tell, maybe we should make the base iteration count
> settable from the command line?

As a last resort (if people observe problems) - maybe. It's not clear to me
though on what basis an admin would choose another value.

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible
  2022-02-14  9:25 ` [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible Jan Beulich
@ 2022-03-11 13:45   ` Roger Pau Monné
  2022-03-14 16:19     ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-11 13:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Feb 14, 2022 at 10:25:11AM +0100, Jan Beulich wrote:
> Use the original calibration against PIT only when the platform timer
> is PIT. This implicitly excludes the "xen_guest" case from using the PIT
> logic (init_pit() fails there, and as of 5e73b2594c54 ["x86/time: minor
> adjustments to init_pit()"] using_pit also isn't being set too early
> anymore), so the respective hack there can be dropped at the same time.
> This also reduces calibration time from 100ms to 50ms, albeit this step
> is being skipped as of 0731a56c7c72 ("x86/APIC: no need for timer
> calibration when using TDT") anyway.
> 
> While re-indenting the PIT logic in calibrate_APIC_clock(), besides
> adjusting style also switch around the 2nd TSC/TMCCT read pair, to match
> the order of the 1st one, yielding more consistent deltas.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Open-coding apic_read() in read_tmcct() isn't overly nice, but I wanted
> to avoid x2apic_enabled being evaluated twice in close succession. (The
> barrier is there just in case only anyway: While this RDMSR isn't
> serializing, I'm unaware of any statement whether it can also be
> executed speculatively, like RDTSC can.) An option might be to move the
> function to apic.c such that it would also be used by
> calibrate_APIC_clock().

I think that would make sense. Or else it's kind of orthogonal that we
use a barrier in calibrate_apic_timer but not in calibrate_APIC_clock.
But maybe we can get rid of the open-coded PIT calibration in
calibrate_APIC_clock? (see below)

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -26,6 +26,7 @@
>  #include <xen/symbols.h>
>  #include <xen/keyhandler.h>
>  #include <xen/guest_access.h>
> +#include <asm/apic.h>
>  #include <asm/io.h>
>  #include <asm/iocap.h>
>  #include <asm/msr.h>
> @@ -1004,6 +1005,78 @@ static u64 __init init_platform_timer(vo
>      return rc;
>  }
>  
> +static uint32_t __init read_tmcct(void)
> +{
> +    if ( x2apic_enabled )
> +    {
> +        alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC);
> +        return apic_rdmsr(APIC_TMCCT);
> +    }
> +
> +    return apic_mem_read(APIC_TMCCT);
> +}
> +
> +static uint64_t __init read_pt_and_tmcct(uint32_t *tmcct)
> +{
> +    uint32_t tmcct_prev = *tmcct = read_tmcct(), tmcct_min = ~0;
> +    uint64_t best = best;
> +    unsigned int i;
> +
> +    for ( i = 0; ; ++i )
> +    {
> +        uint64_t pt = plt_src.read_counter();
> +        uint32_t tmcct_cur = read_tmcct();
> +        uint32_t tmcct_delta = tmcct_prev - tmcct_cur;
> +
> +        if ( tmcct_delta < tmcct_min )
> +        {
> +            tmcct_min = tmcct_delta;
> +            *tmcct = tmcct_cur;
> +            best = pt;
> +        }
> +        else if ( i > 2 )
> +            break;
> +
> +        tmcct_prev = tmcct_cur;
> +    }
> +
> +    return best;
> +}
> +
> +uint64_t __init calibrate_apic_timer(void)
> +{
> +    uint32_t start, end;
> +    uint64_t count = read_pt_and_tmcct(&start), elapsed;
> +    uint64_t target = CALIBRATE_VALUE(plt_src.frequency), actual;
> +    uint64_t mask = (uint64_t)~0 >> (64 - plt_src.counter_bits);
> +
> +    /*
> +     * PIT cannot be used here as it requires the timer interrupt to maintain
> +     * its 32-bit software counter, yet here we run with IRQs disabled.
> +     */

The reasoning in calibrate_APIC_clock to have interrupts disabled
doesn't apply anymore I would think (interrupts are already enabled
when we get there), and hence it seems to me that calibrate_APIC_clock
could be called with interrupts enabled and we could remove the
open-coded usage of the PIT in calibrate_APIC_clock.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT()
  2022-02-14  9:25 ` [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT() Jan Beulich
@ 2022-03-11 14:05   ` Roger Pau Monné
  2022-03-14  8:25     ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-11 14:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Feb 14, 2022 at 10:25:31AM +0100, Jan Beulich wrote:
> In TDT mode there's no point writing TDCR or TMICT, while outside of
> that mode there's no need for the MFENCE.
> 
> No change intended to overall functioning.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I've got some comments below, now that the current proposal is bad,
but I think we could simplify a bit more.

> ---
> v2: New.
> 
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -1059,24 +1059,25 @@ static void __setup_APIC_LVTT(unsigned i
>  {
>      unsigned int lvtt_value, tmp_value;
>  
> -    /* NB. Xen uses local APIC timer in one-shot mode. */
> -    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
> -
>      if ( tdt_enabled )
>      {
> -        lvtt_value &= (~APIC_TIMER_MODE_MASK);
> -        lvtt_value |= APIC_TIMER_MODE_TSC_DEADLINE;
> +        lvtt_value = APIC_TIMER_MODE_TSC_DEADLINE | LOCAL_TIMER_VECTOR;
> +        apic_write(APIC_LVTT, lvtt_value);
> +
> +        /*
> +         * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
> +         * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
> +         * According to Intel, MFENCE can do the serialization here.
> +         */
> +        asm volatile( "mfence" : : : "memory" );
> +
> +        return;
>      }
>  
> +    /* NB. Xen uses local APIC timer in one-shot mode. */
> +    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;

While here I wouldn't mind if you replaced the comment(s) here with
APIC_TIMER_MODE_ONESHOT. I think that's clearer.

I wouldn't mind if you did something like:

unsigned int lvtt_value = (tdt_enabled ? APIC_TIMER_MODE_TSC_DEADLINE
                                       : APIC_TIMER_MODE_ONESHOT) |
                          LOCAL_TIMER_VECTOR;

apic_write(APIC_LVTT, lvtt_value);

if ( tdt_enabled )
{
    MFENCE;
    return;
}

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers
  2022-02-14  9:25 ` [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers Jan Beulich
@ 2022-03-11 14:24   ` Roger Pau Monné
  2022-03-14  8:19     ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-11 14:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Feb 14, 2022 at 10:25:57AM +0100, Jan Beulich wrote:
> Making adjustments to arbitrarily chosen values shouldn't require
> auditing the code for possible derived numbers - such a change should
> be doable in a single place, having an effect on all code depending on
> that choice.
> 
> For one make the TDCR write actually use APIC_DIVISOR. With the
> necessary mask constant introduced, also use that in vLAPIC code. While
> introducing the constant, drop APIC_TDR_DIV_TMBASE: The bit has been
> undefined in halfway recent SDM and PM versions.
> 
> And then introduce a constant tying together the scale used when
> converting nanoseconds to bus clocks.
> 
> No functional change intended.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> I thought we have a generic "glue" macro, but I couldn't find one. Hence
> I'm (ab)using _AC().

I would be fine if you want to introduce something right in this
commit to cover those needs, using _AC is not overly nice (or
clear) IMO.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers
  2022-03-11 14:24   ` Roger Pau Monné
@ 2022-03-14  8:19     ` Jan Beulich
  2022-03-14  8:56       ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-03-14  8:19 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 11.03.2022 15:24, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 10:25:57AM +0100, Jan Beulich wrote:
>> Making adjustments to arbitrarily chosen values shouldn't require
>> auditing the code for possible derived numbers - such a change should
>> be doable in a single place, having an effect on all code depending on
>> that choice.
>>
>> For one make the TDCR write actually use APIC_DIVISOR. With the
>> necessary mask constant introduced, also use that in vLAPIC code. While
>> introducing the constant, drop APIC_TDR_DIV_TMBASE: The bit has been
>> undefined in halfway recent SDM and PM versions.
>>
>> And then introduce a constant tying together the scale used when
>> converting nanoseconds to bus clocks.
>>
>> No functional change intended.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> ---
>> I thought we have a generic "glue" macro, but I couldn't find one. Hence
>> I'm (ab)using _AC().
> 
> I would be fine if you want to introduce something right in this
> commit to cover those needs, using _AC is not overly nice (or
> clear) IMO.

Hmm, I was rather hoping that you (or someone else) would point me
at what I thought I'm overlooking. If anything I'd likely clone
Linux'es __PASTE() (avoiding the leading underscores), but their
placement in linux/compiler_types.h seems pretty arbitrary and
hence not a good guideline for placement in our tree. To be honest
the only thing that would seem halfway consistent to me would be a
separate header, yet that seems somewhat overkill ... Or wait -
maybe xen/lib.h could be viewed as kind of suitable. Of course
there's then the immediate question of whether to make _AC() use
the new macro instead of open-coding it.

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT()
  2022-03-11 14:05   ` Roger Pau Monné
@ 2022-03-14  8:25     ` Jan Beulich
  2022-03-14  8:58       ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-03-14  8:25 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 11.03.2022 15:05, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 10:25:31AM +0100, Jan Beulich wrote:
>> In TDT mode there's no point writing TDCR or TMICT, while outside of
>> that mode there's no need for the MFENCE.
>>
>> No change intended to overall functioning.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I've got some comments below, now that the current proposal is bad,
> but I think we could simplify a bit more.

I'm struggling with the sentence; perhaps s/now/not/ was meant?

>> --- a/xen/arch/x86/apic.c
>> +++ b/xen/arch/x86/apic.c
>> @@ -1059,24 +1059,25 @@ static void __setup_APIC_LVTT(unsigned i
>>  {
>>      unsigned int lvtt_value, tmp_value;
>>  
>> -    /* NB. Xen uses local APIC timer in one-shot mode. */
>> -    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
>> -
>>      if ( tdt_enabled )
>>      {
>> -        lvtt_value &= (~APIC_TIMER_MODE_MASK);
>> -        lvtt_value |= APIC_TIMER_MODE_TSC_DEADLINE;
>> +        lvtt_value = APIC_TIMER_MODE_TSC_DEADLINE | LOCAL_TIMER_VECTOR;
>> +        apic_write(APIC_LVTT, lvtt_value);
>> +
>> +        /*
>> +         * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
>> +         * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
>> +         * According to Intel, MFENCE can do the serialization here.
>> +         */
>> +        asm volatile( "mfence" : : : "memory" );
>> +
>> +        return;
>>      }
>>  
>> +    /* NB. Xen uses local APIC timer in one-shot mode. */
>> +    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
> 
> While here I wouldn't mind if you replaced the comment(s) here with
> APIC_TIMER_MODE_ONESHOT. I think that's clearer.
> 
> I wouldn't mind if you did something like:
> 
> unsigned int lvtt_value = (tdt_enabled ? APIC_TIMER_MODE_TSC_DEADLINE
>                                        : APIC_TIMER_MODE_ONESHOT) |
>                           LOCAL_TIMER_VECTOR;

I'm happy to switch to using APIC_TIMER_MODE_ONESHOT, but ...

> apic_write(APIC_LVTT, lvtt_value);
> 
> if ( tdt_enabled )
> {
>     MFENCE;
>     return;
> }

... I'd prefer to stick to just a single tdt_enabled conditional.
But then I'm also unclear about your use of "comment(s)" - what is
the (optional?) plural referring to?

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers
  2022-03-14  8:19     ` Jan Beulich
@ 2022-03-14  8:56       ` Roger Pau Monné
  0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-14  8:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Mar 14, 2022 at 09:19:01AM +0100, Jan Beulich wrote:
> On 11.03.2022 15:24, Roger Pau Monné wrote:
> > On Mon, Feb 14, 2022 at 10:25:57AM +0100, Jan Beulich wrote:
> >> Making adjustments to arbitrarily chosen values shouldn't require
> >> auditing the code for possible derived numbers - such a change should
> >> be doable in a single place, having an effect on all code depending on
> >> that choice.
> >>
> >> For one make the TDCR write actually use APIC_DIVISOR. With the
> >> necessary mask constant introduced, also use that in vLAPIC code. While
> >> introducing the constant, drop APIC_TDR_DIV_TMBASE: The bit has been
> >> undefined in halfway recent SDM and PM versions.
> >>
> >> And then introduce a constant tying together the scale used when
> >> converting nanoseconds to bus clocks.
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks.
> 
> >> ---
> >> I thought we have a generic "glue" macro, but I couldn't find one. Hence
> >> I'm (ab)using _AC().
> > 
> > I would be fine if you want to introduce something right in this
> > commit to cover those needs, using _AC is not overly nice (or
> > clear) IMO.
> 
> Hmm, I was rather hoping that you (or someone else) would point me
> at what I thought I'm overlooking. If anything I'd likely clone
> Linux'es __PASTE() (avoiding the leading underscores), but their
> placement in linux/compiler_types.h seems pretty arbitrary and
> hence not a good guideline for placement in our tree. To be honest
> the only thing that would seem halfway consistent to me would be a
> separate header, yet that seems somewhat overkill ... Or wait -
> maybe xen/lib.h could be viewed as kind of suitable. Of course
> there's then the immediate question of whether to make _AC() use
> the new macro instead of open-coding it.

I think if possible _AC should be switched to use the new macro, yes.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT()
  2022-03-14  8:25     ` Jan Beulich
@ 2022-03-14  8:58       ` Roger Pau Monné
  0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-14  8:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Mar 14, 2022 at 09:25:07AM +0100, Jan Beulich wrote:
> On 11.03.2022 15:05, Roger Pau Monné wrote:
> > On Mon, Feb 14, 2022 at 10:25:31AM +0100, Jan Beulich wrote:
> >> In TDT mode there's no point writing TDCR or TMICT, while outside of
> >> that mode there's no need for the MFENCE.
> >>
> >> No change intended to overall functioning.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > I've got some comments below, now that the current proposal is bad,
> > but I think we could simplify a bit more.
> 
> I'm struggling with the sentence; perhaps s/now/not/ was meant?

Indeed, s/now/not/ is what I meant.

> >> --- a/xen/arch/x86/apic.c
> >> +++ b/xen/arch/x86/apic.c
> >> @@ -1059,24 +1059,25 @@ static void __setup_APIC_LVTT(unsigned i
> >>  {
> >>      unsigned int lvtt_value, tmp_value;
> >>  
> >> -    /* NB. Xen uses local APIC timer in one-shot mode. */
> >> -    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
> >> -
> >>      if ( tdt_enabled )
> >>      {
> >> -        lvtt_value &= (~APIC_TIMER_MODE_MASK);
> >> -        lvtt_value |= APIC_TIMER_MODE_TSC_DEADLINE;
> >> +        lvtt_value = APIC_TIMER_MODE_TSC_DEADLINE | LOCAL_TIMER_VECTOR;
> >> +        apic_write(APIC_LVTT, lvtt_value);
> >> +
> >> +        /*
> >> +         * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
> >> +         * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
> >> +         * According to Intel, MFENCE can do the serialization here.
> >> +         */
> >> +        asm volatile( "mfence" : : : "memory" );
> >> +
> >> +        return;
> >>      }
> >>  
> >> +    /* NB. Xen uses local APIC timer in one-shot mode. */
> >> +    lvtt_value = /*APIC_TIMER_MODE_PERIODIC |*/ LOCAL_TIMER_VECTOR;
> > 
> > While here I wouldn't mind if you replaced the comment(s) here with
> > APIC_TIMER_MODE_ONESHOT. I think that's clearer.
> > 
> > I wouldn't mind if you did something like:
> > 
> > unsigned int lvtt_value = (tdt_enabled ? APIC_TIMER_MODE_TSC_DEADLINE
> >                                        : APIC_TIMER_MODE_ONESHOT) |
> >                           LOCAL_TIMER_VECTOR;
> 
> I'm happy to switch to using APIC_TIMER_MODE_ONESHOT, but ...
> 
> > apic_write(APIC_LVTT, lvtt_value);
> > 
> > if ( tdt_enabled )
> > {
> >     MFENCE;
> >     return;
> > }
> 
> ... I'd prefer to stick to just a single tdt_enabled conditional.
> But then I'm also unclear about your use of "comment(s)" - what is
> the (optional?) plural referring to?

I considered the switch to use APIC_TIMER_MODE_ONESHOT one comment,
while the switch to set lvtt_value only once another one.

I'm fine if you want to leave the layout as-is while using
APIC_TIMER_MODE_ONESHOT.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible
  2022-03-11 13:45   ` Roger Pau Monné
@ 2022-03-14 16:19     ` Jan Beulich
  2022-03-15  9:12       ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-03-14 16:19 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 11.03.2022 14:45, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 10:25:11AM +0100, Jan Beulich wrote:
>> Use the original calibration against PIT only when the platform timer
>> is PIT. This implicitly excludes the "xen_guest" case from using the PIT
>> logic (init_pit() fails there, and as of 5e73b2594c54 ["x86/time: minor
>> adjustments to init_pit()"] using_pit also isn't being set too early
>> anymore), so the respective hack there can be dropped at the same time.
>> This also reduces calibration time from 100ms to 50ms, albeit this step
>> is being skipped as of 0731a56c7c72 ("x86/APIC: no need for timer
>> calibration when using TDT") anyway.
>>
>> While re-indenting the PIT logic in calibrate_APIC_clock(), besides
>> adjusting style also switch around the 2nd TSC/TMCCT read pair, to match
>> the order of the 1st one, yielding more consistent deltas.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Open-coding apic_read() in read_tmcct() isn't overly nice, but I wanted
>> to avoid x2apic_enabled being evaluated twice in close succession. (The
>> barrier is there just in case only anyway: While this RDMSR isn't
>> serializing, I'm unaware of any statement whether it can also be
>> executed speculatively, like RDTSC can.) An option might be to move the
>> function to apic.c such that it would also be used by
>> calibrate_APIC_clock().
> 
> I think that would make sense. Or else it's kind of orthogonal that we
> use a barrier in calibrate_apic_timer but not in calibrate_APIC_clock.

But there is a barrier there, via rdtsc_ordered(). Thinking about
this again, I'm not not even sure I'd like to use the helper in
calibrate_APIC_clock(), as there's no need to have two barriers
there.

But I guess I'll move the function in any event, so it at least
feels less like a layering violation. But I still would want to
avoid calling apic_read(), i.e. the function would remain as is
(albeit perhaps renamed as becoming non-static).

> But maybe we can get rid of the open-coded PIT calibration in
> calibrate_APIC_clock? (see below)
> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -26,6 +26,7 @@
>>  #include <xen/symbols.h>
>>  #include <xen/keyhandler.h>
>>  #include <xen/guest_access.h>
>> +#include <asm/apic.h>
>>  #include <asm/io.h>
>>  #include <asm/iocap.h>
>>  #include <asm/msr.h>
>> @@ -1004,6 +1005,78 @@ static u64 __init init_platform_timer(vo
>>      return rc;
>>  }
>>  
>> +static uint32_t __init read_tmcct(void)
>> +{
>> +    if ( x2apic_enabled )
>> +    {
>> +        alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC);
>> +        return apic_rdmsr(APIC_TMCCT);
>> +    }
>> +
>> +    return apic_mem_read(APIC_TMCCT);
>> +}
>> +
>> +static uint64_t __init read_pt_and_tmcct(uint32_t *tmcct)
>> +{
>> +    uint32_t tmcct_prev = *tmcct = read_tmcct(), tmcct_min = ~0;
>> +    uint64_t best = best;
>> +    unsigned int i;
>> +
>> +    for ( i = 0; ; ++i )
>> +    {
>> +        uint64_t pt = plt_src.read_counter();
>> +        uint32_t tmcct_cur = read_tmcct();
>> +        uint32_t tmcct_delta = tmcct_prev - tmcct_cur;
>> +
>> +        if ( tmcct_delta < tmcct_min )
>> +        {
>> +            tmcct_min = tmcct_delta;
>> +            *tmcct = tmcct_cur;
>> +            best = pt;
>> +        }
>> +        else if ( i > 2 )
>> +            break;
>> +
>> +        tmcct_prev = tmcct_cur;
>> +    }
>> +
>> +    return best;
>> +}
>> +
>> +uint64_t __init calibrate_apic_timer(void)
>> +{
>> +    uint32_t start, end;
>> +    uint64_t count = read_pt_and_tmcct(&start), elapsed;
>> +    uint64_t target = CALIBRATE_VALUE(plt_src.frequency), actual;
>> +    uint64_t mask = (uint64_t)~0 >> (64 - plt_src.counter_bits);
>> +
>> +    /*
>> +     * PIT cannot be used here as it requires the timer interrupt to maintain
>> +     * its 32-bit software counter, yet here we run with IRQs disabled.
>> +     */
> 
> The reasoning in calibrate_APIC_clock to have interrupts disabled
> doesn't apply anymore I would think (interrupts are already enabled
> when we get there),

setup_boot_APIC_clock() disables IRQs before calling
calibrate_APIC_clock(). Whether the reasoning still applies is hard
to tell - I at least cannot claim I fully understand the concern.

> and hence it seems to me that calibrate_APIC_clock
> could be called with interrupts enabled and we could remove the
> open-coded usage of the PIT in calibrate_APIC_clock.

I won't exclude this might be possible, but it would mean changing
a path which is hardly ever used nowadays. While on one hand this
means hardly anyone might notice, otoh it also means possible
breakage might not be noticed until far in the future. It anyway
feels too much for a single change to also alter calibration against
PIT right here.

One thing seems quite clear though: Doing any of this with interrupts
enabled increases the chances for the read pairs to not properly
correlate, due to an interrupt happening in the middle. This alone is
a reason for me to want to keep IRQs off here.

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible
  2022-03-14 16:19     ` Jan Beulich
@ 2022-03-15  9:12       ` Roger Pau Monné
  2022-03-15 10:39         ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-15  9:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Mon, Mar 14, 2022 at 05:19:37PM +0100, Jan Beulich wrote:
> On 11.03.2022 14:45, Roger Pau Monné wrote:
> > On Mon, Feb 14, 2022 at 10:25:11AM +0100, Jan Beulich wrote:
> >> Use the original calibration against PIT only when the platform timer
> >> is PIT. This implicitly excludes the "xen_guest" case from using the PIT
> >> logic (init_pit() fails there, and as of 5e73b2594c54 ["x86/time: minor
> >> adjustments to init_pit()"] using_pit also isn't being set too early
> >> anymore), so the respective hack there can be dropped at the same time.
> >> This also reduces calibration time from 100ms to 50ms, albeit this step
> >> is being skipped as of 0731a56c7c72 ("x86/APIC: no need for timer
> >> calibration when using TDT") anyway.
> >>
> >> While re-indenting the PIT logic in calibrate_APIC_clock(), besides
> >> adjusting style also switch around the 2nd TSC/TMCCT read pair, to match
> >> the order of the 1st one, yielding more consistent deltas.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> Open-coding apic_read() in read_tmcct() isn't overly nice, but I wanted
> >> to avoid x2apic_enabled being evaluated twice in close succession. (The
> >> barrier is there just in case only anyway: While this RDMSR isn't
> >> serializing, I'm unaware of any statement whether it can also be
> >> executed speculatively, like RDTSC can.) An option might be to move the
> >> function to apic.c such that it would also be used by
> >> calibrate_APIC_clock().
> > 
> > I think that would make sense. Or else it's kind of orthogonal that we
> > use a barrier in calibrate_apic_timer but not in calibrate_APIC_clock.
> 
> But there is a barrier there, via rdtsc_ordered(). Thinking about
> this again, I'm not not even sure I'd like to use the helper in
> calibrate_APIC_clock(), as there's no need to have two barriers
> there.
> 
> But I guess I'll move the function in any event, so it at least
> feels less like a layering violation. But I still would want to
> avoid calling apic_read(), i.e. the function would remain as is
> (albeit perhaps renamed as becoming non-static).
> 
> > But maybe we can get rid of the open-coded PIT calibration in
> > calibrate_APIC_clock? (see below)
> > 
> >> --- a/xen/arch/x86/time.c
> >> +++ b/xen/arch/x86/time.c
> >> @@ -26,6 +26,7 @@
> >>  #include <xen/symbols.h>
> >>  #include <xen/keyhandler.h>
> >>  #include <xen/guest_access.h>
> >> +#include <asm/apic.h>
> >>  #include <asm/io.h>
> >>  #include <asm/iocap.h>
> >>  #include <asm/msr.h>
> >> @@ -1004,6 +1005,78 @@ static u64 __init init_platform_timer(vo
> >>      return rc;
> >>  }
> >>  
> >> +static uint32_t __init read_tmcct(void)
> >> +{
> >> +    if ( x2apic_enabled )
> >> +    {
> >> +        alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC);
> >> +        return apic_rdmsr(APIC_TMCCT);
> >> +    }
> >> +
> >> +    return apic_mem_read(APIC_TMCCT);
> >> +}
> >> +
> >> +static uint64_t __init read_pt_and_tmcct(uint32_t *tmcct)
> >> +{
> >> +    uint32_t tmcct_prev = *tmcct = read_tmcct(), tmcct_min = ~0;
> >> +    uint64_t best = best;
> >> +    unsigned int i;
> >> +
> >> +    for ( i = 0; ; ++i )
> >> +    {
> >> +        uint64_t pt = plt_src.read_counter();
> >> +        uint32_t tmcct_cur = read_tmcct();
> >> +        uint32_t tmcct_delta = tmcct_prev - tmcct_cur;
> >> +
> >> +        if ( tmcct_delta < tmcct_min )
> >> +        {
> >> +            tmcct_min = tmcct_delta;
> >> +            *tmcct = tmcct_cur;
> >> +            best = pt;
> >> +        }
> >> +        else if ( i > 2 )
> >> +            break;
> >> +
> >> +        tmcct_prev = tmcct_cur;
> >> +    }
> >> +
> >> +    return best;
> >> +}
> >> +
> >> +uint64_t __init calibrate_apic_timer(void)
> >> +{
> >> +    uint32_t start, end;
> >> +    uint64_t count = read_pt_and_tmcct(&start), elapsed;
> >> +    uint64_t target = CALIBRATE_VALUE(plt_src.frequency), actual;
> >> +    uint64_t mask = (uint64_t)~0 >> (64 - plt_src.counter_bits);
> >> +
> >> +    /*
> >> +     * PIT cannot be used here as it requires the timer interrupt to maintain
> >> +     * its 32-bit software counter, yet here we run with IRQs disabled.
> >> +     */
> > 
> > The reasoning in calibrate_APIC_clock to have interrupts disabled
> > doesn't apply anymore I would think (interrupts are already enabled
> > when we get there),
> 
> setup_boot_APIC_clock() disables IRQs before calling
> calibrate_APIC_clock(). Whether the reasoning still applies is hard
> to tell - I at least cannot claim I fully understand the concern.

Me neither, I'm not sure what will explicitly need the first
interrupt, and why further interrupts won't be fine.

Also interrupts are already enabled before calling
calibrate_APIC_clock() (as it's setup_boot_APIC_clock() that disables
them), so this whole thing about getting the first interrupt seems
very bogus and plain wrong.

> > and hence it seems to me that calibrate_APIC_clock
> > could be called with interrupts enabled and we could remove the
> > open-coded usage of the PIT in calibrate_APIC_clock.
> 
> I won't exclude this might be possible, but it would mean changing
> a path which is hardly ever used nowadays. While on one hand this
> means hardly anyone might notice, otoh it also means possible
> breakage might not be noticed until far in the future. It anyway
> feels too much for a single change to also alter calibration against
> PIT right here.

You are already changing this path by using a clocksource different
than PIT to perform the calibration.

> One thing seems quite clear though: Doing any of this with interrupts
> enabled increases the chances for the read pairs to not properly
> correlate, due to an interrupt happening in the middle. This alone is
> a reason for me to want to keep IRQs off here.

Right, TSC calibration is also done with interrupts disabled, so it
does seem correct to do the same here for APIC.

Maybe it would be cleaner to hide the specific PIT logic in
calibrate_apic_timer() so that we could remove get_8254_timer_count()
and wait_8254_wraparound() from apic.c and apic.c doesn't have any PIT
specific code anymore?

I think using channel 2 like it's used for the TSC calibration won't
be possible at this point, since it will skew read_pit_count() users?
In any case if we disable interrupts those will already be skewed
because the timer won't be rearmed until interrupts are enabled.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible
  2022-03-15  9:12       ` Roger Pau Monné
@ 2022-03-15 10:39         ` Jan Beulich
  2022-03-15 14:57           ` Roger Pau Monné
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-03-15 10:39 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 15.03.2022 10:12, Roger Pau Monné wrote:
> On Mon, Mar 14, 2022 at 05:19:37PM +0100, Jan Beulich wrote:
>> One thing seems quite clear though: Doing any of this with interrupts
>> enabled increases the chances for the read pairs to not properly
>> correlate, due to an interrupt happening in the middle. This alone is
>> a reason for me to want to keep IRQs off here.
> 
> Right, TSC calibration is also done with interrupts disabled, so it
> does seem correct to do the same here for APIC.
> 
> Maybe it would be cleaner to hide the specific PIT logic in
> calibrate_apic_timer() so that we could remove get_8254_timer_count()
> and wait_8254_wraparound() from apic.c and apic.c doesn't have any PIT
> specific code anymore?

Yes, that's certainly a further cleanup step to take (saying this
without actually having tried, so there may be obstacles).

Jan

> I think using channel 2 like it's used for the TSC calibration won't
> be possible at this point, since it will skew read_pit_count() users?
> In any case if we disable interrupts those will already be skewed
> because the timer won't be rearmed until interrupts are enabled.
> 
> Thanks, Roger.
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible
  2022-03-15 10:39         ` Jan Beulich
@ 2022-03-15 14:57           ` Roger Pau Monné
  0 siblings, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2022-03-15 14:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Tue, Mar 15, 2022 at 11:39:29AM +0100, Jan Beulich wrote:
> On 15.03.2022 10:12, Roger Pau Monné wrote:
> > On Mon, Mar 14, 2022 at 05:19:37PM +0100, Jan Beulich wrote:
> >> One thing seems quite clear though: Doing any of this with interrupts
> >> enabled increases the chances for the read pairs to not properly
> >> correlate, due to an interrupt happening in the middle. This alone is
> >> a reason for me to want to keep IRQs off here.
> > 
> > Right, TSC calibration is also done with interrupts disabled, so it
> > does seem correct to do the same here for APIC.
> > 
> > Maybe it would be cleaner to hide the specific PIT logic in
> > calibrate_apic_timer() so that we could remove get_8254_timer_count()
> > and wait_8254_wraparound() from apic.c and apic.c doesn't have any PIT
> > specific code anymore?
> 
> Yes, that's certainly a further cleanup step to take (saying this
> without actually having tried, so there may be obstacles).

OK, I think you are planning to post a new version of this to avoid
open-coding apic_read() in read_tmcct()?

TBH the PIT calibration done in calibrate_APIC_clock seems fairly
bogus, as it's possible the counter wraps around more than once
between calls when running virtualized. Maybe reprogramming channel 2
would be better, as then at least wrap around would be detected
(albeit it's unclear how much delta we would have between the counter
reaching 0 and Xen realizing).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-15 14:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14  9:22 [PATCH v3 0/4] x86: further improve timer freq calibration accuracy Jan Beulich
2022-02-14  9:24 ` [PATCH v3 1/4] x86/time: further improve TSC / CPU " Jan Beulich
2022-03-11 12:03   ` Roger Pau Monné
2022-03-11 12:30     ` Jan Beulich
2022-02-14  9:25 ` [PATCH v3 2/4] x86/APIC: calibrate against platform timer when possible Jan Beulich
2022-03-11 13:45   ` Roger Pau Monné
2022-03-14 16:19     ` Jan Beulich
2022-03-15  9:12       ` Roger Pau Monné
2022-03-15 10:39         ` Jan Beulich
2022-03-15 14:57           ` Roger Pau Monné
2022-02-14  9:25 ` [PATCH v3 3/4] x86/APIC: skip unnecessary parts of __setup_APIC_LVTT() Jan Beulich
2022-03-11 14:05   ` Roger Pau Monné
2022-03-14  8:25     ` Jan Beulich
2022-03-14  8:58       ` Roger Pau Monné
2022-02-14  9:25 ` [PATCH v3 4/4] x86/APIC: make connections between seemingly arbitrary numbers Jan Beulich
2022-03-11 14:24   ` Roger Pau Monné
2022-03-14  8:19     ` Jan Beulich
2022-03-14  8:56       ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.