xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86/time: calibration rendezvous adjustments
@ 2021-02-09 12:53 Jan Beulich
  2021-02-09 12:54 ` [PATCH v3 1/4] x86/time: change initiation of the calibration timer Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Jan Beulich @ 2021-02-09 12:53 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Claudemir Todo Bom, Ian Jackson

The middle two patches are meant to address a regression reported on
the list under "Problems with APIC on versions 4.9 and later (4.8
works)". In the course of analyzing output from a debugging patch I
ran into another anomaly again, which I thought I should finally try
to address. Hence patch 1. Patch 4 is new in v3 and RFC for now.

While looking closely at the corresponding debugging patch'es output I
noticed a suspicious drift between local and master stime: Measured not
very precisely, local was behind master by about 200ms in about half an
hour. Interestingly the recording of ->master_stime (and hence the not
really inexpensive invocation of read_platform_stime()) looks to be
pretty pointless when CONSTANT_TSC - I haven't been able to spot an
actual consumer. IOW the drift may not be a problem, and we might be
able to eliminate the platform timer reads. (When !CONSTANT_TSC, such
drift would get corrected anyway, by local_time_calibration().)

1: change initiation of the calibration timer
2: adjust time recording time_calibration_tsc_rendezvous()
3: don't move TSC backwards in time_calibration_tsc_rendezvous()
4: re-arrange struct calibration_rendezvous

Jan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/4] x86/time: change initiation of the calibration timer
  2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
@ 2021-02-09 12:54 ` Jan Beulich
  2021-02-09 12:55 ` [PATCH v3 2/4] x86/time: adjust time recording in time_calibration_tsc_rendezvous() Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2021-02-09 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Claudemir Todo Bom, Ian Jackson

Setting the timer a second (EPOCH) into the future at a random point
during boot (prior to bringing up APs and prior to launching Dom0) does
not yield predictable results: The timer may expire while we're still
bringing up APs (too early) or when Dom0 already boots (too late).
Instead invoke the timer handler function explicitly at a predictable
point in time, once we've established the rendezvous function to use
(and hence also once all APs are online). This will, through the raising
and handling of TIMER_SOFTIRQ, then also have the effect of arming the
timer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -854,9 +854,7 @@ static void resume_platform_timer(void)
 
 static void __init reset_platform_timer(void)
 {
-    /* Deactivate any timers running */
     kill_timer(&plt_overflow_timer);
-    kill_timer(&calibration_timer);
 
     /* Reset counters and stamps */
     spin_lock_irq(&platform_timer_lock);
@@ -1956,19 +1954,13 @@ static void __init reset_percpu_time(voi
     t->stamp.master_stime = t->stamp.local_stime;
 }
 
-static void __init try_platform_timer_tail(bool late)
+static void __init try_platform_timer_tail(void)
 {
     init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
     plt_overflow(NULL);
 
     platform_timer_stamp = plt_stamp64;
     stime_platform_stamp = NOW();
-
-    if ( !late )
-        init_percpu_time();
-
-    init_timer(&calibration_timer, time_calibration, NULL, 0);
-    set_timer(&calibration_timer, NOW() + EPOCH);
 }
 
 /* Late init function, after all cpus have booted */
@@ -2009,10 +2001,13 @@ static int __init verify_tsc_reliability
             time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;
 
             /* Finish platform timer switch. */
-            try_platform_timer_tail(true);
+            try_platform_timer_tail();
 
             printk("Switched to Platform timer %s TSC\n",
                    freq_string(plt_src.frequency));
+
+            time_calibration(NULL);
+
             return 0;
         }
     }
@@ -2033,6 +2028,8 @@ static int __init verify_tsc_reliability
          !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
         time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous;
 
+    time_calibration(NULL);
+
     return 0;
 }
 __initcall(verify_tsc_reliability);
@@ -2048,7 +2045,11 @@ int __init init_xen_time(void)
     do_settime(get_wallclock_time(), 0, NOW());
 
     /* Finish platform timer initialization. */
-    try_platform_timer_tail(false);
+    try_platform_timer_tail();
+
+    init_percpu_time();
+
+    init_timer(&calibration_timer, time_calibration, NULL, 0);
 
     /*
      * Setup space to track per-socket TSC_ADJUST values. Don't fiddle with



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/4] x86/time: adjust time recording in time_calibration_tsc_rendezvous()
  2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
  2021-02-09 12:54 ` [PATCH v3 1/4] x86/time: change initiation of the calibration timer Jan Beulich
@ 2021-02-09 12:55 ` Jan Beulich
  2021-02-09 12:55 ` [PATCH v3 3/4] x86/time: don't move TSC backwards " Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2021-02-09 12:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Claudemir Todo Bom, Ian Jackson

The (stime,tsc) tuple is the basis for extrapolation by get_s_time().
Therefore the two better get taken as close to one another as possible.
This means two things: First, reading platform time is too early when
done on the first iteration. The closest we can get is on the last
iteration, immediately before telling other CPUs to write their TSCs
(and then also writing CPU0's). While at the first glance it may seem
not overly relevant when exactly platform time is read (when assuming
that only stime is ever relevant anywhere, and hence the association
with the precise TSC values is of lower interest), both CPU frequency
changes and the effects of SMT make it unpredictable (between individual
rendezvous instances) how long the loop iterations will take. This will
in turn lead to higher an error than neccesary in how close to linear
stime movement we can get.

Second, re-reading the TSC for local recording is increasing the overall
error as well, when we already know a more precise value - the one just
written.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
---
v2: New.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1662,11 +1662,12 @@ struct calibration_rendezvous {
 };
 
 static void
-time_calibration_rendezvous_tail(const struct calibration_rendezvous *r)
+time_calibration_rendezvous_tail(const struct calibration_rendezvous *r,
+                                 uint64_t tsc)
 {
     struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
 
-    c->local_tsc    = rdtsc_ordered();
+    c->local_tsc    = tsc;
     c->local_stime  = get_s_time_fixed(c->local_tsc);
     c->master_stime = r->master_stime;
 
@@ -1691,11 +1692,11 @@ static void time_calibration_tsc_rendezv
             while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
                 cpu_relax();
 
-            if ( r->master_stime == 0 )
-            {
-                r->master_stime = read_platform_stime(NULL);
+            if ( r->master_tsc_stamp == 0 )
                 r->master_tsc_stamp = rdtsc_ordered();
-            }
+            else if ( i == 0 )
+                r->master_stime = read_platform_stime(NULL);
+
             atomic_inc(&r->semaphore);
 
             if ( i == 0 )
@@ -1720,7 +1721,7 @@ static void time_calibration_tsc_rendezv
         }
     }
 
-    time_calibration_rendezvous_tail(r);
+    time_calibration_rendezvous_tail(r, r->master_tsc_stamp);
 }
 
 /* Ordinary rendezvous function which does not modify TSC values. */
@@ -1745,7 +1746,7 @@ static void time_calibration_std_rendezv
         smp_rmb(); /* receive signal /then/ read r->master_stime */
     }
 
-    time_calibration_rendezvous_tail(r);
+    time_calibration_rendezvous_tail(r, rdtsc_ordered());
 }
 
 /*



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 3/4] x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()
  2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
  2021-02-09 12:54 ` [PATCH v3 1/4] x86/time: change initiation of the calibration timer Jan Beulich
  2021-02-09 12:55 ` [PATCH v3 2/4] x86/time: adjust time recording in time_calibration_tsc_rendezvous() Jan Beulich
@ 2021-02-09 12:55 ` Jan Beulich
  2021-02-09 12:57 ` [PATCH RFC v3 4/4] x86/time: re-arrange struct calibration_rendezvous Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2021-02-09 12:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Roger Pau Monné,
	Claudemir Todo Bom, Ian Jackson

While doing this for small amounts may be okay, the unconditional use
of CPU0's value here has been found to be a problem when the boot time
TSC of the BSP was behind that of all APs by more than a second. In
particular because of get_s_time_fixed() producing insane output when
the calculated delta is negative, we can't allow this to happen.

On the first iteration have all other CPUs sort out the highest TSC
value any one of them has read. On the second iteration, if that
maximum is higher than CPU0's, update its recorded value from that
taken in the first iteration. Use the resulting value on the last
iteration to write everyone's TSCs.

To account for the possible discontinuity, have
time_calibration_rendezvous_tail() record the newly written value, but
extrapolate local stime using the value read.

Reported-by: Claudemir Todo Bom <claudemir@todobom.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v3: Simplify cmpxchg() loop.
v2: Don't update r->master_stime by calculation. Re-base over new
    earlier patch. Make time_calibration_rendezvous_tail() take two TSC
    values.
---
Since CPU0 reads its TSC last on the first iteration, if TSCs were
perfectly sync-ed there shouldn't ever be a need to update. However,
even on the TSC-reliable system I first tested this on (using
"tsc=skewed" to get this rendezvous function into use in the first
place) updates by up to several thousand clocks did happen. I wonder
whether this points at some problem with the approach that I'm not (yet)
seeing.

Considering the sufficiently modern CPU it's using, I suspect the
reporter's system wouldn't even need to turn off TSC_RELIABLE, if only
there wasn't the boot time skew. Hence another approach might be to fix
this boot time skew. Of course to recognize whether the TSCs then still
aren't in sync we'd need to run tsc_check_reliability() sufficiently
long after that adjustment. Which is besides the need to have this
"fixing" be precise enough for the TSCs to not look skewed anymore
afterwards.

As per the comment ahead of it, the original purpose of the function was
to deal with TSCs halted in deep C states. While this probably explains
why only forward moves were ever expected, I don't see how this could
have been reliable in case CPU0 was deep-sleeping for a sufficiently
long time. My only guess here is a hidden assumption of CPU0 never being
idle for long enough. Furthermore that comment looks to be contradicting
the actual use of the function: It gets installed when !RELIABLE_TSC,
while the comment would suggest !NONSTOP_TSC. I suppose the comment is
simply misleading, because RELIABLE_TSC implies NONSTOP_TSC according to
all the places where either of the two feature bits gets played with.
Plus in the !NONSTOP_TSC case we write the TSC explicitly anyway when
coming back out of a (deep; see below) C-state.

As an implication from the above mwait_idle_cpu_init() then looks to
pointlessly clear "reliable" when "nonstop" is clear.

It further looks odd that mwait_idle() (unlike acpi_processor_idle())
calls cstate_restore_tsc() independent of what C-state was active.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1658,17 +1658,17 @@ struct calibration_rendezvous {
     cpumask_t cpu_calibration_map;
     atomic_t semaphore;
     s_time_t master_stime;
-    u64 master_tsc_stamp;
+    uint64_t master_tsc_stamp, max_tsc_stamp;
 };
 
 static void
 time_calibration_rendezvous_tail(const struct calibration_rendezvous *r,
-                                 uint64_t tsc)
+                                 uint64_t old_tsc, uint64_t new_tsc)
 {
     struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
 
-    c->local_tsc    = tsc;
-    c->local_stime  = get_s_time_fixed(c->local_tsc);
+    c->local_tsc    = new_tsc;
+    c->local_stime  = get_s_time_fixed(old_tsc ?: new_tsc);
     c->master_stime = r->master_stime;
 
     raise_softirq(TIME_CALIBRATE_SOFTIRQ);
@@ -1683,6 +1683,7 @@ static void time_calibration_tsc_rendezv
     int i;
     struct calibration_rendezvous *r = _r;
     unsigned int total_cpus = cpumask_weight(&r->cpu_calibration_map);
+    uint64_t tsc = 0;
 
     /* Loop to get rid of cache effects on TSC skew. */
     for ( i = 4; i >= 0; i-- )
@@ -1692,8 +1693,15 @@ static void time_calibration_tsc_rendezv
             while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
                 cpu_relax();
 
-            if ( r->master_tsc_stamp == 0 )
-                r->master_tsc_stamp = rdtsc_ordered();
+            if ( tsc == 0 )
+                r->master_tsc_stamp = tsc = rdtsc_ordered();
+            else if ( r->master_tsc_stamp < r->max_tsc_stamp )
+                /*
+                 * We want to avoid moving the TSC backwards for any CPU.
+                 * Use the largest value observed anywhere on the first
+                 * iteration.
+                 */
+                r->master_tsc_stamp = r->max_tsc_stamp;
             else if ( i == 0 )
                 r->master_stime = read_platform_stime(NULL);
 
@@ -1712,6 +1720,15 @@ static void time_calibration_tsc_rendezv
             while ( atomic_read(&r->semaphore) < total_cpus )
                 cpu_relax();
 
+            if ( tsc == 0 )
+            {
+                uint64_t cur = ACCESS_ONCE(r->max_tsc_stamp);
+
+                tsc = rdtsc_ordered();
+                while ( tsc > cur )
+                    cur = cmpxchg(&r->max_tsc_stamp, cur, tsc);
+            }
+
             if ( i == 0 )
                 write_tsc(r->master_tsc_stamp);
 
@@ -1719,9 +1736,12 @@ static void time_calibration_tsc_rendezv
             while ( atomic_read(&r->semaphore) > total_cpus )
                 cpu_relax();
         }
+
+        /* Just in case a read above ended up reading zero. */
+        tsc += !tsc;
     }
 
-    time_calibration_rendezvous_tail(r, r->master_tsc_stamp);
+    time_calibration_rendezvous_tail(r, tsc, r->master_tsc_stamp);
 }
 
 /* Ordinary rendezvous function which does not modify TSC values. */
@@ -1746,7 +1766,7 @@ static void time_calibration_std_rendezv
         smp_rmb(); /* receive signal /then/ read r->master_stime */
     }
 
-    time_calibration_rendezvous_tail(r, rdtsc_ordered());
+    time_calibration_rendezvous_tail(r, 0, rdtsc_ordered());
 }
 
 /*



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC v3 4/4] x86/time: re-arrange struct calibration_rendezvous
  2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2021-02-09 12:55 ` [PATCH v3 3/4] x86/time: don't move TSC backwards " Jan Beulich
@ 2021-02-09 12:57 ` Jan Beulich
  2021-02-24  9:29   ` Jan Beulich
  2021-02-09 13:02 ` [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
  2021-02-19 16:06 ` Ian Jackson
  5 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2021-02-09 12:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

To reduce latency on time_calibration_tsc_rendezvous()'s last loop
iteration, separate fields written on the last iteration enough from the
crucial field read by all CPUs on the last iteration such that they end
up in distinct cache lines. Prefetch this field on an earlier iteration.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.
---
While readability would likely suffer, we may want to consider avoiding
the prefetch on at least the first two iterations (where the field still
gets / may get written to by CPU0). Could e.g. be

    switch ( i )
    {
    case 0:
        write_tsc(r->master_tsc_stamp);
        break;
    case 1:
        prefetch(&r->master_tsc_stamp);
        break;
    }

Of course it would also be nice to avoid the pretty likely branch
misprediction on the last iteration. But with the static prediction
hints having been rather short-lived in the architecture, I don't see
any good means to do so.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1655,10 +1655,20 @@ static void tsc_check_reliability(void)
  * All CPUS snapshot their local TSC and extrapolation of system time.
  */
 struct calibration_rendezvous {
-    cpumask_t cpu_calibration_map;
     atomic_t semaphore;
     s_time_t master_stime;
-    uint64_t master_tsc_stamp, max_tsc_stamp;
+    cpumask_t cpu_calibration_map;
+    /*
+     * All CPUs want to read master_tsc_stamp on the last iteration.  If
+     * cpu_calibration_map isn't large enough to push the field into a cache
+     * line different from the one used by semaphore (written by all CPUs on
+     * every iteration) and master_stime (written by CPU0 on the last
+     * iteration), align the field suitably.
+     */
+    uint64_t __aligned(BITS_TO_LONGS(NR_CPUS) * sizeof(long) +
+                       sizeof(atomic_t) + sizeof(s_time_t) < SMP_CACHE_BYTES
+                       ? SMP_CACHE_BYTES : 1) master_tsc_stamp;
+    uint64_t max_tsc_stamp;
 };
 
 static void
@@ -1709,6 +1719,8 @@ static void time_calibration_tsc_rendezv
 
             if ( i == 0 )
                 write_tsc(r->master_tsc_stamp);
+            else
+                prefetch(&r->master_tsc_stamp);
 
             while ( atomic_read(&r->semaphore) != (2*total_cpus - 1) )
                 cpu_relax();
@@ -1731,6 +1743,8 @@ static void time_calibration_tsc_rendezv
 
             if ( i == 0 )
                 write_tsc(r->master_tsc_stamp);
+            else
+                prefetch(&r->master_tsc_stamp);
 
             atomic_inc(&r->semaphore);
             while ( atomic_read(&r->semaphore) > total_cpus )



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/4] x86/time: calibration rendezvous adjustments
  2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
                   ` (3 preceding siblings ...)
  2021-02-09 12:57 ` [PATCH RFC v3 4/4] x86/time: re-arrange struct calibration_rendezvous Jan Beulich
@ 2021-02-09 13:02 ` Jan Beulich
  2021-02-17  8:27   ` Ping: " Jan Beulich
  2021-02-19 16:06 ` Ian Jackson
  5 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2021-02-09 13:02 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, xen-devel

On 09.02.2021 13:53, Jan Beulich wrote:
> The middle two patches are meant to address a regression reported on
> the list under "Problems with APIC on versions 4.9 and later (4.8
> works)". In the course of analyzing output from a debugging patch I
> ran into another anomaly again, which I thought I should finally try
> to address. Hence patch 1. Patch 4 is new in v3 and RFC for now.

Of course this is the kind of change I'd prefer doing early in a
release cycle. I don't think there are severe risks from patch 1, but
I'm not going to claim patches 2 and 3 are risk free. They fix booting
Xen on a system left in rather awkward state by the firmware. And
they shouldn't affect well behaved modern systems at all (due to
those using a different rendezvous function). While we've been having
this issue for years, I also consider this set a backporting
candidate. Hence I can see reasons pro and con inclusion in 4.15.

Jan

> 1: change initiation of the calibration timer
> 2: adjust time recording time_calibration_tsc_rendezvous()
> 3: don't move TSC backwards in time_calibration_tsc_rendezvous()
> 4: re-arrange struct calibration_rendezvous
> 
> Jan
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Ping: [PATCH v3 0/4] x86/time: calibration rendezvous adjustments
  2021-02-09 13:02 ` [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
@ 2021-02-17  8:27   ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2021-02-17  8:27 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, xen-devel

Ian?

On 09.02.2021 14:02, Jan Beulich wrote:
> On 09.02.2021 13:53, Jan Beulich wrote:
>> The middle two patches are meant to address a regression reported on
>> the list under "Problems with APIC on versions 4.9 and later (4.8
>> works)". In the course of analyzing output from a debugging patch I
>> ran into another anomaly again, which I thought I should finally try
>> to address. Hence patch 1. Patch 4 is new in v3 and RFC for now.
> 
> Of course this is the kind of change I'd prefer doing early in a
> release cycle. I don't think there are severe risks from patch 1, but
> I'm not going to claim patches 2 and 3 are risk free. They fix booting
> Xen on a system left in rather awkward state by the firmware. And
> they shouldn't affect well behaved modern systems at all (due to
> those using a different rendezvous function). While we've been having
> this issue for years, I also consider this set a backporting
> candidate. Hence I can see reasons pro and con inclusion in 4.15.
> 
> Jan
> 
>> 1: change initiation of the calibration timer
>> 2: adjust time recording time_calibration_tsc_rendezvous()
>> 3: don't move TSC backwards in time_calibration_tsc_rendezvous()
>> 4: re-arrange struct calibration_rendezvous
>>
>> Jan
>>
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/4] x86/time: calibration rendezvous adjustments
  2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
                   ` (4 preceding siblings ...)
  2021-02-09 13:02 ` [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
@ 2021-02-19 16:06 ` Ian Jackson
  5 siblings, 0 replies; 9+ messages in thread
From: Ian Jackson @ 2021-02-19 16:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, Roger Pau Monné,
	Claudemir Todo Bom, Ian Jackson

Jan Beulich writes ("[PATCH v3 0/4] x86/time: calibration rendezvous adjustments"):
> The middle two patches are meant to address a regression reported on
> the list under "Problems with APIC on versions 4.9 and later (4.8
> works)". In the course of analyzing output from a debugging patch I
> ran into another anomaly again, which I thought I should finally try
> to address. Hence patch 1. Patch 4 is new in v3 and RFC for now.

For patches 1-3:

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Ian.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v3 4/4] x86/time: re-arrange struct calibration_rendezvous
  2021-02-09 12:57 ` [PATCH RFC v3 4/4] x86/time: re-arrange struct calibration_rendezvous Jan Beulich
@ 2021-02-24  9:29   ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2021-02-24  9:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

On 09.02.2021 13:57, Jan Beulich wrote:
> To reduce latency on time_calibration_tsc_rendezvous()'s last loop
> iteration, separate fields written on the last iteration enough from the
> crucial field read by all CPUs on the last iteration such that they end
> up in distinct cache lines. Prefetch this field on an earlier iteration.

I've now measured the effects of this, at least to some degree. On my
single-socket 18-core Skylake system this reduces the average loop
exit time on CPU0 (from the TSC write on the last iteration to until
after the main loop) from around 32k cycles to around 28k (albeit the
values measured on separate runs vary quite significantly).

About the same effect (maybe a little less, but within error bounds)
can be had without any re-arrangement of the struct's layout, by
simply reading r->master_tsc_stamp into a local variable at the end
of each loop iteration. I'll make v4 use this less convoluted model
instead.

Jan

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1655,10 +1655,20 @@ static void tsc_check_reliability(void)
>   * All CPUS snapshot their local TSC and extrapolation of system time.
>   */
>  struct calibration_rendezvous {
> -    cpumask_t cpu_calibration_map;
>      atomic_t semaphore;
>      s_time_t master_stime;
> -    uint64_t master_tsc_stamp, max_tsc_stamp;
> +    cpumask_t cpu_calibration_map;
> +    /*
> +     * All CPUs want to read master_tsc_stamp on the last iteration.  If
> +     * cpu_calibration_map isn't large enough to push the field into a cache
> +     * line different from the one used by semaphore (written by all CPUs on
> +     * every iteration) and master_stime (written by CPU0 on the last
> +     * iteration), align the field suitably.
> +     */
> +    uint64_t __aligned(BITS_TO_LONGS(NR_CPUS) * sizeof(long) +
> +                       sizeof(atomic_t) + sizeof(s_time_t) < SMP_CACHE_BYTES
> +                       ? SMP_CACHE_BYTES : 1) master_tsc_stamp;
> +    uint64_t max_tsc_stamp;
>  };
>  
>  static void
> @@ -1709,6 +1719,8 @@ static void time_calibration_tsc_rendezv
>  
>              if ( i == 0 )
>                  write_tsc(r->master_tsc_stamp);
> +            else
> +                prefetch(&r->master_tsc_stamp);
>  
>              while ( atomic_read(&r->semaphore) != (2*total_cpus - 1) )
>                  cpu_relax();
> @@ -1731,6 +1743,8 @@ static void time_calibration_tsc_rendezv
>  
>              if ( i == 0 )
>                  write_tsc(r->master_tsc_stamp);
> +            else
> +                prefetch(&r->master_tsc_stamp);
>  
>              atomic_inc(&r->semaphore);
>              while ( atomic_read(&r->semaphore) > total_cpus )
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-24  9:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 12:53 [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
2021-02-09 12:54 ` [PATCH v3 1/4] x86/time: change initiation of the calibration timer Jan Beulich
2021-02-09 12:55 ` [PATCH v3 2/4] x86/time: adjust time recording in time_calibration_tsc_rendezvous() Jan Beulich
2021-02-09 12:55 ` [PATCH v3 3/4] x86/time: don't move TSC backwards " Jan Beulich
2021-02-09 12:57 ` [PATCH RFC v3 4/4] x86/time: re-arrange struct calibration_rendezvous Jan Beulich
2021-02-24  9:29   ` Jan Beulich
2021-02-09 13:02 ` [PATCH v3 0/4] x86/time: calibration rendezvous adjustments Jan Beulich
2021-02-17  8:27   ` Ping: " Jan Beulich
2021-02-19 16:06 ` Ian Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).