xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] x86/time: calibration rendezvous adjustments
@ 2021-04-01  9:53 Jan Beulich
  2021-04-01  9:54 ` [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop Jan Beulich
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-01  9:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

The first patch was, under a different title and with a different
approach, already part of the prior series of the same subject.
The other two patches are new, resulting from me spotting further
room for improvement (or so I hope).

1: latch to-be-written TSC value early in rendezvous loop
2: yield to hyperthreads after updating TSC during rendezvous
3: avoid reading the platform timer in rendezvous functions

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop
  2021-04-01  9:53 [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
@ 2021-04-01  9:54 ` Jan Beulich
  2021-04-20 15:44   ` Roger Pau Monné
  2021-04-01  9:54 ` [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous Jan Beulich
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-04-01  9:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

To reduce latency on time_calibration_tsc_rendezvous()'s last loop
iteration, read the value to be written on the last iteration at the end
of the loop body (i.e. in particular at the end of the second to last
iteration).

On my single-socket 18-core Skylake system this reduces the average loop
exit time on CPU0 (from the TSC write on the last iteration to until
after the main loop) from around 32k cycles to around 29k (albeit the
values measured on separate runs vary quite significantly).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: Different approach.
v3: New.
---
Of course it would also be nice to avoid the pretty likely branch
misprediction on the last iteration. But with the static prediction
hints having been rather short-lived in the architecture, I don't see
any good means to do so.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1683,7 +1683,7 @@ static void time_calibration_tsc_rendezv
     int i;
     struct calibration_rendezvous *r = _r;
     unsigned int total_cpus = cpumask_weight(&r->cpu_calibration_map);
-    uint64_t tsc = 0;
+    uint64_t tsc = 0, master_tsc = 0;
 
     /* Loop to get rid of cache effects on TSC skew. */
     for ( i = 4; i >= 0; i-- )
@@ -1708,7 +1708,7 @@ static void time_calibration_tsc_rendezv
             atomic_inc(&r->semaphore);
 
             if ( i == 0 )
-                write_tsc(r->master_tsc_stamp);
+                write_tsc(master_tsc);
 
             while ( atomic_read(&r->semaphore) != (2*total_cpus - 1) )
                 cpu_relax();
@@ -1730,7 +1730,7 @@ static void time_calibration_tsc_rendezv
             }
 
             if ( i == 0 )
-                write_tsc(r->master_tsc_stamp);
+                write_tsc(master_tsc);
 
             atomic_inc(&r->semaphore);
             while ( atomic_read(&r->semaphore) > total_cpus )
@@ -1739,9 +1739,17 @@ static void time_calibration_tsc_rendezv
 
         /* Just in case a read above ended up reading zero. */
         tsc += !tsc;
+
+        /*
+         * To reduce latency of the TSC write on the last iteration,
+         * fetch the value to be written into a local variable. To avoid
+         * introducing yet another conditional branch (which the CPU may
+         * have difficulty predicting well) do this on all iterations.
+         */
+        master_tsc = r->master_tsc_stamp;
     }
 
-    time_calibration_rendezvous_tail(r, tsc, r->master_tsc_stamp);
+    time_calibration_rendezvous_tail(r, tsc, master_tsc);
 }
 
 /* Ordinary rendezvous function which does not modify TSC values. */



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous
  2021-04-01  9:53 [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
  2021-04-01  9:54 ` [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop Jan Beulich
@ 2021-04-01  9:54 ` Jan Beulich
  2021-04-20 15:59   ` Roger Pau Monné
  2021-04-01  9:55 ` [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions Jan Beulich
  2021-04-15  9:54 ` Ping: [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
  3 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-04-01  9:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Since we'd like the updates to be done as synchronously as possible,
make an attempt at yielding immediately after the TSC write.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: New.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1708,7 +1708,14 @@ static void time_calibration_tsc_rendezv
             atomic_inc(&r->semaphore);
 
             if ( i == 0 )
+            {
                 write_tsc(master_tsc);
+                /*
+                 * Try to give our hyperthread(s), if any, a chance to do
+                 * the same as instantly as possible.
+                 */
+                cpu_relax();
+            }
 
             while ( atomic_read(&r->semaphore) != (2*total_cpus - 1) )
                 cpu_relax();
@@ -1730,7 +1737,14 @@ static void time_calibration_tsc_rendezv
             }
 
             if ( i == 0 )
+            {
                 write_tsc(master_tsc);
+                /*
+                 * Try to give our hyperthread(s), if any, a chance to do
+                 * the same as instantly as possible.
+                 */
+                cpu_relax();
+            }
 
             atomic_inc(&r->semaphore);
             while ( atomic_read(&r->semaphore) > total_cpus )



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-01  9:53 [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
  2021-04-01  9:54 ` [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop Jan Beulich
  2021-04-01  9:54 ` [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous Jan Beulich
@ 2021-04-01  9:55 ` Jan Beulich
  2021-04-20 16:12   ` Roger Pau Monné
  2021-04-29 12:53   ` Roger Pau Monné
  2021-04-15  9:54 ` Ping: [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
  3 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-01  9:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Reading the platform timer isn't cheap, so we'd better avoid it when the
resulting value is of no interest to anyone.

The consumer of master_stime, obtained by
time_calibration_{std,tsc}_rendezvous() and propagated through
this_cpu(cpu_calibration), is local_time_calibration(). With
CONSTANT_TSC the latter function uses an early exit path, which doesn't
explicitly use the field. While this_cpu(cpu_calibration) (including the
master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
path, both structures' fields get consumed only by the !CONSTANT_TSC
logic of the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v4: New.
---
I realize there's some risk associated with potential new uses of the
field down the road. What would people think about compiling time.c a
2nd time into a dummy object file, with a conditional enabled to force
assuming CONSTANT_TSC, and with that conditional used to suppress
presence of the field as well as all audited used of it (i.e. in
particular that large part of local_time_calibration())? Unexpected new
users of the field would then cause build time errors.

--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -52,6 +52,7 @@ unsigned long pit0_ticks;
 struct cpu_time_stamp {
     u64 local_tsc;
     s_time_t local_stime;
+    /* Next field unconditionally valid only when !CONSTANT_TSC. */
     s_time_t master_stime;
 };
 
@@ -1702,7 +1703,7 @@ static void time_calibration_tsc_rendezv
                  * iteration.
                  */
                 r->master_tsc_stamp = r->max_tsc_stamp;
-            else if ( i == 0 )
+            else if ( !boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && i == 0 )
                 r->master_stime = read_platform_stime(NULL);
 
             atomic_inc(&r->semaphore);
@@ -1776,8 +1777,11 @@ static void time_calibration_std_rendezv
     {
         while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
             cpu_relax();
-        r->master_stime = read_platform_stime(NULL);
-        smp_wmb(); /* write r->master_stime /then/ signal */
+        if ( !boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
+        {
+            r->master_stime = read_platform_stime(NULL);
+            smp_wmb(); /* write r->master_stime /then/ signal */
+        }
         atomic_inc(&r->semaphore);
     }
     else



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Ping: [PATCH v4 0/3] x86/time: calibration rendezvous adjustments
  2021-04-01  9:53 [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2021-04-01  9:55 ` [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions Jan Beulich
@ 2021-04-15  9:54 ` Jan Beulich
  3 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-15  9:54 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné, Wei Liu; +Cc: xen-devel

On 01.04.2021 11:53, Jan Beulich wrote:
> The first patch was, under a different title and with a different
> approach, already part of the prior series of the same subject.
> The other two patches are new, resulting from me spotting further
> room for improvement (or so I hope).
> 
> 1: latch to-be-written TSC value early in rendezvous loop
> 2: yield to hyperthreads after updating TSC during rendezvous
> 3: avoid reading the platform timer in rendezvous functions

Ping?

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop
  2021-04-01  9:54 ` [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop Jan Beulich
@ 2021-04-20 15:44   ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2021-04-20 15:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Thu, Apr 01, 2021 at 11:54:05AM +0200, Jan Beulich wrote:
> To reduce latency on time_calibration_tsc_rendezvous()'s last loop
> iteration, read the value to be written on the last iteration at the end
> of the loop body (i.e. in particular at the end of the second to last
> iteration).
> 
> On my single-socket 18-core Skylake system this reduces the average loop
> exit time on CPU0 (from the TSC write on the last iteration to until
> after the main loop) from around 32k cycles to around 29k (albeit the
> values measured on separate runs vary quite significantly).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous
  2021-04-01  9:54 ` [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous Jan Beulich
@ 2021-04-20 15:59   ` Roger Pau Monné
  2021-04-21  9:57     ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2021-04-20 15:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Thu, Apr 01, 2021 at 11:54:27AM +0200, Jan Beulich wrote:
> Since we'd like the updates to be done as synchronously as possible,
> make an attempt at yielding immediately after the TSC write.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Did you observe any difference with the pause inserted?

I wonder whether that's enough to give a chance the hyperthread to
also perform the TSC write. In any case there's no harm from it
certainly.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-01  9:55 ` [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions Jan Beulich
@ 2021-04-20 16:12   ` Roger Pau Monné
  2021-04-21 10:06     ` Jan Beulich
  2021-04-29 12:53   ` Roger Pau Monné
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2021-04-20 16:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
> Reading the platform timer isn't cheap, so we'd better avoid it when the
> resulting value is of no interest to anyone.
> 
> The consumer of master_stime, obtained by
> time_calibration_{std,tsc}_rendezvous() and propagated through
> this_cpu(cpu_calibration), is local_time_calibration(). With
> CONSTANT_TSC the latter function uses an early exit path, which doesn't
> explicitly use the field. While this_cpu(cpu_calibration) (including the
> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
> path, both structures' fields get consumed only by the !CONSTANT_TSC
> logic of the function.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v4: New.
> ---
> I realize there's some risk associated with potential new uses of the
> field down the road. What would people think about compiling time.c a
> 2nd time into a dummy object file, with a conditional enabled to force
> assuming CONSTANT_TSC, and with that conditional used to suppress
> presence of the field as well as all audited used of it (i.e. in
> particular that large part of local_time_calibration())? Unexpected new
> users of the field would then cause build time errors.

Wouldn't that add quite a lot of churn to the file itself in the form
of pre-processor conditionals?

Could we instead set master_stime to an invalid value that would make
the consumers explode somehow?

I know there might be new consumers, but those should be able to
figure whether the value is sane by looking at the existing ones.

Also, since this is only done on the BSP on the last iteration I
wonder if it really makes such a difference performance-wise to
warrant all this trouble.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous
  2021-04-20 15:59   ` Roger Pau Monné
@ 2021-04-21  9:57     ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-21  9:57 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 20.04.2021 17:59, Roger Pau Monné wrote:
> On Thu, Apr 01, 2021 at 11:54:27AM +0200, Jan Beulich wrote:
>> Since we'd like the updates to be done as synchronously as possible,
>> make an attempt at yielding immediately after the TSC write.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> Did you observe any difference with the pause inserted?

I wouldn't even know how to measure it precisely enough. So - no,
I haven't. In fact ...

> I wonder whether that's enough to give a chance the hyperthread to
> also perform the TSC write. In any case there's no harm from it
> certainly.

... I have an inquiry pending with Intel as to better (more
reliable) ways to yield, for quite a bit longer than the 8259 one,
yet with the same lack of any outcome so far. Until then it really
is going to be no more than "make an attempt", as said in the
commit message.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-20 16:12   ` Roger Pau Monné
@ 2021-04-21 10:06     ` Jan Beulich
  2021-04-29  9:32       ` Jan Beulich
  2021-04-29 12:48       ` Roger Pau Monné
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-21 10:06 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 20.04.2021 18:12, Roger Pau Monné wrote:
> On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
>> Reading the platform timer isn't cheap, so we'd better avoid it when the
>> resulting value is of no interest to anyone.
>>
>> The consumer of master_stime, obtained by
>> time_calibration_{std,tsc}_rendezvous() and propagated through
>> this_cpu(cpu_calibration), is local_time_calibration(). With
>> CONSTANT_TSC the latter function uses an early exit path, which doesn't
>> explicitly use the field. While this_cpu(cpu_calibration) (including the
>> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
>> path, both structures' fields get consumed only by the !CONSTANT_TSC
>> logic of the function.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v4: New.
>> ---
>> I realize there's some risk associated with potential new uses of the
>> field down the road. What would people think about compiling time.c a
>> 2nd time into a dummy object file, with a conditional enabled to force
>> assuming CONSTANT_TSC, and with that conditional used to suppress
>> presence of the field as well as all audited used of it (i.e. in
>> particular that large part of local_time_calibration())? Unexpected new
>> users of the field would then cause build time errors.
> 
> Wouldn't that add quite a lot of churn to the file itself in the form
> of pre-processor conditionals?

Possibly - I didn't try yet, simply because of fearing this might
not be liked even without presenting it in patch form.

> Could we instead set master_stime to an invalid value that would make
> the consumers explode somehow?

No idea whether there is any such "reliable" value.

> I know there might be new consumers, but those should be able to
> figure whether the value is sane by looking at the existing ones.

This could be the hope, yes. But the effort of auditing the code to
confirm the potential of optimizing this (after vaguely getting the
impression there might be room) was non-negligible (in fact I did
three runs just to be really certain). This in particular means
that I'm in no way certain that looking at existing consumers would
point out the possible pitfall.

> Also, since this is only done on the BSP on the last iteration I
> wonder if it really makes such a difference performance-wise to
> warrant all this trouble.

By "all this trouble", do you mean the outlined further steps or
the patch itself? In the latter case, while it's only the BSP to
read the value, all other CPUs are waiting for the BSP to get its
part done. So the extra time it takes to read the platform clock
affects the overall duration of the rendezvous, and hence the time
not "usefully" spent by _all_ of the CPUs.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-21 10:06     ` Jan Beulich
@ 2021-04-29  9:32       ` Jan Beulich
  2021-04-29 12:48       ` Roger Pau Monné
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-29  9:32 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 21.04.2021 12:06, Jan Beulich wrote:
> On 20.04.2021 18:12, Roger Pau Monné wrote:
>> On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
>>> Reading the platform timer isn't cheap, so we'd better avoid it when the
>>> resulting value is of no interest to anyone.
>>>
>>> The consumer of master_stime, obtained by
>>> time_calibration_{std,tsc}_rendezvous() and propagated through
>>> this_cpu(cpu_calibration), is local_time_calibration(). With
>>> CONSTANT_TSC the latter function uses an early exit path, which doesn't
>>> explicitly use the field. While this_cpu(cpu_calibration) (including the
>>> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
>>> path, both structures' fields get consumed only by the !CONSTANT_TSC
>>> logic of the function.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> v4: New.
>>> ---
>>> I realize there's some risk associated with potential new uses of the
>>> field down the road. What would people think about compiling time.c a
>>> 2nd time into a dummy object file, with a conditional enabled to force
>>> assuming CONSTANT_TSC, and with that conditional used to suppress
>>> presence of the field as well as all audited used of it (i.e. in
>>> particular that large part of local_time_calibration())? Unexpected new
>>> users of the field would then cause build time errors.
>>
>> Wouldn't that add quite a lot of churn to the file itself in the form
>> of pre-processor conditionals?
> 
> Possibly - I didn't try yet, simply because of fearing this might
> not be liked even without presenting it in patch form.
> 
>> Could we instead set master_stime to an invalid value that would make
>> the consumers explode somehow?
> 
> No idea whether there is any such "reliable" value.
> 
>> I know there might be new consumers, but those should be able to
>> figure whether the value is sane by looking at the existing ones.
> 
> This could be the hope, yes. But the effort of auditing the code to
> confirm the potential of optimizing this (after vaguely getting the
> impression there might be room) was non-negligible (in fact I did
> three runs just to be really certain). This in particular means
> that I'm in no way certain that looking at existing consumers would
> point out the possible pitfall.
> 
>> Also, since this is only done on the BSP on the last iteration I
>> wonder if it really makes such a difference performance-wise to
>> warrant all this trouble.
> 
> By "all this trouble", do you mean the outlined further steps or
> the patch itself? In the latter case, while it's only the BSP to
> read the value, all other CPUs are waiting for the BSP to get its
> part done. So the extra time it takes to read the platform clock
> affects the overall duration of the rendezvous, and hence the time
> not "usefully" spent by _all_ of the CPUs.

Ping? Your answer here has a significant effect on the disposition
of this change.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-21 10:06     ` Jan Beulich
  2021-04-29  9:32       ` Jan Beulich
@ 2021-04-29 12:48       ` Roger Pau Monné
  1 sibling, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2021-04-29 12:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Wed, Apr 21, 2021 at 12:06:34PM +0200, Jan Beulich wrote:
> On 20.04.2021 18:12, Roger Pau Monné wrote:
> > On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
> >> Reading the platform timer isn't cheap, so we'd better avoid it when the
> >> resulting value is of no interest to anyone.
> >>
> >> The consumer of master_stime, obtained by
> >> time_calibration_{std,tsc}_rendezvous() and propagated through
> >> this_cpu(cpu_calibration), is local_time_calibration(). With
> >> CONSTANT_TSC the latter function uses an early exit path, which doesn't
> >> explicitly use the field. While this_cpu(cpu_calibration) (including the
> >> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
> >> path, both structures' fields get consumed only by the !CONSTANT_TSC
> >> logic of the function.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> v4: New.
> >> ---
> >> I realize there's some risk associated with potential new uses of the
> >> field down the road. What would people think about compiling time.c a
> >> 2nd time into a dummy object file, with a conditional enabled to force
> >> assuming CONSTANT_TSC, and with that conditional used to suppress
> >> presence of the field as well as all audited used of it (i.e. in
> >> particular that large part of local_time_calibration())? Unexpected new
> >> users of the field would then cause build time errors.
> > 
> > Wouldn't that add quite a lot of churn to the file itself in the form
> > of pre-processor conditionals?
> 
> Possibly - I didn't try yet, simply because of fearing this might
> not be liked even without presenting it in patch form.
> 
> > Could we instead set master_stime to an invalid value that would make
> > the consumers explode somehow?
> 
> No idea whether there is any such "reliable" value.
> 
> > I know there might be new consumers, but those should be able to
> > figure whether the value is sane by looking at the existing ones.
> 
> This could be the hope, yes. But the effort of auditing the code to
> confirm the potential of optimizing this (after vaguely getting the
> impression there might be room) was non-negligible (in fact I did
> three runs just to be really certain). This in particular means
> that I'm in no way certain that looking at existing consumers would
> point out the possible pitfall.
> 
> > Also, since this is only done on the BSP on the last iteration I
> > wonder if it really makes such a difference performance-wise to
> > warrant all this trouble.
> 
> By "all this trouble", do you mean the outlined further steps or
> the patch itself?

Yes, either the further steps or the fact that we would have to be
careful to not introduce new users of master_stime that expect it to
be set when CONSTANT_TSC is true.

> In the latter case, while it's only the BSP to
> read the value, all other CPUs are waiting for the BSP to get its
> part done. So the extra time it takes to read the platform clock
> affects the overall duration of the rendezvous, and hence the time
> not "usefully" spent by _all_ of the CPUs.

Right, but that's only during the time rendezvous, which doesn't
happen that often. And I guess that just the rendezvous of all CPUs is
biggest hit in terms of performance.

While I don't think I would have done the work myself, I guess there's
no reason to block it.

In any case I would prefer if such performance related changes come
with some proof that they do indeed make a difference, or else we
might just be making the code more complicated for no concrete
performance benefit.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-01  9:55 ` [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions Jan Beulich
  2021-04-20 16:12   ` Roger Pau Monné
@ 2021-04-29 12:53   ` Roger Pau Monné
  2021-04-29 13:51     ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2021-04-29 12:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
> Reading the platform timer isn't cheap, so we'd better avoid it when the
> resulting value is of no interest to anyone.
> 
> The consumer of master_stime, obtained by
> time_calibration_{std,tsc}_rendezvous() and propagated through
> this_cpu(cpu_calibration), is local_time_calibration(). With
> CONSTANT_TSC the latter function uses an early exit path, which doesn't
> explicitly use the field. While this_cpu(cpu_calibration) (including the
> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
> path, both structures' fields get consumed only by the !CONSTANT_TSC
> logic of the function.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Albeit as said on my other email I would prefer performance related
changes like this one to be accompanied with some proof that they
actually make a difference, or else we risk making the code more
complicated for no concrete benefit.

> ---
> v4: New.
> ---
> I realize there's some risk associated with potential new uses of the
> field down the road. What would people think about compiling time.c a
> 2nd time into a dummy object file, with a conditional enabled to force
> assuming CONSTANT_TSC, and with that conditional used to suppress
> presence of the field as well as all audited used of it (i.e. in
> particular that large part of local_time_calibration())? Unexpected new
> users of the field would then cause build time errors.
> 
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -52,6 +52,7 @@ unsigned long pit0_ticks;
>  struct cpu_time_stamp {
>      u64 local_tsc;
>      s_time_t local_stime;
> +    /* Next field unconditionally valid only when !CONSTANT_TSC. */

Could you also mention this is only true for the cpu_time_stamp that's
used in cpu_calibration?

For ap_bringup_ref master_stime is valid regardless of CONSTANT_TSC.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions
  2021-04-29 12:53   ` Roger Pau Monné
@ 2021-04-29 13:51     ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-04-29 13:51 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 29.04.2021 14:53, Roger Pau Monné wrote:
> On Thu, Apr 01, 2021 at 11:55:10AM +0200, Jan Beulich wrote:
>> Reading the platform timer isn't cheap, so we'd better avoid it when the
>> resulting value is of no interest to anyone.
>>
>> The consumer of master_stime, obtained by
>> time_calibration_{std,tsc}_rendezvous() and propagated through
>> this_cpu(cpu_calibration), is local_time_calibration(). With
>> CONSTANT_TSC the latter function uses an early exit path, which doesn't
>> explicitly use the field. While this_cpu(cpu_calibration) (including the
>> master_stime field) gets propagated to this_cpu(cpu_time).stamp on that
>> path, both structures' fields get consumed only by the !CONSTANT_TSC
>> logic of the function.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> Albeit as said on my other email I would prefer performance related
> changes like this one to be accompanied with some proof that they
> actually make a difference, or else we risk making the code more
> complicated for no concrete benefit.

I'm not sure that's always sensible or useful. Removing an operation
that may take hundreds of clocks is surely not going to make things
worse performance-wise. Whether it's measurable in any way with
real world workloads is hard to predict. Micro-measurement, as
expected, shows an improvement.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -52,6 +52,7 @@ unsigned long pit0_ticks;
>>  struct cpu_time_stamp {
>>      u64 local_tsc;
>>      s_time_t local_stime;
>> +    /* Next field unconditionally valid only when !CONSTANT_TSC. */
> 
> Could you also mention this is only true for the cpu_time_stamp that's
> used in cpu_calibration?
> 
> For ap_bringup_ref master_stime is valid regardless of CONSTANT_TSC.

Well, that's precisely why I put "unconditionally" there. I'm not
convinced it's helpful to point out ap_bringup_ref in particular,
as the comment would then likely not get updated when yet another
instance appears which sets the field in all cases. If you have a
suggestion on how to word this better without mentioning particular
instances of the struct, I'll be happy to consider taking that.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-04-29 13:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01  9:53 [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich
2021-04-01  9:54 ` [PATCH v4 1/3] x86/time: latch to-be-written TSC value early in rendezvous loop Jan Beulich
2021-04-20 15:44   ` Roger Pau Monné
2021-04-01  9:54 ` [PATCH v4 2/3] x86/time: yield to hyperthreads after updating TSC during rendezvous Jan Beulich
2021-04-20 15:59   ` Roger Pau Monné
2021-04-21  9:57     ` Jan Beulich
2021-04-01  9:55 ` [PATCH v4 3/3] x86/time: avoid reading the platform timer in rendezvous functions Jan Beulich
2021-04-20 16:12   ` Roger Pau Monné
2021-04-21 10:06     ` Jan Beulich
2021-04-29  9:32       ` Jan Beulich
2021-04-29 12:48       ` Roger Pau Monné
2021-04-29 12:53   ` Roger Pau Monné
2021-04-29 13:51     ` Jan Beulich
2021-04-15  9:54 ` Ping: [PATCH v4 0/3] x86/time: calibration rendezvous adjustments Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).