All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
@ 2019-03-08 19:20 Olaf Hering
  2019-03-08 19:29 ` Olaf Hering
  2019-03-11 10:16 ` Jan Beulich
  0 siblings, 2 replies; 13+ messages in thread
From: Olaf Hering @ 2019-03-08 19:20 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Olaf Hering, Wei Liu, Jan Beulich, Roger Pau Monné

Improve decision when vTSC emulation will be activated for a domU with
tsc_mode=default. The current approach is to compare the cpu_khz value
from two physical hosts. Since this value is not accurate, it can not be
used verbatim to decide if vTSC emulation needs to be enabled. Without
this change each TSC access from domU will be emulated after migration,
which causes a significant perfomance drop for workloads that make use
of rdtsc.

If a domU uses TSC as clocksoure it also must run NTP in some way to
avoid the potential drift what will most likely happen, independent of
any migration. The calculation of the drift is based on the time
returned by remote servers versus how fast the local clock advances. NTP
on Linux can handle a drift up to 500PPM. This means the local
clocksource can run up to 500us slower or faster. This calculation is
based on the TSC frequency of the host where the domU was started.

If the domU is migrated to another host of the same class, both hosts
may have a slightly different TSC frequency. The difference is small
enough and most likely within the drift range that NTP can handle.
The upper drift limit of 500PPM is 1MHz on a 2.0GHz host.

Once a domU is migrated to a host of a different class, like from
"2.3GHz" to "2.4GHz", the TSC frequency change is significant. The domU
kernel may not recalibrate itself. As a result, the drift will be larger
and will be way outside of the 500 PPM range. In addition, the kernel
may notice the change of speed in which the TSC advances and could
change the clocksource. This will impact the workload within the domU.
All this depends of course on the type of OS that is running in the
domU. This patch can do nothing for this case.

The formula to set the tolerance for this host calculates the ticks
within a timespan of 500 PPM, which is 500us. From this number the
assumed jitter in the TSC frequency measurement must be substracted
because Xen itself can not know if the estimated value in cpu_khz is at
the edge or in the middle of the range of possible freqencies. Data
collected during the incident which triggered this change showed a
jitter of up to 200 KHz across systems of the same class. The resulting
tolerance is larger than needed, and it is expected to still cover the
possible drift that NTP can handle.

To reiterate the second paragraph: if a domU uses TSC as primary clock
source, it is expected that it runs NTP to cover for the resulting
drift. Therefore this change does no need a knob to turn it on or off.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
--

v12:
 - rebase to 4deeaf2a3e
 - remove casts and trailing dot in early_time_init
 - add comments to explain how vtsc_tolerance_khz is calculated
 - adjust cast in ABS()
 - adjust comment in tsc_set_info
v11:
 - trim patch and use calculated tolerance value, no admin interaction
   required
v10:
 - rebase to ae01a8e315
 - remove changes for libxl and save/restore protocol, the feature has
   to be per host instead of per guest
 - add newline to tsc_set_info (Andrew)
 - add pointer to xen-tscmode(7) in xl.cfg(5)/vtsc_tolerance_khz (Andrew)
 - mention potential clock drift in the domU (Andrew)
 - reword the newly added paragraph in xen-tscmode(7) (Andrew),
   and also mention that it is about the measured/estimated TSC value
   rather than the real value. The latter is simply unknown.
 - use uint32 for internal representation of xen_domctl_tsc_info.vtsc_tolerance_khz
   and remove padding field
 - add math for real TSC frequency to xen-tscmode
v9:
 - extend commit msg, mention potential issues with xc_sr_rec_tsc_info._res1
v8:
 - adjust also python stream checker for added tolerance member
v7:
 - use uint16 in libxl_types.idl to match type used elsewhere in the patch
v6:
 - mention default value in xl.cfg
 - tsc_set_info: remove usage of __func__, use %d for domid
 - tsc_set_info: use ABS to calculate khz_diff
v5:
 - reduce functionality to allow setting of the tolerance value
   only at initial domU startup
v4:
 - add missing copyback in XEN_DOMCTL_set_vtsc_tolerance_khz
v3:
 - rename vtsc_khz_tolerance to vtsc_tolerance_khz
 - separate domctls to adjust values
 - more docs
 - update libxl.h
 - update python tests
 - flask check bound to tsc permissions
 - not runtime tested due to dlsym() build errors in staging
---
 xen/arch/x86/time.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 9a6ea8ffcb..bef5724c05 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -43,6 +43,9 @@ static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
+#define VTSC_NTP_PPM_TOLERANCE 500UL  /* Amount of drift NTP will handle */
+#define VTSC_JITTER_RANGE_KHZ 200UL   /* Assumed jitter in cpu_khz */
+static unsigned int __read_mostly vtsc_tolerance_khz;
 DEFINE_SPINLOCK(rtc_lock);
 unsigned long pit0_ticks;
 
@@ -1885,6 +1888,27 @@ void __init early_time_init(void)
     printk("Detected %lu.%03lu MHz processor.\n", 
            cpu_khz / 1000, cpu_khz % 1000);
 
+    /*
+     * How many kHz (in other words: drift) is ntpd in domU expected to handle?
+     *    freq    tolerated freq difference
+     *  ------- = -------------------------
+     *  Million         Million + PPM      
+     */
+    tmp = 1000 * 1000;
+    tmp += VTSC_NTP_PPM_TOLERANCE;
+    tmp *= cpu_khz;
+    tmp /= 1000 * 1000;
+
+    tmp -= cpu_khz;
+
+    /*
+     * Reduce the theoretical upper limit by the assumed measuring inaccuracy.
+     */
+    if ( tmp >= VTSC_JITTER_RANGE_KHZ )
+        tmp -= VTSC_JITTER_RANGE_KHZ;
+    vtsc_tolerance_khz = tmp;
+    printk("Tolerating vtsc jitter for domUs: %u kHz\n", vtsc_tolerance_khz);
+
     setup_irq(0, 0, &irq0);
 }
 
@@ -2193,6 +2217,8 @@ int tsc_set_info(struct domain *d,
 
     switch ( tsc_mode )
     {
+        bool disable_vtsc;
+
     case TSC_MODE_DEFAULT:
     case TSC_MODE_ALWAYS_EMULATE:
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
@@ -2201,13 +2227,30 @@ int tsc_set_info(struct domain *d,
 
         /*
          * In default mode use native TSC if the host has safe TSC and
-         * host and guest frequencies are the same (either "naturally" or
-         * - for HVM/PVH - via TSC scaling).
+         * host and guest frequencies are (almost) the same (either "naturally"
+         * or - for HVM/PVH - via TSC scaling).
          * When a guest is created, gtsc_khz is passed in as zero, making
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
+        disable_vtsc = d->arch.tsc_khz == cpu_khz;
+
+        if ( tsc_mode == TSC_MODE_DEFAULT && gtsc_khz && vtsc_tolerance_khz )
+        {
+            long khz_diff;
+
+            khz_diff = ABS(((long)cpu_khz - gtsc_khz));
+            disable_vtsc = khz_diff <= vtsc_tolerance_khz;
+
+            printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
+                   " domU expects %u kHz,"
+                   " difference of %ld is %s tolerance of %u\n",
+                   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
+                   disable_vtsc ? "within" : "outside",
+                   vtsc_tolerance_khz);
+        }
+
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (disable_vtsc ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-08 19:20 [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation Olaf Hering
@ 2019-03-08 19:29 ` Olaf Hering
  2019-03-11 10:02   ` Jan Beulich
  2019-03-11 10:16 ` Jan Beulich
  1 sibling, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2019-03-08 19:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 515 bytes --]

Am Fri,  8 Mar 2019 20:20:59 +0100
schrieb Olaf Hering <olaf@aepfle.de>:

> To reiterate the second paragraph: if a domU uses TSC as primary clock
> source, it is expected that it runs NTP to cover for the resulting
> drift. Therefore this change does no need a knob to turn it on or off.

One interesting aspect is: the xenlinux based kernels report clocksource=tsc,
but I think there was no explicit requirement to run NTP in domU.
There was even that 'independent_wallclock' knob. Why is that?

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-08 19:29 ` Olaf Hering
@ 2019-03-11 10:02   ` Jan Beulich
  2019-03-11 10:26     ` Olaf Hering
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2019-03-11 10:02 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 08.03.19 at 20:29, <olaf@aepfle.de> wrote:
> Am Fri,  8 Mar 2019 20:20:59 +0100
> schrieb Olaf Hering <olaf@aepfle.de>:
> 
>> To reiterate the second paragraph: if a domU uses TSC as primary clock
>> source, it is expected that it runs NTP to cover for the resulting
>> drift. Therefore this change does no need a knob to turn it on or off.
> 
> One interesting aspect is: the xenlinux based kernels report clocksource=tsc,

I don't think they do - iirc they are hardcoded to clocksource=xen.

> but I think there was no explicit requirement to run NTP in domU.

Correct, unless "independent wall clock" mode was used.

> There was even that 'independent_wallclock' knob. Why is that?

Why is what? Are you questioning the presence of the setting in
the XenoLinux kernels, or its absence in the pv-ops ones?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-08 19:20 [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation Olaf Hering
  2019-03-08 19:29 ` Olaf Hering
@ 2019-03-11 10:16 ` Jan Beulich
  2019-03-11 10:49   ` Olaf Hering
  2019-03-12 10:03   ` Olaf Hering
  1 sibling, 2 replies; 13+ messages in thread
From: Jan Beulich @ 2019-03-11 10:16 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 08.03.19 at 20:20, <olaf@aepfle.de> wrote:
> To reiterate the second paragraph: if a domU uses TSC as primary clock
> source, it is expected that it runs NTP to cover for the resulting
> drift. Therefore this change does no need a knob to turn it on or off.

Did you omit a 't' or a 'w' above? Judging from the patch I think you
mean "not", but I don't see how this follows, especially with your
subsequent reply validly stating that such a requirement did not
exist with the XenoLinux kernels. And please don't forget that
Linux is not the only possible guest OS. What is or is not a
requirement inside the guest depends not only on Xen's behavior,
but also on the OS'es. Hence uniformly (and even by default)
changing the behavior for everyone is imo not acceptable.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -43,6 +43,9 @@ static char __initdata opt_clocksource[10];
>  string_param("clocksource", opt_clocksource);
>  
>  unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
> +#define VTSC_NTP_PPM_TOLERANCE 500UL  /* Amount of drift NTP will handle */

As per above, "will" is too strong here: I think this wants to be "can", and
you want to add "if enabled in the guest".

> +#define VTSC_JITTER_RANGE_KHZ 200UL   /* Assumed jitter in cpu_khz */

I'm struggling to understand the comment: Surely not every single
CPU surfaces a jitter of precisely 200kHz?

> @@ -1885,6 +1888,27 @@ void __init early_time_init(void)
>      printk("Detected %lu.%03lu MHz processor.\n", 
>             cpu_khz / 1000, cpu_khz % 1000);
>  
> +    /*
> +     * How many kHz (in other words: drift) is ntpd in domU expected to handle?

Same here - you can't expect anything. Even the name "ntpd" is already
going too far without something like "for example" attached to it. There
shouldn't be strict dependencies on guests only possibly be Linux based.

> +     *    freq    tolerated freq difference
> +     *  ------- = -------------------------
> +     *  Million         Million + PPM      
> +     */
> +    tmp = 1000 * 1000;
> +    tmp += VTSC_NTP_PPM_TOLERANCE;
> +    tmp *= cpu_khz;
> +    tmp /= 1000 * 1000;
> +
> +    tmp -= cpu_khz;
> +
> +    /*
> +     * Reduce the theoretical upper limit by the assumed measuring inaccuracy.
> +     */
> +    if ( tmp >= VTSC_JITTER_RANGE_KHZ )
> +        tmp -= VTSC_JITTER_RANGE_KHZ;

This could suggest the constant is meant to be an upper bound, but
(as said in review of prior versions) subtracting introduce a non-
linearity, _and_ it doesn't guarantee the result to be within the
assumed bounds.

> +    vtsc_tolerance_khz = tmp;
> +    printk("Tolerating vtsc jitter for domUs: %u kHz\n", vtsc_tolerance_khz);

Could you remind me again how Dom0 remains unaffected here?
And if that's indeed the case, why would that be?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 10:02   ` Jan Beulich
@ 2019-03-11 10:26     ` Olaf Hering
  2019-03-11 10:42       ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2019-03-11 10:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1303 bytes --]

Am Mon, 11 Mar 2019 04:02:14 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> > One interesting aspect is: the xenlinux based kernels report clocksource=tsc,  
> I don't think they do - iirc they are hardcoded to clocksource=xen.

For HVM they do:

 # dmesg | grep -Ei '(clock|hz)'
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Detected 2000.225 MHz processor
[    0.142074] smpboot: CPU0: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz (fam: 06, model: 1a, stepping: 05)
[    0.322108] hpet0: 3 comparators, 64-bit 62.500000 MHz counter
[    0.324095] Switched to clocksource hpet
[    2.436321] tsc: Refined TSC clocksource calibration: 2000.084 MHz
[    3.437123] Switched to clocksource tsc

But PV shows just that:

 # dmesg | grep -Ei '(clock|hz)'
[    0.000000] Xen reported: 2000.084 MHz processor.
[    0.316671] Switched to clocksource xen

> > There was even that 'independent_wallclock' knob. Why is that?  
> Why is what? Are you questioning the presence of the setting in
> the XenoLinux kernels, or its absence in the pv-ops ones?

I'm just curious why HVM uses tsc, but nothing states that using ntpd would be a requirement.

However, a brief search suggests that reported "clock issues" are indeed solved by running ntp in HVM.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 10:26     ` Olaf Hering
@ 2019-03-11 10:42       ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2019-03-11 10:42 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 11.03.19 at 11:26, <olaf@aepfle.de> wrote:
> Am Mon, 11 Mar 2019 04:02:14 -0600
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> > One interesting aspect is: the xenlinux based kernels report 
> clocksource=tsc,  
>> I don't think they do - iirc they are hardcoded to clocksource=xen.
> 
> For HVM they do:

There's no such thing as a XenoLinux guest running in HVM mode. The
only HVM-related derivate from XenoLinux are the PV drivers used by
the corresponding version HVM guests. Time management inside such
HVM guests is entirely unaffected by this iirc, i.e. matches that of
plain HVM (without any PV extensions).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 10:16 ` Jan Beulich
@ 2019-03-11 10:49   ` Olaf Hering
  2019-03-11 11:19     ` Jan Beulich
  2019-03-12 10:03   ` Olaf Hering
  1 sibling, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2019-03-11 10:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 3192 bytes --]

Am Mon, 11 Mar 2019 04:16:07 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> >>> On 08.03.19 at 20:20, <olaf@aepfle.de> wrote:  
> > To reiterate the second paragraph: if a domU uses TSC as primary clock
> > source, it is expected that it runs NTP to cover for the resulting
> > drift. Therefore this change does no need a knob to turn it on or off.  
> Did you omit a 't' or a 'w' above? Judging from the patch I think you
> mean "not", but I don't see how this follows, especially with your
> subsequent reply validly stating that such a requirement did not
> exist with the XenoLinux kernels.

This does indeed mean 'does not need a knob'.

I do not see how the HVM domU clock can be without drift if it does not
sync with an external source. It seems xenlinux based PV drivers lack
a clocksource, so they would better run ntp. pvops provides a clocksource=xen,
but apparently with a low resolution.


> > +#define VTSC_NTP_PPM_TOLERANCE 500UL  /* Amount of drift NTP will handle */  
> As per above, "will" is too strong here: I think this wants to be "can", and
> you want to add "if enabled in the guest".

I will reword this comment.

> > +#define VTSC_JITTER_RANGE_KHZ 200UL   /* Assumed jitter in cpu_khz */  
> I'm struggling to understand the comment: Surely not every single
> CPU surfaces a jitter of precisely 200kHz?

This is the observed range of frequencies on a large number of hosts.
The frequencies are expected to be exactly like "2.4GHz", but due to
inaccurate measurement the value of "cpu_khz" is higher or lower.
The value of "200" is simple the range of observed frequencies.

I will find a better name for this variable.


> > +     *    freq    tolerated freq difference
> > +     *  ------- = -------------------------
> > +     *  Million         Million + PPM      
> > +     */
> > +    tmp = 1000 * 1000;
> > +    tmp += VTSC_NTP_PPM_TOLERANCE;
> > +    tmp *= cpu_khz;
> > +    tmp /= 1000 * 1000;
> > +
> > +    tmp -= cpu_khz;
> > +
> > +    /*
> > +     * Reduce the theoretical upper limit by the assumed measuring inaccuracy.
> > +     */
> > +    if ( tmp >= VTSC_JITTER_RANGE_KHZ )
> > +        tmp -= VTSC_JITTER_RANGE_KHZ;  
> This could suggest the constant is meant to be an upper bound, but
> (as said in review of prior versions) subtracting introduce a non-
> linearity, _and_ it doesn't guarantee the result to be within the
> assumed bounds.

The upper bound is the PPM value. ntpd in Linux can handle up to 500.
What is unclear about this formula?
First PPM is converted into "khz", on a 2GHz system tmp will become 1000.
Then the inaccuracy has to be handled, Xen can not know if cpu_khz is correct.
As said above, the observed range was 200, so this will be subtracted.

> > +    vtsc_tolerance_khz = tmp;
> > +    printk("Tolerating vtsc jitter for domUs: %u kHz\n", vtsc_tolerance_khz);  
> Could you remind me again how Dom0 remains unaffected here?
> And if that's indeed the case, why would that be?

A dom0 does not move from one host to another.
And it better synchronizes with an external clock if domUs are supposed to be migrated in the future.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 10:49   ` Olaf Hering
@ 2019-03-11 11:19     ` Jan Beulich
  2019-03-11 11:57       ` Olaf Hering
  2019-03-11 12:09       ` Olaf Hering
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Beulich @ 2019-03-11 11:19 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 11.03.19 at 11:49, <olaf@aepfle.de> wrote:
> Am Mon, 11 Mar 2019 04:16:07 -0600
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> >>> On 08.03.19 at 20:20, <olaf@aepfle.de> wrote:  
>> > To reiterate the second paragraph: if a domU uses TSC as primary clock
>> > source, it is expected that it runs NTP to cover for the resulting
>> > drift. Therefore this change does no need a knob to turn it on or off.  
>> Did you omit a 't' or a 'w' above? Judging from the patch I think you
>> mean "not", but I don't see how this follows, especially with your
>> subsequent reply validly stating that such a requirement did not
>> exist with the XenoLinux kernels.
> 
> This does indeed mean 'does not need a knob'.
> 
> I do not see how the HVM domU clock can be without drift if it does not
> sync with an external source. It seems xenlinux based PV drivers lack
> a clocksource, so they would better run ntp.

Yes indeed. But it still cannot be a requirement. There may be people
wanting to fully isolate their systems.

Plus - is your change somehow limited to HVM guests? I can't seem to
spot why that would be. And if it isn't, then leaving PV guests out of
the discussion is simply wrong.

Also I'm having trouble seeing how the connection to "drift" has come
up all of the sudden. Your change is to deal with singular events
(migration) aiui.

>> > +     *    freq    tolerated freq difference
>> > +     *  ------- = -------------------------
>> > +     *  Million         Million + PPM      
>> > +     */
>> > +    tmp = 1000 * 1000;
>> > +    tmp += VTSC_NTP_PPM_TOLERANCE;
>> > +    tmp *= cpu_khz;
>> > +    tmp /= 1000 * 1000;
>> > +
>> > +    tmp -= cpu_khz;
>> > +
>> > +    /*
>> > +     * Reduce the theoretical upper limit by the assumed measuring inaccuracy.
>> > +     */
>> > +    if ( tmp >= VTSC_JITTER_RANGE_KHZ )
>> > +        tmp -= VTSC_JITTER_RANGE_KHZ;  
>> This could suggest the constant is meant to be an upper bound, but
>> (as said in review of prior versions) subtracting introduce a non-
>> linearity, _and_ it doesn't guarantee the result to be within the
>> assumed bounds.
> 
> The upper bound is the PPM value. ntpd in Linux can handle up to 500.
> What is unclear about this formula?
> First PPM is converted into "khz", on a 2GHz system tmp will become 1000.

I didn't question this part (except again for its Linux connection, which
you point out in your reply).

> Then the inaccuracy has to be handled, Xen can not know if cpu_khz is correct.
> As said above, the observed range was 200, so this will be subtracted.

This is what I consider wrong, because it results in

   tmp (initial)	|   tmp (result)
   198		|   198
   199		|   199
   200		|   0
   201		|   1
   ...
   398		|   198
   399		|   199
   400		|   0
   401		|   1

I'd expect you to _cap_ at 200 instead. But of course the seemingly
random 200 will itself need a much better reasoning, or at least a
clear indication of the data base (number of different systems) that
it was derived from. "Large number of hosts", after all, may mean 12
to you and tens of thousands to me.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 11:19     ` Jan Beulich
@ 2019-03-11 11:57       ` Olaf Hering
  2019-03-11 14:07         ` Jan Beulich
  2019-03-11 12:09       ` Olaf Hering
  1 sibling, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2019-03-11 11:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 2431 bytes --]

Am Mon, 11 Mar 2019 05:19:34 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> >>> On 11.03.19 at 11:49, <olaf@aepfle.de> wrote:  
> > I do not see how the HVM domU clock can be without drift if it does not
> > sync with an external source. It seems xenlinux based PV drivers lack
> > a clocksource, so they would better run ntp.  
> Yes indeed. But it still cannot be a requirement. There may be people
> wanting to fully isolate their systems.

They are free to isolate them. Still, there has to be a source to keep
time in sync. One of the involved dom0s can be declared as reference.
I'm not seeing how this change does affect the time within such domUs
in a significant way. Time will move forward at slightly different speed
when running on both hosts.

> Plus - is your change somehow limited to HVM guests? I can't seem to
> spot why that would be. And if it isn't, then leaving PV guests out of
> the discussion is simply wrong.

tsc_set_info exits early for non-HVM, so I think PV just does not have vtsc?

> Also I'm having trouble seeing how the connection to "drift" has come
> up all of the sudden. Your change is to deal with singular events
> (migration) aiui.

"drift" as in "TSC runs at different speed than reference clock"?
It is the migration/restore that enables emulation of TSC access.

> > Then the inaccuracy has to be handled, Xen can not know if cpu_khz is correct.
> > As said above, the observed range was 200, so this will be subtracted.  
> 
> This is what I consider wrong, because it results in
> 
>    tmp (initial)	|   tmp (result)
>    198		|   198
>    199		|   199
>    200		|   0
>    201		|   1
>    ...
>    398		|   198
>    399		|   199
>    400		|   0
>    401		|   1

Why does it become 0 here?

 
> I'd expect you to _cap_ at 200 instead. But of course the seemingly
> random 200 will itself need a much better reasoning, or at least a
> clear indication of the data base (number of different systems) that
> it was derived from. "Large number of hosts", after all, may mean 12
> to you and tens of thousands to me.

The formula reduces the calculated number by a constant. tmp would reach
zero on a 400MHz host. Assuming that still exists, the worst case is emulation
of TSC access. If the check would be (tmp > 200), tmp would become 1 khz
of difference. I think either '>' or '>=' would be fine on such system.


Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 11:19     ` Jan Beulich
  2019-03-11 11:57       ` Olaf Hering
@ 2019-03-11 12:09       ` Olaf Hering
  2019-03-11 14:11         ` Jan Beulich
  1 sibling, 1 reply; 13+ messages in thread
From: Olaf Hering @ 2019-03-11 12:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 586 bytes --]

Am Mon, 11 Mar 2019 05:19:34 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> But of course the seemingly
> random 200 will itself need a much better reasoning, or at least a
> clear indication of the data base (number of different systems) that
> it was derived from. "Large number of hosts", after all, may mean 12
> to you and tens of thousands to me.

How many hosts would be relevant?
My own test hosts calibrate itself in a range of maybe 30 khz.
I gave up to track the exact numbers after every reboot.

If it helps, I can gather 'cpu_khz' once again.


Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 11:57       ` Olaf Hering
@ 2019-03-11 14:07         ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2019-03-11 14:07 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 11.03.19 at 12:57, <olaf@aepfle.de> wrote:
> Am Mon, 11 Mar 2019 05:19:34 -0600
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> Plus - is your change somehow limited to HVM guests? I can't seem to
>> spot why that would be. And if it isn't, then leaving PV guests out of
>> the discussion is simply wrong.
> 
> tsc_set_info exits early for non-HVM, so I think PV just does not have vtsc?

It does exit early, but only for PV hwdom afaics.

>> Also I'm having trouble seeing how the connection to "drift" has come
>> up all of the sudden. Your change is to deal with singular events
>> (migration) aiui.
> 
> "drift" as in "TSC runs at different speed than reference clock"?
> It is the migration/restore that enables emulation of TSC access.

Well, "drift" to me means something that changes slowly. The
change during migration is a single discontinuity though, even when
capped at 200MHz.

>> > Then the inaccuracy has to be handled, Xen can not know if cpu_khz is correct.
>> > As said above, the observed range was 200, so this will be subtracted.  
>> 
>> This is what I consider wrong, because it results in
>> 
>>    tmp (initial)	|   tmp (result)
>>    198		|   198
>>    199		|   199
>>    200		|   0
>>    201		|   1
>>    ...
>>    398		|   198
>>    399		|   199
>>    400		|   0
>>    401		|   1
> 
> Why does it become 0 here?

Oops, this second instance should have had 200 and 201 respectively.

>> I'd expect you to _cap_ at 200 instead. But of course the seemingly
>> random 200 will itself need a much better reasoning, or at least a
>> clear indication of the data base (number of different systems) that
>> it was derived from. "Large number of hosts", after all, may mean 12
>> to you and tens of thousands to me.
> 
> The formula reduces the calculated number by a constant. tmp would reach
> zero on a 400MHz host. Assuming that still exists, the worst case is emulation
> of TSC access. If the check would be (tmp > 200), tmp would become 1 khz
> of difference. I think either '>' or '>=' would be fine on such system.

The question is not whether to use > or >= ; the question is how
the step from 199 to 200 (or from 200 to 201) can sensibly be the
way you've coded it. But perhaps I'm simply missing something here.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 12:09       ` Olaf Hering
@ 2019-03-11 14:11         ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2019-03-11 14:11 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne

>>> On 11.03.19 at 13:09, <olaf@aepfle.de> wrote:
> Am Mon, 11 Mar 2019 05:19:34 -0600
> schrieb "Jan Beulich" <JBeulich@suse.com>:
> 
>> But of course the seemingly
>> random 200 will itself need a much better reasoning, or at least a
>> clear indication of the data base (number of different systems) that
>> it was derived from. "Large number of hosts", after all, may mean 12
>> to you and tens of thousands to me.
> 
> How many hosts would be relevant?

I don't know. More than dozens but less than tens of thousands,
I'd say.

> My own test hosts calibrate itself in a range of maybe 30 khz.
> I gave up to track the exact numbers after every reboot.

The variance of the calibration result says nothing about the
absolute error, nor about any possible drift. It's only an indication
of the quality of the calibration logic _plus_ the stability of the
base frequency.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation
  2019-03-11 10:16 ` Jan Beulich
  2019-03-11 10:49   ` Olaf Hering
@ 2019-03-12 10:03   ` Olaf Hering
  1 sibling, 0 replies; 13+ messages in thread
From: Olaf Hering @ 2019-03-12 10:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1928 bytes --]

Am Mon, 11 Mar 2019 04:16:07 -0600
schrieb "Jan Beulich" <JBeulich@suse.com>:

> >>> On 08.03.19 at 20:20, <olaf@aepfle.de> wrote:  
> > To reiterate the second paragraph: if a domU uses TSC as primary clock
> > source, it is expected that it runs NTP to cover for the resulting
> > drift. Therefore this change does no need a knob to turn it on or off.  
> 
> Did you omit a 't' or a 'w' above? Judging from the patch I think you
> mean "not", but I don't see how this follows, especially with your
> subsequent reply validly stating that such a requirement did not
> exist with the XenoLinux kernels. And please don't forget that
> Linux is not the only possible guest OS. What is or is not a
> requirement inside the guest depends not only on Xen's behavior,
> but also on the OS'es. Hence uniformly (and even by default)
> changing the behavior for everyone is imo not acceptable.

If I interpret the various docs, which are available online, correctly
there is always a recommendation to sync with an external clocksource,
IF the system clock has to be somewhat accurate.
For example the XenServer docs mentions that for Windows, PV and HVM.
If that external source is a pvclock driver, then it just means the
syncing has to be done by dom0. I have not verified it, but I think
NTP in dom0 will sync Xen, which in turn allows pvclock to sync domU.

The change exchanges one sort of inaccuracy with another. In other words,
it was broken before and will remain broken after migration/restore.

> > +#define VTSC_JITTER_RANGE_KHZ 200UL   /* Assumed jitter in cpu_khz */  
> I'm struggling to understand the comment: Surely not every single
> CPU surfaces a jitter of precisely 200kHz?

I think that variable describes something like 'range' or 'dispersion'
of measurement results, of something that can not be measured accurately.
So VTSC_MEASUREMENT_INACCURACY_RANGE_KHZ may work.

Olaf

[-- Attachment #1.2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-03-12 10:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 19:20 [PATCH v12] tolerate jitter in cpu_khz calculation to avoid TSC emulation Olaf Hering
2019-03-08 19:29 ` Olaf Hering
2019-03-11 10:02   ` Jan Beulich
2019-03-11 10:26     ` Olaf Hering
2019-03-11 10:42       ` Jan Beulich
2019-03-11 10:16 ` Jan Beulich
2019-03-11 10:49   ` Olaf Hering
2019-03-11 11:19     ` Jan Beulich
2019-03-11 11:57       ` Olaf Hering
2019-03-11 14:07         ` Jan Beulich
2019-03-11 12:09       ` Olaf Hering
2019-03-11 14:11         ` Jan Beulich
2019-03-12 10:03   ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.