[Xen-devel] [PATCH 0/1] Avoiding RDTSC emulation due to host clock drift

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xen-devel] [PATCH 0/1] Avoiding RDTSC emulation due to host clock drift
@ 2019-09-02 18:27 Edwin Török
  2019-09-02 18:27 ` [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid " Edwin Török
  0 siblings, 1 reply; 4+ messages in thread
From: Edwin Török @ 2019-09-02 18:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Edwin Török, Jan Beulich,
	Roger Pau Monné

I noticed that RDTSC emulation got turned on for a VM after a
suspend/host-reboot/resume cycle.
Xen currently expects an exact match between host CPU and saved guest CPU
frequency in KHz, otherwise it turns on RDTSC emulation if the CPU doesn't
support TSC scaling.

An exact match would require ~0.4 ppm accuracy, and even on physical hardware
the platform timer used for calibration is not that accurate.  The best
accuracy I could find that datasheets/specifications require is 100 ppm, so let
Xen accept a 100 ppm difference in clock frequency as "the same" and do not
turn on RDTSC emulation due to that.

So far I have manually tested this on Intel(R) Xeon(R) CPU E5-2697 v3 and a
Debian 9 guest, more tests pending.

See the commit for more details.

Edwin Török (1):
  x86/arch: VM resume: avoid RDTSC emulation due to host clock drift

 xen/arch/x86/time.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.20.1

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift
  2019-09-02 18:27 [Xen-devel] [PATCH 0/1] Avoiding RDTSC emulation due to host clock drift Edwin Török
@ 2019-09-02 18:27 ` Edwin Török
  2019-09-03  7:54   ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Edwin Török @ 2019-09-02 18:27 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Wei Liu, Edwin Török, Jan Beulich,
	Roger Pau Monné

On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts:
```
(XEN) [    6.607693] Detected 2600.004 MHz processor.
(XEN) [ 2674.213081] dom1(hvm): mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3
(XEN) [ 2674.213087] dom2(hvm): mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2
```

The 2 domains were suspended prior to rebooting the host and applying a
xen/microcode patch. After the reboot the frequency of the host was deemed to
be slightly different, and therefore switching on RDTSC emulation for the Linux
HVM guest, even though the difference was only 5 ppm. This CPU doesn't support
TSC scaling.

Therefore we should either measure the standard deviation of our calibration
and have a range of acceptable frequencies as "same", or have a static
tolerance value.

The platform timer's clock frequency accuracy is:
* IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better
* ACPI PM timer, and PIT timer do not have defined accuracies
* Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better
* NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature
* section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET and
thus 500 ppm as an upper bound, "the real drift is usually smaller than 30ppm"

For simplicity and determinism opted for a static tolerance value of 100 ppm
here, such that the any error would be well within the error you would get with
HPET/Linux's calibration. NTP can cope with a drift < 500 ppm.
Most importantly this should stop Xen from claiming that the clock frequency on
the same host is different across reboots. Specifications do not currently
mandate an accuracy higher than 100 ppm, therefore OSes should already be able
to cope with such drift on real hardware. Any improvements in accuracy from
future specifications/motherboards wouldn't be applicable, because they would
also come with newer CPUs that support TSC scaling.

If the CPU does support TSC scaling Xen will of course still attempt to match
the exact frequency value it thinks the guest had when it was suspended.
See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not visible
in patch context).

llabs() doesn't appear to be available when building xen, hence the 2 comparisons.

After this patch when suspending a VM, and rebooting the host I get this output:
```
(XEN) [    6.614703] Detected 2600.010 MHz processor.
(XEN) [  138.924342] TSC marked as reliable, warp = 0 (count=2)
(XEN) [  138.924346] dom1(hvm): mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2
```

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
---
 xen/arch/x86/time.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 9a6ea8ffcb..a0b99f5fff 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
         *elapsed_nsec = 0;
 }

+static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2)
+{
+    int64_t ppm = (khz2 - khz1) * 1000000 / khz2;
+    return -100 < ppm && ppm < 100;
+}
+
 /*
  * This may be called as many as three times for a domain, once when the
  * hypervisor creates the domain, once when the toolstack creates the
@@ -2207,7 +2213,7 @@ int tsc_set_info(struct domain *d,
          * d->arch.tsc_khz == cpu_khz. Thus no need to check incarnation.
          */
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
-             (d->arch.tsc_khz == cpu_khz ||
+             (frequency_same_with_tolerance(d->arch.tsc_khz, cpu_khz) ||
               (is_hvm_domain(d) &&
                hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
         {
-- 
2.20.1

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift
  2019-09-02 18:27 ` [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid " Edwin Török
@ 2019-09-03  7:54   ` Jan Beulich
  2019-09-03  9:26     ` Edwin Török
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2019-09-03  7:54 UTC (permalink / raw)
  To: Edwin Török
  Cc: xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 02.09.2019 20:27, Edwin Török  wrote:
> On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts:
> ```
> (XEN) [    6.607693] Detected 2600.004 MHz processor.
> (XEN) [ 2674.213081] dom1(hvm): mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3
> (XEN) [ 2674.213087] dom2(hvm): mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2
> ```
> 
> The 2 domains were suspended prior to rebooting the host and applying a
> xen/microcode patch. After the reboot the frequency of the host was deemed to
> be slightly different, and therefore switching on RDTSC emulation for the Linux
> HVM guest, even though the difference was only 5 ppm. This CPU doesn't support
> TSC scaling.
> 
> Therefore we should either measure the standard deviation of our calibration
> and have a range of acceptable frequencies as "same", or have a static
> tolerance value.
> 
> The platform timer's clock frequency accuracy is:
> * IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better
> * ACPI PM timer, and PIT timer do not have defined accuracies
> * Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better
> * NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature
> * section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET and
> thus 500 ppm as an upper bound, "the real drift is usually smaller than 30ppm"
> 
> For simplicity and determinism opted for a static tolerance value of 100 ppm
> here, such that the any error would be well within the error you would get with
> HPET/Linux's calibration. NTP can cope with a drift < 500 ppm.
> Most importantly this should stop Xen from claiming that the clock frequency on
> the same host is different across reboots. Specifications do not currently
> mandate an accuracy higher than 100 ppm, therefore OSes should already be able
> to cope with such drift on real hardware. Any improvements in accuracy from
> future specifications/motherboards wouldn't be applicable, because they would
> also come with newer CPUs that support TSC scaling.
> 
> If the CPU does support TSC scaling Xen will of course still attempt to match
> the exact frequency value it thinks the guest had when it was suspended.
> See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not visible
> in patch context).
> 
> llabs() doesn't appear to be available when building xen, hence the 2 comparisons.
> 
> After this patch when suspending a VM, and rebooting the host I get this output:
> ```
> (XEN) [    6.614703] Detected 2600.010 MHz processor.
> (XEN) [  138.924342] TSC marked as reliable, warp = 0 (count=2)
> (XEN) [  138.924346] dom1(hvm): mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2
> ```
> 
> Signed-off-by: Edwin Török <edvin.torok@citrix.com>

First of all - are you aware that there had been multiple iterations
of a patch (by Olaf Hering) making this a command line and/or guest
config controllable setting? If so, it would have been nice if at
least the cover letter identified the correlation. If not, please
take a look. After all it hasn't gone in so far because of objections
by Andrew.

Using a hardcoded tolerance value in any event raises the question
of how you know whether a particular guest OS can actually cope.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
>          *elapsed_nsec = 0;
>  }
>  
> +static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2)

The return type wants to be bool. And there wants to be an explaining
comment ahead of the function, (re-)stating some of what you have in
the description.

> +{
> +    int64_t ppm = (khz2 - khz1) * 1000000 / khz2;
> +    return -100 < ppm && ppm < 100;

While we have no llabs(), you should imo use either ABS() or
__builtin_labs() / __builtin_llabs().

Furthermore, could we limit this behavior to the case of there not
being TSC scaling available (due to running on old hardware, or due
to it being a PV guest)?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift
  2019-09-03  7:54   ` Jan Beulich
@ 2019-09-03  9:26     ` Edwin Török
  0 siblings, 0 replies; 4+ messages in thread
From: Edwin Török @ 2019-09-03  9:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Roger Pau Monné, Wei Liu, Andrew Cooper

On 03/09/2019 08:54, Jan Beulich wrote:
> On 02.09.2019 20:27, Edwin Török  wrote:
>> On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts:
>> ```
>> (XEN) [    6.607693] Detected 2600.004 MHz processor.
>> (XEN) [ 2674.213081] dom1(hvm): mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3
>> (XEN) [ 2674.213087] dom2(hvm): mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2
>> ```
>>
>> The 2 domains were suspended prior to rebooting the host and applying a
>> xen/microcode patch. After the reboot the frequency of the host was deemed to
>> be slightly different, and therefore switching on RDTSC emulation for the Linux
>> HVM guest, even though the difference was only 5 ppm. This CPU doesn't support
>> TSC scaling.
>>
>> Therefore we should either measure the standard deviation of our calibration
>> and have a range of acceptable frequencies as "same", or have a static
>> tolerance value.
>>
>> The platform timer's clock frequency accuracy is:
>> * IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better
>> * ACPI PM timer, and PIT timer do not have defined accuracies
>> * Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better
>> * NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature
>> * section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET and
>> thus 500 ppm as an upper bound, "the real drift is usually smaller than 30ppm"
>>
>> For simplicity and determinism opted for a static tolerance value of 100 ppm
>> here, such that the any error would be well within the error you would get with
>> HPET/Linux's calibration. NTP can cope with a drift < 500 ppm.
>> Most importantly this should stop Xen from claiming that the clock frequency on
>> the same host is different across reboots. Specifications do not currently
>> mandate an accuracy higher than 100 ppm, therefore OSes should already be able
>> to cope with such drift on real hardware. Any improvements in accuracy from
>> future specifications/motherboards wouldn't be applicable, because they would
>> also come with newer CPUs that support TSC scaling.
>>
>> If the CPU does support TSC scaling Xen will of course still attempt to match
>> the exact frequency value it thinks the guest had when it was suspended.
>> See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not visible
>> in patch context).
>>
>> llabs() doesn't appear to be available when building xen, hence the 2 comparisons.
>>
>> After this patch when suspending a VM, and rebooting the host I get this output:
>> ```
>> (XEN) [    6.614703] Detected 2600.010 MHz processor.
>> (XEN) [  138.924342] TSC marked as reliable, warp = 0 (count=2)
>> (XEN) [  138.924346] dom1(hvm): mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2
>> ```
>>
>> Signed-off-by: Edwin Török <edvin.torok@citrix.com>
> 
> First of all - are you aware that there had been multiple iterations
> of a patch (by Olaf Hering) making this a command line and/or guest
> config controllable setting?

No, I'll take a look at them and the associated discussion.
Found a '[PATCH v10] new config option vtsc_tolerance_khz to avoid TSC emulation' in the archives from 9 months ago.

> If so, it would have been nice if at
> least the cover letter identified the correlation. If not, please
> take a look. After all it hasn't gone in so far because of objections
> by Andrew.
> 
> Using a hardcoded tolerance value in any event raises the question
> of how you know whether a particular guest OS can actually cope.
> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
>>          *elapsed_nsec = 0;
>>  }
>>  
>> +static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2)
> 
> The return type wants to be bool. And there wants to be an explaining
> comment ahead of the function, (re-)stating some of what you have in
> the description.
> 
>> +{
>> +    int64_t ppm = (khz2 - khz1) * 1000000 / khz2;
>> +    return -100 < ppm && ppm < 100;
> 
> While we have no llabs(), you should imo use either ABS() or
> __builtin_labs() / __builtin_llabs().
> 
> Furthermore, could we limit this behavior to the case of there not
> being TSC scaling available (due to running on old hardware, or due
> to it being a PV guest)?

Yes, that'd make sense.


Best regards,
--Edwin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-09-03  9:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-02 18:27 [Xen-devel] [PATCH 0/1] Avoiding RDTSC emulation due to host clock drift Edwin Török
2019-09-02 18:27 ` [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid " Edwin Török
2019-09-03  7:54   ` Jan Beulich
2019-09-03  9:26     ` Edwin Török

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.