All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Add VMX TSC scaling support
@ 2015-09-28  7:13 Haozhong Zhang
  2015-09-28  7:13 ` [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info() Haozhong Zhang
                   ` (14 more replies)
  0 siblings, 15 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patchset adds support for VMX TSC scaling feature which is
available on Intel Skylake CPU. The specification of VMX TSC scaling
can be found at
http://www.intel.com/content/www/us/en/processors/timestamp-counter-scaling-virtualization-white-paper.html

VMX TSC scaling allows guest TSC which is read by guest rdtsc(p)
instructions increases in a rate that is customized by the hypervisor
and can be different than the host TSC rate. Basically, VMX TSC
scaling adds a 64-bit field called TSC multiplier in VMCS so that, if
VMX TSC scaling is enabled, TSC read by guest rdtsc(p) instructions
will be calculated by the following formula:

  guest EDX:EAX = (Host TSC * TSC multiplier) >> 48 + VMX TSC Offset

where, Host TSC = Host MSR_IA32_TSC + Host MSR_IA32_TSC_ADJUST.

This patchset is composed of following four parts.
  1. PATCH 01 - 02 fix bugs in tsc_get_info() which could result in
     errors when VMX TSC scaling is used.
     
  2. PATCH 03 - 09 add/move the common parts of VMX TSC scaling and
     SVM TSC ratio to hvm.c and x86/time.c.
     
  3. PATCH 10 - 12 implement the VMX-specific code for supporting VMX
     TSC scaling.
     
  4. PATCH 13 adapts libxl for VMX TSC scaling (as well as SVM TSC
     ratio).

Haozhong Zhang (13):
  x86/time.c: Use system time to calculate elapsed_nsec in
    tsc_get_info()
  x86/time.c: Get the correct guest TSC rate in tsc_get_info()
  x86/hvm: Collect information of TSC scaling ratio
  x86/hvm: Setup TSC scaling ratio
  x86/hvm: Replace architecture TSC scaling by a common function
  x86/hvm: Scale host TSC when setting/getting guest TSC
  x86/hvm: Move saving/loading vcpu's TSC to common code
  x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info()
  x86/time.c: Scale host TSC in pvclock properly
  vmx: Detect and initialize VMX RDTSC(P) scaling
  vmx: Use scaled host TSC to calculate TSC offset
  vmx: Add a call-back to apply TSC scaling ratio to hardware
  tools/libxl: Add 'vtsc_khz' option to set guest TSC rate

 tools/libxl/libxl_types.idl        |   1 +
 tools/libxl/libxl_x86.c            |   4 +-
 tools/libxl/xl_cmdimpl.c           |  22 ++++++++
 xen/arch/x86/hvm/hvm.c             | 110 +++++++++++++++++++++++++++++++++----
 xen/arch/x86/hvm/svm/svm.c         |  25 ++++++---
 xen/arch/x86/hvm/vmx/vmcs.c        |  11 +++-
 xen/arch/x86/hvm/vmx/vmx.c         |  39 +++++++++++--
 xen/arch/x86/time.c                |  33 ++++++++---
 xen/include/asm-x86/domain.h       |   2 +
 xen/include/asm-x86/hvm/hvm.h      |  19 +++++++
 xen/include/asm-x86/hvm/svm/svm.h  |   4 +-
 xen/include/asm-x86/hvm/vmx/vmcs.h |   7 +++
 12 files changed, 240 insertions(+), 37 deletions(-)

--
2.4.8

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-09  6:51   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 02/13] x86/time.c: Get the correct guest TSC rate " Haozhong Zhang
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
the host TSC with a ratio between guest TSC rate and
nanoseconds. However, the result will be incorrect if the guest TSC rate
differs from the host TSC rate. This patch fixes this problem by using
the system time as elapsed_nsec.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/time.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index bbb7e6c..a345efb 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1868,8 +1868,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
             *gtsc_khz = d->arch.tsc_khz;
             break;
         }
-        tsc = rdtsc();
-        *elapsed_nsec = scale_delta(tsc, &d->arch.vtsc_to_ns);
+        *elapsed_nsec = get_s_time();
         *gtsc_khz = cpu_khz;
         break;
     case TSC_MODE_PVRDTSCP:
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 02/13] x86/time.c: Get the correct guest TSC rate in tsc_get_info()
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
  2015-09-28  7:13 ` [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info() Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-09-28  7:13 ` [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio Haozhong Zhang
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
is used, the existing tsc_get_info() returns the host TSC rate (cpu_khz)
as the guest TSC rate. However, tsc_set_info() may set the guest TSC
rate of a domain in TSC_MODE_DEFAULT to a value different than the host
TSC rate. In order to keep consistent to tsc_set_info(), this patch make
tsc_get_info() use the value set by tsc_set_info() as the guest TSC
rate.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index a345efb..92dd8a1 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1869,7 +1869,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode,
             break;
         }
         *elapsed_nsec = get_s_time();
-        *gtsc_khz = cpu_khz;
+        *gtsc_khz = d->arch.tsc_khz;
         break;
     case TSC_MODE_PVRDTSCP:
         if ( d->arch.vtsc )
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
  2015-09-28  7:13 ` [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info() Haozhong Zhang
  2015-09-28  7:13 ` [PATCH 02/13] x86/time.c: Get the correct guest TSC rate " Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 12:53   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 04/13] x86/hvm: Setup " Haozhong Zhang
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

Both VMX TSC scaling and SVM TSC ratio use the 64-bit TSC scaling ratio,
but the number of fractional bits of the ratio is different between VMX
and SVM. This patch makes the architecture code to collect the number of
fractional bits and other related information into fields of struct
hvm_function_table so that they can be used in the common code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/svm/svm.c        |  9 +++++++++
 xen/arch/x86/hvm/vmx/vmx.c        |  2 ++
 xen/include/asm-x86/hvm/hvm.h     | 13 +++++++++++++
 xen/include/asm-x86/hvm/svm/svm.h |  1 +
 4 files changed, 25 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 8de41fa..94b9618 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1428,6 +1428,9 @@ const struct hvm_function_table * __init start_svm(void)
     if ( !cpu_has_svm_nrips )
         clear_bit(SVM_FEATURE_DECODEASSISTS, &svm_feature_flags);
 
+    if ( cpu_has_tsc_ratio )
+        svm_function_table.tsc_scaling_supported = 1;
+
 #define P(p,s) if ( p ) { printk(" - %s\n", s); printed = 1; }
     P(cpu_has_svm_npt, "Nested Page Tables (NPT)");
     P(cpu_has_svm_lbrv, "Last Branch Record (LBR) Virtualisation");
@@ -2283,6 +2286,12 @@ static struct hvm_function_table __initdata svm_function_table = {
     .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
     .nhvm_intr_blocked = nsvm_intr_blocked,
     .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
+
+    .tsc_scaling_supported       = 0,
+    .default_tsc_scaling_ratio   = DEFAULT_TSC_RATIO,
+    .max_tsc_scaling_ratio       = MAX_TSC_RATIO,
+    .tsc_scaling_ratio_frac_bits = 32,
+    .tsc_scaling_ratio_rsvd      = TSC_RATIO_RSVD_BITS,
 };
 
 void svm_vmexit_handler(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index bbec0e8..4edb099 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1968,6 +1968,8 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
     .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
+    /* support for VMX RDTSC(P) scaling */
+    .tsc_scaling_supported       = 0,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 0693706..7dddfa0 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -99,6 +99,19 @@ struct hvm_function_table {
     /* Indicate HAP capabilities. */
     int hap_capabilities;
 
+    /*
+     * Parameters of hardware-assisted TSC scaling.
+     */
+    /* is TSC scaling supported? */
+    bool_t   tsc_scaling_supported;
+    /* number of bits of the fractional part of TSC scaling ratio */
+    uint8_t  tsc_scaling_ratio_frac_bits;
+    /* mask of reserved bits of TSC scaling ratio */
+    uint64_t tsc_scaling_ratio_rsvd;
+    /* default TSC scaling ratio (no scaling) */
+    uint64_t default_tsc_scaling_ratio;
+    /* maxmimum-allowed TSC scaling ratio */
+    uint64_t max_tsc_scaling_ratio;
 
     /*
      * Initialise/destroy HVM domain/vcpu resources
diff --git a/xen/include/asm-x86/hvm/svm/svm.h b/xen/include/asm-x86/hvm/svm/svm.h
index d60ec23..a4832d9 100644
--- a/xen/include/asm-x86/hvm/svm/svm.h
+++ b/xen/include/asm-x86/hvm/svm/svm.h
@@ -96,6 +96,7 @@ extern u32 svm_feature_flags;
 
 /* TSC rate */
 #define DEFAULT_TSC_RATIO       0x0000000100000000ULL
+#define MAX_TSC_RATIO           0x000000ffffffffffULL
 #define TSC_RATIO_RSVD_BITS     0xffffff0000000000ULL
 #define TSC_RATIO(g_khz, h_khz) ( (((u64)(g_khz)<<32)/(u64)(h_khz)) & \
                                   ~TSC_RATIO_RSVD_BITS )
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (2 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 13:13   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function Haozhong Zhang
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch adds a field tsc_scaling_ratio in struct arch_vcpu to
represent the TSC scaling ratio, and sets it up when tsc_set_info() is
called for a vcpu or a vcpu is restored or reset.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c        | 34 ++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c    |  2 ++
 xen/arch/x86/time.c           | 10 +++++++++-
 xen/include/asm-x86/domain.h  |  2 ++
 xen/include/asm-x86/hvm/hvm.h |  2 ++
 5 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 6afc344..63ce4de 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -297,6 +297,34 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
     return 1;
 }
 
+void hvm_setup_tsc_scaling(struct vcpu *v)
+{
+    u64 ratio, khz;
+	s8 shift;
+
+    if ( !hvm_funcs.tsc_scaling_supported )
+        return;
+
+    khz = v->domain->arch.tsc_khz;
+    shift = (hvm_funcs.tsc_scaling_ratio_frac_bits <= 32) ?
+        hvm_funcs.tsc_scaling_ratio_frac_bits : 32;
+    ratio = khz << shift;
+    do_div(ratio, cpu_khz);
+    ratio <<= hvm_funcs.tsc_scaling_ratio_frac_bits - shift;
+
+    if ( ratio == 0 ||
+         ratio > hvm_funcs.max_tsc_scaling_ratio ||
+         ratio & hvm_funcs.tsc_scaling_ratio_rsvd )
+    {
+        printk(XENLOG_WARNING
+               "Invalid TSC scaling ratio - virtual tsc khz=%lu\n",
+               khz);
+        return;
+    }
+
+    v->arch.tsc_scaling_ratio = ratio;
+}
+
 void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
 {
     uint64_t tsc;
@@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
         return -EINVAL;
 
+    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
+        hvm_setup_tsc_scaling(v);
+
     v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
 
     seg.limit = ctxt.idtr_limit;
@@ -5458,6 +5489,9 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
     hvm_set_segment_register(v, x86_seg_gdtr, &reg);
     hvm_set_segment_register(v, x86_seg_idtr, &reg);
 
+    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
+        hvm_setup_tsc_scaling(v);
+
     /* Sync AP's TSC with BSP's. */
     v->arch.hvm_vcpu.cache_tsc_offset =
         v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 94b9618..a7465c6 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1170,6 +1170,8 @@ static int svm_vcpu_initialise(struct vcpu *v)
 
     svm_guest_osvw_init(v);
 
+    v->arch.tsc_scaling_ratio = DEFAULT_TSC_RATIO;
+
     return 0;
 }
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 92dd8a1..64f4e31 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1956,6 +1956,8 @@ void tsc_set_info(struct domain *d,
         {
     case TSC_MODE_NEVER_EMULATE:
             d->arch.vtsc = 0;
+            if ( tsc_mode == TSC_MODE_NEVER_EMULATE )
+                d->arch.tsc_khz = cpu_khz;
             break;
         }
         d->arch.vtsc = 1;
@@ -1981,8 +1983,14 @@ void tsc_set_info(struct domain *d,
     if ( is_hvm_domain(d) )
     {
         hvm_set_rdtsc_exiting(d, d->arch.vtsc);
-        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
+        if ( d->vcpu && d->vcpu[0] )
         {
+            if ( !d->arch.vtsc && hvm_funcs.tsc_scaling_supported )
+                hvm_setup_tsc_scaling(d->vcpu[0]);
+
+            if ( incarnation )
+                return;
+
             /*
              * set_tsc_offset() is called from hvm_vcpu_initialise() before
              * tsc_set_info(). New vtsc mode may require recomputing TSC
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index f0aeade..fffd519 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -533,6 +533,8 @@ struct arch_vcpu
     XEN_GUEST_HANDLE(vcpu_time_info_t) time_info_guest;
 
     struct arch_vm_event *vm_event;
+
+    uint64_t tsc_scaling_ratio;
 };
 
 smap_check_policy_t smap_policy_change(struct vcpu *v,
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 7dddfa0..55e7f64 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -261,6 +261,8 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc);
 u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc);
 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0)
 
+void hvm_setup_tsc_scaling(struct vcpu *v);
+
 int hvm_set_mode(struct vcpu *v, int mode);
 void hvm_init_guest_time(struct domain *d);
 void hvm_set_guest_time(struct vcpu *v, u64 guest_time);
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (3 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 04/13] x86/hvm: Setup " Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 13:52   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC Haozhong Zhang
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch implements a common function hvm_scale_tsc() to calculate the
scaling of TSC by using TSC scaling information which is collected by
architecture code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c            | 53 +++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c        |  4 +--
 xen/include/asm-x86/hvm/hvm.h     |  1 +
 xen/include/asm-x86/hvm/svm/svm.h |  3 ---
 4 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 63ce4de..2d55a36 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -297,6 +297,59 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
     return 1;
 }
 
+/*
+ * Multiply tsc by a fixed point number represented by ratio.
+ *
+ * The most significant 64-N bits (mult) of ratio represent the
+ * integral part of the fixed point number; the remaining N bits
+ * (frac) represent the fractional part, ie. ratio represents a fixed
+ * point number (mult + frac * 2^(-N)).
+ *
+ * N equals to hvm_funcs.tsc_scaling_ratio_frac_bits.
+ */
+static u64 __scale_tsc(u64 tsc, u64 ratio)
+{
+    u64 mult, frac, mask, _tsc;
+    int width, nr;
+
+    BUG_ON(hvm_funcs.tsc_scaling_ratio_frac_bits >= 64);
+
+    mult  = ratio >> hvm_funcs.tsc_scaling_ratio_frac_bits;
+    mask  = (1ULL << hvm_funcs.tsc_scaling_ratio_frac_bits) - 1;
+    frac  = ratio & mask;
+
+    width = 64 - hvm_funcs.tsc_scaling_ratio_frac_bits;
+    mask  = (1ULL << width) - 1;
+    nr    = hvm_funcs.tsc_scaling_ratio_frac_bits;
+
+    _tsc  = tsc;
+    _tsc *= mult;
+    _tsc += (tsc >> hvm_funcs.tsc_scaling_ratio_frac_bits) * frac;
+
+    while ( nr >= width )
+    {
+        _tsc += (((tsc >> (nr - width)) & mask) * frac) >> (64 - nr);
+        nr   -= width;
+    }
+
+    if ( nr > 0 )
+        _tsc += ((tsc & ((1ULL << nr) - 1)) * frac) >>
+            hvm_funcs.tsc_scaling_ratio_frac_bits;
+
+    return _tsc;
+}
+
+u64 hvm_scale_tsc(struct vcpu *v, u64 tsc)
+{
+    u64 _tsc = tsc;
+    u64 ratio = v->arch.tsc_scaling_ratio;
+
+    if ( ratio != hvm_funcs.default_tsc_scaling_ratio )
+        _tsc = __scale_tsc(ratio, tsc);
+
+    return _tsc;
+}
+
 void hvm_setup_tsc_scaling(struct vcpu *v)
 {
     u64 ratio, khz;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index a7465c6..0984021 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -804,7 +804,7 @@ static void svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
             host_tsc = at_tsc;
         else
             host_tsc = rdtsc();
-        offset = svm_get_tsc_offset(host_tsc, guest_tsc, vcpu_tsc_ratio(v));
+        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
     }
 
     if ( !nestedhvm_enabled(d) ) {
@@ -965,7 +965,7 @@ static inline void svm_tsc_ratio_save(struct vcpu *v)
 static inline void svm_tsc_ratio_load(struct vcpu *v)
 {
     if ( cpu_has_tsc_ratio && !v->domain->arch.vtsc ) 
-        wrmsrl(MSR_AMD64_TSC_RATIO, vcpu_tsc_ratio(v));
+        wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio);
 }
 
 static void svm_ctxt_switch_from(struct vcpu *v)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 55e7f64..f63fe93 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -261,6 +261,7 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc);
 u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc);
 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0)
 
+u64 hvm_scale_tsc(struct vcpu *v, u64 tsc);
 void hvm_setup_tsc_scaling(struct vcpu *v);
 
 int hvm_set_mode(struct vcpu *v, int mode);
diff --git a/xen/include/asm-x86/hvm/svm/svm.h b/xen/include/asm-x86/hvm/svm/svm.h
index a4832d9..42aa6e8 100644
--- a/xen/include/asm-x86/hvm/svm/svm.h
+++ b/xen/include/asm-x86/hvm/svm/svm.h
@@ -98,9 +98,6 @@ extern u32 svm_feature_flags;
 #define DEFAULT_TSC_RATIO       0x0000000100000000ULL
 #define MAX_TSC_RATIO           0x000000ffffffffffULL
 #define TSC_RATIO_RSVD_BITS     0xffffff0000000000ULL
-#define TSC_RATIO(g_khz, h_khz) ( (((u64)(g_khz)<<32)/(u64)(h_khz)) & \
-                                  ~TSC_RATIO_RSVD_BITS )
-#define vcpu_tsc_ratio(v)       TSC_RATIO((v)->domain->arch.tsc_khz, cpu_khz)
 
 extern void svm_host_osvw_reset(void);
 extern void svm_host_osvw_init(void);
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (4 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 14:17   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 07/13] x86/hvm: Move saving/loading vcpu's TSC to common code Haozhong Zhang
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
calculate the guest TSC by adding the TSC offset to the host TSC. When
the TSC scaling is enabled, the host TSC should be scaled first. This
patch adds the scaling logic to those two functions.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2d55a36..568c9ef 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
         tsc = hvm_get_guest_time_fixed(v, at_tsc);
         tsc = gtime_to_gtsc(v->domain, tsc);
     }
-    else if ( at_tsc )
-    {
-        tsc = at_tsc;
-    }
     else
     {
-        tsc = rdtsc();
+        tsc = at_tsc ? at_tsc : rdtsc();
+
+        if ( hvm_funcs.tsc_scaling_supported )
+            tsc = hvm_scale_tsc(v, tsc);
     }
 
     delta_tsc = guest_tsc - tsc;
@@ -422,13 +421,12 @@ u64 hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc)
         tsc = hvm_get_guest_time_fixed(v, at_tsc);
         tsc = gtime_to_gtsc(v->domain, tsc);
     }
-    else if ( at_tsc )
-    {
-        tsc = at_tsc;
-    }
     else
     {
-        tsc = rdtsc();
+        tsc = at_tsc ? at_tsc : rdtsc();
+
+        if ( hvm_funcs.tsc_scaling_supported )
+            tsc = hvm_scale_tsc(v, tsc);
     }
 
     return tsc + v->arch.hvm_vcpu.cache_tsc_offset;
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 07/13] x86/hvm: Move saving/loading vcpu's TSC to common code
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (5 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 14:54   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 08/13] x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info() Haozhong Zhang
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

Both VMX and SVM saves/loads vcpu's TSC when saving/loading vcpu's
context, so this patch moves saving/loading vcpu's TSC to the common
function hvm_save_cpu_ctxt()/hvm_load_cpu_ctxt().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c     | 4 ++++
 xen/arch/x86/hvm/svm/svm.c | 5 -----
 xen/arch/x86/hvm/vmx/vmx.c | 5 -----
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 568c9ef..3522d20 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1813,6 +1813,8 @@ static int hvm_save_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
         /* Architecture-specific vmcs/vmcb bits */
         hvm_funcs.save_cpu_ctxt(v, &ctxt);
 
+        ctxt.tsc = hvm_get_guest_tsc_fixed(v, d->arch.hvm_domain.sync_tsc);
+
         ctxt.msr_tsc_aux = hvm_msr_tsc_aux(v);
 
         hvm_get_segment_register(v, x86_seg_idtr, &seg);
@@ -2105,6 +2107,8 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
         hvm_setup_tsc_scaling(v);
 
+    hvm_set_guest_tsc_fixed(v, ctxt.tsc, d->arch.hvm_domain.sync_tsc);
+
     v->arch.hvm_vcpu.msr_tsc_aux = ctxt.msr_tsc_aux;
 
     seg.limit = ctxt.idtr_limit;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 0984021..73bc863 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -357,9 +357,6 @@ static void svm_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     data->msr_syscall_mask = vmcb->sfmask;
     data->msr_efer         = v->arch.hvm_vcpu.guest_efer;
     data->msr_flags        = -1ULL;
-
-    data->tsc = hvm_get_guest_tsc_fixed(v,
-                                        v->domain->arch.hvm_domain.sync_tsc);
 }
 
 
@@ -374,8 +371,6 @@ static void svm_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     vmcb->sfmask     = data->msr_syscall_mask;
     v->arch.hvm_vcpu.guest_efer = data->msr_efer;
     svm_update_guest_efer(v);
-
-    hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.hvm_domain.sync_tsc);
 }
 
 static void svm_save_vmcb_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4edb099..624db1c 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -581,9 +581,6 @@ static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     data->msr_lstar        = guest_state->msrs[VMX_INDEX_MSR_LSTAR];
     data->msr_star         = guest_state->msrs[VMX_INDEX_MSR_STAR];
     data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK];
-
-    data->tsc = hvm_get_guest_tsc_fixed(v,
-                                        v->domain->arch.hvm_domain.sync_tsc);
 }
 
 static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
@@ -598,8 +595,6 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
 
     v->arch.hvm_vmx.cstar     = data->msr_cstar;
     v->arch.hvm_vmx.shadow_gs = data->shadow_gs;
-
-    hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.hvm_domain.sync_tsc);
 }
 
 
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 08/13] x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info()
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (6 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 07/13] x86/hvm: Move saving/loading vcpu's TSC to common code Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 15:01   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly Haozhong Zhang
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch uses hvm_funcs.tsc_scaling_supported instead of the
architecture code to detect the TSC scaling support.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/time.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 64f4e31..4b5402c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -37,7 +37,6 @@
 #include <asm/hpet.h>
 #include <io_ports.h>
 #include <asm/setup.h> /* for early_time_init */
-#include <asm/hvm/svm/svm.h> /* for cpu_has_tsc_ratio */
 #include <public/arch-x86/cpuid.h>
 
 /* opt_clocksource: Force clocksource to one of: pit, hpet, acpi. */
@@ -1951,7 +1950,7 @@ void tsc_set_info(struct domain *d,
          */
         if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
              (has_hvm_container_domain(d) ?
-              d->arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio :
+              d->arch.tsc_khz == cpu_khz || hvm_funcs.tsc_scaling_supported :
               incarnation == 0) )
         {
     case TSC_MODE_NEVER_EMULATE:
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (7 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 08/13] x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info() Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-09-28 16:36   ` Boris Ostrovsky
  2015-10-22 15:50   ` Boris Ostrovsky
  2015-09-28  7:13 ` [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling Haozhong Zhang
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch makes the pvclock return the scaled host TSC and
corresponding scaling parameters to HVM domains if guest TSC is not
emulated and TSC scaling is enabled.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/time.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 4b5402c..54eab6e 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -832,10 +832,19 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
     }
     else
     {
-        _u.tsc_timestamp     = t->local_tsc_stamp;
+        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )
+        {
+            _u.tsc_timestamp     = hvm_scale_tsc(v, t->local_tsc_stamp);
+            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
+            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
+        }
+        else
+        {
+            _u.tsc_timestamp     = t->local_tsc_stamp;
+            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
+            _u.tsc_shift         = (s8)t->tsc_scale.shift;
+        }
         _u.system_time       = t->stime_local_stamp;
-        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
-        _u.tsc_shift         = (s8)t->tsc_scale.shift;
     }
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (8 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-27 13:19   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset Haozhong Zhang
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch adds the detection and initialization code for VMX TSC
scaling.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 11 +++++++++--
 xen/arch/x86/hvm/vmx/vmx.c         |  9 +++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  7 +++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 08f2078..7172b27 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -144,6 +144,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_vmfunc, "VM Functions");
     P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
     P(cpu_has_vmx_pml, "Page Modification Logging");
+    P(cpu_has_vmx_tsc_scaling, "RDTSC(P) Scaling");
 #undef P
 
     if ( !printed )
@@ -236,7 +237,8 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
-               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
+               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
+               SECONDARY_EXEC_TSC_SCALING);
         rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -965,7 +967,7 @@ static int construct_vmcs(struct vcpu *v)
     __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control);
 
     v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control;
-    if ( d->arch.vtsc )
+    if ( d->arch.vtsc && !cpu_has_vmx_tsc_scaling )
         v->arch.hvm_vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
 
     v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
@@ -1239,6 +1241,9 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(GUEST_PAT, guest_pat);
     }
 
+    if ( cpu_has_vmx_tsc_scaling )
+        __vmwrite(TSC_MULTIPLIER, VMX_TSC_MULTIPLIER_DEFAULT);
+
     vmx_vmcs_exit(v);
 
     /* PVH: paging mode is updated by arch_set_info_guest(). */
@@ -1805,6 +1810,8 @@ void vmcs_dump_vcpu(struct vcpu *v)
     printk("IDTVectoring: info=%08x errcode=%08x\n",
            vmr32(IDT_VECTORING_INFO), vmr32(IDT_VECTORING_ERROR_CODE));
     printk("TSC Offset = 0x%016lx\n", vmr(TSC_OFFSET));
+    if ( v->arch.hvm_vmx.secondary_exec_control & SECONDARY_EXEC_TSC_SCALING )
+        printk("TSC Multiplier = 0x%016lx\n", vmr(TSC_MULTIPLIER));
     if ( (v->arch.hvm_vmx.exec_control & CPU_BASED_TPR_SHADOW) ||
          (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT) )
         printk("TPR Threshold = 0x%02x  PostedIntrVec = 0x%02x\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 624db1c..454440e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -151,6 +151,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
     if ( v->vcpu_id == 0 )
         v->arch.user_regs.eax = 1;
 
+    v->arch.tsc_scaling_ratio = VMX_TSC_MULTIPLIER_DEFAULT;
+
     return 0;
 }
 
@@ -1965,6 +1967,10 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
     /* support for VMX RDTSC(P) scaling */
     .tsc_scaling_supported       = 0,
+    .default_tsc_scaling_ratio   = VMX_TSC_MULTIPLIER_DEFAULT,
+    .max_tsc_scaling_ratio       = VMX_TSC_MULTIPLIER_MAX,
+    .tsc_scaling_ratio_frac_bits = 48,
+    .tsc_scaling_ratio_rsvd      = 0x0ULL,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2017,6 +2023,9 @@ const struct hvm_function_table * __init start_vmx(void)
          && cpu_has_vmx_secondary_exec_control )
         vmx_function_table.pvh_supported = 1;
 
+    if ( cpu_has_vmx_tsc_scaling )
+        vmx_function_table.tsc_scaling_supported = 1;
+
     setup_vmcs_dump();
 
     return &vmx_function_table;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f1126d4..d478584 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -225,6 +225,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
+#define SECONDARY_EXEC_TSC_SCALING              0x02000000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
@@ -248,6 +249,9 @@ extern u32 vmx_secondary_exec_control;
 
 #define VMX_MISC_CR3_TARGET             0x1ff0000
 
+#define VMX_TSC_MULTIPLIER_DEFAULT 0x0001000000000000ULL
+#define VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
+
 #define cpu_has_wbinvd_exiting \
     (vmx_secondary_exec_control & SECONDARY_EXEC_WBINVD_EXITING)
 #define cpu_has_vmx_virtualize_apic_accesses \
@@ -291,6 +295,8 @@ extern u32 vmx_secondary_exec_control;
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS)
 #define cpu_has_vmx_pml \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
+#define cpu_has_vmx_tsc_scaling \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -365,6 +371,7 @@ enum vmcs_field {
     VMREAD_BITMAP                   = 0x00002026,
     VMWRITE_BITMAP                  = 0x00002028,
     VIRT_EXCEPTION_INFO             = 0x0000202a,
+    TSC_MULTIPLIER                  = 0x00002032,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     VMCS_LINK_POINTER               = 0x00002800,
     GUEST_IA32_DEBUGCTL             = 0x00002802,
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (9 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-10-22 15:55   ` Boris Ostrovsky
  2015-10-27 13:29   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware Haozhong Zhang
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

If VMX TSC scaling is enabled and no TSC emulation is used,
vmx_set_tsc_offset() will calculate the TSC offset by substracting the
scaled host TSC from the current guest TSC.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 454440e..163974d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
 
 static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 {
+    uint64_t host_tsc, guest_tsc;
+    struct domain *d = v->domain;
+
+    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
+
+    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
+    {
+        host_tsc = at_tsc ? at_tsc : rdtsc();
+        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
+    }
+
     vmx_vmcs_enter(v);
 
+    if ( !nestedhvm_enabled(d) )
+        goto out;
+
     if ( nestedhvm_vcpu_in_guestmode(v) )
         offset += nvmx_get_tsc_offset(v);
 
+out:
     __vmwrite(TSC_OFFSET, offset);
     vmx_vmcs_exit(v);
 }
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (10 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-09-28 16:02   ` Boris Ostrovsky
  2015-10-27 13:33   ` Jan Beulich
  2015-09-28  7:13 ` [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate Haozhong Zhang
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch adds a new call-back setup_tsc_scaling in struct
hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c        | 1 +
 xen/arch/x86/hvm/svm/svm.c    | 5 +++++
 xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
 xen/include/asm-x86/hvm/hvm.h | 3 +++
 4 files changed, 17 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3522d20..2d8a148 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
     }
 
     v->arch.tsc_scaling_ratio = ratio;
+    hvm_funcs.setup_tsc_scaling(v);
 }
 
 void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 73bc863..d890c1f 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
     svm_asid_g_invlpg(curr, vaddr);
 }
 
+static void svm_setup_tsc_scaling(struct vcpu *v)
+{
+}
+
 static struct hvm_function_table __initdata svm_function_table = {
     .name                 = "SVM",
     .cpu_up_prepare       = svm_cpu_up_prepare,
@@ -2289,6 +2293,7 @@ static struct hvm_function_table __initdata svm_function_table = {
     .max_tsc_scaling_ratio       = MAX_TSC_RATIO,
     .tsc_scaling_ratio_frac_bits = 32,
     .tsc_scaling_ratio_rsvd      = TSC_RATIO_RSVD_BITS,
+    .setup_tsc_scaling           = svm_setup_tsc_scaling,
 };
 
 void svm_vmexit_handler(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 163974d..c4a7b81 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1100,6 +1100,13 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
     }
 }
 
+static void vmx_setup_tsc_scaling(struct vcpu *v)
+{
+    vmx_vmcs_enter(v);
+    __vmwrite(TSC_MULTIPLIER, v->arch.tsc_scaling_ratio);
+    vmx_vmcs_exit(v);
+}
+
 static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 {
     uint64_t host_tsc, guest_tsc;
@@ -1986,6 +1993,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .max_tsc_scaling_ratio       = VMX_TSC_MULTIPLIER_MAX,
     .tsc_scaling_ratio_frac_bits = 48,
     .tsc_scaling_ratio_rsvd      = 0x0ULL,
+    .setup_tsc_scaling           = vmx_setup_tsc_scaling,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index f63fe93..9f8b6d5 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -226,6 +226,9 @@ struct hvm_function_table {
     void (*altp2m_vcpu_update_vmfunc_ve)(struct vcpu *v);
     bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
     int (*altp2m_vcpu_emulate_vmfunc)(struct cpu_user_regs *regs);
+
+    /* setup TSC scaling */
+    void (*setup_tsc_scaling)(struct vcpu *v);
 };
 
 extern struct hvm_function_table hvm_funcs;
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (11 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware Haozhong Zhang
@ 2015-09-28  7:13 ` Haozhong Zhang
  2015-09-28 11:47   ` Julien Grall
                     ` (2 more replies)
  2015-09-28 10:51 ` [PATCH 00/13] Add VMX TSC scaling support Andrew Cooper
  2015-11-22 17:54 ` Haozhong Zhang
  14 siblings, 3 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28  7:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Haozhong Zhang, Kevin Tian, Wei Liu, Ian Campbell,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
rate in KHz. In the case that tsc_mode = 'default', the default value of
'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
option is set to 0 or does not appear in the configuration. In all other
cases of tsc_mode, 'vtsc_khz' option is just ignored.

Another purpose of adding this option is to keep vcpu's TSC rate across
guest reboot. In existing code, a new domain is created from the
configuration of the previous domain which was just rebooted. vcpu's TSC
rate is not stored in the configuration and the host TSC rate is the
used as vcpu's TSC rate. This works fine unless the previous domain was
migrated from another host machine with a different host TSC rate than
the current one.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/libxl_x86.c     |  4 +++-
 tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9f6ec00..91cb0be 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
     ("numa_placement",  libxl_defbool),
     ("tsc_mode",        libxl_tsc_mode),
+    ("vtsc_khz",        uint32),
     ("max_memkb",       MemKB),
     ("target_memkb",    MemKB),
     ("video_memkb",     MemKB),
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 896f34c..7baaee4 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
 {
     int ret = 0;
     int tsc_mode;
+    uint32_t vtsc_khz;
     uint32_t rtc_timeoffset;
     libxl_ctx *ctx = libxl__gc_owner(gc);
 
@@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
     default:
         abort();
     }
-    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
+    vtsc_khz = d_config->b_info.vtsc_khz;
+    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
     if (libxl_defbool_val(d_config->b_info.disable_migrate))
         xc_domain_disable_migrate(ctx->xch, domid);
     rtc_timeoffset = d_config->b_info.rtc_timeoffset;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 2706759..5fabda7 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
         }
     }
 
+    /* "vtsc_khz" option works only if "tsc_mode" option is
+     * "default". In this case, if "vtsc_khz" option is set to 0, we
+     * will reset it to the host TSC rate. In all other cases, we just
+     * ignore any given value and always set it to 0.
+     */
+    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
+        b_info->vtsc_khz = l;
+    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
+        if (b_info->vtsc_khz == 0) {
+            libxl_physinfo physinfo;
+            if (!libxl_get_physinfo(ctx, &physinfo))
+                b_info->vtsc_khz = physinfo.cpu_khz;
+            else
+                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
+        }
+    } else {
+        fprintf(stderr, "WARNING: ignoring \"vtsc_khz\" option. "
+                "\"vtsc_khz\" option works only if "
+                "\"tsc_mode\" option is \"default\".\n");
+        b_info->vtsc_khz = 0;
+    }
+
     if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
         b_info->rtc_timeoffset = l;
 
-- 
2.4.8

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (12 preceding siblings ...)
  2015-09-28  7:13 ` [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate Haozhong Zhang
@ 2015-09-28 10:51 ` Andrew Cooper
  2015-09-28 13:48   ` Boris Ostrovsky
  2015-11-22 17:54 ` Haozhong Zhang
  14 siblings, 1 reply; 117+ messages in thread
From: Andrew Cooper @ 2015-09-28 10:51 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, Aravind Gopalakrishnan, Jan Beulich,
	Keir Fraser, Boris Ostrovsky, Suravee Suthikulpanit

On 28/09/15 08:13, Haozhong Zhang wrote:
> This patchset adds support for VMX TSC scaling feature which is
> available on Intel Skylake CPU. The specification of VMX TSC scaling
> can be found at
> http://www.intel.com/content/www/us/en/processors/timestamp-counter-scaling-virtualization-white-paper.html
>
> VMX TSC scaling allows guest TSC which is read by guest rdtsc(p)
> instructions increases in a rate that is customized by the hypervisor
> and can be different than the host TSC rate. Basically, VMX TSC
> scaling adds a 64-bit field called TSC multiplier in VMCS so that, if
> VMX TSC scaling is enabled, TSC read by guest rdtsc(p) instructions
> will be calculated by the following formula:
>
>   guest EDX:EAX = (Host TSC * TSC multiplier) >> 48 + VMX TSC Offset
>
> where, Host TSC = Host MSR_IA32_TSC + Host MSR_IA32_TSC_ADJUST.
>
> This patchset is composed of following four parts.
>   1. PATCH 01 - 02 fix bugs in tsc_get_info() which could result in
>      errors when VMX TSC scaling is used.
>      
>   2. PATCH 03 - 09 add/move the common parts of VMX TSC scaling and
>      SVM TSC ratio to hvm.c and x86/time.c.
>      
>   3. PATCH 10 - 12 implement the VMX-specific code for supporting VMX
>      TSC scaling.
>      
>   4. PATCH 13 adapts libxl for VMX TSC scaling (as well as SVM TSC
>      ratio).

Thankyou for this series.  I have had a brief look over it and it
appears to be in good shape, but have not done a thorough review yet.

Konrad/Boris: As Oracle are the main users of the more interesting guest
timing modes, do you have tests to verify correct functioning?

~Andrew

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-28  7:13 ` [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate Haozhong Zhang
@ 2015-09-28 11:47   ` Julien Grall
  2015-09-28 12:11     ` Haozhong Zhang
  2015-09-28 14:19   ` Wei Liu
  2015-09-29 10:04   ` Ian Campbell
  2 siblings, 1 reply; 117+ messages in thread
From: Julien Grall @ 2015-09-28 11:47 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, Jan Beulich, Aravind Gopalakrishnan,
	Jun Nakajima, Keir Fraser, Boris Ostrovsky,
	Suravee Suthikulpanit

Hi,

On 28/09/15 08:13, Haozhong Zhang wrote:
> This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> rate in KHz. In the case that tsc_mode = 'default', the default value of
> 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> option is set to 0 or does not appear in the configuration. In all other
> cases of tsc_mode, 'vtsc_khz' option is just ignored.
> 
> Another purpose of adding this option is to keep vcpu's TSC rate across
> guest reboot. In existing code, a new domain is created from the
> configuration of the previous domain which was just rebooted. vcpu's TSC
> rate is not stored in the configuration and the host TSC rate is the
> used as vcpu's TSC rate. This works fine unless the previous domain was
> migrated from another host machine with a different host TSC rate than
> the current one.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/libxl_x86.c     |  4 +++-
>  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 9f6ec00..91cb0be 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
>      ("numa_placement",  libxl_defbool),
>      ("tsc_mode",        libxl_tsc_mode),
> +    ("vtsc_khz",        uint32),

This is x86 specific, can we begin to move anything arch-specific under
arch_foo? See arch_arm for instance.

Also, you would need to add a new define LIBXL_HAVE_foo to let allow
developer writing app on top of libxl support multiple version of Xen.
See libxl.h

>      ("max_memkb",       MemKB),
>      ("target_memkb",    MemKB),
>      ("video_memkb",     MemKB),
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 896f34c..7baaee4 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
>  {
>      int ret = 0;
>      int tsc_mode;
> +    uint32_t vtsc_khz;
>      uint32_t rtc_timeoffset;
>      libxl_ctx *ctx = libxl__gc_owner(gc);
>  
> @@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
>      default:
>          abort();
>      }
> -    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
> +    vtsc_khz = d_config->b_info.vtsc_khz;
> +    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
>      if (libxl_defbool_val(d_config->b_info.disable_migrate))
>          xc_domain_disable_migrate(ctx->xch, domid);
>      rtc_timeoffset = d_config->b_info.rtc_timeoffset;
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 2706759..5fabda7 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
>          }
>      }
>  
> +    /* "vtsc_khz" option works only if "tsc_mode" option is
> +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> +     * will reset it to the host TSC rate. In all other cases, we just
> +     * ignore any given value and always set it to 0.
> +     */
> +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))

This option has to be documented docs/man/xl.cfg.pod.5.

> +        b_info->vtsc_khz = l;
> +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> +        if (b_info->vtsc_khz == 0) {
> +            libxl_physinfo physinfo;
> +            if (!libxl_get_physinfo(ctx, &physinfo))
> +                b_info->vtsc_khz = physinfo.cpu_khz;
> +            else
> +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> +        }
> +    } else {
> +        fprintf(stderr, "WARNING: ignoring \"vtsc_khz\" option. "
> +                "\"vtsc_khz\" option works only if "
> +                "\"tsc_mode\" option is \"default\".\n");
> +        b_info->vtsc_khz = 0;
> +    }
> +
>      if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
>          b_info->rtc_timeoffset = l;
>  
> 

Regards,


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-28 11:47   ` Julien Grall
@ 2015-09-28 12:11     ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-28 12:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Aravind Gopalakrishnan, Jun Nakajima, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

Hi Julien,

Thank you for the comments!

On Mon, Sep 28, 2015 at 12:47:24PM +0100, Julien Grall wrote:
> Hi,
> 
> On 28/09/15 08:13, Haozhong Zhang wrote:
> > This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> > rate in KHz. In the case that tsc_mode = 'default', the default value of
> > 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> > option is set to 0 or does not appear in the configuration. In all other
> > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > 
> > Another purpose of adding this option is to keep vcpu's TSC rate across
> > guest reboot. In existing code, a new domain is created from the
> > configuration of the previous domain which was just rebooted. vcpu's TSC
> > rate is not stored in the configuration and the host TSC rate is the
> > used as vcpu's TSC rate. This works fine unless the previous domain was
> > migrated from another host machine with a different host TSC rate than
> > the current one.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> >  tools/libxl/libxl_types.idl |  1 +
> >  tools/libxl/libxl_x86.c     |  4 +++-
> >  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
> >  3 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 9f6ec00..91cb0be 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> >      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
> >      ("numa_placement",  libxl_defbool),
> >      ("tsc_mode",        libxl_tsc_mode),
> > +    ("vtsc_khz",        uint32),
> 
> This is x86 specific, can we begin to move anything arch-specific under
> arch_foo? See arch_arm for instance.
>

I'll have a look at arch_arm

> Also, you would need to add a new define LIBXL_HAVE_foo to let allow
> developer writing app on top of libxl support multiple version of Xen.
> See libxl.h
>

I'll add this

> >      ("max_memkb",       MemKB),
> >      ("target_memkb",    MemKB),
> >      ("video_memkb",     MemKB),
> > diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> > index 896f34c..7baaee4 100644
> > --- a/tools/libxl/libxl_x86.c
> > +++ b/tools/libxl/libxl_x86.c
> > @@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> >  {
> >      int ret = 0;
> >      int tsc_mode;
> > +    uint32_t vtsc_khz;
> >      uint32_t rtc_timeoffset;
> >      libxl_ctx *ctx = libxl__gc_owner(gc);
> >  
> > @@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> >      default:
> >          abort();
> >      }
> > -    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
> > +    vtsc_khz = d_config->b_info.vtsc_khz;
> > +    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
> >      if (libxl_defbool_val(d_config->b_info.disable_migrate))
> >          xc_domain_disable_migrate(ctx->xch, domid);
> >      rtc_timeoffset = d_config->b_info.rtc_timeoffset;
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 2706759..5fabda7 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> >          }
> >      }
> >  
> > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > +     * will reset it to the host TSC rate. In all other cases, we just
> > +     * ignore any given value and always set it to 0.
> > +     */
> > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> 
> This option has to be documented docs/man/xl.cfg.pod.5.
>

I'll add the document

> > +        b_info->vtsc_khz = l;
> > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > +        if (b_info->vtsc_khz == 0) {
> > +            libxl_physinfo physinfo;
> > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > +            else
> > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > +        }
> > +    } else {
> > +        fprintf(stderr, "WARNING: ignoring \"vtsc_khz\" option. "
> > +                "\"vtsc_khz\" option works only if "
> > +                "\"tsc_mode\" option is \"default\".\n");
> > +        b_info->vtsc_khz = 0;
> > +    }
> > +
> >      if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
> >          b_info->rtc_timeoffset = l;
> >  
> > 
> 
> Regards,
> 
> 
> -- 
> Julien Grall

- Haozhong Zhang

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-09-28 10:51 ` [PATCH 00/13] Add VMX TSC scaling support Andrew Cooper
@ 2015-09-28 13:48   ` Boris Ostrovsky
  0 siblings, 0 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-09-28 13:48 UTC (permalink / raw)
  To: Andrew Cooper, Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, Aravind Gopalakrishnan, Jan Beulich,
	Keir Fraser, Suravee Suthikulpanit

On 09/28/2015 06:51 AM, Andrew Cooper wrote:
> On 28/09/15 08:13, Haozhong Zhang wrote:
>> This patchset adds support for VMX TSC scaling feature which is
>> available on Intel Skylake CPU. The specification of VMX TSC scaling
>> can be found at
>> http://www.intel.com/content/www/us/en/processors/timestamp-counter-scaling-virtualization-white-paper.html
>>
>> VMX TSC scaling allows guest TSC which is read by guest rdtsc(p)
>> instructions increases in a rate that is customized by the hypervisor
>> and can be different than the host TSC rate. Basically, VMX TSC
>> scaling adds a 64-bit field called TSC multiplier in VMCS so that, if
>> VMX TSC scaling is enabled, TSC read by guest rdtsc(p) instructions
>> will be calculated by the following formula:
>>
>>    guest EDX:EAX = (Host TSC * TSC multiplier) >> 48 + VMX TSC Offset
>>
>> where, Host TSC = Host MSR_IA32_TSC + Host MSR_IA32_TSC_ADJUST.
>>
>> This patchset is composed of following four parts.
>>    1. PATCH 01 - 02 fix bugs in tsc_get_info() which could result in
>>       errors when VMX TSC scaling is used.
>>       
>>    2. PATCH 03 - 09 add/move the common parts of VMX TSC scaling and
>>       SVM TSC ratio to hvm.c and x86/time.c.
>>       
>>    3. PATCH 10 - 12 implement the VMX-specific code for supporting VMX
>>       TSC scaling.
>>       
>>    4. PATCH 13 adapts libxl for VMX TSC scaling (as well as SVM TSC
>>       ratio).
> Thankyou for this series.  I have had a brief look over it and it
> appears to be in good shape, but have not done a thorough review yet.
>
> Konrad/Boris: As Oracle are the main users of the more interesting guest
> timing modes, do you have tests to verify correct functioning?

Not really. The most interesting thing to test here I think would be 
migration between differently-clocked systems and we only do save/restore.

-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-28  7:13 ` [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate Haozhong Zhang
  2015-09-28 11:47   ` Julien Grall
@ 2015-09-28 14:19   ` Wei Liu
  2015-09-29  0:40     ` Haozhong Zhang
  2015-09-29 10:04   ` Ian Campbell
  2 siblings, 1 reply; 117+ messages in thread
From: Wei Liu @ 2015-09-28 14:19 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky, Suravee Suthikulpanit

On Mon, Sep 28, 2015 at 03:13:58PM +0800, Haozhong Zhang wrote:
> This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> rate in KHz. In the case that tsc_mode = 'default', the default value of
> 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> option is set to 0 or does not appear in the configuration. In all other
> cases of tsc_mode, 'vtsc_khz' option is just ignored.
> 
> Another purpose of adding this option is to keep vcpu's TSC rate across
> guest reboot. In existing code, a new domain is created from the
> configuration of the previous domain which was just rebooted. vcpu's TSC
> rate is not stored in the configuration and the host TSC rate is the
> used as vcpu's TSC rate. This works fine unless the previous domain was
> migrated from another host machine with a different host TSC rate than
> the current one.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/libxl_x86.c     |  4 +++-
>  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index 9f6ec00..91cb0be 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
>      ("numa_placement",  libxl_defbool),
>      ("tsc_mode",        libxl_tsc_mode),
> +    ("vtsc_khz",        uint32),

This field should be in arch-specific substructure, i.e. "hvm".

>      ("max_memkb",       MemKB),
>      ("target_memkb",    MemKB),
>      ("video_memkb",     MemKB),
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 896f34c..7baaee4 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
>  {
>      int ret = 0;
>      int tsc_mode;
> +    uint32_t vtsc_khz;
>      uint32_t rtc_timeoffset;
>      libxl_ctx *ctx = libxl__gc_owner(gc);
>  
> @@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
>      default:
>          abort();
>      }
> -    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
> +    vtsc_khz = d_config->b_info.vtsc_khz;
> +    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
>      if (libxl_defbool_val(d_config->b_info.disable_migrate))
>          xc_domain_disable_migrate(ctx->xch, domid);
>      rtc_timeoffset = d_config->b_info.rtc_timeoffset;
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 2706759..5fabda7 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
>          }
>      }
>  
> +    /* "vtsc_khz" option works only if "tsc_mode" option is
> +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> +     * will reset it to the host TSC rate. In all other cases, we just
> +     * ignore any given value and always set it to 0.
> +     */
> +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> +        b_info->vtsc_khz = l;
> +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> +        if (b_info->vtsc_khz == 0) {
> +            libxl_physinfo physinfo;
> +            if (!libxl_get_physinfo(ctx, &physinfo))
> +                b_info->vtsc_khz = physinfo.cpu_khz;
> +            else
> +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> +        }

And this hunk (the decision making bit) should be in libxl, not xl.

Consider there are other toolstack that uses libxl, say libvirt.

Please ask if you have doubts.

Wei.

> +    } else {
> +        fprintf(stderr, "WARNING: ignoring \"vtsc_khz\" option. "
> +                "\"vtsc_khz\" option works only if "
> +                "\"tsc_mode\" option is \"default\".\n");
> +        b_info->vtsc_khz = 0;
> +    }
> +

>      if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
>          b_info->rtc_timeoffset = l;
>  
> -- 
> 2.4.8

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-28  7:13 ` [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware Haozhong Zhang
@ 2015-09-28 16:02   ` Boris Ostrovsky
  2015-09-29  1:07     ` Haozhong Zhang
  2015-10-27 13:33   ` Jan Beulich
  1 sibling, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-09-28 16:02 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Suravee Suthikulpanit

On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> This patch adds a new call-back setup_tsc_scaling in struct
> hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
> it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>   xen/arch/x86/hvm/hvm.c        | 1 +
>   xen/arch/x86/hvm/svm/svm.c    | 5 +++++
>   xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
>   xen/include/asm-x86/hvm/hvm.h | 3 +++
>   4 files changed, 17 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 3522d20..2d8a148 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
>       }
>   
>       v->arch.tsc_scaling_ratio = ratio;
> +    hvm_funcs.setup_tsc_scaling(v);
>   }
>   
>   void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 73bc863..d890c1f 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
>       svm_asid_g_invlpg(curr, vaddr);
>   }
>   
> +static void svm_setup_tsc_scaling(struct vcpu *v)
> +{
> +}
> +

Should this be wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio) ?

-boris

>   static struct hvm_function_table __initdata svm_function_table = {
>       .name                 = "SVM",
>       .cpu_up_prepare       = svm_cpu_up_prepare,
> @@ -2289,6 +2293,7 @@ static struct hvm_function_table __initdata svm_function_table = {
>       .max_tsc_scaling_ratio       = MAX_TSC_RATIO,
>       .tsc_scaling_ratio_frac_bits = 32,
>       .tsc_scaling_ratio_rsvd      = TSC_RATIO_RSVD_BITS,
> +    .setup_tsc_scaling           = svm_setup_tsc_scaling,
>   };
>   
>   void svm_vmexit_handler(struct cpu_user_regs *regs)
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 163974d..c4a7b81 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1100,6 +1100,13 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
>       }
>   }
>   
> +static void vmx_setup_tsc_scaling(struct vcpu *v)
> +{
> +    vmx_vmcs_enter(v);
> +    __vmwrite(TSC_MULTIPLIER, v->arch.tsc_scaling_ratio);
> +    vmx_vmcs_exit(v);
> +}
> +
>   static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
>   {
>       uint64_t host_tsc, guest_tsc;
> @@ -1986,6 +1993,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
>       .max_tsc_scaling_ratio       = VMX_TSC_MULTIPLIER_MAX,
>       .tsc_scaling_ratio_frac_bits = 48,
>       .tsc_scaling_ratio_rsvd      = 0x0ULL,
> +    .setup_tsc_scaling           = vmx_setup_tsc_scaling,
>   };
>   
>   const struct hvm_function_table * __init start_vmx(void)
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index f63fe93..9f8b6d5 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -226,6 +226,9 @@ struct hvm_function_table {
>       void (*altp2m_vcpu_update_vmfunc_ve)(struct vcpu *v);
>       bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
>       int (*altp2m_vcpu_emulate_vmfunc)(struct cpu_user_regs *regs);
> +
> +    /* setup TSC scaling */
> +    void (*setup_tsc_scaling)(struct vcpu *v);
>   };
>   
>   extern struct hvm_function_table hvm_funcs;

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly
  2015-09-28  7:13 ` [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly Haozhong Zhang
@ 2015-09-28 16:36   ` Boris Ostrovsky
  2015-09-29  0:19     ` Haozhong Zhang
  2015-10-22 15:50   ` Boris Ostrovsky
  1 sibling, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-09-28 16:36 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Suravee Suthikulpanit

On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> This patch makes the pvclock return the scaled host TSC and
> corresponding scaling parameters to HVM domains if guest TSC is not
> emulated and TSC scaling is enabled.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>   xen/arch/x86/time.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 4b5402c..54eab6e 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -832,10 +832,19 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>       }
>       else
>       {
> -        _u.tsc_timestamp     = t->local_tsc_stamp;
> +        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )

This should probably be has_hvm_container_domain(d).

And I think you also may need to adjust this in patch 4 (tsc_set_info()).

-boris


> +        {
> +            _u.tsc_timestamp     = hvm_scale_tsc(v, t->local_tsc_stamp);
> +            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> +            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
> +        }
> +        else
> +        {
> +            _u.tsc_timestamp     = t->local_tsc_stamp;
> +            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> +            _u.tsc_shift         = (s8)t->tsc_scale.shift;
> +        }
>           _u.system_time       = t->stime_local_stamp;
> -        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> -        _u.tsc_shift         = (s8)t->tsc_scale.shift;
>       }
>       if ( is_hvm_domain(d) )
>           _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly
  2015-09-28 16:36   ` Boris Ostrovsky
@ 2015-09-29  0:19     ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29  0:19 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Suravee Suthikulpanit

On Mon, Sep 28, 2015 at 12:36:51PM -0400, Boris Ostrovsky wrote:
> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >This patch makes the pvclock return the scaled host TSC and
> >corresponding scaling parameters to HVM domains if guest TSC is not
> >emulated and TSC scaling is enabled.
> >
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >  xen/arch/x86/time.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> >diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> >index 4b5402c..54eab6e 100644
> >--- a/xen/arch/x86/time.c
> >+++ b/xen/arch/x86/time.c
> >@@ -832,10 +832,19 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
> >      }
> >      else
> >      {
> >-        _u.tsc_timestamp     = t->local_tsc_stamp;
> >+        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )
> 
> This should probably be has_hvm_container_domain(d).
> 
> And I think you also may need to adjust this in patch 4 (tsc_set_info()).
>

Yes, I'll change them to has_hvm_container_domain() in next version.

> -boris
> 
> 
> >+        {
> >+            _u.tsc_timestamp     = hvm_scale_tsc(v, t->local_tsc_stamp);
> >+            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> >+            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
> >+        }
> >+        else
> >+        {
> >+            _u.tsc_timestamp     = t->local_tsc_stamp;
> >+            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> >+            _u.tsc_shift         = (s8)t->tsc_scale.shift;
> >+        }
> >          _u.system_time       = t->stime_local_stamp;
> >-        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> >-        _u.tsc_shift         = (s8)t->tsc_scale.shift;
> >      }
> >      if ( is_hvm_domain(d) )
> >          _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
> 
- Haozhong Zhang

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-28 14:19   ` Wei Liu
@ 2015-09-29  0:40     ` Haozhong Zhang
  2015-09-29  9:20       ` Wei Liu
  2015-09-29 10:07       ` Ian Campbell
  0 siblings, 2 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29  0:40 UTC (permalink / raw)
  To: Wei Liu
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit

On Mon, Sep 28, 2015 at 03:19:25PM +0100, Wei Liu wrote:
> On Mon, Sep 28, 2015 at 03:13:58PM +0800, Haozhong Zhang wrote:
> > This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> > rate in KHz. In the case that tsc_mode = 'default', the default value of
> > 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> > option is set to 0 or does not appear in the configuration. In all other
> > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > 
> > Another purpose of adding this option is to keep vcpu's TSC rate across
> > guest reboot. In existing code, a new domain is created from the
> > configuration of the previous domain which was just rebooted. vcpu's TSC
> > rate is not stored in the configuration and the host TSC rate is the
> > used as vcpu's TSC rate. This works fine unless the previous domain was
> > migrated from another host machine with a different host TSC rate than
> > the current one.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> >  tools/libxl/libxl_types.idl |  1 +
> >  tools/libxl/libxl_x86.c     |  4 +++-
> >  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
> >  3 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 9f6ec00..91cb0be 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> >      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
> >      ("numa_placement",  libxl_defbool),
> >      ("tsc_mode",        libxl_tsc_mode),
> > +    ("vtsc_khz",        uint32),
> 
> This field should be in arch-specific substructure, i.e. "hvm".
>

Julien also pointed out this and suggested to moving to an
arch-specific substructure. I'm going to add a new substructure
"arch_x86" and move "vtsc_khz" there. Is this good for you?

> >      ("max_memkb",       MemKB),
> >      ("target_memkb",    MemKB),
> >      ("video_memkb",     MemKB),
> > diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> > index 896f34c..7baaee4 100644
> > --- a/tools/libxl/libxl_x86.c
> > +++ b/tools/libxl/libxl_x86.c
> > @@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> >  {
> >      int ret = 0;
> >      int tsc_mode;
> > +    uint32_t vtsc_khz;
> >      uint32_t rtc_timeoffset;
> >      libxl_ctx *ctx = libxl__gc_owner(gc);
> >  
> > @@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> >      default:
> >          abort();
> >      }
> > -    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
> > +    vtsc_khz = d_config->b_info.vtsc_khz;
> > +    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
> >      if (libxl_defbool_val(d_config->b_info.disable_migrate))
> >          xc_domain_disable_migrate(ctx->xch, domid);
> >      rtc_timeoffset = d_config->b_info.rtc_timeoffset;
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 2706759..5fabda7 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> >          }
> >      }
> >  
> > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > +     * will reset it to the host TSC rate. In all other cases, we just
> > +     * ignore any given value and always set it to 0.
> > +     */
> > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> > +        b_info->vtsc_khz = l;
> > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > +        if (b_info->vtsc_khz == 0) {
> > +            libxl_physinfo physinfo;
> > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > +            else
> > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > +        }
> 
> And this hunk (the decision making bit) should be in libxl, not xl.
> 
> Consider there are other toolstack that uses libxl, say libvirt.
>

Good to know this.

I'm going to move it to libxl__arch_domain_create() where
b_info->vtsc_khz is used.

Haozhong

> Please ask if you have doubts.
>
> Wei.
> 
> > +    } else {
> > +        fprintf(stderr, "WARNING: ignoring \"vtsc_khz\" option. "
> > +                "\"vtsc_khz\" option works only if "
> > +                "\"tsc_mode\" option is \"default\".\n");
> > +        b_info->vtsc_khz = 0;
> > +    }
> > +
> 
> >      if (!xlu_cfg_get_long(config, "rtc_timeoffset", &l, 0))
> >          b_info->rtc_timeoffset = l;
> >  
> > -- 
> > 2.4.8

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-28 16:02   ` Boris Ostrovsky
@ 2015-09-29  1:07     ` Haozhong Zhang
  2015-09-29  9:33       ` Andrew Cooper
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29  1:07 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Suravee Suthikulpanit

On Mon, Sep 28, 2015 at 12:02:08PM -0400, Boris Ostrovsky wrote:
> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >This patch adds a new call-back setup_tsc_scaling in struct
> >hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
> >it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
> >
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >  xen/arch/x86/hvm/hvm.c        | 1 +
> >  xen/arch/x86/hvm/svm/svm.c    | 5 +++++
> >  xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
> >  xen/include/asm-x86/hvm/hvm.h | 3 +++
> >  4 files changed, 17 insertions(+)
> >
> >diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >index 3522d20..2d8a148 100644
> >--- a/xen/arch/x86/hvm/hvm.c
> >+++ b/xen/arch/x86/hvm/hvm.c
> >@@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
> >      }
> >      v->arch.tsc_scaling_ratio = ratio;
> >+    hvm_funcs.setup_tsc_scaling(v);
> >  }
> >  void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> >index 73bc863..d890c1f 100644
> >--- a/xen/arch/x86/hvm/svm/svm.c
> >+++ b/xen/arch/x86/hvm/svm/svm.c
> >@@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
> >      svm_asid_g_invlpg(curr, vaddr);
> >  }
> >+static void svm_setup_tsc_scaling(struct vcpu *v)
> >+{
> >+}
> >+
> 
> Should this be wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio) ?
> 
> -boris
>

MSR_AMD64_TSC_RATIO is set in svm_ctxt_switch_to() before entering guest.

For VMX, the ratio is set to a VMCS field TSC_MULTIPLIER and it's not
necessary to set it every time entering guest. Therefore, I introduce
the call-back setup_tsc_scaling() to do this. For SVM, as the ratio is
set every time entering guest, I leave the SVM version of setup_tsc_scaling()
empty.

- Haozhong

> >  static struct hvm_function_table __initdata svm_function_table = {
> >      .name                 = "SVM",
> >      .cpu_up_prepare       = svm_cpu_up_prepare,
> >@@ -2289,6 +2293,7 @@ static struct hvm_function_table __initdata svm_function_table = {
> >      .max_tsc_scaling_ratio       = MAX_TSC_RATIO,
> >      .tsc_scaling_ratio_frac_bits = 32,
> >      .tsc_scaling_ratio_rsvd      = TSC_RATIO_RSVD_BITS,
> >+    .setup_tsc_scaling           = svm_setup_tsc_scaling,
> >  };
> >  void svm_vmexit_handler(struct cpu_user_regs *regs)
> >diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >index 163974d..c4a7b81 100644
> >--- a/xen/arch/x86/hvm/vmx/vmx.c
> >+++ b/xen/arch/x86/hvm/vmx/vmx.c
> >@@ -1100,6 +1100,13 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
> >      }
> >  }
> >+static void vmx_setup_tsc_scaling(struct vcpu *v)
> >+{
> >+    vmx_vmcs_enter(v);
> >+    __vmwrite(TSC_MULTIPLIER, v->arch.tsc_scaling_ratio);
> >+    vmx_vmcs_exit(v);
> >+}
> >+
> >  static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
> >  {
> >      uint64_t host_tsc, guest_tsc;
> >@@ -1986,6 +1993,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
> >      .max_tsc_scaling_ratio       = VMX_TSC_MULTIPLIER_MAX,
> >      .tsc_scaling_ratio_frac_bits = 48,
> >      .tsc_scaling_ratio_rsvd      = 0x0ULL,
> >+    .setup_tsc_scaling           = vmx_setup_tsc_scaling,
> >  };
> >  const struct hvm_function_table * __init start_vmx(void)
> >diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> >index f63fe93..9f8b6d5 100644
> >--- a/xen/include/asm-x86/hvm/hvm.h
> >+++ b/xen/include/asm-x86/hvm/hvm.h
> >@@ -226,6 +226,9 @@ struct hvm_function_table {
> >      void (*altp2m_vcpu_update_vmfunc_ve)(struct vcpu *v);
> >      bool_t (*altp2m_vcpu_emulate_ve)(struct vcpu *v);
> >      int (*altp2m_vcpu_emulate_vmfunc)(struct cpu_user_regs *regs);
> >+
> >+    /* setup TSC scaling */
> >+    void (*setup_tsc_scaling)(struct vcpu *v);
> >  };
> >  extern struct hvm_function_table hvm_funcs;
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29  0:40     ` Haozhong Zhang
@ 2015-09-29  9:20       ` Wei Liu
  2015-09-29  9:50         ` Haozhong Zhang
  2015-09-29 10:07       ` Ian Campbell
  1 sibling, 1 reply; 117+ messages in thread
From: Wei Liu @ 2015-09-29  9:20 UTC (permalink / raw)
  To: Wei Liu, xen-devel, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Keir Fraser, Jan Beulich, Andrew Cooper,
	Boris Ostrovsky, Suravee Suthikulpanit, Aravind Gopalakrishnan,
	Jun Nakajima, Kevin Tian

On Tue, Sep 29, 2015 at 08:40:23AM +0800, Haozhong Zhang wrote:
> On Mon, Sep 28, 2015 at 03:19:25PM +0100, Wei Liu wrote:
> > On Mon, Sep 28, 2015 at 03:13:58PM +0800, Haozhong Zhang wrote:
> > > This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> > > rate in KHz. In the case that tsc_mode = 'default', the default value of
> > > 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> > > option is set to 0 or does not appear in the configuration. In all other
> > > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > > 
> > > Another purpose of adding this option is to keep vcpu's TSC rate across
> > > guest reboot. In existing code, a new domain is created from the
> > > configuration of the previous domain which was just rebooted. vcpu's TSC
> > > rate is not stored in the configuration and the host TSC rate is the
> > > used as vcpu's TSC rate. This works fine unless the previous domain was
> > > migrated from another host machine with a different host TSC rate than
> > > the current one.
> > > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > ---
> > >  tools/libxl/libxl_types.idl |  1 +
> > >  tools/libxl/libxl_x86.c     |  4 +++-
> > >  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
> > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > > index 9f6ec00..91cb0be 100644
> > > --- a/tools/libxl/libxl_types.idl
> > > +++ b/tools/libxl/libxl_types.idl
> > > @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> > >      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
> > >      ("numa_placement",  libxl_defbool),
> > >      ("tsc_mode",        libxl_tsc_mode),
> > > +    ("vtsc_khz",        uint32),
> > 
> > This field should be in arch-specific substructure, i.e. "hvm".
> >
> 
> Julien also pointed out this and suggested to moving to an
> arch-specific substructure. I'm going to add a new substructure
> "arch_x86" and move "vtsc_khz" there. Is this good for you?
> 

My initial thought was that this was a feature of HVM. I don't recollect
why I got that idea. I could be wrong.  Does it work with (or intent to
work with) PV too?  If yes, adding it to arch_x86 would be appropriate.
If not, hvm specific is good enough.  Please correct my
misunderstanding, I'm definitely no expert on x86.  

> > >      ("max_memkb",       MemKB),
> > >      ("target_memkb",    MemKB),
> > >      ("video_memkb",     MemKB),
> > > diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> > > index 896f34c..7baaee4 100644
> > > --- a/tools/libxl/libxl_x86.c
> > > +++ b/tools/libxl/libxl_x86.c
> > > @@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> > >  {
> > >      int ret = 0;
> > >      int tsc_mode;
> > > +    uint32_t vtsc_khz;
> > >      uint32_t rtc_timeoffset;
> > >      libxl_ctx *ctx = libxl__gc_owner(gc);
> > >  
> > > @@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> > >      default:
> > >          abort();
> > >      }
> > > -    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
> > > +    vtsc_khz = d_config->b_info.vtsc_khz;
> > > +    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
> > >      if (libxl_defbool_val(d_config->b_info.disable_migrate))
> > >          xc_domain_disable_migrate(ctx->xch, domid);
> > >      rtc_timeoffset = d_config->b_info.rtc_timeoffset;
> > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > > index 2706759..5fabda7 100644
> > > --- a/tools/libxl/xl_cmdimpl.c
> > > +++ b/tools/libxl/xl_cmdimpl.c
> > > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> > >          }
> > >      }
> > >  
> > > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > > +     * will reset it to the host TSC rate. In all other cases, we just
> > > +     * ignore any given value and always set it to 0.
> > > +     */
> > > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> > > +        b_info->vtsc_khz = l;
> > > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > > +        if (b_info->vtsc_khz == 0) {
> > > +            libxl_physinfo physinfo;
> > > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > > +            else
> > > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > > +        }
> > 
> > And this hunk (the decision making bit) should be in libxl, not xl.
> > 
> > Consider there are other toolstack that uses libxl, say libvirt.
> >
> 
> Good to know this.
> 
> I'm going to move it to libxl__arch_domain_create() where
> b_info->vtsc_khz is used.
> 

Right, that seems appropriate.

Wei.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-29  1:07     ` Haozhong Zhang
@ 2015-09-29  9:33       ` Andrew Cooper
  2015-09-29 10:02         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Andrew Cooper @ 2015-09-29  9:33 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Wei Liu, Keir Fraser, Jan Beulich,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On 29/09/15 02:07, Haozhong Zhang wrote:
> On Mon, Sep 28, 2015 at 12:02:08PM -0400, Boris Ostrovsky wrote:
>> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
>>> This patch adds a new call-back setup_tsc_scaling in struct
>>> hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
>>> it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
>>>
>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>>> ---
>>>  xen/arch/x86/hvm/hvm.c        | 1 +
>>>  xen/arch/x86/hvm/svm/svm.c    | 5 +++++
>>>  xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
>>>  xen/include/asm-x86/hvm/hvm.h | 3 +++
>>>  4 files changed, 17 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>> index 3522d20..2d8a148 100644
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
>>>      }
>>>      v->arch.tsc_scaling_ratio = ratio;
>>> +    hvm_funcs.setup_tsc_scaling(v);
>>>  }
>>>  void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>> index 73bc863..d890c1f 100644
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
>>>      svm_asid_g_invlpg(curr, vaddr);
>>>  }
>>> +static void svm_setup_tsc_scaling(struct vcpu *v)
>>> +{
>>> +}
>>> +
>> Should this be wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio) ?
>>
>> -boris
>>
> MSR_AMD64_TSC_RATIO is set in svm_ctxt_switch_to() before entering guest.
>
> For VMX, the ratio is set to a VMCS field TSC_MULTIPLIER and it's not
> necessary to set it every time entering guest. Therefore, I introduce
> the call-back setup_tsc_scaling() to do this. For SVM, as the ratio is
> set every time entering guest, I leave the SVM version of setup_tsc_scaling()
> empty.

VT-x has a per-VMCS scale, while SVM has a per-core MSR to adjust the
scale.  These do require different modification algorithms.

However, if there is any chance that any part of the system can update
the ratio while an SVM VCPU is in context (which appears to be the
case), then MSR_AMD64_TSC_RATIO needs updating synchronously, or it will
be deferred until the next full context switch which could be an
arbitrary time into the future.  This appears to be a latent bug in the
SVM side.

~Andrew

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29  9:20       ` Wei Liu
@ 2015-09-29  9:50         ` Haozhong Zhang
  2015-09-29 10:24           ` Julien Grall
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29  9:50 UTC (permalink / raw)
  To: Wei Liu, Julien Grall
  Cc: Kevin Tian, Keir Fraser, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit

On Tue, Sep 29, 2015 at 10:20:21AM +0100, Wei Liu wrote:
> On Tue, Sep 29, 2015 at 08:40:23AM +0800, Haozhong Zhang wrote:
> > On Mon, Sep 28, 2015 at 03:19:25PM +0100, Wei Liu wrote:
> > > On Mon, Sep 28, 2015 at 03:13:58PM +0800, Haozhong Zhang wrote:
> > > > This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> > > > rate in KHz. In the case that tsc_mode = 'default', the default value of
> > > > 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> > > > option is set to 0 or does not appear in the configuration. In all other
> > > > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > > > 
> > > > Another purpose of adding this option is to keep vcpu's TSC rate across
> > > > guest reboot. In existing code, a new domain is created from the
> > > > configuration of the previous domain which was just rebooted. vcpu's TSC
> > > > rate is not stored in the configuration and the host TSC rate is the
> > > > used as vcpu's TSC rate. This works fine unless the previous domain was
> > > > migrated from another host machine with a different host TSC rate than
> > > > the current one.
> > > > 
> > > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > > ---
> > > >  tools/libxl/libxl_types.idl |  1 +
> > > >  tools/libxl/libxl_x86.c     |  4 +++-
> > > >  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
> > > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > > > index 9f6ec00..91cb0be 100644
> > > > --- a/tools/libxl/libxl_types.idl
> > > > +++ b/tools/libxl/libxl_types.idl
> > > > @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> > > >      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
> > > >      ("numa_placement",  libxl_defbool),
> > > >      ("tsc_mode",        libxl_tsc_mode),
> > > > +    ("vtsc_khz",        uint32),
> > > 
> > > This field should be in arch-specific substructure, i.e. "hvm".
> > >
> > 
> > Julien also pointed out this and suggested to moving to an
> > arch-specific substructure. I'm going to add a new substructure
> > "arch_x86" and move "vtsc_khz" there. Is this good for you?
> > 
> 
> My initial thought was that this was a feature of HVM. I don't recollect
> why I got that idea. I could be wrong.  Does it work with (or intent to
> work with) PV too?  If yes, adding it to arch_x86 would be appropriate.
> If not, hvm specific is good enough.  Please correct my
> misunderstanding, I'm definitely no expert on x86.  
>

No, it only works with HVM. So it should be in 'hvm'? Julien, what is
your opinion?

- Haozhong

> > > >      ("max_memkb",       MemKB),
> > > >      ("target_memkb",    MemKB),
> > > >      ("video_memkb",     MemKB),
> > > > diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> > > > index 896f34c..7baaee4 100644
> > > > --- a/tools/libxl/libxl_x86.c
> > > > +++ b/tools/libxl/libxl_x86.c
> > > > @@ -276,6 +276,7 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> > > >  {
> > > >      int ret = 0;
> > > >      int tsc_mode;
> > > > +    uint32_t vtsc_khz;
> > > >      uint32_t rtc_timeoffset;
> > > >      libxl_ctx *ctx = libxl__gc_owner(gc);
> > > >  
> > > > @@ -300,7 +301,8 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config,
> > > >      default:
> > > >          abort();
> > > >      }
> > > > -    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, 0, 0);
> > > > +    vtsc_khz = d_config->b_info.vtsc_khz;
> > > > +    xc_domain_set_tsc_info(ctx->xch, domid, tsc_mode, 0, vtsc_khz, 0);
> > > >      if (libxl_defbool_val(d_config->b_info.disable_migrate))
> > > >          xc_domain_disable_migrate(ctx->xch, domid);
> > > >      rtc_timeoffset = d_config->b_info.rtc_timeoffset;
> > > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > > > index 2706759..5fabda7 100644
> > > > --- a/tools/libxl/xl_cmdimpl.c
> > > > +++ b/tools/libxl/xl_cmdimpl.c
> > > > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> > > >          }
> > > >      }
> > > >  
> > > > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > > > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > > > +     * will reset it to the host TSC rate. In all other cases, we just
> > > > +     * ignore any given value and always set it to 0.
> > > > +     */
> > > > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> > > > +        b_info->vtsc_khz = l;
> > > > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > > > +        if (b_info->vtsc_khz == 0) {
> > > > +            libxl_physinfo physinfo;
> > > > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > > > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > > > +            else
> > > > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > > > +        }
> > > 
> > > And this hunk (the decision making bit) should be in libxl, not xl.
> > > 
> > > Consider there are other toolstack that uses libxl, say libvirt.
> > >
> > 
> > Good to know this.
> > 
> > I'm going to move it to libxl__arch_domain_create() where
> > b_info->vtsc_khz is used.
> > 
> 
> Right, that seems appropriate.
> 
> Wei.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-29  9:33       ` Andrew Cooper
@ 2015-09-29 10:02         ` Haozhong Zhang
  2015-09-29 10:25           ` Andrew Cooper
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 10:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Boris Ostrovsky, Suravee Suthikulpanit

On Tue, Sep 29, 2015 at 10:33:14AM +0100, Andrew Cooper wrote:
> On 29/09/15 02:07, Haozhong Zhang wrote:
> > On Mon, Sep 28, 2015 at 12:02:08PM -0400, Boris Ostrovsky wrote:
> >> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >>> This patch adds a new call-back setup_tsc_scaling in struct
> >>> hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
> >>> it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
> >>>
> >>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >>> ---
> >>>  xen/arch/x86/hvm/hvm.c        | 1 +
> >>>  xen/arch/x86/hvm/svm/svm.c    | 5 +++++
> >>>  xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
> >>>  xen/include/asm-x86/hvm/hvm.h | 3 +++
> >>>  4 files changed, 17 insertions(+)
> >>>
> >>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >>> index 3522d20..2d8a148 100644
> >>> --- a/xen/arch/x86/hvm/hvm.c
> >>> +++ b/xen/arch/x86/hvm/hvm.c
> >>> @@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
> >>>      }
> >>>      v->arch.tsc_scaling_ratio = ratio;
> >>> +    hvm_funcs.setup_tsc_scaling(v);
> >>>  }
> >>>  void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> >>> index 73bc863..d890c1f 100644
> >>> --- a/xen/arch/x86/hvm/svm/svm.c
> >>> +++ b/xen/arch/x86/hvm/svm/svm.c
> >>> @@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
> >>>      svm_asid_g_invlpg(curr, vaddr);
> >>>  }
> >>> +static void svm_setup_tsc_scaling(struct vcpu *v)
> >>> +{
> >>> +}
> >>> +
> >> Should this be wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio) ?
> >>
> >> -boris
> >>
> > MSR_AMD64_TSC_RATIO is set in svm_ctxt_switch_to() before entering guest.
> >
> > For VMX, the ratio is set to a VMCS field TSC_MULTIPLIER and it's not
> > necessary to set it every time entering guest. Therefore, I introduce
> > the call-back setup_tsc_scaling() to do this. For SVM, as the ratio is
> > set every time entering guest, I leave the SVM version of setup_tsc_scaling()
> > empty.
> 
> VT-x has a per-VMCS scale, while SVM has a per-core MSR to adjust the
> scale.  These do require different modification algorithms.
>

Yes, this is what I mean.

> However, if there is any chance that any part of the system can update
> the ratio while an SVM VCPU is in context (which appears to be the
> case), then MSR_AMD64_TSC_RATIO needs updating synchronously, or it will
> be deferred until the next full context switch which could be an
> arbitrary time into the future.  This appears to be a latent bug in the
> SVM side.
>

In my patch, tsc ratio is set only when
 1. a domain is created (by arch_domain_create()),
 2. a vcpu's state is reset (by hvm_vcpu_reset_state()),
 3. a vcpu's context is restored (by hvm_load_cpu_ctxt()), or
 4. through the hypercall XEN_DOMCTL_settscinfo.

(Correct me if I'm wrong below)

For the first 3 cases, vcpu is definitely not in context, so it's safe
to set tsc ratio without any latent bug. For the last case,
arch_do_domctl() pauses the domain before updating tsc ratio, so it's
also safe.

> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-28  7:13 ` [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate Haozhong Zhang
  2015-09-28 11:47   ` Julien Grall
  2015-09-28 14:19   ` Wei Liu
@ 2015-09-29 10:04   ` Ian Campbell
  2015-09-29 10:13     ` Haozhong Zhang
  2 siblings, 1 reply; 117+ messages in thread
From: Ian Campbell @ 2015-09-29 10:04 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Suravee Suthikulpanit, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Boris Ostrovsky

On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> rate in KHz. In the case that tsc_mode = 'default', the default value of
> 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> option is set to 0 or does not appear in the configuration. In all other
> cases of tsc_mode, 'vtsc_khz' option is just ignored.
> 
> Another purpose of adding this option is to keep vcpu's TSC rate across
> guest reboot. In existing code, a new domain is created from the
> configuration of the previous domain which was just rebooted. vcpu's TSC
> rate is not stored in the configuration and the host TSC rate is the
> used as vcpu's TSC rate. This works fine unless the previous domain was
> migrated from another host machine with a different host TSC rate than
> the current one.

I understand why this is necessary over a migration, but why is it
important to be able to retain the TSC rate across a reboot? What is the
usecase there?

> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/libxl_x86.c     |  4 +++-
>  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++

The documentation should be patched at the same time. At least the xl.cfg
manpage, but I think there is also a specific document about time and the
TSC which should also be updated.

Ian.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29  0:40     ` Haozhong Zhang
  2015-09-29  9:20       ` Wei Liu
@ 2015-09-29 10:07       ` Ian Campbell
  2015-09-29 10:33         ` Wei Liu
  2015-09-29 12:57         ` Haozhong Zhang
  1 sibling, 2 replies; 117+ messages in thread
From: Ian Campbell @ 2015-09-29 10:07 UTC (permalink / raw)
  To: Haozhong Zhang, Wei Liu
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	xen-devel, Aravind Gopalakrishnan, Jan Beulich, Boris Ostrovsky

On Tue, 2015-09-29 at 08:40 +0800, Haozhong Zhang wrote:
> > > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> > >          }
> > >      }
> > >  
> > > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > > +     * will reset it to the host TSC rate. In all other cases, we just
> > > +     * ignore any given value and always set it to 0.
> > > +     */
> > > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> > > +        b_info->vtsc_khz = l;
> > > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > > +        if (b_info->vtsc_khz == 0) {
> > > +            libxl_physinfo physinfo;
> > > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > > +            else
> > > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > > +        }
> > 
> > And this hunk (the decision making bit) should be in libxl, not xl.
> > 
> > Consider there are other toolstack that uses libxl, say libvirt.
> > 
> 
> Good to know this.
> 
> I'm going to move it to libxl__arch_domain_create() where
> b_info->vtsc_khz is used.

libxl__domain_build_info_setdefault would be more usual, I think.

Rather than calling get_physinfo in order to give vtsc_khz a specific value
instead of zero can we not leave it as zero and just not call
 xc_domain_set_tsc_info() in that case and let the hypervisor default to
using the host rate?

Then the check in libxl just becomes "is vtsc_khz non-zero and is tsc_mode
not DEFAULT".

Don't forget to switch from fprintf to the proper log macros.

Ian.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:04   ` Ian Campbell
@ 2015-09-29 10:13     ` Haozhong Zhang
  2015-09-29 10:24       ` Andrew Cooper
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 10:13 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kevin Tian, Wei Liu, Suravee Suthikulpanit, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Boris Ostrovsky

On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
> On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> > This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
> > rate in KHz. In the case that tsc_mode = 'default', the default value of
> > 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
> > option is set to 0 or does not appear in the configuration. In all other
> > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > 
> > Another purpose of adding this option is to keep vcpu's TSC rate across
> > guest reboot. In existing code, a new domain is created from the
> > configuration of the previous domain which was just rebooted. vcpu's TSC
> > rate is not stored in the configuration and the host TSC rate is the
> > used as vcpu's TSC rate. This works fine unless the previous domain was
> > migrated from another host machine with a different host TSC rate than
> > the current one.
> 
> I understand why this is necessary over a migration, but why is it
> important to be able to retain the TSC rate across a reboot? What is the
> usecase there?
>

No usecase so far. Is 'making a virtual machine more like a physical
machine' reasonable here? (I suppose a physical machine retains TSC
rate across reboot)

> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> >  tools/libxl/libxl_types.idl |  1 +
> >  tools/libxl/libxl_x86.c     |  4 +++-
> >  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
> 
> The documentation should be patched at the same time. At least the xl.cfg
> manpage, but I think there is also a specific document about time and the
                                     ~~~~~~~~~~~~~~~~~~~
I think it's doc/misc/tscmode.txt? Will update it as well.

- Haozhong

> TSC which should also be updated.
>
> Ian.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29  9:50         ` Haozhong Zhang
@ 2015-09-29 10:24           ` Julien Grall
  0 siblings, 0 replies; 117+ messages in thread
From: Julien Grall @ 2015-09-29 10:24 UTC (permalink / raw)
  To: Wei Liu, xen-devel, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Keir Fraser, Jan Beulich, Andrew Cooper,
	Boris Ostrovsky, Suravee Suthikulpanit, Aravind Gopalakrishnan,
	Jun Nakajima, Kevin Tian

On 29/09/15 10:50, Haozhong Zhang wrote:
> On Tue, Sep 29, 2015 at 10:20:21AM +0100, Wei Liu wrote:
>> On Tue, Sep 29, 2015 at 08:40:23AM +0800, Haozhong Zhang wrote:
>>> On Mon, Sep 28, 2015 at 03:19:25PM +0100, Wei Liu wrote:
>>>> On Mon, Sep 28, 2015 at 03:13:58PM +0800, Haozhong Zhang wrote:
>>>>> This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
>>>>> rate in KHz. In the case that tsc_mode = 'default', the default value of
>>>>> 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
>>>>> option is set to 0 or does not appear in the configuration. In all other
>>>>> cases of tsc_mode, 'vtsc_khz' option is just ignored.
>>>>>
>>>>> Another purpose of adding this option is to keep vcpu's TSC rate across
>>>>> guest reboot. In existing code, a new domain is created from the
>>>>> configuration of the previous domain which was just rebooted. vcpu's TSC
>>>>> rate is not stored in the configuration and the host TSC rate is the
>>>>> used as vcpu's TSC rate. This works fine unless the previous domain was
>>>>> migrated from another host machine with a different host TSC rate than
>>>>> the current one.
>>>>>
>>>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>>>>> ---
>>>>>  tools/libxl/libxl_types.idl |  1 +
>>>>>  tools/libxl/libxl_x86.c     |  4 +++-
>>>>>  tools/libxl/xl_cmdimpl.c    | 22 ++++++++++++++++++++++
>>>>>  3 files changed, 26 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
>>>>> index 9f6ec00..91cb0be 100644
>>>>> --- a/tools/libxl/libxl_types.idl
>>>>> +++ b/tools/libxl/libxl_types.idl
>>>>> @@ -413,6 +413,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>>>>>      ("vcpu_soft_affinity", Array(libxl_bitmap, "num_vcpu_soft_affinity")),
>>>>>      ("numa_placement",  libxl_defbool),
>>>>>      ("tsc_mode",        libxl_tsc_mode),
>>>>> +    ("vtsc_khz",        uint32),
>>>>
>>>> This field should be in arch-specific substructure, i.e. "hvm".
>>>>
>>>
>>> Julien also pointed out this and suggested to moving to an
>>> arch-specific substructure. I'm going to add a new substructure
>>> "arch_x86" and move "vtsc_khz" there. Is this good for you?
>>>
>>
>> My initial thought was that this was a feature of HVM. I don't recollect
>> why I got that idea. I could be wrong.  Does it work with (or intent to
>> work with) PV too?  If yes, adding it to arch_x86 would be appropriate.
>> If not, hvm specific is good enough.  Please correct my
>> misunderstanding, I'm definitely no expert on x86.  
>>
> 
> No, it only works with HVM. So it should be in 'hvm'? Julien, what is
> your opinion?

I'm fine with having vtsc_khw directly in hvm.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:13     ` Haozhong Zhang
@ 2015-09-29 10:24       ` Andrew Cooper
  2015-09-29 10:28         ` Ian Campbell
  0 siblings, 1 reply; 117+ messages in thread
From: Andrew Cooper @ 2015-09-29 10:24 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, Ian Jackson, Stefano Stabellini,
	Wei Liu, Keir Fraser, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On 29/09/15 11:13, Haozhong Zhang wrote:
> On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
>> On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
>>> This patch adds an option 'vtsc_khz' to allow users to set vcpu's TSC
>>> rate in KHz. In the case that tsc_mode = 'default', the default value of
>>> 'vtsc_khz' option is the host TSC rate which is used when 'vtsc_khz'
>>> option is set to 0 or does not appear in the configuration. In all other
>>> cases of tsc_mode, 'vtsc_khz' option is just ignored.
>>>
>>> Another purpose of adding this option is to keep vcpu's TSC rate across
>>> guest reboot. In existing code, a new domain is created from the
>>> configuration of the previous domain which was just rebooted. vcpu's TSC
>>> rate is not stored in the configuration and the host TSC rate is the
>>> used as vcpu's TSC rate. This works fine unless the previous domain was
>>> migrated from another host machine with a different host TSC rate than
>>> the current one.
>> I understand why this is necessary over a migration, but why is it
>> important to be able to retain the TSC rate across a reboot? What is the
>> usecase there?
>>
> No usecase so far. Is 'making a virtual machine more like a physical
> machine' reasonable here? (I suppose a physical machine retains TSC
> rate across reboot)

There are situations such as altering firmware settings which can cause
the TSC rate to differ across reboot.  I don't see any reason to try and
maintain it across VM reboots.

~Andrew

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-29 10:02         ` Haozhong Zhang
@ 2015-09-29 10:25           ` Andrew Cooper
  2015-09-29 13:59             ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Andrew Cooper @ 2015-09-29 10:25 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel, Ian Jackson, Stefano Stabellini,
	Ian Campbell, Wei Liu, Keir Fraser, Jan Beulich,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On 29/09/15 11:02, Haozhong Zhang wrote:
> On Tue, Sep 29, 2015 at 10:33:14AM +0100, Andrew Cooper wrote:
>> On 29/09/15 02:07, Haozhong Zhang wrote:
>>> On Mon, Sep 28, 2015 at 12:02:08PM -0400, Boris Ostrovsky wrote:
>>>> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
>>>>> This patch adds a new call-back setup_tsc_scaling in struct
>>>>> hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
>>>>> it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
>>>>>
>>>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>>>>> ---
>>>>>  xen/arch/x86/hvm/hvm.c        | 1 +
>>>>>  xen/arch/x86/hvm/svm/svm.c    | 5 +++++
>>>>>  xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
>>>>>  xen/include/asm-x86/hvm/hvm.h | 3 +++
>>>>>  4 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>>>>> index 3522d20..2d8a148 100644
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
>>>>>      }
>>>>>      v->arch.tsc_scaling_ratio = ratio;
>>>>> +    hvm_funcs.setup_tsc_scaling(v);
>>>>>  }
>>>>>  void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
>>>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>>>> index 73bc863..d890c1f 100644
>>>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>>>> @@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
>>>>>      svm_asid_g_invlpg(curr, vaddr);
>>>>>  }
>>>>> +static void svm_setup_tsc_scaling(struct vcpu *v)
>>>>> +{
>>>>> +}
>>>>> +
>>>> Should this be wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio) ?
>>>>
>>>> -boris
>>>>
>>> MSR_AMD64_TSC_RATIO is set in svm_ctxt_switch_to() before entering guest.
>>>
>>> For VMX, the ratio is set to a VMCS field TSC_MULTIPLIER and it's not
>>> necessary to set it every time entering guest. Therefore, I introduce
>>> the call-back setup_tsc_scaling() to do this. For SVM, as the ratio is
>>> set every time entering guest, I leave the SVM version of setup_tsc_scaling()
>>> empty.
>> VT-x has a per-VMCS scale, while SVM has a per-core MSR to adjust the
>> scale.  These do require different modification algorithms.
>>
> Yes, this is what I mean.
>
>> However, if there is any chance that any part of the system can update
>> the ratio while an SVM VCPU is in context (which appears to be the
>> case), then MSR_AMD64_TSC_RATIO needs updating synchronously, or it will
>> be deferred until the next full context switch which could be an
>> arbitrary time into the future.  This appears to be a latent bug in the
>> SVM side.
>>
> In my patch, tsc ratio is set only when
>  1. a domain is created (by arch_domain_create()),
>  2. a vcpu's state is reset (by hvm_vcpu_reset_state()),
>  3. a vcpu's context is restored (by hvm_load_cpu_ctxt()), or
>  4. through the hypercall XEN_DOMCTL_settscinfo.
>
> (Correct me if I'm wrong below)
>
> For the first 3 cases, vcpu is definitely not in context, so it's safe
> to set tsc ratio without any latent bug. For the last case,
> arch_do_domctl() pauses the domain before updating tsc ratio, so it's
> also safe.

That logic appears to be correct, which would suggest that there isn't
actually a latent bug.

In such a case, we would typically make the hvm_funcs pointer optional,
and omit an empty stub on the SVM side.

~Andrew

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:24       ` Andrew Cooper
@ 2015-09-29 10:28         ` Ian Campbell
  2015-09-29 10:31           ` Andrew Cooper
  2015-09-29 13:53           ` Haozhong Zhang
  0 siblings, 2 replies; 117+ messages in thread
From: Ian Campbell @ 2015-09-29 10:28 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, Ian Jackson, Stefano Stabellini,
	Wei Liu, Keir Fraser, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
> On 29/09/15 11:13, Haozhong Zhang wrote:
> > On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
> > > On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> > > > This patch adds an option 'vtsc_khz' to allow users to set vcpu's
> > > > TSC
> > > > rate in KHz. In the case that tsc_mode = 'default', the default
> > > > value of
> > > > 'vtsc_khz' option is the host TSC rate which is used when
> > > > 'vtsc_khz'
> > > > option is set to 0 or does not appear in the configuration. In all
> > > > other
> > > > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > > > 
> > > > Another purpose of adding this option is to keep vcpu's TSC rate
> > > > across
> > > > guest reboot. In existing code, a new domain is created from the
> > > > configuration of the previous domain which was just rebooted.
> > > > vcpu's TSC
> > > > rate is not stored in the configuration and the host TSC rate is
> > > > the
> > > > used as vcpu's TSC rate. This works fine unless the previous domain
> > > > was
> > > > migrated from another host machine with a different host TSC rate
> > > > than
> > > > the current one.
> > > I understand why this is necessary over a migration, but why is it
> > > important to be able to retain the TSC rate across a reboot? What is
> > > the
> > > usecase there?
> > > 
> > No usecase so far. Is 'making a virtual machine more like a physical
> > machine' reasonable here? (I suppose a physical machine retains TSC
> > rate across reboot)
> 
> There are situations such as altering firmware settings which can cause
> the TSC rate to differ across reboot.  I don't see any reason to try and
> maintain it across VM reboots.

Right. If it happens to come for free as a side effect of making it work
for migration then fine.

But it seems to me that tsc rate could/should be in the hypervisors save
blob and require no interaction with the toolstack once it is latched when
the domain is built.

Ian.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:28         ` Ian Campbell
@ 2015-09-29 10:31           ` Andrew Cooper
  2015-09-29 13:53           ` Haozhong Zhang
  1 sibling, 0 replies; 117+ messages in thread
From: Andrew Cooper @ 2015-09-29 10:31 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, Ian Jackson, Stefano Stabellini,
	Wei Liu, Keir Fraser, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On 29/09/15 11:28, Ian Campbell wrote:
> On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
>> On 29/09/15 11:13, Haozhong Zhang wrote:
>>> On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
>>>> On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
>>>>> This patch adds an option 'vtsc_khz' to allow users to set vcpu's
>>>>> TSC
>>>>> rate in KHz. In the case that tsc_mode = 'default', the default
>>>>> value of
>>>>> 'vtsc_khz' option is the host TSC rate which is used when
>>>>> 'vtsc_khz'
>>>>> option is set to 0 or does not appear in the configuration. In all
>>>>> other
>>>>> cases of tsc_mode, 'vtsc_khz' option is just ignored.
>>>>>
>>>>> Another purpose of adding this option is to keep vcpu's TSC rate
>>>>> across
>>>>> guest reboot. In existing code, a new domain is created from the
>>>>> configuration of the previous domain which was just rebooted.
>>>>> vcpu's TSC
>>>>> rate is not stored in the configuration and the host TSC rate is
>>>>> the
>>>>> used as vcpu's TSC rate. This works fine unless the previous domain
>>>>> was
>>>>> migrated from another host machine with a different host TSC rate
>>>>> than
>>>>> the current one.
>>>> I understand why this is necessary over a migration, but why is it
>>>> important to be able to retain the TSC rate across a reboot? What is
>>>> the
>>>> usecase there?
>>>>
>>> No usecase so far. Is 'making a virtual machine more like a physical
>>> machine' reasonable here? (I suppose a physical machine retains TSC
>>> rate across reboot)
>> There are situations such as altering firmware settings which can cause
>> the TSC rate to differ across reboot.  I don't see any reason to try and
>> maintain it across VM reboots.
> Right. If it happens to come for free as a side effect of making it work
> for migration then fine.
>
> But it seems to me that tsc rate could/should be in the hypervisors save
> blob and require no interaction with the toolstack once it is latched when
> the domain is built.

There are a lot of blobs which fall into this category.  Others are
cpuid policy and guest-MSRs.  I have a longterm plan to fix them, which
is under very slow progress.

~Andrew

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:07       ` Ian Campbell
@ 2015-09-29 10:33         ` Wei Liu
  2015-09-29 12:57         ` Haozhong Zhang
  1 sibling, 0 replies; 117+ messages in thread
From: Wei Liu @ 2015-09-29 10:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Haozhong Zhang, Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	xen-devel, Aravind Gopalakrishnan, Jan Beulich, Wei Liu,
	Boris Ostrovsky

On Tue, Sep 29, 2015 at 11:07:17AM +0100, Ian Campbell wrote:
> On Tue, 2015-09-29 at 08:40 +0800, Haozhong Zhang wrote:
> > > > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> > > >          }
> > > >      }
> > > >  
> > > > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > > > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > > > +     * will reset it to the host TSC rate. In all other cases, we just
> > > > +     * ignore any given value and always set it to 0.
> > > > +     */
> > > > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> > > > +        b_info->vtsc_khz = l;
> > > > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > > > +        if (b_info->vtsc_khz == 0) {
> > > > +            libxl_physinfo physinfo;
> > > > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > > > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > > > +            else
> > > > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > > > +        }
> > > 
> > > And this hunk (the decision making bit) should be in libxl, not xl.
> > > 
> > > Consider there are other toolstack that uses libxl, say libvirt.
> > > 
> > 
> > Good to know this.
> > 
> > I'm going to move it to libxl__arch_domain_create() where
> > b_info->vtsc_khz is used.
> 
> libxl__domain_build_info_setdefault would be more usual, I think.
> 

For the benefit of avoiding conflict suggestions, I think Ian is right.
I retract my previous comment.

> Rather than calling get_physinfo in order to give vtsc_khz a specific value
> instead of zero can we not leave it as zero and just not call
>  xc_domain_set_tsc_info() in that case and let the hypervisor default to
> using the host rate?
> 
> Then the check in libxl just becomes "is vtsc_khz non-zero and is tsc_mode
> not DEFAULT".
> 
> Don't forget to switch from fprintf to the proper log macros.
> 
> Ian.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:07       ` Ian Campbell
  2015-09-29 10:33         ` Wei Liu
@ 2015-09-29 12:57         ` Haozhong Zhang
  1 sibling, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 12:57 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	xen-devel, Aravind Gopalakrishnan, Jan Beulich, Wei Liu,
	Boris Ostrovsky

On Tue, Sep 29, 2015 at 11:07:17AM +0100, Ian Campbell wrote:
> On Tue, 2015-09-29 at 08:40 +0800, Haozhong Zhang wrote:
> > > > @@ -1462,6 +1462,28 @@ static void parse_config_data(const char *config_source,
> > > >          }
> > > >      }
> > > >  
> > > > +    /* "vtsc_khz" option works only if "tsc_mode" option is
> > > > +     * "default". In this case, if "vtsc_khz" option is set to 0, we
> > > > +     * will reset it to the host TSC rate. In all other cases, we just
> > > > +     * ignore any given value and always set it to 0.
> > > > +     */
> > > > +    if (!xlu_cfg_get_long(config, "vtsc_khz", &l, 0))
> > > > +        b_info->vtsc_khz = l;
> > > > +    if (b_info->tsc_mode == LIBXL_TSC_MODE_DEFAULT) {
> > > > +        if (b_info->vtsc_khz == 0) {
> > > > +            libxl_physinfo physinfo;
> > > > +            if (!libxl_get_physinfo(ctx, &physinfo))
> > > > +                b_info->vtsc_khz = physinfo.cpu_khz;
> > > > +            else
> > > > +                fprintf(stderr, "WARNING: cannot get host TSC rate.\n");
> > > > +        }
> > > 
> > > And this hunk (the decision making bit) should be in libxl, not xl.
> > > 
> > > Consider there are other toolstack that uses libxl, say libvirt.
> > > 
> > 
> > Good to know this.
> > 
> > I'm going to move it to libxl__arch_domain_create() where
> > b_info->vtsc_khz is used.
> 
> libxl__domain_build_info_setdefault would be more usual, I think.
>

Yes, this is a better place.

> Rather than calling get_physinfo in order to give vtsc_khz a specific value
> instead of zero can we not leave it as zero and just not call
>  xc_domain_set_tsc_info() in that case and let the hypervisor default to
> using the host rate?
>

Alternatively, I can leave vtsc_khz zero if it's not set in xl.cfg
and, if xc_domain_set_tsc_info() passes a zero vtsc_khz to hypervisor,
the latter will just use the host tsc rate (which was the original logic),

> Then the check in libxl just becomes "is vtsc_khz non-zero and is tsc_mode
> not DEFAULT".
>

... and use this check in libxl__domain_build_info_setdefault().

> Don't forget to switch from fprintf to the proper log macros.
>

Of course. Thanks for reminding!

- Haozhong

> Ian.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 10:28         ` Ian Campbell
  2015-09-29 10:31           ` Andrew Cooper
@ 2015-09-29 13:53           ` Haozhong Zhang
  2015-09-29 13:56             ` Andrew Cooper
  1 sibling, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 13:53 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	xen-devel, Aravind Gopalakrishnan, Jan Beulich, Wei Liu,
	Boris Ostrovsky

On Tue, Sep 29, 2015 at 11:28:38AM +0100, Ian Campbell wrote:
> On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
> > On 29/09/15 11:13, Haozhong Zhang wrote:
> > > On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
> > > > On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> > > > > This patch adds an option 'vtsc_khz' to allow users to set vcpu's
> > > > > TSC
> > > > > rate in KHz. In the case that tsc_mode = 'default', the default
> > > > > value of
> > > > > 'vtsc_khz' option is the host TSC rate which is used when
> > > > > 'vtsc_khz'
> > > > > option is set to 0 or does not appear in the configuration. In all
> > > > > other
> > > > > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > > > > 
> > > > > Another purpose of adding this option is to keep vcpu's TSC rate
> > > > > across
> > > > > guest reboot. In existing code, a new domain is created from the
> > > > > configuration of the previous domain which was just rebooted.
> > > > > vcpu's TSC
> > > > > rate is not stored in the configuration and the host TSC rate is
> > > > > the
> > > > > used as vcpu's TSC rate. This works fine unless the previous domain
> > > > > was
> > > > > migrated from another host machine with a different host TSC rate
> > > > > than
> > > > > the current one.
> > > > I understand why this is necessary over a migration, but why is it
> > > > important to be able to retain the TSC rate across a reboot? What is
> > > > the
> > > > usecase there?
> > > > 
> > > No usecase so far. Is 'making a virtual machine more like a physical
> > > machine' reasonable here? (I suppose a physical machine retains TSC
> > > rate across reboot)
> > 
> > There are situations such as altering firmware settings which can cause
> > the TSC rate to differ across reboot.  I don't see any reason to try and
> > maintain it across VM reboots.
> 
> Right. If it happens to come for free as a side effect of making it work
> for migration then fine.
> 
> But it seems to me that tsc rate could/should be in the hypervisors save
> blob and require no interaction with the toolstack once it is latched when
> the domain is built.
> 
> Ian.
>

Seemingly I don't need vtsc_khz at all if retaining tsc rate across
reboot is unnecessary (or even wrong). The migration of tsc rate is
already done through write_tsc_info() and handle_tsc_info() w/o this
patch. I'll check if "xl save/restore" also go through this path. If
so, I think vtsc_khz can be removed.

- Haozhong

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 13:53           ` Haozhong Zhang
@ 2015-09-29 13:56             ` Andrew Cooper
  2015-09-29 14:01               ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Andrew Cooper @ 2015-09-29 13:56 UTC (permalink / raw)
  To: Ian Campbell, xen-devel, Ian Jackson, Stefano Stabellini,
	Wei Liu, Keir Fraser, Jan Beulich, Boris Ostrovsky,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On 29/09/15 14:53, Haozhong Zhang wrote:
> On Tue, Sep 29, 2015 at 11:28:38AM +0100, Ian Campbell wrote:
>> On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
>>> On 29/09/15 11:13, Haozhong Zhang wrote:
>>>> On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
>>>>> On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
>>>>>> This patch adds an option 'vtsc_khz' to allow users to set vcpu's
>>>>>> TSC
>>>>>> rate in KHz. In the case that tsc_mode = 'default', the default
>>>>>> value of
>>>>>> 'vtsc_khz' option is the host TSC rate which is used when
>>>>>> 'vtsc_khz'
>>>>>> option is set to 0 or does not appear in the configuration. In all
>>>>>> other
>>>>>> cases of tsc_mode, 'vtsc_khz' option is just ignored.
>>>>>>
>>>>>> Another purpose of adding this option is to keep vcpu's TSC rate
>>>>>> across
>>>>>> guest reboot. In existing code, a new domain is created from the
>>>>>> configuration of the previous domain which was just rebooted.
>>>>>> vcpu's TSC
>>>>>> rate is not stored in the configuration and the host TSC rate is
>>>>>> the
>>>>>> used as vcpu's TSC rate. This works fine unless the previous domain
>>>>>> was
>>>>>> migrated from another host machine with a different host TSC rate
>>>>>> than
>>>>>> the current one.
>>>>> I understand why this is necessary over a migration, but why is it
>>>>> important to be able to retain the TSC rate across a reboot? What is
>>>>> the
>>>>> usecase there?
>>>>>
>>>> No usecase so far. Is 'making a virtual machine more like a physical
>>>> machine' reasonable here? (I suppose a physical machine retains TSC
>>>> rate across reboot)
>>> There are situations such as altering firmware settings which can cause
>>> the TSC rate to differ across reboot.  I don't see any reason to try and
>>> maintain it across VM reboots.
>> Right. If it happens to come for free as a side effect of making it work
>> for migration then fine.
>>
>> But it seems to me that tsc rate could/should be in the hypervisors save
>> blob and require no interaction with the toolstack once it is latched when
>> the domain is built.
>>
>> Ian.
>>
> Seemingly I don't need vtsc_khz at all if retaining tsc rate across
> reboot is unnecessary (or even wrong). The migration of tsc rate is
> already done through write_tsc_info() and handle_tsc_info() w/o this
> patch. I'll check if "xl save/restore" also go through this path. If
> so, I think vtsc_khz can be removed.

I can confirm from my rewrite of migration that tsc info is explicitly
saved and restored in both PV and HVM migration.

~Andrew

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-29 10:25           ` Andrew Cooper
@ 2015-09-29 13:59             ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 13:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Boris Ostrovsky, Suravee Suthikulpanit

On Tue, Sep 29, 2015 at 11:25:56AM +0100, Andrew Cooper wrote:
> On 29/09/15 11:02, Haozhong Zhang wrote:
> > On Tue, Sep 29, 2015 at 10:33:14AM +0100, Andrew Cooper wrote:
> >> On 29/09/15 02:07, Haozhong Zhang wrote:
> >>> On Mon, Sep 28, 2015 at 12:02:08PM -0400, Boris Ostrovsky wrote:
> >>>> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >>>>> This patch adds a new call-back setup_tsc_scaling in struct
> >>>>> hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
> >>>>> it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
> >>>>>
> >>>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >>>>> ---
> >>>>>  xen/arch/x86/hvm/hvm.c        | 1 +
> >>>>>  xen/arch/x86/hvm/svm/svm.c    | 5 +++++
> >>>>>  xen/arch/x86/hvm/vmx/vmx.c    | 8 ++++++++
> >>>>>  xen/include/asm-x86/hvm/hvm.h | 3 +++
> >>>>>  4 files changed, 17 insertions(+)
> >>>>>
> >>>>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >>>>> index 3522d20..2d8a148 100644
> >>>>> --- a/xen/arch/x86/hvm/hvm.c
> >>>>> +++ b/xen/arch/x86/hvm/hvm.c
> >>>>> @@ -376,6 +376,7 @@ void hvm_setup_tsc_scaling(struct vcpu *v)
> >>>>>      }
> >>>>>      v->arch.tsc_scaling_ratio = ratio;
> >>>>> +    hvm_funcs.setup_tsc_scaling(v);
> >>>>>  }
> >>>>>  void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >>>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> >>>>> index 73bc863..d890c1f 100644
> >>>>> --- a/xen/arch/x86/hvm/svm/svm.c
> >>>>> +++ b/xen/arch/x86/hvm/svm/svm.c
> >>>>> @@ -2236,6 +2236,10 @@ static void svm_invlpg_intercept(unsigned long vaddr)
> >>>>>      svm_asid_g_invlpg(curr, vaddr);
> >>>>>  }
> >>>>> +static void svm_setup_tsc_scaling(struct vcpu *v)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>> Should this be wrmsrl(MSR_AMD64_TSC_RATIO, v->arch.tsc_scaling_ratio) ?
> >>>>
> >>>> -boris
> >>>>
> >>> MSR_AMD64_TSC_RATIO is set in svm_ctxt_switch_to() before entering guest.
> >>>
> >>> For VMX, the ratio is set to a VMCS field TSC_MULTIPLIER and it's not
> >>> necessary to set it every time entering guest. Therefore, I introduce
> >>> the call-back setup_tsc_scaling() to do this. For SVM, as the ratio is
> >>> set every time entering guest, I leave the SVM version of setup_tsc_scaling()
> >>> empty.
> >> VT-x has a per-VMCS scale, while SVM has a per-core MSR to adjust the
> >> scale.  These do require different modification algorithms.
> >>
> > Yes, this is what I mean.
> >
> >> However, if there is any chance that any part of the system can update
> >> the ratio while an SVM VCPU is in context (which appears to be the
> >> case), then MSR_AMD64_TSC_RATIO needs updating synchronously, or it will
> >> be deferred until the next full context switch which could be an
> >> arbitrary time into the future.  This appears to be a latent bug in the
> >> SVM side.
> >>
> > In my patch, tsc ratio is set only when
> >  1. a domain is created (by arch_domain_create()),
> >  2. a vcpu's state is reset (by hvm_vcpu_reset_state()),
> >  3. a vcpu's context is restored (by hvm_load_cpu_ctxt()), or
> >  4. through the hypercall XEN_DOMCTL_settscinfo.
> >
> > (Correct me if I'm wrong below)
> >
> > For the first 3 cases, vcpu is definitely not in context, so it's safe
> > to set tsc ratio without any latent bug. For the last case,
> > arch_do_domctl() pauses the domain before updating tsc ratio, so it's
> > also safe.
> 
> That logic appears to be correct, which would suggest that there isn't
> actually a latent bug.
> 
> In such a case, we would typically make the hvm_funcs pointer optional,
> and omit an empty stub on the SVM side.
>

Yes, I'll add the following check in hvm_setup_tsc_scaling():

  if ( !hvm_funcs.setup_tsc_scaling )
      return;

- Haozhong

> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 13:56             ` Andrew Cooper
@ 2015-09-29 14:01               ` Haozhong Zhang
  2015-09-29 14:37                 ` Ian Campbell
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 14:01 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Boris Ostrovsky, Suravee Suthikulpanit

On Tue, Sep 29, 2015 at 02:56:12PM +0100, Andrew Cooper wrote:
> On 29/09/15 14:53, Haozhong Zhang wrote:
> > On Tue, Sep 29, 2015 at 11:28:38AM +0100, Ian Campbell wrote:
> >> On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
> >>> On 29/09/15 11:13, Haozhong Zhang wrote:
> >>>> On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
> >>>>> On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> >>>>>> This patch adds an option 'vtsc_khz' to allow users to set vcpu's
> >>>>>> TSC
> >>>>>> rate in KHz. In the case that tsc_mode = 'default', the default
> >>>>>> value of
> >>>>>> 'vtsc_khz' option is the host TSC rate which is used when
> >>>>>> 'vtsc_khz'
> >>>>>> option is set to 0 or does not appear in the configuration. In all
> >>>>>> other
> >>>>>> cases of tsc_mode, 'vtsc_khz' option is just ignored.
> >>>>>>
> >>>>>> Another purpose of adding this option is to keep vcpu's TSC rate
> >>>>>> across
> >>>>>> guest reboot. In existing code, a new domain is created from the
> >>>>>> configuration of the previous domain which was just rebooted.
> >>>>>> vcpu's TSC
> >>>>>> rate is not stored in the configuration and the host TSC rate is
> >>>>>> the
> >>>>>> used as vcpu's TSC rate. This works fine unless the previous domain
> >>>>>> was
> >>>>>> migrated from another host machine with a different host TSC rate
> >>>>>> than
> >>>>>> the current one.
> >>>>> I understand why this is necessary over a migration, but why is it
> >>>>> important to be able to retain the TSC rate across a reboot? What is
> >>>>> the
> >>>>> usecase there?
> >>>>>
> >>>> No usecase so far. Is 'making a virtual machine more like a physical
> >>>> machine' reasonable here? (I suppose a physical machine retains TSC
> >>>> rate across reboot)
> >>> There are situations such as altering firmware settings which can cause
> >>> the TSC rate to differ across reboot.  I don't see any reason to try and
> >>> maintain it across VM reboots.
> >> Right. If it happens to come for free as a side effect of making it work
> >> for migration then fine.
> >>
> >> But it seems to me that tsc rate could/should be in the hypervisors save
> >> blob and require no interaction with the toolstack once it is latched when
> >> the domain is built.
> >>
> >> Ian.
> >>
> > Seemingly I don't need vtsc_khz at all if retaining tsc rate across
> > reboot is unnecessary (or even wrong). The migration of tsc rate is
> > already done through write_tsc_info() and handle_tsc_info() w/o this
> > patch. I'll check if "xl save/restore" also go through this path. If
> > so, I think vtsc_khz can be removed.
> 
> I can confirm from my rewrite of migration that tsc info is explicitly
> saved and restored in both PV and HVM migration.
>

Great! Thanks!

- Haozhong

> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 14:01               ` Haozhong Zhang
@ 2015-09-29 14:37                 ` Ian Campbell
  2015-09-29 15:16                   ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Ian Campbell @ 2015-09-29 14:37 UTC (permalink / raw)
  To: Haozhong Zhang, Andrew Cooper
  Cc: Kevin Tian, Wei Liu, Suravee Suthikulpanit, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, xen-devel, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Boris Ostrovsky

On Tue, 2015-09-29 at 22:01 +0800, Haozhong Zhang wrote:
> On Tue, Sep 29, 2015 at 02:56:12PM +0100, Andrew Cooper wrote:
> > On 29/09/15 14:53, Haozhong Zhang wrote:
> > > On Tue, Sep 29, 2015 at 11:28:38AM +0100, Ian Campbell wrote:
> > > > On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
> > > > > On 29/09/15 11:13, Haozhong Zhang wrote:
> > > > > > On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
> > > > > > > On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> > > > > > > > This patch adds an option 'vtsc_khz' to allow users to set
> > > > > > > > vcpu's
> > > > > > > > TSC
> > > > > > > > rate in KHz. In the case that tsc_mode = 'default', the
> > > > > > > > default
> > > > > > > > value of
> > > > > > > > 'vtsc_khz' option is the host TSC rate which is used when
> > > > > > > > 'vtsc_khz'
> > > > > > > > option is set to 0 or does not appear in the configuration.
> > > > > > > > In all
> > > > > > > > other
> > > > > > > > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > > > > > > > 
> > > > > > > > Another purpose of adding this option is to keep vcpu's TSC
> > > > > > > > rate
> > > > > > > > across
> > > > > > > > guest reboot. In existing code, a new domain is created
> > > > > > > > from the
> > > > > > > > configuration of the previous domain which was just
> > > > > > > > rebooted.
> > > > > > > > vcpu's TSC
> > > > > > > > rate is not stored in the configuration and the host TSC
> > > > > > > > rate is
> > > > > > > > the
> > > > > > > > used as vcpu's TSC rate. This works fine unless the
> > > > > > > > previous domain
> > > > > > > > was
> > > > > > > > migrated from another host machine with a different host
> > > > > > > > TSC rate
> > > > > > > > than
> > > > > > > > the current one.
> > > > > > > I understand why this is necessary over a migration, but why
> > > > > > > is it
> > > > > > > important to be able to retain the TSC rate across a reboot?
> > > > > > > What is
> > > > > > > the
> > > > > > > usecase there?
> > > > > > > 
> > > > > > No usecase so far. Is 'making a virtual machine more like a
> > > > > > physical
> > > > > > machine' reasonable here? (I suppose a physical machine retains
> > > > > > TSC
> > > > > > rate across reboot)
> > > > > There are situations such as altering firmware settings which can
> > > > > cause
> > > > > the TSC rate to differ across reboot.  I don't see any reason to
> > > > > try and
> > > > > maintain it across VM reboots.
> > > > Right. If it happens to come for free as a side effect of making it
> > > > work
> > > > for migration then fine.
> > > > 
> > > > But it seems to me that tsc rate could/should be in the hypervisors
> > > > save
> > > > blob and require no interaction with the toolstack once it is
> > > > latched when
> > > > the domain is built.
> > > > 
> > > > Ian.
> > > > 
> > > Seemingly I don't need vtsc_khz at all if retaining tsc rate across
> > > reboot is unnecessary (or even wrong). The migration of tsc rate is
> > > already done through write_tsc_info() and handle_tsc_info() w/o this
> > > patch. I'll check if "xl save/restore" also go through this path. If
> > > so, I think vtsc_khz can be removed.
> > 
> > I can confirm from my rewrite of migration that tsc info is explicitly
> > saved and restored in both PV and HVM migration.
> > 
> 
> Great! Thanks!

In which case unless someone has a concrete use case for manual
configuration of the tsc rate I guess I'll expect a v2 with no tools side
in it.

Ian.

> 
> - Haozhong
> 
> > ~Andrew
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate
  2015-09-29 14:37                 ` Ian Campbell
@ 2015-09-29 15:16                   ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-09-29 15:16 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Stefano Stabellini, Jun Nakajima, Andrew Cooper, Ian Jackson,
	xen-devel, Aravind Gopalakrishnan, Jan Beulich, Wei Liu,
	Boris Ostrovsky

On Tue, Sep 29, 2015 at 03:37:06PM +0100, Ian Campbell wrote:
> On Tue, 2015-09-29 at 22:01 +0800, Haozhong Zhang wrote:
> > On Tue, Sep 29, 2015 at 02:56:12PM +0100, Andrew Cooper wrote:
> > > On 29/09/15 14:53, Haozhong Zhang wrote:
> > > > On Tue, Sep 29, 2015 at 11:28:38AM +0100, Ian Campbell wrote:
> > > > > On Tue, 2015-09-29 at 11:24 +0100, Andrew Cooper wrote:
> > > > > > On 29/09/15 11:13, Haozhong Zhang wrote:
> > > > > > > On Tue, Sep 29, 2015 at 11:04:14AM +0100, Ian Campbell wrote:
> > > > > > > > On Mon, 2015-09-28 at 15:13 +0800, Haozhong Zhang wrote:
> > > > > > > > > This patch adds an option 'vtsc_khz' to allow users to set
> > > > > > > > > vcpu's
> > > > > > > > > TSC
> > > > > > > > > rate in KHz. In the case that tsc_mode = 'default', the
> > > > > > > > > default
> > > > > > > > > value of
> > > > > > > > > 'vtsc_khz' option is the host TSC rate which is used when
> > > > > > > > > 'vtsc_khz'
> > > > > > > > > option is set to 0 or does not appear in the configuration.
> > > > > > > > > In all
> > > > > > > > > other
> > > > > > > > > cases of tsc_mode, 'vtsc_khz' option is just ignored.
> > > > > > > > > 
> > > > > > > > > Another purpose of adding this option is to keep vcpu's TSC
> > > > > > > > > rate
> > > > > > > > > across
> > > > > > > > > guest reboot. In existing code, a new domain is created
> > > > > > > > > from the
> > > > > > > > > configuration of the previous domain which was just
> > > > > > > > > rebooted.
> > > > > > > > > vcpu's TSC
> > > > > > > > > rate is not stored in the configuration and the host TSC
> > > > > > > > > rate is
> > > > > > > > > the
> > > > > > > > > used as vcpu's TSC rate. This works fine unless the
> > > > > > > > > previous domain
> > > > > > > > > was
> > > > > > > > > migrated from another host machine with a different host
> > > > > > > > > TSC rate
> > > > > > > > > than
> > > > > > > > > the current one.
> > > > > > > > I understand why this is necessary over a migration, but why
> > > > > > > > is it
> > > > > > > > important to be able to retain the TSC rate across a reboot?
> > > > > > > > What is
> > > > > > > > the
> > > > > > > > usecase there?
> > > > > > > > 
> > > > > > > No usecase so far. Is 'making a virtual machine more like a
> > > > > > > physical
> > > > > > > machine' reasonable here? (I suppose a physical machine retains
> > > > > > > TSC
> > > > > > > rate across reboot)
> > > > > > There are situations such as altering firmware settings which can
> > > > > > cause
> > > > > > the TSC rate to differ across reboot.  I don't see any reason to
> > > > > > try and
> > > > > > maintain it across VM reboots.
> > > > > Right. If it happens to come for free as a side effect of making it
> > > > > work
> > > > > for migration then fine.
> > > > > 
> > > > > But it seems to me that tsc rate could/should be in the hypervisors
> > > > > save
> > > > > blob and require no interaction with the toolstack once it is
> > > > > latched when
> > > > > the domain is built.
> > > > > 
> > > > > Ian.
> > > > > 
> > > > Seemingly I don't need vtsc_khz at all if retaining tsc rate across
> > > > reboot is unnecessary (or even wrong). The migration of tsc rate is
> > > > already done through write_tsc_info() and handle_tsc_info() w/o this
> > > > patch. I'll check if "xl save/restore" also go through this path. If
> > > > so, I think vtsc_khz can be removed.
> > > 
> > > I can confirm from my rewrite of migration that tsc info is explicitly
> > > saved and restored in both PV and HVM migration.
> > > 
> > 
> > Great! Thanks!
> 
> In which case unless someone has a concrete use case for manual
> configuration of the tsc rate I guess I'll expect a v2 with no tools side
> in it.
>

This last patch should not be needed in v2.

> Ian.
> 
> > 
> > - Haozhong
> > 
> > > ~Andrew
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-09-28  7:13 ` [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info() Haozhong Zhang
@ 2015-10-09  6:51   ` Jan Beulich
  2015-10-09 13:41     ` Boris Ostrovsky
                       ` (2 more replies)
  0 siblings, 3 replies; 117+ messages in thread
From: Jan Beulich @ 2015-10-09  6:51 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima, Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
> the host TSC with a ratio between guest TSC rate and
> nanoseconds. However, the result will be incorrect if the guest TSC rate
> differs from the host TSC rate. This patch fixes this problem by using
> the system time as elapsed_nsec.

For both this and patch 2, while at a first glance (and taking into
account just the visible patch context) what you say seems to
make sense, the explanation is far from sufficient namely when
looking at the function as a whole. For one, effects on existing
cases need to be explicitly described, in particular why SVM's TSC
ratio code works without that change (or whether it has been
broken all along, in which case these would become backporting
candidates; input from SVM maintainers would be appreciated
too). That may in particular mean being more specific about
what is actually wrong with scaling the host TSC here (i.e. in
which way both results differ), when supposedly that matches
what the hardware does when TSC ratio is supported.

Then a reason needs to be given why the similar logic in the
PVRDTSCP case does not also get adjusted.

Plus, looking at the respective code in tsc_set_info(), I'm
getting the impression that what you're trying to do is not in line
with what is intended so far: Especially the comment there
suggests that the intention is for the guest TSC to be made
match the host one. Considering migration this indeed looks
suspicious, but then that would need changing too.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09  6:51   ` Jan Beulich
@ 2015-10-09 13:41     ` Boris Ostrovsky
  2015-10-09 14:00       ` Haozhong Zhang
  2015-10-09 14:39       ` Jan Beulich
  2015-10-09 14:35     ` Haozhong Zhang
  2015-10-14  2:45     ` Haozhong Zhang
  2 siblings, 2 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 13:41 UTC (permalink / raw)
  To: Jan Beulich, Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Jun Nakajima, Andrew Cooper, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit

On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>> the host TSC with a ratio between guest TSC rate and
>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>> differs from the host TSC rate. This patch fixes this problem by using
>> the system time as elapsed_nsec.
> For both this and patch 2, while at a first glance (and taking into
> account just the visible patch context) what you say seems to
> make sense, the explanation is far from sufficient namely when
> looking at the function as a whole. For one, effects on existing
> cases need to be explicitly described, in particular why SVM's TSC
> ratio code works without that change (or whether it has been
> broken all along, in which case these would become backporting
> candidates; input from SVM maintainers would be appreciated
> too). That may in particular mean being more specific about
> what is actually wrong with scaling the host TSC here (i.e. in
> which way both results differ), when supposedly that matches
> what the hardware does when TSC ratio is supported.

If elapsed_nsec is the time that guest has been running then how can 
get_s_time(), which is system time, be the right answer here? But what 
confuses me even more is that existing code is not doing that neither.

Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side? 
I.e.

*elapsed_nsec = get_s_time() - d->arch.vtsc_offset?

-boris

>
> Then a reason needs to be given why the similar logic in the
> PVRDTSCP case does not also get adjusted.
>
> Plus, looking at the respective code in tsc_set_info(), I'm
> getting the impression that what you're trying to do is not in line
> with what is intended so far: Especially the comment there
> suggests that the intention is for the guest TSC to be made
> match the host one. Considering migration this indeed looks
> suspicious, but then that would need changing too.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 13:41     ` Boris Ostrovsky
@ 2015-10-09 14:00       ` Haozhong Zhang
  2015-10-09 15:11         ` Jan Beulich
  2015-10-09 14:39       ` Jan Beulich
  1 sibling, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-09 14:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Jan Beulich, Jun Nakajima,
	Andrew Cooper, xen-devel, Aravind Gopalakrishnan,
	Suravee Suthikulpanit

On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
> On 10/09/2015 02:51 AM, Jan Beulich wrote:
> >>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
> >>is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
> >>the host TSC with a ratio between guest TSC rate and
> >>nanoseconds. However, the result will be incorrect if the guest TSC rate
> >>differs from the host TSC rate. This patch fixes this problem by using
> >>the system time as elapsed_nsec.
> >For both this and patch 2, while at a first glance (and taking into
> >account just the visible patch context) what you say seems to
> >make sense, the explanation is far from sufficient namely when
> >looking at the function as a whole. For one, effects on existing
> >cases need to be explicitly described, in particular why SVM's TSC
> >ratio code works without that change (or whether it has been
> >broken all along, in which case these would become backporting
> >candidates; input from SVM maintainers would be appreciated
> >too). That may in particular mean being more specific about
> >what is actually wrong with scaling the host TSC here (i.e. in
> >which way both results differ), when supposedly that matches
> >what the hardware does when TSC ratio is supported.
> 
> If elapsed_nsec is the time that guest has been running then how can
> get_s_time(), which is system time, be the right answer here? But what
> confuses me even more is that existing code is not doing that neither.
> 
> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
> I.e.
> 
> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>

Yes, I should minus d->arch.vtsc_offset here.

> -boris
> 
> >
> >Then a reason needs to be given why the similar logic in the
> >PVRDTSCP case does not also get adjusted.
> >
> >Plus, looking at the respective code in tsc_set_info(), I'm
> >getting the impression that what you're trying to do is not in line
> >with what is intended so far: Especially the comment there
> >suggests that the intention is for the guest TSC to be made
> >match the host one. Considering migration this indeed looks
> >suspicious, but then that would need changing too.
> >
> >Jan
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xen.org
> >http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09  6:51   ` Jan Beulich
  2015-10-09 13:41     ` Boris Ostrovsky
@ 2015-10-09 14:35     ` Haozhong Zhang
  2015-10-09 14:43       ` Jan Beulich
  2015-10-14  2:45     ` Haozhong Zhang
  2 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-09 14:35 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

On Fri, Oct 09, 2015 at 12:51:32AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
> > is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
> > the host TSC with a ratio between guest TSC rate and
> > nanoseconds. However, the result will be incorrect if the guest TSC rate
> > differs from the host TSC rate. This patch fixes this problem by using
> > the system time as elapsed_nsec.
> 
> For both this and patch 2, while at a first glance (and taking into
> account just the visible patch context) what you say seems to
> make sense, the explanation is far from sufficient namely when
> looking at the function as a whole. For one, effects on existing
> cases need to be explicitly described, in particular why SVM's TSC
> ratio code works without that change (or whether it has been
> broken all along, in which case these would become backporting
> candidates; input from SVM maintainers would be appreciated
> too). That may in particular mean being more specific about
> what is actually wrong with scaling the host TSC here (i.e. in
> which way both results differ), when supposedly that matches
> what the hardware does when TSC ratio is supported.
> 
> Then a reason needs to be given why the similar logic in the
> PVRDTSCP case does not also get adjusted.
> 
> Plus, looking at the respective code in tsc_set_info(), I'm
> getting the impression that what you're trying to do is not in line
> with what is intended so far: Especially the comment there
> suggests that the intention is for the guest TSC to be made
> match the host one. Considering migration this indeed looks
> suspicious, but then that would need changing too.
>

Do you mean the following comment?
/*
 * In default mode use native TSC if the host has safe TSC and:
 *  HVM/PVH: host and guest frequencies are the same (either
 *           "naturally" or via TSC scaling)
 *  PV: guest has not migrated yet (and thus arch.tsc_khz == cpu_khz)
 */
						     
To my understanding,

1. "naturally" responds to the case that a domain is
   newly created (rather than being migrated from other machine) so that
   its TSC frequency (d->arch.tsc_khz) is identical to the host TSC
   frequency (cpu_khz).

2. "via TSC scaling" means the case that the domain is migrated from
   another machine of different host TSC rate so that d->arch.tsc_khz
   != cpu_khz. In this case the guest still reads the (host) TSC
   natively, but SVM TSC ratio makes sure that TSC value is a scaled
   host TSC. This point can be confirmed by svm_tsc_ratio_load() which
   sets MSR_AMD64_TSC_RATIO to d->arch.tsc_khz/cpu_khz.

If my understanding, especially the second point, is correct, this
patch set intends to do the same thing with VMX TSC scaling.

Boris, I notice this comment was added by your commit 82713ec8. Is my
understanding correct?

Thanks,
Haozhong

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 13:41     ` Boris Ostrovsky
  2015-10-09 14:00       ` Haozhong Zhang
@ 2015-10-09 14:39       ` Jan Beulich
  2015-10-09 15:37         ` Boris Ostrovsky
  1 sibling, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-09 14:39 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Haozhong Zhang, Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Andrew Cooper, xen-devel, Aravind Gopalakrishnan, Jun Nakajima

>>> On 09.10.15 at 15:41, <boris.ostrovsky@oracle.com> wrote:
> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>> the host TSC with a ratio between guest TSC rate and
>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>> differs from the host TSC rate. This patch fixes this problem by using
>>> the system time as elapsed_nsec.
>> For both this and patch 2, while at a first glance (and taking into
>> account just the visible patch context) what you say seems to
>> make sense, the explanation is far from sufficient namely when
>> looking at the function as a whole. For one, effects on existing
>> cases need to be explicitly described, in particular why SVM's TSC
>> ratio code works without that change (or whether it has been
>> broken all along, in which case these would become backporting
>> candidates; input from SVM maintainers would be appreciated
>> too). That may in particular mean being more specific about
>> what is actually wrong with scaling the host TSC here (i.e. in
>> which way both results differ), when supposedly that matches
>> what the hardware does when TSC ratio is supported.
> 
> If elapsed_nsec is the time that guest has been running then how can 
> get_s_time(), which is system time, be the right answer here? But what 
> confuses me even more is that existing code is not doing that neither.
> 
> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side? 
> I.e.
> 
> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?

Doesn't whether or not to adjust be the offset depend on d-arch.vtsc?

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 14:35     ` Haozhong Zhang
@ 2015-10-09 14:43       ` Jan Beulich
  2015-10-09 15:56         ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-09 14:43 UTC (permalink / raw)
  To: Haozhong Zhang, Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

>>> On 09.10.15 at 16:35, <haozhong.zhang@intel.com> wrote:
> On Fri, Oct 09, 2015 at 12:51:32AM -0600, Jan Beulich wrote:
>> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> > When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>> > is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>> > the host TSC with a ratio between guest TSC rate and
>> > nanoseconds. However, the result will be incorrect if the guest TSC rate
>> > differs from the host TSC rate. This patch fixes this problem by using
>> > the system time as elapsed_nsec.
>> 
>> For both this and patch 2, while at a first glance (and taking into
>> account just the visible patch context) what you say seems to
>> make sense, the explanation is far from sufficient namely when
>> looking at the function as a whole. For one, effects on existing
>> cases need to be explicitly described, in particular why SVM's TSC
>> ratio code works without that change (or whether it has been
>> broken all along, in which case these would become backporting
>> candidates; input from SVM maintainers would be appreciated
>> too). That may in particular mean being more specific about
>> what is actually wrong with scaling the host TSC here (i.e. in
>> which way both results differ), when supposedly that matches
>> what the hardware does when TSC ratio is supported.
>> 
>> Then a reason needs to be given why the similar logic in the
>> PVRDTSCP case does not also get adjusted.
>> 
>> Plus, looking at the respective code in tsc_set_info(), I'm
>> getting the impression that what you're trying to do is not in line
>> with what is intended so far: Especially the comment there
>> suggests that the intention is for the guest TSC to be made
>> match the host one. Considering migration this indeed looks
>> suspicious, but then that would need changing too.
>>
> 
> Do you mean the following comment?
> /*
>  * In default mode use native TSC if the host has safe TSC and:
>  *  HVM/PVH: host and guest frequencies are the same (either
>  *           "naturally" or via TSC scaling)
>  *  PV: guest has not migrated yet (and thus arch.tsc_khz == cpu_khz)
>  */
> 						     
> To my understanding,
> 
> 1. "naturally" responds to the case that a domain is
>    newly created (rather than being migrated from other machine) so that
>    its TSC frequency (d->arch.tsc_khz) is identical to the host TSC
>    frequency (cpu_khz).
> 
> 2. "via TSC scaling" means the case that the domain is migrated from
>    another machine of different host TSC rate so that d->arch.tsc_khz
>    != cpu_khz. In this case the guest still reads the (host) TSC
>    natively, but SVM TSC ratio makes sure that TSC value is a scaled
>    host TSC. This point can be confirmed by svm_tsc_ratio_load() which
>    sets MSR_AMD64_TSC_RATIO to d->arch.tsc_khz/cpu_khz.

I.e. they are _not_ the same (unless the quotient happens to be 1,
in which case scaling wouldn't be necessary in the first place). I.e.
imo the comment would need to be

/*
 * In default mode use native TSC if the host has safe TSC and:
 *  HVM/PVH: host and guest frequencies are the same or TSC
 *           scaling is in use
 *  PV: guest has not migrated yet (and thus arch.tsc_khz == cpu_khz)
 */

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 14:00       ` Haozhong Zhang
@ 2015-10-09 15:11         ` Jan Beulich
  2015-10-09 16:09           ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-09 15:11 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima, Boris Ostrovsky

>>> On 09.10.15 at 16:00, <haozhong.zhang@intel.com> wrote:
> On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>> >>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> >>When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>> >>is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>> >>the host TSC with a ratio between guest TSC rate and
>> >>nanoseconds. However, the result will be incorrect if the guest TSC rate
>> >>differs from the host TSC rate. This patch fixes this problem by using
>> >>the system time as elapsed_nsec.
>> >For both this and patch 2, while at a first glance (and taking into
>> >account just the visible patch context) what you say seems to
>> >make sense, the explanation is far from sufficient namely when
>> >looking at the function as a whole. For one, effects on existing
>> >cases need to be explicitly described, in particular why SVM's TSC
>> >ratio code works without that change (or whether it has been
>> >broken all along, in which case these would become backporting
>> >candidates; input from SVM maintainers would be appreciated
>> >too). That may in particular mean being more specific about
>> >what is actually wrong with scaling the host TSC here (i.e. in
>> >which way both results differ), when supposedly that matches
>> >what the hardware does when TSC ratio is supported.
>> 
>> If elapsed_nsec is the time that guest has been running then how can
>> get_s_time(), which is system time, be the right answer here? But what
>> confuses me even more is that existing code is not doing that neither.
>> 
>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>> I.e.
>> 
>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>>
> 
> Yes, I should minus d->arch.vtsc_offset here.

In which case - afaict - the code becomes identical to that of the
TSC_MODE_ALWAYS_EMULATE case as well as the
TSC_MODE_DEFAULT w/ d->arch.vtsc true. Which seems quite
unlikely to be correct.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 14:39       ` Jan Beulich
@ 2015-10-09 15:37         ` Boris Ostrovsky
  2015-10-09 16:39           ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 15:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Haozhong Zhang, Kevin Tian, Keir Fraser, Suravee Suthikulpanit,
	Andrew Cooper, xen-devel, Aravind Gopalakrishnan, Jun Nakajima

On 10/09/2015 10:39 AM, Jan Beulich wrote:
>>>> On 09.10.15 at 15:41, <boris.ostrovsky@oracle.com> wrote:
>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>> the host TSC with a ratio between guest TSC rate and
>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>> the system time as elapsed_nsec.
>>> For both this and patch 2, while at a first glance (and taking into
>>> account just the visible patch context) what you say seems to
>>> make sense, the explanation is far from sufficient namely when
>>> looking at the function as a whole. For one, effects on existing
>>> cases need to be explicitly described, in particular why SVM's TSC
>>> ratio code works without that change (or whether it has been
>>> broken all along, in which case these would become backporting
>>> candidates; input from SVM maintainers would be appreciated
>>> too). That may in particular mean being more specific about
>>> what is actually wrong with scaling the host TSC here (i.e. in
>>> which way both results differ), when supposedly that matches
>>> what the hardware does when TSC ratio is supported.
>> If elapsed_nsec is the time that guest has been running then how can
>> get_s_time(), which is system time, be the right answer here? But what
>> confuses me even more is that existing code is not doing that neither.
>>
>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>> I.e.
>>
>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
> Doesn't whether or not to adjust be the offset depend on d-arch.vtsc?

We only use elapsed_nsec when vtsc is set, I think. In native case 
(vtsc=0) elapsed_nsec and d->arch.vtsc_offset are ignored.

-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 14:43       ` Jan Beulich
@ 2015-10-09 15:56         ` Boris Ostrovsky
  0 siblings, 0 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 15:56 UTC (permalink / raw)
  To: Jan Beulich, Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

On 10/09/2015 10:43 AM, Jan Beulich wrote:
>>>> On 09.10.15 at 16:35, <haozhong.zhang@intel.com> wrote:
>> On Fri, Oct 09, 2015 at 12:51:32AM -0600, Jan Beulich wrote:
>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>> the host TSC with a ratio between guest TSC rate and
>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>> the system time as elapsed_nsec.
>>> For both this and patch 2, while at a first glance (and taking into
>>> account just the visible patch context) what you say seems to
>>> make sense, the explanation is far from sufficient namely when
>>> looking at the function as a whole. For one, effects on existing
>>> cases need to be explicitly described, in particular why SVM's TSC
>>> ratio code works without that change (or whether it has been
>>> broken all along, in which case these would become backporting
>>> candidates; input from SVM maintainers would be appreciated
>>> too). That may in particular mean being more specific about
>>> what is actually wrong with scaling the host TSC here (i.e. in
>>> which way both results differ), when supposedly that matches
>>> what the hardware does when TSC ratio is supported.
>>>
>>> Then a reason needs to be given why the similar logic in the
>>> PVRDTSCP case does not also get adjusted.
>>>
>>> Plus, looking at the respective code in tsc_set_info(), I'm
>>> getting the impression that what you're trying to do is not in line
>>> with what is intended so far: Especially the comment there
>>> suggests that the intention is for the guest TSC to be made
>>> match the host one. Considering migration this indeed looks
>>> suspicious, but then that would need changing too.
>>>
>> Do you mean the following comment?
>> /*
>>   * In default mode use native TSC if the host has safe TSC and:
>>   *  HVM/PVH: host and guest frequencies are the same (either
>>   *           "naturally" or via TSC scaling)
>>   *  PV: guest has not migrated yet (and thus arch.tsc_khz == cpu_khz)
>>   */
>> 						
>> To my understanding,
>>
>> 1. "naturally" responds to the case that a domain is
>>     newly created (rather than being migrated from other machine) so that
>>     its TSC frequency (d->arch.tsc_khz) is identical to the host TSC
>>     frequency (cpu_khz).
>>
>> 2. "via TSC scaling" means the case that the domain is migrated from
>>     another machine of different host TSC rate so that d->arch.tsc_khz
>>     != cpu_khz. In this case the guest still reads the (host) TSC
>>     natively, but SVM TSC ratio makes sure that TSC value is a scaled
>>     host TSC. This point can be confirmed by svm_tsc_ratio_load() which
>>     sets MSR_AMD64_TSC_RATIO to d->arch.tsc_khz/cpu_khz.
> I.e. they are _not_ the same (unless the quotient happens to be 1,
> in which case scaling wouldn't be necessary in the first place). I.e.
> imo the comment would need to be
>
> /*
>   * In default mode use native TSC if the host has safe TSC and:
>   *  HVM/PVH: host and guest frequencies are the same or TSC
>   *           scaling is in use

Yes, that's what I meant to say. I was referring to "virtual" frequency.

-boris

>   *  PV: guest has not migrated yet (and thus arch.tsc_khz == cpu_khz)
>   */
>
> Jan
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 15:11         ` Jan Beulich
@ 2015-10-09 16:09           ` Boris Ostrovsky
  2015-10-09 16:19             ` Jan Beulich
  0 siblings, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 16:09 UTC (permalink / raw)
  To: Jan Beulich, Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

On 10/09/2015 11:11 AM, Jan Beulich wrote:
>>>> On 09.10.15 at 16:00, <haozhong.zhang@intel.com> wrote:
>> On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
>>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>>> the host TSC with a ratio between guest TSC rate and
>>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>>> the system time as elapsed_nsec.
>>>> For both this and patch 2, while at a first glance (and taking into
>>>> account just the visible patch context) what you say seems to
>>>> make sense, the explanation is far from sufficient namely when
>>>> looking at the function as a whole. For one, effects on existing
>>>> cases need to be explicitly described, in particular why SVM's TSC
>>>> ratio code works without that change (or whether it has been
>>>> broken all along, in which case these would become backporting
>>>> candidates; input from SVM maintainers would be appreciated
>>>> too). That may in particular mean being more specific about
>>>> what is actually wrong with scaling the host TSC here (i.e. in
>>>> which way both results differ), when supposedly that matches
>>>> what the hardware does when TSC ratio is supported.
>>> If elapsed_nsec is the time that guest has been running then how can
>>> get_s_time(), which is system time, be the right answer here? But what
>>> confuses me even more is that existing code is not doing that neither.
>>>
>>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>>> I.e.
>>>
>>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>>>
>> Yes, I should minus d->arch.vtsc_offset here.
> In which case - afaict - the code becomes identical to that of the
> TSC_MODE_ALWAYS_EMULATE case as well as the
> TSC_MODE_DEFAULT w/ d->arch.vtsc true. Which seems quite
> unlikely to be correct.

*elapsed_nsec = *gtsc_khz = 0; ? Because we are effectively in 
TSC_MODE_NEVER.

That can't be right...


-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 16:09           ` Boris Ostrovsky
@ 2015-10-09 16:19             ` Jan Beulich
  2015-10-09 16:31               ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-09 16:19 UTC (permalink / raw)
  To: Haozhong Zhang, Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

>>> On 09.10.15 at 18:09, <boris.ostrovsky@oracle.com> wrote:
> On 10/09/2015 11:11 AM, Jan Beulich wrote:
>>>>> On 09.10.15 at 16:00, <haozhong.zhang@intel.com> wrote:
>>> On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
>>>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>>>> the host TSC with a ratio between guest TSC rate and
>>>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>>>> the system time as elapsed_nsec.
>>>>> For both this and patch 2, while at a first glance (and taking into
>>>>> account just the visible patch context) what you say seems to
>>>>> make sense, the explanation is far from sufficient namely when
>>>>> looking at the function as a whole. For one, effects on existing
>>>>> cases need to be explicitly described, in particular why SVM's TSC
>>>>> ratio code works without that change (or whether it has been
>>>>> broken all along, in which case these would become backporting
>>>>> candidates; input from SVM maintainers would be appreciated
>>>>> too). That may in particular mean being more specific about
>>>>> what is actually wrong with scaling the host TSC here (i.e. in
>>>>> which way both results differ), when supposedly that matches
>>>>> what the hardware does when TSC ratio is supported.
>>>> If elapsed_nsec is the time that guest has been running then how can
>>>> get_s_time(), which is system time, be the right answer here? But what
>>>> confuses me even more is that existing code is not doing that neither.
>>>>
>>>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>>>> I.e.
>>>>
>>>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>>>>
>>> Yes, I should minus d->arch.vtsc_offset here.
>> In which case - afaict - the code becomes identical to that of the
>> TSC_MODE_ALWAYS_EMULATE case as well as the
>> TSC_MODE_DEFAULT w/ d->arch.vtsc true. Which seems quite
>> unlikely to be correct.
> 
> *elapsed_nsec = *gtsc_khz = 0; ? Because we are effectively in 
> TSC_MODE_NEVER.

How that? Talk here has been about TSC_MODE_DEFAULT...

> That can't be right...

Why not? tsc_set_info() doesn't care about any of its other input
values when that mode is in effect.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 16:19             ` Jan Beulich
@ 2015-10-09 16:31               ` Boris Ostrovsky
  2015-10-09 16:51                 ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 16:31 UTC (permalink / raw)
  To: Jan Beulich, Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

On 10/09/2015 12:19 PM, Jan Beulich wrote:
>>>> On 09.10.15 at 18:09, <boris.ostrovsky@oracle.com> wrote:
>> On 10/09/2015 11:11 AM, Jan Beulich wrote:
>>>>>> On 09.10.15 at 16:00, <haozhong.zhang@intel.com> wrote:
>>>> On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
>>>>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>>>>> the host TSC with a ratio between guest TSC rate and
>>>>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>>>>> the system time as elapsed_nsec.
>>>>>> For both this and patch 2, while at a first glance (and taking into
>>>>>> account just the visible patch context) what you say seems to
>>>>>> make sense, the explanation is far from sufficient namely when
>>>>>> looking at the function as a whole. For one, effects on existing
>>>>>> cases need to be explicitly described, in particular why SVM's TSC
>>>>>> ratio code works without that change (or whether it has been
>>>>>> broken all along, in which case these would become backporting
>>>>>> candidates; input from SVM maintainers would be appreciated
>>>>>> too). That may in particular mean being more specific about
>>>>>> what is actually wrong with scaling the host TSC here (i.e. in
>>>>>> which way both results differ), when supposedly that matches
>>>>>> what the hardware does when TSC ratio is supported.
>>>>> If elapsed_nsec is the time that guest has been running then how can
>>>>> get_s_time(), which is system time, be the right answer here? But what
>>>>> confuses me even more is that existing code is not doing that neither.
>>>>>
>>>>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>>>>> I.e.
>>>>>
>>>>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>>>>>
>>>> Yes, I should minus d->arch.vtsc_offset here.
>>> In which case - afaict - the code becomes identical to that of the
>>> TSC_MODE_ALWAYS_EMULATE case as well as the
>>> TSC_MODE_DEFAULT w/ d->arch.vtsc true. Which seems quite
>>> unlikely to be correct.
>> *elapsed_nsec = *gtsc_khz = 0; ? Because we are effectively in
>> TSC_MODE_NEVER.
> How that? Talk here has been about TSC_MODE_DEFAULT...

AFAIUI, TSC_MODE_DEFAULT is a shorthand for saying "I will let the 
hypervisor pick whether the guest will be in TSC_MODE_ALWAYS_EMULATE or 
TSC_MODE_NEVER". d->arch.vtsc is what ends up being internal 
implementation of user-provided mode (for the most parts; I think 
hvm_cpuid() being the only true exception --- and perhaps it needs to be 
looked at).

So if we have d->arch.vtsc=0 (which is the case we are talking about 
here) then we are really in NEVER mode


-boris

>
>> That can't be right...
> Why not? tsc_set_info() doesn't care about any of its other input
> values when that mode is in effect.
>
> Jan
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 15:37         ` Boris Ostrovsky
@ 2015-10-09 16:39           ` Haozhong Zhang
  2015-10-09 16:44             ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-09 16:39 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Jan Beulich, Aravind Gopalakrishnan, Jun Nakajima

On Fri, Oct 09, 2015 at 11:37:06AM -0400, Boris Ostrovsky wrote:
> On 10/09/2015 10:39 AM, Jan Beulich wrote:
> >>>>On 09.10.15 at 15:41, <boris.ostrovsky@oracle.com> wrote:
> >>On 10/09/2015 02:51 AM, Jan Beulich wrote:
> >>>>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>>>When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
> >>>>is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
> >>>>the host TSC with a ratio between guest TSC rate and
> >>>>nanoseconds. However, the result will be incorrect if the guest TSC rate
> >>>>differs from the host TSC rate. This patch fixes this problem by using
> >>>>the system time as elapsed_nsec.
> >>>For both this and patch 2, while at a first glance (and taking into
> >>>account just the visible patch context) what you say seems to
> >>>make sense, the explanation is far from sufficient namely when
> >>>looking at the function as a whole. For one, effects on existing
> >>>cases need to be explicitly described, in particular why SVM's TSC
> >>>ratio code works without that change (or whether it has been
> >>>broken all along, in which case these would become backporting
> >>>candidates; input from SVM maintainers would be appreciated
> >>>too). That may in particular mean being more specific about
> >>>what is actually wrong with scaling the host TSC here (i.e. in
> >>>which way both results differ), when supposedly that matches
> >>>what the hardware does when TSC ratio is supported.
> >>If elapsed_nsec is the time that guest has been running then how can
> >>get_s_time(), which is system time, be the right answer here? But what
> >>confuses me even more is that existing code is not doing that neither.
> >>
> >>Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
> >>I.e.
> >>
> >>*elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
> >Doesn't whether or not to adjust be the offset depend on d-arch.vtsc?
> 
> We only use elapsed_nsec when vtsc is set, I think. In native case (vtsc=0)
> elapsed_nsec and d->arch.vtsc_offset are ignored.
>

But it is used in tsc_set_info() if a HVM domain in TSC_MODE_DEFAULT
is migrated to a machine and the following if condition in
tsc_set_info() is false.

if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
     (has_hvm_container_domain(d) ?
      d->arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio :
      incarnation == 0) )

- Haozhong

> -boris
> 
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 16:39           ` Haozhong Zhang
@ 2015-10-09 16:44             ` Boris Ostrovsky
  0 siblings, 0 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 16:44 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan, Suravee Suthikulpanit,
	Andrew Cooper, Jun Nakajima, Kevin Tian, xen-devel, Keir Fraser

On 10/09/2015 12:39 PM, Haozhong Zhang wrote:
> On Fri, Oct 09, 2015 at 11:37:06AM -0400, Boris Ostrovsky wrote:
>> On 10/09/2015 10:39 AM, Jan Beulich wrote:
>>>>>> On 09.10.15 at 15:41, <boris.ostrovsky@oracle.com> wrote:
>>>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>>>> the host TSC with a ratio between guest TSC rate and
>>>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>>>> the system time as elapsed_nsec.
>>>>> For both this and patch 2, while at a first glance (and taking into
>>>>> account just the visible patch context) what you say seems to
>>>>> make sense, the explanation is far from sufficient namely when
>>>>> looking at the function as a whole. For one, effects on existing
>>>>> cases need to be explicitly described, in particular why SVM's TSC
>>>>> ratio code works without that change (or whether it has been
>>>>> broken all along, in which case these would become backporting
>>>>> candidates; input from SVM maintainers would be appreciated
>>>>> too). That may in particular mean being more specific about
>>>>> what is actually wrong with scaling the host TSC here (i.e. in
>>>>> which way both results differ), when supposedly that matches
>>>>> what the hardware does when TSC ratio is supported.
>>>> If elapsed_nsec is the time that guest has been running then how can
>>>> get_s_time(), which is system time, be the right answer here? But what
>>>> confuses me even more is that existing code is not doing that neither.
>>>>
>>>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>>>> I.e.
>>>>
>>>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>>> Doesn't whether or not to adjust be the offset depend on d-arch.vtsc?
>> We only use elapsed_nsec when vtsc is set, I think. In native case (vtsc=0)
>> elapsed_nsec and d->arch.vtsc_offset are ignored.
>>
> But it is used in tsc_set_info() if a HVM domain in TSC_MODE_DEFAULT
> is migrated to a machine and the following if condition in
> tsc_set_info() is false.
>
> if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
>       (has_hvm_container_domain(d) ?
>        d->arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio :
>        incarnation == 0) )


Ah, yes, then we do need to save it.

-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 16:31               ` Boris Ostrovsky
@ 2015-10-09 16:51                 ` Haozhong Zhang
  2015-10-09 18:59                   ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-09 16:51 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Jan Beulich, Aravind Gopalakrishnan, Jun Nakajima

On Fri, Oct 09, 2015 at 12:31:53PM -0400, Boris Ostrovsky wrote:
> On 10/09/2015 12:19 PM, Jan Beulich wrote:
> >>>>On 09.10.15 at 18:09, <boris.ostrovsky@oracle.com> wrote:
> >>On 10/09/2015 11:11 AM, Jan Beulich wrote:
> >>>>>>On 09.10.15 at 16:00, <haozhong.zhang@intel.com> wrote:
> >>>>On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
> >>>>>On 10/09/2015 02:51 AM, Jan Beulich wrote:
> >>>>>>>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>>>>>>When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
> >>>>>>>is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
> >>>>>>>the host TSC with a ratio between guest TSC rate and
> >>>>>>>nanoseconds. However, the result will be incorrect if the guest TSC rate
> >>>>>>>differs from the host TSC rate. This patch fixes this problem by using
> >>>>>>>the system time as elapsed_nsec.
> >>>>>>For both this and patch 2, while at a first glance (and taking into
> >>>>>>account just the visible patch context) what you say seems to
> >>>>>>make sense, the explanation is far from sufficient namely when
> >>>>>>looking at the function as a whole. For one, effects on existing
> >>>>>>cases need to be explicitly described, in particular why SVM's TSC
> >>>>>>ratio code works without that change (or whether it has been
> >>>>>>broken all along, in which case these would become backporting
> >>>>>>candidates; input from SVM maintainers would be appreciated
> >>>>>>too). That may in particular mean being more specific about
> >>>>>>what is actually wrong with scaling the host TSC here (i.e. in
> >>>>>>which way both results differ), when supposedly that matches
> >>>>>>what the hardware does when TSC ratio is supported.
> >>>>>If elapsed_nsec is the time that guest has been running then how can
> >>>>>get_s_time(), which is system time, be the right answer here? But what
> >>>>>confuses me even more is that existing code is not doing that neither.
> >>>>>
> >>>>>Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
> >>>>>I.e.
> >>>>>
> >>>>>*elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
> >>>>>
> >>>>Yes, I should minus d->arch.vtsc_offset here.
> >>>In which case - afaict - the code becomes identical to that of the
> >>>TSC_MODE_ALWAYS_EMULATE case as well as the
> >>>TSC_MODE_DEFAULT w/ d->arch.vtsc true. Which seems quite
> >>>unlikely to be correct.
> >>*elapsed_nsec = *gtsc_khz = 0; ? Because we are effectively in
> >>TSC_MODE_NEVER.
> >How that? Talk here has been about TSC_MODE_DEFAULT...
> 
> AFAIUI, TSC_MODE_DEFAULT is a shorthand for saying "I will let the
> hypervisor pick whether the guest will be in TSC_MODE_ALWAYS_EMULATE or
> TSC_MODE_NEVER". d->arch.vtsc is what ends up being internal implementation
> of user-provided mode (for the most parts; I think hvm_cpuid() being the
> only true exception --- and perhaps it needs to be looked at).
> 
> So if we have d->arch.vtsc=0 (which is the case we are talking about here)
> then we are really in NEVER mode
>

Not quite understand this. Is tsc_set_info() the only place to set
d->arch.tsc_mode ? Though it may decide d->arch.vtsc should be 1, it
still sets d->arch.tsc_mode to the user provided TSC mode for a
non-pvh domain. And then in tsc_get_info(), it should never fall into
TSC_MODE_NEVER_EMULATE branch if d->arch.tsc_mode is not.

- Haozhong

> 
> -boris
> 
> >
> >>That can't be right...
> >Why not? tsc_set_info() doesn't care about any of its other input
> >values when that mode is in effect.
> >
> >Jan
> >
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09 16:51                 ` Haozhong Zhang
@ 2015-10-09 18:59                   ` Boris Ostrovsky
  0 siblings, 0 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-09 18:59 UTC (permalink / raw)
  To: xen-devel

On 10/09/2015 12:51 PM, Haozhong Zhang wrote:
> On Fri, Oct 09, 2015 at 12:31:53PM -0400, Boris Ostrovsky wrote:
>> On 10/09/2015 12:19 PM, Jan Beulich wrote:
>>>>>> On 09.10.15 at 18:09, <boris.ostrovsky@oracle.com> wrote:
>>>> On 10/09/2015 11:11 AM, Jan Beulich wrote:
>>>>>>>> On 09.10.15 at 16:00, <haozhong.zhang@intel.com> wrote:
>>>>>> On Fri, Oct 09, 2015 at 09:41:36AM -0400, Boris Ostrovsky wrote:
>>>>>>> On 10/09/2015 02:51 AM, Jan Beulich wrote:
>>>>>>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>>>>>>> When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
>>>>>>>>> is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
>>>>>>>>> the host TSC with a ratio between guest TSC rate and
>>>>>>>>> nanoseconds. However, the result will be incorrect if the guest TSC rate
>>>>>>>>> differs from the host TSC rate. This patch fixes this problem by using
>>>>>>>>> the system time as elapsed_nsec.
>>>>>>>> For both this and patch 2, while at a first glance (and taking into
>>>>>>>> account just the visible patch context) what you say seems to
>>>>>>>> make sense, the explanation is far from sufficient namely when
>>>>>>>> looking at the function as a whole. For one, effects on existing
>>>>>>>> cases need to be explicitly described, in particular why SVM's TSC
>>>>>>>> ratio code works without that change (or whether it has been
>>>>>>>> broken all along, in which case these would become backporting
>>>>>>>> candidates; input from SVM maintainers would be appreciated
>>>>>>>> too). That may in particular mean being more specific about
>>>>>>>> what is actually wrong with scaling the host TSC here (i.e. in
>>>>>>>> which way both results differ), when supposedly that matches
>>>>>>>> what the hardware does when TSC ratio is supported.
>>>>>>> If elapsed_nsec is the time that guest has been running then how can
>>>>>>> get_s_time(), which is system time, be the right answer here? But what
>>>>>>> confuses me even more is that existing code is not doing that neither.
>>>>>>>
>>>>>>> Shouldn't elapsed_nsec be offset by d->arch.vtsc_offset on the get side?
>>>>>>> I.e.
>>>>>>>
>>>>>>> *elapsed_nsec = get_s_time() - d->arch.vtsc_offset?
>>>>>>>
>>>>>> Yes, I should minus d->arch.vtsc_offset here.
>>>>> In which case - afaict - the code becomes identical to that of the
>>>>> TSC_MODE_ALWAYS_EMULATE case as well as the
>>>>> TSC_MODE_DEFAULT w/ d->arch.vtsc true. Which seems quite
>>>>> unlikely to be correct.
>>>> *elapsed_nsec = *gtsc_khz = 0; ? Because we are effectively in
>>>> TSC_MODE_NEVER.
>>> How that? Talk here has been about TSC_MODE_DEFAULT...
>> AFAIUI, TSC_MODE_DEFAULT is a shorthand for saying "I will let the
>> hypervisor pick whether the guest will be in TSC_MODE_ALWAYS_EMULATE or
>> TSC_MODE_NEVER". d->arch.vtsc is what ends up being internal implementation
>> of user-provided mode (for the most parts; I think hvm_cpuid() being the
>> only true exception --- and perhaps it needs to be looked at).
>>
>> So if we have d->arch.vtsc=0 (which is the case we are talking about here)
>> then we are really in NEVER mode
>>
> Not quite understand this. Is tsc_set_info() the only place to set
> d->arch.tsc_mode ?

Yes.

> Though it may decide d->arch.vtsc should be 1, it
> still sets d->arch.tsc_mode to the user provided TSC mode for a
> non-pvh domain. And then in tsc_get_info(), it should never fall into
> TSC_MODE_NEVER_EMULATE branch if d->arch.tsc_mode is not.

I was trying to say that TSC behavior in current incarnation is 
equivalent to _NEVER if d->arch.vtsc is 0. But when we call 
tsc_get_info() we can not handle it out of _NEVER case (because, as you 
pointed out, d->arch.vtsc may change after migration). And we don't.

-boris


>
> - Haozhong
>
>> -boris
>>
>>>> That can't be right...
>>> Why not? tsc_set_info() doesn't care about any of its other input
>>> values when that mode is in effect.
>>>
>>> Jan
>>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-09  6:51   ` Jan Beulich
  2015-10-09 13:41     ` Boris Ostrovsky
  2015-10-09 14:35     ` Haozhong Zhang
@ 2015-10-14  2:45     ` Haozhong Zhang
  2015-10-14  9:40       ` Jan Beulich
  2 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-14  2:45 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima

On Fri, Oct 09, 2015 at 12:51:32AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > When the TSC mode of a domain is TSC_MODE_DEFAULT and no TSC emulation
> > is used, the existing tsc_get_info() calculates elapsed_nsec by scaling
> > the host TSC with a ratio between guest TSC rate and
> > nanoseconds. However, the result will be incorrect if the guest TSC rate
> > differs from the host TSC rate. This patch fixes this problem by using
> > the system time as elapsed_nsec.
> 
> For both this and patch 2, while at a first glance (and taking into
> account just the visible patch context) what you say seems to
> make sense, the explanation is far from sufficient namely when
> looking at the function as a whole. For one, effects on existing
> cases need to be explicitly described, in particular why SVM's TSC
> ratio code works without that change (or whether it has been
> broken all along, in which case these would become backporting
> candidates; input from SVM maintainers would be appreciated
> too). That may in particular mean being more specific about
> what is actually wrong with scaling the host TSC here (i.e. in
> which way both results differ), when supposedly that matches
> what the hardware does when TSC ratio is supported.
>

I just found that patch 1 is in fact not necessary for supporting VMX
TSC scaling/SVM TSC ratio, because

 1. VMX TSC scaling and SVM TSC ratio are only used for HVM domains.
 
 2. The value of elapsed_nsec, which is modified by patch 1, is used
    to compute d->arch.vtsc_offset by tsc_set_info() for domains using
    TSC_MODE_[DEFAULT|ALWAYS_EMULATE].
    
 3. d->arch.vtsc_offset is then used in three places:
   - gtime_to_gtsc() and gtsc_to_gtime()
    In these two functions, d->arch.vtsc_offset does not take effect
    for HVM domains.
   - cpuid_time_leaf()
    It's only used for domains using TSC_MODE_PVRDTSCP.

Therefore, I think patch 1 can be removed.


However, patch 2 is still necessary. The existing tsc_get_info() uses
the host TSC frequency as the guest TSC frequency for a domain in
TSC_MODE_DEFAULT, which could cause errors in the following example:
 - A domain d using TSC_MODE_DEFAULT is created on host A, then
   migrated to host B, and finally migrated to host C.
 - The host TSC frequencies of three hosts are f_a, f_b and f_c
   respectively and f_a != f_b and f_b != f_c.
 - Both host B and host C support TSC scaling (either VMX TSC scaling
   or SVM TSC ratio).

In above example w/o patch 2,
 1. Initially, d->arch.tsc_khz == f_a.
 
 2. In the first migration, tsc_get_info() on host A passes f_a as the
    guest TSC frequency to tsc_set_info() on host B, so that after the
    migration it's still that d->arch.tsc_khz == f_a. As TSC scaling
    takes effect, guest programs can still observe TSC in frequency f_a.
    So far so good.
    
 3. However, in the second migration, f_b (!= f_a) is passed as the
    guest TSC frequency to tsc_set_info() on host C so that after the
    migration d->arch.tsc_khz is not f_a any more. As TSC scaling
    takes effect on host C as well, the TSC frequency observed by
    guest programs changes and may break some TSC sensitive programs

    At least in my test for VMX TSC scaling, guest Linux kernel would
    complain tsc clocksource is unstable. SVM TSC ratio should have
    the same problem.

W/ patch 2, tsc_get_info() in the above case always gets the guest TSC
frequency from d->arch.tsc_khz. Then in the first migration above, it
behaves the same as before, while in the second migration it can
maintain the guest TSC frequency correctly.

> Then a reason needs to be given why the similar logic in the
> PVRDTSCP case does not also get adjusted.
>

It's just because I didn't consider about PVRDTSCP case. I will add it
in the next version.

- Haozhong

> Plus, looking at the respective code in tsc_set_info(), I'm
> getting the impression that what you're trying to do is not in line
> with what is intended so far: Especially the comment there
> suggests that the intention is for the guest TSC to be made
> match the host one. Considering migration this indeed looks
> suspicious, but then that would need changing too.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-14  2:45     ` Haozhong Zhang
@ 2015-10-14  9:40       ` Jan Beulich
  2015-10-14 10:00         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-14  9:40 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima, Boris Ostrovsky

>>> On 14.10.15 at 04:45, <haozhong.zhang@intel.com> wrote:
> However, patch 2 is still necessary. The existing tsc_get_info() uses
> the host TSC frequency as the guest TSC frequency for a domain in
> TSC_MODE_DEFAULT, which could cause errors in the following example:
>  - A domain d using TSC_MODE_DEFAULT is created on host A, then
>    migrated to host B, and finally migrated to host C.
>  - The host TSC frequencies of three hosts are f_a, f_b and f_c
>    respectively and f_a != f_b and f_b != f_c.
>  - Both host B and host C support TSC scaling (either VMX TSC scaling
>    or SVM TSC ratio).
> 
> In above example w/o patch 2,
>  1. Initially, d->arch.tsc_khz == f_a.
>  
>  2. In the first migration, tsc_get_info() on host A passes f_a as the
>     guest TSC frequency to tsc_set_info() on host B, so that after the
>     migration it's still that d->arch.tsc_khz == f_a. As TSC scaling
>     takes effect, guest programs can still observe TSC in frequency f_a.
>     So far so good.
>     
>  3. However, in the second migration, f_b (!= f_a) is passed as the
>     guest TSC frequency to tsc_set_info() on host C so that after the
>     migration d->arch.tsc_khz is not f_a any more.

Hmm, yes, looks like you're right. But I don't think the current use of
cpu_khz should be blindly replaced - instead this should be done only
when cpu_has_tsc_ratio (mirroring what tsc_set_info() does). Unless
of course it can be proven that d->arch.tsc_khz == cpu_khz in all
relevant cases without use of TSC ratio (i.e. the use of cpu_khz here
just served as kind of a shorthand).

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-14  9:40       ` Jan Beulich
@ 2015-10-14 10:00         ` Haozhong Zhang
  2015-10-14 10:20           ` Jan Beulich
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-14 10:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima, Boris Ostrovsky

On Wed, Oct 14, 2015 at 03:40:01AM -0600, Jan Beulich wrote:
> >>> On 14.10.15 at 04:45, <haozhong.zhang@intel.com> wrote:
> > However, patch 2 is still necessary. The existing tsc_get_info() uses
> > the host TSC frequency as the guest TSC frequency for a domain in
> > TSC_MODE_DEFAULT, which could cause errors in the following example:
> >  - A domain d using TSC_MODE_DEFAULT is created on host A, then
> >    migrated to host B, and finally migrated to host C.
> >  - The host TSC frequencies of three hosts are f_a, f_b and f_c
> >    respectively and f_a != f_b and f_b != f_c.
> >  - Both host B and host C support TSC scaling (either VMX TSC scaling
> >    or SVM TSC ratio).
> > 
> > In above example w/o patch 2,
> >  1. Initially, d->arch.tsc_khz == f_a.
> >  
> >  2. In the first migration, tsc_get_info() on host A passes f_a as the
> >     guest TSC frequency to tsc_set_info() on host B, so that after the
> >     migration it's still that d->arch.tsc_khz == f_a. As TSC scaling
> >     takes effect, guest programs can still observe TSC in frequency f_a.
> >     So far so good.
> >     
> >  3. However, in the second migration, f_b (!= f_a) is passed as the
> >     guest TSC frequency to tsc_set_info() on host C so that after the
> >     migration d->arch.tsc_khz is not f_a any more.
> 
> Hmm, yes, looks like you're right. But I don't think the current use of
> cpu_khz should be blindly replaced - instead this should be done only
> when cpu_has_tsc_ratio (mirroring what tsc_set_info() does). Unless
> of course it can be proven that d->arch.tsc_khz == cpu_khz in all
> relevant cases without use of TSC ratio (i.e. the use of cpu_khz here
> just served as kind of a shorthand).
>
> Jan
> 

Sounds reasonable. A better patch 2 may look like

- *gtsc_khz = cpu_khz;
+ if ( is_hvm_domain(d) && cpu_has_tsc_ratio )
+     *gtsc_khz = d->arch.tsc_khz;
+ else
+     *gtsc_khz = cpu_khz;

which gets d->arch.tsc_khz only if TSC ratio is used, otherwise it
follows the existing behavior.

- Haozhong

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-14 10:00         ` Haozhong Zhang
@ 2015-10-14 10:20           ` Jan Beulich
  2015-10-14 10:24             ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-14 10:20 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima, Boris Ostrovsky

>>> On 14.10.15 at 12:00, <haozhong.zhang@intel.com> wrote:
> Sounds reasonable. A better patch 2 may look like
> 
> - *gtsc_khz = cpu_khz;
> + if ( is_hvm_domain(d) && cpu_has_tsc_ratio )
> +     *gtsc_khz = d->arch.tsc_khz;
> + else
> +     *gtsc_khz = cpu_khz;
> 
> which gets d->arch.tsc_khz only if TSC ratio is used, otherwise it
> follows the existing behavior.

Except that it needs to use has_hvm_container_domain() (again to
match tsc_set_info()).

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info()
  2015-10-14 10:20           ` Jan Beulich
@ 2015-10-14 10:24             ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-14 10:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Keir Fraser, Suravee Suthikulpanit, Andrew Cooper,
	xen-devel, Aravind Gopalakrishnan, Jun Nakajima, Boris Ostrovsky

On Wed, Oct 14, 2015 at 04:20:57AM -0600, Jan Beulich wrote:
> >>> On 14.10.15 at 12:00, <haozhong.zhang@intel.com> wrote:
> > Sounds reasonable. A better patch 2 may look like
> > 
> > - *gtsc_khz = cpu_khz;
> > + if ( is_hvm_domain(d) && cpu_has_tsc_ratio )
> > +     *gtsc_khz = d->arch.tsc_khz;
> > + else
> > +     *gtsc_khz = cpu_khz;
> > 
> > which gets d->arch.tsc_khz only if TSC ratio is used, otherwise it
> > follows the existing behavior.
> 
> Except that it needs to use has_hvm_container_domain() (again to
> match tsc_set_info()).
>

Yes.

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio
  2015-09-28  7:13 ` [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio Haozhong Zhang
@ 2015-10-22 12:53   ` Jan Beulich
  2015-10-22 14:40     ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 12:53 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> Both VMX TSC scaling and SVM TSC ratio use the 64-bit TSC scaling ratio,
> but the number of fractional bits of the ratio is different between VMX
> and SVM. This patch makes the architecture code to collect the number of
> fractional bits and other related information into fields of struct
> hvm_function_table so that they can be used in the common code.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>  xen/arch/x86/hvm/svm/svm.c        |  9 +++++++++
>  xen/arch/x86/hvm/vmx/vmx.c        |  2 ++
>  xen/include/asm-x86/hvm/hvm.h     | 13 +++++++++++++
>  xen/include/asm-x86/hvm/svm/svm.h |  1 +
>  4 files changed, 25 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 8de41fa..94b9618 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -1428,6 +1428,9 @@ const struct hvm_function_table * __init 
> start_svm(void)
>      if ( !cpu_has_svm_nrips )
>          clear_bit(SVM_FEATURE_DECODEASSISTS, &svm_feature_flags);
>  
> +    if ( cpu_has_tsc_ratio )
> +        svm_function_table.tsc_scaling_supported = 1;
> +
>  #define P(p,s) if ( p ) { printk(" - %s\n", s); printed = 1; }
>      P(cpu_has_svm_npt, "Nested Page Tables (NPT)");
>      P(cpu_has_svm_lbrv, "Last Branch Record (LBR) Virtualisation");
> @@ -2283,6 +2286,12 @@ static struct hvm_function_table __initdata svm_function_table = {
>      .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
>      .nhvm_intr_blocked = nsvm_intr_blocked,
>      .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
> +
> +    .tsc_scaling_supported       = 0,

This is not needed.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1968,6 +1968,8 @@ static struct hvm_function_table __initdata 
> vmx_function_table = {
>      .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
>      .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
>      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
> +    /* support for VMX RDTSC(P) scaling */
> +    .tsc_scaling_supported       = 0,

Same here.

> --- a/xen/include/asm-x86/hvm/svm/svm.h
> +++ b/xen/include/asm-x86/hvm/svm/svm.h
> @@ -96,6 +96,7 @@ extern u32 svm_feature_flags;
>  
>  /* TSC rate */
>  #define DEFAULT_TSC_RATIO       0x0000000100000000ULL
> +#define MAX_TSC_RATIO           0x000000ffffffffffULL
>  #define TSC_RATIO_RSVD_BITS     0xffffff0000000000ULL

How about 

#define MAX_TSC_RATIO           (~TSC_RATIO_RSVD_BITS)

? (But of course it's not really clear in which way this is to
be used as "maximum" without seeing the code using it. I.e.
it's not clear whether you don't really just mean to specify
all the valid bits in the MSR.)

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-09-28  7:13 ` [PATCH 04/13] x86/hvm: Setup " Haozhong Zhang
@ 2015-10-22 13:13   ` Jan Beulich
  2015-10-22 15:55     ` Haozhong Zhang
  2015-10-23  7:44     ` Haozhong Zhang
  0 siblings, 2 replies; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 13:13 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> This patch adds a field tsc_scaling_ratio in struct arch_vcpu to

Why not in struct hvm_vcpu? Are you intending any use for PV guests?

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -297,6 +297,34 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
>      return 1;
>  }
>  
> +void hvm_setup_tsc_scaling(struct vcpu *v)
> +{
> +    u64 ratio, khz;
> +	s8 shift;

Hard tab.

> +    if ( !hvm_funcs.tsc_scaling_supported )
> +        return;
> +
> +    khz = v->domain->arch.tsc_khz;

I don't see the need for this variable in the first place. But if you
absolutely want to keep it, I don't see why it needs to be u64
when the field you load from is uint32_t.

> +    shift = (hvm_funcs.tsc_scaling_ratio_frac_bits <= 32) ?
> +        hvm_funcs.tsc_scaling_ratio_frac_bits : 32;

min()

> +    ratio = khz << shift;
> +    do_div(ratio, cpu_khz);
> +    ratio <<= hvm_funcs.tsc_scaling_ratio_frac_bits - shift;
> +
> +    if ( ratio == 0 ||
> +         ratio > hvm_funcs.max_tsc_scaling_ratio ||
> +         ratio & hvm_funcs.tsc_scaling_ratio_rsvd )

Parentheses around the operands of the & please.

> +    {
> +        printk(XENLOG_WARNING
> +               "Invalid TSC scaling ratio - virtual tsc khz=%lu\n",
> +               khz);

Who can issue a call to this function under which conditions? I.e. is
a non-ratelimited printk() okay here? Plus, without identifying the
subject vcpu I don't think the message is of much use beyond your
initial debugging purposes.

> @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
>          return -EINVAL;
>  
> +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> +        hvm_setup_tsc_scaling(v);

What's the rationale for putting it in this function? And what's the
reason for the dependency on !vtsc (please also see the comment
ahead of tsc_set_info())?

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1956,6 +1956,8 @@ void tsc_set_info(struct domain *d,
>          {
>      case TSC_MODE_NEVER_EMULATE:
>              d->arch.vtsc = 0;
> +            if ( tsc_mode == TSC_MODE_NEVER_EMULATE )
> +                d->arch.tsc_khz = cpu_khz;
>              break;
>          }

Depending on the changes to the first two patches: If this change
would remain like this, please move out the TSC_MODE_NEVER_EMULATE
case to be a standalone one again, since the way you do it here
looks pretty confusing/odd.

> @@ -1981,8 +1983,14 @@ void tsc_set_info(struct domain *d,
>      if ( is_hvm_domain(d) )
>      {
>          hvm_set_rdtsc_exiting(d, d->arch.vtsc);
> -        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
> +        if ( d->vcpu && d->vcpu[0] )
>          {
> +            if ( !d->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> +                hvm_setup_tsc_scaling(d->vcpu[0]);

And what about the other vCPU-s? If you mean this to be along
the lines of the code that follows here, you should put this after
the comment explaining that.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function
  2015-09-28  7:13 ` [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function Haozhong Zhang
@ 2015-10-22 13:52   ` Jan Beulich
  2015-10-23  0:49     ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 13:52 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -297,6 +297,59 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
>      return 1;
>  }
>  
> +/*
> + * Multiply tsc by a fixed point number represented by ratio.
> + *
> + * The most significant 64-N bits (mult) of ratio represent the
> + * integral part of the fixed point number; the remaining N bits
> + * (frac) represent the fractional part, ie. ratio represents a fixed
> + * point number (mult + frac * 2^(-N)).
> + *
> + * N equals to hvm_funcs.tsc_scaling_ratio_frac_bits.
> + */
> +static u64 __scale_tsc(u64 tsc, u64 ratio)

No double underscores please without good reason.

> +{
> +    u64 mult, frac, mask, _tsc;

_tsc is not a valid name for a local variable.

> +    int width, nr;

Both unsigned afaict.

> +    BUG_ON(hvm_funcs.tsc_scaling_ratio_frac_bits >= 64);
> +
> +    mult  = ratio >> hvm_funcs.tsc_scaling_ratio_frac_bits;
> +    mask  = (1ULL << hvm_funcs.tsc_scaling_ratio_frac_bits) - 1;
> +    frac  = ratio & mask;
> +
> +    width = 64 - hvm_funcs.tsc_scaling_ratio_frac_bits;
> +    mask  = (1ULL << width) - 1;
> +    nr    = hvm_funcs.tsc_scaling_ratio_frac_bits;
> +
> +    _tsc  = tsc;
> +    _tsc *= mult;
> +    _tsc += (tsc >> hvm_funcs.tsc_scaling_ratio_frac_bits) * frac;
> +
> +    while ( nr >= width )
> +    {
> +        _tsc += (((tsc >> (nr - width)) & mask) * frac) >> (64 - nr);
> +        nr   -= width;
> +    }

Please add a comment explaining what this loop is intended to do.

> +u64 hvm_scale_tsc(struct vcpu *v, u64 tsc)

const struct vcpu *v

> +{
> +    u64 _tsc = tsc;

Here I don't even see the need for this misnamed variable.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-09-28  7:13 ` [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC Haozhong Zhang
@ 2015-10-22 14:17   ` Jan Beulich
  2015-10-22 15:44     ` Boris Ostrovsky
                       ` (3 more replies)
  0 siblings, 4 replies; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 14:17 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, Suravee Suthikulpanit, Haozhong Zhang,
	Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, xen-devel, Jun Nakajima, Keir Fraser

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> calculate the guest TSC by adding the TSC offset to the host TSC. When
> the TSC scaling is enabled, the host TSC should be scaled first. This
> patch adds the scaling logic to those two functions.

Just like mentioned for the first twp patches - I'd first of all like to
understand why the lack of scaling this wasn't an issue for SVM so
far. What you reads plausible, but assuming that SVM TSC scaling
code was tested, I'm hesitant to apply changes to it without
understanding the details (or at least without SVM maintainers'
consent).

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
>          tsc = hvm_get_guest_time_fixed(v, at_tsc);
>          tsc = gtime_to_gtsc(v->domain, tsc);
>      }
> -    else if ( at_tsc )
> -    {
> -        tsc = at_tsc;
> -    }
>      else
>      {
> -        tsc = rdtsc();
> +        tsc = at_tsc ? at_tsc : rdtsc();

In cases like this please prefer the gcc extension allowing the middle
operand of the ?: to be omitted.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio
  2015-10-22 12:53   ` Jan Beulich
@ 2015-10-22 14:40     ` Haozhong Zhang
  2015-10-22 14:51       ` Jan Beulich
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 14:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 06:53:27AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > Both VMX TSC scaling and SVM TSC ratio use the 64-bit TSC scaling ratio,
> > but the number of fractional bits of the ratio is different between VMX
> > and SVM. This patch makes the architecture code to collect the number of
> > fractional bits and other related information into fields of struct
> > hvm_function_table so that they can be used in the common code.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > ---
> >  xen/arch/x86/hvm/svm/svm.c        |  9 +++++++++
> >  xen/arch/x86/hvm/vmx/vmx.c        |  2 ++
> >  xen/include/asm-x86/hvm/hvm.h     | 13 +++++++++++++
> >  xen/include/asm-x86/hvm/svm/svm.h |  1 +
> >  4 files changed, 25 insertions(+)
> > 
> > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> > index 8de41fa..94b9618 100644
> > --- a/xen/arch/x86/hvm/svm/svm.c
> > +++ b/xen/arch/x86/hvm/svm/svm.c
> > @@ -1428,6 +1428,9 @@ const struct hvm_function_table * __init 
> > start_svm(void)
> >      if ( !cpu_has_svm_nrips )
> >          clear_bit(SVM_FEATURE_DECODEASSISTS, &svm_feature_flags);
> >  
> > +    if ( cpu_has_tsc_ratio )
> > +        svm_function_table.tsc_scaling_supported = 1;
> > +
> >  #define P(p,s) if ( p ) { printk(" - %s\n", s); printed = 1; }
> >      P(cpu_has_svm_npt, "Nested Page Tables (NPT)");
> >      P(cpu_has_svm_lbrv, "Last Branch Record (LBR) Virtualisation");
> > @@ -2283,6 +2286,12 @@ static struct hvm_function_table __initdata svm_function_table = {
> >      .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
> >      .nhvm_intr_blocked = nsvm_intr_blocked,
> >      .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
> > +
> > +    .tsc_scaling_supported       = 0,
> 
> This is not needed.
> 
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1968,6 +1968,8 @@ static struct hvm_function_table __initdata 
> > vmx_function_table = {
> >      .altp2m_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
> >      .altp2m_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
> >      .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
> > +    /* support for VMX RDTSC(P) scaling */
> > +    .tsc_scaling_supported       = 0,
> 
> Same here.
>

I'll remove them in the next version.

> > --- a/xen/include/asm-x86/hvm/svm/svm.h
> > +++ b/xen/include/asm-x86/hvm/svm/svm.h
> > @@ -96,6 +96,7 @@ extern u32 svm_feature_flags;
> >  
> >  /* TSC rate */
> >  #define DEFAULT_TSC_RATIO       0x0000000100000000ULL
> > +#define MAX_TSC_RATIO           0x000000ffffffffffULL
> >  #define TSC_RATIO_RSVD_BITS     0xffffff0000000000ULL
> 
> How about 
> 
> #define MAX_TSC_RATIO           (~TSC_RATIO_RSVD_BITS)
>

Yes.

> ? (But of course it's not really clear in which way this is to
> be used as "maximum" without seeing the code using it. I.e.
> it's not clear whether you don't really just mean to specify
> all the valid bits in the MSR.)
>

The reserved bits are used to calculate the maximum TSC ratio which is
used in hvm_setup_tsc_scaling() in patch 4 to check the whether a TSC
scaling ratio is legal.

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio
  2015-10-22 14:40     ` Haozhong Zhang
@ 2015-10-22 14:51       ` Jan Beulich
  2015-10-22 15:57         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 14:51 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 22.10.15 at 16:40, <haozhong.zhang@intel.com> wrote:
> On Thu, Oct 22, 2015 at 06:53:27AM -0600, Jan Beulich wrote:
>> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> > --- a/xen/include/asm-x86/hvm/svm/svm.h
>> > +++ b/xen/include/asm-x86/hvm/svm/svm.h
>> > @@ -96,6 +96,7 @@ extern u32 svm_feature_flags;
>> >  
>> >  /* TSC rate */
>> >  #define DEFAULT_TSC_RATIO       0x0000000100000000ULL
>> > +#define MAX_TSC_RATIO           0x000000ffffffffffULL
>> >  #define TSC_RATIO_RSVD_BITS     0xffffff0000000000ULL
>> 
>> How about 
>> 
>> #define MAX_TSC_RATIO           (~TSC_RATIO_RSVD_BITS)
> 
> Yes.
> 
>> ? (But of course it's not really clear in which way this is to
>> be used as "maximum" without seeing the code using it. I.e.
>> it's not clear whether you don't really just mean to specify
>> all the valid bits in the MSR.)
> 
> The reserved bits are used to calculate the maximum TSC ratio which is
> used in hvm_setup_tsc_scaling() in patch 4 to check the whether a TSC
> scaling ratio is legal.

The main motivation behind the comment was to understand
whether one of the two constants wouldn't suffice.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 07/13] x86/hvm: Move saving/loading vcpu's TSC to common code
  2015-09-28  7:13 ` [PATCH 07/13] x86/hvm: Move saving/loading vcpu's TSC to common code Haozhong Zhang
@ 2015-10-22 14:54   ` Jan Beulich
  0 siblings, 0 replies; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 14:54 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> Both VMX and SVM saves/loads vcpu's TSC when saving/loading vcpu's
> context, so this patch moves saving/loading vcpu's TSC to the common
> function hvm_save_cpu_ctxt()/hvm_load_cpu_ctxt().
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 08/13] x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info()
  2015-09-28  7:13 ` [PATCH 08/13] x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info() Haozhong Zhang
@ 2015-10-22 15:01   ` Jan Beulich
  0 siblings, 0 replies; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 15:01 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> This patch uses hvm_funcs.tsc_scaling_supported instead of the
> architecture code to detect the TSC scaling support.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-22 14:17   ` Jan Beulich
@ 2015-10-22 15:44     ` Boris Ostrovsky
  2015-10-22 16:23       ` Haozhong Zhang
  2015-10-27 20:16       ` Aravind Gopalakrishnan
  2015-10-22 16:03     ` Haozhong Zhang
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-22 15:44 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan, Suravee Suthikulpanit,
	Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, xen-devel, Jun Nakajima, Keir Fraser

On 10/22/2015 10:17 AM, Jan Beulich wrote:
>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
>> calculate the guest TSC by adding the TSC offset to the host TSC. When
>> the TSC scaling is enabled, the host TSC should be scaled first. This
>> patch adds the scaling logic to those two functions.
> Just like mentioned for the first twp patches - I'd first of all like to
> understand why the lack of scaling this wasn't an issue for SVM so
> far. What you reads plausible, but assuming that SVM TSC scaling
> code was tested, I'm hesitant to apply changes to it without
> understanding the details (or at least without SVM maintainers'
> consent).

I don't see that this series will create any regressions in SVM . Most 
of the changes move SVM-specific code into HVM I didn't see any obvious 
problems there. I do have concern about patch 5 since I am sure I fully 
understand whether the new algorithm (in __scale_tsc()) is equivalent to 
current SVM code. I think you also had questions about that.

Having said this, the fact that this patch (and patch 9) fix bugs leads 
me to believe this feature may not have been thoroughly tested.

I don't have a pair of appropriate AMD systems to test this series with 
migration (which is where this can be verified). Aravind, can you find 
something and see how this works?

-boris


>
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
>>           tsc = hvm_get_guest_time_fixed(v, at_tsc);
>>           tsc = gtime_to_gtsc(v->domain, tsc);
>>       }
>> -    else if ( at_tsc )
>> -    {
>> -        tsc = at_tsc;
>> -    }
>>       else
>>       {
>> -        tsc = rdtsc();
>> +        tsc = at_tsc ? at_tsc : rdtsc();
> In cases like this please prefer the gcc extension allowing the middle
> operand of the ?: to be omitted.
>
> Jan
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly
  2015-09-28  7:13 ` [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly Haozhong Zhang
  2015-09-28 16:36   ` Boris Ostrovsky
@ 2015-10-22 15:50   ` Boris Ostrovsky
  2015-10-22 16:44     ` Haozhong Zhang
  1 sibling, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-22 15:50 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Suravee Suthikulpanit

On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> This patch makes the pvclock return the scaled host TSC and
> corresponding scaling parameters to HVM domains if guest TSC is not
> emulated and TSC scaling is enabled.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>   xen/arch/x86/time.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 4b5402c..54eab6e 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -832,10 +832,19 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>       }
>       else
>       {
> -        _u.tsc_timestamp     = t->local_tsc_stamp;
> +        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )
> +        {
> +            _u.tsc_timestamp     = hvm_scale_tsc(v, t->local_tsc_stamp);
> +            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> +            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
> +        }
> +        else
> +        {
> +            _u.tsc_timestamp     = t->local_tsc_stamp;
> +            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> +            _u.tsc_shift         = (s8)t->tsc_scale.shift;
> +        }
>           _u.system_time       = t->stime_local_stamp;
> -        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> -        _u.tsc_shift         = (s8)t->tsc_scale.shift;
>       }
>       if ( is_hvm_domain(d) )
>           _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;

So this is not directly related to this series but when we calculate 
tsc_timestamp --- shouldn't we subtract TSC offset? Otherwise we are 
reporting (possibly scaled) host's TSC and this is supposed to be 
guest's counter.


-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-09-28  7:13 ` [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset Haozhong Zhang
@ 2015-10-22 15:55   ` Boris Ostrovsky
  2015-10-22 17:12     ` Haozhong Zhang
  2015-10-27 13:29   ` Jan Beulich
  1 sibling, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-22 15:55 UTC (permalink / raw)
  To: Haozhong Zhang, xen-devel
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Suravee Suthikulpanit

On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> If VMX TSC scaling is enabled and no TSC emulation is used,
> vmx_set_tsc_offset() will calculate the TSC offset by substracting the
> scaled host TSC from the current guest TSC.
>
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> ---
>   xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 454440e..163974d 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
>   
>   static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
>   {
> +    uint64_t host_tsc, guest_tsc;
> +    struct domain *d = v->domain;
> +
> +    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
> +
> +    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
> +    {
> +        host_tsc = at_tsc ? at_tsc : rdtsc();
> +        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
> +    }
> +
>       vmx_vmcs_enter(v);
>   
> +    if ( !nestedhvm_enabled(d) )
> +        goto out;
> +
>       if ( nestedhvm_vcpu_in_guestmode(v) )
>           offset += nvmx_get_tsc_offset(v);
>   
> +out:
>       __vmwrite(TSC_OFFSET, offset);
>       vmx_vmcs_exit(v);
>   }


This (and corresponding SVM code) looks somewhat suspect to me: if the 
processor supports scaling we are ignoring caller-provided offset.

Besides, at least when called from hvm_set_guest_tsc_fixed() --- we've 
already taken scaling into account, that's what patch 6 does, doesn't it?


-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-22 13:13   ` Jan Beulich
@ 2015-10-22 15:55     ` Haozhong Zhang
  2015-10-22 16:05       ` Jan Beulich
  2015-10-23  7:44     ` Haozhong Zhang
  1 sibling, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 15:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > This patch adds a field tsc_scaling_ratio in struct arch_vcpu to
> 
> Why not in struct hvm_vcpu? Are you intending any use for PV guests?
>

No, I'll move tsc_scaling_ratio to struct hvm_cpu.

> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -297,6 +297,34 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
> >      return 1;
> >  }
> >  
> > +void hvm_setup_tsc_scaling(struct vcpu *v)
> > +{
> > +    u64 ratio, khz;
> > +	s8 shift;
> 
> Hard tab.

will change to spaces

> 
> > +    if ( !hvm_funcs.tsc_scaling_supported )
> > +        return;
> > +
> > +    khz = v->domain->arch.tsc_khz;
> 
> I don't see the need for this variable in the first place. But if you
> absolutely want to keep it, I don't see why it needs to be u64
> when the field you load from is uint32_t.
>

will remove

> > +    shift = (hvm_funcs.tsc_scaling_ratio_frac_bits <= 32) ?
> > +        hvm_funcs.tsc_scaling_ratio_frac_bits : 32;
> 
> min()
>

yes

> > +    ratio = khz << shift;
> > +    do_div(ratio, cpu_khz);
> > +    ratio <<= hvm_funcs.tsc_scaling_ratio_frac_bits - shift;
> > +
> > +    if ( ratio == 0 ||
> > +         ratio > hvm_funcs.max_tsc_scaling_ratio ||
> > +         ratio & hvm_funcs.tsc_scaling_ratio_rsvd )
> 
> Parentheses around the operands of the & please.
>

will add

> > +    {
> > +        printk(XENLOG_WARNING
> > +               "Invalid TSC scaling ratio - virtual tsc khz=%lu\n",
> > +               khz);
> 
> Who can issue a call to this function under which conditions? I.e. is
> a non-ratelimited printk() okay here? Plus, without identifying the
> subject vcpu I don't think the message is of much use beyond your
> initial debugging purposes.
>

hvm_load_cpu_ctxt(), hvm_vcpu_reset_state() and tsc_set_info() call
this function. Am I correct that those functions are not called in
high rate? But I agree that the warning is useless w/o identifying the
subject vcpu and will add it.


> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
> >          return -EINVAL;
> >  
> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> > +        hvm_setup_tsc_scaling(v);
> 
> What's the rationale for putting it in this function?

I cannot remind clearly why it's needed here. I remember it's used in
the case that a domain is restored from a file by 'xl restore'. I'll
reply to this issue later.

> And what's the
> reason for the dependency on !vtsc (please also see the comment
> ahead of tsc_set_info())?
>

If v->domain->arch.vtsc == 1, guest rdtsc/rdtscp is trapped (setup in
tsc_set_info()) and emulated by hypervisor and the hardware TSC
scaling is not used in this case.

> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1956,6 +1956,8 @@ void tsc_set_info(struct domain *d,
> >          {
> >      case TSC_MODE_NEVER_EMULATE:
> >              d->arch.vtsc = 0;
> > +            if ( tsc_mode == TSC_MODE_NEVER_EMULATE )
> > +                d->arch.tsc_khz = cpu_khz;
> >              break;
> >          }
> 
> Depending on the changes to the first two patches: If this change
> would remain like this, please move out the TSC_MODE_NEVER_EMULATE
> case to be a standalone one again, since the way you do it here
> looks pretty confusing/odd.
>

Thanks for pointing out it! This two lines change is incorrect and
should be removed.

> > @@ -1981,8 +1983,14 @@ void tsc_set_info(struct domain *d,
> >      if ( is_hvm_domain(d) )
> >      {
> >          hvm_set_rdtsc_exiting(d, d->arch.vtsc);
> > -        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
> > +        if ( d->vcpu && d->vcpu[0] )
> >          {
> > +            if ( !d->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> > +                hvm_setup_tsc_scaling(d->vcpu[0]);
> 
> And what about the other vCPU-s? If you mean this to be along
> the lines of the code that follows here, you should put this after
> the comment explaining that.
>

TSC scaling for other vcpus are set in hvm_vcpu_reset_state(). But I'm
not sure it can be moved together with the followed code because of
the followed
     if ( incarnation )
         return;

incarnation != 0 after migration and the setup of TSC scaling is
however necessary.

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio
  2015-10-22 14:51       ` Jan Beulich
@ 2015-10-22 15:57         ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 15:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 08:51:13AM -0600, Jan Beulich wrote:
> >>> On 22.10.15 at 16:40, <haozhong.zhang@intel.com> wrote:
> > On Thu, Oct 22, 2015 at 06:53:27AM -0600, Jan Beulich wrote:
> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >> > --- a/xen/include/asm-x86/hvm/svm/svm.h
> >> > +++ b/xen/include/asm-x86/hvm/svm/svm.h
> >> > @@ -96,6 +96,7 @@ extern u32 svm_feature_flags;
> >> >  
> >> >  /* TSC rate */
> >> >  #define DEFAULT_TSC_RATIO       0x0000000100000000ULL
> >> > +#define MAX_TSC_RATIO           0x000000ffffffffffULL
> >> >  #define TSC_RATIO_RSVD_BITS     0xffffff0000000000ULL
> >> 
> >> How about 
> >> 
> >> #define MAX_TSC_RATIO           (~TSC_RATIO_RSVD_BITS)
> > 
> > Yes.
> > 
> >> ? (But of course it's not really clear in which way this is to
> >> be used as "maximum" without seeing the code using it. I.e.
> >> it's not clear whether you don't really just mean to specify
> >> all the valid bits in the MSR.)
> > 
> > The reserved bits are used to calculate the maximum TSC ratio which is
> > used in hvm_setup_tsc_scaling() in patch 4 to check the whether a TSC
> > scaling ratio is legal.
> 
> The main motivation behind the comment was to understand
> whether one of the two constants wouldn't suffice.
>

Yes, either one itself is suffice. I'll only use the max ratio in the
next version (especially that VMX TSC scaling does not reserve any bits)

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-22 14:17   ` Jan Beulich
  2015-10-22 15:44     ` Boris Ostrovsky
@ 2015-10-22 16:03     ` Haozhong Zhang
  2015-10-27  1:54     ` Haozhong Zhang
  2015-10-27  8:44     ` Haozhong Zhang
  3 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 16:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> > calculate the guest TSC by adding the TSC offset to the host TSC. When
> > the TSC scaling is enabled, the host TSC should be scaled first. This
> > patch adds the scaling logic to those two functions.
> 
> Just like mentioned for the first twp patches - I'd first of all like to
> understand why the lack of scaling this wasn't an issue for SVM so
> far. What you reads plausible, but assuming that SVM TSC scaling
> code was tested, I'm hesitant to apply changes to it without
> understanding the details (or at least without SVM maintainers'
> consent).
> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >          tsc = hvm_get_guest_time_fixed(v, at_tsc);
> >          tsc = gtime_to_gtsc(v->domain, tsc);
> >      }
> > -    else if ( at_tsc )
> > -    {
> > -        tsc = at_tsc;
> > -    }
> >      else
> >      {
> > -        tsc = rdtsc();
> > +        tsc = at_tsc ? at_tsc : rdtsc();
> 
> In cases like this please prefer the gcc extension allowing the middle
> operand of the ?: to be omitted.
>

will modify

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-22 15:55     ` Haozhong Zhang
@ 2015-10-22 16:05       ` Jan Beulich
  2015-10-22 16:39         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-22 16:05 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 22.10.15 at 17:55, <haozhong.zhang@intel.com> wrote:
> On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
>> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> > +    {
>> > +        printk(XENLOG_WARNING
>> > +               "Invalid TSC scaling ratio - virtual tsc khz=%lu\n",
>> > +               khz);
>> 
>> Who can issue a call to this function under which conditions? I.e. is
>> a non-ratelimited printk() okay here? Plus, without identifying the
>> subject vcpu I don't think the message is of much use beyond your
>> initial debugging purposes.
> 
> hvm_load_cpu_ctxt(), hvm_vcpu_reset_state() and tsc_set_info() call
> this function. Am I correct that those functions are not called in
> high rate? But I agree that the warning is useless w/o identifying the
> subject vcpu and will add it.

The question isn't whether they're normally called at a high rate,
but whether a malicious entity could make it so.

>> And what's the
>> reason for the dependency on !vtsc (please also see the comment
>> ahead of tsc_set_info())?
> 
> If v->domain->arch.vtsc == 1, guest rdtsc/rdtscp is trapped (setup in
> tsc_set_info()) and emulated by hypervisor and the hardware TSC
> scaling is not used in this case.

But there wouldn't be any harm, I suppose?

>> > @@ -1981,8 +1983,14 @@ void tsc_set_info(struct domain *d,
>> >      if ( is_hvm_domain(d) )
>> >      {
>> >          hvm_set_rdtsc_exiting(d, d->arch.vtsc);
>> > -        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
>> > +        if ( d->vcpu && d->vcpu[0] )
>> >          {
>> > +            if ( !d->arch.vtsc && hvm_funcs.tsc_scaling_supported )
>> > +                hvm_setup_tsc_scaling(d->vcpu[0]);
>> 
>> And what about the other vCPU-s? If you mean this to be along
>> the lines of the code that follows here, you should put this after
>> the comment explaining that.
>>
> 
> TSC scaling for other vcpus are set in hvm_vcpu_reset_state(). But I'm
> not sure it can be moved together with the followed code because of
> the followed
>      if ( incarnation )
>          return;
> 
> incarnation != 0 after migration and the setup of TSC scaling is
> however necessary.

It seems to me that you could move _all_ your additions to the if()
body past the comment (perhaps adjusting that one as needed).

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-22 15:44     ` Boris Ostrovsky
@ 2015-10-22 16:23       ` Haozhong Zhang
  2015-10-27 20:16       ` Aravind Gopalakrishnan
  1 sibling, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 16:23 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Ian Campbell

On Thu, Oct 22, 2015 at 11:44:52AM -0400, Boris Ostrovsky wrote:
> On 10/22/2015 10:17 AM, Jan Beulich wrote:
> >>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> >>calculate the guest TSC by adding the TSC offset to the host TSC. When
> >>the TSC scaling is enabled, the host TSC should be scaled first. This
> >>patch adds the scaling logic to those two functions.
> >Just like mentioned for the first twp patches - I'd first of all like to
> >understand why the lack of scaling this wasn't an issue for SVM so
> >far. What you reads plausible, but assuming that SVM TSC scaling
> >code was tested, I'm hesitant to apply changes to it without
> >understanding the details (or at least without SVM maintainers'
> >consent).
> 
> I don't see that this series will create any regressions in SVM . Most of
> the changes move SVM-specific code into HVM I didn't see any obvious
> problems there. I do have concern about patch 5 since I am sure I fully
> understand whether the new algorithm (in __scale_tsc()) is equivalent to
> current SVM code. I think you also had questions about that.
> 
> Having said this, the fact that this patch (and patch 9) fix bugs leads me
> to believe this feature may not have been thoroughly tested.
> 
> I don't have a pair of appropriate AMD systems to test this series with
> migration (which is where this can be verified). Aravind, can you find
> something and see how this works?
> 
> -boris
>

Thank Boris and any guys who can help to test this patch on AMD systems!

On both specifications, VMX TSC scaling and SVM TSC ratio intend to
address the same problem and take quite similar approach. Thus, when I
originally started to prepare this patchset and before I lift the
common code, I simply copied corresponding SVM TSC ratio code to VMX
and only made necessary adaptions and assumed all other
hardware-independent code was correct.

Haozhong

> 
> >
> >>--- a/xen/arch/x86/hvm/hvm.c
> >>+++ b/xen/arch/x86/hvm/hvm.c
> >>@@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >>          tsc = hvm_get_guest_time_fixed(v, at_tsc);
> >>          tsc = gtime_to_gtsc(v->domain, tsc);
> >>      }
> >>-    else if ( at_tsc )
> >>-    {
> >>-        tsc = at_tsc;
> >>-    }
> >>      else
> >>      {
> >>-        tsc = rdtsc();
> >>+        tsc = at_tsc ? at_tsc : rdtsc();
> >In cases like this please prefer the gcc extension allowing the middle
> >operand of the ?: to be omitted.
> >
> >Jan
> >
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-22 16:05       ` Jan Beulich
@ 2015-10-22 16:39         ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 16:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 10:05:39AM -0600, Jan Beulich wrote:
> >>> On 22.10.15 at 17:55, <haozhong.zhang@intel.com> wrote:
> > On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >> > +    {
> >> > +        printk(XENLOG_WARNING
> >> > +               "Invalid TSC scaling ratio - virtual tsc khz=%lu\n",
> >> > +               khz);
> >> 
> >> Who can issue a call to this function under which conditions? I.e. is
> >> a non-ratelimited printk() okay here? Plus, without identifying the
> >> subject vcpu I don't think the message is of much use beyond your
> >> initial debugging purposes.
> > 
> > hvm_load_cpu_ctxt(), hvm_vcpu_reset_state() and tsc_set_info() call
> > this function. Am I correct that those functions are not called in
> > high rate? But I agree that the warning is useless w/o identifying the
> > subject vcpu and will add it.
> 
> The question isn't whether they're normally called at a high rate,
> but whether a malicious entity could make it so.
>

tsc_set_info() can be called through XEN_DOMCTL_settscinfo by
toolstack, so a malicious entity in toolstack (if any) could cause
problem.

Then I should remove the message here.

> >> And what's the
> >> reason for the dependency on !vtsc (please also see the comment
> >> ahead of tsc_set_info())?
> > 
> > If v->domain->arch.vtsc == 1, guest rdtsc/rdtscp is trapped (setup in
> > tsc_set_info()) and emulated by hypervisor and the hardware TSC
> > scaling is not used in this case.
> 
> But there wouldn't be any harm, I suppose?
>

should be harmless.

> >> > @@ -1981,8 +1983,14 @@ void tsc_set_info(struct domain *d,
> >> >      if ( is_hvm_domain(d) )
> >> >      {
> >> >          hvm_set_rdtsc_exiting(d, d->arch.vtsc);
> >> > -        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
> >> > +        if ( d->vcpu && d->vcpu[0] )
> >> >          {
> >> > +            if ( !d->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> >> > +                hvm_setup_tsc_scaling(d->vcpu[0]);
> >> 
> >> And what about the other vCPU-s? If you mean this to be along
> >> the lines of the code that follows here, you should put this after
> >> the comment explaining that.
> >>
> > 
> > TSC scaling for other vcpus are set in hvm_vcpu_reset_state(). But I'm
> > not sure it can be moved together with the followed code because of
> > the followed
> >      if ( incarnation )
> >          return;
> > 
> > incarnation != 0 after migration and the setup of TSC scaling is
> > however necessary.
> 
> It seems to me that you could move _all_ your additions to the if()
> body past the comment (perhaps adjusting that one as needed).
>

Good suggestion! I'll move and adapt the comment to the beginning of the if-body.

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly
  2015-10-22 15:50   ` Boris Ostrovsky
@ 2015-10-22 16:44     ` Haozhong Zhang
  2015-10-22 19:15       ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 16:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Suravee Suthikulpanit

On Thu, Oct 22, 2015 at 11:50:18AM -0400, Boris Ostrovsky wrote:
> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >This patch makes the pvclock return the scaled host TSC and
> >corresponding scaling parameters to HVM domains if guest TSC is not
> >emulated and TSC scaling is enabled.
> >
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >  xen/arch/x86/time.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> >diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> >index 4b5402c..54eab6e 100644
> >--- a/xen/arch/x86/time.c
> >+++ b/xen/arch/x86/time.c
> >@@ -832,10 +832,19 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
> >      }
> >      else
> >      {
> >-        _u.tsc_timestamp     = t->local_tsc_stamp;
> >+        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )
> >+        {
> >+            _u.tsc_timestamp     = hvm_scale_tsc(v, t->local_tsc_stamp);
> >+            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
> >+            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
> >+        }
> >+        else
> >+        {
> >+            _u.tsc_timestamp     = t->local_tsc_stamp;
> >+            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> >+            _u.tsc_shift         = (s8)t->tsc_scale.shift;
> >+        }
> >          _u.system_time       = t->stime_local_stamp;
> >-        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
> >-        _u.tsc_shift         = (s8)t->tsc_scale.shift;
> >      }
> >      if ( is_hvm_domain(d) )
> >          _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	     the offset is subtract here
> 
> So this is not directly related to this series but when we calculate
> tsc_timestamp --- shouldn't we subtract TSC offset? Otherwise we are
> reporting (possibly scaled) host's TSC and this is supposed to be guest's
> counter.
> 
> 
> -boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-10-22 15:55   ` Boris Ostrovsky
@ 2015-10-22 17:12     ` Haozhong Zhang
  2015-10-22 19:19       ` Boris Ostrovsky
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-22 17:12 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Aravind Gopalakrishnan, Jun Nakajima, Keir Fraser,
	Suravee Suthikulpanit

On Thu, Oct 22, 2015 at 11:55:00AM -0400, Boris Ostrovsky wrote:
> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >If VMX TSC scaling is enabled and no TSC emulation is used,
> >vmx_set_tsc_offset() will calculate the TSC offset by substracting the
> >scaled host TSC from the current guest TSC.
> >
> >Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >---
> >  xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> >
> >diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >index 454440e..163974d 100644
> >--- a/xen/arch/x86/hvm/vmx/vmx.c
> >+++ b/xen/arch/x86/hvm/vmx/vmx.c
> >@@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
> >  static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
> >  {
> >+    uint64_t host_tsc, guest_tsc;
> >+    struct domain *d = v->domain;
> >+
> >+    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
> >+
> >+    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
> >+    {
> >+        host_tsc = at_tsc ? at_tsc : rdtsc();
> >+        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
> >+    }
> >+
> >      vmx_vmcs_enter(v);
> >+    if ( !nestedhvm_enabled(d) )
> >+        goto out;
> >+
> >      if ( nestedhvm_vcpu_in_guestmode(v) )
> >          offset += nvmx_get_tsc_offset(v);
> >+out:
> >      __vmwrite(TSC_OFFSET, offset);
> >      vmx_vmcs_exit(v);
> >  }
> 
> 
> This (and corresponding SVM code) looks somewhat suspect to me: if the
> processor supports scaling we are ignoring caller-provided offset.
>

Yes, if TSC scaling is available, [svm|vmx]_set_tsc_offset() do not
trust the offset from callers and calculate a scaled version by itself
instead.

The original svm_set_tsc_offset() did so and I keep its semantics
here.

Maybe moving the scaling logic out of [svm|vmx]_set_tsc_offset() could
make the semantics more clear.

> Besides, at least when called from hvm_set_guest_tsc_fixed() --- we've
> already taken scaling into account, that's what patch 6 does, doesn't it?
>

Yes.

> 
> -boris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly
  2015-10-22 16:44     ` Haozhong Zhang
@ 2015-10-22 19:15       ` Boris Ostrovsky
  0 siblings, 0 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-22 19:15 UTC (permalink / raw)
  To: xen-devel, Ian Jackson, Stefano Stabellini, Ian Campbell,
	Wei Liu, Keir Fraser, Jan Beulich, Andrew Cooper,
	Suravee Suthikulpanit, Aravind Gopalakrishnan, Jun Nakajima,
	Kevin Tian

On 10/22/2015 12:44 PM, Haozhong Zhang wrote:
> On Thu, Oct 22, 2015 at 11:50:18AM -0400, Boris Ostrovsky wrote:
>> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
>>> This patch makes the pvclock return the scaled host TSC and
>>> corresponding scaling parameters to HVM domains if guest TSC is not
>>> emulated and TSC scaling is enabled.
>>>
>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>>> ---
>>>   xen/arch/x86/time.c | 15 ++++++++++++---
>>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>>> index 4b5402c..54eab6e 100644
>>> --- a/xen/arch/x86/time.c
>>> +++ b/xen/arch/x86/time.c
>>> @@ -832,10 +832,19 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>>>       }
>>>       else
>>>       {
>>> -        _u.tsc_timestamp     = t->local_tsc_stamp;
>>> +        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )
>>> +        {
>>> +            _u.tsc_timestamp     = hvm_scale_tsc(v, t->local_tsc_stamp);
>>> +            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
>>> +            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
>>> +        }
>>> +        else
>>> +        {
>>> +            _u.tsc_timestamp     = t->local_tsc_stamp;
>>> +            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
>>> +            _u.tsc_shift         = (s8)t->tsc_scale.shift;
>>> +        }
>>>           _u.system_time       = t->stime_local_stamp;
>>> -        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
>>> -        _u.tsc_shift         = (s8)t->tsc_scale.shift;
>>>       }
>>>       if ( is_hvm_domain(d) )
>>>           _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
>               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	     the offset is subtract here

Ah, I missed this. Thanks.

-boris


>> So this is not directly related to this series but when we calculate
>> tsc_timestamp --- shouldn't we subtract TSC offset? Otherwise we are
>> reporting (possibly scaled) host's TSC and this is supposed to be guest's
>> counter.
>>
>>
>> -boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-10-22 17:12     ` Haozhong Zhang
@ 2015-10-22 19:19       ` Boris Ostrovsky
  2015-10-23  0:52         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-22 19:19 UTC (permalink / raw)
  To: xen-devel, Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, Aravind Gopalakrishnan,
	Jan Beulich, Keir Fraser, Suravee Suthikulpanit

On 10/22/2015 01:12 PM, Haozhong Zhang wrote:
> On Thu, Oct 22, 2015 at 11:55:00AM -0400, Boris Ostrovsky wrote:
>> On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
>>> If VMX TSC scaling is enabled and no TSC emulation is used,
>>> vmx_set_tsc_offset() will calculate the TSC offset by substracting the
>>> scaled host TSC from the current guest TSC.
>>>
>>> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
>>> ---
>>>   xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>>> index 454440e..163974d 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
>>>   static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
>>>   {
>>> +    uint64_t host_tsc, guest_tsc;
>>> +    struct domain *d = v->domain;
>>> +
>>> +    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
>>> +
>>> +    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
>>> +    {
>>> +        host_tsc = at_tsc ? at_tsc : rdtsc();
>>> +        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
>>> +    }
>>> +
>>>       vmx_vmcs_enter(v);
>>> +    if ( !nestedhvm_enabled(d) )
>>> +        goto out;
>>> +
>>>       if ( nestedhvm_vcpu_in_guestmode(v) )
>>>           offset += nvmx_get_tsc_offset(v);
>>> +out:
>>>       __vmwrite(TSC_OFFSET, offset);
>>>       vmx_vmcs_exit(v);
>>>   }
>>
>> This (and corresponding SVM code) looks somewhat suspect to me: if the
>> processor supports scaling we are ignoring caller-provided offset.
>>
> Yes, if TSC scaling is available, [svm|vmx]_set_tsc_offset() do not
> trust the offset from callers and calculate a scaled version by itself
> instead.
>
> The original svm_set_tsc_offset() did so and I keep its semantics
> here.
>
> Maybe moving the scaling logic out of [svm|vmx]_set_tsc_offset() could
> make the semantics more clear.


Yes: since scaling is now architectural, offsets are (or at least can 
be) calculated correctly at HVM level and so arch-specific handlers 
should not have to do this. In case of hvm_set_guest_tsc_fixed() we 
definitely doing it twice.

-boris

>
>> Besides, at least when called from hvm_set_guest_tsc_fixed() --- we've
>> already taken scaling into account, that's what patch 6 does, doesn't it?
>>
> Yes.
>
>> -boris
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function
  2015-10-22 13:52   ` Jan Beulich
@ 2015-10-23  0:49     ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-23  0:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 07:52:08AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -297,6 +297,59 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
> >      return 1;
> >  }
> >  
> > +/*
> > + * Multiply tsc by a fixed point number represented by ratio.
> > + *
> > + * The most significant 64-N bits (mult) of ratio represent the
> > + * integral part of the fixed point number; the remaining N bits
> > + * (frac) represent the fractional part, ie. ratio represents a fixed
> > + * point number (mult + frac * 2^(-N)).
> > + *
> > + * N equals to hvm_funcs.tsc_scaling_ratio_frac_bits.
> > + */
> > +static u64 __scale_tsc(u64 tsc, u64 ratio)
> 
> No double underscores please without good reason.
>

will rename into scale_tsc().

> > +{
> > +    u64 mult, frac, mask, _tsc;
> 
> _tsc is not a valid name for a local variable.
> 
> > +    int width, nr;
> 
> Both unsigned afaict.
> 
> > +    BUG_ON(hvm_funcs.tsc_scaling_ratio_frac_bits >= 64);
> > +
> > +    mult  = ratio >> hvm_funcs.tsc_scaling_ratio_frac_bits;
> > +    mask  = (1ULL << hvm_funcs.tsc_scaling_ratio_frac_bits) - 1;
> > +    frac  = ratio & mask;
> > +
> > +    width = 64 - hvm_funcs.tsc_scaling_ratio_frac_bits;
> > +    mask  = (1ULL << width) - 1;
> > +    nr    = hvm_funcs.tsc_scaling_ratio_frac_bits;
> > +
> > +    _tsc  = tsc;
> > +    _tsc *= mult;
> > +    _tsc += (tsc >> hvm_funcs.tsc_scaling_ratio_frac_bits) * frac;
> > +
> > +    while ( nr >= width )
> > +    {
> > +        _tsc += (((tsc >> (nr - width)) & mask) * frac) >> (64 - nr);
> > +        nr   -= width;
> > +    }
> 
> Please add a comment explaining what this loop is intended to do.
>

I'm rewriting this function without using this tricky.

> > +u64 hvm_scale_tsc(struct vcpu *v, u64 tsc)
> 
> const struct vcpu *v
> 
> > +{
> > +    u64 _tsc = tsc;
> 
> Here I don't even see the need for this misnamed variable.
>

will remove this variable

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-10-22 19:19       ` Boris Ostrovsky
@ 2015-10-23  0:52         ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-23  0:52 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Aravind Gopalakrishnan, Jun Nakajima, Keir Fraser,
	Suravee Suthikulpanit

On Thu, Oct 22, 2015 at 03:19:19PM -0400, Boris Ostrovsky wrote:
> On 10/22/2015 01:12 PM, Haozhong Zhang wrote:
> >On Thu, Oct 22, 2015 at 11:55:00AM -0400, Boris Ostrovsky wrote:
> >>On 09/28/2015 03:13 AM, Haozhong Zhang wrote:
> >>>If VMX TSC scaling is enabled and no TSC emulation is used,
> >>>vmx_set_tsc_offset() will calculate the TSC offset by substracting the
> >>>scaled host TSC from the current guest TSC.
> >>>
> >>>Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> >>>---
> >>>  xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
> >>>  1 file changed, 15 insertions(+)
> >>>
> >>>diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >>>index 454440e..163974d 100644
> >>>--- a/xen/arch/x86/hvm/vmx/vmx.c
> >>>+++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>>@@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
> >>>  static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
> >>>  {
> >>>+    uint64_t host_tsc, guest_tsc;
> >>>+    struct domain *d = v->domain;
> >>>+
> >>>+    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
> >>>+
> >>>+    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
> >>>+    {
> >>>+        host_tsc = at_tsc ? at_tsc : rdtsc();
> >>>+        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
> >>>+    }
> >>>+
> >>>      vmx_vmcs_enter(v);
> >>>+    if ( !nestedhvm_enabled(d) )
> >>>+        goto out;
> >>>+
> >>>      if ( nestedhvm_vcpu_in_guestmode(v) )
> >>>          offset += nvmx_get_tsc_offset(v);
> >>>+out:
> >>>      __vmwrite(TSC_OFFSET, offset);
> >>>      vmx_vmcs_exit(v);
> >>>  }
> >>
> >>This (and corresponding SVM code) looks somewhat suspect to me: if the
> >>processor supports scaling we are ignoring caller-provided offset.
> >>
> >Yes, if TSC scaling is available, [svm|vmx]_set_tsc_offset() do not
> >trust the offset from callers and calculate a scaled version by itself
> >instead.
> >
> >The original svm_set_tsc_offset() did so and I keep its semantics
> >here.
> >
> >Maybe moving the scaling logic out of [svm|vmx]_set_tsc_offset() could
> >make the semantics more clear.
> 
> 
> Yes: since scaling is now architectural, offsets are (or at least can be)
> calculated correctly at HVM level and so arch-specific handlers should not
> have to do this. In case of hvm_set_guest_tsc_fixed() we definitely doing it
> twice.
> 
> -boris
>

I'll move the scaling logic out of [svm|vmx]_set_tsc_offset() in the
next version.

> >
> >>Besides, at least when called from hvm_set_guest_tsc_fixed() --- we've
> >>already taken scaling into account, that's what patch 6 does, doesn't it?
> >>
> >Yes.
> >
> >>-boris
> >>
> >>_______________________________________________
> >>Xen-devel mailing list
> >>Xen-devel@lists.xen.org
> >>http://lists.xen.org/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-22 13:13   ` Jan Beulich
  2015-10-22 15:55     ` Haozhong Zhang
@ 2015-10-23  7:44     ` Haozhong Zhang
  2015-10-23  7:59       ` Jan Beulich
  1 sibling, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-23  7:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > This patch adds a field tsc_scaling_ratio in struct arch_vcpu to
> 
> Why not in struct hvm_vcpu? Are you intending any use for PV guests?
> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -297,6 +297,34 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
> >      return 1;
> >  }
> >  
> > +void hvm_setup_tsc_scaling(struct vcpu *v)
> > +{
> > +    u64 ratio, khz;
> > +	s8 shift;
> 
> Hard tab.
> 
> > +    if ( !hvm_funcs.tsc_scaling_supported )
> > +        return;
> > +
> > +    khz = v->domain->arch.tsc_khz;
> 
> I don't see the need for this variable in the first place. But if you
> absolutely want to keep it, I don't see why it needs to be u64
> when the field you load from is uint32_t.
> 
> > +    shift = (hvm_funcs.tsc_scaling_ratio_frac_bits <= 32) ?
> > +        hvm_funcs.tsc_scaling_ratio_frac_bits : 32;
> 
> min()
> 
> > +    ratio = khz << shift;
> > +    do_div(ratio, cpu_khz);
> > +    ratio <<= hvm_funcs.tsc_scaling_ratio_frac_bits - shift;
> > +
> > +    if ( ratio == 0 ||
> > +         ratio > hvm_funcs.max_tsc_scaling_ratio ||
> > +         ratio & hvm_funcs.tsc_scaling_ratio_rsvd )
> 
> Parentheses around the operands of the & please.
> 
> > +    {
> > +        printk(XENLOG_WARNING
> > +               "Invalid TSC scaling ratio - virtual tsc khz=%lu\n",
> > +               khz);
> 
> Who can issue a call to this function under which conditions? I.e. is
> a non-ratelimited printk() okay here? Plus, without identifying the
> subject vcpu I don't think the message is of much use beyond your
> initial debugging purposes.
> 
> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
> >          return -EINVAL;
> >  
> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> > +        hvm_setup_tsc_scaling(v);
> 
> What's the rationale for putting it in this function?

hvm_load_cpu_ctxt() is called in the migration to restore vcpu's state
including TSC related things, so hvm_setup_tsc_scaling() is called
here.

hvm_vcpu_reset_state() is not called in the migration, so we cannot
rely on the call to hvm_setup_tsc_scaling() there.

Haozhong

> And what's the
> reason for the dependency on !vtsc (please also see the comment
> ahead of tsc_set_info())?
> 
> > --- a/xen/arch/x86/time.c
> > +++ b/xen/arch/x86/time.c
> > @@ -1956,6 +1956,8 @@ void tsc_set_info(struct domain *d,
> >          {
> >      case TSC_MODE_NEVER_EMULATE:
> >              d->arch.vtsc = 0;
> > +            if ( tsc_mode == TSC_MODE_NEVER_EMULATE )
> > +                d->arch.tsc_khz = cpu_khz;
> >              break;
> >          }
> 
> Depending on the changes to the first two patches: If this change
> would remain like this, please move out the TSC_MODE_NEVER_EMULATE
> case to be a standalone one again, since the way you do it here
> looks pretty confusing/odd.
> 
> > @@ -1981,8 +1983,14 @@ void tsc_set_info(struct domain *d,
> >      if ( is_hvm_domain(d) )
> >      {
> >          hvm_set_rdtsc_exiting(d, d->arch.vtsc);
> > -        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
> > +        if ( d->vcpu && d->vcpu[0] )
> >          {
> > +            if ( !d->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> > +                hvm_setup_tsc_scaling(d->vcpu[0]);
> 
> And what about the other vCPU-s? If you mean this to be along
> the lines of the code that follows here, you should put this after
> the comment explaining that.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-23  7:44     ` Haozhong Zhang
@ 2015-10-23  7:59       ` Jan Beulich
  2015-10-23  8:18         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-23  7:59 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 23.10.15 at 09:44, <haozhong.zhang@intel.com> wrote:
> On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
>> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:

Please remember to trim your replies.

>> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
>> >          return -EINVAL;
>> >  
>> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
>> > +        hvm_setup_tsc_scaling(v);
>> 
>> What's the rationale for putting it in this function?
> 
> hvm_load_cpu_ctxt() is called in the migration to restore vcpu's state
> including TSC related things, so hvm_setup_tsc_scaling() is called
> here.
> 
> hvm_vcpu_reset_state() is not called in the migration, so we cannot
> rely on the call to hvm_setup_tsc_scaling() there.

All that is understood, but doesn't explain why the scaling setup gets
done here instead of somewhere after _all_ state got loaded.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-23  7:59       ` Jan Beulich
@ 2015-10-23  8:18         ` Haozhong Zhang
  2015-10-23  8:31           ` Jan Beulich
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-23  8:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Fri, Oct 23, 2015 at 01:59:46AM -0600, Jan Beulich wrote:
> >>> On 23.10.15 at 09:44, <haozhong.zhang@intel.com> wrote:
> > On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> 
> Please remember to trim your replies.
> 
> >> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> >> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
> >> >          return -EINVAL;
> >> >  
> >> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> >> > +        hvm_setup_tsc_scaling(v);
> >> 
> >> What's the rationale for putting it in this function?
> > 
> > hvm_load_cpu_ctxt() is called in the migration to restore vcpu's state
> > including TSC related things, so hvm_setup_tsc_scaling() is called
> > here.
> > 
> > hvm_vcpu_reset_state() is not called in the migration, so we cannot
> > rely on the call to hvm_setup_tsc_scaling() there.
> 
> All that is understood, but doesn't explain why the scaling setup gets
> done here instead of somewhere after _all_ state got loaded.
>
> Jan
> 

Because vcpu is waken up at the end of hvm_vcpu_reset_state(), the
setup of TSC scaling should be done before that.

Haozhong

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-23  8:18         ` Haozhong Zhang
@ 2015-10-23  8:31           ` Jan Beulich
  2015-10-23  8:40             ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-23  8:31 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 23.10.15 at 10:18, <haozhong.zhang@intel.com> wrote:
> On Fri, Oct 23, 2015 at 01:59:46AM -0600, Jan Beulich wrote:
>> >>> On 23.10.15 at 09:44, <haozhong.zhang@intel.com> wrote:
>> > On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
>> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> 
>> Please remember to trim your replies.
>> 
>> >> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
> hvm_domain_context_t *h)
>> >> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
>> >> >          return -EINVAL;
>> >> >  
>> >> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
>> >> > +        hvm_setup_tsc_scaling(v);
>> >> 
>> >> What's the rationale for putting it in this function?
>> > 
>> > hvm_load_cpu_ctxt() is called in the migration to restore vcpu's state
>> > including TSC related things, so hvm_setup_tsc_scaling() is called
>> > here.
>> > 
>> > hvm_vcpu_reset_state() is not called in the migration, so we cannot
>> > rely on the call to hvm_setup_tsc_scaling() there.
>> 
>> All that is understood, but doesn't explain why the scaling setup gets
>> done here instead of somewhere after _all_ state got loaded.
> 
> Because vcpu is waken up at the end of hvm_vcpu_reset_state(), the
> setup of TSC scaling should be done before that.

But we're talking about hvm_load_cpu_ctxt() here.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-23  8:31           ` Jan Beulich
@ 2015-10-23  8:40             ` Haozhong Zhang
  2015-10-23  9:18               ` Jan Beulich
  0 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-23  8:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Fri, Oct 23, 2015 at 02:31:11AM -0600, Jan Beulich wrote:
> >>> On 23.10.15 at 10:18, <haozhong.zhang@intel.com> wrote:
> > On Fri, Oct 23, 2015 at 01:59:46AM -0600, Jan Beulich wrote:
> >> >>> On 23.10.15 at 09:44, <haozhong.zhang@intel.com> wrote:
> >> > On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
> >> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >> 
> >> Please remember to trim your replies.
> >> 
> >> >> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
> > hvm_domain_context_t *h)
> >> >> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
> >> >> >          return -EINVAL;
> >> >> >  
> >> >> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
> >> >> > +        hvm_setup_tsc_scaling(v);
> >> >> 
> >> >> What's the rationale for putting it in this function?
> >> > 
> >> > hvm_load_cpu_ctxt() is called in the migration to restore vcpu's state
> >> > including TSC related things, so hvm_setup_tsc_scaling() is called
> >> > here.
> >> > 
> >> > hvm_vcpu_reset_state() is not called in the migration, so we cannot
> >> > rely on the call to hvm_setup_tsc_scaling() there.
> >> 
> >> All that is understood, but doesn't explain why the scaling setup gets
> >> done here instead of somewhere after _all_ state got loaded.
> > 
> > Because vcpu is waken up at the end of hvm_vcpu_reset_state(), the
> > setup of TSC scaling should be done before that.
> 
> But we're talking about hvm_load_cpu_ctxt() here.
>

Sorry for the typo. s/hvm_vcpu_reset_state/hvm_load_cpu_ctxt

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 04/13] x86/hvm: Setup TSC scaling ratio
  2015-10-23  8:40             ` Haozhong Zhang
@ 2015-10-23  9:18               ` Jan Beulich
  0 siblings, 0 replies; 117+ messages in thread
From: Jan Beulich @ 2015-10-23  9:18 UTC (permalink / raw)
  To: Andrew Cooper, Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Ian Jackson, xen-devel, Aravind Gopalakrishnan,
	Suravee Suthikulpanit, Keir Fraser, Boris Ostrovsky

>>> On 23.10.15 at 10:40, <haozhong.zhang@intel.com> wrote:
> On Fri, Oct 23, 2015 at 02:31:11AM -0600, Jan Beulich wrote:
>> >>> On 23.10.15 at 10:18, <haozhong.zhang@intel.com> wrote:
>> > On Fri, Oct 23, 2015 at 01:59:46AM -0600, Jan Beulich wrote:
>> >> >>> On 23.10.15 at 09:44, <haozhong.zhang@intel.com> wrote:
>> >> > On Thu, Oct 22, 2015 at 07:13:07AM -0600, Jan Beulich wrote:
>> >> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> >> 
>> >> Please remember to trim your replies.
>> >> 
>> >> >> > @@ -2023,6 +2051,9 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
>> > hvm_domain_context_t *h)
>> >> >> >      if ( hvm_funcs.load_cpu_ctxt(v, &ctxt) < 0 )
>> >> >> >          return -EINVAL;
>> >> >> >  
>> >> >> > +    if ( !v->domain->arch.vtsc && hvm_funcs.tsc_scaling_supported )
>> >> >> > +        hvm_setup_tsc_scaling(v);
>> >> >> 
>> >> >> What's the rationale for putting it in this function?
>> >> > 
>> >> > hvm_load_cpu_ctxt() is called in the migration to restore vcpu's state
>> >> > including TSC related things, so hvm_setup_tsc_scaling() is called
>> >> > here.
>> >> > 
>> >> > hvm_vcpu_reset_state() is not called in the migration, so we cannot
>> >> > rely on the call to hvm_setup_tsc_scaling() there.
>> >> 
>> >> All that is understood, but doesn't explain why the scaling setup gets
>> >> done here instead of somewhere after _all_ state got loaded.
>> > 
>> > Because vcpu is waken up at the end of hvm_vcpu_reset_state(), the
>> > setup of TSC scaling should be done before that.
>> 
>> But we're talking about hvm_load_cpu_ctxt() here.
> 
> Sorry for the typo. s/hvm_vcpu_reset_state/hvm_load_cpu_ctxt

Hmm, interesting. I don't really understand why we do so, and I
don't see how this can be correct unless we rely on either something
else to keep the vCPU from running or this always being the last
restored item. Plus the commit that introduced this (89fdf2860a) only
talks about waking APs, yet I don't see any distinction between BP
and APs here.

Andrew - you probably know the restore logic best: Any thoughts?

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-22 14:17   ` Jan Beulich
  2015-10-22 15:44     ` Boris Ostrovsky
  2015-10-22 16:03     ` Haozhong Zhang
@ 2015-10-27  1:54     ` Haozhong Zhang
  2015-10-27  8:15       ` Jan Beulich
  2015-10-27  8:44     ` Haozhong Zhang
  3 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-27  1:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> > calculate the guest TSC by adding the TSC offset to the host TSC. When
> > the TSC scaling is enabled, the host TSC should be scaled first. This
> > patch adds the scaling logic to those two functions.
> 
> Just like mentioned for the first twp patches - I'd first of all like to
> understand why the lack of scaling this wasn't an issue for SVM so
> far. What you reads plausible, but assuming that SVM TSC scaling
> code was tested, I'm hesitant to apply changes to it without
> understanding the details (or at least without SVM maintainers'
> consent).
>

The current SVM TSC ratio code does not seem correct w/o patch 6 (as
well as patch 2, but I only analyze patch 6 here). Following is the
explanation.

When SVM TSC ratio is used and the ratio is not 1,

1. The original hvm_get_guest_tsc_fixed(v, at_tsc) returns
     (at_tsc ? : rdtsc()) + v->arch.hvm_vcpu.cache_tsc_offset

   It's called in following control flows:

   * hvm_msr_write_intercept(MSR_IA32_TSC_DEADLINE, msr_content, ...)
       vlapic_tdt_msr_set(..., msr_content)
         guest_tsc = hvm_get_guest_tsc(...)
           hvm_get_guest_tsc_fixed(..., 0)

   * hvm_save_cpu_ctxt()
       svm_save_vmcb_ctxt(v, ...)
         svm_save_cpu_state(v, data)
           data->tsc = hvm_get_guest_tsc_fixed(v, v->domain->arch.hvm_domain.sync_tsc)

   * svm_set_tsc_offset(v, offset, at_tsc)
       guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc)

   In all above control flows, hvm_get_guest_tsc_fixed() is expected
   to return the guest TSC. And if its 2nd argument at_tsc is not
   zero, at_tsc is always a host TSC value. Thus,
   hvm_get_guest_tsc_fixed() should scale at_tsc or rdtsc() in order
   to get the correct guest TSC, but it doesn't.

2. In the original hvm_set_guest_tsc_fixed(v, guest_tsc, at_tsc),
     v->arch.hvm_vcpu.cache_tsc_offset = guest_tsc - (at_tsc ? : rdtsc())

   It's called in following control flows:

   * hvm_set_guest_tsc(v, t)
       hvm_set_guest_tsc_fixed(v, t, 0)

   * hvm_load_cpu_ctxt()
       svm_load_vmcb_ctxt(v, ctxt)
         svm_load_cpu_state(v, ctxt)
	   hvm_set_guest_tsc_fixed(v, ctxt->tsc, v->domain->arch.hvm_domain.sync_tsc)

   In all above control flows, if the 3rd argument at_tsc of
   hvm_set_guest_tsc_fixed() is not zero, it's always a host TSC
   value. In order to get the correct TSC offset,
   hvm_set_guest_tsc_fixed() should scale at_tsc or rdtsc(), but it
   doesn't.

So this patch 6 is necessary to fix hvm_[g|s]et_guest_tsc_fixed() in
the TSC scaling circumstance.

Haozhong

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27  1:54     ` Haozhong Zhang
@ 2015-10-27  8:15       ` Jan Beulich
  2015-10-27  8:25         ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-27  8:15 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 27.10.15 at 02:54, <haozhong.zhang@intel.com> wrote:
> On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
>> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>> > The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
>> > calculate the guest TSC by adding the TSC offset to the host TSC. When
>> > the TSC scaling is enabled, the host TSC should be scaled first. This
>> > patch adds the scaling logic to those two functions.
>> 
>> Just like mentioned for the first twp patches - I'd first of all like to
>> understand why the lack of scaling this wasn't an issue for SVM so
>> far. What you reads plausible, but assuming that SVM TSC scaling
>> code was tested, I'm hesitant to apply changes to it without
>> understanding the details (or at least without SVM maintainers'
>> consent).
>>
> 
> The current SVM TSC ratio code does not seem correct w/o patch 6 (as
> well as patch 2, but I only analyze patch 6 here). Following is the
> explanation.

Right - as said before, all you write reads plausible, but will need
confirming by an SVM maintainer. And then I'd like to ask you to
re-order you patch series to fix bugs first (whether that's along
with generalizing or ahead of it I'd leave to you, as long as the
result meets the main goal I'm having here: backportability).

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27  8:15       ` Jan Beulich
@ 2015-10-27  8:25         ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-27  8:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Tue, Oct 27, 2015 at 02:15:53AM -0600, Jan Beulich wrote:
> >>> On 27.10.15 at 02:54, <haozhong.zhang@intel.com> wrote:
> > On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
> >> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >> > The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> >> > calculate the guest TSC by adding the TSC offset to the host TSC. When
> >> > the TSC scaling is enabled, the host TSC should be scaled first. This
> >> > patch adds the scaling logic to those two functions.
> >> 
> >> Just like mentioned for the first twp patches - I'd first of all like to
> >> understand why the lack of scaling this wasn't an issue for SVM so
> >> far. What you reads plausible, but assuming that SVM TSC scaling
> >> code was tested, I'm hesitant to apply changes to it without
> >> understanding the details (or at least without SVM maintainers'
> >> consent).
> >>
> > 
> > The current SVM TSC ratio code does not seem correct w/o patch 6 (as
> > well as patch 2, but I only analyze patch 6 here). Following is the
> > explanation.
> 
> Right - as said before, all you write reads plausible, but will need
> confirming by an SVM maintainer. And then I'd like to ask you to
> re-order you patch series to fix bugs first (whether that's along
> with generalizing or ahead of it I'd leave to you, as long as the
> result meets the main goal I'm having here: backportability).
> 
> Jan
>

I'll wait for SVM maintainers' comments. If this patch 6 is valid,
I'll put all bug fixes to the beginning in the next version.

Thanks,
Haozhong

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-22 14:17   ` Jan Beulich
                       ` (2 preceding siblings ...)
  2015-10-27  1:54     ` Haozhong Zhang
@ 2015-10-27  8:44     ` Haozhong Zhang
  2015-10-27 13:10       ` Boris Ostrovsky
  3 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-27  8:44 UTC (permalink / raw)
  To: Boris Ostrovsky, Suravee Suthikulpanit, Aravind Gopalakrishnan
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel, Jan Beulich,
	Keir Fraser

On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> > calculate the guest TSC by adding the TSC offset to the host TSC. When
> > the TSC scaling is enabled, the host TSC should be scaled first. This
> > patch adds the scaling logic to those two functions.
> 
> Just like mentioned for the first twp patches - I'd first of all like to
> understand why the lack of scaling this wasn't an issue for SVM so
> far. What you reads plausible, but assuming that SVM TSC scaling
> code was tested, I'm hesitant to apply changes to it without
> understanding the details (or at least without SVM maintainers'
> consent).
>

Hi SVM maintainers,

Could you help to review this patch 6 as well as patch 2? They intend
to fix bugs in SVM TSC ratio code (or code that affects SVM TSC ratio
code).

The detailed explanations of patch 2 and patch 6 can be found at
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg01490.html
and
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02843.html
respectively.

Thanks,
Haozhong

> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >          tsc = hvm_get_guest_time_fixed(v, at_tsc);
> >          tsc = gtime_to_gtsc(v->domain, tsc);
> >      }
> > -    else if ( at_tsc )
> > -    {
> > -        tsc = at_tsc;
> > -    }
> >      else
> >      {
> > -        tsc = rdtsc();
> > +        tsc = at_tsc ? at_tsc : rdtsc();
> 
> In cases like this please prefer the gcc extension allowing the middle
> operand of the ?: to be omitted.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27  8:44     ` Haozhong Zhang
@ 2015-10-27 13:10       ` Boris Ostrovsky
  2015-10-27 13:55         ` Boris Ostrovsky
  2015-10-27 16:13         ` haozhong.zhang
  0 siblings, 2 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-27 13:10 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Ian Campbell, Wei Liu, Ian Jackson,
	Stefano Stabellini, Jun Nakajima, Kevin Tian, xen-devel,
	Keir Fraser

On 10/27/2015 04:44 AM, Haozhong Zhang wrote:
> On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>> The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
>>> calculate the guest TSC by adding the TSC offset to the host TSC. When
>>> the TSC scaling is enabled, the host TSC should be scaled first. This
>>> patch adds the scaling logic to those two functions.
>> Just like mentioned for the first twp patches - I'd first of all like to
>> understand why the lack of scaling this wasn't an issue for SVM so
>> far. What you reads plausible, but assuming that SVM TSC scaling
>> code was tested, I'm hesitant to apply changes to it without
>> understanding the details (or at least without SVM maintainers'
>> consent).
>>
> Hi SVM maintainers,
>
> Could you help to review this patch 6 as well as patch 2? They intend
> to fix bugs in SVM TSC ratio code (or code that affects SVM TSC ratio
> code).
>
> The detailed explanations of patch 2 and patch 6 can be found at
> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg01490.html
> and
> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02843.html
> respectively.

I agree with patch 2 (so you can add my Reviewed-by).


  but I am not so sure about patch 6 (and 11, together with existing SVM 
handlers).


I don't have latest Intel's manual handy but for SVM the guest TSC value 
is calculated as scaled host TSC plus *unscaled* VMCB's TSC offset. Is 
Intel's implementation similar?

Both svm_set_tsc_offset and vmx_set_tsc_offset (as proposed in patch 11) 
write VMCB/VMCS with guest (i.s. scaled) offset, and that doesn't seem 
right.

If I am right then I think (1) SVM is broken now and (2) patches 6 and 
11 don't fix this brokenness and instead propagate it to VMX.

(I should have thought about this when I last replied to you asking to 
move scaling out of vmx_set_tsc_offset(). But I re-read this code now 
and it doesn't make sense to me anymore).

-boris


>
> Thanks,
> Haozhong
>
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
>>>           tsc = hvm_get_guest_time_fixed(v, at_tsc);
>>>           tsc = gtime_to_gtsc(v->domain, tsc);
>>>       }
>>> -    else if ( at_tsc )
>>> -    {
>>> -        tsc = at_tsc;
>>> -    }
>>>       else
>>>       {
>>> -        tsc = rdtsc();
>>> +        tsc = at_tsc ? at_tsc : rdtsc();
>> In cases like this please prefer the gcc extension allowing the middle
>> operand of the ?: to be omitted.
>>
>> Jan
>>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling
  2015-09-28  7:13 ` [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling Haozhong Zhang
@ 2015-10-27 13:19   ` Jan Beulich
  2015-10-27 16:17     ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-27 13:19 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> @@ -1805,6 +1810,8 @@ void vmcs_dump_vcpu(struct vcpu *v)
>      printk("IDTVectoring: info=%08x errcode=%08x\n",
>             vmr32(IDT_VECTORING_INFO), vmr32(IDT_VECTORING_ERROR_CODE));
>      printk("TSC Offset = 0x%016lx\n", vmr(TSC_OFFSET));
> +    if ( v->arch.hvm_vmx.secondary_exec_control & SECONDARY_EXEC_TSC_SCALING )
> +        printk("TSC Multiplier = 0x%016lx\n", vmr(TSC_MULTIPLIER));

Please make this a single output line together with "TSC Offset = ..."
(vmr() can safely be used on non-existent VMCS fields).

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -151,6 +151,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
>      if ( v->vcpu_id == 0 )
>          v->arch.user_regs.eax = 1;
>  
> +    v->arch.tsc_scaling_ratio = VMX_TSC_MULTIPLIER_DEFAULT;
> +
>      return 0;
>  }

If you did this earlier in the function, then construct_vmcs() could
(more naturally) use the value from the field instead of the constant.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-09-28  7:13 ` [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset Haozhong Zhang
  2015-10-22 15:55   ` Boris Ostrovsky
@ 2015-10-27 13:29   ` Jan Beulich
  2015-10-27 16:21     ` Haozhong Zhang
  1 sibling, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-27 13:29 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
>  
>  static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
>  {
> +    uint64_t host_tsc, guest_tsc;
> +    struct domain *d = v->domain;
> +
> +    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
> +
> +    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
> +    {
> +        host_tsc = at_tsc ? at_tsc : rdtsc();
> +        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
> +    }

Considering up to here this is basically the same as SVM's, this
should imo be factored out into a new hvm_set_tsc_offset(),
with the caller of hvm_funcs.set_tsc_offset() - lacking a proper
wrapper anyway - being converted to call that function.

>      vmx_vmcs_enter(v);
>  
> +    if ( !nestedhvm_enabled(d) )
> +        goto out;
> +
>      if ( nestedhvm_vcpu_in_guestmode(v) )
>          offset += nvmx_get_tsc_offset(v);
>  
> +out:

Instead of using goto and a malformed (coding style wise) label,
please simply extend the if() accordingly.

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-09-28  7:13 ` [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware Haozhong Zhang
  2015-09-28 16:02   ` Boris Ostrovsky
@ 2015-10-27 13:33   ` Jan Beulich
  2015-10-28  2:41     ` Haozhong Zhang
  1 sibling, 1 reply; 117+ messages in thread
From: Jan Beulich @ 2015-10-27 13:33 UTC (permalink / raw)
  To: Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> This patch adds a new call-back setup_tsc_scaling in struct
> hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
> it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>

Looking at this (last non-tools one) patch, I wonder how if this is
needed things can be correct at the point of the series prior to
this patch. IOW - is the series correctly ordered?

Jan

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27 13:10       ` Boris Ostrovsky
@ 2015-10-27 13:55         ` Boris Ostrovsky
  2015-10-27 16:13           ` haozhong.zhang
  2015-10-27 16:13         ` haozhong.zhang
  1 sibling, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-10-27 13:55 UTC (permalink / raw)
  To: Suravee Suthikulpanit, Aravind Gopalakrishnan, Jan Beulich,
	Andrew Cooper, Ian Campbell, Wei Liu, Ian Jackson,
	Stefano Stabellini, Jun Nakajima, Kevin Tian, xen-devel,
	Keir Fraser, haozhong.zhang

On 10/27/2015 09:10 AM, Boris Ostrovsky wrote:
> On 10/27/2015 04:44 AM, Haozhong Zhang wrote:
>> On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
>>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>>> The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
>>>> calculate the guest TSC by adding the TSC offset to the host TSC. When
>>>> the TSC scaling is enabled, the host TSC should be scaled first. This
>>>> patch adds the scaling logic to those two functions.
>>> Just like mentioned for the first twp patches - I'd first of all 
>>> like to
>>> understand why the lack of scaling this wasn't an issue for SVM so
>>> far. What you reads plausible, but assuming that SVM TSC scaling
>>> code was tested, I'm hesitant to apply changes to it without
>>> understanding the details (or at least without SVM maintainers'
>>> consent).
>>>
>> Hi SVM maintainers,
>>
>> Could you help to review this patch 6 as well as patch 2? They intend
>> to fix bugs in SVM TSC ratio code (or code that affects SVM TSC ratio
>> code).
>>
>> The detailed explanations of patch 2 and patch 6 can be found at
>> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg01490.html 
>>
>> and
>> http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02843.html 
>>
>> respectively.
>
> I agree with patch 2 (so you can add my Reviewed-by).
>
>
>  but I am not so sure about patch 6 (and 11, together with existing 
> SVM handlers).
>
>
> I don't have latest Intel's manual handy but for SVM the guest TSC 
> value is calculated as scaled host TSC plus *unscaled* VMCB's TSC 
> offset. Is Intel's implementation similar?
>
> Both svm_set_tsc_offset and vmx_set_tsc_offset (as proposed in patch 
> 11) write VMCB/VMCS with guest (i.s. scaled) offset, and that doesn't 
> seem right.

(I seem to have lost Haozhong on my previous email)

No, it is right (or, rather, since TSC offset is a constant it can be 
used as scaled or unscaled, depending on how you want to implement it).

So if you adjust patch 11 (and corresponding SVM code) to take scaling 
out from there this patch would be correct.

Of course I still can't test this since the two machines that I have 
available are running at fairly close frequencies.


-boris



>
> If I am right then I think (1) SVM is broken now and (2) patches 6 and 
> 11 don't fix this brokenness and instead propagate it to VMX.
>
> (I should have thought about this when I last replied to you asking to 
> move scaling out of vmx_set_tsc_offset(). But I re-read this code now 
> and it doesn't make sense to me anymore).
>
> -boris
>
>
>>
>> Thanks,
>> Haozhong
>>
>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>> @@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, 
>>>> u64 guest_tsc, u64 at_tsc)
>>>>           tsc = hvm_get_guest_time_fixed(v, at_tsc);
>>>>           tsc = gtime_to_gtsc(v->domain, tsc);
>>>>       }
>>>> -    else if ( at_tsc )
>>>> -    {
>>>> -        tsc = at_tsc;
>>>> -    }
>>>>       else
>>>>       {
>>>> -        tsc = rdtsc();
>>>> +        tsc = at_tsc ? at_tsc : rdtsc();
>>> In cases like this please prefer the gcc extension allowing the middle
>>> operand of the ?: to be omitted.
>>>
>>> Jan
>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27 13:10       ` Boris Ostrovsky
  2015-10-27 13:55         ` Boris Ostrovsky
@ 2015-10-27 16:13         ` haozhong.zhang
  1 sibling, 0 replies; 117+ messages in thread
From: haozhong.zhang @ 2015-10-27 16:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Suravee Suthikulpanit, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser, Ian Campbell

On Tue, Oct 27, 2015 at 09:10:19AM -0400, Boris Ostrovsky wrote:
> On 10/27/2015 04:44 AM, Haozhong Zhang wrote:
> >On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
> >>>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>>The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> >>>calculate the guest TSC by adding the TSC offset to the host TSC. When
> >>>the TSC scaling is enabled, the host TSC should be scaled first. This
> >>>patch adds the scaling logic to those two functions.
> >>Just like mentioned for the first twp patches - I'd first of all like to
> >>understand why the lack of scaling this wasn't an issue for SVM so
> >>far. What you reads plausible, but assuming that SVM TSC scaling
> >>code was tested, I'm hesitant to apply changes to it without
> >>understanding the details (or at least without SVM maintainers'
> >>consent).
> >>
> >Hi SVM maintainers,
> >
> >Could you help to review this patch 6 as well as patch 2? They intend
> >to fix bugs in SVM TSC ratio code (or code that affects SVM TSC ratio
> >code).
> >
> >The detailed explanations of patch 2 and patch 6 can be found at
> >http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg01490.html
> >and
> >http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02843.html
> >respectively.
> 
> I agree with patch 2 (so you can add my Reviewed-by).
>
Thank you!

> 
>  but I am not so sure about patch 6 (and 11, together with existing SVM
> handlers).
> 
> 
> I don't have latest Intel's manual handy but for SVM the guest TSC value is
> calculated as scaled host TSC plus *unscaled* VMCB's TSC offset. Is Intel's
> implementation similar?
>
Yes, scaled host TSC + unscaled TSC offset.

> Both svm_set_tsc_offset and vmx_set_tsc_offset (as proposed in patch 11)
> write VMCB/VMCS with guest (i.s. scaled) offset, and that doesn't seem
> right.
>
I think it's right. It does not scale the offset. It just doesn't
trust the offset from the argument, and calculate the offset by
subtracting the scaled host TSC from the current guest TSC.


> If I am right then I think (1) SVM is broken now and (2) patches 6 and 11
> don't fix this brokenness and instead propagate it to VMX.
> 
> (I should have thought about this when I last replied to you asking to move
> scaling out of vmx_set_tsc_offset(). But I re-read this code now and it
> doesn't make sense to me anymore).
So moving scaling out of [vmx|svm]_set_tsc_offset() is valid.

> 
> -boris
> 
> 
> >
> >Thanks,
> >Haozhong
> >
> >>>--- a/xen/arch/x86/hvm/hvm.c
> >>>+++ b/xen/arch/x86/hvm/hvm.c
> >>>@@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
> >>>          tsc = hvm_get_guest_time_fixed(v, at_tsc);
> >>>          tsc = gtime_to_gtsc(v->domain, tsc);
> >>>      }
> >>>-    else if ( at_tsc )
> >>>-    {
> >>>-        tsc = at_tsc;
> >>>-    }
> >>>      else
> >>>      {
> >>>-        tsc = rdtsc();
> >>>+        tsc = at_tsc ? at_tsc : rdtsc();
> >>In cases like this please prefer the gcc extension allowing the middle
> >>operand of the ?: to be omitted.
> >>
> >>Jan
> >>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27 13:55         ` Boris Ostrovsky
@ 2015-10-27 16:13           ` haozhong.zhang
  0 siblings, 0 replies; 117+ messages in thread
From: haozhong.zhang @ 2015-10-27 16:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Suravee Suthikulpanit, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Jan Beulich, Keir Fraser, Ian Campbell

On Tue, Oct 27, 2015 at 09:55:26AM -0400, Boris Ostrovsky wrote:
> On 10/27/2015 09:10 AM, Boris Ostrovsky wrote:
> >On 10/27/2015 04:44 AM, Haozhong Zhang wrote:
> >>On Thu, Oct 22, 2015 at 08:17:29AM -0600, Jan Beulich wrote:
> >>>>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>>>The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> >>>>calculate the guest TSC by adding the TSC offset to the host TSC. When
> >>>>the TSC scaling is enabled, the host TSC should be scaled first. This
> >>>>patch adds the scaling logic to those two functions.
> >>>Just like mentioned for the first twp patches - I'd first of all like
> >>>to
> >>>understand why the lack of scaling this wasn't an issue for SVM so
> >>>far. What you reads plausible, but assuming that SVM TSC scaling
> >>>code was tested, I'm hesitant to apply changes to it without
> >>>understanding the details (or at least without SVM maintainers'
> >>>consent).
> >>>
> >>Hi SVM maintainers,
> >>
> >>Could you help to review this patch 6 as well as patch 2? They intend
> >>to fix bugs in SVM TSC ratio code (or code that affects SVM TSC ratio
> >>code).
> >>
> >>The detailed explanations of patch 2 and patch 6 can be found at
> >>http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg01490.html
> >>
> >>and
> >>http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg02843.html
> >>
> >>respectively.
> >
> >I agree with patch 2 (so you can add my Reviewed-by).
> >
> >
> > but I am not so sure about patch 6 (and 11, together with existing SVM
> >handlers).
> >
> >
> >I don't have latest Intel's manual handy but for SVM the guest TSC value
> >is calculated as scaled host TSC plus *unscaled* VMCB's TSC offset. Is
> >Intel's implementation similar?
> >
> >Both svm_set_tsc_offset and vmx_set_tsc_offset (as proposed in patch 11)
> >write VMCB/VMCS with guest (i.s. scaled) offset, and that doesn't seem
> >right.
> 
> (I seem to have lost Haozhong on my previous email)
> 
> No, it is right (or, rather, since TSC offset is a constant it can be used
> as scaled or unscaled, depending on how you want to implement it).
>
Agree, it's right.

> So if you adjust patch 11 (and corresponding SVM code) to take scaling out
> from there this patch would be correct.
>
Yes, I'm going to do so in the next version.

> Of course I still can't test this since the two machines that I have
> available are running at fairly close frequencies.
>
Sorry for not mentioning this patchset can be tested on a single
machine as well. In patch 13, I introduce an option 'vtsc_khz' to xl
configuration. For a HVM domain, If 'vtsc_khz=xxx' is present, xl will
set vcpu's TSC frequency to xxx KHz.

Thus, we can
1) create a HVM domain with 'vtsc_khz=xxx' where xxx KHz
   is different from the host;
2) use 'xl save domid savedvm' to save the domain to a file;
3) use 'xl restore savedvm' to restore the domain.

Because 'xl save/restore' asks the hypervisor to do the same work as
that for 'xl migrate', we can use above approach to test on a single
machine.

Haozhong

> 
> -boris
> 
> 
> 
> >
> >If I am right then I think (1) SVM is broken now and (2) patches 6 and 11
> >don't fix this brokenness and instead propagate it to VMX.
> >
> >(I should have thought about this when I last replied to you asking to
> >move scaling out of vmx_set_tsc_offset(). But I re-read this code now and
> >it doesn't make sense to me anymore).
> >
> >-boris
> >
> >
> >>
> >>Thanks,
> >>Haozhong
> >>
> >>>>--- a/xen/arch/x86/hvm/hvm.c
> >>>>+++ b/xen/arch/x86/hvm/hvm.c
> >>>>@@ -388,13 +388,12 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v,
> >>>>u64 guest_tsc, u64 at_tsc)
> >>>>          tsc = hvm_get_guest_time_fixed(v, at_tsc);
> >>>>          tsc = gtime_to_gtsc(v->domain, tsc);
> >>>>      }
> >>>>-    else if ( at_tsc )
> >>>>-    {
> >>>>-        tsc = at_tsc;
> >>>>-    }
> >>>>      else
> >>>>      {
> >>>>-        tsc = rdtsc();
> >>>>+        tsc = at_tsc ? at_tsc : rdtsc();
> >>>In cases like this please prefer the gcc extension allowing the middle
> >>>operand of the ?: to be omitted.
> >>>
> >>>Jan
> >>>
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xen.org
> >http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling
  2015-10-27 13:19   ` Jan Beulich
@ 2015-10-27 16:17     ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-27 16:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Tue, Oct 27, 2015 at 07:19:04AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > @@ -1805,6 +1810,8 @@ void vmcs_dump_vcpu(struct vcpu *v)
> >      printk("IDTVectoring: info=%08x errcode=%08x\n",
> >             vmr32(IDT_VECTORING_INFO), vmr32(IDT_VECTORING_ERROR_CODE));
> >      printk("TSC Offset = 0x%016lx\n", vmr(TSC_OFFSET));
> > +    if ( v->arch.hvm_vmx.secondary_exec_control & SECONDARY_EXEC_TSC_SCALING )
> > +        printk("TSC Multiplier = 0x%016lx\n", vmr(TSC_MULTIPLIER));
> 
> Please make this a single output line together with "TSC Offset = ..."
> (vmr() can safely be used on non-existent VMCS fields).
>
OK.

> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -151,6 +151,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
> >      if ( v->vcpu_id == 0 )
> >          v->arch.user_regs.eax = 1;
> >  
> > +    v->arch.tsc_scaling_ratio = VMX_TSC_MULTIPLIER_DEFAULT;
> > +
> >      return 0;
> >  }
> 
> If you did this earlier in the function, then construct_vmcs() could
> (more naturally) use the value from the field instead of the constant.
>
Yes, I'll move it before calling vmx_create_vmcs().

Thanks,
Haozhong

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset
  2015-10-27 13:29   ` Jan Beulich
@ 2015-10-27 16:21     ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-27 16:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Tue, Oct 27, 2015 at 07:29:44AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1102,11 +1102,26 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
> >  
> >  static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
> >  {
> > +    uint64_t host_tsc, guest_tsc;
> > +    struct domain *d = v->domain;
> > +
> > +    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
> > +
> > +    if ( cpu_has_vmx_tsc_scaling && !d->arch.vtsc )
> > +    {
> > +        host_tsc = at_tsc ? at_tsc : rdtsc();
> > +        offset = guest_tsc - hvm_scale_tsc(v, host_tsc);
> > +    }
> 
> Considering up to here this is basically the same as SVM's, this
> should imo be factored out into a new hvm_set_tsc_offset(),
> with the caller of hvm_funcs.set_tsc_offset() - lacking a proper
> wrapper anyway - being converted to call that function.
>
I'll add a new hvm_set_tsc_offset() and move the scaling logic from
set_tsc_offset callbacks to it.

> >      vmx_vmcs_enter(v);
> >  
> > +    if ( !nestedhvm_enabled(d) )
> > +        goto out;
> > +
> >      if ( nestedhvm_vcpu_in_guestmode(v) )
> >          offset += nvmx_get_tsc_offset(v);
> >  
> > +out:
> 
> Instead of using goto and a malformed (coding style wise) label,
> please simply extend the if() accordingly.
>
I'll refactor this piece of code to eliminate goto.

Haozhong

> Jan
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-22 15:44     ` Boris Ostrovsky
  2015-10-22 16:23       ` Haozhong Zhang
@ 2015-10-27 20:16       ` Aravind Gopalakrishnan
  2015-10-28  1:51         ` Haozhong Zhang
  2015-11-09  7:43         ` Haozhong Zhang
  1 sibling, 2 replies; 117+ messages in thread
From: Aravind Gopalakrishnan @ 2015-10-27 20:16 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich, Suravee Suthikulpanit, Haozhong Zhang
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Andrew Cooper, Ian Jackson, xen-devel, Jun Nakajima, Keir Fraser

On 10/22/2015 10:44 AM, Boris Ostrovsky wrote:
> On 10/22/2015 10:17 AM, Jan Beulich wrote:
>>>>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
>>> The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
>>> calculate the guest TSC by adding the TSC offset to the host TSC. When
>>> the TSC scaling is enabled, the host TSC should be scaled first. This
>>> patch adds the scaling logic to those two functions.
>> Just like mentioned for the first twp patches - I'd first of all like to
>> understand why the lack of scaling this wasn't an issue for SVM so
>> far. What you reads plausible, but assuming that SVM TSC scaling
>> code was tested, I'm hesitant to apply changes to it without
>> understanding the details (or at least without SVM maintainers'
>> consent).
>
> I don't see that this series will create any regressions in SVM . Most 
> of the changes move SVM-specific code into HVM I didn't see any 
> obvious problems there. I do have concern about patch 5 since I am 
> sure I fully understand whether the new algorithm (in __scale_tsc()) 
> is equivalent to current SVM code. I think you also had questions 
> about that.
>
> Having said this, the fact that this patch (and patch 9) fix bugs 
> leads me to believe this feature may not have been thoroughly tested.
>
> I don't have a pair of appropriate AMD systems to test this series 
> with migration (which is where this can be verified). Aravind, can you 
> find something and see how this works?
>

Haozhong, Boris-

I am planning to use a Fam10h system (older processor) and Fam15h Model 
60h (newer processor) for the test case.

Shall try to run the test on a single system as Haozhong mentioned on a 
different reply.
I ran into a problem with xl right now which I am trying to solve.

So, shall keep you posted on how testing goes.

Btw, I had issues with applying the patches to my local xen.git branch.
Patches 9 and 10 did not apply cleanly. Here is the log from git apply-

Patch 9:
Checking patch xen/arch/x86/time.c...
error: while searching for:
     }
     else
     {
         _u.tsc_timestamp     = t->local_tsc_stamp;
         _u.system_time       = t->stime_local_stamp;
         _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
         _u.tsc_shift         = (s8)t->tsc_scale.shift;
     }
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;

error: patch failed: xen/arch/x86/time.c:832

I think the complaint is about "_u.tsc_timestamp     = t->local_tsc_stamp;".
I checked current master 
(http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/time.c;h=5d7452a2bf8b8fb830c14f8897cfca65cb1ad39e;hb=refs/heads/master)
and the line there is "tsc_stamp = t->local_tsc_stamp" inside the else 
block and outside it, we have "_u.tsc_timestamp = tsc_stamp"

The rejected hunk for Patch 10:
+#define VMX_TSC_MULTIPLIER_DEFAULT 0x0001000000000000ULL
+#define VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
+
  #define cpu_has_wbinvd_exiting \

This seems to be because we have the #defines ordered like so on current 
master-
#define VMX_MISC_CR3_TARGET                     0x01ff0000
#define VMX_MISC_VMWRITE_ALL                    0x20000000

#define cpu_has_wbinvd_exiting \
     (vmx_secondary_exec_control & SECONDARY_EXEC_WBINVD_EXITING)

but the *_VMWRITE_ALL define is missing on your diff for Patch 10..

Maybe I am missing something?

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27 20:16       ` Aravind Gopalakrishnan
@ 2015-10-28  1:51         ` Haozhong Zhang
  2015-11-09  7:43         ` Haozhong Zhang
  1 sibling, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-28  1:51 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Suravee Suthikulpanit, Keir Fraser, Boris Ostrovsky,
	Ian Campbell

[-- Attachment #1: Type: text/plain, Size: 3919 bytes --]

On Tue, Oct 27, 2015 at 03:16:15PM -0500, Aravind Gopalakrishnan wrote:
> On 10/22/2015 10:44 AM, Boris Ostrovsky wrote:
> >On 10/22/2015 10:17 AM, Jan Beulich wrote:
> >>>>>On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> >>>The existing hvm_set_guest_tsc_fixed() and hvm_get_guest_tsc_fixed()
> >>>calculate the guest TSC by adding the TSC offset to the host TSC. When
> >>>the TSC scaling is enabled, the host TSC should be scaled first. This
> >>>patch adds the scaling logic to those two functions.
> >>Just like mentioned for the first twp patches - I'd first of all like to
> >>understand why the lack of scaling this wasn't an issue for SVM so
> >>far. What you reads plausible, but assuming that SVM TSC scaling
> >>code was tested, I'm hesitant to apply changes to it without
> >>understanding the details (or at least without SVM maintainers'
> >>consent).
> >
> >I don't see that this series will create any regressions in SVM . Most of
> >the changes move SVM-specific code into HVM I didn't see any obvious
> >problems there. I do have concern about patch 5 since I am sure I fully
> >understand whether the new algorithm (in __scale_tsc()) is equivalent to
> >current SVM code. I think you also had questions about that.
> >
> >Having said this, the fact that this patch (and patch 9) fix bugs leads me
> >to believe this feature may not have been thoroughly tested.
> >
> >I don't have a pair of appropriate AMD systems to test this series with
> >migration (which is where this can be verified). Aravind, can you find
> >something and see how this works?
> >
> 
> Haozhong, Boris-
> 
> I am planning to use a Fam10h system (older processor) and Fam15h Model 60h
> (newer processor) for the test case.
> 
> Shall try to run the test on a single system as Haozhong mentioned on a
> different reply.
> I ran into a problem with xl right now which I am trying to solve.
> 
> So, shall keep you posted on how testing goes.
> 
> Btw, I had issues with applying the patches to my local xen.git branch.
> Patches 9 and 10 did not apply cleanly. Here is the log from git apply-
> 
> Patch 9:
> Checking patch xen/arch/x86/time.c...
> error: while searching for:
>     }
>     else
>     {
>         _u.tsc_timestamp     = t->local_tsc_stamp;
>         _u.system_time       = t->stime_local_stamp;
>         _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
>         _u.tsc_shift         = (s8)t->tsc_scale.shift;
>     }
>     if ( is_hvm_domain(d) )
>         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
> 
> error: patch failed: xen/arch/x86/time.c:832
> 
> I think the complaint is about "_u.tsc_timestamp     = t->local_tsc_stamp;".
> I checked current master (http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/time.c;h=5d7452a2bf8b8fb830c14f8897cfca65cb1ad39e;hb=refs/heads/master)
> and the line there is "tsc_stamp = t->local_tsc_stamp" inside the else block
> and outside it, we have "_u.tsc_timestamp = tsc_stamp"
> 
> The rejected hunk for Patch 10:
> +#define VMX_TSC_MULTIPLIER_DEFAULT 0x0001000000000000ULL
> +#define VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
> +
>  #define cpu_has_wbinvd_exiting \
> 
> This seems to be because we have the #defines ordered like so on current
> master-
> #define VMX_MISC_CR3_TARGET                     0x01ff0000
> #define VMX_MISC_VMWRITE_ALL                    0x20000000
> 
> #define cpu_has_wbinvd_exiting \
>     (vmx_secondary_exec_control & SECONDARY_EXEC_WBINVD_EXITING)
> 
> but the *_VMWRITE_ALL define is missing on your diff for Patch 10..
> 
> Maybe I am missing something?
> 
> Thanks,
> -Aravind.

Hi Aravind,

This patchset has been sent out for quite a while and is based on
commit 9cc1346. Something has changed in master and broken the
structure of patch 9 and patch 10. You can either try the old commit
9cc1346, or my rebased patch 9 and patch 10 in the attachment (on
commit e08f383).

Thanks,
Haozhong

[-- Attachment #2: 0009-x86-time.c-Scale-host-TSC-in-pvclock-properly-rebased.patch --]
[-- Type: text/plain, Size: 1509 bytes --]

>From 191effb2beb4d309c70d647c4b0347e15fe6a1d1 Mon Sep 17 00:00:00 2001
From: Haozhong Zhang <haozhong.zhang@intel.com>
Date: Mon, 24 Aug 2015 14:13:35 +0800
Subject: [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly

This patch makes the pvclock return the scaled host TSC and
corresponding scaling parameters to HVM domains if guest TSC is not
emulated and TSC scaling is enabled.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/time.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 2487b3a..a3e8fe7 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -821,10 +821,18 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
     }
     else
     {
-        tsc_stamp = t->local_tsc_stamp;
-
-        _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
-        _u.tsc_shift         = (s8)t->tsc_scale.shift;
+        if ( is_hvm_domain(d) && hvm_funcs.tsc_scaling_supported )
+        {
+            tsc_stamp            = hvm_scale_tsc(v, t->local_tsc_stamp);
+            _u.tsc_to_system_mul = d->arch.vtsc_to_ns.mul_frac;
+            _u.tsc_shift         = d->arch.vtsc_to_ns.shift;
+        }
+        else
+        {
+            tsc_stamp            = t->local_tsc_stamp;
+            _u.tsc_to_system_mul = t->tsc_scale.mul_frac;
+            _u.tsc_shift         = (s8)t->tsc_scale.shift;
+        }
     }
 
     _u.tsc_timestamp = tsc_stamp;
-- 
2.4.8


[-- Attachment #3: 0010-vmx-Detect-and-initialize-VMX-RDTSC-P-scaling-rebased.patch --]
[-- Type: text/plain, Size: 5937 bytes --]

>From 05dfac3a82bbd227da619e14c27ab4f8df438882 Mon Sep 17 00:00:00 2001
From: Haozhong Zhang <haozhong.zhang@intel.com>
Date: Wed, 19 Aug 2015 13:27:11 +0800
Subject: [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling

This patch adds the detection and initialization code for VMX TSC
scaling.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        | 11 +++++++++--
 xen/arch/x86/hvm/vmx/vmx.c         |  9 +++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |  7 +++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4ea1ad1..716703d 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -148,6 +148,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_vmfunc, "VM Functions");
     P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
     P(cpu_has_vmx_pml, "Page Modification Logging");
+    P(cpu_has_vmx_tsc_scaling, "RDTSC(P) Scaling");
 #undef P
 
     if ( !printed )
@@ -240,7 +241,8 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
                SECONDARY_EXEC_ENABLE_INVPCID |
                SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
-               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
+               SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
+               SECONDARY_EXEC_TSC_SCALING);
         rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
         if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
             opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -976,7 +978,7 @@ static int construct_vmcs(struct vcpu *v)
     __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control);
 
     v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control;
-    if ( d->arch.vtsc )
+    if ( d->arch.vtsc && !cpu_has_vmx_tsc_scaling )
         v->arch.hvm_vmx.exec_control |= CPU_BASED_RDTSC_EXITING;
 
     v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
@@ -1250,6 +1252,9 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(GUEST_PAT, guest_pat);
     }
 
+    if ( cpu_has_vmx_tsc_scaling )
+        __vmwrite(TSC_MULTIPLIER, VMX_TSC_MULTIPLIER_DEFAULT);
+
     vmx_vmcs_exit(v);
 
     /* PVH: paging mode is updated by arch_set_info_guest(). */
@@ -1840,6 +1845,8 @@ void vmcs_dump_vcpu(struct vcpu *v)
     printk("IDTVectoring: info=%08x errcode=%08x\n",
            vmr32(IDT_VECTORING_INFO), vmr32(IDT_VECTORING_ERROR_CODE));
     printk("TSC Offset = 0x%016lx\n", vmr(TSC_OFFSET));
+    if ( v->arch.hvm_vmx.secondary_exec_control & SECONDARY_EXEC_TSC_SCALING )
+        printk("TSC Multiplier = 0x%016lx\n", vmr(TSC_MULTIPLIER));
     if ( (v->arch.hvm_vmx.exec_control & CPU_BASED_TPR_SHADOW) ||
          (vmx_pin_based_exec_control & PIN_BASED_POSTED_INTERRUPT) )
         printk("TPR Threshold = 0x%02x  PostedIntrVec = 0x%02x\n",
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 624db1c..454440e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -151,6 +151,8 @@ static int vmx_vcpu_initialise(struct vcpu *v)
     if ( v->vcpu_id == 0 )
         v->arch.user_regs.eax = 1;
 
+    v->arch.tsc_scaling_ratio = VMX_TSC_MULTIPLIER_DEFAULT;
+
     return 0;
 }
 
@@ -1965,6 +1967,10 @@ static struct hvm_function_table __initdata vmx_function_table = {
     .altp2m_vcpu_emulate_vmfunc = vmx_vcpu_emulate_vmfunc,
     /* support for VMX RDTSC(P) scaling */
     .tsc_scaling_supported       = 0,
+    .default_tsc_scaling_ratio   = VMX_TSC_MULTIPLIER_DEFAULT,
+    .max_tsc_scaling_ratio       = VMX_TSC_MULTIPLIER_MAX,
+    .tsc_scaling_ratio_frac_bits = 48,
+    .tsc_scaling_ratio_rsvd      = 0x0ULL,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2017,6 +2023,9 @@ const struct hvm_function_table * __init start_vmx(void)
          && cpu_has_vmx_secondary_exec_control )
         vmx_function_table.pvh_supported = 1;
 
+    if ( cpu_has_vmx_tsc_scaling )
+        vmx_function_table.tsc_scaling_supported = 1;
+
     setup_vmcs_dump();
 
     return &vmx_function_table;
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 865d9fc..6a7a6b5 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -225,6 +225,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 #define SECONDARY_EXEC_ENABLE_PML               0x00020000
 #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS   0x00040000
+#define SECONDARY_EXEC_TSC_SCALING              0x02000000
 extern u32 vmx_secondary_exec_control;
 
 #define VMX_EPT_EXEC_ONLY_SUPPORTED                         0x00000001
@@ -247,6 +248,9 @@ extern u64 vmx_ept_vpid_cap;
 #define VMX_MISC_CR3_TARGET                     0x01ff0000
 #define VMX_MISC_VMWRITE_ALL                    0x20000000
 
+#define VMX_TSC_MULTIPLIER_DEFAULT 0x0001000000000000ULL
+#define VMX_TSC_MULTIPLIER_MAX     0xffffffffffffffffULL
+
 #define cpu_has_wbinvd_exiting \
     (vmx_secondary_exec_control & SECONDARY_EXEC_WBINVD_EXITING)
 #define cpu_has_vmx_virtualize_apic_accesses \
@@ -290,6 +294,8 @@ extern u64 vmx_ept_vpid_cap;
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS)
 #define cpu_has_vmx_pml \
     (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_PML)
+#define cpu_has_vmx_tsc_scaling \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
 
 #define VMCS_RID_TYPE_MASK              0x80000000
 
@@ -364,6 +370,7 @@ enum vmcs_field {
     VMREAD_BITMAP                   = 0x00002026,
     VMWRITE_BITMAP                  = 0x00002028,
     VIRT_EXCEPTION_INFO             = 0x0000202a,
+    TSC_MULTIPLIER                  = 0x00002032,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     VMCS_LINK_POINTER               = 0x00002800,
     GUEST_IA32_DEBUGCTL             = 0x00002802,
-- 
2.4.8


[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware
  2015-10-27 13:33   ` Jan Beulich
@ 2015-10-28  2:41     ` Haozhong Zhang
  0 siblings, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-10-28  2:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Aravind Gopalakrishnan, Suravee Suthikulpanit, Keir Fraser,
	Boris Ostrovsky

On Tue, Oct 27, 2015 at 07:33:46AM -0600, Jan Beulich wrote:
> >>> On 28.09.15 at 09:13, <haozhong.zhang@intel.com> wrote:
> > This patch adds a new call-back setup_tsc_scaling in struct
> > hvm_function_table to apply the TSC scaling ratio to hardware. For VMX,
> > it writes the TSC scaling ratio to VMCS field TSC_MULTIPLIER.
> > 
> > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> 
> Looking at this (last non-tools one) patch, I wonder how if this is
> needed things can be correct at the point of the series prior to
> this patch. IOW - is the series correctly ordered?
> 
> Jan
> 

If I keep the scaling logic out of vmx_set_tsc_offset(), then patch 11
will be not necessary. Then, I can move patch 12 before patch 10. The
reordered patch 12 only adds a callback which is never called until
patch 10 is applied.

Haozhong

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-10-27 20:16       ` Aravind Gopalakrishnan
  2015-10-28  1:51         ` Haozhong Zhang
@ 2015-11-09  7:43         ` Haozhong Zhang
  2015-11-12 13:50           ` George Dunlap
  1 sibling, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-11-09  7:43 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	Suravee Suthikulpanit, Keir Fraser, Boris Ostrovsky,
	Ian Campbell

On 10/27/15 15:16, Aravind Gopalakrishnan wrote:
[...]
> 
> Haozhong, Boris-
> 
> I am planning to use a Fam10h system (older processor) and Fam15h Model 60h
> (newer processor) for the test case.
> 
> Shall try to run the test on a single system as Haozhong mentioned on a
> different reply.
> I ran into a problem with xl right now which I am trying to solve.
> 
> So, shall keep you posted on how testing goes.
>

Hi Aravind,

How is your test going?

Thanks,
Haozhong

[...]

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC
  2015-11-09  7:43         ` Haozhong Zhang
@ 2015-11-12 13:50           ` George Dunlap
  0 siblings, 0 replies; 117+ messages in thread
From: George Dunlap @ 2015-11-12 13:50 UTC (permalink / raw)
  To: Aravind Gopalakrishnan, Boris Ostrovsky, Jan Beulich,
	Suravee Suthikulpanit, Andrew Cooper, Ian Campbell, Wei Liu,
	Ian Jackson, Stefano Stabellini, Jun Nakajima, Kevin Tian,
	xen-devel, Keir Fraser

On Mon, Nov 9, 2015 at 7:43 AM, Haozhong Zhang <haozhong.zhang@intel.com> wrote:
> On 10/27/15 15:16, Aravind Gopalakrishnan wrote:
> [...]
>>
>> Haozhong, Boris-
>>
>> I am planning to use a Fam10h system (older processor) and Fam15h Model 60h
>> (newer processor) for the test case.
>>
>> Shall try to run the test on a single system as Haozhong mentioned on a
>> different reply.
>> I ran into a problem with xl right now which I am trying to solve.
>>
>> So, shall keep you posted on how testing goes.
>>
>
> Hi Aravind,
>
> How is your test going?

To be honest, I'm inclined to say that we should check this in and ask
people in the community with AMD boxes to give it a spin.  Haozhong
has obviously tried very carefully to avoid breaking it on AMD boxes;
it's not reasonable to block this feature based on testing for
hardware that he doesn't have, particularly if the maintainer of that
hardware isn't very responsive.

 -George

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
                   ` (13 preceding siblings ...)
  2015-09-28 10:51 ` [PATCH 00/13] Add VMX TSC scaling support Andrew Cooper
@ 2015-11-22 17:54 ` Haozhong Zhang
  2015-11-23 15:37   ` Boris Ostrovsky
  14 siblings, 1 reply; 117+ messages in thread
From: Haozhong Zhang @ 2015-11-22 17:54 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky, Aravind Gopalakrishnan
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	George Dunlap, Suravee Suthikulpanit, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 5222 bytes --]

Hi Jan, Boris and Aravind,

(Sorry for sending such a long email and thanks for your patience)

Because this patchset also touches the existing SVM TSC ratio code, I
tested it on an AMD machine with an AMD A10-7700K CPU (3.4 GHz) that
supports SVM TSC ratio. There are two goals of the test:
 (1) Check whether this patchset works well for SVM TSC ratio.
 (2) Check whether the existing SVM TSC ratio code works correctly.

* TL;DR
  The detailed testing process is boring and long, so I put the
  conclusions first.

  According to the following test,
  (1) this patchset works well for SVM TSC ratio, and
  (2) the existing SVM TSC ratio code does not work correctly.


* Preliminary bug fix

  Before testing (specially for goal (2)), I have to fix another bug
  found in the current svm_get_tsc_offset() (commit e08f383):

  static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc,
    uint64_t ratio)
  {
      uint64_t offset;

      if (ratio == DEFAULT_TSC_RATIO)
          return guest_tsc - host_tsc;

      /* calculate hi,lo parts in 64bits to prevent overflow */
      offset = (((host_tsc >> 32U) * (ratio >> 32U)) << 32U) +
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            (host_tsc & 0xffffffffULL) * (ratio & 0xffffffffULL);
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            ^^ wrong

      return guest_tsc - offset;
  }

  Looking at the AMD's spec about TSC ratio MSR and where this function is
  called, it's expected to calculate
      guest_tsc - (host_tsc * ratio) >> 32
  but above underlined code is definitely not "(host_tsc * ratio) >> 32",
  and above function will return a much larger result than
  expected if (guest TSC rate / host TSC rate) > 1. In practice, it
  could result the guest TSC jumping to several years later after
  migration (which I came across and was confuse by in this test).

  This bug can be fixed either later by patch 5 which introduces a
  common function hvm_scale_tsc() to scale TSC, or by replacing above
  underlined code with a simplified and inlined version of
  hvm_scale_tsc() as below:
      uint64_t mult, frac;
      mult    = ratio >> 32;
      frac    = ratio & ((1ULL << 32) - 1);
      offset  = host_tsc * mult;
      offset += (host_tsc >> 32) * frac;
      offset += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32;
  For testing goal (2), I apply the latter fix.


* Test for goal (1)

  * Environment
    (1) Xen (commit e08f383)
    (2) Host Linux kernel 3.19.0
    (3) Guest Linux kernel 3.19.0 & 4.2.0

  * Process
    (1) Apply the whole patchset on commit e08f383.

    (2) Launch a HVM domain from the configuration xl-high.cfg (in
        attachment).

        Expected: The guest Linux should boot normally in the domain.

    (3) Execute the command "dmesg | grep -i tsc" in the guest Linux
        to check the TSC rate detected by the guest Linux.

        Expected: Suppose the detected TSC rate is 'gtsc_khz' in KHz,
	          then it should be as close to the value of 'vtsc_khz'
		  option in xl-high.cfg as possible.

    (4) Execute the program "./test_tsc <nr_secs> gtsc_khz" to check
        whether the guest TSC rate is synchronized with the wall clock.
        The code of test_tsc is also in the attachment. It records the
        beginning and ending TSC values (tsc0 and tsc1) for a period
        of nr_secs and outputs the result of
	(tsc1 - tsc0) / (gtsc_khz * 1000).

        Expected: The output should be as close to nr_secs as possible.

     Follows test the migration.

     (5) Save the current domain by "xl save hvm-test saved_domain".

     (6) Restore the domain.

     (7) Take above step (4) again to check whether the guest TSC rate
         is still synchronized with the wall clock.

         Expected: the same as step (5)

     (8) Switch to the configuration xl-low.cfg and take above
         steps (2) ~ (6) again.

  * Results (OK: All as expected)
    First round w/ xl-high.cfg (vtsc_khz = 4000000):
    (3) gtsc_khz = 4000000 KHz
    (4) ./test_tsc 10 4000000   outputs: Passed 9.99895 s
        ./test_tsc 3600 4000000 outputs: Passed 3599.99754 s
    (7) ./test_tsc 10 4000000   outputs: Passed 9.99885 s
        ./test_tsc 3600 4000000 outputs: Passed 3599.98987 s

    Second round w/ xl-low.cfg (vtsc_khz = 2000000):
    (3) gtsc_khz = 2000000 KHz
    (4) ./test_tsc 10 4000000   outputs: Passed 9.99886 s
        ./test_tsc 3600 4000000 outputs: Passed 3599.99810 s
    (7) ./test_tsc 10 4000000   outputs: Passed 9.99885 s
        ./test_tsc 3600 4000000 outputs: Passed 3599.99853 s

   I also switched the clocksource of guest Linux to 'hpet' and got
   very similar results to above.


* Test for goal (2)

  * Environment
    The same as above

  * Process
    (1) ~ (5): the same as above.
    (6) Reboot to Xen hypervisor and toolstack w/o this patchset but
        w/ the bug fix at the beginning and restore the domain.
    (7) the same as above.

  * Results (Failed)
    (7) ./test_tsc 10 4000000 outputs: Passed 63.319284 s


* Conclusion

  This patchset works well for SVM TSC ratio and fixes existing bugs
  in SVM TSC ratio code.


Thanks for your patience to read such a long email,
Haozhong


[-- Attachment #2: test_tsc.c --]
[-- Type: text/plain, Size: 1058 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <inttypes.h>

static uint64_t rdtsc(void)
{
        uint32_t lo, hi;
        asm volatile("lfence; rdtsc" : "=a" (lo), "=d" (hi));
        return ((uint64_t) hi << 32) | lo;
}

int main(int argc, char **argv)
{
        int s, khz;
        uint64_t tsc1, tsc2, delta;
        double tsc_s, error;

        if (argc < 3) {
                printf("Usage: %s <sleep seconds> <cpu khz>\n", argv[0]);
                exit(1);
        }

        s = atoi(argv[1]);

        khz = atoi(argv[2]);
        if (khz == 0) {
                printf("cpu khz must be larger than 0\n");
                exit(1);
        }

        tsc1 = rdtsc();
        sleep(s);
        tsc2 = rdtsc();

        delta = tsc2 - tsc1;
        tsc_s = delta / khz / 1000.0;
        error = (tsc_s - s) * 100.0 / s;

        printf("tsc1  = %" PRIu64 ", "
               "tsc2  = %" PRIu64 ", "
               "delta = %lf s, error = %lf\n",
               tsc1, tsc2, tsc_s, error);

        exit(0);
}

[-- Attachment #3: xl-high.cfg --]
[-- Type: text/plain, Size: 303 bytes --]

builder = "hvm"
name    = "hvm-test"

vcpus   = 4
memory  = 512
disk    = [ 'guest.qcow,qcow2,hda,rw' ]

device_model_override = '/usr/local/lib/xen/bin/qemu-system-i386'
device_model_version  = 'qemu-xen'

sdl     = 0
vnc     = 1
hap     = 1
hpet    = 1
acpi    = 1
serial  = 'pty'

vtsc_khz = 4000000

[-- Attachment #4: xl-low.cfg --]
[-- Type: text/plain, Size: 303 bytes --]

builder = "hvm"
name    = "hvm-test"

vcpus   = 4
memory  = 512
disk    = [ 'guest.qcow,qcow2,hda,rw' ]

device_model_override = '/usr/local/lib/xen/bin/qemu-system-i386'
device_model_version  = 'qemu-xen'

sdl     = 0
vnc     = 1
hap     = 1
hpet    = 1
acpi    = 1
serial  = 'pty'

vtsc_khz = 2000000

[-- Attachment #5: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-11-22 17:54 ` Haozhong Zhang
@ 2015-11-23 15:37   ` Boris Ostrovsky
  2015-11-24 13:05     ` Haozhong Zhang
  0 siblings, 1 reply; 117+ messages in thread
From: Boris Ostrovsky @ 2015-11-23 15:37 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir Fraser,
	Andrew Cooper, Suravee Suthikulpanit, Jun Nakajima, Kevin Tian,
	George Dunlap, xen-devel

On 11/22/2015 12:54 PM, Haozhong Zhang wrote:
> Hi Jan, Boris and Aravind,
>
> (Sorry for sending such a long email and thanks for your patience)

First, thank you very much for doing this.

>
> Because this patchset also touches the existing SVM TSC ratio code, I
> tested it on an AMD machine with an AMD A10-7700K CPU (3.4 GHz) that
> supports SVM TSC ratio. There are two goals of the test:
>   (1) Check whether this patchset works well for SVM TSC ratio.
>   (2) Check whether the existing SVM TSC ratio code works correctly.
>
> * TL;DR
>    The detailed testing process is boring and long, so I put the
>    conclusions first.
>
>    According to the following test,
>    (1) this patchset works well for SVM TSC ratio, and
>    (2) the existing SVM TSC ratio code does not work correctly.
>
>
> * Preliminary bug fix
>
>    Before testing (specially for goal (2)), I have to fix another bug
>    found in the current svm_get_tsc_offset() (commit e08f383):
>
>    static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc,
>      uint64_t ratio)
>    {
>        uint64_t offset;
>
>        if (ratio == DEFAULT_TSC_RATIO)
>            return guest_tsc - host_tsc;
>
>        /* calculate hi,lo parts in 64bits to prevent overflow */
>        offset = (((host_tsc >> 32U) * (ratio >> 32U)) << 32U) +
>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>              (host_tsc & 0xffffffffULL) * (ratio & 0xffffffffULL);
>              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>              ^^ wrong
>
>        return guest_tsc - offset;
>    }
>
>    Looking at the AMD's spec about TSC ratio MSR and where this function is
>    called, it's expected to calculate
>        guest_tsc - (host_tsc * ratio) >> 32
>    but above underlined code is definitely not "(host_tsc * ratio) >> 32",
>    and above function will return a much larger result than
>    expected if (guest TSC rate / host TSC rate) > 1. In practice, it
>    could result the guest TSC jumping to several years later after
>    migration (which I came across and was confuse by in this test).

Yes, this is obviously wrong.

>
>    This bug can be fixed either later by patch 5 which introduces a
>    common function hvm_scale_tsc() to scale TSC, or by replacing above
>    underlined code with a simplified and inlined version of
>    hvm_scale_tsc() as below:
>        uint64_t mult, frac;
>        mult    = ratio >> 32;
>        frac    = ratio & ((1ULL << 32) - 1);
>        offset  = host_tsc * mult;
>        offset += (host_tsc >> 32) * frac;
>        offset += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32;

I am not sure I understand the last line (or maybe 2 lines)

If by 'offset' here you are trying to calculate the scaled version of 
host TSC then I think it would be

(host_tsc * (ratio >> 32)) + ( (host_tsc * (ratio & 0xffffffff)) >> 32 )

(sanity check: assuming host_tsc is 8 and the ratio is 1.5 (i.e. 
0x180000000) we get 12)


-boris


>    For testing goal (2), I apply the latter fix.
>
>
> * Test for goal (1)
>
>    * Environment
>      (1) Xen (commit e08f383)
>      (2) Host Linux kernel 3.19.0
>      (3) Guest Linux kernel 3.19.0 & 4.2.0
>
>    * Process
>      (1) Apply the whole patchset on commit e08f383.
>
>      (2) Launch a HVM domain from the configuration xl-high.cfg (in
>          attachment).
>
>          Expected: The guest Linux should boot normally in the domain.
>
>      (3) Execute the command "dmesg | grep -i tsc" in the guest Linux
>          to check the TSC rate detected by the guest Linux.
>
>          Expected: Suppose the detected TSC rate is 'gtsc_khz' in KHz,
> 	          then it should be as close to the value of 'vtsc_khz'
> 		  option in xl-high.cfg as possible.
>
>      (4) Execute the program "./test_tsc <nr_secs> gtsc_khz" to check
>          whether the guest TSC rate is synchronized with the wall clock.
>          The code of test_tsc is also in the attachment. It records the
>          beginning and ending TSC values (tsc0 and tsc1) for a period
>          of nr_secs and outputs the result of
> 	(tsc1 - tsc0) / (gtsc_khz * 1000).
>
>          Expected: The output should be as close to nr_secs as possible.
>
>       Follows test the migration.
>
>       (5) Save the current domain by "xl save hvm-test saved_domain".
>
>       (6) Restore the domain.
>
>       (7) Take above step (4) again to check whether the guest TSC rate
>           is still synchronized with the wall clock.
>
>           Expected: the same as step (5)
>
>       (8) Switch to the configuration xl-low.cfg and take above
>           steps (2) ~ (6) again.
>
>    * Results (OK: All as expected)
>      First round w/ xl-high.cfg (vtsc_khz = 4000000):
>      (3) gtsc_khz = 4000000 KHz
>      (4) ./test_tsc 10 4000000   outputs: Passed 9.99895 s
>          ./test_tsc 3600 4000000 outputs: Passed 3599.99754 s
>      (7) ./test_tsc 10 4000000   outputs: Passed 9.99885 s
>          ./test_tsc 3600 4000000 outputs: Passed 3599.98987 s
>
>      Second round w/ xl-low.cfg (vtsc_khz = 2000000):
>      (3) gtsc_khz = 2000000 KHz
>      (4) ./test_tsc 10 4000000   outputs: Passed 9.99886 s
>          ./test_tsc 3600 4000000 outputs: Passed 3599.99810 s
>      (7) ./test_tsc 10 4000000   outputs: Passed 9.99885 s
>          ./test_tsc 3600 4000000 outputs: Passed 3599.99853 s
>
>     I also switched the clocksource of guest Linux to 'hpet' and got
>     very similar results to above.
>
>
> * Test for goal (2)
>
>    * Environment
>      The same as above
>
>    * Process
>      (1) ~ (5): the same as above.
>      (6) Reboot to Xen hypervisor and toolstack w/o this patchset but
>          w/ the bug fix at the beginning and restore the domain.
>      (7) the same as above.
>
>    * Results (Failed)
>      (7) ./test_tsc 10 4000000 outputs: Passed 63.319284 s
>
>
> * Conclusion
>
>    This patchset works well for SVM TSC ratio and fixes existing bugs
>    in SVM TSC ratio code.
>
>
> Thanks for your patience to read such a long email,
> Haozhong
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-11-23 15:37   ` Boris Ostrovsky
@ 2015-11-24 13:05     ` Haozhong Zhang
  2015-11-24 14:19       ` Boris Ostrovsky
  2015-11-24 14:25       ` Haozhong Zhang
  0 siblings, 2 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-11-24 13:05 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Kevin Tian, Wei Liu, Ian Campbell, Stefano Stabellini,
	Jun Nakajima, Andrew Cooper, Ian Jackson, xen-devel,
	George Dunlap, Aravind Gopalakrishnan, Jan Beulich, Keir Fraser,
	Suravee Suthikulpanit

On 11/23/15 10:37, Boris Ostrovsky wrote:
> On 11/22/2015 12:54 PM, Haozhong Zhang wrote:
> >Hi Jan, Boris and Aravind,
> >
> >(Sorry for sending such a long email and thanks for your patience)
> 
> First, thank you very much for doing this.
> 
> >
> >Because this patchset also touches the existing SVM TSC ratio code, I
> >tested it on an AMD machine with an AMD A10-7700K CPU (3.4 GHz) that
> >supports SVM TSC ratio. There are two goals of the test:
> >  (1) Check whether this patchset works well for SVM TSC ratio.
> >  (2) Check whether the existing SVM TSC ratio code works correctly.
> >
> >* TL;DR
> >   The detailed testing process is boring and long, so I put the
> >   conclusions first.
> >
> >   According to the following test,
> >   (1) this patchset works well for SVM TSC ratio, and
> >   (2) the existing SVM TSC ratio code does not work correctly.
> >
> >
> >* Preliminary bug fix
> >
> >   Before testing (specially for goal (2)), I have to fix another bug
> >   found in the current svm_get_tsc_offset() (commit e08f383):
> >
> >   static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc,
> >     uint64_t ratio)
> >   {
> >       uint64_t offset;
> >
> >       if (ratio == DEFAULT_TSC_RATIO)
> >           return guest_tsc - host_tsc;
> >
> >       /* calculate hi,lo parts in 64bits to prevent overflow */
> >       offset = (((host_tsc >> 32U) * (ratio >> 32U)) << 32U) +
> >       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >             (host_tsc & 0xffffffffULL) * (ratio & 0xffffffffULL);
> >             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >             ^^ wrong
> >
> >       return guest_tsc - offset;
> >   }
> >
> >   Looking at the AMD's spec about TSC ratio MSR and where this function is
> >   called, it's expected to calculate
> >       guest_tsc - (host_tsc * ratio) >> 32
> >   but above underlined code is definitely not "(host_tsc * ratio) >> 32",
> >   and above function will return a much larger result than
> >   expected if (guest TSC rate / host TSC rate) > 1. In practice, it
> >   could result the guest TSC jumping to several years later after
> >   migration (which I came across and was confuse by in this test).
> 
> Yes, this is obviously wrong.
> 
> >
> >   This bug can be fixed either later by patch 5 which introduces a
> >   common function hvm_scale_tsc() to scale TSC, or by replacing above
> >   underlined code with a simplified and inlined version of
> >   hvm_scale_tsc() as below:
> >       uint64_t mult, frac;
> >       mult    = ratio >> 32;
> >       frac    = ratio & ((1ULL << 32) - 1);
> >       offset  = host_tsc * mult;                               
> >       offset += (host_tsc >> 32) * frac;                       
> >       offset += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32; 
> 
> I am not sure I understand the last line (or maybe 2 lines)
>

Just simple math with carefulness to avoid 64-bit integer overflow:

suppose the most significant 32 bits of host_tsc and ratio are tsc_h
and mult, and the least significant 32 bits of them are tsc_l and
frac, then
    host_tsc * ratio * 2^-32
    = host_tsc * (mult * 2^32 + frac) * 2^-32
    = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32
    = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32)
      
All multiplications in the last line are between 32-bit integers, so none
of them could overflow 64-bit integers.

Consider a simple example that host_tsc = 1ULL << 33 and ratio = 0xffffffff.
Overflow happens in the multiplication of the second term of your formula below,
and all overflowed bits are lost in the next right shift.

Haozhong

> If by 'offset' here you are trying to calculate the scaled version of host
> TSC then I think it would be
> 
> (host_tsc * (ratio >> 32)) + ( (host_tsc * (ratio & 0xffffffff)) >> 32 )
> 
> (sanity check: assuming host_tsc is 8 and the ratio is 1.5 (i.e.
> 0x180000000) we get 12)
> 
> 
> -boris
> 
> 
> >   For testing goal (2), I apply the latter fix.
> >
> >
> >* Test for goal (1)
> >
> >   * Environment
> >     (1) Xen (commit e08f383)
> >     (2) Host Linux kernel 3.19.0
> >     (3) Guest Linux kernel 3.19.0 & 4.2.0
> >
> >   * Process
> >     (1) Apply the whole patchset on commit e08f383.
> >
> >     (2) Launch a HVM domain from the configuration xl-high.cfg (in
> >         attachment).
> >
> >         Expected: The guest Linux should boot normally in the domain.
> >
> >     (3) Execute the command "dmesg | grep -i tsc" in the guest Linux
> >         to check the TSC rate detected by the guest Linux.
> >
> >         Expected: Suppose the detected TSC rate is 'gtsc_khz' in KHz,
> >	          then it should be as close to the value of 'vtsc_khz'
> >		  option in xl-high.cfg as possible.
> >
> >     (4) Execute the program "./test_tsc <nr_secs> gtsc_khz" to check
> >         whether the guest TSC rate is synchronized with the wall clock.
> >         The code of test_tsc is also in the attachment. It records the
> >         beginning and ending TSC values (tsc0 and tsc1) for a period
> >         of nr_secs and outputs the result of
> >	(tsc1 - tsc0) / (gtsc_khz * 1000).
> >
> >         Expected: The output should be as close to nr_secs as possible.
> >
> >      Follows test the migration.
> >
> >      (5) Save the current domain by "xl save hvm-test saved_domain".
> >
> >      (6) Restore the domain.
> >
> >      (7) Take above step (4) again to check whether the guest TSC rate
> >          is still synchronized with the wall clock.
> >
> >          Expected: the same as step (5)
> >
> >      (8) Switch to the configuration xl-low.cfg and take above
> >          steps (2) ~ (6) again.
> >
> >   * Results (OK: All as expected)
> >     First round w/ xl-high.cfg (vtsc_khz = 4000000):
> >     (3) gtsc_khz = 4000000 KHz
> >     (4) ./test_tsc 10 4000000   outputs: Passed 9.99895 s
> >         ./test_tsc 3600 4000000 outputs: Passed 3599.99754 s
> >     (7) ./test_tsc 10 4000000   outputs: Passed 9.99885 s
> >         ./test_tsc 3600 4000000 outputs: Passed 3599.98987 s
> >
> >     Second round w/ xl-low.cfg (vtsc_khz = 2000000):
> >     (3) gtsc_khz = 2000000 KHz
> >     (4) ./test_tsc 10 4000000   outputs: Passed 9.99886 s
> >         ./test_tsc 3600 4000000 outputs: Passed 3599.99810 s
> >     (7) ./test_tsc 10 4000000   outputs: Passed 9.99885 s
> >         ./test_tsc 3600 4000000 outputs: Passed 3599.99853 s
> >
> >    I also switched the clocksource of guest Linux to 'hpet' and got
> >    very similar results to above.
> >
> >
> >* Test for goal (2)
> >
> >   * Environment
> >     The same as above
> >
> >   * Process
> >     (1) ~ (5): the same as above.
> >     (6) Reboot to Xen hypervisor and toolstack w/o this patchset but
> >         w/ the bug fix at the beginning and restore the domain.
> >     (7) the same as above.
> >
> >   * Results (Failed)
> >     (7) ./test_tsc 10 4000000 outputs: Passed 63.319284 s
> >
> >
> >* Conclusion
> >
> >   This patchset works well for SVM TSC ratio and fixes existing bugs
> >   in SVM TSC ratio code.
> >
> >
> >Thanks for your patience to read such a long email,
> >Haozhong
> >
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-11-24 13:05     ` Haozhong Zhang
@ 2015-11-24 14:19       ` Boris Ostrovsky
  2015-11-24 14:25       ` Haozhong Zhang
  1 sibling, 0 replies; 117+ messages in thread
From: Boris Ostrovsky @ 2015-11-24 14:19 UTC (permalink / raw)
  To: Jan Beulich, Aravind Gopalakrishnan, Ian Jackson,
	Stefano Stabellini, Ian Campbell, Wei Liu, Keir Fraser,
	Andrew Cooper, Suravee Suthikulpanit, Jun Nakajima, Kevin Tian,
	George Dunlap, xen-devel



On 11/24/2015 08:05 AM, Haozhong Zhang wrote:
>
>>>    This bug can be fixed either later by patch 5 which introduces a
>>>    common function hvm_scale_tsc() to scale TSC, or by replacing above
>>>    underlined code with a simplified and inlined version of
>>>    hvm_scale_tsc() as below:
>>>        uint64_t mult, frac;
>>>        mult    = ratio >> 32;
>>>        frac    = ratio & ((1ULL << 32) - 1);
>>>        offset  = host_tsc * mult;
>>>        offset += (host_tsc >> 32) * frac;
>>>        offset += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32;
>> I am not sure I understand the last line (or maybe 2 lines)
>>
> Just simple math with carefulness to avoid 64-bit integer overflow:
>
> suppose the most significant 32 bits of host_tsc and ratio are tsc_h
> and mult, and the least significant 32 bits of them are tsc_l and
> frac, then
>      host_tsc * ratio * 2^-32
>      = host_tsc * (mult * 2^32 + frac) * 2^-32
>      = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32
>      = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32)

Ok, now I see. Please include this in patch comments.

-boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH 00/13] Add VMX TSC scaling support
  2015-11-24 13:05     ` Haozhong Zhang
  2015-11-24 14:19       ` Boris Ostrovsky
@ 2015-11-24 14:25       ` Haozhong Zhang
  1 sibling, 0 replies; 117+ messages in thread
From: Haozhong Zhang @ 2015-11-24 14:25 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich, Aravind Gopalakrishnan,
	Ian Jackson, Stefano Stabellini, Ian Campbell, Wei Liu,
	Keir Fraser, Andrew Cooper, Suravee Suthikulpanit, Jun Nakajima,
	Kevin Tian, George Dunlap, xen-devel

On 11/24/15 21:05, Haozhong Zhang wrote:
[...]
> > >
> > >   This bug can be fixed either later by patch 5 which introduces a
> > >   common function hvm_scale_tsc() to scale TSC, or by replacing above
> > >   underlined code with a simplified and inlined version of
> > >   hvm_scale_tsc() as below:
> > >       uint64_t mult, frac;
> > >       mult    = ratio >> 32;
> > >       frac    = ratio & ((1ULL << 32) - 1);
> > >       offset  = host_tsc * mult;                               
> > >       offset += (host_tsc >> 32) * frac;                       
> > >       offset += ((host_tsc & ((1ULL << 32) - 1)) * frac) >> 32; 
> > 
> > I am not sure I understand the last line (or maybe 2 lines)
> >
> 
> Just simple math with carefulness to avoid 64-bit integer overflow:
> 
> suppose the most significant 32 bits of host_tsc and ratio are tsc_h
> and mult, and the least significant 32 bits of them are tsc_l and
> frac, then
>     host_tsc * ratio * 2^-32
>     = host_tsc * (mult * 2^32 + frac) * 2^-32
>     = host_tsc * mult + (tsc_h * 2^32 + tsc_l) * frac * 2^-32
>     = host_tsc * mult + tsc_h * frac + ((tsc_l * frac) >> 32)
>       
> All multiplications in the last line are between 32-bit integers, so none
> of them could overflow 64-bit integers.

Sorry, it should be "All but the first multiplication host_tsc * mult
...". In practice, it should be very rare for the first term to
overflow (considering mult is usually less than 10).

Haozhong

> 
> Consider a simple example that host_tsc = 1ULL << 33 and ratio = 0xffffffff.
> Overflow happens in the multiplication of the second term of your formula below,
> and all overflowed bits are lost in the next right shift.
> 
> Haozhong
> 
> > If by 'offset' here you are trying to calculate the scaled version of host
> > TSC then I think it would be
> > 
> > (host_tsc * (ratio >> 32)) + ( (host_tsc * (ratio & 0xffffffff)) >> 32 )
> > 
> > (sanity check: assuming host_tsc is 8 and the ratio is 1.5 (i.e.
> > 0x180000000) we get 12)
> > 
> > 
> > -boris

^ permalink raw reply	[flat|nested] 117+ messages in thread

end of thread, other threads:[~2015-11-24 14:25 UTC | newest]

Thread overview: 117+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-28  7:13 [PATCH 00/13] Add VMX TSC scaling support Haozhong Zhang
2015-09-28  7:13 ` [PATCH 01/13] x86/time.c: Use system time to calculate elapsed_nsec in tsc_get_info() Haozhong Zhang
2015-10-09  6:51   ` Jan Beulich
2015-10-09 13:41     ` Boris Ostrovsky
2015-10-09 14:00       ` Haozhong Zhang
2015-10-09 15:11         ` Jan Beulich
2015-10-09 16:09           ` Boris Ostrovsky
2015-10-09 16:19             ` Jan Beulich
2015-10-09 16:31               ` Boris Ostrovsky
2015-10-09 16:51                 ` Haozhong Zhang
2015-10-09 18:59                   ` Boris Ostrovsky
2015-10-09 14:39       ` Jan Beulich
2015-10-09 15:37         ` Boris Ostrovsky
2015-10-09 16:39           ` Haozhong Zhang
2015-10-09 16:44             ` Boris Ostrovsky
2015-10-09 14:35     ` Haozhong Zhang
2015-10-09 14:43       ` Jan Beulich
2015-10-09 15:56         ` Boris Ostrovsky
2015-10-14  2:45     ` Haozhong Zhang
2015-10-14  9:40       ` Jan Beulich
2015-10-14 10:00         ` Haozhong Zhang
2015-10-14 10:20           ` Jan Beulich
2015-10-14 10:24             ` Haozhong Zhang
2015-09-28  7:13 ` [PATCH 02/13] x86/time.c: Get the correct guest TSC rate " Haozhong Zhang
2015-09-28  7:13 ` [PATCH 03/13] x86/hvm: Collect information of TSC scaling ratio Haozhong Zhang
2015-10-22 12:53   ` Jan Beulich
2015-10-22 14:40     ` Haozhong Zhang
2015-10-22 14:51       ` Jan Beulich
2015-10-22 15:57         ` Haozhong Zhang
2015-09-28  7:13 ` [PATCH 04/13] x86/hvm: Setup " Haozhong Zhang
2015-10-22 13:13   ` Jan Beulich
2015-10-22 15:55     ` Haozhong Zhang
2015-10-22 16:05       ` Jan Beulich
2015-10-22 16:39         ` Haozhong Zhang
2015-10-23  7:44     ` Haozhong Zhang
2015-10-23  7:59       ` Jan Beulich
2015-10-23  8:18         ` Haozhong Zhang
2015-10-23  8:31           ` Jan Beulich
2015-10-23  8:40             ` Haozhong Zhang
2015-10-23  9:18               ` Jan Beulich
2015-09-28  7:13 ` [PATCH 05/13] x86/hvm: Replace architecture TSC scaling by a common function Haozhong Zhang
2015-10-22 13:52   ` Jan Beulich
2015-10-23  0:49     ` Haozhong Zhang
2015-09-28  7:13 ` [PATCH 06/13] x86/hvm: Scale host TSC when setting/getting guest TSC Haozhong Zhang
2015-10-22 14:17   ` Jan Beulich
2015-10-22 15:44     ` Boris Ostrovsky
2015-10-22 16:23       ` Haozhong Zhang
2015-10-27 20:16       ` Aravind Gopalakrishnan
2015-10-28  1:51         ` Haozhong Zhang
2015-11-09  7:43         ` Haozhong Zhang
2015-11-12 13:50           ` George Dunlap
2015-10-22 16:03     ` Haozhong Zhang
2015-10-27  1:54     ` Haozhong Zhang
2015-10-27  8:15       ` Jan Beulich
2015-10-27  8:25         ` Haozhong Zhang
2015-10-27  8:44     ` Haozhong Zhang
2015-10-27 13:10       ` Boris Ostrovsky
2015-10-27 13:55         ` Boris Ostrovsky
2015-10-27 16:13           ` haozhong.zhang
2015-10-27 16:13         ` haozhong.zhang
2015-09-28  7:13 ` [PATCH 07/13] x86/hvm: Move saving/loading vcpu's TSC to common code Haozhong Zhang
2015-10-22 14:54   ` Jan Beulich
2015-09-28  7:13 ` [PATCH 08/13] x86/hvm: Detect TSC scaling through hvm_funcs in tsc_set_info() Haozhong Zhang
2015-10-22 15:01   ` Jan Beulich
2015-09-28  7:13 ` [PATCH 09/13] x86/time.c: Scale host TSC in pvclock properly Haozhong Zhang
2015-09-28 16:36   ` Boris Ostrovsky
2015-09-29  0:19     ` Haozhong Zhang
2015-10-22 15:50   ` Boris Ostrovsky
2015-10-22 16:44     ` Haozhong Zhang
2015-10-22 19:15       ` Boris Ostrovsky
2015-09-28  7:13 ` [PATCH 10/13] vmx: Detect and initialize VMX RDTSC(P) scaling Haozhong Zhang
2015-10-27 13:19   ` Jan Beulich
2015-10-27 16:17     ` Haozhong Zhang
2015-09-28  7:13 ` [PATCH 11/13] vmx: Use scaled host TSC to calculate TSC offset Haozhong Zhang
2015-10-22 15:55   ` Boris Ostrovsky
2015-10-22 17:12     ` Haozhong Zhang
2015-10-22 19:19       ` Boris Ostrovsky
2015-10-23  0:52         ` Haozhong Zhang
2015-10-27 13:29   ` Jan Beulich
2015-10-27 16:21     ` Haozhong Zhang
2015-09-28  7:13 ` [PATCH 12/13] vmx: Add a call-back to apply TSC scaling ratio to hardware Haozhong Zhang
2015-09-28 16:02   ` Boris Ostrovsky
2015-09-29  1:07     ` Haozhong Zhang
2015-09-29  9:33       ` Andrew Cooper
2015-09-29 10:02         ` Haozhong Zhang
2015-09-29 10:25           ` Andrew Cooper
2015-09-29 13:59             ` Haozhong Zhang
2015-10-27 13:33   ` Jan Beulich
2015-10-28  2:41     ` Haozhong Zhang
2015-09-28  7:13 ` [PATCH 13/13] tools/libxl: Add 'vtsc_khz' option to set guest TSC rate Haozhong Zhang
2015-09-28 11:47   ` Julien Grall
2015-09-28 12:11     ` Haozhong Zhang
2015-09-28 14:19   ` Wei Liu
2015-09-29  0:40     ` Haozhong Zhang
2015-09-29  9:20       ` Wei Liu
2015-09-29  9:50         ` Haozhong Zhang
2015-09-29 10:24           ` Julien Grall
2015-09-29 10:07       ` Ian Campbell
2015-09-29 10:33         ` Wei Liu
2015-09-29 12:57         ` Haozhong Zhang
2015-09-29 10:04   ` Ian Campbell
2015-09-29 10:13     ` Haozhong Zhang
2015-09-29 10:24       ` Andrew Cooper
2015-09-29 10:28         ` Ian Campbell
2015-09-29 10:31           ` Andrew Cooper
2015-09-29 13:53           ` Haozhong Zhang
2015-09-29 13:56             ` Andrew Cooper
2015-09-29 14:01               ` Haozhong Zhang
2015-09-29 14:37                 ` Ian Campbell
2015-09-29 15:16                   ` Haozhong Zhang
2015-09-28 10:51 ` [PATCH 00/13] Add VMX TSC scaling support Andrew Cooper
2015-09-28 13:48   ` Boris Ostrovsky
2015-11-22 17:54 ` Haozhong Zhang
2015-11-23 15:37   ` Boris Ostrovsky
2015-11-24 13:05     ` Haozhong Zhang
2015-11-24 14:19       ` Boris Ostrovsky
2015-11-24 14:25       ` Haozhong Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.