All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] Time-related fixes for migration
@ 2014-04-16 22:59 Boris Ostrovsky
  2014-04-16 22:59 ` [PATCH v5 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Boris Ostrovsky @ 2014-04-16 22:59 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich,
	jun.nakajima, boris.ostrovsky


Version 5:
 * Fixed checks in tsc_set_info to make sure that PV works.

   The fix also covers PVH case although after having a quick look at PVH
   support wrt time/TSC it's pretty clear that more work needs to be done.
   For example, PVH doesn't appear to touch TSC offsets in VMCB, it uses
   hosts TSC value as is. It also doesn't initialise guest time (which is
   possibly why TSC emulation doesn't work). All these issues will have to
   be addressed separately.

 * Syntax cleanup

Version 4:
 While testing on AMD processors I realized that TSC scaling has a number of
 problems, some of which need to use the same interfaces as I added for
 migration bugs. I therefore decided to fold TSC scaling fixes into this series.

 Three patches:
 1 Revert the change introduced by 4aab59a3 where we stop intercepting RDTSC(P)
   if TSC scaling is supported. This was done in the wrong place resulting in guest
   running without intercepts but with vtsc on. This patch also allows us to
   continue running in tsc native mode after migration if frequency stays the same
   (which is what docs/misc/tscmode.txt implies also)
 2 It looks like TSC scaling enablement logic was inverted: we should be using it
   when running in tsc native mode, which is not what happens now. We also need to
   do some work to synchronize TSCs during initial boot when TSC scaling is on.
 3 The remainder of the original patch to cover TSC synchronization after
   migration.

 During v3 review Jan requested that I drop the at_tsc argument to 
 hvm_set_guest_tsc_fixed() since it should always be possible to safely examine
 chkpt_tsc (now called sync_tsc). This is no longer the case because we use this
 variable also during boot and so I left the interface as it was in V3 (and dropped
 arch_hvm_save_done() where sync_tsc is cleared since it doesn't add anything)

Version 3:
* Only the second patch is submitted (the first one has been applied)
* More thorough AMD support (work around rdtscll() in svm_set_tsc_offset())

Version 2:
* Avoid setting parameters from config file twice
* Use out-of-line definition of get_s_time()
* Update rdtscll macro definition to avoid namespace clashes
* Syntax cleanup

Two patches to address issues we discovered during migration testing.

* The first patch loads HVM parameters from configuration file during restore.
To fix the actual problem that we saw only timer_mode needed to be restored but
it seems to me that other parameters are needed as well since at least some of
them are used "at runtime".

The bug can be demonstrated with a Solaris guest but I haven't been able to
trigger it with Linux. Possibly because Solaris's gethrtime() routine is a
fast trap to kernel's hrtimer which does some tricks to account for missed
ticks during interrupts.

* The second patch keeps TSCs synchronized across VPCUs after save/restore.
Currently TSC values diverge after migration because during both save and restore
we calculate them separately for each VCPU and base each calculation on
newly-read host's TSC.

The problem can be easily demonstrated with this program for a 2-VCPU guest 
(I am assuming here invariant TSC so, for example, tsc_mode="always_emulate" (*)):

int
main(int argc, char* argv[])
{

  unsigned long long h = 0LL;
  int proc = 0;
  cpu_set_t set;

  for(;;) {
    unsigned long long n = __native_read_tsc();
    if(h && n < h)
        printf("prev 0x%llx cur 0x%llx\n", h, n);
    CPU_ZERO(&set);
    proc = (proc + 1) & 1;
    CPU_SET(proc, &set);
    if (sched_setaffinity(0, sizeof(cpu_set_t), &set)) {
        perror("sched_setaffinity");
        exit(1);
    }

    h = n;
  }
}


(*) Which brings up another observation: when we are in default tsc_mode we
start off with vtsc=0 and thus clear TSC_Invariant bit in guest's CPUID.
After migration vtsc is 1 and TSC_Invariant bit is set. So the guest may observe
different values of CPUID. Which technically reflects the fact that TSC became
"safe" but I think potentially may be problematic to some guests.


Boris Ostrovsky (3):
  x86: Use native RDTSC(P) execution when guest and host frequencies
    are the same
  x86/svm: Enable TSC scaling
  x86/HVM: Use fixed TSC value when saving or restoring domain

 xen/arch/x86/hvm/hvm.c           |   23 +++++++++++++-----
 xen/arch/x86/hvm/save.c          |    6 +++++
 xen/arch/x86/hvm/svm/svm.c       |   23 +++++++++++-------
 xen/arch/x86/hvm/vmx/vmx.c       |    7 +++--
 xen/arch/x86/hvm/vmx/vvmx.c      |    4 +-
 xen/arch/x86/hvm/vpt.c           |   16 ++++++++-----
 xen/arch/x86/time.c              |   46 ++++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/domain.h |    6 +++++
 xen/include/asm-x86/hvm/hvm.h    |   11 +++++---
 xen/include/asm-x86/msr.h        |    6 ++--
 xen/include/xen/time.h           |    1 +
 11 files changed, 108 insertions(+), 41 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v5 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same
  2014-04-16 22:59 [PATCH v5 0/3] Time-related fixes for migration Boris Ostrovsky
@ 2014-04-16 22:59 ` Boris Ostrovsky
  2014-04-16 22:59 ` [PATCH v5 2/3] x86/svm: Enable TSC scaling Boris Ostrovsky
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Boris Ostrovsky @ 2014-04-16 22:59 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich,
	jun.nakajima, boris.ostrovsky

We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).

This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/svm.c |    2 +-
 xen/arch/x86/time.c        |   15 +++++++++++----
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 4fd5376..813e775 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -728,7 +728,7 @@ static void svm_set_rdtsc_exiting(struct vcpu *v, bool_t enable)
     general1_intercepts &= ~GENERAL1_INTERCEPT_RDTSC;
     general2_intercepts &= ~GENERAL2_INTERCEPT_RDTSCP;
 
-    if ( enable && !cpu_has_tsc_ratio )
+    if ( enable )
     {
         general1_intercepts |= GENERAL1_INTERCEPT_RDTSC;
         general2_intercepts |= GENERAL2_INTERCEPT_RDTSCP;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 07bceda..555f7c8 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -37,6 +37,7 @@
 #include <asm/hpet.h>
 #include <io_ports.h>
 #include <asm/setup.h> /* for early_time_init */
+#include <asm/hvm/svm/svm.h> /* for cpu_has_tsc_ratio */
 #include <public/arch-x86/cpuid.h>
 
 /* opt_clocksource: Force clocksource to one of: pit, hpet, acpi. */
@@ -1889,10 +1890,16 @@ void tsc_set_info(struct domain *d,
         d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
         d->arch.tsc_khz = gtsc_khz ? gtsc_khz : cpu_khz;
         set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000 );
-        /* use native TSC if initial host has safe TSC, has not migrated
-         * yet and tsc_khz == cpu_khz */
-        if ( host_tsc_is_safe() && incarnation == 0 &&
-                d->arch.tsc_khz == cpu_khz )
+        /*
+         * Use native TSC if the host has safe TSC and:
+         *  HVM/PVH: host and guest frequencies are the same (either
+         *           "naturally" or via TSC scaling)
+         *  PV: guest has not migrated yet (and thus arch.tsc_khz == cpu_khz)
+         */
+        if ( host_tsc_is_safe() &&
+             ((has_hvm_container_domain(d) &&
+               (d->arch.tsc_khz == cpu_khz || cpu_has_tsc_ratio)) ||
+              incarnation == 0) )
             d->arch.vtsc = 0;
         else 
             d->arch.ns_to_vtsc = scale_reciprocal(d->arch.vtsc_to_ns);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v5 2/3] x86/svm: Enable TSC scaling
  2014-04-16 22:59 [PATCH v5 0/3] Time-related fixes for migration Boris Ostrovsky
  2014-04-16 22:59 ` [PATCH v5 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky
@ 2014-04-16 22:59 ` Boris Ostrovsky
  2014-04-16 22:59 ` [PATCH v5 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky
  2014-04-17  7:46 ` [PATCH v5 0/3] Time-related fixes for migration Jan Beulich
  3 siblings, 0 replies; 6+ messages in thread
From: Boris Ostrovsky @ 2014-04-16 22:59 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich,
	jun.nakajima, boris.ostrovsky

TSC ratio enabling logic is inverted: we want to use it when we
are running in native tsc mode, i.e. when d->arch.vtsc is zero.

Also, since now svm_set_tsc_offset()'s calculations depend
on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after
vtsc changes in tsc_set_info().

In addition, with TSC ratio enabled, svm_set_tsc_offset() will
need to do rdtsc. With that we may end up having TSCs on guest's
processors out of sync. d->arch.hvm_domain.sync_tsc which is set
by the boot processor can now be used by APs as reference TSC
value instead of host's current TSC.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/hvm/hvm.c           |   15 ++++++++++-----
 xen/arch/x86/hvm/svm/svm.c       |   16 ++++++++++------
 xen/arch/x86/hvm/vmx/vmx.c       |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c      |    4 ++--
 xen/arch/x86/hvm/vpt.c           |   16 ++++++++++------
 xen/arch/x86/time.c              |   29 +++++++++++++++++++++++++++--
 xen/include/asm-x86/hvm/domain.h |    6 ++++++
 xen/include/asm-x86/hvm/hvm.h    |    8 +++++---
 xen/include/asm-x86/msr.h        |    6 +++---
 xen/include/xen/time.h           |    1 +
 10 files changed, 75 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ae7a645..15f4b25 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -278,27 +278,31 @@ void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc)
                           - v->arch.hvm_vcpu.cache_tsc_offset;
     v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
 
-    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
 }
 
 void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
 {
     v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust
                             - v->arch.hvm_vcpu.msr_tsc_adjust;
-    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
     v->arch.hvm_vcpu.msr_tsc_adjust = tsc_adjust;
 }
 
-u64 hvm_get_guest_tsc(struct vcpu *v)
+u64 hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc)
 {
     uint64_t tsc;
 
     if ( v->domain->arch.vtsc )
     {
-        tsc = hvm_get_guest_time(v);
+        tsc = hvm_get_guest_time_fixed(v, at_tsc);
         tsc = gtime_to_gtsc(v->domain, tsc);
         v->domain->arch.vtsc_kerncount++;
     }
+    else if ( at_tsc )
+    {
+        tsc = at_tsc;
+    }
     else
     {
         rdtscll(tsc);
@@ -3840,7 +3844,8 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
     /* Sync AP's TSC with BSP's. */
     v->arch.hvm_vcpu.cache_tsc_offset =
         v->domain->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset;
-    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset,
+                             d->arch.hvm_domain.sync_tsc);
 
     v->arch.hvm_vcpu.msr_tsc_adjust = 0;
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 813e775..4cc271e 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -680,7 +680,7 @@ static uint64_t svm_get_tsc_offset(uint64_t host_tsc, uint64_t guest_tsc,
     return guest_tsc - offset;
 }
 
-static void svm_set_tsc_offset(struct vcpu *v, u64 offset)
+static void svm_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 {
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
     struct vmcb_struct *n1vmcb, *n2vmcb;
@@ -688,11 +688,15 @@ static void svm_set_tsc_offset(struct vcpu *v, u64 offset)
     struct domain *d = v->domain;
     uint64_t host_tsc, guest_tsc;
 
-    guest_tsc = hvm_get_guest_tsc(v);
+    guest_tsc = hvm_get_guest_tsc_fixed(v, at_tsc);
 
     /* Re-adjust the offset value when TSC_RATIO is available */
-    if ( cpu_has_tsc_ratio && d->arch.vtsc ) {
-        rdtscll(host_tsc);
+    if ( cpu_has_tsc_ratio && !d->arch.vtsc )
+    {
+        if ( at_tsc )
+            host_tsc = at_tsc;
+        else
+            rdtscll(host_tsc);
         offset = svm_get_tsc_offset(host_tsc, guest_tsc, vcpu_tsc_ratio(v));
     }
 
@@ -847,13 +851,13 @@ static int svm_update_lwp_cfg(struct vcpu *v, uint64_t msr_content)
 static inline void svm_tsc_ratio_save(struct vcpu *v)
 {
     /* Other vcpus might not have vtsc enabled. So disable TSC_RATIO here. */
-    if ( cpu_has_tsc_ratio && v->domain->arch.vtsc )
+    if ( cpu_has_tsc_ratio && !v->domain->arch.vtsc )
         wrmsrl(MSR_AMD64_TSC_RATIO, DEFAULT_TSC_RATIO);
 }
 
 static inline void svm_tsc_ratio_load(struct vcpu *v)
 {
-    if ( cpu_has_tsc_ratio && v->domain->arch.vtsc ) 
+    if ( cpu_has_tsc_ratio && !v->domain->arch.vtsc ) 
         wrmsrl(MSR_AMD64_TSC_RATIO, vcpu_tsc_ratio(v));
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 180cf6c..1c9fa48 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1052,7 +1052,7 @@ static void vmx_handle_cd(struct vcpu *v, unsigned long value)
     }
 }
 
-static void vmx_set_tsc_offset(struct vcpu *v, u64 offset)
+static void vmx_set_tsc_offset(struct vcpu *v, u64 offset, u64 at_tsc)
 {
     vmx_vmcs_enter(v);
 
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 40167d6..e263376 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1058,7 +1058,7 @@ static void load_shadow_guest_state(struct vcpu *v)
     if ( control & VM_ENTRY_LOAD_PERF_GLOBAL_CTRL )
         hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, GUEST_PERF_GLOBAL_CTRL));
 
-    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
 
     vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmentry_fields), vmentry_fields);
 
@@ -1259,7 +1259,7 @@ static void load_vvmcs_host_state(struct vcpu *v)
     if ( control & VM_EXIT_LOAD_PERF_GLOBAL_CTRL )
         hvm_msr_write_intercept(MSR_CORE_PERF_GLOBAL_CTRL, __get_vvmcs(vvmcs, HOST_PERF_GLOBAL_CTRL));
 
-    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
 
     __set_vvmcs(vvmcs, VM_ENTRY_INTR_INFO, 0);
 }
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index f7af688..38541cf 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -36,7 +36,7 @@ void hvm_init_guest_time(struct domain *d)
     pl->last_guest_time = 0;
 }
 
-u64 hvm_get_guest_time(struct vcpu *v)
+u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc)
 {
     struct pl_time *pl = &v->domain->arch.hvm_domain.pl_time;
     u64 now;
@@ -45,11 +45,15 @@ u64 hvm_get_guest_time(struct vcpu *v)
     ASSERT(is_hvm_vcpu(v));
 
     spin_lock(&pl->pl_time_lock);
-    now = get_s_time() + pl->stime_offset;
-    if ( (int64_t)(now - pl->last_guest_time) > 0 )
-        pl->last_guest_time = now;
-    else
-        now = ++pl->last_guest_time;
+    now = get_s_time_fixed(at_tsc) + pl->stime_offset;
+
+    if ( !at_tsc )
+    {
+        if ( (int64_t)(now - pl->last_guest_time) > 0 )
+            pl->last_guest_time = now;
+        else
+            now = ++pl->last_guest_time;
+    }
     spin_unlock(&pl->pl_time_lock);
 
     return now + v->arch.hvm_vcpu.stime_offset;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 555f7c8..93bb5b6 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -716,19 +716,27 @@ static unsigned long get_cmos_time(void)
  * System Time
  ***************************************************************************/
 
-s_time_t get_s_time(void)
+s_time_t get_s_time_fixed(u64 at_tsc)
 {
     struct cpu_time *t = &this_cpu(cpu_time);
     u64 tsc, delta;
     s_time_t now;
 
-    rdtscll(tsc);
+    if ( at_tsc )
+        tsc = at_tsc;
+    else
+        rdtscll(tsc);
     delta = tsc - t->local_tsc_stamp;
     now = t->stime_local_stamp + scale_delta(delta, &t->tsc_scale);
 
     return now;
 }
 
+s_time_t get_s_time()
+{
+    return get_s_time_fixed(0);
+}
+
 uint64_t tsc_ticks2ns(uint64_t ticks)
 {
     struct cpu_time *t = &this_cpu(cpu_time);
@@ -1924,7 +1932,24 @@ void tsc_set_info(struct domain *d,
     }
     d->arch.incarnation = incarnation + 1;
     if ( is_hvm_domain(d) )
+    {
         hvm_set_rdtsc_exiting(d, d->arch.vtsc);
+        if ( d->vcpu && d->vcpu[0] && incarnation == 0 )
+        {
+            /*
+             * set_tsc_offset() is called from hvm_vcpu_initialise() before
+             * tsc_set_info(). New vtsc mode may require recomputing TSC
+             * offset.
+             * We only need to do this for BSP during initial boot. APs will
+             * call set_tsc_offset() later from hvm_vcpu_reset_state() and they
+             * will sync their TSC to BSP's sync_tsc.
+             */
+            rdtscll(d->arch.hvm_domain.sync_tsc);
+            hvm_funcs.set_tsc_offset(d->vcpu[0],
+                                     d->vcpu[0]->arch.hvm_vcpu.cache_tsc_offset,
+                                     d->arch.hvm_domain.sync_tsc);
+        }
+    }
 }
 
 /* vtsc may incur measurable performance degradation, diagnose with this */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b1e3187..460dd94 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -90,6 +90,12 @@ struct hvm_domain {
     bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
 
+    /*
+     * TSC value that VCPUs use to calculate their tsc_offset value.
+     * Used during initialization and save/restore.
+     */
+    uint64_t sync_tsc;
+
     union {
         struct vmx_domain vmx;
         struct svm_domain svm;
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index dcc3483..1fb3b99 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -137,7 +137,7 @@ struct hvm_function_table {
     int  (*get_guest_pat)(struct vcpu *v, u64 *);
     int  (*set_guest_pat)(struct vcpu *v, u64);
 
-    void (*set_tsc_offset)(struct vcpu *v, u64 offset);
+    void (*set_tsc_offset)(struct vcpu *v, u64 offset, u64 at_tsc);
 
     void (*inject_trap)(struct hvm_trap *trap);
 
@@ -233,11 +233,13 @@ void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
 
 void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc);
-u64 hvm_get_guest_tsc(struct vcpu *v);
+u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc);
+#define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0)
 
 void hvm_init_guest_time(struct domain *d);
 void hvm_set_guest_time(struct vcpu *v, u64 guest_time);
-u64 hvm_get_guest_time(struct vcpu *v);
+u64 hvm_get_guest_time_fixed(struct vcpu *v, u64 at_tsc);
+#define hvm_get_guest_time(v) hvm_get_guest_time_fixed(v, 0)
 
 int vmsi_deliver(
     struct domain *d, int vector,
diff --git a/xen/include/asm-x86/msr.h b/xen/include/asm-x86/msr.h
index 61f579a..52cae4b 100644
--- a/xen/include/asm-x86/msr.h
+++ b/xen/include/asm-x86/msr.h
@@ -78,9 +78,9 @@ static inline int wrmsr_safe(unsigned int msr, uint64_t val)
      __asm__ __volatile__("rdtsc" : "=a" (low) : : "edx")
 
 #define rdtscll(val) do { \
-     unsigned int a,d; \
-     asm volatile("rdtsc" : "=a" (a), "=d" (d)); \
-     (val) = ((unsigned long)a) | (((unsigned long)d)<<32); \
+     unsigned int _eax, _edx; \
+     asm volatile("rdtsc" : "=a" (_eax), "=d" (_edx)); \
+     (val) = ((unsigned long)_eax) | (((unsigned long)_edx)<<32); \
 } while(0)
 
 #define __write_tsc(val) wrmsrl(MSR_IA32_TSC, val)
diff --git a/xen/include/xen/time.h b/xen/include/xen/time.h
index 2703454..709501f 100644
--- a/xen/include/xen/time.h
+++ b/xen/include/xen/time.h
@@ -32,6 +32,7 @@ struct vcpu;
 typedef s64 s_time_t;
 #define PRI_stime PRId64
 
+s_time_t get_s_time_fixed(u64 at_tick);
 s_time_t get_s_time(void);
 unsigned long get_localtime(struct domain *d);
 uint64_t get_localtime_us(struct domain *d);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v5 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain
  2014-04-16 22:59 [PATCH v5 0/3] Time-related fixes for migration Boris Ostrovsky
  2014-04-16 22:59 ` [PATCH v5 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky
  2014-04-16 22:59 ` [PATCH v5 2/3] x86/svm: Enable TSC scaling Boris Ostrovsky
@ 2014-04-16 22:59 ` Boris Ostrovsky
  2014-04-17  7:46 ` [PATCH v5 0/3] Time-related fixes for migration Jan Beulich
  3 siblings, 0 replies; 6+ messages in thread
From: Boris Ostrovsky @ 2014-04-16 22:59 UTC (permalink / raw)
  To: xen-devel
  Cc: kevin.tian, suravee.suthikulpanit, eddie.dong, jbeulich,
	jun.nakajima, boris.ostrovsky

When a domain is saved each VCPU's TSC value needs to be preserved. To get it we
use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() which
it may call) calculates VCPU's TSC based on current host's TSC value (by doing a
rdtscll()). Since this is performed for each VCPU separately we end up with
un-synchronized TSCs.

Similarly, during a restore each VCPU is assigned its TSC based on host's current
tick, causing virtual TSCs to diverge further.

With this, we can easily get into situation where a guest may see time going
backwards.

Instead of reading new TSC value for each VCPU when saving/restoring it we should
use the same value across all VCPUs.

Reported-by: Philippe Coquard <philippe.coquard@mpsa.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/hvm.c        |   10 +++++++---
 xen/arch/x86/hvm/save.c       |    6 ++++++
 xen/arch/x86/hvm/svm/svm.c    |    5 +++--
 xen/arch/x86/hvm/vmx/vmx.c    |    5 +++--
 xen/include/asm-x86/hvm/hvm.h |    3 ++-
 5 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 15f4b25..44fbb69 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -258,16 +258,20 @@ int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat)
     return 1;
 }
 
-void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc)
+void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
 {
     uint64_t tsc;
     uint64_t delta_tsc;
 
     if ( v->domain->arch.vtsc )
     {
-        tsc = hvm_get_guest_time(v);
+        tsc = hvm_get_guest_time_fixed(v, at_tsc);
         tsc = gtime_to_gtsc(v->domain, tsc);
     }
+    else if ( at_tsc )
+    {
+        tsc = at_tsc;
+    }
     else
     {
         rdtscll(tsc);
@@ -278,7 +282,7 @@ void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc)
                           - v->arch.hvm_vcpu.cache_tsc_offset;
     v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
 
-    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, 0);
+    hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc);
 }
 
 void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c
index 066fdb2..6af19be 100644
--- a/xen/arch/x86/hvm/save.c
+++ b/xen/arch/x86/hvm/save.c
@@ -34,6 +34,9 @@ void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
 
     /* Save guest's preferred TSC. */
     hdr->gtsc_khz = d->arch.tsc_khz;
+
+    /* Time when saving started */
+    rdtscll(d->arch.hvm_domain.sync_tsc);
 }
 
 int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
@@ -67,6 +70,9 @@ int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
     if ( d->arch.vtsc )
         hvm_set_rdtsc_exiting(d, 1);
 
+    /* Time when restore started  */
+    rdtscll(d->arch.hvm_domain.sync_tsc);
+
     /* VGA state is not saved/restored, so we nobble the cache. */
     d->arch.hvm_domain.stdvga.cache = 0;
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 4cc271e..3fe4b9c 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -318,7 +318,8 @@ static void svm_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     data->msr_efer         = v->arch.hvm_vcpu.guest_efer;
     data->msr_flags        = -1ULL;
 
-    data->tsc = hvm_get_guest_tsc(v);
+    data->tsc = hvm_get_guest_tsc_fixed(v,
+                                        v->domain->arch.hvm_domain.sync_tsc);
 }
 
 
@@ -334,7 +335,7 @@ static void svm_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     v->arch.hvm_vcpu.guest_efer = data->msr_efer;
     svm_update_guest_efer(v);
 
-    hvm_set_guest_tsc(v, data->tsc);
+    hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.hvm_domain.sync_tsc);
 }
 
 static void svm_save_vmcb_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 1c9fa48..104e6e4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -540,7 +540,8 @@ static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     data->msr_star         = guest_state->msrs[VMX_INDEX_MSR_STAR];
     data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK];
 
-    data->tsc = hvm_get_guest_tsc(v);
+    data->tsc = hvm_get_guest_tsc_fixed(v,
+                                        v->domain->arch.hvm_domain.sync_tsc);
 }
 
 static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
@@ -556,7 +557,7 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     v->arch.hvm_vmx.cstar     = data->msr_cstar;
     v->arch.hvm_vmx.shadow_gs = data->shadow_gs;
 
-    hvm_set_guest_tsc(v, data->tsc);
+    hvm_set_guest_tsc_fixed(v, data->tsc, v->domain->arch.hvm_domain.sync_tsc);
 }
 
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 1fb3b99..31043b2 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -232,7 +232,8 @@ bool_t hvm_send_assist_req(struct vcpu *v);
 void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat);
 int hvm_set_guest_pat(struct vcpu *v, u64 guest_pat);
 
-void hvm_set_guest_tsc(struct vcpu *v, u64 guest_tsc);
+void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc);
+#define hvm_set_guest_tsc(v, t) hvm_set_guest_tsc_fixed(v, t, 0)
 u64 hvm_get_guest_tsc_fixed(struct vcpu *v, u64 at_tsc);
 #define hvm_get_guest_tsc(v) hvm_get_guest_tsc_fixed(v, 0)
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 0/3] Time-related fixes for migration
  2014-04-16 22:59 [PATCH v5 0/3] Time-related fixes for migration Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2014-04-16 22:59 ` [PATCH v5 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky
@ 2014-04-17  7:46 ` Jan Beulich
  2014-04-17 13:30   ` Boris Ostrovsky
  3 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2014-04-17  7:46 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel

>>> On 17.04.14 at 00:59, <boris.ostrovsky@oracle.com> wrote:
> Version 5:
>  * Fixed checks in tsc_set_info to make sure that PV works.
> 
>    The fix also covers PVH case although after having a quick look at PVH
>    support wrt time/TSC it's pretty clear that more work needs to be done.
>    For example, PVH doesn't appear to touch TSC offsets in VMCB, it uses

PVH and VMCB (i.e. SVM) don't work together anyway.

Jan

>    hosts TSC value as is. It also doesn't initialise guest time (which is
>    possibly why TSC emulation doesn't work). All these issues will have to
>    be addressed separately.
> 
>  * Syntax cleanup

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v5 0/3] Time-related fixes for migration
  2014-04-17  7:46 ` [PATCH v5 0/3] Time-related fixes for migration Jan Beulich
@ 2014-04-17 13:30   ` Boris Ostrovsky
  0 siblings, 0 replies; 6+ messages in thread
From: Boris Ostrovsky @ 2014-04-17 13:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, eddie.dong, jun.nakajima, suravee.suthikulpanit, xen-devel

On 04/17/2014 03:46 AM, Jan Beulich wrote:
>>>> On 17.04.14 at 00:59, <boris.ostrovsky@oracle.com> wrote:
>> Version 5:
>>   * Fixed checks in tsc_set_info to make sure that PV works.
>>
>>     The fix also covers PVH case although after having a quick look at PVH
>>     support wrt time/TSC it's pretty clear that more work needs to be done.
>>     For example, PVH doesn't appear to touch TSC offsets in VMCB, it uses
> PVH and VMCB (i.e. SVM) don't work together anyway.
>

I meant VMCS.

-boris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-17 13:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-16 22:59 [PATCH v5 0/3] Time-related fixes for migration Boris Ostrovsky
2014-04-16 22:59 ` [PATCH v5 1/3] x86: Use native RDTSC(P) execution when guest and host frequencies are the same Boris Ostrovsky
2014-04-16 22:59 ` [PATCH v5 2/3] x86/svm: Enable TSC scaling Boris Ostrovsky
2014-04-16 22:59 ` [PATCH v5 3/3] x86/HVM: Use fixed TSC value when saving or restoring domain Boris Ostrovsky
2014-04-17  7:46 ` [PATCH v5 0/3] Time-related fixes for migration Jan Beulich
2014-04-17 13:30   ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.