xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support
@ 2016-03-17 16:12 Joao Martins
  2016-03-17 16:12 ` [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info Joao Martins
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-17 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

Hey,

This series is a repost with the comments that I got so far and
hopefully could be considered for Xen 4.7.

PVCLOCK_TSC_STABLE_BIT is the flag telling the guest that the
vcpu_time_info (pvti) are monotonic as seen by any CPU, a feature
which is currently not supported. As it is (i.e. bindly setting the
flag), we can observe that this property isn't there: a process using
vdso clock_gettime/gettimeofday will observe a significant amount of
warps (i.e. time going backwards) and it's due to 1) time calibration
skew in xen rendezvous algorithm 2) clocksource not in sync with TSC.
These warps are seen more frequently on PV guests (potentially because
vcpu time infos are only updated when guest is in kernel mode, and
perhaps lack of tsc_offset?), and in rare ocasions on HVM guests. It
is worth noting that with guests VCPUs pinned, only PV guests see
these warps. But on HVM guests specifically: such warps only occur
when one of guest VCPUs is pinned to CPU0. This series aims to propose
a solution to that and it's divided as following:

(R)	* Patch 1: Adds the missing flag field to vcpu_time_info.
(U)	* Patch 2: Adds a new clocksource based on TSC
(U)	* Patch 3, 4: Adjustments for patch 5
	* Patch 5: Implements the PVCLOCK_TSC_STABLE_BIT

	[R := Reviewed-by ;; U := Updated]

PVCLOCK_TSC_STABLE_BIT is set only when using clocksource=tsc,
and so it remains optional unless specified by the admin.

The test was running time-warp-test, that constantly calls
clock_gettime/gettimeofday on every CPU. It measures a delta with the
previous returned value and mark a warp if it's negative. I measured
it during periods of 1h and 6h and check how many warps and their
values (alongside the amount of time skew). Measurements/Changelog are
included in individual patches.

Note that most of the testing has been done with Linux 4.4 in which
these warps/skew could be easily observed as vdso would use each vCPU
pvti. Though Linux >= 4.5 changes this behaviour and use only the
vCPU0 pvti though still requiring PVCLOCK_TSC_STABLE_BIT flag
support.

Any comments appreciated,

Thanks!
Joao

Joao Martins (5):
  public/xen.h: add flags field to vcpu_time_info
  x86/time: implement tsc as clocksource
  x86/time: streamline platform time init on plt_init()
  x86/time: refactor read_platform_stime()
  x86/time: implement PVCLOCK_TSC_STABLE_BIT

 xen/arch/x86/time.c      | 132 ++++++++++++++++++++++++++++++++++++++++-------
 xen/include/public/xen.h |   6 ++-
 2 files changed, 119 insertions(+), 19 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-17 16:12 [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
@ 2016-03-17 16:12 ` Joao Martins
  2016-03-18 20:12   ` Andrew Cooper
  2016-03-21 15:10   ` Jan Beulich
  2016-03-17 16:12 ` [PATCH 2/5] x86/time: implement tsc as clocksource Joao Martins
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-17 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Joao Martins, Ian Jackson, Jan Beulich, Tim Deegan

This field has two possible flags (as of latest pvclock ABI
shared with KVM).

flags: bits in this field indicate extended capabilities
coordinated between the guest and the hypervisor.  Specifically
on KVM, availability of specific flags has to be checked in
0x40000001 cpuid leaf. On Xen, we don't have that but we can
still check some of the flags after registering the time info
page since a force_update_vcpu_system_time is performed.

Current flags are:

 flag bit   | cpuid bit    | meaning
-------------------------------------------------------------
            |              | time measures taken across
     0      |      24      | multiple cpus are guaranteed to
            |              | be monotonic
-------------------------------------------------------------
            |              | guest vcpu has been paused by
     1      |     N/A      | the host
            |              |
-------------------------------------------------------------

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/include/public/xen.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..08373f6 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -614,10 +614,14 @@ struct vcpu_time_info {
      */
     uint32_t tsc_to_system_mul;
     int8_t   tsc_shift;
-    int8_t   pad1[3];
+    int8_t   flags;
+    int8_t   pad1[2];
 }; /* 32 bytes */
 typedef struct vcpu_time_info vcpu_time_info_t;
 
+#define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
+#define PVCLOCK_GUEST_STOPPED	(1 << 1)
+
 struct vcpu_info {
     /*
      * 'evtchn_upcall_pending' is written non-zero by Xen to indicate
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-17 16:12 [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
  2016-03-17 16:12 ` [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info Joao Martins
@ 2016-03-17 16:12 ` Joao Martins
  2016-03-18 20:21   ` Andrew Cooper
  2016-03-17 16:12 ` [PATCH 3/5] x86/time: streamline platform time init on plt_init() Joao Martins
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-17 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

Introduce support for using TSC as platform time which is the highest
resolution time and most performant to get (~20 nsecs).  Though there
are also several problems associated with its usage, and there isn't a
complete (and architecturally defined) guarantee that all machines
will provide reliable and monotonic TSC across all CPUs, on different
sockets and different P/C states.  I believe Intel to be the only that
can guarantee that. For this reason it's set with less priority when
compared to HPET unless adminstrator changes "clocksource" boot option
to "tsc". Upon initializing it, we also check for time warps and
appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
NONSTOP_TSC. And in case none of these conditions are met, we fail in
order to fallback to the next available clocksource.

It is also worth noting that with clocksource=tsc there isn't any
need to synchronise with another clocksource, and I could verify that
great portion the time skew was eliminated and seeing much less time
warps happening. With HPET I observed ~500 warps in the period
of 1h of around 27 us, and with TSC down to 50 warps in the same
period having each warp < 100 ns. The warps still exist though but
are only related to cross CPU calibration (being how much it takes to
rendezvous with master), in which a later patch in this series aims to
solve.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since RFC:
 - Spelling fixes in the commit message.
 - Remove unused clocksource_is_tsc variable and introduce it instead
 on the patch that uses it.
 - Move plt_tsc from second to last in the available clocksources.
---
 xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 687e39b..1311c58 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 }
 
 /************************************************************
+ * PLATFORM TIMER 4: TSC
+ */
+static u64 tsc_freq;
+static unsigned long tsc_max_warp;
+static void tsc_check_reliability(void);
+
+static int __init init_tsctimer(struct platform_timesource *pts)
+{
+    bool_t tsc_reliable = 0;
+
+    tsc_check_reliability();
+
+    if ( tsc_max_warp > 0 )
+    {
+        tsc_reliable = 0;
+        printk("TSC: didn't passed warp test\n");
+    }
+    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
+              ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+                boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) )
+    {
+        tsc_reliable = 1;
+    }
+    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
+    {
+        tsc_reliable = (max_cstate <= 2);
+
+        if (tsc_reliable)
+            printk("TSC: no deep Cstates, deemed reliable\n");
+        else
+            printk("TSC: deep Cstates possible, so not reliable\n");
+    }
+
+    pts->frequency = tsc_freq;
+    return tsc_reliable;
+}
+
+static u64 read_tsc(void)
+{
+    return rdtsc();
+}
+
+static void resume_tsctimer(struct platform_timesource *pts)
+{
+}
+
+static struct platform_timesource __initdata plt_tsc =
+{
+    .id = "tsc",
+    .name = "TSC",
+    .read_counter = read_tsc,
+    .counter_bits = 64,
+    .init = init_tsctimer,
+    .resume = resume_tsctimer,
+};
+
+/************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
@@ -536,7 +593,7 @@ static void resume_platform_timer(void)
 static void __init init_platform_timer(void)
 {
     static struct platform_timesource * __initdata plt_timers[] = {
-        &plt_hpet, &plt_pmtimer, &plt_pit
+        &plt_hpet, &plt_pmtimer, &plt_pit, &plt_tsc
     };
 
     struct platform_timesource *pts = NULL;
@@ -1181,7 +1238,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp)
     }
 }
 
-static unsigned long tsc_max_warp, tsc_check_count;
+static unsigned long tsc_check_count;
 static cpumask_t tsc_check_cpumask;
 
 static void tsc_check_slave(void *unused)
@@ -1465,6 +1522,7 @@ void __init early_time_init(void)
     struct cpu_time *t = &this_cpu(cpu_time);
     u64 tmp = init_pit_and_calibrate_tsc();
 
+    tsc_freq = tmp;
     set_time_scale(&t->tsc_scale, tmp);
     t->local_tsc_stamp = boot_tsc_stamp;
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] x86/time: streamline platform time init on plt_init()
  2016-03-17 16:12 [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
  2016-03-17 16:12 ` [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info Joao Martins
  2016-03-17 16:12 ` [PATCH 2/5] x86/time: implement tsc as clocksource Joao Martins
@ 2016-03-17 16:12 ` Joao Martins
  2016-03-18 20:32   ` Andrew Cooper
  2016-03-17 16:12 ` [PATCH 4/5] x86/time: refactor read_platform_stime() Joao Martins
  2016-03-17 16:12 ` [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
  4 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-17 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

And use to initialize platform time solely for clocksource=tsc,
as opposed to initializing platform overflow timer, which would
only fire in ~180 years (on 2.2 Ghz Broadwell processor).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since RFC:
 - Move clocksource_is_tsc variable to this patch.
 - s/1000000000/SECONDS(1)
---
 xen/arch/x86/time.c | 44 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 1311c58..5af8902 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 /************************************************************
  * PLATFORM TIMER 4: TSC
  */
+static bool_t clocksource_is_tsc = 0;
 static u64 tsc_freq;
 static unsigned long tsc_max_warp;
 static void tsc_check_reliability(void);
@@ -466,7 +467,7 @@ static int __init init_tsctimer(struct platform_timesource *pts)
     }
 
     pts->frequency = tsc_freq;
-    return tsc_reliable;
+    return ( clocksource_is_tsc = tsc_reliable );
 }
 
 static u64 read_tsc(void)
@@ -516,17 +517,31 @@ static s_time_t __read_platform_stime(u64 platform_time)
     return (stime_platform_stamp + scale_delta(diff, &plt_scale));
 }
 
+static void __plt_init(void)
+{
+    u64 count;
+
+    ASSERT(spin_is_locked(&platform_timer_lock));
+    count = plt_src.read_counter();
+    plt_stamp64 += (count - plt_stamp) & plt_mask;
+    plt_stamp = count;
+}
+
+static void plt_init(void)
+{
+    spin_lock_irq(&platform_timer_lock);
+    __plt_init();
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static void plt_overflow(void *unused)
 {
     int i;
-    u64 count;
     s_time_t now, plt_now, plt_wrap;
 
     spin_lock_irq(&platform_timer_lock);
 
-    count = plt_src.read_counter();
-    plt_stamp64 += (count - plt_stamp) & plt_mask;
-    plt_stamp = count;
+    __plt_init();
 
     now = NOW();
     plt_wrap = __read_platform_stime(plt_stamp64);
@@ -633,11 +648,22 @@ static void __init init_platform_timer(void)
 
     set_time_scale(&plt_scale, pts->frequency);
 
-    plt_overflow_period = scale_delta(
-        1ull << (pts->counter_bits-1), &plt_scale);
-    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
     plt_src = *pts;
-    plt_overflow(NULL);
+
+    if ( clocksource_is_tsc )
+    {
+        plt_init();
+    }
+    else
+    {
+        plt_overflow_period = scale_delta(
+            1ull << (pts->counter_bits-1), &plt_scale);
+        init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
+        plt_overflow(NULL);
+
+        printk("Platform timer overflow period is %lu secs\n",
+               plt_overflow_period/SECONDS(1));
+    }
 
     platform_timer_stamp = plt_stamp64;
     stime_platform_stamp = NOW();
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] x86/time: refactor read_platform_stime()
  2016-03-17 16:12 [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (2 preceding siblings ...)
  2016-03-17 16:12 ` [PATCH 3/5] x86/time: streamline platform time init on plt_init() Joao Martins
@ 2016-03-17 16:12 ` Joao Martins
  2016-03-18 20:34   ` Andrew Cooper
  2016-03-17 16:12 ` [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
  4 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-17 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

To fetch the last read from the clocksource which was used to
calculate system_time. In the case of clocksource=tsc we will
use it to set tsc_timestamp.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/time.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 5af8902..89c35d0 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -508,6 +508,7 @@ static s_time_t stime_platform_stamp; /* System time at below platform time */
 static u64 platform_timer_stamp;      /* Platform time at above system time */
 static u64 plt_stamp64;          /* 64-bit platform counter stamp           */
 static u64 plt_stamp;            /* hardware-width platform counter stamp   */
+static u64 plt_stamp_counter;    /* last read since read_counter */
 static struct timer plt_overflow_timer;
 
 static s_time_t __read_platform_stime(u64 platform_time)
@@ -566,7 +567,7 @@ static void plt_overflow(void *unused)
     set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
 }
 
-static s_time_t read_platform_stime(void)
+static s_time_t read_platform_stime(u64 *stamp)
 {
     u64 count;
     s_time_t stime;
@@ -574,8 +575,11 @@ static s_time_t read_platform_stime(void)
     ASSERT(!local_irq_is_enabled());
 
     spin_lock(&platform_timer_lock);
-    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
+    plt_stamp_counter = plt_src.read_counter();
+    count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask);
     stime = __read_platform_stime(count);
+    if (stamp)
+        *stamp = plt_stamp_counter;
     spin_unlock(&platform_timer_lock);
 
     return stime;
@@ -693,7 +697,7 @@ void cstate_restore_tsc(void)
     if ( boot_cpu_has(X86_FEATURE_NONSTOP_TSC) )
         return;
 
-    write_tsc(stime2tsc(read_platform_stime()));
+    write_tsc(stime2tsc(read_platform_stime(NULL)));
 }
 
 /***************************************************************************
@@ -1012,7 +1016,7 @@ int cpu_frequency_change(u64 freq)
 
     local_irq_disable();
     /* Platform time /first/, as we may be delayed by platform_timer_lock. */
-    t->stime_master_stamp = read_platform_stime();
+    t->stime_master_stamp = read_platform_stime(NULL);
     /* TSC-extrapolated time may be bogus after frequency change. */
     /*t->stime_local_stamp = get_s_time();*/
     t->stime_local_stamp = t->stime_master_stamp;
@@ -1330,7 +1334,7 @@ static void time_calibration_tsc_rendezvous(void *_r)
 
             if ( r->master_stime == 0 )
             {
-                r->master_stime = read_platform_stime();
+                r->master_stime = read_platform_stime(NULL);
                 r->master_tsc_stamp = rdtsc();
             }
             atomic_inc(&r->semaphore);
@@ -1422,7 +1426,7 @@ void init_percpu_time(void)
 
     local_irq_save(flags);
     t->local_tsc_stamp = rdtsc();
-    now = read_platform_stime();
+    now = read_platform_stime(NULL);
     local_irq_restore(flags);
 
     t->stime_master_stamp = now;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-03-17 16:12 [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (3 preceding siblings ...)
  2016-03-17 16:12 ` [PATCH 4/5] x86/time: refactor read_platform_stime() Joao Martins
@ 2016-03-17 16:12 ` Joao Martins
  2016-03-18 20:58   ` Andrew Cooper
  4 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-17 16:12 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

When using TSC as clocksource we will solely rely on TSC for updating
vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at
different instants meaning every EPOCH + delta. This delta is variable
depending on the time the CPU calibrates with CPU 0 (master), and will
likely be different and variable across vCPUS. This means that each VCPU
pvti won't account to its calibration error which could lead to time
going backwards, and allowing a situation where time read on VCPU B
immediately after A being smaller. While this doesn't happen a lot, I
was able to observe (for clocksource=tsc) around 50 times in an hour
having warps of < 100 ns.

This patch proposes relying on host TSC synchronization and passthrough
of the master tsc to the guest, when running on a TSC-safe platform.  On
the rendezvous function we will retrieve the platform time in ns and the
last count read by the clocksource that was used to compute system time.
master will write both master_tsc_stamp and master_stime, and the other
vCPUS (slave) will use it to update their correspondent time infos.
This way we can guarantee that on a platform with a constant and
reliable TSC, that the time read on vcpu B right after A is bigger
independently of the VCPU calibration error. Since pvclock time infos
are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then
enables usage of VDSO on Linux.  IIUC, this is similar to how it's
implemented on KVM.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/time.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 89c35d0..a17529c 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -917,6 +917,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
 
     _u.tsc_timestamp = tsc_stamp;
     _u.system_time   = t->stime_local_stamp;
+    if ( clocksource_is_tsc )
+        _u.flags    |= PVCLOCK_TSC_STABLE_BIT;
 
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
@@ -1377,9 +1379,12 @@ static void time_calibration_std_rendezvous(void *_r)
 
     if ( smp_processor_id() == 0 )
     {
+        u64 last_counter;
         while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
             cpu_relax();
-        r->master_stime = read_platform_stime();
+        r->master_stime = read_platform_stime(&last_counter);
+        if ( clocksource_is_tsc )
+            r->master_tsc_stamp = last_counter;
         mb(); /* write r->master_stime /then/ signal */
         atomic_inc(&r->semaphore);
     }
@@ -1391,7 +1396,10 @@ static void time_calibration_std_rendezvous(void *_r)
         mb(); /* receive signal /then/ read r->master_stime */
     }
 
-    c->local_tsc_stamp = rdtsc();
+    if ( clocksource_is_tsc )
+        c->local_tsc_stamp = r->master_tsc_stamp;
+    else
+        c->local_tsc_stamp = rdtsc();
     c->stime_local_stamp = get_s_time();
     c->stime_master_stamp = r->master_stime;
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-17 16:12 ` [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info Joao Martins
@ 2016-03-18 20:12   ` Andrew Cooper
  2016-03-21 11:42     ` Joao Martins
  2016-03-21 15:10   ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2016-03-18 20:12 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: Ian Jackson, Keir Fraser, Jan Beulich, Tim Deegan

On 17/03/16 16:12, Joao Martins wrote:
> This field has two possible flags (as of latest pvclock ABI
> shared with KVM).
>
> flags: bits in this field indicate extended capabilities
> coordinated between the guest and the hypervisor.  Specifically
> on KVM, availability of specific flags has to be checked in
> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
> still check some of the flags after registering the time info
> page since a force_update_vcpu_system_time is performed.
>
> Current flags are:
>
>  flag bit   | cpuid bit    | meaning
> -------------------------------------------------------------
>             |              | time measures taken across
>      0      |      24      | multiple cpus are guaranteed to
>             |              | be monotonic
> -------------------------------------------------------------
>             |              | guest vcpu has been paused by
>      1      |     N/A      | the host
>             |              |
> -------------------------------------------------------------
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Has the Linux maintainers file been patched to include xen-devel, to
avoid them altering our ABI unnoticed in the future?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-17 16:12 ` [PATCH 2/5] x86/time: implement tsc as clocksource Joao Martins
@ 2016-03-18 20:21   ` Andrew Cooper
  2016-03-21 11:43     ` Joao Martins
  2016-03-22 12:41     ` Joao Martins
  0 siblings, 2 replies; 30+ messages in thread
From: Andrew Cooper @ 2016-03-18 20:21 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: Keir Fraser, Jan Beulich

On 17/03/16 16:12, Joao Martins wrote:
> Introduce support for using TSC as platform time which is the highest
> resolution time and most performant to get (~20 nsecs).  Though there
> are also several problems associated with its usage, and there isn't a
> complete (and architecturally defined) guarantee that all machines
> will provide reliable and monotonic TSC across all CPUs, on different
> sockets and different P/C states.  I believe Intel to be the only that
> can guarantee that. For this reason it's set with less priority when
> compared to HPET unless adminstrator changes "clocksource" boot option
> to "tsc". Upon initializing it, we also check for time warps and
> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
> NONSTOP_TSC. And in case none of these conditions are met, we fail in
> order to fallback to the next available clocksource.
>
> It is also worth noting that with clocksource=tsc there isn't any
> need to synchronise with another clocksource, and I could verify that
> great portion the time skew was eliminated and seeing much less time
> warps happening. With HPET I observed ~500 warps in the period
> of 1h of around 27 us, and with TSC down to 50 warps in the same
> period having each warp < 100 ns. The warps still exist though but
> are only related to cross CPU calibration (being how much it takes to
> rendezvous with master), in which a later patch in this series aims to
> solve.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Some style corrections, but no functional problems.

Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>

>
> Changes since RFC:
>  - Spelling fixes in the commit message.
>  - Remove unused clocksource_is_tsc variable and introduce it instead
>  on the patch that uses it.
>  - Move plt_tsc from second to last in the available clocksources.
> ---
>  xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 60 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 687e39b..1311c58 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  }
>  
>  /************************************************************
> + * PLATFORM TIMER 4: TSC
> + */
> +static u64 tsc_freq;
> +static unsigned long tsc_max_warp;
> +static void tsc_check_reliability(void);
> +
> +static int __init init_tsctimer(struct platform_timesource *pts)
> +{
> +    bool_t tsc_reliable = 0;
> +
> +    tsc_check_reliability();
> +
> +    if ( tsc_max_warp > 0 )
> +    {
> +        tsc_reliable = 0;
> +        printk("TSC: didn't passed warp test\n");

printk(XENLOG_INFO "TSC ...

> +    }
> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
> +              ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> +                boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) )

You don't need extra spaces on inner brackets, so

boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
(boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )

is fine.

> +    {
> +        tsc_reliable = 1;
> +    }
> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
> +    {
> +        tsc_reliable = (max_cstate <= 2);
> +
> +        if (tsc_reliable)

Please add some extra ones here though.

> +            printk("TSC: no deep Cstates, deemed reliable\n");
> +        else
> +            printk("TSC: deep Cstates possible, so not reliable\n");

XENLOG_INFO as well please.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] x86/time: streamline platform time init on plt_init()
  2016-03-17 16:12 ` [PATCH 3/5] x86/time: streamline platform time init on plt_init() Joao Martins
@ 2016-03-18 20:32   ` Andrew Cooper
  2016-03-21 11:45     ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2016-03-18 20:32 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: Keir Fraser, Jan Beulich

On 17/03/16 16:12, Joao Martins wrote:
> And use to initialize platform time solely for clocksource=tsc,
> as opposed to initializing platform overflow timer, which would
> only fire in ~180 years (on 2.2 Ghz Broadwell processor).
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Again, just style corrections.

Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>

> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 1311c58..5af8902 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  /************************************************************
>   * PLATFORM TIMER 4: TSC
>   */
> +static bool_t clocksource_is_tsc = 0;

No need to explicitly initialise to 0.

>  static u64 tsc_freq;
>  static unsigned long tsc_max_warp;
>  static void tsc_check_reliability(void);
> @@ -466,7 +467,7 @@ static int __init init_tsctimer(struct platform_timesource *pts)
>      }
>  
>      pts->frequency = tsc_freq;
> -    return tsc_reliable;
> +    return ( clocksource_is_tsc = tsc_reliable );

While this does work, please avoid mixing an assignment and a return.

Something like

pts->frequency = tsc_freq;
clocksource_is_tsc = tsc_reliable;

return tsc_reliable;

is fine and easier logic to read.

> @@ -633,11 +648,22 @@ static void __init init_platform_timer(void)
>  
>      set_time_scale(&plt_scale, pts->frequency);
>  
> -    plt_overflow_period = scale_delta(
> -        1ull << (pts->counter_bits-1), &plt_scale);
> -    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>      plt_src = *pts;
> -    plt_overflow(NULL);
> +
> +    if ( clocksource_is_tsc )
> +    {
> +        plt_init();
> +    }
> +    else
> +    {
> +        plt_overflow_period = scale_delta(
> +            1ull << (pts->counter_bits-1), &plt_scale);

As you are moving this codeblock, please fix the style to
(pts->counter_bits - 1)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] x86/time: refactor read_platform_stime()
  2016-03-17 16:12 ` [PATCH 4/5] x86/time: refactor read_platform_stime() Joao Martins
@ 2016-03-18 20:34   ` Andrew Cooper
  2016-03-21 11:45     ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2016-03-18 20:34 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: Keir Fraser, Jan Beulich

On 17/03/16 16:12, Joao Martins wrote:
> To fetch the last read from the clocksource which was used to
> calculate system_time. In the case of clocksource=tsc we will
> use it to set tsc_timestamp.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Again, just minor style issues.

Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>

> ---
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/time.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 5af8902..89c35d0 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -508,6 +508,7 @@ static s_time_t stime_platform_stamp; /* System time at below platform time */
>  static u64 platform_timer_stamp;      /* Platform time at above system time */
>  static u64 plt_stamp64;          /* 64-bit platform counter stamp           */
>  static u64 plt_stamp;            /* hardware-width platform counter stamp   */
> +static u64 plt_stamp_counter;    /* last read since read_counter */

This can have its scope reduced to within read_platform_stime()

>  static struct timer plt_overflow_timer;
>  
>  static s_time_t __read_platform_stime(u64 platform_time)
> @@ -566,7 +567,7 @@ static void plt_overflow(void *unused)
>      set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
>  }
>  
> -static s_time_t read_platform_stime(void)
> +static s_time_t read_platform_stime(u64 *stamp)
>  {
>      u64 count;
>      s_time_t stime;
> @@ -574,8 +575,11 @@ static s_time_t read_platform_stime(void)
>      ASSERT(!local_irq_is_enabled());
>  
>      spin_lock(&platform_timer_lock);
> -    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
> +    plt_stamp_counter = plt_src.read_counter();
> +    count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask);
>      stime = __read_platform_stime(count);
> +    if (stamp)

Spaces.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-03-17 16:12 ` [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
@ 2016-03-18 20:58   ` Andrew Cooper
  2016-03-21 11:50     ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2016-03-18 20:58 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: Keir Fraser, Jan Beulich

On 17/03/16 16:12, Joao Martins wrote:
> When using TSC as clocksource we will solely rely on TSC for updating
> vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at
> different instants meaning every EPOCH + delta. This delta is variable
> depending on the time the CPU calibrates with CPU 0 (master), and will
> likely be different and variable across vCPUS. This means that each VCPU
> pvti won't account to its calibration error which could lead to time
> going backwards, and allowing a situation where time read on VCPU B
> immediately after A being smaller. While this doesn't happen a lot, I
> was able to observe (for clocksource=tsc) around 50 times in an hour
> having warps of < 100 ns.
>
> This patch proposes relying on host TSC synchronization and passthrough
> of the master tsc to the guest, when running on a TSC-safe platform.  On
> the rendezvous function we will retrieve the platform time in ns and the
> last count read by the clocksource that was used to compute system time.
> master will write both master_tsc_stamp and master_stime, and the other
> vCPUS (slave) will use it to update their correspondent time infos.
> This way we can guarantee that on a platform with a constant and
> reliable TSC, that the time read on vcpu B right after A is bigger
> independently of the VCPU calibration error. Since pvclock time infos
> are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then
> enables usage of VDSO on Linux.  IIUC, this is similar to how it's
> implemented on KVM.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/time.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 89c35d0..a17529c 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -917,6 +917,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>  
>      _u.tsc_timestamp = tsc_stamp;
>      _u.system_time   = t->stime_local_stamp;
> +    if ( clocksource_is_tsc )
> +        _u.flags    |= PVCLOCK_TSC_STABLE_BIT;
>  
>      if ( is_hvm_domain(d) )
>          _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
> @@ -1377,9 +1379,12 @@ static void time_calibration_std_rendezvous(void *_r)
>  
>      if ( smp_processor_id() == 0 )
>      {
> +        u64 last_counter;

Blank line here please.

>          while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
>              cpu_relax();
> -        r->master_stime = read_platform_stime();
> +        r->master_stime = read_platform_stime(&last_counter);
> +        if ( clocksource_is_tsc )
> +            r->master_tsc_stamp = last_counter;
>          mb(); /* write r->master_stime /then/ signal */
>          atomic_inc(&r->semaphore);
>      }
> @@ -1391,7 +1396,10 @@ static void time_calibration_std_rendezvous(void *_r)
>          mb(); /* receive signal /then/ read r->master_stime */
>      }
>  
> -    c->local_tsc_stamp = rdtsc();
> +    if ( clocksource_is_tsc )
> +        c->local_tsc_stamp = r->master_tsc_stamp;
> +    else
> +        c->local_tsc_stamp = rdtsc();
>      c->stime_local_stamp = get_s_time();
>      c->stime_master_stamp = r->master_stime;
>  

The point of the rendezvous is to run rdtsc() at a the time on each cpu
at the same time.  With this logic, it seems that you don't need the
rendezvous at all.

Avoiding the time_calibration_std_rendezvous() entirely in this
situation would be the better, surely?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-18 20:12   ` Andrew Cooper
@ 2016-03-21 11:42     ` Joao Martins
  2016-03-21 11:43       ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-21 11:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Ian Jackson, Tim Deegan, Keir Fraser, Jan Beulich, xen-devel



On 03/18/2016 08:12 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> This field has two possible flags (as of latest pvclock ABI
>> shared with KVM).
>>
>> flags: bits in this field indicate extended capabilities
>> coordinated between the guest and the hypervisor.  Specifically
>> on KVM, availability of specific flags has to be checked in
>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
>> still check some of the flags after registering the time info
>> page since a force_update_vcpu_system_time is performed.
>>
>> Current flags are:
>>
>>  flag bit   | cpuid bit    | meaning
>> -------------------------------------------------------------
>>             |              | time measures taken across
>>      0      |      24      | multiple cpus are guaranteed to
>>             |              | be monotonic
>> -------------------------------------------------------------
>>             |              | guest vcpu has been paused by
>>      1      |     N/A      | the host
>>             |              |
>> -------------------------------------------------------------
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
Thanks!

> Has the Linux maintainers file been patched to include xen-devel, to
> avoid them altering our ABI unnoticed in the future?
> 
Not yet, but I had planned to do so when sending the v2 of the linux side. But
perhaps you want it beforehand?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-18 20:21   ` Andrew Cooper
@ 2016-03-21 11:43     ` Joao Martins
  2016-03-22 12:41     ` Joao Martins
  1 sibling, 0 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-21 11:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel



On 03/18/2016 08:21 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> Introduce support for using TSC as platform time which is the highest
>> resolution time and most performant to get (~20 nsecs).  Though there
>> are also several problems associated with its usage, and there isn't a
>> complete (and architecturally defined) guarantee that all machines
>> will provide reliable and monotonic TSC across all CPUs, on different
>> sockets and different P/C states.  I believe Intel to be the only that
>> can guarantee that. For this reason it's set with less priority when
>> compared to HPET unless adminstrator changes "clocksource" boot option
>> to "tsc". Upon initializing it, we also check for time warps and
>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>> order to fallback to the next available clocksource.
>>
>> It is also worth noting that with clocksource=tsc there isn't any
>> need to synchronise with another clocksource, and I could verify that
>> great portion the time skew was eliminated and seeing much less time
>> warps happening. With HPET I observed ~500 warps in the period
>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>> period having each warp < 100 ns. The warps still exist though but
>> are only related to cross CPU calibration (being how much it takes to
>> rendezvous with master), in which a later patch in this series aims to
>> solve.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Cc: Keir Fraser <keir@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Some style corrections, but no functional problems.
> 
> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
> 
I've fixed all your comments for v2, Thanks!

>>
>> Changes since RFC:
>>  - Spelling fixes in the commit message.
>>  - Remove unused clocksource_is_tsc variable and introduce it instead
>>  on the patch that uses it.
>>  - Move plt_tsc from second to last in the available clocksources.
>> ---
>>  xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 60 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 687e39b..1311c58 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  }
>>  
>>  /************************************************************
>> + * PLATFORM TIMER 4: TSC
>> + */
>> +static u64 tsc_freq;
>> +static unsigned long tsc_max_warp;
>> +static void tsc_check_reliability(void);
>> +
>> +static int __init init_tsctimer(struct platform_timesource *pts)
>> +{
>> +    bool_t tsc_reliable = 0;
>> +
>> +    tsc_check_reliability();
>> +
>> +    if ( tsc_max_warp > 0 )
>> +    {
>> +        tsc_reliable = 0;
>> +        printk("TSC: didn't passed warp test\n");
> 
> printk(XENLOG_INFO "TSC ...
> 
>> +    }
>> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>> +              ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>> +                boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) )
> 
> You don't need extra spaces on inner brackets, so
> 
> boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
> (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>  boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
> 
> is fine.
> 
>> +    {
>> +        tsc_reliable = 1;
>> +    }
>> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>> +    {
>> +        tsc_reliable = (max_cstate <= 2);
>> +
>> +        if (tsc_reliable)
> 
> Please add some extra ones here though.
> 
>> +            printk("TSC: no deep Cstates, deemed reliable\n");
>> +        else
>> +            printk("TSC: deep Cstates possible, so not reliable\n");
> 
> XENLOG_INFO as well please.
> 
> ~Andrew
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-21 11:42     ` Joao Martins
@ 2016-03-21 11:43       ` Andrew Cooper
  2016-03-21 11:51         ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2016-03-21 11:43 UTC (permalink / raw)
  To: Joao Martins; +Cc: Ian Jackson, Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

On 21/03/16 11:42, Joao Martins wrote:
>
> On 03/18/2016 08:12 PM, Andrew Cooper wrote:
>> On 17/03/16 16:12, Joao Martins wrote:
>>> This field has two possible flags (as of latest pvclock ABI
>>> shared with KVM).
>>>
>>> flags: bits in this field indicate extended capabilities
>>> coordinated between the guest and the hypervisor.  Specifically
>>> on KVM, availability of specific flags has to be checked in
>>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
>>> still check some of the flags after registering the time info
>>> page since a force_update_vcpu_system_time is performed.
>>>
>>> Current flags are:
>>>
>>>  flag bit   | cpuid bit    | meaning
>>> -------------------------------------------------------------
>>>             |              | time measures taken across
>>>      0      |      24      | multiple cpus are guaranteed to
>>>             |              | be monotonic
>>> -------------------------------------------------------------
>>>             |              | guest vcpu has been paused by
>>>      1      |     N/A      | the host
>>>             |              |
>>> -------------------------------------------------------------
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>
> Thanks!
>
>> Has the Linux maintainers file been patched to include xen-devel, to
>> avoid them altering our ABI unnoticed in the future?
>>
> Not yet, but I had planned to do so when sending the v2 of the linux side. But
> perhaps you want it beforehand?

Just so long as it doesn't get lost.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] x86/time: streamline platform time init on plt_init()
  2016-03-18 20:32   ` Andrew Cooper
@ 2016-03-21 11:45     ` Joao Martins
  0 siblings, 0 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-21 11:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel



On 03/18/2016 08:32 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> And use to initialize platform time solely for clocksource=tsc,
>> as opposed to initializing platform overflow timer, which would
>> only fire in ~180 years (on 2.2 Ghz Broadwell processor).
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> Again, just style corrections.
> 
> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
> 
I've fixed all these comments in v2, too. Thanks!

>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 1311c58..5af8902 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  /************************************************************
>>   * PLATFORM TIMER 4: TSC
>>   */
>> +static bool_t clocksource_is_tsc = 0;
> 
> No need to explicitly initialise to 0.
> 
>>  static u64 tsc_freq;
>>  static unsigned long tsc_max_warp;
>>  static void tsc_check_reliability(void);
>> @@ -466,7 +467,7 @@ static int __init init_tsctimer(struct platform_timesource *pts)
>>      }
>>  
>>      pts->frequency = tsc_freq;
>> -    return tsc_reliable;
>> +    return ( clocksource_is_tsc = tsc_reliable );
> 
> While this does work, please avoid mixing an assignment and a return.
> 
> Something like
> 
> pts->frequency = tsc_freq;
> clocksource_is_tsc = tsc_reliable;
> 
> return tsc_reliable;
> 
> is fine and easier logic to read.
> 
>> @@ -633,11 +648,22 @@ static void __init init_platform_timer(void)
>>  
>>      set_time_scale(&plt_scale, pts->frequency);
>>  
>> -    plt_overflow_period = scale_delta(
>> -        1ull << (pts->counter_bits-1), &plt_scale);
>> -    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>>      plt_src = *pts;
>> -    plt_overflow(NULL);
>> +
>> +    if ( clocksource_is_tsc )
>> +    {
>> +        plt_init();
>> +    }
>> +    else
>> +    {
>> +        plt_overflow_period = scale_delta(
>> +            1ull << (pts->counter_bits-1), &plt_scale);
> 
> As you are moving this codeblock, please fix the style to
> (pts->counter_bits - 1)
> 
> ~Andrew
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] x86/time: refactor read_platform_stime()
  2016-03-18 20:34   ` Andrew Cooper
@ 2016-03-21 11:45     ` Joao Martins
  2016-03-21 13:08       ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-21 11:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel



On 03/18/2016 08:34 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> To fetch the last read from the clocksource which was used to
>> calculate system_time. In the case of clocksource=tsc we will
>> use it to set tsc_timestamp.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> Again, just minor style issues.
> 
> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
> 
All fixed, though I found one change missing in this series, specifically on
time_calibration_std_rendezvous. Otherwise this commit would break compilation.
See chunk below for the change I am adding:

@@ -1377,7 +1380,7 @@ static void time_calibration_std_rendezvous(void *_r)
     {
         while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
             cpu_relax();
-        r->master_stime = read_platform_stime();
+        r->master_stime = read_platform_stime(NULL);
         mb(); /* write r->master_stime /then/ signal */
         atomic_inc(&r->semaphore);
     }

Having this fixed, could I still keep your Reviewed-by?

>> ---
>> Cc: Keir Fraser <keir@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  xen/arch/x86/time.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 5af8902..89c35d0 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -508,6 +508,7 @@ static s_time_t stime_platform_stamp; /* System time at below platform time */
>>  static u64 platform_timer_stamp;      /* Platform time at above system time */
>>  static u64 plt_stamp64;          /* 64-bit platform counter stamp           */
>>  static u64 plt_stamp;            /* hardware-width platform counter stamp   */
>> +static u64 plt_stamp_counter;    /* last read since read_counter */
> 
> This can have its scope reduced to within read_platform_stime()
> 
>>  static struct timer plt_overflow_timer;
>>  
>>  static s_time_t __read_platform_stime(u64 platform_time)
>> @@ -566,7 +567,7 @@ static void plt_overflow(void *unused)
>>      set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
>>  }
>>  
>> -static s_time_t read_platform_stime(void)
>> +static s_time_t read_platform_stime(u64 *stamp)
>>  {
>>      u64 count;
>>      s_time_t stime;
>> @@ -574,8 +575,11 @@ static s_time_t read_platform_stime(void)
>>      ASSERT(!local_irq_is_enabled());
>>  
>>      spin_lock(&platform_timer_lock);
>> -    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
>> +    plt_stamp_counter = plt_src.read_counter();
>> +    count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask);
>>      stime = __read_platform_stime(count);
>> +    if (stamp)
> 
> Spaces.
> 
> ~Andrew
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-03-18 20:58   ` Andrew Cooper
@ 2016-03-21 11:50     ` Joao Martins
  0 siblings, 0 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-21 11:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel



On 03/18/2016 08:58 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> When using TSC as clocksource we will solely rely on TSC for updating
>> vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at
>> different instants meaning every EPOCH + delta. This delta is variable
>> depending on the time the CPU calibrates with CPU 0 (master), and will
>> likely be different and variable across vCPUS. This means that each VCPU
>> pvti won't account to its calibration error which could lead to time
>> going backwards, and allowing a situation where time read on VCPU B
>> immediately after A being smaller. While this doesn't happen a lot, I
>> was able to observe (for clocksource=tsc) around 50 times in an hour
>> having warps of < 100 ns.
>>
>> This patch proposes relying on host TSC synchronization and passthrough
>> of the master tsc to the guest, when running on a TSC-safe platform.  On
>> the rendezvous function we will retrieve the platform time in ns and the
>> last count read by the clocksource that was used to compute system time.
>> master will write both master_tsc_stamp and master_stime, and the other
>> vCPUS (slave) will use it to update their correspondent time infos.
>> This way we can guarantee that on a platform with a constant and
>> reliable TSC, that the time read on vcpu B right after A is bigger
>> independently of the VCPU calibration error. Since pvclock time infos
>> are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then
>> enables usage of VDSO on Linux.  IIUC, this is similar to how it's
>> implemented on KVM.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Cc: Keir Fraser <keir@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>>  xen/arch/x86/time.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 89c35d0..a17529c 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -917,6 +917,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>>  
>>      _u.tsc_timestamp = tsc_stamp;
>>      _u.system_time   = t->stime_local_stamp;
>> +    if ( clocksource_is_tsc )
>> +        _u.flags    |= PVCLOCK_TSC_STABLE_BIT;
>>  
>>      if ( is_hvm_domain(d) )
>>          _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
>> @@ -1377,9 +1379,12 @@ static void time_calibration_std_rendezvous(void *_r)
>>  
>>      if ( smp_processor_id() == 0 )
>>      {
>> +        u64 last_counter;
> 
> Blank line here please.
> 
>>          while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
>>              cpu_relax();
>> -        r->master_stime = read_platform_stime();
>> +        r->master_stime = read_platform_stime(&last_counter);
>> +        if ( clocksource_is_tsc )
>> +            r->master_tsc_stamp = last_counter;
>>          mb(); /* write r->master_stime /then/ signal */
>>          atomic_inc(&r->semaphore);
>>      }
>> @@ -1391,7 +1396,10 @@ static void time_calibration_std_rendezvous(void *_r)
>>          mb(); /* receive signal /then/ read r->master_stime */
>>      }
>>  
>> -    c->local_tsc_stamp = rdtsc();
>> +    if ( clocksource_is_tsc )
>> +        c->local_tsc_stamp = r->master_tsc_stamp;
>> +    else
>> +        c->local_tsc_stamp = rdtsc();
>>      c->stime_local_stamp = get_s_time();
>>      c->stime_master_stamp = r->master_stime;
>>  
> 
> The point of the rendezvous is to run rdtsc() at a the time on each cpu
> at the same time.  With this logic, it seems that you don't need the
> rendezvous at all.
> 
> Avoiding the time_calibration_std_rendezvous() entirely in this
> situation would be the better, surely?
Indeed, and would look cleaner too. I've changed the approach in this patch for
v2 following your guideline, along with some retesting.

> 
> ~Andrew
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-21 11:43       ` Andrew Cooper
@ 2016-03-21 11:51         ` Joao Martins
  0 siblings, 0 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-21 11:51 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Ian Jackson, Tim Deegan, Keir Fraser, Jan Beulich, xen-devel



On 03/21/2016 11:43 AM, Andrew Cooper wrote:
> On 21/03/16 11:42, Joao Martins wrote:
>>
>> On 03/18/2016 08:12 PM, Andrew Cooper wrote:
>>> On 17/03/16 16:12, Joao Martins wrote:
>>>> This field has two possible flags (as of latest pvclock ABI
>>>> shared with KVM).
>>>>
>>>> flags: bits in this field indicate extended capabilities
>>>> coordinated between the guest and the hypervisor.  Specifically
>>>> on KVM, availability of specific flags has to be checked in
>>>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
>>>> still check some of the flags after registering the time info
>>>> page since a force_update_vcpu_system_time is performed.
>>>>
>>>> Current flags are:
>>>>
>>>>  flag bit   | cpuid bit    | meaning
>>>> -------------------------------------------------------------
>>>>             |              | time measures taken across
>>>>      0      |      24      | multiple cpus are guaranteed to
>>>>             |              | be monotonic
>>>> -------------------------------------------------------------
>>>>             |              | guest vcpu has been paused by
>>>>      1      |     N/A      | the host
>>>>             |              |
>>>> -------------------------------------------------------------
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>> Thanks!
>>
>>> Has the Linux maintainers file been patched to include xen-devel, to
>>> avoid them altering our ABI unnoticed in the future?
>>>
>> Not yet, but I had planned to do so when sending the v2 of the linux side. But
>> perhaps you want it beforehand?
> 
> Just so long as it doesn't get lost.
It definitely won't get lost.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] x86/time: refactor read_platform_stime()
  2016-03-21 11:45     ` Joao Martins
@ 2016-03-21 13:08       ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2016-03-21 13:08 UTC (permalink / raw)
  To: Joao Martins; +Cc: Keir Fraser, Jan Beulich, xen-devel

On 21/03/16 11:45, Joao Martins wrote:
>
> All fixed, though I found one change missing in this series, specifically on
> time_calibration_std_rendezvous. Otherwise this commit would break compilation.
> See chunk below for the change I am adding:
>
> @@ -1377,7 +1380,7 @@ static void time_calibration_std_rendezvous(void *_r)
>      {
>          while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
>              cpu_relax();
> -        r->master_stime = read_platform_stime();
> +        r->master_stime = read_platform_stime(NULL);
>          mb(); /* write r->master_stime /then/ signal */
>          atomic_inc(&r->semaphore);
>      }
>
> Having this fixed, could I still keep your Reviewed-by?

Yes - that's fine.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-17 16:12 ` [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info Joao Martins
  2016-03-18 20:12   ` Andrew Cooper
@ 2016-03-21 15:10   ` Jan Beulich
  2016-03-21 15:27     ` Andrew Cooper
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2016-03-21 15:10 UTC (permalink / raw)
  To: Joao Martins; +Cc: Keir Fraser, Tim Deegan, Ian Jackson, xen-devel

>>> On 17.03.16 at 17:12, <joao.m.martins@oracle.com> wrote:
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>       */
>      uint32_t tsc_to_system_mul;
>      int8_t   tsc_shift;
> -    int8_t   pad1[3];
> +    int8_t   flags;

For use as flags I'm sure this would better be uint8_t.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-21 15:10   ` Jan Beulich
@ 2016-03-21 15:27     ` Andrew Cooper
  2016-03-21 15:40       ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2016-03-21 15:27 UTC (permalink / raw)
  To: Jan Beulich, Joao Martins; +Cc: Tim Deegan, Keir Fraser, Ian Jackson, xen-devel

On 21/03/16 15:10, Jan Beulich wrote:
>>>> On 17.03.16 at 17:12, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/include/public/xen.h
>> +++ b/xen/include/public/xen.h
>> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>>       */
>>      uint32_t tsc_to_system_mul;
>>      int8_t   tsc_shift;
>> -    int8_t   pad1[3];
>> +    int8_t   flags;
> For use as flags I'm sure this would better be uint8_t.

Sadly not possible.  Linux have already made the above adjustment
without (CC'ing xen-devel), so that ABI is set.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info
  2016-03-21 15:27     ` Andrew Cooper
@ 2016-03-21 15:40       ` Joao Martins
  0 siblings, 0 replies; 30+ messages in thread
From: Joao Martins @ 2016-03-21 15:40 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Tim Deegan, Keir Fraser, Ian Jackson, xen-devel



On 03/21/2016 03:27 PM, Andrew Cooper wrote:
> On 21/03/16 15:10, Jan Beulich wrote:
>>>>> On 17.03.16 at 17:12, <joao.m.martins@oracle.com> wrote:
>>> --- a/xen/include/public/xen.h
>>> +++ b/xen/include/public/xen.h
>>> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>>>       */
>>>      uint32_t tsc_to_system_mul;
>>>      int8_t   tsc_shift;
>>> -    int8_t   pad1[3];
>>> +    int8_t   flags;
>> For use as flags I'm sure this would better be uint8_t.
> 
> Sadly not possible.  Linux have already made the above adjustment
> without (CC'ing xen-devel), so that ABI is set.
> 
I was double checking again and it's my mistake. Both flags and pad fields are
uint8_t. I will fix both in v2, to make sure it's all kept the same on both
header files. My apologies.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-18 20:21   ` Andrew Cooper
  2016-03-21 11:43     ` Joao Martins
@ 2016-03-22 12:41     ` Joao Martins
  2016-03-22 12:46       ` Jan Beulich
  1 sibling, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-22 12:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel



On 03/18/2016 08:21 PM, Andrew Cooper wrote:
> On 17/03/16 16:12, Joao Martins wrote:
>> Introduce support for using TSC as platform time which is the highest
>> resolution time and most performant to get (~20 nsecs).  Though there
>> are also several problems associated with its usage, and there isn't a
>> complete (and architecturally defined) guarantee that all machines
>> will provide reliable and monotonic TSC across all CPUs, on different
>> sockets and different P/C states.  I believe Intel to be the only that
>> can guarantee that. For this reason it's set with less priority when
>> compared to HPET unless adminstrator changes "clocksource" boot option
>> to "tsc". Upon initializing it, we also check for time warps and
>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>> order to fallback to the next available clocksource.
>>
>> It is also worth noting that with clocksource=tsc there isn't any
>> need to synchronise with another clocksource, and I could verify that
>> great portion the time skew was eliminated and seeing much less time
>> warps happening. With HPET I observed ~500 warps in the period
>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>> period having each warp < 100 ns. The warps still exist though but
>> are only related to cross CPU calibration (being how much it takes to
>> rendezvous with master), in which a later patch in this series aims to
>> solve.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Cc: Keir Fraser <keir@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Some style corrections, but no functional problems.
> 
> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
> 
I found out one issue in the tsc clocksource initialization path with respect to
the reliability check. This check is running when initializing platform timer,
but not all CPUS are up at that point (init_xen_time()) which means that the
check will always succeed. So for clocksource=tsc I need to defer initialization
to a later point (on verify_tsc_reliability()) and if successful switch to that
clocksource. Unless you disagree, v2 would have this and just requires one
additional preparatory change prior to this patch.

Joao

>>
>> Changes since RFC:
>>  - Spelling fixes in the commit message.
>>  - Remove unused clocksource_is_tsc variable and introduce it instead
>>  on the patch that uses it.
>>  - Move plt_tsc from second to last in the available clocksources.
>> ---
>>  xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 60 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 687e39b..1311c58 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  }
>>  
>>  /************************************************************
>> + * PLATFORM TIMER 4: TSC
>> + */
>> +static u64 tsc_freq;
>> +static unsigned long tsc_max_warp;
>> +static void tsc_check_reliability(void);
>> +
>> +static int __init init_tsctimer(struct platform_timesource *pts)
>> +{
>> +    bool_t tsc_reliable = 0;
>> +
>> +    tsc_check_reliability();
>> +
>> +    if ( tsc_max_warp > 0 )
>> +    {
>> +        tsc_reliable = 0;
>> +        printk("TSC: didn't passed warp test\n");
> 
> printk(XENLOG_INFO "TSC ...
> 
>> +    }
>> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>> +              ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>> +                boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) )
> 
> You don't need extra spaces on inner brackets, so
> 
> boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
> (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>  boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
> 
> is fine.
> 
>> +    {
>> +        tsc_reliable = 1;
>> +    }
>> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>> +    {
>> +        tsc_reliable = (max_cstate <= 2);
>> +
>> +        if (tsc_reliable)
> 
> Please add some extra ones here though.
> 
>> +            printk("TSC: no deep Cstates, deemed reliable\n");
>> +        else
>> +            printk("TSC: deep Cstates possible, so not reliable\n");
> 
> XENLOG_INFO as well please.
> 
> ~Andrew
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-22 12:41     ` Joao Martins
@ 2016-03-22 12:46       ` Jan Beulich
  2016-03-22 15:51         ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2016-03-22 12:46 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote:

> 
> On 03/18/2016 08:21 PM, Andrew Cooper wrote:
>> On 17/03/16 16:12, Joao Martins wrote:
>>> Introduce support for using TSC as platform time which is the highest
>>> resolution time and most performant to get (~20 nsecs).  Though there
>>> are also several problems associated with its usage, and there isn't a
>>> complete (and architecturally defined) guarantee that all machines
>>> will provide reliable and monotonic TSC across all CPUs, on different
>>> sockets and different P/C states.  I believe Intel to be the only that
>>> can guarantee that. For this reason it's set with less priority when
>>> compared to HPET unless adminstrator changes "clocksource" boot option
>>> to "tsc". Upon initializing it, we also check for time warps and
>>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>>> order to fallback to the next available clocksource.
>>>
>>> It is also worth noting that with clocksource=tsc there isn't any
>>> need to synchronise with another clocksource, and I could verify that
>>> great portion the time skew was eliminated and seeing much less time
>>> warps happening. With HPET I observed ~500 warps in the period
>>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>>> period having each warp < 100 ns. The warps still exist though but
>>> are only related to cross CPU calibration (being how much it takes to
>>> rendezvous with master), in which a later patch in this series aims to
>>> solve.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> Cc: Keir Fraser <keir@xen.org>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> 
>> Some style corrections, but no functional problems.
>> 
>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
>> 
> I found out one issue in the tsc clocksource initialization path with respect to
> the reliability check. This check is running when initializing platform timer,
> but not all CPUS are up at that point (init_xen_time()) which means that the
> check will always succeed. So for clocksource=tsc I need to defer initialization
> to a later point (on verify_tsc_reliability()) and if successful switch to that
> clocksource. Unless you disagree, v2 would have this and just requires one
> additional preparatory change prior to this patch.

Hmm, that's suspicious when thinking about CPU hotplug: What
do you intend to do when a CPU comes online later, failing the
check? So far we don't do runtime switching of the clock source,
and I'm not sure that's a good idea to do when there are running
guests.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-22 12:46       ` Jan Beulich
@ 2016-03-22 15:51         ` Joao Martins
  2016-03-22 16:02           ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-22 15:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel



On 03/22/2016 12:46 PM, Jan Beulich wrote:
>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote:
> 
>>
>> On 03/18/2016 08:21 PM, Andrew Cooper wrote:
>>> On 17/03/16 16:12, Joao Martins wrote:
>>>> Introduce support for using TSC as platform time which is the highest
>>>> resolution time and most performant to get (~20 nsecs).  Though there
>>>> are also several problems associated with its usage, and there isn't a
>>>> complete (and architecturally defined) guarantee that all machines
>>>> will provide reliable and monotonic TSC across all CPUs, on different
>>>> sockets and different P/C states.  I believe Intel to be the only that
>>>> can guarantee that. For this reason it's set with less priority when
>>>> compared to HPET unless adminstrator changes "clocksource" boot option
>>>> to "tsc". Upon initializing it, we also check for time warps and
>>>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>>>> order to fallback to the next available clocksource.
>>>>
>>>> It is also worth noting that with clocksource=tsc there isn't any
>>>> need to synchronise with another clocksource, and I could verify that
>>>> great portion the time skew was eliminated and seeing much less time
>>>> warps happening. With HPET I observed ~500 warps in the period
>>>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>>>> period having each warp < 100 ns. The warps still exist though but
>>>> are only related to cross CPU calibration (being how much it takes to
>>>> rendezvous with master), in which a later patch in this series aims to
>>>> solve.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> Cc: Keir Fraser <keir@xen.org>
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> Some style corrections, but no functional problems.
>>>
>>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>> I found out one issue in the tsc clocksource initialization path with respect to
>> the reliability check. This check is running when initializing platform timer,
>> but not all CPUS are up at that point (init_xen_time()) which means that the
>> check will always succeed. So for clocksource=tsc I need to defer initialization
>> to a later point (on verify_tsc_reliability()) and if successful switch to that
>> clocksource. Unless you disagree, v2 would have this and just requires one
>> additional preparatory change prior to this patch.
> 
> Hmm, that's suspicious when thinking about CPU hotplug: What
> do you intend to do when a CPU comes online later, failing the
> check?
Good point, but I am not sure whether that would happen. The initcall
verify_tsc_reliability seems to be called only for the boot processor. Or are
you saying that it's case that initcalls are called too when hotplugging cpus
later on? If that's the case then my suggestion wouldn't work as you point out -
or rather without having runtime switching support as you point out below.

> So far we don't do runtime switching of the clock source,
> and I'm not sure that's a good idea to do when there are running
> guests.
Totally agree, but I would be proposing to be at initialization phase where
there wouldn't be guests running. We would start with HPET, then only switch to
TSC if that check has passed (and would happen once).

Simpler alternative would be to call init_xen_time after all CPUs are brought up
(and would also keep this patch as is), but I am not sure about the
repercussions of that.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-22 15:51         ` Joao Martins
@ 2016-03-22 16:02           ` Jan Beulich
  2016-03-22 20:40             ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2016-03-22 16:02 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote:

> 
> On 03/22/2016 12:46 PM, Jan Beulich wrote:
>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote:
>> 
>>>
>>> On 03/18/2016 08:21 PM, Andrew Cooper wrote:
>>>> On 17/03/16 16:12, Joao Martins wrote:
>>>>> Introduce support for using TSC as platform time which is the highest
>>>>> resolution time and most performant to get (~20 nsecs).  Though there
>>>>> are also several problems associated with its usage, and there isn't a
>>>>> complete (and architecturally defined) guarantee that all machines
>>>>> will provide reliable and monotonic TSC across all CPUs, on different
>>>>> sockets and different P/C states.  I believe Intel to be the only that
>>>>> can guarantee that. For this reason it's set with less priority when
>>>>> compared to HPET unless adminstrator changes "clocksource" boot option
>>>>> to "tsc". Upon initializing it, we also check for time warps and
>>>>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>>>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>>>>> order to fallback to the next available clocksource.
>>>>>
>>>>> It is also worth noting that with clocksource=tsc there isn't any
>>>>> need to synchronise with another clocksource, and I could verify that
>>>>> great portion the time skew was eliminated and seeing much less time
>>>>> warps happening. With HPET I observed ~500 warps in the period
>>>>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>>>>> period having each warp < 100 ns. The warps still exist though but
>>>>> are only related to cross CPU calibration (being how much it takes to
>>>>> rendezvous with master), in which a later patch in this series aims to
>>>>> solve.
>>>>>
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> ---
>>>>> Cc: Keir Fraser <keir@xen.org>
>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>
>>>> Some style corrections, but no functional problems.
>>>>
>>>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
>>>>
>>> I found out one issue in the tsc clocksource initialization path with respect to
>>> the reliability check. This check is running when initializing platform timer,
>>> but not all CPUS are up at that point (init_xen_time()) which means that the
>>> check will always succeed. So for clocksource=tsc I need to defer initialization
>>> to a later point (on verify_tsc_reliability()) and if successful switch to that
>>> clocksource. Unless you disagree, v2 would have this and just requires one
>>> additional preparatory change prior to this patch.
>> 
>> Hmm, that's suspicious when thinking about CPU hotplug: What
>> do you intend to do when a CPU comes online later, failing the
>> check?
> Good point, but I am not sure whether that would happen. The initcall
> verify_tsc_reliability seems to be called only for the boot processor. Or are
> you saying that it's case that initcalls are called too when hotplugging cpus
> later on? If that's the case then my suggestion wouldn't work as you point out -
> or rather without having runtime switching support as you point out below.

Looks like I didn't express myself clearly enough: "failing the check"
wasn't meant to imply the failure would actually occur, but rather
that failure would occur if the check was run on that CPU. IOW the
case of a CPU getting hotplugged which doesn't satisfy the needs
for selecting TSC as the clock source.

>> So far we don't do runtime switching of the clock source,
>> and I'm not sure that's a good idea to do when there are running
>> guests.
> Totally agree, but I would be proposing to be at initialization phase where
> there wouldn't be guests running. We would start with HPET, then only switch 
> to
> TSC if that check has passed (and would happen once).
> 
> Simpler alternative would be to call init_xen_time after all CPUs are 
> brought up
> (and would also keep this patch as is), but I am not sure about the
> repercussions of that.

I don't see how either would help with the hotplug case.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-22 16:02           ` Jan Beulich
@ 2016-03-22 20:40             ` Joao Martins
  2016-03-23  7:28               ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-22 20:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel



On 03/22/2016 04:02 PM, Jan Beulich wrote:
>>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote:
> 
>>
>> On 03/22/2016 12:46 PM, Jan Beulich wrote:
>>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote:
>>>
>>>>
>>>> On 03/18/2016 08:21 PM, Andrew Cooper wrote:
>>>>> On 17/03/16 16:12, Joao Martins wrote:
>>>>>> Introduce support for using TSC as platform time which is the highest
>>>>>> resolution time and most performant to get (~20 nsecs).  Though there
>>>>>> are also several problems associated with its usage, and there isn't a
>>>>>> complete (and architecturally defined) guarantee that all machines
>>>>>> will provide reliable and monotonic TSC across all CPUs, on different
>>>>>> sockets and different P/C states.  I believe Intel to be the only that
>>>>>> can guarantee that. For this reason it's set with less priority when
>>>>>> compared to HPET unless adminstrator changes "clocksource" boot option
>>>>>> to "tsc". Upon initializing it, we also check for time warps and
>>>>>> appropriate CPU features i.e.  TSC_RELIABLE, CONSTANT_TSC and
>>>>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in
>>>>>> order to fallback to the next available clocksource.
>>>>>>
>>>>>> It is also worth noting that with clocksource=tsc there isn't any
>>>>>> need to synchronise with another clocksource, and I could verify that
>>>>>> great portion the time skew was eliminated and seeing much less time
>>>>>> warps happening. With HPET I observed ~500 warps in the period
>>>>>> of 1h of around 27 us, and with TSC down to 50 warps in the same
>>>>>> period having each warp < 100 ns. The warps still exist though but
>>>>>> are only related to cross CPU calibration (being how much it takes to
>>>>>> rendezvous with master), in which a later patch in this series aims to
>>>>>> solve.
>>>>>>
>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>> ---
>>>>>> Cc: Keir Fraser <keir@xen.org>
>>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>
>>>>> Some style corrections, but no functional problems.
>>>>>
>>>>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>
>>>> I found out one issue in the tsc clocksource initialization path with respect to
>>>> the reliability check. This check is running when initializing platform timer,
>>>> but not all CPUS are up at that point (init_xen_time()) which means that the
>>>> check will always succeed. So for clocksource=tsc I need to defer initialization
>>>> to a later point (on verify_tsc_reliability()) and if successful switch to that
>>>> clocksource. Unless you disagree, v2 would have this and just requires one
>>>> additional preparatory change prior to this patch.
>>>
>>> Hmm, that's suspicious when thinking about CPU hotplug: What
>>> do you intend to do when a CPU comes online later, failing the
>>> check?
>> Good point, but I am not sure whether that would happen. The initcall
>> verify_tsc_reliability seems to be called only for the boot processor. Or are
>> you saying that it's case that initcalls are called too when hotplugging cpus
>> later on? If that's the case then my suggestion wouldn't work as you point out -
>> or rather without having runtime switching support as you point out below.
> 
> Looks like I didn't express myself clearly enough: "failing the check"
> wasn't meant to imply the failure would actually occur, but rather
> that failure would occur if the check was run on that CPU. IOW the
> case of a CPU getting hotplugged which doesn't satisfy the needs
> for selecting TSC as the clock source.
Ah, I see. I believe this wouldn't be an issue with the current rendezvous
mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp
taken in that (hotplugged) CPU in the calibration and the rdtsc() in the guest
same CPU, even though having CPU0 TSC (master) as system_time.

However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU
could have its TSC drifted, and since setting this flag relies on
synchronization of TSCs we would need to clear the flag enterily.

>>> So far we don't do runtime switching of the clock source,
>>> and I'm not sure that's a good idea to do when there are running
>>> guests.
>> Totally agree, but I would be proposing to be at initialization phase where
>> there wouldn't be guests running. We would start with HPET, then only switch 
>> to
>> TSC if that check has passed (and would happen once).
>>
>> Simpler alternative would be to call init_xen_time after all CPUs are 
>> brought up
>> (and would also keep this patch as is), but I am not sure about the
>> repercussions of that.
> 
> I don't see how either would help with the hotplug case.
This was in response to what I thought you meant with your earlier question
(which I misunderstood). But my question is still valid I believe. The reason
for choosing between my suggested approaches is that tsc_check_reliability()
requires all CPUs up so that the check is correctly performed.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-22 20:40             ` Joao Martins
@ 2016-03-23  7:28               ` Jan Beulich
  2016-03-23 12:05                 ` Joao Martins
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2016-03-23  7:28 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 22.03.16 at 21:40, <joao.m.martins@oracle.com> wrote:
> On 03/22/2016 04:02 PM, Jan Beulich wrote:
>>>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote:
>>> On 03/22/2016 12:46 PM, Jan Beulich wrote:
>>>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote:
>>>>>>
>>>>> I found out one issue in the tsc clocksource initialization path with respect to
>>>>> the reliability check. This check is running when initializing platform timer,
>>>>> but not all CPUS are up at that point (init_xen_time()) which means that the
>>>>> check will always succeed. So for clocksource=tsc I need to defer initialization
>>>>> to a later point (on verify_tsc_reliability()) and if successful switch to that
>>>>> clocksource. Unless you disagree, v2 would have this and just requires one
>>>>> additional preparatory change prior to this patch.
>>>>
>>>> Hmm, that's suspicious when thinking about CPU hotplug: What
>>>> do you intend to do when a CPU comes online later, failing the
>>>> check?
>>> Good point, but I am not sure whether that would happen. The initcall
>>> verify_tsc_reliability seems to be called only for the boot processor. Or are
>>> you saying that it's case that initcalls are called too when hotplugging cpus
>>> later on? If that's the case then my suggestion wouldn't work as you point out -
>>> or rather without having runtime switching support as you point out below.
>> 
>> Looks like I didn't express myself clearly enough: "failing the check"
>> wasn't meant to imply the failure would actually occur, but rather
>> that failure would occur if the check was run on that CPU. IOW the
>> case of a CPU getting hotplugged which doesn't satisfy the needs
>> for selecting TSC as the clock source.
> Ah, I see. I believe this wouldn't be an issue with the current rendezvous
> mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp
> taken in that (hotplugged) CPU in the calibration and the rdtsc() in the 
> guest
> same CPU, even though having CPU0 TSC (master) as system_time.
> 
> However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU
> could have its TSC drifted, and since setting this flag relies on
> synchronization of TSCs we would need to clear the flag enterily.

Except that we can't, after guests already got started, validly clear
that flag afaics. The only option I see here would be to never set
this flag if CPU hotplug is possible - by looking at the hot pluggable
CPU count and, if non-zero, perhaps allowing a command line
override to indicate no hotplug is intended (it may well be that such
is already implicitly available).

>>>> So far we don't do runtime switching of the clock source,
>>>> and I'm not sure that's a good idea to do when there are running
>>>> guests.
>>> Totally agree, but I would be proposing to be at initialization phase where
>>> there wouldn't be guests running. We would start with HPET, then only switch 
>>> to
>>> TSC if that check has passed (and would happen once).
>>>
>>> Simpler alternative would be to call init_xen_time after all CPUs are 
>>> brought up
>>> (and would also keep this patch as is), but I am not sure about the
>>> repercussions of that.
>> 
>> I don't see how either would help with the hotplug case.
> This was in response to what I thought you meant with your earlier question
> (which I misunderstood). But my question is still valid I believe. The 
> reason
> for choosing between my suggested approaches is that tsc_check_reliability()
> requires all CPUs up so that the check is correctly performed.

Sure - it seems quite obvious that all boot time available CPUs
should be checked.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-23  7:28               ` Jan Beulich
@ 2016-03-23 12:05                 ` Joao Martins
  2016-03-23 14:05                   ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Joao Martins @ 2016-03-23 12:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel



On 03/23/2016 07:28 AM, Jan Beulich wrote:
>>>> On 22.03.16 at 21:40, <joao.m.martins@oracle.com> wrote:
>> On 03/22/2016 04:02 PM, Jan Beulich wrote:
>>>>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote:
>>>> On 03/22/2016 12:46 PM, Jan Beulich wrote:
>>>>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote:
>>>>>>>
>>>>>> I found out one issue in the tsc clocksource initialization path with respect to
>>>>>> the reliability check. This check is running when initializing platform timer,
>>>>>> but not all CPUS are up at that point (init_xen_time()) which means that the
>>>>>> check will always succeed. So for clocksource=tsc I need to defer initialization
>>>>>> to a later point (on verify_tsc_reliability()) and if successful switch to that
>>>>>> clocksource. Unless you disagree, v2 would have this and just requires one
>>>>>> additional preparatory change prior to this patch.
>>>>>
>>>>> Hmm, that's suspicious when thinking about CPU hotplug: What
>>>>> do you intend to do when a CPU comes online later, failing the
>>>>> check?
>>>> Good point, but I am not sure whether that would happen. The initcall
>>>> verify_tsc_reliability seems to be called only for the boot processor. Or are
>>>> you saying that it's case that initcalls are called too when hotplugging cpus
>>>> later on? If that's the case then my suggestion wouldn't work as you point out -
>>>> or rather without having runtime switching support as you point out below.
>>>
>>> Looks like I didn't express myself clearly enough: "failing the check"
>>> wasn't meant to imply the failure would actually occur, but rather
>>> that failure would occur if the check was run on that CPU. IOW the
>>> case of a CPU getting hotplugged which doesn't satisfy the needs
>>> for selecting TSC as the clock source.
>> Ah, I see. I believe this wouldn't be an issue with the current rendezvous
>> mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp
>> taken in that (hotplugged) CPU in the calibration and the rdtsc() in the 
>> guest
>> same CPU, even though having CPU0 TSC (master) as system_time.
>>
>> However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU
>> could have its TSC drifted, and since setting this flag relies on
>> synchronization of TSCs we would need to clear the flag enterily.
> 
> Except that we can't, after guests already got started, validly clear
> that flag afaics.
Correct.

> The only option I see here would be to never set
> this flag if CPU hotplug is possible - by looking at the hot pluggable
> CPU count and, if non-zero, perhaps allowing a command line
> override to indicate no hotplug is intended (it may well be that such
> is already implicitly available).
OK, will add this then to allow the flag only if the conditions above are met.
Thanks for the pointer!

>>>>> So far we don't do runtime switching of the clock source,
>>>>> and I'm not sure that's a good idea to do when there are running
>>>>> guests.
>>>> Totally agree, but I would be proposing to be at initialization phase where
>>>> there wouldn't be guests running. We would start with HPET, then only switch 
>>>> to
>>>> TSC if that check has passed (and would happen once).
>>>>
>>>> Simpler alternative would be to call init_xen_time after all CPUs are 
>>>> brought up
>>>> (and would also keep this patch as is), but I am not sure about the
>>>> repercussions of that.
>>>
>>> I don't see how either would help with the hotplug case.
>> This was in response to what I thought you meant with your earlier question
>> (which I misunderstood). But my question is still valid I believe. The 
>> reason
>> for choosing between my suggested approaches is that tsc_check_reliability()
>> requires all CPUs up so that the check is correctly performed.
> 
> Sure - it seems quite obvious that all boot time available CPUs
> should be checked.
Cool, so I will go with moving init_xen_time right after all CPUs are up but
before initcalls are invoked.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] x86/time: implement tsc as clocksource
  2016-03-23 12:05                 ` Joao Martins
@ 2016-03-23 14:05                   ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2016-03-23 14:05 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 23.03.16 at 13:05, <joao.m.martins@oracle.com> wrote:
> On 03/23/2016 07:28 AM, Jan Beulich wrote:
>> Sure - it seems quite obvious that all boot time available CPUs
>> should be checked.
> Cool, so I will go with moving init_xen_time right after all CPUs are up but
> before initcalls are invoked.

I think your other alternative seemed more reasonable; I wouldn't
be very happy to see init_xen_time() moved around.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-03-23 14:05 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-17 16:12 [PATCH 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
2016-03-17 16:12 ` [PATCH 1/5] public/xen.h: add flags field to vcpu_time_info Joao Martins
2016-03-18 20:12   ` Andrew Cooper
2016-03-21 11:42     ` Joao Martins
2016-03-21 11:43       ` Andrew Cooper
2016-03-21 11:51         ` Joao Martins
2016-03-21 15:10   ` Jan Beulich
2016-03-21 15:27     ` Andrew Cooper
2016-03-21 15:40       ` Joao Martins
2016-03-17 16:12 ` [PATCH 2/5] x86/time: implement tsc as clocksource Joao Martins
2016-03-18 20:21   ` Andrew Cooper
2016-03-21 11:43     ` Joao Martins
2016-03-22 12:41     ` Joao Martins
2016-03-22 12:46       ` Jan Beulich
2016-03-22 15:51         ` Joao Martins
2016-03-22 16:02           ` Jan Beulich
2016-03-22 20:40             ` Joao Martins
2016-03-23  7:28               ` Jan Beulich
2016-03-23 12:05                 ` Joao Martins
2016-03-23 14:05                   ` Jan Beulich
2016-03-17 16:12 ` [PATCH 3/5] x86/time: streamline platform time init on plt_init() Joao Martins
2016-03-18 20:32   ` Andrew Cooper
2016-03-21 11:45     ` Joao Martins
2016-03-17 16:12 ` [PATCH 4/5] x86/time: refactor read_platform_stime() Joao Martins
2016-03-18 20:34   ` Andrew Cooper
2016-03-21 11:45     ` Joao Martins
2016-03-21 13:08       ` Andrew Cooper
2016-03-17 16:12 ` [PATCH 5/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
2016-03-18 20:58   ` Andrew Cooper
2016-03-21 11:50     ` Joao Martins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).