xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support
@ 2016-03-29 13:44 Joao Martins
  2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
                   ` (5 more replies)
  0 siblings, 6 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

Hey,

This series is the v2 of pvclock TSC series (v1 presented here [0]).

PVCLOCK_TSC_STABLE_BIT is the flag telling the guest that the
vcpu_time_info (pvti) are monotonic as seen by any CPU, a feature
which is currently not supported. As it is (i.e. bindly setting the
flag), we can observe that this property isn't there: a process using
vdso clock_gettime/gettimeofday will observe a significant amount of
warps (i.e. time going backwards) and it's due to 1) time calibration
skew in xen rendezvous algorithm 2) clocksource not in sync with TSC.
These warps are seen more frequently on PV guests (potentially because
vcpu time infos are only updated when guest is in kernel mode, and
perhaps lack of tsc_offset?), and in rare ocasions on HVM guests. It
is worth noting that with guests VCPUs pinned, only PV guests see
these warps. But on HVM guests specifically: such warps only occur
when one of guest VCPUs is pinned to CPU0. This series aims to propose
a solution to that and it's divided as following:

 U 	* Patch 1: Adds the missing flag field to vcpu_time_info.
 N 	* Patch 2: Small refactor around init_platform_time to reuse
                   initialization code when switching to TSC.
 U 	* Patch 3: Adds a new clocksource based on TSC
 R,R 	* Patch 4, 5: Adjustments for patch 5
 U	* Patch 5: Implements the PVCLOCK_TSC_STABLE_BIT

	[ R := Reviewed-by ;; U := Updated ;; N := New ]

PVCLOCK_TSC_STABLE_BIT is set only when using clocksource=tsc and CPU
hotplug is not meant to be perfomed, and remains optional unless specified
by the admin. nocpuhotplug is the option telling that no cpu hotplug is
to be perfomed which overrides system default behaviour when initializing
TSC i.e. testing max_vcpus and number of present cpus.

The test was running time-warp-test, that constantly calls
clock_gettime/gettimeofday on every CPU. It measures a delta with the
previous returned value and mark a warp if it's negative. I measured
it during periods of 1h and 6h and check how many warps and their
values (alongside the amount of time skew). Measurements/Changelog are
included in individual patches.

Note that most of the testing has been done with Linux 4.4 in which
these warps/skew could be easily observed as vdso would use each vCPU
pvti. Though Linux >= 4.5 changes this behaviour and use only the
vCPU0 pvti though still requiring PVCLOCK_TSC_STABLE_BIT flag
support.

Thanks!
Joao

[0] http://bugs.xenproject.org/xen/mid/%3C1458231136-13457-1-git-send-email-joao.m.martins@oracle.com%3E

Joao Martins (6):
  public/xen.h: add flags field to vcpu_time_info
  x86/time: refactor init_platform_time()
  x86/time: implement tsc as clocksource
  x86/time: streamline platform time init on plt_init()
  x86/time: refactor read_platform_stime()
  x86/time: implement PVCLOCK_TSC_STABLE_BIT

 xen/arch/x86/time.c      | 236 +++++++++++++++++++++++++++++++++++++++++------
 xen/include/public/xen.h |   6 +-
 2 files changed, 212 insertions(+), 30 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
@ 2016-03-29 13:44 ` Joao Martins
  2016-03-30 15:49   ` Ian Jackson
                     ` (2 more replies)
  2016-03-29 13:44 ` [PATCH v2 2/6] x86/time: refactor init_platform_time() Joao Martins
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Joao Martins, Ian Jackson, Jan Beulich, Tim Deegan

This field has two possible flags (as of latest pvclock ABI
shared with KVM).

flags: bits in this field indicate extended capabilities
coordinated between the guest and the hypervisor.  Specifically
on KVM, availability of specific flags has to be checked in
0x40000001 cpuid leaf. On Xen, we don't have that but we can
still check some of the flags after registering the time info
page since a force_update_vcpu_system_time is performed.

Current flags are:

 flag bit   | cpuid bit    | meaning
-------------------------------------------------------------
            |              | time measures taken across
     0      |      24      | multiple cpus are guaranteed to
            |              | be monotonic
-------------------------------------------------------------
            |              | guest vcpu has been paused by
     1      |     N/A      | the host
            |              |
-------------------------------------------------------------

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>

Changes since v1:
 - flags and pad are both uint8_t.
 - fix indentation with the other fields in the struct.
---
 xen/include/public/xen.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 64ba7ab..b263fe3 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -614,10 +614,14 @@ struct vcpu_time_info {
      */
     uint32_t tsc_to_system_mul;
     int8_t   tsc_shift;
-    int8_t   pad1[3];
+    uint8_t  flags;
+    uint8_t  pad1[2];
 }; /* 32 bytes */
 typedef struct vcpu_time_info vcpu_time_info_t;
 
+#define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
+#define PVCLOCK_GUEST_STOPPED	(1 << 1)
+
 struct vcpu_info {
     /*
      * 'evtchn_upcall_pending' is written non-zero by Xen to indicate
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 2/6] x86/time: refactor init_platform_time()
  2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
  2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
@ 2016-03-29 13:44 ` Joao Martins
  2016-04-01 16:10   ` Konrad Rzeszutek Wilk
  2016-04-05 10:09   ` Jan Beulich
  2016-03-29 13:44 ` [PATCH v2 3/6] x86/time: implement tsc as clocksource Joao Martins
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

And accomodate platform time source initialization in
try_platform_time(). This is a preparatory patch for deferring
TSC clocksource initialization to the stage where all CPUS are
up (verify_tsc_reliability init call).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

New in v2.
---
 xen/arch/x86/time.c | 41 ++++++++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 687e39b..ed4ed24 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -533,6 +533,30 @@ static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static int __init try_platform_timer(struct platform_timesource *pts)
+{
+    int rc = -1;
+
+    rc = pts->init(pts);
+    if ( rc <= 0 )
+        return rc;
+
+    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
+
+    set_time_scale(&plt_scale, pts->frequency);
+
+    plt_overflow_period = scale_delta(
+        1ull << (pts->counter_bits - 1), &plt_scale);
+    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
+    plt_src = *pts;
+    plt_overflow(NULL);
+
+    platform_timer_stamp = plt_stamp64;
+    stime_platform_stamp = NOW();
+
+    return rc;
+}
+
 static void __init init_platform_timer(void)
 {
     static struct platform_timesource * __initdata plt_timers[] = {
@@ -549,7 +573,7 @@ static void __init init_platform_timer(void)
             pts = plt_timers[i];
             if ( !strcmp(opt_clocksource, pts->id) )
             {
-                rc = pts->init(pts);
+                rc = try_platform_timer(pts);
                 break;
             }
         }
@@ -565,26 +589,13 @@ static void __init init_platform_timer(void)
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
             pts = plt_timers[i];
-            if ( (rc = pts->init(pts)) > 0 )
+            if ( (rc = try_platform_timer(pts)) > 0 )
                 break;
         }
     }
 
     BUG_ON(rc <= 0);
 
-    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
-
-    set_time_scale(&plt_scale, pts->frequency);
-
-    plt_overflow_period = scale_delta(
-        1ull << (pts->counter_bits-1), &plt_scale);
-    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
-    plt_src = *pts;
-    plt_overflow(NULL);
-
-    platform_timer_stamp = plt_stamp64;
-    stime_platform_stamp = NOW();
-
     printk("Platform timer is %s %s\n",
            freq_string(pts->frequency), pts->name);
 }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
  2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
  2016-03-29 13:44 ` [PATCH v2 2/6] x86/time: refactor init_platform_time() Joao Martins
@ 2016-03-29 13:44 ` Joao Martins
  2016-03-29 17:39   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2016-03-29 13:44 ` [PATCH v2 4/6] x86/time: streamline platform time init on plt_init() Joao Martins
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

Introduce support for using TSC as platform time which is the highest
resolution time and most performant to get (~20 nsecs).  Though there
are also several problems associated with its usage, and there isn't a
complete (and architecturally defined) guarantee that all machines
will provide reliable and monotonic TSC across all CPUs, on different
sockets and different P/C states.  I believe Intel to be the only that
can guarantee that. For this reason it's only set when adminstrator
changes "clocksource" boot option to "tsc". Initializing TSC
clocksource requires all CPUs to have the tsc reliability checks
performed. init_xen_time is called before all CPUs are up, and so we
start with HPET at boot time, and switch later to TSC. The switch then
happens on verify_tsc_reliability initcall that is invoked when all
CPUs are up. When attempting to initializing TSC we also check for
time warps and appropriate CPU features i.e.  TSC_RELIABLE,
CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are
met, we keep the clocksource that was previously initialized on
init_xen_time.

It is also worth noting that with clocksource=tsc there isn't any
need to synchronise with another clocksource, and I could verify that
great portion the time skew was eliminated and seeing much less time
warps happening. With HPET I used to observe ~500 warps in the period
of 1h of around 27 us, and with TSC down to 50 warps in the same
period having each warp < 100 ns. The warps still exist though but
are only related to cross CPU calibration (being how much it takes to
rendezvous with master), in which a later patch in this series aims to
solve.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v1:
 - s/printk/printk(XENLOG_INFO
 - Remove extra space on inner brackets
 - Add missing space around brackets
 - Defer clocksource TSC initialization when all CPUs are up.

Changes since RFC:
 - Spelling fixes in the commit message.
 - Remove unused clocksource_is_tsc variable and introduce it instead
 on the patch that uses it.
 - Move plt_tsc from second to last in the available clocksources.
---
 xen/arch/x86/time.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index ed4ed24..2602dda 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 }
 
 /************************************************************
+ * PLATFORM TIMER 4: TSC
+ */
+static u64 tsc_freq;
+static unsigned long tsc_max_warp;
+static void tsc_check_reliability(void);
+
+static int __init init_tsctimer(struct platform_timesource *pts)
+{
+    bool_t tsc_reliable = 0;
+
+    tsc_check_reliability();
+
+    if ( tsc_max_warp > 0 )
+    {
+        tsc_reliable = 0;
+        printk(XENLOG_INFO "TSC: didn't passed warp test\n");
+    }
+    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
+              (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+               boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
+    {
+        tsc_reliable = 1;
+    }
+    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
+    {
+        tsc_reliable = (max_cstate <= 2);
+
+        if ( tsc_reliable )
+            printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n");
+        else
+            printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n");
+    }
+
+    pts->frequency = tsc_freq;
+    return tsc_reliable;
+}
+
+static u64 read_tsc(void)
+{
+    return rdtsc();
+}
+
+static void resume_tsctimer(struct platform_timesource *pts)
+{
+}
+
+static struct platform_timesource __initdata plt_tsc =
+{
+    .id = "tsc",
+    .name = "TSC",
+    .read_counter = read_tsc,
+    .counter_bits = 64,
+    .init = init_tsctimer,
+    .resume = resume_tsctimer,
+};
+
+/************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
@@ -533,6 +590,21 @@ static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static void __init reset_platform_timer(void)
+{
+    /* Deactivate any timers running */
+    kill_timer(&plt_overflow_timer);
+    kill_timer(&calibration_timer);
+
+    /* Reset counters and stamps */
+    spin_lock_irq(&platform_timer_lock);
+    plt_stamp = 0;
+    plt_stamp64 = 0;
+    platform_timer_stamp = 0;
+    stime_platform_stamp = 0;
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static int __init try_platform_timer(struct platform_timesource *pts)
 {
     int rc = -1;
@@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
     if ( rc <= 0 )
         return rc;
 
+    /* We have a platform timesource already so reset it */
+    if ( plt_src.counter_bits != 0 )
+        reset_platform_timer();
+
     plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
 
     set_time_scale(&plt_scale, pts->frequency);
@@ -566,7 +642,9 @@ static void __init init_platform_timer(void)
     struct platform_timesource *pts = NULL;
     int i, rc = -1;
 
-    if ( opt_clocksource[0] != '\0' )
+    /* clocksource=tsc is initialized later when all CPUS are up */
+    if ( (opt_clocksource[0] != '\0') &&
+         (strcmp(opt_clocksource, "tsc") != 0) )
     {
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
@@ -1192,7 +1270,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp)
     }
 }
 
-static unsigned long tsc_max_warp, tsc_check_count;
+static unsigned long tsc_check_count;
 static cpumask_t tsc_check_cpumask;
 
 static void tsc_check_slave(void *unused)
@@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
         }
     }
 
+    if ( !strcmp(opt_clocksource, "tsc") )
+    {
+        if ( try_platform_timer(&plt_tsc) > 0 )
+        {
+            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
+                   freq_string(plt_src.frequency));
+
+            init_percpu_time();
+
+            init_timer(&calibration_timer, time_calibration, NULL, 0);
+            set_timer(&calibration_timer, NOW() + EPOCH);
+        }
+    }
+
     return 0;
 }
 __initcall(verify_tsc_reliability);
@@ -1476,6 +1568,7 @@ void __init early_time_init(void)
     struct cpu_time *t = &this_cpu(cpu_time);
     u64 tmp = init_pit_and_calibrate_tsc();
 
+    tsc_freq = tmp;
     set_time_scale(&t->tsc_scale, tmp);
     t->local_tsc_stamp = boot_tsc_stamp;
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 4/6] x86/time: streamline platform time init on plt_init()
  2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (2 preceding siblings ...)
  2016-03-29 13:44 ` [PATCH v2 3/6] x86/time: implement tsc as clocksource Joao Martins
@ 2016-03-29 13:44 ` Joao Martins
  2016-04-05 11:46   ` Jan Beulich
  2016-03-29 13:44 ` [PATCH v2 5/6] x86/time: refactor read_platform_stime() Joao Martins
  2016-03-29 13:44 ` [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
  5 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

And use to initialize platform time solely for clocksource=tsc,
as opposed to initializing platform overflow timer, which would
only fire in ~180 years (on 2.2 Ghz Broadwell processor).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v1:
 - Remove initialisation to 0.
 - Remove assignment and return mix.
 - Fix the style on plt_overflow_period calculation.

Changes since RFC:
 - Move clocksource_is_tsc variable to this patch.
 - s/1000000000/SECONDS(1)
---
 xen/arch/x86/time.c | 44 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 2602dda..9cadfcb 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 /************************************************************
  * PLATFORM TIMER 4: TSC
  */
+static bool_t clocksource_is_tsc;
 static u64 tsc_freq;
 static unsigned long tsc_max_warp;
 static void tsc_check_reliability(void);
@@ -466,6 +467,8 @@ static int __init init_tsctimer(struct platform_timesource *pts)
     }
 
     pts->frequency = tsc_freq;
+    clocksource_is_tsc = tsc_reliable;
+
     return tsc_reliable;
 }
 
@@ -516,17 +519,31 @@ static s_time_t __read_platform_stime(u64 platform_time)
     return (stime_platform_stamp + scale_delta(diff, &plt_scale));
 }
 
+static void __plt_init(void)
+{
+    u64 count;
+
+    ASSERT(spin_is_locked(&platform_timer_lock));
+    count = plt_src.read_counter();
+    plt_stamp64 += (count - plt_stamp) & plt_mask;
+    plt_stamp = count;
+}
+
+static void plt_init(void)
+{
+    spin_lock_irq(&platform_timer_lock);
+    __plt_init();
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static void plt_overflow(void *unused)
 {
     int i;
-    u64 count;
     s_time_t now, plt_now, plt_wrap;
 
     spin_lock_irq(&platform_timer_lock);
 
-    count = plt_src.read_counter();
-    plt_stamp64 += (count - plt_stamp) & plt_mask;
-    plt_stamp = count;
+    __plt_init();
 
     now = NOW();
     plt_wrap = __read_platform_stime(plt_stamp64);
@@ -621,11 +638,22 @@ static int __init try_platform_timer(struct platform_timesource *pts)
 
     set_time_scale(&plt_scale, pts->frequency);
 
-    plt_overflow_period = scale_delta(
-        1ull << (pts->counter_bits - 1), &plt_scale);
-    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
     plt_src = *pts;
-    plt_overflow(NULL);
+
+    if ( clocksource_is_tsc )
+    {
+        plt_init();
+    }
+    else
+    {
+        plt_overflow_period = scale_delta(
+            1ull << (pts->counter_bits - 1), &plt_scale);
+        init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
+        plt_overflow(NULL);
+
+        printk("Platform timer overflow period is %lu secs\n",
+               plt_overflow_period/SECONDS(1));
+    }
 
     platform_timer_stamp = plt_stamp64;
     stime_platform_stamp = NOW();
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 5/6] x86/time: refactor read_platform_stime()
  2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (3 preceding siblings ...)
  2016-03-29 13:44 ` [PATCH v2 4/6] x86/time: streamline platform time init on plt_init() Joao Martins
@ 2016-03-29 13:44 ` Joao Martins
  2016-04-01 18:32   ` Konrad Rzeszutek Wilk
  2016-04-05 11:52   ` Jan Beulich
  2016-03-29 13:44 ` [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
  5 siblings, 2 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

To fetch the last read from the clocksource which was used to
calculate system_time. In the case of clocksource=tsc we will
use it to set tsc_timestamp.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v1:
 - Add missing spaces between brackets
 - Move plt_stamp_counter to read_platform_stime()
 - Add signature change in time_calibration_std_rendezvous()
---
 xen/arch/x86/time.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 9cadfcb..123aa42 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -568,16 +568,19 @@ static void plt_overflow(void *unused)
     set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
 }
 
-static s_time_t read_platform_stime(void)
+static s_time_t read_platform_stime(u64 *stamp)
 {
-    u64 count;
+    u64 plt_stamp_counter, count;
     s_time_t stime;
 
     ASSERT(!local_irq_is_enabled());
 
     spin_lock(&platform_timer_lock);
-    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
+    plt_stamp_counter = plt_src.read_counter();
+    count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask);
     stime = __read_platform_stime(count);
+    if ( stamp )
+        *stamp = plt_stamp_counter;
     spin_unlock(&platform_timer_lock);
 
     return stime;
@@ -727,7 +730,7 @@ void cstate_restore_tsc(void)
     if ( boot_cpu_has(X86_FEATURE_NONSTOP_TSC) )
         return;
 
-    write_tsc(stime2tsc(read_platform_stime()));
+    write_tsc(stime2tsc(read_platform_stime(NULL)));
 }
 
 /***************************************************************************
@@ -1046,7 +1049,7 @@ int cpu_frequency_change(u64 freq)
 
     local_irq_disable();
     /* Platform time /first/, as we may be delayed by platform_timer_lock. */
-    t->stime_master_stamp = read_platform_stime();
+    t->stime_master_stamp = read_platform_stime(NULL);
     /* TSC-extrapolated time may be bogus after frequency change. */
     /*t->stime_local_stamp = get_s_time();*/
     t->stime_local_stamp = t->stime_master_stamp;
@@ -1364,7 +1367,7 @@ static void time_calibration_tsc_rendezvous(void *_r)
 
             if ( r->master_stime == 0 )
             {
-                r->master_stime = read_platform_stime();
+                r->master_stime = read_platform_stime(NULL);
                 r->master_tsc_stamp = rdtsc();
             }
             atomic_inc(&r->semaphore);
@@ -1409,7 +1412,7 @@ static void time_calibration_std_rendezvous(void *_r)
     {
         while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
             cpu_relax();
-        r->master_stime = read_platform_stime();
+        r->master_stime = read_platform_stime(NULL);
         mb(); /* write r->master_stime /then/ signal */
         atomic_inc(&r->semaphore);
     }
@@ -1456,7 +1459,7 @@ void init_percpu_time(void)
 
     local_irq_save(flags);
     t->local_tsc_stamp = rdtsc();
-    now = read_platform_stime();
+    now = read_platform_stime(NULL);
     local_irq_restore(flags);
 
     t->stime_master_stamp = now;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (4 preceding siblings ...)
  2016-03-29 13:44 ` [PATCH v2 5/6] x86/time: refactor read_platform_stime() Joao Martins
@ 2016-03-29 13:44 ` Joao Martins
  2016-04-05 12:22   ` Jan Beulich
  5 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-03-29 13:44 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Keir Fraser, Jan Beulich

When using TSC as clocksource we will solely rely on TSC for updating
vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp
at different instants meaning every EPOCH + delta.  This delta is
variable depending on the time the CPU calibrates with CPU 0 (master),
and will likely be different and variable across vCPUS. This means
that each VCPU pvti won't account to its calibration error which could
lead to time going backwards, and allowing a situation where time read
on VCPU B immediately after A being smaller. While this doesn't happen
a lot, I was able to observe (for clocksource=tsc) around 50 times in
an hour having warps of < 100 ns.

This patch proposes relying on host TSC synchronization and
passthrough to the guest, when running on a TSC-safe platform. On
time_calibration we retrieve the platform time in ns and the counter
read by the clocksource that was used to compute system time. We
introduce a new rendezous function which doesn't require
synchronization between master and slave CPUS and just reads
calibration_rendezvous struct and writes it down the stime and stamp
to the cpu_calibration struct to be used later on. We can guarantee that
on a platform with a constant and reliable TSC, that the time read on
vcpu B right after A is bigger independently of the CPU calibration
error. Since pvclock time infos are monotonic as seen by any vCPU set
PVCLOCK_TSC_STABLE_BIT, which then enables usage of VDSO on Linux.
IIUC, this is similar to how it's implemented on KVM.

Note that PVCLOCK_TSC_STABLE_BIT is set only when CPU hotplug isn't
meant to be performed on the host, which will either be when max vcpus
and num_present_cpu are the same or if "nocpuhotplug" command line
parameter is used. This is because a newly hotplugged CPU may not
satisfy the condition of having all TSCs synchronized.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Keir Fraser <keir@xen.org>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Perhaps "cpuhotplugsafe" would be a better name, since potentially
hardware could guarantee TSCs are synchronized on hotplug?

Changes since v1:
 - Change approach to follow Andrew's guideline to
 skip std_rendezvous. And doing so by introducing a nop_rendezvous
 - Change commit message reflecting the change above.
 - Use TSC_STABLE_BIT only if cpu hotplug isn't possible.
 - Add command line option to override it if no cpu hotplug is
 intended.
---
 xen/arch/x86/time.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 123aa42..1dcd4af 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -43,6 +43,10 @@
 static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 
+/* opt_nocpuhotplug: Set if CPU hotplug isn't meant to be used */
+static bool_t __initdata opt_nocpuhotplug;
+boolean_param("nocpuhotplug", opt_nocpuhotplug);
+
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
 DEFINE_SPINLOCK(rtc_lock);
 unsigned long pit0_ticks;
@@ -435,6 +439,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
  * PLATFORM TIMER 4: TSC
  */
 static bool_t clocksource_is_tsc;
+static bool_t use_tsc_stable_bit;
 static u64 tsc_freq;
 static unsigned long tsc_max_warp;
 static void tsc_check_reliability(void);
@@ -468,6 +473,11 @@ static int __init init_tsctimer(struct platform_timesource *pts)
 
     pts->frequency = tsc_freq;
     clocksource_is_tsc = tsc_reliable;
+    use_tsc_stable_bit = clocksource_is_tsc &&
+        ((nr_cpu_ids == num_present_cpus()) || opt_nocpuhotplug);
+
+    if ( clocksource_is_tsc && !use_tsc_stable_bit )
+        printk(XENLOG_INFO "TSC: CPU Hotplug intended, not setting stable bit\n");
 
     return tsc_reliable;
 }
@@ -950,6 +960,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
 
     _u.tsc_timestamp = tsc_stamp;
     _u.system_time   = t->stime_local_stamp;
+    if ( use_tsc_stable_bit )
+        _u.flags    |= PVCLOCK_TSC_STABLE_BIT;
 
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
@@ -1431,6 +1443,22 @@ static void time_calibration_std_rendezvous(void *_r)
     raise_softirq(TIME_CALIBRATE_SOFTIRQ);
 }
 
+/*
+ * Rendezvous function used when clocksource is TSC and
+ * no CPU hotplug will be performed.
+ */
+static void time_calibration_nop_rendezvous(void *_r)
+{
+    struct cpu_calibration *c = &this_cpu(cpu_calibration);
+    struct calibration_rendezvous *r = _r;
+
+    c->local_tsc_stamp = r->master_tsc_stamp;
+    c->stime_local_stamp = get_s_time();
+    c->stime_master_stamp = r->master_stime;
+
+    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
+}
+
 static void (*time_calibration_rendezvous_fn)(void *) =
     time_calibration_std_rendezvous;
 
@@ -1440,6 +1468,13 @@ static void time_calibration(void *unused)
         .semaphore = ATOMIC_INIT(0)
     };
 
+    if ( use_tsc_stable_bit )
+    {
+        local_irq_disable();
+        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
+        local_irq_enable();
+    }
+
     cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
 
     /* @wait=1 because we must wait for all cpus before freeing @r. */
@@ -1555,6 +1590,14 @@ static int __init verify_tsc_reliability(void)
 
             init_percpu_time();
 
+            /*
+             * We won't do CPU Hotplug and TSC clocksource is being used which
+	     * means we have a reliable TSC, plus we don't sync with any other
+	     * clocksource so no need for rendezvous.
+             */
+            if ( use_tsc_stable_bit )
+                time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;
+
             init_timer(&calibration_timer, time_calibration, NULL, 0);
             set_timer(&calibration_timer, NOW() + EPOCH);
         }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-03-29 13:44 ` [PATCH v2 3/6] x86/time: implement tsc as clocksource Joao Martins
@ 2016-03-29 17:39   ` Konrad Rzeszutek Wilk
  2016-03-29 17:52     ` Joao Martins
  2016-04-01 16:43   ` Konrad Rzeszutek Wilk
  2016-04-05 10:43   ` Jan Beulich
  2 siblings, 1 reply; 43+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-03-29 17:39 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

>  static void tsc_check_slave(void *unused)
> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
>          }
>      }
>  
> +    if ( !strcmp(opt_clocksource, "tsc") )

Pls also update xen-command-line.markdown 

> +    {
> +        if ( try_platform_timer(&plt_tsc) > 0 )
> +        {
> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
> +                   freq_string(plt_src.frequency));
> +
> +            init_percpu_time();
> +
> +            init_timer(&calibration_timer, time_calibration, NULL, 0);
> +            set_timer(&calibration_timer, NOW() + EPOCH);
> +        }
> +    }
> +
>      return 0;
>  }
>  __initcall(verify_tsc_reliability);
> @@ -1476,6 +1568,7 @@ void __init early_time_init(void)
>      struct cpu_time *t = &this_cpu(cpu_time);
>      u64 tmp = init_pit_and_calibrate_tsc();
>  
> +    tsc_freq = tmp;
>      set_time_scale(&t->tsc_scale, tmp);
>      t->local_tsc_stamp = boot_tsc_stamp;
>  
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-03-29 17:39   ` Konrad Rzeszutek Wilk
@ 2016-03-29 17:52     ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-29 17:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel



On 03/29/2016 06:39 PM, Konrad Rzeszutek Wilk wrote:
>>  static void tsc_check_slave(void *unused)
>> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
>>          }
>>      }
>>  
>> +    if ( !strcmp(opt_clocksource, "tsc") )
> 
> Pls also update xen-command-line.markdown 
> 
OK. I fixed it for the new clocksource, and also added documentation on the new
boot param that is introduced in the last patch.

>> +    {
>> +        if ( try_platform_timer(&plt_tsc) > 0 )
>> +        {
>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>> +                   freq_string(plt_src.frequency));
>> +
>> +            init_percpu_time();
>> +
>> +            init_timer(&calibration_timer, time_calibration, NULL, 0);
>> +            set_timer(&calibration_timer, NOW() + EPOCH);
>> +        }
>> +    }
>> +
>>      return 0;
>>  }
>>  __initcall(verify_tsc_reliability);
>> @@ -1476,6 +1568,7 @@ void __init early_time_init(void)
>>      struct cpu_time *t = &this_cpu(cpu_time);
>>      u64 tmp = init_pit_and_calibrate_tsc();
>>  
>> +    tsc_freq = tmp;
>>      set_time_scale(&t->tsc_scale, tmp);
>>      t->local_tsc_stamp = boot_tsc_stamp;
>>  
>> -- 
>> 2.1.4
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
@ 2016-03-30 15:49   ` Ian Jackson
  2016-03-30 16:33     ` Joao Martins
  2016-03-31  7:09     ` Jan Beulich
  2016-03-31  7:13   ` Jan Beulich
  2016-04-05 10:16   ` Jan Beulich
  2 siblings, 2 replies; 43+ messages in thread
From: Ian Jackson @ 2016-03-30 15:49 UTC (permalink / raw)
  To: Joao Martins, Andrew Cooper
  Cc: Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

Joao Martins writes ("[PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info"):
> This field has two possible flags (as of latest pvclock ABI
> shared with KVM).
> 
> flags: bits in this field indicate extended capabilities
> coordinated between the guest and the hypervisor.  Specifically
> on KVM, availability of specific flags has to be checked in
> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
> still check some of the flags after registering the time info
> page since a force_update_vcpu_system_time is performed.

Earlier in the thread there was a Reviewed-by from Andrew Cooper.
Shouldn't that have been preserved here ?

Also, Andrew made some points about the linux.git MAINTAINERS file.

Ideally I would like to commit this patch ASAP.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-30 15:49   ` Ian Jackson
@ 2016-03-30 16:33     ` Joao Martins
  2016-03-31  7:09     ` Jan Beulich
  1 sibling, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-30 16:33 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Tim Deegan, Keir Fraser, Jan Beulich, xen-devel

On 03/30/2016 04:49 PM, Ian Jackson wrote:
> Joao Martins writes ("[PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info"):
>> This field has two possible flags (as of latest pvclock ABI
>> shared with KVM).
>>
>> flags: bits in this field indicate extended capabilities
>> coordinated between the guest and the hypervisor.  Specifically
>> on KVM, availability of specific flags has to be checked in
>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
>> still check some of the flags after registering the time info
>> page since a force_update_vcpu_system_time is performed.
> 
> Earlier in the thread there was a Reviewed-by from Andrew Cooper.

> Shouldn't that have been preserved here ?
I dropped both of the Reviewed-by because I updated the patch with changes that
weren't noted in earlier reviews - I had in mind that is the protocol.

> Also, Andrew made some points about the linux.git MAINTAINERS file.
Yes, the MAINTAINERS file will be updated with the linux v2 series and the point
is to have xen-devel be notified whenever changes are made to the pvclock ABI
(that is shared with KVM).

Joao

> Ideally I would like to commit this patch ASAP.
> 
> Thanks,
> Ian.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-30 15:49   ` Ian Jackson
  2016-03-30 16:33     ` Joao Martins
@ 2016-03-31  7:09     ` Jan Beulich
  1 sibling, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2016-03-31  7:09 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Joao Martins, Tim Deegan, Keir Fraser, xen-devel

>>> On 30.03.16 at 17:49, <Ian.Jackson@eu.citrix.com> wrote:
> Joao Martins writes ("[PATCH v2 1/6] public/xen.h: add flags field to 
> vcpu_time_info"):
>> This field has two possible flags (as of latest pvclock ABI
>> shared with KVM).
>> 
>> flags: bits in this field indicate extended capabilities
>> coordinated between the guest and the hypervisor.  Specifically
>> on KVM, availability of specific flags has to be checked in
>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
>> still check some of the flags after registering the time info
>> page since a force_update_vcpu_system_time is performed.
> 
> Earlier in the thread there was a Reviewed-by from Andrew Cooper.
> Shouldn't that have been preserved here ?
> 
> Also, Andrew made some points about the linux.git MAINTAINERS file.
> 
> Ideally I would like to commit this patch ASAP.

Is there a particular reason you see a need for this to go in
ahead of the rest of the series? This change alone can certainly
have my ack, but I lack the time right now to look at the other
patches.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
  2016-03-30 15:49   ` Ian Jackson
@ 2016-03-31  7:13   ` Jan Beulich
  2016-03-31 11:04     ` Joao Martins
  2016-04-05 10:16   ` Jan Beulich
  2 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-03-31  7:13 UTC (permalink / raw)
  To: Joao Martins; +Cc: Keir Fraser, Tim Deegan, Ian Jackson, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> This field has two possible flags (as of latest pvclock ABI
> shared with KVM).
> 
> flags: bits in this field indicate extended capabilities
> coordinated between the guest and the hypervisor.  Specifically
> on KVM, availability of specific flags has to be checked in
> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
> still check some of the flags after registering the time info
> page since a force_update_vcpu_system_time is performed.
> 
> Current flags are:
> 
>  flag bit   | cpuid bit    | meaning
> -------------------------------------------------------------
>             |              | time measures taken across
>      0      |      24      | multiple cpus are guaranteed to
>             |              | be monotonic
> -------------------------------------------------------------
>             |              | guest vcpu has been paused by
>      1      |     N/A      | the host
>             |              |
> -------------------------------------------------------------
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

With one further adjustment (which could be done while committing)
Acked-by: Jan Beulich <jbeulich@suse.com>

> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>       */
>      uint32_t tsc_to_system_mul;
>      int8_t   tsc_shift;
> -    int8_t   pad1[3];
> +    uint8_t  flags;
> +    uint8_t  pad1[2];
>  }; /* 32 bytes */
>  typedef struct vcpu_time_info vcpu_time_info_t;
>  
> +#define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
> +#define PVCLOCK_GUEST_STOPPED	(1 << 1)

No new identifiers not properly prefixed by XEN_ (or, elsewhere,
xen_) in the canonical public headers please (whether
downstream consumers like Linux elect to strip such prefixes is
an independent aspect).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-31  7:13   ` Jan Beulich
@ 2016-03-31 11:04     ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-03-31 11:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Tim Deegan, Ian Jackson, xen-devel



On 03/31/2016 08:13 AM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> This field has two possible flags (as of latest pvclock ABI
>> shared with KVM).
>>
>> flags: bits in this field indicate extended capabilities
>> coordinated between the guest and the hypervisor.  Specifically
>> on KVM, availability of specific flags has to be checked in
>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can
>> still check some of the flags after registering the time info
>> page since a force_update_vcpu_system_time is performed.
>>
>> Current flags are:
>>
>>  flag bit   | cpuid bit    | meaning
>> -------------------------------------------------------------
>>             |              | time measures taken across
>>      0      |      24      | multiple cpus are guaranteed to
>>             |              | be monotonic
>> -------------------------------------------------------------
>>             |              | guest vcpu has been paused by
>>      1      |     N/A      | the host
>>             |              |
>> -------------------------------------------------------------
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> With one further adjustment (which could be done while committing)
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 
Thanks!

>> --- a/xen/include/public/xen.h
>> +++ b/xen/include/public/xen.h
>> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>>       */
>>      uint32_t tsc_to_system_mul;
>>      int8_t   tsc_shift;
>> -    int8_t   pad1[3];
>> +    uint8_t  flags;
>> +    uint8_t  pad1[2];
>>  }; /* 32 bytes */
>>  typedef struct vcpu_time_info vcpu_time_info_t;
>>  
>> +#define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
>> +#define PVCLOCK_GUEST_STOPPED	(1 << 1)
I noticed from seeing different indentation on Jan's reply that there should be
spaces and no tabs here between the macro name and value. Fixed that too.

> No new identifiers not properly prefixed by XEN_ (or, elsewhere,
> xen_) in the canonical public headers please (whether
> downstream consumers like Linux elect to strip such prefixes is
> an independent aspect).
> 
OK, fixed and note taken.

If desired, I can submit v3 of this one in separate if Ian J. would like it
committed beforehand. Though what's added here, only get used later on this series.

João

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/6] x86/time: refactor init_platform_time()
  2016-03-29 13:44 ` [PATCH v2 2/6] x86/time: refactor init_platform_time() Joao Martins
@ 2016-04-01 16:10   ` Konrad Rzeszutek Wilk
  2016-04-01 18:26     ` Joao Martins
  2016-04-05 10:09   ` Jan Beulich
  1 sibling, 1 reply; 43+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-01 16:10 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Tue, Mar 29, 2016 at 02:44:07PM +0100, Joao Martins wrote:
> And accomodate platform time source initialization in
> try_platform_time(). This is a preparatory patch for deferring
> TSC clocksource initialization to the stage where all CPUS are
> up (verify_tsc_reliability init call).
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-03-29 13:44 ` [PATCH v2 3/6] x86/time: implement tsc as clocksource Joao Martins
  2016-03-29 17:39   ` Konrad Rzeszutek Wilk
@ 2016-04-01 16:43   ` Konrad Rzeszutek Wilk
  2016-04-01 18:38     ` Joao Martins
  2016-04-05 10:43   ` Jan Beulich
  2 siblings, 1 reply; 43+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-01 16:43 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Tue, Mar 29, 2016 at 02:44:08PM +0100, Joao Martins wrote:
> Introduce support for using TSC as platform time which is the highest
> resolution time and most performant to get (~20 nsecs).  Though there
> are also several problems associated with its usage, and there isn't a
> complete (and architecturally defined) guarantee that all machines
> will provide reliable and monotonic TSC across all CPUs, on different
> sockets and different P/C states.  I believe Intel to be the only that
> can guarantee that. For this reason it's only set when adminstrator
> changes "clocksource" boot option to "tsc". Initializing TSC
> clocksource requires all CPUs to have the tsc reliability checks
> performed. init_xen_time is called before all CPUs are up, and so we
> start with HPET at boot time, and switch later to TSC. The switch then
> happens on verify_tsc_reliability initcall that is invoked when all
> CPUs are up. When attempting to initializing TSC we also check for
> time warps and appropriate CPU features i.e.  TSC_RELIABLE,
> CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are
> met, we keep the clocksource that was previously initialized on
> init_xen_time.
> 
> It is also worth noting that with clocksource=tsc there isn't any
> need to synchronise with another clocksource, and I could verify that
> great portion the time skew was eliminated and seeing much less time
> warps happening. With HPET I used to observe ~500 warps in the period
> of 1h of around 27 us, and with TSC down to 50 warps in the same
> period having each warp < 100 ns. The warps still exist though but
> are only related to cross CPU calibration (being how much it takes to
> rendezvous with master), in which a later patch in this series aims to
> solve.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Cc: Keir Fraser <keir@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Changes since v1:
>  - s/printk/printk(XENLOG_INFO
>  - Remove extra space on inner brackets
>  - Add missing space around brackets
>  - Defer clocksource TSC initialization when all CPUs are up.
> 
> Changes since RFC:
>  - Spelling fixes in the commit message.
>  - Remove unused clocksource_is_tsc variable and introduce it instead
>  on the patch that uses it.
>  - Move plt_tsc from second to last in the available clocksources.
> ---
>  xen/arch/x86/time.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 95 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index ed4ed24..2602dda 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  }
>  
>  /************************************************************
> + * PLATFORM TIMER 4: TSC
> + */
> +static u64 tsc_freq;
> +static unsigned long tsc_max_warp;
> +static void tsc_check_reliability(void);
> +
> +static int __init init_tsctimer(struct platform_timesource *pts)
> +{
> +    bool_t tsc_reliable = 0;

No need to set it to zero.
> +
> +    tsc_check_reliability();

This has been already called by verify_tsc_reliability which calls this
function. Should we set tsc_max_warp to zero before calling it?

> +
> +    if ( tsc_max_warp > 0 )
> +    {
> +        tsc_reliable = 0;

Ditto. It is by default zero.

> +        printk(XENLOG_INFO "TSC: didn't passed warp test\n");

So the earlier test by verify_tsc_reliability did already this check and
printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE.

So can this check above be removed then?

Or are you thinking to ditch what verify_tsc_reliability does?

> +    }
> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
> +              (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> +               boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
> +    {
> +        tsc_reliable = 1;
> +    }
> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
> +    {
> +        tsc_reliable = (max_cstate <= 2);
> +
> +        if ( tsc_reliable )
> +            printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n");
> +        else
> +            printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n");
> +    }
> +
> +    pts->frequency = tsc_freq;
> +    return tsc_reliable;
> +}
> +
> +static u64 read_tsc(void)
> +{
> +    return rdtsc();
> +}
> +
> +static void resume_tsctimer(struct platform_timesource *pts)
> +{
> +}
> +
> +static struct platform_timesource __initdata plt_tsc =
> +{
> +    .id = "tsc",
> +    .name = "TSC",
> +    .read_counter = read_tsc,
> +    .counter_bits = 64,
> +    .init = init_tsctimer,

Could you add a note in here:

/* Not called by init_platform_timer as it is not on the plt_timers array. */

> +    .resume = resume_tsctimer,
> +};
> +
> +/************************************************************
>   * GENERIC PLATFORM TIMER INFRASTRUCTURE
>   */
>  
> @@ -533,6 +590,21 @@ static void resume_platform_timer(void)
>      plt_stamp = plt_src.read_counter();
>  }
>  
> +static void __init reset_platform_timer(void)
> +{
> +    /* Deactivate any timers running */
> +    kill_timer(&plt_overflow_timer);
> +    kill_timer(&calibration_timer);
> +
> +    /* Reset counters and stamps */
> +    spin_lock_irq(&platform_timer_lock);
> +    plt_stamp = 0;
> +    plt_stamp64 = 0;
> +    platform_timer_stamp = 0;
> +    stime_platform_stamp = 0;
> +    spin_unlock_irq(&platform_timer_lock);
> +}
> +
>  static int __init try_platform_timer(struct platform_timesource *pts)
>  {
>      int rc = -1;
> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>      if ( rc <= 0 )
>          return rc;
>  
> +    /* We have a platform timesource already so reset it */
> +    if ( plt_src.counter_bits != 0 )
> +        reset_platform_timer();
> +
>      plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
>  
>      set_time_scale(&plt_scale, pts->frequency);
> @@ -566,7 +642,9 @@ static void __init init_platform_timer(void)
>      struct platform_timesource *pts = NULL;
>      int i, rc = -1;
>  
> -    if ( opt_clocksource[0] != '\0' )
> +    /* clocksource=tsc is initialized later when all CPUS are up */

Perhaps:
/* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ ?

> +    if ( (opt_clocksource[0] != '\0') &&
> +         (strcmp(opt_clocksource, "tsc") != 0) )
>      {
>          for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
>          {
> @@ -1192,7 +1270,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp)
>      }
>  }
>  
> -static unsigned long tsc_max_warp, tsc_check_count;
> +static unsigned long tsc_check_count;
>  static cpumask_t tsc_check_cpumask;
>  
>  static void tsc_check_slave(void *unused)
> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
>          }
>      }
>  
> +    if ( !strcmp(opt_clocksource, "tsc") )
> +    {
> +        if ( try_platform_timer(&plt_tsc) > 0 )
> +        {
> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
> +                   freq_string(plt_src.frequency));
> +
> +            init_percpu_time();
> +
> +            init_timer(&calibration_timer, time_calibration, NULL, 0);
> +            set_timer(&calibration_timer, NOW() + EPOCH);
> +        }
> +    }
> +
>      return 0;
>  }
>  __initcall(verify_tsc_reliability);
> @@ -1476,6 +1568,7 @@ void __init early_time_init(void)
>      struct cpu_time *t = &this_cpu(cpu_time);
>      u64 tmp = init_pit_and_calibrate_tsc();
>  
> +    tsc_freq = tmp;
>      set_time_scale(&t->tsc_scale, tmp);
>      t->local_tsc_stamp = boot_tsc_stamp;
>  
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/6] x86/time: refactor init_platform_time()
  2016-04-01 16:10   ` Konrad Rzeszutek Wilk
@ 2016-04-01 18:26     ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-04-01 18:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel



On 04/01/2016 05:10 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 29, 2016 at 02:44:07PM +0100, Joao Martins wrote:
>> And accomodate platform time source initialization in
>> try_platform_time(). This is a preparatory patch for deferring
>> TSC clocksource initialization to the stage where all CPUS are
>> up (verify_tsc_reliability init call).
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
Thanks!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/6] x86/time: refactor read_platform_stime()
  2016-03-29 13:44 ` [PATCH v2 5/6] x86/time: refactor read_platform_stime() Joao Martins
@ 2016-04-01 18:32   ` Konrad Rzeszutek Wilk
  2016-04-05 11:52   ` Jan Beulich
  1 sibling, 0 replies; 43+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-01 18:32 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Tue, Mar 29, 2016 at 02:44:10PM +0100, Joao Martins wrote:
> To fetch the last read from the clocksource which was used to
> calculate system_time. In the case of clocksource=tsc we will
> use it to set tsc_timestamp.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-04-01 16:43   ` Konrad Rzeszutek Wilk
@ 2016-04-01 18:38     ` Joao Martins
  2016-04-01 18:45       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-01 18:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

[snip]

>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index ed4ed24..2602dda 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  }
>>  
>>  /************************************************************
>> + * PLATFORM TIMER 4: TSC
>> + */
>> +static u64 tsc_freq;
>> +static unsigned long tsc_max_warp;
>> +static void tsc_check_reliability(void);
>> +
>> +static int __init init_tsctimer(struct platform_timesource *pts)
>> +{
>> +    bool_t tsc_reliable = 0;
> 
> No need to set it to zero.
OK.

>> +
>> +    tsc_check_reliability();
> 
> This has been already called by verify_tsc_reliability which calls this
> function. Should we set tsc_max_warp to zero before calling it?
Ah, correct. But may be it's not necessary to run the tsc_check_reliability here
at all as what I am doing is ineficient. See my other comment below.

> 
>> +
>> +    if ( tsc_max_warp > 0 )
>> +    {
>> +        tsc_reliable = 0;
> 
> Ditto. It is by default zero.
OK.

> 
>> +        printk(XENLOG_INFO "TSC: didn't passed warp test\n");
> 
> So the earlier test by verify_tsc_reliability did already this check and
> printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE.
> 
> So can this check above be removed then?
> 
> Or are you thinking to ditch what verify_tsc_reliability does?
> 
I had the tsc_check_reliability here because TSC could still be deemed reliable
for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the
most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a
better way of doing this would be to run the warp test solely on
verify_tsc_reliability() in all possible conditions to be deemed reliable? And
then I could even remove almost the whole init_tsctimer if it was to be called
when no warps are observed.

>> +    }
>> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>> +              (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>> +               boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
>> +    {
>> +        tsc_reliable = 1;
>> +    }
>> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>> +    {
>> +        tsc_reliable = (max_cstate <= 2);
>> +
>> +        if ( tsc_reliable )
>> +            printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n");
>> +        else
>> +            printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n");
>> +    }
>> +
>> +    pts->frequency = tsc_freq;
>> +    return tsc_reliable;
>> +}
>> +
>> +static u64 read_tsc(void)
>> +{
>> +    return rdtsc();
>> +}
>> +
>> +static void resume_tsctimer(struct platform_timesource *pts)
>> +{
>> +}
>> +
>> +static struct platform_timesource __initdata plt_tsc =
>> +{
>> +    .id = "tsc",
>> +    .name = "TSC",
>> +    .read_counter = read_tsc,
>> +    .counter_bits = 64,
>> +    .init = init_tsctimer,
> 
> Could you add a note in here:
> 
> /* Not called by init_platform_timer as it is not on the plt_timers array. */
> 
OK, I fixed that.

>> +    .resume = resume_tsctimer,
>> +};
>> +
>> +/************************************************************
>>   * GENERIC PLATFORM TIMER INFRASTRUCTURE
>>   */
>>  
>> @@ -533,6 +590,21 @@ static void resume_platform_timer(void)
>>      plt_stamp = plt_src.read_counter();
>>  }
>>  
>> +static void __init reset_platform_timer(void)
>> +{
>> +    /* Deactivate any timers running */
>> +    kill_timer(&plt_overflow_timer);
>> +    kill_timer(&calibration_timer);
>> +
>> +    /* Reset counters and stamps */
>> +    spin_lock_irq(&platform_timer_lock);
>> +    plt_stamp = 0;
>> +    plt_stamp64 = 0;
>> +    platform_timer_stamp = 0;
>> +    stime_platform_stamp = 0;
>> +    spin_unlock_irq(&platform_timer_lock);
>> +}
>> +
>>  static int __init try_platform_timer(struct platform_timesource *pts)
>>  {
>>      int rc = -1;
>> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>>      if ( rc <= 0 )
>>          return rc;
>>  
>> +    /* We have a platform timesource already so reset it */
>> +    if ( plt_src.counter_bits != 0 )
>> +        reset_platform_timer();
>> +
>>      plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
>>  
>>      set_time_scale(&plt_scale, pts->frequency);
>> @@ -566,7 +642,9 @@ static void __init init_platform_timer(void)
>>      struct platform_timesource *pts = NULL;
>>      int i, rc = -1;
>>  
>> -    if ( opt_clocksource[0] != '\0' )
>> +    /* clocksource=tsc is initialized later when all CPUS are up */
> 
> Perhaps:
> /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */ ?
> 
It's clearer indeed. Fixed that too.

>> +    if ( (opt_clocksource[0] != '\0') &&
>> +         (strcmp(opt_clocksource, "tsc") != 0) )
>>      {
>>          for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
>>          {
>> @@ -1192,7 +1270,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp)
>>      }
>>  }
>>  
>> -static unsigned long tsc_max_warp, tsc_check_count;
>> +static unsigned long tsc_check_count;
>>  static cpumask_t tsc_check_cpumask;
>>  
>>  static void tsc_check_slave(void *unused)
>> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
>>          }
>>      }
>>  
>> +    if ( !strcmp(opt_clocksource, "tsc") )
>> +    {
>> +        if ( try_platform_timer(&plt_tsc) > 0 )
>> +        {
>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>> +                   freq_string(plt_src.frequency));
>> +
>> +            init_percpu_time();
>> +
>> +            init_timer(&calibration_timer, time_calibration, NULL, 0);
>> +            set_timer(&calibration_timer, NOW() + EPOCH);
>> +        }
>> +    }
>> +
>>      return 0;
>>  }
>>  __initcall(verify_tsc_reliability);
>> @@ -1476,6 +1568,7 @@ void __init early_time_init(void)
>>      struct cpu_time *t = &this_cpu(cpu_time);
>>      u64 tmp = init_pit_and_calibrate_tsc();
>>  
>> +    tsc_freq = tmp;
>>      set_time_scale(&t->tsc_scale, tmp);
>>      t->local_tsc_stamp = boot_tsc_stamp;
>>  
>> -- 
>> 2.1.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-04-01 18:38     ` Joao Martins
@ 2016-04-01 18:45       ` Konrad Rzeszutek Wilk
  2016-04-03 18:47         ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-04-01 18:45 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

On Fri, Apr 01, 2016 at 07:38:18PM +0100, Joao Martins wrote:
> [snip]
> 
> >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> >> index ed4ed24..2602dda 100644
> >> --- a/xen/arch/x86/time.c
> >> +++ b/xen/arch/x86/time.c
> >> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
> >>  }
> >>  
> >>  /************************************************************
> >> + * PLATFORM TIMER 4: TSC
> >> + */
> >> +static u64 tsc_freq;
> >> +static unsigned long tsc_max_warp;
> >> +static void tsc_check_reliability(void);
> >> +
> >> +static int __init init_tsctimer(struct platform_timesource *pts)
> >> +{
> >> +    bool_t tsc_reliable = 0;
> > 
> > No need to set it to zero.
> OK.
> 
> >> +
> >> +    tsc_check_reliability();
> > 
> > This has been already called by verify_tsc_reliability which calls this
> > function. Should we set tsc_max_warp to zero before calling it?
> Ah, correct. But may be it's not necessary to run the tsc_check_reliability here
> at all as what I am doing is ineficient. See my other comment below.
> 
> > 
> >> +
> >> +    if ( tsc_max_warp > 0 )
> >> +    {
> >> +        tsc_reliable = 0;
> > 
> > Ditto. It is by default zero.
> OK.
> 
> > 
> >> +        printk(XENLOG_INFO "TSC: didn't passed warp test\n");
> > 
> > So the earlier test by verify_tsc_reliability did already this check and
> > printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE.
> > 
> > So can this check above be removed then?
> > 
> > Or are you thinking to ditch what verify_tsc_reliability does?
> > 
> I had the tsc_check_reliability here because TSC could still be deemed reliable
> for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the
> most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a
> better way of doing this would be to run the warp test solely on
> verify_tsc_reliability() in all possible conditions to be deemed reliable? And
> then I could even remove almost the whole init_tsctimer if it was to be called
> when no warps are observed.

So..
> 
> >> +    }
> >> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
> >> +              (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> >> +               boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
> >> +    {
> >> +        tsc_reliable = 1;
> >> +    }
> >> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
> >> +    {
> >> +        tsc_reliable = (max_cstate <= 2);
> >> +
> >> +        if ( tsc_reliable )
> >> +            printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n");
> >> +        else
> >> +            printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n");

.. is that always true? As in if this is indeed the case should we clear
X86_FEATURE_CONSTANT_TSC bit? And make check be part of tsc_check_reliability?

Then init_tsctimer() would just need to check for the boot_cpu_has bits being
set.

As in:

 if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
      (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
       boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
  {
	pts->frequency = tsc_freq;
	return 1;
   }

   return 0;

And tsc_check_reliability would be in charge of clearing the CPU bits if something
is off.

But maybe that is not good? As in, could we mess up and clear those bits
even if they are suppose to be set?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-04-01 18:45       ` Konrad Rzeszutek Wilk
@ 2016-04-03 18:47         ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-04-03 18:47 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel



On 04/01/2016 07:45 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 01, 2016 at 07:38:18PM +0100, Joao Martins wrote:
>> [snip]
>>
>>>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>>>> index ed4ed24..2602dda 100644
>>>> --- a/xen/arch/x86/time.c
>>>> +++ b/xen/arch/x86/time.c
>>>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>>>  }
>>>>  
>>>>  /************************************************************
>>>> + * PLATFORM TIMER 4: TSC
>>>> + */
>>>> +static u64 tsc_freq;
>>>> +static unsigned long tsc_max_warp;
>>>> +static void tsc_check_reliability(void);
>>>> +
>>>> +static int __init init_tsctimer(struct platform_timesource *pts)
>>>> +{
>>>> +    bool_t tsc_reliable = 0;
>>>
>>> No need to set it to zero.
>> OK.
>>
>>>> +
>>>> +    tsc_check_reliability();
>>>
>>> This has been already called by verify_tsc_reliability which calls this
>>> function. Should we set tsc_max_warp to zero before calling it?
>> Ah, correct. But may be it's not necessary to run the tsc_check_reliability here
>> at all as what I am doing is ineficient. See my other comment below.
>>
>>>
>>>> +
>>>> +    if ( tsc_max_warp > 0 )
>>>> +    {
>>>> +        tsc_reliable = 0;
>>>
>>> Ditto. It is by default zero.
>> OK.
>>
>>>
>>>> +        printk(XENLOG_INFO "TSC: didn't passed warp test\n");
>>>
>>> So the earlier test by verify_tsc_reliability did already this check and
>>> printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE.
>>>
>>> So can this check above be removed then?
>>>
>>> Or are you thinking to ditch what verify_tsc_reliability does?
>>>
>> I had the tsc_check_reliability here because TSC could still be deemed reliable
>> for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the
>> most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a
>> better way of doing this would be to run the warp test solely on
>> verify_tsc_reliability() in all possible conditions to be deemed reliable? And
>> then I could even remove almost the whole init_tsctimer if it was to be called
>> when no warps are observed.
> 
> So..
>>
>>>> +    }
>>>> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>>>> +              (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>>>> +               boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
>>>> +    {
>>>> +        tsc_reliable = 1;
>>>> +    }
>>>> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>>>> +    {
>>>> +        tsc_reliable = (max_cstate <= 2);
>>>> +
>>>> +        if ( tsc_reliable )
>>>> +            printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n");
>>>> +        else
>>>> +            printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n");
> 
> .. is that always true?
Might not be on older platforms - but I could be stricter here and require
max_cstates 0 to allow TSC clocksource to be used. CONSTANT_TSC is set based on
CPU model (Manual section 17.14, 2nd paragrah), and (on Intel) AFAICT constant
rate can be interpreted to being across P/T states, thus leaving us with the C
states. It is only architecturally guaranteed that TSC doesn't stop on deep
C-states (Section 36.8.3) if invariant TSC is set (CPUID.80000007:EDX[8]). If
this bit is set, on Intel it guarantees to have synchronized and nonstop TSCs
across all states (as also stated in the Section 17.14.1). NONSTOP_TSC means
that the TSC does not stop in deep C states (C3 or higher), so decreasing the
maximum C states to 2 (or disabling entirely) would provide the equivalent to a
nonstop TSC. Thus deeming it reliable _if_ the warp test passes.

> As in if this is indeed the case should we clear
> X86_FEATURE_CONSTANT_TSC bit? And make check be part of tsc_check_reliability?
I am not sure about clearing CONSTANT_TSC as this is based on CPU model. But I
believe the main issue would be to know whether the TSC stops or on deep C states.

> Then init_tsctimer() would just need to check for the boot_cpu_has bits being
> set.
> 
> As in:
> 
>  if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>       (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>        boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
>   {
> 	pts->frequency = tsc_freq;
> 	return 1;
>    }
>
Yeah something along those lines. My idea previously was to not even check these
bits, and assume init_tsctimer is called knowing that we have a reliable TSC
source as we check all of it beforehand? Something like this:

 static int __init verify_tsc_reliability(void)
 {
    if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
         (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
          (boot_cpu_has(X86_FEATURE_NONSTOP_TSC) || max_cstate <= 2)) )
     {
         if (tsc_warp_warp)
         {
             printk("%s: TSC warp detected, disabling TSC_RELIABLE\n",
                    __func__);
             if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
                  setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
          }
          else if ( !strcmp(opt_clocksource, "tsc") )
          {
          ...


And then init_tsctimer would just be:

static int __init init_tsctimer(struct platform_timesource *pts)
{
    pts->frequency = tsc_freq;
    return 1;
}

?

>    return 0;
>
> And tsc_check_reliability would be in charge of clearing the CPU bits if something
> is off.
Perhaps you mean verify_tsc_reliability()? That's already
clearing TSC_RELIABLE in the event of warps as you previously mentioned.
tsc_check_reliability is the actual warp test - might not be the best place to
clear it.

> But maybe that is not good? As in, could we mess up and clear those bits
> even if they are suppose to be set?
Hm, I am not sure how we could mess up if we clear them, but these bits look
correctly set to me.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/6] x86/time: refactor init_platform_time()
  2016-03-29 13:44 ` [PATCH v2 2/6] x86/time: refactor init_platform_time() Joao Martins
  2016-04-01 16:10   ` Konrad Rzeszutek Wilk
@ 2016-04-05 10:09   ` Jan Beulich
  2016-04-05 10:55     ` Joao Martins
  1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 10:09 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -533,6 +533,30 @@ static void resume_platform_timer(void)
>      plt_stamp = plt_src.read_counter();
>  }
>  
> +static int __init try_platform_timer(struct platform_timesource *pts)
> +{
> +    int rc = -1;

Pointless initializer. In fact ...

> +    rc = pts->init(pts);

... this could be the initializer.

> +    if ( rc <= 0 )
> +        return rc;
> +
> +    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
> +
> +    set_time_scale(&plt_scale, pts->frequency);
> +
> +    plt_overflow_period = scale_delta(
> +        1ull << (pts->counter_bits - 1), &plt_scale);
> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
> +    plt_src = *pts;
> +    plt_overflow(NULL);
> +
> +    platform_timer_stamp = plt_stamp64;
> +    stime_platform_stamp = NOW();
> +
> +    return rc;
> +}

Moving here all this setting up of static/global data makes me
wonder how you mean to consistently re-use this function for
your new purpose.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
  2016-03-30 15:49   ` Ian Jackson
  2016-03-31  7:13   ` Jan Beulich
@ 2016-04-05 10:16   ` Jan Beulich
  2016-04-05 10:59     ` Joao Martins
  2 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 10:16 UTC (permalink / raw)
  To: Joao Martins; +Cc: Keir Fraser, Tim Deegan, Ian Jackson, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>       */
>      uint32_t tsc_to_system_mul;
>      int8_t   tsc_shift;
> -    int8_t   pad1[3];
> +    uint8_t  flags;
> +    uint8_t  pad1[2];
>  }; /* 32 bytes */

I've just noticed there's another issue here: The new structure
layout should be surfaced only conditionally (evaluating
__XEN_INTERFACE_VERSION__). This of course could again be
addressed while committing, is - as Ian had asked for - this is to
go in ahead of the rest of the series.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-03-29 13:44 ` [PATCH v2 3/6] x86/time: implement tsc as clocksource Joao Martins
  2016-03-29 17:39   ` Konrad Rzeszutek Wilk
  2016-04-01 16:43   ` Konrad Rzeszutek Wilk
@ 2016-04-05 10:43   ` Jan Beulich
  2016-04-05 14:56     ` Joao Martins
  2 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 10:43 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> Introduce support for using TSC as platform time which is the highest
> resolution time and most performant to get (~20 nsecs).  Though there
> are also several problems associated with its usage, and there isn't a
> complete (and architecturally defined) guarantee that all machines
> will provide reliable and monotonic TSC across all CPUs, on different
> sockets and different P/C states.  I believe Intel to be the only that
> can guarantee that. For this reason it's only set when adminstrator
> changes "clocksource" boot option to "tsc". Initializing TSC
> clocksource requires all CPUs to have the tsc reliability checks
> performed. init_xen_time is called before all CPUs are up, and so we
> start with HPET at boot time, and switch later to TSC.

Please don't make possibly wrong statements: There's no guarantee
we'd start with HPET - might as well be PMTMR or PIT.

> The switch then
> happens on verify_tsc_reliability initcall that is invoked when all
> CPUs are up. When attempting to initializing TSC we also check for
> time warps and appropriate CPU features i.e.  TSC_RELIABLE,
> CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are
> met, we keep the clocksource that was previously initialized on
> init_xen_time.

DYM "And in case any of these conditions is not met, ..."? Or,
looking at the code, something more complex?

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  }
>  
>  /************************************************************
> + * PLATFORM TIMER 4: TSC
> + */
> +static u64 tsc_freq;
> +static unsigned long tsc_max_warp;
> +static void tsc_check_reliability(void);
> +
> +static int __init init_tsctimer(struct platform_timesource *pts)
> +{
> +    bool_t tsc_reliable = 0;
> +
> +    tsc_check_reliability();
> +
> +    if ( tsc_max_warp > 0 )
> +    {
> +        tsc_reliable = 0;

Redundant assignment.

> +static void resume_tsctimer(struct platform_timesource *pts)
> +{
> +}

No need for an empty function - other clock sources don't have
such empty stubs either (i.e. the caller checks for NULL).

> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>      if ( rc <= 0 )
>          return rc;
>  
> +    /* We have a platform timesource already so reset it */
> +    if ( plt_src.counter_bits != 0 )
> +        reset_platform_timer();

What if any of the time functions get used between here and the
setting of the new clock source? For example, what will happen to
log messages when "console_timestamps=..." is in effect?

> @@ -566,7 +642,9 @@ static void __init init_platform_timer(void)
>      struct platform_timesource *pts = NULL;
>      int i, rc = -1;
>  
> -    if ( opt_clocksource[0] != '\0' )
> +    /* clocksource=tsc is initialized later when all CPUS are up */
> +    if ( (opt_clocksource[0] != '\0') &&
> +         (strcmp(opt_clocksource, "tsc") != 0) )

In line with the inverse check done below (using ! instead of == 0)
please omit the != 0 here.

> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
>          }
>      }
>  
> +    if ( !strcmp(opt_clocksource, "tsc") )
> +    {
> +        if ( try_platform_timer(&plt_tsc) > 0 )
> +        {
> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
> +                   freq_string(plt_src.frequency));
> +
> +            init_percpu_time();

So you re-do this for the BP, but not for any of the APs?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/6] x86/time: refactor init_platform_time()
  2016-04-05 10:09   ` Jan Beulich
@ 2016-04-05 10:55     ` Joao Martins
  2016-04-05 11:16       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-05 10:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 04/05/2016 11:09 AM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -533,6 +533,30 @@ static void resume_platform_timer(void)
>>      plt_stamp = plt_src.read_counter();
>>  }
>>  
>> +static int __init try_platform_timer(struct platform_timesource *pts)
>> +{
>> +    int rc = -1;
> 
> Pointless initializer. In fact ...
> 
>> +    rc = pts->init(pts);
> 
> ... this could be the initializer.
> 
Ah yes. Will fix it.

>> +    if ( rc <= 0 )
>> +        return rc;
>> +
>> +    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
>> +
>> +    set_time_scale(&plt_scale, pts->frequency);
>> +
>> +    plt_overflow_period = scale_delta(
>> +        1ull << (pts->counter_bits - 1), &plt_scale);
>> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>> +    plt_src = *pts;
>> +    plt_overflow(NULL);
>> +
>> +    platform_timer_stamp = plt_stamp64;
>> +    stime_platform_stamp = NOW();
>> +
>> +    return rc;
>> +}
> 
> Moving here all this setting up of static/global data makes me
> wonder how you mean to consistently re-use this function for
> your new purpose.
My purpose is to reuse this initialization part for the case of switching from
from one clocksource to TSC at a later point (which is done in a different place
i.e. verify_tsc_reliability). Though this static/global data part in particular
is done if the clocksource initialization succeeds so I merged that in a single
helper hence the name "try_platform_timer". It also looks cleaner, and we would
leave opt_clocksource checking with plt_timers in init_platform_time.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info
  2016-04-05 10:16   ` Jan Beulich
@ 2016-04-05 10:59     ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-04-05 10:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Tim Deegan, Ian Jackson, xen-devel



On 04/05/2016 11:16 AM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/include/public/xen.h
>> +++ b/xen/include/public/xen.h
>> @@ -614,10 +614,14 @@ struct vcpu_time_info {
>>       */
>>      uint32_t tsc_to_system_mul;
>>      int8_t   tsc_shift;
>> -    int8_t   pad1[3];
>> +    uint8_t  flags;
>> +    uint8_t  pad1[2];
>>  }; /* 32 bytes */
> 
> I've just noticed there's another issue here: The new structure
> layout should be surfaced only conditionally (evaluating
> __XEN_INTERFACE_VERSION__). This of course could again be
> addressed while committing, is - as Ian had asked for - this is to
> go in ahead of the rest of the series.
> 
I will fix this too on my end if this isn't going ahead of the rest of the series.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 2/6] x86/time: refactor init_platform_time()
  2016-04-05 10:55     ` Joao Martins
@ 2016-04-05 11:16       ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 11:16 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 05.04.16 at 12:55, <joao.m.martins@oracle.com> wrote:
> On 04/05/2016 11:09 AM, Jan Beulich wrote:
>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>> +    if ( rc <= 0 )
>>> +        return rc;
>>> +
>>> +    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
>>> +
>>> +    set_time_scale(&plt_scale, pts->frequency);
>>> +
>>> +    plt_overflow_period = scale_delta(
>>> +        1ull << (pts->counter_bits - 1), &plt_scale);
>>> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>>> +    plt_src = *pts;
>>> +    plt_overflow(NULL);
>>> +
>>> +    platform_timer_stamp = plt_stamp64;
>>> +    stime_platform_stamp = NOW();
>>> +
>>> +    return rc;
>>> +}
>> 
>> Moving here all this setting up of static/global data makes me
>> wonder how you mean to consistently re-use this function for
>> your new purpose.
> My purpose is to reuse this initialization part for the case of switching 
> from
> from one clocksource to TSC at a later point (which is done in a different 
> place
> i.e. verify_tsc_reliability). Though this static/global data part in 
> particular
> is done if the clocksource initialization succeeds so I merged that in a 
> single
> helper hence the name "try_platform_timer". It also looks cleaner, and we 
> would
> leave opt_clocksource checking with plt_timers in init_platform_time.

Well, I understand all this, but still wonder - is this second use then
correct? See also the comments on the next patch.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 4/6] x86/time: streamline platform time init on plt_init()
  2016-03-29 13:44 ` [PATCH v2 4/6] x86/time: streamline platform time init on plt_init() Joao Martins
@ 2016-04-05 11:46   ` Jan Beulich
  2016-04-05 15:12     ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 11:46 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  /************************************************************
>   * PLATFORM TIMER 4: TSC
>   */
> +static bool_t clocksource_is_tsc;

__read_mostly, but see below.

> @@ -516,17 +519,31 @@ static s_time_t __read_platform_stime(u64 platform_time)
>      return (stime_platform_stamp + scale_delta(diff, &plt_scale));
>  }
>  
> +static void __plt_init(void)
> +{
> +    u64 count;
> +
> +    ASSERT(spin_is_locked(&platform_timer_lock));
> +    count = plt_src.read_counter();
> +    plt_stamp64 += (count - plt_stamp) & plt_mask;
> +    plt_stamp = count;
> +}

Note that this has nothing to do with "init" - it updates the two time
stamps, as is being made clear by ...

>  static void plt_overflow(void *unused)
>  {
>      int i;
> -    u64 count;
>      s_time_t now, plt_now, plt_wrap;
>  
>      spin_lock_irq(&platform_timer_lock);
>  
> -    count = plt_src.read_counter();
> -    plt_stamp64 += (count - plt_stamp) & plt_mask;
> -    plt_stamp = count;
> +    __plt_init();

... this use.

> @@ -621,11 +638,22 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>  
>      set_time_scale(&plt_scale, pts->frequency);
>  
> -    plt_overflow_period = scale_delta(
> -        1ull << (pts->counter_bits - 1), &plt_scale);
> -    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>      plt_src = *pts;
> -    plt_overflow(NULL);
> +
> +    if ( clocksource_is_tsc )

Why not simply "if ( pts == plt_tsc )", eliminating the need for the
variable?

> +    {
> +        plt_init();
> +    }
> +    else
> +    {
> +        plt_overflow_period = scale_delta(
> +            1ull << (pts->counter_bits - 1), &plt_scale);
> +        init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
> +        plt_overflow(NULL);
> +
> +        printk("Platform timer overflow period is %lu secs\n",
> +               plt_overflow_period/SECONDS(1));

If we want this logged at all, then please at most as XENLOG_INFO.
Plus - is seconds granularity fine grained enough for all sources, i.e.
wouldn't there for typical HPET just be a single digit, not a lot of
precision that is? And finally: Blanks around / please.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/6] x86/time: refactor read_platform_stime()
  2016-03-29 13:44 ` [PATCH v2 5/6] x86/time: refactor read_platform_stime() Joao Martins
  2016-04-01 18:32   ` Konrad Rzeszutek Wilk
@ 2016-04-05 11:52   ` Jan Beulich
  2016-04-05 15:22     ` Joao Martins
  1 sibling, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 11:52 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> -static s_time_t read_platform_stime(void)
> +static s_time_t read_platform_stime(u64 *stamp)
>  {
> -    u64 count;
> +    u64 plt_stamp_counter, count;

"stamp" and "counter" seem kind of redundant.

>      s_time_t stime;
>  
>      ASSERT(!local_irq_is_enabled());
>  
>      spin_lock(&platform_timer_lock);
> -    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
> +    plt_stamp_counter = plt_src.read_counter();
> +    count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask);
>      stime = __read_platform_stime(count);
> +    if ( stamp )
> +        *stamp = plt_stamp_counter;
>      spin_unlock(&platform_timer_lock);

What reason is there to do that conditional write inside the locked
region?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-03-29 13:44 ` [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
@ 2016-04-05 12:22   ` Jan Beulich
  2016-04-05 21:34     ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 12:22 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -43,6 +43,10 @@
>  static char __initdata opt_clocksource[10];
>  string_param("clocksource", opt_clocksource);
>  
> +/* opt_nocpuhotplug: Set if CPU hotplug isn't meant to be used */
> +static bool_t __initdata opt_nocpuhotplug;
> +boolean_param("nocpuhotplug", opt_nocpuhotplug);

If we were to have such a new option, it would need to be
accompanied by an entry in the command line option doc. But
I'm opposed to this: For one, the variable being static here
means there is nothing that actually suppresses CPU hotplug
to happen. And then I think this can, for all practical purposes,
be had by suitably using existing command line options, namely
"max_cpus=", such that set_nr_cpu_ids() won't allow for any
further CPUs to get added. Albeit I admit that if someone was
to bring down some CPU and then hotplug another one, we
might still be in trouble. So maybe the better approach would
be to fail onlining of CPUs that don't meet the criteria when
"clocksource=tsc"?

> @@ -435,6 +439,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>   * PLATFORM TIMER 4: TSC
>   */
>  static bool_t clocksource_is_tsc;
> +static bool_t use_tsc_stable_bit;

__read_mostly?

> @@ -468,6 +473,11 @@ static int __init init_tsctimer(struct platform_timesource *pts)
>  
>      pts->frequency = tsc_freq;
>      clocksource_is_tsc = tsc_reliable;
> +    use_tsc_stable_bit = clocksource_is_tsc &&
> +        ((nr_cpu_ids == num_present_cpus()) || opt_nocpuhotplug);

See my remark above regarding the reliability of this.

> @@ -1431,6 +1443,22 @@ static void time_calibration_std_rendezvous(void *_r)
>      raise_softirq(TIME_CALIBRATE_SOFTIRQ);
>  }
>  
> +/*
> + * Rendezvous function used when clocksource is TSC and
> + * no CPU hotplug will be performed.
> + */
> +static void time_calibration_nop_rendezvous(void *_r)

Even if done so in existing code - no new local variable or function
parameters starting with an underscore please.

> +{
> +    struct cpu_calibration *c = &this_cpu(cpu_calibration);
> +    struct calibration_rendezvous *r = _r;

const

> +    c->local_tsc_stamp = r->master_tsc_stamp;
> +    c->stime_local_stamp = get_s_time();
> +    c->stime_master_stamp = r->master_stime;
> +
> +    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
> +}

Perhaps this whole function should move up ahead of the other
two, so that they both can use this one instead of now duplicating
the same code a 3rd time? Or maybe a new helper function would
be better, to also account for the difference in what
c->local_tsc_stamp gets set from (which could then become a
parameter of that new function).

> @@ -1440,6 +1468,13 @@ static void time_calibration(void *unused)
>          .semaphore = ATOMIC_INIT(0)
>      };
>  
> +    if ( use_tsc_stable_bit )
> +    {
> +        local_irq_disable();
> +        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
> +        local_irq_enable();
> +    }

So this can't be in time_calibration_nop_rendezvous() because
you want to avoid the actual rendezvousing. But isn't the then
possibly much larger gap between read_platform_stime() (which
parallels the rdtsc()-s in the other two cases) and get_s_time()
invocation going to become a problem?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-04-05 10:43   ` Jan Beulich
@ 2016-04-05 14:56     ` Joao Martins
  2016-04-05 15:12       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-05 14:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 04/05/2016 11:43 AM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> Introduce support for using TSC as platform time which is the highest
>> resolution time and most performant to get (~20 nsecs).  Though there
>> are also several problems associated with its usage, and there isn't a
>> complete (and architecturally defined) guarantee that all machines
>> will provide reliable and monotonic TSC across all CPUs, on different
>> sockets and different P/C states.  I believe Intel to be the only that
>> can guarantee that. For this reason it's only set when adminstrator
>> changes "clocksource" boot option to "tsc". Initializing TSC
>> clocksource requires all CPUs to have the tsc reliability checks
>> performed. init_xen_time is called before all CPUs are up, and so we
>> start with HPET at boot time, and switch later to TSC.
> 
> Please don't make possibly wrong statements: There's no guarantee
> we'd start with HPET - might as well be PMTMR or PIT.
My apologies - while the commit message doesn't express this, I meant it to be
"for example we would start with HPET (...)". I gave that example as it's the
one preferable in plt_timers.

> 
>> The switch then
>> happens on verify_tsc_reliability initcall that is invoked when all
>> CPUs are up. When attempting to initializing TSC we also check for
>> time warps and appropriate CPU features i.e.  TSC_RELIABLE,
>> CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are
>> met, we keep the clocksource that was previously initialized on
>> init_xen_time.
> 
> DYM "And in case any of these conditions is not met, ..."?
Yes. My use of "none" may be wrong here so just to be sure what I mean is: (...)
If all of the conditions in the previous sentence are not met (...)

> Or,
> looking at the code, something more complex?
> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  }
>>  
>>  /************************************************************
>> + * PLATFORM TIMER 4: TSC
>> + */
>> +static u64 tsc_freq;
>> +static unsigned long tsc_max_warp;
>> +static void tsc_check_reliability(void);
>> +
>> +static int __init init_tsctimer(struct platform_timesource *pts)
>> +{
>> +    bool_t tsc_reliable = 0;
>> +
>> +    tsc_check_reliability();
>> +
>> +    if ( tsc_max_warp > 0 )
>> +    {
>> +        tsc_reliable = 0;
> 
> Redundant assignment.
> 
Yes, Konrad had point that out too. Fixed that.

>> +static void resume_tsctimer(struct platform_timesource *pts)
>> +{
>> +}
> 
> No need for an empty function - other clock sources don't have
> such empty stubs either (i.e. the caller checks for NULL).
> 
OK.

>> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>>      if ( rc <= 0 )
>>          return rc;
>>  
>> +    /* We have a platform timesource already so reset it */
>> +    if ( plt_src.counter_bits != 0 )
>> +        reset_platform_timer();
> 
> What if any of the time functions get used between here and the
> setting of the new clock source? For example, what will happen to
> log messages when "console_timestamps=..." is in effect?
time functions will use NOW() (i.e. get_s_time) which uses cpu_time structs
being updated on local_time_calibration() or cpu frequency changes.
reset_platform_timer() will zero out some of the variables used by the
plt_overflow and read_platform_stime(). The overflow and calibration isn't an
issue as the timers are previously killed so any further calls will use what's
on cpu_time while plt_src is being changed.

The case I could see this being different is if cpu_frequency_change is called
between reset_platform_time() and the next update of stime_platform_stamp. In
this situation the calculation would end up a bit different meaning subsequent
calls of read_platform_stime() would return the equivalent to
scale_delta(plt_src->read_counter(), plt_scale), as opposed to computing with a
previous taken stime_platform_stamp and incrementing a difference with a counter
read. But can this situation actually happen?

> 
>> @@ -566,7 +642,9 @@ static void __init init_platform_timer(void)
>>      struct platform_timesource *pts = NULL;
>>      int i, rc = -1;
>>  
>> -    if ( opt_clocksource[0] != '\0' )
>> +    /* clocksource=tsc is initialized later when all CPUS are up */
>> +    if ( (opt_clocksource[0] != '\0') &&
>> +         (strcmp(opt_clocksource, "tsc") != 0) )
> 
> In line with the inverse check done below (using ! instead of == 0)
> please omit the != 0 here.
> 
OK.

>> @@ -1437,6 +1515,20 @@ static int __init verify_tsc_reliability(void)
>>          }
>>      }
>>  
>> +    if ( !strcmp(opt_clocksource, "tsc") )
>> +    {
>> +        if ( try_platform_timer(&plt_tsc) > 0 )
>> +        {
>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>> +                   freq_string(plt_src.frequency));
>> +
>> +            init_percpu_time();
> 
> So you re-do this for the BP, but not for any of the APs?
I had overlooked the invocation made in start_secondary(). I also need to redo
this on the APs.

> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-04-05 14:56     ` Joao Martins
@ 2016-04-05 15:12       ` Jan Beulich
  2016-04-05 17:07         ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 15:12 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 05.04.16 at 16:56, <joao.m.martins@oracle.com> wrote:
> On 04/05/2016 11:43 AM, Jan Beulich wrote:
>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>>>      if ( rc <= 0 )
>>>          return rc;
>>>  
>>> +    /* We have a platform timesource already so reset it */
>>> +    if ( plt_src.counter_bits != 0 )
>>> +        reset_platform_timer();
>> 
>> What if any of the time functions get used between here and the
>> setting of the new clock source? For example, what will happen to
>> log messages when "console_timestamps=..." is in effect?
> time functions will use NOW() (i.e. get_s_time) which uses cpu_time structs
> being updated on local_time_calibration() or cpu frequency changes.
> reset_platform_timer() will zero out some of the variables used by the
> plt_overflow and read_platform_stime(). The overflow and calibration isn't an
> issue as the timers are previously killed so any further calls will use what's
> on cpu_time while plt_src is being changed.
> 
> The case I could see this being different is if cpu_frequency_change is called
> between reset_platform_time() and the next update of stime_platform_stamp. In
> this situation the calculation would end up a bit different meaning subsequent
> calls of read_platform_stime() would return the equivalent to
> scale_delta(plt_src->read_counter(), plt_scale), as opposed to computing with a
> previous taken stime_platform_stamp and incrementing a difference with a counter
> read. But can this situation actually happen?

No matter which CPU freq model is being used (right now, may
change when the Intel P-state driver finally arrives), Dom0 is
required to be executing already, so no issue here.

Since you didn't go into any detail on the specific example of log
time stamps - am I to imply that they're completely unaffected by
this (i.e. there's also no risk of them going backwards for a brief
period of time)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 4/6] x86/time: streamline platform time init on plt_init()
  2016-04-05 11:46   ` Jan Beulich
@ 2016-04-05 15:12     ` Joao Martins
  2016-04-05 15:22       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-05 15:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel



On 04/05/2016 12:46 PM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  /************************************************************
>>   * PLATFORM TIMER 4: TSC
>>   */
>> +static bool_t clocksource_is_tsc;
> 
> __read_mostly, but see below.
> 
>> @@ -516,17 +519,31 @@ static s_time_t __read_platform_stime(u64 platform_time)
>>      return (stime_platform_stamp + scale_delta(diff, &plt_scale));
>>  }
>>  
>> +static void __plt_init(void)
>> +{
>> +    u64 count;
>> +
>> +    ASSERT(spin_is_locked(&platform_timer_lock));
>> +    count = plt_src.read_counter();
>> +    plt_stamp64 += (count - plt_stamp) & plt_mask;
>> +    plt_stamp = count;
>> +}
> 
> Note that this has nothing to do with "init" - it updates the two time
> stamps, as is being made clear by ...
> 
>>  static void plt_overflow(void *unused)
>>  {
>>      int i;
>> -    u64 count;
>>      s_time_t now, plt_now, plt_wrap;
>>  
>>      spin_lock_irq(&platform_timer_lock);
>>  
>> -    count = plt_src.read_counter();
>> -    plt_stamp64 += (count - plt_stamp) & plt_mask;
>> -    plt_stamp = count;
>> +    __plt_init();
> 
> ... this use.
> 
Would you prefer changing the name to e.g "set_plt_stamp" ?

>> @@ -621,11 +638,22 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>>  
>>      set_time_scale(&plt_scale, pts->frequency);
>>  
>> -    plt_overflow_period = scale_delta(
>> -        1ull << (pts->counter_bits - 1), &plt_scale);
>> -    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>>      plt_src = *pts;
>> -    plt_overflow(NULL);
>> +
>> +    if ( clocksource_is_tsc )
> 
> Why not simply "if ( pts == plt_tsc )", eliminating the need for the
> variable?
Yeah, good point. I will fix that.

> 
>> +    {
>> +        plt_init();
>> +    }
>> +    else
>> +    {
>> +        plt_overflow_period = scale_delta(
>> +            1ull << (pts->counter_bits - 1), &plt_scale);
>> +        init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>> +        plt_overflow(NULL);
>> +
>> +        printk("Platform timer overflow period is %lu secs\n",
>> +               plt_overflow_period/SECONDS(1));
> 
> If we want this logged at all, then please at most as XENLOG_INFO.
OK.

> Plus - is seconds granularity fine grained enough for all sources, i.e.
> wouldn't there for typical HPET just be a single digit, not a lot of
> precision that is?
Could be, my HPET was around 2 minutes overflow period, but PIT was a single
digit as you mention. I will change that to MILLISECS(1000) for higher precision
- or I can remove it entirely if you prefer not logging this info.

> And finally: Blanks around / please.
OK.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 4/6] x86/time: streamline platform time init on plt_init()
  2016-04-05 15:12     ` Joao Martins
@ 2016-04-05 15:22       ` Jan Beulich
  2016-04-05 17:17         ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 15:22 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 05.04.16 at 17:12, <joao.m.martins@oracle.com> wrote:
> On 04/05/2016 12:46 PM, Jan Beulich wrote:
>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>> @@ -516,17 +519,31 @@ static s_time_t __read_platform_stime(u64 platform_time)
>>>      return (stime_platform_stamp + scale_delta(diff, &plt_scale));
>>>  }
>>>  
>>> +static void __plt_init(void)
>>> +{
>>> +    u64 count;
>>> +
>>> +    ASSERT(spin_is_locked(&platform_timer_lock));
>>> +    count = plt_src.read_counter();
>>> +    plt_stamp64 += (count - plt_stamp) & plt_mask;
>>> +    plt_stamp = count;
>>> +}
>> 
>> Note that this has nothing to do with "init" - it updates the two time
>> stamps, as is being made clear by ...
>> 
>>>  static void plt_overflow(void *unused)
>>>  {
>>>      int i;
>>> -    u64 count;
>>>      s_time_t now, plt_now, plt_wrap;
>>>  
>>>      spin_lock_irq(&platform_timer_lock);
>>>  
>>> -    count = plt_src.read_counter();
>>> -    plt_stamp64 += (count - plt_stamp) & plt_mask;
>>> -    plt_stamp = count;
>>> +    __plt_init();
>> 
>> ... this use.
>> 
> Would you prefer changing the name to e.g "set_plt_stamp" ?

Or simply plt_update()?

>>> +    {
>>> +        plt_init();
>>> +    }
>>> +    else
>>> +    {
>>> +        plt_overflow_period = scale_delta(
>>> +            1ull << (pts->counter_bits - 1), &plt_scale);
>>> +        init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>>> +        plt_overflow(NULL);
>>> +
>>> +        printk("Platform timer overflow period is %lu secs\n",
>>> +               plt_overflow_period/SECONDS(1));
>> 
>> If we want this logged at all, then please at most as XENLOG_INFO.
> OK.
> 
>> Plus - is seconds granularity fine grained enough for all sources, i.e.
>> wouldn't there for typical HPET just be a single digit, not a lot of
>> precision that is?
> Could be, my HPET was around 2 minutes overflow period, but PIT was a single
> digit as you mention. I will change that to MILLISECS(1000) for higher 
> precision

How is MILLISECS(1000) different from SECONDS(1)?

> - or I can remove it entirely if you prefer not logging this info.

Well, there had been times where this information would have been
quite useful in diagnosing problems. That's been a while back, but
knowing we had such issues I can't just say "drop the message",
even if I hope we won't have any similar problems anymore.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/6] x86/time: refactor read_platform_stime()
  2016-04-05 11:52   ` Jan Beulich
@ 2016-04-05 15:22     ` Joao Martins
  2016-04-05 15:26       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-05 15:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel



On 04/05/2016 12:52 PM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> -static s_time_t read_platform_stime(void)
>> +static s_time_t read_platform_stime(u64 *stamp)
>>  {
>> -    u64 count;
>> +    u64 plt_stamp_counter, count;
> 
> "stamp" and "counter" seem kind of redundant.
> 
A bit, perhaps you prefer the latter? There was a variable named "count", so I
named "stamp" for clearer distinction between the variables and the output arg.

>>      s_time_t stime;
>>  
>>      ASSERT(!local_irq_is_enabled());
>>  
>>      spin_lock(&platform_timer_lock);
>> -    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
>> +    plt_stamp_counter = plt_src.read_counter();
>> +    count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask);
>>      stime = __read_platform_stime(count);
>> +    if ( stamp )
>> +        *stamp = plt_stamp_counter;
>>      spin_unlock(&platform_timer_lock);
> 
> What reason is there to do that conditional write inside the locked
> region?
> 
None, I should move this conditional write out of this region.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/6] x86/time: refactor read_platform_stime()
  2016-04-05 15:22     ` Joao Martins
@ 2016-04-05 15:26       ` Jan Beulich
  2016-04-05 17:08         ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-05 15:26 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 05.04.16 at 17:22, <joao.m.martins@oracle.com> wrote:
> On 04/05/2016 12:52 PM, Jan Beulich wrote:
>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>> -static s_time_t read_platform_stime(void)
>>> +static s_time_t read_platform_stime(u64 *stamp)
>>>  {
>>> -    u64 count;
>>> +    u64 plt_stamp_counter, count;
>> 
>> "stamp" and "counter" seem kind of redundant.
>> 
> A bit, perhaps you prefer the latter? There was a variable named "count", so I
> named "stamp" for clearer distinction between the variables and the output arg.

Between plt_stamp and plt_counter I'd indeed have a slight preference
towards the latter.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
  2016-04-05 15:12       ` Jan Beulich
@ 2016-04-05 17:07         ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-04-05 17:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 04/05/2016 04:12 PM, Jan Beulich wrote:
>>>> On 05.04.16 at 16:56, <joao.m.martins@oracle.com> wrote:
>> On 04/05/2016 11:43 AM, Jan Beulich wrote:
>>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>>> @@ -541,6 +613,10 @@ static int __init try_platform_timer(struct platform_timesource *pts)
>>>>      if ( rc <= 0 )
>>>>          return rc;
>>>>  
>>>> +    /* We have a platform timesource already so reset it */
>>>> +    if ( plt_src.counter_bits != 0 )
>>>> +        reset_platform_timer();
>>>
>>> What if any of the time functions get used between here and the
>>> setting of the new clock source? For example, what will happen to
>>> log messages when "console_timestamps=..." is in effect?
>> time functions will use NOW() (i.e. get_s_time) which uses cpu_time structs
>> being updated on local_time_calibration() or cpu frequency changes.
>> reset_platform_timer() will zero out some of the variables used by the
>> plt_overflow and read_platform_stime(). The overflow and calibration isn't an
>> issue as the timers are previously killed so any further calls will use what's
>> on cpu_time while plt_src is being changed.
>>
>> The case I could see this being different is if cpu_frequency_change is called
>> between reset_platform_time() and the next update of stime_platform_stamp. In
>> this situation the calculation would end up a bit different meaning subsequent
>> calls of read_platform_stime() would return the equivalent to
>> scale_delta(plt_src->read_counter(), plt_scale), as opposed to computing with a
>> previous taken stime_platform_stamp and incrementing a difference with a counter
>> read. But can this situation actually happen?
> 
> No matter which CPU freq model is being used (right now, may
> change when the Intel P-state driver finally arrives), Dom0 is
> required to be executing already, so no issue here.
Cool.

> Since you didn't go into any detail on the specific example of log
> time stamps - am I to imply that they're completely unaffected by
> this (i.e. there's also no risk of them going backwards for a brief
> period of time)?
log timestamps uses wallclock_time() function which uses NOW() and follows the
first paragraph I explained. But I need to correct that statement as get_s_time
uses a previously seeded value from stime_local_stamp (which is set through
read_platform_time()). Time going backwards could happen only if TSC is behind
the currently-in-use clocksource. During that brief period it uses the values
taken from cpu_time but after I set the new clocksource it would go backwards as
I zeroed the previous stime/stamp snapshots in reset_platform_time. In this case
with I propose I don't have any other timestamps (platform_timer_stamp =
(stime_platform_stamp = 0)) to calculate the difference with, so it would go
backwards if the TSC falls behind. I could prevent this situation from happening
by having an offset which would be used until the new clocksource catches up to
the old one? I am also searching in the manual, if that can situation can happen.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 5/6] x86/time: refactor read_platform_stime()
  2016-04-05 15:26       ` Jan Beulich
@ 2016-04-05 17:08         ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-04-05 17:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel



On 04/05/2016 04:26 PM, Jan Beulich wrote:
>>>> On 05.04.16 at 17:22, <joao.m.martins@oracle.com> wrote:
>> On 04/05/2016 12:52 PM, Jan Beulich wrote:
>>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>>> -static s_time_t read_platform_stime(void)
>>>> +static s_time_t read_platform_stime(u64 *stamp)
>>>>  {
>>>> -    u64 count;
>>>> +    u64 plt_stamp_counter, count;
>>>
>>> "stamp" and "counter" seem kind of redundant.
>>>
>> A bit, perhaps you prefer the latter? There was a variable named "count", so I
>> named "stamp" for clearer distinction between the variables and the output arg.
> 
> Between plt_stamp and plt_counter I'd indeed have a slight preference
> towards the latter.
OK, I will fix that.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 4/6] x86/time: streamline platform time init on plt_init()
  2016-04-05 15:22       ` Jan Beulich
@ 2016-04-05 17:17         ` Joao Martins
  0 siblings, 0 replies; 43+ messages in thread
From: Joao Martins @ 2016-04-05 17:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 04/05/2016 04:22 PM, Jan Beulich wrote:
>>>> On 05.04.16 at 17:12, <joao.m.martins@oracle.com> wrote:
>> On 04/05/2016 12:46 PM, Jan Beulich wrote:
>>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>>>> @@ -516,17 +519,31 @@ static s_time_t __read_platform_stime(u64 platform_time)
>>>>      return (stime_platform_stamp + scale_delta(diff, &plt_scale));
>>>>  }
>>>>  
>>>> +static void __plt_init(void)
>>>> +{
>>>> +    u64 count;
>>>> +
>>>> +    ASSERT(spin_is_locked(&platform_timer_lock));
>>>> +    count = plt_src.read_counter();
>>>> +    plt_stamp64 += (count - plt_stamp) & plt_mask;
>>>> +    plt_stamp = count;
>>>> +}
>>>
>>> Note that this has nothing to do with "init" - it updates the two time
>>> stamps, as is being made clear by ...
>>>
>>>>  static void plt_overflow(void *unused)
>>>>  {
>>>>      int i;
>>>> -    u64 count;
>>>>      s_time_t now, plt_now, plt_wrap;
>>>>  
>>>>      spin_lock_irq(&platform_timer_lock);
>>>>  
>>>> -    count = plt_src.read_counter();
>>>> -    plt_stamp64 += (count - plt_stamp) & plt_mask;
>>>> -    plt_stamp = count;
>>>> +    __plt_init();
>>>
>>> ... this use.
>>>
>> Would you prefer changing the name to e.g "set_plt_stamp" ?
> 
> Or simply plt_update()?
Sounds better indeed.

> 
>>>> +    {
>>>> +        plt_init();
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        plt_overflow_period = scale_delta(
>>>> +            1ull << (pts->counter_bits - 1), &plt_scale);
>>>> +        init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>>>> +        plt_overflow(NULL);
>>>> +
>>>> +        printk("Platform timer overflow period is %lu secs\n",
>>>> +               plt_overflow_period/SECONDS(1));
>>>
>>> If we want this logged at all, then please at most as XENLOG_INFO.
>> OK.
>>
>>> Plus - is seconds granularity fine grained enough for all sources, i.e.
>>> wouldn't there for typical HPET just be a single digit, not a lot of
>>> precision that is?
>> Could be, my HPET was around 2 minutes overflow period, but PIT was a single
>> digit as you mention. I will change that to MILLISECS(1000) for higher 
>> precision
> 
> How is MILLISECS(1000) different from SECONDS(1)?
Sorry, It's not - I meant MILLISECS(1).

> 
>> - or I can remove it entirely if you prefer not logging this info.
> 
> Well, there had been times where this information would have been
> quite useful in diagnosing problems. That's been a while back, but
> knowing we had such issues I can't just say "drop the message",
> even if I hope we won't have any similar problems anymore.
I will keep it then - until further notice.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-04-05 12:22   ` Jan Beulich
@ 2016-04-05 21:34     ` Joao Martins
  2016-04-07 15:58       ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-05 21:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 04/05/2016 01:22 PM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -43,6 +43,10 @@
>>  static char __initdata opt_clocksource[10];
>>  string_param("clocksource", opt_clocksource);
>>  
>> +/* opt_nocpuhotplug: Set if CPU hotplug isn't meant to be used */
>> +static bool_t __initdata opt_nocpuhotplug;
>> +boolean_param("nocpuhotplug", opt_nocpuhotplug);
> 
> If we were to have such a new option, it would need to be
> accompanied by an entry in the command line option doc.
Yes, Konrad pointed that out too - and I had it already documented
already for the next version. But given your argument below might
not even be needed to add this option.

> But
> I'm opposed to this: For one, the variable being static here
> means there is nothing that actually suppresses CPU hotplug
> to happen.
> And then I think this can, for all practical purposes,
> be had by suitably using existing command line options, namely
> "max_cpus=", such that set_nr_cpu_ids() won't allow for any
> further CPUs to get added. Albeit I admit that if someone was
> to bring down some CPU and then hotplug another one, we
> might still be in trouble. So maybe the better approach would
> be to fail onlining of CPUs that don't meet the criteria when
> "clocksource=tsc"?
True - max_cpus would produce the same effect. But I should point out
that even when clocksource=tsc the rendezvous would be std_rendezvous. So the
reference TSC is CPU 0 and tsc_timestamps are of the individual
CPUs. So perhaps the criteria would be for clocksource=tsc and use_tsc_stable_bit.


> 
>> @@ -435,6 +439,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>   * PLATFORM TIMER 4: TSC
>>   */
>>  static bool_t clocksource_is_tsc;
>> +static bool_t use_tsc_stable_bit;
> 
> __read_mostly?
Yeah - I will add it there.

> 
>> @@ -468,6 +473,11 @@ static int __init init_tsctimer(struct platform_timesource *pts)
>>  
>>      pts->frequency = tsc_freq;
>>      clocksource_is_tsc = tsc_reliable;
>> +    use_tsc_stable_bit = clocksource_is_tsc &&
>> +        ((nr_cpu_ids == num_present_cpus()) || opt_nocpuhotplug);
> 
> See my remark above regarding the reliability of this.
> 
>> @@ -1431,6 +1443,22 @@ static void time_calibration_std_rendezvous(void *_r)
>>      raise_softirq(TIME_CALIBRATE_SOFTIRQ);
>>  }
>>  
>> +/*
>> + * Rendezvous function used when clocksource is TSC and
>> + * no CPU hotplug will be performed.
>> + */
>> +static void time_calibration_nop_rendezvous(void *_r)
> 
> Even if done so in existing code - no new local variable or function
> parameters starting with an underscore please.
OK.

> 
>> +{
>> +    struct cpu_calibration *c = &this_cpu(cpu_calibration);
>> +    struct calibration_rendezvous *r = _r;
> 
> const
> 
>> +    c->local_tsc_stamp = r->master_tsc_stamp;
>> +    c->stime_local_stamp = get_s_time();
>> +    c->stime_master_stamp = r->master_stime;
>> +
>> +    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
>> +}
> 
> Perhaps this whole function should move up ahead of the other
> two, so that they both can use this one instead of now duplicating
> the same code a 3rd time? Or maybe a new helper function would
> be better, to also account for the difference in what
> c->local_tsc_stamp gets set from (which could then become a
> parameter of that new function).
The refactoring you suggest sounds a good idea indeed as that
code is shared across all rendezvous - I will do so following
this guideline you advised.

> 
>> @@ -1440,6 +1468,13 @@ static void time_calibration(void *unused)
>>          .semaphore = ATOMIC_INIT(0)
>>      };
>>  
>> +    if ( use_tsc_stable_bit )
>> +    {
>> +        local_irq_disable();
>> +        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
>> +        local_irq_enable();
>> +    }
> 
> So this can't be in time_calibration_nop_rendezvous() because
> you want to avoid the actual rendezvousing. But isn't the then
> possibly much larger gap between read_platform_stime() (which
> parallels the rdtsc()-s in the other two cases) and get_s_time()
> invocation going to become a problem?
Perhaps I am not not seeing the potential problem of this. The main
difference I see between both would be the base system time: read_platform_stime
uses stime_platform_stamp as base, and computes a difference from the
read_counter (i.e. rdtsc() ) with previously saved platform-wide stamp
(platform_timer_stamp). get_s_time uses the stime_local_stamp (updated from
stime_master_stamp on local_time_calibration) as base plus delta from rdtsc()
with local_tsc_stamp. And since this is now all TSC, and TSC monotonically
increase and is synchronized across CPUs, both calls would end up returning the
same or a always up-to-date value, whether cpu_time have a larger gap or not
from stime_platform_stamp. Unless the concern you are raising comes from the
fact CPU 0 calibrates much sooner than the last calibrated CPU, as opposed to
roughly at the same time with std_rendezvous?

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-04-05 21:34     ` Joao Martins
@ 2016-04-07 15:58       ` Jan Beulich
  2016-04-07 21:17         ` Joao Martins
  0 siblings, 1 reply; 43+ messages in thread
From: Jan Beulich @ 2016-04-07 15:58 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 05.04.16 at 23:34, <joao.m.martins@oracle.com> wrote:
> On 04/05/2016 01:22 PM, Jan Beulich wrote:
>>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> But
>> I'm opposed to this: For one, the variable being static here
>> means there is nothing that actually suppresses CPU hotplug
>> to happen.
>> And then I think this can, for all practical purposes,
>> be had by suitably using existing command line options, namely
>> "max_cpus=", such that set_nr_cpu_ids() won't allow for any
>> further CPUs to get added. Albeit I admit that if someone was
>> to bring down some CPU and then hotplug another one, we
>> might still be in trouble. So maybe the better approach would
>> be to fail onlining of CPUs that don't meet the criteria when
>> "clocksource=tsc"?
> True - max_cpus would produce the same effect. But I should point out
> that even when clocksource=tsc the rendezvous would be std_rendezvous. So 
> the
> reference TSC is CPU 0 and tsc_timestamps are of the individual
> CPUs. So perhaps the criteria would be for clocksource=tsc and 
> use_tsc_stable_bit.

Oh, of course I didn't mean this to be the precise condition, just
an outline. Considering use_tsc_stable_bit certainly makes sense.

>>> @@ -1440,6 +1468,13 @@ static void time_calibration(void *unused)
>>>          .semaphore = ATOMIC_INIT(0)
>>>      };
>>>  
>>> +    if ( use_tsc_stable_bit )
>>> +    {
>>> +        local_irq_disable();
>>> +        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
>>> +        local_irq_enable();
>>> +    }
>> 
>> So this can't be in time_calibration_nop_rendezvous() because
>> you want to avoid the actual rendezvousing. But isn't the then
>> possibly much larger gap between read_platform_stime() (which
>> parallels the rdtsc()-s in the other two cases) and get_s_time()
>> invocation going to become a problem?
> Perhaps I am not not seeing the potential problem of this.

I'm not sure there's a problem, I'm just asking because I've noticed
this behavioral difference.

> The main
> difference I see between both would be the base system time: 
> read_platform_stime
> uses stime_platform_stamp as base, and computes a difference from the
> read_counter (i.e. rdtsc() ) with previously saved platform-wide stamp
> (platform_timer_stamp). get_s_time uses the stime_local_stamp (updated from
> stime_master_stamp on local_time_calibration) as base plus delta from 
> rdtsc()
> with local_tsc_stamp. And since this is now all TSC, and TSC monotonically
> increase and is synchronized across CPUs, both calls would end up returning 
> the
> same or a always up-to-date value, whether cpu_time have a larger gap or not
> from stime_platform_stamp. Unless the concern you are raising comes from the
> fact CPU 0 calibrates much sooner than the last calibrated CPU, as opposed 
> to
> roughly at the same time with std_rendezvous?

In a way, yes. I'm concerned by the two time stamps no longer
being obtained at (almost) the same time. If that's not having
any bad consequences, the better.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-04-07 15:58       ` Jan Beulich
@ 2016-04-07 21:17         ` Joao Martins
  2016-04-07 21:32           ` Jan Beulich
  0 siblings, 1 reply; 43+ messages in thread
From: Joao Martins @ 2016-04-07 21:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>> The main
>> difference I see between both would be the base system time: 
>> read_platform_stime
>> uses stime_platform_stamp as base, and computes a difference from the
>> read_counter (i.e. rdtsc() ) with previously saved platform-wide stamp
>> (platform_timer_stamp). get_s_time uses the stime_local_stamp (updated from
>> stime_master_stamp on local_time_calibration) as base plus delta from 
>> rdtsc()
>> with local_tsc_stamp. And since this is now all TSC, and TSC monotonically
>> increase and is synchronized across CPUs, both calls would end up returning 
>> the
>> same or a always up-to-date value, whether cpu_time have a larger gap or not
>> from stime_platform_stamp. Unless the concern you are raising comes from the
>> fact CPU 0 calibrates much sooner than the last calibrated CPU, as opposed 
>> to
>> roughly at the same time with std_rendezvous?
> 
> In a way, yes. I'm concerned by the two time stamps no longer
> being obtained at (almost) the same time. If that's not having
> any bad consequences, the better.

I don't think there would be bad consequences as both timestamps correspond to
the same time reference - thus returning always the latest system time
irrespective of the gap between both stamps.

If you prefer I can go back with my initial approach (v1, with std_rendezvous)
to have both timestamps closely updated. And later (post-release?) revisit the
introduction of nop_rendezvous. Perhaps this way is more reasonable?

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-04-07 21:17         ` Joao Martins
@ 2016-04-07 21:32           ` Jan Beulich
  0 siblings, 0 replies; 43+ messages in thread
From: Jan Beulich @ 2016-04-07 21:32 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 07.04.16 at 23:17, <joao.m.martins@oracle.com> wrote:
>> > The main
>>> difference I see between both would be the base system time: 
>>> read_platform_stime
>>> uses stime_platform_stamp as base, and computes a difference from the
>>> read_counter (i.e. rdtsc() ) with previously saved platform-wide stamp
>>> (platform_timer_stamp). get_s_time uses the stime_local_stamp (updated from
>>> stime_master_stamp on local_time_calibration) as base plus delta from 
>>> rdtsc()
>>> with local_tsc_stamp. And since this is now all TSC, and TSC monotonically
>>> increase and is synchronized across CPUs, both calls would end up returning 
>>> the
>>> same or a always up-to-date value, whether cpu_time have a larger gap or not
>>> from stime_platform_stamp. Unless the concern you are raising comes from the
>>> fact CPU 0 calibrates much sooner than the last calibrated CPU, as opposed 
>>> to
>>> roughly at the same time with std_rendezvous?
>> 
>> In a way, yes. I'm concerned by the two time stamps no longer
>> being obtained at (almost) the same time. If that's not having
>> any bad consequences, the better.
> 
> I don't think there would be bad consequences as both timestamps correspond 
> to the same time reference - thus returning always the latest system time
> irrespective of the gap between both stamps.
> 
> If you prefer I can go back with my initial approach (v1, with std_rendezvous)
> to have both timestamps closely updated. And later (post-release?) revisit 
> the introduction of nop_rendezvous. Perhaps this way is more reasonable?

Since the new mode need to be actively asked for, I don't think
that's necessary.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2016-04-07 21:32 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-29 13:44 [PATCH v2 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
2016-03-29 13:44 ` [PATCH v2 1/6] public/xen.h: add flags field to vcpu_time_info Joao Martins
2016-03-30 15:49   ` Ian Jackson
2016-03-30 16:33     ` Joao Martins
2016-03-31  7:09     ` Jan Beulich
2016-03-31  7:13   ` Jan Beulich
2016-03-31 11:04     ` Joao Martins
2016-04-05 10:16   ` Jan Beulich
2016-04-05 10:59     ` Joao Martins
2016-03-29 13:44 ` [PATCH v2 2/6] x86/time: refactor init_platform_time() Joao Martins
2016-04-01 16:10   ` Konrad Rzeszutek Wilk
2016-04-01 18:26     ` Joao Martins
2016-04-05 10:09   ` Jan Beulich
2016-04-05 10:55     ` Joao Martins
2016-04-05 11:16       ` Jan Beulich
2016-03-29 13:44 ` [PATCH v2 3/6] x86/time: implement tsc as clocksource Joao Martins
2016-03-29 17:39   ` Konrad Rzeszutek Wilk
2016-03-29 17:52     ` Joao Martins
2016-04-01 16:43   ` Konrad Rzeszutek Wilk
2016-04-01 18:38     ` Joao Martins
2016-04-01 18:45       ` Konrad Rzeszutek Wilk
2016-04-03 18:47         ` Joao Martins
2016-04-05 10:43   ` Jan Beulich
2016-04-05 14:56     ` Joao Martins
2016-04-05 15:12       ` Jan Beulich
2016-04-05 17:07         ` Joao Martins
2016-03-29 13:44 ` [PATCH v2 4/6] x86/time: streamline platform time init on plt_init() Joao Martins
2016-04-05 11:46   ` Jan Beulich
2016-04-05 15:12     ` Joao Martins
2016-04-05 15:22       ` Jan Beulich
2016-04-05 17:17         ` Joao Martins
2016-03-29 13:44 ` [PATCH v2 5/6] x86/time: refactor read_platform_stime() Joao Martins
2016-04-01 18:32   ` Konrad Rzeszutek Wilk
2016-04-05 11:52   ` Jan Beulich
2016-04-05 15:22     ` Joao Martins
2016-04-05 15:26       ` Jan Beulich
2016-04-05 17:08         ` Joao Martins
2016-03-29 13:44 ` [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
2016-04-05 12:22   ` Jan Beulich
2016-04-05 21:34     ` Joao Martins
2016-04-07 15:58       ` Jan Beulich
2016-04-07 21:17         ` Joao Martins
2016-04-07 21:32           ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).