All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support
@ 2016-09-14 17:37 Joao Martins
  2016-09-14 17:37 ` [PATCH v4 1/5] x86/time: refactor init_platform_time() Joao Martins
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-14 17:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Jan Beulich

Hey,

This is v4 on the pvclock TSC series addressing comments from previous
version, see individual patches for complete changelog.

This series is divided as follows:

 R      * Patch 1: Small refactor around init_platform_time to reuse
                   initialization code when switching to TSC.
 U      * Patch 2: Adds a new clocksource based on TSC
 U      * Patch 3: Prerequisite for patch 4
 U      * Patch 4: Implements the PVCLOCK_TSC_STABLE_BIT
 N      * Patch 5: Extend "tsc" param to relax monotonicity restriction
                   across sockets.

 [ R := Reviewed-by ;; U := Updated ;; N := New ]

The benefit of this series is two-fold:
 
 1. Provide the guarantee of monotonic results on xen own system time as seen
 by any cpu when using TSC as clocksource.

 2. Provide this same guarantee to guests and thus set the
 TSC_STABLE_BIT (both FreeBSD and Linux support it) which then allows guests to
 skip expensive monotonicity check between PV CPU time infos. Plus, on Linux
 specifically this also means that it could support vDSO which greatly increases
 performance (x10) for gettimeofday and clock_gettime since it would no
 longer need to do the system call to get a reliable snapshot of system time.
 For a reference on my laptop the speed of gettimeofday under xen pvclock is 
 ~2 Mops/sec (Million ops per sec) whereas with vDSO it's on the range
 of ~22 Mops/sec on <= 4.4 kernels and ~37 Mops on >= 4.5.
 
 Doing a long running time warp test for 2 weeks on a dual-socket Haswell
 machine and I haven't yet seen time going backwards (plus tests on older
 multi socket machines). Furthermore double checked migration to/from hosts
 with/without the bit while guest was running the time warp test and no issues
 occurred too.

Thanks!

Joao Martins (5):
  x86/time: refactor init_platform_time()
  x86/time: implement tsc as clocksource
  x86/time: refactor read_platform_stime()
  x86/time: implement PVCLOCK_TSC_STABLE_BIT
  x86/time: extend "tsc" param with "stable:socket"

 docs/misc/xen-command-line.markdown |  10 +-
 xen/arch/x86/platform_hypercall.c   |   3 +-
 xen/arch/x86/time.c                 | 226 +++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/time.h          |   1 +
 4 files changed, 206 insertions(+), 34 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 1/5] x86/time: refactor init_platform_time()
  2016-09-14 17:37 [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
@ 2016-09-14 17:37 ` Joao Martins
  2016-09-14 17:37 ` [PATCH v4 2/5] x86/time: implement tsc as clocksource Joao Martins
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-14 17:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Jan Beulich

And accomodate platform time source initialization in
try_platform_time(). This is a preparatory patch for deferring
TSC clocksource initialization to the stage where all CPUS are
up (verify_tsc_reliability init call).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/time.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 73e0f98..0c1ad45 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -576,6 +576,24 @@ static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static s64 __init try_platform_timer(struct platform_timesource *pts)
+{
+    s64 rc = pts->init(pts);
+
+    if ( rc <= 0 )
+        return rc;
+
+    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
+
+    set_time_scale(&plt_scale, pts->frequency);
+
+    plt_overflow_period = scale_delta(
+        1ull << (pts->counter_bits - 1), &plt_scale);
+    plt_src = *pts;
+
+    return rc;
+}
+
 static u64 __init init_platform_timer(void)
 {
     static struct platform_timesource * __initdata plt_timers[] = {
@@ -593,7 +611,7 @@ static u64 __init init_platform_timer(void)
             pts = plt_timers[i];
             if ( !strcmp(opt_clocksource, pts->id) )
             {
-                rc = pts->init(pts);
+                rc = try_platform_timer(pts);
                 break;
             }
         }
@@ -609,21 +627,13 @@ static u64 __init init_platform_timer(void)
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
             pts = plt_timers[i];
-            if ( (rc = pts->init(pts)) > 0 )
+            if ( (rc = try_platform_timer(pts)) > 0 )
                 break;
         }
     }
 
     BUG_ON(rc <= 0);
 
-    plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
-
-    set_time_scale(&plt_scale, pts->frequency);
-
-    plt_overflow_period = scale_delta(
-        1ull << (pts->counter_bits-1), &plt_scale);
-    plt_src = *pts;
-
     printk("Platform timer is %s %s\n",
            freq_string(pts->frequency), pts->name);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-14 17:37 [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
  2016-09-14 17:37 ` [PATCH v4 1/5] x86/time: refactor init_platform_time() Joao Martins
@ 2016-09-14 17:37 ` Joao Martins
  2016-09-19 10:13   ` Jan Beulich
  2016-09-14 17:37 ` [PATCH v4 3/5] x86/time: refactor read_platform_stime() Joao Martins
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-14 17:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Jan Beulich

Recent x86/time changes improved a lot of the monotonicity in xen
timekeeping, making it much harder to observe time going backwards.
Although platform timer can't be expected to be perfectly in sync with
TSC and so get_s_time won't be guaranteed to always return
monotonically increasing values across cpus.  This is the case in some
of the boxes I am testing with, observing sometimes ~100 warps (of
very few nanoseconds each) after a few hours.

This patch introduces support for using TSC as platform time source
which is the highest resolution time and most performant to get.
Though there are also several problems associated with its usage, and
there isn't a complete (and architecturally defined) guarantee that
all machines will provide reliable and monotonic TSC in all cases (I
believe Intel to be the only that can guarantee that?) For this reason
it's set with less priority when compared to HPET unless adminstrator
changes "clocksource" boot option to "tsc". Initializing TSC
clocksource requires all CPUs up to have the tsc reliability checks
performed. init_xen_time is called before all CPUs are up, so for
example we would start with HPET (or ACPI, PIT) at boot time, and
switch later to TSC. The switch then happens on verify_tsc_reliability
initcall that is invoked when all CPUs are up. When attempting to
initialize TSC we also check for time warps and if it has invariant
TSC. Note that while we deem reliable a CONSTANT_TSC with no deep
C-states, it might not always be the case, so we're conservative and
allow TSC to be used as platform timer only with invariant TSC.
Additionally we check if CPU Hotplug isn't meant to be performed on
the host which will either be when max vcpus and num_present_cpu are
the same. This is because a newly hotplugged CPU may not satisfy the
condition of having all TSCs synchronized - so when having tsc
clocksource being used we allow offlining CPUs but not onlining any
ones back. Finally we prevent TSC from being used as clocksource on
multiple sockets because it isn't guaranteed to be invariant. Further
relaxing of this last requirement is added in a separate patch, such
that we allow vendors with such guarantee to use TSC as clocksource.
In case any of these conditions is not met, we keep the clocksource
that was previously initialized on init_xen_time.

Since b64438c7c ("x86/time: use correct (local) time stamp in
constant-TSC calibration fast path") updates to cpu time use local
stamps, which means platform timer is only used to seed the initial
cpu time.  With clocksource=tsc there is no need to be in sync with
another clocksource, so we reseed the local/master stamps to be values
of TSC and update the platform time stamps accordingly. Time
calibration is set to 1sec after we switch to TSC, thus these stamps
are reseeded to also ensure monotonic returning values right after the
point we switch to TSC. This is also to avoid the possibility of
having inconsistent readings in this short period (i.e. until
calibration fires).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v3:
 - Really fix "HPET switching to TSC" comment. Despite mentioned in the
 in previous version, the change wasn't there.
 - Remove parenthesis around the function call in init_platform_timer
 - Merge if on verify_tsc_reliability with opt_clocksource check
 - Removed comment above ".init = init_tsctimer"
 - Fixup docs updated into this patch.
 - Move host_tsc_is_clocksource() and CPU hotplug possibility check to this
 patch.
 - s/host_tsc_is_clocksource/clocksource_is_tsc
 - Use bool instead of bool_t
 - Add a comment above init_tsctimer() declaration mentioning the
   reliable TSC checks on verify_tsc_reliability(), under which the
   function is invoked.
 - Prevent clocksource=tsc on platforms with multiple sockets. Further
 relaxing of this requirement is added in a separate patch, as
 extension of "tsc" boot parameter.
 - Removed control group to update cpu_time and do instead with
   on_selected_cpus to avoid any potential races.
 - Accomodate common path between init_xen_time and TSC switch into
 try_platform_timer_tail, such that finishing platform timer
 initialization is done in the same place (including platform timer
 overflow which was previously was removed in previous versions).
 - Changed TSC counter_bits 63 to avoid mishandling of TSC counter
 wrap-around in platform timer overflow timer.
 - Moved paragraph CPU Hotplug from last patch and add comment on
   commit message about multiple sockets TSC sync.
 - s/init_tsctimer/init_tsc/g to be consistent with other TSC platform
 timer functions.

Changes since v2:
 - Suggest "HPET switching to TSC" only as an example as otherwise it
 would be misleading on platforms not having one.
 - Change init_tsctimer to skip all the tests and assume it's called
 only on reliable TSC conditions and no warps observed. Tidy
 initialization on verify_tsc_reliability as suggested by Konrad.
 - CONSTANT_TSC and max_cstate <= 2 case removed and only allow tsc
   clocksource in invariant TSC boxes.
 - Prefer omit !=0 on init_platform_timer for tsc case.
 - Change comment on init_platform_timer.
 - Add comment on plt_tsc declaration.
 - Reinit CPU time for all online cpus instead of just CPU 0.
 - Use rdtsc_ordered() as opposed to rdtsc()
 - Remove tsc_freq variable and set plt_tsc clocksource frequency
 with the refined tsc calibration.
 - Rework a bit the commit message.

Changes since v1:
 - s/printk/printk(XENLOG_INFO
 - Remove extra space on inner brackets
 - Add missing space around brackets
 - Defer TSC initialization when all CPUs are up.

Changes since RFC:
 - Spelling fixes in the commit message.
 - Remove unused clocksource_is_tsc variable and introduce it instead
 on the patch that uses it.
 - Move plt_tsc from second to last in the available clocksources.
---
 docs/misc/xen-command-line.markdown |   6 +-
 xen/arch/x86/platform_hypercall.c   |   3 +-
 xen/arch/x86/time.c                 | 127 +++++++++++++++++++++++++++++++++---
 xen/include/asm-x86/time.h          |   1 +
 4 files changed, 125 insertions(+), 12 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 3a250cb..f92fb3f 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -264,9 +264,13 @@ minimum of 32M, subject to a suitably aligned and sized contiguous
 region of memory being available.
 
 ### clocksource
-> `= pit | hpet | acpi`
+> `= pit | hpet | acpi | tsc`
 
 If set, override Xen's default choice for the platform timer.
+Having TSC as platform timer requires being explicitly set. This is because
+TSC can only be safely used if CPU hotplug isn't performed on the system. In
+some platforms, "maxcpus" parameter may require further adjustment to the
+number of online cpus.
 
 ### cmci-threshold
 > `= <integer>`
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 780f22d..0879e19 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -631,7 +631,8 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
         if ( ret )
             break;
 
-        if ( cpu >= nr_cpu_ids || !cpu_present(cpu) )
+        if ( cpu >= nr_cpu_ids || !cpu_present(cpu) ||
+             clocksource_is_tsc() )
         {
             ret = -EINVAL;
             break;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 0c1ad45..e5001d5 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -475,6 +475,50 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 }
 
 /************************************************************
+ * PLATFORM TIMER 4: TSC
+ */
+
+/*
+ * Called in verify_tsc_reliability() under reliable TSC conditions
+ * thus reusing all the checks already performed there.
+ */
+static s64 __init init_tsc(struct platform_timesource *pts)
+{
+    u64 ret = pts->frequency;
+
+    if ( nr_cpu_ids != num_present_cpus() )
+    {
+        printk(XENLOG_INFO "TSC: CPU Hotplug intended\n");
+        ret = 0;
+    }
+
+    if ( nr_sockets > 1 )
+    {
+        printk(XENLOG_INFO "TSC: Not invariant across sockets\n");
+        ret = 0;
+    }
+
+    if ( !ret )
+        printk(XENLOG_INFO "TSC: Not setting it as clocksource\n");
+
+    return ret;
+}
+
+static u64 read_tsc(void)
+{
+    return rdtsc_ordered();
+}
+
+static struct platform_timesource __initdata plt_tsc =
+{
+    .id = "tsc",
+    .name = "TSC",
+    .read_counter = read_tsc,
+    .counter_bits = 63,
+    .init = init_tsc,
+};
+
+/************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
@@ -576,6 +620,21 @@ static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static void __init reset_platform_timer(void)
+{
+    /* Deactivate any timers running */
+    kill_timer(&plt_overflow_timer);
+    kill_timer(&calibration_timer);
+
+    /* Reset counters and stamps */
+    spin_lock_irq(&platform_timer_lock);
+    plt_stamp = 0;
+    plt_stamp64 = 0;
+    platform_timer_stamp = 0;
+    stime_platform_stamp = 0;
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static s64 __init try_platform_timer(struct platform_timesource *pts)
 {
     s64 rc = pts->init(pts);
@@ -583,6 +642,10 @@ static s64 __init try_platform_timer(struct platform_timesource *pts)
     if ( rc <= 0 )
         return rc;
 
+    /* We have a platform timesource already so reset it */
+    if ( plt_src.counter_bits != 0 )
+        reset_platform_timer();
+
     plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
 
     set_time_scale(&plt_scale, pts->frequency);
@@ -604,7 +667,9 @@ static u64 __init init_platform_timer(void)
     unsigned int i;
     s64 rc = -1;
 
-    if ( opt_clocksource[0] != '\0' )
+    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
+    if ( (opt_clocksource[0] != '\0') &&
+         strcmp(opt_clocksource, "tsc") )
     {
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
@@ -1463,6 +1528,31 @@ static void __init tsc_check_writability(void)
     disable_tsc_sync = 1;
 }
 
+static void __init reset_percpu_time(void *unused)
+{
+    struct cpu_time *t = &this_cpu(cpu_time);
+
+    t->stamp.local_tsc = boot_tsc_stamp;
+    t->stamp.local_stime = 0;
+    t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
+    t->stamp.master_stime = t->stamp.local_stime;
+}
+
+static void __init try_platform_timer_tail(void)
+{
+    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
+    plt_overflow(NULL);
+
+    platform_timer_stamp = plt_stamp64;
+    stime_platform_stamp = NOW();
+
+    if ( !clocksource_is_tsc() )
+        init_percpu_time();
+
+    init_timer(&calibration_timer, time_calibration, NULL, 0);
+    set_timer(&calibration_timer, NOW() + EPOCH);
+}
+
 /* Late init function, after all cpus have booted */
 static int __init verify_tsc_reliability(void)
 {
@@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
             printk("TSC warp detected, disabling TSC_RELIABLE\n");
             setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
         }
+        else if ( !strcmp(opt_clocksource, "tsc") &&
+                  (try_platform_timer(&plt_tsc) > 0) )
+        {
+            /*
+             * Platform timer has changed and CPU time will only be updated
+             * after we set again the calibration timer, which means we need to
+             * seed again each local CPU time. At this stage TSC is known to be
+             * reliable i.e. monotonically increasing across all CPUs so this
+             * lets us remove the skew between platform timer and TSC, since
+             * these are now effectively the same.
+             */
+            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
+
+            /* Finish platform timer switch. */
+            try_platform_timer_tail();
+
+            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
+                   freq_string(plt_src.frequency));
+        }
     }
 
     return 0;
@@ -1505,15 +1614,7 @@ int __init init_xen_time(void)
     do_settime(get_cmos_time(), 0, NOW());
 
     /* Finish platform timer initialization. */
-    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
-    plt_overflow(NULL);
-    platform_timer_stamp = plt_stamp64;
-    stime_platform_stamp = NOW();
-
-    init_percpu_time();
-
-    init_timer(&calibration_timer, time_calibration, NULL, 0);
-    set_timer(&calibration_timer, NOW() + EPOCH);
+    try_platform_timer_tail();
 
     return 0;
 }
@@ -1527,6 +1628,7 @@ void __init early_time_init(void)
 
     preinit_pit();
     tmp = init_platform_timer();
+    plt_tsc.frequency = tmp;
 
     set_time_scale(&t->tsc_scale, tmp);
     t->stamp.local_tsc = boot_tsc_stamp;
@@ -1775,6 +1877,11 @@ void pv_soft_rdtsc(struct vcpu *v, struct cpu_user_regs *regs, int rdtscp)
              (d->arch.tsc_mode == TSC_MODE_PVRDTSCP) ? d->arch.incarnation : 0;
 }
 
+bool clocksource_is_tsc(void)
+{
+    return plt_src.read_counter == read_tsc;
+}
+
 int host_tsc_is_safe(void)
 {
     return boot_cpu_has(X86_FEATURE_TSC_RELIABLE);
diff --git a/xen/include/asm-x86/time.h b/xen/include/asm-x86/time.h
index 971883a..6d704b4 100644
--- a/xen/include/asm-x86/time.h
+++ b/xen/include/asm-x86/time.h
@@ -69,6 +69,7 @@ void tsc_get_info(struct domain *d, uint32_t *tsc_mode, uint64_t *elapsed_nsec,
 
 void force_update_vcpu_system_time(struct vcpu *v);
 
+bool clocksource_is_tsc(void);
 int host_tsc_is_safe(void);
 void cpuid_time_leaf(uint32_t sub_idx, uint32_t *eax, uint32_t *ebx,
                      uint32_t *ecx, uint32_t *edx);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 3/5] x86/time: refactor read_platform_stime()
  2016-09-14 17:37 [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
  2016-09-14 17:37 ` [PATCH v4 1/5] x86/time: refactor init_platform_time() Joao Martins
  2016-09-14 17:37 ` [PATCH v4 2/5] x86/time: implement tsc as clocksource Joao Martins
@ 2016-09-14 17:37 ` Joao Martins
  2016-09-19 10:15   ` Jan Beulich
  2016-09-14 17:37 ` [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
  2016-09-14 17:37 ` [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket" Joao Martins
  4 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-14 17:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Jan Beulich

To allow the caller to fetch the last read from the clocksource which
was used to calculate system_time. This is a prerequisite for a
subsequent patch that will use this last read.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v3:
 - Add mention of this being a prerequisite to a later patch.
---
 xen/arch/x86/time.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index e5001d5..af9e31f 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -581,18 +581,22 @@ static void plt_overflow(void *unused)
     set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
 }
 
-static s_time_t read_platform_stime(void)
+static s_time_t read_platform_stime(u64 *stamp)
 {
-    u64 count;
+    u64 plt_counter, count;
     s_time_t stime;
 
     ASSERT(!local_irq_is_enabled());
 
     spin_lock(&platform_timer_lock);
-    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
+    plt_counter = plt_src.read_counter();
+    count = plt_stamp64 + ((plt_counter - plt_stamp) & plt_mask);
     stime = __read_platform_stime(count);
     spin_unlock(&platform_timer_lock);
 
+    if ( stamp )
+        *stamp = plt_counter;
+
     return stime;
 }
 
@@ -726,7 +730,7 @@ void cstate_restore_tsc(void)
     if ( boot_cpu_has(X86_FEATURE_NONSTOP_TSC) )
         return;
 
-    write_tsc(stime2tsc(read_platform_stime()));
+    write_tsc(stime2tsc(read_platform_stime(NULL)));
 }
 
 /***************************************************************************
@@ -1045,7 +1049,7 @@ int cpu_frequency_change(u64 freq)
 
     local_irq_disable();
     /* Platform time /first/, as we may be delayed by platform_timer_lock. */
-    t->stamp.master_stime = read_platform_stime();
+    t->stamp.master_stime = read_platform_stime(NULL);
     curr_tsc = rdtsc_ordered();
     /* TSC-extrapolated time may be bogus after frequency change. */
     /*t->stamp.local_stime = get_s_time_fixed(curr_tsc);*/
@@ -1350,7 +1354,7 @@ static void time_calibration_tsc_rendezvous(void *_r)
 
             if ( r->master_stime == 0 )
             {
-                r->master_stime = read_platform_stime();
+                r->master_stime = read_platform_stime(NULL);
                 r->master_tsc_stamp = rdtsc_ordered();
             }
             atomic_inc(&r->semaphore);
@@ -1390,7 +1394,7 @@ static void time_calibration_std_rendezvous(void *_r)
     {
         while ( atomic_read(&r->semaphore) != (total_cpus - 1) )
             cpu_relax();
-        r->master_stime = read_platform_stime();
+        r->master_stime = read_platform_stime(NULL);
         smp_wmb(); /* write r->master_stime /then/ signal */
         atomic_inc(&r->semaphore);
     }
@@ -1429,7 +1433,7 @@ void time_latch_stamps(void)
     unsigned long flags;
 
     local_irq_save(flags);
-    ap_bringup_ref.master_stime = read_platform_stime();
+    ap_bringup_ref.master_stime = read_platform_stime(NULL);
     ap_bringup_ref.local_tsc = rdtsc_ordered();
     local_irq_restore(flags);
 
@@ -1447,7 +1451,7 @@ void init_percpu_time(void)
     t->tsc_scale = per_cpu(cpu_time, 0).tsc_scale;
 
     local_irq_save(flags);
-    now = read_platform_stime();
+    now = read_platform_stime(NULL);
     tsc = rdtsc_ordered();
     local_irq_restore(flags);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-09-14 17:37 [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (2 preceding siblings ...)
  2016-09-14 17:37 ` [PATCH v4 3/5] x86/time: refactor read_platform_stime() Joao Martins
@ 2016-09-14 17:37 ` Joao Martins
  2016-09-19 10:22   ` Jan Beulich
  2016-09-14 17:37 ` [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket" Joao Martins
  4 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-14 17:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Jan Beulich

This patch proposes relying on host TSC synchronization and
passthrough to the guest, when running on a TSC-safe platform. On
time_calibration we retrieve the platform time in ns and the counter
read by the clocksource that was used to compute system time. We
introduce a new rendezous function which doesn't require
synchronization between master and slave CPUS and just reads
calibration_rendezvous struct and writes it down the stime and stamp
to the cpu_calibration struct to be used later on. We can guarantee
that on a platform with a constant and reliable TSC, that the time
read on vcpu B right after A is bigger independently of the VCPU
calibration error. Since pvclock time infos are monotonic as seen by
any vCPU set PVCLOCK_TSC_STABLE_BIT, which then enables usage of VDSO
on Linux.  IIUC, this is similar to how it's implemented on KVM. Add
also a comment regarding this bit changing and that guests are
expected to check this bit on every read.

Should note that I've yet to see time going backwards in a long running
test for 2 weeks (in a dual socket machine), plus few other tests I did
on older platforms, including migration.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v3:
 - Do not adjust time_calibration_rendezvous_tail for nop_rendezvous and
 instead set cpu_time_stamp directly on the rendezvous function.
 - Move CPU Hotplug checks into patch 2
 - Add a commit and code comment regarding guests cope with this bit
   changing on hosts.
 - s/host_tsc_is_clocksource/clocksource_is_tsc

Changes since v2:
 - Add XEN_ prefix to pvclock flags.
 - Adapter time_calibration_rendezvous_tail to have the case of setting master
 tsc/stime and use it for the nop_rendezvous.
 - Removed hotplug CPU option that was added in v1
 - Prevent online of CPUs when clocksource is tsc.
 - Remove use_tsc_stable_bit, since clocksource is only used to seed
 values. So instead we test if hotplug is possible, and prevent clocksource=tsc
 to be used.
 - Remove 1st paragrah of commit message since the behaviour described
   no longer applies since b64438c.

Changes since v1:
 - Change approach to skip std_rendezvous by introducing a
   nop_rendezvous
 - Change commit message reflecting the change above.
 - Use TSC_STABLE_BIT only if cpu hotplug isn't possible.
 - Add command line option to override it if no cpu hotplug is
 intended.
---
 xen/arch/x86/time.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index af9e31f..0c1badc 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -951,6 +951,14 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
     _u.tsc_timestamp = tsc_stamp;
     _u.system_time   = t->stamp.local_stime;
 
+    /*
+     * It's expected that domains cope with this bit changing on every
+     * pvclock read to check whether they can resort solely on this tuple
+     * or if it further requires monotonicity checks with other vcpus.
+     */
+    if ( clocksource_is_tsc() )
+        _u.flags    |= XEN_PVCLOCK_TSC_STABLE_BIT;
+
     if ( is_hvm_domain(d) )
         _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset;
 
@@ -1409,6 +1417,22 @@ static void time_calibration_std_rendezvous(void *_r)
     time_calibration_rendezvous_tail(r);
 }
 
+/*
+ * Rendezvous function used when clocksource is TSC and
+ * no CPU hotplug will be performed.
+ */
+static void time_calibration_nop_rendezvous(void *rv)
+{
+    const struct calibration_rendezvous *r = rv;
+    struct cpu_time_stamp *c = &this_cpu(cpu_calibration);
+
+    c->local_tsc    = r->master_tsc_stamp;
+    c->local_stime  = r->master_stime;
+    c->master_stime = r->master_stime;
+
+    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
+}
+
 static void (*time_calibration_rendezvous_fn)(void *) =
     time_calibration_std_rendezvous;
 
@@ -1418,6 +1442,13 @@ static void time_calibration(void *unused)
         .semaphore = ATOMIC_INIT(0)
     };
 
+    if ( clocksource_is_tsc() )
+    {
+        local_irq_disable();
+        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
+        local_irq_enable();
+    }
+
     cpumask_copy(&r.cpu_calibration_map, &cpu_online_map);
 
     /* @wait=1 because we must wait for all cpus before freeing @r. */
@@ -1587,6 +1618,13 @@ static int __init verify_tsc_reliability(void)
              */
             on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
 
+            /*
+             * We won't do CPU Hotplug and TSC clocksource is being used which
+             * means we have a reliable TSC, plus we don't sync with any other
+             * clocksource so no need for rendezvous.
+             */
+            time_calibration_rendezvous_fn = time_calibration_nop_rendezvous;
+
             /* Finish platform timer switch. */
             try_platform_timer_tail();
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket"
  2016-09-14 17:37 [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
                   ` (3 preceding siblings ...)
  2016-09-14 17:37 ` [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
@ 2016-09-14 17:37 ` Joao Martins
  2016-09-19 10:29   ` Jan Beulich
  4 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-14 17:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Joao Martins, Jan Beulich

Extend the "tsc" boot parameter is to further relax TSC restrictions and
allow it to be used on machines that guarantee reliable TSC across
sockets. This is up to board manufacturers and there's no way for the OS
to probe this property, therefore user needs to explicitly set this option.

Also make one style adjustment that is to remove the unnecessary
parenthesis around clearing TSC_RELIABLE.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Alternatively to having tsc_flags, I could instead introduce a new
X86_FEATURE in the Xen defined mapping. This would mean
introducing a "set_caps" similar to "cleared_caps" in order to set the
feature bit after x86_capability get's zeroed out in common
identify_cpu(). It was unclear to me what would maintainers prefer so
I went for the simplest for starters. If you prefer the other way
I can redo it.

NB: Didn't add "stable:node" (and consequently tsc=stable as it would
refer to both) because I don't have a host with multiple nodes that
I can test with.
---
 docs/misc/xen-command-line.markdown |  6 ++++--
 xen/arch/x86/time.c                 | 11 ++++++++---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index f92fb3f..7161788 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -270,7 +270,9 @@ If set, override Xen's default choice for the platform timer.
 Having TSC as platform timer requires being explicitly set. This is because
 TSC can only be safely used if CPU hotplug isn't performed on the system. In
 some platforms, "maxcpus" parameter may require further adjustment to the
-number of online cpus.
+number of online cpus. When running under platforms that can guarantee a
+monotonic TSC across sockets you require adjusting "tsc" command line parameter
+parameter to "stable:sockets".
 
 ### cmci-threshold
 > `= <integer>`
@@ -1508,7 +1510,7 @@ pages) must also be specified via the tbuf\_size parameter.
 > `= <integer>`
 
 ### tsc
-> `= unstable | skewed`
+> `= unstable | skewed | stable:socket`
 
 ### ucode
 > `= [<integer> | scan]`
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 0c1badc..c1255db 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -477,6 +477,10 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 /************************************************************
  * PLATFORM TIMER 4: TSC
  */
+static unsigned int __read_mostly tsc_flags;
+
+/* TSC is reliable across sockets */
+#define TSC_RELIABLE_SOCKET (1 << 0)
 
 /*
  * Called in verify_tsc_reliability() under reliable TSC conditions
@@ -492,7 +496,7 @@ static s64 __init init_tsc(struct platform_timesource *pts)
         ret = 0;
     }
 
-    if ( nr_sockets > 1 )
+    if ( nr_sockets > 1 && !(tsc_flags & TSC_RELIABLE_SOCKET) )
     {
         printk(XENLOG_INFO "TSC: Not invariant across sockets\n");
         ret = 0;
@@ -1851,6 +1855,7 @@ int hwdom_pit_access(struct ioreq *ioreq)
 /*
  * tsc=unstable: Override all tests; assume TSC is unreliable.
  * tsc=skewed: Assume TSCs are individually reliable, but skewed across CPUs.
+ * tsc=stable:socket: Assume TSCs are reliable across sockets.
  */
 static void __init tsc_parse(const char *s)
 {
@@ -1861,9 +1866,9 @@ static void __init tsc_parse(const char *s)
         setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
     }
     else if ( !strcmp(s, "skewed") )
-    {
         setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
-    }
+    else if ( !strcmp(s, "stable:socket") )
+        tsc_flags |= TSC_RELIABLE_SOCKET;
 }
 custom_param("tsc", tsc_parse);
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-14 17:37 ` [PATCH v4 2/5] x86/time: implement tsc as clocksource Joao Martins
@ 2016-09-19 10:13   ` Jan Beulich
  2016-09-19 16:11     ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-19 10:13 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
> This patch introduces support for using TSC as platform time source
> which is the highest resolution time and most performant to get.
> Though there are also several problems associated with its usage, and
> there isn't a complete (and architecturally defined) guarantee that
> all machines will provide reliable and monotonic TSC in all cases (I
> believe Intel to be the only that can guarantee that?) For this reason
> it's set with less priority when compared to HPET unless adminstrator
> changes "clocksource" boot option to "tsc".

In the following sentence you removed the exclusive mentioning
of HPET, but above you don't. Furthermore I don't think this
sentence is in line with what the patch does: There's no priority
given to it, and it won't be used at all when not requested on the
command line.

> Initializing TSC
> clocksource requires all CPUs up to have the tsc reliability checks
> performed. init_xen_time is called before all CPUs are up, so for
> example we would start with HPET (or ACPI, PIT) at boot time, and
> switch later to TSC. The switch then happens on verify_tsc_reliability
> initcall that is invoked when all CPUs are up. When attempting to
> initialize TSC we also check for time warps and if it has invariant
> TSC. Note that while we deem reliable a CONSTANT_TSC with no deep
> C-states, it might not always be the case, so we're conservative and
> allow TSC to be used as platform timer only with invariant TSC.
> Additionally we check if CPU Hotplug isn't meant to be performed on
> the host which will either be when max vcpus and num_present_cpu are
> the same. This is because a newly hotplugged CPU may not satisfy the
> condition of having all TSCs synchronized - so when having tsc
> clocksource being used we allow offlining CPUs but not onlining any
> ones back. Finally we prevent TSC from being used as clocksource on
> multiple sockets because it isn't guaranteed to be invariant. Further
> relaxing of this last requirement is added in a separate patch, such
> that we allow vendors with such guarantee to use TSC as clocksource.
> In case any of these conditions is not met, we keep the clocksource
> that was previously initialized on init_xen_time.
> 
> Since b64438c7c ("x86/time: use correct (local) time stamp in
> constant-TSC calibration fast path") updates to cpu time use local
> stamps, which means platform timer is only used to seed the initial
> cpu time.  With clocksource=tsc there is no need to be in sync with
> another clocksource, so we reseed the local/master stamps to be values
> of TSC and update the platform time stamps accordingly. Time
> calibration is set to 1sec after we switch to TSC, thus these stamps
> are reseeded to also ensure monotonic returning values right after the
> point we switch to TSC. This is also to avoid the possibility of
> having inconsistent readings in this short period (i.e. until
> calibration fires).

And within this one second, which may cover some of Dom0's
booting up, it is okay to have inconsistencies?

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -475,6 +475,50 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  }
>  
>  /************************************************************
> + * PLATFORM TIMER 4: TSC
> + */
> +
> +/*
> + * Called in verify_tsc_reliability() under reliable TSC conditions
> + * thus reusing all the checks already performed there.
> + */
> +static s64 __init init_tsc(struct platform_timesource *pts)
> +{
> +    u64 ret = pts->frequency;
> +
> +    if ( nr_cpu_ids != num_present_cpus() )
> +    {
> +        printk(XENLOG_INFO "TSC: CPU Hotplug intended\n");
> +        ret = 0;
> +    }
> +
> +    if ( nr_sockets > 1 )
> +    {
> +        printk(XENLOG_INFO "TSC: Not invariant across sockets\n");
> +        ret = 0;
> +    }
> +
> +    if ( !ret )
> +        printk(XENLOG_INFO "TSC: Not setting it as clocksource\n");

I think this last message is redundant with the former two. But since
I also think that info level is too low for those earlier ones, perhaps
keeping the latter (at info or even debug level) would be okay, once
the other got bumped to warning level.

> +static struct platform_timesource __initdata plt_tsc =
> +{
> +    .id = "tsc",
> +    .name = "TSC",
> +    .read_counter = read_tsc,
> +    .counter_bits = 63,

Please add a brief comment explaining why this is not 64.

> @@ -604,7 +667,9 @@ static u64 __init init_platform_timer(void)
>      unsigned int i;
>      s64 rc = -1;
>  
> -    if ( opt_clocksource[0] != '\0' )
> +    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
> +    if ( (opt_clocksource[0] != '\0') &&
> +         strcmp(opt_clocksource, "tsc") )

No real need to split this if() across two lines.

> +static void __init try_platform_timer_tail(void)
> +{
> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
> +    plt_overflow(NULL);
> +
> +    platform_timer_stamp = plt_stamp64;
> +    stime_platform_stamp = NOW();
> +
> +    if ( !clocksource_is_tsc() )
> +        init_percpu_time();

This isn't really dependent on whether TSC is used as clocksource,
but solely on the point in time at which the call gets made, is it? If
so, I think an explicit boolean function parameter (named e.g. "late")
would be better than abusing the predicate here.

> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>          }
> +        else if ( !strcmp(opt_clocksource, "tsc") &&
> +                  (try_platform_timer(&plt_tsc) > 0) )
> +        {
> +            /*
> +             * Platform timer has changed and CPU time will only be updated
> +             * after we set again the calibration timer, which means we need to
> +             * seed again each local CPU time. At this stage TSC is known to be
> +             * reliable i.e. monotonically increasing across all CPUs so this
> +             * lets us remove the skew between platform timer and TSC, since
> +             * these are now effectively the same.
> +             */
> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
> +
> +            /* Finish platform timer switch. */
> +            try_platform_timer_tail();
> +
> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
> +                   freq_string(plt_src.frequency));

This message should have the same log level as the one at the end
of init_platform_timer().

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/5] x86/time: refactor read_platform_stime()
  2016-09-14 17:37 ` [PATCH v4 3/5] x86/time: refactor read_platform_stime() Joao Martins
@ 2016-09-19 10:15   ` Jan Beulich
  2016-09-19 16:11     ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-19 10:15 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
> To allow the caller to fetch the last read from the clocksource which
> was used to calculate system_time. This is a prerequisite for a
> subsequent patch that will use this last read.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
with one further minor request:

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -581,18 +581,22 @@ static void plt_overflow(void *unused)
>      set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
>  }
>  
> -static s_time_t read_platform_stime(void)
> +static s_time_t read_platform_stime(u64 *stamp)
>  {
> -    u64 count;
> +    u64 plt_counter, count;
>      s_time_t stime;
>  
>      ASSERT(!local_irq_is_enabled());
>  
>      spin_lock(&platform_timer_lock);
> -    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
> +    plt_counter = plt_src.read_counter();
> +    count = plt_stamp64 + ((plt_counter - plt_stamp) & plt_mask);
>      stime = __read_platform_stime(count);
>      spin_unlock(&platform_timer_lock);
>  
> +    if ( stamp )
> +        *stamp = plt_counter;

Considering that all current callers pass in NULL and you mean to
add only one (iirc) which doesn't, please add unlikely() here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-09-14 17:37 ` [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
@ 2016-09-19 10:22   ` Jan Beulich
  2016-09-19 16:11     ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-19 10:22 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -951,6 +951,14 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>      _u.tsc_timestamp = tsc_stamp;
>      _u.system_time   = t->stamp.local_stime;
>  
> +    /*
> +     * It's expected that domains cope with this bit changing on every
> +     * pvclock read to check whether they can resort solely on this tuple
> +     * or if it further requires monotonicity checks with other vcpus.
> +     */
> +    if ( clocksource_is_tsc() )
> +        _u.flags    |= XEN_PVCLOCK_TSC_STABLE_BIT;

Stray blanks ahead of the |=.

With that taken care of
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket"
  2016-09-14 17:37 ` [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket" Joao Martins
@ 2016-09-19 10:29   ` Jan Beulich
  2016-09-19 16:11     ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-19 10:29 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -270,7 +270,9 @@ If set, override Xen's default choice for the platform timer.
>  Having TSC as platform timer requires being explicitly set. This is because
>  TSC can only be safely used if CPU hotplug isn't performed on the system. In
>  some platforms, "maxcpus" parameter may require further adjustment to the
> -number of online cpus.
> +number of online cpus. When running under platforms that can guarantee a

... running on platforms ...

> +monotonic TSC across sockets you require adjusting "tsc" command line parameter

... you may want to adjust the ...

> +parameter to "stable:sockets".

Redundant "parameter" (I guess the one on the earlier line was meant
to get moved due to line length). Also you say "sockets" here but ...

> @@ -1508,7 +1510,7 @@ pages) must also be specified via the tbuf\_size parameter.
>  > `= <integer>`
>  
>  ### tsc
> -> `= unstable | skewed`
> +> `= unstable | skewed | stable:socket`

"socket" here - the two really should match.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -477,6 +477,10 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>  /************************************************************
>   * PLATFORM TIMER 4: TSC
>   */
> +static unsigned int __read_mostly tsc_flags;

__initdata instead of __read_mostly

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-19 10:13   ` Jan Beulich
@ 2016-09-19 16:11     ` Joao Martins
  2016-09-19 16:25       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-19 16:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>> This patch introduces support for using TSC as platform time source
>> which is the highest resolution time and most performant to get.
>> Though there are also several problems associated with its usage, and
>> there isn't a complete (and architecturally defined) guarantee that
>> all machines will provide reliable and monotonic TSC in all cases (I
>> believe Intel to be the only that can guarantee that?) For this reason
>> it's set with less priority when compared to HPET unless adminstrator
>> changes "clocksource" boot option to "tsc".
> 
> In the following sentence you removed the exclusive mentioning
> of HPET, but above you don't. Furthermore I don't think this
> sentence is in line with what the patch does: There's no priority
> given to it, and it won't be used at all when not requested on the
> command line.
You're right, let me change this sentence to be:

For this reason it's not used unless administrator changes "clocksource" boot
option to "tsc".

> 
>> Initializing TSC
>> clocksource requires all CPUs up to have the tsc reliability checks
>> performed. init_xen_time is called before all CPUs are up, so for
>> example we would start with HPET (or ACPI, PIT) at boot time, and
>> switch later to TSC. The switch then happens on verify_tsc_reliability
>> initcall that is invoked when all CPUs are up. When attempting to
>> initialize TSC we also check for time warps and if it has invariant
>> TSC. Note that while we deem reliable a CONSTANT_TSC with no deep
>> C-states, it might not always be the case, so we're conservative and
>> allow TSC to be used as platform timer only with invariant TSC.
>> Additionally we check if CPU Hotplug isn't meant to be performed on
>> the host which will either be when max vcpus and num_present_cpu are
>> the same. This is because a newly hotplugged CPU may not satisfy the
>> condition of having all TSCs synchronized - so when having tsc
>> clocksource being used we allow offlining CPUs but not onlining any
>> ones back. Finally we prevent TSC from being used as clocksource on
>> multiple sockets because it isn't guaranteed to be invariant. Further
>> relaxing of this last requirement is added in a separate patch, such
>> that we allow vendors with such guarantee to use TSC as clocksource.
>> In case any of these conditions is not met, we keep the clocksource
>> that was previously initialized on init_xen_time.
>>
>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>> constant-TSC calibration fast path") updates to cpu time use local
>> stamps, which means platform timer is only used to seed the initial
>> cpu time.  With clocksource=tsc there is no need to be in sync with
>> another clocksource, so we reseed the local/master stamps to be values
>> of TSC and update the platform time stamps accordingly. Time
>> calibration is set to 1sec after we switch to TSC, thus these stamps
>> are reseeded to also ensure monotonic returning values right after the
>> point we switch to TSC. This is also to avoid the possibility of
>> having inconsistent readings in this short period (i.e. until
>> calibration fires).
> 
> And within this one second, which may cover some of Dom0's
> booting up, it is okay to have inconsistencies?
It's not okay which is why I am removing this possibility when switching to TSC.
The inconsistencies in those readings (if I wasn't adjusting) would be because
we would be using (in that 1-sec) those cpu time tuples calculated by the
previous calibration or platform time initialization (while still was HPET,
ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
change it to "remove the possibility" in this last sentence?

> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -475,6 +475,50 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  }
>>  
>>  /************************************************************
>> + * PLATFORM TIMER 4: TSC
>> + */
>> +
>> +/*
>> + * Called in verify_tsc_reliability() under reliable TSC conditions
>> + * thus reusing all the checks already performed there.
>> + */
>> +static s64 __init init_tsc(struct platform_timesource *pts)
>> +{
>> +    u64 ret = pts->frequency;
>> +
>> +    if ( nr_cpu_ids != num_present_cpus() )
>> +    {
>> +        printk(XENLOG_INFO "TSC: CPU Hotplug intended\n");
>> +        ret = 0;
>> +    }
>> +
>> +    if ( nr_sockets > 1 )
>> +    {
>> +        printk(XENLOG_INFO "TSC: Not invariant across sockets\n");
>> +        ret = 0;
>> +    }
>> +
>> +    if ( !ret )
>> +        printk(XENLOG_INFO "TSC: Not setting it as clocksource\n");
> 
> I think this last message is redundant with the former two. But since
> I also think that info level is too low for those earlier ones, perhaps
> keeping the latter (at info or even debug level) would be okay, once
> the other got bumped to warning level.
Makes sense and one can infer that message from the lack of "Switched to ..."
Let me change the first two into warning level and the last one as debug level
as you're suggesting.

> 
>> +static struct platform_timesource __initdata plt_tsc =
>> +{
>> +    .id = "tsc",
>> +    .name = "TSC",
>> +    .read_counter = read_tsc,
>> +    .counter_bits = 63,
> 
> Please add a brief comment explaining why this is not 64.
OK.

> 
>> @@ -604,7 +667,9 @@ static u64 __init init_platform_timer(void)
>>      unsigned int i;
>>      s64 rc = -1;
>>  
>> -    if ( opt_clocksource[0] != '\0' )
>> +    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
>> +    if ( (opt_clocksource[0] != '\0') &&
>> +         strcmp(opt_clocksource, "tsc") )
> 
> No real need to split this if() across two lines.
OK.

> 
>> +static void __init try_platform_timer_tail(void)
>> +{
>> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>> +    plt_overflow(NULL);
>> +
>> +    platform_timer_stamp = plt_stamp64;
>> +    stime_platform_stamp = NOW();
>> +
>> +    if ( !clocksource_is_tsc() )
>> +        init_percpu_time();
> 
> This isn't really dependent on whether TSC is used as clocksource,
> but solely on the point in time at which the call gets made, is it? If
> so, I think an explicit boolean function parameter (named e.g. "late")
> would be better than abusing the predicate here.

Correct, I will introduce this boolean parameter. Not that is critical but
probably add an likely(...) there too, since the late case only happens for
clocksource=tsc ?

> 
>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>          }
>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>> +                  (try_platform_timer(&plt_tsc) > 0) )
>> +        {
>> +            /*
>> +             * Platform timer has changed and CPU time will only be updated
>> +             * after we set again the calibration timer, which means we need to
>> +             * seed again each local CPU time. At this stage TSC is known to be
>> +             * reliable i.e. monotonically increasing across all CPUs so this
>> +             * lets us remove the skew between platform timer and TSC, since
>> +             * these are now effectively the same.
>> +             */
>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>> +
>> +            /* Finish platform timer switch. */
>> +            try_platform_timer_tail();
>> +
>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>> +                   freq_string(plt_src.frequency));
> 
> This message should have the same log level as the one at the end
> of init_platform_timer().
Agreed, but at the end of init_platform_timer there is a plain printk with an
omitted log level. Or do you mean to remove XENLOG_INFO from this printk above
or, instead add XENLOG_INFO to one printk at the end of init_platform_timer() ?

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 3/5] x86/time: refactor read_platform_stime()
  2016-09-19 10:15   ` Jan Beulich
@ 2016-09-19 16:11     ` Joao Martins
  0 siblings, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-19 16:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/19/2016 11:15 AM, Jan Beulich wrote:
>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>> To allow the caller to fetch the last read from the clocksource which
>> was used to calculate system_time. This is a prerequisite for a
>> subsequent patch that will use this last read.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> 
> Acked-by: Jan Beulich <jbeulich@suse.com>
> with one further minor request:
> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -581,18 +581,22 @@ static void plt_overflow(void *unused)
>>      set_timer(&plt_overflow_timer, NOW() + plt_overflow_period);
>>  }
>>  
>> -static s_time_t read_platform_stime(void)
>> +static s_time_t read_platform_stime(u64 *stamp)
>>  {
>> -    u64 count;
>> +    u64 plt_counter, count;
>>      s_time_t stime;
>>  
>>      ASSERT(!local_irq_is_enabled());
>>  
>>      spin_lock(&platform_timer_lock);
>> -    count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask);
>> +    plt_counter = plt_src.read_counter();
>> +    count = plt_stamp64 + ((plt_counter - plt_stamp) & plt_mask);
>>      stime = __read_platform_stime(count);
>>      spin_unlock(&platform_timer_lock);
>>  
>> +    if ( stamp )
>> +        *stamp = plt_counter;
> 
> Considering that all current callers pass in NULL and you mean to
> add only one (iirc) which doesn't, please add unlikely() here.
OK, I will add it, Thank you!

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT
  2016-09-19 10:22   ` Jan Beulich
@ 2016-09-19 16:11     ` Joao Martins
  0 siblings, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-19 16:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/19/2016 11:22 AM, Jan Beulich wrote:
>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -951,6 +951,14 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>>      _u.tsc_timestamp = tsc_stamp;
>>      _u.system_time   = t->stamp.local_stime;
>>  
>> +    /*
>> +     * It's expected that domains cope with this bit changing on every
>> +     * pvclock read to check whether they can resort solely on this tuple
>> +     * or if it further requires monotonicity checks with other vcpus.
>> +     */
>> +    if ( clocksource_is_tsc() )
>> +        _u.flags    |= XEN_PVCLOCK_TSC_STABLE_BIT;
> 
> Stray blanks ahead of the |=.
> 
Will fix it.

> With that taken care of
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thanks!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket"
  2016-09-19 10:29   ` Jan Beulich
@ 2016-09-19 16:11     ` Joao Martins
  0 siblings, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-19 16:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/19/2016 11:29 AM, Jan Beulich wrote:
>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>> --- a/docs/misc/xen-command-line.markdown
>> +++ b/docs/misc/xen-command-line.markdown
>> @@ -270,7 +270,9 @@ If set, override Xen's default choice for the platform timer.
>>  Having TSC as platform timer requires being explicitly set. This is because
>>  TSC can only be safely used if CPU hotplug isn't performed on the system. In
>>  some platforms, "maxcpus" parameter may require further adjustment to the
>> -number of online cpus.
>> +number of online cpus. When running under platforms that can guarantee a
> 
> ... running on platforms ...
> 
>> +monotonic TSC across sockets you require adjusting "tsc" command line parameter
> 
> ... you may want to adjust the ...
> 
>> +parameter to "stable:sockets".
> 
> Redundant "parameter" (I guess the one on the earlier line was meant
> to get moved due to line length). Also you say "sockets" here but ...
> 
>> @@ -1508,7 +1510,7 @@ pages) must also be specified via the tbuf\_size parameter.
>>  > `= <integer>`
>>  
>>  ### tsc
>> -> `= unstable | skewed`
>> +> `= unstable | skewed | stable:socket`
> 
> "socket" here - the two really should match.
Correct, I will fix this parameter name mismatch, and the spelling mistakes you
pointed out above.

>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -477,6 +477,10 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>  /************************************************************
>>   * PLATFORM TIMER 4: TSC
>>   */
>> +static unsigned int __read_mostly tsc_flags;
> 
> __initdata instead of __read_mostly
OK.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-19 16:11     ` Joao Martins
@ 2016-09-19 16:25       ` Jan Beulich
  2016-09-19 17:54         ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-19 16:25 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>> constant-TSC calibration fast path") updates to cpu time use local
>>> stamps, which means platform timer is only used to seed the initial
>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>> another clocksource, so we reseed the local/master stamps to be values
>>> of TSC and update the platform time stamps accordingly. Time
>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>> are reseeded to also ensure monotonic returning values right after the
>>> point we switch to TSC. This is also to avoid the possibility of
>>> having inconsistent readings in this short period (i.e. until
>>> calibration fires).
>> 
>> And within this one second, which may cover some of Dom0's
>> booting up, it is okay to have inconsistencies?
> It's not okay which is why I am removing this possibility when switching to TSC.
> The inconsistencies in those readings (if I wasn't adjusting) would be because
> we would be using (in that 1-sec) those cpu time tuples calculated by the
> previous calibration or platform time initialization (while still was HPET,
> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
> change it to "remove the possibility" in this last sentence?

Let's not do the 2nd step before the 1st, which is the question of
what happens prior to and what actually changes at this first
calibration (after 1 sec).

>>> +static void __init try_platform_timer_tail(void)
>>> +{
>>> +    init_timer(&plt_overflow_timer, plt_overflow, NULL, 0);
>>> +    plt_overflow(NULL);
>>> +
>>> +    platform_timer_stamp = plt_stamp64;
>>> +    stime_platform_stamp = NOW();
>>> +
>>> +    if ( !clocksource_is_tsc() )
>>> +        init_percpu_time();
>> 
>> This isn't really dependent on whether TSC is used as clocksource,
>> but solely on the point in time at which the call gets made, is it? If
>> so, I think an explicit boolean function parameter (named e.g. "late")
>> would be better than abusing the predicate here.
> 
> Correct, I will introduce this boolean parameter. Not that is critical but
> probably add an likely(...) there too, since the late case only happens for
> clocksource=tsc ?

Well, in __init code I prefer to avoid likely()/unlikely(), unless it's
in e.g. a performance critical loop.

>>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>>          }
>>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>>> +                  (try_platform_timer(&plt_tsc) > 0) )
>>> +        {
>>> +            /*
>>> +             * Platform timer has changed and CPU time will only be updated
>>> +             * after we set again the calibration timer, which means we need to
>>> +             * seed again each local CPU time. At this stage TSC is known to be
>>> +             * reliable i.e. monotonically increasing across all CPUs so this
>>> +             * lets us remove the skew between platform timer and TSC, since
>>> +             * these are now effectively the same.
>>> +             */
>>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>>> +
>>> +            /* Finish platform timer switch. */
>>> +            try_platform_timer_tail();
>>> +
>>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>>> +                   freq_string(plt_src.frequency));
>> 
>> This message should have the same log level as the one at the end
>> of init_platform_timer().
> Agreed, but at the end of init_platform_timer there is a plain printk with an
> omitted log level. Or do you mean to remove XENLOG_INFO from this printk above
> or, instead add XENLOG_INFO to one printk at the end of 
> init_platform_timer() ?

Well, info would again be too low a level for my taste. Hence either
remove the level here (slightly preferred from my pov), or make both
warning.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-19 16:25       ` Jan Beulich
@ 2016-09-19 17:54         ` Joao Martins
  2016-09-20  7:13           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-19 17:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>> stamps, which means platform timer is only used to seed the initial
>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>> another clocksource, so we reseed the local/master stamps to be values
>>>> of TSC and update the platform time stamps accordingly. Time
>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>> are reseeded to also ensure monotonic returning values right after the
>>>> point we switch to TSC. This is also to avoid the possibility of
>>>> having inconsistent readings in this short period (i.e. until
>>>> calibration fires).
>>>
>>> And within this one second, which may cover some of Dom0's
>>> booting up, it is okay to have inconsistencies?
>> It's not okay which is why I am removing this possibility when switching to TSC.
>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>> previous calibration or platform time initialization (while still was HPET,
>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>> change it to "remove the possibility" in this last sentence?
> 
> Let's not do the 2nd step before the 1st, which is the question of
> what happens prior to and what actually changes at this first
> calibration (after 1 sec).
The first calibration won't change much - this 1-sec was meant when having
nop_rendezvous which is the first time platform timer would be used to set local
cpu_time (will adjust the mention above as it's misleading for the reader as it
doesn't refer to this patch). Though reseeding the cpu times to boot_tsc_stamp
prior to calibration (*without* the time latch values from previous platform
timer) we can ensure NOW/get_s_time or calls to update_vcpu_system_time() will
see monotonically increasing values. Otherwise keeping the previous ones,
calibration would just add up local TSC delta and any existing divergence
wouldn't be solved. On a later patch when we set the stable bit (with nop
rendezvous), if these cpu times weren't adjusted guests/xen would still see
small divergence between CPUs until the first calibration was fired (after we
switched to TSC). And then the values wouldn't be consistent with the guarantees
expected from this bit. But even not considering this bit (which is not the
subject of this patch), same guarantee is expected from get_s_time() calls
within Xen.


>>>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>>>          }
>>>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>>>> +                  (try_platform_timer(&plt_tsc) > 0) )
>>>> +        {
>>>> +            /*
>>>> +             * Platform timer has changed and CPU time will only be updated
>>>> +             * after we set again the calibration timer, which means we need to
>>>> +             * seed again each local CPU time. At this stage TSC is known to be
>>>> +             * reliable i.e. monotonically increasing across all CPUs so this
>>>> +             * lets us remove the skew between platform timer and TSC, since
>>>> +             * these are now effectively the same.
>>>> +             */
>>>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>>>> +
>>>> +            /* Finish platform timer switch. */
>>>> +            try_platform_timer_tail();
>>>> +
>>>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>>>> +                   freq_string(plt_src.frequency));
>>>
>>> This message should have the same log level as the one at the end
>>> of init_platform_timer().
>> Agreed, but at the end of init_platform_timer there is a plain printk with an
>> omitted log level. Or do you mean to remove XENLOG_INFO from this printk above
>> or, instead add XENLOG_INFO to one printk at the end of 
>> init_platform_timer() ?
> 
> Well, info would again be too low a level for my taste. Hence either
> remove the level here (slightly preferred from my pov), or make both
> warning.
As your preference goes towards without the log level, I will re-introduce back
without it. Although I would find clearer to use printk with a log level as it
was advised in earlier reviews.

NB: My suggestion of info as level is because my usual line of thought is to see
warning as something potentially erroneous that user should be warned about, and
error as being an actual error.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-19 17:54         ` Joao Martins
@ 2016-09-20  7:13           ` Jan Beulich
  2016-09-20 10:15             ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-20  7:13 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>> stamps, which means platform timer is only used to seed the initial
>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>> having inconsistent readings in this short period (i.e. until
>>>>> calibration fires).
>>>>
>>>> And within this one second, which may cover some of Dom0's
>>>> booting up, it is okay to have inconsistencies?
>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>> previous calibration or platform time initialization (while still was HPET,
>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>> change it to "remove the possibility" in this last sentence?
>> 
>> Let's not do the 2nd step before the 1st, which is the question of
>> what happens prior to and what actually changes at this first
>> calibration (after 1 sec).
> The first calibration won't change much - this 1-sec was meant when having
> nop_rendezvous which is the first time platform timer would be used to set local
> cpu_time (will adjust the mention above as it's misleading for the reader as it
> doesn't refer to this patch).

So what makes it that it actually _is_ nop_rendezvous after that one
second? (And yes, part of this may indeed be just bad placement of
the description, as iirc nop_rendezvous gets introduced only in a later
patch.)

> Though reseeding the cpu times to boot_tsc_stamp
> prior to calibration (*without* the time latch values from previous platform
> timer) we can ensure NOW/get_s_time or calls to update_vcpu_system_time() will
> see monotonically increasing values. Otherwise keeping the previous ones,
> calibration would just add up local TSC delta and any existing divergence
> wouldn't be solved. On a later patch when we set the stable bit (with nop
> rendezvous), if these cpu times weren't adjusted guests/xen would still see
> small divergence between CPUs until the first calibration was fired (after we
> switched to TSC).

Right. But my concern is regarding the time window _between_
switching to TSC and running calibration the first time afterwards.
Part of this indeed gets taken care of by the re-seeding (which,
other than one could imply from what you say above, doesn't get
done [immediately] prior to calibration, but upon switching to TSC).
But is re-seeding without switching to nop_rendezvous really
sufficient (or in other words, can the patches really be broken up
like this)?

>>>>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>>>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>>>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>>>>          }
>>>>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>>>>> +                  (try_platform_timer(&plt_tsc) > 0) )
>>>>> +        {
>>>>> +            /*
>>>>> +             * Platform timer has changed and CPU time will only be updated
>>>>> +             * after we set again the calibration timer, which means we need to
>>>>> +             * seed again each local CPU time. At this stage TSC is known to be
>>>>> +             * reliable i.e. monotonically increasing across all CPUs so this
>>>>> +             * lets us remove the skew between platform timer and TSC, since
>>>>> +             * these are now effectively the same.
>>>>> +             */
>>>>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>>>>> +
>>>>> +            /* Finish platform timer switch. */
>>>>> +            try_platform_timer_tail();
>>>>> +
>>>>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>>>>> +                   freq_string(plt_src.frequency));
>>>>
>>>> This message should have the same log level as the one at the end
>>>> of init_platform_timer().
>>> Agreed, but at the end of init_platform_timer there is a plain printk with an
>>> omitted log level. Or do you mean to remove XENLOG_INFO from this printk above
>>> or, instead add XENLOG_INFO to one printk at the end of 
>>> init_platform_timer() ?
>> 
>> Well, info would again be too low a level for my taste. Hence either
>> remove the level here (slightly preferred from my pov), or make both
>> warning.
> As your preference goes towards without the log level, I will re-introduce back
> without it. Although I would find clearer to use printk with a log level as it
> was advised in earlier reviews.
> 
> NB: My suggestion of info as level is because my usual line of thought is to see
> warning as something potentially erroneous that user should be warned about, and
> error as being an actual error.

I understand and mostly agree. There are a few cases though (and
we're dealing with one here imo) where the absence of a log level is
better: We want these messages present in the log by default (which
they wouldn't be if they were info), but they're also not really
warnings. Perhaps simply for documentation purposes we could add
XENLOG_DEFAULT (expanding to an empty string), but that's not
something to be done in this patch.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-20  7:13           ` Jan Beulich
@ 2016-09-20 10:15             ` Joao Martins
  2016-09-20 10:23               ` Joao Martins
  2016-09-20 13:55               ` Jan Beulich
  0 siblings, 2 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-20 10:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>> calibration fires).
>>>>>
>>>>> And within this one second, which may cover some of Dom0's
>>>>> booting up, it is okay to have inconsistencies?
>>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>> previous calibration or platform time initialization (while still was HPET,
>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>>> change it to "remove the possibility" in this last sentence?
>>>
>>> Let's not do the 2nd step before the 1st, which is the question of
>>> what happens prior to and what actually changes at this first
>>> calibration (after 1 sec).
>> The first calibration won't change much - this 1-sec was meant when having
>> nop_rendezvous which is the first time platform timer would be used to set local
>> cpu_time (will adjust the mention above as it's misleading for the reader as it
>> doesn't refer to this patch).
> 
> So what makes it that it actually _is_ nop_rendezvous after that one
> second? (And yes, part of this may indeed be just bad placement of
> the description, as iirc nop_rendezvous gets introduced only in a later
> patch.)
Because with nop_rendezvous we will be using the platform timer to get a
monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
solves both ends of the problem, with nop_rendezvous until it is first
calibration fixes it, and without nop_rendezvous to remove the latch adjustment
from initial platform timer.

>> Though reseeding the cpu times to boot_tsc_stamp
>> prior to calibration (*without* the time latch values from previous platform
>> timer) we can ensure NOW/get_s_time or calls to update_vcpu_system_time() will
>> see monotonically increasing values. Otherwise keeping the previous ones,
>> calibration would just add up local TSC delta and any existing divergence
>> wouldn't be solved. On a later patch when we set the stable bit (with nop
>> rendezvous), if these cpu times weren't adjusted guests/xen would still see
>> small divergence between CPUs until the first calibration was fired (after we
>> switched to TSC).
> 
> Right. But my concern is regarding the time window _between_
> switching to TSC and running calibration the first time afterwards.
> Part of this indeed gets taken care of by the re-seeding (which,
> other than one could imply from what you say above, doesn't get
> done [immediately] prior to calibration, but upon switching to TSC).
Exactly.

> But is re-seeding without switching to nop_rendezvous really
> sufficient (or in other words, can the patches really be broken up
> like this)?
I think it is sufficient, unless I am missing out something. In my opinion what
requires it to be correct is that the tuple that is getting (re-)written to cpu
time has a matching stamp + stime and corresponds to the same time reference as
the next written after. Unless your doubts might originate from how
std_rendezvous currently updates the time infos but even there with your series
we fortunately add up to cpu_time with a monotonic delta (with a matching stamp
and stime tuple) and thus having the monotonic property even without
nop_rendezvous. I recall doing testing/validation of this patch alone and didn't
observe any divergences, but I must say that the most significant portion of
testing was with the whole series.

>>>>>> @@ -1480,6 +1570,25 @@ static int __init verify_tsc_reliability(void)
>>>>>>              printk("TSC warp detected, disabling TSC_RELIABLE\n");
>>>>>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>>>>>          }
>>>>>> +        else if ( !strcmp(opt_clocksource, "tsc") &&
>>>>>> +                  (try_platform_timer(&plt_tsc) > 0) )
>>>>>> +        {
>>>>>> +            /*
>>>>>> +             * Platform timer has changed and CPU time will only be updated
>>>>>> +             * after we set again the calibration timer, which means we need to
>>>>>> +             * seed again each local CPU time. At this stage TSC is known to be
>>>>>> +             * reliable i.e. monotonically increasing across all CPUs so this
>>>>>> +             * lets us remove the skew between platform timer and TSC, since
>>>>>> +             * these are now effectively the same.
>>>>>> +             */
>>>>>> +            on_selected_cpus(&cpu_online_map, reset_percpu_time, NULL, 1);
>>>>>> +
>>>>>> +            /* Finish platform timer switch. */
>>>>>> +            try_platform_timer_tail();
>>>>>> +
>>>>>> +            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
>>>>>> +                   freq_string(plt_src.frequency));
>>>>>
>>>>> This message should have the same log level as the one at the end
>>>>> of init_platform_timer().
>>>> Agreed, but at the end of init_platform_timer there is a plain printk with an
>>>> omitted log level. Or do you mean to remove XENLOG_INFO from this printk above
>>>> or, instead add XENLOG_INFO to one printk at the end of 
>>>> init_platform_timer() ?
>>>
>>> Well, info would again be too low a level for my taste. Hence either
>>> remove the level here (slightly preferred from my pov), or make both
>>> warning.
>> As your preference goes towards without the log level, I will re-introduce back
>> without it. Although I would find clearer to use printk with a log level as it
>> was advised in earlier reviews.
>>
>> NB: My suggestion of info as level is because my usual line of thought is to see
>> warning as something potentially erroneous that user should be warned about, and
>> error as being an actual error.
> 
> I understand and mostly agree. There are a few cases though (and
> we're dealing with one here imo) where the absence of a log level is
> better: We want these messages present in the log by default (which
> they wouldn't be if they were info), but they're also not really
> warnings. Perhaps simply for documentation purposes we could add
> XENLOG_DEFAULT (expanding to an empty string), but that's not
> something to be done in this patch.
I see, thanks for the clarification.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-20 10:15             ` Joao Martins
@ 2016-09-20 10:23               ` Joao Martins
  2016-09-20 13:55               ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-20 10:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/20/2016 11:15 AM, Joao Martins wrote:
> On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
>>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>>> calibration fires).
>>>>>>
>>>>>> And within this one second, which may cover some of Dom0's
>>>>>> booting up, it is okay to have inconsistencies?
>>>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>>> previous calibration or platform time initialization (while still was HPET,
>>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>>>> change it to "remove the possibility" in this last sentence?
>>>>
>>>> Let's not do the 2nd step before the 1st, which is the question of
>>>> what happens prior to and what actually changes at this first
>>>> calibration (after 1 sec).
>>> The first calibration won't change much - this 1-sec was meant when having
>>> nop_rendezvous which is the first time platform timer would be used to set local
>>> cpu_time (will adjust the mention above as it's misleading for the reader as it
>>> doesn't refer to this patch).
>>
>> So what makes it that it actually _is_ nop_rendezvous after that one
>> second? (And yes, part of this may indeed be just bad placement of
>> the description, as iirc nop_rendezvous gets introduced only in a later
>> patch.)
> Because with nop_rendezvous we will be using the platform timer to get a
> monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
> delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
> solves both ends of the problem, with nop_rendezvous until it is first
> calibration fixes it, and without nop_rendezvous to remove the latch adjustment
> from initial platform timer.
The part "until it is the first calibration fixes it" is very
confusing/redundant I am sorry. I meant it: "with nop_rendezvous which otherwise
would be the first calibration fixing it". The previous part was hinting like
there's a problem, when it is fixed but the reseeding.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-20 10:15             ` Joao Martins
  2016-09-20 10:23               ` Joao Martins
@ 2016-09-20 13:55               ` Jan Beulich
  2016-09-20 16:17                 ` Joao Martins
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-20 13:55 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 20.09.16 at 12:15, <joao.m.martins@oracle.com> wrote:
> On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
>>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>>> calibration fires).
>>>>>>
>>>>>> And within this one second, which may cover some of Dom0's
>>>>>> booting up, it is okay to have inconsistencies?
>>>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>>> previous calibration or platform time initialization (while still was HPET,
>>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>>>> change it to "remove the possibility" in this last sentence?
>>>>
>>>> Let's not do the 2nd step before the 1st, which is the question of
>>>> what happens prior to and what actually changes at this first
>>>> calibration (after 1 sec).
>>> The first calibration won't change much - this 1-sec was meant when having
>>> nop_rendezvous which is the first time platform timer would be used to set local
>>> cpu_time (will adjust the mention above as it's misleading for the reader as it
>>> doesn't refer to this patch).
>> 
>> So what makes it that it actually _is_ nop_rendezvous after that one
>> second? (And yes, part of this may indeed be just bad placement of
>> the description, as iirc nop_rendezvous gets introduced only in a later
>> patch.)
> Because with nop_rendezvous we will be using the platform timer to get a
> monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
> delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
> solves both ends of the problem, with nop_rendezvous until it is first
> calibration fixes it, and without nop_rendezvous to remove the latch adjustment
> from initial platform timer.

So am I getting you right (together with the second part of your reply
further down) that you escape answering the question raised by saying
that it doesn't really matter which rendezvous function gets used, when
TSC is the clock source? I.e. the introduction of nop_rendezvous is
really just to avoid unnecessary overhead? In which case it should
probably be a separate patch, saying so in its description.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-20 13:55               ` Jan Beulich
@ 2016-09-20 16:17                 ` Joao Martins
  2016-09-21  9:14                   ` Joao Martins
  2016-09-21  9:20                   ` Joao Martins
  0 siblings, 2 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-20 16:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On 09/20/2016 02:55 PM, Jan Beulich wrote:
>>>> On 20.09.16 at 12:15, <joao.m.martins@oracle.com> wrote:
>> On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
>>>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>>>> calibration fires).
>>>>>>>
>>>>>>> And within this one second, which may cover some of Dom0's
>>>>>>> booting up, it is okay to have inconsistencies?
>>>>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>>>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>>>> previous calibration or platform time initialization (while still was HPET,
>>>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>>>>> change it to "remove the possibility" in this last sentence?
>>>>>
>>>>> Let's not do the 2nd step before the 1st, which is the question of
>>>>> what happens prior to and what actually changes at this first
>>>>> calibration (after 1 sec).
>>>> The first calibration won't change much - this 1-sec was meant when having
>>>> nop_rendezvous which is the first time platform timer would be used to set local
>>>> cpu_time (will adjust the mention above as it's misleading for the reader as it
>>>> doesn't refer to this patch).
>>>
>>> So what makes it that it actually _is_ nop_rendezvous after that one
>>> second? (And yes, part of this may indeed be just bad placement of
>>> the description, as iirc nop_rendezvous gets introduced only in a later
>>> patch.)
>> Because with nop_rendezvous we will be using the platform timer to get a
>> monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
>> delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
>> solves both ends of the problem, with nop_rendezvous until it is first
>> calibration fixes it, and without nop_rendezvous to remove the latch adjustment
>> from initial platform timer.
> 
> So am I getting you right (together with the second part of your reply
> further down) that you escape answering the question raised by saying
> that it doesn't really matter which rendezvous function gets used, when
> TSC is the clock source?
Correct and in my defense I wasn't escaping the question, as despite
unfortunate mis-mention in the patch (or bad English) I think the above
explains it. During that time window, we now just need to ensure that we will
get monotonic results solely based on the individual CPU time (i.e. calls to
get_s_time or anything that uses cpu_time). Unless the calibration function is
doing something wrong/fishy, I don't see a reason for this to go wrong.

> I.e. the introduction of nop_rendezvous is
> really just to avoid unnecessary overhead?
Yes, but note that it's only the case since recent commit b64438c7c where
cpu_time stime is now incremented with TSC based deltas with a matching TSC
stamp. Before it wasn't the case. The main difference with nop_rendezvous (other
than the significant overhead) versus std_rendezvous is that we use a single
global tuple propagated to all cpus, whereas with std_rendezvous each tuple is
different and will vary according to when it rendezvous with cpu 0.

> In which case it should
> probably be a separate patch, saying so in its description.
OK, will move that out of Patch 4 into its own while keeping the same logic.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-20 16:17                 ` Joao Martins
@ 2016-09-21  9:14                   ` Joao Martins
  2016-09-21  9:20                   ` Joao Martins
  1 sibling, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-21  9:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/20/2016 05:17 PM, Joao Martins wrote:
> On 09/20/2016 02:55 PM, Jan Beulich wrote:
>>>>> On 20.09.16 at 12:15, <joao.m.martins@oracle.com> wrote:
>>> On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>>>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
>>>>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>>>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>>>>> calibration fires).
>>>>>>>>
>>>>>>>> And within this one second, which may cover some of Dom0's
>>>>>>>> booting up, it is okay to have inconsistencies?
>>>>>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>>>>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>>>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>>>>> previous calibration or platform time initialization (while still was HPET,
>>>>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>>>>>> change it to "remove the possibility" in this last sentence?
>>>>>>
>>>>>> Let's not do the 2nd step before the 1st, which is the question of
>>>>>> what happens prior to and what actually changes at this first
>>>>>> calibration (after 1 sec).
>>>>> The first calibration won't change much - this 1-sec was meant when having
>>>>> nop_rendezvous which is the first time platform timer would be used to set local
>>>>> cpu_time (will adjust the mention above as it's misleading for the reader as it
>>>>> doesn't refer to this patch).
>>>>
>>>> So what makes it that it actually _is_ nop_rendezvous after that one
>>>> second? (And yes, part of this may indeed be just bad placement of
>>>> the description, as iirc nop_rendezvous gets introduced only in a later
>>>> patch.)
>>> Because with nop_rendezvous we will be using the platform timer to get a
>>> monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
>>> delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
>>> solves both ends of the problem, with nop_rendezvous until it is first
>>> calibration fixes it, and without nop_rendezvous to remove the latch adjustment
>>> from initial platform timer.
>>
>> So am I getting you right (together with the second part of your reply
>> further down) that you escape answering the question raised by saying
>> that it doesn't really matter which rendezvous function gets used, when
>> TSC is the clock source?
> Correct and in my defense I wasn't escaping the question, as despite
> unfortunate mis-mention in the patch (or bad English) I think the above
> explains it. During that time window, we now just need to ensure that we will
> get monotonic results solely based on the individual CPU time (i.e. calls to
> get_s_time or anything that uses cpu_time). Unless the calibration function is
> doing something wrong/fishy, I don't see a reason for this to go wrong.
> 
>> I.e. the introduction of nop_rendezvous is
>> really just to avoid unnecessary overhead?
> Yes, but note that it's only the case since recent commit b64438c7c where
> cpu_time stime is now incremented with TSC based deltas with a matching TSC
> stamp. Before it wasn't the case. The main difference with nop_rendezvous (other
> than the significant overhead) versus std_rendezvous is that we use a single
> global tuple propagated to all cpus, whereas with std_rendezvous each tuple is
> different and will vary according to when it rendezvous with cpu 0.
I have to take back my comment: having redouble-checked on a test run overnight
without rendezvous and stable bit, and I saw time going backwards a few times
(~100ns) but only after a few hours (initially there were none probably why I
was led into error). This is in contrast to nop_rendezvous where I see none in
weeks.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-20 16:17                 ` Joao Martins
  2016-09-21  9:14                   ` Joao Martins
@ 2016-09-21  9:20                   ` Joao Martins
  2016-09-21  9:45                     ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Joao Martins @ 2016-09-21  9:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/20/2016 05:17 PM, Joao Martins wrote:
> On 09/20/2016 02:55 PM, Jan Beulich wrote:
>>>>> On 20.09.16 at 12:15, <joao.m.martins@oracle.com> wrote:
>>> On 09/20/2016 08:13 AM, Jan Beulich wrote:
>>>>>>> On 19.09.16 at 19:54, <joao.m.martins@oracle.com> wrote:
>>>>> On 09/19/2016 05:25 PM, Jan Beulich wrote:
>>>>>>>>> On 19.09.16 at 18:11, <joao.m.martins@oracle.com> wrote:
>>>>>>> On 09/19/2016 11:13 AM, Jan Beulich wrote:
>>>>>>>>>>> On 14.09.16 at 19:37, <joao.m.martins@oracle.com> wrote:
>>>>>>>>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>>>>>>>>> constant-TSC calibration fast path") updates to cpu time use local
>>>>>>>>> stamps, which means platform timer is only used to seed the initial
>>>>>>>>> cpu time.  With clocksource=tsc there is no need to be in sync with
>>>>>>>>> another clocksource, so we reseed the local/master stamps to be values
>>>>>>>>> of TSC and update the platform time stamps accordingly. Time
>>>>>>>>> calibration is set to 1sec after we switch to TSC, thus these stamps
>>>>>>>>> are reseeded to also ensure monotonic returning values right after the
>>>>>>>>> point we switch to TSC. This is also to avoid the possibility of
>>>>>>>>> having inconsistent readings in this short period (i.e. until
>>>>>>>>> calibration fires).
>>>>>>>>
>>>>>>>> And within this one second, which may cover some of Dom0's
>>>>>>>> booting up, it is okay to have inconsistencies?
>>>>>>> It's not okay which is why I am removing this possibility when switching to TSC.
>>>>>>> The inconsistencies in those readings (if I wasn't adjusting) would be because
>>>>>>> we would be using (in that 1-sec) those cpu time tuples calculated by the
>>>>>>> previous calibration or platform time initialization (while still was HPET,
>>>>>>> ACPI, etc as clocksource). Would you prefer me removing the "avoid" and instead
>>>>>>> change it to "remove the possibility" in this last sentence?
>>>>>>
>>>>>> Let's not do the 2nd step before the 1st, which is the question of
>>>>>> what happens prior to and what actually changes at this first
>>>>>> calibration (after 1 sec).
>>>>> The first calibration won't change much - this 1-sec was meant when having
>>>>> nop_rendezvous which is the first time platform timer would be used to set local
>>>>> cpu_time (will adjust the mention above as it's misleading for the reader as it
>>>>> doesn't refer to this patch).
>>>>
>>>> So what makes it that it actually _is_ nop_rendezvous after that one
>>>> second? (And yes, part of this may indeed be just bad placement of
>>>> the description, as iirc nop_rendezvous gets introduced only in a later
>>>> patch.)
>>> Because with nop_rendezvous we will be using the platform timer to get a
>>> monotonic time tuple and *set* cpu_time as opposed to just adding up plain TSC
>>> delta as it is the case prior to b64438c7c. Thus the reseeding of the cpu times
>>> solves both ends of the problem, with nop_rendezvous until it is first
>>> calibration fixes it, and without nop_rendezvous to remove the latch adjustment
>>> from initial platform timer.
>>
>> So am I getting you right (together with the second part of your reply
>> further down) that you escape answering the question raised by saying
>> that it doesn't really matter which rendezvous function gets used, when
>> TSC is the clock source?
> Correct and in my defense I wasn't escaping the question, as despite
> unfortunate mis-mention in the patch (or bad English) I think the above
> explains it. During that time window, we now just need to ensure that we will
> get monotonic results solely based on the individual CPU time (i.e. calls to
> get_s_time or anything that uses cpu_time). Unless the calibration function is
> doing something wrong/fishy, I don't see a reason for this to go wrong.
> 
>> I.e. the introduction of nop_rendezvous is
>> really just to avoid unnecessary overhead?
> Yes, but note that it's only the case since recent commit b64438c7c where
> cpu_time stime is now incremented with TSC based deltas with a matching TSC
> stamp. Before it wasn't the case. The main difference with nop_rendezvous (other
> than the significant overhead) versus std_rendezvous is that we use a single
> global tuple propagated to all cpus, whereas with std_rendezvous each tuple is
> different and will vary according to when it rendezvous with cpu 0.
> 
>> In which case it should
>> probably be a separate patch, saying so in its description.
> OK, will move that out of Patch 4 into its own while keeping the same logic.
I have to take back my comment: having redouble-checked on a test run overnight
with std_rendezvous and stable bit, and I saw time going backwards a few times
(~100ns) but only after a few hours (initially there were none - probably why I
was led into error). This is in contrast to nop_rendezvous where I see none in
weeks.

Joao

P.S. If you received similar earlier response but my mailer was misbehaving -
please ignore and sorry for the noise.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-21  9:20                   ` Joao Martins
@ 2016-09-21  9:45                     ` Jan Beulich
  2016-09-21 13:30                       ` Joao Martins
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2016-09-21  9:45 UTC (permalink / raw)
  To: Joao Martins; +Cc: Andrew Cooper, xen-devel

>>> On 21.09.16 at 11:20, <joao.m.martins@oracle.com> wrote:
> On 09/20/2016 05:17 PM, Joao Martins wrote:
>> On 09/20/2016 02:55 PM, Jan Beulich wrote:
>>> I.e. the introduction of nop_rendezvous is
>>> really just to avoid unnecessary overhead?
>> Yes, but note that it's only the case since recent commit b64438c7c where
>> cpu_time stime is now incremented with TSC based deltas with a matching TSC
>> stamp. Before it wasn't the case. The main difference with nop_rendezvous (other
>> than the significant overhead) versus std_rendezvous is that we use a single
>> global tuple propagated to all cpus, whereas with std_rendezvous each tuple is
>> different and will vary according to when it rendezvous with cpu 0.
>> 
>>> In which case it should
>>> probably be a separate patch, saying so in its description.
>> OK, will move that out of Patch 4 into its own while keeping the same logic.
> I have to take back my comment: having redouble-checked on a test run overnight
> with std_rendezvous and stable bit, and I saw time going backwards a few times
> (~100ns) but only after a few hours (initially there were none - probably why I
> was led into error). This is in contrast to nop_rendezvous where I see none in
> weeks.

Hmm, that would then seem to call for the introduction of
nop_rendezvous to be pulled ahead in the series (presumably into
the very patch we're discussing here).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 2/5] x86/time: implement tsc as clocksource
  2016-09-21  9:45                     ` Jan Beulich
@ 2016-09-21 13:30                       ` Joao Martins
  0 siblings, 0 replies; 25+ messages in thread
From: Joao Martins @ 2016-09-21 13:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel



On 09/21/2016 10:45 AM, Jan Beulich wrote:
>>>> On 21.09.16 at 11:20, <joao.m.martins@oracle.com> wrote:
>> On 09/20/2016 05:17 PM, Joao Martins wrote:
>>> On 09/20/2016 02:55 PM, Jan Beulich wrote:
>>>> I.e. the introduction of nop_rendezvous is
>>>> really just to avoid unnecessary overhead?
>>> Yes, but note that it's only the case since recent commit b64438c7c where
>>> cpu_time stime is now incremented with TSC based deltas with a matching TSC
>>> stamp. Before it wasn't the case. The main difference with nop_rendezvous (other
>>> than the significant overhead) versus std_rendezvous is that we use a single
>>> global tuple propagated to all cpus, whereas with std_rendezvous each tuple is
>>> different and will vary according to when it rendezvous with cpu 0.
>>>
>>>> In which case it should
>>>> probably be a separate patch, saying so in its description.
>>> OK, will move that out of Patch 4 into its own while keeping the same logic.
>> I have to take back my comment: having redouble-checked on a test run overnight
>> with std_rendezvous and stable bit, and I saw time going backwards a few times
>> (~100ns) but only after a few hours (initially there were none - probably why I
>> was led into error). This is in contrast to nop_rendezvous where I see none in
>> weeks.
> 
> Hmm, that would then seem to call for the introduction of
> nop_rendezvous to be pulled ahead in the series (presumably into
> the very patch we're discussing here).
Seems like it. I will move it into this patch, in which case patch 3 needs to be
moved before this one (since it's a prerequisite patch).

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-21 13:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-14 17:37 [PATCH v4 0/5] x86/time: PVCLOCK_TSC_STABLE_BIT support Joao Martins
2016-09-14 17:37 ` [PATCH v4 1/5] x86/time: refactor init_platform_time() Joao Martins
2016-09-14 17:37 ` [PATCH v4 2/5] x86/time: implement tsc as clocksource Joao Martins
2016-09-19 10:13   ` Jan Beulich
2016-09-19 16:11     ` Joao Martins
2016-09-19 16:25       ` Jan Beulich
2016-09-19 17:54         ` Joao Martins
2016-09-20  7:13           ` Jan Beulich
2016-09-20 10:15             ` Joao Martins
2016-09-20 10:23               ` Joao Martins
2016-09-20 13:55               ` Jan Beulich
2016-09-20 16:17                 ` Joao Martins
2016-09-21  9:14                   ` Joao Martins
2016-09-21  9:20                   ` Joao Martins
2016-09-21  9:45                     ` Jan Beulich
2016-09-21 13:30                       ` Joao Martins
2016-09-14 17:37 ` [PATCH v4 3/5] x86/time: refactor read_platform_stime() Joao Martins
2016-09-19 10:15   ` Jan Beulich
2016-09-19 16:11     ` Joao Martins
2016-09-14 17:37 ` [PATCH v4 4/5] x86/time: implement PVCLOCK_TSC_STABLE_BIT Joao Martins
2016-09-19 10:22   ` Jan Beulich
2016-09-19 16:11     ` Joao Martins
2016-09-14 17:37 ` [PATCH v4 5/5] x86/time: extend "tsc" param with "stable:socket" Joao Martins
2016-09-19 10:29   ` Jan Beulich
2016-09-19 16:11     ` Joao Martins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.