xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 1/8] x86/time: improve cross-CPU clock monotonicity (and more)
Date: Wed, 15 Jun 2016 23:51:58 +0100	[thread overview]
Message-ID: <5761DC0E.5010201@oracle.com> (raw)
In-Reply-To: <5761496802000078000F5395@prv-mh.provo.novell.com>

On 06/15/2016 11:26 AM, Jan Beulich wrote:
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -1328,21 +1328,51 @@ static void time_calibration(void *unuse
>                       &r, 1);
>  }
>  
> +static struct {
> +    s_time_t local_stime, master_stime;
> +} ap_bringup_ref;
> +
> +void time_latch_stamps(void) {
                                ^ style: shouldn't this bracket be on a new line?

> +    unsigned long flags;
> +    u64 tsc;
> +
> +    local_irq_save(flags);
> +    ap_bringup_ref.master_stime = read_platform_stime();
> +    tsc = rdtsc();
> +    local_irq_restore(flags);
> +
> +    ap_bringup_ref.local_stime = get_s_time_fixed(tsc);
> +}
> +
>  void init_percpu_time(void)
>  {
>      struct cpu_time *t = &this_cpu(cpu_time);
>      unsigned long flags;
> +    u64 tsc;
>      s_time_t now;
>  
>      /* Initial estimate for TSC rate. */
>      t->tsc_scale = per_cpu(cpu_time, 0).tsc_scale;
>  
>      local_irq_save(flags);
> -    t->local_tsc_stamp = rdtsc();
>      now = read_platform_stime();
Do we need to take another read from the platform timer here? We could
just reuse the one on ap_bringup_ref.master_stime. See also below.

> +    tsc = rdtsc();
>      local_irq_restore(flags);
>  
>      t->stime_master_stamp = now;
> +    /*
> +     * To avoid a discontinuity (TSC and platform clock can't be expected
> +     * to be in perfect sync), initialization here needs to match up with
> +     * local_time_calibration()'s decision whether to use its fast path.
> +     */
> +    if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
> +    {
> +        if ( system_state < SYS_STATE_smp_boot )
> +            now = get_s_time_fixed(tsc);
> +        else
> +            now += ap_bringup_ref.local_stime - ap_bringup_ref.master_stime;
> +    }
> +    t->local_tsc_stamp    = tsc;

As far as my current experimentation (both v1 of this patch and the whole series on
v2), the source of the remaining warps could be fixed with this seeding. Though I
think this seeding might not yet be correct. Essentially the boot CPU you do
get_s_time_fixed(tsc), which basically gives you a value strictly calculated with TSC
since boot_tsc_stamp. For the other CPUs you append the time skew between TSC and
platform timer calculated before AP bringup, with a value just read on the platform
timer. Which I assume you take this as an approximation skew between TSC and platform
timer.

So, before this patch cpu_time is filled like this:

CPU 0: M0 , T0
CPU 1: M1 , T1
CPU 2: M2 , T2

With this patch it goes like:

CPU 0: L0 , T0
CPU 1: L1 = (M1 + (L - M)), T1
CPU 2: L2 = (M2 + (L - M)), T2

Given that,

1) values separated by commas are respectively local stime, tsc that are
written in cpu_time
2) M2 > M1 > M0 as values read from platform timer.
3) L and M solely as values calculated on CPU doing AP bringup.

After this seeding, local_time_calibration will increment the deltas between
calibrations every second, based entirely on a monotonically increasing TSC. Altough
I see two issues here: 1) appending to a new read from platform time which might
already be offsetted by the one taken from the boot CPU and 2) the skew you calculate
don't account for the current tsc. Which leads to local stime and tsc still not being
a valid pair for the secondary CPUs. So it would be possible that get_s_time(S0) on
CPU0 immediately followed by a get_s_time(S1) on CPU1 [S1 > S0] ends up returning a
value backwards IOW L0 + scale(S0 - T0) is bigger than L1 + scale(S1 - T1). That is
independently of the deltas being added on every calibration.

So how about we do the seeding in another way?

1) Relying on individual CPU TSC like you do on CPU 0:

     t->stamp.master_stime = now;
+    t->stamp.local_tsc = 0;
+    t->stamp.local_stime = 0;
+
     if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
     {
         now = get_s_time_fixed(tsc);
     }

(...)

Or 2) taking into account the skew between platform timer and tsc we take on
init_percpu_time. Diff below based on this series:

@@ -1394,11 +1508,14 @@ void init_percpu_time(void)
     t->tsc_scale = per_cpu(cpu_time, 0).tsc_scale;

     local_irq_save(flags);
-    now = read_platform_stime();
+    now = ap_bringup_ref.master_stime;
     tsc = rdtsc_ordered();
     local_irq_restore(flags);

     t->stamp.master_stime = now;
+    t->stamp.local_tsc = boot_tsc_stamp;
+    t->stamp.local_stime = 0;
+

     /*
      * To avoid a discontinuity (TSC and platform clock can't be expected
      * to be in perfect sync), initialization here needs to match up with
@@ -1406,10 +1523,12 @@ void init_percpu_time(void)
      */
     if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
     {
+        u64 stime = get_s_time_fixed(tsc);
+
         if ( system_state < SYS_STATE_smp_boot )
-            now = get_s_time_fixed(tsc);
+            now = stime;
         else
-            now += ap_bringup_ref.local_stime - ap_bringup_ref.master_stime;
+            now += stime - ap_bringup_ref.master_stime;
     }

Second option follows a similar thinking of this patch, but on both ways the
local_stime will match the tsc on init_percpu_time, thus being a matched pair. I have
a test similar to check_tsc_warp but with get_s_time() and I no longer observe time
going backwards. But without the above I still observe it even on short periods.
Thoughts? (Sorry for the long answer)

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-06-15 22:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-15  9:50 [PATCH 0/8] x86/time: improve cross-CPU clock monotonicity (and more) Jan Beulich
2016-06-15 10:26 ` [PATCH 1/8] " Jan Beulich
2016-06-15 10:32   ` Jan Beulich
2016-06-15 22:51   ` Joao Martins [this message]
2016-06-16  8:27     ` Jan Beulich
2016-06-16 20:27       ` Joao Martins
2016-06-17  7:32         ` Jan Beulich
2016-06-21 12:05           ` Joao Martins
2016-06-21 12:28             ` Jan Beulich
2016-06-21 13:57               ` Joao Martins
2016-08-02 19:30   ` Andrew Cooper
2016-06-15 10:26 ` [PATCH 2/8] x86: also generate assembler usable equates for synthesized features Jan Beulich
2016-06-20 12:50   ` Andrew Cooper
2016-06-15 10:27 ` [PATCH 3/8] x86/time: introduce and use rdtsc_ordered() Jan Beulich
2016-06-20 12:59   ` Andrew Cooper
2016-06-20 13:06     ` Jan Beulich
2016-06-20 13:07       ` Andrew Cooper
2016-07-11 11:39     ` Dario Faggioli
2016-06-15 10:28 ` [PATCH 4/8] x86/time: calibrate TSC against platform timer Jan Beulich
2016-06-20 14:20   ` Andrew Cooper
2016-06-20 15:19     ` Jan Beulich
2016-08-02 19:25       ` Andrew Cooper
2016-08-03  9:32         ` Jan Beulich
2016-06-15 10:28 ` [PATCH 5/8] x86/time: correctly honor late clearing of TSC related feature flags Jan Beulich
2016-06-20 14:32   ` Andrew Cooper
2016-06-20 15:20     ` Jan Beulich
2016-07-04 15:39       ` Andrew Cooper
2016-07-04 15:53         ` Jan Beulich
2016-08-02 19:08           ` Andrew Cooper
2016-08-03  9:43             ` Jan Beulich
2016-08-31 13:42               ` Andrew Cooper
2016-08-31 14:03                 ` Jan Beulich
2016-06-15 10:29 ` [PATCH 6/8] x86/time: support 32-bit wide ACPI PM timer Jan Beulich
2016-07-04 15:40   ` Andrew Cooper
2016-06-15 10:30 ` [PATCH 7/8] x86/time: fold recurring code Jan Beulich
2016-07-04 15:43   ` Andrew Cooper
2016-06-15 10:30 ` [PATCH 8/8] x86/time: group time stamps into a structure Jan Beulich
2016-07-04 15:57   ` Andrew Cooper
2016-07-01  7:44 ` Ping: [PATCH 0/8] x86/time: improve cross-CPU clock monotonicity (and more) Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5761DC0E.5010201@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).