From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: [PATCH 1/8] x86/time: improve cross-CPU clock monotonicity (and more) Date: Wed, 15 Jun 2016 04:26:16 -0600 Message-ID: <5761496802000078000F5395@prv-mh.provo.novell.com> References: <576140F302000078000F52FE@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__Part2B1D1058.1__=" Return-path: Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bD81d-0007DI-Q6 for xen-devel@lists.xenproject.org; Wed, 15 Jun 2016 10:26:21 +0000 In-Reply-To: <576140F302000078000F52FE@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: xen-devel Cc: Andrew Cooper , Dario Faggioli , Joao Martins List-Id: xen-devel@lists.xenproject.org This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part2B1D1058.1__= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Using the bare return value from read_platform_stime() is not suitable when local_time_calibration() is going to use its fast path: Divergence of several dozen microseconds between NOW() return values on different CPUs results when platform and local time don't stay in close sync. Latch local and platform time on the CPU initiating AP bringup, such that the AP can use these values to seed its stime_local_stamp with as little of an error as possible. The boot CPU, otoh, can simply calculate the correct initial value (other CPUs could do so too with even greater accuracy than the approach being introduced, but that can work only if all CPUs' TSCs start ticking at the same time, which generally can't be assumed to be the case on multi-socket systems). This slightly defers init_percpu_time() (moved ahead by commit dd2658f966 ["x86/time: initialise time earlier during start_secondary()"]) in order to reduce as much as possible the gap between populating the stamps and consuming them. Signed-off-by: Jan Beulich --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -328,12 +328,12 @@ void start_secondary(void *unused) =20 percpu_traps_init(); =20 - init_percpu_time(); - cpu_init(); =20 smp_callin(); =20 + init_percpu_time(); + setup_secondary_APIC_clock(); =20 /* @@ -996,6 +996,8 @@ int __cpu_up(unsigned int cpu) if ( (ret =3D do_boot_cpu(apicid, cpu)) !=3D 0 ) return ret; =20 + time_latch_stamps(); + set_cpu_state(CPU_STATE_ONLINE); while ( !cpu_online(cpu) ) { --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1328,21 +1328,51 @@ static void time_calibration(void *unuse &r, 1); } =20 +static struct { + s_time_t local_stime, master_stime; +} ap_bringup_ref; + +void time_latch_stamps(void) { + unsigned long flags; + u64 tsc; + + local_irq_save(flags); + ap_bringup_ref.master_stime =3D read_platform_stime(); + tsc =3D rdtsc(); + local_irq_restore(flags); + + ap_bringup_ref.local_stime =3D get_s_time_fixed(tsc); +} + void init_percpu_time(void) { struct cpu_time *t =3D &this_cpu(cpu_time); unsigned long flags; + u64 tsc; s_time_t now; =20 /* Initial estimate for TSC rate. */ t->tsc_scale =3D per_cpu(cpu_time, 0).tsc_scale; =20 local_irq_save(flags); - t->local_tsc_stamp =3D rdtsc(); now =3D read_platform_stime(); + tsc =3D rdtsc(); local_irq_restore(flags); =20 t->stime_master_stamp =3D now; + /* + * To avoid a discontinuity (TSC and platform clock can't be expected + * to be in perfect sync), initialization here needs to match up with + * local_time_calibration()'s decision whether to use its fast path. + */ + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) + { + if ( system_state < SYS_STATE_smp_boot ) + now =3D get_s_time_fixed(tsc); + else + now +=3D ap_bringup_ref.local_stime - ap_bringup_ref.master_st= ime; + } + t->local_tsc_stamp =3D tsc; t->stime_local_stamp =3D now; } =20 --- a/xen/include/asm-x86/time.h +++ b/xen/include/asm-x86/time.h @@ -40,6 +40,7 @@ int time_suspend(void); int time_resume(void); =20 void init_percpu_time(void); +void time_latch_stamps(void); =20 struct ioreq; int hwdom_pit_access(struct ioreq *ioreq); --=__Part2B1D1058.1__= Content-Type: text/plain; name="x86-time-init-local-stime.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="x86-time-init-local-stime.patch" x86/time: adjust local system time initialization=0A=0AUsing the bare = return value from read_platform_stime() is not suitable=0Awhen local_time_c= alibration() is going to use its fast path: Divergence=0Aof several dozen = microseconds between NOW() return values on different=0ACPUs results when = platform and local time don't stay in close sync.=0A=0ALatch local and = platform time on the CPU initiating AP bringup, such=0Athat the AP can use = these values to seed its stime_local_stamp with as=0Alittle of an error as = possible. The boot CPU, otoh, can simply=0Acalculate the correct initial = value (other CPUs could do so too with=0Aeven greater accuracy than the = approach being introduced, but that can=0Awork only if all CPUs' TSCs = start ticking at the same time, which=0Agenerally can't be assumed to be = the case on multi-socket systems).=0A=0AThis slightly defers init_percpu_ti= me() (moved ahead by commit=0Add2658f966 ["x86/time: initialise time = earlier during=0Astart_secondary()"]) in order to reduce as much as = possible the gap=0Abetween populating the stamps and consuming them.=0A=0AS= igned-off-by: Jan Beulich =0A=0A--- a/xen/arch/x86/smpbo= ot.c=0A+++ b/xen/arch/x86/smpboot.c=0A@@ -328,12 +328,12 @@ void start_seco= ndary(void *unused)=0A =0A percpu_traps_init();=0A =0A- init_percpu_= time();=0A-=0A cpu_init();=0A =0A smp_callin();=0A =0A+ = init_percpu_time();=0A+=0A setup_secondary_APIC_clock();=0A =0A = /*=0A@@ -996,6 +996,8 @@ int __cpu_up(unsigned int cpu)=0A if ( (ret = =3D do_boot_cpu(apicid, cpu)) !=3D 0 )=0A return ret;=0A =0A+ = time_latch_stamps();=0A+=0A set_cpu_state(CPU_STATE_ONLINE);=0A = while ( !cpu_online(cpu) )=0A {=0A--- a/xen/arch/x86/time.c=0A+++ = b/xen/arch/x86/time.c=0A@@ -1328,21 +1328,51 @@ static void time_calibratio= n(void *unuse=0A &r, 1);=0A }=0A =0A+static struct = {=0A+ s_time_t local_stime, master_stime;=0A+} ap_bringup_ref;=0A+=0A+vo= id time_latch_stamps(void) {=0A+ unsigned long flags;=0A+ u64 = tsc;=0A+=0A+ local_irq_save(flags);=0A+ ap_bringup_ref.master_stime = =3D read_platform_stime();=0A+ tsc =3D rdtsc();=0A+ local_irq_restore= (flags);=0A+=0A+ ap_bringup_ref.local_stime =3D get_s_time_fixed(tsc);= =0A+}=0A+=0A void init_percpu_time(void)=0A {=0A struct cpu_time *t = =3D &this_cpu(cpu_time);=0A unsigned long flags;=0A+ u64 tsc;=0A = s_time_t now;=0A =0A /* Initial estimate for TSC rate. */=0A = t->tsc_scale =3D per_cpu(cpu_time, 0).tsc_scale;=0A =0A local_irq_save(= flags);=0A- t->local_tsc_stamp =3D rdtsc();=0A now =3D read_platform= _stime();=0A+ tsc =3D rdtsc();=0A local_irq_restore(flags);=0A =0A = t->stime_master_stamp =3D now;=0A+ /*=0A+ * To avoid a discontinu= ity (TSC and platform clock can't be expected=0A+ * to be in perfect = sync), initialization here needs to match up with=0A+ * local_time_cali= bration()'s decision whether to use its fast path.=0A+ */=0A+ if ( = boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )=0A+ {=0A+ if ( = system_state < SYS_STATE_smp_boot )=0A+ now =3D get_s_time_fixed= (tsc);=0A+ else=0A+ now +=3D ap_bringup_ref.local_stime = - ap_bringup_ref.master_stime;=0A+ }=0A+ t->local_tsc_stamp =3D = tsc;=0A t->stime_local_stamp =3D now;=0A }=0A =0A--- a/xen/include/asm= -x86/time.h=0A+++ b/xen/include/asm-x86/time.h=0A@@ -40,6 +40,7 @@ int = time_suspend(void);=0A int time_resume(void);=0A =0A void init_percpu_time(= void);=0A+void time_latch_stamps(void);=0A =0A struct ioreq;=0A int = hwdom_pit_access(struct ioreq *ioreq);=0A --=__Part2B1D1058.1__= Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --=__Part2B1D1058.1__=--