Hey, This series is a repost with the comments that I got so far and hopefully could be considered for Xen 4.7. PVCLOCK_TSC_STABLE_BIT is the flag telling the guest that the vcpu_time_info (pvti) are monotonic as seen by any CPU, a feature which is currently not supported. As it is (i.e. bindly setting the flag), we can observe that this property isn't there: a process using vdso clock_gettime/gettimeofday will observe a significant amount of warps (i.e. time going backwards) and it's due to 1) time calibration skew in xen rendezvous algorithm 2) clocksource not in sync with TSC. These warps are seen more frequently on PV guests (potentially because vcpu time infos are only updated when guest is in kernel mode, and perhaps lack of tsc_offset?), and in rare ocasions on HVM guests. It is worth noting that with guests VCPUs pinned, only PV guests see these warps. But on HVM guests specifically: such warps only occur when one of guest VCPUs is pinned to CPU0. This series aims to propose a solution to that and it's divided as following: (R) * Patch 1: Adds the missing flag field to vcpu_time_info. (U) * Patch 2: Adds a new clocksource based on TSC (U) * Patch 3, 4: Adjustments for patch 5 * Patch 5: Implements the PVCLOCK_TSC_STABLE_BIT [R := Reviewed-by ;; U := Updated] PVCLOCK_TSC_STABLE_BIT is set only when using clocksource=tsc, and so it remains optional unless specified by the admin. The test was running time-warp-test, that constantly calls clock_gettime/gettimeofday on every CPU. It measures a delta with the previous returned value and mark a warp if it's negative. I measured it during periods of 1h and 6h and check how many warps and their values (alongside the amount of time skew). Measurements/Changelog are included in individual patches. Note that most of the testing has been done with Linux 4.4 in which these warps/skew could be easily observed as vdso would use each vCPU pvti. Though Linux >= 4.5 changes this behaviour and use only the vCPU0 pvti though still requiring PVCLOCK_TSC_STABLE_BIT flag support. Any comments appreciated, Thanks! Joao Joao Martins (5): public/xen.h: add flags field to vcpu_time_info x86/time: implement tsc as clocksource x86/time: streamline platform time init on plt_init() x86/time: refactor read_platform_stime() x86/time: implement PVCLOCK_TSC_STABLE_BIT xen/arch/x86/time.c | 132 ++++++++++++++++++++++++++++++++++++++++------- xen/include/public/xen.h | 6 ++- 2 files changed, 119 insertions(+), 19 deletions(-) -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
This field has two possible flags (as of latest pvclock ABI shared with KVM). flags: bits in this field indicate extended capabilities coordinated between the guest and the hypervisor. Specifically on KVM, availability of specific flags has to be checked in 0x40000001 cpuid leaf. On Xen, we don't have that but we can still check some of the flags after registering the time info page since a force_update_vcpu_system_time is performed. Current flags are: flag bit | cpuid bit | meaning ------------------------------------------------------------- | | time measures taken across 0 | 24 | multiple cpus are guaranteed to | | be monotonic ------------------------------------------------------------- | | guest vcpu has been paused by 1 | N/A | the host | | ------------------------------------------------------------- Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org> --- xen/include/public/xen.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 64ba7ab..08373f6 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -614,10 +614,14 @@ struct vcpu_time_info { */ uint32_t tsc_to_system_mul; int8_t tsc_shift; - int8_t pad1[3]; + int8_t flags; + int8_t pad1[2]; }; /* 32 bytes */ typedef struct vcpu_time_info vcpu_time_info_t; +#define PVCLOCK_TSC_STABLE_BIT (1 << 0) +#define PVCLOCK_GUEST_STOPPED (1 << 1) + struct vcpu_info { /* * 'evtchn_upcall_pending' is written non-zero by Xen to indicate -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Introduce support for using TSC as platform time which is the highest resolution time and most performant to get (~20 nsecs). Though there are also several problems associated with its usage, and there isn't a complete (and architecturally defined) guarantee that all machines will provide reliable and monotonic TSC across all CPUs, on different sockets and different P/C states. I believe Intel to be the only that can guarantee that. For this reason it's set with less priority when compared to HPET unless adminstrator changes "clocksource" boot option to "tsc". Upon initializing it, we also check for time warps and appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and NONSTOP_TSC. And in case none of these conditions are met, we fail in order to fallback to the next available clocksource. It is also worth noting that with clocksource=tsc there isn't any need to synchronise with another clocksource, and I could verify that great portion the time skew was eliminated and seeing much less time warps happening. With HPET I observed ~500 warps in the period of 1h of around 27 us, and with TSC down to 50 warps in the same period having each warp < 100 ns. The warps still exist though but are only related to cross CPU calibration (being how much it takes to rendezvous with master), in which a later patch in this series aims to solve. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Changes since RFC: - Spelling fixes in the commit message. - Remove unused clocksource_is_tsc variable and introduce it instead on the patch that uses it. - Move plt_tsc from second to last in the available clocksources. --- xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 687e39b..1311c58 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) } /************************************************************ + * PLATFORM TIMER 4: TSC + */ +static u64 tsc_freq; +static unsigned long tsc_max_warp; +static void tsc_check_reliability(void); + +static int __init init_tsctimer(struct platform_timesource *pts) +{ + bool_t tsc_reliable = 0; + + tsc_check_reliability(); + + if ( tsc_max_warp > 0 ) + { + tsc_reliable = 0; + printk("TSC: didn't passed warp test\n"); + } + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || + ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) ) + { + tsc_reliable = 1; + } + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) + { + tsc_reliable = (max_cstate <= 2); + + if (tsc_reliable) + printk("TSC: no deep Cstates, deemed reliable\n"); + else + printk("TSC: deep Cstates possible, so not reliable\n"); + } + + pts->frequency = tsc_freq; + return tsc_reliable; +} + +static u64 read_tsc(void) +{ + return rdtsc(); +} + +static void resume_tsctimer(struct platform_timesource *pts) +{ +} + +static struct platform_timesource __initdata plt_tsc = +{ + .id = "tsc", + .name = "TSC", + .read_counter = read_tsc, + .counter_bits = 64, + .init = init_tsctimer, + .resume = resume_tsctimer, +}; + +/************************************************************ * GENERIC PLATFORM TIMER INFRASTRUCTURE */ @@ -536,7 +593,7 @@ static void resume_platform_timer(void) static void __init init_platform_timer(void) { static struct platform_timesource * __initdata plt_timers[] = { - &plt_hpet, &plt_pmtimer, &plt_pit + &plt_hpet, &plt_pmtimer, &plt_pit, &plt_tsc }; struct platform_timesource *pts = NULL; @@ -1181,7 +1238,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp) } } -static unsigned long tsc_max_warp, tsc_check_count; +static unsigned long tsc_check_count; static cpumask_t tsc_check_cpumask; static void tsc_check_slave(void *unused) @@ -1465,6 +1522,7 @@ void __init early_time_init(void) struct cpu_time *t = &this_cpu(cpu_time); u64 tmp = init_pit_and_calibrate_tsc(); + tsc_freq = tmp; set_time_scale(&t->tsc_scale, tmp); t->local_tsc_stamp = boot_tsc_stamp; -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
And use to initialize platform time solely for clocksource=tsc, as opposed to initializing platform overflow timer, which would only fire in ~180 years (on 2.2 Ghz Broadwell processor). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Changes since RFC: - Move clocksource_is_tsc variable to this patch. - s/1000000000/SECONDS(1) --- xen/arch/x86/time.c | 44 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 1311c58..5af8902 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) /************************************************************ * PLATFORM TIMER 4: TSC */ +static bool_t clocksource_is_tsc = 0; static u64 tsc_freq; static unsigned long tsc_max_warp; static void tsc_check_reliability(void); @@ -466,7 +467,7 @@ static int __init init_tsctimer(struct platform_timesource *pts) } pts->frequency = tsc_freq; - return tsc_reliable; + return ( clocksource_is_tsc = tsc_reliable ); } static u64 read_tsc(void) @@ -516,17 +517,31 @@ static s_time_t __read_platform_stime(u64 platform_time) return (stime_platform_stamp + scale_delta(diff, &plt_scale)); } +static void __plt_init(void) +{ + u64 count; + + ASSERT(spin_is_locked(&platform_timer_lock)); + count = plt_src.read_counter(); + plt_stamp64 += (count - plt_stamp) & plt_mask; + plt_stamp = count; +} + +static void plt_init(void) +{ + spin_lock_irq(&platform_timer_lock); + __plt_init(); + spin_unlock_irq(&platform_timer_lock); +} + static void plt_overflow(void *unused) { int i; - u64 count; s_time_t now, plt_now, plt_wrap; spin_lock_irq(&platform_timer_lock); - count = plt_src.read_counter(); - plt_stamp64 += (count - plt_stamp) & plt_mask; - plt_stamp = count; + __plt_init(); now = NOW(); plt_wrap = __read_platform_stime(plt_stamp64); @@ -633,11 +648,22 @@ static void __init init_platform_timer(void) set_time_scale(&plt_scale, pts->frequency); - plt_overflow_period = scale_delta( - 1ull << (pts->counter_bits-1), &plt_scale); - init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); plt_src = *pts; - plt_overflow(NULL); + + if ( clocksource_is_tsc ) + { + plt_init(); + } + else + { + plt_overflow_period = scale_delta( + 1ull << (pts->counter_bits-1), &plt_scale); + init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); + plt_overflow(NULL); + + printk("Platform timer overflow period is %lu secs\n", + plt_overflow_period/SECONDS(1)); + } platform_timer_stamp = plt_stamp64; stime_platform_stamp = NOW(); -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
To fetch the last read from the clocksource which was used to calculate system_time. In the case of clocksource=tsc we will use it to set tsc_timestamp. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> --- xen/arch/x86/time.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 5af8902..89c35d0 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -508,6 +508,7 @@ static s_time_t stime_platform_stamp; /* System time at below platform time */ static u64 platform_timer_stamp; /* Platform time at above system time */ static u64 plt_stamp64; /* 64-bit platform counter stamp */ static u64 plt_stamp; /* hardware-width platform counter stamp */ +static u64 plt_stamp_counter; /* last read since read_counter */ static struct timer plt_overflow_timer; static s_time_t __read_platform_stime(u64 platform_time) @@ -566,7 +567,7 @@ static void plt_overflow(void *unused) set_timer(&plt_overflow_timer, NOW() + plt_overflow_period); } -static s_time_t read_platform_stime(void) +static s_time_t read_platform_stime(u64 *stamp) { u64 count; s_time_t stime; @@ -574,8 +575,11 @@ static s_time_t read_platform_stime(void) ASSERT(!local_irq_is_enabled()); spin_lock(&platform_timer_lock); - count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask); + plt_stamp_counter = plt_src.read_counter(); + count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask); stime = __read_platform_stime(count); + if (stamp) + *stamp = plt_stamp_counter; spin_unlock(&platform_timer_lock); return stime; @@ -693,7 +697,7 @@ void cstate_restore_tsc(void) if ( boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) return; - write_tsc(stime2tsc(read_platform_stime())); + write_tsc(stime2tsc(read_platform_stime(NULL))); } /*************************************************************************** @@ -1012,7 +1016,7 @@ int cpu_frequency_change(u64 freq) local_irq_disable(); /* Platform time /first/, as we may be delayed by platform_timer_lock. */ - t->stime_master_stamp = read_platform_stime(); + t->stime_master_stamp = read_platform_stime(NULL); /* TSC-extrapolated time may be bogus after frequency change. */ /*t->stime_local_stamp = get_s_time();*/ t->stime_local_stamp = t->stime_master_stamp; @@ -1330,7 +1334,7 @@ static void time_calibration_tsc_rendezvous(void *_r) if ( r->master_stime == 0 ) { - r->master_stime = read_platform_stime(); + r->master_stime = read_platform_stime(NULL); r->master_tsc_stamp = rdtsc(); } atomic_inc(&r->semaphore); @@ -1422,7 +1426,7 @@ void init_percpu_time(void) local_irq_save(flags); t->local_tsc_stamp = rdtsc(); - now = read_platform_stime(); + now = read_platform_stime(NULL); local_irq_restore(flags); t->stime_master_stamp = now; -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
When using TSC as clocksource we will solely rely on TSC for updating vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at different instants meaning every EPOCH + delta. This delta is variable depending on the time the CPU calibrates with CPU 0 (master), and will likely be different and variable across vCPUS. This means that each VCPU pvti won't account to its calibration error which could lead to time going backwards, and allowing a situation where time read on VCPU B immediately after A being smaller. While this doesn't happen a lot, I was able to observe (for clocksource=tsc) around 50 times in an hour having warps of < 100 ns. This patch proposes relying on host TSC synchronization and passthrough of the master tsc to the guest, when running on a TSC-safe platform. On the rendezvous function we will retrieve the platform time in ns and the last count read by the clocksource that was used to compute system time. master will write both master_tsc_stamp and master_stime, and the other vCPUS (slave) will use it to update their correspondent time infos. This way we can guarantee that on a platform with a constant and reliable TSC, that the time read on vcpu B right after A is bigger independently of the VCPU calibration error. Since pvclock time infos are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then enables usage of VDSO on Linux. IIUC, this is similar to how it's implemented on KVM. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- Cc: Keir Fraser <keir@xen.org> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> --- xen/arch/x86/time.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 89c35d0..a17529c 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -917,6 +917,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) _u.tsc_timestamp = tsc_stamp; _u.system_time = t->stime_local_stamp; + if ( clocksource_is_tsc ) + _u.flags |= PVCLOCK_TSC_STABLE_BIT; if ( is_hvm_domain(d) ) _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset; @@ -1377,9 +1379,12 @@ static void time_calibration_std_rendezvous(void *_r) if ( smp_processor_id() == 0 ) { + u64 last_counter; while ( atomic_read(&r->semaphore) != (total_cpus - 1) ) cpu_relax(); - r->master_stime = read_platform_stime(); + r->master_stime = read_platform_stime(&last_counter); + if ( clocksource_is_tsc ) + r->master_tsc_stamp = last_counter; mb(); /* write r->master_stime /then/ signal */ atomic_inc(&r->semaphore); } @@ -1391,7 +1396,10 @@ static void time_calibration_std_rendezvous(void *_r) mb(); /* receive signal /then/ read r->master_stime */ } - c->local_tsc_stamp = rdtsc(); + if ( clocksource_is_tsc ) + c->local_tsc_stamp = r->master_tsc_stamp; + else + c->local_tsc_stamp = rdtsc(); c->stime_local_stamp = get_s_time(); c->stime_master_stamp = r->master_stime; -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 17/03/16 16:12, Joao Martins wrote: > This field has two possible flags (as of latest pvclock ABI > shared with KVM). > > flags: bits in this field indicate extended capabilities > coordinated between the guest and the hypervisor. Specifically > on KVM, availability of specific flags has to be checked in > 0x40000001 cpuid leaf. On Xen, we don't have that but we can > still check some of the flags after registering the time info > page since a force_update_vcpu_system_time is performed. > > Current flags are: > > flag bit | cpuid bit | meaning > ------------------------------------------------------------- > | | time measures taken across > 0 | 24 | multiple cpus are guaranteed to > | | be monotonic > ------------------------------------------------------------- > | | guest vcpu has been paused by > 1 | N/A | the host > | | > ------------------------------------------------------------- > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Has the Linux maintainers file been patched to include xen-devel, to avoid them altering our ABI unnoticed in the future? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 17/03/16 16:12, Joao Martins wrote: > Introduce support for using TSC as platform time which is the highest > resolution time and most performant to get (~20 nsecs). Though there > are also several problems associated with its usage, and there isn't a > complete (and architecturally defined) guarantee that all machines > will provide reliable and monotonic TSC across all CPUs, on different > sockets and different P/C states. I believe Intel to be the only that > can guarantee that. For this reason it's set with less priority when > compared to HPET unless adminstrator changes "clocksource" boot option > to "tsc". Upon initializing it, we also check for time warps and > appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and > NONSTOP_TSC. And in case none of these conditions are met, we fail in > order to fallback to the next available clocksource. > > It is also worth noting that with clocksource=tsc there isn't any > need to synchronise with another clocksource, and I could verify that > great portion the time skew was eliminated and seeing much less time > warps happening. With HPET I observed ~500 warps in the period > of 1h of around 27 us, and with TSC down to 50 warps in the same > period having each warp < 100 ns. The warps still exist though but > are only related to cross CPU calibration (being how much it takes to > rendezvous with master), in which a later patch in this series aims to > solve. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > --- > Cc: Keir Fraser <keir@xen.org> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> Some style corrections, but no functional problems. Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > > Changes since RFC: > - Spelling fixes in the commit message. > - Remove unused clocksource_is_tsc variable and introduce it instead > on the patch that uses it. > - Move plt_tsc from second to last in the available clocksources. > --- > xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 60 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index 687e39b..1311c58 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) > } > > /************************************************************ > + * PLATFORM TIMER 4: TSC > + */ > +static u64 tsc_freq; > +static unsigned long tsc_max_warp; > +static void tsc_check_reliability(void); > + > +static int __init init_tsctimer(struct platform_timesource *pts) > +{ > + bool_t tsc_reliable = 0; > + > + tsc_check_reliability(); > + > + if ( tsc_max_warp > 0 ) > + { > + tsc_reliable = 0; > + printk("TSC: didn't passed warp test\n"); printk(XENLOG_INFO "TSC ... > + } > + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || > + ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) ) You don't need extra spaces on inner brackets, so boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) is fine. > + { > + tsc_reliable = 1; > + } > + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) > + { > + tsc_reliable = (max_cstate <= 2); > + > + if (tsc_reliable) Please add some extra ones here though. > + printk("TSC: no deep Cstates, deemed reliable\n"); > + else > + printk("TSC: deep Cstates possible, so not reliable\n"); XENLOG_INFO as well please. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 17/03/16 16:12, Joao Martins wrote: > And use to initialize platform time solely for clocksource=tsc, > as opposed to initializing platform overflow timer, which would > only fire in ~180 years (on 2.2 Ghz Broadwell processor). > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Again, just style corrections. Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index 1311c58..5af8902 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) > /************************************************************ > * PLATFORM TIMER 4: TSC > */ > +static bool_t clocksource_is_tsc = 0; No need to explicitly initialise to 0. > static u64 tsc_freq; > static unsigned long tsc_max_warp; > static void tsc_check_reliability(void); > @@ -466,7 +467,7 @@ static int __init init_tsctimer(struct platform_timesource *pts) > } > > pts->frequency = tsc_freq; > - return tsc_reliable; > + return ( clocksource_is_tsc = tsc_reliable ); While this does work, please avoid mixing an assignment and a return. Something like pts->frequency = tsc_freq; clocksource_is_tsc = tsc_reliable; return tsc_reliable; is fine and easier logic to read. > @@ -633,11 +648,22 @@ static void __init init_platform_timer(void) > > set_time_scale(&plt_scale, pts->frequency); > > - plt_overflow_period = scale_delta( > - 1ull << (pts->counter_bits-1), &plt_scale); > - init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); > plt_src = *pts; > - plt_overflow(NULL); > + > + if ( clocksource_is_tsc ) > + { > + plt_init(); > + } > + else > + { > + plt_overflow_period = scale_delta( > + 1ull << (pts->counter_bits-1), &plt_scale); As you are moving this codeblock, please fix the style to (pts->counter_bits - 1) ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 17/03/16 16:12, Joao Martins wrote: > To fetch the last read from the clocksource which was used to > calculate system_time. In the case of clocksource=tsc we will > use it to set tsc_timestamp. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Again, just minor style issues. Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > --- > Cc: Keir Fraser <keir@xen.org> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > xen/arch/x86/time.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index 5af8902..89c35d0 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -508,6 +508,7 @@ static s_time_t stime_platform_stamp; /* System time at below platform time */ > static u64 platform_timer_stamp; /* Platform time at above system time */ > static u64 plt_stamp64; /* 64-bit platform counter stamp */ > static u64 plt_stamp; /* hardware-width platform counter stamp */ > +static u64 plt_stamp_counter; /* last read since read_counter */ This can have its scope reduced to within read_platform_stime() > static struct timer plt_overflow_timer; > > static s_time_t __read_platform_stime(u64 platform_time) > @@ -566,7 +567,7 @@ static void plt_overflow(void *unused) > set_timer(&plt_overflow_timer, NOW() + plt_overflow_period); > } > > -static s_time_t read_platform_stime(void) > +static s_time_t read_platform_stime(u64 *stamp) > { > u64 count; > s_time_t stime; > @@ -574,8 +575,11 @@ static s_time_t read_platform_stime(void) > ASSERT(!local_irq_is_enabled()); > > spin_lock(&platform_timer_lock); > - count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask); > + plt_stamp_counter = plt_src.read_counter(); > + count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask); > stime = __read_platform_stime(count); > + if (stamp) Spaces. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 17/03/16 16:12, Joao Martins wrote: > When using TSC as clocksource we will solely rely on TSC for updating > vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at > different instants meaning every EPOCH + delta. This delta is variable > depending on the time the CPU calibrates with CPU 0 (master), and will > likely be different and variable across vCPUS. This means that each VCPU > pvti won't account to its calibration error which could lead to time > going backwards, and allowing a situation where time read on VCPU B > immediately after A being smaller. While this doesn't happen a lot, I > was able to observe (for clocksource=tsc) around 50 times in an hour > having warps of < 100 ns. > > This patch proposes relying on host TSC synchronization and passthrough > of the master tsc to the guest, when running on a TSC-safe platform. On > the rendezvous function we will retrieve the platform time in ns and the > last count read by the clocksource that was used to compute system time. > master will write both master_tsc_stamp and master_stime, and the other > vCPUS (slave) will use it to update their correspondent time infos. > This way we can guarantee that on a platform with a constant and > reliable TSC, that the time read on vcpu B right after A is bigger > independently of the VCPU calibration error. Since pvclock time infos > are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then > enables usage of VDSO on Linux. IIUC, this is similar to how it's > implemented on KVM. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > --- > Cc: Keir Fraser <keir@xen.org> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > xen/arch/x86/time.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c > index 89c35d0..a17529c 100644 > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -917,6 +917,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) > > _u.tsc_timestamp = tsc_stamp; > _u.system_time = t->stime_local_stamp; > + if ( clocksource_is_tsc ) > + _u.flags |= PVCLOCK_TSC_STABLE_BIT; > > if ( is_hvm_domain(d) ) > _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset; > @@ -1377,9 +1379,12 @@ static void time_calibration_std_rendezvous(void *_r) > > if ( smp_processor_id() == 0 ) > { > + u64 last_counter; Blank line here please. > while ( atomic_read(&r->semaphore) != (total_cpus - 1) ) > cpu_relax(); > - r->master_stime = read_platform_stime(); > + r->master_stime = read_platform_stime(&last_counter); > + if ( clocksource_is_tsc ) > + r->master_tsc_stamp = last_counter; > mb(); /* write r->master_stime /then/ signal */ > atomic_inc(&r->semaphore); > } > @@ -1391,7 +1396,10 @@ static void time_calibration_std_rendezvous(void *_r) > mb(); /* receive signal /then/ read r->master_stime */ > } > > - c->local_tsc_stamp = rdtsc(); > + if ( clocksource_is_tsc ) > + c->local_tsc_stamp = r->master_tsc_stamp; > + else > + c->local_tsc_stamp = rdtsc(); > c->stime_local_stamp = get_s_time(); > c->stime_master_stamp = r->master_stime; > The point of the rendezvous is to run rdtsc() at a the time on each cpu at the same time. With this logic, it seems that you don't need the rendezvous at all. Avoiding the time_calibration_std_rendezvous() entirely in this situation would be the better, surely? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/18/2016 08:12 PM, Andrew Cooper wrote: > On 17/03/16 16:12, Joao Martins wrote: >> This field has two possible flags (as of latest pvclock ABI >> shared with KVM). >> >> flags: bits in this field indicate extended capabilities >> coordinated between the guest and the hypervisor. Specifically >> on KVM, availability of specific flags has to be checked in >> 0x40000001 cpuid leaf. On Xen, we don't have that but we can >> still check some of the flags after registering the time info >> page since a force_update_vcpu_system_time is performed. >> >> Current flags are: >> >> flag bit | cpuid bit | meaning >> ------------------------------------------------------------- >> | | time measures taken across >> 0 | 24 | multiple cpus are guaranteed to >> | | be monotonic >> ------------------------------------------------------------- >> | | guest vcpu has been paused by >> 1 | N/A | the host >> | | >> ------------------------------------------------------------- >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > Thanks! > Has the Linux maintainers file been patched to include xen-devel, to > avoid them altering our ABI unnoticed in the future? > Not yet, but I had planned to do so when sending the v2 of the linux side. But perhaps you want it beforehand? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/18/2016 08:21 PM, Andrew Cooper wrote: > On 17/03/16 16:12, Joao Martins wrote: >> Introduce support for using TSC as platform time which is the highest >> resolution time and most performant to get (~20 nsecs). Though there >> are also several problems associated with its usage, and there isn't a >> complete (and architecturally defined) guarantee that all machines >> will provide reliable and monotonic TSC across all CPUs, on different >> sockets and different P/C states. I believe Intel to be the only that >> can guarantee that. For this reason it's set with less priority when >> compared to HPET unless adminstrator changes "clocksource" boot option >> to "tsc". Upon initializing it, we also check for time warps and >> appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and >> NONSTOP_TSC. And in case none of these conditions are met, we fail in >> order to fallback to the next available clocksource. >> >> It is also worth noting that with clocksource=tsc there isn't any >> need to synchronise with another clocksource, and I could verify that >> great portion the time skew was eliminated and seeing much less time >> warps happening. With HPET I observed ~500 warps in the period >> of 1h of around 27 us, and with TSC down to 50 warps in the same >> period having each warp < 100 ns. The warps still exist though but >> are only related to cross CPU calibration (being how much it takes to >> rendezvous with master), in which a later patch in this series aims to >> solve. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> --- >> Cc: Keir Fraser <keir@xen.org> >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > Some style corrections, but no functional problems. > > Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > I've fixed all your comments for v2, Thanks! >> >> Changes since RFC: >> - Spelling fixes in the commit message. >> - Remove unused clocksource_is_tsc variable and introduce it instead >> on the patch that uses it. >> - Move plt_tsc from second to last in the available clocksources. >> --- >> xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 60 insertions(+), 2 deletions(-) >> >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >> index 687e39b..1311c58 100644 >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) >> } >> >> /************************************************************ >> + * PLATFORM TIMER 4: TSC >> + */ >> +static u64 tsc_freq; >> +static unsigned long tsc_max_warp; >> +static void tsc_check_reliability(void); >> + >> +static int __init init_tsctimer(struct platform_timesource *pts) >> +{ >> + bool_t tsc_reliable = 0; >> + >> + tsc_check_reliability(); >> + >> + if ( tsc_max_warp > 0 ) >> + { >> + tsc_reliable = 0; >> + printk("TSC: didn't passed warp test\n"); > > printk(XENLOG_INFO "TSC ... > >> + } >> + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || >> + ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && >> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) ) > > You don't need extra spaces on inner brackets, so > > boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || > (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) > > is fine. > >> + { >> + tsc_reliable = 1; >> + } >> + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) >> + { >> + tsc_reliable = (max_cstate <= 2); >> + >> + if (tsc_reliable) > > Please add some extra ones here though. > >> + printk("TSC: no deep Cstates, deemed reliable\n"); >> + else >> + printk("TSC: deep Cstates possible, so not reliable\n"); > > XENLOG_INFO as well please. > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 21/03/16 11:42, Joao Martins wrote: > > On 03/18/2016 08:12 PM, Andrew Cooper wrote: >> On 17/03/16 16:12, Joao Martins wrote: >>> This field has two possible flags (as of latest pvclock ABI >>> shared with KVM). >>> >>> flags: bits in this field indicate extended capabilities >>> coordinated between the guest and the hypervisor. Specifically >>> on KVM, availability of specific flags has to be checked in >>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can >>> still check some of the flags after registering the time info >>> page since a force_update_vcpu_system_time is performed. >>> >>> Current flags are: >>> >>> flag bit | cpuid bit | meaning >>> ------------------------------------------------------------- >>> | | time measures taken across >>> 0 | 24 | multiple cpus are guaranteed to >>> | | be monotonic >>> ------------------------------------------------------------- >>> | | guest vcpu has been paused by >>> 1 | N/A | the host >>> | | >>> ------------------------------------------------------------- >>> >>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> >> > Thanks! > >> Has the Linux maintainers file been patched to include xen-devel, to >> avoid them altering our ABI unnoticed in the future? >> > Not yet, but I had planned to do so when sending the v2 of the linux side. But > perhaps you want it beforehand? Just so long as it doesn't get lost. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/18/2016 08:32 PM, Andrew Cooper wrote: > On 17/03/16 16:12, Joao Martins wrote: >> And use to initialize platform time solely for clocksource=tsc, >> as opposed to initializing platform overflow timer, which would >> only fire in ~180 years (on 2.2 Ghz Broadwell processor). >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > > Again, just style corrections. > > Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > I've fixed all these comments in v2, too. Thanks! >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >> index 1311c58..5af8902 100644 >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -434,6 +434,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) >> /************************************************************ >> * PLATFORM TIMER 4: TSC >> */ >> +static bool_t clocksource_is_tsc = 0; > > No need to explicitly initialise to 0. > >> static u64 tsc_freq; >> static unsigned long tsc_max_warp; >> static void tsc_check_reliability(void); >> @@ -466,7 +467,7 @@ static int __init init_tsctimer(struct platform_timesource *pts) >> } >> >> pts->frequency = tsc_freq; >> - return tsc_reliable; >> + return ( clocksource_is_tsc = tsc_reliable ); > > While this does work, please avoid mixing an assignment and a return. > > Something like > > pts->frequency = tsc_freq; > clocksource_is_tsc = tsc_reliable; > > return tsc_reliable; > > is fine and easier logic to read. > >> @@ -633,11 +648,22 @@ static void __init init_platform_timer(void) >> >> set_time_scale(&plt_scale, pts->frequency); >> >> - plt_overflow_period = scale_delta( >> - 1ull << (pts->counter_bits-1), &plt_scale); >> - init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); >> plt_src = *pts; >> - plt_overflow(NULL); >> + >> + if ( clocksource_is_tsc ) >> + { >> + plt_init(); >> + } >> + else >> + { >> + plt_overflow_period = scale_delta( >> + 1ull << (pts->counter_bits-1), &plt_scale); > > As you are moving this codeblock, please fix the style to > (pts->counter_bits - 1) > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/18/2016 08:34 PM, Andrew Cooper wrote: > On 17/03/16 16:12, Joao Martins wrote: >> To fetch the last read from the clocksource which was used to >> calculate system_time. In the case of clocksource=tsc we will >> use it to set tsc_timestamp. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > > Again, just minor style issues. > > Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > All fixed, though I found one change missing in this series, specifically on time_calibration_std_rendezvous. Otherwise this commit would break compilation. See chunk below for the change I am adding: @@ -1377,7 +1380,7 @@ static void time_calibration_std_rendezvous(void *_r) { while ( atomic_read(&r->semaphore) != (total_cpus - 1) ) cpu_relax(); - r->master_stime = read_platform_stime(); + r->master_stime = read_platform_stime(NULL); mb(); /* write r->master_stime /then/ signal */ atomic_inc(&r->semaphore); } Having this fixed, could I still keep your Reviewed-by? >> --- >> Cc: Keir Fraser <keir@xen.org> >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> xen/arch/x86/time.c | 16 ++++++++++------ >> 1 file changed, 10 insertions(+), 6 deletions(-) >> >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >> index 5af8902..89c35d0 100644 >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -508,6 +508,7 @@ static s_time_t stime_platform_stamp; /* System time at below platform time */ >> static u64 platform_timer_stamp; /* Platform time at above system time */ >> static u64 plt_stamp64; /* 64-bit platform counter stamp */ >> static u64 plt_stamp; /* hardware-width platform counter stamp */ >> +static u64 plt_stamp_counter; /* last read since read_counter */ > > This can have its scope reduced to within read_platform_stime() > >> static struct timer plt_overflow_timer; >> >> static s_time_t __read_platform_stime(u64 platform_time) >> @@ -566,7 +567,7 @@ static void plt_overflow(void *unused) >> set_timer(&plt_overflow_timer, NOW() + plt_overflow_period); >> } >> >> -static s_time_t read_platform_stime(void) >> +static s_time_t read_platform_stime(u64 *stamp) >> { >> u64 count; >> s_time_t stime; >> @@ -574,8 +575,11 @@ static s_time_t read_platform_stime(void) >> ASSERT(!local_irq_is_enabled()); >> >> spin_lock(&platform_timer_lock); >> - count = plt_stamp64 + ((plt_src.read_counter() - plt_stamp) & plt_mask); >> + plt_stamp_counter = plt_src.read_counter(); >> + count = plt_stamp64 + ((plt_stamp_counter - plt_stamp) & plt_mask); >> stime = __read_platform_stime(count); >> + if (stamp) > > Spaces. > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/18/2016 08:58 PM, Andrew Cooper wrote: > On 17/03/16 16:12, Joao Martins wrote: >> When using TSC as clocksource we will solely rely on TSC for updating >> vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at >> different instants meaning every EPOCH + delta. This delta is variable >> depending on the time the CPU calibrates with CPU 0 (master), and will >> likely be different and variable across vCPUS. This means that each VCPU >> pvti won't account to its calibration error which could lead to time >> going backwards, and allowing a situation where time read on VCPU B >> immediately after A being smaller. While this doesn't happen a lot, I >> was able to observe (for clocksource=tsc) around 50 times in an hour >> having warps of < 100 ns. >> >> This patch proposes relying on host TSC synchronization and passthrough >> of the master tsc to the guest, when running on a TSC-safe platform. On >> the rendezvous function we will retrieve the platform time in ns and the >> last count read by the clocksource that was used to compute system time. >> master will write both master_tsc_stamp and master_stime, and the other >> vCPUS (slave) will use it to update their correspondent time infos. >> This way we can guarantee that on a platform with a constant and >> reliable TSC, that the time read on vcpu B right after A is bigger >> independently of the VCPU calibration error. Since pvclock time infos >> are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then >> enables usage of VDSO on Linux. IIUC, this is similar to how it's >> implemented on KVM. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> --- >> Cc: Keir Fraser <keir@xen.org> >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> xen/arch/x86/time.c | 12 ++++++++++-- >> 1 file changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >> index 89c35d0..a17529c 100644 >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -917,6 +917,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) >> >> _u.tsc_timestamp = tsc_stamp; >> _u.system_time = t->stime_local_stamp; >> + if ( clocksource_is_tsc ) >> + _u.flags |= PVCLOCK_TSC_STABLE_BIT; >> >> if ( is_hvm_domain(d) ) >> _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset; >> @@ -1377,9 +1379,12 @@ static void time_calibration_std_rendezvous(void *_r) >> >> if ( smp_processor_id() == 0 ) >> { >> + u64 last_counter; > > Blank line here please. > >> while ( atomic_read(&r->semaphore) != (total_cpus - 1) ) >> cpu_relax(); >> - r->master_stime = read_platform_stime(); >> + r->master_stime = read_platform_stime(&last_counter); >> + if ( clocksource_is_tsc ) >> + r->master_tsc_stamp = last_counter; >> mb(); /* write r->master_stime /then/ signal */ >> atomic_inc(&r->semaphore); >> } >> @@ -1391,7 +1396,10 @@ static void time_calibration_std_rendezvous(void *_r) >> mb(); /* receive signal /then/ read r->master_stime */ >> } >> >> - c->local_tsc_stamp = rdtsc(); >> + if ( clocksource_is_tsc ) >> + c->local_tsc_stamp = r->master_tsc_stamp; >> + else >> + c->local_tsc_stamp = rdtsc(); >> c->stime_local_stamp = get_s_time(); >> c->stime_master_stamp = r->master_stime; >> > > The point of the rendezvous is to run rdtsc() at a the time on each cpu > at the same time. With this logic, it seems that you don't need the > rendezvous at all. > > Avoiding the time_calibration_std_rendezvous() entirely in this > situation would be the better, surely? Indeed, and would look cleaner too. I've changed the approach in this patch for v2 following your guideline, along with some retesting. > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/21/2016 11:43 AM, Andrew Cooper wrote: > On 21/03/16 11:42, Joao Martins wrote: >> >> On 03/18/2016 08:12 PM, Andrew Cooper wrote: >>> On 17/03/16 16:12, Joao Martins wrote: >>>> This field has two possible flags (as of latest pvclock ABI >>>> shared with KVM). >>>> >>>> flags: bits in this field indicate extended capabilities >>>> coordinated between the guest and the hypervisor. Specifically >>>> on KVM, availability of specific flags has to be checked in >>>> 0x40000001 cpuid leaf. On Xen, we don't have that but we can >>>> still check some of the flags after registering the time info >>>> page since a force_update_vcpu_system_time is performed. >>>> >>>> Current flags are: >>>> >>>> flag bit | cpuid bit | meaning >>>> ------------------------------------------------------------- >>>> | | time measures taken across >>>> 0 | 24 | multiple cpus are guaranteed to >>>> | | be monotonic >>>> ------------------------------------------------------------- >>>> | | guest vcpu has been paused by >>>> 1 | N/A | the host >>>> | | >>>> ------------------------------------------------------------- >>>> >>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>>> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >>> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> >> Thanks! >> >>> Has the Linux maintainers file been patched to include xen-devel, to >>> avoid them altering our ABI unnoticed in the future? >>> >> Not yet, but I had planned to do so when sending the v2 of the linux side. But >> perhaps you want it beforehand? > > Just so long as it doesn't get lost. It definitely won't get lost. Joao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 21/03/16 11:45, Joao Martins wrote: > > All fixed, though I found one change missing in this series, specifically on > time_calibration_std_rendezvous. Otherwise this commit would break compilation. > See chunk below for the change I am adding: > > @@ -1377,7 +1380,7 @@ static void time_calibration_std_rendezvous(void *_r) > { > while ( atomic_read(&r->semaphore) != (total_cpus - 1) ) > cpu_relax(); > - r->master_stime = read_platform_stime(); > + r->master_stime = read_platform_stime(NULL); > mb(); /* write r->master_stime /then/ signal */ > atomic_inc(&r->semaphore); > } > > Having this fixed, could I still keep your Reviewed-by? Yes - that's fine. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 17.03.16 at 17:12, <joao.m.martins@oracle.com> wrote: > --- a/xen/include/public/xen.h > +++ b/xen/include/public/xen.h > @@ -614,10 +614,14 @@ struct vcpu_time_info { > */ > uint32_t tsc_to_system_mul; > int8_t tsc_shift; > - int8_t pad1[3]; > + int8_t flags; For use as flags I'm sure this would better be uint8_t. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 21/03/16 15:10, Jan Beulich wrote: >>>> On 17.03.16 at 17:12, <joao.m.martins@oracle.com> wrote: >> --- a/xen/include/public/xen.h >> +++ b/xen/include/public/xen.h >> @@ -614,10 +614,14 @@ struct vcpu_time_info { >> */ >> uint32_t tsc_to_system_mul; >> int8_t tsc_shift; >> - int8_t pad1[3]; >> + int8_t flags; > For use as flags I'm sure this would better be uint8_t. Sadly not possible. Linux have already made the above adjustment without (CC'ing xen-devel), so that ABI is set. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/21/2016 03:27 PM, Andrew Cooper wrote: > On 21/03/16 15:10, Jan Beulich wrote: >>>>> On 17.03.16 at 17:12, <joao.m.martins@oracle.com> wrote: >>> --- a/xen/include/public/xen.h >>> +++ b/xen/include/public/xen.h >>> @@ -614,10 +614,14 @@ struct vcpu_time_info { >>> */ >>> uint32_t tsc_to_system_mul; >>> int8_t tsc_shift; >>> - int8_t pad1[3]; >>> + int8_t flags; >> For use as flags I'm sure this would better be uint8_t. > > Sadly not possible. Linux have already made the above adjustment > without (CC'ing xen-devel), so that ABI is set. > I was double checking again and it's my mistake. Both flags and pad fields are uint8_t. I will fix both in v2, to make sure it's all kept the same on both header files. My apologies. Joao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/18/2016 08:21 PM, Andrew Cooper wrote: > On 17/03/16 16:12, Joao Martins wrote: >> Introduce support for using TSC as platform time which is the highest >> resolution time and most performant to get (~20 nsecs). Though there >> are also several problems associated with its usage, and there isn't a >> complete (and architecturally defined) guarantee that all machines >> will provide reliable and monotonic TSC across all CPUs, on different >> sockets and different P/C states. I believe Intel to be the only that >> can guarantee that. For this reason it's set with less priority when >> compared to HPET unless adminstrator changes "clocksource" boot option >> to "tsc". Upon initializing it, we also check for time warps and >> appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and >> NONSTOP_TSC. And in case none of these conditions are met, we fail in >> order to fallback to the next available clocksource. >> >> It is also worth noting that with clocksource=tsc there isn't any >> need to synchronise with another clocksource, and I could verify that >> great portion the time skew was eliminated and seeing much less time >> warps happening. With HPET I observed ~500 warps in the period >> of 1h of around 27 us, and with TSC down to 50 warps in the same >> period having each warp < 100 ns. The warps still exist though but >> are only related to cross CPU calibration (being how much it takes to >> rendezvous with master), in which a later patch in this series aims to >> solve. >> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >> --- >> Cc: Keir Fraser <keir@xen.org> >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > Some style corrections, but no functional problems. > > Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> > I found out one issue in the tsc clocksource initialization path with respect to the reliability check. This check is running when initializing platform timer, but not all CPUS are up at that point (init_xen_time()) which means that the check will always succeed. So for clocksource=tsc I need to defer initialization to a later point (on verify_tsc_reliability()) and if successful switch to that clocksource. Unless you disagree, v2 would have this and just requires one additional preparatory change prior to this patch. Joao >> >> Changes since RFC: >> - Spelling fixes in the commit message. >> - Remove unused clocksource_is_tsc variable and introduce it instead >> on the patch that uses it. >> - Move plt_tsc from second to last in the available clocksources. >> --- >> xen/arch/x86/time.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 60 insertions(+), 2 deletions(-) >> >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c >> index 687e39b..1311c58 100644 >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) >> } >> >> /************************************************************ >> + * PLATFORM TIMER 4: TSC >> + */ >> +static u64 tsc_freq; >> +static unsigned long tsc_max_warp; >> +static void tsc_check_reliability(void); >> + >> +static int __init init_tsctimer(struct platform_timesource *pts) >> +{ >> + bool_t tsc_reliable = 0; >> + >> + tsc_check_reliability(); >> + >> + if ( tsc_max_warp > 0 ) >> + { >> + tsc_reliable = 0; >> + printk("TSC: didn't passed warp test\n"); > > printk(XENLOG_INFO "TSC ... > >> + } >> + else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || >> + ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && >> + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ) ) > > You don't need extra spaces on inner brackets, so > > boot_cpu_has(X86_FEATURE_TSC_RELIABLE) || > (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) ) > > is fine. > >> + { >> + tsc_reliable = 1; >> + } >> + else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) >> + { >> + tsc_reliable = (max_cstate <= 2); >> + >> + if (tsc_reliable) > > Please add some extra ones here though. > >> + printk("TSC: no deep Cstates, deemed reliable\n"); >> + else >> + printk("TSC: deep Cstates possible, so not reliable\n"); > > XENLOG_INFO as well please. > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote: > > On 03/18/2016 08:21 PM, Andrew Cooper wrote: >> On 17/03/16 16:12, Joao Martins wrote: >>> Introduce support for using TSC as platform time which is the highest >>> resolution time and most performant to get (~20 nsecs). Though there >>> are also several problems associated with its usage, and there isn't a >>> complete (and architecturally defined) guarantee that all machines >>> will provide reliable and monotonic TSC across all CPUs, on different >>> sockets and different P/C states. I believe Intel to be the only that >>> can guarantee that. For this reason it's set with less priority when >>> compared to HPET unless adminstrator changes "clocksource" boot option >>> to "tsc". Upon initializing it, we also check for time warps and >>> appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and >>> NONSTOP_TSC. And in case none of these conditions are met, we fail in >>> order to fallback to the next available clocksource. >>> >>> It is also worth noting that with clocksource=tsc there isn't any >>> need to synchronise with another clocksource, and I could verify that >>> great portion the time skew was eliminated and seeing much less time >>> warps happening. With HPET I observed ~500 warps in the period >>> of 1h of around 27 us, and with TSC down to 50 warps in the same >>> period having each warp < 100 ns. The warps still exist though but >>> are only related to cross CPU calibration (being how much it takes to >>> rendezvous with master), in which a later patch in this series aims to >>> solve. >>> >>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>> --- >>> Cc: Keir Fraser <keir@xen.org> >>> Cc: Jan Beulich <jbeulich@suse.com> >>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >> >> Some style corrections, but no functional problems. >> >> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> >> > I found out one issue in the tsc clocksource initialization path with respect to > the reliability check. This check is running when initializing platform timer, > but not all CPUS are up at that point (init_xen_time()) which means that the > check will always succeed. So for clocksource=tsc I need to defer initialization > to a later point (on verify_tsc_reliability()) and if successful switch to that > clocksource. Unless you disagree, v2 would have this and just requires one > additional preparatory change prior to this patch. Hmm, that's suspicious when thinking about CPU hotplug: What do you intend to do when a CPU comes online later, failing the check? So far we don't do runtime switching of the clock source, and I'm not sure that's a good idea to do when there are running guests. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/22/2016 12:46 PM, Jan Beulich wrote: >>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote: > >> >> On 03/18/2016 08:21 PM, Andrew Cooper wrote: >>> On 17/03/16 16:12, Joao Martins wrote: >>>> Introduce support for using TSC as platform time which is the highest >>>> resolution time and most performant to get (~20 nsecs). Though there >>>> are also several problems associated with its usage, and there isn't a >>>> complete (and architecturally defined) guarantee that all machines >>>> will provide reliable and monotonic TSC across all CPUs, on different >>>> sockets and different P/C states. I believe Intel to be the only that >>>> can guarantee that. For this reason it's set with less priority when >>>> compared to HPET unless adminstrator changes "clocksource" boot option >>>> to "tsc". Upon initializing it, we also check for time warps and >>>> appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and >>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in >>>> order to fallback to the next available clocksource. >>>> >>>> It is also worth noting that with clocksource=tsc there isn't any >>>> need to synchronise with another clocksource, and I could verify that >>>> great portion the time skew was eliminated and seeing much less time >>>> warps happening. With HPET I observed ~500 warps in the period >>>> of 1h of around 27 us, and with TSC down to 50 warps in the same >>>> period having each warp < 100 ns. The warps still exist though but >>>> are only related to cross CPU calibration (being how much it takes to >>>> rendezvous with master), in which a later patch in this series aims to >>>> solve. >>>> >>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>>> --- >>>> Cc: Keir Fraser <keir@xen.org> >>>> Cc: Jan Beulich <jbeulich@suse.com> >>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>> >>> Some style corrections, but no functional problems. >>> >>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> >>> >> I found out one issue in the tsc clocksource initialization path with respect to >> the reliability check. This check is running when initializing platform timer, >> but not all CPUS are up at that point (init_xen_time()) which means that the >> check will always succeed. So for clocksource=tsc I need to defer initialization >> to a later point (on verify_tsc_reliability()) and if successful switch to that >> clocksource. Unless you disagree, v2 would have this and just requires one >> additional preparatory change prior to this patch. > > Hmm, that's suspicious when thinking about CPU hotplug: What > do you intend to do when a CPU comes online later, failing the > check? Good point, but I am not sure whether that would happen. The initcall verify_tsc_reliability seems to be called only for the boot processor. Or are you saying that it's case that initcalls are called too when hotplugging cpus later on? If that's the case then my suggestion wouldn't work as you point out - or rather without having runtime switching support as you point out below. > So far we don't do runtime switching of the clock source, > and I'm not sure that's a good idea to do when there are running > guests. Totally agree, but I would be proposing to be at initialization phase where there wouldn't be guests running. We would start with HPET, then only switch to TSC if that check has passed (and would happen once). Simpler alternative would be to call init_xen_time after all CPUs are brought up (and would also keep this patch as is), but I am not sure about the repercussions of that. Joao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote: > > On 03/22/2016 12:46 PM, Jan Beulich wrote: >>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote: >> >>> >>> On 03/18/2016 08:21 PM, Andrew Cooper wrote: >>>> On 17/03/16 16:12, Joao Martins wrote: >>>>> Introduce support for using TSC as platform time which is the highest >>>>> resolution time and most performant to get (~20 nsecs). Though there >>>>> are also several problems associated with its usage, and there isn't a >>>>> complete (and architecturally defined) guarantee that all machines >>>>> will provide reliable and monotonic TSC across all CPUs, on different >>>>> sockets and different P/C states. I believe Intel to be the only that >>>>> can guarantee that. For this reason it's set with less priority when >>>>> compared to HPET unless adminstrator changes "clocksource" boot option >>>>> to "tsc". Upon initializing it, we also check for time warps and >>>>> appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and >>>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in >>>>> order to fallback to the next available clocksource. >>>>> >>>>> It is also worth noting that with clocksource=tsc there isn't any >>>>> need to synchronise with another clocksource, and I could verify that >>>>> great portion the time skew was eliminated and seeing much less time >>>>> warps happening. With HPET I observed ~500 warps in the period >>>>> of 1h of around 27 us, and with TSC down to 50 warps in the same >>>>> period having each warp < 100 ns. The warps still exist though but >>>>> are only related to cross CPU calibration (being how much it takes to >>>>> rendezvous with master), in which a later patch in this series aims to >>>>> solve. >>>>> >>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>>>> --- >>>>> Cc: Keir Fraser <keir@xen.org> >>>>> Cc: Jan Beulich <jbeulich@suse.com> >>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>>> >>>> Some style corrections, but no functional problems. >>>> >>>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> >>>> >>> I found out one issue in the tsc clocksource initialization path with respect to >>> the reliability check. This check is running when initializing platform timer, >>> but not all CPUS are up at that point (init_xen_time()) which means that the >>> check will always succeed. So for clocksource=tsc I need to defer initialization >>> to a later point (on verify_tsc_reliability()) and if successful switch to that >>> clocksource. Unless you disagree, v2 would have this and just requires one >>> additional preparatory change prior to this patch. >> >> Hmm, that's suspicious when thinking about CPU hotplug: What >> do you intend to do when a CPU comes online later, failing the >> check? > Good point, but I am not sure whether that would happen. The initcall > verify_tsc_reliability seems to be called only for the boot processor. Or are > you saying that it's case that initcalls are called too when hotplugging cpus > later on? If that's the case then my suggestion wouldn't work as you point out - > or rather without having runtime switching support as you point out below. Looks like I didn't express myself clearly enough: "failing the check" wasn't meant to imply the failure would actually occur, but rather that failure would occur if the check was run on that CPU. IOW the case of a CPU getting hotplugged which doesn't satisfy the needs for selecting TSC as the clock source. >> So far we don't do runtime switching of the clock source, >> and I'm not sure that's a good idea to do when there are running >> guests. > Totally agree, but I would be proposing to be at initialization phase where > there wouldn't be guests running. We would start with HPET, then only switch > to > TSC if that check has passed (and would happen once). > > Simpler alternative would be to call init_xen_time after all CPUs are > brought up > (and would also keep this patch as is), but I am not sure about the > repercussions of that. I don't see how either would help with the hotplug case. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/22/2016 04:02 PM, Jan Beulich wrote: >>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote: > >> >> On 03/22/2016 12:46 PM, Jan Beulich wrote: >>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote: >>> >>>> >>>> On 03/18/2016 08:21 PM, Andrew Cooper wrote: >>>>> On 17/03/16 16:12, Joao Martins wrote: >>>>>> Introduce support for using TSC as platform time which is the highest >>>>>> resolution time and most performant to get (~20 nsecs). Though there >>>>>> are also several problems associated with its usage, and there isn't a >>>>>> complete (and architecturally defined) guarantee that all machines >>>>>> will provide reliable and monotonic TSC across all CPUs, on different >>>>>> sockets and different P/C states. I believe Intel to be the only that >>>>>> can guarantee that. For this reason it's set with less priority when >>>>>> compared to HPET unless adminstrator changes "clocksource" boot option >>>>>> to "tsc". Upon initializing it, we also check for time warps and >>>>>> appropriate CPU features i.e. TSC_RELIABLE, CONSTANT_TSC and >>>>>> NONSTOP_TSC. And in case none of these conditions are met, we fail in >>>>>> order to fallback to the next available clocksource. >>>>>> >>>>>> It is also worth noting that with clocksource=tsc there isn't any >>>>>> need to synchronise with another clocksource, and I could verify that >>>>>> great portion the time skew was eliminated and seeing much less time >>>>>> warps happening. With HPET I observed ~500 warps in the period >>>>>> of 1h of around 27 us, and with TSC down to 50 warps in the same >>>>>> period having each warp < 100 ns. The warps still exist though but >>>>>> are only related to cross CPU calibration (being how much it takes to >>>>>> rendezvous with master), in which a later patch in this series aims to >>>>>> solve. >>>>>> >>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> >>>>>> --- >>>>>> Cc: Keir Fraser <keir@xen.org> >>>>>> Cc: Jan Beulich <jbeulich@suse.com> >>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >>>>> >>>>> Some style corrections, but no functional problems. >>>>> >>>>> Reviewed-by Andrew Cooper <andrew.cooper3@citrix.com> >>>>> >>>> I found out one issue in the tsc clocksource initialization path with respect to >>>> the reliability check. This check is running when initializing platform timer, >>>> but not all CPUS are up at that point (init_xen_time()) which means that the >>>> check will always succeed. So for clocksource=tsc I need to defer initialization >>>> to a later point (on verify_tsc_reliability()) and if successful switch to that >>>> clocksource. Unless you disagree, v2 would have this and just requires one >>>> additional preparatory change prior to this patch. >>> >>> Hmm, that's suspicious when thinking about CPU hotplug: What >>> do you intend to do when a CPU comes online later, failing the >>> check? >> Good point, but I am not sure whether that would happen. The initcall >> verify_tsc_reliability seems to be called only for the boot processor. Or are >> you saying that it's case that initcalls are called too when hotplugging cpus >> later on? If that's the case then my suggestion wouldn't work as you point out - >> or rather without having runtime switching support as you point out below. > > Looks like I didn't express myself clearly enough: "failing the check" > wasn't meant to imply the failure would actually occur, but rather > that failure would occur if the check was run on that CPU. IOW the > case of a CPU getting hotplugged which doesn't satisfy the needs > for selecting TSC as the clock source. Ah, I see. I believe this wouldn't be an issue with the current rendezvous mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp taken in that (hotplugged) CPU in the calibration and the rdtsc() in the guest same CPU, even though having CPU0 TSC (master) as system_time. However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU could have its TSC drifted, and since setting this flag relies on synchronization of TSCs we would need to clear the flag enterily. >>> So far we don't do runtime switching of the clock source, >>> and I'm not sure that's a good idea to do when there are running >>> guests. >> Totally agree, but I would be proposing to be at initialization phase where >> there wouldn't be guests running. We would start with HPET, then only switch >> to >> TSC if that check has passed (and would happen once). >> >> Simpler alternative would be to call init_xen_time after all CPUs are >> brought up >> (and would also keep this patch as is), but I am not sure about the >> repercussions of that. > > I don't see how either would help with the hotplug case. This was in response to what I thought you meant with your earlier question (which I misunderstood). But my question is still valid I believe. The reason for choosing between my suggested approaches is that tsc_check_reliability() requires all CPUs up so that the check is correctly performed. Joao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 22.03.16 at 21:40, <joao.m.martins@oracle.com> wrote: > On 03/22/2016 04:02 PM, Jan Beulich wrote: >>>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote: >>> On 03/22/2016 12:46 PM, Jan Beulich wrote: >>>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote: >>>>>> >>>>> I found out one issue in the tsc clocksource initialization path with respect to >>>>> the reliability check. This check is running when initializing platform timer, >>>>> but not all CPUS are up at that point (init_xen_time()) which means that the >>>>> check will always succeed. So for clocksource=tsc I need to defer initialization >>>>> to a later point (on verify_tsc_reliability()) and if successful switch to that >>>>> clocksource. Unless you disagree, v2 would have this and just requires one >>>>> additional preparatory change prior to this patch. >>>> >>>> Hmm, that's suspicious when thinking about CPU hotplug: What >>>> do you intend to do when a CPU comes online later, failing the >>>> check? >>> Good point, but I am not sure whether that would happen. The initcall >>> verify_tsc_reliability seems to be called only for the boot processor. Or are >>> you saying that it's case that initcalls are called too when hotplugging cpus >>> later on? If that's the case then my suggestion wouldn't work as you point out - >>> or rather without having runtime switching support as you point out below. >> >> Looks like I didn't express myself clearly enough: "failing the check" >> wasn't meant to imply the failure would actually occur, but rather >> that failure would occur if the check was run on that CPU. IOW the >> case of a CPU getting hotplugged which doesn't satisfy the needs >> for selecting TSC as the clock source. > Ah, I see. I believe this wouldn't be an issue with the current rendezvous > mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp > taken in that (hotplugged) CPU in the calibration and the rdtsc() in the > guest > same CPU, even though having CPU0 TSC (master) as system_time. > > However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU > could have its TSC drifted, and since setting this flag relies on > synchronization of TSCs we would need to clear the flag enterily. Except that we can't, after guests already got started, validly clear that flag afaics. The only option I see here would be to never set this flag if CPU hotplug is possible - by looking at the hot pluggable CPU count and, if non-zero, perhaps allowing a command line override to indicate no hotplug is intended (it may well be that such is already implicitly available). >>>> So far we don't do runtime switching of the clock source, >>>> and I'm not sure that's a good idea to do when there are running >>>> guests. >>> Totally agree, but I would be proposing to be at initialization phase where >>> there wouldn't be guests running. We would start with HPET, then only switch >>> to >>> TSC if that check has passed (and would happen once). >>> >>> Simpler alternative would be to call init_xen_time after all CPUs are >>> brought up >>> (and would also keep this patch as is), but I am not sure about the >>> repercussions of that. >> >> I don't see how either would help with the hotplug case. > This was in response to what I thought you meant with your earlier question > (which I misunderstood). But my question is still valid I believe. The > reason > for choosing between my suggested approaches is that tsc_check_reliability() > requires all CPUs up so that the check is correctly performed. Sure - it seems quite obvious that all boot time available CPUs should be checked. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 03/23/2016 07:28 AM, Jan Beulich wrote: >>>> On 22.03.16 at 21:40, <joao.m.martins@oracle.com> wrote: >> On 03/22/2016 04:02 PM, Jan Beulich wrote: >>>>>> On 22.03.16 at 16:51, <joao.m.martins@oracle.com> wrote: >>>> On 03/22/2016 12:46 PM, Jan Beulich wrote: >>>>>>>> On 22.03.16 at 13:41, <joao.m.martins@oracle.com> wrote: >>>>>>> >>>>>> I found out one issue in the tsc clocksource initialization path with respect to >>>>>> the reliability check. This check is running when initializing platform timer, >>>>>> but not all CPUS are up at that point (init_xen_time()) which means that the >>>>>> check will always succeed. So for clocksource=tsc I need to defer initialization >>>>>> to a later point (on verify_tsc_reliability()) and if successful switch to that >>>>>> clocksource. Unless you disagree, v2 would have this and just requires one >>>>>> additional preparatory change prior to this patch. >>>>> >>>>> Hmm, that's suspicious when thinking about CPU hotplug: What >>>>> do you intend to do when a CPU comes online later, failing the >>>>> check? >>>> Good point, but I am not sure whether that would happen. The initcall >>>> verify_tsc_reliability seems to be called only for the boot processor. Or are >>>> you saying that it's case that initcalls are called too when hotplugging cpus >>>> later on? If that's the case then my suggestion wouldn't work as you point out - >>>> or rather without having runtime switching support as you point out below. >>> >>> Looks like I didn't express myself clearly enough: "failing the check" >>> wasn't meant to imply the failure would actually occur, but rather >>> that failure would occur if the check was run on that CPU. IOW the >>> case of a CPU getting hotplugged which doesn't satisfy the needs >>> for selecting TSC as the clock source. >> Ah, I see. I believe this wouldn't be an issue with the current rendezvous >> mechanism (std_rendezvous), as the delta is computed between local_tsc_stamp >> taken in that (hotplugged) CPU in the calibration and the rdtsc() in the >> guest >> same CPU, even though having CPU0 TSC (master) as system_time. >> >> However it can be a problem with PVCLOCK_TSC_STABLE_BIT as the hotplugged CPU >> could have its TSC drifted, and since setting this flag relies on >> synchronization of TSCs we would need to clear the flag enterily. > > Except that we can't, after guests already got started, validly clear > that flag afaics. Correct. > The only option I see here would be to never set > this flag if CPU hotplug is possible - by looking at the hot pluggable > CPU count and, if non-zero, perhaps allowing a command line > override to indicate no hotplug is intended (it may well be that such > is already implicitly available). OK, will add this then to allow the flag only if the conditions above are met. Thanks for the pointer! >>>>> So far we don't do runtime switching of the clock source, >>>>> and I'm not sure that's a good idea to do when there are running >>>>> guests. >>>> Totally agree, but I would be proposing to be at initialization phase where >>>> there wouldn't be guests running. We would start with HPET, then only switch >>>> to >>>> TSC if that check has passed (and would happen once). >>>> >>>> Simpler alternative would be to call init_xen_time after all CPUs are >>>> brought up >>>> (and would also keep this patch as is), but I am not sure about the >>>> repercussions of that. >>> >>> I don't see how either would help with the hotplug case. >> This was in response to what I thought you meant with your earlier question >> (which I misunderstood). But my question is still valid I believe. The >> reason >> for choosing between my suggested approaches is that tsc_check_reliability() >> requires all CPUs up so that the check is correctly performed. > > Sure - it seems quite obvious that all boot time available CPUs > should be checked. Cool, so I will go with moving init_xen_time right after all CPUs are up but before initcalls are invoked. Joao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 23.03.16 at 13:05, <joao.m.martins@oracle.com> wrote: > On 03/23/2016 07:28 AM, Jan Beulich wrote: >> Sure - it seems quite obvious that all boot time available CPUs >> should be checked. > Cool, so I will go with moving init_xen_time right after all CPUs are up but > before initcalls are invoked. I think your other alternative seemed more reasonable; I wouldn't be very happy to see init_xen_time() moved around. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel