Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT

From: Joao Martins <joao.m.martins@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Keir Fraser <keir@xen.org>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT
Date: Tue, 5 Apr 2016 22:34:11 +0100	[thread overview]
Message-ID: <57042F53.5090206@oracle.com> (raw)
In-Reply-To: <5703CA1802000078000E32A1@prv-mh.provo.novell.com>

On 04/05/2016 01:22 PM, Jan Beulich wrote:
>>>> On 29.03.16 at 15:44, <joao.m.martins@oracle.com> wrote:
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -43,6 +43,10 @@
>>  static char __initdata opt_clocksource[10];
>>  string_param("clocksource", opt_clocksource);
>>  
>> +/* opt_nocpuhotplug: Set if CPU hotplug isn't meant to be used */
>> +static bool_t __initdata opt_nocpuhotplug;
>> +boolean_param("nocpuhotplug", opt_nocpuhotplug);
> 
> If we were to have such a new option, it would need to be
> accompanied by an entry in the command line option doc.
Yes, Konrad pointed that out too - and I had it already documented
already for the next version. But given your argument below might
not even be needed to add this option.

> But
> I'm opposed to this: For one, the variable being static here
> means there is nothing that actually suppresses CPU hotplug
> to happen.
> And then I think this can, for all practical purposes,
> be had by suitably using existing command line options, namely
> "max_cpus=", such that set_nr_cpu_ids() won't allow for any
> further CPUs to get added. Albeit I admit that if someone was
> to bring down some CPU and then hotplug another one, we
> might still be in trouble. So maybe the better approach would
> be to fail onlining of CPUs that don't meet the criteria when
> "clocksource=tsc"?
True - max_cpus would produce the same effect. But I should point out
that even when clocksource=tsc the rendezvous would be std_rendezvous. So the
reference TSC is CPU 0 and tsc_timestamps are of the individual
CPUs. So perhaps the criteria would be for clocksource=tsc and use_tsc_stable_bit.


> 
>> @@ -435,6 +439,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>   * PLATFORM TIMER 4: TSC
>>   */
>>  static bool_t clocksource_is_tsc;
>> +static bool_t use_tsc_stable_bit;
> 
> __read_mostly?
Yeah - I will add it there.

> 
>> @@ -468,6 +473,11 @@ static int __init init_tsctimer(struct platform_timesource *pts)
>>  
>>      pts->frequency = tsc_freq;
>>      clocksource_is_tsc = tsc_reliable;
>> +    use_tsc_stable_bit = clocksource_is_tsc &&
>> +        ((nr_cpu_ids == num_present_cpus()) || opt_nocpuhotplug);
> 
> See my remark above regarding the reliability of this.
> 
>> @@ -1431,6 +1443,22 @@ static void time_calibration_std_rendezvous(void *_r)
>>      raise_softirq(TIME_CALIBRATE_SOFTIRQ);
>>  }
>>  
>> +/*
>> + * Rendezvous function used when clocksource is TSC and
>> + * no CPU hotplug will be performed.
>> + */
>> +static void time_calibration_nop_rendezvous(void *_r)
> 
> Even if done so in existing code - no new local variable or function
> parameters starting with an underscore please.
OK.

> 
>> +{
>> +    struct cpu_calibration *c = &this_cpu(cpu_calibration);
>> +    struct calibration_rendezvous *r = _r;
> 
> const
> 
>> +    c->local_tsc_stamp = r->master_tsc_stamp;
>> +    c->stime_local_stamp = get_s_time();
>> +    c->stime_master_stamp = r->master_stime;
>> +
>> +    raise_softirq(TIME_CALIBRATE_SOFTIRQ);
>> +}
> 
> Perhaps this whole function should move up ahead of the other
> two, so that they both can use this one instead of now duplicating
> the same code a 3rd time? Or maybe a new helper function would
> be better, to also account for the difference in what
> c->local_tsc_stamp gets set from (which could then become a
> parameter of that new function).
The refactoring you suggest sounds a good idea indeed as that
code is shared across all rendezvous - I will do so following
this guideline you advised.

> 
>> @@ -1440,6 +1468,13 @@ static void time_calibration(void *unused)
>>          .semaphore = ATOMIC_INIT(0)
>>      };
>>  
>> +    if ( use_tsc_stable_bit )
>> +    {
>> +        local_irq_disable();
>> +        r.master_stime = read_platform_stime(&r.master_tsc_stamp);
>> +        local_irq_enable();
>> +    }
> 
> So this can't be in time_calibration_nop_rendezvous() because
> you want to avoid the actual rendezvousing. But isn't the then
> possibly much larger gap between read_platform_stime() (which
> parallels the rdtsc()-s in the other two cases) and get_s_time()
> invocation going to become a problem?
Perhaps I am not not seeing the potential problem of this. The main
difference I see between both would be the base system time: read_platform_stime
uses stime_platform_stamp as base, and computes a difference from the
read_counter (i.e. rdtsc() ) with previously saved platform-wide stamp
(platform_timer_stamp). get_s_time uses the stime_local_stamp (updated from
stime_master_stamp on local_time_calibration) as base plus delta from rdtsc()
with local_tsc_stamp. And since this is now all TSC, and TSC monotonically
increase and is synchronized across CPUs, both calls would end up returning the
same or a always up-to-date value, whether cpu_time have a larger gap or not
from stime_platform_stamp. Unless the concern you are raising comes from the
fact CPU 0 calibrates much sooner than the last calibrated CPU, as opposed to
roughly at the same time with std_rendezvous?

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel