Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource

From: Joao Martins <joao.m.martins@oracle.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Keir Fraser <keir@xen.org>, Jan Beulich <jbeulich@suse.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH v2 3/6] x86/time: implement tsc as clocksource
Date: Sun, 3 Apr 2016 19:47:04 +0100	[thread overview]
Message-ID: <57016528.3080909@oracle.com> (raw)
In-Reply-To: <20160401184511.GB24202@char.us.oracle.com>

On 04/01/2016 07:45 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 01, 2016 at 07:38:18PM +0100, Joao Martins wrote:
>> [snip]
>>
>>>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>>>> index ed4ed24..2602dda 100644
>>>> --- a/xen/arch/x86/time.c
>>>> +++ b/xen/arch/x86/time.c
>>>> @@ -432,6 +432,63 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns)
>>>>  }
>>>>  
>>>>  /************************************************************
>>>> + * PLATFORM TIMER 4: TSC
>>>> + */
>>>> +static u64 tsc_freq;
>>>> +static unsigned long tsc_max_warp;
>>>> +static void tsc_check_reliability(void);
>>>> +
>>>> +static int __init init_tsctimer(struct platform_timesource *pts)
>>>> +{
>>>> +    bool_t tsc_reliable = 0;
>>>
>>> No need to set it to zero.
>> OK.
>>
>>>> +
>>>> +    tsc_check_reliability();
>>>
>>> This has been already called by verify_tsc_reliability which calls this
>>> function. Should we set tsc_max_warp to zero before calling it?
>> Ah, correct. But may be it's not necessary to run the tsc_check_reliability here
>> at all as what I am doing is ineficient. See my other comment below.
>>
>>>
>>>> +
>>>> +    if ( tsc_max_warp > 0 )
>>>> +    {
>>>> +        tsc_reliable = 0;
>>>
>>> Ditto. It is by default zero.
>> OK.
>>
>>>
>>>> +        printk(XENLOG_INFO "TSC: didn't passed warp test\n");
>>>
>>> So the earlier test by verify_tsc_reliability did already this check and
>>> printed this out - and also cleared the X86_FEATURE_TSC_RELIABLE.
>>>
>>> So can this check above be removed then?
>>>
>>> Or are you thinking to ditch what verify_tsc_reliability does?
>>>
>> I had the tsc_check_reliability here because TSC could still be deemed reliable
>> for max_cstate <= 2 or with CONSTANT_TSC + NONSTOP_TSC. The way I have it, the
>> most likely scenario (i.e. having TSC_RELIABLE) would run twice. Perhaps a
>> better way of doing this would be to run the warp test solely on
>> verify_tsc_reliability() in all possible conditions to be deemed reliable? And
>> then I could even remove almost the whole init_tsctimer if it was to be called
>> when no warps are observed.
> 
> So..
>>
>>>> +    }
>>>> +    else if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>>>> +              (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>>>> +               boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
>>>> +    {
>>>> +        tsc_reliable = 1;
>>>> +    }
>>>> +    else if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>>>> +    {
>>>> +        tsc_reliable = (max_cstate <= 2);
>>>> +
>>>> +        if ( tsc_reliable )
>>>> +            printk(XENLOG_INFO "TSC: no deep Cstates, deemed reliable\n");
>>>> +        else
>>>> +            printk(XENLOG_INFO "TSC: deep Cstates possible, so not reliable\n");
> 
> .. is that always true?
Might not be on older platforms - but I could be stricter here and require
max_cstates 0 to allow TSC clocksource to be used. CONSTANT_TSC is set based on
CPU model (Manual section 17.14, 2nd paragrah), and (on Intel) AFAICT constant
rate can be interpreted to being across P/T states, thus leaving us with the C
states. It is only architecturally guaranteed that TSC doesn't stop on deep
C-states (Section 36.8.3) if invariant TSC is set (CPUID.80000007:EDX[8]). If
this bit is set, on Intel it guarantees to have synchronized and nonstop TSCs
across all states (as also stated in the Section 17.14.1). NONSTOP_TSC means
that the TSC does not stop in deep C states (C3 or higher), so decreasing the
maximum C states to 2 (or disabling entirely) would provide the equivalent to a
nonstop TSC. Thus deeming it reliable _if_ the warp test passes.

> As in if this is indeed the case should we clear
> X86_FEATURE_CONSTANT_TSC bit? And make check be part of tsc_check_reliability?
I am not sure about clearing CONSTANT_TSC as this is based on CPU model. But I
believe the main issue would be to know whether the TSC stops or on deep C states.

> Then init_tsctimer() would just need to check for the boot_cpu_has bits being
> set.
> 
> As in:
> 
>  if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
>       (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>        boot_cpu_has(X86_FEATURE_NONSTOP_TSC)) )
>   {
> 	pts->frequency = tsc_freq;
> 	return 1;
>    }
>
Yeah something along those lines. My idea previously was to not even check these
bits, and assume init_tsctimer is called knowing that we have a reliable TSC
source as we check all of it beforehand? Something like this:

 static int __init verify_tsc_reliability(void)
 {
    if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ||
         (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
          (boot_cpu_has(X86_FEATURE_NONSTOP_TSC) || max_cstate <= 2)) )
     {
         if (tsc_warp_warp)
         {
             printk("%s: TSC warp detected, disabling TSC_RELIABLE\n",
                    __func__);
             if ( boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
                  setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
          }
          else if ( !strcmp(opt_clocksource, "tsc") )
          {
          ...

And then init_tsctimer would just be:

static int __init init_tsctimer(struct platform_timesource *pts)
{
    pts->frequency = tsc_freq;
    return 1;
}

?

>    return 0;
>
> And tsc_check_reliability would be in charge of clearing the CPU bits if something
> is off.
Perhaps you mean verify_tsc_reliability()? That's already
clearing TSC_RELIABLE in the event of warps as you previously mentioned.
tsc_check_reliability is the actual warp test - might not be the best place to
clear it.

> But maybe that is not good? As in, could we mess up and clear those bits
> even if they are suppose to be set?
Hm, I am not sure how we could mess up if we clear them, but these bits look
correctly set to me.

Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel