xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Tim Deegan <tim@xen.org>,
	George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: BUG: NOW() seems to (sometimes) go backwards!
Date: Wed, 8 Jun 2016 21:36:50 +0100	[thread overview]
Message-ID: <575881E2.1080605@oracle.com> (raw)
In-Reply-To: <1465393417.15816.206.camel@citrix.com>

On 06/08/2016 02:43 PM, Dario Faggioli wrote:
> On Wed, 2016-06-08 at 04:42 -0600, Jan Beulich wrote:
>>>>> On 07.06.16 at 17:54, <dario.faggioli@citrix.com> wrote:
>>> So, it looks to me that the TSC is actually ok, and it could be the
>>> local_tsc_stamp and scale_delta() magic done in get_s_time_fixed()
>>> (which, AFAICUI, is there to convert cycles to time) that does not
>>> guarantee the result to be monotonic, even if the input is...
>>> Thoughts?
>> Indeed this smells like an issue in the scaling. However, the scaling
>> values vary only when !CONSTANT_TSC. Can you check that this
>> flag is actually set on that system? 
>>
> Checked. I do have it. I instrumented the code to print stuff if it
> finds it, and it prints.
> 
> Also:
> root@Zhaman:~# xl debug-keys s
> (XEN) [  406.719464] TSC marked as reliable, warp = 0 (count=3)
> (XEN) [  406.719467] dom1(hvm): mode=0,ofs=0xffffd9279716c276,khz=2394069,inc=4,vtsc count: 195367 kernel, 0 user
> 
>> (I hope you're not running a
>> strange Dom0 setup with FREQCTL_dom0_kernel in effect.) 
>>
> I've no idea what this is. I've been running 4.1.0, built myself, and
> stock Debian unstable 4.5.0, and I'm seeing the issue in both cases.
> 
> Looking FREQCTL_dom0_kernel up, I guess you mean what happens when one
> passes cpufreq="dom0-kernel" to xen on boot command line. In which
> case, no, I'm not doing that.
> 
>> And
>> at the same time that it's time_calibration_tsc_rendezvous() that
>> is in use?
>>
> The code you're referring to should be this:
> 
>     /* If we have constant-rate TSCs then scale factor can be shared. */
>     if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
>     {
>         /* If TSCs are not marked as 'reliable', re-sync during rendezvous. */
>         if ( !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
>             time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous;
>     }
> 
> And I have both X86_FEATURE_CONSTANT_TSC and X86_FEATURE_TSC_RELIABLE.
> 
> I've again instrumented the code in order to check whether it is
> time_calibration_tsc_rendezvous() or time_calibration_std_rendezvous()
> that is being used, and it's the latter:
> 
> (XEN) [    1.795916] TSC reliable. Yay!! Using ffff82d080198362 for rendevousez
> 
> [dario@Solace xen.git] $ objdump -D xen/xen-syms |grep ffff82d080198362
> ffff82d080198362 <time_calibration_std_rendezvous>:
> 
> which looks correct to me.
Probably one other codepath that you would be interested is on
local_time_calibration() commenting the chunk on this conditional:

if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC )
{
...
}

And with that you rarely see time going backwards since the scaling would be
adjusted to each cpu calibration error - but it would be a (very very close)
approximation on software not exactly guaranteed as hardware TSC. But trying this out
would be just for experimentation - doesn't look to be a correct way of addressing this.

> 
>> Yet when the scaling values get set only once at boot, monotonic
>> (cross-CPU) TSC means monotonic (cross-CPU) returns from NOW().
>>
> Yep. And at this point, this is what needs to be verified, I guess...
I think get_s_time_fixed doesn't guarantee monotonicity across CPUs being it
different socket or (SMT) siblings. local_tsc_stamp is seeded differently on each CPU
i.e. rdtsc() right after doing the platform time read on the calibration rendezvous.
Plus stime_local_stamp is seeded with values taken from platform timer (HPET, ACPI,
PIT) on local_time_calibration which means that get_s_time isn't solely based on TSC
and that there will always be a gap between stime_local_stamp and local_tsc_stamp.
TSC is indeed monotonic on a TSC invariant box, but the delta that is computed
differs from cpu to cpu according to when time calibration happens on each CPU - thus
not guaranteeing the desired monotonicity property. Having stime_local_stamp be based
on the same timestamp that of the local_tsc_stamp plus having a single
local_tsc_stamp as reference would address this behaviour - see also below.

> 
>> In any event - could you try to exclude C- and P-state effects? Of
>> course that makes sense only if you can reasonably repro the
>> problem situation (and hence can tell from its absence over a certain
>> period of time that whatever change was done did have some
>> positive effect).
>>
> It's actually quite hard *NOT* to reproduce the problem... it happens
> all the time, and if the load is high enough, I see the "Time went
> backwards?" printk several times per second!
> 
> So, trying to do what you suggest in an online fashion, i.e., issueing:
> 
>  # xenpm set-max-cstate 0
>  # xenpm set-scaling-governor all performance
> 
> did not change the situation (I still see the printks).
> 
> I've then tried passing both cpufreq="none" and max_cstate=0 to xen at
> boot, but they made no difference at all either.
> 
> Most of the time, we're speaking of very small skews, e.g.:
> 
> (XEN) [   59.999959] WARNING: __update_runq_load: Time went backwards? now 59999946079 llu 59999946085
> (XEN) [  117.595508] WARNING: __update_runq_load: Time went backwards? now 117595495295 llu 117595495310
> 
> i.e., 6 nanoseconds or 15 nanoseconds!
> 
> Then there are instances where the difference is bigger (microseconds
> time scale, like in the first email of the thread).
It does sound similar to the cross cpu calibration error I was observing but on a
place other than guests pv time infos updates. My pvclock TSC stable bit series could
address your issue - but only for clocksource=tsc added there, though it would be
quite limitative for the other cases i.e. clocksource=hpet|acpi|etc.

One patch in the RFC of that series, could fix your issue but it was more of a hacky
patch for discussion [0]. It converts a value returned from the platform timer (which
monotonically increases) and scales it to the equivalent in TSC, assuming the TSC is
always ahead of the other clocksources. But probably the correct fix would be to do
as linux does with pvclock? Linux keeps an accumulator representing the latest system
time across CPUs and atomically update it if now > last. Thus always returning the
most up-to-date value. But then NOW() would take a hit in performance.

[0] http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02832.html

Cheers,
Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2016-06-08 20:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-07 15:54 BUG: NOW() seems to (sometimes) go backwards! Dario Faggioli
2016-06-08 10:42 ` Jan Beulich
2016-06-08 13:43   ` Dario Faggioli
2016-06-08 14:01     ` Jan Beulich
2016-06-08 20:36     ` Joao Martins [this message]
2016-06-09  8:05       ` Jan Beulich
2016-06-09 10:24         ` Joao Martins
2016-06-09 10:43           ` Joao Martins
2016-06-09 11:25           ` Jan Beulich
2016-06-09 11:31             ` Jan Beulich
2016-06-09 10:19 ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575881E2.1080605@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).