All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>,
	Pawel Moll <pawel.moll@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	Will Deacon <Will.Deacon@arm.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	Pekka Enberg <penberg@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Robert Richter <robert.richter@amd.com>
Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples
Date: Tue, 19 Feb 2013 14:20:30 -0800	[thread overview]
Message-ID: <5123FAAE.3030706@linaro.org> (raw)
In-Reply-To: <alpine.LFD.2.02.1302192152240.22263@ionos>

On 02/19/2013 01:50 PM, Thomas Gleixner wrote:
> On Tue, 19 Feb 2013, John Stultz wrote:
>> On 02/19/2013 12:15 PM, Thomas Gleixner wrote:
>>> Depending on the length of the delay which kept VCPU0 away from
>>> executing and depending on the direction of the ntp update of the
>>> timekeeping variables __vdso_clock_gettime()#2 can observe time going
>>> backwards.
>>>
>>> You can reproduce that by pinning VCPU0 to physical core 0 and VCPU1
>>> to physical core 1. Now remove all load from physical core 1 except
>>> VCPU1 and put massive load on physical core 0 and make sure that the
>>> NTP adjustment lowers the mult factor.
>>>
>>> Fun, isn't it ?
>> Yea, this has always worried me. I had a patch for this way way back, blocking
>> vdso readers for the entire timekeeping update.
>> But it was ugly, hurt performance and no one seemed to be hitting the window
>> you hit above.  None the less, you're probably right, we should find a way to
>> do it right. I'll try to revive those patches.
> Let me summarize the IRC discussion we just had about that:
>
> 1) We really want to reduce the seq write hold time of the timekeeper
>     to the bare minimum.
>
>     That's doable and I have working patches for this by splitting the
>     timekeeper seqlock into a spin_lock and a seqcount and doing the
>     update calculations on a shadow timekeeper structure. The seq write
>     hold time then gets reduced to switching a pointer and updating the
>     gtod data.
>
>     So the sequence would look like:
>
>     raw_spin_lock(&timekeeper_lock);
>     copy_shadow_data(current_timekeeper, shadow_timekeeper);
>     do_timekeeping_and_ntp_update(shadow_timekeeper);
>     write_seqcount_begin(&timekeeper_seq);
>     switch_pointers(current_timekeeper, shadow_timekeeper);
>     update_vsyscall();
>     write_seqcount_end(&timekeeper_seq);
>     raw_spin_unlock(&timekeeper_lock);
>
>     It's really worth the trouble. On one of my optimized RT systems I
>     get the maximum latency of the non timekeeping cores (timekeeping
>     duty is pinned to core 0) down from 8us to 4 us. That's a whopping
>     factor of 2.
>
> 2) Doing #1 will allow to observe the described time going backwards
>     scenario in kernel as well.
>
>     The reason why we did not get complaints about that scenario at all
>     (yet) is that the window and the probability to hit it are small
>     enough. Nevertheless it's a real issue for virtualized systems.
>
>     Now you came up with the great idea, that the timekeeping core is
>     able to calculate what the approximate safe value is for the
>     clocksource readout to be in a state where wreckage relative to the
>     last update of the clocksource is not observable, not matter how
>     long the scheduled out delay is and in which direction the NTP
>     update is going.

So the other bit of caution here, is I realize my idea of "valid cycle 
ranges" has the potential for deadlock.

While it should be fine for use with vdso, we have to be careful if we 
use this in-kernel, because if we're in the update path, the valid 
interval check could trigger the ktime_get() in hrtimer_interrupt() to 
spin forever. So we need to be sure we don't use this method anywhere in 
the code paths that trigger the update_wall_time() code.

So some additional thinking may be necessary here. Though it may be as 
simple as making sure we don't loop on the cpu that does the timekeeping 
update.

thanks
-john


  reply	other threads:[~2013-02-19 22:20 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 10:13 [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples Stephane Eranian
2012-10-16 17:23 ` Peter Zijlstra
2012-10-18 19:33   ` Stephane Eranian
2012-11-10  2:04   ` John Stultz
2012-11-11 20:32     ` Stephane Eranian
2012-11-12 18:53       ` John Stultz
2012-11-12 20:54         ` Stephane Eranian
2012-11-12 22:39           ` John Stultz
2012-11-13 20:58     ` Steven Rostedt
2012-11-14 22:26       ` John Stultz
2012-11-14 23:30         ` Steven Rostedt
2013-02-01 14:18   ` Pawel Moll
2013-02-05 21:18     ` David Ahern
2013-02-05 22:13     ` Stephane Eranian
2013-02-05 22:28       ` John Stultz
2013-02-06  1:19         ` Steven Rostedt
2013-02-06 18:17           ` Pawel Moll
2013-02-13 20:00             ` Stephane Eranian
2013-02-14 10:33               ` Pawel Moll
2013-02-18 15:16                 ` Stephane Eranian
2013-02-18 18:59                   ` David Ahern
2013-02-18 20:35         ` Thomas Gleixner
2013-02-19 18:25           ` John Stultz
2013-02-19 19:55             ` Thomas Gleixner
2013-02-19 20:15               ` Thomas Gleixner
2013-02-19 20:35                 ` John Stultz
2013-02-19 21:50                   ` Thomas Gleixner
2013-02-19 22:20                     ` John Stultz [this message]
2013-02-20 10:06                       ` Thomas Gleixner
2013-02-20 10:29             ` Peter Zijlstra
2013-02-23  6:04               ` John Stultz
2013-02-25 14:10                 ` Peter Zijlstra
2013-03-14 15:34                   ` Stephane Eranian
2013-03-14 19:57                     ` Pawel Moll
2013-03-31 16:23                       ` David Ahern
2013-04-01 18:29                         ` John Stultz
2013-04-01 22:29                           ` David Ahern
2013-04-01 23:12                             ` John Stultz
2013-04-03  9:17                             ` Stephane Eranian
2013-04-03 13:55                               ` David Ahern
2013-04-03 14:00                                 ` Stephane Eranian
2013-04-03 14:14                                   ` David Ahern
2013-04-03 14:22                                     ` Stephane Eranian
2013-04-03 17:57                                       ` John Stultz
2013-04-04  8:12                                         ` Stephane Eranian
2013-04-04 22:26                                           ` John Stultz
2013-04-02  7:54                           ` Peter Zijlstra
2013-04-02 16:05                             ` Pawel Moll
2013-04-02 16:19                             ` John Stultz
2013-04-02 16:34                               ` Pawel Moll
2013-04-03 17:19                               ` Pawel Moll
2013-04-03 17:29                                 ` John Stultz
2013-04-03 17:35                                   ` Pawel Moll
2013-04-03 17:50                                     ` John Stultz
2013-04-04  7:37                                       ` Richard Cochran
2013-04-04 16:33                                         ` Pawel Moll
2013-04-04 16:29                                       ` Pawel Moll
2013-04-05 18:16                                         ` Pawel Moll
2013-04-06 11:05                                           ` Richard Cochran
2013-04-08 17:58                                             ` Pawel Moll
2013-04-08 19:05                                               ` John Stultz
2013-04-09  5:02                                                 ` Richard Cochran
2013-02-06 18:17       ` Pawel Moll
2013-06-26 16:49     ` David Ahern
2013-07-15 10:44       ` Pawel Moll

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5123FAAE.3030706@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=Will.Deacon@arm.com \
    --cc=ak@linux.intel.com \
    --cc=anton@samba.org \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulus@samba.org \
    --cc=pawel.moll@arm.com \
    --cc=penberg@gmail.com \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.