linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <john.stultz@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: [patch V2 63/64] timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
Date: Wed, 16 Jul 2014 21:05:23 -0000	[thread overview]
Message-ID: <20140716205057.569512979@linutronix.de> (raw)
In-Reply-To: 20140716205018.175419210@linutronix.de

[-- Attachment #1: timekeeping-nmi-safe-access-to-mono-raw.patch --]
[-- Type: text/plain, Size: 6837 bytes --]

Tracers want a correlated time between the kernel instrumentation and
user space. We really do not want to export sched_clock() to user
space, so we need to provide something sensible for this.

Using separate data structures with an non blocking sequence count
based update mechanism allows us to do that. The data structure
required for the readout has a sequence counter and two copies of the
timekeeping data.

On the update side:

  smp_wmb();
  tkf->seq++;
  smp_wmb();
  update(tkf->base[0], tk);
  smp_wmb();
  tkf->seq++;
  smp_wmb();
  update(tkf->base[1], tk);

On the reader side:

  do {
     seq = tkf->seq;
     smp_rmb();
     idx = seq & 0x01;
     now = now(tkf->base[idx]);
     smp_rmb();
  } while (seq != tkf->seq)

So if a NMI hits the update of base[0] it will use base[1] which is
still consistent, but this timestamp is not guaranteed to be monotonic
across an update.

The timestamp is calculated by:

	now = base_mono + clock_delta * slope

So if the update lowers the slope, readers who are forced to the
not yet updated second array are still using the old steeper slope.

 tmono
 ^
 |    o  n
 |   o n
 |  u
 | o
 |o
 |12345678---> reader order

 o = old slope
 u = update
 n = new slope

So reader 6 will observe time going backwards versus reader 5.

While other CPUs are likely to be able observe that, the only way
for a CPU local observation is when an NMI hits in the middle of
the update. Timestamps taken from that NMI context might be ahead
of the following timestamps. Callers need to be aware of that and
deal with it.

V2: Got rid of clock monotonic raw and reorganized the data
    structures. Folded in the barrier fix from Mathieu.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 include/linux/timekeeping.h |    2 
 kernel/time/timekeeping.c   |  124 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+)

Index: tip/include/linux/timekeeping.h
===================================================================
--- tip.orig/include/linux/timekeeping.h
+++ tip/include/linux/timekeeping.h
@@ -164,6 +164,8 @@ static inline u64 ktime_get_raw_ns(void)
 	return ktime_to_ns(ktime_get_raw());
 }
 
+extern u64 ktime_get_mono_fast_ns(void);
+
 /*
  * Timespec interfaces utilizing the ktime based ones
  */
Index: tip/kernel/time/timekeeping.c
===================================================================
--- tip.orig/kernel/time/timekeeping.c
+++ tip/kernel/time/timekeeping.c
@@ -44,6 +44,22 @@ static struct {
 static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
+/**
+ * struct tk_fast - NMI safe timekeeper
+ * @seq:	Sequence counter for protecting updates. The lowest bit
+ *		is the index for the tk_read_base array
+ * @base:	tk_read_base array. Access is indexed by the lowest bit of
+ *		@seq.
+ *
+ * See @update_fast_timekeeper() below.
+ */
+struct tk_fast {
+	seqcount_t		seq;
+	struct tk_read_base	base[2];
+};
+
+static struct tk_fast tk_fast_mono ____cacheline_aligned;
+
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
 
@@ -210,6 +226,112 @@ static inline s64 timekeeping_get_ns_raw
 	return nsec + arch_gettimeoffset();
 }
 
+/**
+ * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
+ * @tk:		The timekeeper from which we take the update
+ * @tkf:	The fast timekeeper to update
+ * @tbase:	The time base for the fast timekeeper (mono/raw)
+ *
+ * We want to use this from any context including NMI and tracing /
+ * instrumenting the timekeeping code itself.
+ *
+ * So we handle this differently than the other timekeeping accessor
+ * functions which retry when the sequence count has changed. The
+ * update side does:
+ *
+ * smp_wmb();	<- Ensure that the last base[1] update is visible
+ * tkf->seq++;
+ * smp_wmb();	<- Ensure that the seqcount update is visible
+ * update(tkf->base[0], tk);
+ * smp_wmb();	<- Ensure that the base[0] update is visible
+ * tkf->seq++;
+ * smp_wmb();	<- Ensure that the seqcount update is visible
+ * update(tkf->base[1], tk);
+ *
+ * The reader side does:
+ *
+ * do {
+ *	seq = tkf->seq;
+ *	smp_rmb();
+ *	idx = seq & 0x01;
+ *	now = now(tkf->base[idx]);
+ *	smp_rmb();
+ * } while (seq != tkf->seq)
+ *
+ * As long as we update base[0] readers are forced off to
+ * base[1]. Once base[0] is updated readers are redirected to base[0]
+ * and the base[1] update takes place.
+ *
+ * So if a NMI hits the update of base[0] then it will use base[1]
+ * which is still consistent. In the worst case this can result is a
+ * slightly wrong timestamp (a few nanoseconds). See
+ * @ktime_get_mono_fast_ns.
+ */
+static void update_fast_timekeeper(struct timekeeper *tk)
+{
+	struct tk_read_base *base = tk_fast_mono.base;
+
+	/* Force readers off to base[1] */
+	raw_write_seqcount_latch(&tk_fast_mono.seq);
+
+	/* Update base[0] */
+	memcpy(base, &tk->tkr, sizeof(*base));
+
+	/* Force readers back to base[0] */
+	raw_write_seqcount_latch(&tk_fast_mono.seq);
+
+	/* Update base[1] */
+	memcpy(base + 1, base, sizeof(*base));
+}
+
+/**
+ * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic
+ *
+ * This timestamp is not guaranteed to be monotonic across an update.
+ * The timestamp is calculated by:
+ *
+ *	now = base_mono + clock_delta * slope
+ *
+ * So if the update lowers the slope, readers who are forced to the
+ * not yet updated second array are still using the old steeper slope.
+ *
+ * tmono
+ * ^
+ * |    o  n
+ * |   o n
+ * |  u
+ * | o
+ * |o
+ * |12345678---> reader order
+ *
+ * o = old slope
+ * u = update
+ * n = new slope
+ *
+ * So reader 6 will observe time going backwards versus reader 5.
+ *
+ * While other CPUs are likely to be able observe that, the only way
+ * for a CPU local observation is when an NMI hits in the middle of
+ * the update. Timestamps taken from that NMI context might be ahead
+ * of the following timestamps. Callers need to be aware of that and
+ * deal with it.
+ */
+u64 notrace ktime_get_mono_fast_ns(void)
+{
+	struct tk_read_base *tkr;
+	unsigned int seq;
+	u64 now;
+
+	do {
+		seq = raw_read_seqcount(&tk_fast_mono.seq);
+		tkr = tk_fast_mono.base + (seq & 0x01);
+		now = ktime_to_ns(tkr->base_mono) + timekeeping_get_ns(tkr);
+
+	} while (read_seqcount_retry(&tk_fast_mono.seq, seq));
+	return now;
+}
+EXPORT_SYMBOL_GPL(ktime_get_mono_fast_ns);
+
 #ifdef CONFIG_GENERIC_TIME_VSYSCALL_OLD
 
 static inline void update_vsyscall(struct timekeeper *tk)
@@ -325,6 +447,8 @@ static void timekeeping_update(struct ti
 	if (action & TK_MIRROR)
 		memcpy(&shadow_timekeeper, &tk_core.timekeeper,
 		       sizeof(tk_core.timekeeper));
+
+	update_fast_timekeeper(tk);
 }
 
 /**



  parent reply	other threads:[~2014-07-16 21:06 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-16 21:03 [patch V2 00/64] timekeeping: 2038, optimizations, NMI safe accessors Thomas Gleixner
2014-07-16 21:03 ` [patch V2 01/64] tile: Convert VDSO timekeeping to the precise mechanism Thomas Gleixner
2014-07-16 21:03 ` [patch V2 02/64] timekeeping: Simplify arch_gettimeoffset() Thomas Gleixner
2014-07-16 21:03 ` [patch V2 03/64] hrtimer: Cleanup hrtimer accessors to the timekepeing state Thomas Gleixner
2014-07-16 21:03 ` [patch V2 04/64] ktime: Kill non-scalar ktime_t implementation for 2038 Thomas Gleixner
2014-07-16 21:03 ` [patch V2 05/64] ktime: Sanitize ktime_to_us/ms conversion Thomas Gleixner
2014-07-16 21:03 ` [patch V2 06/64] ktime: Change ktime_set() to take 64bit seconds value Thomas Gleixner
2014-07-16 21:03 ` [patch V2 07/64] time64: Add time64.h header and define struct timespec64 Thomas Gleixner
2014-07-16 21:03 ` [patch V2 08/64] time: More core infrastructure for timespec64 Thomas Gleixner
2014-07-16 21:04 ` [patch V2 09/64] timekeeping: Convert timekeeping core to use timespec64s Thomas Gleixner
2014-07-16 21:04 ` [patch V2 10/64] time: Consolidate the time accessor prototypes Thomas Gleixner
2014-07-16 21:04 ` [patch V2 11/64] timekeeping: Provide timespec64 based interfaces Thomas Gleixner
2014-07-16 21:04 ` [patch V2 12/64] timekeeper: Move tk_xtime to core code Thomas Gleixner
2014-07-23 21:15   ` John Stultz
2014-07-23 21:59     ` Thomas Gleixner
2014-07-16 21:04 ` [patch V2 13/64] timekeeping: Cache optimize struct timekeeper Thomas Gleixner
2014-07-16 21:04 ` [patch V2 14/64] timekeeping: Use timekeeping_update() instead of memcpy() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 15/64] timekeeping: Provide internal ktime_t based data Thomas Gleixner
2014-07-16 21:04 ` [patch V2 16/64] timekeeping: Use ktime_t based data for ktime_get() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 17/64] timekeeping: Provide ktime_get_with_offset() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 18/64] timekeeping: Use ktime_t based data for ktime_get_real() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 19/64] timekeeping; Use ktime_t based data for ktime_get_boottime() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 20/64] timekeeping: Use ktime_t based data for ktime_get_clocktai() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 21/64] timekeeping: Use ktime_t data for ktime_get_update_offsets_now() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 22/64] timekeeping; Use ktime based data for ktime_get_update_offsets_tick() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 23/64] timekeeping: Provide ktime_mono_to_any() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 24/64] timerfd: Use ktime_mono_to_real() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 25/64] input: evdev: " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 26/64] drm: " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 27/64] timekeeping: Remove ktime_get_monotonic_offset() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 28/64] timekeeping: Provide ktime_get[*]_ns() helpers Thomas Gleixner
2014-07-16 21:04 ` [patch V2 29/64] time: Export nsecs_to_jiffies() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 30/64] sched: Make task->real_start_time nanoseconds based Thomas Gleixner
2014-07-16 21:04 ` [patch V2 31/64] sched: Make task->start_time " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 32/64] delayacct: Make accounting nanosecond based Thomas Gleixner
2014-07-16 21:04 ` [patch V2 33/64] delayacct: Remove braindamaged type conversions Thomas Gleixner
2014-07-16 21:04 ` [patch V2 34/64] powerpc: cell: Use ktime_get_ns() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 35/64] connector: " Thomas Gleixner
2014-07-16 21:06   ` Evgeniy Polyakov
2014-07-16 21:04 ` [patch V2 36/64] mfd: cros_ec_spi: " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 37/64] misc: ioc4: " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 38/64] net: mlx5: " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 39/64] fs: lockd: " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 40/64] hwmon: ibmaem: " Thomas Gleixner
2014-07-21  8:37   ` Jean Delvare
2014-07-21 21:38     ` Darrick J. Wong
2014-07-16 21:04 ` [patch V2 41/64] iio: Use ktime_get_real_ns() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 42/64] arm: bL_switcher:k " Thomas Gleixner
2014-07-16 21:04 ` [patch V2 43/64] x86: kvm: Use ktime_get_boot_ns() Thomas Gleixner
2014-07-17 10:58   ` Paolo Bonzini
2014-07-16 21:04 ` [patch V2 44/64] x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based Thomas Gleixner
2014-07-17 10:58   ` Paolo Bonzini
2014-07-16 21:04 ` [patch V2 45/64] timekeeping: Remove monotonic_to_bootbased Thomas Gleixner
2014-07-16 21:04 ` [patch V2 46/64] timekeeping: Use ktime_get_boottime() for get_monotonic_boottime() Thomas Gleixner
2014-07-16 21:04 ` [patch V2 47/64] timekeeping: Simplify getboottime() Thomas Gleixner
2014-07-16 21:05 ` [patch V2 48/64] timekeeping: Remove timekeeper.total_sleep_time Thomas Gleixner
2014-07-16 21:05 ` [patch V2 49/64] timekeeping: Simplify timekeeping_clocktai() Thomas Gleixner
2014-07-16 21:05 ` [patch V2 50/64] hangcheck-timer: Use ktime_get_ns() Thomas Gleixner
2014-07-16 22:08   ` Greg Kroah-Hartman
2014-07-16 21:05 ` [patch V2 51/64] timekeeping: Provide ktime_get_raw() Thomas Gleixner
2014-07-16 21:05 ` [patch V2 52/64] drm: i915: Use nsec based interfaces Thomas Gleixner
2014-07-16 21:05 ` [patch V2 53/64] drm: vmwgfx: " Thomas Gleixner
2014-07-16 21:05 ` [patch V2 54/64] wireless: ath9k: Get rid of timespec conversions Thomas Gleixner
2014-07-16 21:05 ` [patch V2 55/64] clocksource: Make delta calculation a function Thomas Gleixner
2014-07-16 21:05 ` [patch V2 56/64] clocksource: Move cycle_last validation to core code Thomas Gleixner
2014-07-16 21:05 ` [patch V2 57/64] clocksource: Get rid of cycle_last Thomas Gleixner
2014-07-16 21:05 ` [patch V2 58/64] timekeeping: Restructure the timekeeper some more Thomas Gleixner
2014-07-16 21:05 ` [patch V2 59/64] timekeeping: Create struct tk_read_base and use it in struct timekeeper Thomas Gleixner
2014-07-16 21:05 ` [patch V2 60/64] timekeeping: Use tk_read_base as argument for timekeeping_get_ns() Thomas Gleixner
2014-07-16 21:05 ` [patch V2 61/64] seqcount: Provide raw_read_seqcount() Thomas Gleixner
2014-07-16 21:05 ` [patch V2 62/64] seqcount: Add raw_write_seqcount_latch() Thomas Gleixner
2014-07-16 21:05 ` Thomas Gleixner [this message]
2014-07-16 21:05 ` [patch V2 64/64] ftrace: Provide trace clocks monotonic Thomas Gleixner
2014-07-17 12:55   ` Steven Rostedt
2014-07-17 13:12     ` Steven Rostedt
2014-07-17 22:32 ` [patch V2 00/64] timekeeping: 2038, optimizations, NMI safe accessors Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140716205057.569512979@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).