All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking
@ 2013-02-21 22:51 Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 2/8] timekeeping: Make jiffies_lock internal Thomas Gleixner
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

The vsyscall based timekeeping interfaces for userspace provide the
shortest possible reader side blocking (update of the vsyscall gtod
data structure), but the kernel side interfaces to timekeeping are
blocked over the full code sequence of calculating update_wall_time()
magic which can be rather "long" due to ntp, corner cases, etc...

Eric did some work a few years ago to distangle the seqcount write
hold from the spinlock which is serializing the potential updaters of
the kernel internal timekeeper data. I couldn't be bothered to reread
the old mail thread and figure out why this got turned down, but I
remember that there were objections due to the potential inconsistency
between calculation, update and observation.

In hindsight that's nonsense, because even back at that time we did
the vsyscall update at the very least moment and unsychronized to the
in kernel data update.

While we never got any complaints about that there is a real issue
versus virtualization:

  VCPU0                                         VCPU1

  update_wall_time()
    write_seqlock_irqsave(&tk->lock, flags);
    ....

Host schedules out VCPU0

Arbitrary delay

Host schedules in VCPU0
                                                __vdso_clock_gettime()#1
    update_vsyscall();
                                                __vdso_clock_gettime()#2

Depending on the length of the delay which kept VCPU0 away from
executing and depending on the direction of the ntp update of the
timekeeping variables __vdso_clock_gettime()#2 can observe time going
backwards.

You can reproduce that by pinning VCPU0 to physical core 0 and VCPU1
to physical core 1. Now remove all load from physical core 1 except
VCPU1 and put massive load on physical core 0 and make sure that the
NTP adjustment lowers the mult factor. It's extremly hard to
reproduce, but it's possible.

So this patch series is going to expose the same issue to the kernel
side timekeeping. I'm not too worried about that, because 

 - it's extremly hard to trigger
 
 - we are aware of the issue vs. vsyscalls already

 - making the kernel behave the same way as vsyscall does not make
   things worse

 - John Stultz has already an idea how to fix it.
   See  https://lkml.org/lkml/2013/2/19/569

Though that's not the scope of this patch series, but I want to make
sure that it's documented.

Now the obvious question whether this is worth the trouble can be
answered easily. Preempt-RT users and HPC folks have complained about
the long write hold time of the timekeeping seqcount since years and a
quick test on a preempt-RT enabled kernel shows, that this series
lowers the maximum latency on the non-timekeeping cores from 8 to 4
microseconds. That's a whopping factor of 2. Defintely worth the
trouble!

Thanks,

	tglx
---
 include/linux/jiffies.h             |    1 
 include/linux/timekeeper_internal.h |    4 
 kernel/time/tick-internal.h         |    2 
 kernel/time/timekeeping.c           |  176 +++++++++++++++++++++---------------
 4 files changed, 107 insertions(+), 76 deletions(-)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 2/8] timekeeping: Make jiffies_lock internal
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 1/8] timekeeping: Calc stuff once Thomas Gleixner
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-make-jiffies-lock-internal.patch --]
[-- Type: text/plain, Size: 1445 bytes --]

Nothing outside of the timekeeping core needs that lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/jiffies.h     |    1 -
 kernel/time/tick-internal.h |    2 ++
 kernel/time/timekeeping.c   |    1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/jiffies.h
===================================================================
--- linux-2.6.orig/include/linux/jiffies.h
+++ linux-2.6/include/linux/jiffies.h
@@ -75,7 +75,6 @@ extern int register_refined_jiffies(long
  */
 extern u64 __jiffy_data jiffies_64;
 extern unsigned long volatile __jiffy_data jiffies;
-extern seqlock_t jiffies_lock;
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void);
Index: linux-2.6/kernel/time/tick-internal.h
===================================================================
--- linux-2.6.orig/kernel/time/tick-internal.h
+++ linux-2.6/kernel/time/tick-internal.h
@@ -4,6 +4,8 @@
 #include <linux/hrtimer.h>
 #include <linux/tick.h>
 
+extern seqlock_t jiffies_lock;
+
 #ifdef CONFIG_GENERIC_CLOCKEVENTS_BUILD
 
 #define TICK_DO_TIMER_NONE	-1
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -23,6 +23,7 @@
 #include <linux/stop_machine.h>
 #include <linux/pvclock_gtod.h>
 
+#include "tick-internal.h"
 
 static struct timekeeper timekeeper;
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 1/8] timekeeping: Calc stuff once
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 2/8] timekeeping: Make jiffies_lock internal Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 3/8] timekeeping: Move lock out of timekeeper struct Thomas Gleixner
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-do-not-calc-crap-over-and-over.patch --]
[-- Type: text/plain, Size: 1123 bytes --]

Calculate the cycle interval shifted value once. No functional change,
just makes the code more readable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -1102,15 +1102,16 @@ static inline void accumulate_nsecs_to_s
 static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
 						u32 shift)
 {
+	cycle_t interval = tk->cycle_interval << shift;
 	u64 raw_nsecs;
 
 	/* If the offset is smaller then a shifted interval, do nothing */
-	if (offset < tk->cycle_interval<<shift)
+	if (offset < interval)
 		return offset;
 
 	/* Accumulate one shifted interval */
-	offset -= tk->cycle_interval << shift;
-	tk->clock->cycle_last += tk->cycle_interval << shift;
+	offset -= interval;
+	tk->clock->cycle_last += interval;
 
 	tk->xtime_nsec += tk->xtime_interval << shift;
 	accumulate_nsecs_to_secs(tk);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 3/8] timekeeping: Move lock out of timekeeper struct
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 2/8] timekeeping: Make jiffies_lock internal Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 1/8] timekeeping: Calc stuff once Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 4/8] timekeeping: Split timekeeper_lock into lock and seqcount Thomas Gleixner
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-move-lock-out-of-timekeeper.patch --]
[-- Type: text/plain, Size: 11796 bytes --]

Make the lock a separate entity. Preparatory patch for shadow
timekeeper structure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timekeeper_internal.h |    2 
 kernel/time/timekeeping.c           |   96 +++++++++++++++++-------------------
 2 files changed, 47 insertions(+), 51 deletions(-)

Index: linux-2.6/include/linux/timekeeper_internal.h
===================================================================
--- linux-2.6.orig/include/linux/timekeeper_internal.h
+++ linux-2.6/include/linux/timekeeper_internal.h
@@ -62,8 +62,6 @@ struct timekeeper {
 	ktime_t			offs_boot;
 	/* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock. */
 	struct timespec		raw_time;
-	/* Seqlock for all timekeeper values */
-	seqlock_t		lock;
 };
 
 static inline struct timespec tk_xtime(struct timekeeper *tk)
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -26,6 +26,7 @@
 #include "tick-internal.h"
 
 static struct timekeeper timekeeper;
+static DEFINE_SEQLOCK(timekeeper_lock);
 
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
@@ -197,11 +198,11 @@ int pvclock_gtod_register_notifier(struc
 	unsigned long flags;
 	int ret;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 	ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb);
 	/* update timekeeping data */
 	update_pvclock_gtod(tk);
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	return ret;
 }
@@ -215,13 +216,12 @@ EXPORT_SYMBOL_GPL(pvclock_gtod_register_
  */
 int pvclock_gtod_unregister_notifier(struct notifier_block *nb)
 {
-	struct timekeeper *tk = &timekeeper;
 	unsigned long flags;
 	int ret;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 	ret = raw_notifier_chain_unregister(&pvclock_gtod_chain, nb);
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	return ret;
 }
@@ -281,12 +281,12 @@ int __getnstimeofday(struct timespec *ts
 	s64 nsecs = 0;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		ts->tv_sec = tk->xtime_sec;
 		nsecs = timekeeping_get_ns(tk);
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	ts->tv_nsec = 0;
 	timespec_add_ns(ts, nsecs);
@@ -322,11 +322,11 @@ ktime_t ktime_get(void)
 	WARN_ON(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 		secs = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
 		nsecs = timekeeping_get_ns(tk) + tk->wall_to_monotonic.tv_nsec;
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 	/*
 	 * Use ktime_set/ktime_add_ns to create a proper ktime on
 	 * 32-bit architectures without CONFIG_KTIME_SCALAR.
@@ -353,12 +353,12 @@ void ktime_get_ts(struct timespec *ts)
 	WARN_ON(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 		ts->tv_sec = tk->xtime_sec;
 		nsec = timekeeping_get_ns(tk);
 		tomono = tk->wall_to_monotonic;
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	ts->tv_sec += tomono.tv_sec;
 	ts->tv_nsec = 0;
@@ -386,7 +386,7 @@ void getnstime_raw_and_real(struct times
 	WARN_ON_ONCE(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		*ts_raw = tk->raw_time;
 		ts_real->tv_sec = tk->xtime_sec;
@@ -395,7 +395,7 @@ void getnstime_raw_and_real(struct times
 		nsecs_raw = timekeeping_get_ns_raw(tk);
 		nsecs_real = timekeeping_get_ns(tk);
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	timespec_add_ns(ts_raw, nsecs_raw);
 	timespec_add_ns(ts_real, nsecs_real);
@@ -435,7 +435,7 @@ int do_settimeofday(const struct timespe
 	if (!timespec_valid_strict(tv))
 		return -EINVAL;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 
 	timekeeping_forward_now(tk);
 
@@ -449,7 +449,7 @@ int do_settimeofday(const struct timespe
 
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -474,7 +474,7 @@ int timekeeping_inject_offset(struct tim
 	if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
 		return -EINVAL;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 
 	timekeeping_forward_now(tk);
 
@@ -491,7 +491,7 @@ int timekeeping_inject_offset(struct tim
 error: /* even if we error out, we forwarded the time, so call update */
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -513,7 +513,7 @@ static int change_clocksource(void *data
 
 	new = (struct clocksource *) data;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 
 	timekeeping_forward_now(tk);
 	if (!new->enable || new->enable(new) == 0) {
@@ -524,7 +524,7 @@ static int change_clocksource(void *data
 	}
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	return 0;
 }
@@ -574,11 +574,11 @@ void getrawmonotonic(struct timespec *ts
 	s64 nsecs;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 		nsecs = timekeeping_get_ns_raw(tk);
 		*ts = tk->raw_time;
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	timespec_add_ns(ts, nsecs);
 }
@@ -594,11 +594,11 @@ int timekeeping_valid_for_hres(void)
 	int ret;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		ret = tk->clock->flags & CLOCK_SOURCE_VALID_FOR_HRES;
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	return ret;
 }
@@ -613,11 +613,11 @@ u64 timekeeping_max_deferment(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		ret = tk->clock->max_idle_ns;
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	return ret;
 }
@@ -680,11 +680,9 @@ void __init timekeeping_init(void)
 		boot.tv_nsec = 0;
 	}
 
-	seqlock_init(&tk->lock);
-
 	ntp_init();
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 	clock = clocksource_default_clock();
 	if (clock->enable)
 		clock->enable(clock);
@@ -703,7 +701,7 @@ void __init timekeeping_init(void)
 	tmp.tv_nsec = 0;
 	tk_set_sleep_time(tk, tmp);
 
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 }
 
 /* time in seconds when suspend began */
@@ -751,7 +749,7 @@ void timekeeping_inject_sleeptime(struct
 	if (has_persistent_clock())
 		return;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 
 	timekeeping_forward_now(tk);
 
@@ -759,7 +757,7 @@ void timekeeping_inject_sleeptime(struct
 
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -783,7 +781,7 @@ static void timekeeping_resume(void)
 	clockevents_resume();
 	clocksource_resume();
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 
 	if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
 		ts = timespec_sub(ts, timekeeping_suspend_time);
@@ -794,7 +792,7 @@ static void timekeeping_resume(void)
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, false);
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	touch_softlockup_watchdog();
 
@@ -813,7 +811,7 @@ static int timekeeping_suspend(void)
 
 	read_persistent_clock(&timekeeping_suspend_time);
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 	timekeeping_forward_now(tk);
 	timekeeping_suspended = 1;
 
@@ -836,7 +834,7 @@ static int timekeeping_suspend(void)
 		timekeeping_suspend_time =
 			timespec_add(timekeeping_suspend_time, delta_delta);
 	}
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 	clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
 	clocksource_suspend();
@@ -1174,7 +1172,7 @@ static void update_wall_time(void)
 	int shift = 0, maxshift;
 	unsigned long flags;
 
-	write_seqlock_irqsave(&tk->lock, flags);
+	write_seqlock_irqsave(&timekeeper_lock, flags);
 
 	/* Make sure we're fully resumed: */
 	if (unlikely(timekeeping_suspended))
@@ -1229,7 +1227,7 @@ static void update_wall_time(void)
 	timekeeping_update(tk, false);
 
 out:
-	write_sequnlock_irqrestore(&tk->lock, flags);
+	write_sequnlock_irqrestore(&timekeeper_lock, flags);
 
 }
 
@@ -1277,13 +1275,13 @@ void get_monotonic_boottime(struct times
 	WARN_ON(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 		ts->tv_sec = tk->xtime_sec;
 		nsec = timekeeping_get_ns(tk);
 		tomono = tk->wall_to_monotonic;
 		sleep = tk->total_sleep_time;
 
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	ts->tv_sec += tomono.tv_sec + sleep.tv_sec;
 	ts->tv_nsec = 0;
@@ -1342,10 +1340,10 @@ struct timespec current_kernel_time(void
 	unsigned long seq;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		now = tk_xtime(tk);
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	return now;
 }
@@ -1358,11 +1356,11 @@ struct timespec get_monotonic_coarse(voi
 	unsigned long seq;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		now = tk_xtime(tk);
 		mono = tk->wall_to_monotonic;
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	set_normalized_timespec(&now, now.tv_sec + mono.tv_sec,
 				now.tv_nsec + mono.tv_nsec);
@@ -1393,11 +1391,11 @@ void get_xtime_and_monotonic_and_sleep_o
 	unsigned long seq;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 		*xtim = tk_xtime(tk);
 		*wtom = tk->wall_to_monotonic;
 		*sleep = tk->total_sleep_time;
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 }
 
 #ifdef CONFIG_HIGH_RES_TIMERS
@@ -1417,14 +1415,14 @@ ktime_t ktime_get_update_offsets(ktime_t
 	u64 secs, nsecs;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 
 		secs = tk->xtime_sec;
 		nsecs = timekeeping_get_ns(tk);
 
 		*offs_real = tk->offs_real;
 		*offs_boot = tk->offs_boot;
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	now = ktime_add_ns(ktime_set(secs, 0), nsecs);
 	now = ktime_sub(now, *offs_real);
@@ -1442,9 +1440,9 @@ ktime_t ktime_get_monotonic_offset(void)
 	struct timespec wtom;
 
 	do {
-		seq = read_seqbegin(&tk->lock);
+		seq = read_seqbegin(&timekeeper_lock);
 		wtom = tk->wall_to_monotonic;
-	} while (read_seqretry(&tk->lock, seq));
+	} while (read_seqretry(&timekeeper_lock, seq));
 
 	return timespec_to_ktime(wtom);
 }



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 4/8] timekeeping: Split timekeeper_lock into lock and seqcount
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
                   ` (2 preceding siblings ...)
  2013-02-21 22:51 ` [RFC patch 3/8] timekeeping: Move lock out of timekeeper struct Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 5/8] timekeeping: Store cycle_last value in timekeeper struct as well Thomas Gleixner
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-split-timekeeper-lock.patch --]
[-- Type: text/plain, Size: 13062 bytes --]

We want to shorten the seqcount write hold time. So split the seqlock
into a lock and a seqcount.

Open code the seqwrite_lock in the places which matter and drop the
sequence counter update where it's pointless.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |  118 +++++++++++++++++++++++++---------------------
 1 file changed, 65 insertions(+), 53 deletions(-)

Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -26,7 +26,8 @@
 #include "tick-internal.h"
 
 static struct timekeeper timekeeper;
-static DEFINE_SEQLOCK(timekeeper_lock);
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+static seqcount_t timekeeper_seq;
 
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
@@ -189,8 +190,6 @@ static void update_pvclock_gtod(struct t
 
 /**
  * pvclock_gtod_register_notifier - register a pvclock timedata update listener
- *
- * Must hold write on timekeeper.lock
  */
 int pvclock_gtod_register_notifier(struct notifier_block *nb)
 {
@@ -198,11 +197,10 @@ int pvclock_gtod_register_notifier(struc
 	unsigned long flags;
 	int ret;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
 	ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb);
-	/* update timekeeping data */
 	update_pvclock_gtod(tk);
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	return ret;
 }
@@ -211,23 +209,21 @@ EXPORT_SYMBOL_GPL(pvclock_gtod_register_
 /**
  * pvclock_gtod_unregister_notifier - unregister a pvclock
  * timedata update listener
- *
- * Must hold write on timekeeper.lock
  */
 int pvclock_gtod_unregister_notifier(struct notifier_block *nb)
 {
 	unsigned long flags;
 	int ret;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
 	ret = raw_notifier_chain_unregister(&pvclock_gtod_chain, nb);
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	return ret;
 }
 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier);
 
-/* must hold write on timekeeper.lock */
+/* must hold timekeeper_lock */
 static void timekeeping_update(struct timekeeper *tk, bool clearntp)
 {
 	if (clearntp) {
@@ -281,12 +277,12 @@ int __getnstimeofday(struct timespec *ts
 	s64 nsecs = 0;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		ts->tv_sec = tk->xtime_sec;
 		nsecs = timekeeping_get_ns(tk);
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	ts->tv_nsec = 0;
 	timespec_add_ns(ts, nsecs);
@@ -322,11 +318,11 @@ ktime_t ktime_get(void)
 	WARN_ON(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 		secs = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
 		nsecs = timekeeping_get_ns(tk) + tk->wall_to_monotonic.tv_nsec;
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 	/*
 	 * Use ktime_set/ktime_add_ns to create a proper ktime on
 	 * 32-bit architectures without CONFIG_KTIME_SCALAR.
@@ -353,12 +349,12 @@ void ktime_get_ts(struct timespec *ts)
 	WARN_ON(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 		ts->tv_sec = tk->xtime_sec;
 		nsec = timekeeping_get_ns(tk);
 		tomono = tk->wall_to_monotonic;
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	ts->tv_sec += tomono.tv_sec;
 	ts->tv_nsec = 0;
@@ -386,7 +382,7 @@ void getnstime_raw_and_real(struct times
 	WARN_ON_ONCE(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		*ts_raw = tk->raw_time;
 		ts_real->tv_sec = tk->xtime_sec;
@@ -395,7 +391,7 @@ void getnstime_raw_and_real(struct times
 		nsecs_raw = timekeeping_get_ns_raw(tk);
 		nsecs_real = timekeeping_get_ns(tk);
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	timespec_add_ns(ts_raw, nsecs_raw);
 	timespec_add_ns(ts_real, nsecs_real);
@@ -435,7 +431,8 @@ int do_settimeofday(const struct timespe
 	if (!timespec_valid_strict(tv))
 		return -EINVAL;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 
 	timekeeping_forward_now(tk);
 
@@ -449,7 +446,8 @@ int do_settimeofday(const struct timespe
 
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -474,7 +472,8 @@ int timekeeping_inject_offset(struct tim
 	if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
 		return -EINVAL;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 
 	timekeeping_forward_now(tk);
 
@@ -491,7 +490,8 @@ int timekeeping_inject_offset(struct tim
 error: /* even if we error out, we forwarded the time, so call update */
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -513,7 +513,8 @@ static int change_clocksource(void *data
 
 	new = (struct clocksource *) data;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 
 	timekeeping_forward_now(tk);
 	if (!new->enable || new->enable(new) == 0) {
@@ -524,7 +525,8 @@ static int change_clocksource(void *data
 	}
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	return 0;
 }
@@ -574,11 +576,11 @@ void getrawmonotonic(struct timespec *ts
 	s64 nsecs;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 		nsecs = timekeeping_get_ns_raw(tk);
 		*ts = tk->raw_time;
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	timespec_add_ns(ts, nsecs);
 }
@@ -594,11 +596,11 @@ int timekeeping_valid_for_hres(void)
 	int ret;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		ret = tk->clock->flags & CLOCK_SOURCE_VALID_FOR_HRES;
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	return ret;
 }
@@ -613,11 +615,11 @@ u64 timekeeping_max_deferment(void)
 	u64 ret;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		ret = tk->clock->max_idle_ns;
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	return ret;
 }
@@ -682,7 +684,8 @@ void __init timekeeping_init(void)
 
 	ntp_init();
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 	clock = clocksource_default_clock();
 	if (clock->enable)
 		clock->enable(clock);
@@ -701,7 +704,8 @@ void __init timekeeping_init(void)
 	tmp.tv_nsec = 0;
 	tk_set_sleep_time(tk, tmp);
 
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 }
 
 /* time in seconds when suspend began */
@@ -749,7 +753,8 @@ void timekeeping_inject_sleeptime(struct
 	if (has_persistent_clock())
 		return;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 
 	timekeeping_forward_now(tk);
 
@@ -757,7 +762,8 @@ void timekeeping_inject_sleeptime(struct
 
 	timekeeping_update(tk, true);
 
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	/* signal hrtimers about time change */
 	clock_was_set();
@@ -781,7 +787,8 @@ static void timekeeping_resume(void)
 	clockevents_resume();
 	clocksource_resume();
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 
 	if (timespec_compare(&ts, &timekeeping_suspend_time) > 0) {
 		ts = timespec_sub(ts, timekeeping_suspend_time);
@@ -792,7 +799,8 @@ static void timekeeping_resume(void)
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, false);
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	touch_softlockup_watchdog();
 
@@ -811,7 +819,8 @@ static int timekeeping_suspend(void)
 
 	read_persistent_clock(&timekeeping_suspend_time);
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 	timekeeping_forward_now(tk);
 	timekeeping_suspended = 1;
 
@@ -834,7 +843,8 @@ static int timekeeping_suspend(void)
 		timekeeping_suspend_time =
 			timespec_add(timekeeping_suspend_time, delta_delta);
 	}
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
 	clocksource_suspend();
@@ -1172,7 +1182,8 @@ static void update_wall_time(void)
 	int shift = 0, maxshift;
 	unsigned long flags;
 
-	write_seqlock_irqsave(&timekeeper_lock, flags);
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	write_seqcount_begin(&timekeeper_seq);
 
 	/* Make sure we're fully resumed: */
 	if (unlikely(timekeeping_suspended))
@@ -1227,7 +1238,8 @@ static void update_wall_time(void)
 	timekeeping_update(tk, false);
 
 out:
-	write_sequnlock_irqrestore(&timekeeper_lock, flags);
+	write_seqcount_end(&timekeeper_seq);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 }
 
@@ -1275,13 +1287,13 @@ void get_monotonic_boottime(struct times
 	WARN_ON(timekeeping_suspended);
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 		ts->tv_sec = tk->xtime_sec;
 		nsec = timekeeping_get_ns(tk);
 		tomono = tk->wall_to_monotonic;
 		sleep = tk->total_sleep_time;
 
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	ts->tv_sec += tomono.tv_sec + sleep.tv_sec;
 	ts->tv_nsec = 0;
@@ -1340,10 +1352,10 @@ struct timespec current_kernel_time(void
 	unsigned long seq;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		now = tk_xtime(tk);
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	return now;
 }
@@ -1356,11 +1368,11 @@ struct timespec get_monotonic_coarse(voi
 	unsigned long seq;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		now = tk_xtime(tk);
 		mono = tk->wall_to_monotonic;
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	set_normalized_timespec(&now, now.tv_sec + mono.tv_sec,
 				now.tv_nsec + mono.tv_nsec);
@@ -1391,11 +1403,11 @@ void get_xtime_and_monotonic_and_sleep_o
 	unsigned long seq;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 		*xtim = tk_xtime(tk);
 		*wtom = tk->wall_to_monotonic;
 		*sleep = tk->total_sleep_time;
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 }
 
 #ifdef CONFIG_HIGH_RES_TIMERS
@@ -1415,14 +1427,14 @@ ktime_t ktime_get_update_offsets(ktime_t
 	u64 secs, nsecs;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 
 		secs = tk->xtime_sec;
 		nsecs = timekeeping_get_ns(tk);
 
 		*offs_real = tk->offs_real;
 		*offs_boot = tk->offs_boot;
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	now = ktime_add_ns(ktime_set(secs, 0), nsecs);
 	now = ktime_sub(now, *offs_real);
@@ -1440,9 +1452,9 @@ ktime_t ktime_get_monotonic_offset(void)
 	struct timespec wtom;
 
 	do {
-		seq = read_seqbegin(&timekeeper_lock);
+		seq = read_seqcount_begin(&timekeeper_seq);
 		wtom = tk->wall_to_monotonic;
-	} while (read_seqretry(&timekeeper_lock, seq));
+	} while (read_seqcount_retry(&timekeeper_seq, seq));
 
 	return timespec_to_ktime(wtom);
 }



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 5/8] timekeeping: Store cycle_last value in timekeeper struct as well
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
                   ` (3 preceding siblings ...)
  2013-02-21 22:51 ` [RFC patch 4/8] timekeeping: Split timekeeper_lock into lock and seqcount Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 6/8] timekeeping: Delay update of clock->cycle_last Thomas Gleixner
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-store-cycle-last-in-timekeeper.patch --]
[-- Type: text/plain, Size: 1864 bytes --]

For implementing a shadow timekeeper and a split calculation/update
region we need to store the cycle_last value in the timekeeper and
update the value in the clocksource struct only in the update region.

Add the extra storage to the timekeeper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timekeeper_internal.h |    2 ++
 kernel/time/timekeeping.c           |    4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/timekeeper_internal.h
===================================================================
--- linux-2.6.orig/include/linux/timekeeper_internal.h
+++ linux-2.6/include/linux/timekeeper_internal.h
@@ -20,6 +20,8 @@ struct timekeeper {
 	u32			shift;
 	/* Number of clock cycles in one NTP interval. */
 	cycle_t			cycle_interval;
+	/* Last cycle value (also stored in clock->cycle_last) */
+	cycle_t			cycle_last;
 	/* Number of clock shifted nano seconds in one NTP interval. */
 	u64			xtime_interval;
 	/* shifted nano seconds left over when rounding cycle_interval */
Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -99,7 +99,7 @@ static void tk_setup_internals(struct ti
 
 	old_clock = tk->clock;
 	tk->clock = clock;
-	clock->cycle_last = clock->read(clock);
+	tk->cycle_last = clock->cycle_last = clock->read(clock);
 
 	/* Do the ns -> cycle conversion first, using original mult */
 	tmp = NTP_INTERVAL_LENGTH;
@@ -250,7 +250,7 @@ static void timekeeping_forward_now(stru
 	clock = tk->clock;
 	cycle_now = clock->read(clock);
 	cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
-	clock->cycle_last = cycle_now;
+	tk->cycle_last = clock->cycle_last = cycle_now;
 
 	tk->xtime_nsec += cycle_delta * tk->mult;
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 6/8] timekeeping: Delay update of clock->cycle_last
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
                   ` (4 preceding siblings ...)
  2013-02-21 22:51 ` [RFC patch 5/8] timekeeping: Store cycle_last value in timekeeper struct as well Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 8/8] timekeeping: Shorten seq_count region Thomas Gleixner
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-delay-clock-cycle-last-update.patch --]
[-- Type: text/plain, Size: 1017 bytes --]

For calculating the new timekeeper values store the new cycle_last
value in the timekeeper and update the clock->cycle_last just when we
actually update the new values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/time/timekeeping.c
===================================================================
--- linux-2.6.orig/kernel/time/timekeeping.c
+++ linux-2.6/kernel/time/timekeeping.c
@@ -1120,7 +1120,7 @@ static cycle_t logarithmic_accumulation(
 
 	/* Accumulate one shifted interval */
 	offset -= interval;
-	tk->clock->cycle_last += interval;
+	tk->cycle_last += interval;
 
 	tk->xtime_nsec += tk->xtime_interval << shift;
 	accumulate_nsecs_to_secs(tk);
@@ -1235,6 +1235,8 @@ static void update_wall_time(void)
 	 */
 	accumulate_nsecs_to_secs(tk);
 
+	/* Update clock->cycle_last with the new value */
+	clock->cycle_last = tk->cycle_last;
 	timekeeping_update(tk, false);
 
 out:



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 7/8] timekeeping: Implement a shadow timekeeper
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
                   ` (6 preceding siblings ...)
  2013-02-21 22:51 ` [RFC patch 8/8] timekeeping: Shorten seq_count region Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-22 23:53   ` John Stultz
  2013-02-21 23:06 ` [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Eric Dumazet
  8 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-implement-shadow-timekeeper.patch --]
[-- Type: text/plain, Size: 4888 bytes --]

Use the shadow timekeeper to do the update_wall_time() adjustments and
then copy it over to the real timekeeper.

Keep the shadow timekeeper in sync when updating stuff outside of
update_wall_time().

This allows us to limit the timekeeper_seq hold time to the update of
the real timekeeper and the vsyscall data in the next patch.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |   41 +++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

Index: linux-3.6/kernel/time/timekeeping.c
===================================================================
--- linux-3.6.orig/kernel/time/timekeeping.c
+++ linux-3.6/kernel/time/timekeeping.c
@@ -28,6 +28,7 @@
 static struct timekeeper timekeeper;
 static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static seqcount_t timekeeper_seq;
+static struct timekeeper shadow_timekeeper;
 
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
@@ -224,7 +225,7 @@ int pvclock_gtod_unregister_notifier(str
 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier);
 
 /* must hold timekeeper_lock */
-static void timekeeping_update(struct timekeeper *tk, bool clearntp)
+static void timekeeping_update(struct timekeeper *tk, bool clearntp, bool mirror)
 {
 	if (clearntp) {
 		tk->ntp_error = 0;
@@ -232,6 +233,9 @@ static void timekeeping_update(struct ti
 	}
 	update_vsyscall(tk);
 	update_pvclock_gtod(tk);
+
+	if (mirror)
+		memcpy(&shadow_timekeeper, &timekeeper, sizeof(timekeeper));
 }
 
 /**
@@ -444,7 +448,7 @@ int do_settimeofday(const struct timespe
 
 	tk_set_xtime(tk, tv);
 
-	timekeeping_update(tk, true);
+	timekeeping_update(tk, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -488,7 +492,7 @@ int timekeeping_inject_offset(struct tim
 	tk_set_wall_to_mono(tk, timespec_sub(tk->wall_to_monotonic, *ts));
 
 error: /* even if we error out, we forwarded the time, so call update */
-	timekeeping_update(tk, true);
+	timekeeping_update(tk, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -523,7 +527,7 @@ static int change_clocksource(void *data
 		if (old->disable)
 			old->disable(old);
 	}
-	timekeeping_update(tk, true);
+	timekeeping_update(tk, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -704,6 +708,8 @@ void __init timekeeping_init(void)
 	tmp.tv_nsec = 0;
 	tk_set_sleep_time(tk, tmp);
 
+	memcpy(&shadow_timekeeper, &timekeeper, sizeof(timekeeper));
+
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 }
@@ -760,7 +766,7 @@ void timekeeping_inject_sleeptime(struct
 
 	__timekeeping_inject_sleeptime(tk, delta);
 
-	timekeeping_update(tk, true);
+	timekeeping_update(tk, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -798,7 +804,7 @@ static void timekeeping_resume(void)
 	tk->clock->cycle_last = tk->clock->read(tk->clock);
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
-	timekeeping_update(tk, false);
+	timekeeping_update(tk, false, true);
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
@@ -1177,7 +1183,8 @@ static inline void old_vsyscall_fixup(st
 static void update_wall_time(void)
 {
 	struct clocksource *clock;
-	struct timekeeper *tk = &timekeeper;
+	struct timekeeper *real_tk = &timekeeper;
+	struct timekeeper *tk = &shadow_timekeeper;
 	cycle_t offset;
 	int shift = 0, maxshift;
 	unsigned long flags;
@@ -1189,16 +1196,16 @@ static void update_wall_time(void)
 	if (unlikely(timekeeping_suspended))
 		goto out;
 
-	clock = tk->clock;
+	clock = real_tk->clock;
 
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
-	offset = tk->cycle_interval;
+	offset = real_tk->cycle_interval;
 #else
 	offset = (clock->read(clock) - clock->cycle_last) & clock->mask;
 #endif
 
 	/* Check if there's really nothing to do */
-	if (offset < tk->cycle_interval)
+	if (offset < real_tk->cycle_interval)
 		goto out;
 
 	/*
@@ -1237,12 +1244,22 @@ static void update_wall_time(void)
 
 	/* Update clock->cycle_last with the new value */
 	clock->cycle_last = tk->cycle_last;
-	timekeeping_update(tk, false);
+	/*
+	 * Update the real timekeeper.
+	 *
+	 * We could avoid this memcpy by switching pointers, but that
+	 * requires changes to all other timekeeper usage sites as
+	 * well, i.e. move the timekeeper pointer getter into the
+	 * spinlocked/seqcount protected sections. And we trade this
+	 * memcpy under the timekeeper_seq against one before we start
+	 * updating.
+	 */
+	memcpy(real_tk, tk, sizeof(*tk));
+	timekeeping_update(real_tk, false, false);
 
 out:
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-
 }
 
 /**



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC patch 8/8] timekeeping: Shorten seq_count region
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
                   ` (5 preceding siblings ...)
  2013-02-21 22:51 ` [RFC patch 6/8] timekeeping: Delay update of clock->cycle_last Thomas Gleixner
@ 2013-02-21 22:51 ` Thomas Gleixner
  2013-02-21 22:51 ` [RFC patch 7/8] timekeeping: Implement a shadow timekeeper Thomas Gleixner
  2013-02-21 23:06 ` [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Eric Dumazet
  8 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-21 22:51 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

[-- Attachment #1: timekeeping-shorten-seq-count-region.patch --]
[-- Type: text/plain, Size: 1328 bytes --]

Shorten the seqcount write hold region to the actual update of the
timekeeper and the related data (e.g vsyscall).

On a contemporary x86 system this reduces the maximum latencies on
Preempt-RT from 8us to 4us on the non-timekeeping cores.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Index: linux-3.6/kernel/time/timekeeping.c
===================================================================
--- linux-3.6.orig/kernel/time/timekeeping.c
+++ linux-3.6/kernel/time/timekeeping.c
@@ -1190,7 +1190,6 @@ static void update_wall_time(void)
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&timekeeper_lock, flags);
-	write_seqcount_begin(&timekeeper_seq);
 
 	/* Make sure we're fully resumed: */
 	if (unlikely(timekeeping_suspended))
@@ -1242,6 +1241,7 @@ static void update_wall_time(void)
 	 */
 	accumulate_nsecs_to_secs(tk);
 
+	write_seqcount_begin(&timekeeper_seq);
 	/* Update clock->cycle_last with the new value */
 	clock->cycle_last = tk->cycle_last;
 	/*
@@ -1256,9 +1256,8 @@ static void update_wall_time(void)
 	 */
 	memcpy(real_tk, tk, sizeof(*tk));
 	timekeeping_update(real_tk, false, false);
-
-out:
 	write_seqcount_end(&timekeeper_seq);
+out:
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 }
 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking
  2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
                   ` (7 preceding siblings ...)
  2013-02-21 22:51 ` [RFC patch 7/8] timekeeping: Implement a shadow timekeeper Thomas Gleixner
@ 2013-02-21 23:06 ` Eric Dumazet
  8 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2013-02-21 23:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Stultz, Ingo Molnar, Peter Zijlstra, Eric Dumazet,
	Frederic Weisbecker

On Thu, 2013-02-21 at 22:51 +0000, Thomas Gleixner wrote:

> Now the obvious question whether this is worth the trouble can be
> answered easily. Preempt-RT users and HPC folks have complained about
> the long write hold time of the timekeeping seqcount since years and a
> quick test on a preempt-RT enabled kernel shows, that this series
> lowers the maximum latency on the non-timekeeping cores from 8 to 4
> microseconds. That's a whopping factor of 2. Defintely worth the
> trouble!

Thanks for doing this !



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC patch 7/8] timekeeping: Implement a shadow timekeeper
  2013-02-21 22:51 ` [RFC patch 7/8] timekeeping: Implement a shadow timekeeper Thomas Gleixner
@ 2013-02-22 23:53   ` John Stultz
  2013-02-26 12:17     ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: John Stultz @ 2013-02-22 23:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Eric Dumazet, Frederic Weisbecker

On 02/21/2013 02:51 PM, Thomas Gleixner wrote:
> Use the shadow timekeeper to do the update_wall_time() adjustments and
> then copy it over to the real timekeeper.
>
> Keep the shadow timekeeper in sync when updating stuff outside of
> update_wall_time().
>
> This allows us to limit the timekeeper_seq hold time to the update of
> the real timekeeper and the vsyscall data in the next patch.
>
> Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
> ---

So up to here it all looks ok to me (and not so different from my 
earlier attempts at the same).

The only gotcha here that I realized with my earlier patches, is that in 
order to do the shadow copy update properly, we are also going to need 
to merge the NTP state data into the timekeeper. Otherwise, we could run 
into odd cases where as we update the shadow copy, we change the NTP 
state which then would affect the non-shadow timekeeping state that is 
about to be updated. One example: A the leap second lands, and the tai 
offset gets bumped in the ntp state, while we do a similar counter 
adjustment to the shadow-copy. Then before the real/active timekeeper is 
updated, someone gets the tai offset and applies it to that pre-update 
timekeeper state, and gets an invalid tai time.

The down side is that the NTP state data is fairly large, and so adding 
it to the timekeeper will cause the memcopys to be a bit more painful.

I'm looking at the NTP code now to try to see if we can bound where the 
NTP state is accessed, so we can maybe thin out what ntp state is linked 
to timekeeper updates, and only move that data over to the timekeeper.

thanks
-john






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC patch 7/8] timekeeping: Implement a shadow timekeeper
  2013-02-22 23:53   ` John Stultz
@ 2013-02-26 12:17     ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2013-02-26 12:17 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Eric Dumazet, Frederic Weisbecker

On Fri, 22 Feb 2013, John Stultz wrote:

> On 02/21/2013 02:51 PM, Thomas Gleixner wrote:
> > Use the shadow timekeeper to do the update_wall_time() adjustments and
> > then copy it over to the real timekeeper.
> > 
> > Keep the shadow timekeeper in sync when updating stuff outside of
> > update_wall_time().
> > 
> > This allows us to limit the timekeeper_seq hold time to the update of
> > the real timekeeper and the vsyscall data in the next patch.
> > 
> > Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
> > ---
> 
> So up to here it all looks ok to me (and not so different from my earlier
> attempts at the same).
> 
> The only gotcha here that I realized with my earlier patches, is that in order
> to do the shadow copy update properly, we are also going to need to merge the
> NTP state data into the timekeeper. Otherwise, we could run into odd cases
> where as we update the shadow copy, we change the NTP state which then would
> affect the non-shadow timekeeping state that is about to be updated. One
> example: A the leap second lands, and the tai offset gets bumped in the ntp
> state, while we do a similar counter adjustment to the shadow-copy. Then
> before the real/active timekeeper is updated, someone gets the tai offset and
> applies it to that pre-update timekeeper state, and gets an invalid tai time.
> 
> The down side is that the NTP state data is fairly large, and so adding it to
> the timekeeper will cause the memcopys to be a bit more painful.
> 
> I'm looking at the NTP code now to try to see if we can bound where the NTP
> state is accessed, so we can maybe thin out what ntp state is linked to
> timekeeper updates, and only move that data over to the timekeeper.

Hmm. Can we block the NTP data readout while we are doing the update ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-02-26 12:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-21 22:51 [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 2/8] timekeeping: Make jiffies_lock internal Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 1/8] timekeeping: Calc stuff once Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 3/8] timekeeping: Move lock out of timekeeper struct Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 4/8] timekeeping: Split timekeeper_lock into lock and seqcount Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 5/8] timekeeping: Store cycle_last value in timekeeper struct as well Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 6/8] timekeeping: Delay update of clock->cycle_last Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 8/8] timekeeping: Shorten seq_count region Thomas Gleixner
2013-02-21 22:51 ` [RFC patch 7/8] timekeeping: Implement a shadow timekeeper Thomas Gleixner
2013-02-22 23:53   ` John Stultz
2013-02-26 12:17     ` Thomas Gleixner
2013-02-21 23:06 ` [RFC patch 0/8] timekeeping: Implement shadow timekeeper to shorten in kernel reader side blocking Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.