All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue
@ 2012-07-17 18:05 John Stultz
  2012-07-17 18:05 ` [PATCH 01/11] 2.6.35.x: ntp: Fix leap-second hrtimer livelock John Stultz
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: John Stultz, Prarit Bhargava, Thomas Gleixner, Linux Kernel

Here is backport of the leapsecond fixes to 2.6.35-stable. These are less
straight forward, and should get closer review.

This patch set addresses two issues:

1) Deadlock leapsecond issue that a few reports described.

I spent some time over the weekend trying to find a way to reproduce
the hard-hang issue some folks were reporting after the leapsecond.
Initially I didn't think the 6b43ae8a619d17 leap-second hrimter livelock
patch needed to be backported since, I assumed it required the ntp_lock
split for it to be triggered, but looking again I found that the same
issue could occur prior to splitting out the ntp_lock. So I've backported
that fix (and its follow-on fixups) as well as created a test case
to reproduce the hard-hang deadlock.


2) Early hrtimer/futex expiration issue that was more widely observed

This is the load-spike issue that a number of folks saw that did not
hard hang most boxes (although some reports did show nmi-watchdogs
triggering due to sudden spinning in tight loops).

I've booted and tested this entire patchset on two boxes and run through a
number of leapsecond related stress tests. However, additional testing and
review would be appreciated. Especially as the backports get further away
from upstream.

The original commits backported in this set are:

Deadlock issue fixes:
---------------------
6b43ae8a619d17c4935c3320d2ef9e92bdeed05d    ntp: Fix leap-second hrtimer livelock
dd48d708ff3e917f6d6b6c2b696c3f18c019feed    ntp: Correct TAI offset during leap second
fad0c66c4bb836d57a5f125ecd38bed653ca863a    timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond

Helper change: (allows the following fixes to backport more easily):
--------------------------------------------------------------------
cc06268c6a87db156af2daed6e96a936b955cc82    time: Move common updates to a function

Hrtimer early-expiration issue fixes:
-------------------------------
f55a6faa384304c89cfef162768e88374d3312cb    hrtimer: Provide clock_was_set_delayed()
4873fa070ae84a4115f0b3c9dfabc224f1bc7c51    timekeeping: Fix leapsecond triggered load spike issue
5b9fe759a678e05be4937ddf03d50e950207c1c0    timekeeping: Maintain ktime_t based offsets for hrtimers
196951e91262fccda81147d2bcf7fdab08668b40    hrtimers: Move lock held region in hrtimer_interrupt()
f6c06abfb3972ad4914cef57d8348fcb2932bc3b    timekeeping: Provide hrtimer update function
5baefd6d84163443215f4a99f6a20f054ef11236    hrtimer: Update hrtimer base offsets each hrtimer_interrupt
3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0    timekeeping: Add missing update call in timekeeping_resume()


I've already done backports to all the stable kernels to 2.6.32, and
will send out the rest soon.

Please let me know if you have any comments or feedback. 

thanks
-john


Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>

John Stultz (5):
  2.6.35.x: ntp: Fix leap-second hrtimer livelock
  2.6.35.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during
    leapsecond
  2.6.35.x: hrtimer: Provide clock_was_set_delayed()
  2.6.35.x: timekeeping: Fix leapsecond triggered load spike issue
  2.6.35.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt

Richard Cochran (1):
  2.6.35.x: ntp: Correct TAI offset during leap second

Thomas Gleixner (5):
  2.6.35.x: time: Move common updates to a function
  2.6.35.x: timekeeping: Maintain ktime_t based offsets for hrtimers
  2.6.35.x: hrtimers: Move lock held region in hrtimer_interrupt()
  2.6.35.x: timekeeping: Provide hrtimer update function
  2.6.35.x: timekeeping: Add missing update call in
    timekeeping_resume()

 include/linux/hrtimer.h   |    9 +++-
 include/linux/timex.h     |    2 +-
 kernel/hrtimer.c          |   52 ++++++++++++-------
 kernel/time/ntp.c         |  124 +++++++++++++++------------------------------
 kernel/time/timekeeping.c |   97 +++++++++++++++++++++++++++++------
 5 files changed, 167 insertions(+), 117 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 01/11] 2.6.35.x: ntp: Fix leap-second hrtimer livelock
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 02/11] 2.6.35.x: ntp: Correct TAI offset during leap second John Stultz
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable
  Cc: John Stultz, Sasha Levin, Thomas Gleixner, Prarit Bhargava, Linux Kernel

From: John Stultz <john.stultz@linaro.org>

This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.

Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
 write_seqlock_irq(&xtime_lock);
  process_adjtimex_modes()
   process_adj_status()
    ntp_start_leap_timer()
     hrtimer_start()
      hrtimer_reprogram()
       tick_program_event()
        clockevents_program_event()
         ktime_get()
          seq = req_seqbegin(xtime_lock); [DEADLOCK]

This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.

NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.

Original commit message below:

Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0                                                    CPU 1
do_adjtimex()
  spin_lock_irq(&ntp_lock);
    process_adjtimex_modes();				 timer_interrupt()
      process_adj_status();                                do_timer()
        ntp_start_leap_timer();                             write_lock(&xtime_lock);
          hrtimer_start();                                  update_wall_time();
             hrtimer_reprogram();                            ntp_tick_length()
               tick_program_event()                            spin_lock(&ntp_lock);
                 clockevents_program_event()
		   ktime_get()
                     seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ)  after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

CC: Sasha Levin <levinsasha928@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timex.h     |    2 +-
 kernel/time/ntp.c         |  122 +++++++++++++++------------------------------
 kernel/time/timekeeping.c |   16 +++---
 3 files changed, 48 insertions(+), 92 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index 32d852f..5eadb59 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -265,7 +265,7 @@ static inline int ntp_synced(void)
 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
 extern u64 tick_length;
 
-extern void second_overflow(void);
+extern int second_overflow(unsigned long secs);
 extern void update_ntp_one_tick(void);
 extern int do_adjtimex(struct timex *);
 
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index c631168..67ef677 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -28,8 +28,6 @@ unsigned long			tick_nsec;
 u64				tick_length;
 static u64			tick_length_base;
 
-static struct hrtimer		leap_timer;
-
 #define MAX_TICKADJ		500LL		/* usecs */
 #define MAX_TICKADJ_SCALED \
 	(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -180,60 +178,60 @@ void ntp_clear(void)
 }
 
 /*
- * Leap second processing. If in leap-insert state at the end of the
- * day, the system clock is set back one second; if in leap-delete
- * state, the system clock is set ahead one second.
+ * this routine handles the overflow of the microsecond field
+ *
+ * The tricky bits of code to handle the accurate clock support
+ * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
+ * They were originally developed for SUN and DEC kernels.
+ * All the kudos should go to Dave for this stuff.
+ *
+ * Also handles leap second processing, and returns leap offset
  */
-static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
+int second_overflow(unsigned long secs)
 {
-	enum hrtimer_restart res = HRTIMER_NORESTART;
-
-	write_seqlock(&xtime_lock);
+	int leap = 0;
+	s64 delta;
 
+	/*
+	 * Leap second processing. If in leap-insert state at the end of the
+	 * day, the system clock is set back one second; if in leap-delete
+	 * state, the system clock is set ahead one second.
+	 */
 	switch (time_state) {
 	case TIME_OK:
+		if (time_status & STA_INS)
+			time_state = TIME_INS;
+		else if (time_status & STA_DEL)
+			time_state = TIME_DEL;
 		break;
 	case TIME_INS:
-		timekeeping_leap_insert(-1);
-		time_state = TIME_OOP;
-		printk(KERN_NOTICE
-			"Clock: inserting leap second 23:59:60 UTC\n");
-		hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
-		res = HRTIMER_RESTART;
+		if (secs % 86400 == 0) {
+			leap = -1;
+			time_state = TIME_OOP;
+			printk(KERN_NOTICE
+				"Clock: inserting leap second 23:59:60 UTC\n");
+		}
 		break;
 	case TIME_DEL:
-		timekeeping_leap_insert(1);
-		time_tai--;
-		time_state = TIME_WAIT;
-		printk(KERN_NOTICE
-			"Clock: deleting leap second 23:59:59 UTC\n");
+		if ((secs + 1) % 86400 == 0) {
+			leap = 1;
+			time_tai--;
+			time_state = TIME_WAIT;
+			printk(KERN_NOTICE
+				"Clock: deleting leap second 23:59:59 UTC\n");
+		}
 		break;
 	case TIME_OOP:
 		time_tai++;
 		time_state = TIME_WAIT;
-		/* fall through */
+		break;
+
 	case TIME_WAIT:
 		if (!(time_status & (STA_INS | STA_DEL)))
 			time_state = TIME_OK;
 		break;
 	}
 
-	write_sequnlock(&xtime_lock);
-
-	return res;
-}
-
-/*
- * this routine handles the overflow of the microsecond field
- *
- * The tricky bits of code to handle the accurate clock support
- * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
- * They were originally developed for SUN and DEC kernels.
- * All the kudos should go to Dave for this stuff.
- */
-void second_overflow(void)
-{
-	s64 delta;
 
 	/* Bump the maxerror field */
 	time_maxerror += MAXFREQ / NSEC_PER_USEC;
@@ -253,23 +251,25 @@ void second_overflow(void)
 	tick_length	+= delta;
 
 	if (!time_adjust)
-		return;
+		goto out;
 
 	if (time_adjust > MAX_TICKADJ) {
 		time_adjust -= MAX_TICKADJ;
 		tick_length += MAX_TICKADJ_SCALED;
-		return;
+		goto out;
 	}
 
 	if (time_adjust < -MAX_TICKADJ) {
 		time_adjust += MAX_TICKADJ;
 		tick_length -= MAX_TICKADJ_SCALED;
-		return;
+		goto out;
 	}
 
 	tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ)
 							 << NTP_SCALE_SHIFT;
 	time_adjust = 0;
+out:
+	return leap;
 }
 
 #ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -331,27 +331,6 @@ static void notify_cmos_timer(void)
 static inline void notify_cmos_timer(void) { }
 #endif
 
-/*
- * Start the leap seconds timer:
- */
-static inline void ntp_start_leap_timer(struct timespec *ts)
-{
-	long now = ts->tv_sec;
-
-	if (time_status & STA_INS) {
-		time_state = TIME_INS;
-		now += 86400 - now % 86400;
-		hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-
-		return;
-	}
-
-	if (time_status & STA_DEL) {
-		time_state = TIME_DEL;
-		now += 86400 - (now + 1) % 86400;
-		hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-	}
-}
 
 /*
  * Propagate a new txc->status value into the NTP state:
@@ -374,22 +353,6 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
 	time_status &= STA_RONLY;
 	time_status |= txc->status & ~STA_RONLY;
 
-	switch (time_state) {
-	case TIME_OK:
-		ntp_start_leap_timer(ts);
-		break;
-	case TIME_INS:
-	case TIME_DEL:
-		time_state = TIME_OK;
-		ntp_start_leap_timer(ts);
-	case TIME_WAIT:
-		if (!(time_status & (STA_INS | STA_DEL)))
-			time_state = TIME_OK;
-		break;
-	case TIME_OOP:
-		hrtimer_restart(&leap_timer);
-		break;
-	}
 }
 /*
  * Called with the xtime lock held, so we can access and modify
@@ -469,9 +432,6 @@ int do_adjtimex(struct timex *txc)
 		    (txc->tick <  900000/USER_HZ ||
 		     txc->tick > 1100000/USER_HZ))
 			return -EINVAL;
-
-		if (txc->modes & ADJ_STATUS && time_state != TIME_OK)
-			hrtimer_cancel(&leap_timer);
 	}
 
 	getnstimeofday(&ts);
@@ -549,6 +509,4 @@ __setup("ntp_tick_adj=", ntp_tick_adj_setup);
 void __init ntp_init(void)
 {
 	ntp_clear();
-	hrtimer_init(&leap_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
-	leap_timer.function = ntp_leap_second;
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f3767c4..8c4b32e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -169,13 +169,6 @@ struct timespec raw_time;
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
 
-/* must hold xtime_lock */
-void timekeeping_leap_insert(int leapsecond)
-{
-	xtime.tv_sec += leapsecond;
-	wall_to_monotonic.tv_sec -= leapsecond;
-	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
-}
 
 #ifdef CONFIG_GENERIC_TIME
 
@@ -752,9 +745,11 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
 
 	timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
 	while (timekeeper.xtime_nsec >= nsecps) {
+		int leap;
 		timekeeper.xtime_nsec -= nsecps;
 		xtime.tv_sec++;
-		second_overflow();
+		leap = second_overflow(xtime.tv_sec);
+		xtime.tv_sec += leap;
 	}
 
 	/* Accumulate raw time */
@@ -859,9 +854,12 @@ void update_wall_time(void)
 	 * xtime.tv_nsec isn't larger then NSEC_PER_SEC
 	 */
 	if (unlikely(xtime.tv_nsec >= NSEC_PER_SEC)) {
+		int leap;
 		xtime.tv_nsec -= NSEC_PER_SEC;
 		xtime.tv_sec++;
-		second_overflow();
+		leap = second_overflow(xtime.tv_sec);
+		xtime.tv_sec += leap;
+
 	}
 
 	/* check to see if there is a new clocksource to use */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 02/11] 2.6.35.x: ntp: Correct TAI offset during leap second
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
  2012-07-17 18:05 ` [PATCH 01/11] 2.6.35.x: ntp: Fix leap-second hrtimer livelock John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 03/11] 2.6.35.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond John Stultz
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable
  Cc: Richard Cochran, Prarit Bhargava, Thomas Gleixner, Linux Kernel,
	John Stultz

From: Richard Cochran <richardcochran@gmail.com>

This is a backport of dd48d708ff3e917f6d6b6c2b696c3f18c019feed

When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/ntp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 67ef677..272e169 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -208,6 +208,7 @@ int second_overflow(unsigned long secs)
 		if (secs % 86400 == 0) {
 			leap = -1;
 			time_state = TIME_OOP;
+			time_tai++;
 			printk(KERN_NOTICE
 				"Clock: inserting leap second 23:59:60 UTC\n");
 		}
@@ -222,7 +223,6 @@ int second_overflow(unsigned long secs)
 		}
 		break;
 	case TIME_OOP:
-		time_tai++;
 		time_state = TIME_WAIT;
 		break;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 03/11] 2.6.35.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
  2012-07-17 18:05 ` [PATCH 01/11] 2.6.35.x: ntp: Fix leap-second hrtimer livelock John Stultz
  2012-07-17 18:05 ` [PATCH 02/11] 2.6.35.x: ntp: Correct TAI offset during leap second John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 04/11] 2.6.35.x: time: Move common updates to a function John Stultz
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable
  Cc: John Stultz, stable, Thomas Gleixner, Prarit Bhargava,
	Linux Kernel, John Stultz

From: John Stultz <john.stultz@linaro.org>

This is a backport of fad0c66c4bb836d57a5f125ecd38bed653ca863a
which resolves a bug the previous commit.

Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Richard Cochran <richardcochran@gmail.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/time/timekeeping.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 8c4b32e..5960728 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -750,6 +750,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
 		xtime.tv_sec++;
 		leap = second_overflow(xtime.tv_sec);
 		xtime.tv_sec += leap;
+		wall_to_monotonic.tv_sec -= leap;
 	}
 
 	/* Accumulate raw time */
@@ -859,7 +860,7 @@ void update_wall_time(void)
 		xtime.tv_sec++;
 		leap = second_overflow(xtime.tv_sec);
 		xtime.tv_sec += leap;
-
+		wall_to_monotonic.tv_sec -= leap;
 	}
 
 	/* check to see if there is a new clocksource to use */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 04/11] 2.6.35.x: time: Move common updates to a function
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (2 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 03/11] 2.6.35.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 05/11] 2.6.35.x: hrtimer: Provide clock_was_set_delayed() John Stultz
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable
  Cc: Thomas Gleixner, Eric Dumazet, Richard Cochran, Prarit Bhargava,
	Linux Kernel, John Stultz

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of cc06268c6a87db156af2daed6e96a936b955cc82

While not a bugfix itself, it allows following fixes to backport
in a more straightforward manner.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c |   20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5960728..5fbcb9a 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -166,6 +166,18 @@ static struct timespec total_sleep_time;
  */
 struct timespec raw_time;
 
+/* must hold write on xtime_lock */
+static void timekeeping_update(bool clearntp)
+{
+	if (clearntp) {
+		timekeeper.ntp_error = 0;
+		ntp_clear();
+	}
+	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
+}
+
+
+
 /* flag for if timekeeping is suspended */
 int __read_mostly timekeeping_suspended;
 
@@ -322,10 +334,7 @@ int do_settimeofday(struct timespec *tv)
 
 	xtime = *tv;
 
-	timekeeper.ntp_error = 0;
-	ntp_clear();
-
-	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
+	timekeeping_update(true);
 
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
@@ -863,8 +872,7 @@ void update_wall_time(void)
 		wall_to_monotonic.tv_sec -= leap;
 	}
 
-	/* check to see if there is a new clocksource to use */
-	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
+	timekeeping_update(false);
 }
 
 /**
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 05/11] 2.6.35.x: hrtimer: Provide clock_was_set_delayed()
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (3 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 04/11] 2.6.35.x: time: Move common updates to a function John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 06/11] 2.6.35.x: timekeeping: Fix leapsecond triggered load spike issue John Stultz
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: John Stultz, Thomas Gleixner, Prarit Bhargava, Linux Kernel

This is a backport of f55a6faa384304c89cfef162768e88374d3312cb

clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
  rid of all this ifdeffery ASAP ]

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 include/linux/hrtimer.h |    7 +++++++
 kernel/hrtimer.c        |   20 ++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index fd0c1b8..39d5222 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,6 +159,7 @@ struct hrtimer_clock_base {
  *			and timers
  * @clock_base:		array of clock bases for this cpu
  * @curr_timer:		the timer which is executing a callback right now
+ * @clock_was_set:	Indicates that clock was set from irq context.
  * @expires_next:	absolute time of the next event which was scheduled
  *			via clock_set_next_event()
  * @hres_active:	State of high resolution mode
@@ -171,6 +172,7 @@ struct hrtimer_clock_base {
 struct hrtimer_cpu_base {
 	raw_spinlock_t			lock;
 	struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
+	unsigned int			clock_was_set;
 #ifdef CONFIG_HIGH_RES_TIMERS
 	ktime_t				expires_next;
 	int				hres_active;
@@ -280,6 +282,8 @@ extern void hrtimer_peek_ahead_timers(void);
 # define MONOTONIC_RES_NSEC	HIGH_RES_NSEC
 # define KTIME_MONOTONIC_RES	KTIME_HIGH_RES
 
+extern void clock_was_set_delayed(void);
+
 #else
 
 # define MONOTONIC_RES_NSEC	LOW_RES_NSEC
@@ -308,6 +312,9 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer)
 {
 	return 0;
 }
+
+static inline void clock_was_set_delayed(void) { }
+
 #endif
 
 extern ktime_t ktime_get(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 21e0c5e..c1f7d12 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -738,6 +738,19 @@ static int hrtimer_switch_to_hres(void)
 	return 1;
 }
 
+/*
+ * Called from timekeeping code to reprogramm the hrtimer interrupt
+ * device. If called from the timer interrupt context we defer it to
+ * softirq context.
+ */
+void clock_was_set_delayed(void)
+{
+	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+	cpu_base->clock_was_set = 1;
+	__raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+}
+
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
@@ -1409,6 +1422,13 @@ void hrtimer_peek_ahead_timers(void)
 
 static void run_hrtimer_softirq(struct softirq_action *h)
 {
+	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+	if (cpu_base->clock_was_set) {
+		cpu_base->clock_was_set = 0;
+		clock_was_set();
+	}
+
 	hrtimer_peek_ahead_timers();
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 06/11] 2.6.35.x: timekeeping: Fix leapsecond triggered load spike issue
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (4 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 05/11] 2.6.35.x: hrtimer: Provide clock_was_set_delayed() John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 07/11] 2.6.35.x: timekeeping: Maintain ktime_t based offsets for hrtimers John Stultz
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: John Stultz, Thomas Gleixner, Prarit Bhargava, Linux Kernel

This is a backport of 4873fa070ae84a4115f0b3c9dfabc224f1bc7c51

The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.

The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/time/timekeeping.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5fbcb9a..eaf1e09 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -760,6 +760,8 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
 		leap = second_overflow(xtime.tv_sec);
 		xtime.tv_sec += leap;
 		wall_to_monotonic.tv_sec -= leap;
+		if (leap)
+			clock_was_set_delayed();
 	}
 
 	/* Accumulate raw time */
@@ -870,6 +872,8 @@ void update_wall_time(void)
 		leap = second_overflow(xtime.tv_sec);
 		xtime.tv_sec += leap;
 		wall_to_monotonic.tv_sec -= leap;
+		if (leap)
+			clock_was_set_delayed();
 	}
 
 	timekeeping_update(false);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 07/11] 2.6.35.x: timekeeping: Maintain ktime_t based offsets for hrtimers
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (5 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 06/11] 2.6.35.x: timekeeping: Fix leapsecond triggered load spike issue John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 08/11] 2.6.35.x: hrtimers: Move lock held region in hrtimer_interrupt() John Stultz
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: Thomas Gleixner, John Stultz, Prarit Bhargava, Linux Kernel

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of 5b9fe759a678e05be4937ddf03d50e950207c1c0

We need to update the hrtimer clock offsets from the hrtimer interrupt
context. To avoid conversions from timespec to ktime_t maintain a
ktime_t based representation of those offsets in the timekeeper. This
puts the conversion overhead into the code which updates the
underlying offsets and provides fast accessible values in the hrtimer
interrupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/time/timekeeping.c |   25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index eaf1e09..1f8140e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -161,18 +161,34 @@ struct timespec xtime __attribute__ ((aligned (16)));
 struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 static struct timespec total_sleep_time;
 
+/* Offset clock monotonic -> clock realtime */
+static ktime_t offs_real;
+
+/* Offset clock monotonic -> clock boottime */
+static ktime_t offs_boot;
+
 /*
  * The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock.
  */
 struct timespec raw_time;
 
 /* must hold write on xtime_lock */
+static void update_rt_offset(void)
+{
+	struct timespec tmp, *wtm = &wall_to_monotonic;
+
+	set_normalized_timespec(&tmp, -wtm->tv_sec, -wtm->tv_nsec);
+	offs_real = timespec_to_ktime(tmp);
+}
+
+/* must hold write on xtime_lock */
 static void timekeeping_update(bool clearntp)
 {
 	if (clearntp) {
 		timekeeper.ntp_error = 0;
 		ntp_clear();
 	}
+	update_rt_offset();
 	update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
 }
 
@@ -556,6 +572,7 @@ void __init timekeeping_init(void)
 	}
 	set_normalized_timespec(&wall_to_monotonic,
 				-boot.tv_sec, -boot.tv_nsec);
+	update_rt_offset();
 	total_sleep_time.tv_sec = 0;
 	total_sleep_time.tv_nsec = 0;
 	write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -564,6 +581,12 @@ void __init timekeeping_init(void)
 /* time in seconds when suspend began */
 static struct timespec timekeeping_suspend_time;
 
+static void update_sleep_time(struct timespec t)
+{
+	total_sleep_time = t;
+	offs_boot = timespec_to_ktime(t);
+}
+
 /**
  * timekeeping_resume - Resumes the generic timekeeping subsystem.
  * @dev:	unused
@@ -587,7 +610,7 @@ static int timekeeping_resume(struct sys_device *dev)
 		ts = timespec_sub(ts, timekeeping_suspend_time);
 		xtime = timespec_add_safe(xtime, ts);
 		wall_to_monotonic = timespec_sub(wall_to_monotonic, ts);
-		total_sleep_time = timespec_add_safe(total_sleep_time, ts);
+		update_sleep_time(timespec_add_safe(total_sleep_time, ts));
 	}
 	/* re-base the last cycle value */
 	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 08/11] 2.6.35.x: hrtimers: Move lock held region in hrtimer_interrupt()
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (6 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 07/11] 2.6.35.x: timekeeping: Maintain ktime_t based offsets for hrtimers John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 09/11] 2.6.35.x: timekeeping: Provide hrtimer update function John Stultz
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: Thomas Gleixner, John Stultz, Prarit Bhargava, Linux Kernel

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of 196951e91262fccda81147d2bcf7fdab08668b40

We need to update the base offsets from this code and we need to do
that under base->lock. Move the lock held region around the
ktime_get() calls. The ktime_get() calls are going to be replaced with
a function which gets the time and the offsets atomically.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/hrtimer.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c1f7d12..2f05eb1 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1279,11 +1279,10 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	cpu_base->nr_events++;
 	dev->next_event.tv64 = KTIME_MAX;
 
+	raw_spin_lock(&cpu_base->lock);
 	entry_time = now = ktime_get();
 retry:
 	expires_next.tv64 = KTIME_MAX;
-
-	raw_spin_lock(&cpu_base->lock);
 	/*
 	 * We set expires_next to KTIME_MAX here with cpu_base->lock
 	 * held to prevent that a timer is enqueued in our queue via
@@ -1358,6 +1357,7 @@ retry:
 	 * interrupt routine. We give it 3 attempts to avoid
 	 * overreacting on some spurious event.
 	 */
+	raw_spin_lock(&cpu_base->lock);
 	now = ktime_get();
 	cpu_base->nr_retries++;
 	if (++retries < 3)
@@ -1370,6 +1370,7 @@ retry:
 	 */
 	cpu_base->nr_hangs++;
 	cpu_base->hang_detected = 1;
+	raw_spin_unlock(&cpu_base->lock);
 	delta = ktime_sub(now, entry_time);
 	if (delta.tv64 > cpu_base->max_hang_time.tv64)
 		cpu_base->max_hang_time = delta;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 09/11] 2.6.35.x: timekeeping: Provide hrtimer update function
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (7 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 08/11] 2.6.35.x: hrtimers: Move lock held region in hrtimer_interrupt() John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 10/11] 2.6.35.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
  2012-07-17 18:05 ` [PATCH 11/11] 2.6.35.x: timekeeping: Add missing update call in timekeeping_resume() John Stultz
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: Thomas Gleixner, John Stultz, Prarit Bhargava, Linux Kernel

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of f6c06abfb3972ad4914cef57d8348fcb2932bc3b

To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 include/linux/hrtimer.h   |    2 +-
 kernel/time/timekeeping.c |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 39d5222..568d88f 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -319,7 +319,7 @@ static inline void clock_was_set_delayed(void) { }
 
 extern ktime_t ktime_get(void);
 extern ktime_t ktime_get_real(void);
-
+extern ktime_t ktime_get_update_offsets(ktime_t *offs_real);
 
 DECLARE_PER_CPU(struct tick_device, tick_cpu_device);
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 1f8140e..2461fe4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -976,3 +976,35 @@ struct timespec get_monotonic_coarse(void)
 				now.tv_nsec + mono.tv_nsec);
 	return now;
 }
+
+#ifdef CONFIG_HIGH_RES_TIMERS
+/**
+ * ktime_get_update_offsets - hrtimer helper
+ * @real:	pointer to storage for monotonic -> realtime offset
+ *
+ * Returns current monotonic time and updates the offsets
+ * Called from hrtimer_interupt() or retrigger_next_event()
+ */
+ktime_t ktime_get_update_offsets(ktime_t *real)
+{
+	ktime_t now;
+	unsigned int seq;
+	u64 secs, nsecs;
+
+	do {
+		seq = read_seqbegin(&xtime_lock);
+
+		secs = xtime.tv_sec;
+		nsecs = xtime.tv_nsec;
+		nsecs += timekeeping_get_ns();
+		/* If arch requires, add in gettimeoffset() */
+		nsecs += arch_gettimeoffset();
+
+		*real = offs_real;
+	} while (read_seqretry(&xtime_lock, seq));
+
+	now = ktime_add_ns(ktime_set(secs, 0), nsecs);
+	now = ktime_sub(now, *real);
+	return now;
+}
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 10/11] 2.6.35.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (8 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 09/11] 2.6.35.x: timekeeping: Provide hrtimer update function John Stultz
@ 2012-07-17 18:05 ` John Stultz
  2012-07-17 18:05 ` [PATCH 11/11] 2.6.35.x: timekeeping: Add missing update call in timekeeping_resume() John Stultz
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable; +Cc: John Stultz, Thomas Gleixner, Prarit Bhargava, Linux Kernel

This is a backport of 5baefd6d84163443215f4a99f6a20f054ef11236

The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.

clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.

In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.

So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.

ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().

The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.

This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/hrtimer.c |   27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 2f05eb1..85b5cfc 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -603,6 +603,12 @@ static int hrtimer_reprogram(struct hrtimer *timer,
 	return res;
 }
 
+static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
+{
+	ktime_t *offs_real = &base->clock_base[CLOCK_REALTIME].offset;
+
+	return ktime_get_update_offsets(offs_real);
+}
 
 /*
  * Retrigger next event is called after clock was set
@@ -612,26 +618,15 @@ static int hrtimer_reprogram(struct hrtimer *timer,
 static void retrigger_next_event(void *arg)
 {
 	struct hrtimer_cpu_base *base;
-	struct timespec realtime_offset;
-	unsigned long seq;
 
 	if (!hrtimer_hres_active())
 		return;
 
-	do {
-		seq = read_seqbegin(&xtime_lock);
-		set_normalized_timespec(&realtime_offset,
-					-wall_to_monotonic.tv_sec,
-					-wall_to_monotonic.tv_nsec);
-	} while (read_seqretry(&xtime_lock, seq));
-
 	base = &__get_cpu_var(hrtimer_bases);
 
 	/* Adjust CLOCK_REALTIME offset */
 	raw_spin_lock(&base->lock);
-	base->clock_base[CLOCK_REALTIME].offset =
-		timespec_to_ktime(realtime_offset);
-
+	hrtimer_update_base(base);
 	hrtimer_force_reprogram(base, 0);
 	raw_spin_unlock(&base->lock);
 }
@@ -731,7 +726,6 @@ static int hrtimer_switch_to_hres(void)
 	base->clock_base[CLOCK_MONOTONIC].resolution = KTIME_HIGH_RES;
 
 	tick_setup_sched_timer();
-
 	/* "Retrigger" the interrupt to get things going */
 	retrigger_next_event(NULL);
 	local_irq_restore(flags);
@@ -1280,7 +1274,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	dev->next_event.tv64 = KTIME_MAX;
 
 	raw_spin_lock(&cpu_base->lock);
-	entry_time = now = ktime_get();
+	entry_time = now = hrtimer_update_base(cpu_base);
 retry:
 	expires_next.tv64 = KTIME_MAX;
 	/*
@@ -1356,9 +1350,12 @@ retry:
 	 * We need to prevent that we loop forever in the hrtimer
 	 * interrupt routine. We give it 3 attempts to avoid
 	 * overreacting on some spurious event.
+	 *
+	 * Acquire base lock for updating the offsets and retrieving
+	 * the current time.
 	 */
 	raw_spin_lock(&cpu_base->lock);
-	now = ktime_get();
+	now = hrtimer_update_base(cpu_base);
 	cpu_base->nr_retries++;
 	if (++retries < 3)
 		goto retry;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 11/11] 2.6.35.x: timekeeping: Add missing update call in timekeeping_resume()
  2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
                   ` (9 preceding siblings ...)
  2012-07-17 18:05 ` [PATCH 10/11] 2.6.35.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
@ 2012-07-17 18:05 ` John Stultz
  10 siblings, 0 replies; 12+ messages in thread
From: John Stultz @ 2012-07-17 18:05 UTC (permalink / raw)
  To: stable
  Cc: Thomas Gleixner, LKML, Linux PM list, John Stultz, Ingo Molnar,
	Peter Zijlstra, Prarit Bhargava, Linus Torvalds

From: Thomas Gleixner <tglx@linutronix.de>

This is a backport of 3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
---
 kernel/time/timekeeping.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 2461fe4..782f13d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -616,6 +616,7 @@ static int timekeeping_resume(struct sys_device *dev)
 	timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
 	timekeeper.ntp_error = 0;
 	timekeeping_suspended = 0;
+	timekeeping_update(false);
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
 	touch_softlockup_watchdog();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-07-17 18:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-17 18:05 [PATCH 00/11] 2.6.35-stable: Fix for leapsecond deadlock & hrtimer/futex issue John Stultz
2012-07-17 18:05 ` [PATCH 01/11] 2.6.35.x: ntp: Fix leap-second hrtimer livelock John Stultz
2012-07-17 18:05 ` [PATCH 02/11] 2.6.35.x: ntp: Correct TAI offset during leap second John Stultz
2012-07-17 18:05 ` [PATCH 03/11] 2.6.35.x: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond John Stultz
2012-07-17 18:05 ` [PATCH 04/11] 2.6.35.x: time: Move common updates to a function John Stultz
2012-07-17 18:05 ` [PATCH 05/11] 2.6.35.x: hrtimer: Provide clock_was_set_delayed() John Stultz
2012-07-17 18:05 ` [PATCH 06/11] 2.6.35.x: timekeeping: Fix leapsecond triggered load spike issue John Stultz
2012-07-17 18:05 ` [PATCH 07/11] 2.6.35.x: timekeeping: Maintain ktime_t based offsets for hrtimers John Stultz
2012-07-17 18:05 ` [PATCH 08/11] 2.6.35.x: hrtimers: Move lock held region in hrtimer_interrupt() John Stultz
2012-07-17 18:05 ` [PATCH 09/11] 2.6.35.x: timekeeping: Provide hrtimer update function John Stultz
2012-07-17 18:05 ` [PATCH 10/11] 2.6.35.x: hrtimer: Update hrtimer base offsets each hrtimer_interrupt John Stultz
2012-07-17 18:05 ` [PATCH 11/11] 2.6.35.x: timekeeping: Add missing update call in timekeeping_resume() John Stultz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.