[RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
@ 2015-05-29 20:24 John Stultz
  2015-05-29 20:24 ` [RFC][PATCH 1/4] selftests: timers: Add leap-second timer edge testing to leap-a-day.c John Stultz
                   ` (6 more replies)
  0 siblings, 7 replies; 40+ messages in thread
From: John Stultz @ 2015-05-29 20:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

As Prarit reported here:
https://lkml.org/lkml/2015/5/27/458

Since the leapsecond is applied at timer tick time, and not
the actual second edge, ABS_TIME CLOCK_REALTIME timers set for
right after the leapsecond could fire a second early, since
some timers may be expired before we trigger the timekeeping
timer, which then applies the leapsecond.

Thus this patch series tries to address this isssue, including
extending the leap-a-day test to catch this problem, as well
as other relevant fixups I found while working on the code.

This series has only had limited testing, so I wanted to send
it out for initial review and comment. Folks can grab this tree
via git for testing here:
https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer

Thougths and feedback would be appreciated!
thanks
-john

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>

John Stultz (4):
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  timer_list: Add the base offset so remaining nsecs are accurate for
    non monotonic timers
  ntp: Use printk_deferred in leapsecond path
  time: Do leapsecond adjustment in gettime fastpaths

 include/linux/time64.h                      |  1 +
 include/linux/timekeeper_internal.h         |  7 +++
 kernel/time/ntp.c                           | 77 ++++++++++++++++++++---
 kernel/time/ntp_internal.h                  |  1 +
 kernel/time/timekeeping.c                   | 97 +++++++++++++++++++++++++----
 kernel/time/timer_list.c                    |  2 +-
 tools/testing/selftests/timers/leap-a-day.c | 76 ++++++++++++++++++++--
 7 files changed, 234 insertions(+), 27 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC][PATCH 1/4] selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
@ 2015-05-29 20:24 ` John Stultz
  2015-05-29 20:24 ` [RFC][PATCH 2/4] timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers John Stultz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-05-29 20:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

Prarit reported an issue w/ timers around the leapsecond, where
a timer set for Midnight UTC (00:00:00) might fire a second early
right before the leapsecond (23:59:60 - though it appears as a
repeated 23:59:59) is applied.

So I've updated the leap-a-day.c test to integrate a similar
test, where we set a timer and check if it triggers at the
right time, and if the ntp state transition is managed
properly.

Currently this test will fail. But following patches correct
the issue.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 tools/testing/selftests/timers/leap-a-day.c | 76 +++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/timers/leap-a-day.c b/tools/testing/selftests/timers/leap-a-day.c
index b8272e6..331c4f7 100644
--- a/tools/testing/selftests/timers/leap-a-day.c
+++ b/tools/testing/selftests/timers/leap-a-day.c
@@ -44,6 +44,7 @@
 #include <time.h>
 #include <sys/time.h>
 #include <sys/timex.h>
+#include <sys/errno.h>
 #include <string.h>
 #include <signal.h>
 #include <unistd.h>
@@ -63,6 +64,9 @@ static inline int ksft_exit_fail(void)
 #define NSEC_PER_SEC 1000000000ULL
 #define CLOCK_TAI 11
 
+time_t next_leap;
+int error_found;
+
 /* returns 1 if a <= b, 0 otherwise */
 static inline int in_order(struct timespec a, struct timespec b)
 {
@@ -134,6 +138,34 @@ void handler(int unused)
 	exit(0);
 }
 
+void sigalarm(int signo)
+{
+	struct timex tx;
+	char buf[26];
+	int ret;
+
+	tx.modes = 0;
+	ret = adjtimex(&tx);
+
+	ctime_r(&tx.time.tv_sec, buf);
+	buf[strlen(buf)-1] = 0; /*remove trailing\n */
+	printf("%s + %6ld us (%i)\t%s - TIMER FIRED\n",
+					buf,
+					tx.time.tv_usec,
+					tx.tai,
+					time_state_str(ret));
+
+	if (tx.time.tv_sec < next_leap) {
+		printf("Error: Early timer expiration!\n");
+		error_found = 1;
+	}
+	if (ret != TIME_WAIT) {
+		printf("Error: Incorrect NTP state?\n");
+		error_found = 1;
+	}
+}
+
+
 /* Test for known hrtimer failure */
 void test_hrtimer_failure(void)
 {
@@ -144,12 +176,19 @@ void test_hrtimer_failure(void)
 	clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &target, NULL);
 	clock_gettime(CLOCK_REALTIME, &now);
 
-	if (!in_order(target, now))
+	if (!in_order(target, now)) {
 		printf("ERROR: hrtimer early expiration failure observed.\n");
+		error_found = 1;
+	}
 }
 
 int main(int argc, char **argv)
 {
+	timer_t tm1;
+	struct itimerspec its1;
+	struct sigevent se;
+	struct sigaction act;
+	int signum = SIGRTMAX;
 	int settime = 0;
 	int tai_time = 0;
 	int insert = 1;
@@ -191,6 +230,12 @@ int main(int argc, char **argv)
 	signal(SIGINT, handler);
 	signal(SIGKILL, handler);
 
+	/* Set up timer signal handler: */
+	sigfillset(&act.sa_mask);
+	act.sa_flags = 0;
+	act.sa_handler = sigalarm;
+	sigaction(signum, &act, NULL);
+
 	if (iterations < 0)
 		printf("This runs continuously. Press ctrl-c to stop\n");
 	else
@@ -201,7 +246,7 @@ int main(int argc, char **argv)
 		int ret;
 		struct timespec ts;
 		struct timex tx;
-		time_t now, next_leap;
+		time_t now;
 
 		/* Get the current time */
 		clock_gettime(CLOCK_REALTIME, &ts);
@@ -251,10 +296,27 @@ int main(int argc, char **argv)
 
 		printf("Scheduling leap second for %s", ctime(&next_leap));
 
+		/* Set up timer */
+		printf("Setting timer for %s", ctime(&next_leap));
+		memset(&se, 0, sizeof(se));
+		se.sigev_notify = SIGEV_SIGNAL;
+		se.sigev_signo = signum;
+		se.sigev_value.sival_int = 0;
+		if (timer_create(CLOCK_REALTIME, &se, &tm1) == -1) {
+			printf("Error: timer_create failed\n");
+			return ksft_exit_fail();
+		}
+		its1.it_value.tv_sec = next_leap;
+		its1.it_value.tv_nsec = 0;
+		its1.it_interval.tv_sec = 0;
+		its1.it_interval.tv_nsec = 0;
+		timer_settime(tm1, TIMER_ABSTIME, &its1, NULL);
+
 		/* Wake up 3 seconds before leap */
 		ts.tv_sec = next_leap - 3;
 		ts.tv_nsec = 0;
 
+
 		while (clock_nanosleep(CLOCK_REALTIME, TIMER_ABSTIME, &ts, NULL))
 			printf("Something woke us up, returning to sleep\n");
 
@@ -276,6 +338,7 @@ int main(int argc, char **argv)
 		while (now < next_leap + 2) {
 			char buf[26];
 			struct timespec tai;
+			int ret;
 
 			tx.modes = 0;
 			ret = adjtimex(&tx);
@@ -308,8 +371,13 @@ int main(int argc, char **argv)
 		/* Note if kernel has known hrtimer failure */
 		test_hrtimer_failure();
 
-		printf("Leap complete\n\n");
-
+		printf("Leap complete\n");
+		if (error_found) {
+			printf("Errors observed\n");
+			clear_time_state();
+			return ksft_exit_fail();
+		}
+		printf("\n");
 		if ((iterations != -1) && !(--iterations))
 			break;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC][PATCH 2/4] timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
  2015-05-29 20:24 ` [RFC][PATCH 1/4] selftests: timers: Add leap-second timer edge testing to leap-a-day.c John Stultz
@ 2015-05-29 20:24 ` John Stultz
  2015-05-29 20:24 ` [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path John Stultz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-05-29 20:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

I noticed for non-monotonic timers in timer_list, some of the
output looked a little confusing.

For example:
 #1: <0000000000000000>, posix_timer_fn, S:01, hrtimer_start_range_ns, leap-a-day/2360
 # expires at 1434412800000000000-1434412800000000000 nsecs [in 1434410725062375469 to 1434410725062375469 nsecs]

You'll note the relative time till the expiration "[in xxx to
yyy nsecs]" is incorrect. This is because its printing the delta
between CLOCK_MONOTONIC time to the CLOCK_REALTIME expiration.

This patch fixes this issue by adding the clock offset to the
"now" time which we use to calculate the delta.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timer_list.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index e878c2e..05a560f 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -132,7 +132,7 @@ print_base(struct seq_file *m, struct hrtimer_clock_base *base, u64 now)
 		   (unsigned long long) ktime_to_ns(base->offset));
 #endif
 	SEQ_printf(m,   "active timers:\n");
-	print_active_timers(m, base, now);
+	print_active_timers(m, base, now + ktime_to_ns(base->offset));
 }
 
 static void print_cpu(struct seq_file *m, int cpu, u64 now)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
  2015-05-29 20:24 ` [RFC][PATCH 1/4] selftests: timers: Add leap-second timer edge testing to leap-a-day.c John Stultz
  2015-05-29 20:24 ` [RFC][PATCH 2/4] timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers John Stultz
@ 2015-05-29 20:24 ` John Stultz
  2015-06-02 10:31   ` Jiri Bohac
  2015-05-29 20:24 ` [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths John Stultz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: John Stultz @ 2015-05-29 20:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

Looking over the leapsecond code, I noticed the printk messages
reporting the leapsecond insertion in the second_overflow path
were not using the printk_deferred method. This was surprising
since the printk_deferred method was added in part to avoid
printk-ing while holding the timekeeping locks.

See 6d9bcb621b0b (timekeeping: use printk_deferred when holding
timekeeping seqlock) for further rational.

I can only guess that this omission was a git add -p oversight.

Folks particularly worried about leapsecond crashes should
probably pay attention to this patch. Pending review, Its likely
a -stable candidate.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/ntp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 7a68100..472591e 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -393,7 +393,7 @@ int second_overflow(unsigned long secs)
 		else if (secs % 86400 == 0) {
 			leap = -1;
 			time_state = TIME_OOP;
-			printk(KERN_NOTICE
+			printk_deferred(KERN_NOTICE
 				"Clock: inserting leap second 23:59:60 UTC\n");
 		}
 		break;
@@ -403,7 +403,7 @@ int second_overflow(unsigned long secs)
 		else if ((secs + 1) % 86400 == 0) {
 			leap = 1;
 			time_state = TIME_WAIT;
-			printk(KERN_NOTICE
+			printk_deferred(KERN_NOTICE
 				"Clock: deleting leap second 23:59:59 UTC\n");
 		}
 		break;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
                   ` (2 preceding siblings ...)
  2015-05-29 20:24 ` [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path John Stultz
@ 2015-05-29 20:24 ` John Stultz
  2015-05-31 16:05   ` Richard Cochran
  2015-06-02  9:01   ` Ingo Molnar
  2015-05-31 13:55 ` [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers Prarit Bhargava
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: John Stultz @ 2015-05-29 20:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

Currently, leapsecond adjustments are done at tick time.

As a result, the leapsecond was applied at the first timer
tick *after* the leapsecond (~1-10ms late depending on HZ),
rather then exactly on the second edge.

This was in part historical from back when we were always
tick based, but correcting this since has been avoided since
it adds extra conditional checks in the gettime fastpath,
which has performance overhead.

However, it was recently pointed out that ABS_TIME
CLOCK_REALTIME timers set for right after the leapsecond
could fire a second early, since some timers may be expired
before we trigger the timekeeping timer, which then applies
the leapsecond.

This isn't quite as bad as it sounds, since behaviorally
it is similar to what is possible w/ ntpd made leapsecond
adjustments done w/o using the kernel discipline. Where
due to latencies, timers may fire just prior to the
settimeofday call. (Also, one should note that all
applications using CLOCK_REALTIME timers should always be
careful, since they are prone to quirks from settimeofday()
disturbances.)

However, the purpose of having the kernel do the leap adjustment
is to avoid such latencies, so I think this is worth fixing.

So in order to properly keep those timers from firing a second
early, this patch modifies the gettime accessors to do the
extra checks to apply the leapsecond adjustment on the second
edge. This prevents the timer core from expiring timers too
early.

This patch does not handle VDSO time implementations, so
userspace using vdso gettime will still see the leapsecond
applied at the first timer tick after the leapsecond.
This is a bit of a tradeoff, since the performance impact
would be greatest to VDSO implementations, and since vdso
interfaces don't provide the TIME_OOP flag, one can't
distinquish the leapsecond from a time discontinuity (such
as settimeofday), so correcting the VDSO may not be as
important there.

Apologies to Richard Cochran, who pushed for such a change
years ago, which I resisted due to the concerns about the
performance overhead.

While I suspect this isn't extremely critical, folks who
care about strict leap-second correctness will likely
want to watch this, and it will likely be a -stable candidate.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/time64.h              |  1 +
 include/linux/timekeeper_internal.h |  7 +++
 kernel/time/ntp.c                   | 73 +++++++++++++++++++++++++---
 kernel/time/ntp_internal.h          |  1 +
 kernel/time/timekeeping.c           | 97 ++++++++++++++++++++++++++++++++-----
 5 files changed, 159 insertions(+), 20 deletions(-)

diff --git a/include/linux/time64.h b/include/linux/time64.h
index a383147..ff46e87 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -28,6 +28,7 @@ struct timespec64 {
 #define FSEC_PER_SEC	1000000000000000LL
 
 /* Located here for timespec[64]_valid_strict */
+#define TIME64_MAX			((s64)~((u64)1 << 63))
 #define KTIME_MAX			((s64)~((u64)1 << 63))
 #define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
 
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index fb86963..78980ea 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -60,6 +60,9 @@ struct tk_read_base {
  *			shifted nano seconds.
  * @ntp_error_shift:	Shift conversion between clock shifted nano seconds and
  *			ntp shifted nano seconds.
+ * @next_leap_sec:	Second value of the next leap sec (or TIME64_MAX)
+ * @next_leap_ktime:	ktime_t value of the next leap sec (or KTIME_MAX)
+ * @leap_direction:	Direction of pending leap adjustment
  *
  * Note: For timespec(64) based interfaces wall_to_monotonic is what
  * we need to add to xtime (or xtime corrected for sub jiffie times)
@@ -104,6 +107,10 @@ struct timekeeper {
 	s64			ntp_error;
 	u32			ntp_error_shift;
 	u32			ntp_err_mult;
+	/* Leapsecond status */
+	time64_t		next_leap_sec;
+	ktime_t			next_leap_ktime;
+	int			leap_direction;
 };
 
 #ifdef CONFIG_GENERIC_TIME_VSYSCALL
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 472591e..6e15fbb 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -35,6 +35,7 @@ unsigned long			tick_nsec;
 static u64			tick_length;
 static u64			tick_length_base;
 
+#define SECS_PER_DAY		86400
 #define MAX_TICKADJ		500LL		/* usecs */
 #define MAX_TICKADJ_SCALED \
 	(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -76,6 +77,9 @@ static long			time_adjust;
 /* constant (boot-param configurable) NTP tick adjustment (upscaled)	*/
 static s64			ntp_tick_adj;
 
+/* second value of the next pending leapsecond, or TIME64_MAX if no leap */
+static time64_t			ntp_next_leap_sec = TIME64_MAX;
+
 #ifdef CONFIG_NTP_PPS
 
 /*
@@ -349,6 +353,7 @@ void ntp_clear(void)
 	tick_length	= tick_length_base;
 	time_offset	= 0;
 
+	ntp_next_leap_sec = TIME64_MAX;
 	/* Clear PPS state variables */
 	pps_clear();
 }
@@ -359,6 +364,33 @@ u64 ntp_tick_length(void)
 	return tick_length;
 }
 
+/**
+ * get_leap_state - Returns the NTP leap state
+ * @next_leap_sec:	Next leapsecond in time64_t
+ * @next_leap_ktime:	Next leapsecond in ktime_t
+ *
+ * Provides NTP leapsecond state. Returns direction
+ * of the leapsecond adjustment as an integer.
+ */
+int get_leap_state(time64_t *next_leap_sec, ktime_t *next_leap_ktime)
+{
+	int dir;
+
+	if ((time_state == TIME_INS) && (time_status & STA_INS)) {
+		dir = -1;
+		*next_leap_sec = ntp_next_leap_sec;
+		*next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
+	} else if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
+		dir = 1;
+		*next_leap_sec = ntp_next_leap_sec;
+		*next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
+	} else {
+		dir = 0;
+		*next_leap_sec = TIME64_MAX;
+		next_leap_ktime->tv64 = KTIME_MAX;
+	}
+	return dir;
+}
 
 /*
  * this routine handles the overflow of the microsecond field
@@ -382,15 +414,21 @@ int second_overflow(unsigned long secs)
 	 */
 	switch (time_state) {
 	case TIME_OK:
-		if (time_status & STA_INS)
+		if (time_status & STA_INS) {
 			time_state = TIME_INS;
-		else if (time_status & STA_DEL)
+			ntp_next_leap_sec = secs + SECS_PER_DAY -
+						(secs % SECS_PER_DAY);
+		} else if (time_status & STA_DEL) {
 			time_state = TIME_DEL;
+			ntp_next_leap_sec = secs + SECS_PER_DAY -
+						 ((secs+1) % SECS_PER_DAY);
+		}
 		break;
 	case TIME_INS:
-		if (!(time_status & STA_INS))
+		if (!(time_status & STA_INS)) {
+			ntp_next_leap_sec = TIME64_MAX;
 			time_state = TIME_OK;
-		else if (secs % 86400 == 0) {
+		} else if (secs % SECS_PER_DAY == 0) {
 			leap = -1;
 			time_state = TIME_OOP;
 			printk_deferred(KERN_NOTICE
@@ -398,19 +436,21 @@ int second_overflow(unsigned long secs)
 		}
 		break;
 	case TIME_DEL:
-		if (!(time_status & STA_DEL))
+		if (!(time_status & STA_DEL)) {
+			ntp_next_leap_sec = TIME64_MAX;
 			time_state = TIME_OK;
-		else if ((secs + 1) % 86400 == 0) {
+		} else if ((secs + 1) % SECS_PER_DAY == 0) {
 			leap = 1;
+			ntp_next_leap_sec = TIME64_MAX;
 			time_state = TIME_WAIT;
 			printk_deferred(KERN_NOTICE
 				"Clock: deleting leap second 23:59:59 UTC\n");
 		}
 		break;
 	case TIME_OOP:
+		ntp_next_leap_sec = TIME64_MAX;
 		time_state = TIME_WAIT;
 		break;
-
 	case TIME_WAIT:
 		if (!(time_status & (STA_INS | STA_DEL)))
 			time_state = TIME_OK;
@@ -547,6 +587,7 @@ static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
 	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
 		time_state = TIME_OK;
 		time_status = STA_UNSYNC;
+		ntp_next_leap_sec = TIME64_MAX;
 		/* restart PPS frequency calibration */
 		pps_reset_freq_interval();
 	}
@@ -711,6 +752,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai)
 	if (!(time_status & STA_NANO))
 		txc->time.tv_usec /= NSEC_PER_USEC;
 
+	/* Handle leapsec adjustments */
+	if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) {
+		if ((time_state == TIME_INS) && (time_status & STA_INS)) {
+			result = TIME_OOP;
+			txc->tai++;
+			txc->time.tv_sec--;
+		}
+		if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
+			result = TIME_WAIT;
+			txc->tai--;
+			txc->time.tv_sec++;
+		}
+		if ((time_state == TIME_OOP) &&
+					(ts->tv_sec == ntp_next_leap_sec)) {
+			result = TIME_WAIT;
+		}
+	}
+
 	return result;
 }
 
diff --git a/kernel/time/ntp_internal.h b/kernel/time/ntp_internal.h
index bbd102a..cd831b6 100644
--- a/kernel/time/ntp_internal.h
+++ b/kernel/time/ntp_internal.h
@@ -5,6 +5,7 @@ extern void ntp_init(void);
 extern void ntp_clear(void);
 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
 extern u64 ntp_tick_length(void);
+extern int get_leap_state(time64_t *next_leap_sec, ktime_t *next_leap_ktime);
 extern int second_overflow(unsigned long secs);
 extern int ntp_validate_timex(struct timex *);
 extern int __do_adjtimex(struct timex *, struct timespec64 *, s32 *);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 946acb7..9313190 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -591,6 +591,10 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
 		ntp_clear();
 	}
 
+	/* Capture leapsecond state */
+	tk->leap_direction = get_leap_state(&tk->next_leap_sec,
+						&tk->next_leap_ktime);
+
 	tk_update_ktime_data(tk);
 
 	update_vsyscall(tk);
@@ -634,6 +638,24 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 }
 
 /**
+ * __getnstimeofday64_preleap - Returns the time of day in a timespec64,
+ * @tk:		pointer to the timekeeper structure to use
+ * @ts:		pointer to the timespec to be set
+ *
+ * Internal function. Does not take lock. Updates the time of day in the
+ * timespec, WITHOUT the leapsecond edge adjustment.
+ */
+static void __getnstimeofday64_preleap(struct timekeeper *tk, struct timespec64 *ts)
+{
+	s64 nsecs;
+
+	ts->tv_sec = tk->xtime_sec;
+	ts->tv_nsec = 0;
+	nsecs = timekeeping_get_ns(&tk->tkr_mono);
+	timespec64_add_ns(ts, nsecs);
+}
+
+/**
  * __getnstimeofday64 - Returns the time of day in a timespec64.
  * @ts:		pointer to the timespec to be set
  *
@@ -643,20 +665,22 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 int __getnstimeofday64(struct timespec64 *ts)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
+	time64_t next_leap;
+	int dir;
 	unsigned long seq;
-	s64 nsecs = 0;
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 
-		ts->tv_sec = tk->xtime_sec;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		__getnstimeofday64_preleap(tk, ts);
+		next_leap = tk->next_leap_sec;
+		dir = tk->leap_direction;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	ts->tv_nsec = 0;
-	timespec64_add_ns(ts, nsecs);
-
+	/* Apply leapsecond adjustment */
+	if (unlikely(ts->tv_sec >= next_leap))
+		ts->tv_sec += dir;
 	/*
 	 * Do not bail out early, in case there were callers still using
 	 * the value, even in the face of the WARN_ON.
@@ -710,6 +734,8 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs)
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned int seq;
 	ktime_t base, *offset = offsets[offs];
+	ktime_t next_leap;
+	int dir;
 	s64 nsecs;
 
 	WARN_ON(timekeeping_suspended);
@@ -718,11 +744,17 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs)
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = ktime_add(tk->tkr_mono.base, *offset);
 		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		next_leap = tk->next_leap_ktime;
+		dir = tk->leap_direction;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	return ktime_add_ns(base, nsecs);
-
+	base = ktime_add_ns(base, nsecs);
+	/* apply leapsecond adjustment */
+	if (offs == TK_OFFS_REAL)
+		if (unlikely(base.tv64 >= next_leap.tv64))
+			base = ktime_add(base, ktime_set(dir, 0));
+	return base;
 }
 EXPORT_SYMBOL_GPL(ktime_get_with_offset);
 
@@ -733,15 +765,23 @@ EXPORT_SYMBOL_GPL(ktime_get_with_offset);
  */
 ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs)
 {
+	struct timekeeper *tk = &tk_core.timekeeper;
 	ktime_t *offset = offsets[offs];
 	unsigned long seq;
-	ktime_t tconv;
+	ktime_t tconv, next_leap;
+	int dir;
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		tconv = ktime_add(tmono, *offset);
+		next_leap = tk->next_leap_ktime;
+		dir = tk->leap_direction;
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
+	/* apply leapsecond adjustment */
+	if (offs == TK_OFFS_REAL)
+		if (unlikely(tconv.tv64 >= next_leap.tv64))
+			tconv = ktime_add(tconv, ktime_set(dir, 0));
 	return tconv;
 }
 EXPORT_SYMBOL_GPL(ktime_mono_to_any);
@@ -862,6 +902,8 @@ void getnstime_raw_and_real(struct timespec *ts_raw, struct timespec *ts_real)
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned long seq;
 	s64 nsecs_raw, nsecs_real;
+	time64_t next_leap;
+	int dir;
 
 	WARN_ON_ONCE(timekeeping_suspended);
 
@@ -875,10 +917,17 @@ void getnstime_raw_and_real(struct timespec *ts_raw, struct timespec *ts_real)
 		nsecs_raw  = timekeeping_get_ns(&tk->tkr_raw);
 		nsecs_real = timekeeping_get_ns(&tk->tkr_mono);
 
+		next_leap = tk->next_leap_sec;
+		dir = tk->leap_direction;
+
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
 	timespec_add_ns(ts_raw, nsecs_raw);
 	timespec_add_ns(ts_real, nsecs_real);
+
+	/* apply leapsecond adjustment */
+	if (unlikely(ts_real->tv_sec >= next_leap))
+		ts_real->tv_sec += dir;
 }
 EXPORT_SYMBOL(getnstime_raw_and_real);
 
@@ -1252,6 +1301,10 @@ void __init timekeeping_init(void)
 	set_normalized_timespec64(&tmp, -boot.tv_sec, -boot.tv_nsec);
 	tk_set_wall_to_mono(tk, tmp);
 
+	/* Capture leapsecond state */
+	tk->leap_direction = get_leap_state(&tk->next_leap_sec,
+						&tk->next_leap_ktime);
+
 	timekeeping_update(tk, TK_MIRROR);
 
 	write_seqcount_end(&tk_core.seq);
@@ -1422,6 +1475,10 @@ void timekeeping_resume(void)
 	tk->tkr_mono.cycle_last = cycle_now;
 	tk->tkr_raw.cycle_last  = cycle_now;
 
+	/* Capture leapsecond state */
+	tk->leap_direction = get_leap_state(&tk->next_leap_sec,
+						&tk->next_leap_ktime);
+
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
 	timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
@@ -1825,6 +1882,10 @@ void update_wall_time(void)
 	 */
 	clock_set |= accumulate_nsecs_to_secs(tk);
 
+	/* Capture leapsecond state */
+	tk->leap_direction = get_leap_state(&tk->next_leap_sec,
+						&tk->next_leap_ktime);
+
 	write_seqcount_begin(&tk_core.seq);
 	/*
 	 * Update the real timekeeper.
@@ -1970,7 +2031,8 @@ ktime_t ktime_get_update_offsets_now(ktime_t *offs_real, ktime_t *offs_boot,
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned int seq;
-	ktime_t base;
+	ktime_t base, next_leap;
+	int dir;
 	u64 nsecs;
 
 	do {
@@ -1982,9 +2044,18 @@ ktime_t ktime_get_update_offsets_now(ktime_t *offs_real, ktime_t *offs_boot,
 		*offs_real = tk->offs_real;
 		*offs_boot = tk->offs_boot;
 		*offs_tai = tk->offs_tai;
+
+		next_leap = tk->next_leap_ktime;
+		dir = tk->leap_direction;
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	return ktime_add_ns(base, nsecs);
+	base = ktime_add_ns(base, nsecs);
+
+	/* apply leapsecond adjustment */
+	if (unlikely(ktime_add(base, *offs_real).tv64 >= next_leap.tv64))
+		*offs_real = ktime_add(*offs_real, ktime_set(dir, 0));
+
+	return base;
 }
 #endif
 
@@ -2015,11 +2086,11 @@ int do_adjtimex(struct timex *txc)
 			return ret;
 	}
 
-	getnstimeofday64(&ts);
-
 	raw_spin_lock_irqsave(&timekeeper_lock, flags);
 	write_seqcount_begin(&tk_core.seq);
 
+	__getnstimeofday64_preleap(tk, &ts);
+
 	orig_tai = tai = tk->tai_offset;
 	ret = __do_adjtimex(txc, &ts, &tai);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
                   ` (3 preceding siblings ...)
  2015-05-29 20:24 ` [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths John Stultz
@ 2015-05-31 13:55 ` Prarit Bhargava
  2015-06-01 11:57 ` Prarit Bhargava
  2015-06-01 20:18 ` Daniel Bristot de Oliveira
  6 siblings, 0 replies; 40+ messages in thread
From: Prarit Bhargava @ 2015-05-31 13:55 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Daniel Bristot de Oliveira, Richard Cochran, Jan Kara,
	Jiri Bohac, Thomas Gleixner, Ingo Molnar, Shuah Khan



On 05/29/2015 04:24 PM, John Stultz wrote:
> As Prarit reported here:
> https://lkml.org/lkml/2015/5/27/458
> 
> Since the leapsecond is applied at timer tick time, and not
> the actual second edge, ABS_TIME CLOCK_REALTIME timers set for
> right after the leapsecond could fire a second early, since
> some timers may be expired before we trigger the timekeeping
> timer, which then applies the leapsecond.
> 
> Thus this patch series tries to address this isssue, including
> extending the leap-a-day test to catch this problem, as well
> as other relevant fixups I found while working on the code.
> 
> This series has only had limited testing, so I wanted to send
> it out for initial review and comment. Folks can grab this tree
> via git for testing here:
> https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer
> 
> Thougths and feedback would be appreciated!


Testing now ...

P.

> thanks
> -john
> 
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jiri Bohac <jbohac@suse.cz>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Shuah Khan <shuahkh@osg.samsung.com>
> 
> John Stultz (4):
>   selftests: timers: Add leap-second timer edge testing to leap-a-day.c
>   timer_list: Add the base offset so remaining nsecs are accurate for
>     non monotonic timers
>   ntp: Use printk_deferred in leapsecond path
>   time: Do leapsecond adjustment in gettime fastpaths
> 
>  include/linux/time64.h                      |  1 +
>  include/linux/timekeeper_internal.h         |  7 +++
>  kernel/time/ntp.c                           | 77 ++++++++++++++++++++---
>  kernel/time/ntp_internal.h                  |  1 +
>  kernel/time/timekeeping.c                   | 97 +++++++++++++++++++++++++----
>  kernel/time/timer_list.c                    |  2 +-
>  tools/testing/selftests/timers/leap-a-day.c | 76 ++++++++++++++++++++--
>  7 files changed, 234 insertions(+), 27 deletions(-)
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-05-29 20:24 ` [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths John Stultz
@ 2015-05-31 16:05   ` Richard Cochran
  2015-06-02  9:01   ` Ingo Molnar
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Cochran @ 2015-05-31 16:05 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira, Jan Kara,
	Jiri Bohac, Thomas Gleixner, Ingo Molnar, Shuah Khan

On Fri, May 29, 2015 at 01:24:28PM -0700, John Stultz wrote:
> Apologies to Richard Cochran, who pushed for such a change
> years ago, which I resisted due to the concerns about the
> performance overhead.

For the record, I got the idea from Michel Hack of IBM.

> While I suspect this isn't extremely critical, folks who
> care about strict leap-second correctness will likely
> want to watch this, and it will likely be a -stable candidate.

I think this is a step in the right direction.  If the 'next_leap_sec'
is made available to the vdso, then the 1-10 ms time error could also
be prevented there.

I have some comments, but, as is, feel free to add my ack.

Acked-by: Richard Cochran <richardcochran@gmail.com>

> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 472591e..6e15fbb 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c

> @@ -359,6 +364,33 @@ u64 ntp_tick_length(void)
>  	return tick_length;
>  }
>  
> +/**
> + * get_leap_state - Returns the NTP leap state
> + * @next_leap_sec:	Next leapsecond in time64_t
> + * @next_leap_ktime:	Next leapsecond in ktime_t
> + *
> + * Provides NTP leapsecond state. Returns direction
> + * of the leapsecond adjustment as an integer.
> + */
> +int get_leap_state(time64_t *next_leap_sec, ktime_t *next_leap_ktime)
> +{
> +	int dir;
> +
> +	if ((time_state == TIME_INS) && (time_status & STA_INS)) {

This can be reduced to just one test on (time_state == TIME_INS).
If user spaces clears STA_INS, then you can immediately cancel the
leap second.

> +		dir = -1;
> +		*next_leap_sec = ntp_next_leap_sec;
> +		*next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
> +	} else if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
> +		dir = 1;
> +		*next_leap_sec = ntp_next_leap_sec;
> +		*next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
> +	} else {
> +		dir = 0;
> +		*next_leap_sec = TIME64_MAX;
> +		next_leap_ktime->tv64 = KTIME_MAX;
> +	}
> +	return dir;
> +}
>  
>  /*
>   * this routine handles the overflow of the microsecond field

> @@ -382,15 +414,21 @@ int second_overflow(unsigned long secs)
>  	 */
>  	switch (time_state) {
>  	case TIME_OK:
> -		if (time_status & STA_INS)
> +		if (time_status & STA_INS) {

The user sets STA_INS via adjtimex, but we don't change to TIME_INS
until the next tick.  Why not change immediately?  Then this funtion
would only need to check for TIME_INS && (secs % 86400 == 0)  and the
very unlikey TIME_DEL.

>  			time_state = TIME_INS;
> -		else if (time_status & STA_DEL)
> +			ntp_next_leap_sec = secs + SECS_PER_DAY -
> +						(secs % SECS_PER_DAY);
> +		} else if (time_status & STA_DEL) {
>  			time_state = TIME_DEL;
> +			ntp_next_leap_sec = secs + SECS_PER_DAY -
> +						 ((secs+1) % SECS_PER_DAY);
> +		}
>  		break;
>  	case TIME_INS:
> -		if (!(time_status & STA_INS))
> +		if (!(time_status & STA_INS)) {
> +			ntp_next_leap_sec = TIME64_MAX;
>  			time_state = TIME_OK;
> -		else if (secs % 86400 == 0) {
> +		} else if (secs % SECS_PER_DAY == 0) {
>  			leap = -1;
>  			time_state = TIME_OOP;
>  			printk_deferred(KERN_NOTICE

> @@ -711,6 +752,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai)
>  	if (!(time_status & STA_NANO))
>  		txc->time.tv_usec /= NSEC_PER_USEC;
>  
> +	/* Handle leapsec adjustments */

This block and its commnet rather confused me.  What this code
actually does is fix up the time value returned to the caller of
adjtimex, but only in the 1-10 millisecond window before the leap
second tick.

> +	if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) {
> +		if ((time_state == TIME_INS) && (time_status & STA_INS)) {
> +			result = TIME_OOP;
> +			txc->tai++;
> +			txc->time.tv_sec--;
> +		}
> +		if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
> +			result = TIME_WAIT;
> +			txc->tai--;
> +			txc->time.tv_sec++;
> +		}
> +		if ((time_state == TIME_OOP) &&
> +					(ts->tv_sec == ntp_next_leap_sec)) {
> +			result = TIME_WAIT;
> +		}
> +	}
> +
>  	return result;
>  }

Thanks,
Richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
                   ` (4 preceding siblings ...)
  2015-05-31 13:55 ` [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers Prarit Bhargava
@ 2015-06-01 11:57 ` Prarit Bhargava
  2015-06-01 17:02   ` John Stultz
  2015-06-01 20:18 ` Daniel Bristot de Oliveira
  6 siblings, 1 reply; 40+ messages in thread
From: Prarit Bhargava @ 2015-06-01 11:57 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Daniel Bristot de Oliveira, Richard Cochran, Jan Kara,
	Jiri Bohac, Thomas Gleixner, Ingo Molnar, Shuah Khan



On 05/29/2015 04:24 PM, John Stultz wrote:
> As Prarit reported here:
> https://lkml.org/lkml/2015/5/27/458
> 
> Since the leapsecond is applied at timer tick time, and not
> the actual second edge, ABS_TIME CLOCK_REALTIME timers set for
> right after the leapsecond could fire a second early, since
> some timers may be expired before we trigger the timekeeping
> timer, which then applies the leapsecond.
> 
> Thus this patch series tries to address this isssue, including
> extending the leap-a-day test to catch this problem, as well
> as other relevant fixups I found while working on the code.
> 
> This series has only had limited testing, so I wanted to send
> it out for initial review and comment. Folks can grab this tree
> via git for testing here:
> https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer
> 

John, native testing went well across many 32-bit and 64-bit AMD and Intel
boxes.  However, virtual (specifically KVM) guests failed with some sort of
corruption:

[ 1546.038479] Clock: inserting leap second 23:59:60 UTC
[ 1559.774700] Clock: inserting leap second 23:59:60 UTC
[ 1573.502989] Clock: inserting leap second 23:59:60 UTC
[ 1587.235917] Clock: inserting leap second 23:59:60 UTC
[    7.     X] Clock: inserting leap second 23:59:60 UTC
[    7.     X] Clock: inserting leap second 23:59:60 UTC
[    7.     X] Clock: inserting leap second 23:59:60 UTC
[    7.     X] Clock: inserting leap second 23:59:60 UTC

I'm going to see if I can figure out a way to kdump a guest with this problem.
The issue is that it seems like all input to the console is ignored when this
happens.

P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-06-01 11:57 ` Prarit Bhargava
@ 2015-06-01 17:02   ` John Stultz
  2015-06-01 17:43     ` Prarit Bhargava
  0 siblings, 1 reply; 40+ messages in thread
From: John Stultz @ 2015-06-01 17:02 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: lkml, Daniel Bristot de Oliveira, Richard Cochran, Jan Kara,
	Jiri Bohac, Thomas Gleixner, Ingo Molnar, Shuah Khan

On Mon, Jun 1, 2015 at 4:57 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>
>
> On 05/29/2015 04:24 PM, John Stultz wrote:
>> As Prarit reported here:
>> https://lkml.org/lkml/2015/5/27/458
>>
>> Since the leapsecond is applied at timer tick time, and not
>> the actual second edge, ABS_TIME CLOCK_REALTIME timers set for
>> right after the leapsecond could fire a second early, since
>> some timers may be expired before we trigger the timekeeping
>> timer, which then applies the leapsecond.
>>
>> Thus this patch series tries to address this isssue, including
>> extending the leap-a-day test to catch this problem, as well
>> as other relevant fixups I found while working on the code.
>>
>> This series has only had limited testing, so I wanted to send
>> it out for initial review and comment. Folks can grab this tree
>> via git for testing here:
>> https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer
>>
>
> John, native testing went well across many 32-bit and 64-bit AMD and Intel
> boxes.  However, virtual (specifically KVM) guests failed with some sort of
> corruption:
>
> [ 1546.038479] Clock: inserting leap second 23:59:60 UTC
> [ 1559.774700] Clock: inserting leap second 23:59:60 UTC
> [ 1573.502989] Clock: inserting leap second 23:59:60 UTC
> [ 1587.235917] Clock: inserting leap second 23:59:60 UTC
> [    7.     X] Clock: inserting leap second 23:59:60 UTC
> [    7.     X] Clock: inserting leap second 23:59:60 UTC
> [    7.     X] Clock: inserting leap second 23:59:60 UTC
> [    7.     X] Clock: inserting leap second 23:59:60 UTC
>
> I'm going to see if I can figure out a way to kdump a guest with this problem.
> The issue is that it seems like all input to the console is ignored when this
> happens.

Hrm. Can you send me the qemu version and command line you're using?

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-06-01 17:02   ` John Stultz
@ 2015-06-01 17:43     ` Prarit Bhargava
  0 siblings, 0 replies; 40+ messages in thread
From: Prarit Bhargava @ 2015-06-01 17:43 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Daniel Bristot de Oliveira, Richard Cochran, Jan Kara,
	Jiri Bohac, Thomas Gleixner, Ingo Molnar, Shuah Khan



On 06/01/2015 01:02 PM, John Stultz wrote:
> On Mon, Jun 1, 2015 at 4:57 AM, Prarit Bhargava <prarit@redhat.com> wrote:

>>
>> John, native testing went well across many 32-bit and 64-bit AMD and Intel
>> boxes.  However, virtual (specifically KVM) guests failed with some sort of
>> corruption:
>>
>> [ 1546.038479] Clock: inserting leap second 23:59:60 UTC
>> [ 1559.774700] Clock: inserting leap second 23:59:60 UTC
>> [ 1573.502989] Clock: inserting leap second 23:59:60 UTC
>> [ 1587.235917] Clock: inserting leap second 23:59:60 UTC
>> [    7.     X] Clock: inserting leap second 23:59:60 UTC
>> [    7.     X] Clock: inserting leap second 23:59:60 UTC
>> [    7.     X] Clock: inserting leap second 23:59:60 UTC
>> [    7.     X] Clock: inserting leap second 23:59:60 UTC
>>
>> I'm going to see if I can figure out a way to kdump a guest with this problem.
>> The issue is that it seems like all input to the console is ignored when this
>> happens.
> 
> Hrm. Can you send me the qemu version and command line you're using?

The qemu version is 1.5.3.

The command line is:

 /usr/libexec/qemu-kvm -name prarit_fedora_21 -S -machine
pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
4,sockets=4,cores=1,threads=1 -uuid a387ab20-1b19-4519-95dd-15805d40dfa0
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/prarit_fedora_21.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
file=/var/lib/libvirt/images/prarit_fedora_21.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=23,id=hostnet0,vhost=on,vhostfd=25 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:4f:4f:db,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-chardev spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-device usb-tablet,id=input0 -spice
port=5902,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl
-global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -global
qxl-vga.vgamem_mb=16 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

P.

> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
                   ` (5 preceding siblings ...)
  2015-06-01 11:57 ` Prarit Bhargava
@ 2015-06-01 20:18 ` Daniel Bristot de Oliveira
  2015-06-01 20:32   ` John Stultz
  2015-06-01 21:42   ` Prarit Bhargava
  6 siblings, 2 replies; 40+ messages in thread
From: Daniel Bristot de Oliveira @ 2015-06-01 20:18 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: Prarit Bhargava, Richard Cochran, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan



Il 29/05/2015 17:24, John Stultz ha scritto:
> Thus this patch series tries to address this isssue, including
> extending the leap-a-day test to catch this problem, as well
> as other relevant fixups I found while working on the code.
> 
> This series has only had limited testing, so I wanted to send
> it out for initial review and comment. Folks can grab this tree
> via git for testing here:
> https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer

John,

I saw two failures while testing the patched kernel
using the patched leap-a-day.

On both cases the leap second was not inserted, and
then the timer fired at the second edge (as expected?).

Here is the output of both leap-a-day and dmesg for
both errors:

===== Error 1 =====
[root@vostro timers]# ./leap-a-day -s
[...many lines...]
Setting time to Wed Sep  9 20:59:50 2015
Scheduling leap second for Wed Sep  9 21:00:00 2015
Setting timer for Wed Sep  9 21:00:00 2015
Wed Sep  9 20:59:57 2015 +    147 us (0)	TIME_INS
Wed Sep  9 20:59:57 2015 + 500352 us (0)	TIME_INS
Wed Sep  9 20:59:58 2015 +    558 us (0)	TIME_INS
Wed Sep  9 20:59:58 2015 + 500767 us (0)	TIME_INS
Wed Sep  9 20:59:59 2015 +    975 us (0)	TIME_INS
Wed Sep  9 20:59:59 2015 + 501181 us (0)	TIME_INS
Wed Sep  9 20:59:59 2015 +   1442 us (1)	TIME_OOP
Wed Sep  9 20:59:59 2015 + 501670 us (1)	TIME_OOP
Wed Sep  9 21:00:00 2015 +    110 us (1)	TIME_WAIT - TIMER FIRED
Wed Sep  9 21:00:00 2015 +    179 us (1)	TIME_WAIT
Wed Sep  9 21:00:00 2015 + 500340 us (1)	TIME_WAIT
Wed Sep  9 21:00:01 2015 +    548 us (1)	TIME_WAIT
Wed Sep  9 21:00:01 2015 + 500768 us (1)	TIME_WAIT
Wed Sep  9 21:00:02 2015 +    976 us (1)	TIME_WAIT
Leap complete

Setting time to Thu Sep 10 20:59:50 2015
Scheduling leap second for Thu Sep 10 21:00:00 2015
Setting timer for Thu Sep 10 21:00:00 2015
Thu Sep 10 20:59:57 2015 +    209 us (1)	TIME_DEL
Thu Sep 10 20:59:57 2015 + 500410 us (1)	TIME_DEL
Thu Sep 10 20:59:58 2015 +    646 us (1)	TIME_DEL
Thu Sep 10 20:59:58 2015 + 500881 us (1)	TIME_DEL
Thu Sep 10 21:00:00 2015 +   1182 us (0)	TIME_WAIT - TIMER FIRED
Thu Sep 10 21:00:00 2015 +   1283 us (0)	TIME_WAIT
Thu Sep 10 21:00:00 2015 + 501475 us (0)	TIME_WAIT
Thu Sep 10 21:00:01 2015 +   1654 us (0)	TIME_WAIT
Thu Sep 10 21:00:01 2015 + 501856 us (0)	TIME_WAIT
Thu Sep 10 21:00:02 2015 +   2032 us (0)	TIME_WAIT
Leap complete

Setting time to Fri Sep 11 20:59:50 2015
Scheduling leap second for Fri Sep 11 21:00:00 2015
Setting timer for Fri Sep 11 21:00:00 2015
Fri Sep 11 20:59:57 2015 +    139 us (0)	TIME_INS
Fri Sep 11 20:59:57 2015 + 500347 us (0)	TIME_INS
Fri Sep 11 20:59:58 2015 +    527 us (0)	TIME_INS
Fri Sep 11 20:59:58 2015 + 500751 us (0)	TIME_INS
Fri Sep 11 20:59:59 2015 +    987 us (0)	TIME_INS
Fri Sep 11 20:59:59 2015 + 501203 us (0)	TIME_INS
Fri Sep 11 21:00:00 2015 +    100 us (0)	TIME_INS - TIMER FIRED
Error: Incorrect NTP state?
Fri Sep 11 21:00:00 2015 +    173 us (0)	TIME_INS
Fri Sep 11 21:00:00 2015 + 500353 us (0)	TIME_OK
Fri Sep 11 21:00:01 2015 +    583 us (0)	TIME_OK
Fri Sep 11 21:00:01 2015 + 500799 us (0)	TIME_OK
Fri Sep 11 21:00:02 2015 +   1021 us (0)	TIME_OK
Leap complete
Errors observed

[root@vostro timers]# dmesg  | tail
[ 1451.279313] Clock: deleting leap second 23:59:59 UTC
[ 1464.296724] Clock: inserting leap second 23:59:60 UTC
[ 1477.312373] Clock: deleting leap second 23:59:59 UTC
[ 1490.329740] Clock: inserting leap second 23:59:60 UTC
[ 1503.345402] Clock: deleting leap second 23:59:59 UTC
[ 1516.362841] Clock: inserting leap second 23:59:60 UTC
[ 1529.378466] Clock: deleting leap second 23:59:59 UTC
[ 1542.395826] Clock: inserting leap second 23:59:60 UTC
[ 1555.411551] Clock: deleting leap second 23:59:59 UTC
[ 1697.998312] Adjusting tsc more than 11% (5677670 vs 7466007) <-- ???
---> did not insert the leap second <---
## did not insert the leap second

===== Error 2 =====
[root@vostro timers]# ./leap-a-day -s
[...many lines...]
Setting time to Wed Aug 19 20:59:50 2015
Scheduling leap second for Wed Aug 19 21:00:00 2015
Setting timer for Wed Aug 19 21:00:00 2015
Something cleared STA_INS/STA_DEL, setting it again.
Wed Aug 19 20:59:57 2015 +    181 us (0)	TIME_OK
Wed Aug 19 20:59:57 2015 + 500357 us (0)	TIME_INS
Wed Aug 19 20:59:58 2015 +    559 us (0)	TIME_INS
Wed Aug 19 20:59:58 2015 + 500761 us (0)	TIME_INS
Wed Aug 19 20:59:59 2015 +    967 us (0)	TIME_INS
Wed Aug 19 20:59:59 2015 + 501177 us (0)	TIME_INS
Wed Aug 19 20:59:59 2015 +   1318 us (1)	TIME_OOP
Wed Aug 19 20:59:59 2015 + 501508 us (1)	TIME_OOP
Wed Aug 19 21:00:00 2015 +     99 us (1)	TIME_WAIT - TIMER FIRED
Wed Aug 19 21:00:00 2015 +    161 us (1)	TIME_WAIT
Wed Aug 19 21:00:00 2015 + 500320 us (1)	TIME_WAIT
Wed Aug 19 21:00:01 2015 +    523 us (1)	TIME_WAIT
Wed Aug 19 21:00:01 2015 + 500735 us (1)	TIME_WAIT
Wed Aug 19 21:00:02 2015 +    937 us (1)	TIME_WAIT
Leap complete

Setting time to Thu Aug 20 20:59:50 2015
Scheduling leap second for Thu Aug 20 21:00:00 2015
Setting timer for Thu Aug 20 21:00:00 2015
Thu Aug 20 20:59:57 2015 +    144 us (1)	TIME_DEL
Thu Aug 20 20:59:57 2015 + 500351 us (1)	TIME_DEL
Thu Aug 20 20:59:58 2015 +    552 us (1)	TIME_DEL
Thu Aug 20 20:59:58 2015 + 500752 us (1)	TIME_DEL
Thu Aug 20 21:00:00 2015 +   1021 us (0)	TIME_WAIT - TIMER FIRED
Thu Aug 20 21:00:00 2015 +   1096 us (0)	TIME_WAIT
Thu Aug 20 21:00:00 2015 + 501267 us (0)	TIME_WAIT
Thu Aug 20 21:00:01 2015 +   1470 us (0)	TIME_WAIT
Thu Aug 20 21:00:01 2015 + 501677 us (0)	TIME_WAIT
Thu Aug 20 21:00:02 2015 +   1886 us (0)	TIME_WAIT
Leap complete

Setting time to Fri Aug 21 20:59:50 2015
Scheduling leap second for Fri Aug 21 21:00:00 2015
Setting timer for Fri Aug 21 21:00:00 2015
Fri Aug 21 20:59:57 2015 +    153 us (0)	TIME_INS
Fri Aug 21 20:59:57 2015 + 500354 us (0)	TIME_INS
Fri Aug 21 20:59:58 2015 +    562 us (0)	TIME_INS
Fri Aug 21 20:59:58 2015 + 500815 us (0)	TIME_INS
Fri Aug 21 20:59:59 2015 +   1000 us (0)	TIME_INS
Fri Aug 21 20:59:59 2015 + 501207 us (0)	TIME_INS
Fri Aug 21 21:00:00 2015 +    109 us (0)	TIME_INS - TIMER FIRED
Error: Incorrect NTP state?
Fri Aug 21 21:00:00 2015 +    173 us (0)	TIME_INS
Fri Aug 21 21:00:00 2015 + 500331 us (0)	TIME_OK
Fri Aug 21 21:00:01 2015 +    540 us (0)	TIME_OK
Fri Aug 21 21:00:01 2015 + 500791 us (0)	TIME_OK
Fri Aug 21 21:00:02 2015 +   1046 us (0)	TIME_OK
Leap complete
Errors observed

[root@vostro timers]# dmesg  | tail
[ 1283.857902] Clock: inserting leap second 23:59:60 UTC
[ 1296.873600] Clock: deleting leap second 23:59:59 UTC
[ 1309.891049] Clock: inserting leap second 23:59:60 UTC
[ 1322.906709] Clock: deleting leap second 23:59:59 UTC
[ 1335.924154] Clock: inserting leap second 23:59:60 UTC
[ 1348.939824] Clock: deleting leap second 23:59:59 UTC
[ 1361.957267] Clock: inserting leap second 23:59:60 UTC
[ 1374.972919] Clock: deleting leap second 23:59:59 UTC
[ 1387.989238] Clock: inserting leap second 23:59:60 UTC
[ 1401.005877] Clock: deleting leap second 23:59:59 UTC
---> did not insert the leap second <---

Both cases occurred in the same physical machine.
AFAICS, it does not occur in the unpatched
kernel/leap-a-day in the same machine.

-- Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-06-01 20:18 ` Daniel Bristot de Oliveira
@ 2015-06-01 20:32   ` John Stultz
  2015-06-01 21:42   ` Prarit Bhargava
  1 sibling, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-01 20:32 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: lkml, Prarit Bhargava, Richard Cochran, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Mon, Jun 1, 2015 at 1:18 PM, Daniel Bristot de Oliveira
<bristot@redhat.com> wrote:
>
>
> Il 29/05/2015 17:24, John Stultz ha scritto:
>> Thus this patch series tries to address this isssue, including
>> extending the leap-a-day test to catch this problem, as well
>> as other relevant fixups I found while working on the code.
>>
>> This series has only had limited testing, so I wanted to send
>> it out for initial review and comment. Folks can grab this tree
>> via git for testing here:
>> https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer
>
> John,
>
> I saw two failures while testing the patched kernel
> using the patched leap-a-day.
>
> On both cases the leap second was not inserted, and
> then the timer fired at the second edge (as expected?).
>
> Here is the output of both leap-a-day and dmesg for
> both errors:
>
> ===== Error 1 =====
> [root@vostro timers]# ./leap-a-day -s
> [...many lines...]
> Setting time to Wed Sep  9 20:59:50 2015
> Scheduling leap second for Wed Sep  9 21:00:00 2015
> Setting timer for Wed Sep  9 21:00:00 2015
> Wed Sep  9 20:59:57 2015 +    147 us (0)        TIME_INS
> Wed Sep  9 20:59:57 2015 + 500352 us (0)        TIME_INS
> Wed Sep  9 20:59:58 2015 +    558 us (0)        TIME_INS
> Wed Sep  9 20:59:58 2015 + 500767 us (0)        TIME_INS
> Wed Sep  9 20:59:59 2015 +    975 us (0)        TIME_INS
> Wed Sep  9 20:59:59 2015 + 501181 us (0)        TIME_INS
> Wed Sep  9 20:59:59 2015 +   1442 us (1)        TIME_OOP
> Wed Sep  9 20:59:59 2015 + 501670 us (1)        TIME_OOP
> Wed Sep  9 21:00:00 2015 +    110 us (1)        TIME_WAIT - TIMER FIRED
> Wed Sep  9 21:00:00 2015 +    179 us (1)        TIME_WAIT
> Wed Sep  9 21:00:00 2015 + 500340 us (1)        TIME_WAIT
> Wed Sep  9 21:00:01 2015 +    548 us (1)        TIME_WAIT
> Wed Sep  9 21:00:01 2015 + 500768 us (1)        TIME_WAIT
> Wed Sep  9 21:00:02 2015 +    976 us (1)        TIME_WAIT
> Leap complete
>
> Setting time to Thu Sep 10 20:59:50 2015
> Scheduling leap second for Thu Sep 10 21:00:00 2015
> Setting timer for Thu Sep 10 21:00:00 2015
> Thu Sep 10 20:59:57 2015 +    209 us (1)        TIME_DEL
> Thu Sep 10 20:59:57 2015 + 500410 us (1)        TIME_DEL
> Thu Sep 10 20:59:58 2015 +    646 us (1)        TIME_DEL
> Thu Sep 10 20:59:58 2015 + 500881 us (1)        TIME_DEL
> Thu Sep 10 21:00:00 2015 +   1182 us (0)        TIME_WAIT - TIMER FIRED
> Thu Sep 10 21:00:00 2015 +   1283 us (0)        TIME_WAIT
> Thu Sep 10 21:00:00 2015 + 501475 us (0)        TIME_WAIT
> Thu Sep 10 21:00:01 2015 +   1654 us (0)        TIME_WAIT
> Thu Sep 10 21:00:01 2015 + 501856 us (0)        TIME_WAIT
> Thu Sep 10 21:00:02 2015 +   2032 us (0)        TIME_WAIT
> Leap complete
>
> Setting time to Fri Sep 11 20:59:50 2015
> Scheduling leap second for Fri Sep 11 21:00:00 2015
> Setting timer for Fri Sep 11 21:00:00 2015
> Fri Sep 11 20:59:57 2015 +    139 us (0)        TIME_INS
> Fri Sep 11 20:59:57 2015 + 500347 us (0)        TIME_INS
> Fri Sep 11 20:59:58 2015 +    527 us (0)        TIME_INS
> Fri Sep 11 20:59:58 2015 + 500751 us (0)        TIME_INS
> Fri Sep 11 20:59:59 2015 +    987 us (0)        TIME_INS
> Fri Sep 11 20:59:59 2015 + 501203 us (0)        TIME_INS
> Fri Sep 11 21:00:00 2015 +    100 us (0)        TIME_INS - TIMER FIRED
> Error: Incorrect NTP state?
> Fri Sep 11 21:00:00 2015 +    173 us (0)        TIME_INS
> Fri Sep 11 21:00:00 2015 + 500353 us (0)        TIME_OK
> Fri Sep 11 21:00:01 2015 +    583 us (0)        TIME_OK
> Fri Sep 11 21:00:01 2015 + 500799 us (0)        TIME_OK
> Fri Sep 11 21:00:02 2015 +   1021 us (0)        TIME_OK
> Leap complete
> Errors observed
>
> [root@vostro timers]# dmesg  | tail
> [ 1451.279313] Clock: deleting leap second 23:59:59 UTC
> [ 1464.296724] Clock: inserting leap second 23:59:60 UTC
> [ 1477.312373] Clock: deleting leap second 23:59:59 UTC
> [ 1490.329740] Clock: inserting leap second 23:59:60 UTC
> [ 1503.345402] Clock: deleting leap second 23:59:59 UTC
> [ 1516.362841] Clock: inserting leap second 23:59:60 UTC
> [ 1529.378466] Clock: deleting leap second 23:59:59 UTC
> [ 1542.395826] Clock: inserting leap second 23:59:60 UTC
> [ 1555.411551] Clock: deleting leap second 23:59:59 UTC
> [ 1697.998312] Adjusting tsc more than 11% (5677670 vs 7466007) <-- ???
> ---> did not insert the leap second <---
> ## did not insert the leap second
>
> ===== Error 2 =====
> [root@vostro timers]# ./leap-a-day -s
> [...many lines...]
> Setting time to Wed Aug 19 20:59:50 2015
> Scheduling leap second for Wed Aug 19 21:00:00 2015
> Setting timer for Wed Aug 19 21:00:00 2015
> Something cleared STA_INS/STA_DEL, setting it again.
> Wed Aug 19 20:59:57 2015 +    181 us (0)        TIME_OK
> Wed Aug 19 20:59:57 2015 + 500357 us (0)        TIME_INS
> Wed Aug 19 20:59:58 2015 +    559 us (0)        TIME_INS
> Wed Aug 19 20:59:58 2015 + 500761 us (0)        TIME_INS
> Wed Aug 19 20:59:59 2015 +    967 us (0)        TIME_INS
> Wed Aug 19 20:59:59 2015 + 501177 us (0)        TIME_INS
> Wed Aug 19 20:59:59 2015 +   1318 us (1)        TIME_OOP
> Wed Aug 19 20:59:59 2015 + 501508 us (1)        TIME_OOP
> Wed Aug 19 21:00:00 2015 +     99 us (1)        TIME_WAIT - TIMER FIRED
> Wed Aug 19 21:00:00 2015 +    161 us (1)        TIME_WAIT
> Wed Aug 19 21:00:00 2015 + 500320 us (1)        TIME_WAIT
> Wed Aug 19 21:00:01 2015 +    523 us (1)        TIME_WAIT
> Wed Aug 19 21:00:01 2015 + 500735 us (1)        TIME_WAIT
> Wed Aug 19 21:00:02 2015 +    937 us (1)        TIME_WAIT
> Leap complete
>
> Setting time to Thu Aug 20 20:59:50 2015
> Scheduling leap second for Thu Aug 20 21:00:00 2015
> Setting timer for Thu Aug 20 21:00:00 2015
> Thu Aug 20 20:59:57 2015 +    144 us (1)        TIME_DEL
> Thu Aug 20 20:59:57 2015 + 500351 us (1)        TIME_DEL
> Thu Aug 20 20:59:58 2015 +    552 us (1)        TIME_DEL
> Thu Aug 20 20:59:58 2015 + 500752 us (1)        TIME_DEL
> Thu Aug 20 21:00:00 2015 +   1021 us (0)        TIME_WAIT - TIMER FIRED
> Thu Aug 20 21:00:00 2015 +   1096 us (0)        TIME_WAIT
> Thu Aug 20 21:00:00 2015 + 501267 us (0)        TIME_WAIT
> Thu Aug 20 21:00:01 2015 +   1470 us (0)        TIME_WAIT
> Thu Aug 20 21:00:01 2015 + 501677 us (0)        TIME_WAIT
> Thu Aug 20 21:00:02 2015 +   1886 us (0)        TIME_WAIT
> Leap complete
>
> Setting time to Fri Aug 21 20:59:50 2015
> Scheduling leap second for Fri Aug 21 21:00:00 2015
> Setting timer for Fri Aug 21 21:00:00 2015
> Fri Aug 21 20:59:57 2015 +    153 us (0)        TIME_INS
> Fri Aug 21 20:59:57 2015 + 500354 us (0)        TIME_INS
> Fri Aug 21 20:59:58 2015 +    562 us (0)        TIME_INS
> Fri Aug 21 20:59:58 2015 + 500815 us (0)        TIME_INS
> Fri Aug 21 20:59:59 2015 +   1000 us (0)        TIME_INS
> Fri Aug 21 20:59:59 2015 + 501207 us (0)        TIME_INS
> Fri Aug 21 21:00:00 2015 +    109 us (0)        TIME_INS - TIMER FIRED
> Error: Incorrect NTP state?
> Fri Aug 21 21:00:00 2015 +    173 us (0)        TIME_INS
> Fri Aug 21 21:00:00 2015 + 500331 us (0)        TIME_OK
> Fri Aug 21 21:00:01 2015 +    540 us (0)        TIME_OK
> Fri Aug 21 21:00:01 2015 + 500791 us (0)        TIME_OK
> Fri Aug 21 21:00:02 2015 +   1046 us (0)        TIME_OK
> Leap complete
> Errors observed
>
> [root@vostro timers]# dmesg  | tail
> [ 1283.857902] Clock: inserting leap second 23:59:60 UTC
> [ 1296.873600] Clock: deleting leap second 23:59:59 UTC
> [ 1309.891049] Clock: inserting leap second 23:59:60 UTC
> [ 1322.906709] Clock: deleting leap second 23:59:59 UTC
> [ 1335.924154] Clock: inserting leap second 23:59:60 UTC
> [ 1348.939824] Clock: deleting leap second 23:59:59 UTC
> [ 1361.957267] Clock: inserting leap second 23:59:60 UTC
> [ 1374.972919] Clock: deleting leap second 23:59:59 UTC
> [ 1387.989238] Clock: inserting leap second 23:59:60 UTC
> [ 1401.005877] Clock: deleting leap second 23:59:59 UTC
> ---> did not insert the leap second <---
>
> Both cases occurred in the same physical machine.
> AFAICS, it does not occur in the unpatched
> kernel/leap-a-day in the same machine.

Hrm. So did you happen to have ntpd running in the background during this test?

I know Prarit saw something very similar when he left ntpd running
during the test. I'd suspect it was due to ntpd changing the ntpd
state and possibly trying to stear the clock, however I've not seen
such behavior yet leaving ntpd running, and from your log it seems
like it would have to hit at a very specific window to trigger what
you're seeing.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-06-01 20:18 ` Daniel Bristot de Oliveira
  2015-06-01 20:32   ` John Stultz
@ 2015-06-01 21:42   ` Prarit Bhargava
  2015-06-01 22:29     ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 40+ messages in thread
From: Prarit Bhargava @ 2015-06-01 21:42 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: John Stultz, lkml, Richard Cochran, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On 06/01/2015 04:18 PM, Daniel Bristot de Oliveira wrote:
> 
> 
> Il 29/05/2015 17:24, John Stultz ha scritto:
>> Thus this patch series tries to address this isssue, including
>> extending the leap-a-day test to catch this problem, as well
>> as other relevant fixups I found while working on the code.
>>
>> This series has only had limited testing, so I wanted to send
>> it out for initial review and comment. Folks can grab this tree
>> via git for testing here:
>> https://git.linaro.org/people/john.stultz/linux.git dev/early-leap-timer
> 
> John,
> 
> I saw two failures while testing the patched kernel
> using the patched leap-a-day.
> 
> On both cases the leap second was not inserted, and
> then the timer fired at the second edge (as expected?).


Daniel, did you disable chronyd/ntpd?  I've seen both failures if I leave
chronyd running.

P.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-06-01 21:42   ` Prarit Bhargava
@ 2015-06-01 22:29     ` Daniel Bristot de Oliveira
  2015-06-02  6:19       ` John Stultz
  0 siblings, 1 reply; 40+ messages in thread
From: Daniel Bristot de Oliveira @ 2015-06-01 22:29 UTC (permalink / raw)
  To: Prarit Bhargava, John Stultz
  Cc: lkml, Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan



Il 01/06/2015 18:42, Prarit Bhargava ha scritto:
> Daniel, did you disable chronyd/ntpd?  I've seen both failures if I leave
> chronyd running.
> 
> P.

Prarit, John, that is it, chronyd was running.

- Daniel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers
  2015-06-01 22:29     ` Daniel Bristot de Oliveira
@ 2015-06-02  6:19       ` John Stultz
  0 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-02  6:19 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Prarit Bhargava, lkml, Richard Cochran, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Mon, Jun 1, 2015 at 3:29 PM, Daniel Bristot de Oliveira
<bristot@redhat.com> wrote:
>
>
> Il 01/06/2015 18:42, Prarit Bhargava ha scritto:
>> Daniel, did you disable chronyd/ntpd?  I've seen both failures if I leave
>> chronyd running.
>>
>> P.
>
> Prarit, John, that is it, chronyd was running.

I reproduced this with chronyd running in the background, and while
its unlikely to be an edge case anyone would hit in normal usage, I'm
looking at how to ensure the adjtimex state is handled better.

It basically comes down to chrony modifying the timex.time_status
removing the STA_INS flag, which doesn't get processed to change the
time_state from TIME_INS -> TIME_OK until the next second_overflow().
Since STA_INS flag is not present, the leap-second is not applied at
the second edge, and we see the strange TIME_INS state run through
what should be the leapsecond.

Mostly this is an issue with the test not printing the time_status
details, so the output looks especially confusing.  Richard's point
about modifying the time_state immediately when the STA_INS flag is
dropped by adjtimex, rather then waiting for the next second_overflow
call would likely make this case more sensible as well.

But again, this is a problematic edge case of two adjtimex controllers
running at the same time, which isn't really valid to begin with, so
I'm less worried about trying to get this issue fixed and into
-stable, and am fine with improving it via a test output change or via
a separate patch to change the time_state state machine.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-05-29 20:24 ` [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths John Stultz
  2015-05-31 16:05   ` Richard Cochran
@ 2015-06-02  9:01   ` Ingo Molnar
  2015-06-02  9:21     ` Peter Zijlstra
  2015-06-02 15:52     ` John Stultz
  1 sibling, 2 replies; 40+ messages in thread
From: Ingo Molnar @ 2015-06-02  9:01 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> Currently, leapsecond adjustments are done at tick time.
> 
> As a result, the leapsecond was applied at the first timer
> tick *after* the leapsecond (~1-10ms late depending on HZ),
> rather then exactly on the second edge.
> 
> This was in part historical from back when we were always
> tick based, but correcting this since has been avoided since
> it adds extra conditional checks in the gettime fastpath,
> which has performance overhead.
> 
> However, it was recently pointed out that ABS_TIME
> CLOCK_REALTIME timers set for right after the leapsecond
> could fire a second early, since some timers may be expired
> before we trigger the timekeeping timer, which then applies
> the leapsecond.
> 
> This isn't quite as bad as it sounds, since behaviorally
> it is similar to what is possible w/ ntpd made leapsecond
> adjustments done w/o using the kernel discipline. Where
> due to latencies, timers may fire just prior to the
> settimeofday call. (Also, one should note that all
> applications using CLOCK_REALTIME timers should always be
> careful, since they are prone to quirks from settimeofday()
> disturbances.)
> 
> However, the purpose of having the kernel do the leap adjustment
> is to avoid such latencies, so I think this is worth fixing.
> 
> So in order to properly keep those timers from firing a second
> early, this patch modifies the gettime accessors to do the
> extra checks to apply the leapsecond adjustment on the second
> edge. This prevents the timer core from expiring timers too
> early.
> 
> This patch does not handle VDSO time implementations, so
> userspace using vdso gettime will still see the leapsecond
> applied at the first timer tick after the leapsecond.
> This is a bit of a tradeoff, since the performance impact
> would be greatest to VDSO implementations, and since vdso
> interfaces don't provide the TIME_OOP flag, one can't
> distinquish the leapsecond from a time discontinuity (such
> as settimeofday), so correcting the VDSO may not be as
> important there.
> 
> Apologies to Richard Cochran, who pushed for such a change
> years ago, which I resisted due to the concerns about the
> performance overhead.
> 
> While I suspect this isn't extremely critical, folks who
> care about strict leap-second correctness will likely
> want to watch this, and it will likely be a -stable candidate.
> 
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jiri Bohac <jbohac@suse.cz>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Shuah Khan <shuahkh@osg.samsung.com>
> Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Reported-by: Prarit Bhargava <prarit@redhat.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/time64.h              |  1 +
>  include/linux/timekeeper_internal.h |  7 +++
>  kernel/time/ntp.c                   | 73 +++++++++++++++++++++++++---
>  kernel/time/ntp_internal.h          |  1 +
>  kernel/time/timekeeping.c           | 97 ++++++++++++++++++++++++++++++++-----
>  5 files changed, 159 insertions(+), 20 deletions(-)

So I don't like the complexity of this at all: why do we add over 100 lines of 
code for something that occurs (literally) once in a blue moon?

... and for that reason I'm not surprised at all that it broke in non-obvious 
ways.

Instead of having these super rare special events, how about implementing leap 
second smearing instead? That's far less radical and a lot easier to test as well, 
as it's a continuous mechanism. It will also confuse user-space a lot less, 
because there are no sudden time jumps.

Secondly, why is there a directional flag? I thought leap seconds can only be 
inserted.

So all in one, the leap second code is fragile and complex - lets re-think the 
whole topic instead of complicating it even more ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-02  9:01   ` Ingo Molnar
@ 2015-06-02  9:21     ` Peter Zijlstra
  2015-06-02 14:09       ` John Stultz
  2015-06-02 15:52     ` John Stultz
  1 sibling, 1 reply; 40+ messages in thread
From: Peter Zijlstra @ 2015-06-02  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: John Stultz, lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

On Tue, Jun 02, 2015 at 11:01:54AM +0200, Ingo Molnar wrote:
> > Currently, leapsecond adjustments are done at tick time.
> > 
> > As a result, the leapsecond was applied at the first timer
> > tick *after* the leapsecond (~1-10ms late depending on HZ),
> > rather then exactly on the second edge.

So why not program a hrtimer to expire at the exact right time? Then
even the VDSO gets its time 'right' and you avoid touching all the
fast paths.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path
  2015-05-29 20:24 ` [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path John Stultz
@ 2015-06-02 10:31   ` Jiri Bohac
  2015-06-02 10:43     ` Jiri Kosina
  2015-06-02 16:04     ` John Stultz
  0 siblings, 2 replies; 40+ messages in thread
From: Jiri Bohac @ 2015-06-02 10:31 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Thomas Gleixner, Ingo Molnar,
	Shuah Khan

Hi,

On Fri, May 29, 2015 at 01:24:27PM -0700, John Stultz wrote:
> Looking over the leapsecond code, I noticed the printk messages
> reporting the leapsecond insertion in the second_overflow path
> were not using the printk_deferred method. This was surprising
> since the printk_deferred method was added in part to avoid
> printk-ing while holding the timekeeping locks.
> 
> See 6d9bcb621b0b (timekeeping: use printk_deferred when holding
> timekeeping seqlock) for further rational.
> 
> I can only guess that this omission was a git add -p oversight.

second_overflow() is called from accumulate_nsecs_to_secs().

accumulate_nsecs_to_secs() is called from update_wall_time()
- once directly 
- once via logarithmic_accumulation()
Both calls are before write_seqcount_begin(&tk_core.seq).

So it looks safe to use printk there.

Regards,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path
  2015-06-02 10:31   ` Jiri Bohac
@ 2015-06-02 10:43     ` Jiri Kosina
  2015-06-02 16:14       ` John Stultz
  2015-06-02 16:04     ` John Stultz
  1 sibling, 1 reply; 40+ messages in thread
From: Jiri Kosina @ 2015-06-02 10:43 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: John Stultz, lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Thomas Gleixner, Ingo Molnar,
	Shuah Khan

On Tue, 2 Jun 2015, Jiri Bohac wrote:

> > Looking over the leapsecond code, I noticed the printk messages
> > reporting the leapsecond insertion in the second_overflow path
> > were not using the printk_deferred method. This was surprising
> > since the printk_deferred method was added in part to avoid
> > printk-ing while holding the timekeeping locks.
> > 
> > See 6d9bcb621b0b (timekeeping: use printk_deferred when holding
> > timekeeping seqlock) for further rational.
> > 
> > I can only guess that this omission was a git add -p oversight.
> 
> second_overflow() is called from accumulate_nsecs_to_secs().
> 
> accumulate_nsecs_to_secs() is called from update_wall_time()
> - once directly 
> - once via logarithmic_accumulation()
> Both calls are before write_seqcount_begin(&tk_core.seq).
> 
> So it looks safe to use printk there.

Couldn't we stuff a couple of

	!lockdep_is_held()

assertions into printk() so that we don't have to keep rediscovering this 
sort of problems over and over again?

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-02  9:21     ` Peter Zijlstra
@ 2015-06-02 14:09       ` John Stultz
  0 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-02 14:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

On Tue, Jun 2, 2015 at 2:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jun 02, 2015 at 11:01:54AM +0200, Ingo Molnar wrote:
>> > Currently, leapsecond adjustments are done at tick time.
>> >
>> > As a result, the leapsecond was applied at the first timer
>> > tick *after* the leapsecond (~1-10ms late depending on HZ),
>> > rather then exactly on the second edge.
>
> So why not program a hrtimer to expire at the exact right time? Then
> even the VDSO gets its time 'right' and you avoid touching all the
> fast paths.

Well, the hrtimer won't always expire at the exact right time (due to
irq latency, possible clockevent drift/error, etc) , so that would
close the window a bit, but wouldn't eliminate the issue.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-02  9:01   ` Ingo Molnar
  2015-06-02  9:21     ` Peter Zijlstra
@ 2015-06-02 15:52     ` John Stultz
  2015-06-03  9:04       ` Ingo Molnar
  1 sibling, 1 reply; 40+ messages in thread
From: John Stultz @ 2015-06-02 15:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan, Peter Zijlstra

On Tue, Jun 2, 2015 at 2:01 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> Currently, leapsecond adjustments are done at tick time.
>>
>> As a result, the leapsecond was applied at the first timer
>> tick *after* the leapsecond (~1-10ms late depending on HZ),
>> rather then exactly on the second edge.
>>
>> This was in part historical from back when we were always
>> tick based, but correcting this since has been avoided since
>> it adds extra conditional checks in the gettime fastpath,
>> which has performance overhead.
>>
>> However, it was recently pointed out that ABS_TIME
>> CLOCK_REALTIME timers set for right after the leapsecond
>> could fire a second early, since some timers may be expired
>> before we trigger the timekeeping timer, which then applies
>> the leapsecond.
>>
>> This isn't quite as bad as it sounds, since behaviorally
>> it is similar to what is possible w/ ntpd made leapsecond
>> adjustments done w/o using the kernel discipline. Where
>> due to latencies, timers may fire just prior to the
>> settimeofday call. (Also, one should note that all
>> applications using CLOCK_REALTIME timers should always be
>> careful, since they are prone to quirks from settimeofday()
>> disturbances.)
>>
>> However, the purpose of having the kernel do the leap adjustment
>> is to avoid such latencies, so I think this is worth fixing.
>>
>> So in order to properly keep those timers from firing a second
>> early, this patch modifies the gettime accessors to do the
>> extra checks to apply the leapsecond adjustment on the second
>> edge. This prevents the timer core from expiring timers too
>> early.
>>
>> This patch does not handle VDSO time implementations, so
>> userspace using vdso gettime will still see the leapsecond
>> applied at the first timer tick after the leapsecond.
>> This is a bit of a tradeoff, since the performance impact
>> would be greatest to VDSO implementations, and since vdso
>> interfaces don't provide the TIME_OOP flag, one can't
>> distinquish the leapsecond from a time discontinuity (such
>> as settimeofday), so correcting the VDSO may not be as
>> important there.
>>
>> Apologies to Richard Cochran, who pushed for such a change
>> years ago, which I resisted due to the concerns about the
>> performance overhead.
>>
>> While I suspect this isn't extremely critical, folks who
>> care about strict leap-second correctness will likely
>> want to watch this, and it will likely be a -stable candidate.
>>
>> Cc: Prarit Bhargava <prarit@redhat.com>
>> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
>> Cc: Richard Cochran <richardcochran@gmail.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Jiri Bohac <jbohac@suse.cz>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Shuah Khan <shuahkh@osg.samsung.com>
>> Originally-suggested-by: Richard Cochran <richardcochran@gmail.com>
>> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
>> Reported-by: Prarit Bhargava <prarit@redhat.com>
>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>> ---
>>  include/linux/time64.h              |  1 +
>>  include/linux/timekeeper_internal.h |  7 +++
>>  kernel/time/ntp.c                   | 73 +++++++++++++++++++++++++---
>>  kernel/time/ntp_internal.h          |  1 +
>>  kernel/time/timekeeping.c           | 97 ++++++++++++++++++++++++++++++++-----
>>  5 files changed, 159 insertions(+), 20 deletions(-)
>
> So I don't like the complexity of this at all: why do we add over 100 lines of
> code for something that occurs (literally) once in a blue moon?

So yea. I very much felt the same way before the early timer
expiration issue came up.

> ... and for that reason I'm not surprised at all that it broke in non-obvious
> ways.
>
> Instead of having these super rare special events, how about implementing leap
> second smearing instead? That's far less radical and a lot easier to test as well,
> as it's a continuous mechanism. It will also confuse user-space a lot less,
> because there are no sudden time jumps.

So yea. Leap smearing/slewing is an attractive solution. The first
issue is that there's no standard yet for the range of time that the
slew occurs (or even if the slew is linear or a curve). The second is
I don't think we can actually get away from supporting UTC w/ leap, as
applications may depend on precision. Also things like NTP sync w/
mixed systems would be problematic, as NTPd and others need to become
savvy of which mode they are working with.

The  leap smearing method of only doing it in private networks and
controlling it by the NTP server is becoming more widespread, but it
has its own problems, since it doesn't handle CLOCK_TAI properly, and
since CLOCK_REALTIME isn't yet frequency steerable separately from the
other clockids, this method ends up slowing down CLOCK_TAI and
CLOCK_MONOTONIC as well.

I'd like to try to get something working in the kernel so we could
support CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids,
then allow applications that care to migrate explicitly to the one
they care about. Possibly allowing CLOCK_REALTIME to be compile-time
directed to CLOCK_UTCSLS so that most applications that don't care can
just ignore it.  But finding time to do this has been hard (if anyone
is interested in working on it, I'd be excited to hear!).

But if you think this patch is complicated, creating a new separately
steered clockid is not going to be trvial (as there will be lots of
ugly edge cases, like what if a leap second is cancelled mid-way
through the slewing adjustment, etc).

> Secondly, why is there a directional flag? I thought leap seconds can only be
> inserted.

A leap delete isn't likely to occur, but its supported by the adjtimex
interface. And given the irregularity of the earths rotation, I'm not
sure I'd rule it out completely.

> So all in one, the leap second code is fragile and complex - lets re-think the
> whole topic instead of complicating it even more ...

So the core complexity with this patch is that we're basically having
to do state-machine transitions in a read-only path (since the reads
may happen before the update path runs). Since there's a number of
read-paths, there's some duplication, and in some cases variance if
the read path exports more state (ie: adjtimex).

I do agree that the complexity of the time subsystem is getting hard
to manage. I'm at the point where I think we need to avoid keeping
duplicated timespec and ktime_t data (we can leave the ktime->timespec
caching to the VDSOs). That will help cut down the read paths a bit,
but will also simplify updates since we'll have less data to keep in
sync.  How we manage the ntp state also needs a rework, since the
locking rules are getting too complex (bit me in an earlier version of
this patch), and we're in effect duplicating some of that state in the
timekeeper with this patch to handle the reads safely.

But even assuming all those changes were already made, I think we'd
still need something close to this patch.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path
  2015-06-02 10:31   ` Jiri Bohac
  2015-06-02 10:43     ` Jiri Kosina
@ 2015-06-02 16:04     ` John Stultz
  1 sibling, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-02 16:04 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Thomas Gleixner, Ingo Molnar,
	Shuah Khan

On Tue, Jun 2, 2015 at 3:31 AM, Jiri Bohac <jbohac@suse.cz> wrote:
> Hi,
>
> On Fri, May 29, 2015 at 01:24:27PM -0700, John Stultz wrote:
>> Looking over the leapsecond code, I noticed the printk messages
>> reporting the leapsecond insertion in the second_overflow path
>> were not using the printk_deferred method. This was surprising
>> since the printk_deferred method was added in part to avoid
>> printk-ing while holding the timekeeping locks.
>>
>> See 6d9bcb621b0b (timekeeping: use printk_deferred when holding
>> timekeeping seqlock) for further rational.
>>
>> I can only guess that this omission was a git add -p oversight.
>
> second_overflow() is called from accumulate_nsecs_to_secs().
>
> accumulate_nsecs_to_secs() is called from update_wall_time()
> - once directly
> - once via logarithmic_accumulation()
> Both calls are before write_seqcount_begin(&tk_core.seq).
>
> So it looks safe to use printk there.

Good point. The update is being done to the shadow-timekeeper, so we
won't block readers.  This can probably be dropped then. Although I'm
almost consider keeping it for consistency so I don't forget this
again and worry about it in the future.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path
  2015-06-02 10:43     ` Jiri Kosina
@ 2015-06-02 16:14       ` John Stultz
  0 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-02 16:14 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jiri Bohac, lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Thomas Gleixner, Ingo Molnar,
	Shuah Khan

On Tue, Jun 2, 2015 at 3:43 AM, Jiri Kosina <jkosina@suse.cz> wrote:
> On Tue, 2 Jun 2015, Jiri Bohac wrote:
>
>> > Looking over the leapsecond code, I noticed the printk messages
>> > reporting the leapsecond insertion in the second_overflow path
>> > were not using the printk_deferred method. This was surprising
>> > since the printk_deferred method was added in part to avoid
>> > printk-ing while holding the timekeeping locks.
>> >
>> > See 6d9bcb621b0b (timekeeping: use printk_deferred when holding
>> > timekeeping seqlock) for further rational.
>> >
>> > I can only guess that this omission was a git add -p oversight.
>>
>> second_overflow() is called from accumulate_nsecs_to_secs().
>>
>> accumulate_nsecs_to_secs() is called from update_wall_time()
>> - once directly
>> - once via logarithmic_accumulation()
>> Both calls are before write_seqcount_begin(&tk_core.seq).
>>
>> So it looks safe to use printk there.
>
> Couldn't we stuff a couple of
>
>         !lockdep_is_held()
>
> assertions into printk() so that we don't have to keep rediscovering this
> sort of problems over and over again?

Yea.  I was thinking if we could add something very early in printk
before we disable lockdep where we lockdep_aquire/release a few of the
locks we know printk might take, it would help close the gap on these
sorts of call paths that surprise us.

Lockdep is *such* a great tool, because it provides some confidence
that changes don't cause locking regressions, so to have printk poke a
hole in that confidence is frustrating.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-02 15:52     ` John Stultz
@ 2015-06-03  9:04       ` Ingo Molnar
  2015-06-03 17:44         ` John Stultz
  0 siblings, 1 reply; 40+ messages in thread
From: Ingo Molnar @ 2015-06-03  9:04 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan, Peter Zijlstra

* John Stultz <john.stultz@linaro.org> wrote:

> > Instead of having these super rare special events, how about implementing leap 
> > second smearing instead? That's far less radical and a lot easier to test as 
> > well, as it's a continuous mechanism. It will also confuse user-space a lot 
> > less, because there are no sudden time jumps.
> 
> So yea. Leap smearing/slewing is an attractive solution. The first issue is that 
> there's no standard yet for the range of time that the slew occurs (or even if 
> the slew is linear or a curve). The second is I don't think we can actually get 
> away from supporting UTC w/ leap, as applications may depend on precision. Also 
> things like NTP sync w/ mixed systems would be problematic, as NTPd and others 
> need to become savvy of which mode they are working with.

Supporting it minimally is fine - supporting it with clearly unmaintainable 
complexity is not.

So as long as we offer good smearing of the leap second (with a configurable 
parameter for how long the period should be), people in need of better leap
second handling can take that.

> The leap smearing method of only doing it in private networks and controlling it 
> by the NTP server is becoming more widespread, but it has its own problems, 
> since it doesn't handle CLOCK_TAI properly, and since CLOCK_REALTIME isn't yet 
> frequency steerable separately from the other clockids, this method ends up 
> slowing down CLOCK_TAI and CLOCK_MONOTONIC as well.

All real time clock derived clocks should smear in sync as well.

> I'd like to try to get something working in the kernel so we could support 
> CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids, then allow 
> applications that care to migrate explicitly to the one they care about. 
> Possibly allowing CLOCK_REALTIME to be compile-time directed to CLOCK_UTCSLS so 
> that most applications that don't care can just ignore it.  But finding time to 
> do this has been hard (if anyone is interested in working on it, I'd be excited 
> to hear!).

There should definitely be a Kconfig option to just map all relevant clocks to 
smeared seconds. Hopefully this ends up being the standard in a few years and we 
can pin down the exact parameters as well.

Having separate clockids for mixed uses would be fine as well. Maybe.

> But if you think this patch is complicated, creating a new separately steered 
> clockid is not going to be trvial (as there will be lots of ugly edge cases, 
> like what if a leap second is cancelled mid-way through the slewing adjustment, 
> etc).

Well, I think the main advantage of leap second smearing is that it's not a 
binary, but a continuous interface, and so it's way easier to test than 'sudden' 
leap second insertions.

In fact we could essentially implement leap second smearing via the usual adjtimex 
mechanisms: as far as the time code is concerned it does not matter why a gradual 
adjustment occurs, only the rate of change and the method of convergence is an 
open parameter.

In fact I'd suggest we implement even original leap seconds by doing a high-rate 
'smearing' in the final X minutes leading up to the leap second, where 'X' could 
be 1 by default. This way we could eliminate leap seconds as a separate logical 
entity mostly.

This should be far more gentle to applications as well than sudden jumps, and 
timers will just work fine as well.

> > Secondly, why is there a directional flag? I thought leap seconds can only be 
> > inserted.
> 
> A leap delete isn't likely to occur, but its supported by the adjtimex 
> interface. And given the irregularity of the earths rotation, I'm not sure I'd 
> rule it out completely.

Well, the long term trend is clear and unambiguous: the rotation of Earth is 
slowing down (the main component of which is losing angular momentum to the Moon), 
hence the days are getting longer and we have to insert a leap second every second 
year or so.

The short term trends (discounting massive asteorid strikes, at which point leap 
seconds will be the least of our problems) are somewhat chaotic:

 - glaciation (which shifts water mass assymetrically)

 - global warming (one component of which is thermal expansion, which expands 
   oceans assymetrically and shifts water mass - the other component is changing
   climatology: different oceanic currents, etc. - which all shift mass around)

 - tectonics (slow rearrangement of mass plus earthquakes).

 - even slower scale rearrangement of mass (mantle plumes, etc.)

but the long term trend still dominates. Look at this graph of measurements of the 
Earth's rotation:

  http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg

See how the mean (the green line) was always above zero in the measured past. The 
monotonically increasing nature comes from that.

and given how many problems we had with leap second insertion, on millions of 
installed systems, guess the likelihood of there being a leap second deleted? How 
many OSs that can do leap second insertion are unable to do leap second deletion?

Also note that leap second deletion means a jump in time backward. Daylight saving 
time is already causing problems with that.

> > So all in one, the leap second code is fragile and complex - lets re-think the 
> > whole topic instead of complicating it even more ...
> 
> So the core complexity with this patch is that we're basically having to do 
> state-machine transitions in a read-only path (since the reads may happen before 
> the update path runs). Since there's a number of read-paths, there's some 
> duplication, and in some cases variance if the read path exports more state (ie: 
> adjtimex).

My fundamental observation is: the cost/benefit ratio is insanely high.

Interrupts are fundamentally jittery, there's no guarantee of their accuracy - you 
yourself said that as a reply to PeterZ's suggestion to drive leap seconds via 
hrtimers - and the motivation was to make interrupts arrive more accurately around 
leap seconds.

So why make the code more fragile, more complex, just to solve a scenario that 
cannot really be done perfectly?

Especially as second smearing appears to be the way superior future method of 
handling leap seconds.

> I do agree that the complexity of the time subsystem is getting hard to manage. 

That's rather an understatement.

> I'm at the point where I think we need to avoid keeping duplicated timespec and 
> ktime_t data (we can leave the ktime->timespec caching to the VDSOs). That will 
> help cut down the read paths a bit, but will also simplify updates since we'll 
> have less data to keep in sync.  How we manage the ntp state also needs a 
> rework, since the locking rules are getting too complex (bit me in an earlier 
> version of this patch), and we're in effect duplicating some of that state in 
> the timekeeper with this patch to handle the reads safely.

Agreed.

> But even assuming all those changes were already made, I think we'd still need 
> something close to this patch.

I disagree rather strongly.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-03  9:04       ` Ingo Molnar
@ 2015-06-03 17:44         ` John Stultz
  2015-06-04  6:48           ` Ingo Molnar
  0 siblings, 1 reply; 40+ messages in thread
From: John Stultz @ 2015-06-03 17:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan, Peter Zijlstra

On Wed, Jun 3, 2015 at 2:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> > Instead of having these super rare special events, how about implementing leap
>> > second smearing instead? That's far less radical and a lot easier to test as
>> > well, as it's a continuous mechanism. It will also confuse user-space a lot
>> > less, because there are no sudden time jumps.
>>
>> So yea. Leap smearing/slewing is an attractive solution. The first issue is that
>> there's no standard yet for the range of time that the slew occurs (or even if
>> the slew is linear or a curve). The second is I don't think we can actually get
>> away from supporting UTC w/ leap, as applications may depend on precision. Also
>> things like NTP sync w/ mixed systems would be problematic, as NTPd and others
>> need to become savvy of which mode they are working with.
>
> Supporting it minimally is fine - supporting it with clearly unmaintainable
> complexity is not.
>
> So as long as we offer good smearing of the leap second (with a configurable
> parameter for how long the period should be), people in need of better leap
> second handling can take that.
>
>> The leap smearing method of only doing it in private networks and controlling it
>> by the NTP server is becoming more widespread, but it has its own problems,
>> since it doesn't handle CLOCK_TAI properly, and since CLOCK_REALTIME isn't yet
>> frequency steerable separately from the other clockids, this method ends up
>> slowing down CLOCK_TAI and CLOCK_MONOTONIC as well.
>
> All real time clock derived clocks should smear in sync as well.

Eeerrr.. So CLOCK_TAI is UTC without leapseconds, to smear TAI would
be wrong. Similarly, CLOCK_MONOTONIC/BOOTTIME probably shouldn't be
smeared either (but those are defined less strictly).

>> I'd like to try to get something working in the kernel so we could support
>> CLOCK_UTC and CLOCK_UTCSLS (smeared-leap-second) clockids, then allow
>> applications that care to migrate explicitly to the one they care about.
>> Possibly allowing CLOCK_REALTIME to be compile-time directed to CLOCK_UTCSLS so
>> that most applications that don't care can just ignore it.  But finding time to
>> do this has been hard (if anyone is interested in working on it, I'd be excited
>> to hear!).
>
> There should definitely be a Kconfig option to just map all relevant clocks to
> smeared seconds. Hopefully this ends up being the standard in a few years and we
> can pin down the exact parameters as well.
>
> Having separate clockids for mixed uses would be fine as well. Maybe.
>
>> But if you think this patch is complicated, creating a new separately steered
>> clockid is not going to be trvial (as there will be lots of ugly edge cases,
>> like what if a leap second is cancelled mid-way through the slewing adjustment,
>> etc).
>
> Well, I think the main advantage of leap second smearing is that it's not a
> binary, but a continuous interface, and so it's way easier to test than 'sudden'
> leap second insertions.
>
> In fact we could essentially implement leap second smearing via the usual adjtimex
> mechanisms: as far as the time code is concerned it does not matter why a gradual
> adjustment occurs, only the rate of change and the method of convergence is an
> open parameter.
>
> In fact I'd suggest we implement even original leap seconds by doing a high-rate
> 'smearing' in the final X minutes leading up to the leap second, where 'X' could
> be 1 by default. This way we could eliminate leap seconds as a separate logical
> entity mostly.
>
> This should be far more gentle to applications as well than sudden jumps, and
> timers will just work fine as well.

Well, again the problem with high-rate smearing as you describe is
that it would affect CLOCK_MONOTONIC as well, which could cause
periodic timers used for sampling, etc (imagine recording audio, etc)
to slow as well, possibly causing application problems. This is why
the smeared leap-seconds are usually done across a day at a slow rate.

To allow for CLOCK_REALTIME to be frequency adjusted separately from
CLOCK_MONOTONIC/CLOCK_TAI, which would would have the least unwanted
side-effects, we're probably going to have to manage it separately
(like we do w/ MONOTONIC_RAW time). But again, this creates a lot more
complexity.

>> > Secondly, why is there a directional flag? I thought leap seconds can only be
>> > inserted.
>>
>> A leap delete isn't likely to occur, but its supported by the adjtimex
>> interface. And given the irregularity of the earths rotation, I'm not sure I'd
>> rule it out completely.
>
> Well, the long term trend is clear and unambiguous: the rotation of Earth is
> slowing down (the main component of which is losing angular momentum to the Moon),
> hence the days are getting longer and we have to insert a leap second every second
> year or so.
>
> The short term trends (discounting massive asteorid strikes, at which point leap
> seconds will be the least of our problems) are somewhat chaotic:
>
>  - glaciation (which shifts water mass assymetrically)
>
>  - global warming (one component of which is thermal expansion, which expands
>    oceans assymetrically and shifts water mass - the other component is changing
>    climatology: different oceanic currents, etc. - which all shift mass around)
>
>  - tectonics (slow rearrangement of mass plus earthquakes).
>
>  - even slower scale rearrangement of mass (mantle plumes, etc.)
>
> but the long term trend still dominates. Look at this graph of measurements of the
> Earth's rotation:
>
>   http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg
>
> See how the mean (the green line) was always above zero in the measured past. The
> monotonically increasing nature comes from that.
>
> and given how many problems we had with leap second insertion, on millions of
> installed systems, guess the likelihood of there being a leap second deleted? How
> many OSs that can do leap second insertion are unable to do leap second deletion?
>
> Also note that leap second deletion means a jump in time backward. Daylight saving
> time is already causing problems with that.

Err.. Other way around. Leap-second deletion is a jump in time forward
(jumping from 23:59:58 to 00:00:00, skipping 23:59:59). Which is
simpler to deal with. And luckily (at least for us) daylight savings
is done in userspace (as UTC, including leapseconds, ideally would be
from the kernel providing TAI time).

But yes, I agree that the leap deletion logic is likely to never run
outside of testing.

>> > So all in one, the leap second code is fragile and complex - lets re-think the
>> > whole topic instead of complicating it even more ...
>>
>> So the core complexity with this patch is that we're basically having to do
>> state-machine transitions in a read-only path (since the reads may happen before
>> the update path runs). Since there's a number of read-paths, there's some
>> duplication, and in some cases variance if the read path exports more state (ie:
>> adjtimex).
>
> My fundamental observation is: the cost/benefit ratio is insanely high.

I agree. In a perfect world, the kernel would export TAI not UTC,
leaving the translation to UTC to userspace (take heed developers of
new IoT OSes!). But the trouble is that historical posix/linux
provides UTC (without a leapsecond representation, which is why we
have to repeat a second).

And as more folks (userspace developers, not really kernel developers)
are caring about strict UTC correctness around the leapsecond, its
hard to rationalize avoiding the complexity (since they don't really
care, they just don't want to deal with anything unexpected in their
application).

> Interrupts are fundamentally jittery, there's no guarantee of their accuracy - you
> yourself said that as a reply to PeterZ's suggestion to drive leap seconds via
> hrtimers - and the motivation was to make interrupts arrive more accurately around
> leap seconds.
>
> So why make the code more fragile, more complex, just to solve a scenario that
> cannot really be done perfectly?

So here I worry I didn't communicate clearly enough what the patch does. :(

Its not about making interrupts more accurate around the leapsecond,
its about applying the leapsecond transition in the read-path
precisely at the leapsecond edge (rather then a short while later when
the timer fires and we update the timekeeping structures).

But more importantly, this change to the read path prevents timers
that may be expired before update_wall_time timer runs (most likely on
other cpus) from being expired early. Since the time read that is used
by the hrtimer expiration logic is adjusted properly right on that
edge.

> Especially as second smearing appears to be the way superior future method of
> handling leap seconds.
>

So here the problem is it depends on the user. For probably most
users, who really don't care, the leap-smear is ideal behavior for
CLOCK_REALTIME (I think leap-smears causing any change to other
clockids would be surprising). However, there are some users who
expect posix UTC leapsecond behavior. Either because they're
positioning telescopes doing things that do depend on strict solar
time, or because they are required (in some cases by law) to use UTC.

I don't think we can just abandon/break those users, for
leap-smearing. So I don't know if we can get away from that
complexity.
But maybe I'm not thinking "boldly" here.

>> I do agree that the complexity of the time subsystem is getting hard to manage.
>
> That's rather an understatement.
>
>> I'm at the point where I think we need to avoid keeping duplicated timespec and
>> ktime_t data (we can leave the ktime->timespec caching to the VDSOs). That will
>> help cut down the read paths a bit, but will also simplify updates since we'll
>> have less data to keep in sync.  How we manage the ntp state also needs a
>> rework, since the locking rules are getting too complex (bit me in an earlier
>> version of this patch), and we're in effect duplicating some of that state in
>> the timekeeper with this patch to handle the reads safely.
>
> Agreed.
>
>> But even assuming all those changes were already made, I think we'd still need
>> something close to this patch.
>
> I disagree rather strongly.

I do really appreciate the review and thoughts here, and respect and
share your concern about complexity, but I'm not yet seeing a viable
path forward with your proposals above. So additional ideas or
clarifications would be welcome.

So, I think with this push back, we're unlikely to have a solution
that will be deploy-able by the leap second at the end of the month
(though the issue was reported late enough that getting something
merged/backported/deployed in mass wasn't super realistic).  So we'll
get to hear how much folks actually care about this issue.

Since the leap is a discontinuity, and there is no way to set a
ABS_TIME CLOCK_REALTIME timer for the 23:59:60 leap second,  having a
few very early timers targeted for the next second expire early on
that repeated second is probably not a major issue in practice.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-03 17:44         ` John Stultz
@ 2015-06-04  6:48           ` Ingo Molnar
  2015-06-05  0:08             ` John Stultz
  0 siblings, 1 reply; 40+ messages in thread
From: Ingo Molnar @ 2015-06-04  6:48 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> > [...]
> >
> > but the long term trend still dominates. Look at this graph of measurements of the
> > Earth's rotation:
> >
> >   http://en.wikipedia.org/wiki/File:Deviation_of_day_length_from_SI_day.svg
> >
> > See how the mean (the green line) was always above zero in the measured past. The
> > monotonically increasing nature comes from that.
> >
> > and given how many problems we had with leap second insertion, on millions of
> > installed systems, guess the likelihood of there being a leap second deleted? How
> > many OSs that can do leap second insertion are unable to do leap second deletion?
> >
> > Also note that leap second deletion means a jump in time backward. Daylight saving
> > time is already causing problems with that.
> 
> Err.. Other way around. Leap-second deletion is a jump in time forward (jumping 
> from 23:59:58 to 00:00:00, skipping 23:59:59). Which is simpler to deal with. 
> [...]

Indeed!

Still my point remains: many OSs that can handle leap second insertion don't 
handle leap second deletion, so what are the chances that even a temporary blip in 
earth's rotation (which will be offset by the long term trend within a year or so 
after the event) will cause a conservative international standards body to declare 
an unprecedented leap second deletion? Fairly low I'd say.

> [...] And luckily (at least for us) daylight savings is done in userspace (as 
> UTC, including leapseconds, ideally would be from the kernel providing TAI 
> time).
> 
> But yes, I agree that the leap deletion logic is likely to never run outside of 
> testing.

Which is a red flag.

> > [...]
> >
> > So why make the code more fragile, more complex, just to solve a scenario that 
> > cannot really be done perfectly?
> 
> So here I worry I didn't communicate clearly enough what the patch does. :(
> 
> Its not about making interrupts more accurate around the leapsecond, its about 
> applying the leapsecond transition in the read-path precisely at the leapsecond 
> edge (rather then a short while later when the timer fires and we update the 
> timekeeping structures).
> 
> But more importantly, this change to the read path prevents timers that may be 
> expired before update_wall_time timer runs (most likely on other cpus) from 
> being expired early. Since the time read that is used by the hrtimer expiration 
> logic is adjusted properly right on that edge.

But with leap second smearing we'd have the same benefits and some more.

> > Especially as second smearing appears to be the way superior future method of 
> > handling leap seconds.
> 
> So here the problem is it depends on the user. For probably most users, who 
> really don't care, the leap-smear is ideal behavior for CLOCK_REALTIME (I think 
> leap-smears causing any change to other clockids would be surprising). However, 
> there are some users who expect posix UTC leapsecond behavior. Either because 
> they're positioning telescopes doing things that do depend on strict solar time, 
> or because they are required (in some cases by law) to use UTC.

That usecase is perfectly OK: they can use the old leap second logic.

What I argue is that we should not add significant complexity to logic that is 
fragile already as-is, and which runs at most only once per year.

> I don't think we can just abandon/break those users, for leap-smearing. So I 
> don't know if we can get away from that complexity.

That's a false dichotomy - I'm not suggesting to 'abandon' them.

I'm suggesting one of these options:

  - Keep the current leap second code as-is, because it's proven. Concentrate on 
    leap second smearing instead that is superior and which might emerge as a 
    future standard.

  - or make leap second insertion much more obvious and easier to verify (i.e. not
    a +100 LOC delta!)

  - or make leap second handling a part of some other existing facility that is
    used much more frequently: for example have you considered making it a
    special, kernel-internal case of settimeofday()? If settimeofday was
    implemented in a fashion to set the time at a given 'time edge', then
    thousands of systems would test that facility day in and out. Leap second
    insertion would simply be a special settimeofday() call that sets the time
    back by one second at midnight June 30 or so. Normal settimeofday() would be a
    call that sets the time back (or forward) at the next seconds edge right now -
    but it would fundamentally use the same facility.

    and yes, this variant would necessarily complicate settimeofday(), but that's
    OK, as it's used so frequently so we have adequate testing of it.

    A third facility would effectiely be merged with this as well: when NTP
    adjusts large offsets (when running with -g, etc.) it will use settimeofday(),
    right?

I don't think we've exhausted all of these options.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-04  6:48           ` Ingo Molnar
@ 2015-06-05  0:08             ` John Stultz
  2015-06-05  7:29               ` Peter Zijlstra
  0 siblings, 1 reply; 40+ messages in thread
From: John Stultz @ 2015-06-05  0:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan, Peter Zijlstra

On Wed, Jun 3, 2015 at 11:48 PM, Ingo Molnar <mingo@kernel.org> wrote:
> * John Stultz <john.stultz@linaro.org> wrote:
>>
>> But more importantly, this change to the read path prevents timers that may be
>> expired before update_wall_time timer runs (most likely on other cpus) from
>> being expired early. Since the time read that is used by the hrtimer expiration
>> logic is adjusted properly right on that edge.
>
> But with leap second smearing we'd have the same benefits and some more.

I'm not sure how this follows. Leap smearing is a different behavior
and I'd like to support it (as a separate clockid) as well, but adding
that support doesn't change that the existing behavior applying the
leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
isn't strictly correct.

>> > Especially as second smearing appears to be the way superior future method of
>> > handling leap seconds.
>>
>> So here the problem is it depends on the user. For probably most users, who
>> really don't care, the leap-smear is ideal behavior for CLOCK_REALTIME (I think
>> leap-smears causing any change to other clockids would be surprising). However,
>> there are some users who expect posix UTC leapsecond behavior. Either because
>> they're positioning telescopes doing things that do depend on strict solar time,
>> or because they are required (in some cases by law) to use UTC.
>
> That usecase is perfectly OK: they can use the old leap second logic.

Which again, the current leap second logic has slight behavioral
issues the patch I'm proposing is trying to fix.

> What I argue is that we should not add significant complexity to logic that is
> fragile already as-is, and which runs at most only once per year.
>
>> I don't think we can just abandon/break those users, for leap-smearing. So I
>> don't know if we can get away from that complexity.
>
> That's a false dichotomy - I'm not suggesting to 'abandon' them.
>
> I'm suggesting one of these options:
>
>   - Keep the current leap second code as-is, because it's proven. Concentrate on
>     leap second smearing instead that is superior and which might emerge as a
>     future standard.

I understand that you're saying "focus on the future", which is good
advice.  But if the objection is complexity, adding leap-smearing
isn't going to reduce complexity.

I'd also argue the current leap second code isn't quite "proven". Or
at least it didn't prove itself well last time around. Those major
issues have been addressed, but the strict correctness issue where the
leap second is done at timer time, instead of the second edge is not
currently addressed.

>   - or make leap second insertion much more obvious and easier to verify (i.e. not
>     a +100 LOC delta!)

I can work on simplifying and compressing some of the state to try to
reduce the line count and make it easier to verify. But again,
applying the leap-second at the exact second edge means we have to
mimic the state transition in the various read paths, which is going
to still be a non-trivial patch.

>   - or make leap second handling a part of some other existing facility that is
>     used much more frequently: for example have you considered making it a
>     special, kernel-internal case of settimeofday()? If settimeofday was
>     implemented in a fashion to set the time at a given 'time edge', then
>     thousands of systems would test that facility day in and out. Leap second
>     insertion would simply be a special settimeofday() call that sets the time
>     back by one second at midnight June 30 or so. Normal settimeofday() would be a
>     call that sets the time back (or forward) at the next seconds edge right now -
>     but it would fundamentally use the same facility.
>
>     and yes, this variant would necessarily complicate settimeofday(), but that's
>     OK, as it's used so frequently so we have adequate testing of it.

I will have to think more about this, but initially it doesn't seem to
make much sense to me (again, I still worry that the specifics of the
issue aren't clear). settimeofday() calls happen immediately, and
under the timekeeping write lock. The issues here with leap seconds is
that we are supposed to make a second repeat exactly on the second
boundary. Since nothing asynchronous can be reliably done at a
specific time, the only way to have correct behavior is to handle the
leap adjustment in the read path, basically checking if we have
crossed that edge, and if so, setting the time we return back by one
second. For the adjtime case its more complicated because we also have
to return other state data (leap-in-progress TIME_OOP flag, or after
the leapsecond is over, the TIME_WAIT flag) which we have to adjust on
that edge as well.

The larger point of finding a way to ensure the code paths are tested
well in an everyday fashion is a good one. I'm just not sure how to
make that happen.

(One could argue having ntp apply the leapsecond from userspace via
settimeofday() rather then having the kernel do it would be more along
this line of though, but again the problem there is userspace can't
ensure something happens at a specific time, so it has the same issues
of being "late" in applying the leap adjustment that the kernel
currently has).

>     A third facility would effectiely be merged with this as well: when NTP
>     adjusts large offsets (when running with -g, etc.) it will use settimeofday(),
>     right?

So..  it depends, a method has been added to adjtimex to allow clock
syncing applications to inject time offsets, rather then trying to set
the time absolutely to correct for error (since the calculation of
setting the time absolutely is inherently racy). But again, those
mechanisms happen immediately and don't have the same behavioral
expectation that leap seconds have.

> I don't think we've exhausted all of these options.

So again, lots of good things to think about, but I'm still not seeing
too much in the way of actionable suggestions yet.

The leap-smear work is interesting, but also I see it as independent
to the issue brought up by Prarit.

In the mean-time, to try to address the complexity concern of this
issue, I am looking at trying to simplify some of the NTP state
handling, per Richard's comment, as well as moving that global state
into the timekeeper so we don't have to copy it over so often. This
will hopefully simplify my proposed change a bit (although it will
make it hell for folks trying to backport a fix).

Further changes to avoid duplicate state in the timekeeping structure
(as well as duplicate accessors) will further reduce the footprint of
the change (though I could see Thomas or you objecting to those on
performance grounds).

I can see about also removing support for leap-second removals
(STA_DEL) in the core code, which might reduce the proposed patch by a
few lines.

I'm hoping after that it will appear simple enough. :)

Thanks again for the review and thoughts!
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  0:08             ` John Stultz
@ 2015-06-05  7:29               ` Peter Zijlstra
  2015-06-05  9:04                 ` Richard Cochran
                                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-06-05  7:29 UTC (permalink / raw)
  To: John Stultz
  Cc: Ingo Molnar, lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

On Thu, Jun 04, 2015 at 05:08:11PM -0700, John Stultz wrote:
> I'm not sure how this follows. Leap smearing is a different behavior
> and I'd like to support it (as a separate clockid) as well, but adding
> that support doesn't change that the existing behavior applying the
> leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
> isn't strictly correct.

So the big problem of course is that your patches do not handle the VDSO
time read, and that is the biggest source of userspace time.

So userspace will still see incorrect time, only in-kernel users (timers
being your prime example) get the leap second at the 'right' place.

Also note that your argument that timers will now get the correct time
is subject to the very same timer interrupt jitter as driving the leap
second stuff from an hrtimer itself -- they're all timers.

That leaves the question; for who is this exact second edge important?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  7:29               ` Peter Zijlstra
@ 2015-06-05  9:04                 ` Richard Cochran
  2015-06-05  9:10                   ` Peter Zijlstra
  2015-06-05 11:37                   ` Prarit Bhargava
  2015-06-05 14:22                 ` Richard Cochran
  2015-06-05 17:24                 ` John Stultz
  2 siblings, 2 replies; 40+ messages in thread
From: Richard Cochran @ 2015-06-05  9:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, Ingo Molnar, lkml, Prarit Bhargava,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> That leaves the question; for who is this exact second edge important?

Distributed applications using the UTC time scale.

Many control applications are done with a 1 millisecond period.
Having the time wrong by a second for 10 or 100 loops is bad news.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  9:04                 ` Richard Cochran
@ 2015-06-05  9:10                   ` Peter Zijlstra
  2015-06-05 14:12                     ` Richard Cochran
  2015-06-06  9:44                     ` Thomas Gleixner
  2015-06-05 11:37                   ` Prarit Bhargava
  1 sibling, 2 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-06-05  9:10 UTC (permalink / raw)
  To: Richard Cochran
  Cc: John Stultz, Ingo Molnar, lkml, Prarit Bhargava,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
> On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> > That leaves the question; for who is this exact second edge important?
> 
> Distributed applications using the UTC time scale.
> 
> Many control applications are done with a 1 millisecond period.
> Having the time wrong by a second for 10 or 100 loops is bad news.

Firstly I would strongly suggest such applications not use UTC because
of this, I think TAI was invented for just this reason.

Secondly, how would John's patches help with this? Usespace loops
reading time would be using the VDSO and would still not get the right
time, and timers would be subject to the same IRQ latency that a hrtimer
based leap second insert would, and would still very much not be in-sync
across the cluster.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  9:04                 ` Richard Cochran
  2015-06-05  9:10                   ` Peter Zijlstra
@ 2015-06-05 11:37                   ` Prarit Bhargava
  2015-06-05 12:07                     ` Thomas Gleixner
  1 sibling, 1 reply; 40+ messages in thread
From: Prarit Bhargava @ 2015-06-05 11:37 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Peter Zijlstra, John Stultz, Ingo Molnar, lkml,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan



On 06/05/2015 05:04 AM, Richard Cochran wrote:
> On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
>> That leaves the question; for who is this exact second edge important?
> 
> Distributed applications using the UTC time scale.
> 
> Many control applications are done with a 1 millisecond period.
> Having the time wrong by a second for 10 or 100 loops is bad news.
> 

Anything starting @ UTC midnight (I think that would be Beijing, China?) ...
being one second off is not good.

P.

> Thanks,
> Richard
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05 11:37                   ` Prarit Bhargava
@ 2015-06-05 12:07                     ` Thomas Gleixner
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Gleixner @ 2015-06-05 12:07 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Richard Cochran, Peter Zijlstra, John Stultz, Ingo Molnar, lkml,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac, Ingo Molnar,
	Shuah Khan

On Fri, 5 Jun 2015, Prarit Bhargava wrote:

> 
> 
> On 06/05/2015 05:04 AM, Richard Cochran wrote:
> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> >> That leaves the question; for who is this exact second edge important?
> > 
> > Distributed applications using the UTC time scale.
> > 
> > Many control applications are done with a 1 millisecond period.
> > Having the time wrong by a second for 10 or 100 loops is bad news.

Control applications with a 1ms period running on CLOCK_REALTIME are
broken by definition.

> > 
> 
> Anything starting @ UTC midnight (I think that would be Beijing, China?) ...
> being one second off is not good.
> 
> P.
> 
> > Thanks,
> > Richard
> > 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  9:10                   ` Peter Zijlstra
@ 2015-06-05 14:12                     ` Richard Cochran
  2015-06-05 17:28                       ` John Stultz
  2015-06-06  9:44                     ` Thomas Gleixner
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Cochran @ 2015-06-05 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, Ingo Molnar, lkml, Prarit Bhargava,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Fri, Jun 05, 2015 at 11:10:08AM +0200, Peter Zijlstra wrote:
> Firstly I would strongly suggest such applications not use UTC because
> of this, I think TAI was invented for just this reason.

So I wonder whether the bug in the original post affects TAI timers as
well...

> Secondly, how would John's patches help with this? Usespace loops
> reading time would be using the VDSO and would still not get the right
> time, and timers would be subject to the same IRQ latency that a hrtimer
> based leap second insert would, and would still very much not be in-sync
> across the cluster.

But we have a tick based insertion.  (IIRC, it used to be hrtimer
based, but that was buggy, too).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  7:29               ` Peter Zijlstra
  2015-06-05  9:04                 ` Richard Cochran
@ 2015-06-05 14:22                 ` Richard Cochran
  2015-06-05 17:24                 ` John Stultz
  2 siblings, 0 replies; 40+ messages in thread
From: Richard Cochran @ 2015-06-05 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, Ingo Molnar, lkml, Prarit Bhargava,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> That leaves the question; for who is this exact second edge important?

Another one: data loggers for scientific applications using UTC time
stamps.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  7:29               ` Peter Zijlstra
  2015-06-05  9:04                 ` Richard Cochran
  2015-06-05 14:22                 ` Richard Cochran
@ 2015-06-05 17:24                 ` John Stultz
  2 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-05 17:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, lkml, Prarit Bhargava, Daniel Bristot de Oliveira,
	Richard Cochran, Jan Kara, Jiri Bohac, Thomas Gleixner,
	Ingo Molnar, Shuah Khan

On Fri, Jun 5, 2015 at 12:29 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 04, 2015 at 05:08:11PM -0700, John Stultz wrote:
>> I'm not sure how this follows. Leap smearing is a different behavior
>> and I'd like to support it (as a separate clockid) as well, but adding
>> that support doesn't change that the existing behavior applying the
>> leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
>> isn't strictly correct.
>
> So the big problem of course is that your patches do not handle the VDSO
> time read, and that is the biggest source of userspace time.
>
> So userspace will still see incorrect time, only in-kernel users (timers
> being your prime example) get the leap second at the 'right' place.

So yes. And as I mentioned in the patch, this is somewhat of a
tradeoff, since adding extra conditionals in the highly optimized vdso
gettime is probably more controversial. Further, since only adjtimex()
provides both the time and the leap-state, its the only interface
which can differentiate between 23:59:59 and 23:59:60, I'm not as sure
leap-insertion on exactly the leap edge is as critical for users not
using adjtimex.

Fixing the issue in the vdso would likely be a following discussion
(which I think is harder to weigh). But this patch ensures the
kernel's internal state is correct, so we don't fire timers early.

> Also note that your argument that timers will now get the correct time
> is subject to the very same timer interrupt jitter as driving the leap
> second stuff from an hrtimer itself -- they're all timers.

So. The key here is that timers set for *after* the leapsecond, will
not fire before.  That's the problem. Timers aren't supposed to fire
early, and that's the invariant we're currently not following. So if
you set a timer for midnight after the leapsecond, it may expire ~a
second early.

Now, since there's no real representation of the leap-second, you
can't set timers for that second, at least until the leap-second has
been applied.

For timers that are set just prior to the leap, its possible they
could think they were expired early, since they may be delayed and not
run until after the leapsecond is applied. So simple gettime calls
might see the time as "before" the expiration time. However, if they
use adjtimex they can see they're in the 23:59:59 w/ TIME_OOP, and
things were correct.

> That leaves the question; for who is this exact second edge important?

I don't have specific cases in mind.

Again, I argued against this sort of change when Richard suggested it
earlier. The rational that its a discontinuity, and it shouldn't
matter if that discontinuity is slightly late, since most folks aren't
using adjtimex() and carefully noting the time state flags to see if a
leap is in progress.

However, when Prarit brought up the early timer expiration issue, it
is harder for me to rationalize ignoring. Timers should not fire
early.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05 14:12                     ` Richard Cochran
@ 2015-06-05 17:28                       ` John Stultz
  0 siblings, 0 replies; 40+ messages in thread
From: John Stultz @ 2015-06-05 17:28 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Peter Zijlstra, Ingo Molnar, lkml, Prarit Bhargava,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac,
	Thomas Gleixner, Ingo Molnar, Shuah Khan

On Fri, Jun 5, 2015 at 7:12 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Jun 05, 2015 at 11:10:08AM +0200, Peter Zijlstra wrote:
>> Firstly I would strongly suggest such applications not use UTC because
>> of this, I think TAI was invented for just this reason.
>
> So I wonder whether the bug in the original post affects TAI timers as
> well...

No, I don't believe so. The TAI timeline doesn't have a
discontinuity(ie: each second is properly represented over a
leapsecond), so it wouldn't have the same issue with the specifics of
when that discontinuity is applied.   Our TAI adjustment is done
atomically with the leap adjustment, so late or not it shouldn't have
problematic behavior.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-05  9:10                   ` Peter Zijlstra
  2015-06-05 14:12                     ` Richard Cochran
@ 2015-06-06  9:44                     ` Thomas Gleixner
  2015-06-08 17:55                       ` John Stultz
  1 sibling, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2015-06-06  9:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Richard Cochran, John Stultz, Ingo Molnar, lkml, Prarit Bhargava,
	Daniel Bristot de Oliveira, Jan Kara, Jiri Bohac, Ingo Molnar,
	Shuah Khan

On Fri, 5 Jun 2015, Peter Zijlstra wrote:

> On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> > > That leaves the question; for who is this exact second edge important?
> > 
> > Distributed applications using the UTC time scale.
> > 
> > Many control applications are done with a 1 millisecond period.
> > Having the time wrong by a second for 10 or 100 loops is bad news.
> 
> Firstly I would strongly suggest such applications not use UTC because
> of this, I think TAI was invented for just this reason.
> 
> Secondly, how would John's patches help with this? Usespace loops
> reading time would be using the VDSO and would still not get the right
> time, and timers would be subject to the same IRQ latency that a hrtimer
> based leap second insert would, and would still very much not be in-sync
> across the cluster.

So the only thing which is fixed are in kernel users and therefor
hrtimers. 

That means the whole leap mess added into the gettime fast path is
just constant overhead for that corner case.

We can be smarter than that and just block hrtimer delivery for clock
realtime timers across the leap edge. There should be ways to do that
non intrusive if we think hard enough about it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-06  9:44                     ` Thomas Gleixner
@ 2015-06-08 17:55                       ` John Stultz
  2015-06-08 19:05                         ` Thomas Gleixner
  0 siblings, 1 reply; 40+ messages in thread
From: John Stultz @ 2015-06-08 17:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Richard Cochran, Ingo Molnar, lkml,
	Prarit Bhargava, Daniel Bristot de Oliveira, Jan Kara,
	Jiri Bohac, Ingo Molnar, Shuah Khan

On Sat, Jun 6, 2015 at 2:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 5 Jun 2015, Peter Zijlstra wrote:
>
>> On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
>> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
>> > > That leaves the question; for who is this exact second edge important?
>> >
>> > Distributed applications using the UTC time scale.
>> >
>> > Many control applications are done with a 1 millisecond period.
>> > Having the time wrong by a second for 10 or 100 loops is bad news.
>>
>> Firstly I would strongly suggest such applications not use UTC because
>> of this, I think TAI was invented for just this reason.
>>
>> Secondly, how would John's patches help with this? Usespace loops
>> reading time would be using the VDSO and would still not get the right
>> time, and timers would be subject to the same IRQ latency that a hrtimer
>> based leap second insert would, and would still very much not be in-sync
>> across the cluster.
>
> So the only thing which is fixed are in kernel users and therefor
> hrtimers.

Well, for vdso systems, hrtimers and adjtimex (which is the only
interface that provides enough information to understand where in a
leapsecond you actually are).

And again, vdsos are fixable, but I hesitated due to my concerns about
the extra performance overhead, the smaller benefit it provides
relative to not having timers expiring early.

> That means the whole leap mess added into the gettime fast path is
> just constant overhead for that corner case.
>
> We can be smarter than that and just block hrtimer delivery for clock
> realtime timers across the leap edge. There should be ways to do that
> non intrusive if we think hard enough about it.

This approach seems like it would add more work to the timer-add
function (to check leapstate and adjust the expiration), which might
be a heavier use case (we adjust for each timer) then the similar
logic done in the update_base_offsets_now() at hrtimer_interrupt time
(adjust for each hrtimer_interrupt).

Now, It could be possible to do a lighter weight version of my patch,
which just does the adjustment only for the hrtimer_interrupt code
(leaving the rest of the read paths alone).  If that is something
you'd prefer.  I'll have to think a bit to ensure the internal
inconsistency isn't otherwise problematic.

Though I suspect fixing adjtimex is worth it as well, since its really
the only interface that can provide a sane view of the leapsecond, and
isn't normally considered a hot path.

thanks
-john

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-08 17:55                       ` John Stultz
@ 2015-06-08 19:05                         ` Thomas Gleixner
  2015-06-08 20:02                           ` Thomas Gleixner
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2015-06-08 19:05 UTC (permalink / raw)
  To: John Stultz
  Cc: Peter Zijlstra, Richard Cochran, Ingo Molnar, lkml,
	Prarit Bhargava, Daniel Bristot de Oliveira, Jan Kara,
	Jiri Bohac, Ingo Molnar, Shuah Khan

On Mon, 8 Jun 2015, John Stultz wrote:
> On Sat, Jun 6, 2015 at 2:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Fri, 5 Jun 2015, Peter Zijlstra wrote:
> >
> >> On Fri, 2015-06-05 at 11:04 +0200, Richard Cochran wrote:
> >> > On Fri, Jun 05, 2015 at 09:29:13AM +0200, Peter Zijlstra wrote:
> >> > > That leaves the question; for who is this exact second edge important?
> >> >
> >> > Distributed applications using the UTC time scale.
> >> >
> >> > Many control applications are done with a 1 millisecond period.
> >> > Having the time wrong by a second for 10 or 100 loops is bad news.
> >>
> >> Firstly I would strongly suggest such applications not use UTC because
> >> of this, I think TAI was invented for just this reason.
> >>
> >> Secondly, how would John's patches help with this? Usespace loops
> >> reading time would be using the VDSO and would still not get the right
> >> time, and timers would be subject to the same IRQ latency that a hrtimer
> >> based leap second insert would, and would still very much not be in-sync
> >> across the cluster.
> >
> > So the only thing which is fixed are in kernel users and therefor
> > hrtimers.
> 
> Well, for vdso systems, hrtimers and adjtimex (which is the only
> interface that provides enough information to understand where in a
> leapsecond you actually are).
> 
> And again, vdsos are fixable, but I hesitated due to my concerns about
> the extra performance overhead, the smaller benefit it provides
> relative to not having timers expiring early.

Right, and I'm concerned about the extra overhead of your patch. Just
look at the cache layout.

Current:

struct timekeeper {
       struct tk_read_base        tkr_mono;             /*     0    56 */
       struct tk_read_base        tkr_raw;              /*    56    56 */
       /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
       u64                        xtime_sec;            /*   112     8 */
       long unsigned int          ktime_sec;            /*   120     8 */
       /* --- cacheline 2 boundary (128 bytes) --- */
       struct timespec            wall_to_monotonic;    /*   128    16 */
       ktime_t                    offs_real;            /*   144     8 */
       ktime_t                    offs_boot;            /*   152     8 */
       ktime_t                    offs_tai;             /*   160     8 */
       s32                        tai_offset;           /*   168     4 */
       unsigned int               clock_was_set_seq;    /*   172     4 */
       struct timespec            raw_time;             /*   176    16 */
       /* --- cacheline 3 boundary (192 bytes) --- */
       cycle_t                    cycle_interval;       /*   192     8 */
       u64                        xtime_interval;       /*   200     8 */
       s64                        xtime_remainder;      /*   208     8 */
       u32                        raw_interval;         /*   216     4 */

       /* XXX 4 bytes hole, try to pack */

       u64                        ntp_tick;             /*   224     8 */
       s64                        ntp_error;            /*   232     8 */
       u32                        ntp_error_shift;      /*   240     4 */
       u32                        ntp_err_mult;         /*   244     4 */
};

Four cachelines where everything which is considered hotpath is in the
first two cache lines.

With your change this becomes:

struct timekeeper {
       struct tk_read_base        tkr_mono;             /*     0    56 */
       struct tk_read_base        tkr_raw;              /*    56    56 */
       /* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
       u64                        xtime_sec;            /*   112     8 */
       long unsigned int          ktime_sec;            /*   120     8 */
       /* --- cacheline 2 boundary (128 bytes) --- */
       struct timespec            wall_to_monotonic;    /*   128    16 */
       ktime_t                    offs_real;            /*   144     8 */
       ktime_t                    offs_boot;            /*   152     8 */
       ktime_t                    offs_tai;             /*   160     8 */
       s32                        tai_offset;           /*   168     4 */
       unsigned int               clock_was_set_seq;    /*   172     4 */
       struct timespec            raw_time;             /*   176    16 */
       /* --- cacheline 3 boundary (192 bytes) --- */
       cycle_t                    cycle_interval;       /*   192     8 */
       u64                        xtime_interval;       /*   200     8 */
       s64                        xtime_remainder;      /*   208     8 */
       u32                        raw_interval;         /*   216     4 */

       /* XXX 4 bytes hole, try to pack */

       u64                        ntp_tick;             /*   224     8 */
       s64                        ntp_error;            /*   232     8 */
       u32                        ntp_error_shift;      /*   240     4 */
       u32                        ntp_err_mult;         /*   244     4 */
       time64_t                   next_leap_sec;        /*   248     8 */
       /* --- cacheline 4 boundary (256 bytes) --- */
       ktime_t                    next_leap_ktime;      /*   256     8 */
       int                        leap_direction;       /*   264     4 */
};
  
That's 5 cache lines, but that's not the worst thing.

For every readout of CLOCK_REALTIME, which will affect in kernel
callers AND all systems which lack a VDSO, we have to load TWO more
cache lines.

We could mitigate that partially by moving the leap second stuff into
the 3rd cacheline, but you also add a conditional into every readout.

Did you try to measure the impact of your changes with sys_gettime()
and VDSO disabled?

> > That means the whole leap mess added into the gettime fast path is
> > just constant overhead for that corner case.
> >
> > We can be smarter than that and just block hrtimer delivery for clock
> > realtime timers across the leap edge. There should be ways to do that
> > non intrusive if we think hard enough about it.
> 
> This approach seems like it would add more work to the timer-add
> function (to check leapstate and adjust the expiration), which might
> be a heavier use case (we adjust for each timer) then the similar
> logic done in the update_base_offsets_now() at hrtimer_interrupt time
> (adjust for each hrtimer_interrupt).

No, certainly not in the timer add function. How should that work at
all? If we arm the timer to expire 24 hours from now, how should we
know about the leap state at expiry time?

> Now, It could be possible to do a lighter weight version of my patch,
> which just does the adjustment only for the hrtimer_interrupt code
> (leaving the rest of the read paths alone).

Yes, that should work. As long as I can keep the cached values in the
hrtimer cpu bases and the whole thing keeps the clock_was_set_seq
logic intact.

If we do not do the conditional version, then on every hrtimer
interrupt we write THREE cachelines for nothing.

And if we cannot cache the offsets, then we end up calling into the
timekeeping code for every timer which is not CLOCK_MONOTONIC based
and retrieve the offset. That hurts especially on 32bit machines,
because we need to protect the readout with the timekeeper sequence
counter.

> If that is something you'd prefer.  I'll have to think a bit to
> ensure the internal inconsistency isn't otherwise problematic.
>
> Though I suspect fixing adjtimex is worth it as well, since its really
> the only interface that can provide a sane view of the leapsecond, and
> isn't normally considered a hot path.

I'm not worried about adjtimex at all. You can do whatever you want
there, but please leave the fast pathes alone.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths
  2015-06-08 19:05                         ` Thomas Gleixner
@ 2015-06-08 20:02                           ` Thomas Gleixner
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Gleixner @ 2015-06-08 20:02 UTC (permalink / raw)
  To: John Stultz
  Cc: Peter Zijlstra, Richard Cochran, Ingo Molnar, lkml,
	Prarit Bhargava, Daniel Bristot de Oliveira, Jan Kara,
	Jiri Bohac, Ingo Molnar, Shuah Khan

On Mon, 8 Jun 2015, Thomas Gleixner wrote:
> On Mon, 8 Jun 2015, John Stultz wrote:
> > Now, It could be possible to do a lighter weight version of my patch,
> > which just does the adjustment only for the hrtimer_interrupt code
> > (leaving the rest of the read paths alone).
> 
> Yes, that should work. As long as I can keep the cached values in the
> hrtimer cpu bases and the whole thing keeps the clock_was_set_seq
> logic intact.
> 
> If we do not do the conditional version, then on every hrtimer
> interrupt we write THREE cachelines for nothing.
> 
> And if we cannot cache the offsets, then we end up calling into the
> timekeeping code for every timer which is not CLOCK_MONOTONIC based
> and retrieve the offset. That hurts especially on 32bit machines,
> because we need to protect the readout with the timekeeper sequence
> counter.

Below is a patch which just has the required data in the right place
and the changes to ktime_get_update_offsets_now().

It's simpler than your version as:

    - there is no requirement to do the add in the first place as we
      know the monotonic time at which the leap second happens.

    - we can precalculate the leap adjusted offset and just replace
      the non adjusted offset for the window.

When the whole thing is over you just need to increment the
clock_was_set_seq counter so the next timer interrupt will cache the
normal values again.

Thanks,

	tglx

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index e1f5a1136554..ecd193c23676 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -90,6 +90,9 @@ struct timekeeper {
 	ktime_t			offs_tai;
 	s32			tai_offset;
 	unsigned int		clock_was_set_seq;
+	ktime_t			next_leap_ktime;
+	ktime_t			offs_real_leap_adjusted;
+
 	struct timespec64	raw_time;
 
 	/* The following members are for timekeeping internal use */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 90ed5db67c1d..0de85bf9e331 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1952,15 +1952,21 @@ ktime_t ktime_get_update_offsets_now(unsigned int *cwsseq, ktime_t *offs_real,
 
 		base = tk->tkr_mono.base;
 		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		base = ktime_add_ns(base, nsecs);
+
 		if (*cwsseq != tk->clock_was_set_seq) {
 			*cwsseq = tk->clock_was_set_seq;
 			*offs_real = tk->offs_real;
 			*offs_boot = tk->offs_boot;
 			*offs_tai = tk->offs_tai;
 		}
+
+		if (base.tv64 >= tk->next_leap_ktime.tv64)
+			*offs_real = tk->offs_real_leap_adjusted;
+
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	return ktime_add_ns(base, nsecs);
+	return base;
 }
 
 /**

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-06-08 20:02 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-29 20:24 [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers John Stultz
2015-05-29 20:24 ` [RFC][PATCH 1/4] selftests: timers: Add leap-second timer edge testing to leap-a-day.c John Stultz
2015-05-29 20:24 ` [RFC][PATCH 2/4] timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers John Stultz
2015-05-29 20:24 ` [RFC][PATCH 3/4] ntp: Use printk_deferred in leapsecond path John Stultz
2015-06-02 10:31   ` Jiri Bohac
2015-06-02 10:43     ` Jiri Kosina
2015-06-02 16:14       ` John Stultz
2015-06-02 16:04     ` John Stultz
2015-05-29 20:24 ` [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths John Stultz
2015-05-31 16:05   ` Richard Cochran
2015-06-02  9:01   ` Ingo Molnar
2015-06-02  9:21     ` Peter Zijlstra
2015-06-02 14:09       ` John Stultz
2015-06-02 15:52     ` John Stultz
2015-06-03  9:04       ` Ingo Molnar
2015-06-03 17:44         ` John Stultz
2015-06-04  6:48           ` Ingo Molnar
2015-06-05  0:08             ` John Stultz
2015-06-05  7:29               ` Peter Zijlstra
2015-06-05  9:04                 ` Richard Cochran
2015-06-05  9:10                   ` Peter Zijlstra
2015-06-05 14:12                     ` Richard Cochran
2015-06-05 17:28                       ` John Stultz
2015-06-06  9:44                     ` Thomas Gleixner
2015-06-08 17:55                       ` John Stultz
2015-06-08 19:05                         ` Thomas Gleixner
2015-06-08 20:02                           ` Thomas Gleixner
2015-06-05 11:37                   ` Prarit Bhargava
2015-06-05 12:07                     ` Thomas Gleixner
2015-06-05 14:22                 ` Richard Cochran
2015-06-05 17:24                 ` John Stultz
2015-05-31 13:55 ` [RFC][PATCH 0/4] Fixes for leapsecond expiring early ABS_TIME CLOCK_REALTIME timers Prarit Bhargava
2015-06-01 11:57 ` Prarit Bhargava
2015-06-01 17:02   ` John Stultz
2015-06-01 17:43     ` Prarit Bhargava
2015-06-01 20:18 ` Daniel Bristot de Oliveira
2015-06-01 20:32   ` John Stultz
2015-06-01 21:42   ` Prarit Bhargava
2015-06-01 22:29     ` Daniel Bristot de Oliveira
2015-06-02  6:19       ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).