All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC V1 0/5] Rationalize time keeping
@ 2012-04-27  8:12 Richard Cochran
  2012-04-27  8:12 ` [PATCH RFC V1 1/5] Add functions to convert continuous timescales to UTC Richard Cochran
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Richard Cochran @ 2012-04-27  8:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

Just in time for this year's leap second, this patch series presents a
solution for the UTC leap second mess.

Of course, the POSIX UTC system is broken by design, and the Linux
kernel cannot fix that. However, what we can do is correctly execute
leap seconds and always report the time variables (UTC time, TAI
offset, and leap second status) with consistency.

The basic idea is to keep the internal time using a continuous
timescale and to convert to UTC by testing the time value against the
current threshold and adding the appropriate offset. Since the UTC
time and the leap second status is provided on demand, this eliminates
the need to set a timer or to constantly monitor for leap seconds, as
was done up until now.

Patches 2 and 3 are just trivial stuff I saw along the way.

* Benefits
  - Fixes the buggy, inconsistent time reporting surrounding a leap
    second event.
  - Opens the possibility of offering a rational time source to user
    space. [ Trivial to offer clock_gettime(CLOCK_TAI) for example. ]

* Performance Impacts
** con
   - Small extra cost when reading the time (one integer addition plus
     one integer test).
** pro
   - Removes repetitive, periodic division (secs % 86400 == 0) the whole
     day long preceding a leap second.
   - Cost of maintaining leap second status goes to the user of the
     NTP adjtimex() interface, if any.

* Todo
  - The function __current_kernel_time accesses the time variables
    without taking the lock. I can't figure that out.


Richard Cochran (5):
  Add functions to convert continuous timescales to UTC.
  ntp: Fix a stale comment and a few stray newlines.
  timekeeping: Fix a few minor newline issues.
  timekeeping: Offer an interface to manipulate leap seconds.
  timekeeping: Use a continuous timescale to tell time.

 include/linux/timex.h      |    2 +-
 kernel/time/Kconfig        |   12 ++
 kernel/time/leap-seconds.h |   23 ++++
 kernel/time/ntp.c          |   87 ++++------------
 kernel/time/timekeeping.c  |  252 +++++++++++++++++++++++++++++++++++++++++---
 kernel/time/utc-tai.h      |   99 +++++++++++++++++
 6 files changed, 391 insertions(+), 84 deletions(-)
 create mode 100644 kernel/time/leap-seconds.h
 create mode 100644 kernel/time/utc-tai.h

-- 
1.7.2.5


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH RFC V1 1/5] Add functions to convert continuous timescales to UTC.
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
@ 2012-04-27  8:12 ` Richard Cochran
  2012-04-27  8:12 ` [PATCH RFC V1 2/5] ntp: Fix a stale comment and a few stray newlines Richard Cochran
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Richard Cochran @ 2012-04-27  8:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

This patch adds a pair of inline helper functions to convert from
one timescale to the other, around a given leap second threshold.
Also, it adds an academic compile time option to support deleting
leap seconds.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 kernel/time/Kconfig   |   12 ++++++
 kernel/time/utc-tai.h |   99 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 111 insertions(+), 0 deletions(-)
 create mode 100644 kernel/time/utc-tai.h

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index a20dc8a..1569aef 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -26,6 +26,18 @@ config HIGH_RES_TIMERS
 	  hardware is not capable then this option only increases
 	  the size of the kernel image.
 
+config DELETE_LEAP_SECONDS
+	bool "Support deletion of leap seconds"
+	default y
+	help
+	  This option enables support for leap second deletion via the
+	  STA_DEL control bit in the NTP adjtimex system call.
+
+	  Historically, leap seconds have only ever been inserted,
+	  never deleted. Saying no here results in a slightly smaller
+	  kernel image. If you believe that Earth's rotational speed
+	  might one day increase, then say yes here.
+
 config GENERIC_CLOCKEVENTS_BUILD
 	bool
 	default y
diff --git a/kernel/time/utc-tai.h b/kernel/time/utc-tai.h
new file mode 100644
index 0000000..ffc3ae9
--- /dev/null
+++ b/kernel/time/utc-tai.h
@@ -0,0 +1,99 @@
+/*
+ * linux/kernel/time/utc-tai.h
+ *
+ * Converts between TAI and UTC by applying the offset correction.
+ *
+ * Copyright (C) 2012 Richard Cochran <richardcochran@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+#ifndef __LINUX_KERNEL_TIME_UTC_TAI_H
+#define __LINUX_KERNEL_TIME_UTC_TAI_H
+
+#include <linux/types.h>
+
+static inline time_t __tai_to_utc(time_t tai, time_t next, int offset)
+{
+	time_t utc = tai - offset;
+
+	if (unlikely(tai >= next))
+		utc--;
+
+	return utc;
+}
+
+static inline time_t __utc_to_tai(time_t utc, time_t next, int offset)
+{
+	time_t leapsec = next - offset;
+	time_t result = utc + offset;
+
+	if (unlikely(utc >= leapsec))
+		result++;
+
+	return result;
+}
+
+#ifdef CONFIG_DELETE_LEAP_SECONDS
+
+static inline time_t tai_to_utc(time_t tai, time_t next, int offset, int insert)
+{
+	if (likely(insert))
+		return __tai_to_utc(tai, next, offset);
+
+	if (unlikely(tai > next))
+		return tai - offset + 1;
+
+	return tai - offset;
+}
+
+static inline time_t utc_to_tai(time_t utc, time_t next, int offset, int insert)
+{
+	time_t leapsec, result;
+
+	if (likely(insert))
+		return __utc_to_tai(utc, next, offset);
+
+	leapsec = next - offset + 1;
+
+	if (unlikely(utc > leapsec))
+		result = utc + offset - 1;
+	else if (utc < leapsec)
+		result = utc + offset;
+	else {
+		/*
+		 * Ouch, this time never existed.
+		 */
+		result = next;
+	}
+
+	return result;
+}
+
+#else /*CONFIG_DELETE_LEAP_SECONDS*/
+
+static inline time_t tai_to_utc(time_t tai, time_t next, int offset, int insert)
+{
+	return __tai_to_utc(tai, next, offset);
+}
+
+static inline time_t utc_to_tai(time_t utc, time_t next, int offset, int insert)
+{
+	return __utc_to_tai(utc, next, offset);
+}
+
+#endif /*!CONFIG_DELETE_LEAP_SECONDS*/
+
+#endif
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC V1 2/5] ntp: Fix a stale comment and a few stray newlines.
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
  2012-04-27  8:12 ` [PATCH RFC V1 1/5] Add functions to convert continuous timescales to UTC Richard Cochran
@ 2012-04-27  8:12 ` Richard Cochran
  2012-04-27 22:25   ` John Stultz
  2012-04-27  8:12 ` [PATCH RFC V1 3/5] timekeeping: Fix a few minor newline issues Richard Cochran
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-04-27  8:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 kernel/time/ntp.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index f03fd83..d0a2183 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -473,8 +473,6 @@ int second_overflow(unsigned long secs)
 							 << NTP_SCALE_SHIFT;
 	time_adjust = 0;
 
-
-
 out:
 	spin_unlock_irqrestore(&ntp_lock, flags);
 
@@ -559,10 +557,10 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
 	/* only set allowed bits */
 	time_status &= STA_RONLY;
 	time_status |= txc->status & ~STA_RONLY;
-
 }
+
 /*
- * Called with the xtime lock held, so we can access and modify
+ * Called with ntp_lock held, so we can access and modify
  * all the global NTP state:
  */
 static inline void process_adjtimex_modes(struct timex *txc, struct timespec *ts)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC V1 3/5] timekeeping: Fix a few minor newline issues.
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
  2012-04-27  8:12 ` [PATCH RFC V1 1/5] Add functions to convert continuous timescales to UTC Richard Cochran
  2012-04-27  8:12 ` [PATCH RFC V1 2/5] ntp: Fix a stale comment and a few stray newlines Richard Cochran
@ 2012-04-27  8:12 ` Richard Cochran
  2012-04-27 22:25   ` John Stultz
  2012-04-27  8:12 ` [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds Richard Cochran
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-04-27  8:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 kernel/time/timekeeping.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d66b213..6e46cac 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -240,7 +240,6 @@ void getnstimeofday(struct timespec *ts)
 
 	timespec_add_ns(ts, nsecs);
 }
-
 EXPORT_SYMBOL(getnstimeofday);
 
 ktime_t ktime_get(void)
@@ -357,8 +356,8 @@ void do_gettimeofday(struct timeval *tv)
 	tv->tv_sec = now.tv_sec;
 	tv->tv_usec = now.tv_nsec/1000;
 }
-
 EXPORT_SYMBOL(do_gettimeofday);
+
 /**
  * do_settimeofday - Sets the time of day
  * @tv:		pointer to the timespec variable containing the new time
@@ -392,7 +391,6 @@ int do_settimeofday(const struct timespec *tv)
 
 	return 0;
 }
-
 EXPORT_SYMBOL(do_settimeofday);
 
 
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds.
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
                   ` (2 preceding siblings ...)
  2012-04-27  8:12 ` [PATCH RFC V1 3/5] timekeeping: Fix a few minor newline issues Richard Cochran
@ 2012-04-27  8:12 ` Richard Cochran
  2012-04-27 23:08   ` John Stultz
  2012-04-27  8:12 ` [PATCH RFC V1 5/5] timekeeping: Use a continuous timescale to tell time Richard Cochran
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-04-27  8:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

This patch adds a new internal interface to be used by the NTP code in
order to set the next leap second event. Also, it adds a kernel command
line option that can be used to dial the TAI - UTC offset at boot.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 kernel/time/leap-seconds.h |   23 ++++++
 kernel/time/timekeeping.c  |  175 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 198 insertions(+), 0 deletions(-)
 create mode 100644 kernel/time/leap-seconds.h

diff --git a/kernel/time/leap-seconds.h b/kernel/time/leap-seconds.h
new file mode 100644
index 0000000..d13923e8
--- /dev/null
+++ b/kernel/time/leap-seconds.h
@@ -0,0 +1,23 @@
+/*
+ * linux/kernel/time/leap-seconds.h
+ *
+ * Functional interface to the timekeeper code,
+ * for use by the NTP code.
+ *
+ */
+#ifndef __LINUX_KERNEL_TIME_LEAP_SECONDS_H
+#define __LINUX_KERNEL_TIME_LEAP_SECONDS_H
+
+#include <linux/time.h>
+
+int timekeeper_gettod_status(struct timespec *ts, int *offset);
+
+void timekeeper_delete_leap_second(void);
+
+void timekeeper_finish_leap_second(void);
+
+void timekeeper_insert_leap_second(void);
+
+void timekeeper_tai_offset(int offset);
+
+#endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6e46cac..7941258 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -21,6 +21,9 @@
 #include <linux/tick.h>
 #include <linux/stop_machine.h>
 
+#include "leap-seconds.h"
+#include "utc-tai.h"
+
 /* Structure holding internal timekeeping values. */
 struct timekeeper {
 	/* Current clocksource used for timekeeping. */
@@ -50,6 +53,16 @@ struct timekeeper {
 
 	/* The current time */
 	struct timespec xtime;
+	/* The Kernel Time Scale (KTS) value of the next leap second. */
+	time_t next_leapsecond;
+	/* The current difference KTS - UTC. */
+	int kts_utc_offset;
+	/* The current difference TAI - KTS. */
+	int tai_kts_offset;
+#ifdef CONFIG_DELETE_LEAP_SECONDS
+	/* Remembers whether to insert or to delete. */
+	int insert_leapsecond;
+#endif
 	/*
 	 * wall_to_monotonic is what we need to add to xtime (or xtime corrected
 	 * for sub jiffie times) to get to monotonic time.  Monotonic is pegged
@@ -87,6 +100,30 @@ __cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock);
 int __read_mostly timekeeping_suspended;
 
 
+static int __init tai_offset_setup(char *str)
+{
+	get_option(&str, &timekeeper.kts_utc_offset);
+	return 1;
+}
+__setup("tai_offset=", tai_offset_setup);
+
+#ifdef CONFIG_DELETE_LEAP_SECONDS
+#define tk_insert timekeeper.insert_leapsecond
+#else
+#define tk_insert 1
+#endif
+
+/**
+ * timekeeper_utc_sec - returns current time in the UTC timescale
+ *
+ * Callers must use timekeeper.lock for reading.
+ */
+static inline time_t timekeeper_utc_sec(void)
+{
+	return tai_to_utc(timekeeper.xtime.tv_sec,
+			  timekeeper.next_leapsecond,
+			  timekeeper.kts_utc_offset, tk_insert);
+}
 
 /**
  * timekeeper_setup_internals - Set up internals to use clocksource clock.
@@ -455,6 +492,144 @@ static int change_clocksource(void *data)
 }
 
 /**
+ * timekeeper_gettod_status - Get the current time, TAI offset,
+ *                            and leap second status.
+ */
+int timekeeper_gettod_status(struct timespec *ts, int *offset)
+{
+	int code = TIME_OK, insert, ku_off, tk_off;
+	unsigned long seq;
+	time_t diff, next;
+	s64 nsecs;
+
+	WARN_ON(timekeeping_suspended);
+
+	do {
+		seq = read_seqbegin(&timekeeper.lock);
+		next   = timekeeper.next_leapsecond;
+		ku_off = timekeeper.kts_utc_offset;
+		tk_off = timekeeper.tai_kts_offset;
+		*ts    = timekeeper.xtime;
+		nsecs  = timekeeping_get_ns();
+		nsecs += arch_gettimeoffset();
+		insert = tk_insert;
+
+	} while (read_seqretry(&timekeeper.lock, seq));
+
+	timespec_add_ns(ts, nsecs);
+
+	diff = next - ts->tv_sec;
+	ts->tv_sec = tai_to_utc(ts->tv_sec, next, ku_off, insert);
+
+	if (!diff) {
+		code = TIME_OOP;
+		ku_off += insert ? 1 : 0;
+	} else if (diff < 0) {
+		code = TIME_WAIT;
+		ku_off += insert ? 1 : -1;
+	} else if (diff < 86400) {
+		code = insert ? TIME_INS : TIME_DEL;
+	}
+
+	*offset = ku_off + tk_off;
+	return code;
+}
+
+/**
+ * utc_next_midnight - Return the UTC time of the next zero hour.
+ *
+ * Callers must use timekeeper.lock.
+ */
+static time_t utc_next_midnight(void)
+{
+	time_t days, now, zero;
+
+	now = timekeeper_utc_sec();
+	days = now / 86400;
+	zero = (1 + days) * 86400;
+
+	return zero;
+}
+
+/**
+ * timekeeper_delete_leap_second - Delete a leap second today.
+ */
+void timekeeper_delete_leap_second(void)
+{
+#ifdef CONFIG_DELETE_LEAP_SECONDS
+	time_t leap;
+	unsigned long flags;
+
+	write_seqlock_irqsave(&timekeeper.lock, flags);
+
+	leap = utc_next_midnight() - 1;
+	timekeeper.next_leapsecond = leap + timekeeper.kts_utc_offset;
+	timekeeper.insert_leapsecond = 0;
+
+	write_sequnlock_irqrestore(&timekeeper.lock, flags);
+#endif
+}
+
+/**
+ * timekeeper_finish_leap_second - Advance the leap second threshold.
+ */
+void timekeeper_finish_leap_second(void)
+{
+	unsigned long flags;
+	write_seqlock_irqsave(&timekeeper.lock, flags);
+
+	if (timekeeper.xtime.tv_sec <= timekeeper.next_leapsecond)
+		goto out;
+
+	timekeeper.next_leapsecond = LONG_MAX;
+
+	if (tk_insert)
+		timekeeper.kts_utc_offset++;
+	else
+		timekeeper.kts_utc_offset--;
+out:
+	write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
+ * timekeeper_insert_leap_second - Add a leap second today.
+ */
+void timekeeper_insert_leap_second(void)
+{
+	time_t leap;
+	unsigned long flags;
+
+	write_seqlock_irqsave(&timekeeper.lock, flags);
+
+	leap = utc_next_midnight();
+	timekeeper.next_leapsecond = leap + timekeeper.kts_utc_offset;
+#ifdef CONFIG_DELETE_LEAP_SECONDS
+	timekeeper.insert_leapsecond = 1;
+#endif
+	write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
+ * timekeeper_tai_offset - Set the TAI - UTC offset.
+ *
+ * If the current offset is wrong, then we fix it by resetting the TAI
+ * clock and keeping the UTC time continuous. The rationale is that if
+ * a machine boots and gets an approximately correct UTC time from a
+ * RTC and later discovers the current TAI offset from the network,
+ * then the UTC users will not experience a jump in time.
+ */
+void timekeeper_tai_offset(int offset)
+{
+	unsigned long flags;
+
+	write_seqlock_irqsave(&timekeeper.lock, flags);
+
+	timekeeper.tai_kts_offset = offset - timekeeper.kts_utc_offset;
+
+	write_sequnlock_irqrestore(&timekeeper.lock, flags);
+}
+
+/**
  * timekeeping_notify - Install a new clock source
  * @clock:		pointer to the clock source
  *
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC V1 5/5] timekeeping: Use a continuous timescale to tell time.
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
                   ` (3 preceding siblings ...)
  2012-04-27  8:12 ` [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds Richard Cochran
@ 2012-04-27  8:12 ` Richard Cochran
  2012-05-28 16:49   ` Richard Cochran
  2012-04-27 22:49 ` [PATCH RFC V1 0/5] Rationalize time keeping John Stultz
  2012-05-03 19:57 ` John Stultz
  6 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-04-27  8:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

This patch converts the core time keeping code to use a continuous
timescale called the Kernel Time Scale (KTS). KTS is based on the
TAI timescale but can be offset from it by an integral number of seconds.
Every function that returns UTC time now coverts the seconds by adding
the current KTS - UTC offset.

As a result of this change, the NTP leap second code is considerably
simplified and hopefully more robust.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
---
 include/linux/timex.h     |    2 +-
 kernel/time/ntp.c         |   81 ++++++++++----------------------------------
 kernel/time/timekeeping.c |   73 ++++++++++++++++++++++++++++++++--------
 3 files changed, 79 insertions(+), 77 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index 99bc88b..9461e6f 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -252,7 +252,7 @@ extern void ntp_clear(void);
 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
 extern u64 ntp_tick_length(void);
 
-extern int second_overflow(unsigned long secs);
+void second_overflow(void);
 extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
 
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index d0a2183..91de2f8 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 #include <linux/module.h>
 
+#include "leap-seconds.h"
 #include "tick-internal.h"
 
 /*
@@ -24,6 +25,7 @@
 
 DEFINE_SPINLOCK(ntp_lock);
 
+#define STA_LEAP		(STA_INS | STA_DEL)
 
 /* USER_HZ period (usecs): */
 unsigned long			tick_usec = TICK_USEC;
@@ -42,19 +44,9 @@ static u64			tick_length_base;
  * phase-lock loop variables
  */
 
-/*
- * clock synchronization status
- *
- * (TIME_ERROR prevents overwriting the CMOS clock)
- */
-static int			time_state = TIME_OK;
-
 /* clock status bits:							*/
 static int			time_status = STA_UNSYNC;
 
-/* TAI offset (secs):							*/
-static long			time_tai;
-
 /* time adjustment (nsecs):						*/
 static s64			time_offset;
 
@@ -386,57 +378,14 @@ u64 ntp_tick_length(void)
  * They were originally developed for SUN and DEC kernels.
  * All the kudos should go to Dave for this stuff.
  *
- * Also handles leap second processing, and returns leap offset
  */
-int second_overflow(unsigned long secs)
+void second_overflow(void)
 {
 	s64 delta;
-	int leap = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(&ntp_lock, flags);
 
-	/*
-	 * Leap second processing. If in leap-insert state at the end of the
-	 * day, the system clock is set back one second; if in leap-delete
-	 * state, the system clock is set ahead one second.
-	 */
-	switch (time_state) {
-	case TIME_OK:
-		if (time_status & STA_INS)
-			time_state = TIME_INS;
-		else if (time_status & STA_DEL)
-			time_state = TIME_DEL;
-		break;
-	case TIME_INS:
-		if (secs % 86400 == 0) {
-			leap = -1;
-			time_state = TIME_OOP;
-			printk(KERN_NOTICE
-				"Clock: inserting leap second 23:59:60 UTC\n");
-		}
-		break;
-	case TIME_DEL:
-		if ((secs + 1) % 86400 == 0) {
-			leap = 1;
-			time_tai--;
-			time_state = TIME_WAIT;
-			printk(KERN_NOTICE
-				"Clock: deleting leap second 23:59:59 UTC\n");
-		}
-		break;
-	case TIME_OOP:
-		time_tai++;
-		time_state = TIME_WAIT;
-		break;
-
-	case TIME_WAIT:
-		if (!(time_status & (STA_INS | STA_DEL)))
-			time_state = TIME_OK;
-		break;
-	}
-
-
 	/* Bump the maxerror field */
 	time_maxerror += MAXFREQ / NSEC_PER_USEC;
 	if (time_maxerror > NTP_PHASE_LIMIT) {
@@ -475,8 +424,6 @@ int second_overflow(unsigned long secs)
 
 out:
 	spin_unlock_irqrestore(&ntp_lock, flags);
-
-	return leap;
 }
 
 #ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -541,7 +488,6 @@ static inline void notify_cmos_timer(void) { }
 static inline void process_adj_status(struct timex *txc, struct timespec *ts)
 {
 	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
-		time_state = TIME_OK;
 		time_status = STA_UNSYNC;
 		/* restart PPS frequency calibration */
 		pps_reset_freq_interval();
@@ -554,6 +500,18 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
 	if (!(time_status & STA_PLL) && (txc->status & STA_PLL))
 		time_reftime = get_seconds();
 
+	/*
+	 * Check for new leap second commands.
+	 */
+	if (!(time_status & STA_INS) && (txc->status & STA_INS))
+		timekeeper_insert_leap_second();
+
+	else if (!(time_status & STA_DEL) && (txc->status & STA_DEL))
+		timekeeper_delete_leap_second();
+
+	else if ((time_status & STA_LEAP) && !(txc->status & STA_LEAP))
+		timekeeper_finish_leap_second();
+
 	/* only set allowed bits */
 	time_status &= STA_RONLY;
 	time_status |= txc->status & ~STA_RONLY;
@@ -597,7 +555,7 @@ static inline void process_adjtimex_modes(struct timex *txc, struct timespec *ts
 	}
 
 	if (txc->modes & ADJ_TAI && txc->constant > 0)
-		time_tai = txc->constant;
+		timekeeper_tai_offset(txc->constant);
 
 	if (txc->modes & ADJ_OFFSET)
 		ntp_update_offset(txc->offset);
@@ -616,7 +574,7 @@ static inline void process_adjtimex_modes(struct timex *txc, struct timespec *ts
 int do_adjtimex(struct timex *txc)
 {
 	struct timespec ts;
-	int result;
+	int result, tai_offset;
 
 	/* Validate the data before disabling interrupts */
 	if (txc->modes & ADJ_ADJTIME) {
@@ -654,7 +612,7 @@ int do_adjtimex(struct timex *txc)
 			return result;
 	}
 
-	getnstimeofday(&ts);
+	result = timekeeper_gettod_status(&ts, &tai_offset);
 
 	spin_lock_irq(&ntp_lock);
 
@@ -679,7 +637,6 @@ int do_adjtimex(struct timex *txc)
 			txc->offset /= NSEC_PER_USEC;
 	}
 
-	result = time_state;	/* mostly `TIME_OK' */
 	/* check for errors */
 	if (is_error_status(time_status))
 		result = TIME_ERROR;
@@ -693,7 +650,7 @@ int do_adjtimex(struct timex *txc)
 	txc->precision	   = 1;
 	txc->tolerance	   = MAXFREQ_SCALED / PPM_SCALE;
 	txc->tick	   = tick_usec;
-	txc->tai	   = time_tai;
+	txc->tai	   = tai_offset;
 
 	/* fill PPS status fields */
 	pps_fill_timex(txc);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7941258..6cedf46 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -212,11 +212,14 @@ static inline s64 timekeeping_get_ns_raw(void)
 /* must hold write on timekeeper.lock */
 static void timekeeping_update(bool clearntp)
 {
+	struct timespec ts;
 	if (clearntp) {
 		timekeeper.ntp_error = 0;
 		ntp_clear();
 	}
-	update_vsyscall(&timekeeper.xtime, &timekeeper.wall_to_monotonic,
+	ts.tv_sec = timekeeper_utc_sec();
+	ts.tv_nsec = timekeeper.xtime.tv_nsec;
+	update_vsyscall(&ts, &timekeeper.wall_to_monotonic,
 			 timekeeper.clock, timekeeper.mult);
 }
 
@@ -267,7 +270,8 @@ void getnstimeofday(struct timespec *ts)
 	do {
 		seq = read_seqbegin(&timekeeper.lock);
 
-		*ts = timekeeper.xtime;
+		ts->tv_sec = timekeeper_utc_sec();
+		ts->tv_nsec = timekeeper.xtime.tv_nsec;
 		nsecs = timekeeping_get_ns();
 
 		/* If arch requires, add in gettimeoffset() */
@@ -360,7 +364,8 @@ void getnstime_raw_and_real(struct timespec *ts_raw, struct timespec *ts_real)
 		seq = read_seqbegin(&timekeeper.lock);
 
 		*ts_raw = timekeeper.raw_time;
-		*ts_real = timekeeper.xtime;
+		ts_real->tv_sec = timekeeper_utc_sec();
+		ts_real->tv_nsec = timekeeper.xtime.tv_nsec;
 
 		nsecs_raw = timekeeping_get_ns_raw();
 		nsecs_real = timekeeping_get_ns();
@@ -404,6 +409,7 @@ EXPORT_SYMBOL(do_gettimeofday);
 int do_settimeofday(const struct timespec *tv)
 {
 	struct timespec ts_delta;
+	time_t kts_seconds;
 	unsigned long flags;
 
 	if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
@@ -413,12 +419,16 @@ int do_settimeofday(const struct timespec *tv)
 
 	timekeeping_forward_now();
 
-	ts_delta.tv_sec = tv->tv_sec - timekeeper.xtime.tv_sec;
+	kts_seconds = utc_to_tai(tv->tv_sec, timekeeper.next_leapsecond,
+				 timekeeper.kts_utc_offset, tk_insert);
+
+	ts_delta.tv_sec = kts_seconds - timekeeper.xtime.tv_sec;
 	ts_delta.tv_nsec = tv->tv_nsec - timekeeper.xtime.tv_nsec;
 	timekeeper.wall_to_monotonic =
 			timespec_sub(timekeeper.wall_to_monotonic, ts_delta);
 
-	timekeeper.xtime = *tv;
+	timekeeper.xtime.tv_sec = kts_seconds;
+	timekeeper.xtime.tv_nsec = tv->tv_nsec;
 	timekeeping_update(true);
 
 	write_sequnlock_irqrestore(&timekeeper.lock, flags);
@@ -769,6 +779,21 @@ void __init timekeeping_init(void)
 		clock->enable(clock);
 	timekeeper_setup_internals(clock);
 
+	/*
+	 * If the TAI offset is not set or unreasonable, start with
+	 * the offset as of 1 Jan 2009.
+	 */
+	if (timekeeper.kts_utc_offset < 10)
+		timekeeper.kts_utc_offset = 34;
+
+	timekeeper.next_leapsecond = LONG_MAX;
+
+#ifdef CONFIG_DELETE_LEAP_SECONDS
+	timekeeper.insert_leapsecond = 1;
+#endif
+	now.tv_sec = utc_to_tai(now.tv_sec, timekeeper.next_leapsecond,
+				timekeeper.kts_utc_offset, tk_insert);
+
 	timekeeper.xtime.tv_sec = now.tv_sec;
 	timekeeper.xtime.tv_nsec = now.tv_nsec;
 	timekeeper.raw_time.tv_sec = 0;
@@ -1132,11 +1157,9 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
 
 	timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
 	while (timekeeper.xtime_nsec >= nsecps) {
-		int leap;
 		timekeeper.xtime_nsec -= nsecps;
 		timekeeper.xtime.tv_sec++;
-		leap = second_overflow(timekeeper.xtime.tv_sec);
-		timekeeper.xtime.tv_sec += leap;
+		second_overflow();
 	}
 
 	/* Accumulate raw time */
@@ -1247,11 +1270,9 @@ static void update_wall_time(void)
 	 * xtime.tv_nsec isn't larger than NSEC_PER_SEC
 	 */
 	if (unlikely(timekeeper.xtime.tv_nsec >= NSEC_PER_SEC)) {
-		int leap;
 		timekeeper.xtime.tv_nsec -= NSEC_PER_SEC;
 		timekeeper.xtime.tv_sec++;
-		leap = second_overflow(timekeeper.xtime.tv_sec);
-		timekeeper.xtime.tv_sec += leap;
+		second_overflow();
 	}
 
 	timekeeping_update(false);
@@ -1346,13 +1367,35 @@ EXPORT_SYMBOL_GPL(monotonic_to_bootbased);
 
 unsigned long get_seconds(void)
 {
-	return timekeeper.xtime.tv_sec;
+	unsigned long seconds, seq;
+
+	do {
+		seq = read_seqbegin(&timekeeper.lock);
+		seconds = timekeeper_utc_sec();
+	} while (read_seqretry(&timekeeper.lock, seq));
+
+	return seconds;
 }
 EXPORT_SYMBOL(get_seconds);
 
 struct timespec __current_kernel_time(void)
 {
-	return timekeeper.xtime;
+/*
+ * TODO - What do the two callers really expect?
+ *        Why don't they take the lock?
+ *
+ * linux/arch/x86/kernel/vsyscall_64.c: update_vsyscall[104]
+ *
+ *	vsyscall_gtod_data.wall_time_coarse = __current_kernel_time();
+ *
+ * linux/kernel/debug/kdb/kdb_main.c:   kdb_summary[2570]
+ *
+ *	now = __current_kernel_time();
+ */
+	struct timespec ts;
+	ts.tv_sec = timekeeper_utc_sec();
+	ts.tv_nsec = timekeeper.xtime.tv_nsec;
+	return ts;
 }
 
 struct timespec current_kernel_time(void)
@@ -1363,7 +1406,9 @@ struct timespec current_kernel_time(void)
 	do {
 		seq = read_seqbegin(&timekeeper.lock);
 
-		now = timekeeper.xtime;
+		now.tv_sec = timekeeper_utc_sec();
+		now.tv_nsec = timekeeper.xtime.tv_nsec;
+
 	} while (read_seqretry(&timekeeper.lock, seq));
 
 	return now;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 2/5] ntp: Fix a stale comment and a few stray newlines.
  2012-04-27  8:12 ` [PATCH RFC V1 2/5] ntp: Fix a stale comment and a few stray newlines Richard Cochran
@ 2012-04-27 22:25   ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2012-04-27 22:25 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 04/27/2012 01:12 AM, Richard Cochran wrote:
> Signed-off-by: Richard Cochran<richardcochran@gmail.com>
Queued.

thanks
-john

> ---
>   kernel/time/ntp.c |    6 ++----
>   1 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index f03fd83..d0a2183 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -473,8 +473,6 @@ int second_overflow(unsigned long secs)
>   							<<  NTP_SCALE_SHIFT;
>   	time_adjust = 0;
>
> -
> -
>   out:
>   	spin_unlock_irqrestore(&ntp_lock, flags);
>
> @@ -559,10 +557,10 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
>   	/* only set allowed bits */
>   	time_status&= STA_RONLY;
>   	time_status |= txc->status&  ~STA_RONLY;
> -
>   }
> +
>   /*
> - * Called with the xtime lock held, so we can access and modify
> + * Called with ntp_lock held, so we can access and modify
>    * all the global NTP state:
>    */
>   static inline void process_adjtimex_modes(struct timex *txc, struct timespec *ts)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 3/5] timekeeping: Fix a few minor newline issues.
  2012-04-27  8:12 ` [PATCH RFC V1 3/5] timekeeping: Fix a few minor newline issues Richard Cochran
@ 2012-04-27 22:25   ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2012-04-27 22:25 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 04/27/2012 01:12 AM, Richard Cochran wrote:
> Signed-off-by: Richard Cochran<richardcochran@gmail.com>

Queued.
thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
                   ` (4 preceding siblings ...)
  2012-04-27  8:12 ` [PATCH RFC V1 5/5] timekeeping: Use a continuous timescale to tell time Richard Cochran
@ 2012-04-27 22:49 ` John Stultz
  2012-04-28  8:04   ` Richard Cochran
  2012-05-03 18:21   ` Richard Cochran
  2012-05-03 19:57 ` John Stultz
  6 siblings, 2 replies; 28+ messages in thread
From: John Stultz @ 2012-04-27 22:49 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 04/27/2012 01:12 AM, Richard Cochran wrote:
> Just in time for this year's leap second, this patch series presents a
> solution for the UTC leap second mess.
>
> Of course, the POSIX UTC system is broken by design, and the Linux
> kernel cannot fix that. However, what we can do is correctly execute
> leap seconds and always report the time variables (UTC time, TAI
> offset, and leap second status) with consistency.
>
> The basic idea is to keep the internal time using a continuous
> timescale and to convert to UTC by testing the time value against the
> current threshold and adding the appropriate offset. Since the UTC
> time and the leap second status is provided on demand, this eliminates
> the need to set a timer or to constantly monitor for leap seconds, as
> was done up until now.
>
> Patches 2 and 3 are just trivial stuff I saw along the way.
The trivial cleanups I went ahead and took, but I think the rest still 
needs some work.

> * Benefits
>    - Fixes the buggy, inconsistent time reporting surrounding a leap
>      second event.
Just to clarify this, so we've got the right scope on the problem, 
you're trying to address the fact that the leap second is not actually 
applied until the tick after the leap second, correct?

Where basically you can see small offsets like:

23:59:59.999999999
00:00:00.000500000
00:00:00.000800000
[tick]
23:59:59.000900000 (+TIME_OOP)
...
23:59:59.999999999 (+TIME_OOP)
00:00:00.000800000 (+TIME_OOP)
[tick]
00:00:00.000900000
00:00:00.006000000

And you're proposing we fix this by changing the leap-second processing 
from only being done at tick-time  (which isn't exactly on the second 
boundary)to being calculated for each getnstimeofday, correct?

>    - Opens the possibility of offering a rational time source to user
>      space. [ Trivial to offer clock_gettime(CLOCK_TAI) for example. ]

CLOCK_TAI is something I'd like to have.  My only concern is how we 
manage it along with possible smeared-leap-seconds ala:
http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html

( I shudder at the idea of managing two separate frequency corrections 
for different time domains).

> * Performance Impacts
> ** con
>     - Small extra cost when reading the time (one integer addition plus
>       one integer test).
This may not be so small when it comes to folks who are very concerned 
about the clock_gettime hotpath.
Further, the correction will be needed to be made in the vsyscall paths, 
which isn't done with your current patchset (causing userland to see 
different time values then what kernel space calculates).

One possible thing to consider? Since the TIME_OOP flag is only visible 
via the adjtimex() interface, maybe it alone should have the extra 
overhead of the conditional? I'm not excited about the gettimeofday 
field returned by adjtimex not matching what gettimeofday actually 
provides for that single-tick interval, but maybe its a reasonable 
middle ground?

> ** pro
>     - Removes repetitive, periodic division (secs % 86400 == 0) the whole
>       day long preceding a leap second.
>     - Cost of maintaining leap second status goes to the user of the
>       NTP adjtimex() interface, if any.
Not sure I follow this last point. How are we pushing this maintenance 
to adjtimex() users?


> * Todo
>    - The function __current_kernel_time accesses the time variables
>      without taking the lock. I can't figure that out.
>
There's a few cases where we want the current second value when we 
already hold the xtime_lock, or we might possibly hold the xtime_lock. 
Its an special internal interface for special users (update_vsyscall, 
for example).

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds.
  2012-04-27  8:12 ` [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds Richard Cochran
@ 2012-04-27 23:08   ` John Stultz
  2012-04-28  8:47     ` Richard Cochran
  2012-05-05 10:17     ` Richard Cochran
  0 siblings, 2 replies; 28+ messages in thread
From: John Stultz @ 2012-04-27 23:08 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 04/27/2012 01:12 AM, Richard Cochran wrote:
> This patch adds a new internal interface to be used by the NTP code in
> order to set the next leap second event. Also, it adds a kernel command
> line option that can be used to dial the TAI - UTC offset at boot.
>
> Signed-off-by: Richard Cochran<richardcochran@gmail.com>
> ---
>   kernel/time/leap-seconds.h |   23 ++++++
>   kernel/time/timekeeping.c  |  175 ++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 198 insertions(+), 0 deletions(-)
>   create mode 100644 kernel/time/leap-seconds.h
[snip]
>   /* Structure holding internal timekeeping values. */
>   struct timekeeper {
>   	/* Current clocksource used for timekeeping. */
> @@ -50,6 +53,16 @@ struct timekeeper {
>
>   	/* The current time */
>   	struct timespec xtime;
> +	/* The Kernel Time Scale (KTS) value of the next leap second. */
> +	time_t next_leapsecond;

I'm not a big fan of this new Kernel Time Scale. I don't think we really 
need a new time domain, and it only muddles things.
Why not have next_leapsecond be specified in CLOCK_MONOTONIC time?

> +	/* The current difference KTS - UTC. */
> +	int kts_utc_offset;
> +	/* The current difference TAI - KTS. */
> +	int tai_kts_offset;
Again, get rid of KTS and we can simplify these as:
CLOCK_TAI = CLOCK_MONOTONIC + mono_to_tai;
UTC/CLOCK_REALTIME = CLOCK_TAI + utc_offset;

This basically requires changing the timekeeping core from keeping track 
of CLOCK_REALTIME as its time base, and instead having it use 
CLOCK_MONOTONIC. This actually is something I proposed a long time ago, 
but was deferred because folks were concerned that the extra addition 
would slow gettimeofday() down.  However, that was back when everything 
internally used jiffies. These days ktime_get() is probably called 
internally more frequently, and so optimizing for CLOCK_MONOTONIC makes 
sense.



> +#ifdef CONFIG_DELETE_LEAP_SECONDS
> +	/* Remembers whether to insert or to delete. */
> +	int insert_leapsecond;
> +#endif

I'm not a big fan of this additional config option.  The maintenance 
burden for the extra condition is probably not worth the code size 
trade-off.  Or it needs way more justification.

>   	/*
>   	 * wall_to_monotonic is what we need to add to xtime (or xtime corrected
>   	 * for sub jiffie times) to get to monotonic time.  Monotonic is pegged
> @@ -87,6 +100,30 @@ __cacheline_aligned_in_smp DEFINE_SEQLOCK(xtime_lock);
>   int __read_mostly timekeeping_suspended;
>
>
> +static int __init tai_offset_setup(char *str)
> +{
> +	get_option(&str,&timekeeper.kts_utc_offset);
> +	return 1;
> +}
> +__setup("tai_offset=", tai_offset_setup);
> +
Oooof.. Yuck.  I really don't like the boot time tai_offset argument. 
What sysadmin wants to change their grub settings after a leapsecond so 
that the next reboot its set properly?  I'd suggest tai_offset be set to 
zero until userland updates it at boot time.  CLOCK_TAI is not 
CLOCK_MONOTONIC, and it can jump around if the user calls 
settimeofday(), so I don't see a major reason for it to be initialized 
via a boot argument.  Its true that right now there's no userland 
infrastructure to set it (other then ntp, but I've never verified it 
actually gets set), but there also aren't any users consuming it either.

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-27 22:49 ` [PATCH RFC V1 0/5] Rationalize time keeping John Stultz
@ 2012-04-28  8:04   ` Richard Cochran
  2012-04-30 20:56     ` John Stultz
  2012-05-03 18:21   ` Richard Cochran
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-04-28  8:04 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Fri, Apr 27, 2012 at 03:49:51PM -0700, John Stultz wrote:
> On 04/27/2012 01:12 AM, Richard Cochran wrote:
> >* Benefits
> >   - Fixes the buggy, inconsistent time reporting surrounding a leap
> >     second event.
> Just to clarify this, so we've got the right scope on the problem,
> you're trying to address the fact that the leap second is not
> actually applied until the tick after the leap second, correct?

That is one problem, yes.
 
> Where basically you can see small offsets like:

I can synchronize over the network to under 100 nanoseconds, so to me,
one second is a large offset.

> And you're proposing we fix this by changing the leap-second
> processing from only being done at tick-time  (which isn't exactly
> on the second boundary)to being calculated for each getnstimeofday,
> correct?

Yes. We prodive UTC time on demand whenever an application (or kernel
thread) asks for it.

> >   - Opens the possibility of offering a rational time source to user
> >     space. [ Trivial to offer clock_gettime(CLOCK_TAI) for example. ]
> 
> CLOCK_TAI is something I'd like to have.

Me, too.

> My only concern is how we
> manage it along with possible smeared-leap-seconds ala:
> http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html
> 
> ( I shudder at the idea of managing two separate frequency
> corrections for different time domains).

Are you planning to implement that? This approach is by no means
universally accepted.

In my view, what Google is doing is hack (albeit a sensible for
business applications). For test and measurement or scientific
applications, it does not make sense to introduce artifical frequency
errors in this way.

Another variant of this idea: http://www.cl.cam.ac.uk/~mgk25/time/utc-sls/

Here is a nice quote from that page:

   All other objections to UTC-SLS that I heard were not directed
   against its specific design choices, but against the (very well
   established) practice of using UTC at all in the applications that
   this proposal targets:

   * Some people argue that operating system interfaces, such as the
     POSIX "seconds since the epoch" scale used in time_t APIs, should
     be changed from being an encoding of UTC to being an encoding of
     the leap-second free TAI timescale.

   * Some people want to go even further and abandon UTC and leap
     seconds entirely, detach all civilian time zones from the
     rotation of Earth, and redefine them purely based on atomic time.

   While these people are usually happy to agree that UTC-SLS is a
   sensible engineering solution as long as UTC remains the main time
   basis of distributed computing, they argue that this is just a
   workaround that will be obsolete once their grand vision of giving
   up UTC entirely has become true, and that it is therefore just an
   unwelcome distraction from their ultimate goal.

Until the whole world agrees to this "work around" I think we should
stick to current standards. If and when this practice becomes
standardized (I'm not holding my breath), then we could simply drop
the internal difference between the kernel time scale and UTC, and
steer out the leap second synchronously with the rest of the world.

> >* Performance Impacts
> >** con
> >    - Small extra cost when reading the time (one integer addition plus
> >      one integer test).
> This may not be so small when it comes to folks who are very
> concerned about the clock_gettime hotpath.

If you would support the option to only insert leap seconds, then the
cost is one integer addition and one integer test.

Also, once we have a rational time interface (like CLOCK_TAI), then
time sensitive will want to use that instead anyhow.

> Further, the correction will be needed to be made in the vsyscall
> paths, which isn't done with your current patchset (causing userland
> to see different time values then what kernel space calculates).

Do you mean __current_kernel_time? What did I miss?

> One possible thing to consider? Since the TIME_OOP flag is only
> visible via the adjtimex() interface, maybe it alone should have the
> extra overhead of the conditional?

This would mean that you would have to do the conditional somehow
backwards in order to provide TAI time values. To me, the logical way
is to keep a continuous time scale, and then compute UTC from it.

> I'm not excited about the
> gettimeofday field returned by adjtimex not matching what
> gettimeofday actually provides for that single-tick interval, but
> maybe its a reasonable middle ground?

Not sure what you mean, but to me it is not acceptable to deliver
inconsistent time values to userspace!

> >** pro
> >    - Removes repetitive, periodic division (secs % 86400 == 0) the whole
> >      day long preceding a leap second.
> >    - Cost of maintaining leap second status goes to the user of the
> >      NTP adjtimex() interface, if any.
> Not sure I follow this last point. How are we pushing this
> maintenance to adjtimex() users?

Only adjtimex calls timekeeper_gettod_status, where the leap second is
calculated, outside of timekeeper.lock, on the NTP user space's kernel
time.

In current Linux, the modulus is done in update_wall_time and
logarithmic_accumulation, on kernel time.

> >* Todo
> >   - The function __current_kernel_time accesses the time variables
> >     without taking the lock. I can't figure that out.
> >
> There's a few cases where we want the current second value when we
> already hold the xtime_lock, or we might possibly hold the
> xtime_lock. Its an special internal interface for special users
> (update_vsyscall, for example).

What about kdb_summary?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds.
  2012-04-27 23:08   ` John Stultz
@ 2012-04-28  8:47     ` Richard Cochran
  2012-05-05 10:17     ` Richard Cochran
  1 sibling, 0 replies; 28+ messages in thread
From: Richard Cochran @ 2012-04-28  8:47 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Fri, Apr 27, 2012 at 04:08:03PM -0700, John Stultz wrote:
> On 04/27/2012 01:12 AM, Richard Cochran wrote:
> >+	/* The Kernel Time Scale (KTS) value of the next leap second. */
> >+	time_t next_leapsecond;
> 
> I'm not a big fan of this new Kernel Time Scale. I don't think we
> really need a new time domain, and it only muddles things.
> Why not have next_leapsecond be specified in CLOCK_MONOTONIC time?

I agree that the monotonic could be used as the basis, but I didn't
want to muck around with it just yet. This is only a RFC.
 
> >+	/* The current difference KTS - UTC. */
> >+	int kts_utc_offset;
> >+	/* The current difference TAI - KTS. */
> >+	int tai_kts_offset;
> Again, get rid of KTS and we can simplify these as:
> CLOCK_TAI = CLOCK_MONOTONIC + mono_to_tai;
> UTC/CLOCK_REALTIME = CLOCK_TAI + utc_offset;
> 
> This basically requires changing the timekeeping core from keeping
> track of CLOCK_REALTIME as its time base, and instead having it use
> CLOCK_MONOTONIC. This actually is something I proposed a long time
> ago, but was deferred because folks were concerned that the extra
> addition would slow gettimeofday() down.  However, that was back
> when everything internally used jiffies. These days ktime_get() is
> probably called internally more frequently, and so optimizing for
> CLOCK_MONOTONIC makes sense.

Yes, that is really the basic idea.

> >+#ifdef CONFIG_DELETE_LEAP_SECONDS
> >+	/* Remembers whether to insert or to delete. */
> >+	int insert_leapsecond;
> >+#endif
> 
> I'm not a big fan of this additional config option.  The maintenance
> burden for the extra condition is probably not worth the code size
> trade-off.  Or it needs way more justification.

Well, if people are wanting a fast gettimeofday, then they will want
this, too.

Furthermore, the extra code for deletion is dead code. It has never
been put to the test. It never, ever, gets called, and never will.
(Have you ever tested the current code?  Does it do the deletion
correctly?) Earth is decelerating. It is a scientific fact, almost as
certain as global warming.

[ Can't resist this quote from: "UTC is doomed" ]

  http://www.ucolick.org/~sla/leapsecs/HTMLutcdoomed.html

  If global warming causes the ice caps to melt then the deceleration
  will be somewhat larger.  If melting ice caps shut down the Gulf
  Stream and cause a new ice age then the deceleration could be
  somewhat smaller.  If a supervolcano erupts or asteroid strikes, all
  bets are off.

At first I had planned to not implement deletion at all, but I added
the option just to appease anyone who might protest, "but it is
standardized! NTP can do this!" So, if no option, then I would rather
just not offer deletion at all.

> >+static int __init tai_offset_setup(char *str)
> >+{
> >+	get_option(&str,&timekeeper.kts_utc_offset);
> >+	return 1;
> >+}
> >+__setup("tai_offset=", tai_offset_setup);
> >+
> Oooof.. Yuck.  I really don't like the boot time tai_offset
> argument. What sysadmin wants to change their grub settings after a
> leapsecond so that the next reboot its set properly?

One who cares about telling correct time?

Grub is not the only bootloader, but even so, if I admin a 24x7
critical system, and I have to do a planned reboot, then I surely will
want to update the grub.conf before rebooting. Or it would be enough
to check it once every six months according to a routine maintainance
schedule.

> I'd suggest tai_offset be set to zero until userland updates it at
> boot time.  CLOCK_TAI is not CLOCK_MONOTONIC, and it can jump around
> if the user calls settimeofday(), so I don't see a major reason for
> it to be initialized via a boot argument.

The reason is that it gives careful admins the possibility to have
time based services immediately working after boot. It should not be a
"must" to wait for the network. Starting with zero mandates that the
network is always present, but I think it smarter to allow for an
outage.

> Its true that right now there's no userland infrastructure to set it
> (other then ntp, but I've never verified it actually gets set), but
> there also aren't any users consuming it either.

Later on, there will be users.

I think it quite reasonable to expect that (once we offer a rational
time base) time sensitive systems will store the offset automatically.
There are lots of ways to do this, like bootloader config files, NVRAM
in an RTC, flash parameter sections, and so on.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-28  8:04   ` Richard Cochran
@ 2012-04-30 20:56     ` John Stultz
  2012-05-01  7:17       ` Richard Cochran
  2012-05-03  7:02       ` Richard Cochran
  0 siblings, 2 replies; 28+ messages in thread
From: John Stultz @ 2012-04-30 20:56 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 04/28/2012 01:04 AM, Richard Cochran wrote:
> On Fri, Apr 27, 2012 at 03:49:51PM -0700, John Stultz wrote:
>> On 04/27/2012 01:12 AM, Richard Cochran wrote:
>>> * Benefits
>>>    - Fixes the buggy, inconsistent time reporting surrounding a leap
>>>      second event.
>> Just to clarify this, so we've got the right scope on the problem,
>> you're trying to address the fact that the leap second is not
>> actually applied until the tick after the leap second, correct?
> That is one problem, yes.
>
>> Where basically you can see small offsets like:
> I can synchronize over the network to under 100 nanoseconds, so to me,
> one second is a large offset.

Well, the leap-offset is a second, but when it arrives is only 
tick-accurate. :)


>> My only concern is how we
>> manage it along with possible smeared-leap-seconds ala:
>> http://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html
>>
>> ( I shudder at the idea of managing two separate frequency
>> corrections for different time domains).
> Are you planning to implement that? This approach is by no means
> universally accepted.

No no.  I have no plans there.

> In my view, what Google is doing is hack (albeit a sensible for
> business applications). For test and measurement or scientific
> applications, it does not make sense to introduce artifical frequency
> errors in this way.

True, although even if it is a hack, google *is* using it.  My concern 
is that if CLOCK_REALTIME  is smeared to avoid a leap second jump, in 
that environment we cannot also accurate provide a correct CLOCK_TAI.  
So far that's not been a problem, because CLOCK_TAI isn't a clockid we 
yet support.  But the expectations bar always rises, so I suspect once 
we have a CLOCK_TAI, someone will want us to handle smeared-leap seconds 
without affecting CLOCK_TAI's correctness.


> Another variant of this idea: http://www.cl.cam.ac.uk/~mgk25/time/utc-sls/
>
> Here is a nice quote from that page:
>
>     All other objections to UTC-SLS that I heard were not directed
>     against its specific design choices, but against the (very well
>     established) practice of using UTC at all in the applications that
>     this proposal targets:
>
>     * Some people argue that operating system interfaces, such as the
>       POSIX "seconds since the epoch" scale used in time_t APIs, should
>       be changed from being an encoding of UTC to being an encoding of
>       the leap-second free TAI timescale.
>
>     * Some people want to go even further and abandon UTC and leap
>       seconds entirely, detach all civilian time zones from the
>       rotation of Earth, and redefine them purely based on atomic time.
>
>     While these people are usually happy to agree that UTC-SLS is a
>     sensible engineering solution as long as UTC remains the main time
>     basis of distributed computing, they argue that this is just a
>     workaround that will be obsolete once their grand vision of giving
>     up UTC entirely has become true, and that it is therefore just an
>     unwelcome distraction from their ultimate goal.
I think this last point is very telling. Neither of the above options 
are really viable in my mind, as I don't see any real consensus to 
giving up UTC.  What is in-practice is actually way more important then 
where folks wish things would go.

> Until the whole world agrees to this "work around" I think we should
> stick to current standards. If and when this practice becomes
> standardized (I'm not holding my breath), then we could simply drop
> the internal difference between the kernel time scale and UTC, and
> steer out the leap second synchronously with the rest of the world.
Well, I think that Google shows some folks are starting to use 
workarounds like smeared-leap-seconds/UTC-SLS. So its something we 
should watch carefully and expect more folks to follow.  Its true that 
you don't want to mix UTC-SLS and standard UTC time domains, but its 
likely this will be a site-specific configuration.

So its a concern when a correct CLOCK_TAI would be incompatible on 
systems using these hacks/workarounds.


>>> * Performance Impacts
>>> ** con
>>>     - Small extra cost when reading the time (one integer addition plus
>>>       one integer test).
>> This may not be so small when it comes to folks who are very
>> concerned about the clock_gettime hotpath.
> If you would support the option to only insert leap seconds, then the
> cost is one integer addition and one integer test.

*Any* extra work is a big deal to folks who are sensitive to 
clock_gettime performance.
That said, I don't see why its more complicated to also handle leap removal?

> Also, once we have a rational time interface (like CLOCK_TAI), then
> time sensitive will want to use that instead anyhow.
Well, performance sensitive and correctness sensitive are two different 
things. :) I think CLOCK_TAI is much cleaner for things, but at the same 
time, the world "thinks" in UTC, and converting between them isn't 
always trivial (very similar to the timezone presentation layer, which 
isn't fun). So I'd temper any hopes of mass conversion. :)

>> Further, the correction will be needed to be made in the vsyscall
>> paths, which isn't done with your current patchset (causing userland
>> to see different time values then what kernel space calculates).
> Do you mean __current_kernel_time? What did I miss?
No. So, on architectures that support vsyscalls/vdso (x86_64, powerpc, 
ia64, and maybe a few others) getnstimeofday() is really only an 
internal interface for in-kernel access. Userland uses the vsyscall/vdso 
interface to be able to read the time completely from userland context 
(with no syscall overhead). Since this is done in different ways for 
each architecture, you need to export the proper information out via 
update_vsyscall() and also update the arch-specific vsyscall 
gettimeofday paths (which is non-trivial, as some arches are implemented 
in asm, etc - my sympathies here, its a pain).


>> One possible thing to consider? Since the TIME_OOP flag is only
>> visible via the adjtimex() interface, maybe it alone should have the
>> extra overhead of the conditional?
> This would mean that you would have to do the conditional somehow
> backwards in order to provide TAI time values. To me, the logical way
> is to keep a continuous time scale, and then compute UTC from it.

? Not sure I'm following you here.

What I'm recommending, is even if you rework the kernel so that it 
constructs time as follows:

CLOCK_TAI = CLOCK_MONOTONIC + monotonic_to_tai
CLOCK_REALTIME = CLOCK_TAI + tai_to_utc

The  adjustment made to tai_to_utc by the leap second would still be 
changed at tick time, but the logic to avoid the sub-tick inconsistency 
at the second edge would be only made to the adjtimex() interface. Thus 
for folks who really care about leap seconds, who already need to use 
adjtimex in order to detect the TIME_OOP flag would get the very correct 
time value, but the performance sensitive users of clock_gettime 
wouldn't be affected.

>> I'm not excited about the
>> gettimeofday field returned by adjtimex not matching what
>> gettimeofday actually provides for that single-tick interval, but
>> maybe its a reasonable middle ground?
> Not sure what you mean, but to me it is not acceptable to deliver
> inconsistent time values to userspace!
For users of clock_gettime/gettimeofday, a leapsecond is an 
inconsistency. Neither interfaces provide a way to detect that the 
TIME_OOP flag is set and its not 23:59:59 again, but 23:59:60 (which 
can't be represented by a time_t).  Thus even if the behavior was 
perfect, and the leapsecond landed at exactly the second edge, it is 
still a time hiccup to most applications anyway.

Thus, most of userland doesn't really care if the hiccup happens up to a 
tick after the second's edge. They don't expect it anyway.  So they 
really don't want a constant performance drop in order for the hiccup to 
be more "correct" when it happens.  :)

That's why I'm suggesting that you consider starting by modifying the 
adjtimex() interface. Any application that actually cares about 
leapseconds should be using adjtimex() since its the only interface that 
allows you to realize that's whats happening. Its not a performance 
optimized path, and so its a fine candidate for being slow-but-correct.

My only concern there is that it would cause problems when mixing 
adjtimex() calls with clock_gettime() calls, because you could have a 
tick-length of time when they report different time values. But this may 
be acceptable.


>>> ** pro
>>>     - Removes repetitive, periodic division (secs % 86400 == 0) the whole
>>>       day long preceding a leap second.
>>>     - Cost of maintaining leap second status goes to the user of the
>>>       NTP adjtimex() interface, if any.
>> Not sure I follow this last point. How are we pushing this
>> maintenance to adjtimex() users?
> Only adjtimex calls timekeeper_gettod_status, where the leap second is
> calculated, outside of timekeeper.lock, on the NTP user space's kernel
> time.
So its not really a cost-of-maintaining, but a cost-of-calculation.  We 
only calculate the next leap second when its provided via adjtimex 
rather then doing the check periodically in the kernel.

> In current Linux, the modulus is done in update_wall_time and
> logarithmic_accumulation, on kernel time.
>
>>> * Todo
>>>    - The function __current_kernel_time accesses the time variables
>>>      without taking the lock. I can't figure that out.
>>>
>> There's a few cases where we want the current second value when we
>> already hold the xtime_lock, or we might possibly hold the
>> xtime_lock. Its an special internal interface for special users
>> (update_vsyscall, for example).
> What about kdb_summary?
I don't know the kdb patch especially well, but I suspect kdb_summary 
might be triggered at unexpected times if you're trying to debug a 
remote kernel. Thus we want to be able to get the time_t value (which 
can be read safely without a lock on most systems)  without trying to 
grab a lock that might be held. This avoids deadlock  should kdb be 
blocking the lock-holder from running.

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-30 20:56     ` John Stultz
@ 2012-05-01  7:17       ` Richard Cochran
  2012-05-01  8:01         ` John Stultz
  2012-05-03  7:02       ` Richard Cochran
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-05-01  7:17 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Mon, Apr 30, 2012 at 01:56:16PM -0700, John Stultz wrote:
> On 04/28/2012 01:04 AM, Richard Cochran wrote:
> >I can synchronize over the network to under 100 nanoseconds, so to me,
> >one second is a large offset.
> 
> Well, the leap-offset is a second, but when it arrives is only
> tick-accurate. :)

It would be fine to change the leap second status on the tick, but
then you must also change the time then, and only then, as well. I
know Linux moved away from this long ago, and the new way is better,
but still what the kernel does today is just plain wrong.

But there is a fix. I just offered it.

> True, although even if it is a hack, google *is* using it.  My
> concern is that if CLOCK_REALTIME  is smeared to avoid a leap second
> jump, in that environment we cannot also accurate provide a correct
> CLOCK_TAI.  So far that's not been a problem, because CLOCK_TAI
> isn't a clockid we yet support.  But the expectations bar always
> rises, so I suspect once we have a CLOCK_TAI, someone will want us
> to handle smeared-leap seconds without affecting CLOCK_TAI's
> correctness.

It is either/or, but not both simultaneously.

My proposal does not prevent the smear method in any way. People who
want the smear just never schedule a leap second. People who want the
frequency constant just use the TAI clock interface for the important
work.

We really don't have to support both ways at once.

> >    While these people are usually happy to agree that UTC-SLS is a
> >    sensible engineering solution as long as UTC remains the main time
> >    basis of distributed computing, they argue that this is just a
> >    workaround that will be obsolete once their grand vision of giving
> >    up UTC entirely has become true, and that it is therefore just an
> >    unwelcome distraction from their ultimate goal.
> I think this last point is very telling. Neither of the above
> options are really viable in my mind, as I don't see any real
> consensus to giving up UTC.  What is in-practice is actually way
> more important then where folks wish things would go.

We don't need to give up UTC. We can offer correct UTC and a new,
rational TAI clock and get the leap seconds right, all at the same
time, wow.

> Well, I think that Google shows some folks are starting to use
> workarounds like smeared-leap-seconds/UTC-SLS. So its something we
> should watch carefully and expect more folks to follow.

... but don't hold your breath ...

> Its true
> that you don't want to mix UTC-SLS and standard UTC time domains,
> but its likely this will be a site-specific configuration.
> 
> So its a concern when a correct CLOCK_TAI would be incompatible on
> systems using these hacks/workarounds.

I don't see any problem here.

[ If you do the smear, you can simply just ignore the TAI clock at let
  it be wrong (just like how we handle leap seconds today BTW) or jump
  it at the new epoch. ]

> *Any* extra work is a big deal to folks who are sensitive to
> clock_gettime performance.
> That said, I don't see why its more complicated to also handle leap removal?

It makes your kernel image larger with no added benefit.

> Well, performance sensitive and correctness sensitive are two
> different things. :) I think CLOCK_TAI is much cleaner for things,
> but at the same time, the world "thinks" in UTC, and converting
> between them isn't always trivial (very similar to the timezone
> presentation layer, which isn't fun). So I'd temper any hopes of
> mass conversion. :)

I know, it is a highly politial issue. Converting to UTC is best
handled via timezones. Leap seconds are really just the same as
daylight savings times. Ideally, the kernel would provide only
continuous time, and libc would do the rest.

But I don't expect to change the world, only to fix the kernel.

> Since this is done in
> different ways for each architecture, you need to export the proper
> information out via update_vsyscall() and also update the
> arch-specific vsyscall gettimeofday paths (which is non-trivial, as
> some arches are implemented in asm, etc - my sympathies here, its a
> pain).

Okay, I'll get more familar with that.

> For users of clock_gettime/gettimeofday, a leapsecond is an
> inconsistency. Neither interfaces provide a way to detect that the
> TIME_OOP flag is set and its not 23:59:59 again, but 23:59:60 (which
> can't be represented by a time_t).  Thus even if the behavior was
> perfect, and the leapsecond landed at exactly the second edge, it is
> still a time hiccup to most applications anyway.
> 
> Thus, most of userland doesn't really care if the hiccup happens up
> to a tick after the second's edge. They don't expect it anyway.  So
> they really don't want a constant performance drop in order for the
> hiccup to be more "correct" when it happens.  :)

I don't buy that argument. Repeating a time_t value leads to ambiguous
UTC times, put it is posixly correct. The values are usable together
with difftime(3). Having the time_t go forward and then back again is
certainly worse.

If we leave everything as is, then the user is left with two choices
for data collection applications.

1. Turn off your data system on the night of a leap second.

2. Record data even during a leap second, but post process the files
   to fix up all the uglies.

Either way, the kernel has failed us.

> That's why I'm suggesting that you consider starting by modifying
> the adjtimex() interface. Any application that actually cares about
> leapseconds should be using adjtimex() since its the only interface
> that allows you to realize that's whats happening. Its not a
> performance optimized path, and so its a fine candidate for being
> slow-but-correct.
> 
> My only concern there is that it would cause problems when mixing
> adjtimex() calls with clock_gettime() calls, because you could have
> a tick-length of time when they report different time values. But
> this may be acceptable.

(Introduce yet another kernel bug? No, thanks ;)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-01  7:17       ` Richard Cochran
@ 2012-05-01  8:01         ` John Stultz
  2012-05-01 18:43           ` Richard Cochran
  0 siblings, 1 reply; 28+ messages in thread
From: John Stultz @ 2012-05-01  8:01 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 05/01/2012 12:17 AM, Richard Cochran wrote:
> On Mon, Apr 30, 2012 at 01:56:16PM -0700, John Stultz wrote:
>> On 04/28/2012 01:04 AM, Richard Cochran wrote:
>>> I can synchronize over the network to under 100 nanoseconds, so to me,
>>> one second is a large offset.
>> Well, the leap-offset is a second, but when it arrives is only
>> tick-accurate. :)
> It would be fine to change the leap second status on the tick, but
> then you must also change the time then, and only then, as well. I
> know Linux moved away from this long ago, and the new way is better,
> but still what the kernel does today is just plain wrong.
>
> But there is a fix. I just offered it.
Maybe could you clarify a bit more here about how the kernel is plain 
wrong?  Try a clear problem description including examples (before and 
after you changes)? I'm worried we're talking about different problems.


>> True, although even if it is a hack, google *is* using it.  My
>> concern is that if CLOCK_REALTIME  is smeared to avoid a leap second
>> jump, in that environment we cannot also accurate provide a correct
>> CLOCK_TAI.  So far that's not been a problem, because CLOCK_TAI
>> isn't a clockid we yet support.  But the expectations bar always
>> rises, so I suspect once we have a CLOCK_TAI, someone will want us
>> to handle smeared-leap seconds without affecting CLOCK_TAI's
>> correctness.
> It is either/or, but not both simultaneously.
>
> My proposal does not prevent the smear method in any way. People who
> want the smear just never schedule a leap second. People who want the
> frequency constant just use the TAI clock interface for the important
> work.
>
> We really don't have to support both ways at once.
If both are adopted (separately) by enough folks, we will have to 
support both ways at once. That's why I'm trying to suggest we think a 
bit about how that might be possible.


>
>> *Any* extra work is a big deal to folks who are sensitive to
>> clock_gettime performance.
>> That said, I don't see why its more complicated to also handle leap removal?
> It makes your kernel image larger with no added benefit.
Again,  this should be justified with numbers (try size vmlinux or size 
ntp.o to generate these). More config options makes code harder to 
maintain & test, so I'm pushing back a bit here.  I also suspect keeping 
both can be done with very little extra code.


>> For users of clock_gettime/gettimeofday, a leapsecond is an
>> inconsistency. Neither interfaces provide a way to detect that the
>> TIME_OOP flag is set and its not 23:59:59 again, but 23:59:60 (which
>> can't be represented by a time_t).  Thus even if the behavior was
>> perfect, and the leapsecond landed at exactly the second edge, it is
>> still a time hiccup to most applications anyway.
>>
>> Thus, most of userland doesn't really care if the hiccup happens up
>> to a tick after the second's edge. They don't expect it anyway.  So
>> they really don't want a constant performance drop in order for the
>> hiccup to be more "correct" when it happens.  :)
> I don't buy that argument. Repeating a time_t value leads to ambiguous
> UTC times, put it is posixly correct. The values are usable together
> with difftime(3). Having the time_t go forward and then back again is
> certainly worse.
>
> If we leave everything as is, then the user is left with two choices
> for data collection applications.
>
> 1. Turn off your data system on the night of a leap second.
>
> 2. Record data even during a leap second, but post process the files
>     to fix up all the uglies.
>
> Either way, the kernel has failed us.
3. Use adjtimex() and interpret the timespec and time_state return 
together to interpret 23:59:60 properly?

4. Use adjtimex(), and use the timespec  + the time_tai offset value to 
calculate TAI time?

I dunno. Again, I suspect we're thinking about different issues that 
sound very similar. :)


>> That's why I'm suggesting that you consider starting by modifying
>> the adjtimex() interface. Any application that actually cares about
>> leapseconds should be using adjtimex() since its the only interface
>> that allows you to realize that's whats happening. Its not a
>> performance optimized path, and so its a fine candidate for being
>> slow-but-correct.
>>
>> My only concern there is that it would cause problems when mixing
>> adjtimex() calls with clock_gettime() calls, because you could have
>> a tick-length of time when they report different time values. But
>> this may be acceptable.
> (Introduce yet another kernel bug? No, thanks ;)
>
You did just suggest we allow for CLOCK_TAI to be broken for folks want 
to use smeared-leap-seconds.  That sounds like an eventual bug too.  :) 
Regardless, the point of my suggestion is that you're likely going to be 
adding logic to a very hot path, and the resulting benefit has not been 
clearly stated. Expect push back there (not just from me), so have a 
strong argument and show overhead impact with numbers to help your 
case.  Alternatively look for other solutions that don't affect the 
hot-path (like what I suggested w/ adjtimex() above - although that may 
have downsides too).

Do forgive me for prodding you here.  Assuming I understand your goals 
(adding CLOCK_TAI, reworking timekeeping core to keep construct time in 
a more sane way, and improved leapsecond logic), I very much want to see 
them come to be.  I appreciate your focus on solving some of the complex 
and unloved issues here.

I look forward to your next revision!

thanks
-john






^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-01  8:01         ` John Stultz
@ 2012-05-01 18:43           ` Richard Cochran
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Cochran @ 2012-05-01 18:43 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Tue, May 01, 2012 at 01:01:38AM -0700, John Stultz wrote:
> On 05/01/2012 12:17 AM, Richard Cochran wrote:
> >On Mon, Apr 30, 2012 at 01:56:16PM -0700, John Stultz wrote:
> >>Well, the leap-offset is a second, but when it arrives is only
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>tick-accurate. :)
    ^^^^^^^^^^^^^^

(You said it yourself ...

> >It would be fine to change the leap second status on the tick, but
> >then you must also change the time then, and only then, as well. I
> >know Linux moved away from this long ago, and the new way is better,
> >but still what the kernel does today is just plain wrong.
> >
> >But there is a fix. I just offered it.
> Maybe could you clarify a bit more here about how the kernel is
> plain wrong?  Try a clear problem description including examples
> (before and after you changes)? I'm worried we're talking about
> different problems.

... how the kernel is plain wrong ;)

Here is an example, a preview of what will happen (again) this
summer. This trace is the output of a program that sets the time to
five seconds before the end of the day of a leap second, set the INS
flag, and reads the time in a tight loop, all using adjtimex(). Kernel
is 3.3.0+ and platform is Intel Atom.

Time status
| ID Insert/Delete
| |  TAI offset
| |  |  UTC time_t value
| |  |  |
v v  v  v
1 I- 34 1341100799.000000276
1 I- 34 1341100799.000016480
1 I- 34 1341100799.000021718
        ...
1 I- 34 1341100799.999991361
1 I- 34 1341100799.999995622
1 I- 34 1341100799.999999952
1 I- 34 1341100800.000004212 * What is this?  Seconds and status wrong...
1 I- 34 1341100800.000009590
1 I- 34 1341100800.000014130
1 I- 34 1341100800.000018530
        ...
1 I- 34 1341100800.000087605
1 I- 34 1341100800.000091935
1 I- 34 1341100800.000096265
1 I- 34 1341100800.000100595 * ... still wrong ...
1 I- 34 1341100800.000105065
1 I- 34 1341100800.000109535
1 I- 34 1341100800.000113866
        ...
1 I- 34 1341100800.000227011
1 I- 34 1341100800.000231341
1 I- 34 1341100800.000235671
3 I- 34 1341100799.000276949 * Saved by the tick. (but TAI offset wrong)
3 I- 34 1341100799.000289451
3 I- 34 1341100799.000295807
3 I- 34 1341100799.000303280

Although this example is with adjtimex(), clock_gettime() will show
the same time_t defect. 

> >If we leave everything as is, then the user is left with two choices
> >for data collection applications.
> >
> >1. Turn off your data system on the night of a leap second.
> >
> >2. Record data even during a leap second, but post process the files
> >    to fix up all the uglies.
> >
> >Either way, the kernel has failed us.
> 3. Use adjtimex() and interpret the timespec and time_state return
> together to interpret 23:59:60 properly?
> 
> 4. Use adjtimex(), and use the timespec  + the time_tai offset value
> to calculate TAI time?
> 
> I dunno. Again, I suspect we're thinking about different issues that
> sound very similar. :)

The two options that you suggested won't work without additional
fixing of the uglies (see above).

> You did just suggest we allow for CLOCK_TAI to be broken for folks
> want to use smeared-leap-seconds.  That sounds like an eventual bug
> too.  :) Regardless, the point of my suggestion is that you're
> likely going to be adding logic to a very hot path, and the
> resulting benefit has not been clearly stated.

The benefit is to get correct UTC time_t values from the kernel at all
times, via all interfaces.

I can't understand the argument that a hot path is exempt from having
to be correct. Any programs receiving (really quick) time stamps that
are nonetheless occasionally off by one second will have to program
the checks and corrections themselves. That can hardly be more
efficient than doing the check in the kernel.

Or if user space is just ignoring the issue, then it is going to bite
them one day (unless they just turn off their systems for the leap
second). IMHO, the fact that such bugs can only happen once every six
months (at most) makes them far worse.

> Expect push back
> there (not just from me), so have a strong argument and show
> overhead impact with numbers to help your case.  Alternatively look
> for other solutions that don't affect the hot-path (like what I
> suggested w/ adjtimex() above - although that may have downsides
> too).

Okay, I will gather some numbers on the performace impact.
 
> Do forgive me for prodding you here.  Assuming I understand your
> goals (adding CLOCK_TAI, reworking timekeeping core to keep
> construct time in a more sane way, and improved leapsecond logic), I
> very much want to see them come to be.  I appreciate your focus on
> solving some of the complex and unloved issues here.
> 
> I look forward to your next revision!

Thank you for your encouragement,

Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-30 20:56     ` John Stultz
  2012-05-01  7:17       ` Richard Cochran
@ 2012-05-03  7:02       ` Richard Cochran
  2012-05-03 15:48         ` John Stultz
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-05-03  7:02 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Mon, Apr 30, 2012 at 01:56:16PM -0700, John Stultz wrote:
> No. So, on architectures that support vsyscalls/vdso (x86_64,
> powerpc, ia64, and maybe a few others) getnstimeofday() is really
> only an internal interface for in-kernel access. Userland uses the
> vsyscall/vdso interface to be able to read the time completely from
> userland context (with no syscall overhead). Since this is done in
> different ways for each architecture, you need to export the proper
> information out via update_vsyscall() and also update the
> arch-specific vsyscall gettimeofday paths (which is non-trivial, as
> some arches are implemented in asm, etc - my sympathies here, its a
> pain).

Okay, so now I understand the vDSO page thingy. Help me please to
understand exactly which architectures would need changes for my
proposal.

The only archs exporting time variables/functions through vDSO are
those which define CONFIG_GENERIC_TIME_VSYSCALL=y, namely:

- ia64
- powerpc 64 and 32 bit
- s390    64 and 32 bit
- x86     64 bit only **

  ** But 32 guest running in a 64 host also has time in the vDSO?!?

Did I get that right?

Thanks,
Richard



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-03  7:02       ` Richard Cochran
@ 2012-05-03 15:48         ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2012-05-03 15:48 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 05/03/2012 12:02 AM, Richard Cochran wrote:
> On Mon, Apr 30, 2012 at 01:56:16PM -0700, John Stultz wrote:
>> No. So, on architectures that support vsyscalls/vdso (x86_64,
>> powerpc, ia64, and maybe a few others) getnstimeofday() is really
>> only an internal interface for in-kernel access. Userland uses the
>> vsyscall/vdso interface to be able to read the time completely from
>> userland context (with no syscall overhead). Since this is done in
>> different ways for each architecture, you need to export the proper
>> information out via update_vsyscall() and also update the
>> arch-specific vsyscall gettimeofday paths (which is non-trivial, as
>> some arches are implemented in asm, etc - my sympathies here, its a
>> pain).
> Okay, so now I understand the vDSO page thingy. Help me please to
> understand exactly which architectures would need changes for my
> proposal.
>
> The only archs exporting time variables/functions through vDSO are
> those which define CONFIG_GENERIC_TIME_VSYSCALL=y, namely:
>
> - ia64
> - powerpc 64 and 32 bit
> - s390    64 and 32 bit
> - x86     64 bit only **
>
>    ** But 32 guest running in a 64 host also has time in the vDSO?!?
   Yes, but at least on x86 its just one implementation that needs to be 
modified.

>
> Did I get that right?
That looks right to me.

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-27 22:49 ` [PATCH RFC V1 0/5] Rationalize time keeping John Stultz
  2012-04-28  8:04   ` Richard Cochran
@ 2012-05-03 18:21   ` Richard Cochran
  2012-05-03 18:44     ` John Stultz
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-05-03 18:21 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Fri, Apr 27, 2012 at 03:49:51PM -0700, John Stultz wrote:
> On 04/27/2012 01:12 AM, Richard Cochran wrote:
> >* Performance Impacts
> >** con
> >    - Small extra cost when reading the time (one integer addition plus
> >      one integer test).
> This may not be so small when it comes to folks who are very
> concerned about the clock_gettime hotpath.
> Further, the correction will be needed to be made in the vsyscall
> paths, which isn't done with your current patchset (causing userland
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> to see different time values then what kernel space calculates).

John, now that you clarified the vDSO thing, I am very confused about
this statement of yours. It appears that the vDSO data are updated
when timekeeping_update() in timekeeper.c calls update_vsyscall().

I think the hunk from patch #5, below, does in fact adjust the time
value correctly before it gets handed off to the arch-specific
update_vsyscall() to be copied into the vDSO page. So I'll make the
claim that:

1. We don't have to touch the vsyscall paths for this.
2. This change does not affect vDSO performance at all.

Would you mind taking another look at the patch?

Thanks,
Richard

---
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7941258..6cedf46 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -212,11 +212,14 @@ static inline s64 timekeeping_get_ns_raw(void)
 /* must hold write on timekeeper.lock */
 static void timekeeping_update(bool clearntp)
 {
+	struct timespec ts;
 	if (clearntp) {
 		timekeeper.ntp_error = 0;
 		ntp_clear();
 	}
-	update_vsyscall(&timekeeper.xtime, &timekeeper.wall_to_monotonic,
+	ts.tv_sec = timekeeper_utc_sec();
+	ts.tv_nsec = timekeeper.xtime.tv_nsec;
+	update_vsyscall(&ts, &timekeeper.wall_to_monotonic,
 			 timekeeper.clock, timekeeper.mult);
 }

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-03 18:21   ` Richard Cochran
@ 2012-05-03 18:44     ` John Stultz
  2012-05-03 19:28       ` Richard Cochran
  0 siblings, 1 reply; 28+ messages in thread
From: John Stultz @ 2012-05-03 18:44 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 05/03/2012 11:21 AM, Richard Cochran wrote:
> On Fri, Apr 27, 2012 at 03:49:51PM -0700, John Stultz wrote:
>> On 04/27/2012 01:12 AM, Richard Cochran wrote:
>>> * Performance Impacts
>>> ** con
>>>     - Small extra cost when reading the time (one integer addition plus
>>>       one integer test).
>> This may not be so small when it comes to folks who are very
>> concerned about the clock_gettime hotpath.
>> Further, the correction will be needed to be made in the vsyscall
>> paths, which isn't done with your current patchset (causing userland
>           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> to see different time values then what kernel space calculates).
> John, now that you clarified the vDSO thing, I am very confused about
> this statement of yours. It appears that the vDSO data are updated
> when timekeeping_update() in timekeeper.c calls update_vsyscall().
>
> I think the hunk from patch #5, below, does in fact adjust the time
> value correctly before it gets handed off to the arch-specific
> update_vsyscall() to be copied into the vDSO page. So I'll make the
> claim that:
>
> 1. We don't have to touch the vsyscall paths for this.
> 2. This change does not affect vDSO performance at all.
>
But the changes you make to getnstimeofday() still needs to happen in 
the vDSO code. The vDSO code basically implements getnstimeofday() in 
userland.

If you're code is trying to make it so that the leap-second is properly 
handled at the second boundary instead of the tick boundary, there must 
me some change needed to the vDSO, since the vDSO code is updated only 
each tick. Otherwise how can you enforce the leap after the second 
boundary but before the tick?

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-03 18:44     ` John Stultz
@ 2012-05-03 19:28       ` Richard Cochran
  2012-05-03 19:42         ` John Stultz
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-05-03 19:28 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Thu, May 03, 2012 at 11:44:45AM -0700, John Stultz wrote:
> But the changes you make to getnstimeofday() still needs to happen
> in the vDSO code. The vDSO code basically implements
> getnstimeofday() in userland.
> 
> If you're code is trying to make it so that the leap-second is
> properly handled at the second boundary instead of the tick
> boundary, there must me some change needed to the vDSO, since the
> vDSO code is updated only each tick. Otherwise how can you enforce
> the leap after the second boundary but before the tick?

Yeah, so the vDSO does the sub-tick interpolation, and this can easily
miss an inserted leap second for a while (just like the current code).

So, this patch series as it stands improves the users of the
traditional syscalls without hurting the superduper vDSO performance
at all. The vDSO leap second time errors are not fixed, but they are
also no worse than today, either.

I am try to say that, even if there is resistance to adding code in
the vDSO path for reasons of performance, that doesn't necessarily
mean that we cannot fix the leap second for the tradition syscall
case.

Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-03 19:28       ` Richard Cochran
@ 2012-05-03 19:42         ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2012-05-03 19:42 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 05/03/2012 12:28 PM, Richard Cochran wrote:
> On Thu, May 03, 2012 at 11:44:45AM -0700, John Stultz wrote:
>> But the changes you make to getnstimeofday() still needs to happen
>> in the vDSO code. The vDSO code basically implements
>> getnstimeofday() in userland.
>>
>> If you're code is trying to make it so that the leap-second is
>> properly handled at the second boundary instead of the tick
>> boundary, there must me some change needed to the vDSO, since the
>> vDSO code is updated only each tick. Otherwise how can you enforce
>> the leap after the second boundary but before the tick?
> Yeah, so the vDSO does the sub-tick interpolation, and this can easily
> miss an inserted leap second for a while (just like the current code).
>
> So, this patch series as it stands improves the users of the
> traditional syscalls without hurting the superduper vDSO performance
> at all. The vDSO leap second time errors are not fixed, but they are
> also no worse than today, either.
>
> I am try to say that, even if there is resistance to adding code in
> the vDSO path for reasons of performance, that doesn't necessarily
> mean that we cannot fix the leap second for the tradition syscall
> case.
But this also has the same drawback of only fixing the adjtimex() path, 
in that applications that mix calls to gettimeofday or adjtimex will see 
different behavior in that tick interval.  I'd also try to avoid any 
disparity between the syscall and vdso syscall implementations, as 
they're supposed to be identical.

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
                   ` (5 preceding siblings ...)
  2012-04-27 22:49 ` [PATCH RFC V1 0/5] Rationalize time keeping John Stultz
@ 2012-05-03 19:57 ` John Stultz
  2012-05-05  7:34   ` Richard Cochran
  6 siblings, 1 reply; 28+ messages in thread
From: John Stultz @ 2012-05-03 19:57 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 04/27/2012 01:12 AM, Richard Cochran wrote:
> Just in time for this year's leap second, this patch series presents a
> solution for the UTC leap second mess.
>
> Of course, the POSIX UTC system is broken by design, and the Linux
> kernel cannot fix that. However, what we can do is correctly execute
> leap seconds and always report the time variables (UTC time, TAI
> offset, and leap second status) with consistency.
>
> The basic idea is to keep the internal time using a continuous
> timescale and to convert to UTC by testing the time value against the
> current threshold and adding the appropriate offset. Since the UTC
> time and the leap second status is provided on demand, this eliminates
> the need to set a timer or to constantly monitor for leap seconds, as
> was done up until now.

So as I mentioned in my earlier review of this patch set, I'm not as 
excited about the meta-layer you added in utc-tai.h.

So I figured I'd give it a short go and see if a good chunk of what your 
proposing could be done in a simpler way.

Please check out:
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/clktai

The untested patch set there basically pushes TAI time management into 
the timekeeping core, and then exports a CLOCK_TAI clockid.

This patch set *does not* address the tick-granularity delayed 
leap-second processing issue that your patch also addresses. But I feel 
like the basic handling of tai is a little simpler.

Take a look at it and let me know what you think.
thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-03 19:57 ` John Stultz
@ 2012-05-05  7:34   ` Richard Cochran
  2012-05-05 19:25     ` John Stultz
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-05-05  7:34 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Thu, May 03, 2012 at 12:57:40PM -0700, John Stultz wrote:
> On 04/27/2012 01:12 AM, Richard Cochran wrote:
> >
> >The basic idea is to keep the internal time using a continuous
> >timescale and to convert to UTC by testing the time value against the
> >current threshold and adding the appropriate offset. Since the UTC
> >time and the leap second status is provided on demand, this eliminates
> >the need to set a timer or to constantly monitor for leap seconds, as
> >was done up until now.
> 
> So as I mentioned in my earlier review of this patch set, I'm not as
> excited about the meta-layer you added in utc-tai.h.

It really isn't a layer. It is just a pair of static inline helper
functions. The code could just as well be internal to timekeeper.c,
but I put it in its own file so that I could include it into a unit
test program.

> http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/clktai
> 
> The untested patch set there basically pushes TAI time management
> into the timekeeping core, and then exports a CLOCK_TAI clockid.
> 
> This patch set *does not* address the tick-granularity delayed
> leap-second processing issue that your patch also addresses. But I
> feel like the basic handling of tai is a little simpler.

So, is this a "won't fix" issue, in your view?

(If so, I'll drop it.)
 
> Take a look at it and let me know what you think.

It looks okay to me, as far as it goes.

At least you offer a clock that doesn't jump around. With your patches
(and the TAI offset fix before), user space programs sensitive to the
leap second uglies would have a reasonable option. Data collection
programs can read and store TAI time stamps, and these can always be
converted into UTC using a table method.

But do you think it will be possible to implement the following in the
same way?

- clock_settime(CLOCK_TAI)
- clock_adjtime(CLOCK_TAI)
- timer_create(CLOCK_TAI)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds.
  2012-04-27 23:08   ` John Stultz
  2012-04-28  8:47     ` Richard Cochran
@ 2012-05-05 10:17     ` Richard Cochran
  2012-05-07 17:36       ` John Stultz
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2012-05-05 10:17 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-kernel, Thomas Gleixner

On Fri, Apr 27, 2012 at 04:08:03PM -0700, John Stultz wrote:
> On 04/27/2012 01:12 AM, Richard Cochran wrote:
> 
> >+#ifdef CONFIG_DELETE_LEAP_SECONDS
> >+	/* Remembers whether to insert or to delete. */
> >+	int insert_leapsecond;
> >+#endif
> 
> I'm not a big fan of this additional config option.  The maintenance
> burden for the extra condition is probably not worth the code size
> trade-off.  Or it needs way more justification.

Out of curiosity, I looked at ntp-4.2.6p5 to see if they really
support deleting leap seconds or not. Even though the code appears to
try and support them, I spotted a few bugs. There is a hard coded
assumption that the TAI offset is increasing.

This is just the reason why I suggest not supporting deletions (or
only conditionally for nit pickers). You can code it up, but it will
be in vain, since the code will never be tested or used in practice.
Code that is never executed is a true mainenance burden by definition.

Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 0/5] Rationalize time keeping
  2012-05-05  7:34   ` Richard Cochran
@ 2012-05-05 19:25     ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2012-05-05 19:25 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 05/05/2012 12:34 AM, Richard Cochran wrote:
> On Thu, May 03, 2012 at 12:57:40PM -0700, John Stultz wrote:
>> On 04/27/2012 01:12 AM, Richard Cochran wrote:
>>> The basic idea is to keep the internal time using a continuous
>>> timescale and to convert to UTC by testing the time value against the
>>> current threshold and adding the appropriate offset. Since the UTC
>>> time and the leap second status is provided on demand, this eliminates
>>> the need to set a timer or to constantly monitor for leap seconds, as
>>> was done up until now.
>> So as I mentioned in my earlier review of this patch set, I'm not as
>> excited about the meta-layer you added in utc-tai.h.
> It really isn't a layer. It is just a pair of static inline helper
> functions. The code could just as well be internal to timekeeper.c,
> but I put it in its own file so that I could include it into a unit
> test program.
Sorry, I was imprecise there.  The utc-tai helper functions aren't so 
much of the issue (although isolating them to the timkeeping code would 
be better in my mind), but the new KTS (kernel time scale) stuff, which 
was unintuitive to me.  (I mis-read the utc-tai.h helpers as converting 
from kts -> utc or tai).


>> http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/dev/clktai
>>
>> The untested patch set there basically pushes TAI time management
>> into the timekeeping core, and then exports a CLOCK_TAI clockid.
>>
>> This patch set *does not* address the tick-granularity delayed
>> leap-second processing issue that your patch also addresses. But I
>> feel like the basic handling of tai is a little simpler.
> So, is this a "won't fix" issue, in your view?
>
> (If so, I'll drop it.)
Not at all!  I just want to split out the two problems:
1) Providing a TAI clock
2) Fixing the tick-latency issue with leap-second processing.

>> Take a look at it and let me know what you think.
> It looks okay to me, as far as it goes.
>
> At least you offer a clock that doesn't jump around. With your patches
> (and the TAI offset fix before), user space programs sensitive to the
> leap second uglies would have a reasonable option. Data collection
> programs can read and store TAI time stamps, and these can always be
> converted into UTC using a table method.
>
> But do you think it will be possible to implement the following in the
> same way?
>
> - clock_settime(CLOCK_TAI)
> - clock_adjtime(CLOCK_TAI)
So I'm not sure if clock_settime(CLOCK_TAI) and clock_adjtime(CLOCK_TAI) 
makes sense or not, as it would then have side effects on 
CLOCK_REALTIME. For now I think keeping the adjtimex interface to the 
tai offset and and freq corrections, along with 
clock_settimeofday(CLOCK_REALTIME) is the best approach, but we can 
revisit it.
> - timer_create(CLOCK_TAI)
This should be easily doable though.

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds.
  2012-05-05 10:17     ` Richard Cochran
@ 2012-05-07 17:36       ` John Stultz
  0 siblings, 0 replies; 28+ messages in thread
From: John Stultz @ 2012-05-07 17:36 UTC (permalink / raw)
  To: Richard Cochran; +Cc: linux-kernel, Thomas Gleixner

On 05/05/2012 03:17 AM, Richard Cochran wrote:
> On Fri, Apr 27, 2012 at 04:08:03PM -0700, John Stultz wrote:
>> On 04/27/2012 01:12 AM, Richard Cochran wrote:
>>
>>> +#ifdef CONFIG_DELETE_LEAP_SECONDS
>>> +	/* Remembers whether to insert or to delete. */
>>> +	int insert_leapsecond;
>>> +#endif
>> I'm not a big fan of this additional config option.  The maintenance
>> burden for the extra condition is probably not worth the code size
>> trade-off.  Or it needs way more justification.
> Out of curiosity, I looked at ntp-4.2.6p5 to see if they really
> support deleting leap seconds or not. Even though the code appears to
> try and support them, I spotted a few bugs. There is a hard coded
> assumption that the TAI offset is increasing.
>
> This is just the reason why I suggest not supporting deletions (or
> only conditionally for nit pickers). You can code it up, but it will
> be in vain, since the code will never be tested or used in practice.
> Code that is never executed is a true mainenance burden by definition.
>
Well, testing it from a kernel perspective isn't a problem as its easy 
to write up a userland app that exercises the code path. But I agree its 
unlikely to be used in practice.

And you're argument of the added maintenance burden is reasonable. And 
while I don't find it terrible to keep, I'd *much* rather just remove it 
then add a config option and more #ifdefery.  Even so, such a removal 
needs to be an independent patch that can be discussed and argued on its 
own without mixing in other features.

thanks
-john


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC V1 5/5] timekeeping: Use a continuous timescale to tell time.
  2012-04-27  8:12 ` [PATCH RFC V1 5/5] timekeeping: Use a continuous timescale to tell time Richard Cochran
@ 2012-05-28 16:49   ` Richard Cochran
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Cochran @ 2012-05-28 16:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: John Stultz, Thomas Gleixner

On Fri, Apr 27, 2012 at 10:12:44AM +0200, Richard Cochran wrote:
> This patch converts the core time keeping code to use a continuous
> timescale called the Kernel Time Scale (KTS). KTS is based on the
> TAI timescale but can be offset from it by an integral number of seconds.
> Every function that returns UTC time now coverts the seconds by adding
> the current KTS - UTC offset.
> 
> As a result of this change, the NTP leap second code is considerably
> simplified and hopefully more robust.
> 
> Signed-off-by: Richard Cochran <richardcochran@gmail.com>
> ---
>  include/linux/timex.h     |    2 +-
>  kernel/time/ntp.c         |   81 ++++++++++----------------------------------
>  kernel/time/timekeeping.c |   73 ++++++++++++++++++++++++++++++++--------
>  3 files changed, 79 insertions(+), 77 deletions(-)
> 
> diff --git a/include/linux/timex.h b/include/linux/timex.h
> index 99bc88b..9461e6f 100644
> --- a/include/linux/timex.h
> +++ b/include/linux/timex.h
> @@ -252,7 +252,7 @@ extern void ntp_clear(void);
>  /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
>  extern u64 ntp_tick_length(void);
>  
> -extern int second_overflow(unsigned long secs);
> +void second_overflow(void);
>  extern int do_adjtimex(struct timex *);
>  extern void hardpps(const struct timespec *, const struct timespec *);
>  
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index d0a2183..91de2f8 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -16,6 +16,7 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  
> +#include "leap-seconds.h"
>  #include "tick-internal.h"
>  
>  /*
> @@ -24,6 +25,7 @@
>  
>  DEFINE_SPINLOCK(ntp_lock);
>  
> +#define STA_LEAP		(STA_INS | STA_DEL)
>  
>  /* USER_HZ period (usecs): */
>  unsigned long			tick_usec = TICK_USEC;
> @@ -42,19 +44,9 @@ static u64			tick_length_base;
>   * phase-lock loop variables
>   */
>  
> -/*
> - * clock synchronization status
> - *
> - * (TIME_ERROR prevents overwriting the CMOS clock)
> - */
> -static int			time_state = TIME_OK;
> -
>  /* clock status bits:							*/
>  static int			time_status = STA_UNSYNC;
>  
> -/* TAI offset (secs):							*/
> -static long			time_tai;
> -
>  /* time adjustment (nsecs):						*/
>  static s64			time_offset;
>  
> @@ -386,57 +378,14 @@ u64 ntp_tick_length(void)
>   * They were originally developed for SUN and DEC kernels.
>   * All the kudos should go to Dave for this stuff.
>   *
> - * Also handles leap second processing, and returns leap offset
>   */
> -int second_overflow(unsigned long secs)
> +void second_overflow(void)
>  {
>  	s64 delta;
> -	int leap = 0;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&ntp_lock, flags);
>  
> -	/*
> -	 * Leap second processing. If in leap-insert state at the end of the
> -	 * day, the system clock is set back one second; if in leap-delete
> -	 * state, the system clock is set ahead one second.
> -	 */
> -	switch (time_state) {
> -	case TIME_OK:
> -		if (time_status & STA_INS)
> -			time_state = TIME_INS;
> -		else if (time_status & STA_DEL)
> -			time_state = TIME_DEL;
> -		break;
> -	case TIME_INS:
> -		if (secs % 86400 == 0) {
> -			leap = -1;
> -			time_state = TIME_OOP;
> -			printk(KERN_NOTICE
> -				"Clock: inserting leap second 23:59:60 UTC\n");
> -		}
> -		break;
> -	case TIME_DEL:
> -		if ((secs + 1) % 86400 == 0) {
> -			leap = 1;
> -			time_tai--;
> -			time_state = TIME_WAIT;
> -			printk(KERN_NOTICE
> -				"Clock: deleting leap second 23:59:59 UTC\n");
> -		}
> -		break;
> -	case TIME_OOP:
> -		time_tai++;
> -		time_state = TIME_WAIT;
> -		break;
> -
> -	case TIME_WAIT:
> -		if (!(time_status & (STA_INS | STA_DEL)))
> -			time_state = TIME_OK;
> -		break;
> -	}
> -
> -
>  	/* Bump the maxerror field */
>  	time_maxerror += MAXFREQ / NSEC_PER_USEC;
>  	if (time_maxerror > NTP_PHASE_LIMIT) {
> @@ -475,8 +424,6 @@ int second_overflow(unsigned long secs)
>  
>  out:
>  	spin_unlock_irqrestore(&ntp_lock, flags);
> -
> -	return leap;
>  }
>  
>  #ifdef CONFIG_GENERIC_CMOS_UPDATE
> @@ -541,7 +488,6 @@ static inline void notify_cmos_timer(void) { }
>  static inline void process_adj_status(struct timex *txc, struct timespec *ts)
>  {
>  	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
> -		time_state = TIME_OK;
>  		time_status = STA_UNSYNC;
>  		/* restart PPS frequency calibration */
>  		pps_reset_freq_interval();
> @@ -554,6 +500,18 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
>  	if (!(time_status & STA_PLL) && (txc->status & STA_PLL))
>  		time_reftime = get_seconds();
>  
> +	/*
> +	 * Check for new leap second commands.
> +	 */
> +	if (!(time_status & STA_INS) && (txc->status & STA_INS))
> +		timekeeper_insert_leap_second();
> +
> +	else if (!(time_status & STA_DEL) && (txc->status & STA_DEL))
> +		timekeeper_delete_leap_second();
> +
> +	else if ((time_status & STA_LEAP) && !(txc->status & STA_LEAP))
> +		timekeeper_finish_leap_second();
> +
>  	/* only set allowed bits */
>  	time_status &= STA_RONLY;
>  	time_status |= txc->status & ~STA_RONLY;

Just for the record, this hunk and the following violate the (unspoken
but mentioned in the discussion on V2 series) locking rules for
ntp_lock and timekeerer.lock.

> @@ -597,7 +555,7 @@ static inline void process_adjtimex_modes(struct timex *txc, struct timespec *ts
>  	}
>  
>  	if (txc->modes & ADJ_TAI && txc->constant > 0)
> -		time_tai = txc->constant;
> +		timekeeper_tai_offset(txc->constant);
>  
>  	if (txc->modes & ADJ_OFFSET)
>  		ntp_update_offset(txc->offset);

Thanks,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-05-28 16:49 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-27  8:12 [PATCH RFC V1 0/5] Rationalize time keeping Richard Cochran
2012-04-27  8:12 ` [PATCH RFC V1 1/5] Add functions to convert continuous timescales to UTC Richard Cochran
2012-04-27  8:12 ` [PATCH RFC V1 2/5] ntp: Fix a stale comment and a few stray newlines Richard Cochran
2012-04-27 22:25   ` John Stultz
2012-04-27  8:12 ` [PATCH RFC V1 3/5] timekeeping: Fix a few minor newline issues Richard Cochran
2012-04-27 22:25   ` John Stultz
2012-04-27  8:12 ` [PATCH RFC V1 4/5] timekeeping: Offer an interface to manipulate leap seconds Richard Cochran
2012-04-27 23:08   ` John Stultz
2012-04-28  8:47     ` Richard Cochran
2012-05-05 10:17     ` Richard Cochran
2012-05-07 17:36       ` John Stultz
2012-04-27  8:12 ` [PATCH RFC V1 5/5] timekeeping: Use a continuous timescale to tell time Richard Cochran
2012-05-28 16:49   ` Richard Cochran
2012-04-27 22:49 ` [PATCH RFC V1 0/5] Rationalize time keeping John Stultz
2012-04-28  8:04   ` Richard Cochran
2012-04-30 20:56     ` John Stultz
2012-05-01  7:17       ` Richard Cochran
2012-05-01  8:01         ` John Stultz
2012-05-01 18:43           ` Richard Cochran
2012-05-03  7:02       ` Richard Cochran
2012-05-03 15:48         ` John Stultz
2012-05-03 18:21   ` Richard Cochran
2012-05-03 18:44     ` John Stultz
2012-05-03 19:28       ` Richard Cochran
2012-05-03 19:42         ` John Stultz
2012-05-03 19:57 ` John Stultz
2012-05-05  7:34   ` Richard Cochran
2012-05-05 19:25     ` John Stultz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.