netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms
@ 2015-07-28  0:46 Christopher Hall
  2015-07-28  0:46 ` [PATCH 1/5] Add functions producing system time given a backing counter value Christopher Hall
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Christopher Hall @ 2015-07-28  0:46 UTC (permalink / raw)
  To: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Christopher Hall

Next generation Intel platforms will have an Always Running
Timer (ART) that always runs when the system is powered and
is available to both the CPU and various on-board devices.
Initially, those devices include audio and network.  The
ART will give these devices the capability of precisely
cross timestamping their local device clock with the system
clock.

A system clock value like TSC or ART is not useful
unless translated to system time. The first three patches
enable this by changing the timekeeping code to return a
system time given a system clock value and translating ART
to TSC.

The last two patches modify the PTP driver to call a
cross timestamp function in the driver when available and
perform the cross timestamp in the e1000e driver.

Given the precise relationship between the network device
clock and system time enables better synchronization of
events on multiple network connected devices.

Changelog:

* The PTP portion of the patch set was posted 7/8/2015 (v3)
  and rejected because of there wasn't a driver that
  implemented the new API.  Now, the driver patch is added
  and the PTP patch operation is modified to revert to
  previous behavior when cross timestamp can't be
  completed.  This is indicated by the driver returning a
  non-zero value.

Christopher Hall (5):
  Add functions producing system time given a backing counter value
  Added functions mapping TSC value to system time
  Add calls to translate Always Running Timer (ART) to system time
  Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl
  Enables cross timestamping in the e1000e driver

 Documentation/ptp/testptp.c                 |   6 +-
 arch/x86/Kconfig                            |  12 ++
 arch/x86/include/asm/art.h                  |  42 ++++++
 arch/x86/include/asm/tsc.h                  |   5 +
 arch/x86/kernel/Makefile                    |   1 +
 arch/x86/kernel/art.c                       | 134 ++++++++++++++++++
 arch/x86/kernel/tsc.c                       |  22 +++
 drivers/net/ethernet/intel/e1000e/defines.h |   7 +
 drivers/net/ethernet/intel/e1000e/ptp.c     |  77 ++++++++++
 drivers/net/ethernet/intel/e1000e/regs.h    |   4 +
 drivers/ptp/ptp_chardev.c                   |  29 ++--
 include/linux/ptp_clock_kernel.h            |   7 +
 include/linux/timekeeping.h                 |   8 ++
 include/uapi/linux/ptp_clock.h              |   4 +-
 kernel/time/timekeeping.c                   | 211 +++++++++++++++++++++++++---
 15 files changed, 538 insertions(+), 31 deletions(-)
 create mode 100644 arch/x86/include/asm/art.h
 create mode 100644 arch/x86/kernel/art.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-28  0:46 [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms Christopher Hall
@ 2015-07-28  0:46 ` Christopher Hall
  2015-07-28  3:44   ` John Stultz
  2015-07-28  0:46 ` [PATCH 2/5] Added functions mapping TSC value to system time Christopher Hall
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Christopher Hall @ 2015-07-28  0:46 UTC (permalink / raw)
  To: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Christopher Hall

* counter_to_rawmono64
* counter_to_mono64
* counter_to_realtime64

Enables drivers to translate a captured system clock counter to system
time. This is useful for network and audio devices that capture timestamps
in terms of both the system clock and device clock.

Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
---
 include/linux/timekeeping.h |   8 ++
 kernel/time/timekeeping.c   | 211 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 199 insertions(+), 20 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 6e191e4..04e6455 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_TIMEKEEPING_H
 #define _LINUX_TIMEKEEPING_H
 
+struct clocksource; /* Defined in clocksource.h */
+
 /* Included from linux/ktime.h */
 
 void timekeeping_init(void);
@@ -27,6 +29,12 @@ struct timespec __current_kernel_time(void);
  */
 struct timespec64 get_monotonic_coarse64(void);
 extern void getrawmonotonic64(struct timespec64 *ts);
+extern int counter_to_rawmono64
+(struct timespec64 *rawmono, cycle_t counterval, struct clocksource *cs);
+extern int counter_to_mono64
+(struct timespec64 *mono, cycle_t counterval, struct clocksource *cs);
+extern int counter_to_realtime64
+(struct timespec64 *realtime, cycle_t counterval, struct clocksource *cs);
 extern void ktime_get_ts64(struct timespec64 *ts);
 extern time64_t ktime_get_seconds(void);
 extern time64_t ktime_get_real_seconds(void);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bca3667..1ca4fe0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -116,6 +116,8 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 	tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+#define ROLLOVER_THRESHOLD (2ULL << 39)
+
 #ifdef CONFIG_DEBUG_TIMEKEEPING
 #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
 
@@ -158,10 +160,11 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 	}
 }
 
-static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+static inline int timekeeping_get_delta
+(cycle_t *delta, struct tk_read_base *tkr, cycle_t *counterval)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
-	cycle_t now, last, mask, max, delta;
+	cycle_t now = *counterval, last, mask, max, delta;
 	unsigned int seq;
 
 	/*
@@ -173,46 +176,61 @@ static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 	 */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tkr->read(tkr->clock);
+		if (counterval == NULL) {
+			now = tkr->read(tkr->clock);
+		} else {
+			if (now < tkr->cycle_last &&
+			   tkr->cycle_last - now < ROLLOVER_THRESHOLD)
+				return -EAGAIN;
+		}
 		last = tkr->cycle_last;
 		mask = tkr->mask;
 		max = tkr->clock->max_cycles;
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	delta = clocksource_delta(now, last, mask);
+	*delta = clocksource_delta(now, last, mask);
 
 	/*
 	 * Try to catch underflows by checking if we are seeing small
 	 * mask-relative negative values.
 	 */
-	if (unlikely((~delta & mask) < (mask >> 3))) {
+	if (unlikely((~*delta & mask) < (mask >> 3))) {
 		tk->underflow_seen = 1;
-		delta = 0;
+		*delta = 0;
 	}
 
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
-	if (unlikely(delta > max)) {
-		tk->overflow_seen = 1;
-		delta = tkr->clock->max_cycles;
+	if (unlikely(*delta > max)) {
+		tk->underflow_seen = 1;
+		*delta = tkr->clock->max_cycles;
 	}
 
-	return delta;
+	return 0;
 }
 #else
-static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+static inline void timekeeping_check_update
+(struct timekeeper *tk, cycle_t offset)
 {
 }
-static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+static inline cycle_t timekeeping_get_delta
+(cycle_t *delta, struct tk_read_base *tkr, cycle_t *counterval)
 {
-	cycle_t cycle_now, delta;
+	cycle_t cycle_now;
 
 	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	if (counterval == NULL) {
+		cycle_now = tkr->read(tkr->clock);
+	} else {
+		cycle_now = *counterval;
+		if (cycle_now < tkr->cycle_last &&
+		   tkr->cycle_last - cycle_now < ROLLOVER_THRESHOLD)
+			return -EAGAIN;
+	}
 
 	/* calculate the delta since the last update_wall_time */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	*delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
-	return delta;
+	return 0;
 }
 #endif
 
@@ -298,13 +316,11 @@ u32 (*arch_gettimeoffset)(void) = default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
-static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
+static inline s64 timekeeping_delta_to_ns
+(struct tk_read_base *tkr, cycle_t delta)
 {
-	cycle_t delta;
 	s64 nsec;
 
-	delta = timekeeping_get_delta(tkr);
-
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
 
@@ -312,6 +328,28 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 	return nsec + arch_gettimeoffset();
 }
 
+static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
+{
+	cycle_t delta;
+
+	timekeeping_get_delta(&delta, tkr, NULL);
+	return timekeeping_delta_to_ns(tkr, delta);
+}
+
+static inline int timekeeping_counter_to_ns
+(s64 *nsec, struct tk_read_base *tkr, cycle_t counterval)
+{
+	cycle_t delta;
+	int err;
+
+	err = timekeeping_get_delta(&delta, tkr, &counterval);
+	if (err != 0)
+		return err;
+
+	*nsec = timekeeping_delta_to_ns(tkr, delta);
+	return 0;
+}
+
 /**
  * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
  * @tkr: Timekeeping readout base from which we take the update
@@ -1117,6 +1155,139 @@ void getrawmonotonic64(struct timespec64 *ts)
 EXPORT_SYMBOL(getrawmonotonic64);
 
 
+ /**
+ * counterval_to_rawmono64 - Returns the raw monotonic time given a counter
+ * value
+ * @counterval:          counter value (input)
+ * @rawmono:             raw monotonic clock (output)
+ * @cs:                  clocksource from which clockvalue is derived
+ *
+ * Returns:
+ *   0      - Success
+ *  -EAGAIN - Clock value is in the past, try again
+ *  -ENXIO  - Clocksource 'cs' doesn't match the current clocksource
+ *
+ */
+int counter_to_rawmono64(struct timespec64 *rawmono, cycle_t counterval,
+			 struct clocksource *cs)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	struct clocksource *curr_clock;
+	struct timespec64 ts64;
+	unsigned long seq;
+	s64 nsecs;
+	int err;
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		err = timekeeping_counter_to_ns
+			(&nsecs, &tk->tkr_raw, counterval);
+		if (err != 0)
+			return err;
+		ts64 = tk->raw_time;
+		curr_clock = tk->tkr_raw.clock;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+
+	if (curr_clock != cs)
+		return -ENXIO;
+
+	timespec64_add_ns(&ts64, nsecs);
+	*rawmono = ts64;
+
+	return 0;
+}
+EXPORT_SYMBOL(counter_to_rawmono64);
+
+ /**
+ * counterval_to_realtime64 - Returns the real time given a counter
+ * value
+ * @counterval:          counter value (input)
+ * @realtime:             realtime clock (output)
+ * @cs:                  clocksource from which clockvalue is derived
+ *
+ * Returns:
+ *   0      - Success
+ *  -EAGAIN - Clock value is in the past, try again
+ *  -ENXIO  - Clocksource 'cs' doesn't match the current clocksource
+ *
+ */
+int counter_to_realtime64(struct timespec64 *realtime, cycle_t counterval,
+			  struct clocksource *cs)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	struct clocksource *curr_clock;
+	struct timespec64 ts64;
+	unsigned long seq;
+	s64 nsecs;
+	int err;
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		err = timekeeping_counter_to_ns
+			(&nsecs, &tk->tkr_mono, counterval);
+		if (err != 0)
+			return err;
+		ts64.tv_sec = tk->xtime_sec;
+		curr_clock = tk->tkr_mono.clock;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+
+	if (curr_clock != cs)
+		return -ENXIO;
+
+	ts64.tv_nsec = 0;
+	timespec64_add_ns(&ts64, nsecs);
+	*realtime = ts64;
+
+	return 0;
+}
+EXPORT_SYMBOL(counter_to_realtime64);
+
+ /**
+ * counterval_to_mono64 - Returns the real time given a counter
+ * value
+ * @counterval:          counter value (input)
+ * @realtime:             realtime clock (output)
+ * @cs:                  clocksource from which clockvalue is derived
+ *
+ * Returns:
+ *   0      - Success
+ *  -EAGAIN - Clock value is in the past, try again
+ *  -ENXIO  - Clocksource 'cs' doesn't match the current clocksource
+ *
+ */
+int counter_to_mono64(struct timespec64 *mono, cycle_t counterval,
+		      struct clocksource *cs)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	struct clocksource *curr_clock;
+	struct timespec64 tomono;
+	struct timespec64 ts64;
+	unsigned long seq;
+	s64 nsecs;
+	int err;
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		err = timekeeping_counter_to_ns
+			(&nsecs, &tk->tkr_mono, counterval);
+		if (err != 0)
+			return err;
+		ts64.tv_sec = tk->xtime_sec;
+		tomono = tk->wall_to_monotonic;
+		curr_clock = tk->tkr_mono.clock;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+
+	if (curr_clock != cs)
+		return -ENXIO;
+
+	ts64.tv_nsec = 0;
+	timespec64_add_ns(&ts64, nsecs);
+	*mono = timespec64_add(ts64, tomono);
+
+	return 0;
+}
+EXPORT_SYMBOL(counter_to_mono64);
+
 /**
  * timekeeping_valid_for_hres - Check if timekeeping is suitable for hres
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/5] Added functions mapping TSC value to system time
  2015-07-28  0:46 [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms Christopher Hall
  2015-07-28  0:46 ` [PATCH 1/5] Add functions producing system time given a backing counter value Christopher Hall
@ 2015-07-28  0:46 ` Christopher Hall
  2015-07-28  0:46 ` [PATCH 3/5] Add calls to translate Always Running Timer (ART) " Christopher Hall
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: Christopher Hall @ 2015-07-28  0:46 UTC (permalink / raw)
  To: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Christopher Hall

* tsc_to_rawmono64
* tsc_to_realtime64
* tsc_to_mono64

These function build on the counter_to_* time translation function
specifically for TSC based system clock

Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
---
 arch/x86/include/asm/tsc.h |  5 +++++
 arch/x86/kernel/tsc.c      | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 94605c0..3fdf8b1 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -5,6 +5,7 @@
 #define _ASM_X86_TSC_H
 
 #include <asm/processor.h>
+#include <linux/time64.h>
 
 #define NS_SCALE	10 /* 2^10, carefully chosen */
 #define US_SCALE	32 /* 2^32, arbitralrily chosen */
@@ -52,6 +53,10 @@ extern int check_tsc_unstable(void);
 extern int check_tsc_disabled(void);
 extern unsigned long native_calibrate_tsc(void);
 
+extern int tsc_to_rawmono64(struct timespec64 *rawmono, cycle_t counterval);
+extern int tsc_to_realtime64(struct timespec64 *realtime, cycle_t counterval);
+extern int tsc_to_mono64(struct timespec64 *mono, cycle_t counterval);
+
 extern int tsc_clocksource_reliable;
 
 /*
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 7437b41..a192271 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -979,6 +979,24 @@ static cycle_t read_tsc(struct clocksource *cs)
 	return (cycle_t)get_cycles();
 }
 
+int tsc_to_rawmono64(struct timespec64 *rawmono, cycle_t counterval)
+{
+	return counter_to_rawmono64(rawmono, counterval, &clocksource_tsc);
+}
+EXPORT_SYMBOL(tsc_to_rawmono64);
+
+int tsc_to_realtime64(struct timespec64 *realtime, cycle_t counterval)
+{
+	return counter_to_realtime64(realtime, counterval, &clocksource_tsc);
+}
+EXPORT_SYMBOL(tsc_to_realtime64);
+
+int tsc_to_mono64(struct timespec64 *mono, cycle_t counterval)
+{
+	return counter_to_mono64(mono, counterval, &clocksource_tsc);
+}
+EXPORT_SYMBOL(tsc_to_mono64);
+
 /*
  * .mask MUST be CLOCKSOURCE_MASK(64). See comment above read_tsc()
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-28  0:46 [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms Christopher Hall
  2015-07-28  0:46 ` [PATCH 1/5] Add functions producing system time given a backing counter value Christopher Hall
  2015-07-28  0:46 ` [PATCH 2/5] Added functions mapping TSC value to system time Christopher Hall
@ 2015-07-28  0:46 ` Christopher Hall
  2015-07-28  1:32   ` Andy Lutomirski
                     ` (2 more replies)
  2015-07-28  0:46 ` [PATCH 4/5] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl Christopher Hall
  2015-07-28  0:46 ` [PATCH 5/5] Enables cross timestamping in the e1000e driver Christopher Hall
  4 siblings, 3 replies; 24+ messages in thread
From: Christopher Hall @ 2015-07-28  0:46 UTC (permalink / raw)
  To: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Christopher Hall

* art_to_mono64
* art_to_rawmono64
* art_to_realtime64

Intel audio and PCH ethernet devices use the Always Running Timer (ART) to
relate their device clock to system time

Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
---
 arch/x86/Kconfig           |  12 ++++
 arch/x86/include/asm/art.h |  42 ++++++++++++++
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/art.c      | 134 +++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/tsc.c      |   4 ++
 5 files changed, 193 insertions(+)
 create mode 100644 arch/x86/include/asm/art.h
 create mode 100644 arch/x86/kernel/art.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b3a1a5d..1ef9985 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1175,6 +1175,18 @@ config X86_CPUID
 	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
 	  /dev/cpu/31/cpuid.
 
+config X86_ART
+	bool "Always Running Timer"
+	default y
+	depends on X86_TSC
+	---help---
+	  This option provides functionality to drivers and devices that use
+	  the always-running-timer (ART) to correlate their device clock
+	  counter with the system clock counter. The TSC is *exactly* related
+	  to the ART by a ratio m/n specified by CPUID leaf 0x15
+	  (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
+	  performance impact. It's safe to say Y.
+
 choice
 	prompt "High Memory Support"
 	default HIGHMEM4G
diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
new file mode 100644
index 0000000..da58ce4
--- /dev/null
+++ b/arch/x86/include/asm/art.h
@@ -0,0 +1,42 @@
+/*
+ * x86 ART related functions
+ */
+#ifndef _ASM_X86_ART_H
+#define _ASM_X86_ART_H
+
+#ifndef CONFIG_X86_ART
+
+static inline int setup_art(void)
+{
+	return 0;
+}
+
+static inline bool has_art(void)
+{
+	return false;
+}
+
+static inline int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
+{
+	return -ENXIO;
+}
+static inline int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
+{
+	return -ENXIO;
+}
+static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
+{
+	return -ENXIO;
+}
+
+#else
+
+extern int setup_art(void);
+extern bool has_art(void);
+extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
+extern int art_to_realtime64(struct timespec64 *realtime, cycle_t art);
+extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
+
+#endif
+
+#endif/*_ASM_X86_ART_H*/
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f15af4..0908311 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_IOSF_MBI)			+= iosf_mbi.o
 obj-$(CONFIG_PMC_ATOM)			+= pmc_atom.o
+obj-$(CONFIG_X86_ART)			+= art.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
new file mode 100644
index 0000000..1906cf0
--- /dev/null
+++ b/arch/x86/kernel/art.c
@@ -0,0 +1,134 @@
+#include <asm/tsc.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <linux/spinlock.h>
+#include <linux/seqlock.h>
+
+#define CPUID_ART_LEAF 0x15
+
+static bool art_present;
+
+static struct art_state {
+	seqcount_t seq;
+	u32 art_ratio_numerator;
+	u32 art_ratio_denominator;
+
+	cycle_t prev_art;
+	cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding to
+				     prev_art */
+	u32 tsc_remainder;
+} art_state ____cacheline_aligned;
+
+static DEFINE_RAW_SPINLOCK(art_lock);
+
+#define MIN_DENOMINATOR 2
+int setup_art(void)
+{
+	if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
+		return 0;
+	art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
+	if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
+		return 0;
+	art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
+
+	art_present = true;
+	return 0;
+}
+
+static bool has_art(void)
+{
+	return art_present;
+}
+EXPORT_SYMBOL(has_art);
+
+#define ROLLOVER_THRESHOLD (2ULL << 23)
+
+static u32 art_scale(struct art_state *art_state, cycle_t *art)
+{
+	u32 rem;
+
+	*art *= art_state->art_ratio_numerator;
+
+	switch (art_state->art_ratio_denominator) {
+	default:
+		rem = do_div(*art, art_state->art_ratio_denominator);
+	case 2:
+		rem = *art & 0x1;
+		*art >>= 1;
+	}
+	return rem + art_state->tsc_remainder;
+}
+
+static cycle_t art_to_tsc(cycle_t art)
+{
+	unsigned seq;
+	cycle_t tsc_next;
+	u32 rem_next;
+	bool backward = false;
+	unsigned long flags;
+
+	do {
+		seq = read_seqcount_begin(&art_state.seq);
+
+		if (art < art_state.prev_art &&
+		    art_state.prev_art - art < ROLLOVER_THRESHOLD) {
+			tsc_next = (art_state.prev_art-art);
+			art_scale(&art_state, &tsc_next);
+			tsc_next = art_state.prev_tsc_corr_art - tsc_next;
+			backward = true;
+		} else {
+			tsc_next = art - art_state.prev_art;
+			rem_next = art_scale(&art_state, &tsc_next);
+			tsc_next += art_state.prev_tsc_corr_art;
+
+			tsc_next += rem_next /
+				art_state.art_ratio_denominator;
+			rem_next %= art_state.art_ratio_denominator;
+		}
+	} while (read_seqcount_retry(&art_state.seq, seq));
+
+	/* There's no need to update after every read, if an update is
+	   already in progress by someone else just exit */
+	if (!backward && raw_spin_trylock_irqsave(&art_lock, flags)) {
+		write_seqcount_begin(&art_state.seq);
+		art_state.prev_art = art;
+		art_state.prev_tsc_corr_art = tsc_next;
+		art_state.tsc_remainder = rem_next;
+		write_seqcount_end(&art_state.seq);
+		raw_spin_unlock_irqrestore(&art_lock, flags);
+	}
+
+	return tsc_next;
+}
+
+static bool checked_art_to_tsc(cycle_t *tsc)
+{
+	if (!has_art())
+		return false;
+	*tsc = art_to_tsc(*tsc);
+	return true;
+}
+
+static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
+{
+	if (!checked_art_to_tsc(&art))
+		return -ENXIO;
+	return tsc_to_rawmono64(rawmono, art);
+}
+EXPORT_SYMBOL(art_to_rawmono64);
+
+static int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
+{
+	if (!checked_art_to_tsc(&art))
+		return -ENXIO;
+	return tsc_to_realtime64(realtime, art);
+}
+EXPORT_SYMBOL(art_to_realtime64);
+
+static int art_to_mono64(struct timespec64 *mono, cycle_t art)
+{
+	if (!checked_art_to_tsc(&art))
+		return -ENXIO;
+	return tsc_to_mono64(mono, art);
+}
+EXPORT_SYMBOL(art_to_mono64);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index a192271..828c4b3 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -22,6 +22,8 @@
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
 
+#include <asm/art.h>
+
 unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
 
@@ -1177,6 +1179,8 @@ static int __init init_tsc_clocksource(void)
 		return 0;
 	}
 
+	setup_art();
+
 	schedule_delayed_work(&tsc_irqwork, 0);
 	return 0;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/5] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl
  2015-07-28  0:46 [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms Christopher Hall
                   ` (2 preceding siblings ...)
  2015-07-28  0:46 ` [PATCH 3/5] Add calls to translate Always Running Timer (ART) " Christopher Hall
@ 2015-07-28  0:46 ` Christopher Hall
  2015-07-28  0:46 ` [PATCH 5/5] Enables cross timestamping in the e1000e driver Christopher Hall
  4 siblings, 0 replies; 24+ messages in thread
From: Christopher Hall @ 2015-07-28  0:46 UTC (permalink / raw)
  To: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Christopher Hall

This patch allows system and device time ("cross-timestamp") to be
performed by the driver. Currently, the cross-timestamping is performed
in the PTP_SYS_OFFSET ioctl.  The PTP clock driver reads gettimeofday()
and the gettime64() callback provided by the driver. The cross-timestamp
is best effort where the latency between the capture of system time
(getnstimeofday()) and the device time (driver callback) may be
significant.

This patch adds an additional callback getsynctime64(). Which will be
called when the driver is able to perform a more accurate, implementation
specific cross-timestamping.  For example, future network devices that
implement PCIE PTM will be able to precisely correlate the device clock
with the system clock with virtually zero latency between captures.
This added callback can be used by the driver to expose this functionality.

The callback, getsynctime64(), will only be called when defined and
n_samples == 1 because the driver returns only 1 cross-timestamp where
multiple samples cannot be chained together.

This patch also adds to the capabilities ioctl (PTP_CLOCK_GETCAPS),
allowing applications to query whether or not drivers implement the
getsynctime callback, providing more precise cross timestamping.

Commit Details:

Added additional callback to ptp_clock_info:

* getsynctime64()

This takes 2 arguments referring to system and device time

With this callback drivers may provide both system time and device time
to ensure precise correlation

Modified PTP_SYS_OFFSET ioctl in PTP clock driver to use the above
callback if it's available

Added capability (PTP_CLOCK_GETCAPS) for checking whether driver supports
cross timestamping

Added check for cross timestamping flag to testptp.c

Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
---
 Documentation/ptp/testptp.c      |  6 ++++--
 drivers/ptp/ptp_chardev.c        | 29 +++++++++++++++++++++--------
 include/linux/ptp_clock_kernel.h |  7 +++++++
 include/uapi/linux/ptp_clock.h   |  4 +++-
 4 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index 2bc8abc..8004efd 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -276,13 +276,15 @@ int main(int argc, char *argv[])
 			       "  %d external time stamp channels\n"
 			       "  %d programmable periodic signals\n"
 			       "  %d pulse per second\n"
-			       "  %d programmable pins\n",
+			       "  %d programmable pins\n"
+			       "  %d cross timestamping\n",
 			       caps.max_adj,
 			       caps.n_alarm,
 			       caps.n_ext_ts,
 			       caps.n_per_out,
 			       caps.pps,
-			       caps.n_pins);
+			       caps.n_pins,
+			       caps.cross_timestamping);
 		}
 	}
 
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index da7bae9..392ccfa 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -124,7 +124,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 	struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
 	struct ptp_clock_info *ops = ptp->info;
 	struct ptp_clock_time *pct;
-	struct timespec64 ts;
+	struct timespec64 ts, systs;
 	int enable, err = 0;
 	unsigned int i, pin_index;
 
@@ -138,6 +138,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 		caps.n_per_out = ptp->info->n_per_out;
 		caps.pps = ptp->info->pps;
 		caps.n_pins = ptp->info->n_pins;
+		caps.cross_timestamping = ptp->info->getsynctime64 != NULL;
 		if (copy_to_user((void __user *)arg, &caps, sizeof(caps)))
 			err = -EFAULT;
 		break;
@@ -196,19 +197,31 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 			break;
 		}
 		pct = &sysoff->ts[0];
-		for (i = 0; i < sysoff->n_samples; i++) {
-			getnstimeofday64(&ts);
+		if (ptp->info->getsynctime64 && sysoff->n_samples == 1 &&
+		    ptp->info->getsynctime64(ptp->info, &ts, &systs) == 0) {
+			pct->sec = systs.tv_sec;
+			pct->nsec = systs.tv_nsec;
+			pct++;
 			pct->sec = ts.tv_sec;
 			pct->nsec = ts.tv_nsec;
 			pct++;
-			ptp->info->gettime64(ptp->info, &ts);
+			pct->sec = systs.tv_sec;
+			pct->nsec = systs.tv_nsec;
+		} else {
+			for (i = 0; i < sysoff->n_samples; i++) {
+				getnstimeofday64(&ts);
+				pct->sec = ts.tv_sec;
+				pct->nsec = ts.tv_nsec;
+				pct++;
+				ptp->info->gettime64(ptp->info, &ts);
+				pct->sec = ts.tv_sec;
+				pct->nsec = ts.tv_nsec;
+				pct++;
+			}
+			getnstimeofday64(&ts);
 			pct->sec = ts.tv_sec;
 			pct->nsec = ts.tv_nsec;
-			pct++;
 		}
-		getnstimeofday64(&ts);
-		pct->sec = ts.tv_sec;
-		pct->nsec = ts.tv_nsec;
 		if (copy_to_user((void __user *)arg, sysoff, sizeof(*sysoff)))
 			err = -EFAULT;
 		break;
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index b8b7306..8c43345 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -67,6 +67,11 @@ struct ptp_clock_request {
  * @gettime64:  Reads the current time from the hardware clock.
  *              parameter ts: Holds the result.
  *
+ * @getsynctime64:  Reads the current time from the hardware clock and system
+ *                  clock simultaneously.
+ *                  parameter dev: Holds the device time
+ *                  parameter sys: Holds the system time
+ *
  * @settime64:  Set the current time on the hardware clock.
  *              parameter ts: Time value to set.
  *
@@ -105,6 +110,8 @@ struct ptp_clock_info {
 	int (*adjfreq)(struct ptp_clock_info *ptp, s32 delta);
 	int (*adjtime)(struct ptp_clock_info *ptp, s64 delta);
 	int (*gettime64)(struct ptp_clock_info *ptp, struct timespec64 *ts);
+	int (*getsynctime64)(struct ptp_clock_info *ptp, struct timespec64 *dev,
+			struct timespec64 *sys);
 	int (*settime64)(struct ptp_clock_info *p, const struct timespec64 *ts);
 	int (*enable)(struct ptp_clock_info *ptp,
 		      struct ptp_clock_request *request, int on);
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index f0b7bfe..ffb2635 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -51,7 +51,9 @@ struct ptp_clock_caps {
 	int n_per_out; /* Number of programmable periodic signals. */
 	int pps;       /* Whether the clock supports a PPS callback. */
 	int n_pins;    /* Number of input/output pins. */
-	int rsv[14];   /* Reserved for future use. */
+	/* Whether the clock supports precise system-device cross timestamps */
+	int cross_timestamping;
+	int rsv[13];   /* Reserved for future use. */
 };
 
 struct ptp_extts_request {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/5] Enables cross timestamping in the e1000e driver
  2015-07-28  0:46 [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms Christopher Hall
                   ` (3 preceding siblings ...)
  2015-07-28  0:46 ` [PATCH 4/5] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl Christopher Hall
@ 2015-07-28  0:46 ` Christopher Hall
  4 siblings, 0 replies; 24+ messages in thread
From: Christopher Hall @ 2015-07-28  0:46 UTC (permalink / raw)
  To: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Christopher Hall

Adds getsynctime64() callback which captures "raw" cross timestamp
between ART and Ethernet device clock
Translates captured ART to REALTIME clock

Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
---
 drivers/net/ethernet/intel/e1000e/defines.h |  7 +++
 drivers/net/ethernet/intel/e1000e/ptp.c     | 83 +++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/e1000e/regs.h    |  4 ++
 3 files changed, 94 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
index 133d407..9f16269 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -527,6 +527,13 @@
 #define E1000_RXCW_C          0x20000000        /* Receive config */
 #define E1000_RXCW_SYNCH      0x40000000        /* Receive config synch */
 
+/* HH Time Sync */
+#define E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK   0x0000F000      /* max delay */
+#define E1000_TSYNCTXCTL_SYNC_COMP              0x40000000      /* sync complete
+ */
+#define E1000_TSYNCTXCTL_START_SYNC             0x80000000      /* initiate sync
+ */
+
 #define E1000_TSYNCTXCTL_VALID		0x00000001 /* Tx timestamp valid */
 #define E1000_TSYNCTXCTL_ENABLED	0x00000010 /* enable Tx timestamping */
 
diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c
index 25a0ad5..bd8b888 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -25,6 +25,7 @@
  */
 
 #include "e1000.h"
+#include <asm/art.h>
 
 /**
  * e1000e_phc_adjfreq - adjust the frequency of the hardware clock
@@ -98,6 +99,87 @@ static int e1000e_phc_adjtime(struct ptp_clock_info *ptp, s64 delta)
 	return 0;
 }
 
+#define HW_WAIT_COUNT (2)
+#define HW_RETRY_COUNT (2)
+
+static int e1000e_phc_read_timestamp_pair(struct e1000_adapter *adapter,
+					  u64 *systim, u64 *plttim)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	int i, j;
+	u32 tsync_ctrl;
+	int ret;
+
+	if (hw->mac.type < e1000_pch_spt)
+		return -ENXIO;
+
+	for (j = 0; j < HW_RETRY_COUNT; ++j) {
+		tsync_ctrl = er32(TSYNCTXCTL);
+		tsync_ctrl |= E1000_TSYNCTXCTL_START_SYNC |
+			E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK;
+		ew32(TSYNCTXCTL, tsync_ctrl);
+		ret = 0;
+		for (i = 0; i < HW_WAIT_COUNT; ++i) {
+			udelay(2);
+			tsync_ctrl = er32(TSYNCTXCTL);
+			if (tsync_ctrl & E1000_TSYNCTXCTL_SYNC_COMP)
+				break;
+		}
+
+		if (i == HW_WAIT_COUNT) {
+			ret = -ETIMEDOUT;
+		} else if (ret == 0) {
+			*plttim = er32(PLTSTMPH);
+			*plttim <<= 32;
+			*plttim |= er32(PLTSTMPL);
+			*systim = er32(SYSSTMPH);
+			*systim <<= 32;
+			*systim |= er32(SYSSTMPL);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+#define SYNCTIME_RETRY_COUNT (2)
+
+static int e1000e_phc_getsynctime(struct ptp_clock_info *ptp,
+				  struct timespec64 *devts,
+				  struct timespec64 *systs)
+{
+	struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter,
+						     ptp_clock_info);
+	unsigned long flags;
+	u32 remainder;
+	u64 plttim, systim;
+
+	int i, ret;
+
+	if (!has_art())
+		return -ENXIO;
+
+	for (i = 0; i < SYNCTIME_RETRY_COUNT; ++i) {
+		ret = e1000e_phc_read_timestamp_pair
+			(adapter, &systim, &plttim);
+		if (ret != 0)
+			continue;
+
+		ret = art_to_realtime64(systs, plttim);
+		if (ret == 0) {
+			spin_lock_irqsave(&adapter->systim_lock, flags);
+			systim = timecounter_cyc2time(&adapter->tc, systim);
+			spin_unlock_irqrestore(&adapter->systim_lock, flags);
+			devts->tv_sec =
+				div_u64_rem(systim, NSEC_PER_SEC, &remainder);
+			devts->tv_nsec = remainder;
+			break;
+		}
+	}
+
+	return ret;
+}
+
 /**
  * e1000e_phc_gettime - Reads the current time from the hardware clock
  * @ptp: ptp clock structure
@@ -190,6 +272,7 @@ static const struct ptp_clock_info e1000e_ptp_clock_info = {
 	.adjfreq	= e1000e_phc_adjfreq,
 	.adjtime	= e1000e_phc_adjtime,
 	.gettime64	= e1000e_phc_gettime,
+	.getsynctime64	= e1000e_phc_getsynctime,
 	.settime64	= e1000e_phc_settime,
 	.enable		= e1000e_phc_enable,
 };
diff --git a/drivers/net/ethernet/intel/e1000e/regs.h b/drivers/net/ethernet/intel/e1000e/regs.h
index b24e5fe..4dd5b54 100644
--- a/drivers/net/ethernet/intel/e1000e/regs.h
+++ b/drivers/net/ethernet/intel/e1000e/regs.h
@@ -246,6 +246,10 @@
 #define E1000_SYSTIML	0x0B600	/* System time register Low - RO */
 #define E1000_SYSTIMH	0x0B604	/* System time register High - RO */
 #define E1000_TIMINCA	0x0B608	/* Increment attributes register - RW */
+#define E1000_SYSSTMPL  0x0B648 /* HH Timesync system stamp low register */
+#define E1000_SYSSTMPH  0x0B64C /* HH Timesync system stamp hi register */
+#define E1000_PLTSTMPL  0x0B640 /* HH Timesync platform stamp low register */
+#define E1000_PLTSTMPH  0x0B644 /* HH Timesync platform stamp hi register */
 #define E1000_RXMTRL	0x0B634	/* Time sync Rx EtherType and Msg Type - RW */
 #define E1000_RXUDP	0x0B638	/* Time Sync Rx UDP Port - RW */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-28  0:46 ` [PATCH 3/5] Add calls to translate Always Running Timer (ART) " Christopher Hall
@ 2015-07-28  1:32   ` Andy Lutomirski
  2015-07-29  1:18     ` Hall, Christopher S
  2015-07-28  4:11   ` John Stultz
  2015-07-29  8:25   ` Paul Bolle
  2 siblings, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2015-07-28  1:32 UTC (permalink / raw)
  To: Christopher Hall, john.stultz, tglx, richardcochran, mingo,
	jeffrey.t.kirsher, john.ronciak, hpa, x86
  Cc: linux-kernel, netdev, Borislav Petkov

On 07/27/2015 05:46 PM, Christopher Hall wrote:
> * art_to_mono64
> * art_to_rawmono64
> * art_to_realtime64
>
> Intel audio and PCH ethernet devices use the Always Running Timer (ART) to
> relate their device clock to system time
>
> Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
> ---
>   arch/x86/Kconfig           |  12 ++++
>   arch/x86/include/asm/art.h |  42 ++++++++++++++
>   arch/x86/kernel/Makefile   |   1 +
>   arch/x86/kernel/art.c      | 134 +++++++++++++++++++++++++++++++++++++++++++++
>   arch/x86/kernel/tsc.c      |   4 ++
>   5 files changed, 193 insertions(+)
>   create mode 100644 arch/x86/include/asm/art.h
>   create mode 100644 arch/x86/kernel/art.c
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b3a1a5d..1ef9985 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1175,6 +1175,18 @@ config X86_CPUID
>   	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
>   	  /dev/cpu/31/cpuid.
>
> +config X86_ART
> +	bool "Always Running Timer"
> +	default y
> +	depends on X86_TSC
> +	---help---
> +	  This option provides functionality to drivers and devices that use
> +	  the always-running-timer (ART) to correlate their device clock
> +	  counter with the system clock counter. The TSC is *exactly* related
> +	  to the ART by a ratio m/n specified by CPUID leaf 0x15
> +	  (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
> +	  performance impact. It's safe to say Y.
> +

Is there a good reason to make this optional?

Also, is there *still* no way to ask the thing for its nominal 
frequnency?  Or can we expect CPUID leaf 16H to work on CPUs that 
support this and can we expect it to actually work?  The SDM says "The 
returned information should not be used for any other purpose as the 
returned information does not accurately correlate to information / 
counters returned by other processor interfaces."

Also, does this thing let us learn the real time base?  SDM 17.14.4 
suggests that the ART value isn't affected by "privileged software" (aka 
buggy/malicious firmware).  Or, alternatively, how do we learn the 
offset K between ART and scaled TSC?

>   choice
>   	prompt "High Memory Support"
>   	default HIGHMEM4G
> diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
> new file mode 100644
> index 0000000..da58ce4
> --- /dev/null
> +++ b/arch/x86/include/asm/art.h
> @@ -0,0 +1,42 @@
> +/*
> + * x86 ART related functions
> + */
> +#ifndef _ASM_X86_ART_H
> +#define _ASM_X86_ART_H
> +
> +#ifndef CONFIG_X86_ART
> +
> +static inline int setup_art(void)
> +{
> +	return 0;
> +}
> +
> +static inline bool has_art(void)
> +{
> +	return false;
> +}
> +
> +static inline int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> +	return -ENXIO;
> +}
> +static inline int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
> +{
> +	return -ENXIO;
> +}
> +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
> +{
> +	return -ENXIO;
> +}
> +
> +#else
> +
> +extern int setup_art(void);
> +extern bool has_art(void);
> +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
> +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t art);
> +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
> +
> +#endif
> +
> +#endif/*_ASM_X86_ART_H*/
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 0f15af4..0908311 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
>   obj-$(CONFIG_TRACING)			+= tracepoint.o
>   obj-$(CONFIG_IOSF_MBI)			+= iosf_mbi.o
>   obj-$(CONFIG_PMC_ATOM)			+= pmc_atom.o
> +obj-$(CONFIG_X86_ART)			+= art.o
>
>   ###
>   # 64 bit specific files
> diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
> new file mode 100644
> index 0000000..1906cf0
> --- /dev/null
> +++ b/arch/x86/kernel/art.c
> @@ -0,0 +1,134 @@
> +#include <asm/tsc.h>
> +#include <asm/cpufeature.h>
> +#include <asm/processor.h>
> +#include <linux/spinlock.h>
> +#include <linux/seqlock.h>
> +
> +#define CPUID_ART_LEAF 0x15
> +
> +static bool art_present;
> +
> +static struct art_state {
> +	seqcount_t seq;
> +	u32 art_ratio_numerator;
> +	u32 art_ratio_denominator;
> +

This needs much better comments, IMO, including some discussion of how 
the locking works.

> +	cycle_t prev_art;
> +	cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding to
> +				     prev_art */
> +	u32 tsc_remainder;
> +} art_state ____cacheline_aligned;
> +
> +static DEFINE_RAW_SPINLOCK(art_lock);
> +
> +#define MIN_DENOMINATOR 2
> +int setup_art(void)
> +{
> +	if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
> +		return 0;
> +	art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
> +	if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
> +		return 0;
> +	art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
> +
> +	art_present = true;
> +	return 0;
> +}
> +
> +static bool has_art(void)
> +{
> +	return art_present;
> +}
> +EXPORT_SYMBOL(has_art);

IMO this should be a pseudo-feature X86_FEATURE_ART.

> +
> +#define ROLLOVER_THRESHOLD (2ULL << 23)
> +
> +static u32 art_scale(struct art_state *art_state, cycle_t *art)
> +{
> +	u32 rem;
> +
> +	*art *= art_state->art_ratio_numerator;

This seems very likely to overflow.  I think you should be using the 
equivalent of mul_u128_u64_shr (which probably doesn't exist, but 
mul_u64_u32_shr does) or just open-code it using __uint128_t if this 
thing is x86_64-only.

> +static cycle_t art_to_tsc(cycle_t art)
> +{

This function needs some kind of description of what it does.

Also, I don't think it's okay to have a pure-sounding thing like 
art_to_xyz actually be stateful and rely on a current value being passed.

Also, how does this code account for the unspecified ART-to-TSC offset? 
  I don't see it in the code on a quick reading.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-28  0:46 ` [PATCH 1/5] Add functions producing system time given a backing counter value Christopher Hall
@ 2015-07-28  3:44   ` John Stultz
  2015-07-28  4:05     ` John Stultz
  2015-07-29  1:41     ` Hall, Christopher S
  0 siblings, 2 replies; 24+ messages in thread
From: John Stultz @ 2015-07-28  3:44 UTC (permalink / raw)
  To: Christopher Hall
  Cc: Thomas Gleixner, Richard Cochran, Ingo Molnar, Jeff Kirsher,
	john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
<christopher.s.hall@intel.com> wrote:
> * counter_to_rawmono64
> * counter_to_mono64
> * counter_to_realtime64
>
> Enables drivers to translate a captured system clock counter to system
> time. This is useful for network and audio devices that capture timestamps
> in terms of both the system clock and device clock.

Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
fact that the multiplier is constantly adjusted and corrected. So that
calling the function twice with the same counter value may result in
different returned values.

I've not yet groked the whole patchset, but it seems like there needs
to be some mechanism that ensures the counter value is captured and
used in the same (or at least close) interval that the timekeeper data
is valid for.

thanks
-john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-28  3:44   ` John Stultz
@ 2015-07-28  4:05     ` John Stultz
  2015-07-29 10:19       ` Thomas Gleixner
  2015-07-29  1:41     ` Hall, Christopher S
  1 sibling, 1 reply; 24+ messages in thread
From: John Stultz @ 2015-07-28  4:05 UTC (permalink / raw)
  To: Christopher Hall
  Cc: Thomas Gleixner, Richard Cochran, Ingo Molnar, Jeff Kirsher,
	john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Mon, Jul 27, 2015 at 8:44 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> <christopher.s.hall@intel.com> wrote:
>> * counter_to_rawmono64
>> * counter_to_mono64
>> * counter_to_realtime64
>>
>> Enables drivers to translate a captured system clock counter to system
>> time. This is useful for network and audio devices that capture timestamps
>> in terms of both the system clock and device clock.
>
> Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
> fact that the multiplier is constantly adjusted and corrected. So that
> calling the function twice with the same counter value may result in
> different returned values.
>
> I've not yet groked the whole patchset, but it seems like there needs
> to be some mechanism that ensures the counter value is captured and
> used in the same (or at least close) interval that the timekeeper data
> is valid for.


So reading through. It looks like you only use art_to_realtime(), right?

So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
frequency adjusted, it might be best to construct this delta in the
following way.


Add counter_to_rawmono64(), which should be able to safely calculate
the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.

Use getnstime_raw_and_real() to get a immediate snapshot of current
MONOTONIC_RAW  and REALTIME clocks.

Then calculate the delta between the snapshotted counter raw time, and
the current raw time.  Then apply that offset to the current realtime.

The larger the raw-time delta, the larger the possible realtime error.
But I think that will be as good as it gets.

This should simplify the interfaces you're adding to the timekeeping core.

thanks
-john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-28  0:46 ` [PATCH 3/5] Add calls to translate Always Running Timer (ART) " Christopher Hall
  2015-07-28  1:32   ` Andy Lutomirski
@ 2015-07-28  4:11   ` John Stultz
  2015-07-29  2:05     ` Hall, Christopher S
  2015-07-29  8:25   ` Paul Bolle
  2 siblings, 1 reply; 24+ messages in thread
From: John Stultz @ 2015-07-28  4:11 UTC (permalink / raw)
  To: Christopher Hall
  Cc: Thomas Gleixner, Richard Cochran, Ingo Molnar, Jeff Kirsher,
	john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
<christopher.s.hall@intel.com> wrote:
> +static bool checked_art_to_tsc(cycle_t *tsc)
> +{
> +       if (!has_art())
> +               return false;
> +       *tsc = art_to_tsc(*tsc);
> +       return true;
> +}
> +
> +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> +       if (!checked_art_to_tsc(&art))
> +               return -ENXIO;
> +       return tsc_to_rawmono64(rawmono, art);
> +}
> +EXPORT_SYMBOL(art_to_rawmono64);

This all seems to assume the TSC is the current clocksource, which it
may not be if the user has overridden it.

If instead there were a counter_to_rawmono64() which took the counter
value and maybe the name of the clocksource (if the strncmp is
affordable for your use), it might be easier for the core to provide
an error if the current timekeeping clocksource isn't the one the
counter value is based on. This would also allow the tsc_to_*()
midlayers to be dropped (since they don't seem to do much).

thanks
-john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-28  1:32   ` Andy Lutomirski
@ 2015-07-29  1:18     ` Hall, Christopher S
  2015-07-29  1:47       ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: Hall, Christopher S @ 2015-07-29  1:18 UTC (permalink / raw)
  To: 'Andy Lutomirski',
	john.stultz, tglx, richardcochran, mingo, Kirsher, Jeffrey T,
	Ronciak, John, hpa, x86
  Cc: linux-kernel, netdev, Borislav Petkov



> -----Original Message-----
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Monday, July 27, 2015 6:32 PM
> To: Hall, Christopher S; john.stultz@linaro.org; tglx@linutronix.de;
> richardcochran@gmail.com; mingo@redhat.com; Kirsher, Jeffrey T; Ronciak,
> John; hpa@zytor.com; x86@kernel.org
> Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Borislav
> Petkov
> Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
> (ART) to system time
> 
> On 07/27/2015 05:46 PM, Christopher Hall wrote:
> > * art_to_mono64
> > * art_to_rawmono64
> > * art_to_realtime64
> >
> > Intel audio and PCH ethernet devices use the Always Running Timer
> (ART) to
> > relate their device clock to system time
> >
> > Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
> > ---
> >   arch/x86/Kconfig           |  12 ++++
> >   arch/x86/include/asm/art.h |  42 ++++++++++++++
> >   arch/x86/kernel/Makefile   |   1 +
> >   arch/x86/kernel/art.c      | 134
> +++++++++++++++++++++++++++++++++++++++++++++
> >   arch/x86/kernel/tsc.c      |   4 ++
> >   5 files changed, 193 insertions(+)
> >   create mode 100644 arch/x86/include/asm/art.h
> >   create mode 100644 arch/x86/kernel/art.c
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index b3a1a5d..1ef9985 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1175,6 +1175,18 @@ config X86_CPUID
> >   	  with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
> >   	  /dev/cpu/31/cpuid.
> >
> > +config X86_ART
> > +	bool "Always Running Timer"
> > +	default y
> > +	depends on X86_TSC
> > +	---help---
> > +	  This option provides functionality to drivers and devices that
> use
> > +	  the always-running-timer (ART) to correlate their device clock
> > +	  counter with the system clock counter. The TSC is *exactly*
> related
> > +	  to the ART by a ratio m/n specified by CPUID leaf 0x15
> > +	  (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
> > +	  performance impact. It's safe to say Y.
> > +
> 
> Is there a good reason to make this optional?

If there aren't any objections, it sound OK to me.  So no, I don't know
of any good reasons.

> 
> Also, is there *still* no way to ask the thing for its nominal
> frequnency?  Or can we expect CPUID leaf 16H to work on CPUs that
> support this and can we expect it to actually work?  

There isn't any way to query nominal frequency.  CPUID leaf 0x15 only
exposes the relationship between ART and TSC.  CPUID leaf 0x16 stays
the more or less the same and isn't related to ART.

The SDM says "The
> returned information should not be used for any other purpose as the
> returned information does not accurately correlate to information /
> counters returned by other processor interfaces."
> 
> Also, does this thing let us learn the real time base?  SDM 17.14.4
> suggests that the ART value isn't affected by "privileged software" (aka
> buggy/malicious firmware).  Or, alternatively, how do we learn the
> offset K between ART and scaled TSC?

ART isn't affected by software.  The determination of K used to convert ART to
TSC is in a footnote (2) in that section of the SDM.  I'm not going to risk
repeating it here and possibly altering its meaning.

> 
> >   choice
> >   	prompt "High Memory Support"
> >   	default HIGHMEM4G
> > diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
> > new file mode 100644
> > index 0000000..da58ce4
> > --- /dev/null
> > +++ b/arch/x86/include/asm/art.h
> > @@ -0,0 +1,42 @@
> > +/*
> > + * x86 ART related functions
> > + */
> > +#ifndef _ASM_X86_ART_H
> > +#define _ASM_X86_ART_H
> > +
> > +#ifndef CONFIG_X86_ART
> > +
> > +static inline int setup_art(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline bool has_art(void)
> > +{
> > +	return false;
> > +}
> > +
> > +static inline int art_to_rawmono64(struct timespec64 *rawmono,
> cycle_t art)
> > +{
> > +	return -ENXIO;
> > +}
> > +static inline int art_to_realtime64(struct timespec64 *realtime,
> cycle_t art)
> > +{
> > +	return -ENXIO;
> > +}
> > +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
> > +{
> > +	return -ENXIO;
> > +}
> > +
> > +#else
> > +
> > +extern int setup_art(void);
> > +extern bool has_art(void);
> > +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
> > +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t
> art);
> > +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
> > +
> > +#endif
> > +
> > +#endif/*_ASM_X86_ART_H*/
> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> > index 0f15af4..0908311 100644
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS)		+=
> perf_regs.o
> >   obj-$(CONFIG_TRACING)			+= tracepoint.o
> >   obj-$(CONFIG_IOSF_MBI)			+= iosf_mbi.o
> >   obj-$(CONFIG_PMC_ATOM)			+= pmc_atom.o
> > +obj-$(CONFIG_X86_ART)			+= art.o
> >
> >   ###
> >   # 64 bit specific files
> > diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
> > new file mode 100644
> > index 0000000..1906cf0
> > --- /dev/null
> > +++ b/arch/x86/kernel/art.c
> > @@ -0,0 +1,134 @@
> > +#include <asm/tsc.h>
> > +#include <asm/cpufeature.h>
> > +#include <asm/processor.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/seqlock.h>
> > +
> > +#define CPUID_ART_LEAF 0x15
> > +
> > +static bool art_present;
> > +
> > +static struct art_state {
> > +	seqcount_t seq;
> > +	u32 art_ratio_numerator;
> > +	u32 art_ratio_denominator;
> > +
> 
> This needs much better comments, IMO, including some discussion of how
> the locking works.

I'll work on the comments for version 2.

> 
> > +	cycle_t prev_art;
> > +	cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding
> to
> > +				     prev_art */
> > +	u32 tsc_remainder;
> > +} art_state ____cacheline_aligned;
> > +
> > +static DEFINE_RAW_SPINLOCK(art_lock);
> > +
> > +#define MIN_DENOMINATOR 2
> > +int setup_art(void)
> > +{
> > +	if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
> > +		return 0;
> > +	art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
> > +	if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
> > +		return 0;
> > +	art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
> > +
> > +	art_present = true;
> > +	return 0;
> > +}
> > +
> > +static bool has_art(void)
> > +{
> > +	return art_present;
> > +}
> > +EXPORT_SYMBOL(has_art);
> 
> IMO this should be a pseudo-feature X86_FEATURE_ART.

Good idea.

> 
> > +
> > +#define ROLLOVER_THRESHOLD (2ULL << 23)
> > +
> > +static u32 art_scale(struct art_state *art_state, cycle_t *art)
> > +{
> > +	u32 rem;
> > +
> > +	*art *= art_state->art_ratio_numerator;
> 
> This seems very likely to overflow.

I don't think it would.  The art (u64) value is a delta:

tsc_next = art - art_state.prev_art;
rem_next = art_scale(&art_state, &tsc_next);

The time to overflow would depend upon the TSC frequency and m, n
values, but it should handle 10+ years of delta.

I think you should be using the
> equivalent of mul_u128_u64_shr (which probably doesn't exist, but
> mul_u64_u32_shr does) or just open-code it using __uint128_t if this
> thing is x86_64-only.
> 
> > +static cycle_t art_to_tsc(cycle_t art)
> > +{
> 
> This function needs some kind of description of what it does.

More descriptive comments for v2.

> 
> Also, I don't think it's okay to have a pure-sounding thing like
> art_to_xyz actually be stateful and rely on a current value being
> passed.

I guess I see your point, but I'm not sure how to change it.  Would
a more descriptive verb (e.g. "convert") address this?

> 
> Also, how does this code account for the unspecified ART-to-TSC offset?
>   I don't see it in the code on a quick reading.

It's not there.  The assumption is that the TSC adjust registers is 0.

Thanks for your comments.

Chris

> 
> --Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-28  3:44   ` John Stultz
  2015-07-28  4:05     ` John Stultz
@ 2015-07-29  1:41     ` Hall, Christopher S
  2015-07-29 14:05       ` Thomas Gleixner
  1 sibling, 1 reply; 24+ messages in thread
From: Hall, Christopher S @ 2015-07-29  1:41 UTC (permalink / raw)
  To: 'John Stultz'
  Cc: Thomas Gleixner, Richard Cochran, Ingo Molnar, Kirsher,
	Jeffrey T, Ronciak, John, H. Peter Anvin, x86, lkml, netdev


> -----Original Message-----
> From: John Stultz [mailto:john.stultz@linaro.org]
> Sent: Monday, July 27, 2015 8:44 PM
> To: Hall, Christopher S
> Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
> Ronciak, John; H. Peter Anvin; x86@kernel.org; lkml;
> netdev@vger.kernel.org
> Subject: Re: [PATCH 1/5] Add functions producing system time given a
> backing counter value
> 
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> <christopher.s.hall@intel.com> wrote:
> > * counter_to_rawmono64
> > * counter_to_mono64
> > * counter_to_realtime64
> >
> > Enables drivers to translate a captured system clock counter to system
> > time. This is useful for network and audio devices that capture
> timestamps
> > in terms of both the system clock and device clock.
> 
> Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
> fact that the multiplier is constantly adjusted and corrected. So that
> calling the function twice with the same counter value may result in
> different returned values.
> 
> I've not yet groked the whole patchset, but it seems like there needs
> to be some mechanism that ensures the counter value is captured and
> used in the same (or at least close) interval that the timekeeper data
> is valid for.

The ART (and derived TSC) values are always in the past.  There's no
chance that we could exceed the interval.  I don't think any similar
usage would be a problem either.

Are you suggesting that, for completeness, this be enforced by the
conversion function?

I do a check here to make sure that the current counter value isn't before
the beginning of the current interval:

timekeeping_get_delta()
...
               if (cycle_now < tkr->cycle_last &&
                   tkr->cycle_last - cycle_now < ROLLOVER_THRESHOLD)
                        return -EAGAIN;

If tkr->cycle_last - cycle_now is large, the assumption is that
rollover occurred.  Otherwise, the caller should re-read the counter
so that it falls within the current interval.  In my "normal use"
testing, re-read never occurred.

Thanks for your input.

Chris

> 
> thanks
> -john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-29  1:18     ` Hall, Christopher S
@ 2015-07-29  1:47       ` Andy Lutomirski
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2015-07-29  1:47 UTC (permalink / raw)
  To: Hall, Christopher S
  Cc: Andy Lutomirski, john.stultz, tglx, richardcochran, mingo,
	Kirsher, Jeffrey T, Ronciak, John, hpa, x86, linux-kernel,
	netdev, Borislav Petkov

On Tue, Jul 28, 2015 at 6:18 PM, Hall, Christopher S
<christopher.s.hall@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Andy Lutomirski [mailto:luto@kernel.org]
>> Sent: Monday, July 27, 2015 6:32 PM
>> To: Hall, Christopher S; john.stultz@linaro.org; tglx@linutronix.de;
>> richardcochran@gmail.com; mingo@redhat.com; Kirsher, Jeffrey T; Ronciak,
>> John; hpa@zytor.com; x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org; netdev@vger.kernel.org; Borislav
>> Petkov
>> Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
>> (ART) to system time
>>
>> On 07/27/2015 05:46 PM, Christopher Hall wrote:
>> > * art_to_mono64
>> > * art_to_rawmono64
>> > * art_to_realtime64
>> >
>> > Intel audio and PCH ethernet devices use the Always Running Timer
>> (ART) to
>> > relate their device clock to system time
>> >
>> > Signed-off-by: Christopher Hall <christopher.s.hall@intel.com>
>> > ---
>> >   arch/x86/Kconfig           |  12 ++++
>> >   arch/x86/include/asm/art.h |  42 ++++++++++++++
>> >   arch/x86/kernel/Makefile   |   1 +
>> >   arch/x86/kernel/art.c      | 134
>> +++++++++++++++++++++++++++++++++++++++++++++
>> >   arch/x86/kernel/tsc.c      |   4 ++
>> >   5 files changed, 193 insertions(+)
>> >   create mode 100644 arch/x86/include/asm/art.h
>> >   create mode 100644 arch/x86/kernel/art.c
>> >
>> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> > index b3a1a5d..1ef9985 100644
>> > --- a/arch/x86/Kconfig
>> > +++ b/arch/x86/Kconfig
>> > @@ -1175,6 +1175,18 @@ config X86_CPUID
>> >       with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
>> >       /dev/cpu/31/cpuid.
>> >
>> > +config X86_ART
>> > +   bool "Always Running Timer"
>> > +   default y
>> > +   depends on X86_TSC
>> > +   ---help---
>> > +     This option provides functionality to drivers and devices that
>> use
>> > +     the always-running-timer (ART) to correlate their device clock
>> > +     counter with the system clock counter. The TSC is *exactly*
>> related
>> > +     to the ART by a ratio m/n specified by CPUID leaf 0x15
>> > +     (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
>> > +     performance impact. It's safe to say Y.
>> > +
>>
>> Is there a good reason to make this optional?
>
> If there aren't any objections, it sound OK to me.  So no, I don't know
> of any good reasons.
>
>>
>> Also, is there *still* no way to ask the thing for its nominal
>> frequnency?  Or can we expect CPUID leaf 16H to work on CPUs that
>> support this and can we expect it to actually work?
>
> There isn't any way to query nominal frequency.  CPUID leaf 0x15 only
> exposes the relationship between ART and TSC.  CPUID leaf 0x16 stays
> the more or less the same and isn't related to ART.
>
> The SDM says "The
>> returned information should not be used for any other purpose as the
>> returned information does not accurately correlate to information /
>> counters returned by other processor interfaces."
>>
>> Also, does this thing let us learn the real time base?  SDM 17.14.4
>> suggests that the ART value isn't affected by "privileged software" (aka
>> buggy/malicious firmware).  Or, alternatively, how do we learn the
>> offset K between ART and scaled TSC?
>
> ART isn't affected by software.  The determination of K used to convert ART to
> TSC is in a footnote (2) in that section of the SDM.  I'm not going to risk
> repeating it here and possibly altering its meaning.
>
>>
>> >   choice
>> >     prompt "High Memory Support"
>> >     default HIGHMEM4G
>> > diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
>> > new file mode 100644
>> > index 0000000..da58ce4
>> > --- /dev/null
>> > +++ b/arch/x86/include/asm/art.h
>> > @@ -0,0 +1,42 @@
>> > +/*
>> > + * x86 ART related functions
>> > + */
>> > +#ifndef _ASM_X86_ART_H
>> > +#define _ASM_X86_ART_H
>> > +
>> > +#ifndef CONFIG_X86_ART
>> > +
>> > +static inline int setup_art(void)
>> > +{
>> > +   return 0;
>> > +}
>> > +
>> > +static inline bool has_art(void)
>> > +{
>> > +   return false;
>> > +}
>> > +
>> > +static inline int art_to_rawmono64(struct timespec64 *rawmono,
>> cycle_t art)
>> > +{
>> > +   return -ENXIO;
>> > +}
>> > +static inline int art_to_realtime64(struct timespec64 *realtime,
>> cycle_t art)
>> > +{
>> > +   return -ENXIO;
>> > +}
>> > +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
>> > +{
>> > +   return -ENXIO;
>> > +}
>> > +
>> > +#else
>> > +
>> > +extern int setup_art(void);
>> > +extern bool has_art(void);
>> > +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
>> > +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t
>> art);
>> > +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
>> > +
>> > +#endif
>> > +
>> > +#endif/*_ASM_X86_ART_H*/
>> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> > index 0f15af4..0908311 100644
>> > --- a/arch/x86/kernel/Makefile
>> > +++ b/arch/x86/kernel/Makefile
>> > @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS)               +=
>> perf_regs.o
>> >   obj-$(CONFIG_TRACING)                     += tracepoint.o
>> >   obj-$(CONFIG_IOSF_MBI)                    += iosf_mbi.o
>> >   obj-$(CONFIG_PMC_ATOM)                    += pmc_atom.o
>> > +obj-$(CONFIG_X86_ART)                      += art.o
>> >
>> >   ###
>> >   # 64 bit specific files
>> > diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
>> > new file mode 100644
>> > index 0000000..1906cf0
>> > --- /dev/null
>> > +++ b/arch/x86/kernel/art.c
>> > @@ -0,0 +1,134 @@
>> > +#include <asm/tsc.h>
>> > +#include <asm/cpufeature.h>
>> > +#include <asm/processor.h>
>> > +#include <linux/spinlock.h>
>> > +#include <linux/seqlock.h>
>> > +
>> > +#define CPUID_ART_LEAF 0x15
>> > +
>> > +static bool art_present;
>> > +
>> > +static struct art_state {
>> > +   seqcount_t seq;
>> > +   u32 art_ratio_numerator;
>> > +   u32 art_ratio_denominator;
>> > +
>>
>> This needs much better comments, IMO, including some discussion of how
>> the locking works.
>
> I'll work on the comments for version 2.
>
>>
>> > +   cycle_t prev_art;
>> > +   cycle_t prev_tsc_corr_art; /*This is the TSC value corresponding
>> to
>> > +                                prev_art */
>> > +   u32 tsc_remainder;
>> > +} art_state ____cacheline_aligned;
>> > +
>> > +static DEFINE_RAW_SPINLOCK(art_lock);
>> > +
>> > +#define MIN_DENOMINATOR 2
>> > +int setup_art(void)
>> > +{
>> > +   if (boot_cpu_data.cpuid_level < CPUID_ART_LEAF)
>> > +           return 0;
>> > +   art_state.art_ratio_denominator = cpuid_eax(CPUID_ART_LEAF);
>> > +   if (art_state.art_ratio_denominator < MIN_DENOMINATOR)
>> > +           return 0;
>> > +   art_state.art_ratio_numerator = cpuid_ebx(CPUID_ART_LEAF);
>> > +
>> > +   art_present = true;
>> > +   return 0;
>> > +}
>> > +
>> > +static bool has_art(void)
>> > +{
>> > +   return art_present;
>> > +}
>> > +EXPORT_SYMBOL(has_art);
>>
>> IMO this should be a pseudo-feature X86_FEATURE_ART.
>
> Good idea.
>
>>
>> > +
>> > +#define ROLLOVER_THRESHOLD (2ULL << 23)
>> > +
>> > +static u32 art_scale(struct art_state *art_state, cycle_t *art)
>> > +{
>> > +   u32 rem;
>> > +
>> > +   *art *= art_state->art_ratio_numerator;
>>
>> This seems very likely to overflow.
>
> I don't think it would.  The art (u64) value is a delta:
>
> tsc_next = art - art_state.prev_art;
> rem_next = art_scale(&art_state, &tsc_next);
>
> The time to overflow would depend upon the TSC frequency and m, n
> values, but it should handle 10+ years of delta.

Since I don't work at Intel and I don't have access to prerelease
stuff, I'm just guessing.  But suppose that the numerator and
denominator are both around 2^31 and are similar to each other so the
ART counts at around TSC rate, i.e. ~2 GHz.  Then we overflow in 8
seconds, I think.

>
> I think you should be using the
>> equivalent of mul_u128_u64_shr (which probably doesn't exist, but
>> mul_u64_u32_shr does) or just open-code it using __uint128_t if this
>> thing is x86_64-only.
>>
>> > +static cycle_t art_to_tsc(cycle_t art)
>> > +{
>>
>> This function needs some kind of description of what it does.
>
> More descriptive comments for v2.
>
>>
>> Also, I don't think it's okay to have a pure-sounding thing like
>> art_to_xyz actually be stateful and rely on a current value being
>> passed.
>
> I guess I see your point, but I'm not sure how to change it.  Would
> a more descriptive verb (e.g. "convert") address this?

I can't really comment on what to call it since I still don't really
know what it does.  Maybe update_art_state?  Why is it stateful,
anyway?

>
>>
>> Also, how does this code account for the unspecified ART-to-TSC offset?
>>   I don't see it in the code on a quick reading.
>
> It's not there.  The assumption is that the TSC adjust registers is 0.

And the VMCS adjustment is zero, too?  That's not true in practice, I
think, so a passed-through PTM NIC won't work right.

FWIW, I have a shiny BIOS that intentionally changes my TSC offset on
boot.  I'm typing this email on that piece of crap right now.  Unless
the SDM says that this kind of tomfoolery is impossible, we should
assume that it's not only possible but that BIOS vendors will do it
out of sheer spite.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-28  4:11   ` John Stultz
@ 2015-07-29  2:05     ` Hall, Christopher S
  0 siblings, 0 replies; 24+ messages in thread
From: Hall, Christopher S @ 2015-07-29  2:05 UTC (permalink / raw)
  To: 'John Stultz'
  Cc: Thomas Gleixner, Richard Cochran, Ingo Molnar, Kirsher,
	Jeffrey T, Ronciak, John, H. Peter Anvin, x86, lkml, netdev



> -----Original Message-----
> From: John Stultz [mailto:john.stultz@linaro.org]
> Sent: Monday, July 27, 2015 9:11 PM
> To: Hall, Christopher S
> Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
> Ronciak, John; H. Peter Anvin; x86@kernel.org; lkml;
> netdev@vger.kernel.org
> Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
> (ART) to system time
> 
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> <christopher.s.hall@intel.com> wrote:
> > +static bool checked_art_to_tsc(cycle_t *tsc)
> > +{
> > +       if (!has_art())
> > +               return false;
> > +       *tsc = art_to_tsc(*tsc);
> > +       return true;
> > +}
> > +
> > +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> > +{
> > +       if (!checked_art_to_tsc(&art))
> > +               return -ENXIO;
> > +       return tsc_to_rawmono64(rawmono, art);
> > +}
> > +EXPORT_SYMBOL(art_to_rawmono64);
> 
> This all seems to assume the TSC is the current clocksource, which it
> may not be if the user has overridden it.

I don't make that assumption.  The counter_to_* functions take a
pointer to a clocksource struct.  They return -ENXIO if that clocksource
doesn’t match the current clocksource.

The tsc_to_* functions pass the tsc clocksource pointer to the counter_to_*
functions.  These tsc conversion functions are called by the art_to_*
functions.

> 
> If instead there were a counter_to_rawmono64() which took the counter
> value and maybe the name of the clocksource (if the strncmp is
> affordable for your use), it might be easier for the core to provide
> an error if the current timekeeping clocksource isn't the one the
> counter value is based on. This would also allow the tsc_to_*()
> midlayers to be dropped (since they don't seem to do much).
> 
> thanks
> -john

Again, thanks for your input.

Chris

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time
  2015-07-28  0:46 ` [PATCH 3/5] Add calls to translate Always Running Timer (ART) " Christopher Hall
  2015-07-28  1:32   ` Andy Lutomirski
  2015-07-28  4:11   ` John Stultz
@ 2015-07-29  8:25   ` Paul Bolle
  2 siblings, 0 replies; 24+ messages in thread
From: Paul Bolle @ 2015-07-29  8:25 UTC (permalink / raw)
  To: Christopher Hall
  Cc: john.stultz, tglx, richardcochran, mingo, jeffrey.t.kirsher,
	john.ronciak, hpa, x86, linux-kernel, netdev

On ma, 2015-07-27 at 17:46 -0700, Christopher Hall wrote:
> --- /dev/null
> +++ b/arch/x86/include/asm/art.h

> +#ifndef CONFIG_X86_ART
> +
> +static inline int setup_art(void)
> +{
> +	return 0;
> +}
> +
> +static inline bool has_art(void)
> +{
> +	return false;
> +}
> +
> +static inline int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> +	return -ENXIO;
> +}
> +static inline int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
> +{
> +	return -ENXIO;
> +}
> +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
> +{
> +	return -ENXIO;
> +}
> +
> +#else
> +
> +extern int setup_art(void);
> +extern bool has_art(void);
> +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
> +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t art);
> +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
> +
> +#endif

> --- /dev/null
> +++ b/arch/x86/kernel/art.c

> +static bool has_art(void)
> +{
> +	return art_present;
> +}
> +EXPORT_SYMBOL(has_art);

This exports a static function. Does that work?

> +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> +	if (!checked_art_to_tsc(&art))
> +		return -ENXIO;
> +	return tsc_to_rawmono64(rawmono, art);
> +}
> +EXPORT_SYMBOL(art_to_rawmono64);
> +
> +static int art_to_realtime64(struct timespec64 *realtime, cycle_t art)
> +{
> +	if (!checked_art_to_tsc(&art))
> +		return -ENXIO;
> +	return tsc_to_realtime64(realtime, art);
> +}
> +EXPORT_SYMBOL(art_to_realtime64);
> +
> +static int art_to_mono64(struct timespec64 *mono, cycle_t art)
> +{
> +	if (!checked_art_to_tsc(&art))
> +		return -ENXIO;
> +	return tsc_to_mono64(mono, art);
> +}
> +EXPORT_SYMBOL(art_to_mono64);

Ditto (three times).

By the way, this series doesn't add users for art_to_mono64() and
art_to_mono64(), right?

Thanks,


Paul Bolle

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-28  4:05     ` John Stultz
@ 2015-07-29 10:19       ` Thomas Gleixner
  2015-07-29 10:23         ` Thomas Gleixner
  2015-07-29 10:49         ` Peter Zijlstra
  0 siblings, 2 replies; 24+ messages in thread
From: Thomas Gleixner @ 2015-07-29 10:19 UTC (permalink / raw)
  To: John Stultz
  Cc: Christopher Hall, Richard Cochran, Ingo Molnar, Jeff Kirsher,
	john.ronciak, H. Peter Anvin, x86, lkml, netdev, Peter Zijlstra

On Mon, 27 Jul 2015, John Stultz wrote:
> On Mon, Jul 27, 2015 at 8:44 PM, John Stultz <john.stultz@linaro.org> wrote:
> > On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> > <christopher.s.hall@intel.com> wrote:
> >> * counter_to_rawmono64
> >> * counter_to_mono64
> >> * counter_to_realtime64
> >>
> >> Enables drivers to translate a captured system clock counter to system
> >> time. This is useful for network and audio devices that capture timestamps
> >> in terms of both the system clock and device clock.
> >
> > Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
> > fact that the multiplier is constantly adjusted and corrected. So that
> > calling the function twice with the same counter value may result in
> > different returned values.
> >
> > I've not yet groked the whole patchset, but it seems like there needs
> > to be some mechanism that ensures the counter value is captured and
> > used in the same (or at least close) interval that the timekeeper data
> > is valid for.
> 
> 
> So reading through. It looks like you only use art_to_realtime(), right?
> 
> So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
> frequency adjusted, it might be best to construct this delta in the
> following way.
> 
> 
> Add counter_to_rawmono64(), which should be able to safely calculate
> the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.
> 
> Use getnstime_raw_and_real() to get a immediate snapshot of current
> MONOTONIC_RAW  and REALTIME clocks.
> 
> Then calculate the delta between the snapshotted counter raw time, and
> the current raw time.  Then apply that offset to the current realtime.
> 
> The larger the raw-time delta, the larger the possible realtime error.
> But I think that will be as good as it gets.

I think that's still not the right approach. The whole purpose of this
is to get a precise snapshot of 

   - PTP time from the ETH device
and
   - current system time

Right now this does

      ktime_get_real();
      read_ptp_time();

which is obviously not precise.

The new hardware allows you to latch PTP time and ART time atomically
in the ETH device and read them out.

ART is the base clock of the TSC where

    TSC = K + (ART * n) / d;

So for this to work proper, we need a function which converts ART to
TSC. This is obviously x86/TSC specific code.

Now on the PTP side we need a callback provided by the device driver
to get the snapshot of the PTP and the ART.

So the proposed implementation merily calls that callback from the PTP
ioctl and then tries to do a magic CLOCK_REALTIME conversion of the
ART value. But that's just wrong as it does not guarantee a proper
correlation to the core timekeeping.

So what we really need is a function in the timekeeper core which gets
the PTP/ART timestamps from the device under the timekeeper sequence
counter and converts to clock realtime and raw monotonic.

That function is then called from the PTP ioctl.

Anything else is just 'lets hope it works and is precise enough'
voodoo.

Something like the below untested patch should be all we need for PTP
to be as precise as possible.

I don't know whether we need functionality to convert arbitrary
timestamps at all, but if we really need them then they are going to
be pretty simple and explicitely not precise for anything else than
clock monotonic raw. But that's a different story.

Lets concentrate on PTP first and talk about the other stuff once we
settled the use case which actually has a precision requirement.

Thanks,

	tglx

----------------------------------------->
Subject: ptp: Get sync timestamps
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 29 Jul 2015 10:52:06 +0200

The ART stuff wants to be splitted out.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc.c       |   27 ++++++++++++++++++
 include/linux/clocksource.h |   30 ++++++++++++++++++++
 include/linux/timekeeping.h |    4 ++
 kernel/time/timekeeping.c   |   63 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 124 insertions(+)

Index: linux/arch/x86/kernel/tsc.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc.c
+++ linux/arch/x86/kernel/tsc.c
@@ -1059,6 +1059,23 @@ int unsynchronized_tsc(void)
 	return 0;
 }
 
+static u32 tsc_numerator;
+static u32 tsc_denominator;
+/*
+ * CHECKME: Do we need the adjust value? It should be 0, but if we run
+ * in a VM this might be a different story.
+ */
+static u64 tsc_adjust;
+
+static u64 art_to_tsc(u64 cycles)
+{
+	/* FIXME: This needs 128bit math to work proper */
+	return tsc_adjust + (cycles * tsc_numerator) / tsc_denominator;
+}
+
+struct correlated_cs art_timestamper = {
+	.convert	= art_to_tsc,
+};
 
 static void tsc_refine_calibration_work(struct work_struct *work);
 static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work);
@@ -1129,6 +1146,16 @@ static void tsc_refine_calibration_work(
 		(unsigned long)tsc_khz / 1000,
 		(unsigned long)tsc_khz % 1000);
 
+	/*
+	 * TODO:
+	 *
+	 * If the system has ART, initialize the art_to_tsc conversion
+	 * and set: art_timestamp.related_cs = &tsc_clocksource.
+	 *
+	 * Before that point a call to get_correlated_timestamp will
+	 * fail the clocksource match check and return -ENODEV
+	 */
+
 out:
 	clocksource_register_khz(&clocksource_tsc, tsc_khz);
 }
Index: linux/include/linux/clocksource.h
===================================================================
--- linux.orig/include/linux/clocksource.h
+++ linux/include/linux/clocksource.h
@@ -258,4 +258,34 @@ void acpi_generic_timer_init(void);
 static inline void acpi_generic_timer_init(void) { }
 #endif
 
+/**
+ * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource
+ * @related_cs:		Pointer to the related timekeeping clocksource
+ * @convert:		Conversion function to convert a timestamp from
+ *			the correlated clocksource to cycles of the related
+ *			timekeeping clocksource
+ */
+struct correlated_cs {
+	struct clocksource	*related_cs;
+	u64			(*convert)(u64 cycles);
+};
+
+struct correlated_ts;
+
+/**
+ * struct correlated_ts - Descriptor for taking a correlated time stamp
+ * @get_ts:		Function to read out a synced system and device
+ *			timestamp
+ * @system_ts:		The raw system clock timestamp
+ * @device_ts:		The raw device timestamp
+ * @system_real:	@system_ts converted to CLOCK_REALTIME
+ * @system_raw:		@system_ts converted to CLOCK_MONOTONIC_RAW
+ */
+struct correlated_ts {
+	int			(*get_ts)(struct correlated_ts *ts);
+	u64			system_ts;
+	u64			device_ts;
+	u64			system_real;
+	u64			system_raw;
+};
 #endif /* _LINUX_CLOCKSOURCE_H */
Index: linux/include/linux/timekeeping.h
===================================================================
--- linux.orig/include/linux/timekeeping.h
+++ linux/include/linux/timekeeping.h
@@ -258,6 +258,10 @@ extern void timekeeping_inject_sleeptime
  */
 extern void getnstime_raw_and_real(struct timespec *ts_raw,
 				   struct timespec *ts_real);
+struct correlated_ts;
+struct correlated_cs;
+extern int get_correlated_timestamp(struct correlated_ts *crt,
+				    struct correlated_cs *crs);
 
 /*
  * Persistent clock related interfaces
Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -312,6 +312,19 @@ static inline s64 timekeeping_get_ns(str
 	return nsec + arch_gettimeoffset();
 }
 
+static inline s64 timekeeping_convert_to_ns(struct tk_read_base *tkr,
+					    cycle_t cycles)
+{
+	cycle_t delta;
+	s64 nsec;
+
+	/* calculate the delta since the last update_wall_time */
+	delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+
+	nsec = delta * tkr->mult + tkr->xtime_nsec;
+	return nsec >> tkr->shift;
+}
+
 /**
  * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
  * @tkr: Timekeeping readout base from which we take the update
@@ -885,6 +898,56 @@ EXPORT_SYMBOL(getnstime_raw_and_real);
 #endif /* CONFIG_NTP_PPS */
 
 /**
+ * get_correlated_timestamp - Get a correlated timestamp
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_correlated_timestamp(struct correlated_ts *crt,
+			     struct correlated_cs *crs)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	unsigned long seq;
+	cycles_t cycles;
+	ktime_t base;
+	s64 nsecs;
+	int ret;
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		/*
+		 * Verify that the correlated clocksoure is related to
+		 * the currently installed timekeeper clocksoure
+		 */
+		if (tk->tkr_mono.clock != crs->related_cs)
+			return -ENODEV;
+
+		/*
+		 * Try to get a timestamp from the device.
+		 */
+		ret = crt->get_ts(crt);
+		if (ret)
+			return ret;
+
+		/*
+		 * Convert the timestamp to timekeeper clock cycles
+		 */
+		cycles = crs->convert(crs, crt->system_ts);
+
+		/* Convert to clock realtime */
+		base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
+		nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
+		crt->system_real = ktime_add_ns(base, nsecs);
+
+		/* Convert to clock raw monotonic */
+		base = tk->tkr_raw.base;
+		nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
+		crt->system_raw = ktime_add_ns(base, nsecs);
+
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+	return 0;
+}
+
+/**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:		pointer to the timeval to be set
  *

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 10:19       ` Thomas Gleixner
@ 2015-07-29 10:23         ` Thomas Gleixner
  2015-07-29 10:51           ` Peter Zijlstra
  2015-07-29 12:30           ` Richard Cochran
  2015-07-29 10:49         ` Peter Zijlstra
  1 sibling, 2 replies; 24+ messages in thread
From: Thomas Gleixner @ 2015-07-29 10:23 UTC (permalink / raw)
  To: John Stultz
  Cc: Christopher Hall, Richard Cochran, Ingo Molnar, Jeff Kirsher,
	john.ronciak, H. Peter Anvin, x86, lkml, netdev, Peter Zijlstra

On Wed, 29 Jul 2015, Thomas Gleixner wrote:
> On Mon, 27 Jul 2015, John Stultz wrote:
> > On Mon, Jul 27, 2015 at 8:44 PM, John Stultz <john.stultz@linaro.org> wrote:
> > > On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> > > <christopher.s.hall@intel.com> wrote:
> > >> * counter_to_rawmono64
> > >> * counter_to_mono64
> > >> * counter_to_realtime64
> > >>
> > >> Enables drivers to translate a captured system clock counter to system
> > >> time. This is useful for network and audio devices that capture timestamps
> > >> in terms of both the system clock and device clock.
> > >
> > > Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
> > > fact that the multiplier is constantly adjusted and corrected. So that
> > > calling the function twice with the same counter value may result in
> > > different returned values.
> > >
> > > I've not yet groked the whole patchset, but it seems like there needs
> > > to be some mechanism that ensures the counter value is captured and
> > > used in the same (or at least close) interval that the timekeeper data
> > > is valid for.
> > 
> > 
> > So reading through. It looks like you only use art_to_realtime(), right?
> > 
> > So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
> > frequency adjusted, it might be best to construct this delta in the
> > following way.
> > 
> > 
> > Add counter_to_rawmono64(), which should be able to safely calculate
> > the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.
> > 
> > Use getnstime_raw_and_real() to get a immediate snapshot of current
> > MONOTONIC_RAW  and REALTIME clocks.
> > 
> > Then calculate the delta between the snapshotted counter raw time, and
> > the current raw time.  Then apply that offset to the current realtime.
> > 
> > The larger the raw-time delta, the larger the possible realtime error.
> > But I think that will be as good as it gets.
> 
> I think that's still not the right approach. The whole purpose of this
> is to get a precise snapshot of 
> 
>    - PTP time from the ETH device
> and
>    - current system time
> 
> Right now this does
> 
>       ktime_get_real();
>       read_ptp_time();
> 
> which is obviously not precise.
> 
> The new hardware allows you to latch PTP time and ART time atomically
> in the ETH device and read them out.
> 
> ART is the base clock of the TSC where
> 
>     TSC = K + (ART * n) / d;
> 
> So for this to work proper, we need a function which converts ART to
> TSC. This is obviously x86/TSC specific code.
> 
> Now on the PTP side we need a callback provided by the device driver
> to get the snapshot of the PTP and the ART.
> 
> So the proposed implementation merily calls that callback from the PTP
> ioctl and then tries to do a magic CLOCK_REALTIME conversion of the
> ART value. But that's just wrong as it does not guarantee a proper
> correlation to the core timekeeping.
> 
> So what we really need is a function in the timekeeper core which gets
> the PTP/ART timestamps from the device under the timekeeper sequence
> counter and converts to clock realtime and raw monotonic.
> 
> That function is then called from the PTP ioctl.
> 
> Anything else is just 'lets hope it works and is precise enough'
> voodoo.
> 
> Something like the below untested patch should be all we need for PTP
> to be as precise as possible.
> 
> I don't know whether we need functionality to convert arbitrary
> timestamps at all, but if we really need them then they are going to
> be pretty simple and explicitely not precise for anything else than
> clock monotonic raw. But that's a different story.
> 
> Lets concentrate on PTP first and talk about the other stuff once we
> settled the use case which actually has a precision requirement.

Gah crap. Picked a stale version of the patch.
 
Thanks,

	tglx

----------------------------------------->
Subject: ptp: Get sync timestamps
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 29 Jul 2015 10:52:06 +0200

The ART stuff wants to be splitted out.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc.c       |   31 +++++++++++++++++++++
 include/linux/clocksource.h |   30 ++++++++++++++++++++
 include/linux/timekeeping.h |    4 ++
 kernel/time/timekeeping.c   |   63 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 128 insertions(+)

Index: linux/arch/x86/kernel/tsc.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc.c
+++ linux/arch/x86/kernel/tsc.c
@@ -1059,6 +1059,27 @@ int unsynchronized_tsc(void)
 	return 0;
 }
 
+static u32 tsc_numerator;
+static u32 tsc_denominator;
+/*
+ * CHECKME: Do we need the adjust value? It should be 0, but if we run
+ * in a VM this might be a different story.
+ */
+static u64 tsc_adjust;
+
+static u64 art_to_tsc(u64 cycles)
+{
+	u64 tmp, res = tsc_adjust;
+
+	res += (cycles / tsc_denominator) * tsc_numerator;
+	tmp = (cycles % tsc_denominator) * tsc_numerator;
+	res += tmp / tsc_denominator;
+	return res;
+}
+
+struct correlated_cs art_timestamper = {
+	.convert	= art_to_tsc,
+};
 
 static void tsc_refine_calibration_work(struct work_struct *work);
 static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work);
@@ -1129,6 +1150,16 @@ static void tsc_refine_calibration_work(
 		(unsigned long)tsc_khz / 1000,
 		(unsigned long)tsc_khz % 1000);
 
+	/*
+	 * TODO:
+	 *
+	 * If the system has ART, initialize the art_to_tsc conversion
+	 * and set: art_timestamp.related_cs = &tsc_clocksource.
+	 *
+	 * Before that point a call to get_correlated_timestamp will
+	 * fail the clocksource match check.
+	 */
+
 out:
 	clocksource_register_khz(&clocksource_tsc, tsc_khz);
 }
Index: linux/include/linux/clocksource.h
===================================================================
--- linux.orig/include/linux/clocksource.h
+++ linux/include/linux/clocksource.h
@@ -258,4 +258,34 @@ void acpi_generic_timer_init(void);
 static inline void acpi_generic_timer_init(void) { }
 #endif
 
+/**
+ * struct correlated_cs - Descriptor for a clocksource correlated to another clocksource
+ * @related_cs:		Pointer to the related timekeeping clocksource
+ * @convert:		Conversion function to convert a timestamp from
+ *			the correlated clocksource to cycles of the related
+ *			timekeeping clocksource
+ */
+struct correlated_cs {
+	struct clocksource	*related_cs;
+	u64			(*convert)(u64 cycles);
+};
+
+struct correlated_ts;
+
+/**
+ * struct correlated_ts - Descriptor for taking a correlated time stamp
+ * @get_ts:		Function to read out a synced system and device
+ *			timestamp
+ * @system_ts:		The raw system clock timestamp
+ * @device_ts:		The raw device timestamp
+ * @system_real:	@system_ts converted to CLOCK_REALTIME
+ * @system_raw:		@system_ts converted to CLOCK_MONOTONIC_RAW
+ */
+struct correlated_ts {
+	int			(*get_ts)(struct correlated_ts *ts);
+	u64			system_ts;
+	u64			device_ts;
+	u64			system_real;
+	u64			system_raw;
+};
 #endif /* _LINUX_CLOCKSOURCE_H */
Index: linux/include/linux/timekeeping.h
===================================================================
--- linux.orig/include/linux/timekeeping.h
+++ linux/include/linux/timekeeping.h
@@ -258,6 +258,10 @@ extern void timekeeping_inject_sleeptime
  */
 extern void getnstime_raw_and_real(struct timespec *ts_raw,
 				   struct timespec *ts_real);
+struct correlated_ts;
+struct correlated_cs;
+extern int get_correlated_timestamp(struct correlated_ts *crt,
+				    struct correlated_cs *crs);
 
 /*
  * Persistent clock related interfaces
Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -312,6 +312,19 @@ static inline s64 timekeeping_get_ns(str
 	return nsec + arch_gettimeoffset();
 }
 
+static inline s64 timekeeping_convert_to_ns(struct tk_read_base *tkr,
+					    cycle_t cycles)
+{
+	cycle_t delta;
+	s64 nsec;
+
+	/* calculate the delta since the last update_wall_time */
+	delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+
+	nsec = delta * tkr->mult + tkr->xtime_nsec;
+	return nsec >> tkr->shift;
+}
+
 /**
  * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
  * @tkr: Timekeeping readout base from which we take the update
@@ -885,6 +898,56 @@ EXPORT_SYMBOL(getnstime_raw_and_real);
 #endif /* CONFIG_NTP_PPS */
 
 /**
+ * get_correlated_timestamp - Get a correlated timestamp
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_correlated_timestamp(struct correlated_ts *crt,
+			     struct correlated_cs *crs)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	unsigned long seq;
+	cycles_t cycles;
+	ktime_t base;
+	s64 nsecs;
+	int ret;
+
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		/*
+		 * Verify that the correlated clocksoure is related to
+		 * the currently installed timekeeper clocksoure
+		 */
+		if (tk->tkr_mono.clock != crs->related_cs)
+			return -ENODEV;
+
+		/*
+		 * Try to get a timestamp from the device.
+		 */
+		ret = crt->get_ts(crt);
+		if (ret)
+			return ret;
+
+		/*
+		 * Convert the timestamp to timekeeper clock cycles
+		 */
+		cycles = crs->convert(crs, crt->system_ts);
+
+		/* Convert to clock realtime */
+		base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
+		nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
+		crt->system_real = ktime_add_ns(base, nsecs);
+
+		/* Convert to clock raw monotonic */
+		base = tk->tkr_raw.base;
+		nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
+		crt->system_raw = ktime_add_ns(base, nsecs);
+
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+	return 0;
+}
+
+/**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:		pointer to the timeval to be set
  *

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 10:19       ` Thomas Gleixner
  2015-07-29 10:23         ` Thomas Gleixner
@ 2015-07-29 10:49         ` Peter Zijlstra
  2015-07-29 11:48           ` Richard Cochran
  2015-07-29 12:52           ` Thomas Gleixner
  1 sibling, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2015-07-29 10:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Christopher Hall, Richard Cochran, Ingo Molnar,
	Jeff Kirsher, john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Wed, Jul 29, 2015 at 12:19:06PM +0200, Thomas Gleixner wrote:
> I don't know whether we need functionality to convert arbitrary
> timestamps at all, but if we really need them then they are going to
> be pretty simple and explicitely not precise for anything else than
> clock monotonic raw. But that's a different story.

I think we need that too, and agreed, given NTP anything other than
MONO_RAW is going to be fuzzy at best.

> Index: linux/arch/x86/kernel/tsc.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/tsc.c
> +++ linux/arch/x86/kernel/tsc.c
> @@ -1059,6 +1059,23 @@ int unsynchronized_tsc(void)
>  	return 0;
>  }
>  
> +static u32 tsc_numerator;
> +static u32 tsc_denominator;
> +/*
> + * CHECKME: Do we need the adjust value? It should be 0, but if we run
> + * in a VM this might be a different story.
> + */
> +static u64 tsc_adjust;
> +
> +static u64 art_to_tsc(u64 cycles)
> +{
> +	/* FIXME: This needs 128bit math to work proper */
> +	return tsc_adjust + (cycles * tsc_numerator) / tsc_denominator;

Yeah, its really unfortunate its given as a divisor and not a shift.
That said I think, at least on the initial hardware, its 2, so:

	return mul_u64_u32_shr(cycles, tsc_numerator, 1);

Should do, given that TSC_ADJUST had better be 0.

> +}



> + * get_correlated_timestamp - Get a correlated timestamp
> + *
> + * Reads a timestamp from a device and correlates it to system time
> + */
> +int get_correlated_timestamp(struct correlated_ts *crt,
> +			     struct correlated_cs *crs)
> +{
> +	struct timekeeper *tk = &tk_core.timekeeper;
> +	unsigned long seq;
> +	cycles_t cycles;
> +	ktime_t base;
> +	s64 nsecs;
> +	int ret;
> +
> +	do {
> +		seq = read_seqcount_begin(&tk_core.seq);
> +		/*
> +		 * Verify that the correlated clocksoure is related to
> +		 * the currently installed timekeeper clocksoure
> +		 */
> +		if (tk->tkr_mono.clock != crs->related_cs)
> +			return -ENODEV;
> +
> +		/*
> +		 * Try to get a timestamp from the device.
> +		 */
> +		ret = crt->get_ts(crt);
> +		if (ret)
> +			return ret;
> +
> +		/*
> +		 * Convert the timestamp to timekeeper clock cycles
> +		 */
> +		cycles = crs->convert(crs, crt->system_ts);
> +
> +		/* Convert to clock realtime */
> +		base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
> +		nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
> +		crt->system_real = ktime_add_ns(base, nsecs);
> +
> +		/* Convert to clock raw monotonic */
> +		base = tk->tkr_raw.base;
> +		nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
> +		crt->system_raw = ktime_add_ns(base, nsecs);
> +
> +	} while (read_seqcount_retry(&tk_core.seq, seq));
> +	return 0;
> +}

This is still fuzzy, right? The hardware ART timestamp could be from
_before_ the NTP adjust; or the NTP adjust could happen while we do this
conversion and we'll take the retry loop.

In both cases, the resulting value is computed using a different slope
than was in effect at the time of the stamp. With the end result being
slightly off from what it should be.

With the PTP case the ART timestamp is very recent, so the fuzz is
smallest, but its still there.

Any other ART users (buffered ETH frames) the delay will only get
bigger.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 10:23         ` Thomas Gleixner
@ 2015-07-29 10:51           ` Peter Zijlstra
  2015-07-29 12:30           ` Richard Cochran
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2015-07-29 10:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Christopher Hall, Richard Cochran, Ingo Molnar,
	Jeff Kirsher, john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Wed, Jul 29, 2015 at 12:23:27PM +0200, Thomas Gleixner wrote:
> +/*
> + * CHECKME: Do we need the adjust value? It should be 0, but if we run
> + * in a VM this might be a different story.
> + */
> +static u64 tsc_adjust;

Urgh, VMs have !0 values here?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 10:49         ` Peter Zijlstra
@ 2015-07-29 11:48           ` Richard Cochran
  2015-07-29 12:35             ` Peter Zijlstra
  2015-07-29 12:52           ` Thomas Gleixner
  1 sibling, 1 reply; 24+ messages in thread
From: Richard Cochran @ 2015-07-29 11:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, John Stultz, Christopher Hall, Ingo Molnar,
	Jeff Kirsher, john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Wed, Jul 29, 2015 at 12:49:44PM +0200, Peter Zijlstra wrote:
> This is still fuzzy, right? The hardware ART timestamp could be from
> _before_ the NTP adjust; or the NTP adjust could happen while we do this
> conversion and we'll take the retry loop.

In the original series, yes.
 
> In both cases, the resulting value is computed using a different slope
> than was in effect at the time of the stamp. With the end result being
> slightly off from what it should be.

In Thomas' patch the get_ts() is meant to read fresh pairs of time
stamps from within the loop.


Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 10:23         ` Thomas Gleixner
  2015-07-29 10:51           ` Peter Zijlstra
@ 2015-07-29 12:30           ` Richard Cochran
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Cochran @ 2015-07-29 12:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: John Stultz, Christopher Hall, Ingo Molnar, Jeff Kirsher,
	john.ronciak, H. Peter Anvin, x86, lkml, netdev, Peter Zijlstra

On Wed, Jul 29, 2015 at 12:23:27PM +0200, Thomas Gleixner wrote:
> > Something like the below untested patch should be all we need for PTP
> > to be as precise as possible.

Yes, that is as good as it can be.  The code protects against
concurrent NTP adjustments, and the PTP driver will have to block
changes to its clock during the ioctl.

> > I don't know whether we need functionality to convert arbitrary
> > timestamps at all, but if we really need them then they are going to
> > be pretty simple and explicitely not precise for anything else than
> > clock monotonic raw. But that's a different story.
> > 
> > Lets concentrate on PTP first and talk about the other stuff once we
> > settled the use case which actually has a precision requirement.

The PTP ioctl only needs the REALTIME value, and so the MONO-RAW bit
could be dropped for now.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 11:48           ` Richard Cochran
@ 2015-07-29 12:35             ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2015-07-29 12:35 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Thomas Gleixner, John Stultz, Christopher Hall, Ingo Molnar,
	Jeff Kirsher, john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Wed, Jul 29, 2015 at 01:48:41PM +0200, Richard Cochran wrote:
> On Wed, Jul 29, 2015 at 12:49:44PM +0200, Peter Zijlstra wrote:
> > This is still fuzzy, right? The hardware ART timestamp could be from
> > _before_ the NTP adjust; or the NTP adjust could happen while we do this
> > conversion and we'll take the retry loop.
> 
> In the original series, yes.
>  
> > In both cases, the resulting value is computed using a different slope
> > than was in effect at the time of the stamp. With the end result being
> > slightly off from what it should be.
> 
> In Thomas' patch the get_ts() is meant to read fresh pairs of time
> stamps from within the loop.

Ah, indeed, I missed that. Yes if that's possible we get it right.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29 10:49         ` Peter Zijlstra
  2015-07-29 11:48           ` Richard Cochran
@ 2015-07-29 12:52           ` Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2015-07-29 12:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: John Stultz, Christopher Hall, Richard Cochran, Ingo Molnar,
	Jeff Kirsher, john.ronciak, H. Peter Anvin, x86, lkml, netdev

On Wed, 29 Jul 2015, Peter Zijlstra wrote:
> On Wed, Jul 29, 2015 at 12:19:06PM +0200, Thomas Gleixner wrote:
> > I don't know whether we need functionality to convert arbitrary
> > timestamps at all, but if we really need them then they are going to
> > be pretty simple and explicitely not precise for anything else than
> > clock monotonic raw. But that's a different story.
> 
> I think we need that too, and agreed, given NTP anything other than
> MONO_RAW is going to be fuzzy at best.

Yes, but that's a trivial case.
 
> > +static u64 art_to_tsc(u64 cycles)
> > +{
> > +	/* FIXME: This needs 128bit math to work proper */
> > +	return tsc_adjust + (cycles * tsc_numerator) / tsc_denominator;
> 
> Yeah, its really unfortunate its given as a divisor and not a shift.
> That said I think, at least on the initial hardware, its 2, so:
> 
> 	return mul_u64_u32_shr(cycles, tsc_numerator, 1);
> 
> Should do, given that TSC_ADJUST had better be 0.

I don't trust BIOS folks :)

+       u64 tmp, res = tsc_adjust;
+
+       res += (cycles / tsc_denominator) * tsc_numerator;
+       tmp = (cycles % tsc_denominator) * tsc_numerator;
+       res += tmp / tsc_denominator;
+       return res;

That's what I have in my final patch.
 
> > +	do {
> > +		seq = read_seqcount_begin(&tk_core.seq);
> > +		/*
> > +		 * Verify that the correlated clocksoure is related to
> > +		 * the currently installed timekeeper clocksoure
> > +		 */
> > +		if (tk->tkr_mono.clock != crs->related_cs)
> > +			return -ENODEV;
> > +
> > +		/*
> > +		 * Try to get a timestamp from the device.
> > +		 */
> > +		ret = crt->get_ts(crt);
> > +		if (ret)
> > +			return ret;
> > +
> > +		/*
> > +		 * Convert the timestamp to timekeeper clock cycles
> > +		 */
> > +		cycles = crs->convert(crs, crt->system_ts);
> > +
> > +		/* Convert to clock realtime */
> > +		base = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real);
> > +		nsecs = timekeeping_convert_to_ns(&tk->tkr_mono, cycles);
> > +		crt->system_real = ktime_add_ns(base, nsecs);
> > +
> > +		/* Convert to clock raw monotonic */
> > +		base = tk->tkr_raw.base;
> > +		nsecs = timekeeping_convert_to_ns(&tk->tkr_raw, cycles);
> > +		crt->system_raw = ktime_add_ns(base, nsecs);
> > +
> > +	} while (read_seqcount_retry(&tk_core.seq, seq));
> > +	return 0;
> > +}
> 
> This is still fuzzy, right? The hardware ART timestamp could be from
> _before_ the NTP adjust; or the NTP adjust could happen while we do this
> conversion and we'll take the retry loop.

I read the timestamp pair in the loop, so its always consistent.
 
> Any other ART users (buffered ETH frames) the delay will only get
> bigger.

That's a different story and we really can only convert this to
monotonic raw.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 1/5] Add functions producing system time given a backing counter value
  2015-07-29  1:41     ` Hall, Christopher S
@ 2015-07-29 14:05       ` Thomas Gleixner
  0 siblings, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2015-07-29 14:05 UTC (permalink / raw)
  To: Hall, Christopher S
  Cc: 'John Stultz',
	Richard Cochran, Ingo Molnar, Kirsher, Jeffrey T, Ronciak, John,
	H. Peter Anvin, x86, lkml, netdev

On Wed, 29 Jul 2015, Hall, Christopher S wrote:
> 
> > -----Original Message-----
> > From: John Stultz [mailto:john.stultz@linaro.org]
> > Sent: Monday, July 27, 2015 8:44 PM
> > To: Hall, Christopher S
> > Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
> > Ronciak, John; H. Peter Anvin; x86@kernel.org; lkml;
> > netdev@vger.kernel.org
> > Subject: Re: [PATCH 1/5] Add functions producing system time given a
> > backing counter value
> > 
> > On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
> > <christopher.s.hall@intel.com> wrote:
> > > * counter_to_rawmono64
> > > * counter_to_mono64
> > > * counter_to_realtime64
> > >
> > > Enables drivers to translate a captured system clock counter to system
> > > time. This is useful for network and audio devices that capture
> > timestamps
> > > in terms of both the system clock and device clock.
> > 
> > Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
> > fact that the multiplier is constantly adjusted and corrected. So that
> > calling the function twice with the same counter value may result in
> > different returned values.
> > 
> > I've not yet groked the whole patchset, but it seems like there needs
> > to be some mechanism that ensures the counter value is captured and
> > used in the same (or at least close) interval that the timekeeper data
> > is valid for.
> 
> The ART (and derived TSC) values are always in the past.  There's no
> chance that we could exceed the interval.  I don't think any similar
> usage would be a problem either.
> 
> Are you suggesting that, for completeness, this be enforced by the
> conversion function?
> 
> I do a check here to make sure that the current counter value isn't before
> the beginning of the current interval:
> 
> timekeeping_get_delta()
> ...
>                if (cycle_now < tkr->cycle_last &&
>                    tkr->cycle_last - cycle_now < ROLLOVER_THRESHOLD)
>                         return -EAGAIN;
> 
> If tkr->cycle_last - cycle_now is large, the assumption is that
> rollover occurred.  Otherwise, the caller should re-read the counter
> so that it falls within the current interval.  In my "normal use"
> testing, re-read never occurred.

Sure that never happens, because your rollover value is 2 << 39 for
whatever reasons.

So on a 1GHz machine that is (2 << 39) / 1e9 ~= 1099.51 seconds.

Oh well.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-07-29 14:05 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28  0:46 [PATCH 0/5] Patchset enabling hardware based cross-timestamps for next gen Intel platforms Christopher Hall
2015-07-28  0:46 ` [PATCH 1/5] Add functions producing system time given a backing counter value Christopher Hall
2015-07-28  3:44   ` John Stultz
2015-07-28  4:05     ` John Stultz
2015-07-29 10:19       ` Thomas Gleixner
2015-07-29 10:23         ` Thomas Gleixner
2015-07-29 10:51           ` Peter Zijlstra
2015-07-29 12:30           ` Richard Cochran
2015-07-29 10:49         ` Peter Zijlstra
2015-07-29 11:48           ` Richard Cochran
2015-07-29 12:35             ` Peter Zijlstra
2015-07-29 12:52           ` Thomas Gleixner
2015-07-29  1:41     ` Hall, Christopher S
2015-07-29 14:05       ` Thomas Gleixner
2015-07-28  0:46 ` [PATCH 2/5] Added functions mapping TSC value to system time Christopher Hall
2015-07-28  0:46 ` [PATCH 3/5] Add calls to translate Always Running Timer (ART) " Christopher Hall
2015-07-28  1:32   ` Andy Lutomirski
2015-07-29  1:18     ` Hall, Christopher S
2015-07-29  1:47       ` Andy Lutomirski
2015-07-28  4:11   ` John Stultz
2015-07-29  2:05     ` Hall, Christopher S
2015-07-29  8:25   ` Paul Bolle
2015-07-28  0:46 ` [PATCH 4/5] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl Christopher Hall
2015-07-28  0:46 ` [PATCH 5/5] Enables cross timestamping in the e1000e driver Christopher Hall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).