linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC 10/10] example of simple continuous gettimeofday
@ 2005-12-21 23:30 Roman Zippel
  2005-12-22  2:43 ` john stultz
  2005-12-22 10:29 ` Thomas Gleixner
  0 siblings, 2 replies; 7+ messages in thread
From: Roman Zippel @ 2005-12-21 23:30 UTC (permalink / raw)
  To: johnstul, linux-kernel


This patch demonstrates how a guaranteed continuous timeofday can be
implemented on top of the previous patches. (It's full of debug prints
and rather hacked into the current infrastructure, so please ignore
these parts :) ).

The previous patches convert the ntp code to precalculate as much as
possible, so that at every tick the reference time has only to be
updated by tick_nsec_curr/time_adj_curr. What this patch adds over
update_wall_time_one_tick() is that it modifies the frequency of the
local time to match the reference time instead of simply jumping to it.
It does this by maintaining an error value and modifying the multiplier
to reduce the error. 

The basic idea behind this is that ntp defines the frequency of the
reference time, e.g. with a clock frequency of f=500MHz the clock should
advance exactly 1 second after f cycles, but a quartz is usually not
that precise and varies depending on the temperature, so ntp defines an
adjustment, which should be applied every f cycles. tick_nsec and
time_adj now specify this frequency by how much the time is advanced
every tick. The resolution of the frequency is currently 2^-12nsec,
which allows for a very stable clock.

OTOH when reading the clock source it depends on the used multiplier by
how much the time is advanced every f cycles (this is also the time
visible to the user), e.g. in this case we would initialize the
multiplier to 8388608 (the shift value used in the patch is 22, so
m=10^9*2^22/f) so that after f cycles the time is f*m>>22=10^9nsec. To
keep the time now incrementing monotonically we can change the
multiplier to change the speed of the clock. It depends now on the used
shift value and the update frequency, how close we can keep the user
time to the reference time, e.g. with HZ=250 the possible frequency
adjustment steps can be f/2^22/10^9/HZ or about 0.477nsec, as the
resolution of the clock itself is 1/f (or 2nsec), the resulting
resolution is still well within the practical limits of the clock. This
also means a good shift value for a clock is around log2(f^2/10^9/HZ), a
much larger shift value doesn't improve much over the clock resolution
and a lower shift value makes the clock resolution worse.

In any case the clock resolution will be coarser than the resolution of
the ntp reference time, so with every update step at a timer tick we
accumulate an error of (ntp_update-xtime_update) and after some time the
error is large enough that it can be corrected by adjusting the
multiplier. The tricky part here is to prevent the error value from
oscillating too much, what can happen if the update comes too late and
we compensated too much, this is not much of a problem for small error
values, but larger error values are incrementally corrected instead and
not all at once. Another important detail is that the error value is
calculated after possible adjustments from second_overflow(), this way
we basically look ahead to the next timer tick and compensate for the
expected error and don't wait until we exceeded the error. This means a
well synchronized clock with small update delays almost always stays
within the error limits.


John, I hope this helps you to understand what I have in mind, please ask
if something is unclear, I'm already working on it for a while, so I
have no idea how digestable this is all at once. :)
I certainly have now a much better picture of what's possible and how to
further optimize this for different clocks. Here are a few general
comments about your code:

- with the previous patches the ntp code doesn't need much more changes
  anymore to implement a continuous time source, all you need is access
  to tick_nsec_curr, time_adj_curr and second_overflow() and you have to
  keep the generic code from calling update_wall_time().
  The ntp code certainly can use some more cosmetic changes, but I'd
  prefer to leave it for later.

- your code doesn't maintain the long term error, which will cause a
  random drift. This basically means you need a large shift value to
  increase the resolution and keep this error small. The code below can
  maintain a stable clock even at low resolutions.

- I still don't like the idea of a generic gettimeofday() as it prevents
  more optimized versions, e.g. on the one end with a 1MHz clock you only
  have usec resolution anyway and this allows to keep almost everything
  within 32bits. On the other end 64bit archs can avoid the "if (nsec >
  NSEC_PER_SEC)" by doing something like ppc64 does, but requires a
  different scaling of the values (to sec instead of nsec).

- the clock switch infrastructure can be merged with the clock set
  mechanism. When setting a clock some internal variables have to be
  updated as well, which can be reused for the clock switch basically
  by setting the clock immediately before the switch, so that both
  clocks run synchronously for a few cycles.


Anyway, most important for me right now is to make sure, the stuff below
is understood and I didn't make a stupid mistake somewhere. After this 
we'll see how to take it from there.


---

 arch/i386/kernel/time.c                  |   13 ++-
 arch/i386/kernel/timers/common.c         |   28 ++----
 arch/i386/kernel/timers/timer_tsc.c      |  132 +++++++++++++++++++++++++++++++
 include/asm-i386/mach-default/do_timer.h |    5 -
 include/asm-i386/timer.h                 |    2 
 kernel/time.c                            |    3 
 kernel/timer.c                           |   23 ++++-
 7 files changed, 181 insertions(+), 25 deletions(-)

Index: linux-2.6-mm/arch/i386/kernel/time.c
===================================================================
--- linux-2.6-mm.orig/arch/i386/kernel/time.c	2005-12-21 12:09:42.000000000 +0100
+++ linux-2.6-mm/arch/i386/kernel/time.c	2005-12-21 12:12:30.000000000 +0100
@@ -128,6 +128,10 @@ void do_gettimeofday(struct timeval *tv)
 	unsigned long usec, sec;
 	unsigned long max_ntp_tick;
 
+	if (cur_timer->gettimeofday) {
+		cur_timer->gettimeofday(tv);
+		return;
+	}
 	do {
 		unsigned long lost;
 
@@ -175,14 +179,17 @@ int do_settimeofday(struct timespec *tv)
 		return -EINVAL;
 
 	write_seqlock_irq(&xtime_lock);
+	printk("stod: %lu:%09lu\n", tv->tv_sec, tv->tv_nsec);
 	/*
 	 * This is revolting. We need to set "xtime" correctly. However, the
 	 * value in this location is the value at the most recent update of
 	 * wall time.  Discover what correction gettimeofday() would have
 	 * made, and then undo it!
 	 */
-	nsec -= cur_timer->get_offset() * NSEC_PER_USEC;
-	nsec -= (jiffies - wall_jiffies) * TICK_NSEC;
+	if (!cur_timer->time_was_set) {
+		nsec -= cur_timer->get_offset() * NSEC_PER_USEC;
+		nsec -= (jiffies - wall_jiffies) * TICK_NSEC;
+	}
 
 	wtm_sec  = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
 	wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
@@ -191,6 +198,8 @@ int do_settimeofday(struct timespec *tv)
 	set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
 
 	ntp_clear();
+	if (cur_timer->time_was_set)
+		cur_timer->time_was_set();
 	write_sequnlock_irq(&xtime_lock);
 	clock_was_set();
 	return 0;
Index: linux-2.6-mm/arch/i386/kernel/timers/common.c
===================================================================
--- linux-2.6-mm.orig/arch/i386/kernel/timers/common.c	2005-12-21 12:04:39.000000000 +0100
+++ linux-2.6-mm/arch/i386/kernel/timers/common.c	2005-12-21 12:12:30.000000000 +0100
@@ -23,46 +23,36 @@
  * device.
  */
 
-#define CALIBRATE_TIME	(5 * 1000020/HZ)
+#define CALIBRATE_TIME	((((5LL * LATCH * 1000000) << 20) / CLOCK_TICK_RATE) << 12)
 
 unsigned long calibrate_tsc(void)
 {
 	mach_prepare_counter();
 
 	{
-		unsigned long startlow, starthigh;
-		unsigned long endlow, endhigh;
-		unsigned long count;
+		unsigned long long start, end;
+		unsigned long count, dummy;
 
-		rdtsc(startlow,starthigh);
+		rdtscll(start);
 		mach_countup(&count);
-		rdtsc(endlow,endhigh);
+		rdtscll(end);
 
 
 		/* Error: ECTCNEVERSET */
 		if (count <= 1)
 			goto bad_ctc;
 
-		/* 64-bit subtract - gcc just messes up with long longs */
-		__asm__("subl %2,%0\n\t"
-			"sbbl %3,%1"
-			:"=a" (endlow), "=d" (endhigh)
-			:"g" (startlow), "g" (starthigh),
-			 "0" (endlow), "1" (endhigh));
+		end -= start;
 
 		/* Error: ECPUTOOFAST */
-		if (endhigh)
+		if (end > 0xffffffff)
 			goto bad_ctc;
 
 		/* Error: ECPUTOOSLOW */
-		if (endlow <= CALIBRATE_TIME)
+		if (end <= CALIBRATE_TIME >> 32)
 			goto bad_ctc;
 
-		__asm__("divl %2"
-			:"=a" (endlow), "=d" (endhigh)
-			:"r" (endlow), "0" (0), "1" (CALIBRATE_TIME));
-
-		return endlow;
+		return div_long_long_rem(CALIBRATE_TIME, end, &dummy);
 	}
 
 	/*
Index: linux-2.6-mm/arch/i386/kernel/timers/timer_tsc.c
===================================================================
--- linux-2.6-mm.orig/arch/i386/kernel/timers/timer_tsc.c	2005-12-21 12:09:42.000000000 +0100
+++ linux-2.6-mm/arch/i386/kernel/timers/timer_tsc.c	2005-12-21 12:12:30.000000000 +0100
@@ -344,6 +344,7 @@ int recalibrate_cpu_khz(void)
 }
 EXPORT_SYMBOL(recalibrate_cpu_khz);
 
+#if 0
 static void mark_offset_tsc(void)
 {
 	unsigned long lost,delay;
@@ -455,6 +456,124 @@ static void mark_offset_tsc(void)
 	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
 		jiffies_64++;
 }
+#endif
+
+extern unsigned long tick_nsec_curr, time_adj_curr;
+extern void second_overflow(void);
+
+u32 cycles_mult, update_cycles;
+u64 cycles_offset, xtime_nsec, xtime_update;
+s64 xtime_error;
+int error_adj;
+
+static void mark_offset_tsc(void)
+{
+	u64 cycles, update;
+	s64 off;
+	int adj;
+
+	//printk("[");
+	rdtscll(cycles);
+
+	while ((off = cycles - cycles_offset) >= update_cycles) {
+		//printk("*");
+		cycles_offset += update_cycles;
+		xtime_nsec += xtime_update;
+		if (xtime_nsec >= (u64)NSEC_PER_SEC << 22) {
+			xtime_nsec -= (u64)NSEC_PER_SEC << 22;
+			xtime.tv_sec++;
+			second_overflow();
+			if (!(xtime.tv_sec & 7))
+				printk("ft: %lld,%u,%llu,%u\n", xtime_error, cycles_mult, xtime_nsec, error_adj);
+		}
+		xtime.tv_nsec = xtime_nsec >> 22;
+		xtime_error += (u64)tick_nsec_curr << 22;
+		xtime_error += (u64)time_adj_curr << (22 - SHIFT_SCALE);
+		xtime_error -= xtime_update;
+	}
+
+	if (xtime_error > update_cycles / 2) {
+		//long dummy, tmp = div_ll_X_l_rem(xtime_error + update_cycles / 2, update_cycles, &dummy);
+		//printk("+");
+		adj = error_adj;
+		update = (u64)update_cycles << adj;
+		if (xtime_error > update) {
+			if (xtime_error > update << 1) {
+				update <<= 1;
+				error_adj = ++adj;
+				//printk("e+^:%ld\n", tmp);
+			}
+		} else if (adj) {
+			//printk("e+v:%ld\n", tmp);
+			error_adj--;
+			adj = 0;
+			update = update_cycles;
+		}
+		off <<= adj;
+		cycles_mult += 1 << adj;
+		xtime_update += update;
+		xtime_error -= update - off;
+		xtime_nsec -= off;
+	} else if (-xtime_error > update_cycles / 2) {
+		//long dummy, tmp = -div_ll_X_l_rem(-xtime_error + update_cycles / 2, update_cycles, &dummy);
+		//printk("-");
+		adj = error_adj;
+		update = (u64)update_cycles << adj;
+		if (-xtime_error > update) {
+			if (-xtime_error > update << 1) {
+				update <<= 1;
+				error_adj = ++adj;
+				//printk("e-^:%ld\n", tmp);
+			}
+		} else if (adj) {
+			//printk("e-v:%ld\n", tmp);
+			error_adj--;
+			adj = 0;
+			update = update_cycles;
+		}
+		off <<= adj;
+		cycles_mult -= 1 << adj;
+		xtime_update -= update;
+		xtime_error += update - off;
+		xtime_nsec += off;
+	}
+	//printk("]");
+}
+
+static int tod_test = 10;
+
+void tsc_gettimeofday(struct timeval *tv)
+{
+	u32 nsec, cycles;
+	int seq;
+
+	do {
+		seq = read_seqbegin(&xtime_lock);
+		tv->tv_sec = xtime.tv_sec;
+		rdtscl(cycles);
+		nsec = (xtime_nsec + (cycles - (u32)cycles_offset) * (u64)cycles_mult) >> 22;
+		if (nsec > NSEC_PER_SEC) {
+			nsec -= NSEC_PER_SEC;
+			tv->tv_sec++;
+		}
+		tv->tv_usec = nsec / 1000;
+	} while (unlikely(read_seqretry(&xtime_lock, seq)));
+	if (tod_test) {
+		tod_test--;
+		printk("gtod: %lu,%06lu\n", tv->tv_sec, tv->tv_usec);
+	}
+}
+
+void tsc_time_was_set(void)
+{
+	printk("tsc_time_was_set: %lu,%09lu\n", xtime.tv_sec, xtime.tv_nsec);
+	tod_test = 10;
+	rdtscll(cycles_offset);
+	xtime_nsec = (u64)xtime.tv_nsec << 22;
+	xtime_error = (u64)tick_nsec_curr << 22;
+	xtime_error += (u64)time_adj_curr << (22 - SHIFT_SCALE);
+	xtime_error -= xtime_update;
+}
 
 static int __init init_tsc(char* override)
 {
@@ -523,6 +642,17 @@ static int __init init_tsc(char* overrid
 		}
 
 		if (tsc_quotient) {
+			long dummy;
+
+			cycles_mult = ((u64)tsc_quotient * 1000) >> 10;
+			update_cycles = div_ll_X_l_rem((1000000000LL << 22) / HZ, cycles_mult, &dummy);
+			asm ("mull %1"
+				:"=A" (xtime_update)
+				:"rm" (cycles_mult), "a" (update_cycles));
+			ntp_clear();
+			tsc_time_was_set();
+			printk("tsc: %u,%u,%llu,%llu\n", cycles_mult, update_cycles, xtime_update, xtime_error);
+
 			fast_gettimeoffset_quotient = tsc_quotient;
 			use_tsc = 1;
 			/*
@@ -587,6 +717,8 @@ __setup("notsc", tsc_setup);
 static struct timer_opts timer_tsc = {
 	.name = "tsc",
 	.mark_offset = mark_offset_tsc, 
+	.gettimeofday = tsc_gettimeofday,
+	.time_was_set = tsc_time_was_set,
 	.get_offset = get_offset_tsc,
 	.monotonic_clock = monotonic_clock_tsc,
 	.delay = delay_tsc,
Index: linux-2.6-mm/include/asm-i386/mach-default/do_timer.h
===================================================================
--- linux-2.6-mm.orig/include/asm-i386/mach-default/do_timer.h	2005-12-21 12:04:39.000000000 +0100
+++ linux-2.6-mm/include/asm-i386/mach-default/do_timer.h	2005-12-21 12:12:30.000000000 +0100
@@ -16,7 +16,10 @@
 
 static inline void do_timer_interrupt_hook(struct pt_regs *regs)
 {
-	do_timer(regs);
+	if (cur_timer->gettimeofday)
+		do_timer2(regs);
+	else
+		do_timer(regs);
 #ifndef CONFIG_SMP
 	update_process_times(user_mode(regs));
 #endif
Index: linux-2.6-mm/include/asm-i386/timer.h
===================================================================
--- linux-2.6-mm.orig/include/asm-i386/timer.h	2005-12-21 12:04:39.000000000 +0100
+++ linux-2.6-mm/include/asm-i386/timer.h	2005-12-21 12:12:30.000000000 +0100
@@ -20,6 +20,8 @@
 struct timer_opts {
 	char* name;
 	void (*mark_offset)(void);
+	void (*gettimeofday)(struct timeval *);
+	void (*time_was_set)(void);
 	unsigned long (*get_offset)(void);
 	unsigned long long (*monotonic_clock)(void);
 	void (*delay)(unsigned long);
Index: linux-2.6-mm/kernel/time.c
===================================================================
--- linux-2.6-mm.orig/kernel/time.c	2005-12-21 12:12:26.000000000 +0100
+++ linux-2.6-mm/kernel/time.c	2005-12-21 12:12:30.000000000 +0100
@@ -320,6 +320,7 @@ int do_adjtimex(struct timex *txc)
 		    freq_adj = (s64)time_offset * mtemp;
 		    freq_adj = shift_right(freq_adj, (time_constant + SHIFT_PLL +
 						      2) * 2 - SHIFT_NSEC);
+		    printk("p: %ld,%lx,%llx", mtemp, time_offset, freq_adj);
 		    if (mtemp >= MINSEC && (time_status & STA_FLL || mtemp > MAXSEC)) {
 			if (time_offset < 0) {
 			    temp64 = (s64)-time_offset << SHIFT_NSEC;
@@ -330,8 +331,10 @@ int do_adjtimex(struct timex *txc)
 			    do_div(temp64, mtemp << SHIFT_FLL);
 			    freq_adj += temp64;
 			}
+			printk(",%llx", temp64);
 		    }
 		    freq_adj += time_freq;
+		    printk(",%lx->%llx\n", time_freq, freq_adj);
 		    freq_adj = min(freq_adj, (s64)MAXFREQ);
 		    time_freq = max(freq_adj, (s64)-MAXFREQ);
 		    time_offset = (time_offset / HZ) << SHIFT_HZ;
Index: linux-2.6-mm/kernel/timer.c
===================================================================
--- linux-2.6-mm.orig/kernel/timer.c	2005-12-21 12:12:26.000000000 +0100
+++ linux-2.6-mm/kernel/timer.c	2005-12-21 12:12:30.000000000 +0100
@@ -552,7 +552,7 @@ found:
  */
 unsigned long tick_usec = TICK_USEC; 		/* USER_HZ period (usec) */
 unsigned long tick_nsec;			/* ACTHZ period (nsec) */
-static unsigned long tick_nsec_curr;
+unsigned long tick_nsec_curr;
 
 /* 
  * The current time 
@@ -583,7 +583,7 @@ long time_maxerror = NTP_PHASE_LIMIT;	/*
 long time_esterror = NTP_PHASE_LIMIT;	/* estimated error (us)		*/
 static long time_phase;			/* phase offset (scaled us)	*/
 long time_freq;				/* frequency offset (scaled ppm)*/
-static long time_adj, time_adj_curr;	/* tick adjust (scaled 1 / HZ)	*/
+long time_adj, time_adj_curr;		/* tick adjust (scaled 1 / HZ)	*/
 long time_reftime;			/* time at last adjustment (s)	*/
 long time_adjust;
 
@@ -616,6 +616,7 @@ void ntp_update_frequency(void)
 	 * for the next HZ ticks with the remainder in freq (scaled by
 	 * (SHIFT_USEC - 3)).
 	 */
+	printk("f1: %ld,%lx,%ld\n", tick_usec, time_freq, time_offset);
 	time = tick_usec * 1000 * USER_HZ;
 	time += time_freq >> SHIFT_NSEC;
 
@@ -629,6 +630,7 @@ void ntp_update_frequency(void)
 	time_adj = time / HZ;
 
 	tick_nsec -= NSEC_PER_SEC / HZ - TICK_NSEC;
+	printk("f2: %lu,%lx\n", tick_nsec, time_adj);
 }
 
 /*
@@ -640,7 +642,7 @@ void ntp_update_frequency(void)
  * All the kudos should go to Dave for this stuff.
  *
  */
-static void second_overflow(void)
+void second_overflow(void)
 {
 	long ltemp;
 
@@ -779,6 +781,8 @@ static void update_wall_time(unsigned lo
 		ticks--;
 		update_wall_time_one_tick();
 		if (xtime.tv_nsec >= 1000000000) {
+			if (!(xtime.tv_sec & 127))
+				printk("f: %ld,%lx,%ld\n", tick_nsec_curr, time_adj_curr, time_offset);
 			xtime.tv_nsec -= 1000000000;
 			xtime.tv_sec++;
 			second_overflow();
@@ -907,6 +911,19 @@ void do_timer(struct pt_regs *regs)
 	softlockup_tick(regs);
 }
 
+void do_timer2(struct pt_regs *regs)
+{
+	unsigned long ticks;
+
+	jiffies_64++;
+	ticks = jiffies - wall_jiffies;
+	if (ticks) {
+		wall_jiffies += ticks;
+		calc_load(ticks);
+	}
+	softlockup_tick(regs);
+}
+
 #ifdef __ARCH_WANT_SYS_ALARM
 
 /*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC 10/10] example of simple continuous gettimeofday
  2005-12-21 23:30 [PATCH/RFC 10/10] example of simple continuous gettimeofday Roman Zippel
@ 2005-12-22  2:43 ` john stultz
  2005-12-22  9:01   ` Ingo Molnar
  2005-12-25 20:54   ` Roman Zippel
  2005-12-22 10:29 ` Thomas Gleixner
  1 sibling, 2 replies; 7+ messages in thread
From: john stultz @ 2005-12-22  2:43 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel

Hey Roman,
	First, thanks for bringing code along. This will hopefully let us
narrow down the specific differences between the two designs without
much hand-waving. Please do be patient with me as sometimes you can be
difficult to understand, and really sometimes I can be just dumb. For
most of this I'll try to re-state my understanding of your point and go
from there, that way you'll have a better idea where the confusion is
arising from.

Unfortunately I'm out of town after tomorrow, so this discussion will
probably not get very far until after the new year.

On Thu, 2005-12-22 at 00:30 +0100, Roman Zippel wrote:
> The previous patches convert the ntp code to precalculate as much as
> possible, so that at every tick the reference time has only to be
> updated by tick_nsec_curr/time_adj_curr. What this patch adds over
> update_wall_time_one_tick() is that it modifies the frequency of the
> local time to match the reference time instead of simply jumping to it.
> It does this by maintaining an error value and modifying the multiplier
> to reduce the error. 

My read: Uses NTP to modify the clock frequency value rather then just
adding or subtracting fixed units from system time.

Yep, basically this is what my code does as well. So far I think we're
in agreement at a high level.

> The basic idea behind this is that ntp defines the frequency of the
> reference time, e.g. with a clock frequency of f=500MHz the clock should
> advance exactly 1 second after f cycles, but a quartz is usually not
> that precise and varies depending on the temperature, so ntp defines an
> adjustment, which should be applied every f cycles. tick_nsec and
> time_adj now specify this frequency by how much the time is advanced
> every tick. The resolution of the frequency is currently 2^-12nsec,
> which allows for a very stable clock.

My read: This is how the current tick based timekeeping functions.

Continued agreement.


> OTOH when reading the clock source it depends on the used multiplier by
> how much the time is advanced every f cycles (this is also the time
> visible to the user), e.g. in this case we would initialize the
> multiplier to 8388608 (the shift value used in the patch is 22, so
> m=10^9*2^22/f) so that after f cycles the time is f*m>>22=10^9nsec. To
> keep the time now incrementing monotonically we can change the
> multiplier to change the speed of the clock. It depends now on the used
> shift value and the update frequency, how close we can keep the user
> time to the reference time, e.g. with HZ=250 the possible frequency
> adjustment steps can be f/2^22/10^9/HZ or about 0.477nsec, as the
> resolution of the clock itself is 1/f (or 2nsec), the resulting
> resolution is still well within the practical limits of the clock. This
> also means a good shift value for a clock is around log2(f^2/10^9/HZ), a
> much larger shift value doesn't improve much over the clock resolution
> and a lower shift value makes the clock resolution worse.


My Read: Adjust the frequency multiplier (using mult/shift units) to
apply the frequency adjustment. Picking a good shift unit is desirable.

My clocksource code uses exactly the same method. See comments in the
clocksource drivers and cyc2ns() for details.

> In any case the clock resolution will be coarser than the resolution of
> the ntp reference time, so with every update step at a timer tick we
> accumulate an error of (ntp_update-xtime_update) and after some time the
> error is large enough that it can be corrected by adjusting the
> multiplier. The tricky part here is to prevent the error value from
> oscillating too much, what can happen if the update comes too late and
> we compensated too much, this is not much of a problem for small error
> values, but larger error values are incrementally corrected instead and
> not all at once. Another important detail is that the error value is
> calculated after possible adjustments from second_overflow(), this way
> we basically look ahead to the next timer tick and compensate for the
> expected error and don't wait until we exceeded the error. This means a
> well synchronized clock with small update delays almost always stays
> within the error limits.

The specifics here are a little fuzzy from the first read. It seems the
point is that since frequency adjustments are made at fixed points, the
longer the interval is between the point, the greater possible error.
Thus if ticks are lost, or for whatever reason the intervals between
frequency changes is large, we may over-compensate from what the NTP
reference clock tells us.

I think I agree here, basically. One of the differences between the two
designs is that I've drawn stronger lines between the timekeeping code
and the NTP/reference clock code.

Your design seems to suggest keeping an NTP calculated reference time
that the timekeeping code uses to correct itself from (again, let me
know if I'm mis-understanding you).  In my view this is a little
redundant, because fundamentally this is what the ntp daemon and
adjtimex is already doing. The daemon uses the server's reference time
and figures out how far off it is from the OS's system time. It does
some filtering and feeds that offset into adjtimex where it is dampened
and used to modify a frequency adjustment.

So instead of using those offset/frequency adjustment values directly
(which my design tries to do), your design converts it to a fixed
adjustment to xtime which then becomes yet another reference clock to
the timekeeping code, which calculates yet another offset value and
converts that into a frequency adjustment.

So I think the real difference here is that I calculate the frequency
adjustment directly from the NTP freq/offset/tick values inside the NTP
code and allow the NTP daemon (along w/ the tick based dampening)
provide the feedback as to how accurate we are.

Otherwise (outside of which function does what) I think we're very much
in sync. Below I hope to show you how similar the designs are.

> John, I hope this helps you to understand what I have in mind, please ask
> if something is unclear, I'm already working on it for a while, so I
> have no idea how digestable this is all at once. :)

I think I'm following it (although I'll let you judge how well).

> I certainly have now a much better picture of what's possible and how to
> further optimize this for different clocks. Here are a few general
> comments about your code:
> 
> - with the previous patches the ntp code doesn't need much more changes
>   anymore to implement a continuous time source, all you need is access
>   to tick_nsec_curr, time_adj_curr and second_overflow() and you have to
>   keep the generic code from calling update_wall_time().

I still prefer the stronger isolation in my design (NTP code does not
directly change any timekeeping values, but provides NTP adjustment
values via function interfaces), but yes, at first glance I don't
believe your NTP patches break how my code functions.

>   The ntp code certainly can use some more cosmetic changes, but I'd
>   prefer to leave it for later.

Indeed. That's why I dropped the majority of my NTP cleanups. If we can
trickle the smaller changes in slowly, that's fine. I just wanted to
drop to the bare minimum.

> - your code doesn't maintain the long term error, which will cause a
>   random drift. This basically means you need a large shift value to
>   increase the resolution and keep this error small. The code below can
>   maintain a stable clock even at low resolutions.

I disagree here. I keep a 64 bit remainder value in
timeofday_periodic_hook() that accumulates the sub-nanosecond error when
converting from cycles to nanoseconds. If this is not what you mean,
could you please clarify?


> - I still don't like the idea of a generic gettimeofday() as it prevents
>   more optimized versions, e.g. on the one end with a 1MHz clock you only
>   have usec resolution anyway and this allows to keep almost everything
>   within 32bits. On the other end 64bit archs can avoid the "if (nsec >
>   NSEC_PER_SEC)" by doing something like ppc64 does, but requires a
>   different scaling of the values (to sec instead of nsec).

Fair enough. I agree arches should be able to have their own arch
specific implementations. If you really have to micro-optimize
everything, just don't enable CONFIG_GENERIC_TIME and have your own
timekeeping subsystem! 

But at the same time, I don't like the idea of needlessly having 26
different versions of gettimeofday that do exactly the same thing modulo
a few bugs. :)


> - the clock switch infrastructure can be merged with the clock set
>   mechanism. When setting a clock some internal variables have to be
>   updated as well, which can be reused for the clock switch basically
>   by setting the clock immediately before the switch, so that both
>   clocks run synchronously for a few cycles.

Well, that's a little hand-wavy, but yes, being able to switch
clocksources requires more code and I don't think that's a point of
contention here. :)

> Anyway, most important for me right now is to make sure, the stuff below
> is understood and I didn't make a stupid mistake somewhere. After this 
> we'll see how to take it from there.

Let me know how I do. :)

[snip - basic setup bits]
> Index: linux-2.6-mm/arch/i386/kernel/timers/timer_tsc.c
> ===================================================================
> --- linux-2.6-mm.orig/arch/i386/kernel/timers/timer_tsc.c	2005-12-21 12:09:42.000000000 +0100
> +++ linux-2.6-mm/arch/i386/kernel/timers/timer_tsc.c	2005-12-21 12:12:30.000000000 +0100
> @@ -455,6 +456,124 @@ static void mark_offset_tsc(void)
>  	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
>  		jiffies_64++;
>  }
> +#endif
> +
> +extern unsigned long tick_nsec_curr, time_adj_curr;
> +extern void second_overflow(void);
> +
> +u32 cycles_mult, update_cycles;
> +u64 cycles_offset, xtime_nsec, xtime_update;
> +s64 xtime_error;
> +int error_adj;
> +
> +static void mark_offset_tsc(void)
> +{
> +	u64 cycles, update;
> +	s64 off;
> +	int adj;
> +
> +	//printk("[");
> +	rdtscll(cycles);
> +
> +	while ((off = cycles - cycles_offset) >= update_cycles) {
> +		//printk("*");
> +		cycles_offset += update_cycles;
> +		xtime_nsec += xtime_update;
> +		if (xtime_nsec >= (u64)NSEC_PER_SEC << 22) {
> +			xtime_nsec -= (u64)NSEC_PER_SEC << 22;
> +			xtime.tv_sec++;
> +			second_overflow();
> +			if (!(xtime.tv_sec & 7))
> +				printk("ft: %lld,%u,%llu,%u\n", xtime_error, cycles_mult, xtime_nsec, error_adj);
> +		}
> +		xtime.tv_nsec = xtime_nsec >> 22;
> +		xtime_error += (u64)tick_nsec_curr << 22;
> +		xtime_error += (u64)time_adj_curr << (22 - SHIFT_SCALE);
> +		xtime_error -= xtime_update;
> +	}


Sooo.. let's rework this a bit (also dropping the printks):
	u64 cycle_delta, nsec_delta

	rdtscll(cycles);
	while ((off = cycles - cycles_offset) >= update_cycles) {
		cycles_delta += update_cycles;
		nsec_delta += xtime_update;

		xtime_error += (u64)tick_nsec_curr << 22;
		xtime_error += (u64)time_adj_curr << (22 - SHIFT_SCALE);
		xtime_error -= xtime_update;
	}
	xtime_nsec += nsec_delta;
	cycle_offset += cycle_delta;

	while (xtime_nsec >= (u64)NSEC_PER_SEC << 22) {
		xtime_nsec -= (u64)NSEC_PER_SEC << 22;
		xtime.tv_sec++;
		second_overflow();
	}
	xtime.tv_nsec = xtime_nsec >> 22;


Now lets look at my code in timeofday_periodic_hook():

	/* read time source & calc time since last call: */
	cycle_now = read_clocksource(clock);

	cycle_delta = (cycle_now - cycle_last) & clock->mask;

	delta_nsec = cyc2ns_fixed_rem(ts_interval, &cycle_delta, &remainder);
	cycle_last = (cycle_now - cycle_delta)&clock->mask;

	/* update system_time:  */
	__increment_system_time(delta_nsec);


You might take a look at cyc2ns_fixed_rem() to see exactly how similar
these are, but so far we're talking *really* close.


> +	if (xtime_error > update_cycles / 2) {
> +		//long dummy, tmp = div_ll_X_l_rem(xtime_error + update_cycles / 2, update_cycles, &dummy);
> +		//printk("+");
> +		adj = error_adj;
> +		update = (u64)update_cycles << adj;
> +		if (xtime_error > update) {
> +			if (xtime_error > update << 1) {
> +				update <<= 1;
> +				error_adj = ++adj;
> +				//printk("e+^:%ld\n", tmp);
> +			}
> +		} else if (adj) {
> +			//printk("e+v:%ld\n", tmp);
> +			error_adj--;
> +			adj = 0;
> +			update = update_cycles;
> +		}
> +		off <<= adj;
> +		cycles_mult += 1 << adj;
> +		xtime_update += update;
> +		xtime_error -= update - off;
> +		xtime_nsec -= off;
> +	} else if (-xtime_error > update_cycles / 2) {
> +		//long dummy, tmp = -div_ll_X_l_rem(-xtime_error + update_cycles / 2, update_cycles, &dummy);
> +		//printk("-");
> +		adj = error_adj;
> +		update = (u64)update_cycles << adj;
> +		if (-xtime_error > update) {
> +			if (-xtime_error > update << 1) {
> +				update <<= 1;
> +				error_adj = ++adj;
> +				//printk("e-^:%ld\n", tmp);
> +			}
> +		} else if (adj) {
> +			//printk("e-v:%ld\n", tmp);
> +			error_adj--;
> +			adj = 0;
> +			update = update_cycles;
> +		}
> +		off <<= adj;
> +		cycles_mult -= 1 << adj;
> +		xtime_update -= update;
> +		xtime_error += update - off;
> +		xtime_nsec += off;
> +	}
> +	//printk("]");
> +}

Ok, so here's where we start to see a difference. First it looks like
this should be in a function. I broke what would be considered  similar
functionality into the following two functions:

	ntp_advance(delta_nsec);
and 
	ppm = ntp_get_ppm_adjustment();

Where ntp_advance() tells the NTP code to increment the NTP state values
for the interval that just passed. This is very similar to the
second_overflow() bits done in the first chunk of your code, only it is
not fixed to only being run at the wall-time second boundery. Instead an
arbitrary but constant second interval length is used (from boot time).

Then ntp_get_ppm_adjustment() provides the ppm adjustment value from the
NTP subsystem to be used in adjusting the frequency multiplier value.
Just as your code adjusts the cycles_mult value. 

(One subtle difference is that I keep the multiplier adjustment
separately from the multiplier value, just so its easier to reset back
to the clocksource drivers original value when needed. This isn't
required but I feel makes the code more understandable for the cost of
an add.)

The remaining functional difference here is the fact I mentioned above,
where I'm using the NTP state values in a way that I feel is more direct
in manipulating the frequency multiplier and where you're generating the
multiplier adjustment via the difference in the current time from the
reference time. But really, hide this difference away in a similar
function to ntp_advance() and its all pretty moot.

Does this sound about right? Are we really in this much agreement or has
my looking forward to a full week off tinted my glasses rosy? :)

thanks
-john



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC 10/10] example of simple continuous gettimeofday
  2005-12-22  2:43 ` john stultz
@ 2005-12-22  9:01   ` Ingo Molnar
  2005-12-25 20:54   ` Roman Zippel
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2005-12-22  9:01 UTC (permalink / raw)
  To: john stultz; +Cc: Roman Zippel, linux-kernel, Thomas Gleixner


* john stultz <johnstul@us.ibm.com> wrote:

> > - I still don't like the idea of a generic gettimeofday() as it prevents
> >   more optimized versions, e.g. on the one end with a 1MHz clock you only
> >   have usec resolution anyway and this allows to keep almost everything
> >   within 32bits. On the other end 64bit archs can avoid the "if (nsec >
> >   NSEC_PER_SEC)" by doing something like ppc64 does, but requires a
> >   different scaling of the values (to sec instead of nsec).
> 
> Fair enough. I agree arches should be able to have their own arch 
> specific implementations. If you really have to micro-optimize 
> everything, just don't enable CONFIG_GENERIC_TIME and have your own 
> timekeeping subsystem!
> 
> But at the same time, I don't like the idea of needlessly having 26 
> different versions of gettimeofday that do exactly the same thing 
> modulo a few bugs. :)

I like the first 9 patches, but regarding 10/10 i very much agree with 
John: it moves us to per-arch gettimeofday again, which is a big step 
backwards and reverts some of the biggest advantage of John's 
clocksource patchset!

Also, lets face it: with the union ktime_t type most of the '64-bit is 
slow on 32-bit' issues are much less of a problem. If some 32-bit arch 
wants to pull off its own timekeeping system, it can do so - but 
otherwise we want to move towards generic, unified (as far as it makes 
sense) and generally 64-bit-optimized subsystems. In 1995 i'd have 
agreed with Roman, but not in 2005.

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC 10/10] example of simple continuous gettimeofday
  2005-12-21 23:30 [PATCH/RFC 10/10] example of simple continuous gettimeofday Roman Zippel
  2005-12-22  2:43 ` john stultz
@ 2005-12-22 10:29 ` Thomas Gleixner
  2005-12-25 16:50   ` Roman Zippel
  1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2005-12-22 10:29 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Ingo Molnar, johnstul, linux-kernel

On Thu, 2005-12-22 at 00:30 +0100, Roman Zippel wrote:

> - I still don't like the idea of a generic gettimeofday() as it prevents
>   more optimized versions, e.g. on the one end with a 1MHz clock you only
>   have usec resolution anyway and this allows to keep almost everything
>   within 32bits. On the other end 64bit archs can avoid the "if (nsec >
>   NSEC_PER_SEC)" by doing something like ppc64 does, but requires a
>   different scaling of the values (to sec instead of nsec).

I strongly disagree. Moving this back into the architechture code for
some micro optimizations is the wrong way to go and a big step back from
Johns patches.

As John correctly analyzed we have dozens of gettimeofday/settimeofday
implementations which are roughly the same and I dont see a benefit to
keep it that way.

The timekeeping is a core functionality which should and can be
completely abstracted. This way we can ensure that the timekeeping is
consistent across architectures. So improvements and extensions to the
core code are immidiately available for all architectures rather than
updating 26 architecture implementations every time.

This allows to handle generic problems like suspend/resume, dynamic
ticks and others in a central place rather than scattered all over the
architecture code.

Same applies for clock source management and switching clock sources. 

> - the clock switch infrastructure can be merged with the clock set
>   mechanism. When setting a clock some internal variables have to be
>   updated as well, which can be reused for the clock switch basically
>   by setting the clock immediately before the switch, so that both
>   clocks run synchronously for a few cycles.

This is a question of implementation details rather than a design
question. 

Can we please concentrate on a clear and generic design model before
discussing optimizations of 20 lines of code? 

	tglx



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC 10/10] example of simple continuous gettimeofday
  2005-12-22 10:29 ` Thomas Gleixner
@ 2005-12-25 16:50   ` Roman Zippel
  0 siblings, 0 replies; 7+ messages in thread
From: Roman Zippel @ 2005-12-25 16:50 UTC (permalink / raw)
  To: tglx; +Cc: Ingo Molnar, johnstul, linux-kernel

Hi,

Thomas, Ingo, I would really appreciate if you didn't pick up on some small 
detail in my mail and draw final conclusions from it for the whole 
discussion, before you even have seen the final result. I really thought I 
made it clear, that this patch is more a proof-of-concept patch, which make 
your comments rather pointless, especially since you leave the central parts 
of the mail uncommented, you didn't even have a single question about it. 
This either tells me you understood what I wrote, but it's beneath you to 
comment on it or you didn't understand it and you just want me to shut up. 
Either way I have a hard time not to take it personal.

On Thursday 22 December 2005 11:29, Thomas Gleixner wrote:

> Can we please concentrate on a clear and generic design model before
> discussing optimizations of 20 lines of code?

Fine with and that's exactly what the rest of the mail was about.

On Thursday 22 December 2005 10:01, Ingo Molnar wrote:

> Also, lets face it: with the union ktime_t type most of the '64-bit is
> slow on 32-bit' issues are much less of a problem. If some 32-bit arch
> wants to pull off its own timekeeping system, it can do so - but
> otherwise we want to move towards generic, unified (as far as it makes
> sense) and generally 64-bit-optimized subsystems. In 1995 i'd have
> agreed with Roman, but not in 2005.

ktime_t doesn't magically solve all problems, it just hides them better.
The main point of my mail is to provide a theoretical base of a good 
gettimeofday implementation, which would allow a generic _and_ fast 
implementation. Why do you insist on 64bit, if I can actually prove it's only 
a waste of time?

bye, Roman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC 10/10] example of simple continuous gettimeofday
  2005-12-22  2:43 ` john stultz
  2005-12-22  9:01   ` Ingo Molnar
@ 2005-12-25 20:54   ` Roman Zippel
  2006-01-20  4:01     ` john stultz
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Zippel @ 2005-12-25 20:54 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel

Hi,

On Thursday 22 December 2005 03:43, john stultz wrote:

> > The basic idea behind this is that ntp defines the frequency of the
> > reference time, e.g. with a clock frequency of f=500MHz the clock should
> > advance exactly 1 second after f cycles, but a quartz is usually not
> > that precise and varies depending on the temperature, so ntp defines an
> > adjustment, which should be applied every f cycles. tick_nsec and
> > time_adj now specify this frequency by how much the time is advanced
> > every tick. The resolution of the frequency is currently 2^-12nsec,
> > which allows for a very stable clock.
>
> My read: This is how the current tick based timekeeping functions.
>
> Continued agreement.

Actually this is how every timekeeping works (only tick_nsec/time_adj are the 
implementation specific parts), NTP defines the reference frequency of the 
clock, i.e. by how much time the clock should be advanced every f cycles (or 
n ticks).

> > OTOH when reading the clock source it depends on the used multiplier by
> > how much the time is advanced every f cycles (this is also the time
> > visible to the user), e.g. in this case we would initialize the
> > multiplier to 8388608 (the shift value used in the patch is 22, so
> > m=10^9*2^22/f) so that after f cycles the time is f*m>>22=10^9nsec. To
> > keep the time now incrementing monotonically we can change the
> > multiplier to change the speed of the clock. It depends now on the used
> > shift value and the update frequency, how close we can keep the user
> > time to the reference time, e.g. with HZ=250 the possible frequency
> > adjustment steps can be f/2^22/10^9/HZ or about 0.477nsec, as the
> > resolution of the clock itself is 1/f (or 2nsec), the resulting
> > resolution is still well within the practical limits of the clock. This
> > also means a good shift value for a clock is around log2(f^2/10^9/HZ), a
> > much larger shift value doesn't improve much over the clock resolution
> > and a lower shift value makes the clock resolution worse.
>
> My Read: Adjust the frequency multiplier (using mult/shift units) to
> apply the frequency adjustment. Picking a good shift unit is desirable.
>
> My clocksource code uses exactly the same method. See comments in the
> clocksource drivers and cyc2ns() for details.

Yes, but I think it misses a few details of how to choose a good shift value 
for a specific target precision. That's what I try to provide above.

> > In any case the clock resolution will be coarser than the resolution of
> > the ntp reference time, so with every update step at a timer tick we
> > accumulate an error of (ntp_update-xtime_update) and after some time the
> > error is large enough that it can be corrected by adjusting the
> > multiplier. The tricky part here is to prevent the error value from
> > oscillating too much, what can happen if the update comes too late and
> > we compensated too much, this is not much of a problem for small error
> > values, but larger error values are incrementally corrected instead and
> > not all at once. Another important detail is that the error value is
> > calculated after possible adjustments from second_overflow(), this way
> > we basically look ahead to the next timer tick and compensate for the
> > expected error and don't wait until we exceeded the error. This means a
> > well synchronized clock with small update delays almost always stays
> > within the error limits.
>
> The specifics here are a little fuzzy from the first read. It seems the
> point is that since frequency adjustments are made at fixed points, the
> longer the interval is between the point, the greater possible error.
> Thus if ticks are lost, or for whatever reason the intervals between
> frequency changes is large, we may over-compensate from what the NTP
> reference clock tells us.

It has nothing to do with lost ticks, the important part are the different 
resolutions. The NTP reference time provides long term stability, so you can 
keep it stable within a 1nsec over a day.
The adjustment done with the multiplier are much coarser, here you quickly can 
accumulate an error of a few usec or even msec within a hour. The error is 
f/2^shift/2 nsec per second, so it all depends on the used shift value and 
the faster the clock the larger the error.

> Your design seems to suggest keeping an NTP calculated reference time
> that the timekeeping code uses to correct itself from (again, let me
> know if I'm mis-understanding you).  In my view this is a little
> redundant, because fundamentally this is what the ntp daemon and
> adjtimex is already doing. The daemon uses the server's reference time
> and figures out how far off it is from the OS's system time. It does
> some filtering and feeds that offset into adjtimex where it is dampened
> and used to modify a frequency adjustment.

Indeed, the main part here is really the long term stability, my code corrects 
small errors immediately, where you leave this to the next ntp 
synchronisation step.

> So instead of using those offset/frequency adjustment values directly
> (which my design tries to do), your design converts it to a fixed
> adjustment to xtime which then becomes yet another reference clock to
> the timekeeping code, which calculates yet another offset value and
> converts that into a frequency adjustment.

Our code is not that different, so you basically do all this work as well. :)
total_sppm is basically just (tick_nsec + time_adj - TICK_NSEC) or the new 
multiplier could be calculated as (tick_nsec + time_adj) / update_cycles. The 
major difference that I keep the error (basically (tick_nsec+time_adj) % 
update_cycles) from the previous update and use it to correct the drift (see 
below).

Overall due to the precalculations from the other patches my code is also a 
bit simpler, e.g. you have multiple checks against NSEC_PER_SEC to some work 
once per second (interval_sum and second_check), which I keep together.
Another difference is that I keep everything scaled to the same base, you have 
to convert between different bases (mostly total_sppm). Keeping everything 
scaled equally has the big advantage that it's easy to change the scale (e.g. 
to implement the ppc64 version of gettimeofday()). Only some conversion 
factors and limits in adjtimex() change, all the rest of code stays the same, 
so you gain a lot of flexibility and still keep everything generic.

> So I think the real difference here is that I calculate the frequency
> adjustment directly from the NTP freq/offset/tick values inside the NTP
> code and allow the NTP daemon (along w/ the tick based dampening)
> provide the feedback as to how accurate we are.

This way you push more work to the daemon and create a larger jitter. This 
doesn't matter much for normal machines synchronised over the internet, but 
if you want to synchronise a few machines in a local network and want to 
reach usec resolution, you quickly hit the limits.

> > - with the previous patches the ntp code doesn't need much more changes
> >   anymore to implement a continuous time source, all you need is access
> >   to tick_nsec_curr, time_adj_curr and second_overflow() and you have to
> >   keep the generic code from calling update_wall_time().
>
> I still prefer the stronger isolation in my design (NTP code does not
> directly change any timekeeping values, but provides NTP adjustment
> values via function interfaces), but yes, at first glance I don't
> believe your NTP patches break how my code functions.

It's mainly still only example code, so a better separation is of course still 
possible.

> > - your code doesn't maintain the long term error, which will cause a
> >   random drift. This basically means you need a large shift value to
> >   increase the resolution and keep this error small. The code below can
> >   maintain a stable clock even at low resolutions.
>
> I disagree here. I keep a 64 bit remainder value in
> timeofday_periodic_hook() that accumulates the sub-nanosecond error when
> converting from cycles to nanoseconds. If this is not what you mean,
> could you please clarify?

I keep that remainder as well, but it's part of xtime_nsec, I even included 
this remainder in the gettimeofday() calculation, so time e.g. doesn't jump 
from 0.1 nsec to 0.9 nsec after an update.
The error I mean comes from ppm_to_mult_adj(), here you lose resolution. 
cs->mult depends on the chosen the shift value and the smaller the shift 
value/multiplier the more resolution you lose and the larger the error and 
the drift are.

> > - I still don't like the idea of a generic gettimeofday() as it prevents
> >   more optimized versions, e.g. on the one end with a 1MHz clock you only
> >   have usec resolution anyway and this allows to keep almost everything
> >   within 32bits. On the other end 64bit archs can avoid the "if (nsec >
> >   NSEC_PER_SEC)" by doing something like ppc64 does, but requires a
> >   different scaling of the values (to sec instead of nsec).
>
> Fair enough. I agree arches should be able to have their own arch
> specific implementations. If you really have to micro-optimize
> everything, just don't enable CONFIG_GENERIC_TIME and have your own
> timekeeping subsystem!

I don't think that's necessary, large parts are still generic and I don't 
think a config option is necessary at all (i.e. I didn't need one in my 
example).

> But at the same time, I don't like the idea of needlessly having 26
> different versions of gettimeofday that do exactly the same thing modulo
> a few bugs. :)

We can still provide a generic version, all the clock driver has to do is to 
add a line ".gettimeofday = generic_gettimeofday,". Even different versions 
can come from the same source template, the clock only defines some 
parameters and then includes the template, which generates the correct and 
optimised code.
There are a number of possibilities, which basically would provide the clock 
driver only a basic framework it can choose from, but overall there would 
only be one source version.

> > - the clock switch infrastructure can be merged with the clock set
> >   mechanism. When setting a clock some internal variables have to be
> >   updated as well, which can be reused for the clock switch basically
> >   by setting the clock immediately before the switch, so that both
> >   clocks run synchronously for a few cycles.
>
> Well, that's a little hand-wavy, but yes, being able to switch
> clocksources requires more code and I don't think that's a point of
> contention here. :)

Actually it's not so much hand-wavy, the time_was_set() in my example provides 
already almost everything to switch from one clock source to another.

> Sooo.. let's rework this a bit (also dropping the printks):
> 	u64 cycle_delta, nsec_delta
>
> 	rdtscll(cycles);
> 	while ((off = cycles - cycles_offset) >= update_cycles) {
> 		cycles_delta += update_cycles;
> 		nsec_delta += xtime_update;
>
> 		xtime_error += (u64)tick_nsec_curr << 22;
> 		xtime_error += (u64)time_adj_curr << (22 - SHIFT_SCALE);
> 		xtime_error -= xtime_update;
> 	}
> 	xtime_nsec += nsec_delta;
> 	cycle_offset += cycle_delta;
>
> 	while (xtime_nsec >= (u64)NSEC_PER_SEC << 22) {
> 		xtime_nsec -= (u64)NSEC_PER_SEC << 22;
> 		xtime.tv_sec++;
> 		second_overflow();
> 	}
> 	xtime.tv_nsec = xtime_nsec >> 22;
>
>
> Now lets look at my code in timeofday_periodic_hook():
>
> 	/* read time source & calc time since last call: */
> 	cycle_now = read_clocksource(clock);
>
> 	cycle_delta = (cycle_now - cycle_last) & clock->mask;
>
> 	delta_nsec = cyc2ns_fixed_rem(ts_interval, &cycle_delta, &remainder);
> 	cycle_last = (cycle_now - cycle_delta)&clock->mask;
>
> 	/* update system_time:  */
> 	__increment_system_time(delta_nsec);
>
>
> You might take a look at cyc2ns_fixed_rem() to see exactly how similar
> these are, but so far we're talking *really* close.

Your cyc2ns_fixed_rem() is still a bit more complex than my simple "nsec_delta 
+= xtime_update;", i.e. no special remainder handling. The xtime_error is not 
the same as your remainder (see below).

> > +	if (xtime_error > update_cycles / 2) {
> > [..]
>
> Ok, so here's where we start to see a difference. First it looks like
> this should be in a function. I broke what would be considered  similar
> functionality into the following two functions:
>
> 	ntp_advance(delta_nsec);
> and
> 	ppm = ntp_get_ppm_adjustment();
>
> Where ntp_advance() tells the NTP code to increment the NTP state values
> for the interval that just passed. This is very similar to the
> second_overflow() bits done in the first chunk of your code, only it is
> not fixed to only being run at the wall-time second boundery. Instead an
> arbitrary but constant second interval length is used (from boot time).

I think here is where you misunderstand my code. All the ntp_advance() 
business is already done in the previous loop and at a wall-time second 
boundary.
This part is equivalent to your ntp_get_ppm_adjustment() and could also be 
written as:

	nsec = error;
	nsec += (u64)tick_nsec_curr << 22;
	nsec += (u64)time_adj_curr << (22 - SHIFT_SCALE);

	mult = nsec / update_cycles;
	error = nsec % update_cycles;

You basically drop the error now, which is needed to keep the clock stable.
(Although to accurately maintain the error this also had to be done in the 
loop, i.e. xtime_error above).
My code now avoids the rather expensive recalculations and only has work to do 
if the error is larger than update_cycles/2 and the common case is that the 
multiplier only differs by one from the previous value.
So our code does indeed pretty much the same work, but I update all the values 
incrementally and avoid any possibly expensive calculations on any arch.

> The remaining functional difference here is the fact I mentioned above,
> where I'm using the NTP state values in a way that I feel is more direct
> in manipulating the frequency multiplier and where you're generating the
> multiplier adjustment via the difference in the current time from the
> reference time. But really, hide this difference away in a similar
> function to ntp_advance() and its all pretty moot.

The code could be moved of course into separate functions, although I'm not 
sure about the name ntp_advance(). tick_nsec/time_adj and second_overflow() 
are the only interfaces to the ntp code left and the rest are really clock 
specific parameters.
This code can now be basically turned into some template code, which can be 
optimised for every clock driver, so we can completely avoid the 
CONFIG_GENERIC_TIME, arch_getoffset() and is_continuous stuff.

> Does this sound about right? Are we really in this much agreement or has
> my looking forward to a full week off tinted my glasses rosy? :)

It seems a few details are missing, but overall it looks good. :)

bye, Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC 10/10] example of simple continuous gettimeofday
  2005-12-25 20:54   ` Roman Zippel
@ 2006-01-20  4:01     ` john stultz
  0 siblings, 0 replies; 7+ messages in thread
From: john stultz @ 2006-01-20  4:01 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel

Hey Roman,
	Finally getting around continuing this discussion. Sorry for the long
wait.

On Sun, 2005-12-25 at 21:54 +0100, Roman Zippel wrote:
> On Thursday 22 December 2005 03:43, john stultz wrote:
> > Your design seems to suggest keeping an NTP calculated reference time
> > that the timekeeping code uses to correct itself from (again, let me
> > know if I'm mis-understanding you).  In my view this is a little
> > redundant, because fundamentally this is what the ntp daemon and
> > adjtimex is already doing. The daemon uses the server's reference time
> > and figures out how far off it is from the OS's system time. It does
> > some filtering and feeds that offset into adjtimex where it is dampened
> > and used to modify a frequency adjustment.
> 
> Indeed, the main part here is really the long term stability, my code corrects 
> small errors immediately, where you leave this to the next ntp 
> synchronisation step.

After re-reading your code a bit I think I do finally understand this
bit. Just to be sure (and for anyone else who cares to follow along),
I'll restate it.

So the aproximation of my current code is as follows:

/* Ignore sub-nanosecond remainder accounting.
 * Also ignore dealing with switching clocksources 
 * or clocksources that change freq.
 */

/* global state values */
u64 system_time

/* clocksource specific values */
u64 last
u32 mult, shift
s32 adj

u64 interval_cycles
u64 interval_nsecs


u64 cyc2ns(u64 cycles):
	return (cycles*(mult+adj))>>shift

u64 monotonic_clock():
	u64 now = read_cycles()
	u64 delta = now - last
	return system_time + cyc2ns(delta)


u64 cyc2ns_fixed(u64* cycles):
	u64 delta_nsec = 0;
	while (*cycles > interval_cycles):
		*cycles -= interval_cycles
		delta_nsec += interval_nsecs
	return delta_nsec

s32 ppm2adj(s32 ppm):
	int ret_adj
	u64 mult_adj = abs(ppm);

	/* convert from shifted ppm to clocksource mult units */
	mult_adj = signed_shift_right((mult_adj * mult),SHIFT_USEC)
	do_div(mult_adj, 1000000)
	ret_adj = (int)mult_adj
	return ret_adj;

periodic_hook():
	static int last_ppm;
	/* calculate interval, and accumulate time */
	u64 now = read_cycles()
	u64 delta_cycles = now - last
	u64 delta_nsec = cyc2ns_fixed(&delta_cycles)
	last = now - delta_cycles

	system_time += delta_nsec

	ntp_advance(delta_nsec)
	ppm = get_ntp_ppm()
	if (last_ppm != ppm):
		last_ppm = ppm
		/* accumulate left-over delta_cycles */
		delta_nsec = cyc2ns(delta_cycles)
		system_time += delta_nsec
		ntp_advance(delta_nsec)

		/* calculate new adj value & interval*/
		adj = ppm2adj(ppm)
		interval_nsecs = cyc2ns(interval_cycles)


The problem being: I take a high-resolution value from NTP (shifted ppm
value) and convert it to a fairly course multiplier adjustment. Thus any
small adjustment (smaller then a multiplier unit) from NTP is ignored
until the error grows large enough for the NTP daemon to notice and
correct back. 

Your suggested solution accumulates the adjustment error and
compensates, allowing for cumulative adjustments of fractional
multiplier units (for example,  three intervals w/ adj=0, then for one
interval adj=1 allowing for a multiplier adjustments of 1/4th).


So, we're in sync with the above....


My issue is, that your code to implement this, while computationally
very efficient, is very hard to read and understand. So I've tried to
re-implement it in logical chunks that are a bit easier to digest. So
using most of the above code:


u64 cyc2ns_fixed(u64* cycles, u64 ideal_interval, u64* errror):
	u64 delta_nsec = 0;
	while (*cycles > interval_cycles):
		*cycles -= interval_cycles
		delta_nsec += interval_nsecs
		*error += ideal_interval - interval_nsecs 
	return delta_nsec

int calculate_adj_factor(u64 error, u32 length):
	static int saved_error_adj
	int current_adj = saved_error_adj

	/* this is basically binary approximation */
	u64 adjusted_interval = (u64)length << current_adj
	if (ntp_error > (adjusted_interval * 2)):
		/* large error, so increment the adjustment factor */
		saved_error_adj++
		current_adj++
	else if (ntp_error > adjusted_interval):
		/* just right, don't touch it */
	else if (current_adj):
		/* small error, so drop the adjustment factor */
		saved_error_adj--
		current_adj = 0
	return current_adj


periodic_hook():
	static u64 adj_error;

	/* calculate interval, and accumulate time */
	u64 now = read_cycles()
	u64 delta_cycles = now - last
	u64 ntp_interval = ntp_get_interval()
	u64 delta_nsec = cyc2ns_fixed(&delta_cycles, ntp_interval,
					&adj_error)
	last = now - delta_cycles

	system_time += delta_nsec

	ntp_advance(delta_nsec)

	/* check if NTP adjutment error is too large */
	if (abs(adj_error) > interval_cycles/2):
		u32 adj_factor = calculate_adj_factor(abs(adj_error),
							interval_cycles)

		/* Before changing adj, accumulate the left over
		 * delta_cycles:
		 *
		 * XXX - You do this differently, avoiding the mult
		 * but to limit the discussion, here's the inefficient
		 * method:
		 */
		delta_nsec = cyc2ns(delta_cycles)
		system_time += delta_nsec
		ntp_advance(delta_nsec)

		/* change adj */
		if(adj_error > 0):
			adj += 1<<adj_factor
		else:
			adj -= 1<<adj_factor

		/* we can avoid the mult here using
		 * +/-(interval_cycles<<adj_factor), but for
		 * the sake of understandability:
		 */
		interval_nsecs = cyc2ns(interval_cycles)
		


Does this basically jive w/ what you are suggesting with regard to the
NTP error accumulation and adjustment?

thanks
-john


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-01-20  4:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-21 23:30 [PATCH/RFC 10/10] example of simple continuous gettimeofday Roman Zippel
2005-12-22  2:43 ` john stultz
2005-12-22  9:01   ` Ingo Molnar
2005-12-25 20:54   ` Roman Zippel
2006-01-20  4:01     ` john stultz
2005-12-22 10:29 ` Thomas Gleixner
2005-12-25 16:50   ` Roman Zippel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).