All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/9] x86_64: reliable TSC-based gettimeofday
@ 2007-02-01  9:59 jbohac
  2007-02-01  9:59 ` [patch 1/9] Fix HPET init race jbohac
                   ` (12 more replies)
  0 siblings, 13 replies; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

TSC-based x86_64 timekeeping implementation
===========================================
by Vojtech Pavlik and Jiri Bohac

This implementation allows the current time to be approximated by reading the
CPU's TSC even on SMP machines with unsynchronised TSCs. This allows us to
have a very fast gettimeofday() vsyscall on all SMP machines supporting the
RDTSCP instruction (AMD) or having synchronised TSCs (Intel).

Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, so
vsyscall is not used by default. Still, the syscall version of gettimeofday is
a lot faster using the TSC approximation instead of other hardware timers.

At boot, either the PM timer or HPET (preferred) is chosen as the "Master
Timer" (MT), from which all the time is calculated. As reading either of these
is slow, we want to approximate it using the TSC.

Each CPU updates its idea of the real time in update_timer_caches() called from
the LAPIC ISR. This function reads the real value of the MT and updates the
per-CPU timekeeping variables accordingly. Each CPU maintains its own
"tsc_slope" (a ratio of the MT and TSC frequencies) and a couple of offsets,
allowing us to guess (using guess_mt()) the value of the MT at any time on any
CPU.  All this per-cpu data is kept in the vxtime structure.

The gettimeofday (both the syscall and vsyscall versions) use the
approximated value of the MT to calculate the time elapsed since the
last timer interrupt. For this purpose, vxtime.mt_wall holds the value
of the MT at the last timer interrupt.

During a CPU frequency change, we cannot trust the TSCs. Therefore, when
we get the pre-change notification, we switch to using the hardware
Master Timer instead of the approximation by setting a flag in
vxtime.tsc_invalid. After the post-change notification we keep using the
hardware MT for a while, until the approximation becomes accurate again.

When strict inter-CPU monotonicity is not needed, the vsyscall version of
gettimeofday may be forced using the "nomonotonic" command line parameter.
gettimeofday()'s monotonicity is guaranteed on a single CPU even with the very
fast vsyscall version.  Across CPUs, the vsyscall version of gettimeofday is
not guaranteed to be monotonic, but it should be pretty close. Currently, we
get errors of tens/hundreds of microseconds.

We rely on neither the LAPIC timer nor the main timer interrupts being
called in regular intervals (although a little modification would
improve the MT approximation in this case), so we're basically ready for a
tickless kernel.

A patch series follows. Comments welcome.

--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 1/9] Fix HPET init race
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-02  2:34   ` Andrew Morton
  2007-02-01  9:59 ` [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode jbohac
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: fix_hpet_init_race --]
[-- Type: text/plain, Size: 870 bytes --]

Fix a race in the initialization of HPET, which might result in a 
5 minute lockup on boot.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
@@ -764,10 +767,12 @@ static void setup_APIC_timer(unsigned in
 
 	/* wait for irq slice */
  	if (vxtime.hpet_address && hpet_use_timer) {
- 		int trigger = hpet_readl(HPET_T0_CMP);
- 		while (hpet_readl(HPET_COUNTER) >= trigger)
- 			/* do nothing */ ;
- 		while (hpet_readl(HPET_COUNTER) <  trigger)
+		int trigger;
+		do
+			trigger = hpet_readl(HPET_T0_CMP);
+		while (hpet_readl(HPET_COUNTER) >= trigger);
+
+		while (hpet_readl(HPET_COUNTER) <  trigger)
  			/* do nothing */ ;
  	} else {
 		int c1, c2;

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
  2007-02-01  9:59 ` [patch 1/9] Fix HPET init race jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-01 11:13   ` Andi Kleen
  2007-02-01  9:59 ` [patch 3/9] Remove the support for the VXTIME_HPET " jbohac
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: remove_vxtime_pmtmr --]
[-- Type: text/plain, Size: 5482 bytes --]

VXTIME_PMTMR will be replaced by a more generic "Master Timer"

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
@@ -784,16 +784,7 @@ static void setup_APIC_timer(unsigned in
 		} while (c2 - c1 < 300);
 	}
 	__setup_APIC_LVTT(clocks);
-	/* Turn off PIT interrupt if we use APIC timer as main timer.
-	   Only works with the PM timer right now
-	   TBD fix it for HPET too. */
-	if (vxtime.mode == VXTIME_PMTMR &&
-		smp_processor_id() == boot_cpu_id &&
-		apic_runs_main_timer == 1 &&
-		!cpu_isset(boot_cpu_id, timer_interrupt_broadcast_ipi_mask)) {
-		stop_timer_interrupt();
-		apic_runs_main_timer++;
-	}
+
 	local_irq_restore(flags);
 }
 
Index: linux-2.6.20-rc5/arch/x86_64/kernel/pmtimer.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/pmtimer.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/pmtimer.c
@@ -29,10 +29,6 @@
  * in arch/i386/kernel/acpi/boot.c */
 u32 pmtmr_ioport __read_mostly;
 
-/* value of the Power timer at last timer interrupt */
-static u32 offset_delay;
-static u32 last_pmtmr_tick;
-
 #define ACPI_PM_MASK 0xFFFFFF /* limit it to 24 bits */
 
 static inline u32 cyc2us(u32 cycles)
@@ -48,38 +44,6 @@ static inline u32 cyc2us(u32 cycles)
 	return (cycles >> 10);
 }
 
-int pmtimer_mark_offset(void)
-{
-	static int first_run = 1;
-	unsigned long tsc;
-	u32 lost;
-
-	u32 tick = inl(pmtmr_ioport);
-	u32 delta;
-
-	delta = cyc2us((tick - last_pmtmr_tick) & ACPI_PM_MASK);
-
-	last_pmtmr_tick = tick;
-	monotonic_base += delta * NSEC_PER_USEC;
-
-	delta += offset_delay;
-
-	lost = delta / (USEC_PER_SEC / HZ);
-	offset_delay = delta % (USEC_PER_SEC / HZ);
-
-	rdtscll(tsc);
-	vxtime.last_tsc = tsc - offset_delay * (u64)cpu_khz / 1000;
-
-	/* don't calculate delay for first run,
-	   or if we've got less then a tick */
-	if (first_run || (lost < 1)) {
-		first_run = 0;
-		offset_delay = 0;
-	}
-
-	return lost - 1;
-}
-
 static unsigned pmtimer_wait_tick(void)
 {
 	u32 a, b;
@@ -100,28 +64,3 @@ void pmtimer_wait(unsigned us)
 		cpu_relax();
 	} while (cyc2us(b - a) < us);
 }
-
-void pmtimer_resume(void)
-{
-	last_pmtmr_tick = inl(pmtmr_ioport);
-}
-
-unsigned int do_gettimeoffset_pm(void)
-{
-	u32 now, offset, delta = 0;
-
-	offset = last_pmtmr_tick;
-	now = inl(pmtmr_ioport);
-	delta = (now - offset) & ACPI_PM_MASK;
-
-	return offset_delay + cyc2us(delta);
-}
-
-
-static int __init nopmtimer_setup(char *s)
-{
-	pmtmr_ioport = 0;
-	return 1;
-}
-
-__setup("nopmtimer", nopmtimer_setup);
Index: linux-2.6.20-rc5/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/time.c
@@ -364,13 +364,6 @@ void main_timer_handler(void)
 		 */
 		offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
 		delay = hpet_readl(HPET_COUNTER) - offset;
-	} else if (!pmtmr_ioport) {
-		spin_lock(&i8253_lock);
-		outb_p(0x00, 0x43);
-		delay = inb_p(0x40);
-		delay |= inb(0x40) << 8;
-		spin_unlock(&i8253_lock);
-		delay = LATCH - 1 - delay;
 	}
 
 	tsc = get_cycles_sync();
@@ -384,10 +377,6 @@ void main_timer_handler(void)
 			(offset - vxtime.last) * NSEC_PER_TICK / hpet_tick;
 
 		vxtime.last = offset;
-#ifdef CONFIG_X86_PM_TIMER
-	} else if (vxtime.mode == VXTIME_PMTMR) {
-		lost = pmtimer_mark_offset();
-#endif
 	} else {
 		offset = (((tsc - vxtime.last_tsc) *
 			   vxtime.tsc_quot) >> US_SCALE) - USEC_PER_TICK;
@@ -914,13 +903,6 @@ void __init time_init(void)
 	  	tick_nsec = TICK_NSEC_HPET;
 		cpu_khz = hpet_calibrate_tsc();
 		timename = "HPET";
-#ifdef CONFIG_X86_PM_TIMER
-	} else if (pmtmr_ioport && !vxtime.hpet_address) {
-		vxtime_hz = PM_TIMER_FREQUENCY;
-		timename = "PM";
-		pit_init();
-		cpu_khz = pit_calibrate_tsc();
-#endif
 	} else {
 		pit_init();
 		cpu_khz = pit_calibrate_tsc();
@@ -987,16 +969,6 @@ void time_init_gtod(void)
 			vxtime.last = hpet_readl(HPET_COUNTER);
 		vxtime.mode = VXTIME_HPET;
 		do_gettimeoffset = do_gettimeoffset_hpet;
-#ifdef CONFIG_X86_PM_TIMER
-	/* Using PM for gettimeofday is quite slow, but we have no other
-	   choice because the TSC is too unreliable on some systems. */
-	} else if (pmtmr_ioport && !vxtime.hpet_address && notsc) {
-		timetype = "PM";
-		do_gettimeoffset = do_gettimeoffset_pm;
-		vxtime.mode = VXTIME_PMTMR;
-		sysctl_vsyscall = 0;
-		printk(KERN_INFO "Disabling vsyscall due to use of PM timer\n");
-#endif
 	} else {
 		timetype = hpet_use_timer ? "HPET/TSC" : "PIT/TSC";
 		vxtime.mode = VXTIME_TSC;
@@ -1064,10 +1036,6 @@ static int timer_resume(struct sys_devic
 			vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
 		else
 			vxtime.last = hpet_readl(HPET_COUNTER);
-#ifdef CONFIG_X86_PM_TIMER
-	} else if (vxtime.mode == VXTIME_PMTMR) {
-		pmtimer_resume();
-#endif
 	} else
 		vxtime.last_tsc = get_cycles_sync();
 	write_sequnlock_irqrestore(&xtime_lock,flags);
Index: linux-2.6.20-rc5/include/asm-x86_64/vsyscall.h
===================================================================
--- linux-2.6.20-rc5.orig/include/asm-x86_64/vsyscall.h
+++ linux-2.6.20-rc5/include/asm-x86_64/vsyscall.h
@@ -26,7 +26,6 @@ enum vsyscall_num {
 
 #define VXTIME_TSC	1
 #define VXTIME_HPET	2
-#define VXTIME_PMTMR	3
 
 #define VGETCPU_RDTSCP	1
 #define VGETCPU_LSL	2

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 3/9] Remove the support for the VXTIME_HPET timer mode
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
  2007-02-01  9:59 ` [patch 1/9] Fix HPET init race jbohac
  2007-02-01  9:59 ` [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-01  9:59 ` [patch 4/9] Remove the TSC synchronization on SMP machines jbohac
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: remove_vxtime_hpet --]
[-- Type: text/plain, Size: 5879 bytes --]

VXTIME_HPET will be replaced by a more generic "Master Timer"

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Index: linux-2.6.20-rc5/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/time.c
@@ -101,13 +101,6 @@ static inline unsigned int do_gettimeoff
 	return x;
 }
 
-static inline unsigned int do_gettimeoffset_hpet(void)
-{
-	/* cap counter read to one tick to avoid inconsistencies */
-	unsigned long counter = hpet_readl(HPET_COUNTER) - vxtime.last;
-	return (min(counter,hpet_tick) * vxtime.quot) >> US_SCALE;
-}
-
 unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;
 
 /*
@@ -278,17 +271,6 @@ unsigned long long monotonic_clock(void)
  	u32 last_offset, this_offset, offset;
 	unsigned long long base;
 
-	if (vxtime.mode == VXTIME_HPET) {
-		do {
-			seq = read_seqbegin(&xtime_lock);
-
-			last_offset = vxtime.last;
-			base = monotonic_base;
-			this_offset = hpet_readl(HPET_COUNTER);
-		} while (read_seqretry(&xtime_lock, seq));
-		offset = (this_offset - last_offset);
-		offset *= NSEC_PER_TICK / hpet_tick;
-	} else {
 		do {
 			seq = read_seqbegin(&xtime_lock);
 
@@ -297,7 +279,6 @@ unsigned long long monotonic_clock(void)
 		} while (read_seqretry(&xtime_lock, seq));
 		this_offset = get_cycles_sync();
 		offset = cycles_2_ns(this_offset - last_offset);
-	}
 	return base + offset;
 }
 EXPORT_SYMBOL(monotonic_clock);
@@ -316,16 +297,6 @@ static noinline void handle_lost_ticks(i
 		       KERN_WARNING "Your time source seems to be instable or "
 		   		"some driver is hogging interupts\n");
 		print_symbol("rip %s\n", get_irq_regs()->rip);
-		if (vxtime.mode == VXTIME_TSC && vxtime.hpet_address) {
-			printk(KERN_WARNING "Falling back to HPET\n");
-			if (hpet_use_timer)
-				vxtime.last = hpet_readl(HPET_T0_CMP) - 
-							hpet_tick;
-			else
-				vxtime.last = hpet_readl(HPET_COUNTER);
-			vxtime.mode = VXTIME_HPET;
-			do_gettimeoffset = do_gettimeoffset_hpet;
-		}
 		/* else should fall back to PIT, but code missing. */
 		warned = 1;
 	} else
@@ -368,16 +339,6 @@ void main_timer_handler(void)
 
 	tsc = get_cycles_sync();
 
-	if (vxtime.mode == VXTIME_HPET) {
-		if (offset - vxtime.last > hpet_tick) {
-			lost = (offset - vxtime.last) / hpet_tick - 1;
-		}
-
-		monotonic_base += 
-			(offset - vxtime.last) * NSEC_PER_TICK / hpet_tick;
-
-		vxtime.last = offset;
-	} else {
 		offset = (((tsc - vxtime.last_tsc) *
 			   vxtime.tsc_quot) >> US_SCALE) - USEC_PER_TICK;
 
@@ -387,7 +348,6 @@ void main_timer_handler(void)
 		if (offset > USEC_PER_TICK) {
 			lost = offset / USEC_PER_TICK;
 			offset %= USEC_PER_TICK;
-		}
 
 		monotonic_base += cycles_2_ns(tsc - vxtime.last_tsc);
 
@@ -465,20 +425,6 @@ unsigned long long sched_clock(void)
 {
 	unsigned long a = 0;
 
-#if 0
-	/* Don't do a HPET read here. Using TSC always is much faster
-	   and HPET may not be mapped yet when the scheduler first runs.
-           Disadvantage is a small drift between CPUs in some configurations,
-	   but that should be tolerable. */
-	if (__vxtime.mode == VXTIME_HPET)
-		return (hpet_readl(HPET_COUNTER) * vxtime.quot) >> US_SCALE;
-#endif
-
-	/* Could do CPU core sync here. Opteron can execute rdtsc speculatively,
-	   which means it is not completely exact and may not be monotonous between
-	   CPUs. But the errors should be too small to matter for scheduling
-	   purposes. */
-
 	rdtscll(a);
 	return cycles_2_ns(a);
 }
@@ -961,18 +907,8 @@ void time_init_gtod(void)
 	else
 		vgetcpu_mode = VGETCPU_LSL;
 
-	if (vxtime.hpet_address && notsc) {
-		timetype = hpet_use_timer ? "HPET" : "PIT/HPET";
-		if (hpet_use_timer)
-			vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
-		else
-			vxtime.last = hpet_readl(HPET_COUNTER);
-		vxtime.mode = VXTIME_HPET;
-		do_gettimeoffset = do_gettimeoffset_hpet;
-	} else {
 		timetype = hpet_use_timer ? "HPET/TSC" : "PIT/TSC";
 		vxtime.mode = VXTIME_TSC;
-	}
 
 	printk(KERN_INFO "time.c: Using %ld.%06ld MHz WALL %s GTOD %s timer.\n",
 	       vxtime_hz / 1000000, vxtime_hz % 1000000, timename, timetype);
@@ -1031,12 +967,6 @@ static int timer_resume(struct sys_devic
 	write_seqlock_irqsave(&xtime_lock,flags);
 	xtime.tv_sec = sec;
 	xtime.tv_nsec = 0;
-	if (vxtime.mode == VXTIME_HPET) {
-		if (hpet_use_timer)
-			vxtime.last = hpet_readl(HPET_T0_CMP) - hpet_tick;
-		else
-			vxtime.last = hpet_readl(HPET_COUNTER);
-	} else
 		vxtime.last_tsc = get_cycles_sync();
 	write_sequnlock_irqrestore(&xtime_lock,flags);
 	jiffies += sleep_length;
Index: linux-2.6.20-rc5/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/vsyscall.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/vsyscall.c
@@ -72,18 +72,9 @@ static __always_inline void do_vgettimeo
 		sec = __xtime.tv_sec;
 		usec = __xtime.tv_nsec / 1000;
 
-		if (__vxtime.mode != VXTIME_HPET) {
-			t = get_cycles_sync();
-			if (t < __vxtime.last_tsc)
-				t = __vxtime.last_tsc;
-			usec += ((t - __vxtime.last_tsc) *
-				 __vxtime.tsc_quot) >> 32;
-			/* See comment in x86_64 do_gettimeofday. */
-		} else {
 			usec += ((readl((void __iomem *)
 				   fix_to_virt(VSYSCALL_HPET) + 0xf0) -
 				  __vxtime.last) * __vxtime.quot) >> 32;
-		}
 	} while (read_seqretry(&__xtime_lock, sequence));
 
 	tv->tv_sec = sec + usec / 1000000;
Index: linux-2.6.20-rc5/include/asm-x86_64/vsyscall.h
===================================================================
--- linux-2.6.20-rc5.orig/include/asm-x86_64/vsyscall.h
+++ linux-2.6.20-rc5/include/asm-x86_64/vsyscall.h
@@ -25,7 +25,6 @@ enum vsyscall_num {
 #define __section_xtime_lock __attribute__ ((unused, __section__ (".xtime_lock"), aligned(16)))
 
 #define VXTIME_TSC	1
-#define VXTIME_HPET	2
 
 #define VGETCPU_RDTSCP	1
 #define VGETCPU_LSL	2

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (2 preceding siblings ...)
  2007-02-01  9:59 ` [patch 3/9] Remove the support for the VXTIME_HPET " jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-01 11:14   ` Andi Kleen
  2007-02-03  1:16   ` H. Peter Anvin
  2007-02-01  9:59 ` [patch 5/9] Add all the necessary structures to the vsyscall page jbohac
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: remove_tsc_synchronization --]
[-- Type: text/plain, Size: 7305 bytes --]

TSC is either synchronized by design or not reliable
to be used for anything, let alone timekeeping.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Index: linux-2.6.20-rc5/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/smpboot.c
@@ -148,217 +148,6 @@ static void __cpuinit smp_store_cpu_info
 	print_cpu_info(c);
 }
 
-/*
- * New Funky TSC sync algorithm borrowed from IA64.
- * Main advantage is that it doesn't reset the TSCs fully and
- * in general looks more robust and it works better than my earlier
- * attempts. I believe it was written by David Mosberger. Some minor
- * adjustments for x86-64 by me -AK
- *
- * Original comment reproduced below.
- *
- * Synchronize TSC of the current (slave) CPU with the TSC of the
- * MASTER CPU (normally the time-keeper CPU).  We use a closed loop to
- * eliminate the possibility of unaccounted-for errors (such as
- * getting a machine check in the middle of a calibration step).  The
- * basic idea is for the slave to ask the master what itc value it has
- * and to read its own itc before and after the master responds.  Each
- * iteration gives us three timestamps:
- *
- *	slave		master
- *
- *	t0 ---\
- *             ---\
- *		   --->
- *			tm
- *		   /---
- *	       /---
- *	t1 <---
- *
- *
- * The goal is to adjust the slave's TSC such that tm falls exactly
- * half-way between t0 and t1.  If we achieve this, the clocks are
- * synchronized provided the interconnect between the slave and the
- * master is symmetric.  Even if the interconnect were asymmetric, we
- * would still know that the synchronization error is smaller than the
- * roundtrip latency (t0 - t1).
- *
- * When the interconnect is quiet and symmetric, this lets us
- * synchronize the TSC to within one or two cycles.  However, we can
- * only *guarantee* that the synchronization is accurate to within a
- * round-trip time, which is typically in the range of several hundred
- * cycles (e.g., ~500 cycles).  In practice, this means that the TSCs
- * are usually almost perfectly synchronized, but we shouldn't assume
- * that the accuracy is much better than half a micro second or so.
- *
- * [there are other errors like the latency of RDTSC and of the
- * WRMSR. These can also account to hundreds of cycles. So it's
- * probably worse. It claims 153 cycles error on a dual Opteron,
- * but I suspect the numbers are actually somewhat worse -AK]
- */
-
-#define MASTER	0
-#define SLAVE	(SMP_CACHE_BYTES/8)
-
-/* Intentionally don't use cpu_relax() while TSC synchronization
-   because we don't want to go into funky power save modi or cause
-   hypervisors to schedule us away.  Going to sleep would likely affect
-   latency and low latency is the primary objective here. -AK */
-#define no_cpu_relax() barrier()
-
-static __cpuinitdata DEFINE_SPINLOCK(tsc_sync_lock);
-static volatile __cpuinitdata unsigned long go[SLAVE + 1];
-static int notscsync __cpuinitdata;
-
-#undef DEBUG_TSC_SYNC
-
-#define NUM_ROUNDS	64	/* magic value */
-#define NUM_ITERS	5	/* likewise */
-
-/* Callback on boot CPU */
-static __cpuinit void sync_master(void *arg)
-{
-	unsigned long flags, i;
-
-	go[MASTER] = 0;
-
-	local_irq_save(flags);
-	{
-		for (i = 0; i < NUM_ROUNDS*NUM_ITERS; ++i) {
-			while (!go[MASTER])
-				no_cpu_relax();
-			go[MASTER] = 0;
-			rdtscll(go[SLAVE]);
-		}
-	}
-	local_irq_restore(flags);
-}
-
-/*
- * Return the number of cycles by which our tsc differs from the tsc
- * on the master (time-keeper) CPU.  A positive number indicates our
- * tsc is ahead of the master, negative that it is behind.
- */
-static inline long
-get_delta(long *rt, long *master)
-{
-	unsigned long best_t0 = 0, best_t1 = ~0UL, best_tm = 0;
-	unsigned long tcenter, t0, t1, tm;
-	int i;
-
-	for (i = 0; i < NUM_ITERS; ++i) {
-		rdtscll(t0);
-		go[MASTER] = 1;
-		while (!(tm = go[SLAVE]))
-			no_cpu_relax();
-		go[SLAVE] = 0;
-		rdtscll(t1);
-
-		if (t1 - t0 < best_t1 - best_t0)
-			best_t0 = t0, best_t1 = t1, best_tm = tm;
-	}
-
-	*rt = best_t1 - best_t0;
-	*master = best_tm - best_t0;
-
-	/* average best_t0 and best_t1 without overflow: */
-	tcenter = (best_t0/2 + best_t1/2);
-	if (best_t0 % 2 + best_t1 % 2 == 2)
-		++tcenter;
-	return tcenter - best_tm;
-}
-
-static __cpuinit void sync_tsc(unsigned int master)
-{
-	int i, done = 0;
-	long delta, adj, adjust_latency = 0;
-	unsigned long flags, rt, master_time_stamp, bound;
-#ifdef DEBUG_TSC_SYNC
-	static struct syncdebug {
-		long rt;	/* roundtrip time */
-		long master;	/* master's timestamp */
-		long diff;	/* difference between midpoint and master's timestamp */
-		long lat;	/* estimate of tsc adjustment latency */
-	} t[NUM_ROUNDS] __cpuinitdata;
-#endif
-
-	printk(KERN_INFO "CPU %d: Syncing TSC to CPU %u.\n",
-		smp_processor_id(), master);
-
-	go[MASTER] = 1;
-
-	/* It is dangerous to broadcast IPI as cpus are coming up,
-	 * as they may not be ready to accept them.  So since
-	 * we only need to send the ipi to the boot cpu direct
-	 * the message, and avoid the race.
-	 */
-	smp_call_function_single(master, sync_master, NULL, 1, 0);
-
-	while (go[MASTER])	/* wait for master to be ready */
-		no_cpu_relax();
-
-	spin_lock_irqsave(&tsc_sync_lock, flags);
-	{
-		for (i = 0; i < NUM_ROUNDS; ++i) {
-			delta = get_delta(&rt, &master_time_stamp);
-			if (delta == 0) {
-				done = 1;	/* let's lock on to this... */
-				bound = rt;
-			}
-
-			if (!done) {
-				unsigned long t;
-				if (i > 0) {
-					adjust_latency += -delta;
-					adj = -delta + adjust_latency/4;
-				} else
-					adj = -delta;
-
-				rdtscll(t);
-				wrmsrl(MSR_IA32_TSC, t + adj);
-			}
-#ifdef DEBUG_TSC_SYNC
-			t[i].rt = rt;
-			t[i].master = master_time_stamp;
-			t[i].diff = delta;
-			t[i].lat = adjust_latency/4;
-#endif
-		}
-	}
-	spin_unlock_irqrestore(&tsc_sync_lock, flags);
-
-#ifdef DEBUG_TSC_SYNC
-	for (i = 0; i < NUM_ROUNDS; ++i)
-		printk("rt=%5ld master=%5ld diff=%5ld adjlat=%5ld\n",
-		       t[i].rt, t[i].master, t[i].diff, t[i].lat);
-#endif
-
-	printk(KERN_INFO
-	       "CPU %d: synchronized TSC with CPU %u (last diff %ld cycles, "
-	       "maxerr %lu cycles)\n",
-	       smp_processor_id(), master, delta, rt);
-}
-
-static void __cpuinit tsc_sync_wait(void)
-{
-	/*
-	 * When the CPU has synchronized TSCs assume the BIOS
-  	 * or the hardware already synced.  Otherwise we could
-	 * mess up a possible perfect synchronization with a
-	 * not-quite-perfect algorithm.
-	 */
-	if (notscsync || !cpu_has_tsc || !unsynchronized_tsc())
-		return;
-	sync_tsc(0);
-}
-
-static __init int notscsync_setup(char *s)
-{
-	notscsync = 1;
-	return 1;
-}
-__setup("notscsync", notscsync_setup);
-
 static atomic_t init_deasserted __cpuinitdata;
 
 /*
@@ -565,14 +354,6 @@ void __cpuinit start_secondary(void)
 	 */
 	set_cpu_sibling_map(smp_processor_id());
 
-	/* 
-  	 * Wait for TSC sync to not schedule things before.
-	 * We still process interrupts, which could see an inconsistent
-	 * time in that window unfortunately. 
-	 * Do this here because TSC sync has global unprotected state.
- 	 */
-	tsc_sync_wait();
-
 	/*
 	 * We need to hold call_lock, so there is no inconsistency
 	 * between the time smp_call_function() determines number of

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 5/9] Add all the necessary structures to the vsyscall page
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (3 preceding siblings ...)
  2007-02-01  9:59 ` [patch 4/9] Remove the TSC synchronization on SMP machines jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-01 11:17   ` Andi Kleen
  2007-02-01  9:59 ` [patch 6/9] Add the "Master Timer" jbohac
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: prepare_vsyscall --]
[-- Type: text/plain, Size: 5096 bytes --]

The TSC-based Master Timer approximation code will need a couple of
per-CPU offsets and coefficients to approximate the value of a hardware
"Master Timer" based on the value of TSC on whichever CPU it will be
running.

We want to be able to do these approximations in a vsyscall, so we need
all this data in vsyscall-mapped pages.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>


Index: linux-2.6.20-rc5/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/vsyscall.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/vsyscall.c
@@ -278,9 +278,11 @@ static void __init map_vsyscall(void)
 {
 	extern char __vsyscall_0;
 	unsigned long physaddr_page0 = __pa_symbol(&__vsyscall_0);
+	int i;
 
 	/* Note that VSYSCALL_MAPPED_PAGES must agree with the code below. */
-	__set_fixmap(VSYSCALL_FIRST_PAGE, physaddr_page0, PAGE_KERNEL_VSYSCALL);
+	for (i = 0; i < VSYSCALL_MAPPED_PAGES; ++i)
+		__set_fixmap(VSYSCALL_FIRST_PAGE - i, physaddr_page0 + (i << 12), PAGE_KERNEL_VSYSCALL);
 }
 
 static int __init vsyscall_init(void)
Index: linux-2.6.20-rc5/include/asm-x86_64/vsyscall.h
===================================================================
--- linux-2.6.20-rc5.orig/include/asm-x86_64/vsyscall.h
+++ linux-2.6.20-rc5/include/asm-x86_64/vsyscall.h
@@ -10,7 +10,6 @@ enum vsyscall_num {
 #define VSYSCALL_START (-10UL << 20)
 #define VSYSCALL_SIZE 1024
 #define VSYSCALL_END (-2UL << 20)
-#define VSYSCALL_MAPPED_PAGES 1
 #define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))
 
 #ifdef __KERNEL__
@@ -24,19 +23,40 @@ enum vsyscall_num {
 #define __section_xtime __attribute__ ((unused, __section__ (".xtime"), aligned(16)))
 #define __section_xtime_lock __attribute__ ((unused, __section__ (".xtime_lock"), aligned(16)))
 
-#define VXTIME_TSC	1
+#define VXTIME_TSC	1	/* estimate MT based on rdtsc (fast; UP or synced SMP only) */
+#define VXTIME_TSCS	2	/* estimate MT based on rdtsc in a syscall with locking */
+#define VXTIME_TSCM	3	/* estimate MT based on rdtsc in a syscall, ensure monotonicity */
+#define VXTIME_TSCP	4	/* estimate MT with the help of rdtscp (fast) */
+#define VXTIME_MT	5	/* read the MT, don't estimate (slowest) */
+
+#define VXTIME_TSC_INVALID	0x1
+#define VXTIME_TSC_CPUFREQ	0x2
 
 #define VGETCPU_RDTSCP	1
 #define VGETCPU_LSL	2
 
 struct vxtime_data {
+	union {
+		struct {
+			u64 tsc_slope;		/* TSC to MT coefficient */
+			u64 tsc_slope_avg;	/* average tsc_slope */
+			u64 mt_base;		/* approximated MT at the last LAPIC tick */
+			u64 mt_last;		/* MT at the last LAPIC tick */
+			u64 tsc_last;		/* TSC at the last LAPIC tick */
+			u64 last_mt_guess;	/* ensures monotonicity in temporary MT mode */
+			char tsc_invalid;	/* don't trust the TSC now (frequency changing) */
+		};
+		char pad[64];	/* cacheline alignment */
+	} cpu[NR_CPUS];
 	long hpet_address;	/* HPET base address */
-	int last;
-	unsigned long last_tsc;
-	long quot;
-	long tsc_quot;
+	u64 mt_q;		/* master timer to nsec quotient */
+	u64 mt_wall;		/* MT ticks already covered by the jiffies */
+	s64 ns_drift;		/* MT - xtime drift in the last tick in ns */
 	int mode;
 };
+#define VSYSCALL_MAPPED_PAGES (1 + (sizeof(struct vxtime_data) + 4095) / 4096)
+
+#define TSC_SLOPE_SCALE	32
 
 #define hpet_readl(a)           readl((const void __iomem *)fix_to_virt(FIX_HPET_BASE) + a)
 #define hpet_writel(d,a)        writel(d, (void __iomem *)fix_to_virt(FIX_HPET_BASE) + a)
Index: linux-2.6.20-rc5/arch/x86_64/kernel/vmlinux.lds.S
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/vmlinux.lds.S
+++ linux-2.6.20-rc5/arch/x86_64/kernel/vmlinux.lds.S
@@ -87,13 +87,13 @@ SECTIONS
   .vsyscall_0 :	 AT(VSYSCALL_PHYS_ADDR) { *(.vsyscall_0) } :user
   __vsyscall_0 = VSYSCALL_VIRT_ADDR;
 
+  .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) { *(.vsyscall_1) }
+  .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { *(.vsyscall_2) }
+  .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { *(.vsyscall_3) }
   . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
   .xtime_lock : AT(VLOAD(.xtime_lock)) { *(.xtime_lock) }
   xtime_lock = VVIRT(.xtime_lock);
 
-  .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
-  vxtime = VVIRT(.vxtime);
-
   .vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) }
   vgetcpu_mode = VVIRT(.vgetcpu_mode);
 
@@ -110,11 +110,15 @@ SECTIONS
   .jiffies : AT(VLOAD(.jiffies)) { *(.jiffies) }
   jiffies = VVIRT(.jiffies);
 
-  .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1)) { *(.vsyscall_1) }
-  .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { *(.vsyscall_2) }
-  .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { *(.vsyscall_3) }
+  . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+
+  . = ALIGN(4096);
+
+  .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) }
+  vxtime = VVIRT(.vxtime);
+
+  . = (VSYSCALL_VIRT_ADDR + 4096 + (CONFIG_NR_CPUS * 64 + 40 + 4095)) & ~(4095);
 
-  . = VSYSCALL_VIRT_ADDR + 4096;
 
 #undef VSYSCALL_ADDR
 #undef VSYSCALL_PHYS_ADDR

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 6/9] Add the "Master Timer"
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (4 preceding siblings ...)
  2007-02-01  9:59 ` [patch 5/9] Add all the necessary structures to the vsyscall page jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-01 11:22   ` Andi Kleen
  2007-02-01  9:59 ` [patch 7/9] Adapt the time initialization code jbohac
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: add_master_timer --]
[-- Type: text/plain, Size: 5514 bytes --]

Master Timer (MT) is a reliable, monotonic, constantly growing 64 bit timer. At
present, PM timer or HPET can be used as the Master Timer.

None of them is 64 bit (HPET migt be, but not always), so we access them
through the get_master_timer64() and update_master_timer64() functions
that take care of the wraparounds. update_master_timer64() needs to be
called once in a while, at least once every period of the corresponding
hardware timer (a couple minutes for HPET, cca 3-4 seconds for PM). This
will be done from the main timer handler.

While the hardware MT is reliable and monotonic, it is slow to read. We
want to approximate it using the TSC. guess_mt() does just that, using a
lot of per-cpu calibration data.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Index: linux-2.6.20-rc5/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/time.c
@@ -54,9 +54,14 @@ static char *timename = NULL;
 DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL(rtc_lock);
 DEFINE_SPINLOCK(i8253_lock);
+DEFINE_SEQLOCK(mt_lock);
+
+DEFINE_SPINLOCK(monotonic_mt_lock);
+static u64 last_monotonic_mt;
 
 int nohpet __initdata = 0;
 static int notsc __initdata = 0;
+static int nomonotonic __initdata = 0;
 
 #define USEC_PER_TICK (USEC_PER_SEC / HZ)
 #define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
@@ -65,14 +70,18 @@ static int notsc __initdata = 0;
 #define NS_SCALE	10 /* 2^10, carefully chosen */
 #define US_SCALE	32 /* 2^32, arbitralrily chosen */
 
-unsigned int cpu_khz;					/* TSC clocks / usec, not used here */
+unsigned int cpu_khz;		/* TSC clocks / usec, not used here */
+static s64 mt_per_tick;		/* master timer ticks per jiffie */
+static u64 __mt;		/* master timer */
+static u32 __mt_last;		/* value last read from read_master_timer() when updating timer caches */
+
+u32 (*read_master_timer)(void);
+
 EXPORT_SYMBOL(cpu_khz);
 static unsigned long hpet_period;			/* fsecs / HPET clock */
 unsigned long hpet_tick;				/* HPET clocks / interrupt */
 int hpet_use_timer;				/* Use counter of hpet for time keeping, otherwise PIT */
-unsigned long vxtime_hz = PIT_TICK_RATE;
 int report_lost_ticks;				/* command line option */
-unsigned long long monotonic_base;
 
 struct vxtime_data __vxtime __section_vxtime;	/* for vsyscalls */
 
@@ -80,6 +89,137 @@ volatile unsigned long __jiffies __secti
 struct timespec __xtime __section_xtime;
 struct timezone __sys_tz __section_sys_tz;
 
+#define TSC_SLOPE_DECAY	16
+
+
+/*
+ * set the 64-bit master timer to a given value
+ */
+static inline void set_master_timer64(u64 t)
+{
+	unsigned long flags;
+
+	write_seqlock_irqsave(&mt_lock, flags);
+
+	__mt_last = read_master_timer();
+	__mt = t;
+
+	write_sequnlock_irqrestore(&mt_lock, flags);
+}
+
+/*
+ * add/subtract a number of ticks from the 64-bit master timer
+ */
+static inline void add_master_timer64(s64 t)
+{
+	unsigned long flags;
+	write_seqlock_irqsave(&mt_lock, flags);
+	__mt += t;
+	write_sequnlock_irqrestore(&mt_lock, flags);
+}
+
+/*
+ * get the 64-bit non-overflowing master timer based on current
+ * master timer reading
+ */
+static u64 get_master_timer64(void)
+{
+	u64 ret;
+	u32 delta, now;
+	unsigned long seq;
+	do {
+		seq = read_seqbegin(&mt_lock);
+
+		now = read_master_timer();
+		delta = now - __mt_last;
+		ret = __mt + delta;
+
+
+	} while (read_seqretry(&mt_lock, seq));
+
+	return ret;
+}
+
+/*
+ * get and update the 64-bit non-overflowing master timer based on current
+ * master timer reading
+ *
+ * This needs to be called often enough to prevent the MT from overflowing.
+ * Doing this from the main timer handler is enough. Other places can call
+ * get_master_timer64() instead, avoiding unnecessary contention.
+ */
+static u64 update_master_timer64(void)
+{
+	u32 delta, now;
+	unsigned long flags;
+	write_seqlock_irqsave(&mt_lock, flags);
+
+	now = read_master_timer();
+	delta = now - __mt_last;
+	__mt_last = now;
+	__mt += delta;
+
+	write_sequnlock_irqrestore(&mt_lock, flags);
+
+	return __mt;
+}
+
+/*
+ * estimates the current value of the master timer, based on the TSC
+ */
+static inline u64 __guess_mt(u64 tsc, int cpu)
+{
+	return (((tsc - vxtime.cpu[cpu].tsc_last) * vxtime.cpu[cpu].tsc_slope)
+			>> TSC_SLOPE_SCALE) + vxtime.cpu[cpu].mt_base;
+}
+
+/*
+ * estimates the current value of the master timer, based on the TSC
+ * and corrects the estimate to make it monotonic even across CPUs if needed.
+ */
+
+static inline u64 guess_mt(u64 tsc, int cpu)
+{
+	u64 mt;
+
+	if (unlikely(vxtime.mode == VXTIME_MT || vxtime.cpu[cpu].tsc_invalid))
+		mt = max(get_master_timer64(), vxtime.cpu[cpu].last_mt_guess);
+	else
+		mt = __guess_mt(tsc, cpu);
+
+	if (mt < last_monotonic_mt)
+		mt = last_monotonic_mt;
+
+	return mt;
+}
+
+static inline void update_monotonic_mt(u64 mt)
+{
+	unsigned long flags;
+
+	if (vxtime.mode != VXTIME_TSCM)
+		return;
+
+	spin_lock_irqsave(&monotonic_mt_lock, flags);
+
+	if (mt > last_monotonic_mt)
+		last_monotonic_mt = mt;
+
+	spin_unlock_irqrestore(&monotonic_mt_lock, flags);
+}
+
+static u32 read_master_timer_hpet(void)
+{
+	return hpet_readl(HPET_COUNTER);
+}
+
+static u32 read_master_timer_pm(void)
+{
+	/* the shift ensures u32 wraparound at the time	of
+	   the 24-bit counter wraparound */
+	return inl(pmtmr_ioport) << 8;
+}
+
 /*
  * do_gettimeoffset() returns microseconds since last timer interrupt was
  * triggered by hardware. A memory read of HPET is slower than a register read

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 7/9] Adapt the time initialization code
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (5 preceding siblings ...)
  2007-02-01  9:59 ` [patch 6/9] Add the "Master Timer" jbohac
@ 2007-02-01  9:59 ` jbohac
  2007-02-01 11:26   ` Andi Kleen
  2007-02-01 10:00 ` [patch 8/9] Add time_update_mt_guess() jbohac
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01  9:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: time_init --]
[-- Type: text/plain, Size: 8230 bytes --]

We need to:
 - initialize the RDTSCP instruction if available
 - decide which hardware timer to use as MT (PM or HPET)
 - decide what level of TSC->MT approximation to use
   - none (as slow as the hardware MT can get) -- "notsc" option
   - strictly monotonic (can't be done in a vsyscall) -- default
   - unguaranteed monotonicity (very fast, vsyscall) -- "nomonotonic" option

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Index: linux-2.6.20-rc5/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/smpboot.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/smpboot.c
@@ -318,6 +318,8 @@ static inline void set_cpu_sibling_map(i
 	}
 }
 
+extern void time_initialize_cpu(void);
+
 /*
  * Setup code on secondary processor (after comming out of the trampoline)
  */
@@ -345,6 +347,7 @@ void __cpuinit start_secondary(void)
 		enable_NMI_through_LVT0(NULL);
 		enable_8259A_irq(0);
 	}
+	time_initialize_cpu();
 
 	enable_APIC_timer();
 
@@ -963,6 +966,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
 	return err;
 }
 
+extern void time_init_gtod(void);
+
 /*
  * Finish the SMP boot.
  */
Index: linux-2.6.20-rc5/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/time.c
@@ -968,6 +968,16 @@ static struct irqaction irq0 = {
 	timer_interrupt, IRQF_DISABLED, CPU_MASK_NONE, "timer", NULL, NULL
 };
 
+static void time_init_rdtsc(void)
+{
+	if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP)) {
+		int p;
+		p = smp_processor_id();
+		printk(KERN_INFO "Initializing RDTSCP feature on cpu %d.\n", p);
+		write_rdtscp_aux(p);
+	}
+}
+
 void __init time_init(void)
 {
 	if (nohpet)
@@ -979,27 +989,40 @@ void __init time_init(void)
 	set_normalized_timespec(&wall_to_monotonic,
 	                        -xtime.tv_sec, -xtime.tv_nsec);
 
-	if (!hpet_init())
-                vxtime_hz = (FSEC_PER_SEC + hpet_period / 2) / hpet_period;
-	else
-		vxtime.hpet_address = 0;
+	/* decide what to use as Master Timer: HPET or PM? */
+	if (!hpet_init()) {
+		vxtime.mt_q = (hpet_period << 32) / FSEC_PER_NSEC;
+		mt_per_tick = hpet_use_timer ?
+			hpet_tick : LATCH * 12;
+		read_master_timer = read_master_timer_hpet;
+		timename = "HPET";
+	} else
+	if (pmtmr_ioport) {
+		vxtime.mt_q = (NSEC_PER_SEC << 32) / 916363636UL;
+		mt_per_tick = (LATCH * 3) << 8;
+		read_master_timer = read_master_timer_pm;
+		timename = "PM timer";
+	} else
+		panic("No suitable Master Timer found.\n");
 
-	if (hpet_use_timer) {
-		/* set tick_nsec to use the proper rate for HPET */
-	  	tick_nsec = TICK_NSEC_HPET;
+	/* use PIT as main timer interrupt source if we can't use HPET */
+	if (hpet_use_timer)
 		cpu_khz = hpet_calibrate_tsc();
-		timename = "HPET";
-	} else {
+	else {
 		pit_init();
 		cpu_khz = pit_calibrate_tsc();
-		timename = "PIT";
 	}
 
-	vxtime.mode = VXTIME_TSC;
-	vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz;
-	vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
-	vxtime.last_tsc = get_cycles_sync();
-	set_cyc2ns_scale(cpu_khz);
+	/* initialize the master timer */
+	set_master_timer64(0);
+
+	/* initialize the timekeeping variables of the boot CPU */
+	vxtime.cpu[0].tsc_last = get_cycles_sync();
+	vxtime.cpu[0].mt_last = vxtime.cpu[0].mt_base = 0;
+	vxtime.cpu[0].tsc_slope = vxtime.cpu[0].tsc_slope_avg = (((USEC_PER_SEC << 32) / vxtime.mt_q) << TSC_SLOPE_SCALE) / cpu_khz;
+	time_init_rdtsc();
+
+	vxtime.mode = VXTIME_TSCS; /* not sure yet if CPUs have synced TSCs */
 	setup_irq(0, &irq0);
 
 #ifndef CONFIG_SMP
@@ -1037,28 +1060,71 @@ __cpuinit int unsynchronized_tsc(void)
  */
 void time_init_gtod(void)
 {
-	char *timetype;
-
-	if (unsynchronized_tsc())
-		notsc = 1;
+	char *tsc_method;
 
  	if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP))
 		vgetcpu_mode = VGETCPU_RDTSCP;
 	else
 		vgetcpu_mode = VGETCPU_LSL;
 
-		timetype = hpet_use_timer ? "HPET/TSC" : "PIT/TSC";
+	if (notsc) {
+		tsc_method = "nothing";
+		vxtime.mode = VXTIME_MT;
+	}
+	else if (cpu_has(&boot_cpu_data, X86_FEATURE_RDTSCP)) {
+		if (nomonotonic) {
+			tsc_method = "RDTSCP";
+			vxtime.mode = VXTIME_TSCP;
+		}
+		else {
+			tsc_method = "RDTSCM (syscall)";
+			vxtime.mode = VXTIME_TSCM;
+		}
+	}
+	else {
+		tsc_method = "RDTSC";
 		vxtime.mode = VXTIME_TSC;
+#ifdef CONFIG_SMP
+		if (unsynchronized_tsc()) {
+			if (nomonotonic) {
+				tsc_method = "RDTSC (syscall)";
+				vxtime.mode = VXTIME_TSCS;
+			}
+			else {
+				tsc_method = "RDTSCM (syscall)";
+				vxtime.mode = VXTIME_TSCM;
+			}
+		}
+#endif
+ 	}
+
 
-	printk(KERN_INFO "time.c: Using %ld.%06ld MHz WALL %s GTOD %s timer.\n",
-	       vxtime_hz / 1000000, vxtime_hz % 1000000, timename, timetype);
+	printk(KERN_INFO "time.c: Using %s as master timer, %s for time caching; vsyscall %s.\n",
+	       timename, tsc_method, sysctl_vsyscall ? "enabled" : "disabled");
+	printk(KERN_INFO "time.c: Using %s as interrupt source.\n",
+	       hpet_use_timer ? "HPET" : "PIT");
 	printk(KERN_INFO "time.c: Detected %d.%03d MHz processor.\n",
 		cpu_khz / 1000, cpu_khz % 1000);
-	vxtime.quot = (USEC_PER_SEC << US_SCALE) / vxtime_hz;
-	vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
-	vxtime.last_tsc = get_cycles_sync();
+}
 
-	set_cyc2ns_scale(cpu_khz);
+/*
+ * initialize the per_cpu timekeeping variables
+ * for non-boot CPUs
+ */
+void time_initialize_cpu(void *info)
+{
+	unsigned long flags;
+	int cpu = smp_processor_id();
+	u64 mt_now;
+	write_seqlock_irqsave(&xtime_lock, flags);
+	mt_now = get_master_timer64();
+	vxtime.cpu[cpu].tsc_last = get_cycles_sync();
+	vxtime.cpu[cpu].mt_last = mt_now;
+	vxtime.cpu[cpu].mt_base = mt_now;
+	vxtime.cpu[cpu].tsc_slope = vxtime.cpu[0].tsc_slope;
+	vxtime.cpu[cpu].tsc_slope_avg = vxtime.cpu[0].tsc_slope_avg;
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+	time_init_rdtsc();
 }
 
 __setup("report_lost_ticks", time_setup);
@@ -1098,19 +1164,38 @@ static int timer_resume(struct sys_devic
 		sleep_length = 0;
 		ctime = sleep_start;
 	}
-	if (vxtime.hpet_address)
+
+	write_seqlock_irqsave(&xtime_lock,flags);
+
+	/* reenable the Master Timer */
+	if (read_master_timer == read_master_timer_hpet || hpet_use_timer)
 		hpet_reenable();
+	/* reenable PIT if used as main timer interrupt source */
 	else
 		i8254_timer_resume();
 
 	sec = ctime + clock_cmos_diff;
-	write_seqlock_irqsave(&xtime_lock,flags);
 	xtime.tv_sec = sec;
 	xtime.tv_nsec = 0;
-		vxtime.last_tsc = get_cycles_sync();
-	write_sequnlock_irqrestore(&xtime_lock,flags);
+
+	/* re-initialize the master timer */
+	add_master_timer64(sleep_length * mt_per_tick);
+	vxtime.mt_wall += sleep_length * mt_per_tick;
+
 	jiffies += sleep_length;
-	monotonic_base += sleep_length * (NSEC_PER_SEC/HZ);
+
+	/* re-initialize the timekeeping variables of the boot CPU */
+	vxtime.cpu[0].tsc_last = get_cycles_sync();
+	vxtime.cpu[0].mt_last = vxtime.cpu[0].mt_base = get_master_timer64();
+	/* FIXME: what speed does the cpu really start at; I doubt cpu_khz is right at this point ??!!!
+	   And what speed do the non-boot cpus start at? Their timekeeping variables will probably be set wrong
+	   by copying from CPU 0 in time_initialize_cpu(); Not a great deal, as they will be synced somehow,
+	   but not exactly nice -JB */
+	vxtime.cpu[0].tsc_slope = vxtime.cpu[0].tsc_slope_avg =
+		(((USEC_PER_SEC << 32) / vxtime.mt_q) << TSC_SLOPE_SCALE) / cpu_khz;
+
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+	
 	touch_softlockup_watchdog();
 	return 0;
 }
@@ -1402,3 +1487,11 @@ int __init notsc_setup(char *s)
 }
 
 __setup("notsc", notsc_setup);
+
+int __init nomonotonic_setup(char *s)
+{
+	nomonotonic = 1;
+	return 1;
+}
+
+__setup("nomonotonic", nomonotonic_setup);
Index: linux-2.6.20-rc5/include/linux/time.h
===================================================================
--- linux-2.6.20-rc5.orig/include/linux/time.h
+++ linux-2.6.20-rc5/include/linux/time.h
@@ -31,6 +31,7 @@ struct timezone {
 #define MSEC_PER_SEC	1000L
 #define USEC_PER_MSEC	1000L
 #define NSEC_PER_USEC	1000L
+#define FSEC_PER_NSEC 	1000000L
 #define NSEC_PER_MSEC	1000000L
 #define USEC_PER_SEC	1000000L
 #define NSEC_PER_SEC	1000000000L

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 8/9] Add time_update_mt_guess()
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (6 preceding siblings ...)
  2007-02-01  9:59 ` [patch 7/9] Adapt the time initialization code jbohac
@ 2007-02-01 10:00 ` jbohac
  2007-02-01 11:28   ` Andi Kleen
  2007-02-01 10:00 ` [patch 9/9] Make use of the Master Timer jbohac
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01 10:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: update_mt_guess --]
[-- Type: text/plain, Size: 9019 bytes --]

time_update_mt_guess() is the core of the TSC->MT approximation magic.

Called periodically from the LAPIC timer interrupt handler, it fine-tunes 
all the per-CPU offsets and ratios needed by guess_mt() to approximate the
MT using any processor's TSC.

We also need to update these from the cpufreq notifiers. Because a frequency
change makes the approximation unreliable (we don't know _exactly_ when it
happens) the approximation is disabled for a while after a frequency change and 
it's not re-enabled until the approximation stabilises again.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>


Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
@@ -63,6 +63,9 @@ int using_apic_timer __read_mostly = 0;
 
 static void apic_pm_activate(void);
 
+extern void time_update_mt_guess(void);
+
+
 void enable_NMI_through_LVT0 (void * dummy)
 {
 	unsigned int v;
@@ -986,6 +989,8 @@ void smp_local_timer_interrupt(void)
 	 * Currently this isn't too much of an issue (performance wise),
 	 * we can take more than 100K local irqs per second on a 100 MHz P5.
 	 */
+
+	 time_update_mt_guess();
 }
 
 /*
Index: linux-2.6.20-rc5/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/time.c
@@ -221,6 +221,126 @@ static u32 read_master_timer_pm(void)
 }
 
 /*
+ * This function, called from the LAPIC interrupt,
+ * periodically updates all the per-CPU values needed by
+ * guess_mt()
+ */
+void time_update_mt_guess(void)
+{
+	u64 t, delta_t, delta_mt, mt;
+	s64 guess_mt_err, guess_mt_err_nsec, tsc_per_tick, tsc_slope_corr,
+	    current_slope, old_mt_err;
+	int cpu = smp_processor_id(), resync;
+	unsigned long flags;
+
+	if (vxtime.mode == VXTIME_TSC && cpu != 0)
+		return;
+
+	local_irq_save(flags);
+
+	/* if a frequency change is in progress, don't recalculate anything
+	   as this would destroy the fine-tuned slope. We don't rely on the TSC
+	   during this time, so we don't care about the accuracy at all */
+	if (vxtime.cpu[cpu].tsc_invalid == VXTIME_TSC_CPUFREQ) {
+		local_irq_restore(flags);
+		return;
+	}
+
+	mt = get_master_timer64();
+	t = get_cycles_sync();
+
+	write_seqlock(&xtime_lock);
+
+	/* get the error of the estimated MT value */
+	delta_t = t - vxtime.cpu[cpu].tsc_last;
+	delta_mt = mt - vxtime.cpu[cpu].mt_last;
+	tsc_per_tick = ((mt_per_tick << 32) / delta_mt * delta_t) >> 32;
+
+	vxtime.cpu[cpu].mt_base = __guess_mt(t, cpu);
+
+	guess_mt_err = mt - vxtime.cpu[cpu].mt_base;
+	guess_mt_err_nsec = (guess_mt_err * (s64)vxtime.mt_q) >> 32;
+	old_mt_err =  ((s64)(vxtime.cpu[cpu].tsc_slope_avg - vxtime.cpu[cpu].tsc_slope)
+			* tsc_per_tick) >> TSC_SLOPE_SCALE;
+	current_slope = (delta_mt << TSC_SLOPE_SCALE) / delta_t;
+
+	/* calculate a long time average to attenuate oscilation */
+	vxtime.cpu[cpu].tsc_slope_avg = ((TSC_SLOPE_DECAY - 1) * vxtime.cpu[cpu].tsc_slope_avg +
+			current_slope) / TSC_SLOPE_DECAY;
+
+	tsc_slope_corr = ((s64)(guess_mt_err << TSC_SLOPE_SCALE)) / tsc_per_tick;
+	vxtime.cpu[cpu].tsc_slope = vxtime.cpu[cpu].tsc_slope_avg + tsc_slope_corr;
+
+	if ((s64)vxtime.cpu[cpu].tsc_slope < 0) {
+		vxtime.cpu[cpu].tsc_slope = 0;
+		vxtime.cpu[cpu].tsc_slope_avg = current_slope;
+	}
+
+	if (abs(guess_mt_err) > (mt_per_tick >> 2))
+		printk(KERN_DEBUG "Master Timer guess on cpu %d off by %lld.%.6ld seconds\n",
+			cpu, guess_mt_err_nsec / NSEC_PER_SEC,
+			(abs(guess_mt_err_nsec) % NSEC_PER_SEC) / 1000);
+
+	resync = 0;
+	/* if the guess is off by more than a second, something has gone very
+	   wrong; we'll break monotonicity and re-sync the guess with the MT */
+	if (abs(guess_mt_err_nsec) > NSEC_PER_SEC) {
+		resync = 1;
+		if (vxtime.mode != VXTIME_MT && guess_mt_err < 0)
+			printk(KERN_ERR "time not monotonic on cpu %d\n", cpu);
+	}
+	/* else if the guess is off by more than a jiffie, only synchronize the
+	   guess with the MT if the guess is behind (won't break monotonicity);
+	   if the guess is ahead, stop the timer by setting slope to zero */
+	else if (abs(guess_mt_err) > mt_per_tick) {
+		if (guess_mt_err > 0)
+			resync = 1;
+		else {
+			vxtime.cpu[cpu].tsc_slope = 0;
+			vxtime.cpu[cpu].tsc_slope_avg = current_slope;
+		}
+	}
+	/* good enough to switch back from temporary MT mode? */
+	else if (vxtime.cpu[cpu].tsc_invalid &&
+		    abs(guess_mt_err) < mt_per_tick / USEC_PER_TICK &&
+		    abs(old_mt_err) < mt_per_tick / USEC_PER_TICK &&
+		    mt > vxtime.cpu[cpu].last_mt_guess) {
+			vxtime.cpu[cpu].tsc_invalid = 0;
+			vxtime.cpu[cpu].mt_base = mt;
+			vxtime.cpu[cpu].tsc_slope = vxtime.cpu[cpu].tsc_slope_avg;
+	}
+
+	/* hard re-sync of the guess to the current value of the MT */
+	if (resync) {
+		vxtime.cpu[cpu].mt_base = mt;
+		vxtime.cpu[cpu].tsc_slope = vxtime.cpu[cpu].tsc_slope_avg = current_slope;
+
+		printk(KERN_INFO "Master Timer re-syncing on cpu %d (mt=%lld, slope=%lld)\n",
+			cpu, mt, vxtime.cpu[cpu].tsc_slope);
+	}
+
+	if (vxtime.cpu[cpu].tsc_slope == 0)
+		printk(KERN_INFO "timer on cpu %d frozen, waiting for time to catch up\n", cpu);
+
+	vxtime.cpu[cpu].tsc_last = t;
+	vxtime.cpu[cpu].mt_last = mt;
+
+	write_sequnlock(&xtime_lock);
+	local_irq_restore(flags);
+}
+
+inline u64 mt_to_nsec(u64 mt)
+{
+	u64 ret;
+	ret  = ((mt & 0xffffff) * vxtime.mt_q) >> 32;
+	mt >>= 24;
+	ret += ((mt & 0xffffff) * vxtime.mt_q) >> 8;
+	mt >>= 24;
+	ret += ( mt             * vxtime.mt_q) << 16;
+	return ret;
+}
+
+/*
  * do_gettimeoffset() returns microseconds since last timer interrupt was
  * triggered by hardware. A memory read of HPET is slower than a register read
  * of TSC, but much more reliable. It's also synchronized to the timer
@@ -666,50 +786,83 @@ static void cpufreq_delayed_get(void)
 }
 
 static unsigned int  ref_freq = 0;
-static unsigned long loops_per_jiffy_ref = 0;
 
 static unsigned long cpu_khz_ref = 0;
 
+struct cpufreq_notifier_data {
+	struct cpufreq_freqs *freq;
+	unsigned long val;
+};
+
+/* called on the CPU that changed frequency */
+static void time_cpufreq_notifier_on_cpu(void *data)
+{
+	unsigned long flags;
+	int cpu;
+	struct cpufreq_notifier_data *cnd = data;
+
+	write_seqlock_irqsave(&xtime_lock, flags);
+
+	cpu = smp_processor_id();
+	switch (cnd->val) {
+
+		case CPUFREQ_PRECHANGE:
+		case CPUFREQ_SUSPENDCHANGE:
+			if (!vxtime.cpu[cpu].tsc_invalid)
+				vxtime.cpu[cpu].last_mt_guess = __guess_mt(get_cycles_sync(), cpu);
+			vxtime.cpu[cpu].tsc_invalid = VXTIME_TSC_CPUFREQ;
+			break;
+
+		case CPUFREQ_POSTCHANGE:
+		case CPUFREQ_RESUMECHANGE:
+			vxtime.cpu[cpu].tsc_slope = ((vxtime.cpu[cpu].tsc_slope >> 4) * cnd->freq->old / cnd->freq->new) << 4;
+			vxtime.cpu[cpu].tsc_slope_avg = ((vxtime.cpu[cpu].tsc_slope_avg >> 4) * cnd->freq->old / cnd->freq->new) << 4;
+
+			vxtime.cpu[cpu].mt_base = vxtime.cpu[cpu].mt_last = get_master_timer64();
+			vxtime.cpu[cpu].tsc_last = get_cycles_sync();
+
+			vxtime.cpu[cpu].tsc_invalid = VXTIME_TSC_INVALID;
+			break;
+	}
+
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+}
+
 static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 				 void *data)
 {
-        struct cpufreq_freqs *freq = data;
-	unsigned long *lpj, dummy;
+	struct cpufreq_notifier_data cnd = {
+		.freq = data,
+		.val = val,
+	};
 
-	if (cpu_has(&cpu_data[freq->cpu], X86_FEATURE_CONSTANT_TSC))
+	if (cpu_has(&cpu_data[cnd.freq->cpu], X86_FEATURE_CONSTANT_TSC))
 		return 0;
 
-	lpj = &dummy;
-	if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-#ifdef CONFIG_SMP
-		lpj = &cpu_data[freq->cpu].loops_per_jiffy;
-#else
-		lpj = &boot_cpu_data.loops_per_jiffy;
-#endif
-
 	if (!ref_freq) {
-		ref_freq = freq->old;
-		loops_per_jiffy_ref = *lpj;
+		ref_freq = cnd.freq->old;
 		cpu_khz_ref = cpu_khz;
 	}
-        if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
-            (val == CPUFREQ_POSTCHANGE && freq->old > freq->new) ||
+
+	if ((val == CPUFREQ_PRECHANGE  && cnd.freq->old < cnd.freq->new) ||
+	    (val == CPUFREQ_POSTCHANGE && cnd.freq->old > cnd.freq->new) ||
 	    (val == CPUFREQ_RESUMECHANGE)) {
-                *lpj =
-		cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
 
-		cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new);
-		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
-			vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz;
+		cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, cnd.freq->new);
+
 	}
-	
-	set_cyc2ns_scale(cpu_khz_ref);
+
+	preempt_disable();
+	if (smp_processor_id() == cnd.freq->cpu)
+		time_cpufreq_notifier_on_cpu(&cnd);
+	else smp_call_function_single(cnd.freq->cpu, time_cpufreq_notifier_on_cpu, &cnd, 0, 1);
+	preempt_enable();
 
 	return 0;
 }
- 
+
 static struct notifier_block time_cpufreq_notifier_block = {
-         .notifier_call  = time_cpufreq_notifier
+	 .notifier_call  = time_cpufreq_notifier
 };
 
 static int __init cpufreq_tsc(void)

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [patch 9/9] Make use of the Master Timer
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (7 preceding siblings ...)
  2007-02-01 10:00 ` [patch 8/9] Add time_update_mt_guess() jbohac
@ 2007-02-01 10:00 ` jbohac
  2007-02-01 11:36   ` Andi Kleen
  2007-02-01 11:20 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andi Kleen
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 68+ messages in thread
From: jbohac @ 2007-02-01 10:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Jiri Bohac, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

[-- Attachment #1: use_master_timer --]
[-- Type: text/plain, Size: 12584 bytes --]

Make use of the whole Master Timer infrastructure in gettimeofday, 
monotonic_clock, etc.

Also make the vsyscall version of gettimeofday use the guess_mt() if
possible.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

Index: linux-2.6.20-rc5/arch/x86_64/kernel/time.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/time.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/time.c
@@ -341,27 +341,48 @@ inline u64 mt_to_nsec(u64 mt)
 }
 
 /*
- * do_gettimeoffset() returns microseconds since last timer interrupt was
+ * do_gettimeoffset() returns nanoseconds since last timer interrupt was
  * triggered by hardware. A memory read of HPET is slower than a register read
  * of TSC, but much more reliable. It's also synchronized to the timer
  * interrupt. Note that do_gettimeoffset() may return more than hpet_tick, if a
  * timer interrupt has happened already, but vxtime.trigger wasn't updated yet.
  * This is not a problem, because jiffies hasn't updated either. They are bound
  * together by xtime_lock.
+ *
+ * If used_mt is not null, it will be filled with the master timer value
+ * used for the calculation
  */
 
-static inline unsigned int do_gettimeoffset_tsc(void)
+static inline s64 do_gettimeoffset(u64 *used_mt)
 {
-	unsigned long t;
-	unsigned long x;
-	t = get_cycles_sync();
-	if (t < vxtime.last_tsc) 
-		t = vxtime.last_tsc; /* hack */
-	x = ((t - vxtime.last_tsc) * vxtime.tsc_quot) >> US_SCALE;
-	return x;
-}
+	int cpu = 0;
+	u64 tsc = 0, mt;
+	switch (vxtime.mode) {
+
+		case VXTIME_TSC:
+			rdtscll(tsc);
+                        break;
+
+                case VXTIME_TSCP:
+                        rdtscpll(tsc, cpu);
+			cpu &= 0xfff;
+			break;
 
-unsigned int (*do_gettimeoffset)(void) = do_gettimeoffset_tsc;
+		case VXTIME_TSCS:
+		case VXTIME_TSCM:
+			preempt_disable();
+			cpu = smp_processor_id();
+			rdtscll(tsc);
+			preempt_enable();
+			break;
+	}
+
+	mt = guess_mt(tsc, cpu);
+	if (used_mt)
+		*used_mt = mt;
+
+	return (((s64)(mt - vxtime.mt_wall)) * (s64)vxtime.mt_q) >> 32;
+}
 
 /*
  * This version of gettimeofday() has microsecond resolution and better than
@@ -372,28 +393,32 @@ unsigned int (*do_gettimeoffset)(void) =
 void do_gettimeofday(struct timeval *tv)
 {
 	unsigned long seq;
- 	unsigned int sec, usec;
+	unsigned int sec;
+	int nsec;
+	u64 mt;
 
 	do {
 		seq = read_seqbegin(&xtime_lock);
 
 		sec = xtime.tv_sec;
-		usec = xtime.tv_nsec / NSEC_PER_USEC;
+		nsec = xtime.tv_nsec;
 
-		/* i386 does some correction here to keep the clock 
-		   monotonous even when ntpd is fixing drift.
-		   But they didn't work for me, there is a non monotonic
-		   clock anyways with ntp.
-		   I dropped all corrections now until a real solution can
-		   be found. Note when you fix it here you need to do the same
-		   in arch/x86_64/kernel/vsyscall.c and export all needed
-		   variables in vmlinux.lds. -AK */ 
-		usec += do_gettimeoffset();
+		nsec += max(do_gettimeoffset(&mt), vxtime.ns_drift);
 
 	} while (read_seqretry(&xtime_lock, seq));
 
-	tv->tv_sec = sec + usec / USEC_PER_SEC;
-	tv->tv_usec = usec % USEC_PER_SEC;
+	/* this must be done outside the seqlock loop. Until the loop has finished,
+	the mt may be completely wrong, calculated from incosistent data */
+	update_monotonic_mt(mt);
+
+	sec += nsec / NSEC_PER_SEC;
+	nsec %= NSEC_PER_SEC;
+	if (nsec < 0) {
+		--sec;
+		nsec += NSEC_PER_SEC;
+	}
+	tv->tv_sec = sec;
+	tv->tv_usec = nsec / NSEC_PER_USEC;
 }
 
 EXPORT_SYMBOL(do_gettimeofday);
@@ -408,13 +433,13 @@ int do_settimeofday(struct timespec *tv)
 {
 	time_t wtm_sec, sec = tv->tv_sec;
 	long wtm_nsec, nsec = tv->tv_nsec;
+	unsigned long flags;
 
 	if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
 		return -EINVAL;
+	write_seqlock_irqsave(&xtime_lock, flags);
 
-	write_seqlock_irq(&xtime_lock);
-
-	nsec -= do_gettimeoffset() * NSEC_PER_USEC;
+	nsec -= do_gettimeoffset(NULL);
 
 	wtm_sec  = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
 	wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
@@ -424,7 +449,7 @@ int do_settimeofday(struct timespec *tv)
 
 	ntp_clear();
 
-	write_sequnlock_irq(&xtime_lock);
+	write_sequnlock_irqrestore(&xtime_lock, flags);
 	clock_was_set();
 	return 0;
 }
@@ -519,27 +544,32 @@ static void set_rtc_mmss(unsigned long n
 	spin_unlock(&rtc_lock);
 }
 
-
 /* monotonic_clock(): returns # of nanoseconds passed since time_init()
  *		Note: This function is required to return accurate
  *		time even in the absence of multiple timer ticks.
  */
-static inline unsigned long long cycles_2_ns(unsigned long long cyc);
 unsigned long long monotonic_clock(void)
 {
-	unsigned long seq;
- 	u32 last_offset, this_offset, offset;
-	unsigned long long base;
+	int cpu;
+	unsigned long flags;
+	u64 t;
 
-		do {
-			seq = read_seqbegin(&xtime_lock);
+	/* any code that modifies the per-CPU variables used in guess_mt
+	   will always run on this CPU, so we don't need to lock the xtime_lock
+	   here. If we did, it would create a deadlock on debug printks (and
+	   possibly elsewhere) called from other critical sections protected by
+	   the lock */
 
-			last_offset = vxtime.last_tsc;
-			base = monotonic_base;
-		} while (read_seqretry(&xtime_lock, seq));
-		this_offset = get_cycles_sync();
-		offset = cycles_2_ns(this_offset - last_offset);
-	return base + offset;
+	local_irq_save(flags);
+
+	cpu = smp_processor_id();
+	rdtscll(t);
+	t = guess_mt(t, cpu);
+	update_monotonic_mt(t);
+
+	local_irq_restore(flags);
+
+	return mt_to_nsec(t);
 }
 EXPORT_SYMBOL(monotonic_clock);
 
@@ -573,62 +603,54 @@ static noinline void handle_lost_ticks(i
 void main_timer_handler(void)
 {
 	static unsigned long rtc_update = 0;
-	unsigned long tsc;
-	int delay = 0, offset = 0, lost = 0;
-
-/*
- * Here we are in the timer irq handler. We have irqs locally disabled (so we
- * don't need spin_lock_irqsave()) but we don't know if the timer_bh is running
- * on the other CPU, so we need a lock. We also need to lock the vsyscall
- * variables, because both do_timer() and us change them -arca+vojtech
- */
-
-	write_seqlock(&xtime_lock);
+	unsigned long flags;
+	u64 mt;
+	int ticks, i;
+	u64 xtime_nsecs, mt_ticks;
 
-	if (vxtime.hpet_address)
-		offset = hpet_readl(HPET_COUNTER);
+	write_seqlock_irqsave(&xtime_lock, flags);
 
-	if (hpet_use_timer) {
-		/* if we're using the hpet timer functionality,
-		 * we can more accurately know the counter value
-		 * when the timer interrupt occured.
-		 */
-		offset = hpet_readl(HPET_T0_CMP) - hpet_tick;
-		delay = hpet_readl(HPET_COUNTER) - offset;
+	mt = update_master_timer64();
+	ticks = (mt - vxtime.mt_wall + mt_per_tick / 2) / mt_per_tick;
+	mt_ticks = ticks * mt_per_tick;
+
+	if (ticks > 1) {
+		handle_lost_ticks(ticks - 1);
+		jiffies += ticks - 1;
 	}
 
-	tsc = get_cycles_sync();
-
-		offset = (((tsc - vxtime.last_tsc) *
-			   vxtime.tsc_quot) >> US_SCALE) - USEC_PER_TICK;
 
-		if (offset < 0)
-			offset = 0;
+/*
+ * Do the timer stuff.
+ * NTP will cause the actual increment of xtime to be slightly different from
+ * NSEC_PER_TICK, so we set xtime.ns_drift to the difference. This will be used
+ * by do_gettimeofday() to make sure the time stays monotonic.
+ */
 
-		if (offset > USEC_PER_TICK) {
-			lost = offset / USEC_PER_TICK;
-			offset %= USEC_PER_TICK;
+	xtime_nsecs = xtime.tv_sec * NSEC_PER_SEC + xtime.tv_nsec;
+	for (i = 0; i < ticks; ++i)
+		do_timer(1);
+	xtime_nsecs = xtime.tv_sec * NSEC_PER_SEC + xtime.tv_nsec - xtime_nsecs;
 
-		monotonic_base += cycles_2_ns(tsc - vxtime.last_tsc);
+	vxtime.ns_drift = (mt_ticks * mtq >> 32) - xtime_nsecs;
+	vxtime.mt_wall += mt_ticks;
 
-		vxtime.last_tsc = tsc - vxtime.quot * delay / vxtime.tsc_quot;
+/*
+ * If we have an externally synchronized Linux clock, then update CMOS clock
+ * accordingly every ~11 minutes. set_rtc_mmss() will be called in the jiffy
+ * closest to exactly 500 ms before the next second. If the update fails, we
+ * don't care, as it'll be updated on the next turn, and the problem (time way
+ * off) isn't likely to go away much sooner anyway.
+ */
 
-		if ((((tsc - vxtime.last_tsc) *
-		      vxtime.tsc_quot) >> US_SCALE) < offset)
-			vxtime.last_tsc = tsc -
-				(((long) offset << US_SCALE) / vxtime.tsc_quot) - 1;
+	if (ntp_synced() && xtime.tv_sec > rtc_update &&
+		abs(xtime.tv_nsec - 500000000) <= tick_nsec / 2) {
+		set_rtc_mmss(xtime.tv_sec);
+		rtc_update = xtime.tv_sec + 660;
 	}
 
-	if (lost > 0)
-		handle_lost_ticks(lost);
-	else
-		lost = 0;
-
-/*
- * Do the timer stuff.
- */
+	write_sequnlock_irqrestore(&xtime_lock, flags);
 
-	do_timer(lost + 1);
 #ifndef CONFIG_SMP
 	update_process_times(user_mode(get_irq_regs()));
 #endif
@@ -642,21 +664,6 @@ void main_timer_handler(void)
 	if (!using_apic_timer)
 		smp_local_timer_interrupt();
 
-/*
- * If we have an externally synchronized Linux clock, then update CMOS clock
- * accordingly every ~11 minutes. set_rtc_mmss() will be called in the jiffy
- * closest to exactly 500 ms before the next second. If the update fails, we
- * don't care, as it'll be updated on the next turn, and the problem (time way
- * off) isn't likely to go away much sooner anyway.
- */
-
-	if (ntp_synced() && xtime.tv_sec > rtc_update &&
-		abs(xtime.tv_nsec - 500000000) <= tick_nsec / 2) {
-		set_rtc_mmss(xtime.tv_sec);
-		rtc_update = xtime.tv_sec + 660;
-	}
- 
-	write_sequnlock(&xtime_lock);
 }
 
 static irqreturn_t timer_interrupt(int irq, void *dev_id)
@@ -669,24 +676,9 @@ static irqreturn_t timer_interrupt(int i
 	return IRQ_HANDLED;
 }
 
-static unsigned int cyc2ns_scale __read_mostly;
-
-static inline void set_cyc2ns_scale(unsigned long cpu_khz)
-{
-	cyc2ns_scale = (NSEC_PER_MSEC << NS_SCALE) / cpu_khz;
-}
-
-static inline unsigned long long cycles_2_ns(unsigned long long cyc)
-{
-	return (cyc * cyc2ns_scale) >> NS_SCALE;
-}
-
 unsigned long long sched_clock(void)
 {
-	unsigned long a = 0;
-
-	rdtscll(a);
-	return cycles_2_ns(a);
+	return monotonic_clock();
 }
 
 static unsigned long get_cmos_time(void)
Index: linux-2.6.20-rc5/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/vsyscall.c
+++ linux-2.6.20-rc5/arch/x86_64/kernel/vsyscall.c
@@ -61,24 +61,35 @@ static __always_inline void timeval_norm
 	}
 }
 
-static __always_inline void do_vgettimeofday(struct timeval * tv)
+static __always_inline u64 __guess_mt(u64 tsc, int cpu)
 {
-	long sequence, t;
-	unsigned long sec, usec;
+	return (((tsc - __vxtime.cpu[cpu].tsc_last) * __vxtime.cpu[cpu].tsc_slope)
+			>> TSC_SLOPE_SCALE) + __vxtime.cpu[cpu].mt_base;
+}
+
+#define USEC_PER_TICK (USEC_PER_SEC / HZ)
+static __always_inline s64 __do_gettimeoffset(u64 tsc, int cpu)
+{
+	return (((s64)(__guess_mt(tsc, cpu) - __vxtime.mt_wall)) * (s64)__vxtime.mt_q) >> 32;
+}
+
+static __always_inline void do_vgettimeofday(struct timeval * tv, u64 tsc, int cpu)
+{
+	unsigned int sec;
+	s64 nsec;
 
-	do {
-		sequence = read_seqbegin(&__xtime_lock);
-		
-		sec = __xtime.tv_sec;
-		usec = __xtime.tv_nsec / 1000;
-
-			usec += ((readl((void __iomem *)
-				   fix_to_virt(VSYSCALL_HPET) + 0xf0) -
-				  __vxtime.last) * __vxtime.quot) >> 32;
-	} while (read_seqretry(&__xtime_lock, sequence));
+	sec = __xtime.tv_sec;
+	nsec = __xtime.tv_nsec;
+	nsec +=	max(__do_gettimeoffset(tsc, cpu), __vxtime.drift);
 
-	tv->tv_sec = sec + usec / 1000000;
-	tv->tv_usec = usec % 1000000;
+	sec += nsec / NSEC_PER_SEC;
+	nsec %= NSEC_PER_SEC;
+	if (nsec < 0) {
+		--sec;
+		nsec += NSEC_PER_SEC;
+	}
+	tv->tv_sec = sec;
+	tv->tv_usec = nsec / NSEC_PER_USEC;
 }
 
 /* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
@@ -107,10 +118,39 @@ static __always_inline long time_syscall
 
 int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
 {
-	if (!__sysctl_vsyscall)
+	int cpu = 0;
+	u64 tsc;
+	unsigned long seq;
+	int do_syscall = !__sysctl_vsyscall;
+
+	if (tv && !do_syscall)
+		switch (__vxtime.mode) {
+			case VXTIME_TSC:
+			case VXTIME_TSCP:
+				do {
+					seq = read_seqbegin(&__xtime_lock);
+
+					if (__vxtime.mode == VXTIME_TSC)
+						rdtscll(tsc);
+					else {
+						rdtscpll(tsc, cpu);
+						cpu &= 0xfff;
+					}
+
+					if (unlikely(__vxtime.cpu[cpu].tsc_invalid))
+						do_syscall = 1;
+					else
+						do_vgettimeofday(tv, tsc, cpu);
+
+				} while (read_seqretry(&__xtime_lock, seq));
+				break;
+			default:
+				do_syscall = 1;
+		}
+
+	if (do_syscall)
 		return gettimeofday(tv,tz);
-	if (tv)
-		do_vgettimeofday(tv);
+
 	if (tz)
 		do_get_tz(tz);
 	return 0;

--

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode
  2007-02-01  9:59 ` [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode jbohac
@ 2007-02-01 11:13   ` Andi Kleen
  2007-02-01 13:13     ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:13 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> VXTIME_PMTMR will be replaced by a more generic "Master Timer"

This means we have no fallback if something goes wrong with the Master timer? 

A little risky.

-Andi


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01  9:59 ` [patch 4/9] Remove the TSC synchronization on SMP machines jbohac
@ 2007-02-01 11:14   ` Andi Kleen
  2007-02-01 13:17     ` Jiri Bohac
  2007-02-01 21:05     ` mbligh
  2007-02-03  1:16   ` H. Peter Anvin
  1 sibling, 2 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:14 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> TSC is either synchronized by design or not reliable
> to be used for anything, let alone timekeeping.

In my tree this is already done better by a patch from Ingo.
Check if they look synchronized and don't use TSC if they are not.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 5/9] Add all the necessary structures to the vsyscall page
  2007-02-01  9:59 ` [patch 5/9] Add all the necessary structures to the vsyscall page jbohac
@ 2007-02-01 11:17   ` Andi Kleen
  0 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:17 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
>  struct vxtime_data {
> +	union {
> +		struct {
> +			u64 tsc_slope;		/* TSC to MT coefficient */
> +			u64 tsc_slope_avg;	/* average tsc_slope */
> +			u64 mt_base;		/* approximated MT at the last LAPIC tick */
> +			u64 mt_last;		/* MT at the last LAPIC tick */
> +			u64 tsc_last;		/* TSC at the last LAPIC tick */
> +			u64 last_mt_guess;	/* ensures monotonicity in temporary MT mode */
> +			char tsc_invalid;	/* don't trust the TSC now (frequency changing) */
> +		};
> +		char pad[64];	/* cacheline alignment */

Use some variant of __cacheline_aligned_in_smp

There are far better ways than to hardcode

> +	} cpu[NR_CPUS];

This can become very large with default NR_CPUS==128. I would prefer
a way that waste less space on smaller machines by only sizing the array
num_possible_cpus()

>  	long hpet_address;	/* HPET base address */
> -	int last;
> -	unsigned long last_tsc;
> -	long quot;
> -	long tsc_quot;
> +	u64 mt_q;		/* master timer to nsec quotient */
> +	u64 mt_wall;		/* MT ticks already covered by the jiffies */
> +	s64 ns_drift;		/* MT - xtime drift in the last tick in ns */

Might make sense to duplicate those to all per cpu datas, then they 
only need to acce

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (8 preceding siblings ...)
  2007-02-01 10:00 ` [patch 9/9] Make use of the Master Timer jbohac
@ 2007-02-01 11:20 ` Andi Kleen
  2007-02-01 11:53   ` Andrea Arcangeli
                     ` (2 more replies)
  2007-02-01 11:34 ` Ingo Molnar
                   ` (2 subsequent siblings)
  12 siblings, 3 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:20 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:

> 
> Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, so
> vsyscall is not used by default.

Only for unsynchronized machines I hope 

> Still, the syscall version of gettimeofday is 
> a lot faster using the TSC approximation instead of other hardware timers.

Yes that makes sense.

The big strategic problem is how to marry your patchkit to John Stultz's
clocksources work which is also competing for merge. Any thoughts on that? 

>When strict inter-CPU monotonicity is not needed, the vsyscall version of
>gettimeofday may be forced using the "nomonotonic" command line parameter.
>gettimeofday()'s monotonicity is guaranteed on a single CPU even with the very
>fast vsyscall version.  Across CPUs, the vsyscall version of gettimeofday is
>not guaranteed to be monotonic, but it should be pretty close. Currently, we
>get errors of tens/hundreds of microseconds.

I think a better way to do this would be to define a new CLOCK_THREAD_MONOTONOUS
(or better name) timer for clock_gettime(). 

[and my currently stalled vdso patches that implement clock_gettime
as a vsyscall]

Then also an application could easily use it with LD_PRELOAD

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 6/9] Add the "Master Timer"
  2007-02-01  9:59 ` [patch 6/9] Add the "Master Timer" jbohac
@ 2007-02-01 11:22   ` Andi Kleen
  2007-02-01 13:29     ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:22 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea


>  
> -unsigned int cpu_khz;					/* TSC clocks / usec, not used here */
> +unsigned int cpu_khz;		/* TSC clocks / usec, not used here */
> +static s64 mt_per_tick;		/* master timer ticks per jiffie */
> +static u64 __mt;		/* master timer */
> +static u32 __mt_last;		/* value last read from read_master_timer() when updating timer caches */

Why the underscores? 

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 7/9] Adapt the time initialization code
  2007-02-01  9:59 ` [patch 7/9] Adapt the time initialization code jbohac
@ 2007-02-01 11:26   ` Andi Kleen
  2007-02-01 13:41     ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:26 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea


> +extern void time_initialize_cpu(void);

Never put externs into .c files. Multiple occurrences.


> +void time_initialize_cpu(void *info)
> +{
> +	unsigned long flags;
> +	int cpu = smp_processor_id();

Are you sure this can never preempt? 

> +	/* FIXME: what speed does the cpu really start at; I doubt cpu_khz is right at this point ??!!!

It should be. It comes from measurements. Unless the CPU changes frequency
behind the kernel's back, but there is nothing that can be done then.

-Andio

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 8/9] Add time_update_mt_guess()
  2007-02-01 10:00 ` [patch 8/9] Add time_update_mt_guess() jbohac
@ 2007-02-01 11:28   ` Andi Kleen
  2007-02-01 13:54     ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:28 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 11:00, jbohac@suse.cz wrote:

> Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> ===================================================================
> --- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
> +++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> @@ -63,6 +63,9 @@ int using_apic_timer __read_mostly = 0;
>  
>  static void apic_pm_activate(void);
>  
> +extern void time_update_mt_guess(void);

No externs in .c files


> +inline u64 mt_to_nsec(u64 mt)
> +{
> +	u64 ret;
> +	ret  = ((mt & 0xffffff) * vxtime.mt_q) >> 32;
> +	mt >>= 24;
> +	ret += ((mt & 0xffffff) * vxtime.mt_q) >> 8;
> +	mt >>= 24;
> +	ret += ( mt             * vxtime.mt_q) << 16;
> +	return ret;

Why so complicated? Isn't a single multiply good enough?

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (9 preceding siblings ...)
  2007-02-01 11:20 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andi Kleen
@ 2007-02-01 11:34 ` Ingo Molnar
  2007-02-01 11:46 ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday Ingo Molnar
  2007-02-02  4:22 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andrew Morton
  12 siblings, 0 replies; 68+ messages in thread
From: Ingo Molnar @ 2007-02-01 11:34 UTC (permalink / raw)
  To: jbohac; +Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul


* jbohac@suse.cz <jbohac@suse.cz> wrote:

> This implementation allows the current time to be approximated by 
> reading the CPU's TSC even on SMP machines with unsynchronised TSCs. 
> This allows us to have a very fast gettimeofday() vsyscall on all SMP 
> machines supporting the RDTSCP instruction (AMD) or having 
> synchronised TSCs (Intel).
>
> Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, 
> so vsyscall is not used by default. Still, the syscall version of 
> gettimeofday is a lot faster using the TSC approximation instead of 
> other hardware timers.

ok, this looks mostly good to me - but this definitely should be based 
/ontop/ of the x86_64 GTOD code. I.e. ontop of these patches in -mm:

 generic-vsyscall-gtod-support-for-generic_time.patch
 generic-vsyscall-gtod-support-for-generic_time-tidy.patch
 time-x86_64-hpet_address-cleanup.patch
 revert-x86_64-mm-ignore-long-smi-interrupts-in-clock-calibration.patch
 time-x86_64-split-x86_64-kernel-timec-up.patch
 time-x86_64-split-x86_64-kernel-timec-up-tidy.patch
 time-x86_64-split-x86_64-kernel-timec-up-fix.patch
 reapply-x86_64-mm-ignore-long-smi-interrupts-in-clock-calibration.patch
 time-x86_64-convert-x86_64-to-use-generic_time.patch
 time-x86_64-convert-x86_64-to-use-generic_time-fix.patch
 time-x86_64-convert-x86_64-to-use-generic_time-tidy.patch
 time-x86_64-hpet-fixup-clocksource-changes.patch
 time-x86_64-tsc-fixup-clocksource-changes.patch
 time-x86_64-re-enable-vsyscall-support-for-x86_64.patch
 time-x86_64-re-enable-vsyscall-support-for-x86_64-tidy.patch

also, note that there is a new TSC synchronization check code in -mm as 
well:

 x86-rewrite-smp-tsc-sync-code.patch

this should be ontop of that too. (and ontop of the high-res timers 
queue)

	Ingo

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 9/9] Make use of the Master Timer
  2007-02-01 10:00 ` [patch 9/9] Make use of the Master Timer jbohac
@ 2007-02-01 11:36   ` Andi Kleen
  2007-02-01 14:29     ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 11:36 UTC (permalink / raw)
  To: jbohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 11:00, jbohac@suse.cz wrote:

> +		case VXTIME_TSC:
> +			rdtscll(tsc);

Where is the CPU synchronization? 

> +	cpu = smp_processor_id();
> +	rdtscll(t);

Also no synchronization. It's slower, but needed.

>  unsigned long long sched_clock(void)
>  {
> -	unsigned long a = 0;
> -
> -	rdtscll(a);
> -	return cycles_2_ns(a);
> +	return monotonic_clock();
>  }

This is overkill because sched_clock() doesn't need a globally monotonic
clock, per CPU monotonic is enough. The old version was fine.


> +static __always_inline void do_vgettimeofday(struct timeval * tv, u64 tsc, int cpu)
> +{
> +	unsigned int sec;
> +	s64 nsec;
>  
> -	do {
> -		sequence = read_seqbegin(&__xtime_lock);
> -		
> -		sec = __xtime.tv_sec;
> -		usec = __xtime.tv_nsec / 1000;
> -
> -			usec += ((readl((void __iomem *)
> -				   fix_to_virt(VSYSCALL_HPET) + 0xf0) -
> -				  __vxtime.last) * __vxtime.quot) >> 32;
> -	} while (read_seqretry(&__xtime_lock, sequence));
> +	sec = __xtime.tv_sec;
> +	nsec = __xtime.tv_nsec;
> +	nsec +=	max(__do_gettimeoffset(tsc, cpu), __vxtime.drift);
>  
> -	tv->tv_sec = sec + usec / 1000000;
> -	tv->tv_usec = usec % 1000000;
> +	sec += nsec / NSEC_PER_SEC;
> +	nsec %= NSEC_PER_SEC;

Using while() here is probably faster (done in vdso patchkit where
gtod got mysteriously faster). Modulo and divisions are slow, even 
for constants when they are large.

You might want to use the algorithm from 
ftp://one.firstfloor.org/pub/ak/x86_64/quilt/patches/vdso

> +	if (nsec < 0) {
> +		--sec;
> +		nsec += NSEC_PER_SEC;
> +	}
> +	tv->tv_sec = sec;
> +	tv->tv_usec = nsec / NSEC_PER_USEC;

Similar. 

>  }
>  
>  /* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
> @@ -107,10 +118,39 @@ static __always_inline long time_syscall
>  
>  int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
>  {
> -	if (!__sysctl_vsyscall)
> +	int cpu = 0;
> +	u64 tsc;
> +	unsigned long seq;
> +	int do_syscall = !__sysctl_vsyscall;
> +
> +	if (tv && !do_syscall)
> +		switch (__vxtime.mode) {
> +			case VXTIME_TSC:
> +			case VXTIME_TSCP:
> +				do {
> +					seq = read_seqbegin(&__xtime_lock);
> +
> +					if (__vxtime.mode == VXTIME_TSC)
> +						rdtscll(tsc);
> +					else {
> +						rdtscpll(tsc, cpu);
> +						cpu &= 0xfff;
> +					}
> +
> +					if (unlikely(__vxtime.cpu[cpu].tsc_invalid))
> +						do_syscall = 1;
> +					else
> +						do_vgettimeofday(tv, tsc, cpu);
> +
> +				} while (read_seqretry(&__xtime_lock, seq));
> +				break;
> +			default:
> +				do_syscall = 1;

Why do you not set __sysctl_vsyscall correctly for the mode at initialization?


-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [-mm patch] x86_64 GTOD: offer scalable vgettimeofday
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (10 preceding siblings ...)
  2007-02-01 11:34 ` Ingo Molnar
@ 2007-02-01 11:46 ` Ingo Molnar
  2007-02-01 12:01   ` Andi Kleen
  2007-02-01 12:17   ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II Andi Kleen
  2007-02-02  4:22 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andrew Morton
  12 siblings, 2 replies; 68+ messages in thread
From: Ingo Molnar @ 2007-02-01 11:46 UTC (permalink / raw)
  To: jbohac
  Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul,
	Andrew Morton


* jbohac@suse.cz <jbohac@suse.cz> wrote:

> Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, 
> so vsyscall is not used by default. [...]

note that this is not actually the case. My patch below, ontop of -mm, 
implements a fully monotonic gettimeofday as an optional vsyscall 
feature.

The 'price' paid for it is lower resolution - but it's still good for 
those benchmarking TPC-C runs - and /alot/ simpler. It's also quite a 
bit faster than any TSC based vgettimeofday, because it doesnt have to 
do an RDTSC (or RDTSCP) instruction nor any approximation of the time.

	Ingo

---------------------------->
Subject: [patch] x86_64 GTOD: offer scalable vgettimeofday
From: Ingo Molnar <mingo@elte.hu>

offer scalable vgettimeofday independently of whether the TSC is 
synchronous or not. Off by default. Results in low resolution 
gettimefday().

this patch also fixes an SMP bug in sys_vtime(): we should read 
__vsyscall_gtod_data.wall_time_tv.tv_sec only once.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86_64/kernel/vsyscall.c |   30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

Index: linux/arch/x86_64/kernel/vsyscall.c
===================================================================
--- linux.orig/arch/x86_64/kernel/vsyscall.c
+++ linux/arch/x86_64/kernel/vsyscall.c
@@ -107,6 +107,22 @@ static __always_inline void do_vgettimeo
 	cycle_t now, base, mask, cycle_delta;
 	unsigned long seq, mult, shift, nsec_delta;
 	cycle_t (*vread)(void);
+
+	if (likely(__vsyscall_gtod_data.sysctl_enabled == 2)) {
+		struct timeval tmp;
+
+		do {
+			barrier();
+			*tv = __vsyscall_gtod_data.wall_time_tv;
+			barrier();
+			tmp = __vsyscall_gtod_data.wall_time_tv;
+
+		} while (tmp.tv_usec != tv->tv_usec ||
+					tmp.tv_sec != tv->tv_sec);
+
+		return;
+	}
+
 	do {
 		seq = read_seqbegin(&__vsyscall_gtod_data.lock);
 
@@ -151,11 +167,19 @@ int __vsyscall(0) vgettimeofday(struct t
  * unlikely */
 time_t __vsyscall(1) vtime(time_t *t)
 {
+	time_t secs;
+
 	if (!__vsyscall_gtod_data.sysctl_enabled)
 		return time_syscall(t);
-	else if (t)
-		*t = __vsyscall_gtod_data.wall_time_tv.tv_sec;
-	return __vsyscall_gtod_data.wall_time_tv.tv_sec;
+
+	/*
+	 * Make sure that what we return is the same number we
+	 * write:
+	 */
+	secs = __vsyscall_gtod_data.wall_time_tv.tv_sec;
+	if (t)
+		*t = secs;
+	return secs;
 }
 
 /* Fast way to get current CPU and node.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 11:20 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andi Kleen
@ 2007-02-01 11:53   ` Andrea Arcangeli
  2007-02-01 12:02     ` Andi Kleen
  2007-02-01 12:17   ` Ingo Molnar
  2007-02-01 14:52   ` Jiri Bohac
  2 siblings, 1 reply; 68+ messages in thread
From: Andrea Arcangeli @ 2007-02-01 11:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel

On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote:
> I think a better way to do this would be to define a new CLOCK_THREAD_MONOTONOUS
> (or better name) timer for clock_gettime(). 
> 
> [and my currently stalled vdso patches that implement clock_gettime
> as a vsyscall]
> 
> Then also an application could easily use it with LD_PRELOAD

I think a prctl to enable the non monothone mode is better than any
LD_PRELOAD trick.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [-mm patch] x86_64 GTOD: offer scalable vgettimeofday
  2007-02-01 11:46 ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday Ingo Molnar
@ 2007-02-01 12:01   ` Andi Kleen
  2007-02-01 12:14     ` Ingo Molnar
  2007-02-01 12:17   ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II Andi Kleen
  1 sibling, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 12:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jbohac, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul,
	Andrew Morton

On Thursday 01 February 2007 12:46, Ingo Molnar wrote:
> 
> * jbohac@suse.cz <jbohac@suse.cz> wrote:
> 
> > Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, 
> > so vsyscall is not used by default. [...]
> 
> note that this is not actually the case. My patch below, ontop of -mm, 
> implements a fully monotonic gettimeofday as an optional vsyscall 
> feature.
> 
> The 'price' paid for it is lower resolution - but it's still good for 
> those benchmarking TPC-C runs - and /alot/ simpler. It's also quite a 
> bit faster than any TSC based vgettimeofday, because it doesnt have to 
> do an RDTSC (or RDTSCP) instruction nor any approximation of the time.

I believe that should be also a separate clock_gettime() CLOCK_ 

Global settings for these things are bad. Even if you run TPC-C you don't
want your other programs that rely on monotonic time to break.

-Andi


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 11:53   ` Andrea Arcangeli
@ 2007-02-01 12:02     ` Andi Kleen
  2007-02-01 12:54       ` Andrea Arcangeli
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 12:02 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel

On Thursday 01 February 2007 12:53, Andrea Arcangeli wrote:
> On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote:
> > I think a better way to do this would be to define a new CLOCK_THREAD_MONOTONOUS
> > (or better name) timer for clock_gettime(). 
> > 
> > [and my currently stalled vdso patches that implement clock_gettime
> > as a vsyscall]
> > 
> > Then also an application could easily use it with LD_PRELOAD
> 
> I think a prctl to enable the non monothone mode is better than any
> LD_PRELOAD trick.

I don't think so because having per process state in a vsyscall
is quite costly. You would need to allocate at least one more
page to each process, which I think would be excessive.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [-mm patch] x86_64 GTOD: offer scalable vgettimeofday
  2007-02-01 12:01   ` Andi Kleen
@ 2007-02-01 12:14     ` Ingo Molnar
  0 siblings, 0 replies; 68+ messages in thread
From: Ingo Molnar @ 2007-02-01 12:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul,
	Andrew Morton


* Andi Kleen <ak@suse.de> wrote:

> > The 'price' paid for it is lower resolution - but it's still good 
> > for those benchmarking TPC-C runs - and /alot/ simpler. It's also 
> > quite a bit faster than any TSC based vgettimeofday, because it 
> > doesnt have to do an RDTSC (or RDTSCP) instruction nor any 
> > approximation of the time.
> 
> I believe that should be also a separate clock_gettime() CLOCK_
> 
> Global settings for these things are bad. Even if you run TPC-C you 
> don't want your other programs that rely on monotonic time to break.

yeah. But maybe still there should still be an 'easy' option for people 
to consciously degrade the resolution of gettimeofday(), in exchange for 
more performance. There are systems where gettimeofday already has such 
resolution, so apps certainly shouldnt break from this. But i agree with 
you: that's why i made this default-off, and the CLOCK_ option could be 
a way for apps to reliably get this behavior, independently of the 
global setting (hence driving the migration of affected apps to this new 
CLOCK_ thing). Hm?

	Ingo

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 11:20 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andi Kleen
  2007-02-01 11:53   ` Andrea Arcangeli
@ 2007-02-01 12:17   ` Ingo Molnar
  2007-02-01 14:52   ` Jiri Bohac
  2 siblings, 0 replies; 68+ messages in thread
From: Ingo Molnar @ 2007-02-01 12:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jbohac, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul


* Andi Kleen <ak@suse.de> wrote:

> The big strategic problem is how to marry your patchkit to John 
> Stultz's clocksources work which is also competing for merge. Any 
> thoughts on that?

the only sane thing would be to do it ontop of -mm: the stuff in -mm, 
barring some catastrophy, is hopefully destined for v2.6.21. We could do 
my quick optional hack for those who want fast gettimeofday now ahead of 
that queue - but this approximation thing should be definitely ontop.

	Ingo

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II
  2007-02-01 11:46 ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday Ingo Molnar
  2007-02-01 12:01   ` Andi Kleen
@ 2007-02-01 12:17   ` Andi Kleen
  2007-02-01 12:24     ` Ingo Molnar
  1 sibling, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 12:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jbohac, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul,
	Andrew Morton

On Thursday 01 February 2007 12:46, Ingo Molnar wrote:
> 
> * jbohac@suse.cz <jbohac@suse.cz> wrote:
> 
> > Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, 
> > so vsyscall is not used by default. [...]
> 
> note that this is not actually the case. My patch below, ontop of -mm, 
> implements a fully monotonic gettimeofday as an optional vsyscall 
> feature.
> 
> The 'price' paid for it is lower resolution - but it's still good for 
> those benchmarking TPC-C runs - and /alot/ simpler. 

BTW another comment: I was told that at least one of the big databases
wants ms resolution here. So to make your scheme work
would require a HZ=1024 regular interrupt. But that would also make
everything slower again due to CPU overhead as it was learned in the 2.4->2.6 HZ 
transition. 

So it might not actually be worth it.

-Andi


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II
  2007-02-01 12:17   ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II Andi Kleen
@ 2007-02-01 12:24     ` Ingo Molnar
  2007-02-01 12:45       ` Andi Kleen
  0 siblings, 1 reply; 68+ messages in thread
From: Ingo Molnar @ 2007-02-01 12:24 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul,
	Andrew Morton


* Andi Kleen <ak@suse.de> wrote:

> > The 'price' paid for it is lower resolution - but it's still good 
> > for those benchmarking TPC-C runs - and /alot/ simpler.
> 
> BTW another comment: I was told that at least one of the big databases 
> wants ms resolution here. So to make your scheme work would require a 
> HZ=1024 regular interrupt. [...]

if resolution is an issue then i can improve this thing to be based off 
a separate /optional/ hrtimer, thus if it's enabled it could enable 1000 
Hz (and not 1024 Hz) update for the variable. The update resolution 
could be tuned via a sysctl trivially, so everyone could tune the 
resolution of this to the value desired, and could do so runtime.

[ It could also be driven by the database right now: from a thread open 
  /dev/rtc, set it to 1024 HZ, and do a gettimeofday() call in every 
  tick - that will auto-update the timestamp. ]

> [...] But that would also make everything slower again due to CPU 
> overhead as it was learned in the 2.4->2.6 HZ transition.

note that this cost was measured on UP and on older hardware, and the 
cost of having a global 1000 Hz update gets linearly cheaper with the 
increase of CPUs on SMP: because only one such update has to be running. 
The systems those database vendors are interested in typically have a 
fair number of CPUs.

	Ingo

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II
  2007-02-01 12:24     ` Ingo Molnar
@ 2007-02-01 12:45       ` Andi Kleen
  0 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 12:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jbohac, linux-kernel, Vojtech Pavlik, arjan, tglx, johnstul,
	Andrew Morton

On Thursday 01 February 2007 13:24, Ingo Molnar wrote:

> if resolution is an issue then i can improve this thing to be based off 
> a separate /optional/ hrtimer, thus if it's enabled it could enable 1000 
> Hz (and not 1024 Hz) update for the variable. The update resolution 
> could be tuned via a sysctl trivially, so everyone could tune the 
> resolution of this to the value desired, and could do so runtime.

It would be better to let the application set it without root rights 
(afaik W. allows this).  Auto tuning beats explicit configuration anytime.

Not sure it's really worth it though.

My thinking was to gather more requirements of what users actually
want first before adding all these new modi.

> [ It could also be driven by the database right now: from a thread open 
>   /dev/rtc, set it to 1024 HZ, and do a gettimeofday() call in every 
>   tick - that will auto-update the timestamp. ]

zmailer used to do that (or probably still does) but I always hated
the scheme for some reason :)


> > [...] But that would also make everything slower again due to CPU 
> > overhead as it was learned in the 2.4->2.6 HZ transition.
> 
> note that this cost was measured on UP and on older hardware, and the 
> cost of having a global 1000 Hz update gets linearly cheaper with the 
> increase of CPUs on SMP: because only one such update has to be running. 
> The systems those database vendors are interested in typically have a 
> fair number of CPUs.

Good point. Even on desktop with Multi Core or SMT it should be cheaper now.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 12:02     ` Andi Kleen
@ 2007-02-01 12:54       ` Andrea Arcangeli
  0 siblings, 0 replies; 68+ messages in thread
From: Andrea Arcangeli @ 2007-02-01 12:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel

On Thu, Feb 01, 2007 at 01:02:41PM +0100, Andi Kleen wrote:
> I don't think so because having per process state in a vsyscall
> is quite costly. You would need to allocate at least one more
> page to each process, which I think would be excessive.

You would need one page per cpu and to check a change in a TIF_
bitflag during switch_to (zero cost) and overwrite the vsyscall bit in
the slow path.

If we had a picotimeofday that would be guaranteed monotone... if he
can measure errors with shared memory in smp, it means the measurement
error (LAPIC and tsc frequency estimation) is longer than the time it
takes to bounce a spinlock and reach a second rdtscp. I hoped this
wouldn't happen. Could you send me the app used to reproduce the
non-monotonicity over shared memory with rdtscp? I finally have a (EE)
stepping F to attempt testing it. thanks!

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode
  2007-02-01 13:13     ` Jiri Bohac
@ 2007-02-01 13:13       ` Andi Kleen
  2007-02-01 13:59         ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 13:13 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 14:13, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:13:31PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > > VXTIME_PMTMR will be replaced by a more generic "Master Timer"
> > 
> > This means we have no fallback if something goes wrong with the Master timer? 
> > 
> > A little risky.
> 
> No, either HPET or PM Timer will become the Master Timer (elected
> on boot). Master timer is just an abstraction of these, so the
> rest of the timekeeping code needn't care which hardware timer is
> being used. That's why the VXTIME_PMTMR mode is not needed.

But there is no option for the user to force so, is there?

-Andi


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode
  2007-02-01 11:13   ` Andi Kleen
@ 2007-02-01 13:13     ` Jiri Bohac
  2007-02-01 13:13       ` Andi Kleen
  0 siblings, 1 reply; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 13:13 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:13:31PM +0100, Andi Kleen wrote:
> On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > VXTIME_PMTMR will be replaced by a more generic "Master Timer"
> 
> This means we have no fallback if something goes wrong with the Master timer? 
> 
> A little risky.

No, either HPET or PM Timer will become the Master Timer (elected
on boot). Master timer is just an abstraction of these, so the
rest of the timekeeping code needn't care which hardware timer is
being used. That's why the VXTIME_PMTMR mode is not needed.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01 11:14   ` Andi Kleen
@ 2007-02-01 13:17     ` Jiri Bohac
  2007-02-01 15:16       ` Vojtech Pavlik
  2007-02-02  7:13       ` Andi Kleen
  2007-02-01 21:05     ` mbligh
  1 sibling, 2 replies; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 13:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:14:23PM +0100, Andi Kleen wrote:
> On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > TSC is either synchronized by design or not reliable
> > to be used for anything, let alone timekeeping.
> 
> In my tree this is already done better by a patch from Ingo.
> Check if they look synchronized and don't use TSC if they are not.

The whole purpose of this patchset is to make use of TSC even if
it's not synchronized.

Synchronizing it will not make anything better in any way -- the
implementation just does not care whether TSCs are synchronized.
That's why I think the synchronization code is not needed.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 6/9] Add the "Master Timer"
  2007-02-01 11:22   ` Andi Kleen
@ 2007-02-01 13:29     ` Jiri Bohac
  0 siblings, 0 replies; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 13:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:22:55PM +0100, Andi Kleen wrote:
> >  
> > -unsigned int cpu_khz;					/* TSC clocks / usec, not used here */
> > +unsigned int cpu_khz;		/* TSC clocks / usec, not used here */
> > +static s64 mt_per_tick;		/* master timer ticks per jiffie */
> > +static u64 __mt;		/* master timer */
> > +static u32 __mt_last;		/* value last read from read_master_timer() when updating timer caches */
> 
> Why the underscores? 

To make it clear that the variables should not be used directly.
They should only be accessed through the get_master_timer(),
set_master_timer64(), etc. funcitions.

Something wrong with that? I have no problem deleting the
underscores :-)

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 7/9] Adapt the time initialization code
  2007-02-01 11:26   ` Andi Kleen
@ 2007-02-01 13:41     ` Jiri Bohac
  0 siblings, 0 replies; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 13:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:26:33PM +0100, Andi Kleen wrote:
> 
> > +extern void time_initialize_cpu(void);
> 
> Never put externs into .c files. Multiple occurrences.

Ok, will fix it, sorry.

> > +void time_initialize_cpu(void *info)
> > +{
> > +	unsigned long flags;
> > +	int cpu = smp_processor_id();
> 
> Are you sure this can never preempt? 

yes, preemption is explicitly disabled in start_secondary() that
calls this function.

> > +	/* FIXME: what speed does the cpu really start at; I doubt cpu_khz is right at this point ??!!!
> 
> It should be. It comes from measurements. Unless the CPU changes frequency
> behind the kernel's back, but there is nothing that can be done then.

Well, I'm not sure. I think the global variable cpu_khz is wrong
in the first place. This should be per-cpu, because each CPU can
have a different frequency, right?

And cpu_khz is adjusted in time_cpufreq_notifier() when whichever
CPU's frequency changes.  To me it seems that it's a leftover from
the times when all CPUs in a system ran at the same speed. I
think it should be killed. I just did not want to make too many
unrelated changes in one patchset.

It doesn't matter that much in this case that it probably is
wrong (as the comment explains)...


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 8/9] Add time_update_mt_guess()
  2007-02-01 11:28   ` Andi Kleen
@ 2007-02-01 13:54     ` Jiri Bohac
  0 siblings, 0 replies; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 13:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:28:50PM +0100, Andi Kleen wrote:
> On Thursday 01 February 2007 11:00, jbohac@suse.cz wrote:
> > +inline u64 mt_to_nsec(u64 mt)
> > +{
> > +	u64 ret;
> > +	ret  = ((mt & 0xffffff) * vxtime.mt_q) >> 32;
> > +	mt >>= 24;
> > +	ret += ((mt & 0xffffff) * vxtime.mt_q) >> 8;
> > +	mt >>= 24;
> > +	ret += ( mt             * vxtime.mt_q) << 16;
> > +	return ret;
> 
> Why so complicated? Isn't a single multiply good enough?

This does a multiplication and a downshift at once. The problem
is, that if we first do the multipclication, the result won't fit
in 64 bits.

If we first do the downshift, we lose precision.

This does both operations at once, avoiding both the overflow and
underflow.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode
  2007-02-01 13:13       ` Andi Kleen
@ 2007-02-01 13:59         ` Jiri Bohac
  2007-02-01 14:18           ` Andi Kleen
  0 siblings, 1 reply; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 13:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jiri Bohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 02:13:00PM +0100, Andi Kleen wrote:
> On Thursday 01 February 2007 14:13, Jiri Bohac wrote:
> > On Thu, Feb 01, 2007 at 12:13:31PM +0100, Andi Kleen wrote:
> > > On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > > > VXTIME_PMTMR will be replaced by a more generic "Master Timer"
> > > 
> > > This means we have no fallback if something goes wrong with the Master timer? 
> > > 
> > > A little risky.
> > 
> > No, either HPET or PM Timer will become the Master Timer (elected
> > on boot). Master timer is just an abstraction of these, so the
> > rest of the timekeeping code needn't care which hardware timer is
> > being used. That's why the VXTIME_PMTMR mode is not needed.
> 
> But there is no option for the user to force so, is there?

HPET is the default. If it's not available or with the "nohpet"
commandline option, PM Timer will be used as the Master Timer. 

If this is not enough, it can be easily fixed in time_init().

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode
  2007-02-01 13:59         ` Jiri Bohac
@ 2007-02-01 14:18           ` Andi Kleen
  0 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-01 14:18 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 14:59, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 02:13:00PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 14:13, Jiri Bohac wrote:
> > > On Thu, Feb 01, 2007 at 12:13:31PM +0100, Andi Kleen wrote:
> > > > On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > > > > VXTIME_PMTMR will be replaced by a more generic "Master Timer"
> > > > 
> > > > This means we have no fallback if something goes wrong with the Master timer? 
> > > > 
> > > > A little risky.
> > > 
> > > No, either HPET or PM Timer will become the Master Timer (elected
> > > on boot). Master timer is just an abstraction of these, so the
> > > rest of the timekeeping code needn't care which hardware timer is
> > > being used. That's why the VXTIME_PMTMR mode is not needed.
> > 
> > But there is no option for the user to force so, is there?
> 
> HPET is the default. If it's not available or with the "nohpet"
> commandline option, PM Timer will be used as the Master Timer. 

This assumes all your algorithms are always correct.
 
> If this is not enough, it can be easily fixed in time_init().

I think we want at least one option that forces HPET/PMtimer as primary
time source.

-Andi

 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 9/9] Make use of the Master Timer
  2007-02-01 11:36   ` Andi Kleen
@ 2007-02-01 14:29     ` Jiri Bohac
  2007-02-01 15:23       ` Vojtech Pavlik
  2007-02-02  7:04       ` Andi Kleen
  0 siblings, 2 replies; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 14:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:36:05PM +0100, Andi Kleen wrote:
> On Thursday 01 February 2007 11:00, jbohac@suse.cz wrote:
> 
> > +		case VXTIME_TSC:
> > +			rdtscll(tsc);
> 
> Where is the CPU synchronization? 
> 
> > +	cpu = smp_processor_id();
> > +	rdtscll(t);
> 
> Also no synchronization. It's slower, but needed.

Hmm, I wasn't sure. Why is it needed? How outdated can the
result of RDTSC / RDTSCP be?

If I do:
	rdtscll(a)
	...
	rdtscll(b)
is it guaranteed that (b > a) ?

>
> >  unsigned long long sched_clock(void)
> >  {
> > -	unsigned long a = 0;
> > -
> > -	rdtscll(a);
> > -	return cycles_2_ns(a);
> > +	return monotonic_clock();
> >  }
> 
> This is overkill because sched_clock() doesn't need a globally monotonic
> clock, per CPU monotonic is enough. The old version was fine.

OK, thanks for spotting this. I'll change it to use __guess_mt().
(more or less equal to cycles_2_ns(), no need to maintain yet another
tsc->ns ratio just for cycles_2_ns().


> > +static __always_inline void do_vgettimeofday(struct timeval * tv, u64 tsc, int cpu)
> > +{
> > +	unsigned int sec;
> > +	s64 nsec;
> >  
> > -	do {
> > -		sequence = read_seqbegin(&__xtime_lock);
> > -		
> > -		sec = __xtime.tv_sec;
> > -		usec = __xtime.tv_nsec / 1000;
> > -
> > -			usec += ((readl((void __iomem *)
> > -				   fix_to_virt(VSYSCALL_HPET) + 0xf0) -
> > -				  __vxtime.last) * __vxtime.quot) >> 32;
> > -	} while (read_seqretry(&__xtime_lock, sequence));
> > +	sec = __xtime.tv_sec;
> > +	nsec = __xtime.tv_nsec;
> > +	nsec +=	max(__do_gettimeoffset(tsc, cpu), __vxtime.drift);
> >  
> > -	tv->tv_sec = sec + usec / 1000000;
> > -	tv->tv_usec = usec % 1000000;
> > +	sec += nsec / NSEC_PER_SEC;
> > +	nsec %= NSEC_PER_SEC;
> 
> Using while() here is probably faster (done in vdso patchkit where
> gtod got mysteriously faster). Modulo and divisions are slow, even 
> for constants when they are large.

OK, will do that

> 
> >  }
> >  
> >  /* RED-PEN may want to readd seq locking, but then the variable should be write-once. */
> > @@ -107,10 +118,39 @@ static __always_inline long time_syscall
> >  
> >  int __vsyscall(0) vgettimeofday(struct timeval * tv, struct timezone * tz)
> >  {
> > -	if (!__sysctl_vsyscall)
> > +	int cpu = 0;
> > +	u64 tsc;
> > +	unsigned long seq;
> > +	int do_syscall = !__sysctl_vsyscall;
> > +
> > +	if (tv && !do_syscall)
> > +		switch (__vxtime.mode) {
> > +			case VXTIME_TSC:
> > +			case VXTIME_TSCP:
> > +				do {
> > +					seq = read_seqbegin(&__xtime_lock);
> > +
> > +					if (__vxtime.mode == VXTIME_TSC)
> > +						rdtscll(tsc);
> > +					else {
> > +						rdtscpll(tsc, cpu);
> > +						cpu &= 0xfff;
> > +					}
> > +
> > +					if (unlikely(__vxtime.cpu[cpu].tsc_invalid))
> > +						do_syscall = 1;
> > +					else
> > +						do_vgettimeofday(tv, tsc, cpu);
> > +
> > +				} while (read_seqretry(&__xtime_lock, seq));
> > +				break;
> > +			default:
> > +				do_syscall = 1;
> 
> Why do you not set __sysctl_vsyscall correctly for the mode at initialization?

Because of the __vxtime.cpu[cpu].tsc_invalid flag. We may be
using the vsyscall, but when we get the cpufreq PRE- notification, we
know that TSC cannot be trusted from that point on, until the
frequency stabilises. We set the flag and until TSC becomes
reliable again, vsyscall w/ HW Master Timer read will be used.

So this is something that changes in runtime, and cannot be set
permanently on initialization...


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 11:20 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andi Kleen
  2007-02-01 11:53   ` Andrea Arcangeli
  2007-02-01 12:17   ` Ingo Molnar
@ 2007-02-01 14:52   ` Jiri Bohac
  2007-02-01 16:56     ` john stultz
  2 siblings, 1 reply; 68+ messages in thread
From: Jiri Bohac @ 2007-02-01 14:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote:
> On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> 
> > 
> > Inter-CPU monotonicity can not, however, be guaranteed in a vsyscall, so
> > vsyscall is not used by default.
> 
> Only for unsynchronized machines I hope 

yes, sorry, only on unsynchronized machines

> The big strategic problem is how to marry your patchkit to John Stultz's
> clocksources work which is also competing for merge. Any thoughts on that? 

I'll look into that next week. Sorry, I wanted to do that a long time
ago, but I spent weeks (over a month) fighting a nasty livelock
in the code. (Morale: think twice before using a spinlock inside
                      a {do .. while (read_seqretry(..))} loop)

> >When strict inter-CPU monotonicity is not needed, the vsyscall version of
> >gettimeofday may be forced using the "nomonotonic" command line parameter.
> >gettimeofday()'s monotonicity is guaranteed on a single CPU even with the very
> >fast vsyscall version.  Across CPUs, the vsyscall version of gettimeofday is
> >not guaranteed to be monotonic, but it should be pretty close. Currently, we
> >get errors of tens/hundreds of microseconds.
> 
> I think a better way to do this would be to define a new CLOCK_THREAD_MONOTONOUS
> (or better name) timer for clock_gettime(). 

I absolutely agree. Will do that. This should give userspace a
decently accurate and very fast time source.

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01 13:17     ` Jiri Bohac
@ 2007-02-01 15:16       ` Vojtech Pavlik
  2007-02-02  7:14         ` Andi Kleen
  2007-02-02  7:13       ` Andi Kleen
  1 sibling, 1 reply; 68+ messages in thread
From: Vojtech Pavlik @ 2007-02-01 15:16 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Andi Kleen, linux-kernel, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thu, Feb 01, 2007 at 02:17:15PM +0100, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:14:23PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > > TSC is either synchronized by design or not reliable
> > > to be used for anything, let alone timekeeping.
> > 
> > In my tree this is already done better by a patch from Ingo.
> > Check if they look synchronized and don't use TSC if they are not.
> 
> The whole purpose of this patchset is to make use of TSC even if
> it's not synchronized.
> 
> Synchronizing it will not make anything better in any way -- the
> implementation just does not care whether TSCs are synchronized.
> That's why I think the synchronization code is not needed.
 
It might even make sense to desycnhronize the TSCs on such (AMD)
machines on purpose, so that applications that rely on TSC break
immediately and not after some time when the error becomes too large.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 9/9] Make use of the Master Timer
  2007-02-01 14:29     ` Jiri Bohac
@ 2007-02-01 15:23       ` Vojtech Pavlik
  2007-02-02  7:05         ` Andi Kleen
  2007-02-02  7:04       ` Andi Kleen
  1 sibling, 1 reply; 68+ messages in thread
From: Vojtech Pavlik @ 2007-02-01 15:23 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Andi Kleen, linux-kernel, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thu, Feb 01, 2007 at 03:29:31PM +0100, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:36:05PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 11:00, jbohac@suse.cz wrote:
> > 
> > > +		case VXTIME_TSC:
> > > +			rdtscll(tsc);
> > 
> > Where is the CPU synchronization? 
> > 
> > > +	cpu = smp_processor_id();
> > > +	rdtscll(t);
> > 
> > Also no synchronization. It's slower, but needed.
> 
> Hmm, I wasn't sure. Why is it needed? How outdated can the
> result of RDTSC / RDTSCP be?
> 
> If I do:
> 	rdtscll(a)
> 	...
> 	rdtscll(b)
> is it guaranteed that (b > a) ?

On a single CPU this is always guaranteed. Even on AMD.

> > >  unsigned long long sched_clock(void)
> > >  {
> > > -	unsigned long a = 0;
> > > -
> > > -	rdtscll(a);
> > > -	return cycles_2_ns(a);
> > > +	return monotonic_clock();
> > >  }
> > 
> > This is overkill because sched_clock() doesn't need a globally monotonic
> > clock, per CPU monotonic is enough. The old version was fine.
> 
> OK, thanks for spotting this. I'll change it to use __guess_mt().
> (more or less equal to cycles_2_ns(), no need to maintain yet another
> tsc->ns ratio just for cycles_2_ns().

Will this also work correctly during CPU frequency changes?

> > > -	tv->tv_sec = sec + usec / 1000000;
> > > -	tv->tv_usec = usec % 1000000;
> > > +	sec += nsec / NSEC_PER_SEC;
> > > +	nsec %= NSEC_PER_SEC;
> > 
> > Using while() here is probably faster (done in vdso patchkit where
> > gtod got mysteriously faster). Modulo and divisions are slow, even 
> > for constants when they are large.
> 
> OK, will do that

I'd suggest benchmarking the difference.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 14:52   ` Jiri Bohac
@ 2007-02-01 16:56     ` john stultz
  2007-02-01 19:41       ` Vojtech Pavlik
  0 siblings, 1 reply; 68+ messages in thread
From: john stultz @ 2007-02-01 16:56 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	zippel, andrea

On Thu, 2007-02-01 at 15:52 +0100, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote:
>>
> > The big strategic problem is how to marry your patchkit to John Stultz's
> > clocksources work which is also competing for merge. Any thoughts on that? 
> 
> I'll look into that next week. Sorry, I wanted to do that a long time
> ago, but I spent weeks (over a month) fighting a nasty livelock
> in the code. (Morale: think twice before using a spinlock inside
>                       a {do .. while (read_seqretry(..))} loop)

The first step here shouldn't be too difficult. Just create a _read
function that uses your code to return monotonic TSC cycles (instead of
nanoseconds w/ gettimeofday).  Then just create a clocksource structure
for it.

The harder part will be the vsyscall, as you will need extra per cpu
data in the vsyscall read. I had some test code for this situation
awhile back, so if you get the first part functioning correctly (just a
clocksource w/o a vread pointer), I'll gladly help you get the vsyscall
bits working.

thanks
-john



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01 16:56     ` john stultz
@ 2007-02-01 19:41       ` Vojtech Pavlik
  0 siblings, 0 replies; 68+ messages in thread
From: Vojtech Pavlik @ 2007-02-01 19:41 UTC (permalink / raw)
  To: john stultz
  Cc: Jiri Bohac, Andi Kleen, linux-kernel, ssouhlal, arjan, tglx,
	zippel, andrea

On Thu, Feb 01, 2007 at 08:56:48AM -0800, john stultz wrote:
> On Thu, 2007-02-01 at 15:52 +0100, Jiri Bohac wrote:
> > On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote:
> >>
> > > The big strategic problem is how to marry your patchkit to John Stultz's
> > > clocksources work which is also competing for merge. Any thoughts on that? 
> > 
> > I'll look into that next week. Sorry, I wanted to do that a long time
> > ago, but I spent weeks (over a month) fighting a nasty livelock
> > in the code. (Morale: think twice before using a spinlock inside
> >                       a {do .. while (read_seqretry(..))} loop)
> 
> The first step here shouldn't be too difficult. Just create a _read
> function that uses your code to return monotonic TSC cycles (instead of
> nanoseconds w/ gettimeofday).  Then just create a clocksource structure
> for it.

guess_mt() is more or less the function you're looking for. (With the
exception of the cpufreq and mode switching logic.)

> The harder part will be the vsyscall, as you will need extra per cpu
> data in the vsyscall read. I had some test code for this situation
> awhile back, so if you get the first part functioning correctly (just a
> clocksource w/o a vread pointer), I'll gladly help you get the vsyscall
> bits working.
> 
> thanks
> -john
> 
> 

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01 11:14   ` Andi Kleen
  2007-02-01 13:17     ` Jiri Bohac
@ 2007-02-01 21:05     ` mbligh
  1 sibling, 0 replies; 68+ messages in thread
From: mbligh @ 2007-02-01 21:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

Andi Kleen wrote:
> On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
>> TSC is either synchronized by design or not reliable
>> to be used for anything, let alone timekeeping.
> 
> In my tree this is already done better by a patch from Ingo.
> Check if they look synchronized and don't use TSC if they are not.

Is it going to notice dynamically when one cpu ramps down/up
during runtime, and looses sync?

M.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 1/9] Fix HPET init race
  2007-02-01  9:59 ` [patch 1/9] Fix HPET init race jbohac
@ 2007-02-02  2:34   ` Andrew Morton
  2007-02-06 16:44     ` Jiri Bohac
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2007-02-02  2:34 UTC (permalink / raw)
  To: jbohac
  Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, 01 Feb 2007 10:59:53 +0100 jbohac@suse.cz wrote:

> Fix a race in the initialization of HPET, which might result in a 
> 5 minute lockup on boot.
> 

What race?  Please always describe bugs when fixing them.

> 
> Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> ===================================================================
> --- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
> +++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> @@ -764,10 +767,12 @@ static void setup_APIC_timer(unsigned in
>  
>  	/* wait for irq slice */
>   	if (vxtime.hpet_address && hpet_use_timer) {
> - 		int trigger = hpet_readl(HPET_T0_CMP);
> - 		while (hpet_readl(HPET_COUNTER) >= trigger)
> - 			/* do nothing */ ;
> - 		while (hpet_readl(HPET_COUNTER) <  trigger)
> +		int trigger;
> +		do
> +			trigger = hpet_readl(HPET_T0_CMP);
> +		while (hpet_readl(HPET_COUNTER) >= trigger);
> +

Is this signedness-safe and wraparound-safe?  It might be better to make
`trigger' unsigned and do

	while (hpet_readl(HPET_COUNTER) - trigger >= 0)

> +		while (hpet_readl(HPET_COUNTER) <  trigger)
>   			/* do nothing */ ;

ditto.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
                   ` (11 preceding siblings ...)
  2007-02-01 11:46 ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday Ingo Molnar
@ 2007-02-02  4:22 ` Andrew Morton
  2007-02-02  7:07   ` Andi Kleen
  12 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2007-02-02  4:22 UTC (permalink / raw)
  To: jbohac
  Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Thu, 01 Feb 2007 10:59:52 +0100 jbohac@suse.cz wrote:

> TSC-based x86_64 timekeeping implementation

I worry about the relationship between this work and all the time-management
changes in -mm.  If Andi to were to merge all this stuff under that then I
expect various catastrophes would ensue.

Have you checked to determine the severity of the overlaps?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 9/9] Make use of the Master Timer
  2007-02-01 14:29     ` Jiri Bohac
  2007-02-01 15:23       ` Vojtech Pavlik
@ 2007-02-02  7:04       ` Andi Kleen
  1 sibling, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-02  7:04 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea, Recent, Addresses

On Thursday 01 February 2007 15:29, Jiri Bohac wrote:

> If I do:
> 	rdtscll(a)
> 	...
> 	rdtscll(b)
> is it guaranteed that (b > a) ?

It's not architecturally -- unless you have a barrier.

On P4 the micro architecture guarantees it, but there the barrier in
get_cycles_sync is  patched away. On other x86-64s it is generally needed.

The effect can be also seen between CPUs.

> Because of the __vxtime.cpu[cpu].tsc_invalid flag. We may be

You can still precompute it for the HPET etc. case.
They are already slow, but saving a condition there might be still
worth it.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 9/9] Make use of the Master Timer
  2007-02-01 15:23       ` Vojtech Pavlik
@ 2007-02-02  7:05         ` Andi Kleen
  0 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-02  7:05 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Jiri Bohac, linux-kernel, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea


> > Hmm, I wasn't sure. Why is it needed? How outdated can the
> > result of RDTSC / RDTSCP be?
> > 
> > If I do:
> > 	rdtscll(a)
> > 	...
> > 	rdtscll(b)
> > is it guaranteed that (b > a) ?
> 
> On a single CPU this is always guaranteed. Even on AMD.

It's not guaranteed on Intel at least (but happens to work on P4) 

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday
  2007-02-02  4:22 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andrew Morton
@ 2007-02-02  7:07   ` Andi Kleen
  0 siblings, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-02  7:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jbohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Friday 02 February 2007 05:22, Andrew Morton wrote:
> On Thu, 01 Feb 2007 10:59:52 +0100 jbohac@suse.cz wrote:
> 
> > TSC-based x86_64 timekeeping implementation
> 
> I worry about the relationship between this work and all the time-management
> changes in -mm.  If Andi to were to merge all this stuff under that then I
> expect various catastrophes would ensue.
> 
> Have you checked to determine the severity of the overlaps?

The overlap is quite total. They both overhaul the time code completely. 

I suspect the way to go is to reimplement Jiri's patch on top of clock sources
(Hopefully now with working algorithms that shouldn't be too hard). But 
I'm still waiting for Jiri's assessment on how feasible this is.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01 13:17     ` Jiri Bohac
  2007-02-01 15:16       ` Vojtech Pavlik
@ 2007-02-02  7:13       ` Andi Kleen
  1 sibling, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-02  7:13 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 14:17, Jiri Bohac wrote:
> On Thu, Feb 01, 2007 at 12:14:23PM +0100, Andi Kleen wrote:
> > On Thursday 01 February 2007 10:59, jbohac@suse.cz wrote:
> > > TSC is either synchronized by design or not reliable
> > > to be used for anything, let alone timekeeping.
> > 
> > In my tree this is already done better by a patch from Ingo.
> > Check if they look synchronized and don't use TSC if they are not.
> 
> The whole purpose of this patchset is to make use of TSC even if
> it's not synchronized.

It's still useful as a double check for platforms (like Intel single node) 
which are supposed to be synchronized.
 
> Synchronizing it will not make anything better in any way -- the
> implementation just does not care whether TSCs are synchronized.
> That's why I think the synchronization code is not needed.

It doesn't actively synchronize it, just checks if they look synchronized.

-Andi 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01 15:16       ` Vojtech Pavlik
@ 2007-02-02  7:14         ` Andi Kleen
  2007-02-13  0:34           ` Christoph Lameter
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-02  7:14 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Jiri Bohac, linux-kernel, ssouhlal, arjan, tglx, johnstul,
	zippel, andrea

On Thursday 01 February 2007 16:16, Vojtech Pavlik wrote:

> It might even make sense to desycnhronize the TSCs on such (AMD)
> machines on purpose, so that applications that rely on TSC break
> immediately and not after some time when the error becomes too large.

They won't because they're normally single threaded (and most people
still use single core systems anyways) 

I've threatened to just disable RDTSC for ring 3 before, but it'll likely
never happen because too many programs use it.

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-01  9:59 ` [patch 4/9] Remove the TSC synchronization on SMP machines jbohac
  2007-02-01 11:14   ` Andi Kleen
@ 2007-02-03  1:16   ` H. Peter Anvin
  1 sibling, 0 replies; 68+ messages in thread
From: H. Peter Anvin @ 2007-02-03  1:16 UTC (permalink / raw)
  To: jbohac
  Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

jbohac@suse.cz wrote:
> TSC is either synchronized by design or not reliable
> to be used for anything, let alone timekeeping.

This refers to eliminating the offset between multiple synchronized TSCs.

	-hpa

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 1/9] Fix HPET init race
  2007-02-02  2:34   ` Andrew Morton
@ 2007-02-06 16:44     ` Jiri Bohac
  2007-02-07  0:12       ` Andrew Morton
  0 siblings, 1 reply; 68+ messages in thread
From: Jiri Bohac @ 2007-02-06 16:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jbohac, Andi Kleen, linux-kernel, Vojtech Pavlik, ssouhlal,
	arjan, tglx, johnstul, zippel, andrea

On Thu, Feb 01, 2007 at 06:34:50PM -0800, Andrew Morton wrote:
> On Thu, 01 Feb 2007 10:59:53 +0100 jbohac@suse.cz wrote:
> 
> > Fix a race in the initialization of HPET, which might result in a 
> > 5 minute lockup on boot.
> > 
> 
> What race?  Please always describe bugs when fixing them.

If the value of the HPET_T0_CMP register is reached and exceeded
by the value of the HPET_COUNTER register after HPET_T0_CMP is
read into trigger, but before the first iteration of the while,
the while loop will iterate "endlessly" until the HPET overlaps
eventually (in as much as 5 minutes).

> > 
> > Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> > ===================================================================
> > --- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
> > +++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> > @@ -764,10 +767,12 @@ static void setup_APIC_timer(unsigned in
> >  
> >  	/* wait for irq slice */
> >   	if (vxtime.hpet_address && hpet_use_timer) {
> > - 		int trigger = hpet_readl(HPET_T0_CMP);
> > - 		while (hpet_readl(HPET_COUNTER) >= trigger)
> > - 			/* do nothing */ ;
> > - 		while (hpet_readl(HPET_COUNTER) <  trigger)
> > +		int trigger;
> > +		do
> > +			trigger = hpet_readl(HPET_T0_CMP);
> > +		while (hpet_readl(HPET_COUNTER) >= trigger);
> > +
> 
> Is this signedness-safe and wraparound-safe?  It might be better to make
> `trigger' unsigned and do
> 
> 	while (hpet_readl(HPET_COUNTER) - trigger >= 0)


Yes, making trigger unsigned is a good idea (although having it
signed would probably never cause any problem, because this is
called during boot and it takes ~2 minutes for HPET to overflow
the s32)

But no, looping while the unsigned result is >= 0 does not seem
that good an idea to me ;-)

An updated patch follows. It is still not wraparound safe (a
lockup would still happen if it's called ~5 minutes after boot, but
this should never happen -- it's called early during boot)

--- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c	2007-02-06 16:56:00.000000000 +0100
+++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c	2007-02-06 17:26:42.000000000 +0100
@@ -764,10 +764,12 @@ static void setup_APIC_timer(unsigned in
 
 	/* wait for irq slice */
  	if (vxtime.hpet_address && hpet_use_timer) {
- 		int trigger = hpet_readl(HPET_T0_CMP);
- 		while (hpet_readl(HPET_COUNTER) >= trigger)
- 			/* do nothing */ ;
- 		while (hpet_readl(HPET_COUNTER) <  trigger)
+		u32 trigger;
+		do
+			trigger = hpet_readl(HPET_T0_CMP);
+		while (hpet_readl(HPET_COUNTER) >= trigger);
+
+		while (hpet_readl(HPET_COUNTER) <  trigger)
  			/* do nothing */ ;
  	} else {
 		int c1, c2;


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 1/9] Fix HPET init race
  2007-02-06 16:44     ` Jiri Bohac
@ 2007-02-07  0:12       ` Andrew Morton
  2007-02-10 12:31         ` Andi Kleen
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Morton @ 2007-02-07  0:12 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: Andi Kleen, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Tue, 6 Feb 2007 17:44:59 +0100
Jiri Bohac <jbohac@suse.cz> wrote:

> On Thu, Feb 01, 2007 at 06:34:50PM -0800, Andrew Morton wrote:
> > On Thu, 01 Feb 2007 10:59:53 +0100 jbohac@suse.cz wrote:
> > 
> > > Fix a race in the initialization of HPET, which might result in a 
> > > 5 minute lockup on boot.
> > > 
> > 
> > What race?  Please always describe bugs when fixing them.
> 
> If the value of the HPET_T0_CMP register is reached and exceeded
> by the value of the HPET_COUNTER register after HPET_T0_CMP is
> read into trigger, but before the first iteration of the while,
> the while loop will iterate "endlessly" until the HPET overlaps
> eventually (in as much as 5 minutes).
> 
> > > 
> > > Index: linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> > > ===================================================================
> > > --- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c
> > > +++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c
> > > @@ -764,10 +767,12 @@ static void setup_APIC_timer(unsigned in
> > >  
> > >  	/* wait for irq slice */
> > >   	if (vxtime.hpet_address && hpet_use_timer) {
> > > - 		int trigger = hpet_readl(HPET_T0_CMP);
> > > - 		while (hpet_readl(HPET_COUNTER) >= trigger)
> > > - 			/* do nothing */ ;
> > > - 		while (hpet_readl(HPET_COUNTER) <  trigger)
> > > +		int trigger;
> > > +		do
> > > +			trigger = hpet_readl(HPET_T0_CMP);
> > > +		while (hpet_readl(HPET_COUNTER) >= trigger);
> > > +
> > 
> > Is this signedness-safe and wraparound-safe?  It might be better to make
> > `trigger' unsigned and do
> > 
> > 	while (hpet_readl(HPET_COUNTER) - trigger >= 0)
> 
> 
> Yes, making trigger unsigned is a good idea (although having it
> signed would probably never cause any problem, because this is
> called during boot and it takes ~2 minutes for HPET to overflow
> the s32)
> 
> But no, looping while the unsigned result is >= 0 does not seem
> that good an idea to me ;-)
> 
> An updated patch follows. It is still not wraparound safe (a
> lockup would still happen if it's called ~5 minutes after boot, but
> this should never happen -- it's called early during boot)
> 
> --- linux-2.6.20-rc5.orig/arch/x86_64/kernel/apic.c	2007-02-06 16:56:00.000000000 +0100
> +++ linux-2.6.20-rc5/arch/x86_64/kernel/apic.c	2007-02-06 17:26:42.000000000 +0100
> @@ -764,10 +764,12 @@ static void setup_APIC_timer(unsigned in
>  
>  	/* wait for irq slice */
>   	if (vxtime.hpet_address && hpet_use_timer) {
> - 		int trigger = hpet_readl(HPET_T0_CMP);
> - 		while (hpet_readl(HPET_COUNTER) >= trigger)
> - 			/* do nothing */ ;
> - 		while (hpet_readl(HPET_COUNTER) <  trigger)
> +		u32 trigger;
> +		do
> +			trigger = hpet_readl(HPET_T0_CMP);
> +		while (hpet_readl(HPET_COUNTER) >= trigger);
> +
> +		while (hpet_readl(HPET_COUNTER) <  trigger)
>   			/* do nothing */ ;
>   	} else {
>  		int c1, c2;
> 

Well it still seems a bit dodgy to me - I'd have thought it'd be possible
(and nicer) to come up with a version which is safe to call at any time.

Are you sure this won't cause a kexec'ed kernel to lock up for five
minutes, for example?

Anyway, I'll let Andi worry about this one.  Please send him a signed-off
and fully changelogged patch and still cc myself, thanks.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 1/9] Fix HPET init race
  2007-02-07  0:12       ` Andrew Morton
@ 2007-02-10 12:31         ` Andi Kleen
  2007-07-26 20:58           ` Robin Holt
  0 siblings, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-10 12:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Bohac, linux-kernel, Vojtech Pavlik, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea


> 
> Well it still seems a bit dodgy to me - I'd have thought it'd be possible
> (and nicer) to come up with a version which is safe to call at any time.
> 
> Are you sure this won't cause a kexec'ed kernel to lock up for five
> minutes, for example?
> 
> Anyway, I'll let Andi worry about this one.  Please send him a signed-off
> and fully changelogged patch and still cc myself, thanks.

What is the status on this one? Jiri, do you have an updated patch?
-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-02  7:14         ` Andi Kleen
@ 2007-02-13  0:34           ` Christoph Lameter
  2007-02-13  6:40             ` Arjan van de Ven
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph Lameter @ 2007-02-13  0:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Vojtech Pavlik, Jiri Bohac, linux-kernel, ssouhlal, arjan, tglx,
	johnstul, zippel, andrea

On Fri, 2 Feb 2007, Andi Kleen wrote:

> I've threatened to just disable RDTSC for ring 3 before, but it'll likely
> never happen because too many programs use it.

Those programs are aware that they are fiddling around with low level 
material but with this patchset we are going to have a non 
monotonic time subsystem? Programs expects gettimeofday() etc to be 
monotonic. Its going to a big surprise if that is not working anymore.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13  0:34           ` Christoph Lameter
@ 2007-02-13  6:40             ` Arjan van de Ven
  2007-02-13  8:28               ` Andi Kleen
  2007-02-13 17:09               ` Christoph Lameter
  0 siblings, 2 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-13  6:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Vojtech Pavlik, Jiri Bohac, linux-kernel, ssouhlal,
	tglx, johnstul, zippel, andrea

On Mon, 2007-02-12 at 16:34 -0800, Christoph Lameter wrote:
> On Fri, 2 Feb 2007, Andi Kleen wrote:
> 
> > I've threatened to just disable RDTSC for ring 3 before, but it'll likely
> > never happen because too many programs use it.
> 
> Those programs are aware that they are fiddling around with low level 
> material but with this patchset we are going to have a non 
> monotonic time subsystem?

no quite the opposite. gettimeofday() currently is NOT monotonic
unfortunately. With this patchseries it actually has a better chance of
becoming that...

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13  6:40             ` Arjan van de Ven
@ 2007-02-13  8:28               ` Andi Kleen
  2007-02-13  8:41                 ` Arjan van de Ven
  2007-02-13 17:09               ` Christoph Lameter
  1 sibling, 1 reply; 68+ messages in thread
From: Andi Kleen @ 2007-02-13  8:28 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Christoph Lameter, Vojtech Pavlik, Jiri Bohac, linux-kernel,
	ssouhlal, tglx, johnstul, zippel, andrea

On Tuesday 13 February 2007 07:40, Arjan van de Ven wrote:
> On Mon, 2007-02-12 at 16:34 -0800, Christoph Lameter wrote:
> > On Fri, 2 Feb 2007, Andi Kleen wrote:
> > 
> > > I've threatened to just disable RDTSC for ring 3 before, but it'll likely
> > > never happen because too many programs use it.
> > 
> > Those programs are aware that they are fiddling around with low level 
> > material but with this patchset we are going to have a non 
> > monotonic time subsystem?
> 
> no quite the opposite. gettimeofday() currently is NOT monotonic
> unfortunately. 

Anytime it is non monotonic that's a bug. We've had bugs
like this before, but recently we're doing reasonably well. Of course
there can be always improvements, but in general I don't agree
with your statement, sorry. You can usually rely on it being monotonic,
short of the known limitations (e.g. don't run ntpd) 

Usually it weren't really classical bugs, but more "hardware does
unexpected things under us". x86 hardware is a moving target unfortunately.

> With this patchseries it actually has a better chance of
> becoming that...

ntpd problem is fundamental, nothing will change that. However
it doesn't seem to be a big one in practice. 

-Andi
 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13  8:28               ` Andi Kleen
@ 2007-02-13  8:41                 ` Arjan van de Ven
  0 siblings, 0 replies; 68+ messages in thread
From: Arjan van de Ven @ 2007-02-13  8:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Vojtech Pavlik, Jiri Bohac, linux-kernel,
	ssouhlal, tglx, johnstul, zippel, andrea

On Tue, 2007-02-13 at 09:28 +0100, Andi Kleen wrote:
> On Tuesday 13 February 2007 07:40, Arjan van de Ven wrote:
> > On Mon, 2007-02-12 at 16:34 -0800, Christoph Lameter wrote:
> > > On Fri, 2 Feb 2007, Andi Kleen wrote:
> > > 
> > > > I've threatened to just disable RDTSC for ring 3 before, but it'll likely
> > > > never happen because too many programs use it.
> > > 
> > > Those programs are aware that they are fiddling around with low level 
> > > material but with this patchset we are going to have a non 
> > > monotonic time subsystem?
> > 
> > no quite the opposite. gettimeofday() currently is NOT monotonic
> > unfortunately. 
> 
> Anytime it is non monotonic that's a bug. We've had bugs
> like this before, but recently we're doing reasonably well. Of course
> there can be always improvements, but in general I don't agree
> with your statement, sorry. You can usually rely on it being monotonic,
> short of the known limitations (e.g. don't run ntpd) 

oh I agree it should be monotonic, but I remember an argument I had with
you several weeks ago where you were basically saying the opposite ;)

I'm happy to see gtod become more monotonic/reliable any way we can


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13  6:40             ` Arjan van de Ven
  2007-02-13  8:28               ` Andi Kleen
@ 2007-02-13 17:09               ` Christoph Lameter
  2007-02-13 17:20                 ` Andi Kleen
  1 sibling, 1 reply; 68+ messages in thread
From: Christoph Lameter @ 2007-02-13 17:09 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, Vojtech Pavlik, Jiri Bohac, linux-kernel, ssouhlal,
	tglx, johnstul, zippel, andrea

On Tue, 13 Feb 2007, Arjan van de Ven wrote:

> no quite the opposite. gettimeofday() currently is NOT monotonic
> unfortunately. With this patchseries it actually has a better chance of
> becoming that...

It is monotonic on IA64 at least and we have found that subtle application 
bugs occur if it is not. IA64 (and other arches using time interpolation) 
can insure the monotoneity of time sources. Are you sure about this? I 
wonder why the new time of day subsystem does not have that?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13 17:09               ` Christoph Lameter
@ 2007-02-13 17:20                 ` Andi Kleen
  2007-02-13 22:18                   ` Vojtech Pavlik
  2007-02-14  0:18                   ` Paul Mackerras
  0 siblings, 2 replies; 68+ messages in thread
From: Andi Kleen @ 2007-02-13 17:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Arjan van de Ven, Vojtech Pavlik, Jiri Bohac, linux-kernel,
	ssouhlal, tglx, johnstul, zippel, andrea

On Tuesday 13 February 2007 18:09, Christoph Lameter wrote:
> On Tue, 13 Feb 2007, Arjan van de Ven wrote:
> 
> > no quite the opposite. gettimeofday() currently is NOT monotonic
> > unfortunately. With this patchseries it actually has a better chance of
> > becoming that...
> 
> It is monotonic on IA64 at least and we have found that subtle application 
> bugs occur if it is not. IA64 (and other arches using time interpolation) 
> can insure the monotoneity of time sources. Are you sure about this? I 
> wonder why the new time of day subsystem does not have that?

Just to avoid spreading misinformation: modulo some new broken hardware
(which we always try to work around when found) i386/x86-64 gettimeofday
is monotonic.  AFAIK on the currently known hardware it should be generally
ok.

However ntpd can always screw you up, but that's inherent in the design.

Safer in general is to use clock_gettime(CLOCK_MONOTONIC, ...) which guarantees
no interference from ntpd

-Andi

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13 17:20                 ` Andi Kleen
@ 2007-02-13 22:18                   ` Vojtech Pavlik
  2007-02-13 22:38                     ` Andrea Arcangeli
  2007-02-13 23:55                     ` Christoph Lameter
  2007-02-14  0:18                   ` Paul Mackerras
  1 sibling, 2 replies; 68+ messages in thread
From: Vojtech Pavlik @ 2007-02-13 22:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Arjan van de Ven, Jiri Bohac, linux-kernel,
	ssouhlal, tglx, johnstul, zippel, andrea

On Tue, Feb 13, 2007 at 06:20:14PM +0100, Andi Kleen wrote:
> On Tuesday 13 February 2007 18:09, Christoph Lameter wrote:
> > On Tue, 13 Feb 2007, Arjan van de Ven wrote:
> > 
> > > no quite the opposite. gettimeofday() currently is NOT monotonic
> > > unfortunately. With this patchseries it actually has a better chance of
> > > becoming that...
> > 
> > It is monotonic on IA64 at least and we have found that subtle application 
> > bugs occur if it is not. IA64 (and other arches using time interpolation) 
> > can insure the monotoneity of time sources. Are you sure about this? I 
> > wonder why the new time of day subsystem does not have that?
> 
> Just to avoid spreading misinformation: modulo some new broken hardware
> (which we always try to work around when found) i386/x86-64 gettimeofday
> is monotonic.  AFAIK on the currently known hardware it should be generally
> ok.
> 
> However ntpd can always screw you up, but that's inherent in the design.

It's not inherent to ntpd's design, but the current (which may have been
fixed since I looked last) implementation of the NTP PLL in the kernel.

The interaction with ntpd can be fixed and I've done it in the past
once, although the fix wasn't all that nice.

> Safer in general is to use clock_gettime(CLOCK_MONOTONIC, ...) which
> guarantees no interference from ntpd

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13 22:18                   ` Vojtech Pavlik
@ 2007-02-13 22:38                     ` Andrea Arcangeli
  2007-02-14  6:59                       ` Vojtech Pavlik
  2007-02-13 23:55                     ` Christoph Lameter
  1 sibling, 1 reply; 68+ messages in thread
From: Andrea Arcangeli @ 2007-02-13 22:38 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Andi Kleen, Christoph Lameter, Arjan van de Ven, Jiri Bohac,
	linux-kernel, ssouhlal, tglx, johnstul, zippel

Hi,

On Tue, Feb 13, 2007 at 11:18:48PM +0100, Vojtech Pavlik wrote:
> It's not inherent to ntpd's design, but the current (which may have been
> fixed since I looked last) implementation of the NTP PLL in the kernel.
> 
> The interaction with ntpd can be fixed and I've done it in the past
> once, although the fix wasn't all that nice.

Yep, it can slowly move towards the correct time, but ntpdate (or more
generally settimeofday) remains a fundamental issue (and I prefer time
skews to be fixed ASAP, not slowly).

If the admin is good, he can know that if he ever runs the db when the
clock isn't perfectly synchronized with the atomic clock, he risks to
screwup his whole dataset as the apps won't even handle time going
backwards after reboot.

I think there should be a limit to how much an app can pretend from
gtod before generating failures. Certainly it's always better to write
apps that are robust against a not monotonic gtod because eventually
it _can_ happen (either that or remove the stod syscall ;).

About ntpdate at boot and ntpd at runtime, not running them isn't
really an option on the server IMHO, think the liability if system
time runs out of sync of a minute and you need to know exactly when
something bad has happened.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13 22:18                   ` Vojtech Pavlik
  2007-02-13 22:38                     ` Andrea Arcangeli
@ 2007-02-13 23:55                     ` Christoph Lameter
  1 sibling, 0 replies; 68+ messages in thread
From: Christoph Lameter @ 2007-02-13 23:55 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Andi Kleen, Arjan van de Ven, Jiri Bohac, linux-kernel, ssouhlal,
	tglx, johnstul, zippel, andrea

On Tue, 13 Feb 2007, Vojtech Pavlik wrote:

> The interaction with ntpd can be fixed and I've done it in the past
> once, although the fix wasn't all that nice.

It can be and was fixed by gradually moving time instead of jumping to 
the new time. F.e. the time interpolator on ia64 gradually adapts the 
intervals so that synchronization is obtained.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13 17:20                 ` Andi Kleen
  2007-02-13 22:18                   ` Vojtech Pavlik
@ 2007-02-14  0:18                   ` Paul Mackerras
  2007-02-14  0:25                     ` john stultz
  1 sibling, 1 reply; 68+ messages in thread
From: Paul Mackerras @ 2007-02-14  0:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Arjan van de Ven, Vojtech Pavlik, Jiri Bohac,
	linux-kernel, ssouhlal, tglx, johnstul, zippel, andrea

Andi Kleen writes:

> Just to avoid spreading misinformation: modulo some new broken hardware
> (which we always try to work around when found) i386/x86-64 gettimeofday
> is monotonic.  AFAIK on the currently known hardware it should be generally
> ok.
> 
> However ntpd can always screw you up, but that's inherent in the design.

On powerpc we manage to keep gettimeofday monotonic even when ntpd is
adjusting the clock.  We have 3 parameters used to convert a value
from the timebase register to the time of day, and these parameters
are adjusted if necessary at the beginning of each tick, based on the
value returned by current_tick_length().  The point is that
current_tick_length() tells you at the *beginning* of each tick how
much time will be added on to xtime at the *end* of that tick, and
that makes it possible to aim the interpolation to hit the same value
as xtime at the end of each tick.

Clearly if you make a discrete jump backwards with settimeofday or
adjtime, it's impossible to keep gettimeofday monotonic, but apart
from that it's monotonic on powerpc.

At least, that's the way it's supposed to work.  I hope the recent
timekeeping changes haven't broken it. :)

Paul.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-14  0:18                   ` Paul Mackerras
@ 2007-02-14  0:25                     ` john stultz
  0 siblings, 0 replies; 68+ messages in thread
From: john stultz @ 2007-02-14  0:25 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andi Kleen, Christoph Lameter, Arjan van de Ven, Vojtech Pavlik,
	Jiri Bohac, linux-kernel, ssouhlal, tglx, zippel, andrea

On Wed, 2007-02-14 at 11:18 +1100, Paul Mackerras wrote:
> Andi Kleen writes:
> 
> > Just to avoid spreading misinformation: modulo some new broken hardware
> > (which we always try to work around when found) i386/x86-64 gettimeofday
> > is monotonic.  AFAIK on the currently known hardware it should be generally
> > ok.
> > 
> > However ntpd can always screw you up, but that's inherent in the design.
> 
> On powerpc we manage to keep gettimeofday monotonic even when ntpd is
> adjusting the clock.  We have 3 parameters used to convert a value
> from the timebase register to the time of day, and these parameters
> are adjusted if necessary at the beginning of each tick, based on the
> value returned by current_tick_length().  The point is that
> current_tick_length() tells you at the *beginning* of each tick how
> much time will be added on to xtime at the *end* of that tick, and
> that makes it possible to aim the interpolation to hit the same value
> as xtime at the end of each tick.
> 
> Clearly if you make a discrete jump backwards with settimeofday or
> adjtime, it's impossible to keep gettimeofday monotonic, but apart
> from that it's monotonic on powerpc.
> 
> At least, that's the way it's supposed to work.  I hope the recent
> timekeeping changes haven't broken it. :)

No. Just to even further clarify (and since everyone is speaking up),
the generic timekeeping does a similar scaling adjustment of the
clocksource frequency for NTP adjustments made via sys_adjtimex().

I believe Andi was just referring to ntpd calling settimeofday(), which
will cause clock_gettime(CLOCK_REALTIME,...)/gettimeofday() to possibly
jump backwards. This behavior of NTP is of course configurable (see the
-x option, or the "tinker step 0" option combined w/ "disable kernel" in
ntp.conf)

-john


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 4/9] Remove the TSC synchronization on SMP machines
  2007-02-13 22:38                     ` Andrea Arcangeli
@ 2007-02-14  6:59                       ` Vojtech Pavlik
  0 siblings, 0 replies; 68+ messages in thread
From: Vojtech Pavlik @ 2007-02-14  6:59 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Christoph Lameter, Arjan van de Ven, Jiri Bohac,
	linux-kernel, ssouhlal, tglx, johnstul, zippel

On Tue, Feb 13, 2007 at 11:38:33PM +0100, Andrea Arcangeli wrote:
> Hi,
> 
> On Tue, Feb 13, 2007 at 11:18:48PM +0100, Vojtech Pavlik wrote:
> > It's not inherent to ntpd's design, but the current (which may have been
> > fixed since I looked last) implementation of the NTP PLL in the kernel.
> > 
> > The interaction with ntpd can be fixed and I've done it in the past
> > once, although the fix wasn't all that nice.
> 
> Yep, it can slowly move towards the correct time, but ntpdate (or more
> generally settimeofday) remains a fundamental issue (and I prefer time
> skews to be fixed ASAP, not slowly).

Skipping forward is trivial. For going backward, you can stop time (or
make it go forward very slowly). Still the output will be strictly
monotonic (but not more than that).

For small changes you simply change your estimate of the base clock
frequency to be different from what the specs say. Tuning that in a PLL
will get you to sync with true atomic GMT.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [patch 1/9] Fix HPET init race
  2007-02-10 12:31         ` Andi Kleen
@ 2007-07-26 20:58           ` Robin Holt
  0 siblings, 0 replies; 68+ messages in thread
From: Robin Holt @ 2007-07-26 20:58 UTC (permalink / raw)
  To: Andi Kleen, Jiri Bohac
  Cc: Andrew Morton, linux-kernel, Vojtech Pavlik, ssouhlal, arjan,
	tglx, johnstul, zippel, andrea

I have had four seperate system lockups attributable to this exact problem
in two days of testing.  Instead of trying to handle all the weird end
cases and wrap, how about changing it to look for exactly what we appear
to want.

The following patch removes a couple races in setup_APIC_timer.  One occurs
when the HPET advances the COUNTER past the T0_CMP value between the time
the T0_CMP was originally read and when COUNTER is read.  This results in
a delay waiting for the counter to wrap.  The other results from the counter
wrapping.

This change takes a snapshot of T0_CMP at the beginning of the loop and
simply loops until T0_CMP has changed (a tick has happened).

Signed-off-by: Robin Holt <holt@sgi.com>


Index: apic_fixes/arch/x86_64/kernel/apic.c
===================================================================
--- apic_fixes.orig/arch/x86_64/kernel/apic.c	2007-07-26 15:45:17.000000000 -0500
+++ apic_fixes/arch/x86_64/kernel/apic.c	2007-07-26 15:46:15.000000000 -0500
@@ -791,10 +791,8 @@ static void setup_APIC_timer(unsigned in
 
 	/* wait for irq slice */
 	if (hpet_address && hpet_use_timer) {
-		int trigger = hpet_readl(HPET_T0_CMP);
-		while (hpet_readl(HPET_COUNTER) >= trigger)
-			/* do nothing */ ;
-		while (hpet_readl(HPET_COUNTER) <  trigger)
+		u32 trigger = hpet_readl(HPET_T0_CMP);
+		while (hpet_readl(HPET_T0_CMP) == trigger)
 			/* do nothing */ ;
 	} else {
 		int c1, c2;

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2007-07-26 20:59 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-01  9:59 [patch 0/9] x86_64: reliable TSC-based gettimeofday jbohac
2007-02-01  9:59 ` [patch 1/9] Fix HPET init race jbohac
2007-02-02  2:34   ` Andrew Morton
2007-02-06 16:44     ` Jiri Bohac
2007-02-07  0:12       ` Andrew Morton
2007-02-10 12:31         ` Andi Kleen
2007-07-26 20:58           ` Robin Holt
2007-02-01  9:59 ` [patch 2/9] Remove the support for the VXTIME_PMTMR timer mode jbohac
2007-02-01 11:13   ` Andi Kleen
2007-02-01 13:13     ` Jiri Bohac
2007-02-01 13:13       ` Andi Kleen
2007-02-01 13:59         ` Jiri Bohac
2007-02-01 14:18           ` Andi Kleen
2007-02-01  9:59 ` [patch 3/9] Remove the support for the VXTIME_HPET " jbohac
2007-02-01  9:59 ` [patch 4/9] Remove the TSC synchronization on SMP machines jbohac
2007-02-01 11:14   ` Andi Kleen
2007-02-01 13:17     ` Jiri Bohac
2007-02-01 15:16       ` Vojtech Pavlik
2007-02-02  7:14         ` Andi Kleen
2007-02-13  0:34           ` Christoph Lameter
2007-02-13  6:40             ` Arjan van de Ven
2007-02-13  8:28               ` Andi Kleen
2007-02-13  8:41                 ` Arjan van de Ven
2007-02-13 17:09               ` Christoph Lameter
2007-02-13 17:20                 ` Andi Kleen
2007-02-13 22:18                   ` Vojtech Pavlik
2007-02-13 22:38                     ` Andrea Arcangeli
2007-02-14  6:59                       ` Vojtech Pavlik
2007-02-13 23:55                     ` Christoph Lameter
2007-02-14  0:18                   ` Paul Mackerras
2007-02-14  0:25                     ` john stultz
2007-02-02  7:13       ` Andi Kleen
2007-02-01 21:05     ` mbligh
2007-02-03  1:16   ` H. Peter Anvin
2007-02-01  9:59 ` [patch 5/9] Add all the necessary structures to the vsyscall page jbohac
2007-02-01 11:17   ` Andi Kleen
2007-02-01  9:59 ` [patch 6/9] Add the "Master Timer" jbohac
2007-02-01 11:22   ` Andi Kleen
2007-02-01 13:29     ` Jiri Bohac
2007-02-01  9:59 ` [patch 7/9] Adapt the time initialization code jbohac
2007-02-01 11:26   ` Andi Kleen
2007-02-01 13:41     ` Jiri Bohac
2007-02-01 10:00 ` [patch 8/9] Add time_update_mt_guess() jbohac
2007-02-01 11:28   ` Andi Kleen
2007-02-01 13:54     ` Jiri Bohac
2007-02-01 10:00 ` [patch 9/9] Make use of the Master Timer jbohac
2007-02-01 11:36   ` Andi Kleen
2007-02-01 14:29     ` Jiri Bohac
2007-02-01 15:23       ` Vojtech Pavlik
2007-02-02  7:05         ` Andi Kleen
2007-02-02  7:04       ` Andi Kleen
2007-02-01 11:20 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andi Kleen
2007-02-01 11:53   ` Andrea Arcangeli
2007-02-01 12:02     ` Andi Kleen
2007-02-01 12:54       ` Andrea Arcangeli
2007-02-01 12:17   ` Ingo Molnar
2007-02-01 14:52   ` Jiri Bohac
2007-02-01 16:56     ` john stultz
2007-02-01 19:41       ` Vojtech Pavlik
2007-02-01 11:34 ` Ingo Molnar
2007-02-01 11:46 ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday Ingo Molnar
2007-02-01 12:01   ` Andi Kleen
2007-02-01 12:14     ` Ingo Molnar
2007-02-01 12:17   ` [-mm patch] x86_64 GTOD: offer scalable vgettimeofday II Andi Kleen
2007-02-01 12:24     ` Ingo Molnar
2007-02-01 12:45       ` Andi Kleen
2007-02-02  4:22 ` [patch 0/9] x86_64: reliable TSC-based gettimeofday Andrew Morton
2007-02-02  7:07   ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.