linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 00/21] hrtimer - High-resolution timer subsystem
@ 2005-12-06  0:01 tglx
  2005-12-06  0:01 ` [patch 01/21] Move div_long_long_rem out of jiffies.h tglx
                   ` (21 more replies)
  0 siblings, 22 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

This is a major rework of the former ktimer subsystem. It replaces the 
ktimer patch series and drops the ktimeout series completely.

A broken out series is available from
http://www.tglx.de/projects/ktimers/patches-2.6.15-rc5-hrtimer.tar.bz2

1. Naming

After the extensive discussions on LKML, Andrew Morton suggested 
"hrtimer" and we picked it up. While the hrtimer subsystem does not 
offer high-resolution clock sources just yet, the subsystem can be 
easily extended with high-resolution clock capabilities. The rework of 
the ktimer-hrt patches is the next step.

2. More simplifications

We worked through the subsystem and its users and further reduced the 
implementation to the basic required infrastructure and generally 
streamlined it. (We did this with easy extensibility for the high 
resolution clock support still in mind, so we kept some small extras 
around.)

The new .text overhead (on x86) we believe speaks for itself:

    text    data     bss     dec     hex filename
 2468380  547212  155164 3170756  3061c4 vmlinux-2.6.15-rc2
 2469996  548016  155164 3173176  306b38 vmlinux-ktimer-rc5-mm1
 2468164  547508  155100 3170772  3061d4 vmlinux-hrtimer

While it was +1616 bytes before, it's -216 bytes now. This also gives a 
new justification for hrtimers: it reduces .text overhead ;-) [ There's 
still some .data overhead, but it's acceptable at 0.1%.]

On 64-bit platforms such as x64 there are even more .text savings:

    text    data     bss     dec     hex filename
 3853431  914316  403880 5171627  4ee9ab vmlinux-x64-2.6.15-rc5
 3852407  914548  403752 5170707  4ee613 vmlinux-x64-hrtimer

(due to the compactness of 64-bit ktime_t ops)

Other 32-bit platforms (arm, ppc) have a much smaller .text 
hrtimers footprint now too.

3. Fixes

The last splitup of ktimers resulted in a bug in the overrun accounting.  
This bug is now fixed and the code verified for correctness.

4. Rounding

We looked at the runtime behaviour of vanilla, ktimers and ptimers to 
figure out the consequences for applications in a more detailed way.

The rounding of time values and intervals leads to rather unpredictible 
results which deviates of the current mainline implementation 
significantly and introduces unpredictible behaviour vs. the timeline.

After reading the Posix specification again, we came to the conclusion 
that it is possible to do no rounding at all for the ktime_t values, and 
to still ensure that the timer is not delivered early.

".. and that timers must wait for the next clock tick after the 
theoretical expiration time, to ensure that a timer never returns too 
soon. Note also that the granularity of the clock may be significantly 
coarser than the resolution of the data format used to set and get time 
and interval values. Also note that some implementations may choose to 
adjust time and/or interval values to exactly match the ticks of the 
underlying clock."

Which allows the already discussed part of the spec to be interpreted 
differently:

"Time values that are between two consecutive non-negative integer 
multiples of the resolution of the specified timer shall be rounded up 
to the larger multiple of the resolution. Quantization error shall not 
cause the timer to expire earlier than the rounded time value."

The rounding of the time value i.e. the expiry time itself must be
rounded to the next clock tick, to ensure that a timer never expires
early.

	Thomas, Ingo

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 01/21] Move div_long_long_rem out of jiffies.h
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 02/21] Remove duplicate div_long_long_rem implementation tglx
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: move-div-long-long-rem-out-of-jiffiesh.patch --]
[-- Type: text/plain, Size: 2794 bytes --]


- move div_long_long_rem() from jiffies.h into a new calc64.h include file,
  as it is a general math function useful for other things than the jiffy
  code. Convert it to an inline function

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/calc64.h  |   50 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/jiffies.h |   11 ----------
 2 files changed, 51 insertions(+), 10 deletions(-)

Index: linux-2.6.15-rc5/include/linux/calc64.h
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/include/linux/calc64.h
@@ -0,0 +1,49 @@
+#ifndef _LINUX_CALC64_H
+#define _LINUX_CALC64_H
+
+#include <linux/types.h>
+#include <asm/div64.h>
+
+/*
+ * This is a generic macro which is used when the architecture
+ * specific div64.h does not provide a optimized one.
+ *
+ * The 64bit dividend is divided by the divisor (data type long), the
+ * result is returned and the remainder stored in the variable
+ * referenced by remainder (data type long *). In contrast to the
+ * do_div macro the dividend is kept intact.
+ */
+#ifndef div_long_long_rem
+#define div_long_long_rem(dividend, divisor, remainder)	\
+	do_div_llr((dividend), divisor, remainder)
+
+static inline unsigned long do_div_llr(const long long dividend,
+				       const long divisor, long *remainder)
+{
+	u64 result = dividend;
+
+	*(remainder) = do_div(result, divisor);
+	return (unsigned long) result;
+}
+#endif
+
+/*
+ * Sign aware variation of the above. On some architectures a
+ * negative dividend leads to an divide overflow exception, which
+ * is avoided by the sign check.
+ */
+static inline long div_long_long_rem_signed(const long long dividend,
+					    const long divisor, long *remainder)
+{
+	long res;
+
+	if (unlikely(dividend < 0)) {
+		res = -div_long_long_rem(-dividend, divisor, remainder);
+		*remainder = -(*remainder);
+	} else
+		res = div_long_long_rem(dividend, divisor, remainder);
+
+	return res;
+}
+
+#endif
Index: linux-2.6.15-rc5/include/linux/jiffies.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/jiffies.h
+++ linux-2.6.15-rc5/include/linux/jiffies.h
@@ -1,21 +1,12 @@
 #ifndef _LINUX_JIFFIES_H
 #define _LINUX_JIFFIES_H
 
+#include <linux/calc64.h>
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/time.h>
 #include <linux/timex.h>
 #include <asm/param.h>			/* for HZ */
-#include <asm/div64.h>
-
-#ifndef div_long_long_rem
-#define div_long_long_rem(dividend,divisor,remainder) \
-({							\
-	u64 result = dividend;				\
-	*remainder = do_div(result,divisor);		\
-	result;						\
-})
-#endif
 
 /*
  * The following defines establish the engineering parameters of the PLL

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 02/21] Remove duplicate div_long_long_rem implementation
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
  2005-12-06  0:01 ` [patch 01/21] Move div_long_long_rem out of jiffies.h tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 03/21] Deinline mktime and set_normalized_timespec tglx
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: remove-div-long-long-rem-duplicate.patch --]
[-- Type: text/plain, Size: 1079 bytes --]


- make posix-timers.c use the generic calc64.h facility

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 kernel/posix-timers.c |   10 +---------
 1 files changed, 1 insertion(+), 9 deletions(-)

Index: linux-2.6.15-rc5/kernel/posix-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-timers.c
+++ linux-2.6.15-rc5/kernel/posix-timers.c
@@ -35,6 +35,7 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/time.h>
+#include <linux/calc64.h>
 
 #include <asm/uaccess.h>
 #include <asm/semaphore.h>
@@ -48,15 +49,6 @@
 #include <linux/workqueue.h>
 #include <linux/module.h>
 
-#ifndef div_long_long_rem
-#include <asm/div64.h>
-
-#define div_long_long_rem(dividend,divisor,remainder) ({ \
-		       u64 result = dividend;		\
-		       *remainder = do_div(result,divisor); \
-		       result; })
-
-#endif
 #define CLOCK_REALTIME_RES TICK_NSEC  /* In nano seconds. */
 
 static inline u64  mpy_l_X_l_ll(unsigned long mpy1,unsigned long mpy2)

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 03/21] Deinline mktime and set_normalized_timespec
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
  2005-12-06  0:01 ` [patch 01/21] Move div_long_long_rem out of jiffies.h tglx
  2005-12-06  0:01 ` [patch 02/21] Remove duplicate div_long_long_rem implementation tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 04/21] Clean up mktime and make arguments const tglx
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: deinline-mktime-set-normalized-timespec.patch --]
[-- Type: text/plain, Size: 5237 bytes --]


- mktime() and set_normalized_timespec() are large inline functions used
  in many places: deinline them.

From: George Anzinger, off-by-1 bugfix

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/time.h |   52 ++++---------------------------------------
 kernel/time.c        |   61 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 47 deletions(-)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -38,38 +38,9 @@ static __inline__ int timespec_equal(str
 	return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec);
 } 
 
-/* Converts Gregorian date to seconds since 1970-01-01 00:00:00.
- * Assumes input in normal date format, i.e. 1980-12-31 23:59:59
- * => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
- *
- * [For the Julian calendar (which was used in Russia before 1917,
- * Britain & colonies before 1752, anywhere else before 1582,
- * and is still in use by some communities) leave out the
- * -year/100+year/400 terms, and add 10.]
- *
- * This algorithm was first published by Gauss (I think).
- *
- * WARNING: this function will overflow on 2106-02-07 06:28:16 on
- * machines were long is 32-bit! (However, as time_t is signed, we
- * will already get problems at other places on 2038-01-19 03:14:08)
- */
-static inline unsigned long
-mktime (unsigned int year, unsigned int mon,
-	unsigned int day, unsigned int hour,
-	unsigned int min, unsigned int sec)
-{
-	if (0 >= (int) (mon -= 2)) {	/* 1..12 -> 11,12,1..10 */
-		mon += 12;		/* Puts Feb last since it has leap day */
-		year -= 1;
-	}
-
-	return (((
-		(unsigned long) (year/4 - year/100 + year/400 + 367*mon/12 + day) +
-			year*365 - 719499
-	    )*24 + hour /* now have hours */
-	  )*60 + min /* now have minutes */
-	)*60 + sec; /* finally seconds */
-}
+extern unsigned long mktime (unsigned int year, unsigned int mon,
+			     unsigned int day, unsigned int hour,
+			     unsigned int min, unsigned int sec);
 
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
@@ -80,6 +51,8 @@ static inline unsigned long get_seconds(
 	return xtime.tv_sec;
 }
 
+extern void set_normalized_timespec (struct timespec *ts, time_t sec, long nsec);
+
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME (current_kernel_time())
@@ -98,21 +71,6 @@ extern void getnstimeofday (struct times
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 
-static inline void
-set_normalized_timespec (struct timespec *ts, time_t sec, long nsec)
-{
-	while (nsec >= NSEC_PER_SEC) {
-		nsec -= NSEC_PER_SEC;
-		++sec;
-	}
-	while (nsec < 0) {
-		nsec += NSEC_PER_SEC;
-		--sec;
-	}
-	ts->tv_sec = sec;
-	ts->tv_nsec = nsec;
-}
-
 #endif /* __KERNEL__ */
 
 #define NFDBITS			__NFDBITS
Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -561,6 +561,67 @@ void getnstimeofday(struct timespec *tv)
 EXPORT_SYMBOL_GPL(getnstimeofday);
 #endif
 
+/* Converts Gregorian date to seconds since 1970-01-01 00:00:00.
+ * Assumes input in normal date format, i.e. 1980-12-31 23:59:59
+ * => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
+ *
+ * [For the Julian calendar (which was used in Russia before 1917,
+ * Britain & colonies before 1752, anywhere else before 1582,
+ * and is still in use by some communities) leave out the
+ * -year/100+year/400 terms, and add 10.]
+ *
+ * This algorithm was first published by Gauss (I think).
+ *
+ * WARNING: this function will overflow on 2106-02-07 06:28:16 on
+ * machines were long is 32-bit! (However, as time_t is signed, we
+ * will already get problems at other places on 2038-01-19 03:14:08)
+ */
+unsigned long
+mktime (unsigned int year, unsigned int mon,
+	unsigned int day, unsigned int hour,
+	unsigned int min, unsigned int sec)
+{
+	if (0 >= (int) (mon -= 2)) {	/* 1..12 -> 11,12,1..10 */
+		mon += 12;		/* Puts Feb last since it has leap day */
+		year -= 1;
+	}
+
+	return ((((unsigned long)
+		  (year/4 - year/100 + year/400 + 367*mon/12 + day) +
+		  year*365 - 719499
+	    )*24 + hour /* now have hours */
+	  )*60 + min /* now have minutes */
+	)*60 + sec; /* finally seconds */
+}
+
+/**
+ * set_normalized_timespec - set timespec sec and nsec parts and normalize
+ *
+ * @ts:		pointer to timespec variable to be set
+ * @sec:	seconds to set
+ * @nsec:	nanoseconds to set
+ *
+ * Set seconds and nanoseconds field of a timespec variable and
+ * normalize to the timespec storage format
+ *
+ * Note: The tv_nsec part is always in the range of
+ * 	0 <= tv_nsec < NSEC_PER_SEC
+ * For negative values only the tv_sec field is negative !
+ */
+void set_normalized_timespec (struct timespec *ts, time_t sec, long nsec)
+{
+	while (nsec >= NSEC_PER_SEC) {
+		nsec -= NSEC_PER_SEC;
+		++sec;
+	}
+	while (nsec < 0) {
+		nsec += NSEC_PER_SEC;
+		--sec;
+	}
+	ts->tv_sec = sec;
+	ts->tv_nsec = nsec;
+}
+
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
 {

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 04/21] Clean up mktime and make arguments const
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (2 preceding siblings ...)
  2005-12-06  0:01 ` [patch 03/21] Deinline mktime and set_normalized_timespec tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 05/21] Export deinlined mktime tglx
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: mktime-set-normalized-timespec-const.patch --]
[-- Type: text/plain, Size: 2695 bytes --]


- add 'const' to mktime arguments, and clean it up a bit

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 include/linux/time.h |   10 +++++-----
 kernel/time.c        |   15 +++++++++------
 2 files changed, 14 insertions(+), 11 deletions(-)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -38,9 +38,11 @@ static __inline__ int timespec_equal(str
 	return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec);
 } 
 
-extern unsigned long mktime (unsigned int year, unsigned int mon,
-			     unsigned int day, unsigned int hour,
-			     unsigned int min, unsigned int sec);
+extern unsigned long mktime(const unsigned int year, const unsigned int mon,
+			    const unsigned int day, const unsigned int hour,
+			    const unsigned int min, const unsigned int sec);
+
+extern void set_normalized_timespec(struct timespec *ts, time_t sec, long nsec);
 
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
@@ -51,8 +53,6 @@ static inline unsigned long get_seconds(
 	return xtime.tv_sec;
 }
 
-extern void set_normalized_timespec (struct timespec *ts, time_t sec, long nsec);
-
 struct timespec current_kernel_time(void);
 
 #define CURRENT_TIME (current_kernel_time())
Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -577,12 +577,15 @@ EXPORT_SYMBOL_GPL(getnstimeofday);
  * will already get problems at other places on 2038-01-19 03:14:08)
  */
 unsigned long
-mktime (unsigned int year, unsigned int mon,
-	unsigned int day, unsigned int hour,
-	unsigned int min, unsigned int sec)
+mktime(const unsigned int year0, const unsigned int mon0,
+       const unsigned int day, const unsigned int hour,
+       const unsigned int min, const unsigned int sec)
 {
-	if (0 >= (int) (mon -= 2)) {	/* 1..12 -> 11,12,1..10 */
-		mon += 12;		/* Puts Feb last since it has leap day */
+	unsigned int mon = mon0, year = year0;
+
+	/* 1..12 -> 11,12,1..10 */
+	if (0 >= (int) (mon -= 2)) {
+		mon += 12;	/* Puts Feb last since it has leap day */
 		year -= 1;
 	}
 
@@ -608,7 +611,7 @@ mktime (unsigned int year, unsigned int 
  * 	0 <= tv_nsec < NSEC_PER_SEC
  * For negative values only the tv_sec field is negative !
  */
-void set_normalized_timespec (struct timespec *ts, time_t sec, long nsec)
+void set_normalized_timespec(struct timespec *ts, time_t sec, long nsec)
 {
 	while (nsec >= NSEC_PER_SEC) {
 		nsec -= NSEC_PER_SEC;

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 05/21] Export deinlined mktime
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (3 preceding siblings ...)
  2005-12-06  0:01 ` [patch 04/21] Clean up mktime and make arguments const tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 06/21] Remove unused clock constants tglx
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: deinline-mktime-export.patch --]
[-- Type: text/plain, Size: 770 bytes --]


From: Andrew Morton <akpm@osdl.org>

This is now uninlined, but some modules use it.

Make it a non-GPL export, since the inlined mktime() was also available that
way.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 kernel/time.c |    2 ++
 1 files changed, 2 insertions(+)

Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -597,6 +597,8 @@ mktime(const unsigned int year0, const u
 	)*60 + sec; /* finally seconds */
 }
 
+EXPORT_SYMBOL(mktime);
+
 /**
  * set_normalized_timespec - set timespec sec and nsec parts and normalize
  *

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 06/21] Remove unused clock constants
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (4 preceding siblings ...)
  2005-12-06  0:01 ` [patch 05/21] Export deinlined mktime tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 07/21] Coding style clean up of " tglx
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: time-h-remove-unused-clock-constants.patch --]
[-- Type: text/plain, Size: 1297 bytes --]


- remove unused CLOCK_ constants from time.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/time.h |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -103,12 +103,10 @@ struct	itimerval {
 /*
  * The IDs of the various system clocks (for POSIX.1b interval timers).
  */
-#define CLOCK_REALTIME		  0
-#define CLOCK_MONOTONIC	  1
+#define CLOCK_REALTIME		 0
+#define CLOCK_MONOTONIC	  	 1
 #define CLOCK_PROCESS_CPUTIME_ID 2
 #define CLOCK_THREAD_CPUTIME_ID	 3
-#define CLOCK_REALTIME_HR	 4
-#define CLOCK_MONOTONIC_HR	  5
 
 /*
  * The IDs of various hardware clocks
@@ -117,9 +115,8 @@ struct	itimerval {
 
 #define CLOCK_SGI_CYCLE 10
 #define MAX_CLOCKS 16
-#define CLOCKS_MASK  (CLOCK_REALTIME | CLOCK_MONOTONIC | \
-                     CLOCK_REALTIME_HR | CLOCK_MONOTONIC_HR)
-#define CLOCKS_MONO (CLOCK_MONOTONIC & CLOCK_MONOTONIC_HR)
+#define CLOCKS_MASK  (CLOCK_REALTIME | CLOCK_MONOTONIC)
+#define CLOCKS_MONO (CLOCK_MONOTONIC)
 
 /*
  * The various flags for setting POSIX.1b interval timers.

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 07/21] Coding style clean up of clock constants
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (5 preceding siblings ...)
  2005-12-06  0:01 ` [patch 06/21] Remove unused clock constants tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 08/21] Coding style and white space cleanup tglx
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: time-h-clean-up-clock-constants.patch --]
[-- Type: text/plain, Size: 1383 bytes --]


- clean up the CLOCK_ portions of time.h

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 include/linux/time.h |   23 +++++++++--------------
 1 files changed, 9 insertions(+), 14 deletions(-)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -99,30 +99,25 @@ struct	itimerval {
 	struct	timeval it_value;	/* current value */
 };
 
-
 /*
  * The IDs of the various system clocks (for POSIX.1b interval timers).
  */
-#define CLOCK_REALTIME		 0
-#define CLOCK_MONOTONIC	  	 1
-#define CLOCK_PROCESS_CPUTIME_ID 2
-#define CLOCK_THREAD_CPUTIME_ID	 3
+#define CLOCK_REALTIME			0
+#define CLOCK_MONOTONIC			1
+#define CLOCK_PROCESS_CPUTIME_ID	2
+#define CLOCK_THREAD_CPUTIME_ID		3
 
 /*
  * The IDs of various hardware clocks
  */
-
-
-#define CLOCK_SGI_CYCLE 10
-#define MAX_CLOCKS 16
-#define CLOCKS_MASK  (CLOCK_REALTIME | CLOCK_MONOTONIC)
-#define CLOCKS_MONO (CLOCK_MONOTONIC)
+#define CLOCK_SGI_CYCLE			10
+#define MAX_CLOCKS			16
+#define CLOCKS_MASK			(CLOCK_REALTIME | CLOCK_MONOTONIC)
+#define CLOCKS_MONO			CLOCK_MONOTONIC
 
 /*
  * The various flags for setting POSIX.1b interval timers.
  */
-
-#define TIMER_ABSTIME 0x01
-
+#define TIMER_ABSTIME			0x01
 
 #endif

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 08/21] Coding style and white space cleanup
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (6 preceding siblings ...)
  2005-12-06  0:01 ` [patch 07/21] Coding style clean up of " tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 09/21] Make clockid_t arguments const tglx
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: time-h-clean-up-rest.patch --]
[-- Type: text/plain, Size: 4579 bytes --]


- style and whitespace cleanup of the rest of time.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 include/linux/time.h |   63 +++++++++++++++++++++++++--------------------------
 1 files changed, 32 insertions(+), 31 deletions(-)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -4,7 +4,7 @@
 #include <linux/types.h>
 
 #ifdef __KERNEL__
-#include <linux/seqlock.h>
+# include <linux/seqlock.h>
 #endif
 
 #ifndef _STRUCT_TIMESPEC
@@ -13,7 +13,7 @@ struct timespec {
 	time_t	tv_sec;		/* seconds */
 	long	tv_nsec;	/* nanoseconds */
 };
-#endif /* _STRUCT_TIMESPEC */
+#endif
 
 struct timeval {
 	time_t		tv_sec;		/* seconds */
@@ -27,16 +27,16 @@ struct timezone {
 
 #ifdef __KERNEL__
 
-/* Parameters used to convert the timespec values */
-#define MSEC_PER_SEC (1000L)
-#define USEC_PER_SEC (1000000L)
-#define NSEC_PER_SEC (1000000000L)
-#define NSEC_PER_USEC (1000L)
+/* Parameters used to convert the timespec values: */
+#define MSEC_PER_SEC		1000L
+#define USEC_PER_SEC		1000000L
+#define NSEC_PER_SEC		1000000000L
+#define NSEC_PER_USEC		1000L
 
-static __inline__ int timespec_equal(struct timespec *a, struct timespec *b) 
-{ 
+static __inline__ int timespec_equal(struct timespec *a, struct timespec *b)
+{
 	return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec);
-} 
+}
 
 extern unsigned long mktime(const unsigned int year, const unsigned int mon,
 			    const unsigned int day, const unsigned int hour,
@@ -49,25 +49,26 @@ extern struct timespec wall_to_monotonic
 extern seqlock_t xtime_lock;
 
 static inline unsigned long get_seconds(void)
-{ 
+{
 	return xtime.tv_sec;
 }
 
 struct timespec current_kernel_time(void);
 
-#define CURRENT_TIME (current_kernel_time())
-#define CURRENT_TIME_SEC ((struct timespec) { xtime.tv_sec, 0 })
+#define CURRENT_TIME		(current_kernel_time())
+#define CURRENT_TIME_SEC	((struct timespec) { xtime.tv_sec, 0 })
 
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
 extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz);
-extern void clock_was_set(void); // call when ever the clock is set
+extern void clock_was_set(void); // call whenever the clock is set
 extern int do_posix_clock_monotonic_gettime(struct timespec *tp);
-extern long do_utimes(char __user * filename, struct timeval * times);
+extern long do_utimes(char __user *filename, struct timeval *times);
 struct itimerval;
-extern int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue);
+extern int do_setitimer(int which, struct itimerval *value,
+			struct itimerval *ovalue);
 extern int do_getitimer(int which, struct itimerval *value);
-extern void getnstimeofday (struct timespec *tv);
+extern void getnstimeofday(struct timespec *tv);
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 
@@ -83,24 +84,24 @@ extern struct timespec timespec_trunc(st
 
 /*
  * Names of the interval timers, and structure
- * defining a timer setting.
+ * defining a timer setting:
  */
-#define	ITIMER_REAL	0
-#define	ITIMER_VIRTUAL	1
-#define	ITIMER_PROF	2
-
-struct  itimerspec {
-        struct  timespec it_interval;    /* timer period */
-        struct  timespec it_value;       /* timer expiration */
+#define	ITIMER_REAL		0
+#define	ITIMER_VIRTUAL		1
+#define	ITIMER_PROF		2
+
+struct itimerspec {
+	struct timespec it_interval;	/* timer period */
+	struct timespec it_value;	/* timer expiration */
 };
 
-struct	itimerval {
-	struct	timeval it_interval;	/* timer interval */
-	struct	timeval it_value;	/* current value */
+struct itimerval {
+	struct timeval it_interval;	/* timer interval */
+	struct timeval it_value;	/* current value */
 };
 
 /*
- * The IDs of the various system clocks (for POSIX.1b interval timers).
+ * The IDs of the various system clocks (for POSIX.1b interval timers):
  */
 #define CLOCK_REALTIME			0
 #define CLOCK_MONOTONIC			1
@@ -108,7 +109,7 @@ struct	itimerval {
 #define CLOCK_THREAD_CPUTIME_ID		3
 
 /*
- * The IDs of various hardware clocks
+ * The IDs of various hardware clocks:
  */
 #define CLOCK_SGI_CYCLE			10
 #define MAX_CLOCKS			16
@@ -116,7 +117,7 @@ struct	itimerval {
 #define CLOCKS_MONO			CLOCK_MONOTONIC
 
 /*
- * The various flags for setting POSIX.1b interval timers.
+ * The various flags for setting POSIX.1b interval timers:
  */
 #define TIMER_ABSTIME			0x01
 

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 09/21] Make clockid_t arguments const
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (7 preceding siblings ...)
  2005-12-06  0:01 ` [patch 08/21] Coding style and white space cleanup tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 10/21] Coding style and white space cleanup tglx
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: posix-timer-const-overhaul.patch --]
[-- Type: text/plain, Size: 14194 bytes --]


- add const arguments to the posix-timers.h API functions

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/posix-timers.h |   22 +++++++++++-----------
 kernel/posix-cpu-timers.c    |   40 ++++++++++++++++++++++------------------
 kernel/posix-timers.c        |   38 +++++++++++++++++++++-----------------
 3 files changed, 54 insertions(+), 46 deletions(-)

Index: linux-2.6.15-rc5/include/linux/posix-timers.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/posix-timers.h
+++ linux-2.6.15-rc5/include/linux/posix-timers.h
@@ -72,12 +72,12 @@ struct k_clock_abs {
 };
 struct k_clock {
 	int res;		/* in nano seconds */
-	int (*clock_getres) (clockid_t which_clock, struct timespec *tp);
+	int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
 	struct k_clock_abs *abs_struct;
-	int (*clock_set) (clockid_t which_clock, struct timespec * tp);
-	int (*clock_get) (clockid_t which_clock, struct timespec * tp);
+	int (*clock_set) (const clockid_t which_clock, struct timespec * tp);
+	int (*clock_get) (const clockid_t which_clock, struct timespec * tp);
 	int (*timer_create) (struct k_itimer *timer);
-	int (*nsleep) (clockid_t which_clock, int flags, struct timespec *);
+	int (*nsleep) (const clockid_t which_clock, int flags, struct timespec *);
 	int (*timer_set) (struct k_itimer * timr, int flags,
 			  struct itimerspec * new_setting,
 			  struct itimerspec * old_setting);
@@ -87,12 +87,12 @@ struct k_clock {
 			   struct itimerspec * cur_setting);
 };
 
-void register_posix_clock(clockid_t clock_id, struct k_clock *new_clock);
+void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock);
 
 /* Error handlers for timer_create, nanosleep and settime */
 int do_posix_clock_notimer_create(struct k_itimer *timer);
-int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *);
-int do_posix_clock_nosettime(clockid_t, struct timespec *tp);
+int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *);
+int do_posix_clock_nosettime(const clockid_t, struct timespec *tp);
 
 /* function to call to trigger timer event */
 int posix_timer_event(struct k_itimer *timr, int si_private);
@@ -117,11 +117,11 @@ struct now_struct {
               }								\
             }while (0)
 
-int posix_cpu_clock_getres(clockid_t which_clock, struct timespec *);
-int posix_cpu_clock_get(clockid_t which_clock, struct timespec *);
-int posix_cpu_clock_set(clockid_t which_clock, const struct timespec *tp);
+int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *);
+int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *);
+int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp);
 int posix_cpu_timer_create(struct k_itimer *);
-int posix_cpu_nsleep(clockid_t, int, struct timespec *);
+int posix_cpu_nsleep(const clockid_t, int, struct timespec *);
 int posix_cpu_timer_set(struct k_itimer *, int,
 			struct itimerspec *, struct itimerspec *);
 int posix_cpu_timer_del(struct k_itimer *);
Index: linux-2.6.15-rc5/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-cpu-timers.c
+++ linux-2.6.15-rc5/kernel/posix-cpu-timers.c
@@ -7,7 +7,7 @@
 #include <asm/uaccess.h>
 #include <linux/errno.h>
 
-static int check_clock(clockid_t which_clock)
+static int check_clock(const clockid_t which_clock)
 {
 	int error = 0;
 	struct task_struct *p;
@@ -31,7 +31,7 @@ static int check_clock(clockid_t which_c
 }
 
 static inline union cpu_time_count
-timespec_to_sample(clockid_t which_clock, const struct timespec *tp)
+timespec_to_sample(const clockid_t which_clock, const struct timespec *tp)
 {
 	union cpu_time_count ret;
 	ret.sched = 0;		/* high half always zero when .cpu used */
@@ -43,7 +43,7 @@ timespec_to_sample(clockid_t which_clock
 	return ret;
 }
 
-static void sample_to_timespec(clockid_t which_clock,
+static void sample_to_timespec(const clockid_t which_clock,
 			       union cpu_time_count cpu,
 			       struct timespec *tp)
 {
@@ -55,7 +55,7 @@ static void sample_to_timespec(clockid_t
 	}
 }
 
-static inline int cpu_time_before(clockid_t which_clock,
+static inline int cpu_time_before(const clockid_t which_clock,
 				  union cpu_time_count now,
 				  union cpu_time_count then)
 {
@@ -65,7 +65,7 @@ static inline int cpu_time_before(clocki
 		return cputime_lt(now.cpu, then.cpu);
 	}
 }
-static inline void cpu_time_add(clockid_t which_clock,
+static inline void cpu_time_add(const clockid_t which_clock,
 				union cpu_time_count *acc,
 			        union cpu_time_count val)
 {
@@ -75,7 +75,7 @@ static inline void cpu_time_add(clockid_
 		acc->cpu = cputime_add(acc->cpu, val.cpu);
 	}
 }
-static inline union cpu_time_count cpu_time_sub(clockid_t which_clock,
+static inline union cpu_time_count cpu_time_sub(const clockid_t which_clock,
 						union cpu_time_count a,
 						union cpu_time_count b)
 {
@@ -151,7 +151,7 @@ static inline unsigned long long sched_n
 	return (p == current) ? current_sched_time(p) : p->sched_time;
 }
 
-int posix_cpu_clock_getres(clockid_t which_clock, struct timespec *tp)
+int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *tp)
 {
 	int error = check_clock(which_clock);
 	if (!error) {
@@ -169,7 +169,7 @@ int posix_cpu_clock_getres(clockid_t whi
 	return error;
 }
 
-int posix_cpu_clock_set(clockid_t which_clock, const struct timespec *tp)
+int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp)
 {
 	/*
 	 * You can never reset a CPU clock, but we check for other errors
@@ -186,7 +186,7 @@ int posix_cpu_clock_set(clockid_t which_
 /*
  * Sample a per-thread clock for the given task.
  */
-static int cpu_clock_sample(clockid_t which_clock, struct task_struct *p,
+static int cpu_clock_sample(const clockid_t which_clock, struct task_struct *p,
 			    union cpu_time_count *cpu)
 {
 	switch (CPUCLOCK_WHICH(which_clock)) {
@@ -259,7 +259,7 @@ static int cpu_clock_sample_group_locked
  * Sample a process (thread group) clock for the given group_leader task.
  * Must be called with tasklist_lock held for reading.
  */
-static int cpu_clock_sample_group(clockid_t which_clock,
+static int cpu_clock_sample_group(const clockid_t which_clock,
 				  struct task_struct *p,
 				  union cpu_time_count *cpu)
 {
@@ -273,7 +273,7 @@ static int cpu_clock_sample_group(clocki
 }
 
 
-int posix_cpu_clock_get(clockid_t which_clock, struct timespec *tp)
+int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *tp)
 {
 	const pid_t pid = CPUCLOCK_PID(which_clock);
 	int error = -EINVAL;
@@ -1410,7 +1410,7 @@ void set_process_cpu_timer(struct task_s
 
 static long posix_cpu_clock_nanosleep_restart(struct restart_block *);
 
-int posix_cpu_nsleep(clockid_t which_clock, int flags,
+int posix_cpu_nsleep(const clockid_t which_clock, int flags,
 		     struct timespec *rqtp)
 {
 	struct restart_block *restart_block =
@@ -1514,11 +1514,13 @@ posix_cpu_clock_nanosleep_restart(struct
 #define PROCESS_CLOCK	MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED)
 #define THREAD_CLOCK	MAKE_THREAD_CPUCLOCK(0, CPUCLOCK_SCHED)
 
-static int process_cpu_clock_getres(clockid_t which_clock, struct timespec *tp)
+static int process_cpu_clock_getres(const clockid_t which_clock,
+				    struct timespec *tp)
 {
 	return posix_cpu_clock_getres(PROCESS_CLOCK, tp);
 }
-static int process_cpu_clock_get(clockid_t which_clock, struct timespec *tp)
+static int process_cpu_clock_get(const clockid_t which_clock,
+				 struct timespec *tp)
 {
 	return posix_cpu_clock_get(PROCESS_CLOCK, tp);
 }
@@ -1527,16 +1529,18 @@ static int process_cpu_timer_create(stru
 	timer->it_clock = PROCESS_CLOCK;
 	return posix_cpu_timer_create(timer);
 }
-static int process_cpu_nsleep(clockid_t which_clock, int flags,
+static int process_cpu_nsleep(const clockid_t which_clock, int flags,
 			      struct timespec *rqtp)
 {
 	return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp);
 }
-static int thread_cpu_clock_getres(clockid_t which_clock, struct timespec *tp)
+static int thread_cpu_clock_getres(const clockid_t which_clock,
+				   struct timespec *tp)
 {
 	return posix_cpu_clock_getres(THREAD_CLOCK, tp);
 }
-static int thread_cpu_clock_get(clockid_t which_clock, struct timespec *tp)
+static int thread_cpu_clock_get(const clockid_t which_clock,
+				struct timespec *tp)
 {
 	return posix_cpu_clock_get(THREAD_CLOCK, tp);
 }
@@ -1545,7 +1549,7 @@ static int thread_cpu_timer_create(struc
 	timer->it_clock = THREAD_CLOCK;
 	return posix_cpu_timer_create(timer);
 }
-static int thread_cpu_nsleep(clockid_t which_clock, int flags,
+static int thread_cpu_nsleep(const clockid_t which_clock, int flags,
 			      struct timespec *rqtp)
 {
 	return -EINVAL;
Index: linux-2.6.15-rc5/kernel/posix-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-timers.c
+++ linux-2.6.15-rc5/kernel/posix-timers.c
@@ -151,7 +151,7 @@ static void posix_timer_fn(unsigned long
 static u64 do_posix_clock_monotonic_gettime_parts(
 	struct timespec *tp, struct timespec *mo);
 int do_posix_clock_monotonic_gettime(struct timespec *tp);
-static int do_posix_clock_monotonic_get(clockid_t, struct timespec *tp);
+static int do_posix_clock_monotonic_get(const clockid_t, struct timespec *tp);
 
 static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags);
 
@@ -176,7 +176,7 @@ static inline void unlock_timer(struct k
  * the function pointer CALL in struct k_clock.
  */
 
-static inline int common_clock_getres(clockid_t which_clock,
+static inline int common_clock_getres(const clockid_t which_clock,
 				      struct timespec *tp)
 {
 	tp->tv_sec = 0;
@@ -184,13 +184,15 @@ static inline int common_clock_getres(cl
 	return 0;
 }
 
-static inline int common_clock_get(clockid_t which_clock, struct timespec *tp)
+static inline int common_clock_get(const clockid_t which_clock,
+				   struct timespec *tp)
 {
 	getnstimeofday(tp);
 	return 0;
 }
 
-static inline int common_clock_set(clockid_t which_clock, struct timespec *tp)
+static inline int common_clock_set(const clockid_t which_clock,
+				   struct timespec *tp)
 {
 	return do_sys_settimeofday(tp, NULL);
 }
@@ -207,7 +209,7 @@ static inline int common_timer_create(st
 /*
  * These ones are defined below.
  */
-static int common_nsleep(clockid_t, int flags, struct timespec *t);
+static int common_nsleep(const clockid_t, int flags, struct timespec *t);
 static void common_timer_get(struct k_itimer *, struct itimerspec *);
 static int common_timer_set(struct k_itimer *, int,
 			    struct itimerspec *, struct itimerspec *);
@@ -216,7 +218,7 @@ static int common_timer_del(struct k_iti
 /*
  * Return nonzero iff we know a priori this clockid_t value is bogus.
  */
-static inline int invalid_clockid(clockid_t which_clock)
+static inline int invalid_clockid(const clockid_t which_clock)
 {
 	if (which_clock < 0)	/* CPU clock, posix_cpu_* will check it */
 		return 0;
@@ -522,7 +524,7 @@ static inline struct task_struct * good_
 	return rtn;
 }
 
-void register_posix_clock(clockid_t clock_id, struct k_clock *new_clock)
+void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
 	if ((unsigned) clock_id >= MAX_CLOCKS) {
 		printk("POSIX clock register failed for clock_id %d\n",
@@ -568,7 +570,7 @@ static void release_posix_timer(struct k
 /* Create a POSIX.1b interval timer. */
 
 asmlinkage long
-sys_timer_create(clockid_t which_clock,
+sys_timer_create(const clockid_t which_clock,
 		 struct sigevent __user *timer_event_spec,
 		 timer_t __user * created_timer_id)
 {
@@ -1195,7 +1197,8 @@ static u64 do_posix_clock_monotonic_gett
 	return jiff;
 }
 
-static int do_posix_clock_monotonic_get(clockid_t clock, struct timespec *tp)
+static int do_posix_clock_monotonic_get(const clockid_t clock,
+					struct timespec *tp)
 {
 	struct timespec wall_to_mono;
 
@@ -1212,7 +1215,7 @@ int do_posix_clock_monotonic_gettime(str
 	return do_posix_clock_monotonic_get(CLOCK_MONOTONIC, tp);
 }
 
-int do_posix_clock_nosettime(clockid_t clockid, struct timespec *tp)
+int do_posix_clock_nosettime(const clockid_t clockid, struct timespec *tp)
 {
 	return -EINVAL;
 }
@@ -1224,7 +1227,8 @@ int do_posix_clock_notimer_create(struct
 }
 EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create);
 
-int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t)
+int do_posix_clock_nonanosleep(const clockid_t clock, int flags,
+			       struct timespec *t)
 {
 #ifndef ENOTSUP
 	return -EOPNOTSUPP;	/* aka ENOTSUP in userland for POSIX */
@@ -1234,8 +1238,8 @@ int do_posix_clock_nonanosleep(clockid_t
 }
 EXPORT_SYMBOL_GPL(do_posix_clock_nonanosleep);
 
-asmlinkage long
-sys_clock_settime(clockid_t which_clock, const struct timespec __user *tp)
+asmlinkage long sys_clock_settime(const clockid_t which_clock,
+				  const struct timespec __user *tp)
 {
 	struct timespec new_tp;
 
@@ -1248,7 +1252,7 @@ sys_clock_settime(clockid_t which_clock,
 }
 
 asmlinkage long
-sys_clock_gettime(clockid_t which_clock, struct timespec __user *tp)
+sys_clock_gettime(const clockid_t which_clock, struct timespec __user *tp)
 {
 	struct timespec kernel_tp;
 	int error;
@@ -1265,7 +1269,7 @@ sys_clock_gettime(clockid_t which_clock,
 }
 
 asmlinkage long
-sys_clock_getres(clockid_t which_clock, struct timespec __user *tp)
+sys_clock_getres(const clockid_t which_clock, struct timespec __user *tp)
 {
 	struct timespec rtn_tp;
 	int error;
@@ -1387,7 +1391,7 @@ void clock_was_set(void)
 long clock_nanosleep_restart(struct restart_block *restart_block);
 
 asmlinkage long
-sys_clock_nanosleep(clockid_t which_clock, int flags,
+sys_clock_nanosleep(const clockid_t which_clock, int flags,
 		    const struct timespec __user *rqtp,
 		    struct timespec __user *rmtp)
 {
@@ -1419,7 +1423,7 @@ sys_clock_nanosleep(clockid_t which_cloc
 }
 
 
-static int common_nsleep(clockid_t which_clock,
+static int common_nsleep(const clockid_t which_clock,
 			 int flags, struct timespec *tsave)
 {
 	struct timespec t, dum;

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 10/21] Coding style and white space cleanup
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (8 preceding siblings ...)
  2005-12-06  0:01 ` [patch 09/21] Make clockid_t arguments const tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 11/21] Create and use timespec_valid macro tglx
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: posix-timer-h-cleanup.patch --]
[-- Type: text/plain, Size: 5632 bytes --]


- style/whitespace/macro cleanups of posix-timers.h

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/posix-timers.h |   78 +++++++++++++++++++++++--------------------
 1 files changed, 43 insertions(+), 35 deletions(-)

Index: linux-2.6.15-rc5/include/linux/posix-timers.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/posix-timers.h
+++ linux-2.6.15-rc5/include/linux/posix-timers.h
@@ -42,7 +42,7 @@ struct k_itimer {
 	timer_t it_id;			/* timer id */
 	int it_overrun;			/* overrun on pending signal  */
 	int it_overrun_last;		/* overrun on last delivered signal */
-	int it_requeue_pending;         /* waiting to requeue this timer */
+	int it_requeue_pending;		/* waiting to requeue this timer */
 #define REQUEUE_PENDING 1
 	int it_sigev_notify;		/* notify word of sigevent struct */
 	int it_sigev_signo;		/* signo word of sigevent struct */
@@ -52,8 +52,10 @@ struct k_itimer {
 	union {
 		struct {
 			struct timer_list timer;
-			struct list_head abs_timer_entry; /* clock abs_timer_list */
-			struct timespec wall_to_prev;   /* wall_to_monotonic used when set */
+			/* clock abs_timer_list: */
+			struct list_head abs_timer_entry;
+			/* wall_to_monotonic used when set: */
+			struct timespec wall_to_prev;
 			unsigned long incr; /* interval in jiffies */
 		} real;
 		struct cpu_timer_list cpu;
@@ -70,14 +72,16 @@ struct k_clock_abs {
 	struct list_head list;
 	spinlock_t lock;
 };
+
 struct k_clock {
-	int res;		/* in nano seconds */
+	int res;		/* in nanoseconds */
 	int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
 	struct k_clock_abs *abs_struct;
 	int (*clock_set) (const clockid_t which_clock, struct timespec * tp);
 	int (*clock_get) (const clockid_t which_clock, struct timespec * tp);
 	int (*timer_create) (struct k_itimer *timer);
-	int (*nsleep) (const clockid_t which_clock, int flags, struct timespec *);
+	int (*nsleep) (const clockid_t which_clock, int flags,
+		       struct timespec *);
 	int (*timer_set) (struct k_itimer * timr, int flags,
 			  struct itimerspec * new_setting,
 			  struct itimerspec * old_setting);
@@ -89,7 +93,7 @@ struct k_clock {
 
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock);
 
-/* Error handlers for timer_create, nanosleep and settime */
+/* error handlers for timer_create, nanosleep and settime */
 int do_posix_clock_notimer_create(struct k_itimer *timer);
 int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *);
 int do_posix_clock_nosettime(const clockid_t, struct timespec *tp);
@@ -101,39 +105,43 @@ struct now_struct {
 	unsigned long jiffies;
 };
 
-#define posix_get_now(now) (now)->jiffies = jiffies;
+#define posix_get_now(now) \
+	do { (now)->jiffies = jiffies; } while (0)
+
 #define posix_time_before(timer, now) \
                       time_before((timer)->expires, (now)->jiffies)
 
 #define posix_bump_timer(timr, now)					\
-         do {								\
-              long delta, orun;						\
-	      delta = now.jiffies - (timr)->it.real.timer.expires;	\
-              if (delta >= 0) {						\
-	           orun = 1 + (delta / (timr)->it.real.incr);		\
-	          (timr)->it.real.timer.expires +=			\
-			 orun * (timr)->it.real.incr;			\
-                  (timr)->it_overrun += orun;				\
-              }								\
-            }while (0)
-
-int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *);
-int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *);
-int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *tp);
-int posix_cpu_timer_create(struct k_itimer *);
-int posix_cpu_nsleep(const clockid_t, int, struct timespec *);
-int posix_cpu_timer_set(struct k_itimer *, int,
-			struct itimerspec *, struct itimerspec *);
-int posix_cpu_timer_del(struct k_itimer *);
-void posix_cpu_timer_get(struct k_itimer *, struct itimerspec *);
-
-void posix_cpu_timer_schedule(struct k_itimer *);
-
-void run_posix_cpu_timers(struct task_struct *);
-void posix_cpu_timers_exit(struct task_struct *);
-void posix_cpu_timers_exit_group(struct task_struct *);
+	do {								\
+		long delta, orun;					\
+									\
+		delta = (now).jiffies - (timr)->it.real.timer.expires;	\
+		if (delta >= 0) {					\
+			orun = 1 + (delta / (timr)->it.real.incr);	\
+			(timr)->it.real.timer.expires +=		\
+				orun * (timr)->it.real.incr;		\
+			(timr)->it_overrun += orun;			\
+		}							\
+	} while (0)
+
+int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *ts);
+int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *ts);
+int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *ts);
+int posix_cpu_timer_create(struct k_itimer *timer);
+int posix_cpu_nsleep(const clockid_t which_clock, int flags,
+		     struct timespec *ts);
+int posix_cpu_timer_set(struct k_itimer *timer, int flags,
+			struct itimerspec *new, struct itimerspec *old);
+int posix_cpu_timer_del(struct k_itimer *timer);
+void posix_cpu_timer_get(struct k_itimer *timer, struct itimerspec *itp);
+
+void posix_cpu_timer_schedule(struct k_itimer *timer);
+
+void run_posix_cpu_timers(struct task_struct *task);
+void posix_cpu_timers_exit(struct task_struct *task);
+void posix_cpu_timers_exit_group(struct task_struct *task);
 
-void set_process_cpu_timer(struct task_struct *, unsigned int,
-			   cputime_t *, cputime_t *);
+void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
+			   cputime_t *newval, cputime_t *oldval);
 
 #endif

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 11/21] Create and use timespec_valid macro
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (9 preceding siblings ...)
  2005-12-06  0:01 ` [patch 10/21] Coding style and white space cleanup tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 12/21] Validate timespec of do_sys_settimeofday tglx
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: introduce-timespec-valid.patch --]
[-- Type: text/plain, Size: 1608 bytes --]


- add timespec_valid(ts) [returns false if the timespec is denorm]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/time.h  |    6 ++++++
 kernel/posix-timers.c |    5 ++---
 2 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -44,6 +44,12 @@ extern unsigned long mktime(const unsign
 
 extern void set_normalized_timespec(struct timespec *ts, time_t sec, long nsec);
 
+/*
+ * Returns true if the timespec is norm, false if denorm:
+ */
+#define timespec_valid(ts) \
+	(((ts)->tv_sec >= 0) && (((unsigned) (ts)->tv_nsec) < NSEC_PER_SEC))
+
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock;
Index: linux-2.6.15-rc5/kernel/posix-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-timers.c
+++ linux-2.6.15-rc5/kernel/posix-timers.c
@@ -712,8 +712,7 @@ out:
  */
 static int good_timespec(const struct timespec *ts)
 {
-	if ((!ts) || (ts->tv_sec < 0) ||
-			((unsigned) ts->tv_nsec >= NSEC_PER_SEC))
+	if ((!ts) || !timespec_valid(ts))
 		return 0;
 	return 1;
 }
@@ -1406,7 +1405,7 @@ sys_clock_nanosleep(const clockid_t whic
 	if (copy_from_user(&t, rqtp, sizeof (struct timespec)))
 		return -EFAULT;
 
-	if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0)
+	if (!timespec_valid(&t))
 		return -EINVAL;
 
 	/*

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 12/21] Validate timespec of do_sys_settimeofday
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (10 preceding siblings ...)
  2005-12-06  0:01 ` [patch 11/21] Create and use timespec_valid macro tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 13/21] Introduce nsec_t type and conversion functions tglx
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: sys-settimeofday-check-timespec.patch --]
[-- Type: text/plain, Size: 642 bytes --]


- Check if the timespec which is provided from user space is
  normalized.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 kernel/time.c |    3 +++
 1 files changed, 3 insertions(+)

Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -154,6 +154,9 @@ int do_sys_settimeofday(struct timespec 
 	static int firsttime = 1;
 	int error = 0;
 
+	if (!timespec_valid(tv))
+		return -EINVAL;
+
 	error = security_settime(tv, tz);
 	if (error)
 		return error;

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 13/21] Introduce nsec_t type and conversion functions
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (11 preceding siblings ...)
  2005-12-06  0:01 ` [patch 12/21] Validate timespec of do_sys_settimeofday tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 14/21] Introduce ktime_t time format tglx
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: nsec-t.patch --]
[-- Type: text/plain, Size: 3542 bytes --]


- introduce the nsec_t type
- basic nsec conversion routines: timespec_to_ns(), timeval_to_ns(),
  ns_to_timespec(), ns_to_timeval().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/time.h |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/time.c        |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -50,6 +50,12 @@ extern void set_normalized_timespec(stru
 #define timespec_valid(ts) \
 	(((ts)->tv_sec >= 0) && (((unsigned) (ts)->tv_nsec) < NSEC_PER_SEC))
 
+/*
+ * 64-bit nanosec type. Large enough to span 292+ years in nanosecond
+ * resolution. Ought to be enough for a while.
+ */
+typedef s64 nsec_t;
+
 extern struct timespec xtime;
 extern struct timespec wall_to_monotonic;
 extern seqlock_t xtime_lock;
@@ -78,6 +84,47 @@ extern void getnstimeofday(struct timesp
 
 extern struct timespec timespec_trunc(struct timespec t, unsigned gran);
 
+/**
+ * timespec_to_ns - Convert timespec to nanoseconds
+ * @ts:		pointer to the timespec variable to be converted
+ *
+ * Returns the scalar nanosecond representation of the timespec
+ * parameter.
+ */
+static inline nsec_t timespec_to_ns(const struct timespec *ts)
+{
+	return ((nsec_t) ts->tv_sec * NSEC_PER_SEC) + ts->tv_nsec;
+}
+
+/**
+ * timeval_to_ns - Convert timeval to nanoseconds
+ * @ts:		pointer to the timeval variable to be converted
+ *
+ * Returns the scalar nanosecond representation of the timeval
+ * parameter.
+ */
+static inline nsec_t timeval_to_ns(const struct timeval *tv)
+{
+	return ((nsec_t) tv->tv_sec * NSEC_PER_SEC) +
+		tv->tv_usec * NSEC_PER_USEC;
+}
+
+/**
+ * ns_to_timespec - Convert nanoseconds to timespec
+ * @nsec:	the nanoseconds value to be converted
+ *
+ * Returns the timespec representation of the nsec parameter.
+ */
+extern struct timespec ns_to_timespec(const nsec_t nsec);
+
+/**
+ * ns_to_timeval - Convert nanoseconds to timeval
+ * @nsec:	the nanoseconds value to be converted
+ *
+ * Returns the timeval representation of the nsec parameter.
+ */
+extern struct timeval ns_to_timeval(const nsec_t nsec);
+
 #endif /* __KERNEL__ */
 
 #define NFDBITS			__NFDBITS
Index: linux-2.6.15-rc5/kernel/time.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/time.c
+++ linux-2.6.15-rc5/kernel/time.c
@@ -630,6 +630,42 @@ void set_normalized_timespec(struct time
 	ts->tv_nsec = nsec;
 }
 
+/**
+ * ns_to_timespec - Convert nanoseconds to timespec
+ * @nsec:       the nanoseconds value to be converted
+ *
+ * Returns the timespec representation of the nsec parameter.
+ */
+inline struct timespec ns_to_timespec(const nsec_t nsec)
+{
+	struct timespec ts;
+
+	if (nsec)
+		ts.tv_sec = div_long_long_rem_signed(nsec, NSEC_PER_SEC,
+						     &ts.tv_nsec);
+	else
+		ts.tv_sec = ts.tv_nsec = 0;
+
+	return ts;
+}
+
+/**
+ * ns_to_timeval - Convert nanoseconds to timeval
+ * @nsec:       the nanoseconds value to be converted
+ *
+ * Returns the timeval representation of the nsec parameter.
+ */
+struct timeval ns_to_timeval(const nsec_t nsec)
+{
+	struct timespec ts = ns_to_timespec(nsec);
+	struct timeval tv;
+
+	tv.tv_sec = ts.tv_sec;
+	tv.tv_usec = (suseconds_t) ts.tv_nsec / 1000;
+
+	return tv;
+}
+
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
 {

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 14/21] Introduce ktime_t time format
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (12 preceding siblings ...)
  2005-12-06  0:01 ` [patch 13/21] Introduce nsec_t type and conversion functions tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 15/21] hrtimer core code tglx
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: ktime-t.patch --]
[-- Type: text/plain, Size: 8657 bytes --]


- introduce ktime_t: nanosecond-resolution time format.

- eliminate the plain s64 scalar type, and always use the union.
  This simplifies the arithmetics. Idea from Roman Zippel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 arch/i386/Kconfig     |    4 
 include/linux/ktime.h |  269 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 273 insertions(+)

Index: linux-2.6.15-rc5/arch/i386/Kconfig
===================================================================
--- linux-2.6.15-rc5.orig/arch/i386/Kconfig
+++ linux-2.6.15-rc5/arch/i386/Kconfig
@@ -1055,3 +1055,7 @@ config X86_TRAMPOLINE
 	bool
 	depends on X86_SMP || (X86_VOYAGER && SMP)
 	default y
+
+config KTIME_SCALAR
+	bool
+	default y
Index: linux-2.6.15-rc5/include/linux/ktime.h
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/include/linux/ktime.h
@@ -0,0 +1,269 @@
+/*
+ *  include/linux/ktime.h
+ *
+ *  ktime_t - nanosecond-resolution time format.
+ *
+ *   Copyright(C) 2005, Thomas Gleixner <tglx@linutronix.de>
+ *   Copyright(C) 2005, Red Hat, Inc., Ingo Molnar
+ *
+ *  data type definitions, declarations, prototypes and macros.
+ *
+ *  Started by: Thomas Gleixner and Ingo Molnar
+ *
+ *  For licencing details see kernel-base/COPYING
+ */
+#ifndef _LINUX_KTIME_H
+#define _LINUX_KTIME_H
+
+#include <linux/time.h>
+#include <linux/jiffies.h>
+
+/*
+ * ktime_t:
+ *
+ * On 64-bit CPUs a single 64-bit variable is used to store the hrtimers
+ * internal representation of time values in scalar nanoseconds. The
+ * design plays out best on 64-bit CPUs, where most conversions are
+ * NOPs and most arithmetic ktime_t operations are plain arithmetic
+ * operations.
+ *
+ * On 32-bit CPUs an optimized representation of the timespec structure
+ * is used to avoid expensive conversions from and to timespecs. The
+ * endian-aware order of the tv struct members is choosen to allow
+ * mathematical operations on the tv64 member of the union too, which
+ * for certain operations produces better code.
+ *
+ * For architectures with efficient support for 64/32-bit conversions the
+ * plain scalar nanosecond based representation can be selected by the
+ * config switch CONFIG_KTIME_SCALAR.
+ */
+typedef union {
+	s64	tv64;
+#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+	struct {
+# ifdef __BIG_ENDIAN
+	s32	sec, nsec;
+# else
+	s32	nsec, sec;
+# endif
+	} tv;
+#endif
+} ktime_t;
+
+#define KTIME_MAX			(~((u64)1 << 63))
+
+/*
+ * ktime_t definitions when using the 64-bit scalar representation:
+ */
+
+#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
+
+/* Define a ktime_t variable and initialize it to zero: */
+#define DEFINE_KTIME(kt)		ktime_t kt = { .tv64 = 0 }
+
+/**
+ * ktime_set - Set a ktime_t variable from a seconds/nanoseconds value
+ *
+ * @secs:	seconds to set
+ * @nsecs:	nanoseconds to set
+ *
+ * Return the ktime_t representation of the value
+ */
+static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
+{
+	return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
+}
+
+/* Subtract two ktime_t variables. rem = lhs -rhs: */
+#define ktime_sub(lhs, rhs) \
+		({ (ktime_t){ .tv64 = (lhs).tv64 - (rhs).tv64 }; })
+
+/* Add two ktime_t variables. res = lhs + rhs: */
+#define ktime_add(lhs, rhs) \
+		({ (ktime_t){ .tv64 = (lhs).tv64 + (rhs).tv64 }; })
+
+/*
+ * Add a ktime_t variable and a scalar nanosecond value.
+ * res = kt + nsval:
+ */
+#define ktime_add_ns(kt, nsval) \
+		({ (ktime_t){ .tv64 = (kt).tv64 + (nsval) }; })
+
+/* convert a timespec to ktime_t format: */
+#define timespec_to_ktime(ts)		ktime_set((ts).tv_sec, (ts).tv_nsec)
+
+/* convert a timeval to ktime_t format: */
+#define timeval_to_ktime(tv)		ktime_set((tv).tv_sec, (tv).tv_usec * 1000)
+
+/* Map the ktime_t to timespec conversion to ns_to_timespec function */
+#define ktime_to_timespec(kt)		ns_to_timespec((kt).tv64)
+
+/* Map the ktime_t to timeval conversion to ns_to_timeval function */
+#define ktime_to_timeval(kt)		ns_to_timeval((kt).tv64)
+
+/* Map the ktime_t to clock_t conversion to the inline in jiffies.h: */
+#define ktime_to_clock_t(kt)		nsec_to_clock_t((kt).tv64)
+
+/* Convert ktime_t to nanoseconds - NOP in the scalar storage format: */
+#define ktime_to_ns(kt)			((kt).tv64)
+
+#else
+
+/*
+ * Helper macros/inlines to get the ktime_t math right in the timespec
+ * representation. The macros are sometimes ugly - their actual use is
+ * pretty okay-ish, given the circumstances. We do all this for
+ * performance reasons. The pure scalar nsec_t based code was nice and
+ * simple, but created too many 64-bit / 32-bit conversions and divisions.
+ *
+ * Be especially aware that negative values are represented in a way
+ * that the tv.sec field is negative and the tv.nsec field is greater
+ * or equal to zero but less than nanoseconds per second. This is the
+ * same representation which is used by timespecs.
+ *
+ *   tv.sec < 0 and 0 >= tv.nsec < NSEC_PER_SEC
+ */
+
+/* Define a ktime_t variable and initialize it to zero: */
+#define DEFINE_KTIME(kt)		ktime_t kt = { .tv64 = 0 }
+
+/* Set a ktime_t variable to a value in sec/nsec representation: */
+static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
+{
+	return (ktime_t) { .tv = { .sec = secs, .nsec = nsecs } };
+}
+
+/**
+ * ktime_sub - subtract two ktime_t variables
+ *
+ * @lhs:	minuend
+ * @rhs:	subtrahend
+ *
+ * Returns the remainder of the substraction
+ */
+static inline ktime_t ktime_sub(const ktime_t lhs, const ktime_t rhs)
+{
+	ktime_t res;
+
+	res.tv64 = lhs.tv64 - rhs.tv64;
+	if (res.tv.nsec < 0)
+		res.tv.nsec += NSEC_PER_SEC;
+
+	return res;
+}
+
+/**
+ * ktime_add - add two ktime_t variables
+ *
+ * @add1:	addend1
+ * @add2:	addend2
+ *
+ * Returns the sum of addend1 and addend2
+ */
+static inline ktime_t ktime_add(const ktime_t add1, const ktime_t add2)
+{
+	ktime_t res;
+
+	res.tv64 = add1.tv64 + add2.tv64;
+	/*
+	 * performance trick: the (u32) -NSEC gives 0x00000000Fxxxxxxx
+	 * so we subtract NSEC_PER_SEC and add 1 to the upper 32 bit.
+	 *
+	 * it's equivalent to:
+	 *   tv.nsec -= NSEC_PER_SEC
+	 *   tv.sec ++;
+	 */
+	if (res.tv.nsec >= NSEC_PER_SEC)
+		res.tv64 += (u32)-NSEC_PER_SEC;
+
+	return res;
+}
+
+/**
+ * ktime_add_ns - Add a scalar nanoseconds value to a ktime_t variable
+ *
+ * @kt:		addend
+ * @nsec:	the scalar nsec value to add
+ *
+ * Returns the sum of kt and nsec in ktime_t format
+ */
+extern ktime_t ktime_add_ns(const ktime_t kt, u64 nsec);
+
+/**
+ * timespec_to_ktime - convert a timespec to ktime_t format
+ *
+ * @ts:		the timespec variable to convert
+ *
+ * Returns a ktime_t variable with the converted timespec value
+ */
+static inline ktime_t timespec_to_ktime(const struct timespec ts)
+{
+	return (ktime_t) { .tv = { .sec = (s32)ts.tv_sec,
+			   	   .nsec = (s32)ts.tv_nsec } };
+}
+
+/**
+ * timeval_to_ktime - convert a timeval to ktime_t format
+ *
+ * @tv:		the timeval variable to convert
+ *
+ * Returns a ktime_t variable with the converted timeval value
+ */
+static inline ktime_t timeval_to_ktime(const struct timeval tv)
+{
+	return (ktime_t) { .tv = { .sec = (s32)tv.tv_sec,
+				   .nsec = (s32)tv.tv_usec * 1000 } };
+}
+
+/**
+ * ktime_to_timespec - convert a ktime_t variable to timespec format
+ *
+ * @kt:		the ktime_t variable to convert
+ *
+ * Returns the timespec representation of the ktime value
+ */
+static inline struct timespec ktime_to_timespec(const ktime_t kt)
+{
+	return (struct timespec) { .tv_sec = (time_t) kt.tv.sec,
+				   .tv_nsec = (long) kt.tv.nsec };
+}
+
+/**
+ * ktime_to_timeval - convert a ktime_t variable to timeval format
+ *
+ * @kt:		the ktime_t variable to convert
+ *
+ * Returns the timeval representation of the ktime value
+ */
+static inline struct timeval ktime_to_timeval(const ktime_t kt)
+{
+	return (struct timeval) {
+		.tv_sec = (time_t) kt.tv.sec,
+		.tv_usec = (suseconds_t) (kt.tv.nsec / NSEC_PER_USEC) };
+}
+
+/**
+ * ktime_to_clock_t - convert a ktime_t variable to clock_t format
+ * @kt:		the ktime_t variable to convert
+ *
+ * Returns a clock_t variable with the converted value
+ */
+static inline clock_t ktime_to_clock_t(const ktime_t kt)
+{
+	return nsec_to_clock_t( (u64) kt.tv.sec * NSEC_PER_SEC + kt.tv.nsec);
+}
+
+/**
+ * ktime_to_ns - convert a ktime_t variable to scalar nanoseconds
+ * @kt:		the ktime_t variable to convert
+ *
+ * Returns the scalar nanoseconds representation of kt
+ */
+static inline u64 ktime_to_ns(const ktime_t kt)
+{
+	return (u64) kt.tv.sec * NSEC_PER_SEC + kt.tv.nsec;
+}
+
+#endif
+
+#endif

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 15/21] hrtimer core code
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (13 preceding siblings ...)
  2005-12-06  0:01 ` [patch 14/21] Introduce ktime_t time format tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-15  3:43   ` Matt Helsley
  2005-12-06  0:01 ` [patch 16/21] hrtimer documentation tglx
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-core.patch --]
[-- Type: text/plain, Size: 22496 bytes --]


- hrtimer subsystem core. It is initialized at bootup and expired by the
  timer interrupt, but is otherwise not utilized by any other subsystem yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/hrtimer.h |  130 +++++++++
 include/linux/ktime.h   |   15 +
 init/main.c             |    1 
 kernel/Makefile         |    3 
 kernel/hrtimer.c        |  679 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/timer.c          |    1 
 6 files changed, 828 insertions(+), 1 deletion(-)

Index: linux-2.6.15-rc5/include/linux/hrtimer.h
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/include/linux/hrtimer.h
@@ -0,0 +1,130 @@
+/*
+ *  include/linux/hrtimer.h
+ *
+ *  hrtimers - High-resolution kernel timers
+ *
+ *   Copyright(C) 2005, Thomas Gleixner <tglx@linutronix.de>
+ *   Copyright(C) 2005, Red Hat, Inc., Ingo Molnar
+ *
+ *  data type definitions, declarations, prototypes
+ *
+ *  Started by: Thomas Gleixner and Ingo Molnar
+ *
+ *  For licencing details see kernel-base/COPYING
+ */
+#ifndef _LINUX_HRTIMER_H
+#define _LINUX_HRTIMER_H
+
+#include <linux/rbtree.h>
+#include <linux/ktime.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/wait.h>
+
+/*
+ * Mode arguments of xxx_hrtimer functions:
+ */
+enum hrtimer_mode {
+	HRTIMER_ABS,	/* Time value is absolute */
+	HRTIMER_REL,	/* Time value is relative to now */
+};
+
+enum hrtimer_restart {
+	HRTIMER_NORESTART,
+	HRTIMER_RESTART,
+};
+
+/*
+ * Timer states:
+ */
+enum hrtimer_state {
+	HRTIMER_INACTIVE,	/* Timer is inactive */
+	HRTIMER_EXPIRED,		/* Timer is expired */
+	HRTIMER_PENDING,		/* Timer is pending */
+};
+
+struct hrtimer_base;
+
+/**
+ * struct hrtimer - the basic hrtimer structure
+ *
+ * @node:	red black tree node for time ordered insertion
+ * @list:	list head for easier access to the time ordered list,
+ *		without walking the red black tree.
+ * @expires:	the absolute expiry time in the hrtimers internal
+ *		representation. The time is related to the clock on
+ *		which the timer is based.
+ * @state:	state of the timer
+ * @function:	timer expiry callback function
+ * @data:	argument for the callback function
+ * @base:	pointer to the timer base (per cpu and per clock)
+ *
+ * The hrtimer structure must be initialized by init_hrtimer_#CLOCKTYPE()
+ */
+struct hrtimer {
+	struct rb_node		node;
+	struct list_head	list;
+	ktime_t			expires;
+	enum hrtimer_state	state;
+	int			(*function)(void *);
+	void			*data;
+	struct hrtimer_base	*base;
+};
+
+/**
+ * struct hrtimer_base - the timer base for a specific clock
+ *
+ * @index:	clock type index for per_cpu support when moving a timer
+ *		to a base on another cpu.
+ * @lock:	lock protecting the base and associated timers
+ * @active:	red black tree root node for the active timers
+ * @pending:	list of pending timers for simple time ordered access
+ * @resolution:	the resolution of the clock, in nanoseconds
+ * @get_time:	function to retrieve the current time of the clock
+ * @curr_timer:	the timer which is executing a callback right now
+ */
+struct hrtimer_base {
+	clockid_t		index;
+	spinlock_t		lock;
+	struct rb_root		active;
+	struct list_head	pending;
+	unsigned long		resolution;
+	ktime_t			(*get_time)(void);
+	struct hrtimer		*curr_timer;
+};
+
+/* Exported timer functions: */
+
+/* Initialize timers: */
+extern void hrtimer_init(struct hrtimer *timer, const clockid_t which_clock);
+extern void hrtimer_rebase(struct hrtimer *timer, const clockid_t which_clock);
+
+
+/* Basic timer operations: */
+extern int hrtimer_start(struct hrtimer *timer, ktime_t tim,
+			 const enum hrtimer_mode mode);
+extern int hrtimer_cancel(struct hrtimer *timer);
+extern int hrtimer_try_to_cancel(struct hrtimer *timer);
+
+#define hrtimer_restart(timer) hrtimer_start((timer), (timer)->expires, HRTIMER_ABS)
+
+/* Query timers: */
+extern ktime_t hrtimer_get_remaining(const struct hrtimer *timer);
+extern int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp);
+
+static inline int hrtimer_active(const struct hrtimer *timer)
+{
+	return timer->state == HRTIMER_PENDING;
+}
+
+/* Forward a hrtimer so it expires after now: */
+extern unsigned long hrtimer_forward(struct hrtimer *timer,
+				     const ktime_t interval);
+
+/* Soft interrupt function to run the hrtimer queues: */
+extern void hrtimer_run_queues(void);
+
+/* Bootup initialization: */
+extern void __init hrtimers_init(void);
+
+#endif
Index: linux-2.6.15-rc5/include/linux/ktime.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/ktime.h
+++ linux-2.6.15-rc5/include/linux/ktime.h
@@ -266,4 +266,19 @@ static inline u64 ktime_to_ns(const ktim
 
 #endif
 
+/*
+ * The resolution of the clocks. The resolution value is returned in
+ * the clock_getres() system call to give application programmers an
+ * idea of the (in)accuracy of timers. Timer values are rounded up to
+ * this resolution values.
+ */
+#define KTIME_REALTIME_RES	(NSEC_PER_SEC/HZ)
+#define KTIME_MONOTONIC_RES	(NSEC_PER_SEC/HZ)
+
+/* Get the monotonic time in timespec format: */
+extern void ktime_get_ts(struct timespec *ts);
+
+/* Get the real (wall-) time in timespec format: */
+#define ktime_get_real_ts(ts)	getnstimeofday(ts)
+
 #endif
Index: linux-2.6.15-rc5/init/main.c
===================================================================
--- linux-2.6.15-rc5.orig/init/main.c
+++ linux-2.6.15-rc5/init/main.c
@@ -487,6 +487,7 @@ asmlinkage void __init start_kernel(void
 	init_IRQ();
 	pidhash_init();
 	init_timers();
+	hrtimers_init();
 	softirq_init();
 	time_init();
 
Index: linux-2.6.15-rc5/kernel/Makefile
===================================================================
--- linux-2.6.15-rc5.orig/kernel/Makefile
+++ linux-2.6.15-rc5/kernel/Makefile
@@ -7,7 +7,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
+	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o \
+	    hrtimer.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
Index: linux-2.6.15-rc5/kernel/hrtimer.c
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/kernel/hrtimer.c
@@ -0,0 +1,679 @@
+/*
+ *  linux/kernel/hrtimer.c
+ *
+ *  Copyright(C) 2005, Thomas Gleixner <tglx@linutronix.de>
+ *  Copyright(C) 2005, Red Hat, Inc., Ingo Molnar
+ *
+ *  High-resolution kernel timers
+ *
+ *  In contrast to the low-resolution timeout API implemented in
+ *  kernel/timer.c, hrtimers provide finer resolution and accuracy
+ *  depending on system configuration and capabilities.
+ *
+ *  These timers are currently used for:
+ *   - itimers
+ *   - POSIX timers
+ *   - nanosleep
+ *   - precise in-kernel timing
+ *
+ *  Started by: Thomas Gleixner and Ingo Molnar
+ *
+ *  Credits:
+ *	based on kernel/timer.c
+ *
+ *  For licencing details see kernel-base/COPYING
+ */
+
+#include <linux/cpu.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/hrtimer.h>
+#include <linux/notifier.h>
+#include <linux/syscalls.h>
+#include <linux/interrupt.h>
+
+#include <asm/uaccess.h>
+
+/**
+ * ktime_get - get the monotonic time in ktime_t format
+ *
+ * returns the time in ktime_t format
+ */
+static ktime_t ktime_get(void)
+{
+	struct timespec now;
+
+	ktime_get_ts(&now);
+
+	return timespec_to_ktime(now);
+}
+
+/**
+ * ktime_get_real - get the real (wall-) time in ktime_t format
+ *
+ * returns the time in ktime_t format
+ */
+static ktime_t ktime_get_real(void)
+{
+	struct timespec now;
+
+	getnstimeofday(&now);
+
+	return timespec_to_ktime(now);
+}
+
+EXPORT_SYMBOL_GPL(ktime_get_real);
+
+/*
+ * The timer bases:
+ */
+
+#define MAX_HRTIMER_BASES 2
+
+static DEFINE_PER_CPU(struct hrtimer_base, hrtimer_bases[MAX_HRTIMER_BASES]) =
+{
+	{
+		.index = CLOCK_REALTIME,
+		.get_time = &ktime_get_real,
+		.resolution = KTIME_REALTIME_RES,
+	},
+	{
+		.index = CLOCK_MONOTONIC,
+		.get_time = &ktime_get,
+		.resolution = KTIME_MONOTONIC_RES,
+	},
+};
+
+/**
+ * ktime_get_ts - get the monotonic clock in timespec format
+ *
+ * @ts:		pointer to timespec variable
+ *
+ * The function calculates the monotonic clock from the realtime
+ * clock and the wall_to_monotonic offset and stores the result
+ * in normalized timespec format in the variable pointed to by ts.
+ */
+void ktime_get_ts(struct timespec *ts)
+{
+	struct timespec tomono;
+	unsigned long seq;
+
+	do {
+		seq = read_seqbegin(&xtime_lock);
+		getnstimeofday(ts);
+		tomono = wall_to_monotonic;
+
+	} while (read_seqretry(&xtime_lock, seq));
+
+	set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
+				ts->tv_nsec + tomono.tv_nsec);
+}
+
+/*
+ * Functions and macros which are different for UP/SMP systems are kept in a
+ * single place
+ */
+#ifdef CONFIG_SMP
+
+#define set_curr_timer(b, t)		do { (b)->curr_timer = (t); } while (0)
+
+/*
+ * We are using hashed locking: holding per_cpu(hrtimer_bases)[n].lock
+ * means that all timers which are tied to this base via timer->base are
+ * locked, and the base itself is locked too.
+ *
+ * So __run_timers/migrate_timers can safely modify all timers which could
+ * be found on the lists/queues.
+ *
+ * When the timer's base is locked, and the timer removed from list, it is
+ * possible to set timer->base = NULL and drop the lock: the timer remains
+ * locked.
+ */
+static struct hrtimer_base *lock_hrtimer_base(const struct hrtimer *timer,
+					      unsigned long *flags)
+{
+	struct hrtimer_base *base;
+
+	for (;;) {
+		base = timer->base;
+		if (likely(base != NULL)) {
+			spin_lock_irqsave(&base->lock, *flags);
+			if (likely(base == timer->base))
+				return base;
+			/* The timer has migrated to another CPU: */
+			spin_unlock_irqrestore(&base->lock, *flags);
+		}
+		cpu_relax();
+	}
+}
+
+/*
+ * Switch the timer base to the current CPU when possible.
+ */
+static inline struct hrtimer_base *
+switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_base *base)
+{
+	struct hrtimer_base *new_base;
+
+	new_base = &__get_cpu_var(hrtimer_bases[base->index]);
+
+	if (base != new_base) {
+		/*
+		 * We are trying to schedule the timer on the local CPU.
+		 * However we can't change timer's base while it is running,
+		 * so we keep it on the same CPU. No hassle vs. reprogramming
+		 * the event source in the high resolution case. The softirq
+		 * code will take care of this when the timer function has
+		 * completed. There is no conflict as we hold the lock until
+		 * the timer is enqueued.
+		 */
+		if (unlikely(base->curr_timer == timer))
+			return base;
+
+		/* See the comment in lock_timer_base() */
+		timer->base = NULL;
+		spin_unlock(&base->lock);
+		spin_lock(&new_base->lock);
+		timer->base = new_base;
+	}
+	return new_base;
+}
+
+#else /* CONFIG_SMP */
+
+#define set_curr_timer(b, t)		do { } while (0)
+
+static inline struct hrtimer_base *
+lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
+{
+	struct hrtimer_base *base = timer->base;
+
+	spin_lock_irqsave(&base->lock, *flags);
+
+	return base;
+}
+
+#define switch_hrtimer_base(t, b)	(b)
+
+#endif	/* !CONFIG_SMP */
+
+/*
+ * Functions for the union type storage format of ktime_t which are
+ * too large for inlining:
+ */
+#if BITS_PER_LONG < 64
+# ifndef CONFIG_KTIME_SCALAR
+/**
+ * ktime_add_ns - Add a scalar nanoseconds value to a ktime_t variable
+ *
+ * @kt:		addend
+ * @nsec:	the scalar nsec value to add
+ *
+ * Returns the sum of kt and nsec in ktime_t format
+ */
+ktime_t ktime_add_ns(const ktime_t kt, u64 nsec)
+{
+	ktime_t tmp;
+
+	if (likely(nsec < NSEC_PER_SEC)) {
+		tmp.tv64 = nsec;
+	} else {
+		unsigned long rem = do_div(nsec, NSEC_PER_SEC);
+
+		tmp = ktime_set((long)nsec, rem);
+	}
+
+	return ktime_add(kt, tmp);
+}
+
+#else /* CONFIG_KTIME_SCALAR */
+
+# endif /* !CONFIG_KTIME_SCALAR */
+
+/*
+ * Divide a ktime value by a nanosecond value
+ */
+static unsigned long ktime_divns(const ktime_t kt, nsec_t div)
+{
+	u64 dclc, inc, dns;
+	int sft = 0;
+
+	dclc = dns = ktime_to_ns(kt);
+	inc = div;
+	/* Make sure the divisor is less than 2^32: */
+	while (div >> 32) {
+		sft++;
+		div >>= 1;
+	}
+	dclc >>= sft;
+	do_div(dclc, (unsigned long) div);
+
+	return (unsigned long) dclc;
+}
+
+#else /* BITS_PER_LONG < 64 */
+# define ktime_divns(kt, div)		(unsigned long)((kt).tv64 / (div))
+#endif /* BITS_PER_LONG >= 64 */
+
+/*
+ * Counterpart to lock_timer_base above:
+ */
+static inline
+void unlock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
+{
+	spin_unlock_irqrestore(&timer->base->lock, *flags);
+}
+
+/**
+ * hrtimer_forward - forward the timer expiry
+ *
+ * @timer:	hrtimer to forward
+ * @interval:	the interval to forward
+ *
+ * Forward the timer expiry so it will expire in the future.
+ * The number of overruns is added to the overrun field.
+ */
+unsigned long
+hrtimer_forward(struct hrtimer *timer, const ktime_t interval)
+{
+	unsigned long orun = 1;
+	ktime_t delta, now;
+
+	now = timer->base->get_time();
+
+	delta = ktime_sub(now, timer->expires);
+
+	if (delta.tv64 < 0)
+		return 0;
+
+	if (unlikely(delta.tv64 >= interval.tv64)) {
+		nsec_t incr = ktime_to_ns(interval);
+
+		orun = ktime_divns(delta, incr);
+		timer->expires = ktime_add_ns(timer->expires, incr * orun);
+		if (timer->expires.tv64 > now.tv64)
+			return orun;
+		/*
+		 * This (and the ktime_add() below) is the
+		 * correction for exact:
+		 */
+		orun++;
+	}
+	timer->expires = ktime_add(timer->expires, interval);
+
+	return orun;
+}
+
+/*
+ * enqueue_hrtimer - internal function to (re)start a timer
+ *
+ * The timer is inserted in expiry order. Insertion into the
+ * red black tree is O(log(n)). Must hold the base lock.
+ */
+static void enqueue_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
+{
+	struct rb_node **link = &base->active.rb_node;
+	struct list_head *prev = &base->pending;
+	struct rb_node *parent = NULL;
+	struct hrtimer *entry;
+
+	/*
+	 * Find the right place in the rbtree:
+	 */
+	while (*link) {
+		parent = *link;
+		entry = rb_entry(parent, struct hrtimer, node);
+		/*
+		 * We dont care about collisions. Nodes with
+		 * the same expiry time stay together.
+		 */
+		if (timer->expires.tv64 < entry->expires.tv64)
+			link = &(*link)->rb_left;
+		else {
+			link = &(*link)->rb_right;
+			prev = &entry->list;
+		}
+	}
+
+	/*
+	 * Insert the timer to the rbtree and to the sorted list:
+	 */
+	rb_link_node(&timer->node, parent, link);
+	rb_insert_color(&timer->node, &base->active);
+	list_add(&timer->list, prev);
+
+	timer->state = HRTIMER_PENDING;
+}
+
+
+/*
+ * __remove_hrtimer - internal function to remove a timer
+ *
+ * Caller must hold the base lock.
+ */
+static void __remove_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
+{
+	/*
+	 * Remove the timer from the sorted list and from the rbtree:
+	 */
+	list_del(&timer->list);
+	rb_erase(&timer->node, &base->active);
+}
+
+/*
+ * remove hrtimer, called with base lock held
+ */
+static inline int
+remove_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
+{
+	if (hrtimer_active(timer)) {
+		__remove_hrtimer(timer, base);
+		timer->state = HRTIMER_INACTIVE;
+		return 1;
+	}
+	return 0;
+}
+
+/**
+ * hrtimer_start - (re)start an relative timer on the current CPU
+ *
+ * @timer:	the timer to be added
+ * @tim:	expiry time
+ * @mode:	expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ *  0 on success
+ *  1 when the timer was active
+ */
+int
+hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
+{
+	struct hrtimer_base *base, *new_base;
+	unsigned long flags;
+	int ret;
+
+	base = lock_hrtimer_base(timer, &flags);
+
+	/* Remove an active timer from the queue: */
+	ret = remove_hrtimer(timer, base);
+
+	/* Switch the timer base, if necessary: */
+	new_base = switch_hrtimer_base(timer, base);
+
+	if (mode == HRTIMER_REL)
+		tim = ktime_add(tim, new_base->get_time());
+	timer->expires = tim;
+
+	enqueue_hrtimer(timer, new_base);
+
+	unlock_hrtimer_base(timer, &flags);
+
+	return ret;
+}
+
+/**
+ * hrtimer_try_to_cancel - try to deactivate a timer
+ *
+ * @timer:	hrtimer to stop
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ * -1 when the timer is currently excuting the callback function and
+ *    can not be stopped
+ */
+int hrtimer_try_to_cancel(struct hrtimer *timer)
+{
+	struct hrtimer_base *base;
+	unsigned long flags;
+	int ret = -1;
+
+	base = lock_hrtimer_base(timer, &flags);
+
+	if (base->curr_timer != timer)
+		ret = remove_hrtimer(timer, base);
+
+	unlock_hrtimer_base(timer, &flags);
+
+	return ret;
+
+}
+
+/**
+ * hrtimer_cancel - cancel a timer and wait for the handler to finish.
+ *
+ * @timer:	the timer to be cancelled
+ *
+ * Returns:
+ *  0 when the timer was not active
+ *  1 when the timer was active
+ */
+int hrtimer_cancel(struct hrtimer *timer)
+{
+	for (;;) {
+		int ret = hrtimer_try_to_cancel(timer);
+
+		if (ret >= 0)
+			return ret;
+	}
+}
+
+/**
+ * hrtimer_get_remaining - get remaining time for the timer
+ *
+ * @timer:	the timer to read
+ */
+ktime_t hrtimer_get_remaining(const struct hrtimer *timer)
+{
+	struct hrtimer_base *base;
+	unsigned long flags;
+	ktime_t rem;
+
+	base = lock_hrtimer_base(timer, &flags);
+	rem = ktime_sub(timer->expires, timer->base->get_time());
+	unlock_hrtimer_base(timer, &flags);
+
+	return rem;
+}
+
+/**
+ * hrtimer_rebase - rebase an initialized hrtimer to a different base
+ *
+ * @timer:	the timer to be rebased
+ * @clock_id:	the clock to be used
+ */
+void hrtimer_rebase(struct hrtimer *timer, const clockid_t clock_id)
+{
+	struct hrtimer_base *bases;
+
+	bases = per_cpu(hrtimer_bases, raw_smp_processor_id());
+	timer->base = &bases[clock_id];
+}
+
+/**
+ * hrtimer_init - initialize a timer to the given clock
+ *
+ * @timer:	the timer to be initialized
+ * @clock_id:	the clock to be used
+ */
+void hrtimer_init(struct hrtimer *timer, const clockid_t clock_id)
+{
+	memset(timer, 0, sizeof(struct hrtimer));
+	hrtimer_rebase(timer, clock_id);
+}
+
+/**
+ * hrtimer_get_res - get the timer resolution for a clock
+ *
+ * @which_clock: which clock to query
+ * @tp:		 pointer to timespec variable to store the resolution
+ *
+ * Store the resolution of the clock selected by which_clock in the
+ * variable pointed to by tp.
+ */
+int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp)
+{
+	struct hrtimer_base *bases;
+
+	tp->tv_sec = 0;
+	bases = per_cpu(hrtimer_bases, raw_smp_processor_id());
+	tp->tv_nsec = bases[which_clock].resolution;
+
+	return 0;
+}
+
+/*
+ * Expire the per base hrtimer-queue:
+ */
+static inline void run_hrtimer_queue(struct hrtimer_base *base)
+{
+	ktime_t now = base->get_time();
+
+	spin_lock_irq(&base->lock);
+
+	while (!list_empty(&base->pending)) {
+		struct hrtimer *timer;
+		int (*fn)(void *);
+		int restart;
+		void *data;
+
+		timer = list_entry(base->pending.next, struct hrtimer, list);
+		if (now.tv64 <= timer->expires.tv64)
+			break;
+
+		fn = timer->function;
+		data = timer->data;
+		set_curr_timer(base, timer);
+		__remove_hrtimer(timer, base);
+		spin_unlock_irq(&base->lock);
+
+		/*
+		 * fn == NULL is special case for the simplest timer
+		 * variant - wake up process and do not restart:
+		 */
+		if (!fn) {
+			wake_up_process(data);
+			restart = HRTIMER_NORESTART;
+		} else
+			restart = fn(data);
+
+		spin_lock_irq(&base->lock);
+
+		if (restart == HRTIMER_RESTART)
+			enqueue_hrtimer(timer, base);
+		else
+			timer->state = HRTIMER_EXPIRED;
+	}
+	set_curr_timer(base, NULL);
+	spin_unlock_irq(&base->lock);
+}
+
+/*
+ * Called from timer softirq every jiffy, expire hrtimers:
+ */
+void hrtimer_run_queues(void)
+{
+	struct hrtimer_base *base = __get_cpu_var(hrtimer_bases);
+	int i;
+
+	for (i = 0; i < MAX_HRTIMER_BASES; i++)
+		run_hrtimer_queue(&base[i]);
+}
+
+/*
+ * Functions related to boot-time initialization:
+ */
+static void __devinit init_hrtimers_cpu(int cpu)
+{
+	struct hrtimer_base *base = per_cpu(hrtimer_bases, cpu);
+	int i;
+
+	for (i = 0; i < MAX_HRTIMER_BASES; i++) {
+		spin_lock_init(&base->lock);
+		INIT_LIST_HEAD(&base->pending);
+		base++;
+	}
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+
+static void migrate_hrtimer_list(struct hrtimer_base *old_base,
+				struct hrtimer_base *new_base)
+{
+	struct hrtimer *timer;
+	struct rb_node *node;
+
+	while ((node = rb_first(&old_base->active))) {
+		timer = rb_entry(node, struct hrtimer, node);
+		__remove_hrtimer(timer, old_base);
+		timer->base = new_base;
+		enqueue_hrtimer(timer, new_base);
+	}
+}
+
+static void migrate_hrtimers(int cpu)
+{
+	struct hrtimer_base *old_base, *new_base;
+	int i;
+
+	BUG_ON(cpu_online(cpu));
+	old_base = per_cpu(hrtimer_bases, cpu);
+	new_base = get_cpu_var(hrtimer_bases);
+
+	local_irq_disable();
+
+	for (i = 0; i < MAX_HRTIMER_BASES; i++) {
+
+		spin_lock(&new_base->lock);
+		spin_lock(&old_base->lock);
+
+		BUG_ON(old_base->curr_timer);
+
+		migrate_hrtimer_list(old_base, new_base);
+
+		spin_unlock(&old_base->lock);
+		spin_unlock(&new_base->lock);
+		old_base++;
+		new_base++;
+	}
+
+	local_irq_enable();
+	put_cpu_var(hrtimer_bases);
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+static int __devinit hrtimer_cpu_notify(struct notifier_block *self,
+					unsigned long action, void *hcpu)
+{
+	long cpu = (long)hcpu;
+
+	switch (action) {
+
+	case CPU_UP_PREPARE:
+		init_hrtimers_cpu(cpu);
+		break;
+
+#ifdef CONFIG_HOTPLUG_CPU
+	case CPU_DEAD:
+		migrate_hrtimers(cpu);
+		break;
+#endif
+
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block __devinitdata hrtimers_nb = {
+	.notifier_call = hrtimer_cpu_notify,
+};
+
+void __init hrtimers_init(void)
+{
+	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
+			  (void *)(long)smp_processor_id());
+	register_cpu_notifier(&hrtimers_nb);
+}
+
Index: linux-2.6.15-rc5/kernel/timer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/timer.c
+++ linux-2.6.15-rc5/kernel/timer.c
@@ -857,6 +857,7 @@ static void run_timer_softirq(struct sof
 {
 	tvec_base_t *base = &__get_cpu_var(tvec_bases);
 
+ 	hrtimer_run_queues();
 	if (time_after_eq(jiffies, base->timer_jiffies))
 		__run_timers(base);
 }

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 16/21] hrtimer documentation
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (14 preceding siblings ...)
  2005-12-06  0:01 ` [patch 15/21] hrtimer core code tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 17/21] Switch itimers to hrtimer tglx
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-documentation.patch --]
[-- Type: text/plain, Size: 10125 bytes --]


- add hrtimer docbook and design document

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 Documentation/DocBook/kernel-api.tmpl |    5 
 Documentation/hrtimers.txt            |  178 ++++++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)

Index: linux-2.6.15-rc5/Documentation/DocBook/kernel-api.tmpl
===================================================================
--- linux-2.6.15-rc5.orig/Documentation/DocBook/kernel-api.tmpl
+++ linux-2.6.15-rc5/Documentation/DocBook/kernel-api.tmpl
@@ -54,6 +54,11 @@
 !Ekernel/sched.c
 !Ekernel/timer.c
      </sect1>
+     <sect1><title>High-resolution timers</title>
+!Iinclude/linux/ktime.h
+!Iinclude/linux/hrtimer.h
+!Ekernel/hrtimer.c
+     </sect1>
      <sect1><title>Internal Functions</title>
 !Ikernel/exit.c
 !Ikernel/signal.c
Index: linux-2.6.15-rc5/Documentation/hrtimers.txt
===================================================================
--- /dev/null
+++ linux-2.6.15-rc5/Documentation/hrtimers.txt
@@ -0,0 +1,178 @@
+
+hrtimers - subsystem for high-resolution kernel timers
+----------------------------------------------------
+
+This patch introduces a new subsystem for high-resolution kernel timers.
+
+One might ask the question: we already have a timer subsystem
+(kernel/timers.c), why do we need two timer subsystems? After a lot of
+back and forth trying to integrate high-resolution and high-precision
+features into the existing timer framework, and after testing various
+such high-resolution timer implementations in practice, we came to the
+conclusion that the timer wheel code is fundamentally not suitable for
+such an approach. We initially didnt believe this ('there must be a way
+to solve this'), and spent a considerable effort trying to integrate
+things into the timer wheel, but we failed. In hindsight, there are
+several reasons why such integration is hard/impossible:
+
+- the forced handling of low-resolution and high-resolution timers in
+  the same way leads to a lot of compromises, macro magic and #ifdef
+  mess. The timers.c code is very "tightly coded" around jiffies and
+  32-bitness assumptions, and has been honed and micro-optimized for a
+  relatively narrow use case (jiffies in a relatively narrow HZ range)
+  for many years - and thus even small extensions to it easily break
+  the wheel concept, leading to even worse compromises. The timer wheel
+  code is very good and tight code, there's zero problems with it in its
+  current usage - but it is simply not suitable to be extended for
+  high-res timers.
+
+- the unpredictable [O(N)] overhead of cascading leads to delays which
+  necessiate a more complex handling of high resolution timers, which
+  in turn decreases robustness. Such a design still led to rather large
+  timing inaccuracies. Cascading is a fundamental property of the timer
+  wheel concept, it cannot be 'designed out' without unevitably
+  degrading other portions of the timers.c code in an unacceptable way.
+
+- the implementation of the current posix-timer subsystem on top of
+  the timer wheel has already introduced a quite complex handling of
+  the required readjusting of absolute CLOCK_REALTIME timers at
+  settimeofday or NTP time - further underlying our experience by
+  example: that the timer wheel data structure is too rigid for high-res
+  timers.
+
+- the timer wheel code is most optimal for use cases which can be
+  identified as "timeouts". Such timeouts are usually set up to cover
+  error conditions in various I/O paths, such as networking and block
+  I/O. The vast majority of those timers never expire and are rarely
+  recascaded because the expected correct event arrives in time so they
+  can be removed from the timer wheel before any further processing of
+  them becomes necessary. Thus the users of these timeouts can accept
+  the granularity and precision tradeoffs of the timer wheel, and
+  largely expect the timer subsystem to have near-zero overhead.
+  Accurate timing for them is not a core purpose - in fact most of the
+  timeout values used are ad-hoc. For them it is at most a necessary
+  evil to guarantee the processing of actual timeout completions
+  (because most of the timeouts are deleted before completion), which
+  should thus be as cheap and unintrusive as possible.
+
+The primary users of precision timers are user-space applications that
+utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel
+users like drivers and subsystems which require precise timed events
+(e.g. multimedia) can benefit from the availability of a seperate
+high-resolution timer subsystem as well.
+
+While this subsystem does not offer high-resolution clock sources just
+yet, the hrtimer subsystem can be easily extended with high-resolution
+clock capabilities, and patches for that exist and are maturing quickly.
+The increasing demand for realtime and multimedia applications along
+with other potential users for precise timers gives another reason to
+separate the "timeout" and "precise timer" subsystems.
+
+Another potential benefit is that such a seperation allows even more
+special-purpose optimization of the existing timer wheel for the low
+resolution and low precision use cases - once the precision-sensitive
+APIs are separated from the timer wheel and are migrated over to
+hrtimers. E.g. we could decrease the frequency of the timeout subsystem
+from 250 Hz to 100 HZ (or even smaller).
+
+hrtimer subsystem implementation details
+----------------------------------------
+
+the basic design considerations were:
+
+- simplicity
+
+- data structure not bound to jiffies or any other granularity. All the
+  kernel logic works at 64-bit nanoseconds resolution - no compromises.
+
+- simplification of existing, timing related kernel code
+
+another basic requirement was the immediate enqueueing and ordering of
+timers at activation time. After looking at several possible solutions
+such as radix trees and hashes, we chose the red black tree as the basic
+data structure. Rbtrees are available as a library in the kernel and are
+used in various performance-critical areas of e.g. memory management and
+file systems. The rbtree is solely used for time sorted ordering, while
+a separate list is used to give the expiry code fast access to the
+queued timers, without having to walk the rbtree.
+
+(This seperate list is also useful for later when we'll introduce
+high-resolution clocks, where we need seperate pending and expired
+queues while keeping the time-order intact.)
+
+Time-ordered enqueueing is not purely for the purposes of
+high-resolution clocks though, it also simplifies the handling of
+absolute timers based on a low-resolution CLOCK_REALTIME. The existing
+implementation needed to keep an extra list of all armed absolute
+CLOCK_REALTIME timers along with complex locking. In case of
+settimeofday and NTP, all the timers (!) had to be dequeued, the
+time-changing code had to fix them up one by one, and all of them had to
+be enqueued again. The time-ordered enqueueing and the storage of the
+expiry time in absolute time units removes all this complex and poorly
+scaling code from the posix-timer implementation - the clock can simply
+be set without having to touch the rbtree. This also makes the handling
+of posix-timers simpler in general.
+
+The locking and per-CPU behavior of hrtimers was mostly taken from the
+existing timer wheel code, as it is mature and well suited. Sharing code
+was not really a win, due to the different data structures. Also, the
+hrtimer functions now have clearer behavior and clearer names - such as
+hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
+equivalent to del_timer() and del_timer_sync()] - so there's no direct
+1:1 mapping between them on the algorithmical level, and thus no real
+potential for code sharing either.
+
+Basic data types: every time value, absolute or relative, is in a
+special nanosecond-resolution type: ktime_t. The kernel-internal
+representation of ktime_t values and operations is implemented via
+macros and inline functions, and can be switched between a "hybrid
+union" type and a plain "scalar" 64bit nanoseconds representation (at
+compile time). The hybrid union type optimizes time conversions on 32bit
+CPUs. This build-time-selectable ktime_t storage format was implemented
+to avoid the performance impact of 64-bit multiplications and divisions
+on 32bit CPUs. Such operations are frequently necessary to convert
+between the storage formats provided by kernel and userspace interfaces
+and the internal time format. (See include/linux/ktime.h for further
+details.)
+
+hrtimers - rounding of timer values
+-----------------------------------
+
+the hrtimer code will round timer events to lower-resolution clocks
+because it has to. Otherwise it will do no artificial rounding at all.
+
+one question is, what resolution value should be returned to the user by
+the clock_getres() interface. This will return whatever real resolution
+a given clock has - be it low-res, high-res, or artificially-low-res.
+
+hrtimers - testing and verification
+----------------------------------
+
+We used the high-resolution clock subsystem ontop of hrtimers to verify
+the hrtimer implementation details in praxis, and we also ran the posix
+timer tests in order to ensure specification compliance. We also ran
+tests on low-resolution clocks.
+
+The hrtimer patch converts the following kernel functionality to use
+hrtimers:
+
+ - nanosleep
+ - itimers
+ - posix-timers
+
+The conversion of nanosleep and posix-timers enabled the unification of
+nanosleep and clock_nanosleep.
+
+The code was successfully compiled for the following platforms:
+
+ i386, x86_64, ARM, PPC, PPC64, IA64
+
+The code was run-tested on the following platforms:
+
+ i386(UP/SMP), x86_64(UP/SMP), ARM, PPC
+
+hrtimers were also integrated into the -rt tree, along with a
+hrtimers-based high-resolution clock implementation, so the hrtimers
+code got a healthy amount of testing and use in practice.
+
+	Thomas Gleixner, Ingo Molnar

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 17/21] Switch itimers to hrtimer
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (15 preceding siblings ...)
  2005-12-06  0:01 ` [patch 16/21] hrtimer documentation tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 18/21] Create hrtimer nanosleep API tglx
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-convert-itimer.patch --]
[-- Type: text/plain, Size: 9316 bytes --]


- switch itimers to a hrtimers-based implementation

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 fs/exec.c             |    6 +-
 fs/proc/array.c       |    6 +-
 include/linux/sched.h |    5 +-
 include/linux/timer.h |    2 
 kernel/exit.c         |    2 
 kernel/fork.c         |    6 +-
 kernel/itimer.c       |  108 ++++++++++++++++++++++++--------------------------
 7 files changed, 66 insertions(+), 69 deletions(-)

Index: linux-2.6.15-rc5/fs/exec.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/exec.c
+++ linux-2.6.15-rc5/fs/exec.c
@@ -632,10 +632,10 @@ static inline int de_thread(struct task_
 		 * synchronize with any firing (by calling del_timer_sync)
 		 * before we can safely let the old group leader die.
 		 */
-		sig->real_timer.data = (unsigned long)current;
+		sig->real_timer.data = current;
 		spin_unlock_irq(lock);
-		if (del_timer_sync(&sig->real_timer))
-			add_timer(&sig->real_timer);
+		if (hrtimer_cancel(&sig->real_timer))
+			hrtimer_restart(&sig->real_timer);
 		spin_lock_irq(lock);
 	}
 	while (atomic_read(&sig->count) > count) {
Index: linux-2.6.15-rc5/fs/proc/array.c
===================================================================
--- linux-2.6.15-rc5.orig/fs/proc/array.c
+++ linux-2.6.15-rc5/fs/proc/array.c
@@ -330,7 +330,7 @@ static int do_task_stat(struct task_stru
 	unsigned long  min_flt = 0,  maj_flt = 0;
 	cputime_t cutime, cstime, utime, stime;
 	unsigned long rsslim = 0;
-	unsigned long it_real_value = 0;
+	DEFINE_KTIME(it_real_value);
 	struct task_struct *t;
 	char tcomm[sizeof(task->comm)];
 
@@ -386,7 +386,7 @@ static int do_task_stat(struct task_stru
 			utime = cputime_add(utime, task->signal->utime);
 			stime = cputime_add(stime, task->signal->stime);
 		}
-		it_real_value = task->signal->it_real_value;
+		it_real_value = task->signal->real_timer.expires;
 	}
 	ppid = pid_alive(task) ? task->group_leader->real_parent->tgid : 0;
 	read_unlock(&tasklist_lock);
@@ -435,7 +435,7 @@ static int do_task_stat(struct task_stru
 		priority,
 		nice,
 		num_threads,
-		jiffies_to_clock_t(it_real_value),
+		(long) ktime_to_clock_t(it_real_value),
 		start_time,
 		vsize,
 		mm ? get_mm_rss(mm) : 0,
Index: linux-2.6.15-rc5/include/linux/sched.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/sched.h
+++ linux-2.6.15-rc5/include/linux/sched.h
@@ -104,6 +104,7 @@ extern unsigned long nr_iowait(void);
 #include <linux/param.h>
 #include <linux/resource.h>
 #include <linux/timer.h>
+#include <linux/hrtimer.h>
 
 #include <asm/processor.h>
 
@@ -402,8 +403,8 @@ struct signal_struct {
 	struct list_head posix_timers;
 
 	/* ITIMER_REAL timer for the process */
-	struct timer_list real_timer;
-	unsigned long it_real_value, it_real_incr;
+	struct hrtimer real_timer;
+	ktime_t it_real_incr;
 
 	/* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */
 	cputime_t it_prof_expires, it_virt_expires;
Index: linux-2.6.15-rc5/include/linux/timer.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/timer.h
+++ linux-2.6.15-rc5/include/linux/timer.h
@@ -96,6 +96,6 @@ static inline void add_timer(struct time
 
 extern void init_timers(void);
 extern void run_local_timers(void);
-extern void it_real_fn(unsigned long);
+extern int it_real_fn(void *);
 
 #endif
Index: linux-2.6.15-rc5/kernel/exit.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/exit.c
+++ linux-2.6.15-rc5/kernel/exit.c
@@ -842,7 +842,7 @@ fastcall NORET_TYPE void do_exit(long co
 	}
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
- 		del_timer_sync(&tsk->signal->real_timer);
+ 		hrtimer_cancel(&tsk->signal->real_timer);
 		exit_itimers(tsk->signal);
 		acct_process(code);
 	}
Index: linux-2.6.15-rc5/kernel/fork.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/fork.c
+++ linux-2.6.15-rc5/kernel/fork.c
@@ -793,10 +793,10 @@ static inline int copy_signal(unsigned l
 	init_sigpending(&sig->shared_pending);
 	INIT_LIST_HEAD(&sig->posix_timers);
 
-	sig->it_real_value = sig->it_real_incr = 0;
+	hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC);
+	sig->it_real_incr.tv64 = 0;
 	sig->real_timer.function = it_real_fn;
-	sig->real_timer.data = (unsigned long) tsk;
-	init_timer(&sig->real_timer);
+	sig->real_timer.data = tsk;
 
 	sig->it_virt_expires = cputime_zero;
 	sig->it_virt_incr = cputime_zero;
Index: linux-2.6.15-rc5/kernel/itimer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/itimer.c
+++ linux-2.6.15-rc5/kernel/itimer.c
@@ -12,36 +12,46 @@
 #include <linux/syscalls.h>
 #include <linux/time.h>
 #include <linux/posix-timers.h>
+#include <linux/hrtimer.h>
 
 #include <asm/uaccess.h>
 
-static unsigned long it_real_value(struct signal_struct *sig)
+/**
+ * itimer_get_remtime - get remaining time for the timer
+ *
+ * @timer: the timer to read
+ *
+ * Returns the delta between the expiry time and now, which can be
+ * less than zero or 1usec for an pending expired timer
+ */
+static struct timeval itimer_get_remtime(struct hrtimer *timer)
 {
-	unsigned long val = 0;
-	if (timer_pending(&sig->real_timer)) {
-		val = sig->real_timer.expires - jiffies;
-
-		/* look out for negative/zero itimer.. */
-		if ((long) val <= 0)
-			val = 1;
-	}
-	return val;
+	ktime_t rem = hrtimer_get_remaining(timer);
+
+	/*
+	 * Racy but safe: if the itimer expires after the above
+	 * hrtimer_get_remtime() call but before this condition
+	 * then we return 0 - which is correct.
+	 */
+	if (hrtimer_active(timer)) {
+		if (rem.tv64 <= 0)
+			rem.tv64 = NSEC_PER_USEC;
+	} else
+		rem.tv64 = 0;
+
+	return ktime_to_timeval(rem);
 }
 
 int do_getitimer(int which, struct itimerval *value)
 {
 	struct task_struct *tsk = current;
-	unsigned long interval, val;
 	cputime_t cinterval, cval;
 
 	switch (which) {
 	case ITIMER_REAL:
-		spin_lock_irq(&tsk->sighand->siglock);
-		interval = tsk->signal->it_real_incr;
-		val = it_real_value(tsk->signal);
-		spin_unlock_irq(&tsk->sighand->siglock);
-		jiffies_to_timeval(val, &value->it_value);
-		jiffies_to_timeval(interval, &value->it_interval);
+		value->it_value = itimer_get_remtime(&tsk->signal->real_timer);
+		value->it_interval =
+			ktime_to_timeval(tsk->signal->it_real_incr);
 		break;
 	case ITIMER_VIRTUAL:
 		read_lock(&tasklist_lock);
@@ -113,59 +123,45 @@ asmlinkage long sys_getitimer(int which,
 }
 
 
-void it_real_fn(unsigned long __data)
+/*
+ * The timer is automagically restarted, when interval != 0
+ */
+int it_real_fn(void *data)
 {
-	struct task_struct * p = (struct task_struct *) __data;
-	unsigned long inc = p->signal->it_real_incr;
+	struct task_struct *tsk = (struct task_struct *) data;
 
-	send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p);
+	send_group_sig_info(SIGALRM, SEND_SIG_PRIV, tsk);
 
-	/*
-	 * Now restart the timer if necessary.  We don't need any locking
-	 * here because do_setitimer makes sure we have finished running
-	 * before it touches anything.
-	 * Note, we KNOW we are (or should be) at a jiffie edge here so
-	 * we don't need the +1 stuff.  Also, we want to use the prior
-	 * expire value so as to not "slip" a jiffie if we are late.
-	 * Deal with requesting a time prior to "now" here rather than
-	 * in add_timer.
-	 */
-	if (!inc)
-		return;
-	while (time_before_eq(p->signal->real_timer.expires, jiffies))
-		p->signal->real_timer.expires += inc;
-	add_timer(&p->signal->real_timer);
+	if (tsk->signal->it_real_incr.tv64 != 0) {
+		hrtimer_forward(&tsk->signal->real_timer,
+			       tsk->signal->it_real_incr);
+
+		return HRTIMER_RESTART;
+	}
+	return HRTIMER_NORESTART;
 }
 
 int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)
 {
 	struct task_struct *tsk = current;
- 	unsigned long val, interval, expires;
+	struct hrtimer *timer;
+	ktime_t expires;
 	cputime_t cval, cinterval, nval, ninterval;
 
 	switch (which) {
 	case ITIMER_REAL:
-again:
-		spin_lock_irq(&tsk->sighand->siglock);
-		interval = tsk->signal->it_real_incr;
-		val = it_real_value(tsk->signal);
-		/* We are sharing ->siglock with it_real_fn() */
-		if (try_to_del_timer_sync(&tsk->signal->real_timer) < 0) {
-			spin_unlock_irq(&tsk->sighand->siglock);
-			goto again;
-		}
-		tsk->signal->it_real_incr =
-			timeval_to_jiffies(&value->it_interval);
-		expires = timeval_to_jiffies(&value->it_value);
-		if (expires)
-			mod_timer(&tsk->signal->real_timer,
-				  jiffies + 1 + expires);
-		spin_unlock_irq(&tsk->sighand->siglock);
+		timer = &tsk->signal->real_timer;
+		hrtimer_cancel(timer);
 		if (ovalue) {
-			jiffies_to_timeval(val, &ovalue->it_value);
-			jiffies_to_timeval(interval,
-					   &ovalue->it_interval);
+			ovalue->it_value = itimer_get_remtime(timer);
+			ovalue->it_interval
+				= ktime_to_timeval(tsk->signal->it_real_incr);
 		}
+		tsk->signal->it_real_incr =
+			timeval_to_ktime(value->it_interval);
+		expires = timeval_to_ktime(value->it_value);
+		if (expires.tv64 != 0)
+			hrtimer_start(timer, expires, HRTIMER_REL);
 		break;
 	case ITIMER_VIRTUAL:
 		nval = timeval_to_cputime(&value->it_value);

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 18/21] Create hrtimer nanosleep API
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (16 preceding siblings ...)
  2005-12-06  0:01 ` [patch 17/21] Switch itimers to hrtimer tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 19/21] Switch sys_nanosleep to hrtimer tglx
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-nanosleep-interface.patch --]
[-- Type: text/plain, Size: 4700 bytes --]


- introduce the hrtimer_nanosleep() and hrtimer_nanosleep_real() APIs.
  Not yet used by any code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/hrtimer.h |    6 ++
 kernel/hrtimer.c        |  127 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 133 insertions(+)

Index: linux-2.6.15-rc5/include/linux/hrtimer.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/hrtimer.h
+++ linux-2.6.15-rc5/include/linux/hrtimer.h
@@ -121,6 +121,12 @@ static inline int hrtimer_active(const s
 extern unsigned long hrtimer_forward(struct hrtimer *timer,
 				     const ktime_t interval);
 
+/* Precise sleep: */
+extern long hrtimer_nanosleep(struct timespec *rqtp,
+			      struct timespec __user *rmtp,
+			      const enum hrtimer_mode mode,
+			      const clockid_t clockid);
+
 /* Soft interrupt function to run the hrtimer queues: */
 extern void hrtimer_run_queues(void);
 
Index: linux-2.6.15-rc5/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/hrtimer.c
+++ linux-2.6.15-rc5/kernel/hrtimer.c
@@ -581,6 +581,133 @@ void hrtimer_run_queues(void)
 }
 
 /*
+ * Sleep related functions:
+ */
+
+/**
+ * schedule_hrtimer - sleep until timeout
+ *
+ * @timer:	hrtimer variable initialized with the correct clock base
+ * @mode:	timeout value is abs/rel
+ *
+ * Make the current task sleep until @timeout is
+ * elapsed.
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to
+ * pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * will be returned
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ */
+static ktime_t __sched
+schedule_hrtimer(struct hrtimer *timer, const enum hrtimer_mode mode)
+{
+	/* fn stays NULL, meaning single-shot wakeup: */
+	timer->data = current;
+
+	hrtimer_start(timer, timer->expires, mode);
+
+	schedule();
+	hrtimer_cancel(timer);
+
+	/* Return the remaining time: */
+	if (timer->state != HRTIMER_EXPIRED)
+		return ktime_sub(timer->expires, timer->base->get_time());
+	else
+		return (ktime_t) {.tv64 = 0 };
+}
+
+static inline ktime_t __sched
+schedule_hrtimer_interruptible(struct hrtimer *timer,
+			       const enum hrtimer_mode mode)
+{
+	set_current_state(TASK_INTERRUPTIBLE);
+
+	return schedule_hrtimer(timer, mode);
+}
+
+static long __sched
+nanosleep_restart(struct restart_block *restart, clockid_t clockid)
+{
+	struct timespec __user *rmtp, tu;
+	void *rfn_save = restart->fn;
+	struct hrtimer timer;
+	ktime_t rem;
+
+	restart->fn = do_no_restart_syscall;
+
+	hrtimer_init(&timer, clockid);
+
+	timer.expires.tv64 = ((u64)restart->arg1 << 32) | (u64) restart->arg0;
+
+	rem = schedule_hrtimer_interruptible(&timer, HRTIMER_ABS);
+
+	if (rem.tv64 <= 0)
+		return 0;
+
+	rmtp = (struct timespec __user *) restart->arg2;
+	tu = ktime_to_timespec(rem);
+	if (rmtp && copy_to_user(rmtp, &tu, sizeof(tu)))
+		return -EFAULT;
+
+	restart->fn = rfn_save;
+
+	/* The other values in restart are already filled in */
+	return -ERESTART_RESTARTBLOCK;
+}
+
+static long __sched nanosleep_restart_mono(struct restart_block *restart)
+{
+	return nanosleep_restart(restart, CLOCK_MONOTONIC);
+}
+
+static long __sched nanosleep_restart_real(struct restart_block *restart)
+{
+	return nanosleep_restart(restart, CLOCK_REALTIME);
+}
+
+long hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp,
+		       const enum hrtimer_mode mode, const clockid_t clockid)
+{
+	struct restart_block *restart;
+	struct hrtimer timer;
+	struct timespec tu;
+	ktime_t rem;
+
+	hrtimer_init(&timer, clockid);
+
+	timer.expires = timespec_to_ktime(*rqtp);
+
+	rem = schedule_hrtimer_interruptible(&timer, mode);
+	if (rem.tv64 <= 0)
+		return 0;
+
+	/* Absolute timers do not update the rmtp value: */
+	if (mode == HRTIMER_ABS)
+		return -ERESTARTNOHAND;
+
+	tu = ktime_to_timespec(rem);
+
+	if (rmtp && copy_to_user(rmtp, &tu, sizeof(tu)))
+		return -EFAULT;
+
+	restart = &current_thread_info()->restart_block;
+	restart->fn = (clockid == CLOCK_MONOTONIC) ?
+		nanosleep_restart_mono : nanosleep_restart_real;
+	restart->arg0 = timer.expires.tv64 & 0xFFFFFFFF;
+	restart->arg1 = timer.expires.tv64 >> 32;
+	restart->arg2 = (unsigned long) rmtp;
+
+	return -ERESTART_RESTARTBLOCK;
+}
+
+/*
  * Functions related to boot-time initialization:
  */
 static void __devinit init_hrtimers_cpu(int cpu)

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 19/21] Switch sys_nanosleep to hrtimer
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (17 preceding siblings ...)
  2005-12-06  0:01 ` [patch 18/21] Create hrtimer nanosleep API tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 20/21] Switch clock_nanosleep to hrtimer nanosleep API tglx
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-convert-sys-nanosleep.patch --]
[-- Type: text/plain, Size: 2754 bytes --]


- convert sys_nanosleep() to use hrtimer_nanosleep()

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 kernel/hrtimer.c |   14 +++++++++++++
 kernel/timer.c   |   56 -------------------------------------------------------
 2 files changed, 14 insertions(+), 56 deletions(-)

Index: linux-2.6.15-rc5/kernel/timer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/timer.c
+++ linux-2.6.15-rc5/kernel/timer.c
@@ -1119,62 +1119,6 @@ asmlinkage long sys_gettid(void)
 	return current->pid;
 }
 
-static long __sched nanosleep_restart(struct restart_block *restart)
-{
-	unsigned long expire = restart->arg0, now = jiffies;
-	struct timespec __user *rmtp = (struct timespec __user *) restart->arg1;
-	long ret;
-
-	/* Did it expire while we handled signals? */
-	if (!time_after(expire, now))
-		return 0;
-
-	expire = schedule_timeout_interruptible(expire - now);
-
-	ret = 0;
-	if (expire) {
-		struct timespec t;
-		jiffies_to_timespec(expire, &t);
-
-		ret = -ERESTART_RESTARTBLOCK;
-		if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
-			ret = -EFAULT;
-		/* The 'restart' block is already filled in */
-	}
-	return ret;
-}
-
-asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
-{
-	struct timespec t;
-	unsigned long expire;
-	long ret;
-
-	if (copy_from_user(&t, rqtp, sizeof(t)))
-		return -EFAULT;
-
-	if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0))
-		return -EINVAL;
-
-	expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
-	expire = schedule_timeout_interruptible(expire);
-
-	ret = 0;
-	if (expire) {
-		struct restart_block *restart;
-		jiffies_to_timespec(expire, &t);
-		if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
-			return -EFAULT;
-
-		restart = &current_thread_info()->restart_block;
-		restart->fn = nanosleep_restart;
-		restart->arg0 = jiffies + expire;
-		restart->arg1 = (unsigned long) rmtp;
-		ret = -ERESTART_RESTARTBLOCK;
-	}
-	return ret;
-}
-
 /*
  * sys_sysinfo - fill in sysinfo struct
  */ 
Index: linux-2.6.15-rc5/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/hrtimer.c
+++ linux-2.6.15-rc5/kernel/hrtimer.c
@@ -707,6 +707,20 @@ long hrtimer_nanosleep(struct timespec *
 	return -ERESTART_RESTARTBLOCK;
 }
 
+asmlinkage long
+sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
+{
+	struct timespec tu;
+
+	if (copy_from_user(&tu, rqtp, sizeof(tu)))
+		return -EFAULT;
+
+	if (!timespec_valid(&tu))
+		return -EINVAL;
+
+	return hrtimer_nanosleep(&tu, rmtp, HRTIMER_REL, CLOCK_MONOTONIC);
+}
+
 /*
  * Functions related to boot-time initialization:
  */

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 20/21] Switch clock_nanosleep to hrtimer nanosleep API
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (18 preceding siblings ...)
  2005-12-06  0:01 ` [patch 19/21] Switch sys_nanosleep to hrtimer tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06  0:01 ` [patch 21/21] Convert posix timers completely tglx
  2005-12-06 17:32 ` [patch 00/21] hrtimer - High-resolution timer subsystem Roman Zippel
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-convert-posix-clock-nanosleep.patch --]
[-- Type: text/plain, Size: 10532 bytes --]


- Switch clock_nanosleep to use the new nanosleep functions
  in hrtimer.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/posix-timers.h |    7 +
 kernel/posix-cpu-timers.c    |   23 +++---
 kernel/posix-timers.c        |  151 +++++++------------------------------------
 3 files changed, 45 insertions(+), 136 deletions(-)

Index: linux-2.6.15-rc5/include/linux/posix-timers.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/posix-timers.h
+++ linux-2.6.15-rc5/include/linux/posix-timers.h
@@ -81,7 +81,7 @@ struct k_clock {
 	int (*clock_get) (const clockid_t which_clock, struct timespec * tp);
 	int (*timer_create) (struct k_itimer *timer);
 	int (*nsleep) (const clockid_t which_clock, int flags,
-		       struct timespec *);
+		       struct timespec *, struct timespec __user *);
 	int (*timer_set) (struct k_itimer * timr, int flags,
 			  struct itimerspec * new_setting,
 			  struct itimerspec * old_setting);
@@ -95,7 +95,8 @@ void register_posix_clock(const clockid_
 
 /* error handlers for timer_create, nanosleep and settime */
 int do_posix_clock_notimer_create(struct k_itimer *timer);
-int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *);
+int do_posix_clock_nonanosleep(const clockid_t, int flags, struct timespec *,
+			       struct timespec __user *);
 int do_posix_clock_nosettime(const clockid_t, struct timespec *tp);
 
 /* function to call to trigger timer event */
@@ -129,7 +130,7 @@ int posix_cpu_clock_get(const clockid_t 
 int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *ts);
 int posix_cpu_timer_create(struct k_itimer *timer);
 int posix_cpu_nsleep(const clockid_t which_clock, int flags,
-		     struct timespec *ts);
+		     struct timespec *rqtp, struct timespec __user *rmtp);
 int posix_cpu_timer_set(struct k_itimer *timer, int flags,
 			struct itimerspec *new, struct itimerspec *old);
 int posix_cpu_timer_del(struct k_itimer *timer);
Index: linux-2.6.15-rc5/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-cpu-timers.c
+++ linux-2.6.15-rc5/kernel/posix-cpu-timers.c
@@ -1411,7 +1411,7 @@ void set_process_cpu_timer(struct task_s
 static long posix_cpu_clock_nanosleep_restart(struct restart_block *);
 
 int posix_cpu_nsleep(const clockid_t which_clock, int flags,
-		     struct timespec *rqtp)
+		     struct timespec *rqtp, struct timespec __user *rmtp)
 {
 	struct restart_block *restart_block =
 	    &current_thread_info()->restart_block;
@@ -1436,7 +1436,6 @@ int posix_cpu_nsleep(const clockid_t whi
 	error = posix_cpu_timer_create(&timer);
 	timer.it_process = current;
 	if (!error) {
-		struct timespec __user *rmtp;
 		static struct itimerspec zero_it;
 		struct itimerspec it = { .it_value = *rqtp,
 					 .it_interval = {} };
@@ -1483,7 +1482,6 @@ int posix_cpu_nsleep(const clockid_t whi
 		/*
 		 * Report back to the user the time still remaining.
 		 */
-		rmtp = (struct timespec __user *) restart_block->arg1;
 		if (rmtp != NULL && !(flags & TIMER_ABSTIME) &&
 		    copy_to_user(rmtp, &it.it_value, sizeof *rmtp))
 			return -EFAULT;
@@ -1491,6 +1489,7 @@ int posix_cpu_nsleep(const clockid_t whi
 		restart_block->fn = posix_cpu_clock_nanosleep_restart;
 		/* Caller already set restart_block->arg1 */
 		restart_block->arg0 = which_clock;
+		restart_block->arg1 = (unsigned long) rmtp;
 		restart_block->arg2 = rqtp->tv_sec;
 		restart_block->arg3 = rqtp->tv_nsec;
 
@@ -1504,10 +1503,15 @@ static long
 posix_cpu_clock_nanosleep_restart(struct restart_block *restart_block)
 {
 	clockid_t which_clock = restart_block->arg0;
-	struct timespec t = { .tv_sec = restart_block->arg2,
-			      .tv_nsec = restart_block->arg3 };
+	struct timespec __user *rmtp;
+	struct timespec t;
+
+	rmtp = (struct timespec __user *) restart_block->arg1;
+	t.tv_sec = restart_block->arg2;
+	t.tv_nsec = restart_block->arg3;
+
 	restart_block->fn = do_no_restart_syscall;
-	return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t);
+	return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t, rmtp);
 }
 
 
@@ -1530,9 +1534,10 @@ static int process_cpu_timer_create(stru
 	return posix_cpu_timer_create(timer);
 }
 static int process_cpu_nsleep(const clockid_t which_clock, int flags,
-			      struct timespec *rqtp)
+			      struct timespec *rqtp,
+			      struct timespec __user *rmtp)
 {
-	return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp);
+	return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp, rmtp);
 }
 static int thread_cpu_clock_getres(const clockid_t which_clock,
 				   struct timespec *tp)
@@ -1550,7 +1555,7 @@ static int thread_cpu_timer_create(struc
 	return posix_cpu_timer_create(timer);
 }
 static int thread_cpu_nsleep(const clockid_t which_clock, int flags,
-			      struct timespec *rqtp)
+			      struct timespec *rqtp, struct timespec __user *rmtp)
 {
 	return -EINVAL;
 }
Index: linux-2.6.15-rc5/kernel/posix-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-timers.c
+++ linux-2.6.15-rc5/kernel/posix-timers.c
@@ -209,7 +209,8 @@ static inline int common_timer_create(st
 /*
  * These ones are defined below.
  */
-static int common_nsleep(const clockid_t, int flags, struct timespec *t);
+static int common_nsleep(const clockid_t, int flags, struct timespec *t,
+			 struct timespec __user *rmtp);
 static void common_timer_get(struct k_itimer *, struct itimerspec *);
 static int common_timer_set(struct k_itimer *, int,
 			    struct itimerspec *, struct itimerspec *);
@@ -1227,7 +1228,7 @@ int do_posix_clock_notimer_create(struct
 EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create);
 
 int do_posix_clock_nonanosleep(const clockid_t clock, int flags,
-			       struct timespec *t)
+			       struct timespec *t, struct timespec __user *r)
 {
 #ifndef ENOTSUP
 	return -EOPNOTSUPP;	/* aka ENOTSUP in userland for POSIX */
@@ -1387,7 +1388,28 @@ void clock_was_set(void)
 	up(&clock_was_set_lock);
 }
 
-long clock_nanosleep_restart(struct restart_block *restart_block);
+/*
+ * nanosleep for monotonic and realtime clocks
+ */
+static int common_nsleep(const clockid_t which_clock, int flags,
+			 struct timespec *tsave, struct timespec __user *rmtp)
+{
+	int mode = flags & TIMER_ABSTIME ? HRTIMER_ABS : HRTIMER_REL;
+	int clockid = which_clock;
+
+	switch (which_clock) {
+	case CLOCK_REALTIME:
+		/* Posix madness. Only absolute timers on clock realtime
+		   are affected by clock set. */
+		if (mode == HRTIMER_ABS)
+			clockid = CLOCK_MONOTONIC;
+	case CLOCK_MONOTONIC:
+		break;
+	default:
+		return -EINVAL;
+	}
+	return hrtimer_nanosleep(tsave, rmtp, mode, clockid);
+}
 
 asmlinkage long
 sys_clock_nanosleep(const clockid_t which_clock, int flags,
@@ -1395,9 +1417,6 @@ sys_clock_nanosleep(const clockid_t whic
 		    struct timespec __user *rmtp)
 {
 	struct timespec t;
-	struct restart_block *restart_block =
-	    &(current_thread_info()->restart_block);
-	int ret;
 
 	if (invalid_clockid(which_clock))
 		return -EINVAL;
@@ -1408,122 +1427,6 @@ sys_clock_nanosleep(const clockid_t whic
 	if (!timespec_valid(&t))
 		return -EINVAL;
 
-	/*
-	 * Do this here as nsleep function does not have the real address.
-	 */
-	restart_block->arg1 = (unsigned long)rmtp;
-
-	ret = CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t));
-
-	if ((ret == -ERESTART_RESTARTBLOCK) && rmtp &&
-					copy_to_user(rmtp, &t, sizeof (t)))
-		return -EFAULT;
-	return ret;
-}
-
-
-static int common_nsleep(const clockid_t which_clock,
-			 int flags, struct timespec *tsave)
-{
-	struct timespec t, dum;
-	DECLARE_WAITQUEUE(abs_wqueue, current);
-	u64 rq_time = (u64)0;
-	s64 left;
-	int abs;
-	struct restart_block *restart_block =
-	    &current_thread_info()->restart_block;
-
-	abs_wqueue.flags = 0;
-	abs = flags & TIMER_ABSTIME;
-
-	if (restart_block->fn == clock_nanosleep_restart) {
-		/*
-		 * Interrupted by a non-delivered signal, pick up remaining
-		 * time and continue.  Remaining time is in arg2 & 3.
-		 */
-		restart_block->fn = do_no_restart_syscall;
-
-		rq_time = restart_block->arg3;
-		rq_time = (rq_time << 32) + restart_block->arg2;
-		if (!rq_time)
-			return -EINTR;
-		left = rq_time - get_jiffies_64();
-		if (left <= (s64)0)
-			return 0;	/* Already passed */
-	}
-
-	if (abs && (posix_clocks[which_clock].clock_get !=
-			    posix_clocks[CLOCK_MONOTONIC].clock_get))
-		add_wait_queue(&nanosleep_abs_wqueue, &abs_wqueue);
-
-	do {
-		t = *tsave;
-		if (abs || !rq_time) {
-			adjust_abs_time(&posix_clocks[which_clock], &t, abs,
-					&rq_time, &dum);
-		}
-
-		left = rq_time - get_jiffies_64();
-		if (left >= (s64)MAX_JIFFY_OFFSET)
-			left = (s64)MAX_JIFFY_OFFSET;
-		if (left < (s64)0)
-			break;
-
-		schedule_timeout_interruptible(left);
-
-		left = rq_time - get_jiffies_64();
-	} while (left > (s64)0 && !test_thread_flag(TIF_SIGPENDING));
-
-	if (abs_wqueue.task_list.next)
-		finish_wait(&nanosleep_abs_wqueue, &abs_wqueue);
-
-	if (left > (s64)0) {
-
-		/*
-		 * Always restart abs calls from scratch to pick up any
-		 * clock shifting that happened while we are away.
-		 */
-		if (abs)
-			return -ERESTARTNOHAND;
-
-		left *= TICK_NSEC;
-		tsave->tv_sec = div_long_long_rem(left, 
-						  NSEC_PER_SEC, 
-						  &tsave->tv_nsec);
-		/*
-		 * Restart works by saving the time remaing in 
-		 * arg2 & 3 (it is 64-bits of jiffies).  The other
-		 * info we need is the clock_id (saved in arg0). 
-		 * The sys_call interface needs the users 
-		 * timespec return address which _it_ saves in arg1.
-		 * Since we have cast the nanosleep call to a clock_nanosleep
-		 * both can be restarted with the same code.
-		 */
-		restart_block->fn = clock_nanosleep_restart;
-		restart_block->arg0 = which_clock;
-		/*
-		 * Caller sets arg1
-		 */
-		restart_block->arg2 = rq_time & 0xffffffffLL;
-		restart_block->arg3 = rq_time >> 32;
-
-		return -ERESTART_RESTARTBLOCK;
-	}
-
-	return 0;
-}
-/*
- * This will restart clock_nanosleep.
- */
-long
-clock_nanosleep_restart(struct restart_block *restart_block)
-{
-	struct timespec t;
-	int ret = common_nsleep(restart_block->arg0, 0, &t);
-
-	if ((ret == -ERESTART_RESTARTBLOCK) && restart_block->arg1 &&
-	    copy_to_user((struct timespec __user *)(restart_block->arg1), &t,
-			 sizeof (t)))
-		return -EFAULT;
-	return ret;
+	return CLOCK_DISPATCH(which_clock, nsleep,
+			      (which_clock, flags, &t, rmtp));
 }

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [patch 21/21] Convert posix timers completely
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (19 preceding siblings ...)
  2005-12-06  0:01 ` [patch 20/21] Switch clock_nanosleep to hrtimer nanosleep API tglx
@ 2005-12-06  0:01 ` tglx
  2005-12-06 17:32 ` [patch 00/21] hrtimer - High-resolution timer subsystem Roman Zippel
  21 siblings, 0 replies; 74+ messages in thread
From: tglx @ 2005-12-06  0:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, rostedt, johnstul, zippel, mingo

[-- Attachment #1: hrtimer-convert-posix-timers.patch --]
[-- Type: text/plain, Size: 33728 bytes --]


- convert posix-timers.c to use hrtimers
- remove the now obsolete abslist code

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 include/linux/hrtimer.h      |    7 
 include/linux/posix-timers.h |   37 --
 include/linux/time.h         |    3 
 kernel/posix-timers.c        |  713 ++++++++-----------------------------------
 4 files changed, 143 insertions(+), 617 deletions(-)

Index: linux-2.6.15-rc5/include/linux/posix-timers.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/posix-timers.h
+++ linux-2.6.15-rc5/include/linux/posix-timers.h
@@ -51,12 +51,8 @@ struct k_itimer {
 	struct sigqueue *sigq;		/* signal queue entry. */
 	union {
 		struct {
-			struct timer_list timer;
-			/* clock abs_timer_list: */
-			struct list_head abs_timer_entry;
-			/* wall_to_monotonic used when set: */
-			struct timespec wall_to_prev;
-			unsigned long incr; /* interval in jiffies */
+			struct hrtimer timer;
+			ktime_t interval;
 		} real;
 		struct cpu_timer_list cpu;
 		struct {
@@ -68,15 +64,9 @@ struct k_itimer {
 	} it;
 };
 
-struct k_clock_abs {
-	struct list_head list;
-	spinlock_t lock;
-};
-
 struct k_clock {
 	int res;		/* in nanoseconds */
 	int (*clock_getres) (const clockid_t which_clock, struct timespec *tp);
-	struct k_clock_abs *abs_struct;
 	int (*clock_set) (const clockid_t which_clock, struct timespec * tp);
 	int (*clock_get) (const clockid_t which_clock, struct timespec * tp);
 	int (*timer_create) (struct k_itimer *timer);
@@ -102,29 +92,6 @@ int do_posix_clock_nosettime(const clock
 /* function to call to trigger timer event */
 int posix_timer_event(struct k_itimer *timr, int si_private);
 
-struct now_struct {
-	unsigned long jiffies;
-};
-
-#define posix_get_now(now) \
-	do { (now)->jiffies = jiffies; } while (0)
-
-#define posix_time_before(timer, now) \
-                      time_before((timer)->expires, (now)->jiffies)
-
-#define posix_bump_timer(timr, now)					\
-	do {								\
-		long delta, orun;					\
-									\
-		delta = (now).jiffies - (timr)->it.real.timer.expires;	\
-		if (delta >= 0) {					\
-			orun = 1 + (delta / (timr)->it.real.incr);	\
-			(timr)->it.real.timer.expires +=		\
-				orun * (timr)->it.real.incr;		\
-			(timr)->it_overrun += orun;			\
-		}							\
-	} while (0)
-
 int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *ts);
 int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *ts);
 int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *ts);
Index: linux-2.6.15-rc5/include/linux/time.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/time.h
+++ linux-2.6.15-rc5/include/linux/time.h
@@ -73,8 +73,7 @@ struct timespec current_kernel_time(void
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
 extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz);
-extern void clock_was_set(void); // call whenever the clock is set
-extern int do_posix_clock_monotonic_gettime(struct timespec *tp);
+#define do_posix_clock_monotonic_gettime(ts) ktime_get_ts(ts)
 extern long do_utimes(char __user *filename, struct timeval *times);
 struct itimerval;
 extern int do_setitimer(int which, struct itimerval *value,
Index: linux-2.6.15-rc5/kernel/posix-timers.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/posix-timers.c
+++ linux-2.6.15-rc5/kernel/posix-timers.c
@@ -35,7 +35,6 @@
 #include <linux/interrupt.h>
 #include <linux/slab.h>
 #include <linux/time.h>
-#include <linux/calc64.h>
 
 #include <asm/uaccess.h>
 #include <asm/semaphore.h>
@@ -49,12 +48,6 @@
 #include <linux/workqueue.h>
 #include <linux/module.h>
 
-#define CLOCK_REALTIME_RES TICK_NSEC  /* In nano seconds. */
-
-static inline u64  mpy_l_X_l_ll(unsigned long mpy1,unsigned long mpy2)
-{
-	return (u64)mpy1 * mpy2;
-}
 /*
  * Management arrays for POSIX timers.	 Timers are kept in slab memory
  * Timer ids are allocated by an external routine that keeps track of the
@@ -140,18 +133,18 @@ static DEFINE_SPINLOCK(idr_lock);
  */
 
 static struct k_clock posix_clocks[MAX_CLOCKS];
+
 /*
- * We only have one real clock that can be set so we need only one abs list,
- * even if we should want to have several clocks with differing resolutions.
+ * These ones are defined below.
  */
-static struct k_clock_abs abs_list = {.list = LIST_HEAD_INIT(abs_list.list),
-				      .lock = SPIN_LOCK_UNLOCKED};
+static int common_nsleep(const clockid_t, int flags, struct timespec *t,
+			 struct timespec __user *rmtp);
+static void common_timer_get(struct k_itimer *, struct itimerspec *);
+static int common_timer_set(struct k_itimer *, int,
+			    struct itimerspec *, struct itimerspec *);
+static int common_timer_del(struct k_itimer *timer);
 
-static void posix_timer_fn(unsigned long);
-static u64 do_posix_clock_monotonic_gettime_parts(
-	struct timespec *tp, struct timespec *mo);
-int do_posix_clock_monotonic_gettime(struct timespec *tp);
-static int do_posix_clock_monotonic_get(const clockid_t, struct timespec *tp);
+static int posix_timer_fn(void *data);
 
 static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags);
 
@@ -184,10 +177,12 @@ static inline int common_clock_getres(co
 	return 0;
 }
 
-static inline int common_clock_get(const clockid_t which_clock,
-				   struct timespec *tp)
+/*
+ * Get real time for posix timers
+ */
+static int common_clock_get(clockid_t which_clock, struct timespec *tp)
 {
-	getnstimeofday(tp);
+	ktime_get_real_ts(tp);
 	return 0;
 }
 
@@ -199,25 +194,14 @@ static inline int common_clock_set(const
 
 static inline int common_timer_create(struct k_itimer *new_timer)
 {
-	INIT_LIST_HEAD(&new_timer->it.real.abs_timer_entry);
-	init_timer(&new_timer->it.real.timer);
-	new_timer->it.real.timer.data = (unsigned long) new_timer;
+	hrtimer_init(&new_timer->it.real.timer, new_timer->it_clock);
+	new_timer->it.real.timer.data = new_timer;
 	new_timer->it.real.timer.function = posix_timer_fn;
 	return 0;
 }
 
 /*
- * These ones are defined below.
- */
-static int common_nsleep(const clockid_t, int flags, struct timespec *t,
-			 struct timespec __user *rmtp);
-static void common_timer_get(struct k_itimer *, struct itimerspec *);
-static int common_timer_set(struct k_itimer *, int,
-			    struct itimerspec *, struct itimerspec *);
-static int common_timer_del(struct k_itimer *timer);
-
-/*
- * Return nonzero iff we know a priori this clockid_t value is bogus.
+ * Return nonzero if we know a priori this clockid_t value is bogus.
  */
 static inline int invalid_clockid(const clockid_t which_clock)
 {
@@ -227,26 +211,32 @@ static inline int invalid_clockid(const 
 		return 1;
 	if (posix_clocks[which_clock].clock_getres != NULL)
 		return 0;
-#ifndef CLOCK_DISPATCH_DIRECT
 	if (posix_clocks[which_clock].res != 0)
 		return 0;
-#endif
 	return 1;
 }
 
+/*
+ * Get monotonic time for posix timers
+ */
+static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp)
+{
+	ktime_get_ts(tp);
+	return 0;
+}
 
 /*
  * Initialize everything, well, just everything in Posix clocks/timers ;)
  */
 static __init int init_posix_timers(void)
 {
-	struct k_clock clock_realtime = {.res = CLOCK_REALTIME_RES,
-					 .abs_struct = &abs_list
+	struct k_clock clock_realtime = {
+		.clock_getres = hrtimer_get_res,
 	};
-	struct k_clock clock_monotonic = {.res = CLOCK_REALTIME_RES,
-		.abs_struct = NULL,
-		.clock_get = do_posix_clock_monotonic_get,
-		.clock_set = do_posix_clock_nosettime
+	struct k_clock clock_monotonic = {
+		.clock_getres = hrtimer_get_res,
+		.clock_get = posix_ktime_get_ts,
+		.clock_set = do_posix_clock_nosettime,
 	};
 
 	register_posix_clock(CLOCK_REALTIME, &clock_realtime);
@@ -260,117 +250,17 @@ static __init int init_posix_timers(void
 
 __initcall(init_posix_timers);
 
-static void tstojiffie(struct timespec *tp, int res, u64 *jiff)
-{
-	long sec = tp->tv_sec;
-	long nsec = tp->tv_nsec + res - 1;
-
-	if (nsec >= NSEC_PER_SEC) {
-		sec++;
-		nsec -= NSEC_PER_SEC;
-	}
-
-	/*
-	 * The scaling constants are defined in <linux/time.h>
-	 * The difference between there and here is that we do the
-	 * res rounding and compute a 64-bit result (well so does that
-	 * but it then throws away the high bits).
-  	 */
-	*jiff =  (mpy_l_X_l_ll(sec, SEC_CONVERSION) +
-		  (mpy_l_X_l_ll(nsec, NSEC_CONVERSION) >> 
-		   (NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
-}
-
-/*
- * This function adjusts the timer as needed as a result of the clock
- * being set.  It should only be called for absolute timers, and then
- * under the abs_list lock.  It computes the time difference and sets
- * the new jiffies value in the timer.  It also updates the timers
- * reference wall_to_monotonic value.  It is complicated by the fact
- * that tstojiffies() only handles positive times and it needs to work
- * with both positive and negative times.  Also, for negative offsets,
- * we need to defeat the res round up.
- *
- * Return is true if there is a new time, else false.
- */
-static long add_clockset_delta(struct k_itimer *timr,
-			       struct timespec *new_wall_to)
-{
-	struct timespec delta;
-	int sign = 0;
-	u64 exp;
-
-	set_normalized_timespec(&delta,
-				new_wall_to->tv_sec -
-				timr->it.real.wall_to_prev.tv_sec,
-				new_wall_to->tv_nsec -
-				timr->it.real.wall_to_prev.tv_nsec);
-	if (likely(!(delta.tv_sec | delta.tv_nsec)))
-		return 0;
-	if (delta.tv_sec < 0) {
-		set_normalized_timespec(&delta,
-					-delta.tv_sec,
-					1 - delta.tv_nsec -
-					posix_clocks[timr->it_clock].res);
-		sign++;
-	}
-	tstojiffie(&delta, posix_clocks[timr->it_clock].res, &exp);
-	timr->it.real.wall_to_prev = *new_wall_to;
-	timr->it.real.timer.expires += (sign ? -exp : exp);
-	return 1;
-}
-
-static void remove_from_abslist(struct k_itimer *timr)
-{
-	if (!list_empty(&timr->it.real.abs_timer_entry)) {
-		spin_lock(&abs_list.lock);
-		list_del_init(&timr->it.real.abs_timer_entry);
-		spin_unlock(&abs_list.lock);
-	}
-}
-
 static void schedule_next_timer(struct k_itimer *timr)
 {
-	struct timespec new_wall_to;
-	struct now_struct now;
-	unsigned long seq;
-
-	/*
-	 * Set up the timer for the next interval (if there is one).
-	 * Note: this code uses the abs_timer_lock to protect
-	 * it.real.wall_to_prev and must hold it until exp is set, not exactly
-	 * obvious...
-
-	 * This function is used for CLOCK_REALTIME* and
-	 * CLOCK_MONOTONIC* timers.  If we ever want to handle other
-	 * CLOCKs, the calling code (do_schedule_next_timer) would need
-	 * to pull the "clock" info from the timer and dispatch the
-	 * "other" CLOCKs "next timer" code (which, I suppose should
-	 * also be added to the k_clock structure).
-	 */
-	if (!timr->it.real.incr)
+	if (timr->it.real.interval.tv64 == 0)
 		return;
 
-	do {
-		seq = read_seqbegin(&xtime_lock);
-		new_wall_to =	wall_to_monotonic;
-		posix_get_now(&now);
-	} while (read_seqretry(&xtime_lock, seq));
-
-	if (!list_empty(&timr->it.real.abs_timer_entry)) {
-		spin_lock(&abs_list.lock);
-		add_clockset_delta(timr, &new_wall_to);
-
-		posix_bump_timer(timr, now);
-
-		spin_unlock(&abs_list.lock);
-	} else {
-		posix_bump_timer(timr, now);
-	}
+	timr->it_overrun += hrtimer_forward(&timr->it.real.timer,
+					    timr->it.real.interval);
 	timr->it_overrun_last = timr->it_overrun;
 	timr->it_overrun = -1;
 	++timr->it_requeue_pending;
-	add_timer(&timr->it.real.timer);
+	hrtimer_restart(&timr->it.real.timer);
 }
 
 /*
@@ -391,31 +281,23 @@ void do_schedule_next_timer(struct sigin
 
 	timr = lock_timer(info->si_tid, &flags);
 
-	if (!timr || timr->it_requeue_pending != info->si_sys_private)
-		goto exit;
+	if (timr && timr->it_requeue_pending == info->si_sys_private) {
+		if (timr->it_clock < 0)
+			posix_cpu_timer_schedule(timr);
+		else
+			schedule_next_timer(timr);
 
-	if (timr->it_clock < 0)	/* CPU clock */
-		posix_cpu_timer_schedule(timr);
-	else
-		schedule_next_timer(timr);
-	info->si_overrun = timr->it_overrun_last;
-exit:
-	if (timr)
-		unlock_timer(timr, flags);
+		info->si_overrun = timr->it_overrun_last;
+	}
+
+	unlock_timer(timr, flags);
 }
 
 int posix_timer_event(struct k_itimer *timr,int si_private)
 {
 	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
 	timr->sigq->info.si_sys_private = si_private;
-	/*
-	 * Send signal to the process that owns this timer.
-
-	 * This code assumes that all the possible abs_lists share the
-	 * same lock (there is only one list at this time). If this is
-	 * not the case, the CLOCK info would need to be used to find
-	 * the proper abs list lock.
-	 */
+	/* Send signal to the process that owns this timer.*/
 
 	timr->sigq->info.si_signo = timr->it_sigev_signo;
 	timr->sigq->info.si_errno = 0;
@@ -449,64 +331,35 @@ EXPORT_SYMBOL_GPL(posix_timer_event);
 
  * This code is for CLOCK_REALTIME* and CLOCK_MONOTONIC* timers.
  */
-static void posix_timer_fn(unsigned long __data)
+static int posix_timer_fn(void *data)
 {
-	struct k_itimer *timr = (struct k_itimer *) __data;
+	struct k_itimer *timr = data;
 	unsigned long flags;
-	unsigned long seq;
-	struct timespec delta, new_wall_to;
-	u64 exp = 0;
-	int do_notify = 1;
+	int si_private = 0;
+	int ret = HRTIMER_NORESTART;
 
 	spin_lock_irqsave(&timr->it_lock, flags);
-	if (!list_empty(&timr->it.real.abs_timer_entry)) {
-		spin_lock(&abs_list.lock);
-		do {
-			seq = read_seqbegin(&xtime_lock);
-			new_wall_to =	wall_to_monotonic;
-		} while (read_seqretry(&xtime_lock, seq));
-		set_normalized_timespec(&delta,
-					new_wall_to.tv_sec -
-					timr->it.real.wall_to_prev.tv_sec,
-					new_wall_to.tv_nsec -
-					timr->it.real.wall_to_prev.tv_nsec);
-		if (likely((delta.tv_sec | delta.tv_nsec ) == 0)) {
-			/* do nothing, timer is on time */
-		} else if (delta.tv_sec < 0) {
-			/* do nothing, timer is already late */
-		} else {
-			/* timer is early due to a clock set */
-			tstojiffie(&delta,
-				   posix_clocks[timr->it_clock].res,
-				   &exp);
-			timr->it.real.wall_to_prev = new_wall_to;
-			timr->it.real.timer.expires += exp;
-			add_timer(&timr->it.real.timer);
-			do_notify = 0;
-		}
-		spin_unlock(&abs_list.lock);
 
-	}
-	if (do_notify)  {
-		int si_private=0;
+	if (timr->it.real.interval.tv64 != 0)
+		si_private = ++timr->it_requeue_pending;
 
-		if (timr->it.real.incr)
-			si_private = ++timr->it_requeue_pending;
-		else {
-			remove_from_abslist(timr);
+	if (posix_timer_event(timr, si_private)) {
+		/*
+		 * signal was not sent because of sig_ignor
+		 * we will not get a call back to restart it AND
+		 * it should be restarted.
+		 */
+		if (timr->it.real.interval.tv64 != 0) {
+			timr->it_overrun +=
+				hrtimer_forward(&timr->it.real.timer,
+						timr->it.real.interval);
+			ret = HRTIMER_RESTART;
 		}
-
-		if (posix_timer_event(timr, si_private))
-			/*
-			 * signal was not sent because of sig_ignor
-			 * we will not get a call back to restart it AND
-			 * it should be restarted.
-			 */
-			schedule_next_timer(timr);
 	}
-	unlock_timer(timr, flags); /* hold thru abs lock to keep irq off */
-}
 
+	unlock_timer(timr, flags);
+	return ret;
+}
 
 static inline struct task_struct * good_sigevent(sigevent_t * event)
 {
@@ -597,8 +450,7 @@ sys_timer_create(const clockid_t which_c
 		goto out;
 	}
 	spin_lock_irq(&idr_lock);
-	error = idr_get_new(&posix_timers_id,
-			    (void *) new_timer,
+	error = idr_get_new(&posix_timers_id, (void *) new_timer,
 			    &new_timer_id);
 	spin_unlock_irq(&idr_lock);
 	if (error == -EAGAIN)
@@ -699,26 +551,6 @@ out:
 }
 
 /*
- * good_timespec
- *
- * This function checks the elements of a timespec structure.
- *
- * Arguments:
- * ts	     : Pointer to the timespec structure to check
- *
- * Return value:
- * If a NULL pointer was passed in, or the tv_nsec field was less than 0
- * or greater than NSEC_PER_SEC, or the tv_sec field was less than 0,
- * this function returns 0. Otherwise it returns 1.
- */
-static int good_timespec(const struct timespec *ts)
-{
-	if ((!ts) || !timespec_valid(ts))
-		return 0;
-	return 1;
-}
-
-/*
  * Locking issues: We need to protect the result of the id look up until
  * we get the timer locked down so it is not deleted under us.  The
  * removal is done under the idr spinlock so we use that here to bridge
@@ -770,39 +602,39 @@ static struct k_itimer * lock_timer(time
 static void
 common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting)
 {
-	unsigned long expires;
-	struct now_struct now;
+	ktime_t remaining;
+	struct hrtimer *timer = &timr->it.real.timer;
 
-	do
-		expires = timr->it.real.timer.expires;
-	while ((volatile long) (timr->it.real.timer.expires) != expires);
-
-	posix_get_now(&now);
-
-	if (expires &&
-	    ((timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) &&
-	    !timr->it.real.incr &&
-	    posix_time_before(&timr->it.real.timer, &now))
-		timr->it.real.timer.expires = expires = 0;
-	if (expires) {
-		if (timr->it_requeue_pending & REQUEUE_PENDING ||
-		    (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) {
-			posix_bump_timer(timr, now);
-			expires = timr->it.real.timer.expires;
-		}
-		else
-			if (!timer_pending(&timr->it.real.timer))
-				expires = 0;
-		if (expires)
-			expires -= now.jiffies;
-	}
-	jiffies_to_timespec(expires, &cur_setting->it_value);
-	jiffies_to_timespec(timr->it.real.incr, &cur_setting->it_interval);
+	memset(cur_setting, 0, sizeof(struct itimerspec));
+	remaining = hrtimer_get_remaining(timer);
 
-	if (cur_setting->it_value.tv_sec < 0) {
+	/* Time left ? or timer pending */
+	if (remaining.tv64 > 0 || hrtimer_active(timer))
+		goto calci;
+	/* interval timer ? */
+	if (timr->it.real.interval.tv64 == 0)
+		return;
+	/*
+	 * When a requeue is pending or this is a SIGEV_NONE timer
+	 * move the expiry time forward by intervals, so expiry is >
+	 * now.
+	 */
+	if (timr->it_requeue_pending & REQUEUE_PENDING ||
+	    (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) {
+		timr->it_overrun +=
+			hrtimer_forward(timer, timr->it.real.interval);
+		remaining = hrtimer_get_remaining(timer);
+	}
+ calci:
+	/* interval timer ? */
+	if (timr->it.real.interval.tv64 != 0)
+		cur_setting->it_interval =
+			ktime_to_timespec(timr->it.real.interval);
+	/* Return 0 only, when the timer is expired and not pending */
+	if (remaining.tv64 <= 0)
 		cur_setting->it_value.tv_nsec = 1;
-		cur_setting->it_value.tv_sec = 0;
-	}
+	else
+		cur_setting->it_value = ktime_to_timespec(remaining);
 }
 
 /* Get the time remaining on a POSIX.1b interval timer. */
@@ -826,6 +658,7 @@ sys_timer_gettime(timer_t timer_id, stru
 
 	return 0;
 }
+
 /*
  * Get the number of overruns of a POSIX.1b interval timer.  This is to
  * be the overrun of the timer last delivered.  At the same time we are
@@ -835,7 +668,6 @@ sys_timer_gettime(timer_t timer_id, stru
  * the call back to do_schedule_next_timer().  So all we need to do is
  * to pick up the frozen overrun.
  */
-
 asmlinkage long
 sys_timer_getoverrun(timer_t timer_id)
 {
@@ -852,84 +684,6 @@ sys_timer_getoverrun(timer_t timer_id)
 
 	return overrun;
 }
-/*
- * Adjust for absolute time
- *
- * If absolute time is given and it is not CLOCK_MONOTONIC, we need to
- * adjust for the offset between the timer clock (CLOCK_MONOTONIC) and
- * what ever clock he is using.
- *
- * If it is relative time, we need to add the current (CLOCK_MONOTONIC)
- * time to it to get the proper time for the timer.
- */
-static int adjust_abs_time(struct k_clock *clock, struct timespec *tp, 
-			   int abs, u64 *exp, struct timespec *wall_to)
-{
-	struct timespec now;
-	struct timespec oc = *tp;
-	u64 jiffies_64_f;
-	int rtn =0;
-
-	if (abs) {
-		/*
-		 * The mask pick up the 4 basic clocks 
-		 */
-		if (!((clock - &posix_clocks[0]) & ~CLOCKS_MASK)) {
-			jiffies_64_f = do_posix_clock_monotonic_gettime_parts(
-				&now,  wall_to);
-			/*
-			 * If we are doing a MONOTONIC clock
-			 */
-			if((clock - &posix_clocks[0]) & CLOCKS_MONO){
-				now.tv_sec += wall_to->tv_sec;
-				now.tv_nsec += wall_to->tv_nsec;
-			}
-		} else {
-			/*
-			 * Not one of the basic clocks
-			 */
-			clock->clock_get(clock - posix_clocks, &now);
-			jiffies_64_f = get_jiffies_64();
-		}
-		/*
-		 * Take away now to get delta and normalize
-		 */
-		set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec,
-					oc.tv_nsec - now.tv_nsec);
-	}else{
-		jiffies_64_f = get_jiffies_64();
-	}
-	/*
-	 * Check if the requested time is prior to now (if so set now)
-	 */
-	if (oc.tv_sec < 0)
-		oc.tv_sec = oc.tv_nsec = 0;
-
-	if (oc.tv_sec | oc.tv_nsec)
-		set_normalized_timespec(&oc, oc.tv_sec,
-					oc.tv_nsec + clock->res);
-	tstojiffie(&oc, clock->res, exp);
-
-	/*
-	 * Check if the requested time is more than the timer code
-	 * can handle (if so we error out but return the value too).
-	 */
-	if (*exp > ((u64)MAX_JIFFY_OFFSET))
-			/*
-			 * This is a considered response, not exactly in
-			 * line with the standard (in fact it is silent on
-			 * possible overflows).  We assume such a large 
-			 * value is ALMOST always a programming error and
-			 * try not to compound it by setting a really dumb
-			 * value.
-			 */
-			rtn = -EINVAL;
-	/*
-	 * return the actual jiffies expire time, full 64 bits
-	 */
-	*exp += jiffies_64_f;
-	return rtn;
-}
 
 /* Set a POSIX.1b interval timer. */
 /* timr->it_lock is taken. */
@@ -937,68 +691,48 @@ static inline int
 common_timer_set(struct k_itimer *timr, int flags,
 		 struct itimerspec *new_setting, struct itimerspec *old_setting)
 {
-	struct k_clock *clock = &posix_clocks[timr->it_clock];
-	u64 expire_64;
+	struct hrtimer *timer = &timr->it.real.timer;
 
 	if (old_setting)
 		common_timer_get(timr, old_setting);
 
 	/* disable the timer */
-	timr->it.real.incr = 0;
+	timr->it.real.interval.tv64 = 0;
 	/*
 	 * careful here.  If smp we could be in the "fire" routine which will
 	 * be spinning as we hold the lock.  But this is ONLY an SMP issue.
 	 */
-	if (try_to_del_timer_sync(&timr->it.real.timer) < 0) {
-#ifdef CONFIG_SMP
-		/*
-		 * It can only be active if on an other cpu.  Since
-		 * we have cleared the interval stuff above, it should
-		 * clear once we release the spin lock.  Of course once
-		 * we do that anything could happen, including the
-		 * complete melt down of the timer.  So return with
-		 * a "retry" exit status.
-		 */
+	if (hrtimer_try_to_cancel(timer) < 0)
 		return TIMER_RETRY;
-#endif
-	}
-
-	remove_from_abslist(timr);
 
 	timr->it_requeue_pending = (timr->it_requeue_pending + 2) & 
 		~REQUEUE_PENDING;
 	timr->it_overrun_last = 0;
-	timr->it_overrun = -1;
-	/*
-	 *switch off the timer when it_value is zero
-	 */
-	if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) {
-		timr->it.real.timer.expires = 0;
-		return 0;
-	}
 
-	if (adjust_abs_time(clock,
-			    &new_setting->it_value, flags & TIMER_ABSTIME, 
-			    &expire_64, &(timr->it.real.wall_to_prev))) {
-		return -EINVAL;
-	}
-	timr->it.real.timer.expires = (unsigned long)expire_64;
-	tstojiffie(&new_setting->it_interval, clock->res, &expire_64);
-	timr->it.real.incr = (unsigned long)expire_64;
+	/* switch off the timer when it_value is zero */
+	if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec)
+		return 0;
 
-	/*
-	 * We do not even queue SIGEV_NONE timers!  But we do put them
-	 * in the abs list so we can do that right.
+	/* Posix madness. Only absolute CLOCK_REALTIME timers
+	 * are affected by clock sets. So we must reiniatilize
+	 * the timer.
 	 */
-	if (((timr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE))
-		add_timer(&timr->it.real.timer);
+	if (timr->it_clock == CLOCK_REALTIME && (flags & TIMER_ABSTIME))
+		hrtimer_rebase(timer, CLOCK_REALTIME);
+	else
+		hrtimer_rebase(timer, CLOCK_MONOTONIC);
 
-	if (flags & TIMER_ABSTIME && clock->abs_struct) {
-		spin_lock(&clock->abs_struct->lock);
-		list_add_tail(&(timr->it.real.abs_timer_entry),
-			      &(clock->abs_struct->list));
-		spin_unlock(&clock->abs_struct->lock);
-	}
+	timer->expires = timespec_to_ktime(new_setting->it_value);
+
+	/* Convert interval */
+	timr->it.real.interval = timespec_to_ktime(new_setting->it_interval);
+
+	/* SIGEV_NONE timers are not queued ! See common_timer_get */
+	if (((timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE))
+		return 0;
+
+	hrtimer_start(timer, timer->expires, (flags & TIMER_ABSTIME) ?
+		      HRTIMER_ABS : HRTIMER_REL);
 	return 0;
 }
 
@@ -1020,8 +754,8 @@ sys_timer_settime(timer_t timer_id, int 
 	if (copy_from_user(&new_spec, new_setting, sizeof (new_spec)))
 		return -EFAULT;
 
-	if ((!good_timespec(&new_spec.it_interval)) ||
-	    (!good_timespec(&new_spec.it_value)))
+	if (!timespec_valid(&new_spec.it_interval) ||
+	    !timespec_valid(&new_spec.it_value))
 		return -EINVAL;
 retry:
 	timr = lock_timer(timer_id, &flag);
@@ -1037,8 +771,8 @@ retry:
 		goto retry;
 	}
 
-	if (old_setting && !error && copy_to_user(old_setting,
-						  &old_spec, sizeof (old_spec)))
+	if (old_setting && !error &&
+	    copy_to_user(old_setting, &old_spec, sizeof (old_spec)))
 		error = -EFAULT;
 
 	return error;
@@ -1046,24 +780,10 @@ retry:
 
 static inline int common_timer_del(struct k_itimer *timer)
 {
-	timer->it.real.incr = 0;
+	timer->it.real.interval.tv64 = 0;
 
-	if (try_to_del_timer_sync(&timer->it.real.timer) < 0) {
-#ifdef CONFIG_SMP
-		/*
-		 * It can only be active if on an other cpu.  Since
-		 * we have cleared the interval stuff above, it should
-		 * clear once we release the spin lock.  Of course once
-		 * we do that anything could happen, including the
-		 * complete melt down of the timer.  So return with
-		 * a "retry" exit status.
-		 */
+	if (hrtimer_try_to_cancel(&timer->it.real.timer) < 0)
 		return TIMER_RETRY;
-#endif
-	}
-
-	remove_from_abslist(timer);
-
 	return 0;
 }
 
@@ -1079,24 +799,16 @@ sys_timer_delete(timer_t timer_id)
 	struct k_itimer *timer;
 	long flags;
 
-#ifdef CONFIG_SMP
-	int error;
 retry_delete:
-#endif
 	timer = lock_timer(timer_id, &flags);
 	if (!timer)
 		return -EINVAL;
 
-#ifdef CONFIG_SMP
-	error = timer_delete_hook(timer);
-
-	if (error == TIMER_RETRY) {
+	if (timer_delete_hook(timer) == TIMER_RETRY) {
 		unlock_timer(timer, flags);
 		goto retry_delete;
 	}
-#else
-	timer_delete_hook(timer);
-#endif
+
 	spin_lock(&current->sighand->siglock);
 	list_del(&timer->list);
 	spin_unlock(&current->sighand->siglock);
@@ -1113,6 +825,7 @@ retry_delete:
 	release_posix_timer(timer, IT_ID_SET);
 	return 0;
 }
+
 /*
  * return timer owned by the process, used by exit_itimers
  */
@@ -1120,22 +833,13 @@ static inline void itimer_delete(struct 
 {
 	unsigned long flags;
 
-#ifdef CONFIG_SMP
-	int error;
 retry_delete:
-#endif
 	spin_lock_irqsave(&timer->it_lock, flags);
 
-#ifdef CONFIG_SMP
-	error = timer_delete_hook(timer);
-
-	if (error == TIMER_RETRY) {
+	if (timer_delete_hook(timer) == TIMER_RETRY) {
 		unlock_timer(timer, flags);
 		goto retry_delete;
 	}
-#else
-	timer_delete_hook(timer);
-#endif
 	list_del(&timer->list);
 	/*
 	 * This keeps any tasks waiting on the spin lock from thinking
@@ -1164,57 +868,7 @@ void exit_itimers(struct signal_struct *
 	}
 }
 
-/*
- * And now for the "clock" calls
- *
- * These functions are called both from timer functions (with the timer
- * spin_lock_irq() held and from clock calls with no locking.	They must
- * use the save flags versions of locks.
- */
-
-/*
- * We do ticks here to avoid the irq lock ( they take sooo long).
- * The seqlock is great here.  Since we a reader, we don't really care
- * if we are interrupted since we don't take lock that will stall us or
- * any other cpu. Voila, no irq lock is needed.
- *
- */
-
-static u64 do_posix_clock_monotonic_gettime_parts(
-	struct timespec *tp, struct timespec *mo)
-{
-	u64 jiff;
-	unsigned int seq;
-
-	do {
-		seq = read_seqbegin(&xtime_lock);
-		getnstimeofday(tp);
-		*mo = wall_to_monotonic;
-		jiff = jiffies_64;
-
-	} while(read_seqretry(&xtime_lock, seq));
-
-	return jiff;
-}
-
-static int do_posix_clock_monotonic_get(const clockid_t clock,
-					struct timespec *tp)
-{
-	struct timespec wall_to_mono;
-
-	do_posix_clock_monotonic_gettime_parts(tp, &wall_to_mono);
-
-	set_normalized_timespec(tp, tp->tv_sec + wall_to_mono.tv_sec,
-				tp->tv_nsec + wall_to_mono.tv_nsec);
-
-	return 0;
-}
-
-int do_posix_clock_monotonic_gettime(struct timespec *tp)
-{
-	return do_posix_clock_monotonic_get(CLOCK_MONOTONIC, tp);
-}
-
+/* Not available / possible... functions */
 int do_posix_clock_nosettime(const clockid_t clockid, struct timespec *tp)
 {
 	return -EINVAL;
@@ -1288,107 +942,6 @@ sys_clock_getres(const clockid_t which_c
 }
 
 /*
- * The standard says that an absolute nanosleep call MUST wake up at
- * the requested time in spite of clock settings.  Here is what we do:
- * For each nanosleep call that needs it (only absolute and not on
- * CLOCK_MONOTONIC* (as it can not be set)) we thread a little structure
- * into the "nanosleep_abs_list".  All we need is the task_struct pointer.
- * When ever the clock is set we just wake up all those tasks.	 The rest
- * is done by the while loop in clock_nanosleep().
- *
- * On locking, clock_was_set() is called from update_wall_clock which
- * holds (or has held for it) a write_lock_irq( xtime_lock) and is
- * called from the timer bh code.  Thus we need the irq save locks.
- *
- * Also, on the call from update_wall_clock, that is done as part of a
- * softirq thing.  We don't want to delay the system that much (possibly
- * long list of timers to fix), so we defer that work to keventd.
- */
-
-static DECLARE_WAIT_QUEUE_HEAD(nanosleep_abs_wqueue);
-static DECLARE_WORK(clock_was_set_work, (void(*)(void*))clock_was_set, NULL);
-
-static DECLARE_MUTEX(clock_was_set_lock);
-
-void clock_was_set(void)
-{
-	struct k_itimer *timr;
-	struct timespec new_wall_to;
-	LIST_HEAD(cws_list);
-	unsigned long seq;
-
-
-	if (unlikely(in_interrupt())) {
-		schedule_work(&clock_was_set_work);
-		return;
-	}
-	wake_up_all(&nanosleep_abs_wqueue);
-
-	/*
-	 * Check if there exist TIMER_ABSTIME timers to correct.
-	 *
-	 * Notes on locking: This code is run in task context with irq
-	 * on.  We CAN be interrupted!  All other usage of the abs list
-	 * lock is under the timer lock which holds the irq lock as
-	 * well.  We REALLY don't want to scan the whole list with the
-	 * interrupt system off, AND we would like a sequence lock on
-	 * this code as well.  Since we assume that the clock will not
-	 * be set often, it seems ok to take and release the irq lock
-	 * for each timer.  In fact add_timer will do this, so this is
-	 * not an issue.  So we know when we are done, we will move the
-	 * whole list to a new location.  Then as we process each entry,
-	 * we will move it to the actual list again.  This way, when our
-	 * copy is empty, we are done.  We are not all that concerned
-	 * about preemption so we will use a semaphore lock to protect
-	 * aginst reentry.  This way we will not stall another
-	 * processor.  It is possible that this may delay some timers
-	 * that should have expired, given the new clock, but even this
-	 * will be minimal as we will always update to the current time,
-	 * even if it was set by a task that is waiting for entry to
-	 * this code.  Timers that expire too early will be caught by
-	 * the expire code and restarted.
-
-	 * Absolute timers that repeat are left in the abs list while
-	 * waiting for the task to pick up the signal.  This means we
-	 * may find timers that are not in the "add_timer" list, but are
-	 * in the abs list.  We do the same thing for these, save
-	 * putting them back in the "add_timer" list.  (Note, these are
-	 * left in the abs list mainly to indicate that they are
-	 * ABSOLUTE timers, a fact that is used by the re-arm code, and
-	 * for which we have no other flag.)
-
-	 */
-
-	down(&clock_was_set_lock);
-	spin_lock_irq(&abs_list.lock);
-	list_splice_init(&abs_list.list, &cws_list);
-	spin_unlock_irq(&abs_list.lock);
-	do {
-		do {
-			seq = read_seqbegin(&xtime_lock);
-			new_wall_to =	wall_to_monotonic;
-		} while (read_seqretry(&xtime_lock, seq));
-
-		spin_lock_irq(&abs_list.lock);
-		if (list_empty(&cws_list)) {
-			spin_unlock_irq(&abs_list.lock);
-			break;
-		}
-		timr = list_entry(cws_list.next, struct k_itimer,
-				  it.real.abs_timer_entry);
-
-		list_del_init(&timr->it.real.abs_timer_entry);
-		if (add_clockset_delta(timr, &new_wall_to) &&
-		    del_timer(&timr->it.real.timer))  /* timer run yet? */
-			add_timer(&timr->it.real.timer);
-		list_add(&timr->it.real.abs_timer_entry, &abs_list.list);
-		spin_unlock_irq(&abs_list.lock);
-	} while (1);
-
-	up(&clock_was_set_lock);
-}
-
-/*
  * nanosleep for monotonic and realtime clocks
  */
 static int common_nsleep(const clockid_t which_clock, int flags,
@@ -1401,7 +954,7 @@ static int common_nsleep(const clockid_t
 	case CLOCK_REALTIME:
 		/* Posix madness. Only absolute timers on clock realtime
 		   are affected by clock set. */
-		if (mode == HRTIMER_ABS)
+		if (mode != HRTIMER_ABS)
 			clockid = CLOCK_MONOTONIC;
 	case CLOCK_MONOTONIC:
 		break;
Index: linux-2.6.15-rc5/include/linux/hrtimer.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/hrtimer.h
+++ linux-2.6.15-rc5/include/linux/hrtimer.h
@@ -93,6 +93,13 @@ struct hrtimer_base {
 	struct hrtimer		*curr_timer;
 };
 
+/*
+ * clock_was_set() is a NOP for non- high-resolution systems. The
+ * time-sorted order guarantees that a timer does not expire early and
+ * is expired in the next softirq when the clock was advanced.
+ */
+#define clock_was_set()		do { } while (0)
+
 /* Exported timer functions: */
 
 /* Initialize timers: */

--


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
                   ` (20 preceding siblings ...)
  2005-12-06  0:01 ` [patch 21/21] Convert posix timers completely tglx
@ 2005-12-06 17:32 ` Roman Zippel
  2005-12-06 19:07   ` Ingo Molnar
                     ` (2 more replies)
  21 siblings, 3 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-06 17:32 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi Thomas,

On Tue, 6 Dec 2005 tglx@linutronix.de wrote:

Before I get into a detailed review, I have to asked a question I already 
asked earlier: are even interested in a discussion about this?

Since I posted the ptimer patches, I haven't gotten a single direct 
response from you, except some generic description in your last patch.
I would prefer if we could work together on this, but this requires some 
communication. I know I'm sometimes a little hard to understand, but you 
don't even try to ask if something is unclear or to explain the details 
from your perspective.
Slowly I'm asking myself why I should bother, the alternative would be to 
just continue my own patch set. I don't really want that and Andrew 
certainly doesn't want to choose between two versions either. So Thomas, 
please get over yourself and start talking.

> We worked through the subsystem and its users and further reduced the 
> implementation to the basic required infrastructure and generally 
> streamlined it. (We did this with easy extensibility for the high 
> resolution clock support still in mind, so we kept some small extras 
> around.)

It looks better, but could you please explain, what these extras are good 
for?

> After reading the Posix specification again, we came to the conclusion 
> that it is possible to do no rounding at all for the ktime_t values, and 
> to still ensure that the timer is not delivered early.

Nice, that you finally also come to that conclusion, after I said that 
already for ages. (It's also interesting how you do that without giving me 
any credit for it.)
Nevertheless, if you read my explanation of the rounding carefully and 
look at my implementation, you may notice that I still disagree with the 
actual implementation.

BTW there is one thing I'm currently curious about. Why did you rename the 
timer from high-precision timer to high-resolution timer? hrtimer was just 
a suggestion from Andrew and ptimer would have been fine as well.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 17:32 ` [patch 00/21] hrtimer - High-resolution timer subsystem Roman Zippel
@ 2005-12-06 19:07   ` Ingo Molnar
  2005-12-07  3:05     ` Roman Zippel
  2005-12-06 22:10   ` Thomas Gleixner
  2005-12-06 22:28   ` Thomas Gleixner
  2 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2005-12-06 19:07 UTC (permalink / raw)
  To: Roman Zippel; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul


* Roman Zippel <zippel@linux-m68k.org> wrote:

> Hi Thomas,
> 
> On Tue, 6 Dec 2005 tglx@linutronix.de wrote:
> 
> Before I get into a detailed review, I have to asked a question I 
> already asked earlier: are even interested in a discussion about this?

we are certainly interested in a technical discussion!

> I would prefer if we could work together on this, but this requires 
> some communication. I know I'm sometimes a little hard to understand, 
> but you don't even try to ask if something is unclear or to explain 
> the details from your perspective.

you think the reason is that you are "sometimes a little hard to 
understand". Which, as i guess it implies, comes from your superior 
intellectual state of mind, and i am really thankful for your efforts 
trying to educate us mere mortals.

but do you honestly believe that this is the only possible reason? How 
about "your message often gets lost because you often offend people and 
thus do not respect their work" as a possibility? How about "hence it 
has not been much fun to work with you" as a further explanation?

to be able to comprehend what kind of mood we might be in when reading 
your emails these days, how about this little snippet from you, from the 
second email you wrote in the ktimers threads:

"First off, I can understand that you're rather upset with what I wrote,
 unfortunately you got overly defensive, so could you please next time
 not reply immediately and first sleep over it, an overly emotional
 reply is funny to read but not exactly useful."

 http://marc.theaimsgroup.com/?l=linux-kernel&m=112743074308613&w=2

and to tell you my personal perspective, the insults coming from you in 
our direction have not appeared to have stopped since. I am being dead 
serious here, and i'd love nothing else if you stopped doing what you 
are doing and if i didnt have to write such mails and if things got more 
constructive in the end. Insults like the following sentence in this 
very email:

> [...] So Thomas, please get over yourself and start talking.

let me be frank, and show you my initial reply that came to my mind when 
reading the above sentence: "who the f*ck do you think you are to talk 
to _anyone_ like that?". Now i'm usually polite and wont reply like 
that, but one thing is sure: no technical thought was triggered by your 
sentence and no eternal joy filled my mind aching to reply to your 
questions. Suggestion: if you want communication and cooperation, then 
be cooperative to begin with. We are doing Linux kernel programming for 
the fun of it, and the style of discussions matters just as much as the 
style of code.

i'm not sure what eternal sin we've committed to have deserved the 
sometimes hidden, sometimes less hidden trash-talk you've been 
practicing ever since we announced ktimers.

in any case, from me you'll definitely get a reply to every positive or 
constructive question you ask in this thread, but you wont get many 
replies to mails that also include high-horse insults, question or 
statements. Frankly, i dont have that much time to burn, we've been 
through one ktimer flamewar already and it wasnt overly productive :)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 17:32 ` [patch 00/21] hrtimer - High-resolution timer subsystem Roman Zippel
  2005-12-06 19:07   ` Ingo Molnar
@ 2005-12-06 22:10   ` Thomas Gleixner
  2005-12-07  3:11     ` Roman Zippel
  2005-12-06 22:28   ` Thomas Gleixner
  2 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-06 22:10 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi Roman,

On Tue, 2005-12-06 at 18:32 +0100, Roman Zippel wrote:
> Before I get into a detailed review, I have to asked a question I already 
> asked earlier: are even interested in a discussion about this?

Yes, I am and always was, as long it is on a technical level.

> Slowly I'm asking myself why I should bother, the alternative would be 
> to just continue my own patch set. I don't really want that and Andrew 
> certainly doesn't want to choose between two versions either. So 
> Thomas, please get over yourself and start talking.

I'm interested in working with others and I do that a lot. It depends a
bit on the attitude of the person who wants to do that. I did not have
the feeling that you are interested in working together. Usually people
who want to participate in a project send patches, suggestions or
testing feedback. Your reaction throughout the whole mail threads was
neither cooperative nor appealing to me. I have no problem at all to
accept critizism and help from others, but your attitude of teaching me
how to do my work was just annoying. 

When others have done the hard chores of analysing the underlying
problems and trying to solve them in various ways it is a simple task to
jump in and tell them the big truth of the right solution. Acknowledging
the work of others which led to a maybe imperfect solution in the first
pass and helping in a constructive way to bring it to a better shape is
a different thing.

Sure you can fork off your own project and do what you want if you feel
the urge to do so. We'd prefer to see patches against our queue, but
it's up to you.

I'm replying to the technical points in a different mail.

    tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 17:32 ` [patch 00/21] hrtimer - High-resolution timer subsystem Roman Zippel
  2005-12-06 19:07   ` Ingo Molnar
  2005-12-06 22:10   ` Thomas Gleixner
@ 2005-12-06 22:28   ` Thomas Gleixner
  2005-12-07  9:31     ` Andrew Morton
  2005-12-07 12:18     ` Roman Zippel
  2 siblings, 2 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-06 22:28 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi Roman,

On Tue, 2005-12-06 at 18:32 +0100, Roman Zippel wrote:

> > We worked through the subsystem and its users and further reduced the 
> > implementation to the basic required infrastructure and generally 
> > streamlined it. (We did this with easy extensibility for the high 
> > resolution clock support still in mind, so we kept some small extras 
> > around.)
> 
> It looks better, but could you please explain, what these extras are 
> good for?

One extra we kept is the list and the state field, which are useful for
the high resolution implementation. We wanted to keep as much as
possible common code [shared between the current low-resolution clock
based hrtimer code and future high-resolution clock based hrtimercode]
to avoid the big #ifdeffery all over the place.

It might turn out in the rework of the hrt bits that the list can go
away, but this is a nobrainer to do. (The very first unpublished version
of ktimers had no list, it got introduced during the hrt addon and
stayed there to make the hrt patch less intrusive.)

> > After reading the Posix specification again, we came to the conclusion 
> > that it is possible to do no rounding at all for the ktime_t values, and 
> > to still ensure that the timer is not delivered early.
> 
> Nice, that you finally also come to that conclusion, after I said that 
> already for ages. (It's also interesting how you do that without 
> giving me any credit for it.)

Sorry if it was previously your idea and if we didnt credit you for it.
I did not keep track of each word said in these endless mail threads. We
credited every suggestion and idea which we picked up from you, see our
previous emails. If we missed one, it was definitely not intentional.

The decision to change the rounding implementation was not made based on
reading old mail threads. It was made by doing tests and analysis and it
differs a lot from your implementation.

Setting up a periodic timer leads to a summing-up error versus the
expected timeline. This applies both to vanilla, ktimers and ptimers.
The difference is how this error is showing up:

The vanilla 2.6.15-rc5 kernel has an rather constant error which is in
the range of roughly 1ms per interval mostly independent of the given
interval and the system load. This is roughly 1 sec per 1000 intervals.

Ktimers had a contant summing error which is exactly the delta of the
rounded interval to the real interval. The error is number of intervals
* delta. The error could be deduced by the application, but thats not a
really good idea. For a 50ms interval it is 2sec / 1000 intervals, which
is exactly the 2ms delta between the 50ms requested and the 52ms real
interval on a system with HZ=250

Ptimers have a rounding error which depends on the delta to the jiffy
resolution and the system load. So it gets rather unpredicitble what
happens. The basic error is roughly the same as with ktimers, but the
addon component due to system load is not. For a 50ms interval a summing
error between 2sec and 7sec per 1000 intervals was measured.

So while vanilla and ktimers have a systematic error, ptimer introduces
random Brownean motion!

We analysed the problem again and went through the spec and came to the
conclusion that rounding can be completely omitted. We changed the code
accordingly and did the same tests. The result is systematic deviation
of the timeline which wanders between 0 and resolution - 1 [i.e. 0-4msec
with HZ=250], but does not introduce a summing error. This behaviour
will be the same when high resolution bits are put on top. Of course the
error then will be significantly smaller.

To sum up the effects of various implementations (and
non-implementations in our hrtimers case) of rounding, a 50 msec
interval timer accumulates the following timeline error (precision
error) over 1000 periods (50 seconds):

 vanilla:        1000 msecs
 ktimers:        2000 msecs
 ptimers:   2000-7000 msecs
 hrtimers:          4 msecs

In the interim low-res ktimers version we were concentrating on the
'multiples of exposed resolution case. E.g. with 40 msec intervals
(which is 10x 4msec jiffy) you'd only get 0-4msecs longterm error:

 vanilla:        1000 msecs
 ktimers:           4 msecs
 ptimers:      8-2000 msecs
 hrtimers:          4 msecs

> Nevertheless, if you read my explanation of the rounding carefully and 
> look at my implementation, you may notice that I still disagree with 
> the actual implementation.

I started to read it, but your explanation seems to be completely
detached from the testing results and the code.

I can imagine that you dont agree, but you might also elaborate why. I
definitely disagree with your implementation for the following reasons:

You define that absolute interval timers on clock realtime change their
behaviour when the initial time value is expired into relative timers
which are not affected by time settings. I have no idea how you came to
that interpretation of the spec. I completely disagree [but if you would
like I can go into more detail why I think it's wrong.]

Beside of that, the implementation is also completely broken. (You
rebase the timer from the realtime base to the monotonic base inside of
the timer callback function. On return you lock the realtime base and
enqueue the timer into the realtime queue, but the base field of the
timer points to the monotonic queue. It needs not much phantasy to get
this exploited into a crash.)

Furthermore, your implementation is calculating the next expiry value
based on the current time of the expiration rather than on the previous
expected expiry time, which would be the natural thing to do. This
detail also explains the system-load dependent random drifting of
ptimers quite well.

The changes you did to the timer locking code (also in timer.c) are racy
and simply exposable. Oleg's locking implementation is there for a good
reason.

Neither do I understand the jiffie boundness you re-introduced all over
the place. The softirq code is called once per jiffy and the expiry is
checked versus the current time. Basing a new design on jiffies, where
the design intends to be easily extensible to high resolution clocks, is
wrong IMNSHO. Doing a high resolution extension on top of it is just
introducing a lot of #ifdef mess in places where none has to be. We had
that before, and dont want to go back there.

One of your main, often repeated arguments was the complexity of
ktimers. While ktimers held a lot of complex functionality, the "simple"
ptimers .text size is larger than the ktimers one! I know that you claim
that .text size is not a criteria, but Andrew seriously asked what he
gets for the increase of .text.

> BTW there is one thing I'm currently curious about. Why did you rename 
> the timer from high-precision timer to high-resolution timer? hrtimer 
> was just a suggestion from Andrew and ptimer would have been fine as 
> well.

We decided to rename 'ktimer' because Andrew pretty much vetoed the
'ktimeout' queue, and "timer_list" plus "ktimer" looked and sounded
confusing (as we've explained it before). Of the possible target names,
we decided against "ptimer" because it could be confused with "process
timers" and "posix timers". hrtimers is a clear term that indicates what
those timers do, so we picked up Andrew's suggestion as a way out the
endless naming discussion. Does this satisfy your curiosity?

   tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 19:07   ` Ingo Molnar
@ 2005-12-07  3:05     ` Roman Zippel
  2005-12-08  5:18       ` Paul Jackson
  2005-12-08  9:26       ` Ingo Molnar
  0 siblings, 2 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-07  3:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul

Hi,

On Tue, 6 Dec 2005, Ingo Molnar wrote:

> you think the reason is that you are "sometimes a little hard to
> understand". Which, as i guess it implies, comes from your superior
> intellectual state of mind, and i am really thankful for your efforts
> trying to educate us mere mortals.

I can assure you my "superior intellectual state of mind" is not much 
different from many other kernel hackers. I have at times strong opinions, 
but who here hasn't?

> to be able to comprehend what kind of mood we might be in when reading 
> your emails these days, how about this little snippet from you, from the 
> second email you wrote in the ktimers threads:
> 
> "First off, I can understand that you're rather upset with what I wrote,
>  unfortunately you got overly defensive, so could you please next time
>  not reply immediately and first sleep over it, an overly emotional
>  reply is funny to read but not exactly useful."

Here we probably get to the root of the problem: we got off on the wrong 
foot. 
In my first email I hadn't much good to say about the initial 
announcement, but at any time it was meant technical. Anyone who compares 
the first and the following announcement will notice the big improvement. 
Unfortunately Thomas seemed to have to taken it rather personal (although 
it never was meant that way) and I never got past this first impression 
and ever since I can't get him back to a normal conversation.

> Insults like the following sentence in this very email:
> 
> > [...] So Thomas, please get over yourself and start talking.

I must say it's completely beyond me how this could be "insulting". This 
is my desperate attempt at getting any conversation started. If Thomas 
isn't talking to me at all, I can't resolve any issue he might have with 
me. Instead he's just moping around, pissed at me and simply ignores me, 
which makes a conversation over this channel nearly impossible.

> let me be frank, and show you my initial reply that came to my mind when 
> reading the above sentence: "who the f*ck do you think you are to talk 
> to _anyone_ like that?". Now i'm usually polite and wont reply like 
> that,...

You may haven't said it openly like that, but this hostility was still 
noticable. You disagreed with me on minor issues and used the smallest 
mistake to simply lecture me. From my point the attitude you showed 
towards me is not much different from what you're accusing me of here.
I'm not saying that I'm innocent about this, but any "insult" was never 
intentional and I tried my best to correct any issues after we got off on 
the wrong foot, but I obviously failed at that, I simply never got past 
the initial impression.

> in any case, from me you'll definitely get a reply to every positive or 
> constructive question you ask in this thread, but you wont get many 
> replies to mails that also include high-horse insults, question or 
> statements.

Let's take the ptimer patches, I got _zero_ direct responses to it and 
it's difficult for me to understand how this could be taken as "high-horse 
insult". As I obviously failed to make my criticism understandable before, 
I produced these patches to provide a technical base for a discussion of 
how this functionality could be merged in the hopes of "Patches wont be 
ignored, i can assure you", unfortunately they were.
Ingo, you might now start to understand my frustration. One positive 
effect at least is that finally some movement got into this mess and you 
managed to produce a simplified version of the timer. OTOH since I never 
got a reply to these patches does that mean they were neither positive nor 
constructive?

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 22:10   ` Thomas Gleixner
@ 2005-12-07  3:11     ` Roman Zippel
  0 siblings, 0 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-07  3:11 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Tue, 6 Dec 2005, Thomas Gleixner wrote:

> I'm interested in working with others and I do that a lot. It depends a
> bit on the attitude of the person who wants to do that. I did not have
> the feeling that you are interested in working together. Usually people
> who want to participate in a project send patches, suggestions or
> testing feedback. Your reaction throughout the whole mail threads was
> neither cooperative nor appealing to me. I have no problem at all to
> accept critizism and help from others, but your attitude of teaching me
> how to do my work was just annoying. 

See my mail to Ingo about most of this. The basic point is you should have 
told me about this earlier, simply ignoring the problem won't make it go 
away. Your annoyance was quite noticable, but this seemed to include my 
complete contribution. You never said what annoyed you, which makes it 
rather hard for me to you to change it into a more acceptable form.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 22:28   ` Thomas Gleixner
@ 2005-12-07  9:31     ` Andrew Morton
  2005-12-07 10:11       ` Ingo Molnar
  2005-12-07 12:18     ` Roman Zippel
  1 sibling, 1 reply; 74+ messages in thread
From: Andrew Morton @ 2005-12-07  9:31 UTC (permalink / raw)
  To: tglx; +Cc: zippel, linux-kernel, rostedt, johnstul, mingo

Thomas Gleixner <tglx@linutronix.de> wrote:
>
> We decided to rename 'ktimer' because Andrew pretty much vetoed the
>  'ktimeout' queue

Well I whined about the rename of timer_list to ktimeout and asked why it
happened.  I don't think anyone replied.

I assume from your above statement that there wasn't really a strong reason
for the rename, and that a new patch series is in the offing, which adds
ktimers and leaves timer_list alone?

Is ktimer a better name than ptimer or hrtimer?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07  9:31     ` Andrew Morton
@ 2005-12-07 10:11       ` Ingo Molnar
  2005-12-07 10:20         ` Ingo Molnar
  2005-12-07 10:23         ` Nick Piggin
  0 siblings, 2 replies; 74+ messages in thread
From: Ingo Molnar @ 2005-12-07 10:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: tglx, zippel, linux-kernel, rostedt, johnstul


* Andrew Morton <akpm@osdl.org> wrote:

> Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > We decided to rename 'ktimer' because Andrew pretty much vetoed the
> >  'ktimeout' queue

let me defuse things a bit here: the above sentence might sound a bit 
bitter, but we really, truly are not. You were more like the sane voice 
slapping us back into reality: for the next 2 years we do not want to be 
buried in tons of timer_list->ktimeout patches, causing disruption all 
across the kernel (and external trees). You definitely did not 'veto' it 
in any way, and in fact you are carrying it in -mm currently.

I did the ktimeout queue in an hour or so, and i dont have strong
feelings about it. I very much agree with you that a mass rename could
easily cause more problems than the added clarity adds - still i had to
try the ktimeout queue, because i'm hopelessly purist at heart :)

Maybe in a few years all substantial kernel code will be managed by a 
network of GIT repositories, and GIT will be extended with automatic 
'mass namespace change' capabilities, making an overnight switchover 
much more practical.

> Well I whined about the rename of timer_list to ktimeout and asked why 
> it happened.  I don't think anyone replied.

we thought we had this issue covered way too many times :) but find 
below my original justification for the ktimeout patch-queue. This is 
just for historical interest now i think.

[ insert the text below here. Time passes as everyone reads it :-) ]

once we take 'mass change of timer_list to ktimeout' out of the possible 
things to do, we've only got these secondary possibilities:

	'struct timer_list, struct ktimer'
	'struct timer_list, struct ptimer'
	'struct timer_list, struct hrtimer'

and having eliminated the first option due to being impractical to pull 
off, we had the choice between 'ptimer' and 'hrtimer', and went for the 
last one, for the following reason [snipped from a mail to Roman]:

| we decided against "ptimer" because it could be confused with "process 
| timers" and "posix timers". hrtimers is a clear term that indicates 
| what those timers do, so we picked up Andrew's suggestion as a way out 
| the endless naming discussion.

but really ... facing an imperfect naming situation (i do not think 
timer_list is the correct name - just as much as struct inode_list would 
not be correct - but it is the historic name and i think you are right 
that we've got to live with it) i'm alot less passionate about which one 
to pick. If we had the chance to have perfect naming, i'd definitely 
spend the effort to get it right, but now lets just go with the most 
descriptive one: 'struct hrtimer'.

	Ingo

-----
regarding naming. This is something best left to native speakers, but 
i'm not giving up the issue just yet :-)

i always sensed and agreed that 'struct ktimer' and 'struct timer_list' 
together is confusing. Same for kernel/ktimers.c and kernel/timers.c. So 
no argument about that, this situation cannot continue.

but the reason i am still holding on to 'struct ktimer' is that i think 
the end result should be:

 - 'struct ktimer' (for precise timers where the event of expiry is the 
                    main function)

 - 'struct ktimeout' (for the wheel-based timeouts, where expiry is an 
                      exception)

Similarly, kernel/ktimer.c for ktimers, and kernel/ktimeout.c for 
timeouts.

see the attached finegrained patchqueue that does all the changes to 
rename 'timers' to 'timeouts' [and to clean up the resulting subsystem], 
to see what i'm trying to achieve.

For now i'm ignoring the feasibility of a 'mass API change' issues - 
those are better up to lkml. The queue does build and compile fine on a 
rather big .config so the only question is - do we want it. Note that 
the patch does not have to touch even a single non-subsystem user of the 
timer.c APIs, so the renaming is robust.

IMO it looks a lot less confusing and dualistic that way. The rename is 
technically feasible and robust mainly because we can do this:

#define timer_list timeout

for the transition period (see the patch-queue). Fortunately timer_list 
is not a generic name. (it's also an incorrect name because it implies 
implementation) Here's the full list of mappings that occur:

#define timer_list			ktimeout

#define TIMER_INITIALIZER		KTIMEOUT_INITIALIZER 
#define DEFINE_TIMER			DEFINE_KTIMEOUT
#define init_timer			ktimeout_init
#define setup_timer			ktimeout_setup
#define timer_pending			ktimeout_pending
#define add_timer_on			ktimeout_add_on
#define del_timer			ktimeout_del
#define __mod_timer			__ktimeout_mod
#define mod_timer			ktimeout_mod
#define next_timer_interrupt		ktimeout_next_interrupt
#define add_timer			ktimeout_add
#define try_to_del_timer_sync		ktimeout_try_to_del_sync
#define del_timer_sync			ktimeout_del_sync
#define del_singleshot_timer_sync	ktimeout_del_singleshot_sync
#define init_timers			init_ktimeouts
#define run_local_timers		run_local_ktimeouts

but maybe 'struct ptimer' and 'struct ktimeout' is the better choice? I 
dont think so, but it's a possibility.

so i believe that:

	- 'struct ktimer', 'struct ktimeout'

is in theory superior naming, compared to:

	- 'struct ptimer', 'struct timer_list'

again, ignoring all the 'do we want to have a massive namespace change' 
issues.

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 10:11       ` Ingo Molnar
@ 2005-12-07 10:20         ` Ingo Molnar
  2005-12-07 10:23         ` Nick Piggin
  1 sibling, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2005-12-07 10:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: tglx, zippel, linux-kernel, rostedt, johnstul


* Ingo Molnar <mingo@elte.hu> wrote:

> once we take 'mass change of timer_list to ktimeout' out of the possible 
> things to do, we've only got these secondary possibilities:
> 
> 	'struct timer_list, struct ktimer'
> 	'struct timer_list, struct ptimer'
> 	'struct timer_list, struct hrtimer'
> 
> and having eliminated the first option due to being impractical to pull 
> off, [...]

(correction: due to being confusing.)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 10:11       ` Ingo Molnar
  2005-12-07 10:20         ` Ingo Molnar
@ 2005-12-07 10:23         ` Nick Piggin
  2005-12-07 10:49           ` Ingo Molnar
  1 sibling, 1 reply; 74+ messages in thread
From: Nick Piggin @ 2005-12-07 10:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, tglx, zippel, linux-kernel, rostedt, johnstul

Ingo Molnar wrote:

> so i believe that:
> 
> 	- 'struct ktimer', 'struct ktimeout'
> 
> is in theory superior naming, compared to:
> 
> 	- 'struct ptimer', 'struct timer_list'
> 

Just curious -- why the "k" thing?

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 10:23         ` Nick Piggin
@ 2005-12-07 10:49           ` Ingo Molnar
  2005-12-07 11:09             ` Nick Piggin
  0 siblings, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2005-12-07 10:49 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, tglx, zippel, linux-kernel, rostedt, johnstul


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Ingo Molnar wrote:
> 
> >so i believe that:
> >
> >	- 'struct ktimer', 'struct ktimeout'
> >
> >is in theory superior naming, compared to:
> >
> >	- 'struct ptimer', 'struct timer_list'
> >
> 
> Just curious -- why the "k" thing?

yeah. 'struct timer' and 'struct timeout' is even better. I tried it on 
real code and sometimes it looked a bit funny: often we have a 'timeout' 
parameter somewhere that is a scalar or a timeval/timespec. So at least 
for variable names it was useful to have it in this form:

	struct timeout *ktimeout;

	struct timer *ktimer;

otherwise it looked OK. This is also in line with most other 'object 
names' we have in the kernel: struct inode, struct dentry.

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 10:49           ` Ingo Molnar
@ 2005-12-07 11:09             ` Nick Piggin
  2005-12-07 11:33               ` Ingo Molnar
  2005-12-07 12:40               ` Roman Zippel
  0 siblings, 2 replies; 74+ messages in thread
From: Nick Piggin @ 2005-12-07 11:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, tglx, zippel, linux-kernel, rostedt, johnstul

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>Ingo Molnar wrote:
>>
>>
>>>so i believe that:
>>>
>>>	- 'struct ktimer', 'struct ktimeout'
>>>
>>>is in theory superior naming, compared to:
>>>
>>>	- 'struct ptimer', 'struct timer_list'
>>>
>>
>>Just curious -- why the "k" thing?
> 
> 
> yeah. 'struct timer' and 'struct timeout' is even better. I tried it on 

Oh good, glad you think so :)

> real code and sometimes it looked a bit funny: often we have a 'timeout' 
> parameter somewhere that is a scalar or a timeval/timespec. So at least 

Sure... hmm, the names timeout and timer themselves have something
vagely wrong about them, but I can't quite place my finger on it,
not a real worry though...

Maybe it is that timeout is an end result, but timer is a mechanism.
So maybe it should be 'struct interval', 'struct timeout';
or 'struct timer', 'struct timeout_timer'.

But I don't know really, it isn't a big deal.

Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 11:09             ` Nick Piggin
@ 2005-12-07 11:33               ` Ingo Molnar
  2005-12-07 11:40                 ` Nick Piggin
  2005-12-07 13:06                 ` Roman Zippel
  2005-12-07 12:40               ` Roman Zippel
  1 sibling, 2 replies; 74+ messages in thread
From: Ingo Molnar @ 2005-12-07 11:33 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, tglx, zippel, linux-kernel, rostedt, johnstul


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> >>Just curious -- why the "k" thing?
> >
> >
> >yeah. 'struct timer' and 'struct timeout' is even better. I tried it on 
> 
> Oh good, glad you think so :)
> 
> >real code and sometimes it looked a bit funny: often we have a 'timeout' 
> >parameter somewhere that is a scalar or a timeval/timespec. So at least 
> 
> Sure... hmm, the names timeout and timer themselves have something 
> vagely wrong about them, but I can't quite place my finger on it, not 
> a real worry though...
> 
> Maybe it is that timeout is an end result, but timer is a mechanism.  

hm, i think you are right.

> So maybe it should be 'struct interval', 'struct timeout'; or 'struct 
> timer', 'struct timeout_timer'.

maybe 'struct timer' and 'struct hrtimer' is the right solution after 
all, and our latest queue doing 'struct timer_list' + 'struct hrtimer' 
is actually quite close to it.

'struct ptimer' does have a bit of vagueness in it at first sight, do 
you agree with that? (does it mean 'process'? 'posix'? 'precision'?) 

also, hrtimers on low-res clocks do have high internal resolution, but 
they are not precise timing mechanisms in the end, due to the low-res 
clock. So the more generic name would be 'high-resolution timers', not 
'precision timers'. (also, the name 'precision timers' sounds a bit 
funny too, but i dont really know why.)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 11:33               ` Ingo Molnar
@ 2005-12-07 11:40                 ` Nick Piggin
  2005-12-07 13:06                 ` Roman Zippel
  1 sibling, 0 replies; 74+ messages in thread
From: Nick Piggin @ 2005-12-07 11:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, tglx, zippel, linux-kernel, rostedt, johnstul

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 

>>Maybe it is that timeout is an end result, but timer is a mechanism.  
> 
> 
> hm, i think you are right.
> 
> 
>>So maybe it should be 'struct interval', 'struct timeout'; or 'struct 
>>timer', 'struct timeout_timer'.
> 
> 
> maybe 'struct timer' and 'struct hrtimer' is the right solution after 
> all, and our latest queue doing 'struct timer_list' + 'struct hrtimer' 
> is actually quite close to it.
> 
> 'struct ptimer' does have a bit of vagueness in it at first sight, do 
> you agree with that? (does it mean 'process'? 'posix'? 'precision'?) 
> 

Yes I would agree that the p doesn't add much, wheras hrtimer at least
*rules out* the obvious process and posix.

I can't see a problem with timer and hrtimer myself.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-06 22:28   ` Thomas Gleixner
  2005-12-07  9:31     ` Andrew Morton
@ 2005-12-07 12:18     ` Roman Zippel
  2005-12-07 16:55       ` Ingo Molnar
  2005-12-09 17:23       ` Thomas Gleixner
  1 sibling, 2 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-07 12:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi

On Tue, 6 Dec 2005, Thomas Gleixner wrote:

> > It looks better, but could you please explain, what these extras are 
> > good for?
> 
> One extra we kept is the list and the state field, which are useful for
> the high resolution implementation. We wanted to keep as much as
> possible common code [shared between the current low-resolution clock
> based hrtimer code and future high-resolution clock based hrtimercode]
> to avoid the big #ifdeffery all over the place.
> 
> It might turn out in the rework of the hrt bits that the list can go
> away, but this is a nobrainer to do. (The very first unpublished version
> of ktimers had no list, it got introduced during the hrt addon and
> stayed there to make the hrt patch less intrusive.)

If it's such a nobrainer to remove it, why don't you add it later? As it 
is right now, it's not needed and nobody but you knows what it's good for. 
This way yoy make it only harder to review the code, if one stumbles over 
these pieces all the time.

> > Nice, that you finally also come to that conclusion, after I said that 
> > already for ages. (It's also interesting how you do that without 
> > giving me any credit for it.)
> 
> Sorry if it was previously your idea and if we didnt credit you for it.
> I did not keep track of each word said in these endless mail threads. We
> credited every suggestion and idea which we picked up from you, see our
> previous emails. If we missed one, it was definitely not intentional.

http://marc.theaimsgroup.com/?l=linux-kernel&m=112755827327101
http://marc.theaimsgroup.com/?l=linux-kernel&m=112760582427537

A bit later ktime_t looked pretty much like the 64bit part of my ktimespec.
I don't won't to imply any intention, but please try to see this from my 
perspective, after this happens a number of times.

> Setting up a periodic timer leads to a summing-up error versus the
> expected timeline. This applies both to vanilla, ktimers and ptimers.
> The difference is how this error is showing up:
> 
> The vanilla 2.6.15-rc5 kernel has an rather constant error which is in
> the range of roughly 1ms per interval mostly independent of the given
> interval and the system load. This is roughly 1 sec per 1000 intervals.
> 
> Ktimers had a contant summing error which is exactly the delta of the
> rounded interval to the real interval. The error is number of intervals
> * delta. The error could be deduced by the application, but thats not a
> really good idea. For a 50ms interval it is 2sec / 1000 intervals, which
> is exactly the 2ms delta between the 50ms requested and the 52ms real
> interval on a system with HZ=250
> 
> Ptimers have a rounding error which depends on the delta to the jiffy
> resolution and the system load. So it gets rather unpredicitble what
> happens. The basic error is roughly the same as with ktimers, but the
> addon component due to system load is not. For a 50ms interval a summing
> error between 2sec and 7sec per 1000 intervals was measured.
> 
> So while vanilla and ktimers have a systematic error, ptimer introduces
> random Brownean motion!

Thomas, you unfortunately don't provide enough context for these numbers 
for me to reproduce them.
I don't want to guess, so please provide an example to demonstrate this.

> > Nevertheless, if you read my explanation of the rounding carefully and 
> > look at my implementation, you may notice that I still disagree with 
> > the actual implementation.
> 
> I started to read it, but your explanation seems to be completely
> detached from the testing results and the code.

Thomas, if you don't ask me, I can't help you to understand any issues, 
there might have been.

> You define that absolute interval timers on clock realtime change their
> behaviour when the initial time value is expired into relative timers
> which are not affected by time settings. I have no idea how you came to
> that interpretation of the spec. I completely disagree [but if you would
> like I can go into more detail why I think it's wrong.]

Please do. I explained it in one of my patches:

[PATCH 5/9] remove relative timer from abs_list

When an absolute timer expires, it becomes a relative timer, so remove
it from the abs_list.  The TIMER_ABSTIME flag for timer_settime()
changes the interpretation of the it_value member, but it_interval is
always a relative value and clock_settime() only affects absolute time
services.

> Beside of that, the implementation is also completely broken. (You
> rebase the timer from the realtime base to the monotonic base inside of
> the timer callback function. On return you lock the realtime base and
> enqueue the timer into the realtime queue, but the base field of the
> timer points to the monotonic queue. It needs not much phantasy to get
> this exploited into a crash.)

If you provide the wrong parameters, you can crash a lot of stuff in the 
kernel. "exploited" usually implies it can be abused from userspace, which 
is not the case here.

> Furthermore, your implementation is calculating the next expiry value
> based on the current time of the expiration rather than on the previous
> expected expiry time, which would be the natural thing to do. This
> detail also explains the system-load dependent random drifting of
> ptimers quite well.

Is this conclusion based on actual testing? The behaviour of ptimer should 
be quite close to the old jiffie based timer, so I'm a bit at a loss here, 
how you get to this conclusion. Please provide more details.

> The changes you did to the timer locking code (also in timer.c) are racy
> and simply exposable. Oleg's locking implementation is there for a good
> reason.

Thomas, bringing up this issue is really weak. With Oleg's help it's 
already solved, you don't have to warm it up. :(

> Neither do I understand the jiffie boundness you re-introduced all over
> the place. The softirq code is called once per jiffy and the expiry is
> checked versus the current time. Basing a new design on jiffies, where
> the design intends to be easily extensible to high resolution clocks, is
> wrong IMNSHO. Doing a high resolution extension on top of it is just
> introducing a lot of #ifdef mess in places where none has to be. We had
> that before, and dont want to go back there.

I don't understand where you get this from, I explicitely said that higher 
resolution requires a better clock abstraction, bascially any place which 
mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you 
think this requires "a lot of #ifdef mess".

> One of your main, often repeated arguments was the complexity of
> ktimers. While ktimers held a lot of complex functionality, the "simple"
> ptimers .text size is larger than the ktimers one! I know that you claim
> that .text size is not a criteria, but Andrew seriously asked what he
> gets for the increase of .text.

Sorry, but I didn't had as much time as you to finetune my implementation. 
I did some quick tests by also deinlining some stuff, which quickly 
brought it down to the ktimer size, integrating some more cleanups should 
do the rest.
Thomas, this is hardly an argument against my implementation, I never 
claimed it to be complete. It was meant as sanely mergable version 
compared to your large ktimers patch.

Anyway, thanks for finally responding, there seem have to piled up a 
number of misconceptions, please give it some time to clear up.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 11:09             ` Nick Piggin
  2005-12-07 11:33               ` Ingo Molnar
@ 2005-12-07 12:40               ` Roman Zippel
  2005-12-07 23:12                 ` Nick Piggin
  1 sibling, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-07 12:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ingo Molnar, Andrew Morton, tglx, linux-kernel, rostedt, johnstul

Hi,

On Wed, 7 Dec 2005, Nick Piggin wrote:

> Sure... hmm, the names timeout and timer themselves have something
> vagely wrong about them, but I can't quite place my finger on it,
> not a real worry though...
> 
> Maybe it is that timeout is an end result, but timer is a mechanism.
> So maybe it should be 'struct interval', 'struct timeout';
> or 'struct timer', 'struct timeout_timer'.
> 
> But I don't know really, it isn't a big deal.

Nick, thanks for speaking up about this.
My mistake was to make a big deal out of it, because I knew it would 
confuse more people. After I got the heat for this, it seems nobody else 
want to get flamed for it.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 11:33               ` Ingo Molnar
  2005-12-07 11:40                 ` Nick Piggin
@ 2005-12-07 13:06                 ` Roman Zippel
  1 sibling, 0 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-07 13:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Andrew Morton, tglx, linux-kernel, rostedt, johnstul

Hi,

On Wed, 7 Dec 2005, Ingo Molnar wrote:

> maybe 'struct timer' and 'struct hrtimer' is the right solution after 
> all, and our latest queue doing 'struct timer_list' + 'struct hrtimer' 
> is actually quite close to it.
> 
> 'struct ptimer' does have a bit of vagueness in it at first sight, do 
> you agree with that? (does it mean 'process'? 'posix'? 'precision'?) 
> 
> also, hrtimers on low-res clocks do have high internal resolution, but 
> they are not precise timing mechanisms in the end, due to the low-res 
> clock. So the more generic name would be 'high-resolution timers', not 
> 'precision timers'. (also, the name 'precision timers' sounds a bit 
> funny too, but i dont really know why.)

My ptimer suggestion was based on your "funny" name "high-precision 
timer". Sorry Ingo, that joke is on you. :-)
Anyway, anything else than ktimer is fine with me.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 12:18     ` Roman Zippel
@ 2005-12-07 16:55       ` Ingo Molnar
  2005-12-07 17:17         ` Roman Zippel
  2005-12-09 17:23       ` Thomas Gleixner
  1 sibling, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2005-12-07 16:55 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, rostedt, johnstul


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > Sorry if it was previously your idea and if we didnt credit you for it.
> > I did not keep track of each word said in these endless mail threads. We
> > credited every suggestion and idea which we picked up from you, see our
> > previous emails. If we missed one, it was definitely not intentional.
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112755827327101
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112760582427537
> 
> A bit later ktime_t looked pretty much like the 64bit part of my 
> ktimespec.

and Thomas credited you for that point in his announcement:

 " Roman pointed out that the penalty for some architectures
   would be quite big when using the nsec_t (64bit) scalar time
   storage format. "

  http://marc.theaimsgroup.com/?l=linux-kernel&m=112794069605537&w=2

also, once you came up with actual modifications to the ktimers concept 
(the ptimer queue) we noticed a further refinement of ktime_t in that 
code: the elimination of the plain scalar type. We gave you credit 
again:

" - eliminate the plain s64 scalar type, and always use the union.
    This simplifies the arithmetics. Idea from Roman Zippel. "

see:

   http://marc.theaimsgroup.com/?l=linux-kernel&m=113339663027117&w=2
   http://marc.theaimsgroup.com/?l=linux-kernel&m=113382965626004&w=2

we couldnt take your actual patch/code though, due to the way you 
created the ptimer queue: you took our ktimer queue, added a dozen 
changes to it (intermixed, without keeping/disclosing the changes), then 
you split up the resulting queue. This structure made it largely 
incompatible with our queue, the diff between ktimers and ptimers was 
larger than the two patches themselves, due to the stacked changed! This 
is not a complaint - we are happy you are writing ktimer based code - 
this is just an explanation of why we couldnt take the code/patch as-is 
but had to redo that portion from scratch, based on your idea.

> I don't won't to imply any intention, but please try to see this from 
> my perspective, after this happens a number of times.

What the hell are you talking about? I bloody well know how it all 
happened, because i did those simplifications to ktime.h myself, and i 
added the changelog too, crediting you for the idea.

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 16:55       ` Ingo Molnar
@ 2005-12-07 17:17         ` Roman Zippel
  2005-12-07 17:57           ` Ingo Molnar
  2005-12-07 18:02           ` Paul Baxter
  0 siblings, 2 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-07 17:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, rostedt, johnstul

Hi,

On Wed, 7 Dec 2005, Ingo Molnar wrote:

> > A bit later ktime_t looked pretty much like the 64bit part of my 
> > ktimespec.
> 
> and Thomas credited you for that point in his announcement:
> 
>  " Roman pointed out that the penalty for some architectures
>    would be quite big when using the nsec_t (64bit) scalar time
>    storage format. "

"pointed out that the penalty" is a bit different from "provided the 
basic idea of the ktime_t union and half the implementation"...

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 17:17         ` Roman Zippel
@ 2005-12-07 17:57           ` Ingo Molnar
  2005-12-07 18:18             ` Roman Zippel
  2005-12-07 18:02           ` Paul Baxter
  1 sibling, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2005-12-07 17:57 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, rostedt, johnstul


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > > (It's also interesting how you do that without giving me any 
> > >  credit for it.)
> >
> > Sorry if it was previously your idea and if we didnt credit you for 
> > it.
> > [...]
> >
> > > A bit later ktime_t looked pretty much like the 64bit part of my 
> > > ktimespec.
> > 
> > and Thomas credited you for that point in his announcement:
> > 
> >  " Roman pointed out that the penalty for some architectures
> >    would be quite big when using the nsec_t (64bit) scalar time
> >    storage format. "
> 
> "pointed out that the penalty" is a bit different from "provided the 
> basic idea of the ktime_t union and half the implementation"...

so ... did you change your position from accusing us of not giving you 
_any_ credit:

   "It's also interesting how you do that without giving me
    any credit for it."

to accusing us of not giving you _enough_ credit? Did i get that right?

And ontop of that, you now want the credit for providing the basic idea 
for half of the ktimer/hrtimer implementation? Sorry that i did not find 
out in advance that you wanted _that_ ;-)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 17:17         ` Roman Zippel
  2005-12-07 17:57           ` Ingo Molnar
@ 2005-12-07 18:02           ` Paul Baxter
  1 sibling, 0 replies; 74+ messages in thread
From: Paul Baxter @ 2005-12-07 18:02 UTC (permalink / raw)
  To: Roman Zippel, linux-kernel

> "pointed out that the penalty" is a bit different from "provided the
> basic idea of the ktime_t union and half the implementation"...
>
> bye, Roman

This is getting bloody ridiculous.

Roman, you won't get credited for every nuance of what you've said and done, 
neither will Ingo and Thomas and the many others that work hard to make 
Linux a better Operating System.

I admire the fact that you did pick up the gauntlett and produce code which 
has helped further the whole work.

I keep reading this thread because, against all odds, there is a lot of 
technical progress but the constant bickering really does your credibility 
no favours.

Please stop trying to portray yourself as a victim, those that care will 
form an opinion from your words. Please, please have the grace to rise above 
any perceived 'insults' both actual, and, in my view, mostly insubstantial.

A frustrated lurker, who can't wait for high resolution timers and maybe 
even high precision timers one day.

Thank you, Roman, for all your technical efforts.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 17:57           ` Ingo Molnar
@ 2005-12-07 18:18             ` Roman Zippel
  0 siblings, 0 replies; 74+ messages in thread
From: Roman Zippel @ 2005-12-07 18:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, linux-kernel, Andrew Morton, rostedt, johnstul

Hi,

On Wed, 7 Dec 2005, Ingo Molnar wrote:

> so ... did you change your position from accusing us of not giving you 
> _any_ credit:
> 
>    "It's also interesting how you do that without giving me
>     any credit for it."
> 
> to accusing us of not giving you _enough_ credit? Did i get that right?
> 
> And ontop of that, you now want the credit for providing the basic idea 
> for half of the ktimer/hrtimer implementation? Sorry that i did not find 
> out in advance that you wanted _that_ ;-)

Ingo, please stay serious and don't take things out of context.
I was just asking for some more/better credit for some of the ktime_t core 
ideas, I'll leave it now to you what you make with this and I won't 
further pursue this issue.
Can we please now get back to the more important issues?

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 12:40               ` Roman Zippel
@ 2005-12-07 23:12                 ` Nick Piggin
  0 siblings, 0 replies; 74+ messages in thread
From: Nick Piggin @ 2005-12-07 23:12 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Andrew Morton, tglx, linux-kernel, rostedt, johnstul

Hi Roman,

Roman Zippel wrote:
> 
> Nick, thanks for speaking up about this.
> My mistake was to make a big deal out of it, because I knew it would 
> confuse more people. After I got the heat for this, it seems nobody else 
> want to get flamed for it.
> 

I didn't mean to trivialise the issue. I think good naming is
important, however I added the disclaimer because of course I
didn't write any code, so my opinion didn't carry much weight
in that particular situation compared to you guys.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07  3:05     ` Roman Zippel
@ 2005-12-08  5:18       ` Paul Jackson
  2005-12-08  8:12         ` Ingo Molnar
  2005-12-08  9:26       ` Ingo Molnar
  1 sibling, 1 reply; 74+ messages in thread
From: Paul Jackson @ 2005-12-08  5:18 UTC (permalink / raw)
  To: Roman Zippel; +Cc: mingo, tglx, linux-kernel, akpm, rostedt, johnstul

> > > [...] So Thomas, please get over yourself and start talking.
> 
> I must say it's completely beyond me how this could be "insulting".

Well ... fools rush in where angels fear to tread ...

As you note, people had better not take comments on their code as
insulting, if they are going to survive for long on lkml.

However comments on the person that don't match that person's current
self image often cause distress, even if (sometimes -especially- if)
they are accurate.

This 'get over yourself' implies a comment on the person, not the code.
It suggests you think they are on a high horse.

Since Thomas (apparently) didn't think he was afflicted at the
moment with something he needed to 'get over', he probably found that
instruction annoying on first reading.

That he called it an 'insult' is an irrelevant detail.  He's just
saying he found it annoying to read, but like most of us, instead
of saying "I hurt", he's saying "dog bites."  Nevermind that it was
actually a cat.

There is an easy way around this however.  When I feel like telling
someone they are an idiot (or any other ad hominem comment less than
puppy dog happiness), I have better luck calling myself that, as in
"sorry for being such a stupid git, but ...".  Few people object
to the -other- person confessing to being a stupid rude bastard.
They just don't want themselves to be thought of that way, or anything
remotely resembling that.

As I recall, Linus has called himself a bastard more than once, with
just such good affect.

So just take all descriptions of other persons, and flip them around,
pretending to describe yourself.  It will be a bold faced lie, and
totally illogical ... but that's typical in the realm of human
emotions.  The human species is definitely one dorked up bunch.

Imagine that this subthread had begun "Sorry, Thomas, let me get off
my high horse and start talking ...".

Thanks, by the way, for your help back then on cpuset locking.  It was
invaluable.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-08  5:18       ` Paul Jackson
@ 2005-12-08  8:12         ` Ingo Molnar
  0 siblings, 0 replies; 74+ messages in thread
From: Ingo Molnar @ 2005-12-08  8:12 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Roman Zippel, tglx, linux-kernel, akpm, rostedt, johnstul


* Paul Jackson <pj@sgi.com> wrote:

> So just take all descriptions of other persons, and flip them around, 
> pretending to describe yourself.  It will be a bold faced lie, and 
> totally illogical ... but that's typical in the realm of human 
> emotions.  The human species is definitely one dorked up bunch.

> "sorry for being such a stupid git, but ...".

it does not matter that it's a bold faced lie, people technically 'lie' 
about little things all the time and for a good reason - the human 
species (especially males) are way to agressive by default, so a certain 
conscious buffer zone is needed to even that out. (It is in fact a 
medical condition if that buffer-zone does not exist.)

But there's a crutial difference. A statement from Linus (or you) like 
the one above also shows three more things that are important:

1) that you have a sense of humor, and that you dont take things too
   seriously :) Humor goes a long way defusing differences.

2) that you actually entertain the possibility of being wrong and that
   you do not just want to steamroll the other side with your opinion.

3) that you actually care about the other person. This matters alot.
   There's a reason why hundreds of people who never met each other join
   on a mailing list and write code, and the reason is definitely not to
   have others piss on their code.

> Thanks, by the way, for your help back then on cpuset locking.  It was 
> invaluable.

i took some time and re-read your thread with Roman about cpusets back 
in September. It was like fresh air! Roman was totally reasonable and 
positive in that thread - so i dont think it's me or Thomas misreading 
his style in this case or something.

anyway, the reason i confronted Roman with the situation was in the 
renewed hope to improve direct communication. I'm still hopeful :)

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07  3:05     ` Roman Zippel
  2005-12-08  5:18       ` Paul Jackson
@ 2005-12-08  9:26       ` Ingo Molnar
  2005-12-08 13:08         ` Roman Zippel
  1 sibling, 1 reply; 74+ messages in thread
From: Ingo Molnar @ 2005-12-08  9:26 UTC (permalink / raw)
  To: Roman Zippel; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > to be able to comprehend what kind of mood we might be in when reading 
> > your emails these days, how about this little snippet from you, from the 
> > second email you wrote in the ktimers threads:
> > 
> > "First off, I can understand that you're rather upset with what I wrote,
> >  unfortunately you got overly defensive, so could you please next time
> >  not reply immediately and first sleep over it, an overly emotional
> >  reply is funny to read but not exactly useful."
> 
> Here we probably get to the root of the problem: we got off on the 
> wrong foot.

yes.

> In my first email I hadn't much good to say about the initial 
> announcement, but at any time it was meant technical. Anyone who 
> compares the first and the following announcement will notice the big 
> improvement.  Unfortunately Thomas seemed to have to taken it rather 
> personal (although it never was meant that way) [...]

what you did was in essence to piss on his code, concepts and 
description. [oh, i wrote roughly half of ktimers, so you pissed on my 
code too ;-) ]

Here are a few snippets from you that show that most of the negative 
messaging from you was directed against the text Thomas wrote (or 
against Thomas), not against the code:

" How you get to these conclusions is still rather unclear, I don't even
  really know what the problem is from just reading the pretext. "

" What is seriously missing here is the big picture. "

" This no answer at all, you only repeat what you already said above. "

" The main problem with your text is that you jump from one topic to the
  next, making it impossible to create a coherent picture from it. "

" You don't think that having that much timers in first place is little 
  insane (especially if these are kernel timers)? "

" Later it becomes clear that you want high resolution timers, what 
  doesn't become clear is why it's such an essential feature that 
  everyone has to pay the price for it (this does not only include 
  changes in runtime behaviour, but also the proposed API changes). "

" It's nice that you're sure of it, but as long don't provide the means 
  to verify them, they are just assertions. "

note that these were pretty much out of the blue sky, and they pretty 
much set the stage. Given that Thomas is a volunteer too, he does not 
have to bear with what he senses as arbitrary abuse.

> [...] and I never got past this first impression and ever since I 
> can't get him back to a normal conversation.

maybe because your 'get him back to a normal conversation' attempt used 
precisely the same somewhat dismissive and somewhat derogatory tone and 
type of language that your initial mails were using? Past experience is 
extrapolated to the future, so even small negative messages get easily 
blown up, and the "cycle of violence" never stops, because nobody breaks 
the loop.

> > Insults like the following sentence in this very email:
> > 
> > > [...] So Thomas, please get over yourself and start talking.
> 
> I must say it's completely beyond me how this could be "insulting".  

maybe it is insulting because the "get over yourself" implicitly 
suggests that the fault is with Thomas?

Let me give you a few alternatives, that might have had a completely 
different effect and which do not contain any implicit insults:

 "So Thomas, I know we got on to the wrong footing, but lets start
  talking again."

or:

 "So Thomas, I know we had a couple of nasty exchanges in the past, but 
  lets bury the hatchet and try again. I apologize if I offended you in 
  any way in the past."

just try it, really. Even if it's a bold faced lie ;)

> This is my desperate attempt at getting any conversation started. If 
> Thomas isn't talking to me at all, I can't resolve any issue he might 
> have with me. [...]

Thomas wrote you 11 replies in 2.5 months, and some of those were 
extremely detailed. That's a far cry from not talking at all. He did try 
hard, he did get involved in a flamewar with you, which wasnt overly 
productive. But he is a volunteer, he has no obligation to waste time on 
flamewars.

> > let me be frank, and show you my initial reply that came to my mind when 
> > reading the above sentence: "who the f*ck do you think you are to talk 
> > to _anyone_ like that?". Now i'm usually polite and wont reply like 
> > that,...
> 
> You may haven't said it openly like that, but this hostility was still 
> noticable. You disagreed with me on minor issues and used the smallest 
> mistake to simply lecture me. From my point the attitude you showed 
> towards me is not much different from what you're accusing me of here.

yes, i definitely was not nice in a couple of mails, and i'd like to 
apologize for it.

> I'm not saying that I'm innocent about this, but any "insult" was 
> never intentional and I tried my best to correct any issues after we 
> got off on the wrong foot, but I obviously failed at that, I simply 
> never got past the initial impression.

ok, apology taken :)

> > in any case, from me you'll definitely get a reply to every positive or 
> > constructive question you ask in this thread, but you wont get many 
> > replies to mails that also include high-horse insults, question or 
> > statements.
> 
> Let's take the ptimer patches, I got _zero_ direct responses to it 
> [...]

well, direct communication with you has proven to be very unproductive a 
number of times, so what would have been the point? But we did mention 
lots of technical points in our subsequent mails, referring to your 
ptimers queue a number of times. We even adopted the ktime.h 
simplification idea and credited you for it.

also, what did you expect? Basically you came out with a patch-queue
based on ktimers, but you did not send any changes against the ktimers
patch itself, which made it very hard to map the real finegrained
changes you did to ktimers. You provided a writeup of differences, but
they did not fully cover the full scope of changes, relative to ktimers.
You based your queue on a weeks old version of ktimers, which also
raises the possibility that you were working on this for some time, in
private, for whatever reason. (Again, this is not a complaint - we are
happy you are communicating via code - whatever form that code is in.)

> [...] and it's difficult for me to understand how this could be taken 
> as "high-horse insult". As I obviously failed to make my criticism 
> understandable before, I produced these patches to provide a technical 
> base for a discussion of how this functionality could be merged in the 
> hopes of "Patches wont be ignored, i can assure you", unfortunately 
> they were.

they were not ignored - we mentioned the ptimer changes in our 
subsequent announcements, and we Cc:-ed you to those annoucements so 
that you get it first-hand.

	Ingo

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-08  9:26       ` Ingo Molnar
@ 2005-12-08 13:08         ` Roman Zippel
  2005-12-08 15:36           ` Steven Rostedt
  0 siblings, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-08 13:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul

Hi,

On Thu, 8 Dec 2005, Ingo Molnar wrote:

> Here are a few snippets from you that show that most of the negative 
> messaging from you was directed against the text Thomas wrote (or 
> against Thomas), not against the code:

Do you really think quoting me out of context is helping? From my 
perspective you're trying to show me now as the bad guy and I'm not 
accepting that. I don't know what you're trying to do, if you're trying to 
mediate, then you're really suck at it, if you just want to piss me off, 
it's working great. :-(

Technically I still stand behind everything I said in that context, in the 
meantime I learned a few new things and I understand them better, so 
some things have become nonissues and I even changed my mind about some 
other things.
OTOH I'm the first to admit that I could have said things nicer, but mail 
is a rather bad channel to transport emotions and whatever I say can be 
taken badly. I really try my best to avoid this, but sometimes it's really 
hard, especially if I can't get past the initial resentment. I gladly 
apologize for any mistake I did and I'll do my best to learn from it, but 
I'm not going to make amends for it forever. At some point it would be 
really nice if you stopped to rub it in what a insensitive clod I am, I 
know that already. 
Ingo, if you want to help me, why don't you go with a good example ahead 
and I'll try to follow you. How about this?

> > > > [...] So Thomas, please get over yourself and start talking.
> > 
> > I must say it's completely beyond me how this could be "insulting".  
> 
> maybe it is insulting because the "get over yourself" implicitly 
> suggests that the fault is with Thomas?

This is a nice example, that _whatever_ I'm saying can be misunderstood. 
Why don't you even try to give me a little credit that above was not meant 
as insult? You make an assumption about what I said and you don't even 
give me a chance to correct myself.
Thomas obviously has some kind of problem with me and unless he starts to 
talk to me, I can't help him to get over whatever problem that is. I'm 
not going away, so we have to get along somehow and this means we have to 
_talk_.
Ingo, you only want to see the "get over yourself" part, whereas my 
emphasis was and is on "talking".

> just try it, really. Even if it's a bold faced lie ;)

I'm a bad liar and as long as I don't know what the problem is, I'll make 
the same mistake over and over. I have no intention of becoming a 
notorious liar.

> Thomas wrote you 11 replies in 2.5 months, and some of those were 
> extremely detailed. That's a far cry from not talking at all.

Some of it was indeed more verbose, but I never got very far with my 
followup questions. Thomas used very often a phrase like "we analyzed the 
problem and we came to the conclusion...". It's great that you and Thomas 
get so well along with each other, but I'm in the disadvantage that I lack 
the information context that you have. What is "extremely detailed" for 
you is lacking context to create a coherent picture for me, so it's 
sometimes really frustrating to pull some information out of you both.

> also, what did you expect? Basically you came out with a patch-queue
> based on ktimers, but you did not send any changes against the ktimers
> patch itself, which made it very hard to map the real finegrained
> changes you did to ktimers.

At the time I only had the huge ktimers patch from -mm to work with.
One primary target was to split out the core (without all the extra 
complexity and extra cleanups) into mergable pieces, which makes it a bit 
pointless to do it relative to this huge patch.
The other main target was the resolution handling, I tried very hard to 
explain the details of it and why I did them this way. A discussion about 
this would have required a _direct_ response, where you point out with 
what you disagree. Random comments in other mails are not helping at all.
The rest are some smaller patches, which are completely independent of 
hrtimer, but even for this I got no response except from Oleg.

> You provided a writeup of differences, but
> they did not fully cover the full scope of changes, relative to ktimers.

I've seen this claim now a few times, but why the hell don't you just ask 
about the things that you think were missing?

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-08 13:08         ` Roman Zippel
@ 2005-12-08 15:36           ` Steven Rostedt
  0 siblings, 0 replies; 74+ messages in thread
From: Steven Rostedt @ 2005-12-08 15:36 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Ingo Molnar, tglx, linux-kernel, Andrew Morton, johnstul

On Thu, 2005-12-08 at 14:08 +0100, Roman Zippel wrote:
> Hi,
> 
> On Thu, 8 Dec 2005, Ingo Molnar wrote:
> 
> > Here are a few snippets from you that show that most of the negative 
> > messaging from you was directed against the text Thomas wrote (or 
> > against Thomas), not against the code:
> 
> Do you really think quoting me out of context is helping? From my 
> perspective you're trying to show me now as the bad guy and I'm not 
> accepting that. I don't know what you're trying to do, if you're trying to 
> mediate, then you're really suck at it, if you just want to piss me off, 
> it's working great. :-(
> 

Ingo, Thomas, Roman, Please!!!! Lets all say sorry to each other and
start over. This thread is starting to become really annoying.  You
three are very intelligent, and I really want to know what you have to
say technically to each other, and that's why I'm not ignoring this
thread.  Yes, all of you were not nice to each other.  Suck it up and
forget about it.

How about this, if you want to flame each other, do it privately. But
when posting to the LKML, Talk technically only.  Don't even try to be
nice, since I'm not sure now any of you can do that to each other.  But
also leave out _any_ personal comments.  So talk _only_ about the code.
If you don't like it, give a technical reason why.

Remember, the rest of the world is watching here.

Thank you,

-- Steve




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-07 12:18     ` Roman Zippel
  2005-12-07 16:55       ` Ingo Molnar
@ 2005-12-09 17:23       ` Thomas Gleixner
  2005-12-12 13:39         ` Roman Zippel
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-09 17:23 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

sorry for the late reply. I was travelling and cut off the net for a
while.

On Wed, 2005-12-07 at 13:18 +0100, Roman Zippel wrote:
> > It might turn out in the rework of the hrt bits that the list can go
> > away, but this is a nobrainer to do. (The very first unpublished version
> > of ktimers had no list, it got introduced during the hrt addon and
> > stayed there to make the hrt patch less intrusive.)
> 
> If it's such a nobrainer to remove it, why don't you add it later? As it 
> is right now, it's not needed and nobody but you knows what it's good for. 
> This way yoy make it only harder to review the code, if one stumbles over 
> these pieces all the time.

Well, if it makes it simpler for you to review the code.

But can you please explain to a code review unaware kernel developer
newbie like me, why this makes a lot of difference and why it makes it
so much easier to review ?

Actually the change adds more code lines and removes one field of the
hrtimer structure, but it has exactly the same functionality: Fast
access to the first expiring timer without walking the rb_tree.


 include/linux/hrtimer.h |    7 ++-----
 kernel/hrtimer.c        |   26 ++++++++++++++------------
 2 files changed, 16 insertions(+), 17 deletions(-)


Index: linux-2.6.15-rc5/include/linux/hrtimer.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/hrtimer.h
+++ linux-2.6.15-rc5/include/linux/hrtimer.h
@@ -49,8 +49,6 @@ struct hrtimer_base;
  * struct hrtimer - the basic hrtimer structure
  *
  * @node:	red black tree node for time ordered insertion
- * @list:	list head for easier access to the time ordered list,
- *		without walking the red black tree.
  * @expires:	the absolute expiry time in the hrtimers internal
  *		representation. The time is related to the clock on
  *		which the timer is based.
@@ -63,7 +61,6 @@ struct hrtimer_base;
  */
 struct hrtimer {
 	struct rb_node		node;
-	struct list_head	list;
 	ktime_t			expires;
 	enum hrtimer_state	state;
 	int			(*function)(void *);
@@ -78,7 +75,7 @@ struct hrtimer {
  *		to a base on another cpu.
  * @lock:	lock protecting the base and associated timers
  * @active:	red black tree root node for the active timers
- * @pending:	list of pending timers for simple time ordered access
+ * @first:	pointer to the timer node which expires first
  * @resolution:	the resolution of the clock, in nanoseconds
  * @get_time:	function to retrieve the current time of the clock
  * @curr_timer:	the timer which is executing a callback right now
@@ -87,7 +84,7 @@ struct hrtimer_base {
 	clockid_t		index;
 	spinlock_t		lock;
 	struct rb_root		active;
-	struct list_head	pending;
+	struct rb_node		*first;
 	unsigned long		resolution;
 	ktime_t			(*get_time)(void);
 	struct hrtimer		*curr_timer;
Index: linux-2.6.15-rc5/kernel/hrtimer.c
===================================================================
--- linux-2.6.15-rc5.orig/kernel/hrtimer.c
+++ linux-2.6.15-rc5/kernel/hrtimer.c
@@ -313,7 +313,6 @@ hrtimer_forward(struct hrtimer *timer, c
 static void enqueue_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
 {
 	struct rb_node **link = &base->active.rb_node;
-	struct list_head *prev = &base->pending;
 	struct rb_node *parent = NULL;
 	struct hrtimer *entry;
 
@@ -329,22 +328,23 @@ static void enqueue_hrtimer(struct hrtim
 		 */
 		if (timer->expires.tv64 < entry->expires.tv64)
 			link = &(*link)->rb_left;
-		else {
+		else
 			link = &(*link)->rb_right;
-			prev = &entry->list;
-		}
 	}
 
 	/*
-	 * Insert the timer to the rbtree and to the sorted list:
+	 * Insert the timer to the rbtree and check whether it
+	 * replaces the first pending timer
 	 */
 	rb_link_node(&timer->node, parent, link);
 	rb_insert_color(&timer->node, &base->active);
-	list_add(&timer->list, prev);
 
 	timer->state = HRTIMER_PENDING;
-}
 
+	if (!base->first || timer->expires.tv64 <
+	    rb_entry(base->first, struct hrtimer, node)->expires.tv64)
+		base->first = &timer->node;
+}
 
 /*
  * __remove_hrtimer - internal function to remove a timer
@@ -354,9 +354,11 @@ static void enqueue_hrtimer(struct hrtim
 static void __remove_hrtimer(struct hrtimer *timer, struct hrtimer_base *base)
 {
 	/*
-	 * Remove the timer from the sorted list and from the rbtree:
+	 * Remove the timer from the rbtree and replace the
+	 * first entry pointer if necessary.
 	 */
-	list_del(&timer->list);
+	if (base->first == &timer->node)
+		base->first = rb_next(&timer->node);
 	rb_erase(&timer->node, &base->active);
 }
 
@@ -528,16 +530,17 @@ int hrtimer_get_res(const clockid_t whic
 static inline void run_hrtimer_queue(struct hrtimer_base *base)
 {
 	ktime_t now = base->get_time();
+	struct rb_node *node;
 
 	spin_lock_irq(&base->lock);
 
-	while (!list_empty(&base->pending)) {
+	while ((node = base->first)) {
 		struct hrtimer *timer;
 		int (*fn)(void *);
 		int restart;
 		void *data;
 
-		timer = list_entry(base->pending.next, struct hrtimer, list);
+		timer = rb_entry(node, struct hrtimer, node);
 		if (now.tv64 <= timer->expires.tv64)
 			break;
 
@@ -590,7 +593,6 @@ static void __devinit init_hrtimers_cpu(
 
 	for (i = 0; i < MAX_HRTIMER_BASES; i++) {
 		spin_lock_init(&base->lock);
-		INIT_LIST_HEAD(&base->pending);
 		base++;
 	}
 }


> > > Nice, that you finally also come to that conclusion, after I said that 
> > > already for ages. (It's also interesting how you do that without 
> > > giving me any credit for it.)
> > 
> > Sorry if it was previously your idea and if we didnt credit you for it.
> > I did not keep track of each word said in these endless mail threads. We
> > credited every suggestion and idea which we picked up from you, see our
> > previous emails. If we missed one, it was definitely not intentional.
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112755827327101
> http://marc.theaimsgroup.com/?l=linux-kernel&m=112760582427537
> 
> A bit later ktime_t looked pretty much like the 64bit part of my ktimespec.
> I don't won't to imply any intention, but please try to see this from my 
> perspective, after this happens a number of times.

I have seen your and Ingos conversation on that and I dont want to add
more flames into this.

> > You define that absolute interval timers on clock realtime change their
> > behaviour when the initial time value is expired into relative timers
> > which are not affected by time settings. I have no idea how you came to
> > that interpretation of the spec. I completely disagree [but if you would
> > like I can go into more detail why I think it's wrong.]
> 
> Please do. I explained it in one of my patches:
> 
> [PATCH 5/9] remove relative timer from abs_list
> 
> When an absolute timer expires, it becomes a relative timer, so remove
> it from the abs_list.  The TIMER_ABSTIME flag for timer_settime()
> changes the interpretation of the it_value member, but it_interval is
> always a relative value and clock_settime() only affects absolute time
> services.

This is your interpretation and I disagree.

If I set up a timer with a 24 hour interval, which should go off
everyday at 6:00 AM, then I expect that this timer does this even when
the clock is set e.g. by daylight saving. I think, that this is a
completely valid interpretation and makes a lot of sense from a
practical point of view. The existing implementation does it that way
already, so why do we want to change this ?

Also you treat the interval relative to the current time of the callback
function:

timer->expires = ktime_add(timer->base->last_expired,
					   timr->it.real.incr);

This leads to a summing up error and even if the result is similar to
the summing up error of the current vanilla implementation I prefer a
solution which adds the interval to the previous set expiry time

timer->expires = ktime_add(timer->expires,
	        		   timr->it.real.incr);

The spec says:
"Also note that some implementations may choose to adjust time and/or
interval values to exactly match the ticks of the underlying clock."

So there is no requirement to do so. Of course you may, but this takes
simply the name "precision" ad absurdum.

> > Beside of that, the implementation is also completely broken. (You
> > rebase the timer from the realtime base to the monotonic base inside of
> > the timer callback function. On return you lock the realtime base and
> > enqueue the timer into the realtime queue, but the base field of the
> > timer points to the monotonic queue. It needs not much phantasy to get
> > this exploited into a crash.)
> 
> If you provide the wrong parameters, you can crash a lot of stuff in the 
> kernel. "exploited" usually implies it can be abused from userspace, which 
> is not the case here.

Thanks for teaching me what an "exploit" usally means! 

I intentionally wrote "exploited into a crash".

How do you think I got this to crash ? By hacking up some complex kernel
code ? No, simply by running my unmodified test scripts from user space
with completely valid and correct parameters. Of course its also
possible to write a program which actually exploits this.

The implementation is simply broken. Can you just accept this ?

> > Furthermore, your implementation is calculating the next expiry value
> > based on the current time of the expiration rather than on the previous
> > expected expiry time, which would be the natural thing to do. This
> > detail also explains the system-load dependent random drifting of
> > ptimers quite well.
> 
> Is this conclusion based on actual testing? The behaviour of ptimer should 
> be quite close to the old jiffie based timer, so I'm a bit at a loss here, 
> how you get to this conclusion. Please provide more details.

It is based on testing. Do you think I pulled numbers out of my nose ? 

But I have to admit that I did not look close enough into your code and
so I missed the ptimer_run_queue call inside of the lost jiffie loop.
Sorry, my conclusion was wrong. 

The problem seems to be related to the rebase code, which leads to a
wrong expiry value for clock realtime interval timers with the ABSTIME
flag set.

> > The changes you did to the timer locking code (also in timer.c) are racy
> > and simply exposable. Oleg's locking implementation is there for a good
> > reason.
> 
> Thomas, bringing up this issue is really weak. With Oleg's help it's 
> already solved, you don't have to warm it up. :(

I did not warm anything up. I was not aware that Oleg jumped already in
on this - I was not cc'ed and I really did not pay much attention on
LKML during this time. 
I'm familiar enough with locking, that I can recognize such a problem on
my own.

> > Neither do I understand the jiffie boundness you re-introduced all over
> > the place. The softirq code is called once per jiffy and the expiry is
> > checked versus the current time. Basing a new design on jiffies, where
> > the design intends to be easily extensible to high resolution clocks, is
> > wrong IMNSHO. Doing a high resolution extension on top of it is just
> > introducing a lot of #ifdef mess in places where none has to be. We had
> > that before, and dont want to go back there.
> 
> I don't understand where you get this from, I explicitely said that higher 
> resolution requires a better clock abstraction, bascially any place which 
> mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you 
> think this requires "a lot of #ifdef mess".

Why do you need all this jiffie stuff in the first place? It is not
necessary at all. The hrtimer code does not contain a single reference
of jiffies and therefor it does not need anything to clean up. I
consider even a single high resolution timer related #ifdef outside of
hrtimer.c and the clock event abstraction as an unnecessary mess. Sure
you can replace the TICK_NSEC and ktime_to_jiffie stuff completely, but
I still do not see the point why it is necessary to put it there first.
It just makes it overly complex to review and understand :)

I'm happy that we at least agree that we need better clock abstraction
layers. How do you think does our existing high resolution timer
implementation work ? While you explicitely said that it is required, we
explicitely used exactly such a mechanism from the very first day.

Please stop your absurd schoolmasterly attempts to teach me stuff which
I'm well aware off. Can you please accept, that I exactly know what I'm
talking about?

> Anyway, thanks for finally responding, there seem have to piled up a 
> number of misconceptions, please give it some time to clear up.

Roman, I have no interest and no intention to spend any more of my
private time on a discussion like this.

I always was and I'm still up for a technical discussion and
cooperation. I'm not vengeful at all and if I ever meet you in person,
the first beers at the bar are on my bill.

But I seriously will ignore you completely, if you keep this tone and
attitude with me.

I'm well aware that LKML is not a nun convent, but the basic rules of
human behaviour and respect still apply.

	tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-09 17:23       ` Thomas Gleixner
@ 2005-12-12 13:39         ` Roman Zippel
  2005-12-12 16:42           ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-12 13:39 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Fri, 9 Dec 2005, Thomas Gleixner wrote:

> Actually the change adds more code lines and removes one field of the
> hrtimer structure, but it has exactly the same functionality: Fast
> access to the first expiring timer without walking the rb_tree.

Together with the state field this would save 12 bytes, which is IMO very 
well worth considering.
You seem to have some plans for it, the best hint I've found for it is:

+ (This seperate list is also useful for high-resolution timers where we
+ need seperate pending and expired queues while keeping the time-order
+ intact.)"

Could you please elaborate on this?

> > [PATCH 5/9] remove relative timer from abs_list
> > 
> > When an absolute timer expires, it becomes a relative timer, so remove
> > it from the abs_list.  The TIMER_ABSTIME flag for timer_settime()
> > changes the interpretation of the it_value member, but it_interval is
> > always a relative value and clock_settime() only affects absolute time
> > services.
> 
> This is your interpretation and I disagree.
> 
> If I set up a timer with a 24 hour interval, which should go off
> everyday at 6:00 AM, then I expect that this timer does this even when
> the clock is set e.g. by daylight saving. I think, that this is a
> completely valid interpretation and makes a lot of sense from a
> practical point of view. The existing implementation does it that way
> already, so why do we want to change this ?

I don't know whether this behaviour was intentional and why it was done 
this way, so I did this patch to initiate a discussion about this.

I wouldn't say a 1 day interval timer is a very realistic example and the 
old timer wouldn't be very precise for this.
The rationale for example talks about "a periodic timer with an absolute 
_initial_ expiration time", so I could also construct a valid example with 
this expectation. The more I read the spec the more I think the current 
behaviour is not correct, e.g. that ABS_TIME is only relevant for 
it_value.
So I'm interested in specific interpretations of the spec which support 
the current behaviour.

> Also you treat the interval relative to the current time of the callback
> function:
> 
> timer->expires = ktime_add(timer->base->last_expired,
> 					   timr->it.real.incr);
> 
> This leads to a summing up error and even if the result is similar to
> the summing up error of the current vanilla implementation I prefer a
> solution which adds the interval to the previous set expiry time
> 
> timer->expires = ktime_add(timer->expires,
> 	        		   timr->it.real.incr);
> 
> The spec says:
> "Also note that some implementations may choose to adjust time and/or
> interval values to exactly match the ticks of the underlying clock."
> 
> So there is no requirement to do so. Of course you may, but this takes
> simply the name "precision" ad absurdum.

Your current implementation contradicts the requirement that values should 
be rounded up to the resolution of the timer, that's exactly what my 
implementation does. The resolution of the timer is currently TICK_NSEC 
(+- ntp correction) and one expiry of it should only cause at most one 
expiry of all pending timer. If I set a 1msec timer in your implementation 
(with HZ=250), I automatically get 3 overruns, even though the timer 
really did only expire once.

Since you don't do any rounding at all anymore, your timer may now expire 
early with low resolution clocks (the old jiffies + 1 problem I explained 
in my ktime_t patch).

Also in the ktimer patch you wrote:

+- also, there is an application surprise factor, the 'do not round
+  intervals' technique can lead to the following sample sequence of
+  events:
+
+    Interval:   1.7ms
+    Resolution: 1ms
+
+    Event timeline:
+
+     2ms - 4ms - 6ms - 7ms - 9ms - 11ms - 12ms - 14ms - 16ms - 17ms ...
+
+  this 2,2,1,2,2,1...msec 'unpredictable and uneven' relative distance
+  of events could surprise applications.

But this is now exactly the bevhaviour your timer has, why is not 
"surprising" anymore?

> > > Beside of that, the implementation is also completely broken. (You
> > > rebase the timer from the realtime base to the monotonic base inside of
> > > the timer callback function. On return you lock the realtime base and
> > > enqueue the timer into the realtime queue, but the base field of the
> > > timer points to the monotonic queue. It needs not much phantasy to get
> > > this exploited into a crash.)
> > 
> > If you provide the wrong parameters, you can crash a lot of stuff in the 
> > kernel. "exploited" usually implies it can be abused from userspace, which 
> > is not the case here.
> 
> Thanks for teaching me what an "exploit" usally means! 
> 
> I intentionally wrote "exploited into a crash".
> 
> How do you think I got this to crash ? By hacking up some complex kernel
> code ? No, simply by running my unmodified test scripts from user space
> with completely valid and correct parameters. Of course its also
> possible to write a program which actually exploits this.
> 
> The implementation is simply broken. Can you just accept this ?

I can accept that you found bug, but for "simply broken" I'm not convinced 
yet.
Sorry, I have not been specific enough, I disagree with your analysis 
above. On return the timer isn't requeued into the realtime queue at all, 
so this can't be the reason for the crash. I guess it's more likely you 
managed to trigger the locking bug.

> > > Furthermore, your implementation is calculating the next expiry value
> > > based on the current time of the expiration rather than on the previous
> > > expected expiry time, which would be the natural thing to do. This
> > > detail also explains the system-load dependent random drifting of
> > > ptimers quite well.
> > 
> > Is this conclusion based on actual testing? The behaviour of ptimer should 
> > be quite close to the old jiffie based timer, so I'm a bit at a loss here, 
> > how you get to this conclusion. Please provide more details.
> 
> It is based on testing. Do you think I pulled numbers out of my nose ? 

Jeez, sorry for asking. :(
You didn't specify anywhere how you got to this conclusion, so I could 
reproduce it myself. Could you please elaborate on this "system-load 
dependent random drifting"?

> > I don't understand where you get this from, I explicitely said that higher 
> > resolution requires a better clock abstraction, bascially any place which 
> > mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you 
> > think this requires "a lot of #ifdef mess".
> 
> Why do you need all this jiffie stuff in the first place? It is not
> necessary at all. The hrtimer code does not contain a single reference
> of jiffies and therefor it does not need anything to clean up. I
> consider even a single high resolution timer related #ifdef outside of
> hrtimer.c and the clock event abstraction as an unnecessary mess. Sure
> you can replace the TICK_NSEC and ktime_to_jiffie stuff completely, but
> I still do not see the point why it is necessary to put it there first.
> It just makes it overly complex to review and understand :)

In this regard I had two major goals: a) keep it as simple as possible, b) 
preserve the current behaviour and I still think I found the best 
compromise so far. This would allow to first merge the basic 
infrastructure, while reducing the risk of breaking anything.

I don't mind changing the behaviour, but I would prefer to do this in a 
separate step and with an analysis of the possible consequences. This is 
not just about posix-timers, but it also affects itimers, nanosleep and 
possibly other systems in the future. Actually my main focus is not on HR 
posix timer, my main interest is that everythings else keeps working and 
doesn't has to pay the price for it.

It's rather likely that if there is a subtle change in behaviour, which 
causes something to break, it's not noticed until it hits a release 
kernel, so I think it's very well worth it to understand and document the 
differences between the implementations.

> Please stop your absurd schoolmasterly attempts to teach me stuff which
> I'm well aware off. Can you please accept, that I exactly know what I'm
> talking about?

Sure, I can. I'm sorry I tried to explain things you already know, but if 
you know these things already, then please show it. At this point I'm 
mostly still trying to understand, why you did certain things and 
sometimes I explain things from my perspective in the hopes you would fill 
in the holes from your perspective.

You mostly just post your patches and only explain the conclusion, you're 
make it rather short on how you get to these conclusions, e.g. what other 
alternatives you've already considered. This makes it hard for me to 
figure out what you know exactly from what you're talking about.

> But I seriously will ignore you completely, if you keep this tone and
> attitude with me.

Jeez, cut me some slack, would you? Especially in the last mail I mostly 
just asked for more information. You read something into my mails, that is 
simply not there.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-12 13:39         ` Roman Zippel
@ 2005-12-12 16:42           ` Thomas Gleixner
  2005-12-12 18:37             ` Thomas Gleixner
                               ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-12 16:42 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Mon, 2005-12-12 at 14:39 +0100, Roman Zippel wrote:
> > Actually the change adds more code lines and removes one field of the
> > hrtimer structure, but it has exactly the same functionality: Fast
> > access to the first expiring timer without walking the rb_tree.
> 
> Together with the state field this would save 12 bytes, which is IMO very 
> well worth considering.
> You seem to have some plans for it, the best hint I've found for it is:
> 
> + (This seperate list is also useful for high-resolution timers where we
> + need seperate pending and expired queues while keeping the time-order
> + intact.)"
> 
> Could you please elaborate on this?

Sure. I have already removed the list_head for the non high resolution
case as it turned out that it does not hurt the high resolution
implementation.

For the high resolution implementation we have to move the expired
timers to a seperate list, as we do not want to do complex callback
functions from the event interrupt itself. But we have to reprogramm the
next event interrupt, so we need simple access to the timer which
expires first.

The initial implementation did simply move the timer from the pending
list to the expired list without doing the rb_tree removal inside of the
event interrupt handler. That way the next event for reprogramming was
the first event in the pending list.

The new rebased version with the pending list removed does the rb_tree
removal inside the event interrupt and enqueues the timer, for which the
callback function has to be executed in the softirq, to the expired
list. One exception are simple wakeup callback functions, as they are
reasonably fast and we save two context switches. The next event for
reprogramming the event interrupt is retrieved by the pointer in the
base structure.

This way the list head is only necessary for the high resolution case.

The state field is not removed

> > > [PATCH 5/9] remove relative timer from abs_list
> > > 
> > > When an absolute timer expires, it becomes a relative timer, so remove
> > > it from the abs_list.  The TIMER_ABSTIME flag for timer_settime()
> > > changes the interpretation of the it_value member, but it_interval is
> > > always a relative value and clock_settime() only affects absolute time
> > > services.
> > 
> > This is your interpretation and I disagree.
> > 
> > If I set up a timer with a 24 hour interval, which should go off
> > everyday at 6:00 AM, then I expect that this timer does this even when
> > the clock is set e.g. by daylight saving. I think, that this is a
> > completely valid interpretation and makes a lot of sense from a
> > practical point of view. The existing implementation does it that way
> > already, so why do we want to change this ?
> 
> I don't know whether this behaviour was intentional and why it was done 
> this way, so I did this patch to initiate a discussion about this.

Ok.

> I wouldn't say a 1 day interval timer is a very realistic example and the 
> old timer wouldn't be very precise for this.

Sure, as all comparisons are flawed. I just used a simple example to
illustrate my POV.

> The rationale for example talks about "a periodic timer with an absolute 
> _initial_ expiration time", so I could also construct a valid example with 
> this expectation. The more I read the spec the more I think the current 
> behaviour is not correct, e.g. that ABS_TIME is only relevant for 
> it_value.
> So I'm interested in specific interpretations of the spec which support 
> the current behaviour.

Unfortunately you find just the spec all over the place. I fear we have
to find and agree on an interpretation ourself.

I agree, that the restriction to the initial it_value is definitely
something you can read out of the spec. But it does not make a lot of
sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
never understood the rationale behind that.

> > The spec says:
> > "Also note that some implementations may choose to adjust time and/or
> > interval values to exactly match the ticks of the underlying clock."
> > 
> > So there is no requirement to do so. Of course you may, but this takes
> > simply the name "precision" ad absurdum.
> 
> Your current implementation contradicts the requirement that values should 
> be rounded up to the resolution of the timer, that's exactly what my 
> implementation does. The resolution of the timer is currently TICK_NSEC 
> (+- ntp correction) and one expiry of it should only cause at most one 
> expiry of all pending timer. If I set a 1msec timer in your implementation 
> (with HZ=250), I automatically get 3 overruns, even though the timer 
> really did only expire once.

Damn, you are right. We did not take this into account.

> Since you don't do any rounding at all anymore, your timer may now expire 
> early with low resolution clocks (the old jiffies + 1 problem I explained 
> in my ktime_t patch).

It does not expire early. The timer->expires field is still compared
against now. 

> Also in the ktimer patch you wrote:
> 
> +- also, there is an application surprise factor, the 'do not round
> +  intervals' technique can lead to the following sample sequence of
> +  events:
> +
> +    Interval:   1.7ms
> +    Resolution: 1ms
> +
> +    Event timeline:
> +
> +     2ms - 4ms - 6ms - 7ms - 9ms - 11ms - 12ms - 14ms - 16ms - 17ms ...
> +
> +  this 2,2,1,2,2,1...msec 'unpredictable and uneven' relative distance
> +  of events could surprise applications.
> 
> But this is now exactly the bevhaviour your timer has, why is not 
> "surprising" anymore?

Yes, we wrote that before. After reconsidering the results we came to
the conclusion, that we actually dont need the rounding at all because
the uneven distance is equally surprising as the summing up errors
introduced by rounding.

> I can accept that you found bug, but for "simply broken" I'm not convinced 
> yet. Sorry, I have not been specific enough, I disagree with your analysis 
> above. On return the timer isn't requeued into the realtime queue at all, 
> so this can't be the reason for the crash. I guess it's more likely you 
> managed to trigger the locking bug.

Ok. Maybe I did not understand the code at this point.

> You didn't specify anywhere how you got to this conclusion, so I could 
> reproduce it myself. Could you please elaborate on this "system-load 
> dependent random drifting"?

As I said already, my conclusion was wrong. This showed up on a SMP
machine not on UP, when the system load was high. (The timeline was
randomly off)

> > > I don't understand where you get this from, I explicitely said that higher 
> > > resolution requires a better clock abstraction, bascially any place which 
> > > mentions TICK_NSEC has to be cleaned up like this. I'm at loss why you 
> > > think this requires "a lot of #ifdef mess".
> > 
> > Why do you need all this jiffie stuff in the first place? It is not
> > necessary at all. The hrtimer code does not contain a single reference
> > of jiffies and therefor it does not need anything to clean up. I
> > consider even a single high resolution timer related #ifdef outside of
> > hrtimer.c and the clock event abstraction as an unnecessary mess. Sure
> > you can replace the TICK_NSEC and ktime_to_jiffie stuff completely, but
> > I still do not see the point why it is necessary to put it there first.
> > It just makes it overly complex to review and understand :)
> 
> In this regard I had two major goals: a) keep it as simple as possible, b) 
> preserve the current behaviour and I still think I found the best 
> compromise so far. This would allow to first merge the basic 
> infrastructure, while reducing the risk of breaking anything.
>
> I don't mind changing the behaviour, but I would prefer to do this in a 
> separate step and with an analysis of the possible consequences. This is 
> not just about posix-timers, but it also affects itimers, nanosleep and 
> possibly other systems in the future. Actually my main focus is not on HR 
> posix timer, my main interest is that everythings else keeps working and 
> doesn't has to pay the price for it.

While my focus is a clean merging of high resolution timers without
breaking the non hrt case, I still believe that we can find a solution,
where we can have the base implementation without any reference to
jiffies.

> It's rather likely that if there is a subtle change in behaviour, which 
> causes something to break, it's not noticed until it hits a release 
> kernel, so I think it's very well worth it to understand and document the 
> differences between the implementations.

Sure.

Our goal was to keep the code almost identical independend of the
driving clock source.

I try to compare and contrast the two possible solutions:

Rounding the initial expiry time and the interval results in a summing
up error, which depends on the delta of the interval and the
resolution. 

The non rounding solution results in a summing up error for intervals
which are less than the resolution. For intervals >= resolution no
summing up error is happening, but for intervals, which are not a
multiple of the resolution, an uneven interval as close as possible to
the timeline is delivered.

In both cases the timers never expire early and I think both variants
are compliant with the specification.

> Sure, I can. I'm sorry I tried to explain things you already know, but if 
> you know these things already, then please show it. At this point I'm 
> mostly still trying to understand, why you did certain things and 
> sometimes I explain things from my perspective in the hopes you would fill 
> in the holes from your perspective.
>
> You mostly just post your patches and only explain the conclusion, you're 
> make it rather short on how you get to these conclusions, e.g. what other 
> alternatives you've already considered. This makes it hard for me to 
> figure out what you know exactly from what you're talking about.

Ok. Will try to get this better.

	tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-12 16:42           ` Thomas Gleixner
@ 2005-12-12 18:37             ` Thomas Gleixner
  2005-12-13  1:25             ` George Anzinger
  2005-12-14 20:48             ` Roman Zippel
  2 siblings, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-12 18:37 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

> On Mon, 2005-12-12 at 14:39 +0100, Roman Zippel wrote:
> > > Actually the change adds more code lines and removes one field of the
> > > hrtimer structure, but it has exactly the same functionality: Fast
> > > access to the first expiring timer without walking the rb_tree.
> > 
> > Together with the state field this would save 12 bytes, which is IMO very 
> > well worth considering.
> > You seem to have some plans for it, the best hint I've found for it is:
> > 
> > + (This seperate list is also useful for high-resolution timers where we
> > + need seperate pending and expired queues while keeping the time-order
> > + intact.)"
> > 
> > Could you please elaborate on this?
> 
> Sure. I have already removed the list_head for the non high resolution
> case as it turned out that it does not hurt the high resolution
> implementation.
> 
> For the high resolution implementation we have to move the expired
> timers to a seperate list, as we do not want to do complex callback
> functions from the event interrupt itself. But we have to reprogramm the
> next event interrupt, so we need simple access to the timer which
> expires first.
> 
> The initial implementation did simply move the timer from the pending
> list to the expired list without doing the rb_tree removal inside of the
> event interrupt handler. That way the next event for reprogramming was
> the first event in the pending list.
> 
> The new rebased version with the pending list removed does the rb_tree
> removal inside the event interrupt and enqueues the timer, for which the
> callback function has to be executed in the softirq, to the expired
> list. One exception are simple wakeup callback functions, as they are
> reasonably fast and we save two context switches. The next event for
> reprogramming the event interrupt is retrieved by the pointer in the
> base structure.
> 
> This way the list head is only necessary for the high resolution case.
> 
> The state field is not removed

Oops, I somehow managed to remove the rest of this paragraph :(

The state field is not removed because I'm not a big fan of those
overloaded fields and I prefer to pay the 4 byte penalty for the
seperation.
Of course if there is the absolute requirement to reduce the size, I'm
not insisting on keeping it.

	tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-12 16:42           ` Thomas Gleixner
  2005-12-12 18:37             ` Thomas Gleixner
@ 2005-12-13  1:25             ` George Anzinger
  2005-12-13  9:18               ` Thomas Gleixner
  2005-12-15  1:35               ` Roman Zippel
  2005-12-14 20:48             ` Roman Zippel
  2 siblings, 2 replies; 74+ messages in thread
From: George Anzinger @ 2005-12-13  1:25 UTC (permalink / raw)
  To: tglx; +Cc: Roman Zippel, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Thomas Gleixner wrote:
~
> 
> 
>>I wouldn't say a 1 day interval timer is a very realistic example and the 
>>old timer wouldn't be very precise for this.
> 
> 
> Sure, as all comparisons are flawed. I just used a simple example to
> illustrate my POV.
> 
> 
>>The rationale for example talks about "a periodic timer with an absolute 
>>_initial_ expiration time", so I could also construct a valid example with 
>>this expectation. The more I read the spec the more I think the current 
>>behaviour is not correct, e.g. that ABS_TIME is only relevant for 
>>it_value.
>>So I'm interested in specific interpretations of the spec which support 
>>the current behaviour.

My $0.02 worth: It is clear (from the standard) that the initial time 
is to be ABS_TIME.  It is also clear that the interval is to be added 
to that time.  IMHO then, the result should have the same property, 
i.e. ABS_TIME.  Sort of like adding an offset to a relative address. 
The result is still relative.
> 
> 
> Unfortunately you find just the spec all over the place. I fear we have
> to find and agree on an interpretation ourself.
> 
> I agree, that the restriction to the initial it_value is definitely
> something you can read out of the spec. But it does not make a lot of
> sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> never understood the rationale behind that.

I don't think it really does that.  The TIMER_ABSTIME flag just says 
that the time requested is to be taken as "clock" time (which ever 
clock) AND that this is to be the expire time regardless of clock 
setting.  We, in an attempt to simplify the lists, convert the expire 
time into some common time notation (in most cases we convert relative 
times to absolute times) but this introduces problems because the 
caller has _asked_ for a relative or absolute time and not the other. 
  If the clock can not be set this is not a problem.  If it can, well, 
we need to keep track of what the caller wanted, absolute or relative.

It might help others to understand this if you were to remove the 
clock names from your queues and instead call them "absolute_real" and 
"up_time".  Then it would be more clear, I think, that we are mapping 
user requests onto these queues based on the desired functionality 
without a predilection to put a timer on a given queue just because a 
particular clock was requested.  At this point it becomes clear, for 
example, that a TIMER_ABSTIME request on the real clock is the _only_ 
request that should be mapped to the "absolute_real" list.
> 
~
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-13  1:25             ` George Anzinger
@ 2005-12-13  9:18               ` Thomas Gleixner
  2005-12-15  1:35               ` Roman Zippel
  1 sibling, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-13  9:18 UTC (permalink / raw)
  To: george
  Cc: Roman Zippel, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

On Mon, 2005-12-12 at 17:25 -0800, George Anzinger wrote:
> >>The rationale for example talks about "a periodic timer with an absolute 
> >>_initial_ expiration time", so I could also construct a valid example with 
> >>this expectation. The more I read the spec the more I think the current 
> >>behaviour is not correct, e.g. that ABS_TIME is only relevant for 
> >>it_value.
> >>So I'm interested in specific interpretations of the spec which support 
> >>the current behaviour.
> 
> My $0.02 worth: It is clear (from the standard) that the initial time 
> is to be ABS_TIME.  It is also clear that the interval is to be added 
> to that time.  IMHO then, the result should have the same property, 
> i.e. ABS_TIME.  Sort of like adding an offset to a relative address. 
> The result is still relative.

So the only difference between a timer with ABSTIME set and one without
is the notion of the initial expiry value, aside the
clock_settime(CLOCK_REALTIME) speciality.

ABSTIME:
firstexp = it_value
firstexp, firstexp + it_interval, ... firstexp + n * it_interval

non ABSTIME:
firstexp = now + it_value
firstexp, firstexp + it_interval, ... firstexp + n * it_interval

The only limitation of this is that the interval value can not be less
than the resolution of the clock in order to avoid the wrong accounting
of the overflow.

> > Unfortunately you find just the spec all over the place. I fear we have
> > to find and agree on an interpretation ourself.
> > 
> > I agree, that the restriction to the initial it_value is definitely
> > something you can read out of the spec. But it does not make a lot of
> > sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> > as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> > never understood the rationale behind that.
> 
> I don't think it really does that.  The TIMER_ABSTIME flag just says 
> that the time requested is to be taken as "clock" time (which ever 
> clock) AND that this is to be the expire time regardless of clock 
> setting.  We, in an attempt to simplify the lists, convert the expire 
> time into some common time notation (in most cases we convert relative 
> times to absolute times) but this introduces problems because the 
> caller has _asked_ for a relative or absolute time and not the other. 
>   If the clock can not be set this is not a problem.  If it can, well, 
> we need to keep track of what the caller wanted, absolute or relative.
> 
> It might help others to understand this if you were to remove the 
> clock names from your queues and instead call them "absolute_real" and 
> "up_time".  Then it would be more clear, I think, that we are mapping 
> user requests onto these queues based on the desired functionality 
> without a predilection to put a timer on a given queue just because a 
> particular clock was requested.  At this point it becomes clear, for 
> example, that a TIMER_ABSTIME request on the real clock is the _only_ 
> request that should be mapped to the "absolute_real" list.

In other words. If there is only CLOCK_REALTIME, then the implementation
has to keep track of absolute and relative timers.

The existance of CLOCK_MONOTONIC and the fact that CLOCK_MONOTONIC is
using the same clock source as CLOCK_REALTIME allows us to optimize the
implementation by putting the relative timers on the monotonic list.

	tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-12 16:42           ` Thomas Gleixner
  2005-12-12 18:37             ` Thomas Gleixner
  2005-12-13  1:25             ` George Anzinger
@ 2005-12-14 20:48             ` Roman Zippel
  2005-12-14 22:30               ` Thomas Gleixner
  2 siblings, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-14 20:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Mon, 12 Dec 2005, Thomas Gleixner wrote:

> For the high resolution implementation we have to move the expired
> timers to a seperate list, as we do not want to do complex callback
> functions from the event interrupt itself. But we have to reprogramm the
> next event interrupt, so we need simple access to the timer which
> expires first.
> 
> The initial implementation did simply move the timer from the pending
> list to the expired list without doing the rb_tree removal inside of the
> event interrupt handler. That way the next event for reprogramming was
> the first event in the pending list.
> 
> The new rebased version with the pending list removed does the rb_tree
> removal inside the event interrupt and enqueues the timer, for which the
> callback function has to be executed in the softirq, to the expired
> list. One exception are simple wakeup callback functions, as they are
> reasonably fast and we save two context switches. The next event for
> reprogramming the event interrupt is retrieved by the pointer in the
> base structure.
> 
> This way the list head is only necessary for the high resolution case.

Thanks for the explanation. If it's just for reprogramming the interrupt, 
it should be cheaper to just check the rbtree than walk the list to find 
the next expiration time (at least theoretically). This leaves only 
optimizations for rt kernel and from the base kernel point of view I 
prefer the immediate space savings.

> The state field is not removed because I'm not a big fan of those
> overloaded fields and I prefer to pay the 4 byte penalty for the
> seperation.
> Of course if there is the absolute requirement to reduce the size, I'm
> not insisting on keeping it.

Well, I'm not a big fan of redundant state information, e.g. the pending 
information can be included in the rb_node (it's not as quite simple as 
with the timer_list, but it's the same thing). The expired information 
(together with the data field) is an optimization for simple sleeps that 
AFAICT only makes a difference in the rt kernel (the saved context switch 
you mentioned above). What makes me more uncomfortable is that this is a 
special case optimization and other callbacks are probably fast as well 
(e.g. wakeup + timer restart).

I can understand you want to keep the difference to the rt kernel small, 
but for me it's more about immediate benefits against uncertain long term 
benefits.

> > The rationale for example talks about "a periodic timer with an absolute 
> > _initial_ expiration time", so I could also construct a valid example with 
> > this expectation. The more I read the spec the more I think the current 
> > behaviour is not correct, e.g. that ABS_TIME is only relevant for 
> > it_value.
> > So I'm interested in specific interpretations of the spec which support 
> > the current behaviour.
> 
> Unfortunately you find just the spec all over the place. I fear we have
> to find and agree on an interpretation ourself.
> 
> I agree, that the restriction to the initial it_value is definitely
> something you can read out of the spec. But it does not make a lot of
> sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> never understood the rationale behind that.

As George already said, it's easier to keep these clocks separate. I think 
the spec rationale is also more clear about the intended usage. About 
timers it says: 

"Two timer types are required for a system to support realtime 
applications:

1. One-shot
...

2. Periodic
..."

Basically you have two independent timer types. It's quite explicit about 
that only the "initial timer expiration" can be relative or absolute. It 
doesn't say anywhere that there are relative and absolute periodic timer, 
all references to "absolute" or "relative" are only in connection with the 
initial expiration time and after the initial expiration, it becomes a 
periodic timer. At every timer expiration the timer is reloaded with a 
relative time interval.
I can understand that you find this behaviour useful (although other 
people may disagree) and the spec doesn't explicitely say that you must 
not do this, but I really can't convince myself that this is the 
_intendend_ behaviour.

> > Since you don't do any rounding at all anymore, your timer may now expire 
> > early with low resolution clocks (the old jiffies + 1 problem I explained 
> > in my ktime_t patch).
> 
> It does not expire early. The timer->expires field is still compared
> against now. 

I don't think that's enough (unless I missed something). Steven maybe 
explained it better than I did in
http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935

Even if you set the timer resolution to 1 nsec, there is still the 
resolution of the actual hardware clock and it has to be taken into 
account somewhere when you start a relative timer. Even if the clock 
resolution is usually higher than the normal latency, so the problem won't 
be visible for most people, the general timer code should take this into 
account. If someone doesn't care about high resolution timer, he can still 
use it with a low resolution clock (e.g. jiffies) and then this becomes a 
problem.

> > But this is now exactly the bevhaviour your timer has, why is not 
> > "surprising" anymore?
> 
> Yes, we wrote that before. After reconsidering the results we came to
> the conclusion, that we actually dont need the rounding at all because
> the uneven distance is equally surprising as the summing up errors
> introduced by rounding.

I don't think that the summing up error is surprising at all, the spec is 
quite clear that the time values have to be rounded up to the resolution 
of the timer and it's also the behaviour of the current timer.
This error is actually the expected behaviour for any timer with a 
resolution different from 1 nsec. I don't want to say that we can't have 
such a timer, but I'm not so sure whether this should be the default 
behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME 
and CLOCK_REALTIME_HR, where one is possibly faster and the other is more 
precise. Even within the kernel I would prefer to map itimer and nanosleep 
to the first clock (maybe also based on arch/kconfig defaults).
OTOH if the hardware allows it, both clocks can do the same thing, but I 
really would like to have the possibility to give higher (and thus 
possibly more expensive) resolution only to those asking for it.

> > I don't mind changing the behaviour, but I would prefer to do this in a 
> > separate step and with an analysis of the possible consequences. This is 
> > not just about posix-timers, but it also affects itimers, nanosleep and 
> > possibly other systems in the future. Actually my main focus is not on HR 
> > posix timer, my main interest is that everythings else keeps working and 
> > doesn't has to pay the price for it.
> 
> While my focus is a clean merging of high resolution timers without
> breaking the non hrt case, I still believe that we can find a solution,
> where we can have the base implementation without any reference to
> jiffies.

This is what I think requires the better clock abstraction, most of it is 
related to the clock resolution, the generic timer code currently has no 
idea of the real resolution of the underlying clock, so I assumed a worst 
case of TICK_NSEC everywhere.

> I try to compare and contrast the two possible solutions:
> 
> Rounding the initial expiry time and the interval results in a summing
> up error, which depends on the delta of the interval and the
> resolution. 
> 
> The non rounding solution results in a summing up error for intervals
> which are less than the resolution. For intervals >= resolution no
> summing up error is happening, but for intervals, which are not a
> multiple of the resolution, an uneven interval as close as possible to
> the timeline is delivered.
> 
> In both cases the timers never expire early and I think both variants
> are compliant with the specification.

What I'd like to avoid is that we have to commit ourselves to only one 
solution right now. I think the first solution is better suited to the low 
resolution timer, that we have right now. The second solution requires a 
better clock framework - this includes better time keeping and 
reprogrammable timer interrupts.
At this point I wouldn't like to settle on just one solution, I had to 
see more of the infrastructure integrated before doing this. At this point 
I see more advantages in having a choice (may it be as Kconfig or even a 
runtime option).

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-14 20:48             ` Roman Zippel
@ 2005-12-14 22:30               ` Thomas Gleixner
  2005-12-15  0:55                 ` George Anzinger
                                   ` (2 more replies)
  0 siblings, 3 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-14 22:30 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Wed, 2005-12-14 at 21:48 +0100, Roman Zippel wrote:
> > 
> > This way the list head is only necessary for the high resolution case.
> 
> Thanks for the explanation. If it's just for reprogramming the interrupt, 
> it should be cheaper to just check the rbtree than walk the list to find 
> the next expiration time (at least theoretically). This leaves only 
> optimizations for rt kernel and from the base kernel point of view I 
> prefer the immediate space savings.

The current -hrt queue contains the removal patch of the list_head
already and you interrupted my attempt to send out the patch for -mm :)

> > The state field is not removed because I'm not a big fan of those
> > overloaded fields and I prefer to pay the 4 byte penalty for the
> > seperation.
> > Of course if there is the absolute requirement to reduce the size, I'm
> > not insisting on keeping it.
> 
> Well, I'm not a big fan of redundant state information, e.g. the pending 
> information can be included in the rb_node (it's not as quite simple as 
> with the timer_list, but it's the same thing). 

I do not consider this as redundant information. It's a design decision
whether to use a state variable for state information and the rbnode for
rbtree handling or to overload the meaning of the rbnode with
information which is not the natural associated content. 

I'm well aware of those optimization and space saving tricks. I did
microcontroller programming long enough, but - and out of the experience
- I want to avoid it for new designs where ever it is possible for
clarity and extensibility reasons.

> The expired information 
> (together with the data field) is an optimization for simple sleeps that 
> AFAICT only makes a difference in the rt kernel (the saved context switch 
> you mentioned above). What makes me more uncomfortable is that this is a 
> special case optimization and other callbacks are probably fast as well 
> (e.g. wakeup + timer restart).

Not only in a -rt kernel, it also saves the context switch for a high
resolution timer enabled kernel, where you actually can execute the
callback in hard interrupt context. We can solve it differently, but we
should carefully think about the extensiblity issues. The wakeup +
restart szenario is a good reason to reconsider this.

> I can understand you want to keep the difference to the rt kernel small, 
> but for me it's more about immediate benefits against uncertain long term 
> benefits.

If you have a clear target and the experience of having implemented the
extensions, you have to carefully weigh up the consequences of such
decisions. I'm not talking about "might be implemented by somebody
sometimes features", I'm talking about existing proof of concept
implementations. There is no real justification to ignore well known
consequences.

Of course if you consider the possibility of including high resolution
timers (I'm not talking about -rt) as  zero, your requests make sense. 

I disagree because I'm convinced that the problems "high res timers",
"dynamic ticks", "timekeeping", "clock event abstraction" are closely
related and we have high demands to get those solved in a clean way. So
providing some jiffies bound minimal solution in the first place is more
than shortsighted IMO.

> > > The rationale for example talks about "a periodic timer with an absolute 
> > > _initial_ expiration time", so I could also construct a valid example with 
> > > this expectation. The more I read the spec the more I think the current 
> > > behaviour is not correct, e.g. that ABS_TIME is only relevant for 
> > > it_value.
> > > So I'm interested in specific interpretations of the spec which support 
> > > the current behaviour.
> > 
> > Unfortunately you find just the spec all over the place. I fear we have
> > to find and agree on an interpretation ourself.
> > 
> > I agree, that the restriction to the initial it_value is definitely
> > something you can read out of the spec. But it does not make a lot of
> > sense for me. Also the restriction to TIMER_ABSTIME is somehow strange
> > as it converts an CLOCK_REALTIME timer to a CLOCK_MONOTONIC timer. I
> > never understood the rationale behind that.
> 
> As George already said, it's easier to keep these clocks separate. I think 
> the spec rationale is also more clear about the intended usage. About 
> timers it says: 
> 
> "Two timer types are required for a system to support realtime 
> applications:
> 
> 1. One-shot
> ...
> 
> 2. Periodic
> ..."
> 
> Basically you have two independent timer types. It's quite explicit about 
> that only the "initial timer expiration" can be relative or absolute. It 
> doesn't say anywhere that there are relative and absolute periodic timer, 
> all references to "absolute" or "relative" are only in connection with the 
> initial expiration time and after the initial expiration, it becomes a 
> periodic timer. At every timer expiration the timer is reloaded with a 
> relative time interval.
> I can understand that you find this behaviour useful (although other 
> people may disagree) and the spec doesn't explicitely say that you must 
> not do this, but I really can't convince myself that this is the 
> _intendend_ behaviour.

Goerge said explicitely:

> My $0.02 worth: It is clear (from the standard) that the initial time 
> is to be ABS_TIME.  It is also clear that the interval is to be added 
> to that time.  IMHO then, the result should have the same property, 
> i.e. ABS_TIME. 

I dont find a way to read out that the interval should not have the
ABSTIME property.


> > > Since you don't do any rounding at all anymore, your timer may now expire 
> > > early with low resolution clocks (the old jiffies + 1 problem I explained 
> > > in my ktime_t patch).
> > 
> > It does not expire early. The timer->expires field is still compared
> > against now. 
> 
> I don't think that's enough (unless I missed something). Steven maybe 
> explained it better than I did in
> http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935

Steven said:

> Interesting though, I tried to force this scenario, by changing the
> base->get_time to return jiffies.  I have a jitter test and ran this
> several times, and I could never get it to expire early.  I even changed
> HZ back to 100.
> 
> Then I looked at run_ktimer_queue.  And here we have the compare:
> 
> 		timer = list_entry(base->pending.next, struct ktimer, list);
> 		if (ktime_cmp(now, <=, timer->expires))
> 			break;
> 
> So, the timer does _not_ get processed if it is after or _equal_ to the
> current time.  So although the timer may go off early, the expired queue
> does not get executed.  So the above example would not go off at 3.2,
> but some time in the 4 category

Again, I'm not able to find the problem. 

while(timers_pending()) {
	timer = getnext_timer();
	if (timer->expires > now)
		break;
	execute_callback();
}

Please elaborate how the timer can expire early.

> Even if you set the timer resolution to 1 nsec, there is still the 
> resolution of the actual hardware clock and it has to be taken into 
> account somewhere when you start a relative timer. Even if the clock 
> resolution is usually higher than the normal latency, so the problem won't 
> be visible for most people, the general timer code should take this into 
> account. If someone doesn't care about high resolution timer, he can still 
> use it with a low resolution clock (e.g. jiffies) and then this becomes a 
> problem.

I'm completely lost on this.
Can you please make up a simple example with numbers?

If you disable high resolution timers then the resolution is jiffies.
The simple comparision which determines whether the timer is expired or
not is still valid.

> > > But this is now exactly the bevhaviour your timer has, why is not 
> > > "surprising" anymore?
> > 
> > Yes, we wrote that before. After reconsidering the results we came to
> > the conclusion, that we actually dont need the rounding at all because
> > the uneven distance is equally surprising as the summing up errors
> > introduced by rounding.
> 
> I don't think that the summing up error is surprising at all, the spec is 
> quite clear that the time values have to be rounded up to the resolution 
> of the timer and it's also the behaviour of the current timer.

No, the spec is not clear at all about this.

I pointed this out before and I still think that the part of the
RATIONALE section is the key to this decision.

"Also note that some implementations may choose to adjust time and/or
interval values to exactly match the ticks of the underlying clock"

You decide to do the adjustment. I prefer not to do so and I dont buy
any argument which says, that the current behaviour is the yardstick for
everything. It can't be. Otherwise we would not be able to introduce
high resolution timers at all. Every application which relies on some
behaviour of the kernel which is not explicitely required by the
specification is broken by definition. 

The compliance requirement is the yardstick, not some random
implementation detail which happens to be compliant.

> This error is actually the expected behaviour for any timer with a 
> resolution different from 1 nsec. I don't want to say that we can't have 
> such a timer, but I'm not so sure whether this should be the default 
> behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME 
> and CLOCK_REALTIME_HR, where one is possibly faster and the other is more 
> precise. Even within the kernel I would prefer to map itimer and nanosleep 
> to the first clock (maybe also based on arch/kconfig defaults).
> OTOH if the hardware allows it, both clocks can do the same thing, but I 
> really would like to have the possibility to give higher (and thus 
> possibly more expensive) resolution only to those asking for it.

Thats an rather odd approach for me. If we drag this further then we
might consider that only some users (i.e. applications) of -rt patches
are using the enhanced functionalities, which introduces interesting
computational problems (e.g when to treat a mutex as a concurrency
control which is capable of priority inversion or not). 

I vote strongly against introducing private, special purpose APIs and I
consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
introduce any penalties for people who do not enable a future high
resolution extension. It gives us the benefit of a clean code base which
is capable to be switched simply and non intrusive to the high
resolution mode. We have done extensive tests on the impact of
converting all users unconditionally to high resolution mode once it is
switched on and the penalty is within the noise range. 

You are explicitely asking for increased complexity with your approach. 

> > > I don't mind changing the behaviour, but I would prefer to do this in a 
> > > separate step and with an analysis of the possible consequences. This is 
> > > not just about posix-timers, but it also affects itimers, nanosleep and 
> > > possibly other systems in the future. Actually my main focus is not on HR 
> > > posix timer, my main interest is that everythings else keeps working and 
> > > doesn't has to pay the price for it.
> > 
> > While my focus is a clean merging of high resolution timers without
> > breaking the non hrt case, I still believe that we can find a solution,
> > where we can have the base implementation without any reference to
> > jiffies.
> 
> This is what I think requires the better clock abstraction, most of it is 
> related to the clock resolution, the generic timer code currently has no 
> idea of the real resolution of the underlying clock, so I assumed a worst 
> case of TICK_NSEC everywhere.

Well, can you please show where the current hrtimer implementation  is
doing something which requires a better clock abstraction ?

The clock abstraction layer is relevant if you actuallly want to switch
to high resolution mode and until then the coarse interface is
sufficient.

> > I try to compare and contrast the two possible solutions:
> > 
> > Rounding the initial expiry time and the interval results in a summing
> > up error, which depends on the delta of the interval and the
> > resolution. 
> > 
> > The non rounding solution results in a summing up error for intervals
> > which are less than the resolution. For intervals >= resolution no
> > summing up error is happening, but for intervals, which are not a
> > multiple of the resolution, an uneven interval as close as possible to
> > the timeline is delivered.
> > 
> > In both cases the timers never expire early and I think both variants
> > are compliant with the specification.
> 
> What I'd like to avoid is that we have to commit ourselves to only one 
> solution right now. I think the first solution is better suited to the low 
> resolution timer, that we have right now. The second solution requires a 
> better clock framework - this includes better time keeping and 
> reprogrammable timer interrupts.

We have to choose one. Everything else is a bad compromise. There is
nothing worse than making no decision when you are at a point where a
decision is required.

Please provide one reproducable scenario why a better time keeping and a
reprogrammable timer interrupt is required. The current hrtimer code
does not need this at all. Only if you want to have higher resolution
you need this and we use it in the high resolution timer queue - both
the timekeeping and the reprogamming abstraction layer.

> At this point I wouldn't like to settle on just one solution, I had to 
> see more of the infrastructure integrated before doing this. At this point 
> I see more advantages in having a choice (may it be as Kconfig or even a 
> runtime option).

Well, I do not see a point why we want to have Kconfig, runtime or
whatever choices for a non existant problem at all.

	tglx




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-14 22:30               ` Thomas Gleixner
@ 2005-12-15  0:55                 ` George Anzinger
  2005-12-15 14:18                 ` Steven Rostedt
  2005-12-19 14:50                 ` Roman Zippel
  2 siblings, 0 replies; 74+ messages in thread
From: George Anzinger @ 2005-12-15  0:55 UTC (permalink / raw)
  To: tglx; +Cc: Roman Zippel, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Thomas Gleixner wrote:
> Hi,
~

> 
> 
>>This error is actually the expected behaviour for any timer with a 
>>resolution different from 1 nsec. I don't want to say that we can't have 
>>such a timer, but I'm not so sure whether this should be the default 
>>behaviour. I actually prefer George's earlier suggestion of CLOCK_REALTIME 
>>and CLOCK_REALTIME_HR, where one is possibly faster and the other is more 
>>precise. Even within the kernel I would prefer to map itimer and nanosleep 
>>to the first clock (maybe also based on arch/kconfig defaults).
>>OTOH if the hardware allows it, both clocks can do the same thing, but I 
>>really would like to have the possibility to give higher (and thus 
>>possibly more expensive) resolution only to those asking for it.
> 
> 
> Thats an rather odd approach for me. If we drag this further then we
> might consider that only some users (i.e. applications) of -rt patches
> are using the enhanced functionalities, which introduces interesting
> computational problems (e.g when to treat a mutex as a concurrency
> control which is capable of priority inversion or not). 

Er... what?  This is a non-compute.
> 
> I vote strongly against introducing private, special purpose APIs and I
> consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
> introduce any penalties for people who do not enable a future high
> resolution extension. It gives us the benefit of a clean code base which
> is capable to be switched simply and non intrusive to the high
> resolution mode. We have done extensive tests on the impact of
> converting all users unconditionally to high resolution mode once it is
> switched on and the penalty is within the noise range. 
> 
> You are explicitely asking for increased complexity with your approach. 

I beg to differ here.  The fact that high res timers, in general, 
require an interrupt per expiry, and that, by definition, we are 
changing the resolution by, I would guess, a couple of orders of 
magnitude implies a rather much larger over head.  If we sum this over 
all user timers it can IMHO get out of control.  Given that only a 
very small number of applications really need the extra resolution, I 
think it makes a lot of sense that those applications incur the 
overhead and others, which don't need nor want the higher resolution, 
just use the old low resolution timers.  The notion of switching this 
at configure time implies that a given kernel is going to be used ONLY 
one way or another for all applications, which, AFAICT is just not the 
way most users do things.

As to CLOCK_XXX_HR being a special purpose API, this is only half 
true.  It is a POSIX conforming extension and I do think you can find 
it used elsewhere as well.  On the other hand, it if you want to limit 
the higher overhead timers to only those who ask, well, I guess you 
could call that "special purpose".

On the complexity thing, your new organization makes the added 
"complexity" rather non-complex, in fact, you might say it is down 
right simple, for which, thank you.
> 
> 
~
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-13  1:25             ` George Anzinger
  2005-12-13  9:18               ` Thomas Gleixner
@ 2005-12-15  1:35               ` Roman Zippel
  2005-12-15  2:29                 ` George Anzinger
  1 sibling, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-15  1:35 UTC (permalink / raw)
  To: George Anzinger
  Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Mon, 12 Dec 2005, George Anzinger wrote:

> My $0.02 worth: It is clear (from the standard) that the initial time is to be
> ABS_TIME.

Yes.

>  It is also clear that the interval is to be added to that time.

Not necessarily. It says it_interval is a "reload value", it's used to 
reload the timer to count down to the next expiration.
It's up to the implementation, whether it really counts down this time or 
whether it converts it first into an absolute value.

> IMHO then, the result should have the same property, i.e. ABS_TIME.  Sort of
> like adding an offset to a relative address. The result is still relative.

If the result is relative, why should have a clock set any effect?
IMO the spec makes it quite clear that initial timer and the periodic 
timer are two different types of the timer. The initial timer only 
specifies how the periodic timer is started and the periodic timer itself 
is a "relative time service".

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-15  1:35               ` Roman Zippel
@ 2005-12-15  2:29                 ` George Anzinger
  2005-12-19 14:56                   ` Roman Zippel
  0 siblings, 1 reply; 74+ messages in thread
From: George Anzinger @ 2005-12-15  2:29 UTC (permalink / raw)
  To: Roman Zippel; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Roman Zippel wrote:
> Hi,
> 
> On Mon, 12 Dec 2005, George Anzinger wrote:
> 
> 
>>My $0.02 worth: It is clear (from the standard) that the initial time is to be
>>ABS_TIME.
> 
> 
> Yes.
> 
> 
>> It is also clear that the interval is to be added to that time.
> 
> 
> Not necessarily. It says it_interval is a "reload value", it's used to 
> reload the timer to count down to the next expiration.
> It's up to the implementation, whether it really counts down this time or 
> whether it converts it first into an absolute value.
> 
> 
>>IMHO then, the result should have the same property, i.e. ABS_TIME.  Sort of
>>like adding an offset to a relative address. The result is still relative.
> 
> 
> If the result is relative, why should have a clock set any effect?
> IMO the spec makes it quite clear that initial timer and the periodic 
> timer are two different types of the timer. The initial timer only 
> specifies how the periodic timer is started and the periodic timer itself 
> is a "relative time service".
> 
Well, I guess we will have to agree to disagree.  That which the 
interval is added to is an absolute time, so I, and others, take the 
result as absolute.  At this point there really is no "conversion" to 
an absolute timer.  Once the timer initial time is absolute, 
everything derived from it, i.e. all intervals added to it, must be 
absolute.

For what its worth, I do think that the standards folks could have 
done a bit better here.  I, for example, would have liked to have seen 
a discussion about what to do with overrun in the face of clock setting.


-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 15/21] hrtimer core code
  2005-12-06  0:01 ` [patch 15/21] hrtimer core code tglx
@ 2005-12-15  3:43   ` Matt Helsley
  0 siblings, 0 replies; 74+ messages in thread
From: Matt Helsley @ 2005-12-15  3:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Andrew Morton, rostedt, john stultz, zippel, Ingo Molnar

On Tue, 2005-12-06 at 01:01 +0100, tglx@linutronix.de wrote:
<snip>

> Index: linux-2.6.15-rc5/kernel/hrtimer.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.15-rc5/kernel/hrtimer.c

<snip>

> +/**
> + * ktime_get_ts - get the monotonic clock in timespec format
> + *
> + * @ts:		pointer to timespec variable
> + *
> + * The function calculates the monotonic clock from the realtime
> + * clock and the wall_to_monotonic offset and stores the result
> + * in normalized timespec format in the variable pointed to by ts.
> + */
> +void ktime_get_ts(struct timespec *ts)
> +{
> +	struct timespec tomono;
> +	unsigned long seq;
> +
> +	do {
> +		seq = read_seqbegin(&xtime_lock);
> +		getnstimeofday(ts);
> +		tomono = wall_to_monotonic;
> +
> +	} while (read_seqretry(&xtime_lock, seq));
> +
> +	set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
> +				ts->tv_nsec + tomono.tv_nsec);
> +}

<snip>

unlike many other places I've seen the loop structure: 
       do {
               seq = read_seqbegin(&xtime_lock);
...
       } while (unlikely(read_seqretry(&xtime_lock, seq)));

This one lacks the unlikely() in the loop condition. Do high res timers
tend to make the branch hint incorrect?

Thanks,
	-Matt Helsley



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-14 22:30               ` Thomas Gleixner
  2005-12-15  0:55                 ` George Anzinger
@ 2005-12-15 14:18                 ` Steven Rostedt
  2005-12-19 14:50                 ` Roman Zippel
  2 siblings, 0 replies; 74+ messages in thread
From: Steven Rostedt @ 2005-12-15 14:18 UTC (permalink / raw)
  To: tglx; +Cc: Roman Zippel, linux-kernel, Andrew Morton, johnstul, mingo

On Wed, 2005-12-14 at 23:30 +0100, Thomas Gleixner wrote:

> > 
> > I don't think that's enough (unless I missed something). Steven maybe 
> > explained it better than I did in
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935
> 
> Steven said:
> 
> > Interesting though, I tried to force this scenario, by changing the
> > base->get_time to return jiffies.  I have a jitter test and ran this
> > several times, and I could never get it to expire early.  I even changed
> > HZ back to 100.
> > 
> > Then I looked at run_ktimer_queue.  And here we have the compare:
> > 
> > 		timer = list_entry(base->pending.next, struct ktimer, list);
> > 		if (ktime_cmp(now, <=, timer->expires))
> > 			break;
> > 
> > So, the timer does _not_ get processed if it is after or _equal_ to the
> > current time.  So although the timer may go off early, the expired queue
> > does not get executed.  So the above example would not go off at 3.2,
> > but some time in the 4 category
> 
> Again, I'm not able to find the problem. 
> 
> while(timers_pending()) {
> 	timer = getnext_timer();
> 	if (timer->expires > now)
> 		break;
> 	execute_callback();
> }
> 
> Please elaborate how the timer can expire early.

Actually Thomas, the above code doesn't handle it correctly, although,
the code you have in hrtimer.c does.  Here you say ">" where it should
be ">=", otherwise you can have the affect that I explained in the
reference that Roman stated.

Although run_hrtimer_queue is still correct, I think you might want to
change the hrtimer_forward.  It has a ">" where it should be ">=".

-- Steve



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-14 22:30               ` Thomas Gleixner
  2005-12-15  0:55                 ` George Anzinger
  2005-12-15 14:18                 ` Steven Rostedt
@ 2005-12-19 14:50                 ` Roman Zippel
  2005-12-19 22:05                   ` Thomas Gleixner
  2 siblings, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-19 14:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Wed, 14 Dec 2005, Thomas Gleixner wrote:

> The current -hrt queue contains the removal patch of the list_head
> already and you interrupted my attempt to send out the patch for -mm :)

Ok.

> > > The state field is not removed because I'm not a big fan of those
> > > overloaded fields and I prefer to pay the 4 byte penalty for the
> > > seperation.
> > > Of course if there is the absolute requirement to reduce the size, I'm
> > > not insisting on keeping it.
> > 
> > Well, I'm not a big fan of redundant state information, e.g. the pending 
> > information can be included in the rb_node (it's not as quite simple as 
> > with the timer_list, but it's the same thing). 
> 
> I do not consider this as redundant information. It's a design decision
> whether to use a state variable for state information and the rbnode for
> rbtree handling or to overload the meaning of the rbnode with
> information which is not the natural associated content. 
> 
> I'm well aware of those optimization and space saving tricks. I did
> microcontroller programming long enough, but - and out of the experience
> - I want to avoid it for new designs where ever it is possible for
> clarity and extensibility reasons.

It's not just about optimization tricks, it's about redundant information. 
Right now the primary function of the state is to tell whether the timer 
node is in the tree or now, so I prefer to add this information directly 
to the rbnode, similiar to what we do with normal lists.
The other problem is that such "simple" state variables quickly become 
inadequate as the state machine becomes more complex, especially if 
multiple processes run at the same time (e.g. a timer can be "running" 
and/or "active"). So for me it's also for clarity and extensibility 
reasons, that I want to avoid overloaded state machines and rather keep 
it as simple as possible.

> > The expired information 
> > (together with the data field) is an optimization for simple sleeps that 
> > AFAICT only makes a difference in the rt kernel (the saved context switch 
> > you mentioned above). What makes me more uncomfortable is that this is a 
> > special case optimization and other callbacks are probably fast as well 
> > (e.g. wakeup + timer restart).
> 
> Not only in a -rt kernel, it also saves the context switch for a high
> resolution timer enabled kernel, where you actually can execute the
> callback in hard interrupt context. We can solve it differently, but we
> should carefully think about the extensiblity issues. The wakeup +
> restart szenario is a good reason to reconsider this.

I don't think executing something in the soft or hard interrupt context 
makes a big difference on a normal kernel (at least I wouldn't call it a 
context switch).

> > I can understand you want to keep the difference to the rt kernel small, 
> > but for me it's more about immediate benefits against uncertain long term 
> > benefits.
> 
> If you have a clear target and the experience of having implemented the
> extensions, you have to carefully weigh up the consequences of such
> decisions. I'm not talking about "might be implemented by somebody
> sometimes features", I'm talking about existing proof of concept
> implementations. There is no real justification to ignore well known
> consequences.
> 
> Of course if you consider the possibility of including high resolution
> timers (I'm not talking about -rt) as  zero, your requests make sense. 

Thomas, please don't treat me like an idiot, you may have more experiance 
with hrtimer, but after working on it for a while I also know what I'm 
talking about. Please accept that I have different focus on this, I want 
to keep things as simple as possible. New features should stand on his 
own and this includes the complexity they add to the kernel. The new 
hrtimer especially cleans up greatly the posix timer stuff, this and 
keeping all other users working is my primary focus now.

New features add new complexity and I want to see and evaluate it at the 
time it's added to the kernel, primarily to find solutions to avoid the 
(runtime) complexity for clocks which don't want to or can't support such 
high resolutions, so they don't have to pay the price for these new 
features. I want to keep things flexible and keeping things simple is IMO 
a much better starting point.

> I disagree because I'm convinced that the problems "high res timers",
> "dynamic ticks", "timekeeping", "clock event abstraction" are closely
> related and we have high demands to get those solved in a clean way. So
> providing some jiffies bound minimal solution in the first place is more
> than shortsighted IMO.

You're misunderstanding me, I don't want "some jiffies bound minimal 
solution", I want to solve one problem at a time and fixing the jiffies 
problem requires solving problems in the clock abstraction first, 
otherwise you produce a crutch which works in most cases, but leaves a few 
problem cases behind.

> Goerge said explicitely:
> 
> > My $0.02 worth: It is clear (from the standard) that the initial time 
> > is to be ABS_TIME.  It is also clear that the interval is to be added 
> > to that time.  IMHO then, the result should have the same property, 
> > i.e. ABS_TIME. 
> 
> I dont find a way to read out that the interval should not have the
> ABSTIME property.

That's not what you wrote earlier: "I agree, that the restriction to the 
initial it_value is definitely something you can read out of the spec."

clock_settime says: "..., these time services shall expire when the 
requested relative interval elapses, independently of the new or old value 
of the clock." it_interval is a relative interval and otherwise the spec 
only talks about "an initial expiration time, again either relative or 
absolute," I can't really find a direct connection that TIMER_ABSTIME 
should apply to the interval as well.

> > > > Since you don't do any rounding at all anymore, your timer may now expire 
> > > > early with low resolution clocks (the old jiffies + 1 problem I explained 
> > > > in my ktime_t patch).
> > > 
> > > It does not expire early. The timer->expires field is still compared
> > > against now. 
> > 
> > I don't think that's enough (unless I missed something). Steven maybe 
> > explained it better than I did in
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=113047529313935
> 
> Steven said:
> 
> > Interesting though, I tried to force this scenario, by changing the
> > base->get_time to return jiffies.  I have a jitter test and ran this
> > several times, and I could never get it to expire early.
> [..]
> Please elaborate how the timer can expire early.

At this time you still did the rounding of the values, so it actually 
worked.
When reading a time t from a clock with resolution r, the real time can be 
anything from t to t+r-1. Assuming it's currently t+r-1 and you try to set 
a relative timer to r-1, you will read t from the clock and arm the timer 
for t+r-1, which will cause the timer to expire at t+r, where it must not 
expire before t+r-1+r-1.
It currently only works because latencies are usually larger than the 
clock resolution, but if I want to configure hrtimer with a low resolution 
clock, the problem can become visible.

> > > > But this is now exactly the bevhaviour your timer has, why is not 
> > > > "surprising" anymore?
> > > 
> > > Yes, we wrote that before. After reconsidering the results we came to
> > > the conclusion, that we actually dont need the rounding at all because
> > > the uneven distance is equally surprising as the summing up errors
> > > introduced by rounding.
> > 
> > I don't think that the summing up error is surprising at all, the spec is 
> > quite clear that the time values have to be rounded up to the resolution 
> > of the timer and it's also the behaviour of the current timer.
> 
> No, the spec is not clear at all about this.
> 
> I pointed this out before and I still think that the part of the
> RATIONALE section is the key to this decision.
> 
> "Also note that some implementations may choose to adjust time and/or
> interval values to exactly match the ticks of the underlying clock"

You basically use this sentence as loophole to take the whole resolution 
rounding rule ad absurdum and I disagree that this sentence means 
"ignore everything above and do whatever you want".

> You decide to do the adjustment. I prefer not to do so and I dont buy
> any argument which says, that the current behaviour is the yardstick for
> everything. It can't be. Otherwise we would not be able to introduce
> high resolution timers at all. Every application which relies on some
> behaviour of the kernel which is not explicitely required by the
> specification is broken by definition. 

The problem is that you force this behaviour also on other hrtimer users
(itimers and nanosleep) and we should be very careful with such behaviour 
changes. My proposal keeps the current behaviour and is less likely to 
break anything. As I said before I'm not against changing the behaviour, 
but it should be done carefully.

> I vote strongly against introducing private, special purpose APIs and I
> consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
> introduce any penalties for people who do not enable a future high
> resolution extension. It gives us the benefit of a clean code base which
> is capable to be switched simply and non intrusive to the high
> resolution mode. We have done extensive tests on the impact of
> converting all users unconditionally to high resolution mode once it is
> switched on and the penalty is within the noise range. 
> 
> You are explicitely asking for increased complexity with your approach. 

Which in this case it would be good thing. Right now we don't have much 
choice in clock source, but that will change soon and I think it would be 
a good to be able to map a timer to specific clock source. The gained 
flexibility outweighs the required complexity greatly.

> > > While my focus is a clean merging of high resolution timers without
> > > breaking the non hrt case, I still believe that we can find a solution,
> > > where we can have the base implementation without any reference to
> > > jiffies.
> > 
> > This is what I think requires the better clock abstraction, most of it is 
> > related to the clock resolution, the generic timer code currently has no 
> > idea of the real resolution of the underlying clock, so I assumed a worst 
> > case of TICK_NSEC everywhere.
> 
> Well, can you please show where the current hrtimer implementation  is
> doing something which requires a better clock abstraction ?

1. clock resolution is unknown (see above).
2. reprogrammable timer interrupts.

> The clock abstraction layer is relevant if you actuallly want to switch
> to high resolution mode and until then the coarse interface is
> sufficient.

Right and until then it's also not really avoidable, that you find a few 
references to jiffies. I'm not saying that we have to keep them, but 
please only one step at a time.

> > What I'd like to avoid is that we have to commit ourselves to only one 
> > solution right now. I think the first solution is better suited to the low 
> > resolution timer, that we have right now. The second solution requires a 
> > better clock framework - this includes better time keeping and 
> > reprogrammable timer interrupts.
> 
> We have to choose one. Everything else is a bad compromise. There is
> nothing worse than making no decision when you are at a point where a
> decision is required.

Do you really have so little trust in your own code, that we can't afford 
the flexibility and have to hardcode the timer resolution now?

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-15  2:29                 ` George Anzinger
@ 2005-12-19 14:56                   ` Roman Zippel
  2005-12-19 20:54                     ` George Anzinger
  0 siblings, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-19 14:56 UTC (permalink / raw)
  To: George Anzinger
  Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Wed, 14 Dec 2005, George Anzinger wrote:

> > > IMHO then, the result should have the same property, i.e. ABS_TIME.  Sort
> > > of
> > > like adding an offset to a relative address. The result is still relative.
> > 
> > 
> > If the result is relative, why should have a clock set any effect?
> > IMO the spec makes it quite clear that initial timer and the periodic timer
> > are two different types of the timer. The initial timer only specifies how
> > the periodic timer is started and the periodic timer itself is a "relative
> > time service".
> > 
> Well, I guess we will have to agree to disagree.

That's easy for you to say. :)
You don't think the current behaviour is wrong.

>  That which the interval is
> added to is an absolute time, so I, and others, take the result as absolute.
> At this point there really is no "conversion" to an absolute timer.  Once the
> timer initial time is absolute, everything derived from it, i.e. all intervals
> added to it, must be absolute.

With this argumentation, any relative timer could be treated this way, you 
have to base a relative timer on something.
While searching for more information I found the NetBSD code and they 
do exactly this, they just convert everything to absolute values and clock 
set affects all timers equally. Is this now more correct?

> For what its worth, I do think that the standards folks could have done a bit
> better here.  I, for example, would have liked to have seen a discussion about
> what to do with overrun in the face of clock setting.

Maybe they thought it wouldn't be necessary :), because a periodic is a 
relative timer and thus not affected...

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-19 14:56                   ` Roman Zippel
@ 2005-12-19 20:54                     ` George Anzinger
  2005-12-21 23:03                       ` Roman Zippel
  0 siblings, 1 reply; 74+ messages in thread
From: George Anzinger @ 2005-12-19 20:54 UTC (permalink / raw)
  To: Roman Zippel; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Roman Zippel wrote:
> Hi,
> 
> On Wed, 14 Dec 2005, George Anzinger wrote:
> 
> 
>>>>IMHO then, the result should have the same property, i.e. ABS_TIME.  Sort
>>>>of
>>>>like adding an offset to a relative address. The result is still relative.
>>>
>>>
>>>If the result is relative, why should have a clock set any effect?
>>>IMO the spec makes it quite clear that initial timer and the periodic timer
>>>are two different types of the timer. The initial timer only specifies how
>>>the periodic timer is started and the periodic timer itself is a "relative
>>>time service".
>>>
>>
>>Well, I guess we will have to agree to disagree.
> 
> 
> That's easy for you to say. :)
> You don't think the current behaviour is wrong.
> 
> 
On of the issues I see with using your assumption is that moving the 
timer to an absolute clock after the initial expiry _may_ lead to 
additional qauntization errors, depending on how aligned the two 
clocks are.

>> That which the interval is
>>added to is an absolute time, so I, and others, take the result as absolute.
>>At this point there really is no "conversion" to an absolute timer.  Once the
>>timer initial time is absolute, everything derived from it, i.e. all intervals
>>added to it, must be absolute.
> 
> 
> With this argumentation, any relative timer could be treated this way, you 
> have to base a relative timer on something.
> While searching for more information I found the NetBSD code and they 
> do exactly this, they just convert everything to absolute values and clock 
> set affects all timers equally. Is this now more correct?
> 
I would guess, then, that either the non-absolute or the absolute 
timer behaves badly in the face of clock setting.  Could you provide a 
pointer to the NetBSD code so I can have a look too?
> 
>>For what its worth, I do think that the standards folks could have done a bit
>>better here.  I, for example, would have liked to have seen a discussion about
>>what to do with overrun in the face of clock setting.
> 
> 
> Maybe they thought it wouldn't be necessary :), because a periodic is a 
> relative timer and thus not affected...

Well, then they could have said that :)  Might have prevented a lot of 
lkml bandwidth usage as well as several days of my time trying to do 
something other than what they might say is the right thing.

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-19 14:50                 ` Roman Zippel
@ 2005-12-19 22:05                   ` Thomas Gleixner
  0 siblings, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-12-19 22:05 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton, rostedt, johnstul, Ingo Molnar

Hi,

On Mon, 2005-12-19 at 15:50 +0100, Roman Zippel wrote: 

> It's not just about optimization tricks, it's about redundant information. 
> Right now the primary function of the state is to tell whether the timer 
> node is in the tree or now, so I prefer to add this information directly 
> to the rbnode, similiar to what we do with normal lists.
> The other problem is that such "simple" state variables quickly become 
> inadequate as the state machine becomes more complex, especially if 
> multiple processes run at the same time (e.g. a timer can be "running" 
> and/or "active"). So for me it's also for clarity and extensibility 
> reasons, that I want to avoid overloaded state machines and rather keep 
> it as simple as possible.

You want to avoid overloaded state machines and therefor overload a
rbnode struct with state information ? 

Sorry, -ENOPARSE.

> > Not only in a -rt kernel, it also saves the context switch for a high
> > resolution timer enabled kernel, where you actually can execute the
> > callback in hard interrupt context. We can solve it differently, but we
> > should carefully think about the extensiblity issues. The wakeup +
> > restart szenario is a good reason to reconsider this.
> 
> I don't think executing something in the soft or hard interrupt context 
> makes a big difference on a normal kernel (at least I wouldn't call it a 
> context switch).

Well. How would you call it then?

  Thread A runs
  hrtimer interrupt
  timer X is expired, softirq is woken up

  context switch to softirq

  softirq runs
  timer X callback is executed, thread B is woken up

  context switch to thread B

versus

  Thread A runs
  hrtimer interrupt
  timer X is expired, callback is executed thread B is woken up

  context switch to thread B

I still call it a context switch, because it is one, except for the case
where the softirq is called in the interrupt return path, but also then
we gain the advantage that we do not have to execute it.

> > > I can understand you want to keep the difference to the rt kernel small, 
> > > but for me it's more about immediate benefits against uncertain long term 
> > > benefits.
> > 
> > If you have a clear target and the experience of having implemented the
> > extensions, you have to carefully weigh up the consequences of such
> > decisions. I'm not talking about "might be implemented by somebody
> > sometimes features", I'm talking about existing proof of concept
> > implementations. There is no real justification to ignore well known
> > consequences.
> > 
> > Of course if you consider the possibility of including high resolution
> > timers (I'm not talking about -rt) as  zero, your requests make sense. 
> 
> Thomas, please don't treat me like an idiot, you may have more experiance 
> with hrtimer, but after working on it for a while I also know what I'm 
> talking about. Please accept that I have different focus on this, I want 
> to keep things as simple as possible. New features should stand on his 
> own and this includes the complexity they add to the kernel. The new 
> hrtimer especially cleans up greatly the posix timer stuff, this and 
> keeping all other users working is my primary focus now.

The basic hrtimer patch without any addons does not introduce
complexities and is simple and keeps everything working. 

Can you please elaborate the new features and complexities instead of
repeating this over and over without pointing out exactly what and where
it is?

> New features add new complexity and I want to see and evaluate it at the 
> time it's added to the kernel, primarily to find solutions to avoid the 
> (runtime) complexity for clocks which don't want to or can't support such 
> high resolutions, so they don't have to pay the price for these new 
> features. I want to keep things flexible and keeping things simple is IMO 
> a much better starting point.

The code is flexible to handle low resolution as well as a later high
resolution extension. It does not introduce additional complexity to
anything. Stop this prayer wheel argumentation and show exactly which
complexitiy it introduces.

> > I disagree because I'm convinced that the problems "high res timers",
> > "dynamic ticks", "timekeeping", "clock event abstraction" are closely
> > related and we have high demands to get those solved in a clean way. So
> > providing some jiffies bound minimal solution in the first place is more
> > than shortsighted IMO.
> 
> You're misunderstanding me, I don't want "some jiffies bound minimal 
> solution", I want to solve one problem at a time and fixing the jiffies 
> problem requires solving problems in the clock abstraction first, 
> otherwise you produce a crutch which works in most cases, but leaves a few 
> problem cases behind.

Which problem cases please ? 

> > Goerge said explicitely:
> > 
> > > My $0.02 worth: It is clear (from the standard) that the initial time 
> > > is to be ABS_TIME.  It is also clear that the interval is to be added 
> > > to that time.  IMHO then, the result should have the same property, 
> > > i.e. ABS_TIME. 
> > 
> > I dont find a way to read out that the interval should not have the
> > ABSTIME property.
> 
> That's not what you wrote earlier: "I agree, that the restriction to the 
> initial it_value is definitely something you can read out of the spec."

I was talking about Georges citiation and not about some random pieces
of text cut out of the original context. I still dont find a way to
interprete Georges writing in the way you did.

> clock_settime says: "..., these time services shall expire when the 
> requested relative interval elapses, independently of the new or old value 
> of the clock." it_interval is a relative interval and otherwise the spec 
> only talks about "an initial expiration time, again either relative or 
> absolute," I can't really find a direct connection that TIMER_ABSTIME 
> should apply to the interval as well.

timer_set says: "... The reload value of the timer is set to the value
specified by the it_interval member of value. When a timer is armed with
a non-zero it_interval, a periodic (or repetitive) timer is specified."

I dont see a notion that an interval does remove the ABSTIME property
from a timer which was set up with the ABSTIME flag set.

And you are claiming not to change anything in order not to break
anything. The current upstream code is keeping ABSTIME interval timers
on the abslist, so why are you changing this at will for no real good
reason.

> > Please elaborate how the timer can expire early.
> 
> At this time you still did the rounding of the values, so it actually 
> worked.
> When reading a time t from a clock with resolution r, the real time can be 
> anything from t to t+r-1. Assuming it's currently t+r-1 and you try to set 
> a relative timer to r-1, you will read t from the clock and arm the timer 
> for t+r-1, which will cause the timer to expire at t+r, where it must not 
> expire before t+r-1+r-1.

I dont see where you pull this from.

At any given point the clock reads a value between two ticks.

t(tickn) <= now < t(tickn+1), where t(tickn+1) - t(tickn) = resolution

In any given case the interval is added to now. 

expiry = now + interval

The expiry check is still 

if (expiry <= now)
	expire_timer()

The softirq which handles the expiry is called every tick, which
guarantees that the timer is always expired at or past the tick
boundary, but never ever it can be expired early.

> It currently only works because latencies are usually larger than the 
> clock resolution, but if I want to configure hrtimer with a low resolution 
> clock, the problem can become visible.

Where is this configuration switch in the posted code ? The posted
hrtimers code is low resolution.

Show me a simple example code which makes this become visible.

> > > > > But this is now exactly the bevhaviour your timer has, why is not 
> > > > > "surprising" anymore?
> > > > 
> > > > Yes, we wrote that before. After reconsidering the results we came to
> > > > the conclusion, that we actually dont need the rounding at all because
> > > > the uneven distance is equally surprising as the summing up errors
> > > > introduced by rounding.
> > > 
> > > I don't think that the summing up error is surprising at all, the spec is 
> > > quite clear that the time values have to be rounded up to the resolution 
> > > of the timer and it's also the behaviour of the current timer.
> > 
> > No, the spec is not clear at all about this.
> > 
> > I pointed this out before and I still think that the part of the
> > RATIONALE section is the key to this decision.
> > 
> > "Also note that some implementations may choose to adjust time and/or
> > interval values to exactly match the ticks of the underlying clock"
> 
> You basically use this sentence as loophole to take the whole resolution 
> rounding rule ad absurdum and I disagree that this sentence means 
> "ignore everything above and do whatever you want".

I do not whatever I want and you well know that. The rounding is still
done on expiry time, which means the rounding happens on the tick
boundary.

"Time values that are between two consecutive non-negative integer
multiples of the resolution of the specified timer will be rounded up to
the larger multiple of the resolution. Quantization error will not cause
the timer to expire earlier than the rounded time value.
....
Also note that some implementations may choose to adjust time and/or
interval values to exactly match the ticks of the underlying clock."

Please tell me where my interpretation is violating the spec.

It is different to your interpretation, thats all.

> > You decide to do the adjustment. I prefer not to do so and I dont buy
> > any argument which says, that the current behaviour is the yardstick for
> > everything. It can't be. Otherwise we would not be able to introduce
> > high resolution timers at all. Every application which relies on some
> > behaviour of the kernel which is not explicitely required by the
> > specification is broken by definition. 
> 
> The problem is that you force this behaviour also on other hrtimer users
> (itimers and nanosleep) and we should be very careful with such behaviour 
> changes. My proposal keeps the current behaviour and is less likely to 
> break anything. As I said before I'm not against changing the behaviour, 
> but it should be done carefully.
> 
> > I vote strongly against introducing private, special purpose APIs and I
> > consider CLOCK_XXX_HR as such. The proposed hrtimer solution does not
> > introduce any penalties for people who do not enable a future high
> > resolution extension. It gives us the benefit of a clean code base which
> > is capable to be switched simply and non intrusive to the high
> > resolution mode. We have done extensive tests on the impact of
> > converting all users unconditionally to high resolution mode once it is
> > switched on and the penalty is within the noise range. 
> > 
> > You are explicitely asking for increased complexity with your approach. 
> 
> Which in this case it would be good thing. Right now we don't have much 
> choice in clock source, but that will change soon and I think it would be 
> a good to be able to map a timer to specific clock source. The gained 
> flexibility outweighs the required complexity greatly.

I really want to know what you are talking about. I have proven, that
without clock abstraction improvements the current code is working
without one single reference to jiffies. When clock abstractions are
available the code will make use of them just by changing the functions
which read the crurrent time.

> > > > While my focus is a clean merging of high resolution timers without
> > > > breaking the non hrt case, I still believe that we can find a solution,
> > > > where we can have the base implementation without any reference to
> > > > jiffies.
> > > 
> > > This is what I think requires the better clock abstraction, most of it is 
> > > related to the clock resolution, the generic timer code currently has no 
> > > idea of the real resolution of the underlying clock, so I assumed a worst 
> > > case of TICK_NSEC everywhere.
> > 
> > Well, can you please show where the current hrtimer implementation  is
> > doing something which requires a better clock abstraction ?
> 
> 1. clock resolution is unknown (see above).
> 2. reprogrammable timer interrupts.

There is no single reference to a reprogrammable timer interrupt in the
current hrtimer code which was posted and replaced the initial ktimer
code. 

Please stop mixing up the high resolution timer implementation on top of
hrtimers with the hrtimers base patch for a cheap argument. We have been
there and I dont see a point to get back to this kind of discussion.

> > The clock abstraction layer is relevant if you actuallly want to switch
> > to high resolution mode and until then the coarse interface is
> > sufficient.
> 
> Right and until then it's also not really avoidable, that you find a few 
> references to jiffies. I'm not saying that we have to keep them, but 
> please only one step at a time.

No, it is not necessary at all. Thats proven and it is one step at a
time. If we can avoid jiffies in the first place why should we put them
there ? What would we gain ?

> > > What I'd like to avoid is that we have to commit ourselves to only one 
> > > solution right now. I think the first solution is better suited to the low 
> > > resolution timer, that we have right now. The second solution requires a 
> > > better clock framework - this includes better time keeping and 
> > > reprogrammable timer interrupts.
> > 
> > We have to choose one. Everything else is a bad compromise. There is
> > nothing worse than making no decision when you are at a point where a
> > decision is required.
> 
> Do you really have so little trust in your own code, that we can't afford 
> the flexibility and have to hardcode the timer resolution now?

I trust my code, but you seem to trust jiffies more than simple math.

What flexibility do you gain with your jiffies implementation ? 
The flexibility to add some replacement code which will look basically
the same way as hrtimers are looking now ?

	tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-19 20:54                     ` George Anzinger
@ 2005-12-21 23:03                       ` Roman Zippel
  2005-12-22  4:30                         ` George Anzinger
  0 siblings, 1 reply; 74+ messages in thread
From: Roman Zippel @ 2005-12-21 23:03 UTC (permalink / raw)
  To: George Anzinger
  Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Hi,

On Mon, 19 Dec 2005, George Anzinger wrote:

> > You don't think the current behaviour is wrong.
> > 
> > 
> On of the issues I see with using your assumption is that moving the timer to
> an absolute clock after the initial expiry _may_ lead to additional
> qauntization errors, depending on how aligned the two clocks are.

What do you mean by "moving the timer to an an absolute clock"?

> I would guess, then, that either the non-absolute or the absolute timer
> behaves badly in the face of clock setting.  Could you provide a pointer to
> the NetBSD code so I can have a look too?

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_time.c?rev=1.98&content-type=text/x-cvsweb-markup
AFAICT TIMER_ABSTIME is only used to convert the relative value to an 
absolute value.

bye, Roman

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-21 23:03                       ` Roman Zippel
@ 2005-12-22  4:30                         ` George Anzinger
  0 siblings, 0 replies; 74+ messages in thread
From: George Anzinger @ 2005-12-22  4:30 UTC (permalink / raw)
  To: Roman Zippel; +Cc: tglx, linux-kernel, Andrew Morton, rostedt, johnstul, mingo

Roman Zippel wrote:
> Hi,
> 
> On Mon, 19 Dec 2005, George Anzinger wrote:
> 
> 
>>>You don't think the current behaviour is wrong.
>>>
>>>
>>
>>One of the issues I see with using your assumption is that moving the timer to
>>an absolute clock after the initial expiry _may_ lead to additional
>>qauntization errors, depending on how aligned the two clocks are.
> 
> 
> What do you mean by "moving the timer to an an absolute clock"?

The assumption I am making is that the timer is connected to a clock 
(CLOCK_MONOTONIC or CLOCK_REALTIME).  Timers on CLOCK_REALTIME with 
the absolute flag set should expire at the requested time as read from 
that clock, where as relative timers are not affected by time setting 
and thus should be on CLOCK_MONOTONIC.  It is unclear, in general, how 
these two clocks relate to each other at the nanosecond level, or so 
one might think.  Of course, we can define this problem away by a 
particular definition of one of these clocks as being derived from the 
other (which we, infact, do in Linux).
> 
> 
>>I would guess, then, that either the non-absolute or the absolute timer
>>behaves badly in the face of clock setting.  Could you provide a pointer to
>>the NetBSD code so I can have a look too?
> 
> 
> http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/kern_time.c?rev=1.98&content-type=text/x-cvsweb-markup
> AFAICT TIMER_ABSTIME is only used to convert the relative value to an 
> absolute value.

Yes,  there is also this interesting comment in settime:
/* WHAT DO WE DO ABOUT PENDING REAL-TIME TIMEOUTS??? */

I strongly suspect that this system does NOT expire absolute timers 
and clock_nanosleep calls at the requested time in the face of clock 
setting.

I see NO hooks in the referenced code that would allow them to find 
such timers at clock set time, nor are they entered into any different 
list to make them findable.  It would appear that the absolute 
attribute is lost as soon as the time is convereted to a relative time.

In fairness, the POSIX folks added the clock setting requirement a few 
years after the absolute flag was defined...  but, still, there is 
that comment.
	

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-14 10:03   ` Nicolas Mailhot
@ 2005-12-15  1:11     ` George Anzinger
  0 siblings, 0 replies; 74+ messages in thread
From: George Anzinger @ 2005-12-15  1:11 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Thomas Gleixner, Roman Zippel, linux-kernel

Nicolas Mailhot wrote:
> On Mer 14 décembre 2005 00:38, George Anzinger wrote:
> 
>>Nicolas Mailhot wrote:
>>
>>>"This is your interpretation and I disagree.
>>>
>>>If I set up a timer with a 24 hour interval, which should go off
>>>everyday at 6:00 AM, then I expect that this timer does this even when
>>>the clock is set e.g. by daylight saving. I think, that this is a
>>>completely valid interpretation and makes a lot of sense from a
>>>practical point of view. The existing implementation does it that way
>>>already, so why do we want to change this ?"
>>
>>I think that there is a miss understanding here.  The kernel timers,
>>at this time, do not know or care about daylight savings time.  This
>>is not really a clock set but a time zone change which does not
>>intrude on the kernels notion of time (that being, more or less UTC).
> 
> 
> Probably. I freely admit I didn't follow the whole discussion. But the
> example quoted strongly hinted at fudging timers in case of DST, which
> would be very bad if done systematically and not on explicit user request.
> 
> What I meant to write is "do not assume any random clock adjustement
> should change timer duration". Some people want it, others definitely
> don't.
> 
> I case of kernel code legal time should be pretty much irrelevant, so if
> 24h timers are adjusted so they still go of at the same legal hour, that
> would be a bug IMHO.

I am not quite sure what you are asking for here, but, as things set 
today, the kernels notion of time starts with a set time somewhere 
around boot up, be it from RT clock hardware or, possibly some script 
that quires some other system to find the time (NTP or otherwise). 
This time is then kept up to date by timer ticks which are assumed to 
have some fixed duration with, possibly, small drift corrections via 
NTP code.  And then there is the random settimeofday, which the kernel 
has to assume is "needed" and correct.

On top of this the POSIX clocks and timers code implements clocks 
which read the current time and a system relative time called 
monotonic time.  We, by convention, roll the monotonic time and uptime 
together, and, assuming that the NTP corrections are telling us 
something about our "rock", we correct both the monotonic time and the 
time of day as per the NTP requests.

Timers are then built on top of these clocks in two ways, again, as 
per the POSIX standard: 1) the relative timer, and 2) the absolute 
timer.  For the relative timer, the specified expiry time is defined 
to be _now_ plus the given interval.  For the absolute timer the 
expiry time is defined as that time when the given clock reaches the 
requested time.

The only thing in here that might relate to your "legal" hour is that 
we adjust (via NTP) the clocks so that they, supposedly, run at the 
NBS (or is it the Naval Observatory) rate, give or take a small, 
hopefully, well defined error.
> 

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-13 23:38 ` George Anzinger
  2005-12-14  8:58   ` Kyle Moffett
@ 2005-12-14 10:03   ` Nicolas Mailhot
  2005-12-15  1:11     ` George Anzinger
  1 sibling, 1 reply; 74+ messages in thread
From: Nicolas Mailhot @ 2005-12-14 10:03 UTC (permalink / raw)
  To: george; +Cc: Thomas Gleixner, Roman Zippel, linux-kernel


On Mer 14 décembre 2005 00:38, George Anzinger wrote:
> Nicolas Mailhot wrote:
>> "This is your interpretation and I disagree.
>>
>> If I set up a timer with a 24 hour interval, which should go off
>> everyday at 6:00 AM, then I expect that this timer does this even when
>> the clock is set e.g. by daylight saving. I think, that this is a
>> completely valid interpretation and makes a lot of sense from a
>> practical point of view. The existing implementation does it that way
>> already, so why do we want to change this ?"
>
> I think that there is a miss understanding here.  The kernel timers,
> at this time, do not know or care about daylight savings time.  This
> is not really a clock set but a time zone change which does not
> intrude on the kernels notion of time (that being, more or less UTC).

Probably. I freely admit I didn't follow the whole discussion. But the
example quoted strongly hinted at fudging timers in case of DST, which
would be very bad if done systematically and not on explicit user request.

What I meant to write is "do not assume any random clock adjustement
should change timer duration". Some people want it, others definitely
don't.

I case of kernel code legal time should be pretty much irrelevant, so if
24h timers are adjusted so they still go of at the same legal hour, that
would be a bug IMHO.

-- 
Nicolas Mailhot


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-13 23:38 ` George Anzinger
@ 2005-12-14  8:58   ` Kyle Moffett
  2005-12-14 10:03   ` Nicolas Mailhot
  1 sibling, 0 replies; 74+ messages in thread
From: Kyle Moffett @ 2005-12-14  8:58 UTC (permalink / raw)
  To: george; +Cc: Nicolas Mailhot, Thomas Gleixner, Roman Zippel, linux-kernel

On Dec 13, 2005, at 18:38, George Anzinger wrote:
> I think that there is a miss understanding here.  The kernel  
> timers, at this time, do not know or care about daylight savings  
> time.  This is not really a clock set but a time zone change which  
> does not intrude on the kernels notion of time (that being, more or  
> less UTC).

One question I have right now is:  How does the kernel treat time  
slewing?  Sometimes I might want to say: "The clock has continuous  
error and measures 24hours and 2 seconds for every 24 hours of real  
time", in which case the monotonic time should be slewed -2sec/ 
24hours.  On the other hand, I might also want to say: "The clock has  
fixed error and is 2 hours ahead cause some dummy messed up the  
time", so I'm going to fix this over the next 2 weeks by slewing  
backwards 1 hour per 7 days, in which case I do _not_ want the  
monotonic time to be affected (I'm passing 2 days, not 1 day and 22  
hours).  How does the kernel handle this?  I've never seen any good  
description of the NTP and time-control APIs; if there is one out  
there (that's not 42 pages of dry standard), I would love a link.

Cheers,
Kyle Moffett

--
If you don't believe that a case based on [nothing] could potentially  
drag on in court for _years_, then you have no business playing with  
the legal system at all.
   -- Rob Landley




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
  2005-12-13 12:45 Nicolas Mailhot
@ 2005-12-13 23:38 ` George Anzinger
  2005-12-14  8:58   ` Kyle Moffett
  2005-12-14 10:03   ` Nicolas Mailhot
  0 siblings, 2 replies; 74+ messages in thread
From: George Anzinger @ 2005-12-13 23:38 UTC (permalink / raw)
  To: Nicolas Mailhot; +Cc: Thomas Gleixner, Roman Zippel, linux-kernel

Nicolas Mailhot wrote:
> "This is your interpretation and I disagree.
> 
> If I set up a timer with a 24 hour interval, which should go off
> everyday at 6:00 AM, then I expect that this timer does this even when
> the clock is set e.g. by daylight saving. I think, that this is a
> completely valid interpretation and makes a lot of sense from a
> practical point of view. The existing implementation does it that way
> already, so why do we want to change this ?"

I think that there is a miss understanding here.  The kernel timers, 
at this time, do not know or care about daylight savings time.  This 
is not really a clock set but a time zone change which does not 
intrude on the kernels notion of time (that being, more or less UTC).
> 
> Please do not hardcode anywhere 1 day = 24h or something like this.
> Relative timers should stay relative not depend on DST.

As far as timers go, it is only the user who understands any 
abstraction above the second.  I.e. hour, day, min. all are 
abstractions done in user land.

There is, however, one exception, the leap second.  The kernel inserts 
this at midnight UTC and does use a fixed constant (86400) to find 
midnight.
> 
> If someone needs a timer that sets of everyday at the same (legal) time,
> make him ask for everyday at that time not one time + n x 24h.
> 
> Some processes need an exact legal hour
> Other processes need an exact duration

I think what we are saying is that ABS time flag says that the timer 
is supposed to expire at the given time "by the specified clock", 
however that time is arrived at, be it the initial time or the initial 
time plus one or more intervals.  We are NOT saying that these 
intervals are the same size, but only that the given clock says that 
they are the same size, thus any clock setting done during an interval 
can cause that interval to be of a different size.

Without the ABS time flag, we are talking about intervals (the initial 
and subsequent) that are NOT affected by clock setting and are thus as 
close to the requested duration as possible.
> 
> In a DST world that's not the same thing at all - don't assume one or the
> other, have coders request exactly what they need and everyone will be
> happy.

This is why the standard introduced the ABS time flag.  It does NOT, 
however, IMHO touch on the issue of time zone changes introduced by 
shifting into and out of day light savings time.
> 
> I can tell from experience trying to fix code which assumed one day = 24h
> is not fun at all. And yes sometimes the difference between legal and UTC
> time matters a lot.
> 

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [patch 00/21] hrtimer - High-resolution timer subsystem
@ 2005-12-13 12:45 Nicolas Mailhot
  2005-12-13 23:38 ` George Anzinger
  0 siblings, 1 reply; 74+ messages in thread
From: Nicolas Mailhot @ 2005-12-13 12:45 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Roman Zippel, linux-kernel

"This is your interpretation and I disagree.

If I set up a timer with a 24 hour interval, which should go off
everyday at 6:00 AM, then I expect that this timer does this even when
the clock is set e.g. by daylight saving. I think, that this is a
completely valid interpretation and makes a lot of sense from a
practical point of view. The existing implementation does it that way
already, so why do we want to change this ?"

Please do not hardcode anywhere 1 day = 24h or something like this.
Relative timers should stay relative not depend on DST.

If someone needs a timer that sets of everyday at the same (legal) time,
make him ask for everyday at that time not one time + n x 24h.

Some processes need an exact legal hour
Other processes need an exact duration

In a DST world that's not the same thing at all - don't assume one or the
other, have coders request exactly what they need and everyone will be
happy.

I can tell from experience trying to fix code which assumed one day = 24h
is not fun at all. And yes sometimes the difference between legal and UTC
time matters a lot.

-- 
Nicolas Mailhot


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2005-12-22  4:31 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-06  0:01 [patch 00/21] hrtimer - High-resolution timer subsystem tglx
2005-12-06  0:01 ` [patch 01/21] Move div_long_long_rem out of jiffies.h tglx
2005-12-06  0:01 ` [patch 02/21] Remove duplicate div_long_long_rem implementation tglx
2005-12-06  0:01 ` [patch 03/21] Deinline mktime and set_normalized_timespec tglx
2005-12-06  0:01 ` [patch 04/21] Clean up mktime and make arguments const tglx
2005-12-06  0:01 ` [patch 05/21] Export deinlined mktime tglx
2005-12-06  0:01 ` [patch 06/21] Remove unused clock constants tglx
2005-12-06  0:01 ` [patch 07/21] Coding style clean up of " tglx
2005-12-06  0:01 ` [patch 08/21] Coding style and white space cleanup tglx
2005-12-06  0:01 ` [patch 09/21] Make clockid_t arguments const tglx
2005-12-06  0:01 ` [patch 10/21] Coding style and white space cleanup tglx
2005-12-06  0:01 ` [patch 11/21] Create and use timespec_valid macro tglx
2005-12-06  0:01 ` [patch 12/21] Validate timespec of do_sys_settimeofday tglx
2005-12-06  0:01 ` [patch 13/21] Introduce nsec_t type and conversion functions tglx
2005-12-06  0:01 ` [patch 14/21] Introduce ktime_t time format tglx
2005-12-06  0:01 ` [patch 15/21] hrtimer core code tglx
2005-12-15  3:43   ` Matt Helsley
2005-12-06  0:01 ` [patch 16/21] hrtimer documentation tglx
2005-12-06  0:01 ` [patch 17/21] Switch itimers to hrtimer tglx
2005-12-06  0:01 ` [patch 18/21] Create hrtimer nanosleep API tglx
2005-12-06  0:01 ` [patch 19/21] Switch sys_nanosleep to hrtimer tglx
2005-12-06  0:01 ` [patch 20/21] Switch clock_nanosleep to hrtimer nanosleep API tglx
2005-12-06  0:01 ` [patch 21/21] Convert posix timers completely tglx
2005-12-06 17:32 ` [patch 00/21] hrtimer - High-resolution timer subsystem Roman Zippel
2005-12-06 19:07   ` Ingo Molnar
2005-12-07  3:05     ` Roman Zippel
2005-12-08  5:18       ` Paul Jackson
2005-12-08  8:12         ` Ingo Molnar
2005-12-08  9:26       ` Ingo Molnar
2005-12-08 13:08         ` Roman Zippel
2005-12-08 15:36           ` Steven Rostedt
2005-12-06 22:10   ` Thomas Gleixner
2005-12-07  3:11     ` Roman Zippel
2005-12-06 22:28   ` Thomas Gleixner
2005-12-07  9:31     ` Andrew Morton
2005-12-07 10:11       ` Ingo Molnar
2005-12-07 10:20         ` Ingo Molnar
2005-12-07 10:23         ` Nick Piggin
2005-12-07 10:49           ` Ingo Molnar
2005-12-07 11:09             ` Nick Piggin
2005-12-07 11:33               ` Ingo Molnar
2005-12-07 11:40                 ` Nick Piggin
2005-12-07 13:06                 ` Roman Zippel
2005-12-07 12:40               ` Roman Zippel
2005-12-07 23:12                 ` Nick Piggin
2005-12-07 12:18     ` Roman Zippel
2005-12-07 16:55       ` Ingo Molnar
2005-12-07 17:17         ` Roman Zippel
2005-12-07 17:57           ` Ingo Molnar
2005-12-07 18:18             ` Roman Zippel
2005-12-07 18:02           ` Paul Baxter
2005-12-09 17:23       ` Thomas Gleixner
2005-12-12 13:39         ` Roman Zippel
2005-12-12 16:42           ` Thomas Gleixner
2005-12-12 18:37             ` Thomas Gleixner
2005-12-13  1:25             ` George Anzinger
2005-12-13  9:18               ` Thomas Gleixner
2005-12-15  1:35               ` Roman Zippel
2005-12-15  2:29                 ` George Anzinger
2005-12-19 14:56                   ` Roman Zippel
2005-12-19 20:54                     ` George Anzinger
2005-12-21 23:03                       ` Roman Zippel
2005-12-22  4:30                         ` George Anzinger
2005-12-14 20:48             ` Roman Zippel
2005-12-14 22:30               ` Thomas Gleixner
2005-12-15  0:55                 ` George Anzinger
2005-12-15 14:18                 ` Steven Rostedt
2005-12-19 14:50                 ` Roman Zippel
2005-12-19 22:05                   ` Thomas Gleixner
2005-12-13 12:45 Nicolas Mailhot
2005-12-13 23:38 ` George Anzinger
2005-12-14  8:58   ` Kyle Moffett
2005-12-14 10:03   ` Nicolas Mailhot
2005-12-15  1:11     ` George Anzinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).