linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] ktimers subsystem
@ 2005-09-19 16:48 tglx
  2005-09-19 16:48 ` [PATCH] " tglx
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: tglx @ 2005-09-19 16:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, akpm, george, johnstul, paulmck

ktimers seperate the "timer API" from the "timeout API". ktimers are 
used for:
- nanosleep
- posixtimers
- itimers

The following text explains the rationale behind ktimers. It contains

- a general analysis of the current Linux time(r) core system and
patches / projects related to it.  

- a detailed description of necessary changes to the Linux time(r)
core system

- detailed explanation of the ktimer subsystem

- a short explanation of possible follow up patches to demonstrate the
  further possibilities of the ktimers subsystem implementation


Why do we need ktimers ?
========================

Authors: Thomas Gleixner, Ingo Molnar

A lot of discussion took place about Linux timekeeping and timers 
recently. The efforts to integrate the High Resolution Timer patches 
into the -rt tree gave a deep insight into the big picture and initiated 
the ktimers implementation. This document is an analysis of all related 
issues, with the goal of inclusion of ktimers into mainline.


Linux time(rs) status, very short summary
-----------------------------------------

The current upstream timer implementation of Linux is based on a 
periodic system tick (jiffy). This periodic tick initially had a period 
of 10ms. During the 2.5 development series this was changed for some 
architectures to 1ms and recently corrected to 4ms.

The upstream "time of day" (tod) timekeeping code builds on top of the 
periodic ticks and takes architecture dependent sub-tick resolution 
mechanisms into account to provide finer resolution. It also implements 
synchronization with external time sources such as NTP.


Patches and projects related to timers and timekeeping, in history order
------------------------------------------------------------------------

- UTIME   Usec Resolution Timers
- HRT     High Resolution Timers
- VST     Variable Scheduling Timeouts
- DTCK    Dynamic Ticks
- NEWTOD  New timeofday core including reworked NTP

- some related architecture specific code already integrated into the
  kernel (mostly s390 related time interpolator code)
- a couple of others - rather odd attempts to change timer
  resolution. Mostly single purpose patches.

All of those patches have one thing in common. They are restricted to a 
few architectures and address only single problems of timers and 
timekeeping.


UTIME: Microsecond resolution timers 
-------------------------------------

The patch history goes back to Linux 2.0 and is maintained by Dr.  
Douglas Niehaus at Kansas University as part of the KURT (Kansas 
University Realtime) project.

Supported platforms: x86, (PPC, ARM partial)

Implementation:

Originally the usec resolution support was available for all users of 
the timer core system, but during the course of development it turned 
out to be a waste of resources and was restricted to realtime processes.  
The implementation is restricted to nanosleep and itimers. Posix timer 
support is planned. Initially it was implemented on top of the timer 
wheel, but recently converted to a seperate list for high resolution 
timers.

A related and quite interesting point of activity in this project is the 
research on fine granular synchronization of machines in a network.


HRT: High Resolution Timers
---------------------------

George Anzinger forked the UTIME parts of KURT some years ago and 
started the High Resolution Timer project. The usage is restricted to 
posix timers with clock = CLOCK_REALTIME_HR and CLOCK_MONOTONIC_HR.

Supported platforms: x86, PPC, PPC, PPC64, SH, ARM partial

Implementation:

High resolution timers are kept in the timer wheel until they reach the 
jiffy where they expire. On expiry they are moved to a seperate list and 
arm the high resolution timer source. The timer function is handled in a 
seperate softirq. The high resolution portion of the timers is managed 
as a seperate field in the timer structure which holds the fractional 
part. Initially implemented as subjiffies with a given resolution it 
changed to a variable holding the time source cycles to reduce 
conversion overhead. This has impact on the accuracy of cyclic 
schedules and also has some limitations for high resolution time sources
with variable frequencies, e.g. TSC on x86.
Possible changes are work in progress.


VST: Variable Scheduling Timeouts
---------------------------------

A patch closely related to and depending on HRT, also maintained by 
George Anzinger. It provides the suppression of timer ticks during idle 
periods.

Supported platforms: x86, (One ARM platform supported, a related patch 
snippet recently sneaked into the ARM core interrupt handling code)

Implementation:

Whenever the system goes into idle state the timer list(s) are scanned 
to find the next timer and, if it is reasonably far away, VST turns off 
the periodic 1/HZ timer interrupts and sets up a timer interrupt at the 
expiry time of said timer. VST also provides a callback list which is 
used to notify about idle enter / leave events. On the next interrupt, 
be it the VST timer or some other interrupt, the periodic 1/HZ timer is 
restarted and the elapsed time (ticks) is properly accounted for.


DTCK: Dynamic Ticks
-------------------

An implementation similar to VST, but not depending on other patches.  
Maintained by Con Kolivas. It also provides the suppression of timer 
ticks during idle periods.

Supported platforms: x86

Implementation:

Similar to VST, but completely jiffy bound. Very actively maintained in 
recent months, with good progress. The core implementation is leaner 
than VST. It contains some x86 specific bits which have to be seperated 
out. It uses the already existing NO_IDLE_HZ code (s390) instead of 
introducing new duplicated functionality. The configuration interface is 
integrated into sysfs. A generic notification interface is not available 
(yet?).


NEWTOD: New timeofday core including reworked NTP
-------------------------------------------------

John Stultz maintains a set of patches which are related, but have been 
split for easier review and discussion.

- Reworked implementation of NTP synchronization
- Seperation of time of day timekeeping from the timer core

Supported platforms: x86

Implementation:

The time of day code is completely seperated from the periodic tick. The 
code provides a runtime configurable time source selection, which is 
intended to be of generic (architecture spanning) use. One of the 
possible time sources is the periodic tick of course.

The code gets rid of one of the fundamental flaws of Linux time keeping: 
the wrong order of deduction. The current upstream code derives almost 
everything except jiffies from the wall clock time (xtime). This is 
controversial to almost every time reference in the world. Usually time 
references are built on a raw hardware clock which provides a more or 
less accurate monotonic time source. This time source is corrected vs.  
frequency skew. On top of resulting "constant frequency" monotonic time 
source the human time conversions are implemented, e.g. timezones.  
John's patch addresses this nicely and builds the correct order of clock 
source -> frequency correction -> wall clock adjustment. This is one of 
the essential preliminaries to implement nonintrusive, simple and 
effective high resolution time support.

The lively discussion of the patch is not questioning the general idea.  
The main point of criticizm is related to the enforced usage of 64-bit 
variables and 64-bit arithmetic in hot execution paths. This is seen as 
a penalty for 32-bit architectures and for low computing power CPUs 
which are often used in embedded devices, but architectural simplicity 
is a strong argument in favor of 64-bit arithmetics and we have not seen 
a substantial proof of the overhead. (See also the detailed comparison 
of timespec versus 64-bit nsec_t further below.)


Timer related observations
--------------------------

Ticks are a convenient mechanism for a lot of time triggered functions 
like scheduler-ticks, timeouts etc. which require limited resolution and 
precision.

For time of day timekeeping, which requires sub tick resolution 
preferrable in human time units, ticks introduce a bunch of ugliness 
especially when it comes to time synchronizing with high resolution time 
sources. Another astonishing implementation detail of the current time 
keeping is the fact that we get the monotonic clock (defined by POSIX as 
a continous clock source which can not be set) by subtracting a variable 
offset from the real time clock, which can be set by the user and 
corrected by NTP or other mechanisms.

Another well-known drawback of the current tick based implementation is 
the fact that ticks happen even on a completely idle system. This is an 
undesired behaviour for battery powered devices. Resolving this 
currently needs a lot of quirks to the upstream time(rs) system.

The current POSIX timer implementation is also quite complex due to its 
implementation on top of the tick timers. We are forced to e.g. keep 
track of armed CLOCK_REALTIME timers to readjust them when the clock has 
been set. The POSIX timer API, as defined by Posix Specification 1003.1, 
is inconsistent in the treatment of relative and absolute CLOCK_REALTIME 
timers. Absolute timers are influenced by clock_set, relative timers are 
not. This also applies to nanosleep. The specification of nanosleep 
states on the other hand that the sleeping time must not be less than 
the given timeout measured by CLOCK_REALTIME. Setting the clock while a 
nanosleep is scheduled leads to an interesting situation:

 Process 1           Process2
 t1=get_timeofday()
 nanosleep(20s)
                     set_timeofday(relative -20s)
 t2=get_timeofday()
 t2 - t1 =~ 0s

Implementing high resolution timers on top of a the current system also 
requires a lot of quirks to keep the timer API usable for both high 
resolution and tick based timers.

Such kinds of 'interaction artifacts' between the tick based data 
structures and algorithms and the high-resolution data structures, even 
if looked at without knowing anything about the time(r) subsystem, 
already point in the direction of separating 'high resolution time(r)' 
and 'low resolution timeout' APIs and subsystems.

As mentioned earlier, the switch to 1ms ticks during 2.5 development 
series turned out to cause certain regressions and was recently 
corrected to 4ms. One common type of regression was 'timer 
soft-interrupt overrun', i.e. when processing related to a timer tick 
did not finish within one jiffy, causing a domino effect on the timing 
quality of the system.

What are the reasons? During the work on integrating high resolution 
timers into the -rt tree we observed a lot of details related to this 
problem. When changing the period of the timer tick (changing HZ) the 
size of the primary timer wheel in the core code remains unchanged. This 
results in the fact that the primary timer wheel [into which wheel the 
secondary wheels are 'cascaded' periodically] becomes capable of 
handling a smaller time span than before. The CONFIG_BASE_SMALL option 
makes it even worse. Here is a table of the capacity limits of the 
primary wheel:

 HZ     CONFIG_BASE_SMALL=n     CONFIG_BASE_SMALL=y

 100    2560 ms                 640 ms
 250    1024 ms                 256 ms
 1000    256 ms                  64 ms

So one source of regression is the increased necessity to move 
non-expired timers from the outer wheels to the primary wheel.

This alone does not explain all the regressions yet though. We did some 
instrumentation and statistics on the timer code related to common use 
cases, where the regressions showed up - machines with high networking 
and/or disk I/O load.

This revealed a reasonable explanation for this behaviour. Both 
networking and disk I/O arm a lot of timeout timers (the maximum number 
of armed timers during the tests observed was ~400000). The majority of 
those timeouts are in the range of 0.5 seconds. As frequently seen with 
timeout timers, most of those timers never expire, but under high load 
they get easily into a time line where they have to be moved from the 
outer wheel to the primary timer wheel, when HZ=1000. Have a look at the 
timer cascading code [cascade() in kernel/timer.c] to see the penalty...

Another source of regression is the fact that quite a lot of timer 
functions execute long lasting codepaths. E.g. in the networking code 
rt_secret_rebuild() does a loop over rt_hash_mask (1024 in my case), 
over entries and over some subsequent variable sized loops inside each 
step. On a 300MHZ PPC system this accumulated to a worst case total of
>5ms (!). The networking code contains more of those loops in timer
functions and the worst case szenario is when all of those loops happen 
in the same jiffy and block the timely delivery of other timers. There 
are other culprits, but those in the networking code are the most 
obvious ones. This went almost unnoticed on HZ=100 systems, but on 
HZ=1000 based machines those effects surfaced. The change to HZ=250 is 
just hiding the problem rather than solving it.

Another weird effect of the changes to the time tick period is the fact 
that a lot of places in the code are using HZ incorrectly. Even today, 
more than a year after the switchover. Many of those usages still assume 
HZ=100 or even have a completely wrong understanding of the mechanism 
provided by the Linux kernel timer API (which is unrelated to the HZ 
changes of course).

These observations together finally led to the complete seperation of 
the high resolution timer data structures from the jiffy wheel, in the 
HRT-RT integration work. (to further reduce latencies we also separated 
softirq threads, but that is another topic.)


What is solved by the available patches ?
-----------------------------------------

As said before each of the patches addresses a particular part of the 
time(r) related problems. Some of them are competing implementations.

We don't want to put down the efforts of the particular projects, but one 
outstanding patch is John Stultz's work on the new time of day 
subsystem. It really addresses one of the substantial linux time(r) 
problems in a very generic and architecture-independent way, upon which 
the other efforts can build cleanly.

The other patches mostly relate to tickless systems and high resolution 
timers, and are providing great proof of concept implementations but 
suffer from the bindings to particular architectures and the 
restrictions that the current upstream Linux timer core code imposes 
upon them.


What changes are required?
--------------------------

The conclusions of my recent work on Linux time(rs) related problems and 
the analysis of related patches are:

1. The HZ/jiffy based usage of time in the kernel code has to be
   converted to human time units.

2. A clean seperation of all related APIs and subsystems is necessary 
   even if they have interdependencies and shared functionality


| HZ/jiffy conversion

The conversion of users of HZ/jiffy based timing to human time units is 
necessary to allow changes to the core timer subsystem without breaking 
the users all over the kernel. Looking at the code most HZ/jiffie timers 
are using more or less correct conversions from human time units anyway.  
A positive side effect of such a cleanup is the necessary auditing for 
correctness.


| API seperation

- time sources
- time synchronization
- time of day API
- timers API
- timeout API

- time sources:

The number and the resolution of available time sources varies a lot 
over architectures and particular architecture specific platform 
implementations. Some of them are only run time detectable. NEWTOD 
provides a excellent code base for time source abstraction, but a couple 
of details have to be discussed:

  - resolution selection
  - resolution and architecture dependend interface
  - support for tick bound and tick less systems including a clean
    integration into the interrupt handling code.
  - usability of timesources for differrent purposes (timekeeping,
    high/low res event scheduling)
  - 32- vs. 64-bit arithmetic
  
- time synchronisation:

Time syncronization corrects the frequency skew of the time source. It 
must provide a plugable interface for time synchronisation mechanisms to 
allow the flexible implementation of time synchronization sources:

  - None
  - NTP
  - GPS
  - RTC
  - ...

- time of day API:

The time of day API makes use of the eventually frequency corrected time 
sources to implement the "human readable" interface. It is also 
responsible for the translation of the monotonic time source - time 
since system (re)booted - to the wall clock - real time - time. (Real 
time in this context must not be confused with "real time" in the sense 
of determinism.)


- timer API:

The timer API provides finegrained precision timers with relation to the 
time of day subsystem. It provides the functionality of:

- precise interval scheduling
- precise timeouts

with or without high resolution timing support depending on system 
configuration and system capabilities.


- timeout API:

The timeout API provides a coarse resolution interface for timeout 
purposes. As pointed out before the majority of timers are related to 
timeouts. What's the nature of timeouts ?

  - Timeout timers are usually armed to cover an error condition

  - Most of those timers never expire (the non timer related good
    condition arrives before expiry)

  - The demands on resolution are usually quite low. It does not make
    any difference if an error condition is detected a few or even a
    few hundred milliseconds earlier or later. The relevant point is
    that the error is detected at all.

On a heavy loaded web server ~95%+ of all timers (almost all of them 
armed by network or I/O kernel code) are removed before expiry. The 
remaining timers which really expire are mostly timers requested from 
application code. The major usage there is some periodic supervisor 
code, which checks program status or other application relevant
information, and delays.


Conclusion
----------

Before inclusion of extensions to the current timer implementation e.g. 
dynamic ticks, a API seperation and cleanup has to be done. Integrating 
new functionality on top of the current code will just introduce a lot 
of quirks and oddities which make the necessary cleanup and rework 
harder.

John Stultz timeofday patches provide an excellent and solid base to 
solve the first 3 of 5 points of the API seperation changes.

Ktimers add the timer API seperation with a clean way to integrate high 
resolution time keeping.

The combination of both patches provides the grounds and leads the way 
to the cleanup of the timeout API and the implementation of 
dyntick/tickless support without introducing additional ugliness.


What is solved by ktimers ?
===========================

ktimers seperate the "timer API" from the "timeout API". ktimers are 
used for:

- nanosleep
- posixtimers
- itimers

The implementation was done with following constraints in mind:

- Not bound to jiffies
- Multiple time sources
- Per CPU timer queues
- Simplification of absolute CLOCK_REALTIME posix timers
- High resolution timer aware
- Allows the timeout API to reschedule the next event 
  (for tickless systems)

Ktimers enqueue the timers into a time sorted list, which is implemented 
with a rbtree, which is effiecient and already used in other performance 
critical parts of the kernel. This is a bit slower than the timer wheel, 
but due to the fact that the vast majority of timers is actually 
expiring it has to be waged versus the cascading penalty.

The code supports multiple time sources. Currently implemented are 
CLOCK_REALTIME and CLOCK_MONOTONIC. They provide seperate timer queues 
and support functions.

The time ordered implementation and storage of the expiry time as the
time of the selected time source removes the hard work of
reprogramming all armed absolute CLOCK_REALTIME posix timers when the
clock was set

During the initial implementation phase the choice of a time storage 
format had to be done. Dispite the previous discussion about 64-bit time 
storage vs. timespec structures, the decision was made to use plain 
64-bit variables. The rationale behind this is:

1. Simple calculations (add, sub, compare), which are the ones used in
   the fastpaths are simpler and better to handle than struct
   timespec.  

2. The storage size is the same for 32-bit systems, but half the size
   on 64-bit machines

3. The resulting binary code size is smaller due to the simpler fast
   path operations. The comparison of the resulting binary code size
   of a function which resembles parts of the hotpaths in the
   enqueue and expiry code make this very clear. All compiled with
   gcc-3.4 -O2.

	        AMD64	 I386	     ARM	  PPC32	      M68K
   nsec_t_ops	 e2	 11c	      fc	  1ac	       ce
   timespec_ops	19c	 144	     1c0	  280	      156

   Smaller binary code usually executes faster.

4. The kludge introduced by timespec arithmetics is horrible to
   maintain. The simple and straight forward 64-bit calculation have
   much less of surprises and potential error sources hidden.

5. The areas where the 64-bit nsec_t value has to be converted to
   timespec / timeval are very restricted and can be optimized
   further. Except for one odd POSIX timer related case (cyclic timer
   with no signal delivery) the calculation is simple and straight
   forward. The most discussed code in John Stultz timeofday patch is
   the conversion in gettimeofday(). This can be easily solved by a
   low overhead storage in both formats. The POSIX timespec / timeval
   interface to userspace for the apparently often used gettimeofday
   syscall must not be used as an red herring to clutter the complete
   kernel timer and time keeping subsystems with this uneffective
   representation of time.

ktimers are available in a patch series for easier review:

- ktimer_base.patch 

  The basic implementation of ktimers. The timer queues are called
  inside the existing timer softirq. The time ordered queue
  implementation allows to remove all the abs_list functionality from
  posix-timers.c. Converted interfaces: itimers, nanosleep, posix
  timers. There is no change to the current kernel time keeping system
  required. The base patch utilizes existing interfaces. 

The following add on patches are not provided for ad hoc inclusion as
they contain third party patches. The reason for providing this series
is to demonstrate the future use of ktimers and the simple
extensibility for the impelemtation of high resolution timers.
Especially John Stultz timeofday patch is a complete seperate issue
and just used due to the ability to provide high resolution timers in
a simple and non intrusive way.

The full patch series is available from 
http://www.tglx.de/private/tglx/ktimers/patches-ktimer-tod-hrt.tar.bz2

- ktimer_hres.patch

  Generic extension to the base patch to support high resolution
  timers. The high resolution changes are the seperation of the
  softirq, the high resolution interrupt function and the timer
  reprogramming management.
  
- timeofday_b5 patch
  
  Integration of John Stultz timeofday patches to have a clean
  abstraction of time sources for a non intrusive implementation of
  high resolution timers. This patch will be replaced by the reworked
  version which is currently developed by John Stultz.

- timeofday_fixup patch

  Fixup clashing inlines 

- ktimer_tod.patch

  extend the timeofday API and switch ktimers to use the timeofday API

- hres_i386_support.patch

  Patch to support high resolution timers on i386 with local APIC
  timer used for high resolution events. Proof of concept with the
  restriction to local APIC as high resolution event source at the
  moment. The main point is to prove that high resolution timers do
  not require large and intrusive patches anymore due to the cleanup
  of the time system.

Test coverage: 

  The complete patch is tested with the posix timer tests, which all
  pass. It survives a couple of stress tests and shows no flaws when
  integrated into the -rt tree. Of course this is brand new code, but
  it is designed simple and robust from ground and got a thorough
  review by a couple of people.

Some notes to the patch size(s):

  - ktimer_base.patch:
    17 files changed, 1328 insertions(+), 859 deletions(-)
    code     (-)  642 (+) 1058
    comments (-)  217 (+)  270
    The added code is mostly the base functionality of the ktimers
    itself.  The most cleanups happen in posix-timers.c, where all the
    code related to the clock_was_set adjustment of absolute
    CLOCK_REALTIME timers is removed.  Converts itimers, nanosleep and
    posixtimers to ktimer users

  - ktimer_hres_patch  
    3 files changed, 330 insertions(+), 7 deletions(-)
    code     (-)    6 (+)  232
    comments (-)    1 (+)   98
    Add the generic infrastructure for high resolution timers. No non
    POSIX clocks introduced, all ktimer users are automatically converted

  - timeofday_b5.patch
    73 files changed, 2926 insertions(+), 2675 deletions(-)
    code     (-) 2100 (+) 2146
    comments (-)  575 (+)  780
    A balanced patch providing a great improvement of functionality and 
    abstraction.

  - hres_i386_support.patch
    13 files changed, 635 insertions(+), 10 deletions(-)
    code     (-)    9 (+)  386
    comments (-)    1 (+)  249
    The largest addon is a header file containing scaled math operations, which
    needs to be cleaned up.

  - Total patch size  
    97 files changed, 5238 insertions(+), 3544 deletions(-)
    code     (-) 2751 (+) 3829
    comments (-)  793 (+) 1409

  Comparision numbers:
  - hrt-common.patch
    13 files changed, 1464 insertions(+), 108 deletions(-)
    code     (-)   91 (+)  879
    comments (-)   17 (+)  585
    Most code is added to the 
    Only posixtimers are supported. Add seperate non POSIX clocks 
    (CLOCK_REALTIME_HR and CLOCK_MONOTONIC_HR)
    
  - i386-hrt.patch (reduced to apic code)
    13 files changed, 1371 insertions(+), 63 deletions(-)
    code     (-)   51 (+)  910
    comments (-)   12 (+)  461
    
  - combined   
    26 files changed, 2835 insertions(+), 171 deletions(-)
    code     (-)  142 (+) 1789
    comments (-)   29 (+) 1046

  Summary: 

  The ktimer/timeofday/hrt combination adds ~1050 lines of source and
  provides a clean API seperation and a lot of code/functionality
  cleanup.

  The high resolution timer patches add ~1750 lines of code for high
  resolution time keeping without further functional improvements or
  API cleanups.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] ktimers subsystem
  2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx
@ 2005-09-19 16:48 ` tglx
  2005-09-19 21:47 ` [ANNOUNCE] " Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 50+ messages in thread
From: tglx @ 2005-09-19 16:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, akpm, george, johnstul, paulmck

ktimers seperate the "timer API" from the "timeout API". 
ktimers are used for:
- nanosleep
- posixtimers
- itimers

The patch contains the base implementation of ktimers and the
conversion of nanosleep, posixtimers and itimers to ktimer users. 

The patch does not require other changes to the Linux time(r) core
system.

The implementation was done with following constraints in mind:

- Not bound to jiffies
- Multiple time sources
- Per CPU timer queues
- Simplification of absolute CLOCK_REALTIME posix timers
- High resolution timer aware
- Allows the timeout API to reschedule the next event 
  (for tickless systems)

Ktimers enqueue the timers into a time sorted list, which is implemented 
with a rbtree, which is effiecient and already used in other performance 
critical parts of the kernel. This is a bit slower than the timer wheel, 
but due to the fact that the vast majority of timers is actually 
expiring it has to be waged versus the cascading penalty.

The code supports multiple time sources. Currently implemented are 
CLOCK_REALTIME and CLOCK_MONOTONIC. They provide seperate timer queues 
and support functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

---

 fs/exec.c                    |    9 
 fs/proc/array.c              |    6 
 include/asm-generic/div64.h  |   18 
 include/linux/ktimer.h       |  143 +++++++
 include/linux/posix-timers.h |   87 ++--
 include/linux/sched.h        |    4 
 include/linux/time.h         |   65 ++-
 include/linux/timer.h        |    2 
 init/main.c                  |    1 
 kernel/Makefile              |    3 
 kernel/exit.c                |    2 
 kernel/fork.c                |    5 
 kernel/itimer.c              |   83 +---
 kernel/ktimers.c             |  845 +++++++++++++++++++++++++++++++++++++++++++
 kernel/posix-cpu-timers.c    |   23 -
 kernel/posix-timers.c        |  832 +++++++-----------------------------------
 kernel/timer.c               |   59 ---
 17 files changed, 1328 insertions(+), 859 deletions(-)

Index: linux-2.6.13.ktimers/fs/exec.c
===================================================================
--- linux-2.6.13.ktimers.orig/fs/exec.c
+++ linux-2.6.13.ktimers/fs/exec.c
@@ -650,9 +650,10 @@ static inline int de_thread(struct task_
 		 * synchronize with any firing (by calling del_timer_sync)
 		 * before we can safely let the old group leader die.
 		 */
-		sig->real_timer.data = (unsigned long)current;
-		if (del_timer_sync(&sig->real_timer))
-			add_timer(&sig->real_timer);
+		sig->real_timer.data = current;
+		if (stop_ktimer(&sig->real_timer))
+			start_ktimer(&sig->real_timer, NULL,
+				     KTIMER_RESTART|KTIMER_NOCHECK);
 	}
 	while (atomic_read(&sig->count) > count) {
 		sig->group_exit_task = current;
@@ -664,7 +665,7 @@ static inline int de_thread(struct task_
 	}
 	sig->group_exit_task = NULL;
 	sig->notify_count = 0;
-	sig->real_timer.data = (unsigned long)current;
+	sig->real_timer.data = current;
 	spin_unlock_irq(lock);
 
 	/*
Index: linux-2.6.13.ktimers/fs/proc/array.c
===================================================================
--- linux-2.6.13.ktimers.orig/fs/proc/array.c
+++ linux-2.6.13.ktimers/fs/proc/array.c
@@ -324,7 +324,7 @@ static int do_task_stat(struct task_stru
 	unsigned long  min_flt = 0,  maj_flt = 0;
 	cputime_t cutime, cstime, utime, stime;
 	unsigned long rsslim = 0;
-	unsigned long it_real_value = 0;
+	nsec_t it_real_value = 0;
 	struct task_struct *t;
 	char tcomm[sizeof(task->comm)];
 
@@ -380,7 +380,7 @@ static int do_task_stat(struct task_stru
 			utime = cputime_add(utime, task->signal->utime);
 			stime = cputime_add(stime, task->signal->stime);
 		}
-		it_real_value = task->signal->it_real_value;
+		it_real_value = task->signal->real_timer.expires;
 	}
 	ppid = pid_alive(task) ? task->group_leader->real_parent->tgid : 0;
 	read_unlock(&tasklist_lock);
@@ -429,7 +429,7 @@ static int do_task_stat(struct task_stru
 		priority,
 		nice,
 		num_threads,
-		jiffies_to_clock_t(it_real_value),
+		(clock_t) nsec_to_clock_t(it_real_value),
 		start_time,
 		vsize,
 		mm ? get_mm_counter(mm, rss) : 0, /* you might want to shift this left 3 */
Index: linux-2.6.13.ktimers/include/linux/ktimer.h
===================================================================
--- /dev/null
+++ linux-2.6.13.ktimers/include/linux/ktimer.h
@@ -0,0 +1,143 @@
+#ifndef _LINUX_KTIMER_H
+#define _LINUX_KTIMER_H
+
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/rbtree.h>
+#include <linux/time.h>
+#include <linux/wait.h>
+
+/* Timer API */
+
+struct ktimer_base;
+
+/*
+ * Timer structure must be initialized by init_ktimer_xxx !
+ */
+struct ktimer {
+	struct rb_node		node;
+	struct list_head	list;
+	nsec_t			expires;
+	nsec_t		 	expired;
+	nsec_t		 	interval;
+	int 	 	 	overrun;
+	unsigned long		status;
+	void 			(*function)(void *);
+	void			*data;
+	struct ktimer_base 	*base;
+};
+
+/*
+ * Timer base struct
+ */
+struct ktimer_base {
+	struct ktimer_base	*base;
+	char			*name;
+	spinlock_t		lock;
+	struct rb_root		active;
+	struct list_head	pending;
+	int			count;
+	unsigned long		resolution;
+	nsec_t			(*get_time)(void);
+	struct ktimer		*running_timer;
+	wait_queue_head_t	wait_for_running_timer;
+};
+
+/*
+ * Values for the mode argument of xxx_ktimer functions
+ */
+enum
+{
+	KTIMER_NOREARM,	/* Internal value */
+	KTIMER_ABS,	/* Time value is absolute */
+	KTIMER_REL,	/* Time value is relativ to now */
+	KTIMER_INCR,	/* Time value is relativ to previous expiry time */
+	KTIMER_FORWARD,	/* Timer is rearmed with value. Overruns are accounted */
+	KTIMER_REARM,	/* Timer is rearmed with interval. Overruns are accounted */
+	KTIMER_RESTART	/* Timer is restarted with the stored expiry value */
+};
+
+/* Expiry must not be checked when the timer is started */
+#define KTIMER_NOCHECK		0x10000
+
+#define MAX_KTIMER_BASES	4
+#define KTIMER_POISON		((void *) 0x00100101)
+#define KTIMERS_MAX_NSEC	(~(1LL<<63))
+
+#define ktimer_before(t1, t2) (t1->expires < t2->expires)
+#define ktimer_active(t) ((t)->status != KTIMER_INACTIVE)
+
+enum
+{
+	KTIMER_INACTIVE,
+	KTIMER_PENDING,
+	KTIMER_EXPIRED,
+	KTIMER_EXPIRED_NOQUEUE,
+};
+
+/* Exported functions */
+extern void fastcall init_ktimer_real(struct ktimer *timer);
+extern void fastcall init_ktimer_mono(struct ktimer *timer);
+extern int modify_ktimer(struct ktimer *timer, nsec_t *tim, int mode);
+extern int start_ktimer(struct ktimer *timer, nsec_t *tim, int mode);
+extern int try_to_stop_ktimer(struct ktimer *timer);
+extern int stop_ktimer(struct ktimer *timer);
+extern nsec_t get_remtime_ktimer(struct ktimer *timer, long fake);
+extern nsec_t get_expiry_ktimer(struct ktimer *timer, nsec_t *now);
+extern void __init init_ktimers(void);
+
+/* Conversion functions with rounding based on resolution */
+extern nsec_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv);
+extern nsec_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts);
+
+/* Posix timers current quirks */
+extern int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp);
+extern int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp);
+
+/* nanosleep functions */
+long ktimer_nanosleep_mono(struct timespec *rqtp, struct timespec __user *rmtp, int mode);
+long ktimer_nanosleep_real(struct timespec *rqtp, struct timespec __user *rmtp, int mode);
+
+#if defined(CONFIG_SMP)
+extern void wait_for_ktimer(struct ktimer *timer);
+#else
+#define wait_for_ktimer(t) do {} while (0)
+#endif
+
+#define KTIME_REALTIME_RES (NSEC_PER_SEC/HZ)
+#define KTIME_MONOTONIC_RES (NSEC_PER_SEC/HZ)
+
+static inline void get_ktime_mono_ts(struct timespec *ts)
+{
+	unsigned long seq;
+	struct timespec tomono;
+	do {
+		seq = read_seqbegin(&xtime_lock);
+		getnstimeofday(ts);
+		tomono = wall_to_monotonic;
+	} while (read_seqretry(&xtime_lock, seq));
+
+	set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec,
+				ts->tv_nsec + tomono.tv_nsec);
+}
+
+static inline nsec_t do_get_ktime_mono(void)
+{
+	struct timespec tm;
+	get_ktime_mono_ts(&tm);
+	return timespec_to_ns(&tm);
+}
+
+#define get_ktime_real_ts(ts) getnstimeofday(ts)
+static inline nsec_t do_get_ktime_real(void)
+{
+	struct timeval now;
+
+	do_gettimeofday(&now);
+	return timeval_to_ns(&now);
+}
+
+#define clock_was_set() do { } while (0)
+extern void run_ktimer_queues(void);
+
+#endif
Index: linux-2.6.13.ktimers/include/linux/posix-timers.h
===================================================================
--- linux-2.6.13.ktimers.orig/include/linux/posix-timers.h
+++ linux-2.6.13.ktimers/include/linux/posix-timers.h
@@ -51,10 +51,9 @@ struct k_itimer {
 	struct sigqueue *sigq;		/* signal queue entry. */
 	union {
 		struct {
-			struct timer_list timer;
-			struct list_head abs_timer_entry; /* clock abs_timer_list */
-			struct timespec wall_to_prev;   /* wall_to_monotonic used when set */
-			unsigned long incr; /* interval in jiffies */
+			struct ktimer timer;
+			nsec_t incr;
+			int overrun;
 		} real;
 		struct cpu_timer_list cpu;
 		struct {
@@ -66,10 +65,6 @@ struct k_itimer {
 	} it;
 };
 
-struct k_clock_abs {
-	struct list_head list;
-	spinlock_t lock;
-};
 struct k_clock {
 	int res;		/* in nano seconds */
 	int (*clock_getres) (clockid_t which_clock, struct timespec *tp);
@@ -77,7 +72,7 @@ struct k_clock {
 	int (*clock_set) (clockid_t which_clock, struct timespec * tp);
 	int (*clock_get) (clockid_t which_clock, struct timespec * tp);
 	int (*timer_create) (struct k_itimer *timer);
-	int (*nsleep) (clockid_t which_clock, int flags, struct timespec *);
+	int (*nsleep) (clockid_t which_clock, int flags, struct timespec *, struct timespec *);
 	int (*timer_set) (struct k_itimer * timr, int flags,
 			  struct itimerspec * new_setting,
 			  struct itimerspec * old_setting);
@@ -91,37 +86,71 @@ void register_posix_clock(clockid_t cloc
 
 /* Error handlers for timer_create, nanosleep and settime */
 int do_posix_clock_notimer_create(struct k_itimer *timer);
-int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *);
+int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *, struct timespec __user *);
 int do_posix_clock_nosettime(clockid_t, struct timespec *tp);
 
 /* function to call to trigger timer event */
 int posix_timer_event(struct k_itimer *timr, int si_private);
 
-struct now_struct {
-	unsigned long jiffies;
-};
+#if (BITS_PER_LONG < 64)
+static inline nsec_t forward_posix_timer(struct k_itimer *t, nsec_t now)
+{
+	nsec_t delta = now - t->it.real.timer.expires;
+	unsigned long long orun = 1;
+
+	if (delta < 0)
+		return -delta;
+
+	if (unlikely(delta >= t->it.real.incr)) {
+		if ((t->it.real.incr >> 32) == 0) {
+			do_div(delta, (unsigned long) t->it.real.incr);
+			orun += delta;
+		} else {
+			int sft = 0;
+			u64 t1 = t->it.real.incr;
+			u64 t2 = delta;
+
+			while(t1 >> 32) {
+				sft++;
+				t2 >>= 1;
+			}
+			t2 >>= sft;
+			do_div(t2, (unsigned long) t1);
+			t2 *= t->it.real.incr
+;
+			if (t2 <= delta)
+				orun++;
+		}
+	}
+	t->it_overrun += (long) orun;
+	t->it.real.timer.expires += orun * t->it.real.incr;
+	return t->it.real.timer.expires - now;
+}
+#else
+static inline nsec_t forward_posix_timer(struct k_itimer *t, nsec_t now)
+{
+	nsec_t delta = now - t->it.real.timer.expires;
+	unsigned long orun = 1;
+
+	if (delta < 0)
+		return -delta;
+
+	if (unlikely(delta >= t->it.real.incr))
+		orun += delta / t->it.real.incr;
+	t->it.real.timer.expires += orun * t->it.real.incr;
+	t->it_overrun += orun;
+	return t->it.real.timer.expires - now;
+}
+#endif
+
 
-#define posix_get_now(now) (now)->jiffies = jiffies;
-#define posix_time_before(timer, now) \
-                      time_before((timer)->expires, (now)->jiffies)
-
-#define posix_bump_timer(timr, now)					\
-         do {								\
-              long delta, orun;						\
-	      delta = now.jiffies - (timr)->it.real.timer.expires;	\
-              if (delta >= 0) {						\
-	           orun = 1 + (delta / (timr)->it.real.incr);		\
-	          (timr)->it.real.timer.expires +=			\
-			 orun * (timr)->it.real.incr;			\
-                  (timr)->it_overrun += orun;				\
-              }								\
-            }while (0)
 
 int posix_cpu_clock_getres(clockid_t which_clock, struct timespec *);
 int posix_cpu_clock_get(clockid_t which_clock, struct timespec *);
 int posix_cpu_clock_set(clockid_t which_clock, const struct timespec *tp);
 int posix_cpu_timer_create(struct k_itimer *);
-int posix_cpu_nsleep(clockid_t, int, struct timespec *);
+int posix_cpu_nsleep(clockid_t, int, struct timespec *,
+		     struct timespec __user *);
 int posix_cpu_timer_set(struct k_itimer *, int,
 			struct itimerspec *, struct itimerspec *);
 int posix_cpu_timer_del(struct k_itimer *);
Index: linux-2.6.13.ktimers/include/linux/sched.h
===================================================================
--- linux-2.6.13.ktimers.orig/include/linux/sched.h
+++ linux-2.6.13.ktimers/include/linux/sched.h
@@ -102,6 +102,7 @@ extern unsigned long nr_iowait(void);
 #include <linux/param.h>
 #include <linux/resource.h>
 #include <linux/timer.h>
+#include <linux/ktimer.h>
 
 #include <asm/processor.h>
 
@@ -313,8 +314,7 @@ struct signal_struct {
 	struct list_head posix_timers;
 
 	/* ITIMER_REAL timer for the process */
-	struct timer_list real_timer;
-	unsigned long it_real_value, it_real_incr;
+	struct ktimer real_timer;
 
 	/* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */
 	cputime_t it_prof_expires, it_virt_expires;
Index: linux-2.6.13.ktimers/include/linux/time.h
===================================================================
--- linux-2.6.13.ktimers.orig/include/linux/time.h
+++ linux-2.6.13.ktimers/include/linux/time.h
@@ -5,6 +5,7 @@
 
 #ifdef __KERNEL__
 #include <linux/seqlock.h>
+#include <asm/div64.h>
 #endif
 
 #ifndef _STRUCT_TIMESPEC
@@ -45,6 +46,11 @@ static __inline__ int timespec_equal(str
 	return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec);
 } 
 
+#define timespec_valid(ts) \
+(((ts)->tv_sec >= 0) && (((unsigned) (ts)->tv_nsec) < NSEC_PER_SEC))
+
+typedef s64 nsec_t;
+
 /* Converts Gregorian date to seconds since 1970-01-01 00:00:00.
  * Assumes input in normal date format, i.e. 1980-12-31 23:59:59
  * => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
@@ -95,8 +101,7 @@ struct timespec current_kernel_time(void
 extern void do_gettimeofday(struct timeval *tv);
 extern int do_settimeofday(struct timespec *tv);
 extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz);
-extern void clock_was_set(void); // call when ever the clock is set
-extern int do_posix_clock_monotonic_gettime(struct timespec *tp);
+extern void do_posix_clock_monotonic_gettime(struct timespec *ts);
 extern long do_nanosleep(struct timespec *t);
 extern long do_utimes(char __user * filename, struct timeval * times);
 struct itimerval;
@@ -121,6 +126,49 @@ set_normalized_timespec (struct timespec
 	ts->tv_nsec = nsec;
 }
 
+static inline void div_sign_safe_ns(struct timespec *ts, nsec_t ns)
+{
+	if (unlikely(ns < 0)) {
+		ts->tv_sec = div_long_long_rem(-ns, NSEC_PER_SEC, &ts->tv_nsec);
+		set_normalized_timespec(ts, -ts->tv_sec, -ts->tv_nsec);
+	} else
+		ts->tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts->tv_nsec);
+}
+
+static __inline__ nsec_t timespec_to_ns(struct timespec *s)
+{
+	nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC;
+	return res + (nsec_t) s->tv_nsec;
+}
+
+static __inline__ struct timespec ns_to_timespec(nsec_t n)
+{
+	struct timespec ts;
+
+	if (n)
+		div_sign_safe_ns(&ts, n);
+	else
+		ts.tv_sec = ts.tv_nsec = 0;
+	return ts;
+}
+
+static __inline__ nsec_t timeval_to_ns(struct timeval *s)
+{
+	nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC;
+	return res + (nsec_t) s->tv_usec * NSEC_PER_USEC;
+}
+
+static __inline__ struct timeval ns_to_timeval(nsec_t n)
+{
+	struct timeval tv;
+	if (n) {
+		div_sign_safe_ns((struct timespec *)&tv, n);
+		tv.tv_usec /= 1000;
+	} else
+		tv.tv_sec = tv.tv_usec = 0;
+	return tv;
+}
+
 #endif /* __KERNEL__ */
 
 #define NFDBITS			__NFDBITS
@@ -153,23 +201,18 @@ struct	itimerval {
 /*
  * The IDs of the various system clocks (for POSIX.1b interval timers).
  */
-#define CLOCK_REALTIME		  0
-#define CLOCK_MONOTONIC	  1
+#define CLOCK_REALTIME		 0
+#define CLOCK_MONOTONIC	  	 1
 #define CLOCK_PROCESS_CPUTIME_ID 2
 #define CLOCK_THREAD_CPUTIME_ID	 3
-#define CLOCK_REALTIME_HR	 4
-#define CLOCK_MONOTONIC_HR	  5
 
 /*
  * The IDs of various hardware clocks
  */
-
-
 #define CLOCK_SGI_CYCLE 10
 #define MAX_CLOCKS 16
-#define CLOCKS_MASK  (CLOCK_REALTIME | CLOCK_MONOTONIC | \
-                     CLOCK_REALTIME_HR | CLOCK_MONOTONIC_HR)
-#define CLOCKS_MONO (CLOCK_MONOTONIC & CLOCK_MONOTONIC_HR)
+#define CLOCKS_MASK  (CLOCK_REALTIME | CLOCK_MONOTONIC)
+#define CLOCKS_MONO (CLOCK_MONOTONIC)
 
 /*
  * The various flags for setting POSIX.1b interval timers.
Index: linux-2.6.13.ktimers/include/linux/timer.h
===================================================================
--- linux-2.6.13.ktimers.orig/include/linux/timer.h
+++ linux-2.6.13.ktimers/include/linux/timer.h
@@ -87,6 +87,6 @@ static inline void add_timer(struct time
 
 extern void init_timers(void);
 extern void run_local_timers(void);
-extern void it_real_fn(unsigned long);
+extern void it_real_fn(void *);
 
 #endif
Index: linux-2.6.13.ktimers/init/main.c
===================================================================
--- linux-2.6.13.ktimers.orig/init/main.c
+++ linux-2.6.13.ktimers/init/main.c
@@ -472,6 +472,7 @@ asmlinkage void __init start_kernel(void
 	init_IRQ();
 	pidhash_init();
 	init_timers();
+	init_ktimers();
 	softirq_init();
 	time_init();
 
Index: linux-2.6.13.ktimers/kernel/Makefile
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/Makefile
+++ linux-2.6.13.ktimers/kernel/Makefile
@@ -7,7 +7,8 @@ obj-y     = sched.o fork.o exec_domain.o
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o
+	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o \
+	    ktimers.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
Index: linux-2.6.13.ktimers/kernel/exit.c
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/exit.c
+++ linux-2.6.13.ktimers/kernel/exit.c
@@ -830,7 +830,7 @@ fastcall NORET_TYPE void do_exit(long co
 	update_mem_hiwater(tsk);
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
- 		del_timer_sync(&tsk->signal->real_timer);
+ 		stop_ktimer(&tsk->signal->real_timer);
 		acct_process(code);
 	}
 	exit_mm(tsk);
Index: linux-2.6.13.ktimers/kernel/fork.c
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/fork.c
+++ linux-2.6.13.ktimers/kernel/fork.c
@@ -774,10 +774,9 @@ static inline int copy_signal(unsigned l
 	init_sigpending(&sig->shared_pending);
 	INIT_LIST_HEAD(&sig->posix_timers);
 
-	sig->it_real_value = sig->it_real_incr = 0;
+	init_ktimer_mono(&sig->real_timer);
 	sig->real_timer.function = it_real_fn;
-	sig->real_timer.data = (unsigned long) tsk;
-	init_timer(&sig->real_timer);
+	sig->real_timer.data = tsk;
 
 	sig->it_virt_expires = cputime_zero;
 	sig->it_virt_incr = cputime_zero;
Index: linux-2.6.13.ktimers/kernel/itimer.c
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/itimer.c
+++ linux-2.6.13.ktimers/kernel/itimer.c
@@ -12,36 +12,22 @@
 #include <linux/syscalls.h>
 #include <linux/time.h>
 #include <linux/posix-timers.h>
+#include <linux/ktimer.h>
 
 #include <asm/uaccess.h>
 
-static unsigned long it_real_value(struct signal_struct *sig)
-{
-	unsigned long val = 0;
-	if (timer_pending(&sig->real_timer)) {
-		val = sig->real_timer.expires - jiffies;
-
-		/* look out for negative/zero itimer.. */
-		if ((long) val <= 0)
-			val = 1;
-	}
-	return val;
-}
-
 int do_getitimer(int which, struct itimerval *value)
 {
 	struct task_struct *tsk = current;
-	unsigned long interval, val;
+	nsec_t interval, val;
 	cputime_t cinterval, cval;
 
 	switch (which) {
 	case ITIMER_REAL:
-		spin_lock_irq(&tsk->sighand->siglock);
-		interval = tsk->signal->it_real_incr;
-		val = it_real_value(tsk->signal);
-		spin_unlock_irq(&tsk->sighand->siglock);
-		jiffies_to_timeval(val, &value->it_value);
-		jiffies_to_timeval(interval, &value->it_interval);
+		interval = tsk->signal->real_timer.interval;
+		val = get_remtime_ktimer(&tsk->signal->real_timer, NSEC_PER_USEC);
+		value->it_value = ns_to_timeval(val);
+		value->it_interval = ns_to_timeval(interval);
 		break;
 	case ITIMER_VIRTUAL:
 		read_lock(&tasklist_lock);
@@ -113,59 +99,34 @@ asmlinkage long sys_getitimer(int which,
 }
 
 
-void it_real_fn(unsigned long __data)
+/*
+ * The timer is automagically restarted, when interval != 0
+ */
+void it_real_fn(void *data)
 {
-	struct task_struct * p = (struct task_struct *) __data;
-	unsigned long inc = p->signal->it_real_incr;
-
-	send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p);
-
-	/*
-	 * Now restart the timer if necessary.  We don't need any locking
-	 * here because do_setitimer makes sure we have finished running
-	 * before it touches anything.
-	 * Note, we KNOW we are (or should be) at a jiffie edge here so
-	 * we don't need the +1 stuff.  Also, we want to use the prior
-	 * expire value so as to not "slip" a jiffie if we are late.
-	 * Deal with requesting a time prior to "now" here rather than
-	 * in add_timer.
-	 */
-	if (!inc)
-		return;
-	while (time_before_eq(p->signal->real_timer.expires, jiffies))
-		p->signal->real_timer.expires += inc;
-	add_timer(&p->signal->real_timer);
+	send_group_sig_info(SIGALRM, SEND_SIG_PRIV, data);
 }
 
 int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)
 {
 	struct task_struct *tsk = current;
- 	unsigned long val, interval, expires;
+	struct ktimer *timer;
+ 	nsec_t expires;
 	cputime_t cval, cinterval, nval, ninterval;
 
 	switch (which) {
 	case ITIMER_REAL:
-again:
-		spin_lock_irq(&tsk->sighand->siglock);
-		interval = tsk->signal->it_real_incr;
-		val = it_real_value(tsk->signal);
-		/* We are sharing ->siglock with it_real_fn() */
-		if (try_to_del_timer_sync(&tsk->signal->real_timer) < 0) {
-			spin_unlock_irq(&tsk->sighand->siglock);
-			goto again;
-		}
-		tsk->signal->it_real_incr =
-			timeval_to_jiffies(&value->it_interval);
-		expires = timeval_to_jiffies(&value->it_value);
-		if (expires)
-			mod_timer(&tsk->signal->real_timer,
-				  jiffies + 1 + expires);
-		spin_unlock_irq(&tsk->sighand->siglock);
+		timer = &tsk->signal->real_timer;
+		stop_ktimer(timer);
 		if (ovalue) {
-			jiffies_to_timeval(val, &ovalue->it_value);
-			jiffies_to_timeval(interval,
-					   &ovalue->it_interval);
+			ovalue->it_value = ns_to_timeval(get_remtime_ktimer(timer, NSEC_PER_USEC));
+			ovalue->it_interval = ns_to_timeval(timer->interval);
 		}
+		timer->interval = ktimer_convert_timeval(timer, &value->it_interval);
+		expires = ktimer_convert_timeval(timer, &value->it_value);
+		if (expires)
+			modify_ktimer(timer, &expires, KTIMER_REL | KTIMER_NOCHECK);
+
 		break;
 	case ITIMER_VIRTUAL:
 		nval = timeval_to_cputime(&value->it_value);
Index: linux-2.6.13.ktimers/kernel/ktimers.c
===================================================================
--- /dev/null
+++ linux-2.6.13.ktimers/kernel/ktimers.c
@@ -0,0 +1,845 @@
+/*
+ *  linux/kernel/ktimers.c
+ *
+ *  Copyright(C) 2005 Thomas Gleixner <tglx@linutronix.de>
+ *
+ *  Kudos to Ingo Molnar for review, criticism, ideas
+ *
+ *  Credits:
+ *	Lot of ideas and implementation details taken from
+ *	timer.c and related code
+ *
+ *  Kernel timers
+ *
+ *  In contrast to the timeout related API found in kernel/timer.c,
+ *  ktimers provide finer resolution and accuracy depending on system
+ *  configuration and capabilities.
+ *
+ *  These timers are used for
+ *  - itimers
+ *  - posixtimers
+ *  - nanosleep
+ *  - precise in kernel timing
+ *
+ *  Please do not abuse this API for simple timeouts.
+ *
+ *  For licencing details see kernel-base/COPYING
+ *
+ */
+
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/notifier.h>
+#include <linux/percpu.h>
+#include <linux/ktimer.h>
+
+#include <asm/uaccess.h>
+
+/*
+ * The SMP/UP kludge goes here
+ */
+#if defined(CONFIG_SMP)
+
+#define set_running_timer(b,t) b->running_timer = t
+#define wake_up_timer_waiters(b) wake_up(&b->wait_for_running_timer)
+#define ktimer_base_can_change (1)
+/*
+ * Wait for a running timer
+ */
+void wait_for_ktimer(struct ktimer *timer)
+{
+	struct ktimer_base *base = timer->base;
+
+	if (base && base->running_timer == timer)
+		wait_event(base->wait_for_running_timer,
+			   base->running_timer != timer);
+}
+
+/*
+ * We are using hashed locking: holding per_cpu(tvec_bases).t_base.lock
+ * means that all timers which are tied to this base via timer->base are
+ * locked, and the base itself is locked too.
+ *
+ * So __run_timers/migrate_timers can safely modify all timers which could
+ * be found on the lists/queues.
+ *
+ * When the timer's base is locked, and the timer removed from list, it is
+ * possible to set timer->base = NULL and drop the lock: the timer remains
+ * locked.
+ */
+static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer,
+					    unsigned long *flags)
+{
+	struct ktimer_base *base;
+
+	for (;;) {
+		base = timer->base;
+		if (likely(base != NULL)) {
+			spin_lock_irqsave(&base->lock, *flags);
+			if (likely(base == timer->base))
+				return base;
+			/* The timer has migrated to another CPU */
+			spin_unlock_irqrestore(&base->lock, *flags);
+		}
+		cpu_relax();
+	}
+}
+
+static inline struct ktimer_base *switch_ktimer_base(struct ktimer *timer,
+						     struct ktimer_base *base)
+{
+	struct ktimer_base *new_base = &base->base[raw_smp_processor_id()];
+
+	if (base != new_base) {
+		/*
+		 * We are trying to schedule the timer on the local CPU.
+		 * However we can't change timer's base while it is running,
+		 * so we wait for the running timer.
+		 */
+		if (unlikely(base->running_timer == timer)) {
+			return NULL;
+		} else {
+			/* See the comment in lock_timer_base() */
+			timer->base = NULL;
+			spin_unlock(&base->lock);
+			spin_lock(&new_base->lock);
+			timer->base = new_base;
+		}
+	}
+	return new_base;
+}
+
+/*
+ * Get the timer base unlocked
+ *
+ * Take care of timer->base = NULL in switch_ktimer_base !
+ */
+static inline struct ktimer_base *get_ktimer_base_unlocked(struct ktimer *timer)
+{
+	struct ktimer_base *base;
+	while (!(base = timer->base));
+	return base;
+}
+#else
+
+#define set_running_timer(b,t) do {} while (0)
+#define wake_up_timer_waiters(b)  do {} while (0)
+
+static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer,
+					    unsigned long *flags)
+{
+	struct ktimer_base *base;
+
+	base = timer->base;
+	spin_lock_irqsave(&base->lock, *flags);
+	return base;
+}
+
+#define switch_ktimer_base(t, b) b
+
+#define get_ktimer_base_unlocked(t) (t)->base
+#define ktimer_base_can_change (0)
+
+#endif	/* !CONFIG_SMP */
+
+/*
+ * Convert timespec to nsec_t with resolution adjustment
+ *
+ * Note: We can access base without locking here, as ktimers can
+ * migrate between CPUs but can not be moved from one clock source to
+ * another. The clock source binding is set at init_ktimer_XXX.
+ */
+nsec_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts)
+{
+	struct ktimer_base *base = get_ktimer_base_unlocked(timer);
+	nsec_t t;
+	long rem = ts->tv_nsec % base->resolution;
+
+	t = (nsec_t) ts->tv_sec * NSEC_PER_SEC;
+ 	t += (nsec_t) ts->tv_nsec;
+
+	/* Check, if the value has to be rounded */
+	if (rem)
+		t += (nsec_t) (base->resolution - rem);
+	return t;
+}
+
+/*
+ * Convert timeval to nsec_t with resolution adjustment
+ */
+nsec_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv)
+{
+	struct timespec ts;
+
+	ts.tv_sec = tv->tv_sec;
+	ts.tv_nsec = tv->tv_usec * NSEC_PER_USEC;
+
+	return ktimer_convert_timespec(timer, &ts);
+}
+
+/*
+ * Internal function to add (re)start a timer
+ *
+ * The timer is inserted in expiry order.
+ * Insertion into the red black tree is O(log(n))
+ *
+ */
+static int enqueue_ktimer(struct ktimer *timer, struct ktimer_base *base,
+			   nsec_t *tim, int mode)
+{
+	struct rb_node **link = &base->active.rb_node;
+	struct rb_node *parent = NULL;
+	struct ktimer *entry;
+	struct list_head *prev = &base->pending;
+	nsec_t now;
+
+	/* Get current time */
+	now = base->get_time();
+
+	/* Timer expiry mode */
+	switch (mode & ~KTIMER_NOCHECK) {
+	case KTIMER_ABS:
+		timer->expires = *tim;
+		break;
+	case KTIMER_REL:
+		timer->expires = now + *tim;
+		break;
+	case KTIMER_INCR:
+		timer->expires += *tim;
+		break;
+	case KTIMER_FORWARD:
+		while (timer->expires < now) {
+			timer->expires += *tim;
+			timer->overrun++;
+		}
+		goto nocheck;
+	case KTIMER_REARM:
+		while (timer->expires < now) {
+			timer->expires += timer->interval;
+			timer->overrun++;
+		}
+		goto nocheck;
+	case KTIMER_RESTART:
+		break;
+	default:
+		BUG();
+	}
+
+	/* Already expired.*/
+	if (timer->expires <= now) {
+		timer->expired = now;
+		/* The caller takes care of expiry */
+		if (!(mode & KTIMER_NOCHECK))
+			return -1;
+	}
+ nocheck:
+
+	while (*link) {
+		parent = *link;
+		entry = rb_entry(parent, struct ktimer, node);
+		/*
+		 * We dont care about collisions. Nodes with
+		 * the same expiry time stay together.
+		 */
+		if (ktimer_before(timer, entry))
+			link = &(*link)->rb_left;
+		else {
+			link = &(*link)->rb_right;
+			prev = &entry->list;
+		}
+	}
+
+	rb_link_node(&timer->node, parent, link);
+	rb_insert_color(&timer->node, &base->active);
+	list_add(&timer->list, prev);
+	timer->status = KTIMER_PENDING;
+	base->count++;
+	return 0;
+}
+
+/*
+ * Internal helper to remove a timer
+ *
+ * The function allows automatic rearming for interval
+ * timers.
+ *
+ */
+static inline void do_remove_ktimer(struct ktimer *timer,
+				    struct ktimer_base *base, int rearm)
+{
+	list_del(&timer->list);
+	rb_erase(&timer->node, &base->active);
+	timer->node.rb_parent = KTIMER_POISON;
+	timer->status = KTIMER_INACTIVE;
+	base->count--;
+	BUG_ON(base->count < 0);
+	/* Auto rearm the timer ? */
+	if (rearm && timer->interval > 0)
+		enqueue_ktimer(timer, base, NULL, KTIMER_REARM);
+}
+
+/*
+ * Called with base lock held
+ */
+static inline int remove_ktimer(struct ktimer *timer, struct ktimer_base *base)
+{
+	if (ktimer_active(timer)) {
+		do_remove_ktimer(timer, base, KTIMER_NOREARM);
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * Internal function to (re)start a timer.
+ */
+static int internal_restart_ktimer(struct ktimer *timer, nsec_t *tim,
+				   int mode)
+{
+	struct ktimer_base *base, *new_base;
+  	unsigned long flags;
+	int ret;
+
+	BUG_ON(!timer->function);
+
+ retry:
+	base = lock_ktimer_base(timer, &flags);
+
+	/* Remove an active timer from the queue */
+	ret = remove_ktimer(timer, base);
+
+	/* Switch the timer base, if necessary */
+	new_base = switch_ktimer_base(timer, base);
+
+	/* SMP  */
+	if (ktimer_base_can_change && unlikely(!new_base)){
+		spin_unlock_irqrestore(&base->lock, flags);
+		wait_for_ktimer(timer);
+		goto retry;
+	}
+	/*
+	 * When the new timer setting is already expired,
+	 * let the calling code deal with it.
+	 */
+	if (enqueue_ktimer(timer, new_base, tim, mode))
+		ret = -1;
+
+	spin_unlock_irqrestore(&new_base->lock, flags);
+	return ret;
+}
+
+/***
+ * modify_ktimer - modify a running timer
+ * @timer: the timer to be modified
+ * @tim: expiry time (required)
+ * @mode: timer setup mode
+ *
+ */
+int modify_ktimer(struct ktimer *timer, nsec_t *tim, int mode)
+{
+  	BUG_ON(!tim || !timer->function);
+	return internal_restart_ktimer(timer, tim, mode);
+}
+
+/***
+ * start_ktimer - start a timer on current CPU
+ * @timer: the timer to be added
+ * @tim: expiry time (optional, if not set in the timer)
+ * @mode: timer setup mode
+ */
+int start_ktimer(struct ktimer *timer, nsec_t *tim, int mode)
+{
+  	BUG_ON(ktimer_active(timer) || !timer->function);
+
+	return internal_restart_ktimer(timer, tim, mode);
+}
+
+/***
+ * try_to_stop_ktimer - try to deactivate a timer
+ */
+int try_to_stop_ktimer(struct ktimer *timer)
+{
+	struct ktimer_base *base;
+	unsigned long flags;
+	int ret = -1;
+
+	base = lock_ktimer_base(timer, &flags);
+
+	if (base->running_timer != timer) {
+		ret = remove_ktimer(timer, base);
+		if (ret)
+			timer->expired = base->get_time();
+	}
+
+	spin_unlock_irqrestore(&base->lock, flags);
+
+	return ret;
+
+}
+
+/***
+ * stop_timer_sync - deactivate a timer and wait for the handler to finish.
+ * @timer: the timer to be deactivated
+ *
+ */
+int stop_ktimer(struct ktimer *timer)
+{
+	for (;;) {
+		int ret = try_to_stop_ktimer(timer);
+		if (ret >= 0)
+			return ret;
+		wait_for_ktimer(timer);
+	}
+}
+
+/***
+ * get_remtime_ktimer - get remaining time for the timer
+ * @timer: the timer to read
+ * @fake:  when fake > 0 a pending, but expired timer
+ *	   returns fake (itimers need this, uurg)
+ */
+nsec_t get_remtime_ktimer(struct ktimer *timer, long fake)
+{
+	struct ktimer_base *base;
+	unsigned long flags;
+	nsec_t rem;
+
+	base = lock_ktimer_base(timer, &flags);
+	if (ktimer_active(timer)) {
+		rem = timer->expires - base->get_time();
+		if (fake && rem <= 0)
+			rem = (nsec_t) fake;
+	} else {
+		rem = fake ? 0 : timer->expires - timer->expired;
+	}
+	spin_unlock_irqrestore(&base->lock, flags);
+	return rem;
+}
+
+/***
+ * get_expiry_ktimer - get expiry time for the timer
+ * @timer: the timer to read
+ * @now:   if != NULL store current base->time
+ */
+nsec_t get_expiry_ktimer(struct ktimer *timer, nsec_t *now)
+{
+	struct ktimer_base *base;
+	unsigned long flags;
+	nsec_t expiry;
+
+	base = lock_ktimer_base(timer, &flags);
+	expiry = timer->expires;
+	if (now)
+		*now = base->get_time();
+	spin_unlock_irqrestore(&base->lock, flags);
+	return expiry;
+}
+
+/*
+ * Functions related to clock sources
+ */
+static struct ktimer_base *registered_bases[MAX_KTIMER_BASES];
+
+static inline void ktimer_common_init(struct ktimer *timer)
+{
+	memset(timer, 0, sizeof(struct ktimer));
+	timer->node.rb_parent = KTIMER_POISON;
+}
+
+/*
+ * Get monotonic time
+ */
+static nsec_t get_ktime_mono(void)
+{
+	return do_get_ktime_mono();
+}
+
+/*
+ * per-CPU timer queues for monotonic time
+ */
+static struct ktimer_base monotonic_bases[NR_CPUS] __cacheline_aligned = {
+	[0 ... NR_CPUS-1] = {
+		.base = monotonic_bases,
+		.name = "Monotonic",
+		.get_time = &get_ktime_mono,
+		.resolution = KTIME_MONOTONIC_RES,
+	}
+};
+
+/***
+ * init_ktimer_mono - initialize a timer on monotonic time
+ * @timer: the timer to be initialized
+ *
+ */
+void fastcall init_ktimer_mono(struct ktimer *timer)
+{
+	ktimer_common_init(timer);
+	timer->base = &monotonic_bases[raw_smp_processor_id()];
+}
+
+/***
+ * get_ktimer_mono_res - get the monotonic timer resolution
+ *
+ */
+int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp)
+{
+	tp->tv_sec = 0;
+	tp->tv_nsec = monotonic_bases[0].resolution;
+	return 0;
+}
+
+/*
+ * Get real time
+ */
+static nsec_t get_ktime_real(void)
+{
+	return do_get_ktime_real();
+}
+
+/*
+ * per-CPU timer queues for real time
+ */
+static struct ktimer_base realtime_bases[NR_CPUS] __cacheline_aligned = {
+	[0 ... NR_CPUS-1] = {
+		.base = realtime_bases,
+		.name = "Realtime",
+		.get_time = &get_ktime_real,
+		.resolution = KTIME_REALTIME_RES,
+	}
+};
+
+/***
+ * init_ktimer_real - initialize a timer on real time
+ * @timer: the timer to be initialized
+ *
+ */
+void fastcall init_ktimer_real(struct ktimer *timer)
+{
+	ktimer_common_init(timer);
+	timer->base = &realtime_bases[raw_smp_processor_id()];
+}
+
+/***
+ * get_ktimer_real_res - get the real timer resolution
+ *
+ */
+int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp)
+{
+	tp->tv_sec = 0;
+	tp->tv_nsec = realtime_bases[0].resolution;
+	return 0;
+}
+
+/*
+ * The per base runqueue
+ */
+static inline void run_ktimer_queue(struct ktimer_base *base)
+{
+	nsec_t now = base->get_time();
+
+	spin_lock_irq(&base->lock);
+	for (; !list_empty(&base->pending); ) {
+		void (*fn)(void *);
+		void *data;
+		struct ktimer *timer = list_entry(base->pending.next,
+						  struct ktimer, list);
+		if (timer->expires > now)
+			break;
+		timer->expired = now;
+		fn = timer->function;
+		data = timer->data;
+		set_running_timer(base, timer);
+		do_remove_ktimer(timer, base, KTIMER_REARM);
+		spin_unlock_irq(&base->lock);
+ 		fn(data);
+		spin_lock_irq(&base->lock);
+		set_running_timer(base, NULL);
+	}
+	spin_unlock_irq(&base->lock);
+	wake_up_timer_waiters(base);
+}
+
+/*
+ * Called from timer softirq every jiffy
+ */
+void run_ktimer_queues(void)
+{
+	int cpu = smp_processor_id();
+	int i;
+
+	for (i = 0; i < MAX_KTIMER_BASES; i++) {
+		if (!registered_bases[i])
+			break;
+		run_ktimer_queue(&registered_bases[i][cpu]);
+	}
+}
+
+/*
+ * Functions related to initialization
+ */
+static void __devinit init_ktimers_cpu(int cpu)
+{
+	struct ktimer_base *base;
+	int i;
+
+	for (i = 0; i < MAX_KTIMER_BASES; i++) {
+		base = &registered_bases[i][cpu];
+		if (!base)
+			break;
+		spin_lock_init(&base->lock);
+		INIT_LIST_HEAD(&base->pending);
+		init_waitqueue_head(&base->wait_for_running_timer);
+	}
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
+static void migrate_ktimer_list(struct ktimer_base *old_base,
+				struct ktimer_base *new_base)
+{
+	struct ktimer *timer;
+	struct rb_node *node;
+
+	while ((node = rb_first(&old_base->root))) {
+		timer = rb_entry(node, struct ktimer, tnode);
+		remove_ktimer(timer, old_base);
+		timer->base = new_base;
+		enqueue_ktimer(timer, new_base, NULL);
+	}
+}
+
+static void __devinit migrate_ktimers(int cpu)
+{
+	struct ktimer_base *old_base;
+	struct ktimer_base *new_base;
+	int i;
+
+	BUG_ON(cpu_online(cpu));
+
+	local_irq_disable();
+
+	for (i = 0; i < MAX_KTIMER_BASES; i++) {
+		old_base = &registered_bases[i][cpu];
+		if (!old_base)
+			break;
+		new_base = &registered_bases[i][smp_processor_id()];
+
+		spin_lock(&new_base->lock);
+		spin_lock(&old_base->lock);
+
+		if (old_base->running_timer)
+			BUG();
+
+		migrate_ktimer_list(old_base, new_base);
+
+		spin_unlock(&old_base->lock);
+		spin_unlock(&new_base->lock);
+	}
+
+	local_irq_enable();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+static int __devinit ktimer_cpu_notify(struct notifier_block *self,
+				       unsigned long action, void *hcpu)
+{
+	long cpu = (long)hcpu;
+	switch(action) {
+	case CPU_UP_PREPARE:
+		init_ktimers_cpu(cpu);
+		break;
+#ifdef CONFIG_HOTPLUG_CPU
+	case CPU_DEAD:
+		migrate_ktimers(cpu);
+		break;
+#endif
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block __devinitdata ktimers_nb = {
+	.notifier_call	= ktimer_cpu_notify,
+};
+
+int __init register_ktime_base(struct ktimer_base *base)
+{
+	int i;
+
+	for (i = 0; i < MAX_KTIMER_BASES; i++) {
+		if (!registered_bases[i]) {
+			registered_bases[i] = base;
+			printk("Registered KTimer base %s\n", base->name);
+			return 0;
+		}
+	}
+	return -1;
+}
+
+void __init init_ktimers(void)
+{
+	register_ktime_base(monotonic_bases);
+	register_ktime_base(realtime_bases);
+
+	ktimer_cpu_notify(&ktimers_nb, (unsigned long)CPU_UP_PREPARE,
+				(void *)(long)smp_processor_id());
+	register_cpu_notifier(&ktimers_nb);
+}
+
+/*
+ * system interface related functions
+ */
+static void process_ktimer(void *data)
+{
+	wake_up_process(data);
+}
+
+/**
+ * schedule_ktimer - sleep until timeout
+ * @timeout: timeout value
+ * @state:   state to use for sleep
+ * @rel:    timeout value is abs/rel
+ *
+ * Make the current task sleep until @timeout is
+ * elapsed.
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to
+ * pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * will be returned
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ */
+fastcall nsec_t __sched schedule_ktimer(struct ktimer *timer,
+					nsec_t *t, int state, int mode)
+{
+	timer->data = current;
+	timer->function = process_ktimer;
+
+	current->state = state;
+	if (start_ktimer(timer, t, mode)) {
+		current->state = TASK_RUNNING;
+		goto out;
+	}
+	if (current->state != TASK_RUNNING)
+		schedule();
+	stop_ktimer(timer);
+ out:
+	/* Store the absolute expiry time */
+	*t = timer->expires;
+	/* Return the remaining time */
+	return timer->expires - timer->expired;
+}
+
+static long __sched nanosleep_restart(struct ktimer *timer,
+				      struct restart_block *restart)
+{
+	struct timespec tu;
+	nsec_t t, rem;
+	void *rfn = restart->fn;
+	struct timespec __user *rmtp = (struct timespec __user *) restart->arg2;
+
+	restart->fn = do_no_restart_syscall;
+
+	t = (nsec_t) restart->arg0;
+	t += ((nsec_t) restart->arg1) << 32;
+
+	rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, KTIMER_ABS);
+
+	if (rem <= 0)
+		return 0;
+
+	tu = ns_to_timespec(rem);
+	if (rmtp && copy_to_user(rmtp, &rem, sizeof(rem)))
+		return -EFAULT;
+
+	restart->fn = rfn;
+	/* The other values in restart are already filled in */
+	return -ERESTART_RESTARTBLOCK;
+}
+
+static long __sched nanosleep_restart_mono(struct restart_block *restart)
+{
+	struct ktimer timer;
+
+	init_ktimer_mono(&timer);
+	return nanosleep_restart(&timer, restart);
+}
+
+static long __sched nanosleep_restart_real(struct restart_block *restart)
+{
+	struct ktimer timer;
+
+	init_ktimer_real(&timer);
+	return nanosleep_restart(&timer, restart);
+}
+
+static long ktimer_nanosleep(struct ktimer *timer, struct timespec *rqtp,
+			     struct timespec __user *rmtp, int mode,
+			     long (*rfn)(struct restart_block *))
+{
+	struct timespec tu;
+	nsec_t rem, t;
+	struct restart_block *restart;
+
+	t = ktimer_convert_timespec(timer, rqtp);
+
+	/* t is updated to absolute expiry time ! */
+	rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, mode);
+
+	if (rem <= 0)
+		return 0;
+
+	tu = ns_to_timespec(rem);
+
+	if (rmtp && copy_to_user(rmtp, &tu, sizeof(tu)))
+		return -EFAULT;
+
+	restart = &current_thread_info()->restart_block;
+	restart->fn = rfn;
+	restart->arg0 = t & 0xFFFFFFFFLL;
+	restart->arg1 = t >> 32;
+	restart->arg2 = (unsigned long) rmtp;
+	return -ERESTART_RESTARTBLOCK;
+
+}
+
+long ktimer_nanosleep_mono(struct timespec *rqtp,
+			   struct timespec __user *rmtp, int mode)
+{
+	struct ktimer timer;
+
+	init_ktimer_mono(&timer);
+	return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_mono);
+}
+
+long ktimer_nanosleep_real(struct timespec *rqtp,
+			   struct timespec __user *rmtp, int mode)
+{
+	struct ktimer timer;
+
+	init_ktimer_real(&timer);
+	return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_real);
+}
+
+asmlinkage long sys_nanosleep(struct timespec __user *rqtp,
+			      struct timespec __user *rmtp)
+{
+	struct timespec tu;
+
+	if (copy_from_user(&tu, rqtp, sizeof(tu)))
+		return -EFAULT;
+
+	if (!timespec_valid(&tu))
+		return -EINVAL;
+
+	return ktimer_nanosleep_mono(&tu, rmtp, KTIMER_REL);
+}
+
Index: linux-2.6.13.ktimers/kernel/posix-cpu-timers.c
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/posix-cpu-timers.c
+++ linux-2.6.13.ktimers/kernel/posix-cpu-timers.c
@@ -1394,7 +1394,7 @@ void set_process_cpu_timer(struct task_s
 static long posix_cpu_clock_nanosleep_restart(struct restart_block *);
 
 int posix_cpu_nsleep(clockid_t which_clock, int flags,
-		     struct timespec *rqtp)
+		     struct timespec *rqtp, struct timespec __user *rmtp)
 {
 	struct restart_block *restart_block =
 	    &current_thread_info()->restart_block;
@@ -1419,7 +1419,6 @@ int posix_cpu_nsleep(clockid_t which_clo
 	error = posix_cpu_timer_create(&timer);
 	timer.it_process = current;
 	if (!error) {
-		struct timespec __user *rmtp;
 		static struct itimerspec zero_it;
 		struct itimerspec it = { .it_value = *rqtp,
 					 .it_interval = {} };
@@ -1466,7 +1465,6 @@ int posix_cpu_nsleep(clockid_t which_clo
 		/*
 		 * Report back to the user the time still remaining.
 		 */
-		rmtp = (struct timespec __user *) restart_block->arg1;
 		if (rmtp != NULL && !(flags & TIMER_ABSTIME) &&
 		    copy_to_user(rmtp, &it.it_value, sizeof *rmtp))
 			return -EFAULT;
@@ -1474,6 +1472,7 @@ int posix_cpu_nsleep(clockid_t which_clo
 		restart_block->fn = posix_cpu_clock_nanosleep_restart;
 		/* Caller already set restart_block->arg1 */
 		restart_block->arg0 = which_clock;
+		restart_block->arg1 = (unsigned long) rmtp;
 		restart_block->arg2 = rqtp->tv_sec;
 		restart_block->arg3 = rqtp->tv_nsec;
 
@@ -1487,10 +1486,15 @@ static long
 posix_cpu_clock_nanosleep_restart(struct restart_block *restart_block)
 {
 	clockid_t which_clock = restart_block->arg0;
-	struct timespec t = { .tv_sec = restart_block->arg2,
-			      .tv_nsec = restart_block->arg3 };
+	struct timespec __user *rmtp;
+	struct timespec t;
+
+	rmtp = (struct timespec __user *) restart_block->arg1;
+	t.tv_sec = restart_block->arg2;
+	t.tv_nsec = restart_block->arg3;
+
 	restart_block->fn = do_no_restart_syscall;
-	return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t);
+	return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t, rmtp);
 }
 
 
@@ -1511,9 +1515,10 @@ static int process_cpu_timer_create(stru
 	return posix_cpu_timer_create(timer);
 }
 static int process_cpu_nsleep(clockid_t which_clock, int flags,
-			      struct timespec *rqtp)
+			      struct timespec *rqtp,
+			      struct timespec __user *rmtp)
 {
-	return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp);
+	return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp, rmtp);
 }
 static int thread_cpu_clock_getres(clockid_t which_clock, struct timespec *tp)
 {
@@ -1529,7 +1534,7 @@ static int thread_cpu_timer_create(struc
 	return posix_cpu_timer_create(timer);
 }
 static int thread_cpu_nsleep(clockid_t which_clock, int flags,
-			      struct timespec *rqtp)
+			      struct timespec *rqtp, struct timespec __user *rmtp)
 {
 	return -EINVAL;
 }
Index: linux-2.6.13.ktimers/kernel/posix-timers.c
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/posix-timers.c
+++ linux-2.6.13.ktimers/kernel/posix-timers.c
@@ -48,21 +48,6 @@
 #include <linux/workqueue.h>
 #include <linux/module.h>
 
-#ifndef div_long_long_rem
-#include <asm/div64.h>
-
-#define div_long_long_rem(dividend,divisor,remainder) ({ \
-		       u64 result = dividend;		\
-		       *remainder = do_div(result,divisor); \
-		       result; })
-
-#endif
-#define CLOCK_REALTIME_RES TICK_NSEC  /* In nano seconds. */
-
-static inline u64  mpy_l_X_l_ll(unsigned long mpy1,unsigned long mpy2)
-{
-	return (u64)mpy1 * mpy2;
-}
 /*
  * Management arrays for POSIX timers.	 Timers are kept in slab memory
  * Timer ids are allocated by an external routine that keeps track of the
@@ -148,18 +133,18 @@ static DEFINE_SPINLOCK(idr_lock);
  */
 
 static struct k_clock posix_clocks[MAX_CLOCKS];
+
 /*
- * We only have one real clock that can be set so we need only one abs list,
- * even if we should want to have several clocks with differing resolutions.
+ * These ones are defined below.
  */
-static struct k_clock_abs abs_list = {.list = LIST_HEAD_INIT(abs_list.list),
-				      .lock = SPIN_LOCK_UNLOCKED};
+static int common_nsleep(clockid_t, int flags, struct timespec *t,
+			 struct timespec __user *rmtp);
+static void common_timer_get(struct k_itimer *, struct itimerspec *);
+static int common_timer_set(struct k_itimer *, int,
+			    struct itimerspec *, struct itimerspec *);
+static int common_timer_del(struct k_itimer *timer);
 
-static void posix_timer_fn(unsigned long);
-static u64 do_posix_clock_monotonic_gettime_parts(
-	struct timespec *tp, struct timespec *mo);
-int do_posix_clock_monotonic_gettime(struct timespec *tp);
-static int do_posix_clock_monotonic_get(clockid_t, struct timespec *tp);
+static void posix_timer_fn(void *data);
 
 static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags);
 
@@ -205,21 +190,25 @@ static inline int common_clock_set(clock
 
 static inline int common_timer_create(struct k_itimer *new_timer)
 {
-	INIT_LIST_HEAD(&new_timer->it.real.abs_timer_entry);
-	init_timer(&new_timer->it.real.timer);
-	new_timer->it.real.timer.data = (unsigned long) new_timer;
+	return -EINVAL;
+}
+
+static int timer_create_mono(struct k_itimer *new_timer)
+{
+	init_ktimer_mono(&new_timer->it.real.timer);
+	new_timer->it.real.timer.data = new_timer;
+	new_timer->it.real.timer.function = posix_timer_fn;
+	return 0;
+}
+
+static int timer_create_real(struct k_itimer *new_timer)
+{
+	init_ktimer_real(&new_timer->it.real.timer);
+	new_timer->it.real.timer.data = new_timer;
 	new_timer->it.real.timer.function = posix_timer_fn;
 	return 0;
 }
 
-/*
- * These ones are defined below.
- */
-static int common_nsleep(clockid_t, int flags, struct timespec *t);
-static void common_timer_get(struct k_itimer *, struct itimerspec *);
-static int common_timer_set(struct k_itimer *, int,
-			    struct itimerspec *, struct itimerspec *);
-static int common_timer_del(struct k_itimer *timer);
 
 /*
  * Return nonzero iff we know a priori this clockid_t value is bogus.
@@ -239,19 +228,44 @@ static inline int invalid_clockid(clocki
 	return 1;
 }
 
+/*
+ * Get real time for posix timers
+ */
+static int posix_get_ktime_real_ts(clockid_t which_clock, struct timespec *tp)
+{
+	get_ktime_real_ts(tp);
+	return 0;
+}
+
+/*
+ * Get monotonic time for posix timers
+ */
+static int posix_get_ktime_mono_ts(clockid_t which_clock, struct timespec *tp)
+{
+	get_ktime_mono_ts(tp);
+	return 0;
+}
+
+void do_posix_clock_monotonic_gettime(struct timespec *ts)
+{
+	get_ktime_mono_ts(ts);
+}
 
 /*
  * Initialize everything, well, just everything in Posix clocks/timers ;)
  */
 static __init int init_posix_timers(void)
 {
-	struct k_clock clock_realtime = {.res = CLOCK_REALTIME_RES,
-					 .abs_struct = &abs_list
+	struct k_clock clock_realtime = {
+		.clock_getres = get_ktimer_real_res,
+		.clock_get = posix_get_ktime_real_ts,
+		.timer_create = timer_create_real,
 	};
-	struct k_clock clock_monotonic = {.res = CLOCK_REALTIME_RES,
-		.abs_struct = NULL,
-		.clock_get = do_posix_clock_monotonic_get,
-		.clock_set = do_posix_clock_nosettime
+	struct k_clock clock_monotonic = {
+		.clock_getres = get_ktimer_mono_res,
+		.clock_get = posix_get_ktime_mono_ts,
+		.clock_set = do_posix_clock_nosettime,
+		.timer_create = timer_create_mono,
 	};
 
 	register_posix_clock(CLOCK_REALTIME, &clock_realtime);
@@ -265,117 +279,17 @@ static __init int init_posix_timers(void
 
 __initcall(init_posix_timers);
 
-static void tstojiffie(struct timespec *tp, int res, u64 *jiff)
-{
-	long sec = tp->tv_sec;
-	long nsec = tp->tv_nsec + res - 1;
-
-	if (nsec > NSEC_PER_SEC) {
-		sec++;
-		nsec -= NSEC_PER_SEC;
-	}
-
-	/*
-	 * The scaling constants are defined in <linux/time.h>
-	 * The difference between there and here is that we do the
-	 * res rounding and compute a 64-bit result (well so does that
-	 * but it then throws away the high bits).
-  	 */
-	*jiff =  (mpy_l_X_l_ll(sec, SEC_CONVERSION) +
-		  (mpy_l_X_l_ll(nsec, NSEC_CONVERSION) >> 
-		   (NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
-}
-
-/*
- * This function adjusts the timer as needed as a result of the clock
- * being set.  It should only be called for absolute timers, and then
- * under the abs_list lock.  It computes the time difference and sets
- * the new jiffies value in the timer.  It also updates the timers
- * reference wall_to_monotonic value.  It is complicated by the fact
- * that tstojiffies() only handles positive times and it needs to work
- * with both positive and negative times.  Also, for negative offsets,
- * we need to defeat the res round up.
- *
- * Return is true if there is a new time, else false.
- */
-static long add_clockset_delta(struct k_itimer *timr,
-			       struct timespec *new_wall_to)
-{
-	struct timespec delta;
-	int sign = 0;
-	u64 exp;
-
-	set_normalized_timespec(&delta,
-				new_wall_to->tv_sec -
-				timr->it.real.wall_to_prev.tv_sec,
-				new_wall_to->tv_nsec -
-				timr->it.real.wall_to_prev.tv_nsec);
-	if (likely(!(delta.tv_sec | delta.tv_nsec)))
-		return 0;
-	if (delta.tv_sec < 0) {
-		set_normalized_timespec(&delta,
-					-delta.tv_sec,
-					1 - delta.tv_nsec -
-					posix_clocks[timr->it_clock].res);
-		sign++;
-	}
-	tstojiffie(&delta, posix_clocks[timr->it_clock].res, &exp);
-	timr->it.real.wall_to_prev = *new_wall_to;
-	timr->it.real.timer.expires += (sign ? -exp : exp);
-	return 1;
-}
-
-static void remove_from_abslist(struct k_itimer *timr)
-{
-	if (!list_empty(&timr->it.real.abs_timer_entry)) {
-		spin_lock(&abs_list.lock);
-		list_del_init(&timr->it.real.abs_timer_entry);
-		spin_unlock(&abs_list.lock);
-	}
-}
 
 static void schedule_next_timer(struct k_itimer *timr)
 {
-	struct timespec new_wall_to;
-	struct now_struct now;
-	unsigned long seq;
-
-	/*
-	 * Set up the timer for the next interval (if there is one).
-	 * Note: this code uses the abs_timer_lock to protect
-	 * it.real.wall_to_prev and must hold it until exp is set, not exactly
-	 * obvious...
-
-	 * This function is used for CLOCK_REALTIME* and
-	 * CLOCK_MONOTONIC* timers.  If we ever want to handle other
-	 * CLOCKs, the calling code (do_schedule_next_timer) would need
-	 * to pull the "clock" info from the timer and dispatch the
-	 * "other" CLOCKs "next timer" code (which, I suppose should
-	 * also be added to the k_clock structure).
-	 */
 	if (!timr->it.real.incr)
 		return;
 
-	do {
-		seq = read_seqbegin(&xtime_lock);
-		new_wall_to =	wall_to_monotonic;
-		posix_get_now(&now);
-	} while (read_seqretry(&xtime_lock, seq));
-
-	if (!list_empty(&timr->it.real.abs_timer_entry)) {
-		spin_lock(&abs_list.lock);
-		add_clockset_delta(timr, &new_wall_to);
-
-		posix_bump_timer(timr, now);
-
-		spin_unlock(&abs_list.lock);
-	} else {
-		posix_bump_timer(timr, now);
-	}
-	timr->it_overrun_last = timr->it_overrun;
-	timr->it_overrun = -1;
+	timr->it_overrun_last = timr->it.real.overrun;
+	timr->it.real.overrun = timr->it.real.timer.overrun = -1;
 	++timr->it_requeue_pending;
-	add_timer(&timr->it.real.timer);
+	start_ktimer(&timr->it.real.timer, &timr->it.real.incr, KTIMER_FORWARD);
+	timr->it.real.overrun = timr->it.real.timer.overrun;
 }
 
 /*
@@ -413,14 +327,7 @@ int posix_timer_event(struct k_itimer *t
 {
 	memset(&timr->sigq->info, 0, sizeof(siginfo_t));
 	timr->sigq->info.si_sys_private = si_private;
-	/*
-	 * Send signal to the process that owns this timer.
-
-	 * This code assumes that all the possible abs_lists share the
-	 * same lock (there is only one list at this time). If this is
-	 * not the case, the CLOCK info would need to be used to find
-	 * the proper abs list lock.
-	 */
+	/* Send signal to the process that owns this timer.*/
 
 	timr->sigq->info.si_signo = timr->it_sigev_signo;
 	timr->sigq->info.si_errno = 0;
@@ -452,65 +359,28 @@ EXPORT_SYMBOL_GPL(posix_timer_event);
 
  * This code is for CLOCK_REALTIME* and CLOCK_MONOTONIC* timers.
  */
-static void posix_timer_fn(unsigned long __data)
+static void posix_timer_fn(void *data)
 {
-	struct k_itimer *timr = (struct k_itimer *) __data;
+	struct k_itimer *timr = data;
 	unsigned long flags;
-	unsigned long seq;
-	struct timespec delta, new_wall_to;
-	u64 exp = 0;
-	int do_notify = 1;
+	int si_private = 0;
 
 	spin_lock_irqsave(&timr->it_lock, flags);
-	if (!list_empty(&timr->it.real.abs_timer_entry)) {
-		spin_lock(&abs_list.lock);
-		do {
-			seq = read_seqbegin(&xtime_lock);
-			new_wall_to =	wall_to_monotonic;
-		} while (read_seqretry(&xtime_lock, seq));
-		set_normalized_timespec(&delta,
-					new_wall_to.tv_sec -
-					timr->it.real.wall_to_prev.tv_sec,
-					new_wall_to.tv_nsec -
-					timr->it.real.wall_to_prev.tv_nsec);
-		if (likely((delta.tv_sec | delta.tv_nsec ) == 0)) {
-			/* do nothing, timer is on time */
-		} else if (delta.tv_sec < 0) {
-			/* do nothing, timer is already late */
-		} else {
-			/* timer is early due to a clock set */
-			tstojiffie(&delta,
-				   posix_clocks[timr->it_clock].res,
-				   &exp);
-			timr->it.real.wall_to_prev = new_wall_to;
-			timr->it.real.timer.expires += exp;
-			add_timer(&timr->it.real.timer);
-			do_notify = 0;
-		}
-		spin_unlock(&abs_list.lock);
 
-	}
-	if (do_notify)  {
-		int si_private=0;
+	if (timr->it.real.incr)
+		si_private = ++timr->it_requeue_pending;
 
-		if (timr->it.real.incr)
-			si_private = ++timr->it_requeue_pending;
-		else {
-			remove_from_abslist(timr);
-		}
+	if (posix_timer_event(timr, si_private))
+		/*
+		 * signal was not sent because of sig_ignor
+		 * we will not get a call back to restart it AND
+		 * it should be restarted.
+		 */
+		schedule_next_timer(timr);
 
-		if (posix_timer_event(timr, si_private))
-			/*
-			 * signal was not sent because of sig_ignor
-			 * we will not get a call back to restart it AND
-			 * it should be restarted.
-			 */
-			schedule_next_timer(timr);
-	}
 	unlock_timer(timr, flags); /* hold thru abs lock to keep irq off */
 }
 
-
 static inline struct task_struct * good_sigevent(sigevent_t * event)
 {
 	struct task_struct *rtn = current->group_leader;
@@ -774,39 +644,39 @@ static struct k_itimer * lock_timer(time
 static void
 common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting)
 {
-	unsigned long expires;
-	struct now_struct now;
+	nsec_t expires, now, remaining;
+	struct ktimer *timer = &timr->it.real.timer;
 
-	do
-		expires = timr->it.real.timer.expires;
-	while ((volatile long) (timr->it.real.timer.expires) != expires);
-
-	posix_get_now(&now);
-
-	if (expires &&
-	    ((timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) &&
-	    !timr->it.real.incr &&
-	    posix_time_before(&timr->it.real.timer, &now))
-		timr->it.real.timer.expires = expires = 0;
-	if (expires) {
-		if (timr->it_requeue_pending & REQUEUE_PENDING ||
-		    (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) {
-			posix_bump_timer(timr, now);
-			expires = timr->it.real.timer.expires;
-		}
-		else
-			if (!timer_pending(&timr->it.real.timer))
-				expires = 0;
-		if (expires)
-			expires -= now.jiffies;
-	}
-	jiffies_to_timespec(expires, &cur_setting->it_value);
-	jiffies_to_timespec(timr->it.real.incr, &cur_setting->it_interval);
-
-	if (cur_setting->it_value.tv_sec < 0) {
+	memset(cur_setting, 0, sizeof(struct itimerspec));
+	expires = get_expiry_ktimer(timer, &now);
+	remaining = expires - now;
+	/* Time left ? or timer pending */
+	if (remaining > 0 || ktimer_active(timer))
+		goto calci;
+	/* interval timer ? */
+	if (!timr->it.real.incr)
+		return;
+	/*
+	 * When a requeue is pending or this is a SIGEV_NONE timer
+	 * move the expiry time forward by intervals, so expiry is >
+	 * now.
+	 * The active (non SIGEV_NONE) rearm should be done
+	 * automatically by the ktimer REARM mode. Thats the next
+	 * iteration.  The REQUEUE_PENDING part will go away !
+	 */
+	if (timr->it_requeue_pending & REQUEUE_PENDING ||
+	    (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) {
+		remaining = forward_posix_timer(timr, now);
+	}
+ calci:
+	/* interval timer ? */
+	if (timr->it.real.incr)
+		cur_setting->it_interval = ns_to_timespec(timr->it.real.incr);
+	/* Return 0 only, when the timer is expired and not pending */
+	if (remaining <= 0)
 		cur_setting->it_value.tv_nsec = 1;
-		cur_setting->it_value.tv_sec = 0;
-	}
+	else
+		cur_setting->it_value = ns_to_timespec(remaining);
 }
 
 /* Get the time remaining on a POSIX.1b interval timer. */
@@ -830,6 +700,7 @@ sys_timer_gettime(timer_t timer_id, stru
 
 	return 0;
 }
+
 /*
  * Get the number of overruns of a POSIX.1b interval timer.  This is to
  * be the overrun of the timer last delivered.  At the same time we are
@@ -856,84 +727,6 @@ sys_timer_getoverrun(timer_t timer_id)
 
 	return overrun;
 }
-/*
- * Adjust for absolute time
- *
- * If absolute time is given and it is not CLOCK_MONOTONIC, we need to
- * adjust for the offset between the timer clock (CLOCK_MONOTONIC) and
- * what ever clock he is using.
- *
- * If it is relative time, we need to add the current (CLOCK_MONOTONIC)
- * time to it to get the proper time for the timer.
- */
-static int adjust_abs_time(struct k_clock *clock, struct timespec *tp, 
-			   int abs, u64 *exp, struct timespec *wall_to)
-{
-	struct timespec now;
-	struct timespec oc = *tp;
-	u64 jiffies_64_f;
-	int rtn =0;
-
-	if (abs) {
-		/*
-		 * The mask pick up the 4 basic clocks 
-		 */
-		if (!((clock - &posix_clocks[0]) & ~CLOCKS_MASK)) {
-			jiffies_64_f = do_posix_clock_monotonic_gettime_parts(
-				&now,  wall_to);
-			/*
-			 * If we are doing a MONOTONIC clock
-			 */
-			if((clock - &posix_clocks[0]) & CLOCKS_MONO){
-				now.tv_sec += wall_to->tv_sec;
-				now.tv_nsec += wall_to->tv_nsec;
-			}
-		} else {
-			/*
-			 * Not one of the basic clocks
-			 */
-			clock->clock_get(clock - posix_clocks, &now);
-			jiffies_64_f = get_jiffies_64();
-		}
-		/*
-		 * Take away now to get delta and normalize
-		 */
-		set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec,
-					oc.tv_nsec - now.tv_nsec);
-	}else{
-		jiffies_64_f = get_jiffies_64();
-	}
-	/*
-	 * Check if the requested time is prior to now (if so set now)
-	 */
-	if (oc.tv_sec < 0)
-		oc.tv_sec = oc.tv_nsec = 0;
-
-	if (oc.tv_sec | oc.tv_nsec)
-		set_normalized_timespec(&oc, oc.tv_sec,
-					oc.tv_nsec + clock->res);
-	tstojiffie(&oc, clock->res, exp);
-
-	/*
-	 * Check if the requested time is more than the timer code
-	 * can handle (if so we error out but return the value too).
-	 */
-	if (*exp > ((u64)MAX_JIFFY_OFFSET))
-			/*
-			 * This is a considered response, not exactly in
-			 * line with the standard (in fact it is silent on
-			 * possible overflows).  We assume such a large 
-			 * value is ALMOST always a programming error and
-			 * try not to compound it by setting a really dumb
-			 * value.
-			 */
-			rtn = -EINVAL;
-	/*
-	 * return the actual jiffies expire time, full 64 bits
-	 */
-	*exp += jiffies_64_f;
-	return rtn;
-}
 
 /* Set a POSIX.1b interval timer. */
 /* timr->it_lock is taken. */
@@ -941,8 +734,8 @@ static inline int
 common_timer_set(struct k_itimer *timr, int flags,
 		 struct itimerspec *new_setting, struct itimerspec *old_setting)
 {
-	struct k_clock *clock = &posix_clocks[timr->it_clock];
-	u64 expire_64;
+	nsec_t expires;
+	int mode;
 
 	if (old_setting)
 		common_timer_get(timr, old_setting);
@@ -953,56 +746,40 @@ common_timer_set(struct k_itimer *timr, 
 	 * careful here.  If smp we could be in the "fire" routine which will
 	 * be spinning as we hold the lock.  But this is ONLY an SMP issue.
 	 */
-	if (try_to_del_timer_sync(&timr->it.real.timer) < 0) {
-#ifdef CONFIG_SMP
-		/*
-		 * It can only be active if on an other cpu.  Since
-		 * we have cleared the interval stuff above, it should
-		 * clear once we release the spin lock.  Of course once
-		 * we do that anything could happen, including the
-		 * complete melt down of the timer.  So return with
-		 * a "retry" exit status.
-		 */
+	if (try_to_stop_ktimer(&timr->it.real.timer) < 0)
 		return TIMER_RETRY;
-#endif
-	}
-
-	remove_from_abslist(timr);
 
 	timr->it_requeue_pending = (timr->it_requeue_pending + 2) & 
 		~REQUEUE_PENDING;
 	timr->it_overrun_last = 0;
 	timr->it_overrun = -1;
-	/*
-	 *switch off the timer when it_value is zero
-	 */
-	if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) {
-		timr->it.real.timer.expires = 0;
+
+	/* switch off the timer when it_value is zero */
+	if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec)
 		return 0;
-	}
 
-	if (adjust_abs_time(clock,
-			    &new_setting->it_value, flags & TIMER_ABSTIME, 
-			    &expire_64, &(timr->it.real.wall_to_prev))) {
-		return -EINVAL;
-	}
-	timr->it.real.timer.expires = (unsigned long)expire_64;
-	tstojiffie(&new_setting->it_interval, clock->res, &expire_64);
-	timr->it.real.incr = (unsigned long)expire_64;
+	mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL;
 
-	/*
-	 * We do not even queue SIGEV_NONE timers!  But we do put them
-	 * in the abs list so we can do that right.
+	/* Posix madness. Only absolute CLOCK_REALTIME timers
+	 * are affected by clock sets. So we must reiniatilize
+	 * the timer.
 	 */
+	if (timr->it_clock == CLOCK_REALTIME && mode == KTIMER_ABS)
+		timer_create_real(timr);
+	else
+		timer_create_mono(timr);
+
+	expires = ktimer_convert_timespec(&timr->it.real.timer,
+					  &new_setting->it_value);
+	/* This should be moved to the auto rearm code */
+	timr->it.real.incr = ktimer_convert_timespec(&timr->it.real.timer,
+						     &new_setting->it_interval);
+
+	/* SIGEV_NONE timers are not queued ! See common_timer_get */
 	if (((timr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE))
-		add_timer(&timr->it.real.timer);
+		start_ktimer(&timr->it.real.timer, &expires,
+			     mode | KTIMER_NOCHECK);
 
-	if (flags & TIMER_ABSTIME && clock->abs_struct) {
-		spin_lock(&clock->abs_struct->lock);
-		list_add_tail(&(timr->it.real.abs_timer_entry),
-			      &(clock->abs_struct->list));
-		spin_unlock(&clock->abs_struct->lock);
-	}
 	return 0;
 }
 
@@ -1037,6 +814,7 @@ retry:
 
 	unlock_timer(timr, flag);
 	if (error == TIMER_RETRY) {
+		wait_for_ktimer(&timr->it.real.timer);
 		rtn = NULL;	// We already got the old time...
 		goto retry;
 	}
@@ -1052,22 +830,8 @@ static inline int common_timer_del(struc
 {
 	timer->it.real.incr = 0;
 
-	if (try_to_del_timer_sync(&timer->it.real.timer) < 0) {
-#ifdef CONFIG_SMP
-		/*
-		 * It can only be active if on an other cpu.  Since
-		 * we have cleared the interval stuff above, it should
-		 * clear once we release the spin lock.  Of course once
-		 * we do that anything could happen, including the
-		 * complete melt down of the timer.  So return with
-		 * a "retry" exit status.
-		 */
+	if (try_to_stop_ktimer(&timer->it.real.timer) < 0)
 		return TIMER_RETRY;
-#endif
-	}
-
-	remove_from_abslist(timer);
-
 	return 0;
 }
 
@@ -1083,24 +847,17 @@ sys_timer_delete(timer_t timer_id)
 	struct k_itimer *timer;
 	long flags;
 
-#ifdef CONFIG_SMP
-	int error;
 retry_delete:
-#endif
 	timer = lock_timer(timer_id, &flags);
 	if (!timer)
 		return -EINVAL;
 
-#ifdef CONFIG_SMP
-	error = timer_delete_hook(timer);
-
-	if (error == TIMER_RETRY) {
+	if (timer_delete_hook(timer) == TIMER_RETRY) {
 		unlock_timer(timer, flags);
+		wait_for_ktimer(&timer->it.real.timer);
 		goto retry_delete;
 	}
-#else
-	timer_delete_hook(timer);
-#endif
+
 	spin_lock(&current->sighand->siglock);
 	list_del(&timer->list);
 	spin_unlock(&current->sighand->siglock);
@@ -1117,6 +874,7 @@ retry_delete:
 	release_posix_timer(timer, IT_ID_SET);
 	return 0;
 }
+
 /*
  * return timer owned by the process, used by exit_itimers
  */
@@ -1124,22 +882,14 @@ static inline void itimer_delete(struct 
 {
 	unsigned long flags;
 
-#ifdef CONFIG_SMP
-	int error;
 retry_delete:
-#endif
 	spin_lock_irqsave(&timer->it_lock, flags);
 
-#ifdef CONFIG_SMP
-	error = timer_delete_hook(timer);
-
-	if (error == TIMER_RETRY) {
+	if (timer_delete_hook(timer) == TIMER_RETRY) {
 		unlock_timer(timer, flags);
+		wait_for_ktimer(&timer->it.real.timer);
 		goto retry_delete;
 	}
-#else
-	timer_delete_hook(timer);
-#endif
 	list_del(&timer->list);
 	/*
 	 * This keeps any tasks waiting on the spin lock from thinking
@@ -1168,60 +918,7 @@ void exit_itimers(struct signal_struct *
 	}
 }
 
-/*
- * And now for the "clock" calls
- *
- * These functions are called both from timer functions (with the timer
- * spin_lock_irq() held and from clock calls with no locking.	They must
- * use the save flags versions of locks.
- */
-
-/*
- * We do ticks here to avoid the irq lock ( they take sooo long).
- * The seqlock is great here.  Since we a reader, we don't really care
- * if we are interrupted since we don't take lock that will stall us or
- * any other cpu. Voila, no irq lock is needed.
- *
- */
-
-static u64 do_posix_clock_monotonic_gettime_parts(
-	struct timespec *tp, struct timespec *mo)
-{
-	u64 jiff;
-	unsigned int seq;
-
-	do {
-		seq = read_seqbegin(&xtime_lock);
-		getnstimeofday(tp);
-		*mo = wall_to_monotonic;
-		jiff = jiffies_64;
-
-	} while(read_seqretry(&xtime_lock, seq));
-
-	return jiff;
-}
-
-static int do_posix_clock_monotonic_get(clockid_t clock, struct timespec *tp)
-{
-	struct timespec wall_to_mono;
-
-	do_posix_clock_monotonic_gettime_parts(tp, &wall_to_mono);
-
-	tp->tv_sec += wall_to_mono.tv_sec;
-	tp->tv_nsec += wall_to_mono.tv_nsec;
-
-	if ((tp->tv_nsec - NSEC_PER_SEC) > 0) {
-		tp->tv_nsec -= NSEC_PER_SEC;
-		tp->tv_sec++;
-	}
-	return 0;
-}
-
-int do_posix_clock_monotonic_gettime(struct timespec *tp)
-{
-	return do_posix_clock_monotonic_get(CLOCK_MONOTONIC, tp);
-}
-
+/* Not available / possible... functions */
 int do_posix_clock_nosettime(clockid_t clockid, struct timespec *tp)
 {
 	return -EINVAL;
@@ -1234,7 +931,8 @@ int do_posix_clock_notimer_create(struct
 }
 EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create);
 
-int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t)
+int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t,
+			       struct timespec *r)
 {
 #ifndef ENOTSUP
 	return -EOPNOTSUPP;	/* aka ENOTSUP in userland for POSIX */
@@ -1293,125 +991,34 @@ sys_clock_getres(clockid_t which_clock, 
 	return error;
 }
 
-static void nanosleep_wake_up(unsigned long __data)
-{
-	struct task_struct *p = (struct task_struct *) __data;
-
-	wake_up_process(p);
-}
-
 /*
- * The standard says that an absolute nanosleep call MUST wake up at
- * the requested time in spite of clock settings.  Here is what we do:
- * For each nanosleep call that needs it (only absolute and not on
- * CLOCK_MONOTONIC* (as it can not be set)) we thread a little structure
- * into the "nanosleep_abs_list".  All we need is the task_struct pointer.
- * When ever the clock is set we just wake up all those tasks.	 The rest
- * is done by the while loop in clock_nanosleep().
- *
- * On locking, clock_was_set() is called from update_wall_clock which
- * holds (or has held for it) a write_lock_irq( xtime_lock) and is
- * called from the timer bh code.  Thus we need the irq save locks.
- *
- * Also, on the call from update_wall_clock, that is done as part of a
- * softirq thing.  We don't want to delay the system that much (possibly
- * long list of timers to fix), so we defer that work to keventd.
+ * nanosleep for monotonic and realtime clocks
  */
-
-static DECLARE_WAIT_QUEUE_HEAD(nanosleep_abs_wqueue);
-static DECLARE_WORK(clock_was_set_work, (void(*)(void*))clock_was_set, NULL);
-
-static DECLARE_MUTEX(clock_was_set_lock);
-
-void clock_was_set(void)
+static int common_nsleep(clockid_t which_clock, int flags,
+			 struct timespec *tsave, struct timespec __user *rmtp)
 {
-	struct k_itimer *timr;
-	struct timespec new_wall_to;
-	LIST_HEAD(cws_list);
-	unsigned long seq;
-
+	int mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL;
 
-	if (unlikely(in_interrupt())) {
-		schedule_work(&clock_was_set_work);
-		return;
+	switch (which_clock) {
+	case CLOCK_REALTIME:
+		/* Posix madness. Only absolute timers on clock realtime
+		   are affected by clock set. */
+		if (mode == KTIMER_ABS)
+			return ktimer_nanosleep_real(tsave, rmtp, mode);
+	case CLOCK_MONOTONIC:
+		return ktimer_nanosleep_mono(tsave, rmtp, mode);
+	default:
+		break;
 	}
-	wake_up_all(&nanosleep_abs_wqueue);
-
-	/*
-	 * Check if there exist TIMER_ABSTIME timers to correct.
-	 *
-	 * Notes on locking: This code is run in task context with irq
-	 * on.  We CAN be interrupted!  All other usage of the abs list
-	 * lock is under the timer lock which holds the irq lock as
-	 * well.  We REALLY don't want to scan the whole list with the
-	 * interrupt system off, AND we would like a sequence lock on
-	 * this code as well.  Since we assume that the clock will not
-	 * be set often, it seems ok to take and release the irq lock
-	 * for each timer.  In fact add_timer will do this, so this is
-	 * not an issue.  So we know when we are done, we will move the
-	 * whole list to a new location.  Then as we process each entry,
-	 * we will move it to the actual list again.  This way, when our
-	 * copy is empty, we are done.  We are not all that concerned
-	 * about preemption so we will use a semaphore lock to protect
-	 * aginst reentry.  This way we will not stall another
-	 * processor.  It is possible that this may delay some timers
-	 * that should have expired, given the new clock, but even this
-	 * will be minimal as we will always update to the current time,
-	 * even if it was set by a task that is waiting for entry to
-	 * this code.  Timers that expire too early will be caught by
-	 * the expire code and restarted.
-
-	 * Absolute timers that repeat are left in the abs list while
-	 * waiting for the task to pick up the signal.  This means we
-	 * may find timers that are not in the "add_timer" list, but are
-	 * in the abs list.  We do the same thing for these, save
-	 * putting them back in the "add_timer" list.  (Note, these are
-	 * left in the abs list mainly to indicate that they are
-	 * ABSOLUTE timers, a fact that is used by the re-arm code, and
-	 * for which we have no other flag.)
-
-	 */
-
-	down(&clock_was_set_lock);
-	spin_lock_irq(&abs_list.lock);
-	list_splice_init(&abs_list.list, &cws_list);
-	spin_unlock_irq(&abs_list.lock);
-	do {
-		do {
-			seq = read_seqbegin(&xtime_lock);
-			new_wall_to =	wall_to_monotonic;
-		} while (read_seqretry(&xtime_lock, seq));
-
-		spin_lock_irq(&abs_list.lock);
-		if (list_empty(&cws_list)) {
-			spin_unlock_irq(&abs_list.lock);
-			break;
-		}
-		timr = list_entry(cws_list.next, struct k_itimer,
-				  it.real.abs_timer_entry);
-
-		list_del_init(&timr->it.real.abs_timer_entry);
-		if (add_clockset_delta(timr, &new_wall_to) &&
-		    del_timer(&timr->it.real.timer))  /* timer run yet? */
-			add_timer(&timr->it.real.timer);
-		list_add(&timr->it.real.abs_timer_entry, &abs_list.list);
-		spin_unlock_irq(&abs_list.lock);
-	} while (1);
-
-	up(&clock_was_set_lock);
+	return -EINVAL;
 }
 
-long clock_nanosleep_restart(struct restart_block *restart_block);
-
 asmlinkage long
 sys_clock_nanosleep(clockid_t which_clock, int flags,
 		    const struct timespec __user *rqtp,
 		    struct timespec __user *rmtp)
 {
 	struct timespec t;
-	struct restart_block *restart_block =
-	    &(current_thread_info()->restart_block);
-	int ret;
 
 	if (invalid_clockid(which_clock))
 		return -EINVAL;
@@ -1419,135 +1026,8 @@ sys_clock_nanosleep(clockid_t which_cloc
 	if (copy_from_user(&t, rqtp, sizeof (struct timespec)))
 		return -EFAULT;
 
-	if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0)
+	if (!timespec_valid(&t))
 		return -EINVAL;
 
-	/*
-	 * Do this here as nsleep function does not have the real address.
-	 */
-	restart_block->arg1 = (unsigned long)rmtp;
-
-	ret = CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t));
-
-	if ((ret == -ERESTART_RESTARTBLOCK) && rmtp &&
-					copy_to_user(rmtp, &t, sizeof (t)))
-		return -EFAULT;
-	return ret;
-}
-
-
-static int common_nsleep(clockid_t which_clock,
-			 int flags, struct timespec *tsave)
-{
-	struct timespec t, dum;
-	struct timer_list new_timer;
-	DECLARE_WAITQUEUE(abs_wqueue, current);
-	u64 rq_time = (u64)0;
-	s64 left;
-	int abs;
-	struct restart_block *restart_block =
-	    &current_thread_info()->restart_block;
-
-	abs_wqueue.flags = 0;
-	init_timer(&new_timer);
-	new_timer.expires = 0;
-	new_timer.data = (unsigned long) current;
-	new_timer.function = nanosleep_wake_up;
-	abs = flags & TIMER_ABSTIME;
-
-	if (restart_block->fn == clock_nanosleep_restart) {
-		/*
-		 * Interrupted by a non-delivered signal, pick up remaining
-		 * time and continue.  Remaining time is in arg2 & 3.
-		 */
-		restart_block->fn = do_no_restart_syscall;
-
-		rq_time = restart_block->arg3;
-		rq_time = (rq_time << 32) + restart_block->arg2;
-		if (!rq_time)
-			return -EINTR;
-		left = rq_time - get_jiffies_64();
-		if (left <= (s64)0)
-			return 0;	/* Already passed */
-	}
-
-	if (abs && (posix_clocks[which_clock].clock_get !=
-			    posix_clocks[CLOCK_MONOTONIC].clock_get))
-		add_wait_queue(&nanosleep_abs_wqueue, &abs_wqueue);
-
-	do {
-		t = *tsave;
-		if (abs || !rq_time) {
-			adjust_abs_time(&posix_clocks[which_clock], &t, abs,
-					&rq_time, &dum);
-		}
-
-		left = rq_time - get_jiffies_64();
-		if (left >= (s64)MAX_JIFFY_OFFSET)
-			left = (s64)MAX_JIFFY_OFFSET;
-		if (left < (s64)0)
-			break;
-
-		new_timer.expires = jiffies + left;
-		__set_current_state(TASK_INTERRUPTIBLE);
-		add_timer(&new_timer);
-
-		schedule();
-
-		del_timer_sync(&new_timer);
-		left = rq_time - get_jiffies_64();
-	} while (left > (s64)0 && !test_thread_flag(TIF_SIGPENDING));
-
-	if (abs_wqueue.task_list.next)
-		finish_wait(&nanosleep_abs_wqueue, &abs_wqueue);
-
-	if (left > (s64)0) {
-
-		/*
-		 * Always restart abs calls from scratch to pick up any
-		 * clock shifting that happened while we are away.
-		 */
-		if (abs)
-			return -ERESTARTNOHAND;
-
-		left *= TICK_NSEC;
-		tsave->tv_sec = div_long_long_rem(left, 
-						  NSEC_PER_SEC, 
-						  &tsave->tv_nsec);
-		/*
-		 * Restart works by saving the time remaing in 
-		 * arg2 & 3 (it is 64-bits of jiffies).  The other
-		 * info we need is the clock_id (saved in arg0). 
-		 * The sys_call interface needs the users 
-		 * timespec return address which _it_ saves in arg1.
-		 * Since we have cast the nanosleep call to a clock_nanosleep
-		 * both can be restarted with the same code.
-		 */
-		restart_block->fn = clock_nanosleep_restart;
-		restart_block->arg0 = which_clock;
-		/*
-		 * Caller sets arg1
-		 */
-		restart_block->arg2 = rq_time & 0xffffffffLL;
-		restart_block->arg3 = rq_time >> 32;
-
-		return -ERESTART_RESTARTBLOCK;
-	}
-
-	return 0;
-}
-/*
- * This will restart clock_nanosleep.
- */
-long
-clock_nanosleep_restart(struct restart_block *restart_block)
-{
-	struct timespec t;
-	int ret = common_nsleep(restart_block->arg0, 0, &t);
-
-	if ((ret == -ERESTART_RESTARTBLOCK) && restart_block->arg1 &&
-	    copy_to_user((struct timespec __user *)(restart_block->arg1), &t,
-			 sizeof (t)))
-		return -EFAULT;
-	return ret;
+	return CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t, rmtp));
 }
Index: linux-2.6.13.ktimers/kernel/timer.c
===================================================================
--- linux-2.6.13.ktimers.orig/kernel/timer.c
+++ linux-2.6.13.ktimers/kernel/timer.c
@@ -912,6 +912,7 @@ static void run_timer_softirq(struct sof
 {
 	tvec_base_t *base = &__get_cpu_var(tvec_bases);
 
+ 	run_ktimer_queues();
 	if (time_after_eq(jiffies, base->timer_jiffies))
 		__run_timers(base);
 }
@@ -1159,64 +1160,6 @@ asmlinkage long sys_gettid(void)
 	return current->pid;
 }
 
-static long __sched nanosleep_restart(struct restart_block *restart)
-{
-	unsigned long expire = restart->arg0, now = jiffies;
-	struct timespec __user *rmtp = (struct timespec __user *) restart->arg1;
-	long ret;
-
-	/* Did it expire while we handled signals? */
-	if (!time_after(expire, now))
-		return 0;
-
-	current->state = TASK_INTERRUPTIBLE;
-	expire = schedule_timeout(expire - now);
-
-	ret = 0;
-	if (expire) {
-		struct timespec t;
-		jiffies_to_timespec(expire, &t);
-
-		ret = -ERESTART_RESTARTBLOCK;
-		if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
-			ret = -EFAULT;
-		/* The 'restart' block is already filled in */
-	}
-	return ret;
-}
-
-asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
-{
-	struct timespec t;
-	unsigned long expire;
-	long ret;
-
-	if (copy_from_user(&t, rqtp, sizeof(t)))
-		return -EFAULT;
-
-	if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0))
-		return -EINVAL;
-
-	expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec);
-	current->state = TASK_INTERRUPTIBLE;
-	expire = schedule_timeout(expire);
-
-	ret = 0;
-	if (expire) {
-		struct restart_block *restart;
-		jiffies_to_timespec(expire, &t);
-		if (rmtp && copy_to_user(rmtp, &t, sizeof(t)))
-			return -EFAULT;
-
-		restart = &current_thread_info()->restart_block;
-		restart->fn = nanosleep_restart;
-		restart->arg0 = jiffies + expire;
-		restart->arg1 = (unsigned long) rmtp;
-		ret = -ERESTART_RESTARTBLOCK;
-	}
-	return ret;
-}
-
 /*
  * sys_sysinfo - fill in sysinfo struct
  */ 
Index: linux-2.6.13.ktimers/include/asm-generic/div64.h
===================================================================
--- linux-2.6.13.ktimers.orig/include/asm-generic/div64.h
+++ linux-2.6.13.ktimers/include/asm-generic/div64.h
@@ -30,6 +30,24 @@
 	__rem;							\
  })
 
+/*
+ * (long)X = ((long long)divs) / (long)div
+ * (long)rem = ((long long)divs) % (long)div
+ *
+ * Warning, this will do an exception if X overflows.
+ */
+#define div_long_long_rem(a,b,c) div_ll_X_l_rem(a,b,c)
+
+/* x = divs / div; *rem = divs % div; */
+static inline unsigned long div_ll_X_l_rem(unsigned long long divs,
+					   unsigned long div,
+					   unsigned long * rem)
+{
+	unsigned long long it = divs;
+	*rem = do_div(it, div);
+	return (unsigned long)it;
+}
+
 #elif BITS_PER_LONG == 32
 
 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx
  2005-09-19 16:48 ` [PATCH] " tglx
@ 2005-09-19 21:47 ` Thomas Gleixner
  2005-09-19 22:03 ` Christoph Lameter
  2005-09-21 19:50 ` Roman Zippel
  3 siblings, 0 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-19 21:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, akpm, george, johnstul, paulmck

On Mon, 2005-09-19 at 23:04 +0200, tglx@linutronix.de wrote:
> ktimers seperate the "timer API" from the "timeout API". ktimers are 

Sorry for double posting. 

mailer / operator madness

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx
  2005-09-19 16:48 ` [PATCH] " tglx
  2005-09-19 21:47 ` [ANNOUNCE] " Thomas Gleixner
@ 2005-09-19 22:03 ` Christoph Lameter
  2005-09-19 22:17   ` Thomas Gleixner
  2005-09-20  0:43   ` George Anzinger
  2005-09-21 19:50 ` Roman Zippel
  3 siblings, 2 replies; 50+ messages in thread
From: Christoph Lameter @ 2005-09-19 22:03 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Mon, 19 Sep 2005 tglx@linutronix.de wrote:

> sources. Another astonishing implementation detail of the current time 
> keeping is the fact that we get the monotonic clock (defined by POSIX as 
> a continous clock source which can not be set) by subtracting a variable 
> offset from the real time clock, which can be set by the user and 
> corrected by NTP or other mechanisms.

The benefit or drawback of that implementation depends which time is more 
important: realtime or monotonic time. I think the most used time value is 
realtime and not monotonic time. Having the real time value in xtime 
saves one addition when retrieving realtime. 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:03 ` Christoph Lameter
@ 2005-09-19 22:17   ` Thomas Gleixner
  2005-09-19 22:24     ` Christoph Lameter
  2005-09-19 22:39     ` Christopher Friesen
  2005-09-20  0:43   ` George Anzinger
  1 sibling, 2 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-19 22:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Mon, 2005-09-19 at 15:03 -0700, Christoph Lameter wrote:
> On Mon, 19 Sep 2005 tglx@linutronix.de wrote:
> 
> > sources. Another astonishing implementation detail of the current time 
> > keeping is the fact that we get the monotonic clock (defined by POSIX as 
> > a continous clock source which can not be set) by subtracting a variable 
> > offset from the real time clock, which can be set by the user and 
> > corrected by NTP or other mechanisms.
> 
> The benefit or drawback of that implementation depends which time is more 
> important: realtime or monotonic time. I think the most used time value is 
> realtime and not monotonic time. Having the real time value in xtime 
> saves one addition when retrieving realtime. 

Thats only partially true. 

Granted, the most used time in user space is clock_realtime
(gettimeofday() / clock_gettime(CLOCK_REALTIME). 

But do we really want to discuss one add instruction ?

The most used time in kernel space is clock_monotonic. 
Thats partially a result of the rather odd POSIX specs regarding
relative CLOCK_REALTIME timers. 

Also the basic prerequisite for for high resolution timers is a fast and
simple access to clock_monotonic rather than to a backward corrected
clock_realtime representation. 

Kernel code speed in hot pathes must have precedence over code executed
on behalf of userspace if its not completely out of bounds. One add/sub
is definitely not.

We should rather ask glibc people why gettimeofday() / clock_getttime()
is called inside the library code all over the place for non obvious
reasons.

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:17   ` Thomas Gleixner
@ 2005-09-19 22:24     ` Christoph Lameter
  2005-09-19 22:44       ` Thomas Gleixner
                         ` (2 more replies)
  2005-09-19 22:39     ` Christopher Friesen
  1 sibling, 3 replies; 50+ messages in thread
From: Christoph Lameter @ 2005-09-19 22:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Tue, 20 Sep 2005, Thomas Gleixner wrote:

> Also the basic prerequisite for for high resolution timers is a fast and
> simple access to clock_monotonic rather than to a backward corrected
> clock_realtime representation. 

Yup that may be a reason to tolerate the add for realtime.

> We should rather ask glibc people why gettimeofday() / clock_getttime()
> is called inside the library code all over the place for non obvious
> reasons.

You can ask lots of application vendors the same question because its all 
over lots of user space code. The fact is that gettimeofday() / 
clock_gettime() efficiency is very critical to the performance of many 
applications on Linux. That is why the addtion of one add instruction may 
better be carefully considered. Many platforms can execute gettimeofday 
without having to enter the kernel.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:17   ` Thomas Gleixner
  2005-09-19 22:24     ` Christoph Lameter
@ 2005-09-19 22:39     ` Christopher Friesen
  2005-09-19 22:54       ` Thomas Gleixner
  1 sibling, 1 reply; 50+ messages in thread
From: Christopher Friesen @ 2005-09-19 22:39 UTC (permalink / raw)
  To: tglx
  Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck

Thomas Gleixner wrote:

> We should rather ask glibc people why gettimeofday() / clock_getttime()
> is called inside the library code all over the place for non obvious
> reasons.

 From an app point of view, there are any number of reasons to check the 
time frequently.

--debugging
--flight-recorder style logs
--if you've got timers in your application, you may want to check to 
make sure that you didn't get woken up early (the linux behaviour of 
returning unused time in select is not portable)
--the app might be tracking it's own behaviour, measuring how long code 
paths take for its own accounting purposes
--emulators (vmware, UML, etc.) often want to check the time quite 
frequently


Chris

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:24     ` Christoph Lameter
@ 2005-09-19 22:44       ` Thomas Gleixner
  2005-09-19 22:50         ` john stultz
  2005-09-19 23:04         ` Christoph Lameter
  2005-09-20  7:10       ` Ingo Molnar
  2005-09-21 19:24       ` Pavel Machek
  2 siblings, 2 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-19 22:44 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Mon, 2005-09-19 at 15:24 -0700, Christoph Lameter wrote:

> > We should rather ask glibc people why gettimeofday() / clock_getttime()
> > is called inside the library code all over the place for non obvious
> > reasons.
> 
> You can ask lots of application vendors the same question because its all 
> over lots of user space code. The fact is that gettimeofday() / 
> clock_gettime() efficiency is very critical to the performance of many 
> applications on Linux. That is why the addtion of one add instruction may 
> better be carefully considered. 

Hmm. I don't understand the argument line completely. 

1. The kernel has to provide ugly mechanisms because a lot of
applications implementations are doing the Wrong Thing ?

2. All gettimeofday implementations I have looked at do a lot of math
anyway, so its definitely more interesting to look at those oddities
rather than discussing a single add. John Stulz timeofday rework have a
clean solution for this - please do not argue about the div64 in his
original patches which he is reworking at the moment.

> Many platforms can execute gettimeofday 
> without having to enter the kernel.

Which ones ? How is this achieved with respect to all the time adjust,
correction... code ?

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:44       ` Thomas Gleixner
@ 2005-09-19 22:50         ` john stultz
  2005-09-19 22:58           ` Thomas Gleixner
  2005-09-19 23:04         ` Christoph Lameter
  1 sibling, 1 reply; 50+ messages in thread
From: john stultz @ 2005-09-19 22:50 UTC (permalink / raw)
  To: tglx; +Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, paulmck

On Tue, 2005-09-20 at 00:44 +0200, Thomas Gleixner wrote:
> On Mon, 2005-09-19 at 15:24 -0700, Christoph Lameter wrote:
> 
> > > We should rather ask glibc people why gettimeofday() / clock_getttime()
> > > is called inside the library code all over the place for non obvious
> > > reasons.
> > 
> > You can ask lots of application vendors the same question because its all 
> > over lots of user space code. The fact is that gettimeofday() / 
> > clock_gettime() efficiency is very critical to the performance of many 
> > applications on Linux. That is why the addtion of one add instruction may 
> > better be carefully considered. 
> 
> Hmm. I don't understand the argument line completely. 
> 
> 1. The kernel has to provide ugly mechanisms because a lot of
> applications implementations are doing the Wrong Thing ?
> 
> 2. All gettimeofday implementations I have looked at do a lot of math
> anyway, so its definitely more interesting to look at those oddities
> rather than discussing a single add. John Stulz timeofday rework have a
> clean solution for this - please do not argue about the div64 in his
> original patches which he is reworking at the moment.

The simplest solution is to keep wall-time maintained in a timespec as
well as the nsec_t based system_time/wall_time_offset combo. This avoids
the extra add in the hotpath, and only costs an extra add at interrupt
time.

I'll have an updated patch that includes some of Roman's suggestions
from earlier soon.

> > Many platforms can execute gettimeofday 
> > without having to enter the kernel.
> 
> Which ones ? How is this achieved with respect to all the time adjust,
> correction... code ?

Many arches have userspace gtod implementations (x86-64, ppc64, and ia64
as well). Although my timeofday code allows for this as well (I had it
working for x86-64 awhile back). 

thanks
-john



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:39     ` Christopher Friesen
@ 2005-09-19 22:54       ` Thomas Gleixner
  2005-09-20  4:57         ` Christopher Friesen
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-19 22:54 UTC (permalink / raw)
  To: Christopher Friesen
  Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck

On Mon, 2005-09-19 at 16:39 -0600, Christopher Friesen wrote:
> Thomas Gleixner wrote:
> 
> > We should rather ask glibc people why gettimeofday() / clock_getttime()
> > is called inside the library code all over the place for non obvious
> > reasons.
> 
>  From an app point of view, there are any number of reasons to check the 
> time frequently.
> 
> --debugging

Non standard case.

> --flight-recorder style logs

If you want to implement such stuff efficiently you rely on rdtscll() on
x86 or other monotonic easy accessible time souces and not on a
permanent call to gettimeofday.

> --if you've got timers in your application, you may want to check to 
> make sure that you didn't get woken up early (the linux behaviour of 
> returning unused time in select is not portable)

#ifdef is portable


Please beware me of red herrings. If application developers code with
respect to random OS worst case behaviour then they should not complain
that OS N is having an additional add instruction in one of the pathes.

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:50         ` john stultz
@ 2005-09-19 22:58           ` Thomas Gleixner
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-19 22:58 UTC (permalink / raw)
  To: john stultz; +Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, paulmck

On Mon, 2005-09-19 at 15:50 -0700, john stultz wrote:
> > 2. All gettimeofday implementations I have looked at do a lot of math
> > anyway, so its definitely more interesting to look at those oddities
> > rather than discussing a single add. John Stulz timeofday rework have a
> > clean solution for this - please do not argue about the div64 in his
> > original patches which he is reworking at the moment.
> 
> The simplest solution is to keep wall-time maintained in a timespec as
> well as the nsec_t based system_time/wall_time_offset combo. This avoids
> the extra add in the hotpath, and only costs an extra add at interrupt
> time.

<NITPICKING>
The crucial question is what's the hot path ? Depending on the
application type I want to avoid the add in the interrupt code. :)
</NITPICKING>


> Many arches have userspace gtod implementations (x86-64, ppc64, and ia64
> as well). Although my timeofday code allows for this as well (I had it
> working for x86-64 awhile back). 

Was not aware of that. Thanks for clarification.

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:44       ` Thomas Gleixner
  2005-09-19 22:50         ` john stultz
@ 2005-09-19 23:04         ` Christoph Lameter
  2005-09-19 23:12           ` Thomas Gleixner
  1 sibling, 1 reply; 50+ messages in thread
From: Christoph Lameter @ 2005-09-19 23:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Tue, 20 Sep 2005, Thomas Gleixner wrote:

> Hmm. I don't understand the argument line completely. 
> 
> 1. The kernel has to provide ugly mechanisms because a lot of
> applications implementations are doing the Wrong Thing ?

Lets skip the "wrong thing"... Or are you saying that glibc and all the 
apps are all wrong? 

Applications call gettimeofday for a variety of reasons. One is because it 
is widely available over different platformsn and application want to 
schedule things, need timestamps etc etc.

> > Many platforms can execute gettimeofday 
> > without having to enter the kernel.
> 
> Which ones ? How is this achieved with respect to all the time adjust,
> correction... code ?

IA64 f.e. has a special instruction that allows access to kernel user 
space without having to do a context switch.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 23:04         ` Christoph Lameter
@ 2005-09-19 23:12           ` Thomas Gleixner
  2005-09-20  7:14             ` Ingo Molnar
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-19 23:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Mon, 2005-09-19 at 16:04 -0700, Christoph Lameter wrote:
> On Tue, 20 Sep 2005, Thomas Gleixner wrote:
> 
> > Hmm. I don't understand the argument line completely. 
> > 
> > 1. The kernel has to provide ugly mechanisms because a lot of
> > applications implementations are doing the Wrong Thing ?
> 
> Lets skip the "wrong thing"... Or are you saying that glibc and all the 
> apps are all wrong? 
> 
> Applications call gettimeofday for a variety of reasons. One is because it 
> is widely available over different platformsn and application want to 
> schedule things, need timestamps etc etc.

Accepted. But I still doubt that the number of calls to gettimeofday is
in anyway justified. The question I'm asking if it is really worth a
long and epic discussion about a single add instruction ?


> > > Many platforms can execute gettimeofday 
> > > without having to enter the kernel.
> > 
> > Which ones ? How is this achieved with respect to all the time adjust,
> > correction... code ?
> 
> IA64 f.e. has a special instruction that allows access to kernel user 
> space without having to do a context switch.

Ok, was not aware of that and John kindly clarified this already. 

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:03 ` Christoph Lameter
  2005-09-19 22:17   ` Thomas Gleixner
@ 2005-09-20  0:43   ` George Anzinger
  1 sibling, 0 replies; 50+ messages in thread
From: George Anzinger @ 2005-09-20  0:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: tglx, linux-kernel, mingo, akpm, johnstul, paulmck

Christoph Lameter wrote:
> On Mon, 19 Sep 2005 tglx@linutronix.de wrote:
> 
> 
>>sources. Another astonishing implementation detail of the current time 
>>keeping is the fact that we get the monotonic clock (defined by POSIX as 
>>a continuous clock source which can not be set) by subtracting a variable 
>>offset from the real time clock, which can be set by the user and 
>>corrected by NTP or other mechanisms.

Why is this astonishing?  What it really indicates is the nature of 
Linux where in we have just (with 2.6) introduced the concept of 
monotonic time.  As such, and with few users, it made a LOT of sense to 
not upset too much code by making it the primary clock.  In the end, the 
difference between the two clocks is a constant offset and it is only an 
add in one path or the other.

An argument from the other side is that ntp works with CLOCK_REALTIME 
and so that is where and what it corrects.

Agreed, this can be turned around, however, one needs folks like John 
Stultz who take the time to understand ntp as well as all the other 
clock issues to turn things like this around.  Still, we should consider 
carefully IF we want to turn it around.

A far more astonishing thing, IMHO, is the cascade in the timers code...
> 
> 
> The benefit or drawback of that implementation depends which time is more 
> important: realtime or monotonic time. I think the most used time value is 
> realtime and not monotonic time. Having the real time value in xtime 
> saves one addition when retrieving realtime. 
> -
Both sides of this argument have merit.  Much as we would like to, we 
can not change user usage.  AND, in the end, they are, and will continue 
to make far more calls to get the time than the kernel does.  So, in raw 
cpu power (or time) consumed, the user get time will win over kernel 
usage.  Also, the time to do a gettimeofday is easily computed with the 
most simple program...
-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:54       ` Thomas Gleixner
@ 2005-09-20  4:57         ` Christopher Friesen
  2005-09-20  5:11           ` Thomas Gleixner
  0 siblings, 1 reply; 50+ messages in thread
From: Christopher Friesen @ 2005-09-20  4:57 UTC (permalink / raw)
  To: tglx
  Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck

Thomas Gleixner wrote:
> On Mon, 2005-09-19 at 16:39 -0600, Christopher Friesen wrote:
> 
>>Thomas Gleixner wrote:

>>>We should rather ask glibc people why gettimeofday() / clock_getttime()
>>>is called inside the library code all over the place for non obvious
>>>reasons.

>>--flight-recorder style logs

> If you want to implement such stuff efficiently you rely on rdtscll() on
> x86 or other monotonic easy accessible time souces and not on a
> permanent call to gettimeofday.

Not portable across architectures, and doesn't work across all smp/numa 
environments.  Also not easy to compare with other nodes on the network, 
whereas with ntp-synch'd nodes you can use gettimeofday() for quite 
accurate correlations.

> Please beware me of red herrings. If application developers code with
> respect to random OS worst case behaviour then they should not complain
> that OS N is having an additional add instruction in one of the pathes.

Actually I'm not complaining about additional add instructions.  I was 
just suggesting some reasons why apps might reasonably want to know the 
time frequently.

Chris


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-20  4:57         ` Christopher Friesen
@ 2005-09-20  5:11           ` Thomas Gleixner
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-20  5:11 UTC (permalink / raw)
  To: Christopher Friesen
  Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck

On Mon, 2005-09-19 at 22:57 -0600, Christopher Friesen wrote:
> >>--flight-recorder style logs
> 
> > If you want to implement such stuff efficiently you rely on rdtscll() on
> > x86 or other monotonic easy accessible time souces and not on a
> > permanent call to gettimeofday.
> 
> Not portable across architectures, and doesn't work across all smp/numa 
> environments.  Also not easy to compare with other nodes on the network, 
> whereas with ntp-synch'd nodes you can use gettimeofday() for quite 
> accurate correlations.

Sorry was a stupid argument. Withdrawn herby 

> > Please beware me of red herrings. If application developers code with
> > respect to random OS worst case behaviour then they should not complain
> > that OS N is having an additional add instruction in one of the pathes.
> 
> Actually I'm not complaining about additional add instructions.  I was 
> just suggesting some reasons why apps might reasonably want to know the 
> time frequently.

ok

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:24     ` Christoph Lameter
  2005-09-19 22:44       ` Thomas Gleixner
@ 2005-09-20  7:10       ` Ingo Molnar
  2005-09-21 19:24       ` Pavel Machek
  2 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2005-09-20  7:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, linux-kernel, akpm, george, johnstul, paulmck


* Christoph Lameter <clameter@engr.sgi.com> wrote:

> > We should rather ask glibc people why gettimeofday() / clock_getttime()
> > is called inside the library code all over the place for non obvious
> > reasons.
> 
> You can ask lots of application vendors the same question because its 
> all over lots of user space code. The fact is that gettimeofday() / 
> clock_gettime() efficiency is very critical to the performance of many 
> applications on Linux. That is why the addtion of one add instruction 
> may better be carefully considered. Many platforms can execute 
> gettimeofday without having to enter the kernel.

i think this line of argument got into a bit of a wrong direction: do we 
seriously consider a single 'add' as an argument to _not_ go to a much 
cleaner implementation? The answer is very simple: we dont. In the core 
kernel we frequently skip other, much worthier micro-optimizations in 
favor of cleanliness. The time subsystem has been limping along for 
many, many years, and to bring new life into it we need John's and 
Thomas's stuff. Simple as that. I'd give up much more than just a single 
cycle add overhead for that ...

it's probably not even worth keeping the timespec 'cached' in parallel 
to nsec_t - but in any case, speedups like that should be considered 
totally separately - cleanliness, the main problem of the whole time 
subsystem, comes first.  _Once_ cleanliness has been achieved, we can 
consider micro-optimizations anew, and judge them by how much they bring 
and how they affect cleanliness.

[ Even when not considering cleanliness at all, the best opportunities 
  for optimizations are elsewhere. E.g. we could speed up 
  sys_gettimeofday() much more by skipping a number of hardware bug 
  workarounds that affect the quality of e.g. the TSC, and other timer 
  hardware details that are simpler on modern hardware. So if someone is 
  after sys_gettimeofday() performance, dont look for a single add (or 
  even a single division), go for the bigger picture first.  E.g.  the 
  vsyscall people went for the bigger picture on modern platforms and 
  sped sys_time up by doing it in userspace most of the time and thus 
  skipping hundreds of cycles of syscall entry overhead - not a cycle 
  like an add is. ]

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 23:12           ` Thomas Gleixner
@ 2005-09-20  7:14             ` Ingo Molnar
  0 siblings, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2005-09-20  7:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Lameter, linux-kernel, akpm, george, johnstul, paulmck


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > Applications call gettimeofday for a variety of reasons. One is because it 
> > is widely available over different platformsn and application want to 
> > schedule things, need timestamps etc etc.
> 
> Accepted. But I still doubt that the number of calls to gettimeofday 
> is in anyway justified. The question I'm asking if it is really worth 
> a long and epic discussion about a single add instruction ?

it is absolutely and emphatically not worth it.

even in a hypothetical scenario [which this patchset is _not_ analogous 
to] where a new, clean subsystem introduces significant overhead, but 
the old subsystem is unclean, we frequently go with the new one - 
because it's so much easier to speed up something that is clean, robust 
and well-designed, than something that has been cobbled together!

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 22:24     ` Christoph Lameter
  2005-09-19 22:44       ` Thomas Gleixner
  2005-09-20  7:10       ` Ingo Molnar
@ 2005-09-21 19:24       ` Pavel Machek
  2 siblings, 0 replies; 50+ messages in thread
From: Pavel Machek @ 2005-09-21 19:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck

Hi!

> > Also the basic prerequisite for for high resolution timers is a fast and
> > simple access to clock_monotonic rather than to a backward corrected
> > clock_realtime representation. 
> 
> Yup that may be a reason to tolerate the add for realtime.
> 
> > We should rather ask glibc people why gettimeofday() / clock_getttime()
> > is called inside the library code all over the place for non obvious
> > reasons.
> 
> You can ask lots of application vendors the same question because its all 
> over lots of user space code. The fact is that gettimeofday() / 
> clock_gettime() efficiency is very critical to the performance of many 
> applications on Linux. That is why the addtion of one add instruction may 
> better be carefully considered. Many platforms can execute gettimeofday 
> without having to enter the kernel.

Eh? One addition is going to be lost in noise compared to syscall overhead.
(For vsyscall, you may be closer to truth, but I doubt it. You could still gain
more than one addition by using some strange calling convention).


-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx
                   ` (2 preceding siblings ...)
  2005-09-19 22:03 ` Christoph Lameter
@ 2005-09-21 19:50 ` Roman Zippel
  2005-09-21 22:41   ` Thomas Gleixner
  3 siblings, 1 reply; 50+ messages in thread
From: Roman Zippel @ 2005-09-21 19:50 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

Hi,

On Mon, 19 Sep 2005 tglx@linutronix.de wrote:

First a quick question:

> This revealed a reasonable explanation for this behaviour. Both 
> networking and disk I/O arm a lot of timeout timers (the maximum number 
> of armed timers during the tests observed was ~400000).

This triggers the obvious question: where are these timers coming from? 
You don't think that having that much timers in first place is little 
insane (especially if these are kernel timers)?

Ok, now to the long part:

> The conclusions of my recent work on Linux time(rs) related problems and 
> the analysis of related patches are:
> 
> 1. The HZ/jiffy based usage of time in the kernel code has to be
>    converted to human time units.
> 
> 2. A clean seperation of all related APIs and subsystems is necessary 
>    even if they have interdependencies and shared functionality

How you get to these conclusions is still rather unclear, I don't even 
really know what the problem is from just reading the pretext. You talk 
about previous efforts, you talk about users abusing the timer system, but 
what is actually the problem of the timer system itself?

Later it becomes clear that you want high resolution timers, what doesn't 
become clear is why it's such an essential feature that everyone has to 
pay the price for it (this does not only include changes in runtime 
behaviour, but also the proposed API changes).

What is seriously missing here is the big picture. First off how does it 
currently look like? You rather shortly mention scheduler ticks and your 
analysis basically says only that it's "a bunch of ugliness". There is no 
mention why they are needed in first place and there is no real 
explanation why they are such a big problem for high resolution timers.

Second, an API cleanup is all nice, but the more interesting part is still 
what is behind this API and this part you pretty much leave in the dark. 
Basically how does the new big picture look like and how do high 
resolution timer fit into it? (You are more busy defending the 64bit math, 
than actually explaining why and where it's needed in the first place.)

Sorry, if this sounds harsh, but your announcement is more a random 
collection of information about timers than an explanation of why ktimers 
are desirable. I'm not against high resolution timers per se, but this 
doesn't explain why it has to be high resolution all the way. It also 
doesn't explain how it will interact with Johns work, e.g. I'm only scared 
if I see this in the ktimer_hres patch:

+extern int arch_hrtimer_init(int highres);
+extern int arch_hrtimer_reprogram(nsec_t expires);
+extern void arch_hrtimer_trigger_ints(void);


Ok, so what's missing? From a basic design overview I would expect some 
information about types of time within the kernel and their relationship. 
We basically have three types:
- scheduler time
- wallclock time
- process time

The scheduler time (aka jiffies) is not just used for timeouts, it's the 
basic time unit to schedule cpu time. It's major requirement is simplicity 
- a 32bit value can always be read without locking and calculations based 
on it are simple.
I exclude posix clocks here as it can be used with both wallclock and 
process time. The main difference between them is that the latter is user 
programmable. Here we get to the core problem of timer ticks: the current 
timer system is designed around a simple timer model, which is not 
reprogrammable, so the timer resolution available to user space is 
limited to the timer tick resolution.

Johns patches now introduce two major new concepts as a generic mechanism 
(and not just hidden somewhere in arch code): 1) a timer source 
abstraction, 2) making wallclock updates independent of the timer tick. 
BTW here you completely miss the "main point of criticizm", the 64bit math 
is a problem, but the main problem is that he completely changed the NTP 
kernel model. I don't deny that the NTP code could use some updates 
itself, but that's a completely separate problem. Regarding the timer 
system it's only important how to synchronize NTP time with the kernel 
wallclock time, as soon as you get that right, the whole 64bit math 
problematic becomes irrelevant.

The existence of the timer source abstraction is a major requirement for 
further improvements (in this regard it's already suspicious, that you put 
major changes before Johns patch). The next major change would be to add 
the possibility to reprogram a timer source, the scheduler can use this to 
skip timer ticks and e.g. itimer can offer higher resolution timers. The 
main point here is before we get to any API decisions, we need to develop 
a model how a single time source can drive multiple users. Your split 
between user timers and kernel timeouts leaves this question completely 
open.

The next step (_after_ reprogrammable timer sources) would be increasing 
the timer resolution. Here I'm not at all convinced, that we need to 
change everything to nanosecond resolution, we can easily make this a 
config option which either ties process time resolution to scheduler time 
or makes it independent. The first would make process time a 32bit ms 
value (basically current behaviour), the latter can make it to a 64bit ns 
value. Anyone trying to introduce nsec_t in common code really needs to 
come up with some better arguments why calculations in ns are necessary 
unconditionally, instead of making the resolution configurable.

In summary please provide a larger picture for your changes, it's 
especially important to desribe the relationship between the various 
systems. The API definition is only the last step and is derived from 
these relationships.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-21 19:50 ` Roman Zippel
@ 2005-09-21 22:41   ` Thomas Gleixner
  2005-09-22 12:59     ` Ingo Molnar
  2005-09-22 23:09     ` Roman Zippel
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-21 22:41 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

Hi,

On Wed, 2005-09-21 at 21:50 +0200, Roman Zippel wrote:
> First a quick question:
> 
> > This revealed a reasonable explanation for this behaviour. Both 
> > networking and disk I/O arm a lot of timeout timers (the maximum number 
> > of armed timers during the tests observed was ~400000).
> 
> This triggers the obvious question: where are these timers coming from? 
> You don't think that having that much timers in first place is little 
> insane (especially if these are kernel timers)?

Quick answer: Networking and disk I/O. Insane load on a 4 way SMP
machine. Check yourself. :)

FYI. I'm not the only one who observed this.

> Ok, now to the long part:
> 
> > The conclusions of my recent work on Linux time(rs) related problems and 
> > the analysis of related patches are:
> > 
> > 1. The HZ/jiffy based usage of time in the kernel code has to be
> >    converted to human time units.
> > 
> > 2. A clean seperation of all related APIs and subsystems is necessary 
> >    even if they have interdependencies and shared functionality
> 
> How you get to these conclusions is still rather unclear, I don't even 
> really know what the problem is from just reading the pretext. You talk 
> about previous efforts, you talk about users abusing the timer system, but 
> what is actually the problem of the timer system itself?

Actually there are several problems in the timer system:

1. HZ/jiffie boundness in the current implementation is problematic vs.
the conversion of human time units, which is the way programmers think
and the way time values are defined in data sheets, to HZ/jiffies. There
are dozens of places which get this completely wrong - either due to the
missunderstanding or dumbness of programmers or due to the fact that HZ
is not longer =100 and nobody cared to change that affected piece of
code. Using human time units in the API solves this in a clear way.
There is no problem to have conversion functions which convert those to
jiffies/HZ or whatever internal representation during compile or if
necessary run time, but relying on non constant units of time in an API
is utterly wrong and error prone.


2. The all in one solution for "timeouts" and "timers".

I think I made this point rather clear, but nevertheless:

Timeouts are coarse grained functions to catch error conditions. The
vast majority of those never expire. The implementation emphasis is fast
insertion/removal.

Timers are (possibly/desirably) fine grained functions to control
program flow in a time ordered way. The vast majority of those  expire.
The implementation emphasis is time accuracy. Don't confuse accuracy
with high resolution! 

Intermingling those types (1.) /(2.) leads to an obvious conflict in
interests and to restrictions of extensibility / flexibility especially
when (3.) applies.

3. There _are_ provable abuses of the current timer ("timeout") system.
Functions which run longer than the next timer tick have to be
considered insane. I just pointed this out for reference and I sent a
note to the subsystem maintainer(s) to make them aware of that before
writing this. If you read the writeup without being biased in front you
might notice that there are explicit examples of those insane abuses of
timers. If you consider those as legitimate we can stop the discussion
right now. I also pointed out that the HZ=1000->HZ=250 change just hides
this rather than solving the underlying problem.

> Later it becomes clear that you want high resolution timers, what doesn't 
> become clear is why it's such an essential feature that everyone has to 
> pay the price for it (this does not only include changes in runtime 
> behaviour, but also the proposed API changes).

1. Yes, the final goal are high resolution timers. I never denied that.

2. The integration of high resolution timers is a long standing issue,
which was rejected mostly due to the intrusiveness of the proposed
patches.

3. ktimers itself are designed to be aware of a possible extension for
HRT, but they provide a benefit without high resolution timers and
nobody has to pay a price for them when they are not
configured/implemented. The removal of the abstime list reprogramming in
posix-timers.c is definitely a worthwhile cleanup and totaly unrelated
to HRT.

> What is seriously missing here is the big picture. First off how does it 
> currently look like? Y

Point taken. The writeup was intended for people who are familiar with
the current situation.

> You rather shortly mention scheduler ticks and your 
> analysis basically says only that it's "a bunch of ugliness". 

Wrong. I did not say there is a "bunch of ugliness" in general. Even not
between the lines. Period. 

The only places where "ugl*" is used in the whole writeup are:
"ticks introduce a bunch of ugliness especially when it comes to time
synchronizing with high resolution time 
sources."

"The combination of both patches provides the grounds and leads the way
to the cleanup of the timeout API and the implementation of
dyntick/tickless support without introducing additional ugliness."

I'm sure that both assertions are true, especially in the context they
were made.

Roman, please keep this at a serious level. I dont have the intention to
participate on another "UFT-8, Reiser4, sizeof(*p)" lkml debate club.

> There is no mention why they are needed in first place and there is no real 
> explanation why they are such a big problem for hich resolution timers.

Maybe above does shed more light on it ?

> Second, an API cleanup is all nice, but the more interesting part is still 
> what is behind this API and this part you pretty much leave in the dark. 

Whats in the dark ?

> Basically how does the new big picture look like and how do high 
> resolution timer fit into it? (You are more busy defending the 64bit math, 
> than actually explaining why and where it's needed in the first place.)

I also explained why I wanted to seperate "timeout" and "timers" APIs. I
explained why I choose rbtree and I explained why I used 64bit math and
at least why 64 bit math is not that evil as commonly seen. 

There is also code and if you need more details, call my sales
departent....

> Sorry, if this sounds harsh, but your announcement is more a random 
> collection of information about timers than an explanation of why ktimers 
> are desirable.

First of all, this is volunteer work and I _did_ take the time to write
up a detailed explanation at all rather than throwing a random patch
with a 10 line bla into the arena. Do you expect that I write a PhD
thesis on that ?

Second this writeup was not targeted for John User. ....

>  I'm not against high resolution timers per se, but this 
> doesn't explain why it has to be high resolution all the way. 

Where is high resolution all the way. Care to read the patch ? It's high
resolution aware and it does take out odd areas of code by design.

> It also doesn't explain how it will interact with Johns work,

"The following add on patches are not provided for ad hoc inclusion as
they contain third party patches. The reason for providing this series
is to demonstrate the future use of ktimers and the simple extensibility
for the impelemtation of high resolution timers. Especially John Stultz
timeofday patch is a complete seperate issue
and just used due to the ability to provide high resolution timers in a
simple and non intrusive way."

Isn't this clear enough ?

>  e.g. I'm only scared 
> if I see this in the ktimer_hres patch:
> 
> +extern int arch_hrtimer_init(int highres);
> +extern int arch_hrtimer_reprogram(nsec_t expires);
> +extern void arch_hrtimer_trigger_ints(void);

Whats scary ? This is proof of concept. See above !

Have you a simpler solution and did you care to read the comment in
arch/i386/kernel/hrtimer.c ?

> Ok, so what's missing? From a basic design overview I would expect some 
> information about types of time within the kernel and their relationship. 
> We basically have three types:
> - scheduler time
> - wallclock time
> - process time

What about monotonic time ?

> The scheduler time (aka jiffies) is not just used for timeouts, it's the 
> basic time unit to schedule cpu time. It's major requirement is simplicity 
> - a 32bit value can always be read without locking and calculations based 
> on it are simple.
> I exclude posix clocks here as it can be used with both wallclock and 
> process time. 

What about monotonic time ?

> The main difference between them is that the latter is user 
> programmable. 

wallclock is reprogrammable too and it introduces a bunch of horrible
functions in posix-timers.c. grep for abs_list. I explained why its
horrible already.

> Here we get to the core problem of timer ticks: the current 
> timer system is designed around a simple timer model, which is not 
> reprogrammable, so the timer resolution available to user space is 
> limited to the timer tick resolution.
> 
> Johns patches now introduce two major new concepts as a generic mechanism 
> (and not just hidden somewhere in arch code): 1) a timer source 
> abstraction, 2) making wallclock updates independent of the timer tick. 

1. I'm well aware of the addressed problems in Johns patches.

2.I dont see any hidden arch code in the ktimers patch. Do you ?

fs/exec.c                    |    9
 fs/proc/array.c              |    6
 include/asm-generic/div64.h  |   18
 include/linux/ktimer.h       |  142 +++++++
 include/linux/posix-timers.h |   87 ++--
 include/linux/sched.h        |    4
 include/linux/time.h         |   65 ++-
 include/linux/timer.h        |    2
 init/main.c                  |    1
 kernel/Makefile              |    3
 kernel/exit.c                |    2
 kernel/fork.c                |    5
 kernel/itimer.c              |   83 +---
 kernel/ktimers.c             |  826 ++++++++++++++++++++++++++++++++++++++++++
 kernel/posix-cpu-timers.c    |   23 -
 kernel/posix-timers.c        |  832 ++++++++-----------------------------------
 kernel/timer.c               |   59 ---


May I politely remind you, that I provided the complete patch series
just to show the future use and clearly stated that it is just a proof
of concept implemetation on top of ktimers.

> BTW here you completely miss the "main point of criticizm", the 64bit math 
> is a problem, but the main problem is that he completely changed the NTP 
> kernel model. I don't deny that the NTP code could use some updates 
> itself, but that's a completely separate problem. Regarding the timer 
> system it's only important how to synchronize NTP time with the kernel 
> wallclock time, as soon as you get that right, the whole 64bit math 
> problematic becomes irrelevant.

Roman, what are you trying to achieve ? Finding a playground for
rabulistic discussions ?

> The existence of the timer source abstraction is a major requirement for 
> further improvements (in this regard it's already suspicious, that you put 
> major changes before Johns patch). 

Whats suspicious on that ? Seperating the "timeout" API and the "timer"
API has nothing to do with Johns patches.

> The next major change would be to add the possibility to reprogram a 
> timer source, the scheduler can use this to 
> skip timer ticks and e.g. itimer can offer higher resolution timers. The 
> main point here is before we get to any API decisions, we need to develop 
> a model how a single time source can drive multiple users. Your split 
> between user timers and kernel timeouts leaves this question completely 
> open.

Did I claim, that ktimers solve this problem?

No. 

The patches are related but address different aspects of the overall
problem without conflicting with each other. Quite the contrary: they
complement each other.

I clearly stated that the reprogramming of timer events, which are not
addressed by ktimers and I never claimed ktimers does, is a completely
different problem.

> The next step (_after_ reprogrammable timer sources) would be increasing 
> the timer resolution.

Please let me correct you here. Adding reprogrammable timer events
before you have the core timer system ready is wrong by design.
Providing reprogrammable timer events is simple, but when the timer core
system which depends on timer events is not ready for that you implement
useless things and implement likely stuff which is not matching the
requirements of the generic system.

>  Here I'm not at all convinced, that we need to 
> change everything to nanosecond resolution, we can easily make this a 
> config option which either ties process time resolution to scheduler time 
> or makes it independent. The first would make process time a 32bit ms 
> value (basically current behaviour), the latter can make it to a 64bit ns 
> value. Anyone trying to introduce nsec_t in common code really needs to 
> come up with some better arguments why calculations in ns are necessary 
> unconditionally, instead of making the resolution configurable.

Please provide a whatever time unit based and configurable / flexible
solution for that instead of making unprovable claims !

> In summary please provide a larger picture for your changes, it's 
> especially important to desribe the relationship between the various 
> systems. The API definition is only the last step and is derived from 
> these relationships.

Sorry. When you are not able to get the larger picture in your mind, I
doubt that you are the right person to discuss this topic. This kind of
argument is not working with me, especially not when repeated all over
the place. 

Please provide an alternative solution (i.e. code to review) yourself.

tglx




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-21 22:41   ` Thomas Gleixner
@ 2005-09-22 12:59     ` Ingo Molnar
  2005-09-22 23:09     ` Roman Zippel
  1 sibling, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2005-09-22 12:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Roman Zippel, linux-kernel, akpm, george, johnstul, paulmck


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > > This revealed a reasonable explanation for this behaviour. Both 
> > > networking and disk I/O arm a lot of timeout timers (the maximum number 
> > > of armed timers during the tests observed was ~400000).
> > 
> > This triggers the obvious question: where are these timers coming from? 
> > You don't think that having that much timers in first place is little 
> > insane (especially if these are kernel timers)?
> 
> Quick answer: Networking and disk I/O. Insane load on a 4 way SMP 
> machine. Check yourself. :)

a busy network server can easily have millions of timers pending. I once 
had to increase a server's 16 million tw timer sysctl limit ...

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-21 22:41   ` Thomas Gleixner
  2005-09-22 12:59     ` Ingo Molnar
@ 2005-09-22 23:09     ` Roman Zippel
  2005-09-22 23:31       ` Christopher Friesen
                         ` (3 more replies)
  1 sibling, 4 replies; 50+ messages in thread
From: Roman Zippel @ 2005-09-22 23:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

Hi,

On Thu, 22 Sep 2005, Thomas Gleixner wrote:

> > > This revealed a reasonable explanation for this behaviour. Both 
> > > networking and disk I/O arm a lot of timeout timers (the maximum number 
> > > of armed timers during the tests observed was ~400000).
> > 
> > This triggers the obvious question: where are these timers coming from? 
> > You don't think that having that much timers in first place is little 
> > insane (especially if these are kernel timers)?
> 
> Quick answer: Networking and disk I/O. Insane load on a 4 way SMP
> machine. Check yourself. :)

This no answer at all, you only repeat what you already said above. :(
Care to share your knowledge?

First off, I can understand that you're rather upset with what I wrote, 
unfortunately you got overly defensive, so could you please next time 
not reply immediately and first sleep over it, an overly emotional reply 
is funny to read but not exactly useful.
The main problem with your text is that you jump from one topic to the 
next, making it impossible to create a coherent picture from it. Maybe 
it's obvious for you, but I'd like you to fill in these holes to better 
understand the design desicions.

> Actually there are several problems in the timer system:
> 
> 1. HZ/jiffie boundness in the current implementation is problematic vs.
> the conversion of human time units, which is the way programmers think
> and the way time values are defined in data sheets, to HZ/jiffies.

This is an API problem. How is this related to core timer system (and more 
specially to ktimers).

> 2. The all in one solution for "timeouts" and "timers".
> 
> I think I made this point rather clear, but nevertheless:
> 
> Timeouts are coarse grained functions to catch error conditions. The
> vast majority of those never expire. The implementation emphasis is fast
> insertion/removal.
> 
> Timers are (possibly/desirably) fine grained functions to control
> program flow in a time ordered way. The vast majority of those  expire.
> The implementation emphasis is time accuracy. Don't confuse accuracy
> with high resolution! 
> 
> Intermingling those types (1.) /(2.) leads to an obvious conflict in
> interests and to restrictions of extensibility / flexibility especially
> when (3.) applies.

You don't make it obvious at all. You jump from problems with _kernel_ 
timers, to the introduction of a new subsytem to manage _process_ timers.
Why don't you fix the problems with kernel timers separately?
Only because kernel timer require less precision, doesn't mean you can 
make them imprecise. The timer accuracy is defined by the timer source and 
I expect it to be the same for both kernel and process timers.

> 3. There _are_ provable abuses of the current timer ("timeout") system.
> Functions which run longer than the next timer tick have to be
> considered insane.

Fine. Again, how exactly does this problem with _kernel_ timer relate to 
the introduction of ktimers?

> The only places where "ugl*" is used in the whole writeup are:
> "ticks introduce a bunch of ugliness especially when it comes to time
> synchronizing with high resolution time 
> sources."
> 
> "The combination of both patches provides the grounds and leads the way
> to the cleanup of the timeout API and the implementation of
> dyntick/tickless support without introducing additional ugliness."
> 
> I'm sure that both assertions are true, especially in the context they
> were made.

It's nice that you're sure of it, but as long don't provide the means to 
verify them, they are just assertions.
You never really expand on what this "bunch of ugliness" is, you talk 
about timer abusers and API problems, but what is the problem with timer 
ticks related to high resolution timers?
How are dyntick/tickless support and ktimers supposed to share the same 
timer source? This never becomes clear from you document and neither from 
your example code.

> > Basically how does the new big picture look like and how do high 
> > resolution timer fit into it? (You are more busy defending the 64bit math, 
> > than actually explaining why and where it's needed in the first place.)
> 
> I also explained why I wanted to seperate "timeout" and "timers" APIs. I
> explained why I choose rbtree and I explained why I used 64bit math and
> at least why 64 bit math is not that evil as commonly seen. 

I don't say that 64bit math is evil, I just question that it's required - 
small, but important difference.
The main problem with your ktimer patch is that it's another all-in-one 
patch, it simply changes too many aspects at once. If you want to 
introduce a new API, you can do so by first introducing a small layer 
which maps to the old layer. This makes it easier to see and prove any 
potential improvement.

> > Sorry, if this sounds harsh, but your announcement is more a random 
> > collection of information about timers than an explanation of why ktimers 
> > are desirable.
> 
> First of all, this is volunteer work and I _did_ take the time to write
> up a detailed explanation at all rather than throwing a random patch
> with a 10 line bla into the arena. Do you expect that I write a PhD
> thesis on that ?

No, but I hope you didn't just expect hoorray calls. I appreciate that you 
try to explain this, but you should also expect criticism. You make it 
sound I'm doing this just for fun and to annoy you, but I'm trying to keep 
the quality level of Linux up and half finished ideas out of it. There is 
a reason that the answer took a few days, I really thought very carefully 
about this based on your document and patches. It's still possible I 
missed something, but feel free to point this out (preferably in a 
civilized manner).

> Second this writeup was not targeted for John User. ....

Well, I'm not sure whom you targeted, but it wasn't coherent enough for 
the avarage LKML reader (at least for those wanting more than just cursory 
information).

> >  I'm not against high resolution timers per se, but this 
> > doesn't explain why it has to be high resolution all the way. 
> 
> Where is high resolution all the way. Care to read the patch ? It's high
> resolution aware and it does take out odd areas of code by design.

It's not just high resolution aware, it makes all calculation in high 
resolution _unconditionally_, which makes it high resolution all the way.

> > It also doesn't explain how it will interact with Johns work,
> 
> "The following add on patches are not provided for ad hoc inclusion as
> they contain third party patches. The reason for providing this series
> is to demonstrate the future use of ktimers and the simple extensibility
> for the impelemtation of high resolution timers. Especially John Stultz
> timeofday patch is a complete seperate issue
> and just used due to the ability to provide high resolution timers in a
> simple and non intrusive way."
> 
> Isn't this clear enough ?

No and I explained why I think that these are not separate issues at all.

> > Ok, so what's missing? From a basic design overview I would expect some 
> > information about types of time within the kernel and their relationship. 
> > We basically have three types:
> > - scheduler time
> > - wallclock time
> > - process time
> 
> What about monotonic time ?

It's derived from wallclock time.

> > The main difference between them is that the latter is user 
> > programmable. 
> 
> wallclock is reprogrammable too and it introduces a bunch of horrible
> functions in posix-timers.c. grep for abs_list. I explained why its
> horrible already.

I said _user_ programmable, wallclock time is usually NTP controlled.

> > Johns patches now introduce two major new concepts as a generic mechanism 
> > (and not just hidden somewhere in arch code): 1) a timer source 
> > abstraction, 2) making wallclock updates independent of the timer tick. 
> 
> 1. I'm well aware of the addressed problems in Johns patches.
> 
> 2.I dont see any hidden arch code in the ktimers patch. Do you ?

That's not what I meant (and if you had taken the time to think about it, 
instead of just being angry at me, I'm sure you would have noticed 
yourself), this is e.g. about code in arch/i386/kernel/timers/ or 
arch/ppc/kernel/time.c.

> > BTW here you completely miss the "main point of criticizm", the 64bit math 
> > is a problem, but the main problem is that he completely changed the NTP 
> > kernel model. I don't deny that the NTP code could use some updates 
> > itself, but that's a completely separate problem. Regarding the timer 
> > system it's only important how to synchronize NTP time with the kernel 
> > wallclock time, as soon as you get that right, the whole 64bit math 
> > problematic becomes irrelevant.
> 
> Roman, what are you trying to achieve ? Finding a playground for
> rabulistic discussions ?

Ok, I'm at a loss here, what are you trying to tell me? Is the above in 
any way incorrect?

> > The existence of the timer source abstraction is a major requirement for 
> > further improvements (in this regard it's already suspicious, that you put 
> > major changes before Johns patch). 
> 
> Whats suspicious on that ? Seperating the "timeout" API and the "timer"
> API has nothing to do with Johns patches.

Related changes should be done in a logical order, which I'm obviously 
disagree about with you.

> > The next major change would be to add the possibility to reprogram a 
> > timer source, the scheduler can use this to 
> > skip timer ticks and e.g. itimer can offer higher resolution timers. The 
> > main point here is before we get to any API decisions, we need to develop 
> > a model how a single time source can drive multiple users. Your split 
> > between user timers and kernel timeouts leaves this question completely 
> > open.
> 
> Did I claim, that ktimers solve this problem?
> 
> No. 
> 
> The patches are related but address different aspects of the overall
> problem without conflicting with each other. Quite the contrary: they
> complement each other.
> 
> I clearly stated that the reprogramming of timer events, which are not
> addressed by ktimers and I never claimed ktimers does, is a completely
> different problem.

No, it's part of the same problem, how are scheduler and your ktimers 
supposed to share the same time source?

> > The next step (_after_ reprogrammable timer sources) would be increasing 
> > the timer resolution.
> 
> Please let me correct you here. Adding reprogrammable timer events
> before you have the core timer system ready is wrong by design.
> Providing reprogrammable timer events is simple, but when the timer core
> system which depends on timer events is not ready for that you implement
> useless things and implement likely stuff which is not matching the
> requirements of the generic system.

So what are these requirements? Please be more specific.

> >  Here I'm not at all convinced, that we need to 
> > change everything to nanosecond resolution, we can easily make this a 
> > config option which either ties process time resolution to scheduler time 
> > or makes it independent. The first would make process time a 32bit ms 
> > value (basically current behaviour), the latter can make it to a 64bit ns 
> > value. Anyone trying to introduce nsec_t in common code really needs to 
> > come up with some better arguments why calculations in ns are necessary 
> > unconditionally, instead of making the resolution configurable.
> 
> Please provide a whatever time unit based and configurable / flexible
> solution for that instead of making unprovable claims !

What unprovable claims? What would change in the basic principles, if you 
would do them with 32bit ms values instead of 64bit ns values? The basic 
math should be the same and should demonstrate the basic principles 
equally well and since the current timer code has only ms (at HZ=1000) 
precision the behaviour should be the same as well.

> > In summary please provide a larger picture for your changes, it's 
> > especially important to desribe the relationship between the various 
> > systems. The API definition is only the last step and is derived from 
> > these relationships.
> 
> Sorry. When you are not able to get the larger picture in your mind, I
> doubt that you are the right person to discuss this topic. This kind of
> argument is not working with me, especially not when repeated all over
> the place. 
> 
> Please provide an alternative solution (i.e. code to review) yourself.

You seriously trying to tell me, that anyone doing reviews must provide 
alternative solution first?
Please refrain from personal attacks, it should be obvious that we have
different ideas how to implement this, what I'm trying to do is to get you 
to explain your picture better and I'm trying to explain my picture, so we
can understand each other better. We won't get very far as long as you are 
just pissed at me for disagreeing with you.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-22 23:09     ` Roman Zippel
@ 2005-09-22 23:31       ` Christopher Friesen
  2005-09-23  0:25         ` Roman Zippel
  2005-09-23  2:25       ` john stultz
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 50+ messages in thread
From: Christopher Friesen @ 2005-09-22 23:31 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck

Roman Zippel wrote:

> This no answer at all, you only repeat what you already said above. :(
> Care to share your knowledge?

Ingo already gave an example.  "a busy network server can easily have 
millions of timers pending. I once had to increase a server's 16 million 
tw timer sysctl limit ..."

> I don't say that 64bit math is evil, I just question that it's required - 
> small, but important difference.

<snip>

> It's not just high resolution aware, it makes all calculation in high 
> resolution _unconditionally_, which makes it high resolution all the way.

<snip>

> What unprovable claims? What would change in the basic principles, if you 
> would do them with 32bit ms values instead of 64bit ns values? The basic 
> math should be the same and should demonstrate the basic principles 
> equally well and since the current timer code has only ms (at HZ=1000) 
> precision the behaviour should be the same as well.

I see two assumptions that lead to the API using nanoseconds:

1) it is desireable to have a human-time-unit timer API, so that people 
can specify timeouts in easily-understood units
2) eventually we will use sub-ms resolution timers, so it makes sense to 
just jump to nanoseconds as our base timing unit

Are these reasonable starting points, or is there disagreement on these?

Maybe it would make sense to have the API be in nanoseconds and 
internally use 32bit ms for now, and only change to 64bit nanos when we 
actually move to sub-ms resolution timers.

Chris

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-22 23:31       ` Christopher Friesen
@ 2005-09-23  0:25         ` Roman Zippel
  2005-09-23  6:49           ` Thomas Gleixner
  0 siblings, 1 reply; 50+ messages in thread
From: Roman Zippel @ 2005-09-23  0:25 UTC (permalink / raw)
  To: Christopher Friesen
  Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck

Hi,

On Thu, 22 Sep 2005, Christopher Friesen wrote:

> Roman Zippel wrote:
> 
> > This no answer at all, you only repeat what you already said above. :(
> > Care to share your knowledge?
> 
> Ingo already gave an example.  "a busy network server can easily have millions
> of timers pending. I once had to increase a server's 16 million tw timer
> sysctl limit ..."

I hoped for a more concrete example (i.e. pointer to source), but this one 
at least gave me enough hints where to look.
There are ways to avoid this huge number of added timers, but this 
requires a better analysis of the problem.

> I see two assumptions that lead to the API using nanoseconds:
> 
> 1) it is desireable to have a human-time-unit timer API, so that people can
> specify timeouts in easily-understood units
> 2) eventually we will use sub-ms resolution timers, so it makes sense to just
> jump to nanoseconds as our base timing unit
> 
> Are these reasonable starting points, or is there disagreement on these?
> 
> Maybe it would make sense to have the API be in nanoseconds and internally use
> 32bit ms for now, and only change to 64bit nanos when we actually move to
> sub-ms resolution timers.

Actually the decision to use ns has nothing to do with API issues. 
<linux/jiffies.h> has already a lot of options to specify timeouts for 
kernel timer. The official userspace API is mostly timespec/timeval.
The nsec_t type is an _internal_ type to manage time, so this makes it 
possible to do something like this:

#ifdef CONFIG_HIRES_TIMER
typedef u64 ktime_t;
#else
typedef u32 ktime_t;
#endif

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-22 23:09     ` Roman Zippel
  2005-09-22 23:31       ` Christopher Friesen
@ 2005-09-23  2:25       ` john stultz
  2005-09-23  8:27       ` Thomas Gleixner
  2005-09-23 15:21       ` Paul E. McKenney
  3 siblings, 0 replies; 50+ messages in thread
From: john stultz @ 2005-09-23  2:25 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, paulmck

On Fri, 2005-09-23 at 01:09 +0200, Roman Zippel wrote:
> On Thu, 22 Sep 2005, Thomas Gleixner wrote:
> > > It also doesn't explain how it will interact with Johns work,
> > 
> > "The following add on patches are not provided for ad hoc inclusion as
> > they contain third party patches. The reason for providing this series
> > is to demonstrate the future use of ktimers and the simple extensibility
> > for the impelemtation of high resolution timers. Especially John Stultz
> > timeofday patch is a complete seperate issue
> > and just used due to the ability to provide high resolution timers in a
> > simple and non intrusive way."
> > 
> > Isn't this clear enough ?
> 
> No and I explained why I think that these are not separate issues at all.

You had a long response, and some of the terminology is confusing, so
I'm not sure exactly how to help address your concern. 

You did accurately summarized that in my patches I introduced two major
generic concepts:
1. The time source abstraction of a free-running counter to be used for
timekeeping. (Note: not "timer source" - timers generate interrupts,
time sources do not necessarilly)
2. Allowing the wall-clock (as well as a monotonic system clock) to run
independent of timer ticks, via the time source abstraction.

Your complaint was that in making these change the NTP model has been
changed, although I'm not sure if I agree, but hopefully the patches I
sent out today can continue that discussion. :) 

It is my claim that these two concepts allow for correct and robust
timekeeping, and additionally give some flexibility so larger change
(such as dynamic ticks) can be made without worrying too much about
their effect on timekeeping. However, it does not provide anything
interface wise that is not currently exist, it just merely cleans some
of the interfaces up.

So, ignoring correctness in the face of lost ticks and other timekeeping
problems, my patch is not strictly necessary for what Thomas is doing.
It just so happens that my do_monotonic_clock() interface provided a
correct monotonic system time in nanoseconds and that meshed very well
with Thomas' work.


> > > Ok, so what's missing? From a basic design overview I would expect some 
> > > information about types of time within the kernel and their relationship. 
> > > We basically have three types:
> > > - scheduler time
> > > - wallclock time
> > > - process time


My clarity in writing is sometimes an issue, but let me take a shot at
it(Thomas, or anyone, feel free to correct me).

Currently we have two main domains of time in the kernel: xtime and
jiffies.

jiffies:
	jiffies is a simple HZ frequency software maintained (interrupt based)
counter. Since it is fairly low-res, it easily fits in a single long
int(well, a reasonable portion of it does), and requires no locking to
atomically access, making it very easy and fast to use. It is used for
almost all in-kernel time accounting, from scheduler and processes
accounting to soft-timers. Since it is not exported to userspace, the
counter is less robust on some arches in the face of things like lost
ticks. It is not NTP corrected, has a limited range on some arches (in
it's single long int form) and discrepancies between the requested HZ
value and the actual tick frequency (ACTHZ)can cause additional
confusion when mapping jiffies to actual time.


xtime:
	xtime provides nsec resolution software maintained NTP adjusted wall
clock which is exported to userspace.  Along with wall_to_monotonic we
get a NTP adjusted monotonic clock. This is expected to be very robust
and accurate, however since it is so finely grained it requires 64 bits
(or a timepsec) to store it, which can cause performance concerns in the
cases where nsec resolution is unnecessary.


Now that the time domains are covered, how do we use them?

Soft-timers / Timeouts (aka: In kernel software maintained timers):
	Soft-timers provide a internal kernel mechanism for running code at a
later specified time. Soft timers use jiffies for expiration, so they
are fast to use, but are limited to HZ resolution. As Thomas already
discussed (as well as LWN's article) they are very frequently used for
timeouts that are removed before they expire. Additionally, since they
are jiffies based, they have problems when mapping back and forth with
wall time, and do not robustly handle lost ticks (however due to their
common use, this is not normally an issue).


	When userspace requests for action at a future time are made to the
kernel, they are made using some form of human time unit (flat usecs or
timespecs, whatever). Currently inside the kernel, we must convert these
requests to jiffies and use the soft-timer subsystem. This limits both
the range and low resolution of the request. Additionally, the
discrepancies between HZ, ACTHZ, NTP adjustments and lost ticks can
cause for additional inaccuracies in the conversion.


ktimers (From my understanding, again Thomas, correct me as needed):
	ktimers provide a completely separate soft-timer list, which can use
either the wall-clock or the monotonic-clock as its domain for addition
and expiration. Since users may specify nanosecond resolution requests,
ktimers preserve the request in a nanosecond form. This eliminates any
discrepancies between jiffies and wall or monotonic time, and allows for
future sub-HZ latencies for expiration (in combination with a high-res
hardware-timer interrupt source). Additionally, in-kernel users who
desire high-precision wall/monotonic clock based timers could find
ktimers useful. 


So the existing fast interface remains with the same jiffies time
domain. ktimers just add a secondary high-resolution interface that maps
to the wall/monotonic_clock domain.

> > > The existence of the timer source abstraction is a major requirement for 
> > > further improvements (in this regard it's already suspicious, that you put 
> > > major changes before Johns patch). 
> > 
> > Whats suspicious on that ? Seperating the "timeout" API and the "timer"
> > API has nothing to do with Johns patches.
> 
> Related changes should be done in a logical order, which I'm obviously 
> disagree about with you.

However, in this case the ktimer patch Thomas mailed out is really
independent from my change. I believe my change helps insure his
interfaces behave properly (you don't want your monotonic clock jumping
backwards occasionally!), but they do not affect his code's logic.


The fact that I make wall time update independent of timer ticks is
really just for simplicity and correctness. It is in no way a
requirement for wall/monotonic domain based timers or high-res timers
(my code does not provide any higher resolution interface then what is
already there). 

Maybe if you were talking about the dynamic tick changes, would it make
sense to wait and do my changes first, and Thomas is not proposing that
at this moment.


> > > The next major change would be to add the possibility to reprogram a 
> > > timer source, the scheduler can use this to 
> > > skip timer ticks and e.g. itimer can offer higher resolution timers. The 
> > > main point here is before we get to any API decisions, we need to develop 
> > > a model how a single time source can drive multiple users. Your split 
> > > between user timers and kernel timeouts leaves this question completely 
> > > open.
> > 
> > Did I claim, that ktimers solve this problem?
> > 
> > No. 
> > 
> > The patches are related but address different aspects of the overall
> > problem without conflicting with each other. Quite the contrary: they
> > complement each other.
> > 
> > I clearly stated that the reprogramming of timer events, which are not
> > addressed by ktimers and I never claimed ktimers does, is a completely
> > different problem.
> 
> No, it's part of the same problem, how are scheduler and your ktimers 
> supposed to share the same time source?

They don't share a time source.

ktimers are in the xtime domain, the scheduler is in the jiffies domain.


Sorry, this was really much longer then I wanted it to be. Hopefully it
wasn't too repetitious, and  reasonably clear. 

thanks
-john


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-23  0:25         ` Roman Zippel
@ 2005-09-23  6:49           ` Thomas Gleixner
  2005-09-24  3:15             ` Roman Zippel
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-23  6:49 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Christopher Friesen, linux-kernel, mingo, akpm, george, johnstul,
	paulmck

On Fri, 2005-09-23 at 02:25 +0200, Roman Zippel wrote:
> > Maybe it would make sense to have the API be in nanoseconds and internally use
> > 32bit ms for now, and only change to 64bit nanos when we actually move to
> > sub-ms resolution timers.
> 
> Actually the decision to use ns has nothing to do with API issues. 
> <linux/jiffies.h> has already a lot of options to specify timeouts for 
> kernel timer. The official userspace API is mostly timespec/timeval.
> The nsec_t type is an _internal_ type to manage time, so this makes it 
> possible to do something like this:
> 
> #ifdef CONFIG_HIRES_TIMER
> typedef u64 ktime_t;
> #else
> typedef u32 ktime_t;
> #endif

Sure that's possible, but the 32bit storage format has its limitiations
and it is not possible to keep the code compatible for both use cases.

Posix timers - both CLOCK_REALTIME and CLOCK_MONOTONIC - can be
programmed in absolute time. In a 32bit representation with ms
resolution we can store ~49 days, so we can not fit the value which come
up from user space wihtout correction/conversion except we limit the use
cases to 49 days uptime and clock realtime < 49days since the epoch.

If we can not fit the given value into the internal representation, we
have to do exactly what the current implementation of clock realtime in
posix-timers.c has to do. Storing information about xtime / monotonic
offset, adding the timer to yet another list (abs_list) convert to
jiffies and in case the clock gets set, run through all the affected
timers in abs_list recalculate the expiry value and requeue them.

The idea of ktimers is to use the requested time given by a timespec in
human time without any corrections, so we actually can avoid the above.

Also doing time ordered insertion into a list introduces incompabilities
between 32/64 bit storage formats.

I carefully waged the necessary quirk load vs. the cleanliness,
simplicity and robustness of a pure 64 bit implementation. The resulting
payload for 32bit systems, which is in the range of 1-3 instructions per
fast path operation (add, sub, compare) is not worth the trouble IMO to
give up a clean, simple and robust design, which also allows high
resolution timers with no big change to the base implementation.


tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-22 23:09     ` Roman Zippel
  2005-09-22 23:31       ` Christopher Friesen
  2005-09-23  2:25       ` john stultz
@ 2005-09-23  8:27       ` Thomas Gleixner
  2005-09-24  2:43         ` Roman Zippel
  2005-09-23 15:21       ` Paul E. McKenney
  3 siblings, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-23  8:27 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

On Fri, 2005-09-23 at 01:09 +0200, Roman Zippel wrote:
> > Quick answer: Networking and disk I/O. Insane load on a 4 way SMP
> > machine. Check yourself. :)
> 
> This no answer at all, you only repeat what you already said above. :(
> Care to share your knowledge?

Each network connection, each disk I/O operation arms a timeout timer to
cover error conditions. Increasing the load on those increases the
number of armed timers. At the same time this increased load keeps the
timers longer active as it takes more time to detect that the "good"
condition arrived on time.

> You don't make it obvious at all. You jump from problems with _kernel_ 
> timers, to the introduction of a new subsytem to manage _process_ timers.
> Why don't you fix the problems with kernel timers separately?
> Only because kernel timer require less precision, doesn't mean you can 
> make them imprecise. 

I did not make them imprecise. ktimers changes excactly nothing there.

> The timer accuracy is defined by the timer source and 
> I expect it to be the same for both kernel and process timers.

As John pointed out correctly, this is a seperation of time domains. Is
Johns explanation enough ?

> > > Basically how does the new big picture look like and how do high 
> > > resolution timer fit into it? (You are more busy defending the 64bit math, 
> > > than actually explaining why and where it's needed in the first place.)
> > 
> > I also explained why I wanted to seperate "timeout" and "timers" APIs. I
> > explained why I choose rbtree and I explained why I used 64bit math and
> > at least why 64 bit math is not that evil as commonly seen. 
> 
> I don't say that 64bit math is evil, I just question that it's required - 
> small, but important difference.
> The main problem with your ktimer patch is that it's another all-in-one 
> patch, it simply changes too many aspects at once. If you want to 
> introduce a new API, you can do so by first introducing a small layer 
> which maps to the old layer. This makes it easier to see and prove any 
> potential improvement.

I dont see ktimers as an all in one change everything patch. 

It addresses exactly _one_ problem and nothing else. It seperates time
domains and their representation and handling.

I dont see how you want to map this new API to the old layer.

Lets look at the patch:

ktimers.c/h introduce the new API (no change to existing code)

The other files changed are the conversions of the users to this new
API. Mostly small changes except for posix-timers.c, where the abs_list
handling was removed as it was not longer necessary. 

Providing a new API wihtout making use of it and showing the benefits is
rather pointless. I consider the simplification of posix-timers as a
valuable benefit.

> > >  I'm not against high resolution timers per se, but this 
> > > doesn't explain why it has to be high resolution all the way. 
> > 
> > Where is high resolution all the way. Care to read the patch ? It's high
> > resolution aware and it does take out odd areas of code by design.
> 
> It's not just high resolution aware, it makes all calculation in high 
> resolution _unconditionally_, which makes it high resolution all the way.

Please see the other mail explaining 32/64bit issues.

> > > It also doesn't explain how it will interact with Johns work,
> > 
> > "The following add on patches are not provided for ad hoc inclusion as
> > they contain third party patches. The reason for providing this series
> > is to demonstrate the future use of ktimers and the simple extensibility
> > for the impelemtation of high resolution timers. Especially John Stultz
> > timeofday patch is a complete seperate issue
> > and just used due to the ability to provide high resolution timers in a
> > simple and non intrusive way."
> > 
> > Isn't this clear enough ?
> 
> No and I explained why I think that these are not separate issues at all.

These issues are seperate even if ktimers uses values provided by the
time of day subsystem. 

ktimers need the current time in the representation of CLOCK_REALTIME
and CLOCK_MONOTIC. 

ktimers doe not rely on Johns work.  Ktimers can make use of Johns work
as it uses the values provided by the existing timeofday code now. 

The usage is simpler if Johns work is in place. Nothing else.

> > > The main difference between them is that the latter is user 
> > > programmable. 
> > 
> > wallclock is reprogrammable too and it introduces a bunch of horrible
> > functions in posix-timers.c. grep for abs_list. I explained why its
> > horrible already.
> 
> I said _user_ programmable, wallclock time is usually NTP controlled.

I consider sys_adjtimex() and sys_settimeofday() as user interfaces.
Both affect wallclock and therefor affect timers related to wallclock.

> > 2.I dont see any hidden arch code in the ktimers patch. Do you ?
> 
> That's not what I meant (and if you had taken the time to think about it, 
> instead of just being angry at me, I'm sure you would have noticed 
> yourself), this is e.g. about code in arch/i386/kernel/timers/ or 
> arch/ppc/kernel/time.c.

I dont see what you want. The submitted ktimers patch does not contain a
single change to arch/xxx files.

If you refer to the proof of concept high resolution timer
implementation, then I really do not understand why you are insisting on
that. It is proof of concept to verify the usability of the ktimer base
implementation for high resolution timing - nothing more. I used Johns
patches for that proof of concept implementation as they made life
simpler. 

So there is no point in discussing this.

I clearly said that from the beginning. I provided this addon patch
series to give interested developers a possibility to look at it and
compare and contrast it to the existing high resolution timer
implementations. 

> > > The existence of the timer source abstraction is a major requirement for 
> > > further improvements (in this regard it's already suspicious, that you put 
> > > major changes before Johns patch). 
> > 
> > Whats suspicious on that ? Seperating the "timeout" API and the "timer"
> > API has nothing to do with Johns patches.
> 
> Related changes should be done in a logical order, which I'm obviously 
> disagree about with you.

Again. The patches are orthogonal. So where is the logical order ?

> > I clearly stated that the reprogramming of timer events, which are not
> > addressed by ktimers and I never claimed ktimers does, is a completely
> > different problem.
> 
> No, it's part of the same problem, how are scheduler and your ktimers 
> supposed to share the same time source?

<SNIP...>

Admittedly everything which is dealing with aspects of time is related,
but it can and must be seperated into different subsystems, which make
use of the provided interfaces.

1. Time tick

	- constant frequency tick
	
	Provides interface for:
	reading the current tick count

2. Time of day

	- handles frequency adjustments of the timesource
	- keeps track of monotonic time
	- provides the representation of wall clock time
	- handles the adjustment of wall clock time

	Provides interfaces for:
	reading monotonic time
	reading wallclock time
	adjusting the frequency of the time source
	setting wallclock time


	Makes possibly use of the interface:
	(Depends on the availability of time sources)
	time tick:read tick count

3. Timeout API

	- Time tick based timer handling
	- Solely used for in kernel purposes

	Provides interfaces for:
	adding timers
	modifying timers
	deleting timers

	Makes use of the interface:
	time tick:read tick count

4. Timer API

	- monotonic clock based timers
	- realtime (wallclock) clock based timers
	- Mainly intended for application timers

	Provides interfaces for:
	adding timers
	modifying timers
	deleting timers
	
	Makes use of the interfaces:
	timeofday: read monotonic time	
	timeofday: read wallclock time	

So we have four seperate building blocks related to time, but clearly
seperated.

The current implementation in the kernel is providing only 1,2,3. The
timeofday API is somewhat intermingled with the tick code. The Timer API
is implemented with a bunch of workarounds by using the Timeout API.

ktimers provide the seperation of Timeout API and Timer API and therefor
the seperation of the time domains. ktimers do not need any changes to
1,2,3 but can benefit from what ever improvement is made in the
timeofday domain.

Johns patches address the clear seperation of time ticks and time of day
timekeeping and do not affect the timeout API nor ktimers. Once Johns
work is in place the ktimer code simply makes use of the new interfaces.

High resolution timers and dynsmic ticks need this seperations to make
them less intrusive. Of course we need an additional abstraction layer,
which handles the timer event sources.

tglx




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-22 23:09     ` Roman Zippel
                         ` (2 preceding siblings ...)
  2005-09-23  8:27       ` Thomas Gleixner
@ 2005-09-23 15:21       ` Paul E. McKenney
  2005-09-24  3:38         ` Roman Zippel
  3 siblings, 1 reply; 50+ messages in thread
From: Paul E. McKenney @ 2005-09-23 15:21 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul

On Fri, Sep 23, 2005 at 01:09:46AM +0200, Roman Zippel wrote:
> On Thu, 22 Sep 2005, Thomas Gleixner wrote:

[ . . . ]

> > > The main difference between them is that the latter is user 
> > > programmable. 
> > 
> > wallclock is reprogrammable too and it introduces a bunch of horrible
> > functions in posix-timers.c. grep for abs_list. I explained why its
> > horrible already.
> 
> I said _user_ programmable, wallclock time is usually NTP controlled.

I believe Thomas is concerned about workloads that need a short-term
stable timebase.  For example, a process-control application might need
to accurately measure a (say) 1500-millisecond time interval.  Both
user-programmability and NTP adjustments to a given timebase could
destroy the needed measurement accuracy.

Such a workload does not need the long-term tie to wallclock time that
NTP provides, but it does need the accurate short-term timekeeping that
NTP cannot provide -- NTP sacrifices short-term accuracy in order to
adjust the clock as needed to gain long-term stability.

Thomas, John, please jump in if I am missing the point here.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-23  8:27       ` Thomas Gleixner
@ 2005-09-24  2:43         ` Roman Zippel
  2005-09-24  5:03           ` Ingo Molnar
  2005-09-24  9:04           ` James Bruce
  0 siblings, 2 replies; 50+ messages in thread
From: Roman Zippel @ 2005-09-24  2:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck

Hi,

On Fri, 23 Sep 2005, Thomas Gleixner wrote:

> Each network connection, each disk I/O operation arms a timeout timer to
> cover error conditions. Increasing the load on those increases the
> number of armed timers. At the same time this increased load keeps the
> timers longer active as it takes more time to detect that the "good"
> condition arrived on time.

You're still rather vague here...
Anyway, if the amount of active timer should be a problem, there are ways 
to avoid them. For disk io it's rather simple as the timeout is mostly 
constant:

start i/o:
	req->expire = jiffies + timeout;
	list_add_tail(req->list, &timeout_list);
	if (!timer_pending(timer)) {
		timer->expire = req->expire;
		add_timer(timer);
	}

timeout function:
	req = list_head(&timeout_list);
	if (time_after_eq(req->expire, jiffies)) {
		// error...
	} else {
		timer->expire = req->expire;
		add_timer(timer);
	}

Network timer are a bit more difficult as the timeouts are more dynamic, 
but one can at least delay arming the timer in most cases, by running a 
timer every x ticks:

start timer:
	if (timeout < x)
		add_timer();
	else
		list_add_tail();

timer function:
	list_for_each_entry()
		add_timer();

Should the action be successfull before the timer runs, it only needs to 
remove it from the private list.


> Admittedly everything which is dealing with aspects of time is related,
> but it can and must be seperated into different subsystems, which make
> use of the provided interfaces.
> 
> 1. Time tick
> 
> 	- constant frequency tick
> 	
> 	Provides interface for:
> 	reading the current tick count
> 
> 2. Time of day
> 
> 	- handles frequency adjustments of the timesource
> 	- keeps track of monotonic time
> 	- provides the representation of wall clock time
> 	- handles the adjustment of wall clock time
> 
> 	Provides interfaces for:
> 	reading monotonic time
> 	reading wallclock time
> 	adjusting the frequency of the time source
> 	setting wallclock time
> 
> 
> 	Makes possibly use of the interface:
> 	(Depends on the availability of time sources)
> 	time tick:read tick count
> 
> 3. Timeout API
> 
> 	- Time tick based timer handling
> 	- Solely used for in kernel purposes
> 
> 	Provides interfaces for:
> 	adding timers
> 	modifying timers
> 	deleting timers
> 
> 	Makes use of the interface:
> 	time tick:read tick count
> 
> 4. Timer API
> 
> 	- monotonic clock based timers
> 	- realtime (wallclock) clock based timers
> 	- Mainly intended for application timers
> 
> 	Provides interfaces for:
> 	adding timers
> 	modifying timers
> 	deleting timers
> 	
> 	Makes use of the interfaces:
> 	timeofday: read monotonic time	
> 	timeofday: read wallclock time	
> 
> So we have four seperate building blocks related to time, but clearly
> seperated.

First, please don't say API if you talk about subsystems, it's not the 
same. APIs are part of a subsystem and describe the relationships between 
subsystems. I think that caused a major part of the confusion.


I don't completely agree with your picture above and while these 
subsystems are mostly separate, they are not independent:

current dependencies:

	1. time source ---> 2. wallclock    ---> 4. process timer
	                \-> 3. kernel timer -/

We have currently a very simple time source, which just provides a 
nonprogrammable timer interrupt. Process timer are limited in their 
(programmable) resolution to kernel.

This maybe also explains better the 3 types of time I mentioned (or 
domains if you prefer):
- wallclock time: NTP controlled, only readable and not programmable
- scheduler time: monotonic and unsynchronized, readable and programmable 
  in HZ resolution
- process time: derived from the first two.

Johns work now (hopefully) integrates wallclock functionality into the 
time source:

	time source ---> wallclock time
	ntp library -/

This means the NTP code becomes a library which can be used by the time 
source to provide a synchronized wallclock time.

The next step would be to make the time source programmable, so that we 
can get rid of the kernel timer dependency from the process timer:

	time source ---> kernel timer
	             \-> process timer

In this context the main functionality of your patch now finally becomes 
understandable: making process timer independent of kernel timer. This 
never became clear from your announcement, it talks about a lot of 
unrelated problems, but it never gets to the actual problem.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-23  6:49           ` Thomas Gleixner
@ 2005-09-24  3:15             ` Roman Zippel
  2005-09-24  5:16               ` Ingo Molnar
  0 siblings, 1 reply; 50+ messages in thread
From: Roman Zippel @ 2005-09-24  3:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christopher Friesen, linux-kernel, mingo, akpm, george, johnstul,
	paulmck

Hi,

On Fri, 23 Sep 2005, Thomas Gleixner wrote:

> The idea of ktimers is to use the requested time given by a timespec in
> human time without any corrections, so we actually can avoid the above.
> 
> Also doing time ordered insertion into a list introduces incompabilities
> between 32/64 bit storage formats.

Except that the (time) range of the list would be limited I don't really 
see a big difference.
Anyway, the biggest cost is the conversion from/to the 64bit ns value and 
if its main use is sorting, you can use something like this:

typedef union {
	u64 tv64;
	struct {
#ifdef __BIG_ENDIAN
		u32 sec, nsec;
#else
		u32 nsec, sec;
#endif
	} tv;
} ktimespec;

To compare two time values the tv64 value is sufficient.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-23 15:21       ` Paul E. McKenney
@ 2005-09-24  3:38         ` Roman Zippel
  0 siblings, 0 replies; 50+ messages in thread
From: Roman Zippel @ 2005-09-24  3:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul

Hi,

On Fri, 23 Sep 2005, Paul E. McKenney wrote:

> I believe Thomas is concerned about workloads that need a short-term
> stable timebase.  For example, a process-control application might need
> to accurately measure a (say) 1500-millisecond time interval.  Both
> user-programmability and NTP adjustments to a given timebase could
> destroy the needed measurement accuracy.

NTP adjustments a quite small and not applied all at once, this means as 
soon as the time is synchronized, we could switch CLOCK_MONOTONIC to it.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24  2:43         ` Roman Zippel
@ 2005-09-24  5:03           ` Ingo Molnar
  2005-09-24  9:04           ` James Bruce
  1 sibling, 0 replies; 50+ messages in thread
From: Ingo Molnar @ 2005-09-24  5:03 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, linux-kernel, akpm, george, johnstul, paulmck


* Roman Zippel <zippel@linux-m68k.org> wrote:

> On Fri, 23 Sep 2005, Thomas Gleixner wrote:
> 
> > Each network connection, each disk I/O operation arms a timeout timer to
> > cover error conditions. Increasing the load on those increases the
> > number of armed timers. At the same time this increased load keeps the
> > timers longer active as it takes more time to detect that the "good"
> > condition arrived on time.
> 
> You're still rather vague here...

as i said before, millions of timers are easily possible, and i 
personally saw in excess of 16 million active timers. I hope there was 
nothing vague about that ;-)

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24  3:15             ` Roman Zippel
@ 2005-09-24  5:16               ` Ingo Molnar
  2005-09-24 10:35                 ` Roman Zippel
  0 siblings, 1 reply; 50+ messages in thread
From: Ingo Molnar @ 2005-09-24  5:16 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck


* Roman Zippel <zippel@linux-m68k.org> wrote:

> On Fri, 23 Sep 2005, Thomas Gleixner wrote:
> 
> > The idea of ktimers is to use the requested time given by a timespec in
> > human time without any corrections, so we actually can avoid the above.
> > 
> > Also doing time ordered insertion into a list introduces incompabilities
> > between 32/64 bit storage formats.
> 
> Except that the (time) range of the list would be limited I don't really 
> see a big difference.
> Anyway, the biggest cost is the conversion from/to the 64bit ns value 
> [...]

Where do you get that notion from? Have you personally measured the 
performance and code size impact of it? If yes, would you mind to share 
the resulting data with us?

Our data is that the use of 64-bit nsec_t significantly reduces the size 
of a representative piece of code (object size in bytes):

                AMD64    I386        ARM          PPC32       M68K
   nsec_t_ops   226      284         252          428         206
   timespec_ops 412      324         448          640         342

i.e. a ~40% size reduction when going to nsec_t on m68k, in that 
particular function. Even larger, ~45% code size reduction on a true 
64-bit platform.

	Ingo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24  2:43         ` Roman Zippel
  2005-09-24  5:03           ` Ingo Molnar
@ 2005-09-24  9:04           ` James Bruce
  1 sibling, 0 replies; 50+ messages in thread
From: James Bruce @ 2005-09-24  9:04 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck

Roman Zippel wrote:
> On Fri, 23 Sep 2005, Thomas Gleixner wrote:
>>Each network connection, each disk I/O operation arms a timeout timer to
>>cover error conditions. Increasing the load on those increases the
>>number of armed timers. At the same time this increased load keeps the
>>timers longer active as it takes more time to detect that the "good"
>>condition arrived on time.
> 
> You're still rather vague here...
> Anyway, if the amount of active timer should be a problem, there are ways 
> to avoid them. <snip>

What does this have to do with the ktimers work itself?  It's true that 
other parts of the kernel shouldn't create more timers than necessary, 
but the timer subsystem should be able to handle a lot of timers 
regardless of that.  To put it in perspective: A server doesn't run very 
efficiently with a load of 1000, and that should be avoided by proper 
application design.  Yet we still test the scheduler on such workloads, 
don't we?  It's nice to know a subsystem doesn't fall over when its 
stressed.

If you really feel timers are overused, please bring it up with the 
maintainers of *those subsystems* which are overusing it.  There's no 
point in raising the issue with Thomas since he's not responsible for 
how other people use/misuse an existing API.

Perhaps the real issue is that you feel we should police the kernel 
usage of timers, instead of moving to a more scalable implementation. 
This is one of those rare cases however where we can have cleaner, more 
modular code, barely longer than before, which is also more scalable. 
The only thing left to measure is the performance impact, but the 
authors haven't gotten that far yet.  Instead of jumping to conclusions 
now, let's wait until we have some real numbers, shall we?

Jim Bruce

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24  5:16               ` Ingo Molnar
@ 2005-09-24 10:35                 ` Roman Zippel
  2005-09-24 13:56                   ` Thomas Gleixner
  0 siblings, 1 reply; 50+ messages in thread
From: Roman Zippel @ 2005-09-24 10:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

Hi,

On Sat, 24 Sep 2005, Ingo Molnar wrote:

> > Anyway, the biggest cost is the conversion from/to the 64bit ns value 
> > [...]
> 
> Where do you get that notion from? Have you personally measured the 
> performance and code size impact of it? If yes, would you mind to share 
> the resulting data with us?
> 
> Our data is that the use of 64-bit nsec_t significantly reduces the size 
> of a representative piece of code (object size in bytes):
> 
>                 AMD64    I386        ARM          PPC32       M68K
>    nsec_t_ops   226      284         252          428         206
>    timespec_ops 412      324         448          640         342
> 
> i.e. a ~40% size reduction when going to nsec_t on m68k, in that 
> particular function. Even larger, ~45% code size reduction on a true 
> 64-bit platform.

Without any source these numbers are not verifiable. You don't even 
mention here what that "representative piece of code" is...

Anyway, Thomas mentioned that this would be from the insert/remove code 
and here you omitted the most important part of my mail:

typedef union {
	u64 tv64;
	struct {
#ifdef __BIG_ENDIAN
		u32 sec, nsec;
#else
		u32 nsec, sec;
#endif
	} tv;
} ktimespec;

IOW this would allow to keep the time value in timespec format and use 
your nsec_t_ops for sorting.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24 10:35                 ` Roman Zippel
@ 2005-09-24 13:56                   ` Thomas Gleixner
  2005-09-24 16:51                     ` Daniel Walker
  2005-09-24 23:45                     ` Roman Zippel
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-24 13:56 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

On Sat, 2005-09-24 at 12:35 +0200, Roman Zippel wrote:
> Hi,
> 
> On Sat, 24 Sep 2005, Ingo Molnar wrote:
> 
> > > Anyway, the biggest cost is the conversion from/to the 64bit ns value 
> > > [...]
> > 
> > Where do you get that notion from? Have you personally measured the 
> > performance and code size impact of it? If yes, would you mind to share 
> > the resulting data with us?
> > 
> > Our data is that the use of 64-bit nsec_t significantly reduces the size 
> > of a representative piece of code (object size in bytes):
> > 
> >                 AMD64    I386        ARM          PPC32       M68K
> >    nsec_t_ops   226      284         252          428         206
> >    timespec_ops 412      324         448          640         342
> > 
> > i.e. a ~40% size reduction when going to nsec_t on m68k, in that 
> > particular function. Even larger, ~45% code size reduction on a true 
> > 64-bit platform.
> 
> Without any source these numbers are not verifiable. You don't even 
> mention here what that "representative piece of code" is...

struct base {
        nsec_t now;
        struct ktimer *timers[16];
        struct ktimer *running;
};

void nsec_t_ops(struct base *base, struct ktimer *next,
                  nsec_t *tim, int mode)
{
        int i;
        nsec_t now = base->now;

        for (i = 0; i < 16; i++) {

                void (*fn)(void *);
                void *data;
                struct ktimer *timer = base->timers[i];

                if (timer->expires > now)
                        break;
                timer->expired = now;
                fn = timer->function;
                data = timer->data;
                base->running = timer;
                fn(data);
                base->running = NULL;
        }

        switch(mode) {
        case 0:
                next->expires = *tim;
                break;
        case 1:
                next->expires = now + *tim;
                break;
        case 2:
                next->expires += *tim;
                break;
        case 3:
                while (next->expires > now) {
                        next->expires += *tim;
                }
                break;
        }
        base->timers[0] = next;
}


versus:

#define NSEC_PER_SEC 1000000000
struct base {
        struct timespec now;
        struct ktimer *timers[16];
        struct ktimer *running;
};

#define timespec_gt(a,b)                       	\
        (((a).tv_sec > (b).tv_sec) ? 1 :        \
        (((a).tv_sec < (b).tv_sec) ? 0 :        \
        ((a).tv_nsec > (b).tv_nsec)))

#define timespec_addptr(a,b)                            \
        (a)->tv_sec = ((a)->tv_sec + (b)->tv_sec);      \
        (a)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec);   \
        if ((a)->tv_nsec >= NSEC_PER_SEC){               \
                (a)->tv_nsec -= NSEC_PER_SEC;           \
                (a)->tv_sec++;                          \
        }

#define timespec_addppp(c,a,b)                          \
        (c)->tv_sec = ((a)->tv_sec + (b)->tv_sec);      \
        (c)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec);   \
        if ((c)->tv_nsec >= NSEC_PER_SEC){               \
                (c)->tv_nsec -= NSEC_PER_SEC;           \
                (c)->tv_sec++;                          \
        }


void timespec_ops(struct base *base, struct ktimer *next,
                  struct timespec *tim, int mode)
{
        int i;
        struct timespec now = base->now;

        for (i = 0; i < 16; i++) {

                void (*fn)(void *);
                void *data;
                struct ktimer *timer = base->timers[i];

                if (timespec_gt(timer->expires, now))
                        break;
                timer->expired = now;
                fn = timer->function;
                data = timer->data;
                base->running = timer;
                fn(data);
                base->running = NULL;
        }

        switch(mode) {
        case 0:
                next->expires = *tim;
                break;
        case 1:
                timespec_addppp(&next->expires, &now, tim);
                break;
        case 2:
                timespec_addptr(&next->expires, tim);
                break;
        case 3:
                while (timespec_gt(now, next->expires)) {
                        timespec_addptr(&next->expires, tim);
                }
                break;
        }
        base->timers[0] = next;
}


> Anyway, Thomas mentioned that this would be from the insert/remove code 
> and here you omitted the most important part of my mail:
> 
> typedef union {
> 	u64 tv64;
> 	struct {
> #ifdef __BIG_ENDIAN
> 		u32 sec, nsec;
> #else
> 		u32 nsec, sec;
> #endif
> 	} tv;
> } ktimespec;
> 
> IOW this would allow to keep the time value in timespec format and use 
> your nsec_t_ops for sorting.

Yes, it works for comparisons. 

But for any other operation this construct has the same problem than
struct timespec itself. You need at least an add function which is
always an add and a comparison / correction vs. nsec >= NSEC_PER_SEC. 

The 64 bit nsec_t value can just be used as is without inventing a
wrapper macro for each operation.

The only point, where (k)timespec has an advantage is that the userspace
value must not be converted to nsec_t, but deducing therefor this is the
better overall solution is a fallacy.

nsec_t				ktimespec

syscall:
32x32 mul
64bit add			2 x 32bit move

arm timer:
64 bit add			2 x 32 bit add
				  32 bit compare
				  32 bit sub
				  32 bit add

The 3 operation compensate for the 32x32 
multiplication.

For interval timers you have the 
				  32 bit compare
				  32 bit sub
				  32 bit add
additional overhead for each rearm.

The backward conversion from nsec_t to timespec is almost a non issue.
The vast majority of callers dont provide the second argument to
nanosleep(), setitimer(), set_timer() which makes the conversion
necessary and I think we optimize for the common use case.

Besides that the representation of time in nsec_t values is much
clearer. 

I know that we have to deal with timespecs vs. userspace, but keeping
this representation for kernel internal usage reminds me on the BCD
calculations which were a similar 2^x vs 10^x oddity in the early days
of microprocessors. Of course they were obstinate and survived a
surprisingly long time.

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24 13:56                   ` Thomas Gleixner
@ 2005-09-24 16:51                     ` Daniel Walker
  2005-09-24 23:45                     ` Roman Zippel
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel Walker @ 2005-09-24 16:51 UTC (permalink / raw)
  To: tglx
  Cc: Roman Zippel, Ingo Molnar, Christopher Friesen, linux-kernel,
	akpm, george, johnstul, paulmck

On Sat, 2005-09-24 at 15:56 +0200, Thomas Gleixner wrote:
> On Sat, 2005-09-24 at 12:35 +0200, Roman Zippel wrote:
> > Hi,
> > 
> > On Sat, 24 Sep 2005, Ingo Molnar wrote:
> > 
> > > > Anyway, the biggest cost is the conversion from/to the 64bit ns value 
> > > > [...]
> > > 
> > > Where do you get that notion from? Have you personally measured the 
> > > performance and code size impact of it? If yes, would you mind to share 
> > > the resulting data with us?
> > > 
> > > Our data is that the use of 64-bit nsec_t significantly reduces the size 
> > > of a representative piece of code (object size in bytes):
> > > 
> > >                 AMD64    I386        ARM          PPC32       M68K
> > >    nsec_t_ops   226      284         252          428         206
> > >    timespec_ops 412      324         448          640         342
> > > 
> > > i.e. a ~40% size reduction when going to nsec_t on m68k, in that 
> > > particular function. Even larger, ~45% code size reduction on a true 
> > > 64-bit platform.
> > 
> > Without any source these numbers are not verifiable. You don't even 
> > mention here what that "representative piece of code" is...

These numbers are misleading .. Doing a total code comparison shows that
a 2.6.14-rc2+ktimers kernel is slightly bigger than a vanilla 2.6.14-rc2
kernel (gcc 4.0, defconfig) .. So your argument that "small is faster"
must mean ktimers is slower, or at least not faster ..  

Making a speed argument based on code size doesn't make much sense to
me, if it's actually faster then show that it's faster. 

Daniel



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24 13:56                   ` Thomas Gleixner
  2005-09-24 16:51                     ` Daniel Walker
@ 2005-09-24 23:45                     ` Roman Zippel
  2005-09-25 21:00                       ` Thomas Gleixner
  2005-09-25 21:02                       ` Thomas Gleixner
  1 sibling, 2 replies; 50+ messages in thread
From: Roman Zippel @ 2005-09-24 23:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

Hi,

On Sat, 24 Sep 2005, Thomas Gleixner wrote:

> #define timespec_gt(a,b)                       	\
>         (((a).tv_sec > (b).tv_sec) ? 1 :        \
>         (((a).tv_sec < (b).tv_sec) ? 0 :        \
>         ((a).tv_nsec > (b).tv_nsec)))
> 
> #define timespec_addptr(a,b)                            \
>         (a)->tv_sec = ((a)->tv_sec + (b)->tv_sec);      \
>         (a)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec);   \
>         if ((a)->tv_nsec >= NSEC_PER_SEC){               \
>                 (a)->tv_nsec -= NSEC_PER_SEC;           \
>                 (a)->tv_sec++;                          \
>         }
> 
> #define timespec_addppp(c,a,b)                          \
>         (c)->tv_sec = ((a)->tv_sec + (b)->tv_sec);      \
>         (c)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec);   \
>         if ((c)->tv_nsec >= NSEC_PER_SEC){               \
>                 (c)->tv_nsec -= NSEC_PER_SEC;           \
>                 (c)->tv_sec++;                          \
>         }

Alternative for ktimespec:

#define timespec_gt(a,b) ((a).tv64 > (b).tv64)

#if BITS_PER_LONG == 64
#define timespec_addptr(a,b)                            \
        (a).tv64 += (b).tv64;                           \
        if ((a).tv.nsec >= NSEC_PER_SEC) {              \
                (a).tv64 += (u32)-NSEC_PER_SEC;         \
        }
        
#define timespec_addppp(c,a,b)                          \
        (c).tv64 = (a).tv64 + (b).tv64;                 \
        if ((c).tv.nsec >= NSEC_PER_SEC) {              \
                (c).tv64 += (u32)-NSEC_PER_SEC;         \
        }
#else
#define timespec_addptr(a,b)                            \
        (a).tv.sec = ((a).tv.sec + (b).tv.sec);         \
        (a).tv.nsec = ((a).tv.nsec + (b).tv.nsec);      \
        if ((a).tv.nsec >= NSEC_PER_SEC) {              \
                (a).tv.nsec -= NSEC_PER_SEC;            \
                (a).tv.sec++;                           \
        }

#define timespec_addppp(c,a,b)                          \
        (c).tv.sec = ((a).tv.sec + (b).tv.sec);         \
        (c).tv.nsec = ((a).tv.nsec + (b).tv.nsec);      \
        if ((c).tv.nsec >= NSEC_PER_SEC) {              \
                (c).tv.nsec -= NSEC_PER_SEC;            \
                (c).tv.sec++;                           \
        }
#endif  

Adding the necessary conversion to the makes the difference even smaller.

> The only point, where (k)timespec has an advantage is that the userspace
> value must not be converted to nsec_t, but deducing therefor this is the
> better overall solution is a fallacy.

That's your opinion...

> nsec_t				ktimespec
> 
> syscall:
> 32x32 mul
> 64bit add			2 x 32bit move
> 
> arm timer:
> 64 bit add			2 x 32 bit add
> 				  32 bit compare
> 				  32 bit sub
> 				  32 bit add
> 
> The 3 operation compensate for the 32x32 
> multiplication.

The multiply is not necessarly cheap, if the arch has no 32x32->64 
instruction, gcc will generate a call to __muldi3().
Overall for the common case both variations don't differ much in speed 
and size (for a single code path). For a few timers it likely doesn't 
matter and for a lot of timers the tree insert likely dominates.

> The backward conversion from nsec_t to timespec is almost a non issue.
> The vast majority of callers dont provide the second argument to
> nanosleep(), setitimer(), set_timer() which makes the conversion
> necessary and I think we optimize for the common use case.

You know very well, that the conversion back to timespec is the killer in 
your calculation. You graciously decide that the "vast majority" doesn't 
want to read the timer, how did you get to that conclusion?

> Besides that the representation of time in nsec_t values is much
> clearer. 

Well, that depends on the bigger picture, mainly how the timesource 
manages the time. We want to optimize them for a fast get(ns)timeofday, so 
we have already timespec based interfaces. Tick based sources will keep a 
cached xtime timespec, so they either have to convert that to ns or 
maintain another cached value just for your ktimers.
As long as you can't get rid of timespec completely (which is impossible), 
there is a value in keeping it as much as possible as timespec.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24 23:45                     ` Roman Zippel
@ 2005-09-25 21:00                       ` Thomas Gleixner
  2005-09-27 16:54                         ` Roman Zippel
  2005-09-25 21:02                       ` Thomas Gleixner
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-25 21:00 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

On Sun, 2005-09-25 at 01:45 +0200, Roman Zippel wrote:

> > The backward conversion from nsec_t to timespec is almost a non issue.
> > The vast majority of callers dont provide the second argument to
> > nanosleep(), setitimer(), set_timer() which makes the conversion
> > necessary and I think we optimize for the common use case.
> 
> You know very well, that the conversion back to timespec is the killer in 
> your calculation. You graciously decide that the "vast majority" doesn't 
> want to read the timer, how did you get to that conclusion?

I graciously put instrumentation into _all_ the relevant syscalls on a
desktop and a server machine. The result is that less than 1% of the
calls provide the read back variable.

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-24 23:45                     ` Roman Zippel
  2005-09-25 21:00                       ` Thomas Gleixner
@ 2005-09-25 21:02                       ` Thomas Gleixner
  2005-09-27 16:48                         ` Roman Zippel
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Gleixner @ 2005-09-25 21:02 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

On Sun, 2005-09-25 at 01:45 +0200, Roman Zippel wrote:

> The multiply is not necessarly cheap, if the arch has no 32x32->64 
> instruction, gcc will generate a call to __muldi3().

Can you please point out which architectures do not have a 32x32->64
instruction ?

tglx



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-25 21:02                       ` Thomas Gleixner
@ 2005-09-27 16:48                         ` Roman Zippel
  2005-09-27 18:38                           ` Tim Bird
  0 siblings, 1 reply; 50+ messages in thread
From: Roman Zippel @ 2005-09-27 16:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

Hi,

On Sun, 25 Sep 2005, Thomas Gleixner wrote:

> On Sun, 2005-09-25 at 01:45 +0200, Roman Zippel wrote:
> 
> > The multiply is not necessarly cheap, if the arch has no 32x32->64 
> > instruction, gcc will generate a call to __muldi3().
> 
> Can you please point out which architectures do not have a 32x32->64
> instruction ?

I have no complete overview. I know that Motorola actually removed that 
instruction in the M68060 (it causes an emulation trap) and it's still not 
back in newer ColdFire cpus. For arm it's an optional instruction in 
earlier versions (v3). For ppc it's splitted into two instructions.

For the rest you might want to check <asm/div64.h>, if div64 has to be 
emulated, there are good chances this instruction has to be emulated as 
well (especially in smaller embedded archs).

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-25 21:00                       ` Thomas Gleixner
@ 2005-09-27 16:54                         ` Roman Zippel
  2005-09-27 19:03                           ` Tim Bird
  0 siblings, 1 reply; 50+ messages in thread
From: Roman Zippel @ 2005-09-27 16:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george,
	johnstul, paulmck

Hi,

On Sun, 25 Sep 2005, Thomas Gleixner wrote:

> > You know very well, that the conversion back to timespec is the killer in 
> > your calculation. You graciously decide that the "vast majority" doesn't 
> > want to read the timer, how did you get to that conclusion?
> 
> I graciously put instrumentation into _all_ the relevant syscalls on a
> desktop and a server machine. The result is that less than 1% of the
> calls provide the read back variable.

That sill means it is used and if an application actually depends on it, 
it would be penalized by your implementation.
These timers may open up new application (in kernel or user space), where 
this conversion may be needed, so _only_ looking at the current numbers is 
a bit misleading.

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-27 16:48                         ` Roman Zippel
@ 2005-09-27 18:38                           ` Tim Bird
  2005-09-27 20:36                             ` George Anzinger
  0 siblings, 1 reply; 50+ messages in thread
From: Tim Bird @ 2005-09-27 18:38 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel,
	akpm, george, johnstul, paulmck

Roman Zippel wrote:
> On Sun, 25 Sep 2005, Thomas Gleixner wrote:
>>Can you please point out which architectures do not have a 32x32->64
>>instruction ?

<snip>
> For the rest you might want to check <asm/div64.h>, if div64 has to be 
> emulated, there are good chances this instruction has to be emulated as 
> well (especially in smaller embedded archs).

Hmmm.  In my experience, there are several embedded platforms
with a 32x32->64 instruction, which are lacking a div64 instruction.
I don't think checking for div64 is a very good metric here.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-27 16:54                         ` Roman Zippel
@ 2005-09-27 19:03                           ` Tim Bird
  2005-09-28 16:36                             ` Roman Zippel
  0 siblings, 1 reply; 50+ messages in thread
From: Tim Bird @ 2005-09-27 19:03 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel,
	akpm, george, johnstul, paulmck

Roman Zippel wrote:
> On Sun, 25 Sep 2005, Thomas Gleixner wrote:
>> Roman Zippel wrote:
>>> You know very well, that the conversion back to timespec
>>> is the killer in your calculation. You graciously
>>> decide that the "vast majority" doesn't
>>> want to read the timer, how did you get to that
>>> conclusion?
>>
>> I graciously put instrumentation into _all_ the
>> relevant syscalls on a desktop and a server machine.
>> The result is that less than 1% of the
>> calls provide the read back variable.
> 
> That still means it is used and if an application
> actually depends on it, it would be penalized by
> your implementation. These timers may open up new
> application (in kernel or user space), where
> this conversion may be needed, so _only_ looking
> at the current numbers is a bit misleading.

Oh good heavens!  One can always point to real or
hypothetical cases where a change like this
will result in worse performance.  Will you only
be satisfied if there is provably NO performance
degradation for ANY app on ANY platform?  Even
if the code is easier to maintain, and allows
for improvements in functionality and equal or
better performance for the majority of apps.
and platforms?

We're talking about a tradeoff here, and I,
of all people, should be worried about the
possible impact on low-end embedded hardware.
However, having seen some of the problems with
the current timer system in the kernel, I'm in
favor of looking at some abstraction improvements.

Unless I missed something, ktimers has not been
recommended for mainlining yet.  I suspect (without
having measured it myself yet) that the
core abstraction that it proposes (timers
vs. timeouts) is an important one for improving
the kernel timing system.

Personally, I'd like to see it go into
-mm or some other experimental tree, to give
it a proper shakedown.  If some nasty corner
cases show up, then let them show up under testing
rather than via conjecture.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-27 18:38                           ` Tim Bird
@ 2005-09-27 20:36                             ` George Anzinger
  0 siblings, 0 replies; 50+ messages in thread
From: George Anzinger @ 2005-09-27 20:36 UTC (permalink / raw)
  To: Tim Bird
  Cc: Roman Zippel, Thomas Gleixner, Ingo Molnar, Christopher Friesen,
	linux-kernel, akpm, johnstul, paulmck

Tim Bird wrote:
> Roman Zippel wrote:
> 
>>On Sun, 25 Sep 2005, Thomas Gleixner wrote:
>>
>>>Can you please point out which architectures do not have a 32x32->64
>>>instruction ?
> 
> 
> <snip>
> 
>>For the rest you might want to check <asm/div64.h>, if div64 has to be 
>>emulated, there are good chances this instruction has to be emulated as 
>>well (especially in smaller embedded archs).
> 
> 
> Hmmm.  In my experience, there are several embedded platforms
> with a 32x32->64 instruction, which are lacking a div64 instruction.
> I don't think checking for div64 is a very good metric here.

Also, even having a div64 instruction does not eliminate the asm/div64.h 
as it checks for results that are >32-bits and does the right thing. 
For example, letting the x86 do this divide with such a result, results 
in a trap.

This is why we need to be very careful where we use the div_ll_l_rem() 
which accesses just this instruction.

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-27 19:03                           ` Tim Bird
@ 2005-09-28 16:36                             ` Roman Zippel
  0 siblings, 0 replies; 50+ messages in thread
From: Roman Zippel @ 2005-09-28 16:36 UTC (permalink / raw)
  To: Tim Bird
  Cc: Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel,
	akpm, george, johnstul, paulmck

Hi,

On Tue, 27 Sep 2005, Tim Bird wrote:

> > That still means it is used and if an application
> > actually depends on it, it would be penalized by
> > your implementation. These timers may open up new
> > application (in kernel or user space), where
> > this conversion may be needed, so _only_ looking
> > at the current numbers is a bit misleading.
> 
> Oh good heavens!  One can always point to real or
> hypothetical cases where a change like this
> will result in worse performance.  Will you only
> be satisfied if there is provably NO performance
> degradation for ANY app on ANY platform?

I want to get the focus at the complete picture, as this is a rather 
critical area and I will be satisfied, as soon as I can see all 
consequences and possibilities have been considered.

>  Even
> if the code is easier to maintain, and allows
> for improvements in functionality and equal or
> better performance for the majority of apps.
> and platforms?

If that's case, you're hopefully not afraid of a few questions? Why do I 
have to take the code as is and just believe the claims about it?
I like improvements as everyone, but I also want to verify them and look 
at the alternatives and I can't see anything wrong with it.

> Unless I missed something, ktimers has not been
> recommended for mainlining yet.  I suspect (without
> having measured it myself yet) that the
> core abstraction that it proposes (timers
> vs. timeouts) is an important one for improving
> the kernel timing system.

I'm not saying that the idea is wrong, the general direction is fine, but 
some course correction should be possible?

bye, Roman

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-25 18:20 ` Zwane Mwaikambo
@ 2005-09-26  0:02   ` Sid Boyce
  0 siblings, 0 replies; 50+ messages in thread
From: Sid Boyce @ 2005-09-26  0:02 UTC (permalink / raw)
  To: linux-kernel

Zwane Mwaikambo wrote:
> On Sun, 25 Sep 2005, Sid Boyce wrote:
> 
> 
>>OT, but something that's been bugging me for quite a while.
>>I cut and paste the patch from the email to a file ktimers.patch.
>>"# patch -l -p1 <ktimer.patch" and it returns ---
>> (Patch is indented 1 space.)
>>patching file fs/exec.c
>>patch: **** malformed patch at line 16: }
>>
>>If I prepend 2 tabs to the line, it complains about line 17, I do the same to
>>line 17 and on it moves to the next. from the manpage it reads like the "-l"
>>should take care of the tabs so it only compares the text.
>>Can anyone suggest how to apply the patch? Googling didn't help.
> 
> 
> Save the entire email as a text file and apply it. Cut and paste usually 
> introduces white space damage.
> 
> 
> 

Thanks.
regards
Sid.
-- 
Sid Boyce ... Hamradio License G3VBV, licensed Private Pilot
Retired IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support 
Specialist
Microsoft Windows Free Zone - Linux used for all Computing Tasks

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
  2005-09-25 15:48 Sid Boyce
@ 2005-09-25 18:20 ` Zwane Mwaikambo
  2005-09-26  0:02   ` Sid Boyce
  0 siblings, 1 reply; 50+ messages in thread
From: Zwane Mwaikambo @ 2005-09-25 18:20 UTC (permalink / raw)
  To: Sid Boyce; +Cc: linux-kernel

On Sun, 25 Sep 2005, Sid Boyce wrote:

> OT, but something that's been bugging me for quite a while.
> I cut and paste the patch from the email to a file ktimers.patch.
> "# patch -l -p1 <ktimer.patch" and it returns ---
>  (Patch is indented 1 space.)
> patching file fs/exec.c
> patch: **** malformed patch at line 16: }
> 
> If I prepend 2 tabs to the line, it complains about line 17, I do the same to
> line 17 and on it moves to the next. from the manpage it reads like the "-l"
> should take care of the tabs so it only compares the text.
> Can anyone suggest how to apply the patch? Googling didn't help.

Save the entire email as a text file and apply it. Cut and paste usually 
introduces white space damage.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ANNOUNCE] ktimers subsystem
@ 2005-09-25 15:48 Sid Boyce
  2005-09-25 18:20 ` Zwane Mwaikambo
  0 siblings, 1 reply; 50+ messages in thread
From: Sid Boyce @ 2005-09-25 15:48 UTC (permalink / raw)
  To: linux-kernel

OT, but something that's been bugging me for quite a while.
I cut and paste the patch from the email to a file ktimers.patch.
"# patch -l -p1 <ktimer.patch" and it returns ---
  (Patch is indented 1 space.)
patching file fs/exec.c
patch: **** malformed patch at line 16: }

If I prepend 2 tabs to the line, it complains about line 17, I do the 
same to line 17 and on it moves to the next. from the manpage it reads 
like the "-l" should take care of the tabs so it only compares the text.
Can anyone suggest how to apply the patch? Googling didn't help.
Regards
Sid.
-- 
Sid Boyce ... Hamradio License G3VBV, licensed Private Pilot
Retired IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support 
Specialist
Microsoft Windows Free Zone - Linux used for all Computing Tasks

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2005-09-28 16:37 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx
2005-09-19 16:48 ` [PATCH] " tglx
2005-09-19 21:47 ` [ANNOUNCE] " Thomas Gleixner
2005-09-19 22:03 ` Christoph Lameter
2005-09-19 22:17   ` Thomas Gleixner
2005-09-19 22:24     ` Christoph Lameter
2005-09-19 22:44       ` Thomas Gleixner
2005-09-19 22:50         ` john stultz
2005-09-19 22:58           ` Thomas Gleixner
2005-09-19 23:04         ` Christoph Lameter
2005-09-19 23:12           ` Thomas Gleixner
2005-09-20  7:14             ` Ingo Molnar
2005-09-20  7:10       ` Ingo Molnar
2005-09-21 19:24       ` Pavel Machek
2005-09-19 22:39     ` Christopher Friesen
2005-09-19 22:54       ` Thomas Gleixner
2005-09-20  4:57         ` Christopher Friesen
2005-09-20  5:11           ` Thomas Gleixner
2005-09-20  0:43   ` George Anzinger
2005-09-21 19:50 ` Roman Zippel
2005-09-21 22:41   ` Thomas Gleixner
2005-09-22 12:59     ` Ingo Molnar
2005-09-22 23:09     ` Roman Zippel
2005-09-22 23:31       ` Christopher Friesen
2005-09-23  0:25         ` Roman Zippel
2005-09-23  6:49           ` Thomas Gleixner
2005-09-24  3:15             ` Roman Zippel
2005-09-24  5:16               ` Ingo Molnar
2005-09-24 10:35                 ` Roman Zippel
2005-09-24 13:56                   ` Thomas Gleixner
2005-09-24 16:51                     ` Daniel Walker
2005-09-24 23:45                     ` Roman Zippel
2005-09-25 21:00                       ` Thomas Gleixner
2005-09-27 16:54                         ` Roman Zippel
2005-09-27 19:03                           ` Tim Bird
2005-09-28 16:36                             ` Roman Zippel
2005-09-25 21:02                       ` Thomas Gleixner
2005-09-27 16:48                         ` Roman Zippel
2005-09-27 18:38                           ` Tim Bird
2005-09-27 20:36                             ` George Anzinger
2005-09-23  2:25       ` john stultz
2005-09-23  8:27       ` Thomas Gleixner
2005-09-24  2:43         ` Roman Zippel
2005-09-24  5:03           ` Ingo Molnar
2005-09-24  9:04           ` James Bruce
2005-09-23 15:21       ` Paul E. McKenney
2005-09-24  3:38         ` Roman Zippel
2005-09-25 15:48 Sid Boyce
2005-09-25 18:20 ` Zwane Mwaikambo
2005-09-26  0:02   ` Sid Boyce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).