* [ANNOUNCE] ktimers subsystem
@ 2005-09-19 16:48 tglx
2005-09-19 16:48 ` [PATCH] " tglx
` (3 more replies)
0 siblings, 4 replies; 50+ messages in thread
From: tglx @ 2005-09-19 16:48 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, akpm, george, johnstul, paulmck
ktimers seperate the "timer API" from the "timeout API". ktimers are
used for:
- nanosleep
- posixtimers
- itimers
The following text explains the rationale behind ktimers. It contains
- a general analysis of the current Linux time(r) core system and
patches / projects related to it.
- a detailed description of necessary changes to the Linux time(r)
core system
- detailed explanation of the ktimer subsystem
- a short explanation of possible follow up patches to demonstrate the
further possibilities of the ktimers subsystem implementation
Why do we need ktimers ?
========================
Authors: Thomas Gleixner, Ingo Molnar
A lot of discussion took place about Linux timekeeping and timers
recently. The efforts to integrate the High Resolution Timer patches
into the -rt tree gave a deep insight into the big picture and initiated
the ktimers implementation. This document is an analysis of all related
issues, with the goal of inclusion of ktimers into mainline.
Linux time(rs) status, very short summary
-----------------------------------------
The current upstream timer implementation of Linux is based on a
periodic system tick (jiffy). This periodic tick initially had a period
of 10ms. During the 2.5 development series this was changed for some
architectures to 1ms and recently corrected to 4ms.
The upstream "time of day" (tod) timekeeping code builds on top of the
periodic ticks and takes architecture dependent sub-tick resolution
mechanisms into account to provide finer resolution. It also implements
synchronization with external time sources such as NTP.
Patches and projects related to timers and timekeeping, in history order
------------------------------------------------------------------------
- UTIME Usec Resolution Timers
- HRT High Resolution Timers
- VST Variable Scheduling Timeouts
- DTCK Dynamic Ticks
- NEWTOD New timeofday core including reworked NTP
- some related architecture specific code already integrated into the
kernel (mostly s390 related time interpolator code)
- a couple of others - rather odd attempts to change timer
resolution. Mostly single purpose patches.
All of those patches have one thing in common. They are restricted to a
few architectures and address only single problems of timers and
timekeeping.
UTIME: Microsecond resolution timers
-------------------------------------
The patch history goes back to Linux 2.0 and is maintained by Dr.
Douglas Niehaus at Kansas University as part of the KURT (Kansas
University Realtime) project.
Supported platforms: x86, (PPC, ARM partial)
Implementation:
Originally the usec resolution support was available for all users of
the timer core system, but during the course of development it turned
out to be a waste of resources and was restricted to realtime processes.
The implementation is restricted to nanosleep and itimers. Posix timer
support is planned. Initially it was implemented on top of the timer
wheel, but recently converted to a seperate list for high resolution
timers.
A related and quite interesting point of activity in this project is the
research on fine granular synchronization of machines in a network.
HRT: High Resolution Timers
---------------------------
George Anzinger forked the UTIME parts of KURT some years ago and
started the High Resolution Timer project. The usage is restricted to
posix timers with clock = CLOCK_REALTIME_HR and CLOCK_MONOTONIC_HR.
Supported platforms: x86, PPC, PPC, PPC64, SH, ARM partial
Implementation:
High resolution timers are kept in the timer wheel until they reach the
jiffy where they expire. On expiry they are moved to a seperate list and
arm the high resolution timer source. The timer function is handled in a
seperate softirq. The high resolution portion of the timers is managed
as a seperate field in the timer structure which holds the fractional
part. Initially implemented as subjiffies with a given resolution it
changed to a variable holding the time source cycles to reduce
conversion overhead. This has impact on the accuracy of cyclic
schedules and also has some limitations for high resolution time sources
with variable frequencies, e.g. TSC on x86.
Possible changes are work in progress.
VST: Variable Scheduling Timeouts
---------------------------------
A patch closely related to and depending on HRT, also maintained by
George Anzinger. It provides the suppression of timer ticks during idle
periods.
Supported platforms: x86, (One ARM platform supported, a related patch
snippet recently sneaked into the ARM core interrupt handling code)
Implementation:
Whenever the system goes into idle state the timer list(s) are scanned
to find the next timer and, if it is reasonably far away, VST turns off
the periodic 1/HZ timer interrupts and sets up a timer interrupt at the
expiry time of said timer. VST also provides a callback list which is
used to notify about idle enter / leave events. On the next interrupt,
be it the VST timer or some other interrupt, the periodic 1/HZ timer is
restarted and the elapsed time (ticks) is properly accounted for.
DTCK: Dynamic Ticks
-------------------
An implementation similar to VST, but not depending on other patches.
Maintained by Con Kolivas. It also provides the suppression of timer
ticks during idle periods.
Supported platforms: x86
Implementation:
Similar to VST, but completely jiffy bound. Very actively maintained in
recent months, with good progress. The core implementation is leaner
than VST. It contains some x86 specific bits which have to be seperated
out. It uses the already existing NO_IDLE_HZ code (s390) instead of
introducing new duplicated functionality. The configuration interface is
integrated into sysfs. A generic notification interface is not available
(yet?).
NEWTOD: New timeofday core including reworked NTP
-------------------------------------------------
John Stultz maintains a set of patches which are related, but have been
split for easier review and discussion.
- Reworked implementation of NTP synchronization
- Seperation of time of day timekeeping from the timer core
Supported platforms: x86
Implementation:
The time of day code is completely seperated from the periodic tick. The
code provides a runtime configurable time source selection, which is
intended to be of generic (architecture spanning) use. One of the
possible time sources is the periodic tick of course.
The code gets rid of one of the fundamental flaws of Linux time keeping:
the wrong order of deduction. The current upstream code derives almost
everything except jiffies from the wall clock time (xtime). This is
controversial to almost every time reference in the world. Usually time
references are built on a raw hardware clock which provides a more or
less accurate monotonic time source. This time source is corrected vs.
frequency skew. On top of resulting "constant frequency" monotonic time
source the human time conversions are implemented, e.g. timezones.
John's patch addresses this nicely and builds the correct order of clock
source -> frequency correction -> wall clock adjustment. This is one of
the essential preliminaries to implement nonintrusive, simple and
effective high resolution time support.
The lively discussion of the patch is not questioning the general idea.
The main point of criticizm is related to the enforced usage of 64-bit
variables and 64-bit arithmetic in hot execution paths. This is seen as
a penalty for 32-bit architectures and for low computing power CPUs
which are often used in embedded devices, but architectural simplicity
is a strong argument in favor of 64-bit arithmetics and we have not seen
a substantial proof of the overhead. (See also the detailed comparison
of timespec versus 64-bit nsec_t further below.)
Timer related observations
--------------------------
Ticks are a convenient mechanism for a lot of time triggered functions
like scheduler-ticks, timeouts etc. which require limited resolution and
precision.
For time of day timekeeping, which requires sub tick resolution
preferrable in human time units, ticks introduce a bunch of ugliness
especially when it comes to time synchronizing with high resolution time
sources. Another astonishing implementation detail of the current time
keeping is the fact that we get the monotonic clock (defined by POSIX as
a continous clock source which can not be set) by subtracting a variable
offset from the real time clock, which can be set by the user and
corrected by NTP or other mechanisms.
Another well-known drawback of the current tick based implementation is
the fact that ticks happen even on a completely idle system. This is an
undesired behaviour for battery powered devices. Resolving this
currently needs a lot of quirks to the upstream time(rs) system.
The current POSIX timer implementation is also quite complex due to its
implementation on top of the tick timers. We are forced to e.g. keep
track of armed CLOCK_REALTIME timers to readjust them when the clock has
been set. The POSIX timer API, as defined by Posix Specification 1003.1,
is inconsistent in the treatment of relative and absolute CLOCK_REALTIME
timers. Absolute timers are influenced by clock_set, relative timers are
not. This also applies to nanosleep. The specification of nanosleep
states on the other hand that the sleeping time must not be less than
the given timeout measured by CLOCK_REALTIME. Setting the clock while a
nanosleep is scheduled leads to an interesting situation:
Process 1 Process2
t1=get_timeofday()
nanosleep(20s)
set_timeofday(relative -20s)
t2=get_timeofday()
t2 - t1 =~ 0s
Implementing high resolution timers on top of a the current system also
requires a lot of quirks to keep the timer API usable for both high
resolution and tick based timers.
Such kinds of 'interaction artifacts' between the tick based data
structures and algorithms and the high-resolution data structures, even
if looked at without knowing anything about the time(r) subsystem,
already point in the direction of separating 'high resolution time(r)'
and 'low resolution timeout' APIs and subsystems.
As mentioned earlier, the switch to 1ms ticks during 2.5 development
series turned out to cause certain regressions and was recently
corrected to 4ms. One common type of regression was 'timer
soft-interrupt overrun', i.e. when processing related to a timer tick
did not finish within one jiffy, causing a domino effect on the timing
quality of the system.
What are the reasons? During the work on integrating high resolution
timers into the -rt tree we observed a lot of details related to this
problem. When changing the period of the timer tick (changing HZ) the
size of the primary timer wheel in the core code remains unchanged. This
results in the fact that the primary timer wheel [into which wheel the
secondary wheels are 'cascaded' periodically] becomes capable of
handling a smaller time span than before. The CONFIG_BASE_SMALL option
makes it even worse. Here is a table of the capacity limits of the
primary wheel:
HZ CONFIG_BASE_SMALL=n CONFIG_BASE_SMALL=y
100 2560 ms 640 ms
250 1024 ms 256 ms
1000 256 ms 64 ms
So one source of regression is the increased necessity to move
non-expired timers from the outer wheels to the primary wheel.
This alone does not explain all the regressions yet though. We did some
instrumentation and statistics on the timer code related to common use
cases, where the regressions showed up - machines with high networking
and/or disk I/O load.
This revealed a reasonable explanation for this behaviour. Both
networking and disk I/O arm a lot of timeout timers (the maximum number
of armed timers during the tests observed was ~400000). The majority of
those timeouts are in the range of 0.5 seconds. As frequently seen with
timeout timers, most of those timers never expire, but under high load
they get easily into a time line where they have to be moved from the
outer wheel to the primary timer wheel, when HZ=1000. Have a look at the
timer cascading code [cascade() in kernel/timer.c] to see the penalty...
Another source of regression is the fact that quite a lot of timer
functions execute long lasting codepaths. E.g. in the networking code
rt_secret_rebuild() does a loop over rt_hash_mask (1024 in my case),
over entries and over some subsequent variable sized loops inside each
step. On a 300MHZ PPC system this accumulated to a worst case total of
>5ms (!). The networking code contains more of those loops in timer
functions and the worst case szenario is when all of those loops happen
in the same jiffy and block the timely delivery of other timers. There
are other culprits, but those in the networking code are the most
obvious ones. This went almost unnoticed on HZ=100 systems, but on
HZ=1000 based machines those effects surfaced. The change to HZ=250 is
just hiding the problem rather than solving it.
Another weird effect of the changes to the time tick period is the fact
that a lot of places in the code are using HZ incorrectly. Even today,
more than a year after the switchover. Many of those usages still assume
HZ=100 or even have a completely wrong understanding of the mechanism
provided by the Linux kernel timer API (which is unrelated to the HZ
changes of course).
These observations together finally led to the complete seperation of
the high resolution timer data structures from the jiffy wheel, in the
HRT-RT integration work. (to further reduce latencies we also separated
softirq threads, but that is another topic.)
What is solved by the available patches ?
-----------------------------------------
As said before each of the patches addresses a particular part of the
time(r) related problems. Some of them are competing implementations.
We don't want to put down the efforts of the particular projects, but one
outstanding patch is John Stultz's work on the new time of day
subsystem. It really addresses one of the substantial linux time(r)
problems in a very generic and architecture-independent way, upon which
the other efforts can build cleanly.
The other patches mostly relate to tickless systems and high resolution
timers, and are providing great proof of concept implementations but
suffer from the bindings to particular architectures and the
restrictions that the current upstream Linux timer core code imposes
upon them.
What changes are required?
--------------------------
The conclusions of my recent work on Linux time(rs) related problems and
the analysis of related patches are:
1. The HZ/jiffy based usage of time in the kernel code has to be
converted to human time units.
2. A clean seperation of all related APIs and subsystems is necessary
even if they have interdependencies and shared functionality
| HZ/jiffy conversion
The conversion of users of HZ/jiffy based timing to human time units is
necessary to allow changes to the core timer subsystem without breaking
the users all over the kernel. Looking at the code most HZ/jiffie timers
are using more or less correct conversions from human time units anyway.
A positive side effect of such a cleanup is the necessary auditing for
correctness.
| API seperation
- time sources
- time synchronization
- time of day API
- timers API
- timeout API
- time sources:
The number and the resolution of available time sources varies a lot
over architectures and particular architecture specific platform
implementations. Some of them are only run time detectable. NEWTOD
provides a excellent code base for time source abstraction, but a couple
of details have to be discussed:
- resolution selection
- resolution and architecture dependend interface
- support for tick bound and tick less systems including a clean
integration into the interrupt handling code.
- usability of timesources for differrent purposes (timekeeping,
high/low res event scheduling)
- 32- vs. 64-bit arithmetic
- time synchronisation:
Time syncronization corrects the frequency skew of the time source. It
must provide a plugable interface for time synchronisation mechanisms to
allow the flexible implementation of time synchronization sources:
- None
- NTP
- GPS
- RTC
- ...
- time of day API:
The time of day API makes use of the eventually frequency corrected time
sources to implement the "human readable" interface. It is also
responsible for the translation of the monotonic time source - time
since system (re)booted - to the wall clock - real time - time. (Real
time in this context must not be confused with "real time" in the sense
of determinism.)
- timer API:
The timer API provides finegrained precision timers with relation to the
time of day subsystem. It provides the functionality of:
- precise interval scheduling
- precise timeouts
with or without high resolution timing support depending on system
configuration and system capabilities.
- timeout API:
The timeout API provides a coarse resolution interface for timeout
purposes. As pointed out before the majority of timers are related to
timeouts. What's the nature of timeouts ?
- Timeout timers are usually armed to cover an error condition
- Most of those timers never expire (the non timer related good
condition arrives before expiry)
- The demands on resolution are usually quite low. It does not make
any difference if an error condition is detected a few or even a
few hundred milliseconds earlier or later. The relevant point is
that the error is detected at all.
On a heavy loaded web server ~95%+ of all timers (almost all of them
armed by network or I/O kernel code) are removed before expiry. The
remaining timers which really expire are mostly timers requested from
application code. The major usage there is some periodic supervisor
code, which checks program status or other application relevant
information, and delays.
Conclusion
----------
Before inclusion of extensions to the current timer implementation e.g.
dynamic ticks, a API seperation and cleanup has to be done. Integrating
new functionality on top of the current code will just introduce a lot
of quirks and oddities which make the necessary cleanup and rework
harder.
John Stultz timeofday patches provide an excellent and solid base to
solve the first 3 of 5 points of the API seperation changes.
Ktimers add the timer API seperation with a clean way to integrate high
resolution time keeping.
The combination of both patches provides the grounds and leads the way
to the cleanup of the timeout API and the implementation of
dyntick/tickless support without introducing additional ugliness.
What is solved by ktimers ?
===========================
ktimers seperate the "timer API" from the "timeout API". ktimers are
used for:
- nanosleep
- posixtimers
- itimers
The implementation was done with following constraints in mind:
- Not bound to jiffies
- Multiple time sources
- Per CPU timer queues
- Simplification of absolute CLOCK_REALTIME posix timers
- High resolution timer aware
- Allows the timeout API to reschedule the next event
(for tickless systems)
Ktimers enqueue the timers into a time sorted list, which is implemented
with a rbtree, which is effiecient and already used in other performance
critical parts of the kernel. This is a bit slower than the timer wheel,
but due to the fact that the vast majority of timers is actually
expiring it has to be waged versus the cascading penalty.
The code supports multiple time sources. Currently implemented are
CLOCK_REALTIME and CLOCK_MONOTONIC. They provide seperate timer queues
and support functions.
The time ordered implementation and storage of the expiry time as the
time of the selected time source removes the hard work of
reprogramming all armed absolute CLOCK_REALTIME posix timers when the
clock was set
During the initial implementation phase the choice of a time storage
format had to be done. Dispite the previous discussion about 64-bit time
storage vs. timespec structures, the decision was made to use plain
64-bit variables. The rationale behind this is:
1. Simple calculations (add, sub, compare), which are the ones used in
the fastpaths are simpler and better to handle than struct
timespec.
2. The storage size is the same for 32-bit systems, but half the size
on 64-bit machines
3. The resulting binary code size is smaller due to the simpler fast
path operations. The comparison of the resulting binary code size
of a function which resembles parts of the hotpaths in the
enqueue and expiry code make this very clear. All compiled with
gcc-3.4 -O2.
AMD64 I386 ARM PPC32 M68K
nsec_t_ops e2 11c fc 1ac ce
timespec_ops 19c 144 1c0 280 156
Smaller binary code usually executes faster.
4. The kludge introduced by timespec arithmetics is horrible to
maintain. The simple and straight forward 64-bit calculation have
much less of surprises and potential error sources hidden.
5. The areas where the 64-bit nsec_t value has to be converted to
timespec / timeval are very restricted and can be optimized
further. Except for one odd POSIX timer related case (cyclic timer
with no signal delivery) the calculation is simple and straight
forward. The most discussed code in John Stultz timeofday patch is
the conversion in gettimeofday(). This can be easily solved by a
low overhead storage in both formats. The POSIX timespec / timeval
interface to userspace for the apparently often used gettimeofday
syscall must not be used as an red herring to clutter the complete
kernel timer and time keeping subsystems with this uneffective
representation of time.
ktimers are available in a patch series for easier review:
- ktimer_base.patch
The basic implementation of ktimers. The timer queues are called
inside the existing timer softirq. The time ordered queue
implementation allows to remove all the abs_list functionality from
posix-timers.c. Converted interfaces: itimers, nanosleep, posix
timers. There is no change to the current kernel time keeping system
required. The base patch utilizes existing interfaces.
The following add on patches are not provided for ad hoc inclusion as
they contain third party patches. The reason for providing this series
is to demonstrate the future use of ktimers and the simple
extensibility for the impelemtation of high resolution timers.
Especially John Stultz timeofday patch is a complete seperate issue
and just used due to the ability to provide high resolution timers in
a simple and non intrusive way.
The full patch series is available from
http://www.tglx.de/private/tglx/ktimers/patches-ktimer-tod-hrt.tar.bz2
- ktimer_hres.patch
Generic extension to the base patch to support high resolution
timers. The high resolution changes are the seperation of the
softirq, the high resolution interrupt function and the timer
reprogramming management.
- timeofday_b5 patch
Integration of John Stultz timeofday patches to have a clean
abstraction of time sources for a non intrusive implementation of
high resolution timers. This patch will be replaced by the reworked
version which is currently developed by John Stultz.
- timeofday_fixup patch
Fixup clashing inlines
- ktimer_tod.patch
extend the timeofday API and switch ktimers to use the timeofday API
- hres_i386_support.patch
Patch to support high resolution timers on i386 with local APIC
timer used for high resolution events. Proof of concept with the
restriction to local APIC as high resolution event source at the
moment. The main point is to prove that high resolution timers do
not require large and intrusive patches anymore due to the cleanup
of the time system.
Test coverage:
The complete patch is tested with the posix timer tests, which all
pass. It survives a couple of stress tests and shows no flaws when
integrated into the -rt tree. Of course this is brand new code, but
it is designed simple and robust from ground and got a thorough
review by a couple of people.
Some notes to the patch size(s):
- ktimer_base.patch:
17 files changed, 1328 insertions(+), 859 deletions(-)
code (-) 642 (+) 1058
comments (-) 217 (+) 270
The added code is mostly the base functionality of the ktimers
itself. The most cleanups happen in posix-timers.c, where all the
code related to the clock_was_set adjustment of absolute
CLOCK_REALTIME timers is removed. Converts itimers, nanosleep and
posixtimers to ktimer users
- ktimer_hres_patch
3 files changed, 330 insertions(+), 7 deletions(-)
code (-) 6 (+) 232
comments (-) 1 (+) 98
Add the generic infrastructure for high resolution timers. No non
POSIX clocks introduced, all ktimer users are automatically converted
- timeofday_b5.patch
73 files changed, 2926 insertions(+), 2675 deletions(-)
code (-) 2100 (+) 2146
comments (-) 575 (+) 780
A balanced patch providing a great improvement of functionality and
abstraction.
- hres_i386_support.patch
13 files changed, 635 insertions(+), 10 deletions(-)
code (-) 9 (+) 386
comments (-) 1 (+) 249
The largest addon is a header file containing scaled math operations, which
needs to be cleaned up.
- Total patch size
97 files changed, 5238 insertions(+), 3544 deletions(-)
code (-) 2751 (+) 3829
comments (-) 793 (+) 1409
Comparision numbers:
- hrt-common.patch
13 files changed, 1464 insertions(+), 108 deletions(-)
code (-) 91 (+) 879
comments (-) 17 (+) 585
Most code is added to the
Only posixtimers are supported. Add seperate non POSIX clocks
(CLOCK_REALTIME_HR and CLOCK_MONOTONIC_HR)
- i386-hrt.patch (reduced to apic code)
13 files changed, 1371 insertions(+), 63 deletions(-)
code (-) 51 (+) 910
comments (-) 12 (+) 461
- combined
26 files changed, 2835 insertions(+), 171 deletions(-)
code (-) 142 (+) 1789
comments (-) 29 (+) 1046
Summary:
The ktimer/timeofday/hrt combination adds ~1050 lines of source and
provides a clean API seperation and a lot of code/functionality
cleanup.
The high resolution timer patches add ~1750 lines of code for high
resolution time keeping without further functional improvements or
API cleanups.
^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH] ktimers subsystem 2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx @ 2005-09-19 16:48 ` tglx 2005-09-19 21:47 ` [ANNOUNCE] " Thomas Gleixner ` (2 subsequent siblings) 3 siblings, 0 replies; 50+ messages in thread From: tglx @ 2005-09-19 16:48 UTC (permalink / raw) To: linux-kernel; +Cc: mingo, akpm, george, johnstul, paulmck ktimers seperate the "timer API" from the "timeout API". ktimers are used for: - nanosleep - posixtimers - itimers The patch contains the base implementation of ktimers and the conversion of nanosleep, posixtimers and itimers to ktimer users. The patch does not require other changes to the Linux time(r) core system. The implementation was done with following constraints in mind: - Not bound to jiffies - Multiple time sources - Per CPU timer queues - Simplification of absolute CLOCK_REALTIME posix timers - High resolution timer aware - Allows the timeout API to reschedule the next event (for tickless systems) Ktimers enqueue the timers into a time sorted list, which is implemented with a rbtree, which is effiecient and already used in other performance critical parts of the kernel. This is a bit slower than the timer wheel, but due to the fact that the vast majority of timers is actually expiring it has to be waged versus the cascading penalty. The code supports multiple time sources. Currently implemented are CLOCK_REALTIME and CLOCK_MONOTONIC. They provide seperate timer queues and support functions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- fs/exec.c | 9 fs/proc/array.c | 6 include/asm-generic/div64.h | 18 include/linux/ktimer.h | 143 +++++++ include/linux/posix-timers.h | 87 ++-- include/linux/sched.h | 4 include/linux/time.h | 65 ++- include/linux/timer.h | 2 init/main.c | 1 kernel/Makefile | 3 kernel/exit.c | 2 kernel/fork.c | 5 kernel/itimer.c | 83 +--- kernel/ktimers.c | 845 +++++++++++++++++++++++++++++++++++++++++++ kernel/posix-cpu-timers.c | 23 - kernel/posix-timers.c | 832 +++++++----------------------------------- kernel/timer.c | 59 --- 17 files changed, 1328 insertions(+), 859 deletions(-) Index: linux-2.6.13.ktimers/fs/exec.c =================================================================== --- linux-2.6.13.ktimers.orig/fs/exec.c +++ linux-2.6.13.ktimers/fs/exec.c @@ -650,9 +650,10 @@ static inline int de_thread(struct task_ * synchronize with any firing (by calling del_timer_sync) * before we can safely let the old group leader die. */ - sig->real_timer.data = (unsigned long)current; - if (del_timer_sync(&sig->real_timer)) - add_timer(&sig->real_timer); + sig->real_timer.data = current; + if (stop_ktimer(&sig->real_timer)) + start_ktimer(&sig->real_timer, NULL, + KTIMER_RESTART|KTIMER_NOCHECK); } while (atomic_read(&sig->count) > count) { sig->group_exit_task = current; @@ -664,7 +665,7 @@ static inline int de_thread(struct task_ } sig->group_exit_task = NULL; sig->notify_count = 0; - sig->real_timer.data = (unsigned long)current; + sig->real_timer.data = current; spin_unlock_irq(lock); /* Index: linux-2.6.13.ktimers/fs/proc/array.c =================================================================== --- linux-2.6.13.ktimers.orig/fs/proc/array.c +++ linux-2.6.13.ktimers/fs/proc/array.c @@ -324,7 +324,7 @@ static int do_task_stat(struct task_stru unsigned long min_flt = 0, maj_flt = 0; cputime_t cutime, cstime, utime, stime; unsigned long rsslim = 0; - unsigned long it_real_value = 0; + nsec_t it_real_value = 0; struct task_struct *t; char tcomm[sizeof(task->comm)]; @@ -380,7 +380,7 @@ static int do_task_stat(struct task_stru utime = cputime_add(utime, task->signal->utime); stime = cputime_add(stime, task->signal->stime); } - it_real_value = task->signal->it_real_value; + it_real_value = task->signal->real_timer.expires; } ppid = pid_alive(task) ? task->group_leader->real_parent->tgid : 0; read_unlock(&tasklist_lock); @@ -429,7 +429,7 @@ static int do_task_stat(struct task_stru priority, nice, num_threads, - jiffies_to_clock_t(it_real_value), + (clock_t) nsec_to_clock_t(it_real_value), start_time, vsize, mm ? get_mm_counter(mm, rss) : 0, /* you might want to shift this left 3 */ Index: linux-2.6.13.ktimers/include/linux/ktimer.h =================================================================== --- /dev/null +++ linux-2.6.13.ktimers/include/linux/ktimer.h @@ -0,0 +1,143 @@ +#ifndef _LINUX_KTIMER_H +#define _LINUX_KTIMER_H + +#include <linux/init.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/time.h> +#include <linux/wait.h> + +/* Timer API */ + +struct ktimer_base; + +/* + * Timer structure must be initialized by init_ktimer_xxx ! + */ +struct ktimer { + struct rb_node node; + struct list_head list; + nsec_t expires; + nsec_t expired; + nsec_t interval; + int overrun; + unsigned long status; + void (*function)(void *); + void *data; + struct ktimer_base *base; +}; + +/* + * Timer base struct + */ +struct ktimer_base { + struct ktimer_base *base; + char *name; + spinlock_t lock; + struct rb_root active; + struct list_head pending; + int count; + unsigned long resolution; + nsec_t (*get_time)(void); + struct ktimer *running_timer; + wait_queue_head_t wait_for_running_timer; +}; + +/* + * Values for the mode argument of xxx_ktimer functions + */ +enum +{ + KTIMER_NOREARM, /* Internal value */ + KTIMER_ABS, /* Time value is absolute */ + KTIMER_REL, /* Time value is relativ to now */ + KTIMER_INCR, /* Time value is relativ to previous expiry time */ + KTIMER_FORWARD, /* Timer is rearmed with value. Overruns are accounted */ + KTIMER_REARM, /* Timer is rearmed with interval. Overruns are accounted */ + KTIMER_RESTART /* Timer is restarted with the stored expiry value */ +}; + +/* Expiry must not be checked when the timer is started */ +#define KTIMER_NOCHECK 0x10000 + +#define MAX_KTIMER_BASES 4 +#define KTIMER_POISON ((void *) 0x00100101) +#define KTIMERS_MAX_NSEC (~(1LL<<63)) + +#define ktimer_before(t1, t2) (t1->expires < t2->expires) +#define ktimer_active(t) ((t)->status != KTIMER_INACTIVE) + +enum +{ + KTIMER_INACTIVE, + KTIMER_PENDING, + KTIMER_EXPIRED, + KTIMER_EXPIRED_NOQUEUE, +}; + +/* Exported functions */ +extern void fastcall init_ktimer_real(struct ktimer *timer); +extern void fastcall init_ktimer_mono(struct ktimer *timer); +extern int modify_ktimer(struct ktimer *timer, nsec_t *tim, int mode); +extern int start_ktimer(struct ktimer *timer, nsec_t *tim, int mode); +extern int try_to_stop_ktimer(struct ktimer *timer); +extern int stop_ktimer(struct ktimer *timer); +extern nsec_t get_remtime_ktimer(struct ktimer *timer, long fake); +extern nsec_t get_expiry_ktimer(struct ktimer *timer, nsec_t *now); +extern void __init init_ktimers(void); + +/* Conversion functions with rounding based on resolution */ +extern nsec_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv); +extern nsec_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts); + +/* Posix timers current quirks */ +extern int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp); +extern int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp); + +/* nanosleep functions */ +long ktimer_nanosleep_mono(struct timespec *rqtp, struct timespec __user *rmtp, int mode); +long ktimer_nanosleep_real(struct timespec *rqtp, struct timespec __user *rmtp, int mode); + +#if defined(CONFIG_SMP) +extern void wait_for_ktimer(struct ktimer *timer); +#else +#define wait_for_ktimer(t) do {} while (0) +#endif + +#define KTIME_REALTIME_RES (NSEC_PER_SEC/HZ) +#define KTIME_MONOTONIC_RES (NSEC_PER_SEC/HZ) + +static inline void get_ktime_mono_ts(struct timespec *ts) +{ + unsigned long seq; + struct timespec tomono; + do { + seq = read_seqbegin(&xtime_lock); + getnstimeofday(ts); + tomono = wall_to_monotonic; + } while (read_seqretry(&xtime_lock, seq)); + + set_normalized_timespec(ts, ts->tv_sec + tomono.tv_sec, + ts->tv_nsec + tomono.tv_nsec); +} + +static inline nsec_t do_get_ktime_mono(void) +{ + struct timespec tm; + get_ktime_mono_ts(&tm); + return timespec_to_ns(&tm); +} + +#define get_ktime_real_ts(ts) getnstimeofday(ts) +static inline nsec_t do_get_ktime_real(void) +{ + struct timeval now; + + do_gettimeofday(&now); + return timeval_to_ns(&now); +} + +#define clock_was_set() do { } while (0) +extern void run_ktimer_queues(void); + +#endif Index: linux-2.6.13.ktimers/include/linux/posix-timers.h =================================================================== --- linux-2.6.13.ktimers.orig/include/linux/posix-timers.h +++ linux-2.6.13.ktimers/include/linux/posix-timers.h @@ -51,10 +51,9 @@ struct k_itimer { struct sigqueue *sigq; /* signal queue entry. */ union { struct { - struct timer_list timer; - struct list_head abs_timer_entry; /* clock abs_timer_list */ - struct timespec wall_to_prev; /* wall_to_monotonic used when set */ - unsigned long incr; /* interval in jiffies */ + struct ktimer timer; + nsec_t incr; + int overrun; } real; struct cpu_timer_list cpu; struct { @@ -66,10 +65,6 @@ struct k_itimer { } it; }; -struct k_clock_abs { - struct list_head list; - spinlock_t lock; -}; struct k_clock { int res; /* in nano seconds */ int (*clock_getres) (clockid_t which_clock, struct timespec *tp); @@ -77,7 +72,7 @@ struct k_clock { int (*clock_set) (clockid_t which_clock, struct timespec * tp); int (*clock_get) (clockid_t which_clock, struct timespec * tp); int (*timer_create) (struct k_itimer *timer); - int (*nsleep) (clockid_t which_clock, int flags, struct timespec *); + int (*nsleep) (clockid_t which_clock, int flags, struct timespec *, struct timespec *); int (*timer_set) (struct k_itimer * timr, int flags, struct itimerspec * new_setting, struct itimerspec * old_setting); @@ -91,37 +86,71 @@ void register_posix_clock(clockid_t cloc /* Error handlers for timer_create, nanosleep and settime */ int do_posix_clock_notimer_create(struct k_itimer *timer); -int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *); +int do_posix_clock_nonanosleep(clockid_t, int flags, struct timespec *, struct timespec __user *); int do_posix_clock_nosettime(clockid_t, struct timespec *tp); /* function to call to trigger timer event */ int posix_timer_event(struct k_itimer *timr, int si_private); -struct now_struct { - unsigned long jiffies; -}; +#if (BITS_PER_LONG < 64) +static inline nsec_t forward_posix_timer(struct k_itimer *t, nsec_t now) +{ + nsec_t delta = now - t->it.real.timer.expires; + unsigned long long orun = 1; + + if (delta < 0) + return -delta; + + if (unlikely(delta >= t->it.real.incr)) { + if ((t->it.real.incr >> 32) == 0) { + do_div(delta, (unsigned long) t->it.real.incr); + orun += delta; + } else { + int sft = 0; + u64 t1 = t->it.real.incr; + u64 t2 = delta; + + while(t1 >> 32) { + sft++; + t2 >>= 1; + } + t2 >>= sft; + do_div(t2, (unsigned long) t1); + t2 *= t->it.real.incr +; + if (t2 <= delta) + orun++; + } + } + t->it_overrun += (long) orun; + t->it.real.timer.expires += orun * t->it.real.incr; + return t->it.real.timer.expires - now; +} +#else +static inline nsec_t forward_posix_timer(struct k_itimer *t, nsec_t now) +{ + nsec_t delta = now - t->it.real.timer.expires; + unsigned long orun = 1; + + if (delta < 0) + return -delta; + + if (unlikely(delta >= t->it.real.incr)) + orun += delta / t->it.real.incr; + t->it.real.timer.expires += orun * t->it.real.incr; + t->it_overrun += orun; + return t->it.real.timer.expires - now; +} +#endif + -#define posix_get_now(now) (now)->jiffies = jiffies; -#define posix_time_before(timer, now) \ - time_before((timer)->expires, (now)->jiffies) - -#define posix_bump_timer(timr, now) \ - do { \ - long delta, orun; \ - delta = now.jiffies - (timr)->it.real.timer.expires; \ - if (delta >= 0) { \ - orun = 1 + (delta / (timr)->it.real.incr); \ - (timr)->it.real.timer.expires += \ - orun * (timr)->it.real.incr; \ - (timr)->it_overrun += orun; \ - } \ - }while (0) int posix_cpu_clock_getres(clockid_t which_clock, struct timespec *); int posix_cpu_clock_get(clockid_t which_clock, struct timespec *); int posix_cpu_clock_set(clockid_t which_clock, const struct timespec *tp); int posix_cpu_timer_create(struct k_itimer *); -int posix_cpu_nsleep(clockid_t, int, struct timespec *); +int posix_cpu_nsleep(clockid_t, int, struct timespec *, + struct timespec __user *); int posix_cpu_timer_set(struct k_itimer *, int, struct itimerspec *, struct itimerspec *); int posix_cpu_timer_del(struct k_itimer *); Index: linux-2.6.13.ktimers/include/linux/sched.h =================================================================== --- linux-2.6.13.ktimers.orig/include/linux/sched.h +++ linux-2.6.13.ktimers/include/linux/sched.h @@ -102,6 +102,7 @@ extern unsigned long nr_iowait(void); #include <linux/param.h> #include <linux/resource.h> #include <linux/timer.h> +#include <linux/ktimer.h> #include <asm/processor.h> @@ -313,8 +314,7 @@ struct signal_struct { struct list_head posix_timers; /* ITIMER_REAL timer for the process */ - struct timer_list real_timer; - unsigned long it_real_value, it_real_incr; + struct ktimer real_timer; /* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */ cputime_t it_prof_expires, it_virt_expires; Index: linux-2.6.13.ktimers/include/linux/time.h =================================================================== --- linux-2.6.13.ktimers.orig/include/linux/time.h +++ linux-2.6.13.ktimers/include/linux/time.h @@ -5,6 +5,7 @@ #ifdef __KERNEL__ #include <linux/seqlock.h> +#include <asm/div64.h> #endif #ifndef _STRUCT_TIMESPEC @@ -45,6 +46,11 @@ static __inline__ int timespec_equal(str return (a->tv_sec == b->tv_sec) && (a->tv_nsec == b->tv_nsec); } +#define timespec_valid(ts) \ +(((ts)->tv_sec >= 0) && (((unsigned) (ts)->tv_nsec) < NSEC_PER_SEC)) + +typedef s64 nsec_t; + /* Converts Gregorian date to seconds since 1970-01-01 00:00:00. * Assumes input in normal date format, i.e. 1980-12-31 23:59:59 * => year=1980, mon=12, day=31, hour=23, min=59, sec=59. @@ -95,8 +101,7 @@ struct timespec current_kernel_time(void extern void do_gettimeofday(struct timeval *tv); extern int do_settimeofday(struct timespec *tv); extern int do_sys_settimeofday(struct timespec *tv, struct timezone *tz); -extern void clock_was_set(void); // call when ever the clock is set -extern int do_posix_clock_monotonic_gettime(struct timespec *tp); +extern void do_posix_clock_monotonic_gettime(struct timespec *ts); extern long do_nanosleep(struct timespec *t); extern long do_utimes(char __user * filename, struct timeval * times); struct itimerval; @@ -121,6 +126,49 @@ set_normalized_timespec (struct timespec ts->tv_nsec = nsec; } +static inline void div_sign_safe_ns(struct timespec *ts, nsec_t ns) +{ + if (unlikely(ns < 0)) { + ts->tv_sec = div_long_long_rem(-ns, NSEC_PER_SEC, &ts->tv_nsec); + set_normalized_timespec(ts, -ts->tv_sec, -ts->tv_nsec); + } else + ts->tv_sec = div_long_long_rem(ns, NSEC_PER_SEC, &ts->tv_nsec); +} + +static __inline__ nsec_t timespec_to_ns(struct timespec *s) +{ + nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC; + return res + (nsec_t) s->tv_nsec; +} + +static __inline__ struct timespec ns_to_timespec(nsec_t n) +{ + struct timespec ts; + + if (n) + div_sign_safe_ns(&ts, n); + else + ts.tv_sec = ts.tv_nsec = 0; + return ts; +} + +static __inline__ nsec_t timeval_to_ns(struct timeval *s) +{ + nsec_t res = (nsec_t) s->tv_sec * NSEC_PER_SEC; + return res + (nsec_t) s->tv_usec * NSEC_PER_USEC; +} + +static __inline__ struct timeval ns_to_timeval(nsec_t n) +{ + struct timeval tv; + if (n) { + div_sign_safe_ns((struct timespec *)&tv, n); + tv.tv_usec /= 1000; + } else + tv.tv_sec = tv.tv_usec = 0; + return tv; +} + #endif /* __KERNEL__ */ #define NFDBITS __NFDBITS @@ -153,23 +201,18 @@ struct itimerval { /* * The IDs of the various system clocks (for POSIX.1b interval timers). */ -#define CLOCK_REALTIME 0 -#define CLOCK_MONOTONIC 1 +#define CLOCK_REALTIME 0 +#define CLOCK_MONOTONIC 1 #define CLOCK_PROCESS_CPUTIME_ID 2 #define CLOCK_THREAD_CPUTIME_ID 3 -#define CLOCK_REALTIME_HR 4 -#define CLOCK_MONOTONIC_HR 5 /* * The IDs of various hardware clocks */ - - #define CLOCK_SGI_CYCLE 10 #define MAX_CLOCKS 16 -#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC | \ - CLOCK_REALTIME_HR | CLOCK_MONOTONIC_HR) -#define CLOCKS_MONO (CLOCK_MONOTONIC & CLOCK_MONOTONIC_HR) +#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC) +#define CLOCKS_MONO (CLOCK_MONOTONIC) /* * The various flags for setting POSIX.1b interval timers. Index: linux-2.6.13.ktimers/include/linux/timer.h =================================================================== --- linux-2.6.13.ktimers.orig/include/linux/timer.h +++ linux-2.6.13.ktimers/include/linux/timer.h @@ -87,6 +87,6 @@ static inline void add_timer(struct time extern void init_timers(void); extern void run_local_timers(void); -extern void it_real_fn(unsigned long); +extern void it_real_fn(void *); #endif Index: linux-2.6.13.ktimers/init/main.c =================================================================== --- linux-2.6.13.ktimers.orig/init/main.c +++ linux-2.6.13.ktimers/init/main.c @@ -472,6 +472,7 @@ asmlinkage void __init start_kernel(void init_IRQ(); pidhash_init(); init_timers(); + init_ktimers(); softirq_init(); time_init(); Index: linux-2.6.13.ktimers/kernel/Makefile =================================================================== --- linux-2.6.13.ktimers.orig/kernel/Makefile +++ linux-2.6.13.ktimers/kernel/Makefile @@ -7,7 +7,8 @@ obj-y = sched.o fork.o exec_domain.o sysctl.o capability.o ptrace.o timer.o user.o \ signal.o sys.o kmod.o workqueue.o pid.o \ rcupdate.o intermodule.o extable.o params.o posix-timers.o \ - kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o + kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o \ + ktimers.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o Index: linux-2.6.13.ktimers/kernel/exit.c =================================================================== --- linux-2.6.13.ktimers.orig/kernel/exit.c +++ linux-2.6.13.ktimers/kernel/exit.c @@ -830,7 +830,7 @@ fastcall NORET_TYPE void do_exit(long co update_mem_hiwater(tsk); group_dead = atomic_dec_and_test(&tsk->signal->live); if (group_dead) { - del_timer_sync(&tsk->signal->real_timer); + stop_ktimer(&tsk->signal->real_timer); acct_process(code); } exit_mm(tsk); Index: linux-2.6.13.ktimers/kernel/fork.c =================================================================== --- linux-2.6.13.ktimers.orig/kernel/fork.c +++ linux-2.6.13.ktimers/kernel/fork.c @@ -774,10 +774,9 @@ static inline int copy_signal(unsigned l init_sigpending(&sig->shared_pending); INIT_LIST_HEAD(&sig->posix_timers); - sig->it_real_value = sig->it_real_incr = 0; + init_ktimer_mono(&sig->real_timer); sig->real_timer.function = it_real_fn; - sig->real_timer.data = (unsigned long) tsk; - init_timer(&sig->real_timer); + sig->real_timer.data = tsk; sig->it_virt_expires = cputime_zero; sig->it_virt_incr = cputime_zero; Index: linux-2.6.13.ktimers/kernel/itimer.c =================================================================== --- linux-2.6.13.ktimers.orig/kernel/itimer.c +++ linux-2.6.13.ktimers/kernel/itimer.c @@ -12,36 +12,22 @@ #include <linux/syscalls.h> #include <linux/time.h> #include <linux/posix-timers.h> +#include <linux/ktimer.h> #include <asm/uaccess.h> -static unsigned long it_real_value(struct signal_struct *sig) -{ - unsigned long val = 0; - if (timer_pending(&sig->real_timer)) { - val = sig->real_timer.expires - jiffies; - - /* look out for negative/zero itimer.. */ - if ((long) val <= 0) - val = 1; - } - return val; -} - int do_getitimer(int which, struct itimerval *value) { struct task_struct *tsk = current; - unsigned long interval, val; + nsec_t interval, val; cputime_t cinterval, cval; switch (which) { case ITIMER_REAL: - spin_lock_irq(&tsk->sighand->siglock); - interval = tsk->signal->it_real_incr; - val = it_real_value(tsk->signal); - spin_unlock_irq(&tsk->sighand->siglock); - jiffies_to_timeval(val, &value->it_value); - jiffies_to_timeval(interval, &value->it_interval); + interval = tsk->signal->real_timer.interval; + val = get_remtime_ktimer(&tsk->signal->real_timer, NSEC_PER_USEC); + value->it_value = ns_to_timeval(val); + value->it_interval = ns_to_timeval(interval); break; case ITIMER_VIRTUAL: read_lock(&tasklist_lock); @@ -113,59 +99,34 @@ asmlinkage long sys_getitimer(int which, } -void it_real_fn(unsigned long __data) +/* + * The timer is automagically restarted, when interval != 0 + */ +void it_real_fn(void *data) { - struct task_struct * p = (struct task_struct *) __data; - unsigned long inc = p->signal->it_real_incr; - - send_group_sig_info(SIGALRM, SEND_SIG_PRIV, p); - - /* - * Now restart the timer if necessary. We don't need any locking - * here because do_setitimer makes sure we have finished running - * before it touches anything. - * Note, we KNOW we are (or should be) at a jiffie edge here so - * we don't need the +1 stuff. Also, we want to use the prior - * expire value so as to not "slip" a jiffie if we are late. - * Deal with requesting a time prior to "now" here rather than - * in add_timer. - */ - if (!inc) - return; - while (time_before_eq(p->signal->real_timer.expires, jiffies)) - p->signal->real_timer.expires += inc; - add_timer(&p->signal->real_timer); + send_group_sig_info(SIGALRM, SEND_SIG_PRIV, data); } int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue) { struct task_struct *tsk = current; - unsigned long val, interval, expires; + struct ktimer *timer; + nsec_t expires; cputime_t cval, cinterval, nval, ninterval; switch (which) { case ITIMER_REAL: -again: - spin_lock_irq(&tsk->sighand->siglock); - interval = tsk->signal->it_real_incr; - val = it_real_value(tsk->signal); - /* We are sharing ->siglock with it_real_fn() */ - if (try_to_del_timer_sync(&tsk->signal->real_timer) < 0) { - spin_unlock_irq(&tsk->sighand->siglock); - goto again; - } - tsk->signal->it_real_incr = - timeval_to_jiffies(&value->it_interval); - expires = timeval_to_jiffies(&value->it_value); - if (expires) - mod_timer(&tsk->signal->real_timer, - jiffies + 1 + expires); - spin_unlock_irq(&tsk->sighand->siglock); + timer = &tsk->signal->real_timer; + stop_ktimer(timer); if (ovalue) { - jiffies_to_timeval(val, &ovalue->it_value); - jiffies_to_timeval(interval, - &ovalue->it_interval); + ovalue->it_value = ns_to_timeval(get_remtime_ktimer(timer, NSEC_PER_USEC)); + ovalue->it_interval = ns_to_timeval(timer->interval); } + timer->interval = ktimer_convert_timeval(timer, &value->it_interval); + expires = ktimer_convert_timeval(timer, &value->it_value); + if (expires) + modify_ktimer(timer, &expires, KTIMER_REL | KTIMER_NOCHECK); + break; case ITIMER_VIRTUAL: nval = timeval_to_cputime(&value->it_value); Index: linux-2.6.13.ktimers/kernel/ktimers.c =================================================================== --- /dev/null +++ linux-2.6.13.ktimers/kernel/ktimers.c @@ -0,0 +1,845 @@ +/* + * linux/kernel/ktimers.c + * + * Copyright(C) 2005 Thomas Gleixner <tglx@linutronix.de> + * + * Kudos to Ingo Molnar for review, criticism, ideas + * + * Credits: + * Lot of ideas and implementation details taken from + * timer.c and related code + * + * Kernel timers + * + * In contrast to the timeout related API found in kernel/timer.c, + * ktimers provide finer resolution and accuracy depending on system + * configuration and capabilities. + * + * These timers are used for + * - itimers + * - posixtimers + * - nanosleep + * - precise in kernel timing + * + * Please do not abuse this API for simple timeouts. + * + * For licencing details see kernel-base/COPYING + * + */ + +#include <linux/cpu.h> +#include <linux/interrupt.h> +#include <linux/module.h> +#include <linux/notifier.h> +#include <linux/percpu.h> +#include <linux/ktimer.h> + +#include <asm/uaccess.h> + +/* + * The SMP/UP kludge goes here + */ +#if defined(CONFIG_SMP) + +#define set_running_timer(b,t) b->running_timer = t +#define wake_up_timer_waiters(b) wake_up(&b->wait_for_running_timer) +#define ktimer_base_can_change (1) +/* + * Wait for a running timer + */ +void wait_for_ktimer(struct ktimer *timer) +{ + struct ktimer_base *base = timer->base; + + if (base && base->running_timer == timer) + wait_event(base->wait_for_running_timer, + base->running_timer != timer); +} + +/* + * We are using hashed locking: holding per_cpu(tvec_bases).t_base.lock + * means that all timers which are tied to this base via timer->base are + * locked, and the base itself is locked too. + * + * So __run_timers/migrate_timers can safely modify all timers which could + * be found on the lists/queues. + * + * When the timer's base is locked, and the timer removed from list, it is + * possible to set timer->base = NULL and drop the lock: the timer remains + * locked. + */ +static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer, + unsigned long *flags) +{ + struct ktimer_base *base; + + for (;;) { + base = timer->base; + if (likely(base != NULL)) { + spin_lock_irqsave(&base->lock, *flags); + if (likely(base == timer->base)) + return base; + /* The timer has migrated to another CPU */ + spin_unlock_irqrestore(&base->lock, *flags); + } + cpu_relax(); + } +} + +static inline struct ktimer_base *switch_ktimer_base(struct ktimer *timer, + struct ktimer_base *base) +{ + struct ktimer_base *new_base = &base->base[raw_smp_processor_id()]; + + if (base != new_base) { + /* + * We are trying to schedule the timer on the local CPU. + * However we can't change timer's base while it is running, + * so we wait for the running timer. + */ + if (unlikely(base->running_timer == timer)) { + return NULL; + } else { + /* See the comment in lock_timer_base() */ + timer->base = NULL; + spin_unlock(&base->lock); + spin_lock(&new_base->lock); + timer->base = new_base; + } + } + return new_base; +} + +/* + * Get the timer base unlocked + * + * Take care of timer->base = NULL in switch_ktimer_base ! + */ +static inline struct ktimer_base *get_ktimer_base_unlocked(struct ktimer *timer) +{ + struct ktimer_base *base; + while (!(base = timer->base)); + return base; +} +#else + +#define set_running_timer(b,t) do {} while (0) +#define wake_up_timer_waiters(b) do {} while (0) + +static inline struct ktimer_base *lock_ktimer_base(struct ktimer *timer, + unsigned long *flags) +{ + struct ktimer_base *base; + + base = timer->base; + spin_lock_irqsave(&base->lock, *flags); + return base; +} + +#define switch_ktimer_base(t, b) b + +#define get_ktimer_base_unlocked(t) (t)->base +#define ktimer_base_can_change (0) + +#endif /* !CONFIG_SMP */ + +/* + * Convert timespec to nsec_t with resolution adjustment + * + * Note: We can access base without locking here, as ktimers can + * migrate between CPUs but can not be moved from one clock source to + * another. The clock source binding is set at init_ktimer_XXX. + */ +nsec_t ktimer_convert_timespec(struct ktimer *timer, struct timespec *ts) +{ + struct ktimer_base *base = get_ktimer_base_unlocked(timer); + nsec_t t; + long rem = ts->tv_nsec % base->resolution; + + t = (nsec_t) ts->tv_sec * NSEC_PER_SEC; + t += (nsec_t) ts->tv_nsec; + + /* Check, if the value has to be rounded */ + if (rem) + t += (nsec_t) (base->resolution - rem); + return t; +} + +/* + * Convert timeval to nsec_t with resolution adjustment + */ +nsec_t ktimer_convert_timeval(struct ktimer *timer, struct timeval *tv) +{ + struct timespec ts; + + ts.tv_sec = tv->tv_sec; + ts.tv_nsec = tv->tv_usec * NSEC_PER_USEC; + + return ktimer_convert_timespec(timer, &ts); +} + +/* + * Internal function to add (re)start a timer + * + * The timer is inserted in expiry order. + * Insertion into the red black tree is O(log(n)) + * + */ +static int enqueue_ktimer(struct ktimer *timer, struct ktimer_base *base, + nsec_t *tim, int mode) +{ + struct rb_node **link = &base->active.rb_node; + struct rb_node *parent = NULL; + struct ktimer *entry; + struct list_head *prev = &base->pending; + nsec_t now; + + /* Get current time */ + now = base->get_time(); + + /* Timer expiry mode */ + switch (mode & ~KTIMER_NOCHECK) { + case KTIMER_ABS: + timer->expires = *tim; + break; + case KTIMER_REL: + timer->expires = now + *tim; + break; + case KTIMER_INCR: + timer->expires += *tim; + break; + case KTIMER_FORWARD: + while (timer->expires < now) { + timer->expires += *tim; + timer->overrun++; + } + goto nocheck; + case KTIMER_REARM: + while (timer->expires < now) { + timer->expires += timer->interval; + timer->overrun++; + } + goto nocheck; + case KTIMER_RESTART: + break; + default: + BUG(); + } + + /* Already expired.*/ + if (timer->expires <= now) { + timer->expired = now; + /* The caller takes care of expiry */ + if (!(mode & KTIMER_NOCHECK)) + return -1; + } + nocheck: + + while (*link) { + parent = *link; + entry = rb_entry(parent, struct ktimer, node); + /* + * We dont care about collisions. Nodes with + * the same expiry time stay together. + */ + if (ktimer_before(timer, entry)) + link = &(*link)->rb_left; + else { + link = &(*link)->rb_right; + prev = &entry->list; + } + } + + rb_link_node(&timer->node, parent, link); + rb_insert_color(&timer->node, &base->active); + list_add(&timer->list, prev); + timer->status = KTIMER_PENDING; + base->count++; + return 0; +} + +/* + * Internal helper to remove a timer + * + * The function allows automatic rearming for interval + * timers. + * + */ +static inline void do_remove_ktimer(struct ktimer *timer, + struct ktimer_base *base, int rearm) +{ + list_del(&timer->list); + rb_erase(&timer->node, &base->active); + timer->node.rb_parent = KTIMER_POISON; + timer->status = KTIMER_INACTIVE; + base->count--; + BUG_ON(base->count < 0); + /* Auto rearm the timer ? */ + if (rearm && timer->interval > 0) + enqueue_ktimer(timer, base, NULL, KTIMER_REARM); +} + +/* + * Called with base lock held + */ +static inline int remove_ktimer(struct ktimer *timer, struct ktimer_base *base) +{ + if (ktimer_active(timer)) { + do_remove_ktimer(timer, base, KTIMER_NOREARM); + return 1; + } + return 0; +} + +/* + * Internal function to (re)start a timer. + */ +static int internal_restart_ktimer(struct ktimer *timer, nsec_t *tim, + int mode) +{ + struct ktimer_base *base, *new_base; + unsigned long flags; + int ret; + + BUG_ON(!timer->function); + + retry: + base = lock_ktimer_base(timer, &flags); + + /* Remove an active timer from the queue */ + ret = remove_ktimer(timer, base); + + /* Switch the timer base, if necessary */ + new_base = switch_ktimer_base(timer, base); + + /* SMP */ + if (ktimer_base_can_change && unlikely(!new_base)){ + spin_unlock_irqrestore(&base->lock, flags); + wait_for_ktimer(timer); + goto retry; + } + /* + * When the new timer setting is already expired, + * let the calling code deal with it. + */ + if (enqueue_ktimer(timer, new_base, tim, mode)) + ret = -1; + + spin_unlock_irqrestore(&new_base->lock, flags); + return ret; +} + +/*** + * modify_ktimer - modify a running timer + * @timer: the timer to be modified + * @tim: expiry time (required) + * @mode: timer setup mode + * + */ +int modify_ktimer(struct ktimer *timer, nsec_t *tim, int mode) +{ + BUG_ON(!tim || !timer->function); + return internal_restart_ktimer(timer, tim, mode); +} + +/*** + * start_ktimer - start a timer on current CPU + * @timer: the timer to be added + * @tim: expiry time (optional, if not set in the timer) + * @mode: timer setup mode + */ +int start_ktimer(struct ktimer *timer, nsec_t *tim, int mode) +{ + BUG_ON(ktimer_active(timer) || !timer->function); + + return internal_restart_ktimer(timer, tim, mode); +} + +/*** + * try_to_stop_ktimer - try to deactivate a timer + */ +int try_to_stop_ktimer(struct ktimer *timer) +{ + struct ktimer_base *base; + unsigned long flags; + int ret = -1; + + base = lock_ktimer_base(timer, &flags); + + if (base->running_timer != timer) { + ret = remove_ktimer(timer, base); + if (ret) + timer->expired = base->get_time(); + } + + spin_unlock_irqrestore(&base->lock, flags); + + return ret; + +} + +/*** + * stop_timer_sync - deactivate a timer and wait for the handler to finish. + * @timer: the timer to be deactivated + * + */ +int stop_ktimer(struct ktimer *timer) +{ + for (;;) { + int ret = try_to_stop_ktimer(timer); + if (ret >= 0) + return ret; + wait_for_ktimer(timer); + } +} + +/*** + * get_remtime_ktimer - get remaining time for the timer + * @timer: the timer to read + * @fake: when fake > 0 a pending, but expired timer + * returns fake (itimers need this, uurg) + */ +nsec_t get_remtime_ktimer(struct ktimer *timer, long fake) +{ + struct ktimer_base *base; + unsigned long flags; + nsec_t rem; + + base = lock_ktimer_base(timer, &flags); + if (ktimer_active(timer)) { + rem = timer->expires - base->get_time(); + if (fake && rem <= 0) + rem = (nsec_t) fake; + } else { + rem = fake ? 0 : timer->expires - timer->expired; + } + spin_unlock_irqrestore(&base->lock, flags); + return rem; +} + +/*** + * get_expiry_ktimer - get expiry time for the timer + * @timer: the timer to read + * @now: if != NULL store current base->time + */ +nsec_t get_expiry_ktimer(struct ktimer *timer, nsec_t *now) +{ + struct ktimer_base *base; + unsigned long flags; + nsec_t expiry; + + base = lock_ktimer_base(timer, &flags); + expiry = timer->expires; + if (now) + *now = base->get_time(); + spin_unlock_irqrestore(&base->lock, flags); + return expiry; +} + +/* + * Functions related to clock sources + */ +static struct ktimer_base *registered_bases[MAX_KTIMER_BASES]; + +static inline void ktimer_common_init(struct ktimer *timer) +{ + memset(timer, 0, sizeof(struct ktimer)); + timer->node.rb_parent = KTIMER_POISON; +} + +/* + * Get monotonic time + */ +static nsec_t get_ktime_mono(void) +{ + return do_get_ktime_mono(); +} + +/* + * per-CPU timer queues for monotonic time + */ +static struct ktimer_base monotonic_bases[NR_CPUS] __cacheline_aligned = { + [0 ... NR_CPUS-1] = { + .base = monotonic_bases, + .name = "Monotonic", + .get_time = &get_ktime_mono, + .resolution = KTIME_MONOTONIC_RES, + } +}; + +/*** + * init_ktimer_mono - initialize a timer on monotonic time + * @timer: the timer to be initialized + * + */ +void fastcall init_ktimer_mono(struct ktimer *timer) +{ + ktimer_common_init(timer); + timer->base = &monotonic_bases[raw_smp_processor_id()]; +} + +/*** + * get_ktimer_mono_res - get the monotonic timer resolution + * + */ +int get_ktimer_mono_res(clockid_t which_clock, struct timespec *tp) +{ + tp->tv_sec = 0; + tp->tv_nsec = monotonic_bases[0].resolution; + return 0; +} + +/* + * Get real time + */ +static nsec_t get_ktime_real(void) +{ + return do_get_ktime_real(); +} + +/* + * per-CPU timer queues for real time + */ +static struct ktimer_base realtime_bases[NR_CPUS] __cacheline_aligned = { + [0 ... NR_CPUS-1] = { + .base = realtime_bases, + .name = "Realtime", + .get_time = &get_ktime_real, + .resolution = KTIME_REALTIME_RES, + } +}; + +/*** + * init_ktimer_real - initialize a timer on real time + * @timer: the timer to be initialized + * + */ +void fastcall init_ktimer_real(struct ktimer *timer) +{ + ktimer_common_init(timer); + timer->base = &realtime_bases[raw_smp_processor_id()]; +} + +/*** + * get_ktimer_real_res - get the real timer resolution + * + */ +int get_ktimer_real_res(clockid_t which_clock, struct timespec *tp) +{ + tp->tv_sec = 0; + tp->tv_nsec = realtime_bases[0].resolution; + return 0; +} + +/* + * The per base runqueue + */ +static inline void run_ktimer_queue(struct ktimer_base *base) +{ + nsec_t now = base->get_time(); + + spin_lock_irq(&base->lock); + for (; !list_empty(&base->pending); ) { + void (*fn)(void *); + void *data; + struct ktimer *timer = list_entry(base->pending.next, + struct ktimer, list); + if (timer->expires > now) + break; + timer->expired = now; + fn = timer->function; + data = timer->data; + set_running_timer(base, timer); + do_remove_ktimer(timer, base, KTIMER_REARM); + spin_unlock_irq(&base->lock); + fn(data); + spin_lock_irq(&base->lock); + set_running_timer(base, NULL); + } + spin_unlock_irq(&base->lock); + wake_up_timer_waiters(base); +} + +/* + * Called from timer softirq every jiffy + */ +void run_ktimer_queues(void) +{ + int cpu = smp_processor_id(); + int i; + + for (i = 0; i < MAX_KTIMER_BASES; i++) { + if (!registered_bases[i]) + break; + run_ktimer_queue(®istered_bases[i][cpu]); + } +} + +/* + * Functions related to initialization + */ +static void __devinit init_ktimers_cpu(int cpu) +{ + struct ktimer_base *base; + int i; + + for (i = 0; i < MAX_KTIMER_BASES; i++) { + base = ®istered_bases[i][cpu]; + if (!base) + break; + spin_lock_init(&base->lock); + INIT_LIST_HEAD(&base->pending); + init_waitqueue_head(&base->wait_for_running_timer); + } +} + +#ifdef CONFIG_HOTPLUG_CPU +static void migrate_ktimer_list(struct ktimer_base *old_base, + struct ktimer_base *new_base) +{ + struct ktimer *timer; + struct rb_node *node; + + while ((node = rb_first(&old_base->root))) { + timer = rb_entry(node, struct ktimer, tnode); + remove_ktimer(timer, old_base); + timer->base = new_base; + enqueue_ktimer(timer, new_base, NULL); + } +} + +static void __devinit migrate_ktimers(int cpu) +{ + struct ktimer_base *old_base; + struct ktimer_base *new_base; + int i; + + BUG_ON(cpu_online(cpu)); + + local_irq_disable(); + + for (i = 0; i < MAX_KTIMER_BASES; i++) { + old_base = ®istered_bases[i][cpu]; + if (!old_base) + break; + new_base = ®istered_bases[i][smp_processor_id()]; + + spin_lock(&new_base->lock); + spin_lock(&old_base->lock); + + if (old_base->running_timer) + BUG(); + + migrate_ktimer_list(old_base, new_base); + + spin_unlock(&old_base->lock); + spin_unlock(&new_base->lock); + } + + local_irq_enable(); +} +#endif /* CONFIG_HOTPLUG_CPU */ + +static int __devinit ktimer_cpu_notify(struct notifier_block *self, + unsigned long action, void *hcpu) +{ + long cpu = (long)hcpu; + switch(action) { + case CPU_UP_PREPARE: + init_ktimers_cpu(cpu); + break; +#ifdef CONFIG_HOTPLUG_CPU + case CPU_DEAD: + migrate_ktimers(cpu); + break; +#endif + default: + break; + } + return NOTIFY_OK; +} + +static struct notifier_block __devinitdata ktimers_nb = { + .notifier_call = ktimer_cpu_notify, +}; + +int __init register_ktime_base(struct ktimer_base *base) +{ + int i; + + for (i = 0; i < MAX_KTIMER_BASES; i++) { + if (!registered_bases[i]) { + registered_bases[i] = base; + printk("Registered KTimer base %s\n", base->name); + return 0; + } + } + return -1; +} + +void __init init_ktimers(void) +{ + register_ktime_base(monotonic_bases); + register_ktime_base(realtime_bases); + + ktimer_cpu_notify(&ktimers_nb, (unsigned long)CPU_UP_PREPARE, + (void *)(long)smp_processor_id()); + register_cpu_notifier(&ktimers_nb); +} + +/* + * system interface related functions + */ +static void process_ktimer(void *data) +{ + wake_up_process(data); +} + +/** + * schedule_ktimer - sleep until timeout + * @timeout: timeout value + * @state: state to use for sleep + * @rel: timeout value is abs/rel + * + * Make the current task sleep until @timeout is + * elapsed. + * + * You can set the task state as follows - + * + * %TASK_UNINTERRUPTIBLE - at least @timeout is guaranteed to + * pass before the routine returns. The routine will return 0 + * + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is + * delivered to the current task. In this case the remaining time + * will be returned + * + * The current task state is guaranteed to be TASK_RUNNING when this + * routine returns. + * + */ +fastcall nsec_t __sched schedule_ktimer(struct ktimer *timer, + nsec_t *t, int state, int mode) +{ + timer->data = current; + timer->function = process_ktimer; + + current->state = state; + if (start_ktimer(timer, t, mode)) { + current->state = TASK_RUNNING; + goto out; + } + if (current->state != TASK_RUNNING) + schedule(); + stop_ktimer(timer); + out: + /* Store the absolute expiry time */ + *t = timer->expires; + /* Return the remaining time */ + return timer->expires - timer->expired; +} + +static long __sched nanosleep_restart(struct ktimer *timer, + struct restart_block *restart) +{ + struct timespec tu; + nsec_t t, rem; + void *rfn = restart->fn; + struct timespec __user *rmtp = (struct timespec __user *) restart->arg2; + + restart->fn = do_no_restart_syscall; + + t = (nsec_t) restart->arg0; + t += ((nsec_t) restart->arg1) << 32; + + rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, KTIMER_ABS); + + if (rem <= 0) + return 0; + + tu = ns_to_timespec(rem); + if (rmtp && copy_to_user(rmtp, &rem, sizeof(rem))) + return -EFAULT; + + restart->fn = rfn; + /* The other values in restart are already filled in */ + return -ERESTART_RESTARTBLOCK; +} + +static long __sched nanosleep_restart_mono(struct restart_block *restart) +{ + struct ktimer timer; + + init_ktimer_mono(&timer); + return nanosleep_restart(&timer, restart); +} + +static long __sched nanosleep_restart_real(struct restart_block *restart) +{ + struct ktimer timer; + + init_ktimer_real(&timer); + return nanosleep_restart(&timer, restart); +} + +static long ktimer_nanosleep(struct ktimer *timer, struct timespec *rqtp, + struct timespec __user *rmtp, int mode, + long (*rfn)(struct restart_block *)) +{ + struct timespec tu; + nsec_t rem, t; + struct restart_block *restart; + + t = ktimer_convert_timespec(timer, rqtp); + + /* t is updated to absolute expiry time ! */ + rem = schedule_ktimer(timer, &t, TASK_INTERRUPTIBLE, mode); + + if (rem <= 0) + return 0; + + tu = ns_to_timespec(rem); + + if (rmtp && copy_to_user(rmtp, &tu, sizeof(tu))) + return -EFAULT; + + restart = ¤t_thread_info()->restart_block; + restart->fn = rfn; + restart->arg0 = t & 0xFFFFFFFFLL; + restart->arg1 = t >> 32; + restart->arg2 = (unsigned long) rmtp; + return -ERESTART_RESTARTBLOCK; + +} + +long ktimer_nanosleep_mono(struct timespec *rqtp, + struct timespec __user *rmtp, int mode) +{ + struct ktimer timer; + + init_ktimer_mono(&timer); + return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_mono); +} + +long ktimer_nanosleep_real(struct timespec *rqtp, + struct timespec __user *rmtp, int mode) +{ + struct ktimer timer; + + init_ktimer_real(&timer); + return ktimer_nanosleep(&timer, rqtp, rmtp, mode, nanosleep_restart_real); +} + +asmlinkage long sys_nanosleep(struct timespec __user *rqtp, + struct timespec __user *rmtp) +{ + struct timespec tu; + + if (copy_from_user(&tu, rqtp, sizeof(tu))) + return -EFAULT; + + if (!timespec_valid(&tu)) + return -EINVAL; + + return ktimer_nanosleep_mono(&tu, rmtp, KTIMER_REL); +} + Index: linux-2.6.13.ktimers/kernel/posix-cpu-timers.c =================================================================== --- linux-2.6.13.ktimers.orig/kernel/posix-cpu-timers.c +++ linux-2.6.13.ktimers/kernel/posix-cpu-timers.c @@ -1394,7 +1394,7 @@ void set_process_cpu_timer(struct task_s static long posix_cpu_clock_nanosleep_restart(struct restart_block *); int posix_cpu_nsleep(clockid_t which_clock, int flags, - struct timespec *rqtp) + struct timespec *rqtp, struct timespec __user *rmtp) { struct restart_block *restart_block = ¤t_thread_info()->restart_block; @@ -1419,7 +1419,6 @@ int posix_cpu_nsleep(clockid_t which_clo error = posix_cpu_timer_create(&timer); timer.it_process = current; if (!error) { - struct timespec __user *rmtp; static struct itimerspec zero_it; struct itimerspec it = { .it_value = *rqtp, .it_interval = {} }; @@ -1466,7 +1465,6 @@ int posix_cpu_nsleep(clockid_t which_clo /* * Report back to the user the time still remaining. */ - rmtp = (struct timespec __user *) restart_block->arg1; if (rmtp != NULL && !(flags & TIMER_ABSTIME) && copy_to_user(rmtp, &it.it_value, sizeof *rmtp)) return -EFAULT; @@ -1474,6 +1472,7 @@ int posix_cpu_nsleep(clockid_t which_clo restart_block->fn = posix_cpu_clock_nanosleep_restart; /* Caller already set restart_block->arg1 */ restart_block->arg0 = which_clock; + restart_block->arg1 = (unsigned long) rmtp; restart_block->arg2 = rqtp->tv_sec; restart_block->arg3 = rqtp->tv_nsec; @@ -1487,10 +1486,15 @@ static long posix_cpu_clock_nanosleep_restart(struct restart_block *restart_block) { clockid_t which_clock = restart_block->arg0; - struct timespec t = { .tv_sec = restart_block->arg2, - .tv_nsec = restart_block->arg3 }; + struct timespec __user *rmtp; + struct timespec t; + + rmtp = (struct timespec __user *) restart_block->arg1; + t.tv_sec = restart_block->arg2; + t.tv_nsec = restart_block->arg3; + restart_block->fn = do_no_restart_syscall; - return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t); + return posix_cpu_nsleep(which_clock, TIMER_ABSTIME, &t, rmtp); } @@ -1511,9 +1515,10 @@ static int process_cpu_timer_create(stru return posix_cpu_timer_create(timer); } static int process_cpu_nsleep(clockid_t which_clock, int flags, - struct timespec *rqtp) + struct timespec *rqtp, + struct timespec __user *rmtp) { - return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp); + return posix_cpu_nsleep(PROCESS_CLOCK, flags, rqtp, rmtp); } static int thread_cpu_clock_getres(clockid_t which_clock, struct timespec *tp) { @@ -1529,7 +1534,7 @@ static int thread_cpu_timer_create(struc return posix_cpu_timer_create(timer); } static int thread_cpu_nsleep(clockid_t which_clock, int flags, - struct timespec *rqtp) + struct timespec *rqtp, struct timespec __user *rmtp) { return -EINVAL; } Index: linux-2.6.13.ktimers/kernel/posix-timers.c =================================================================== --- linux-2.6.13.ktimers.orig/kernel/posix-timers.c +++ linux-2.6.13.ktimers/kernel/posix-timers.c @@ -48,21 +48,6 @@ #include <linux/workqueue.h> #include <linux/module.h> -#ifndef div_long_long_rem -#include <asm/div64.h> - -#define div_long_long_rem(dividend,divisor,remainder) ({ \ - u64 result = dividend; \ - *remainder = do_div(result,divisor); \ - result; }) - -#endif -#define CLOCK_REALTIME_RES TICK_NSEC /* In nano seconds. */ - -static inline u64 mpy_l_X_l_ll(unsigned long mpy1,unsigned long mpy2) -{ - return (u64)mpy1 * mpy2; -} /* * Management arrays for POSIX timers. Timers are kept in slab memory * Timer ids are allocated by an external routine that keeps track of the @@ -148,18 +133,18 @@ static DEFINE_SPINLOCK(idr_lock); */ static struct k_clock posix_clocks[MAX_CLOCKS]; + /* - * We only have one real clock that can be set so we need only one abs list, - * even if we should want to have several clocks with differing resolutions. + * These ones are defined below. */ -static struct k_clock_abs abs_list = {.list = LIST_HEAD_INIT(abs_list.list), - .lock = SPIN_LOCK_UNLOCKED}; +static int common_nsleep(clockid_t, int flags, struct timespec *t, + struct timespec __user *rmtp); +static void common_timer_get(struct k_itimer *, struct itimerspec *); +static int common_timer_set(struct k_itimer *, int, + struct itimerspec *, struct itimerspec *); +static int common_timer_del(struct k_itimer *timer); -static void posix_timer_fn(unsigned long); -static u64 do_posix_clock_monotonic_gettime_parts( - struct timespec *tp, struct timespec *mo); -int do_posix_clock_monotonic_gettime(struct timespec *tp); -static int do_posix_clock_monotonic_get(clockid_t, struct timespec *tp); +static void posix_timer_fn(void *data); static struct k_itimer *lock_timer(timer_t timer_id, unsigned long *flags); @@ -205,21 +190,25 @@ static inline int common_clock_set(clock static inline int common_timer_create(struct k_itimer *new_timer) { - INIT_LIST_HEAD(&new_timer->it.real.abs_timer_entry); - init_timer(&new_timer->it.real.timer); - new_timer->it.real.timer.data = (unsigned long) new_timer; + return -EINVAL; +} + +static int timer_create_mono(struct k_itimer *new_timer) +{ + init_ktimer_mono(&new_timer->it.real.timer); + new_timer->it.real.timer.data = new_timer; + new_timer->it.real.timer.function = posix_timer_fn; + return 0; +} + +static int timer_create_real(struct k_itimer *new_timer) +{ + init_ktimer_real(&new_timer->it.real.timer); + new_timer->it.real.timer.data = new_timer; new_timer->it.real.timer.function = posix_timer_fn; return 0; } -/* - * These ones are defined below. - */ -static int common_nsleep(clockid_t, int flags, struct timespec *t); -static void common_timer_get(struct k_itimer *, struct itimerspec *); -static int common_timer_set(struct k_itimer *, int, - struct itimerspec *, struct itimerspec *); -static int common_timer_del(struct k_itimer *timer); /* * Return nonzero iff we know a priori this clockid_t value is bogus. @@ -239,19 +228,44 @@ static inline int invalid_clockid(clocki return 1; } +/* + * Get real time for posix timers + */ +static int posix_get_ktime_real_ts(clockid_t which_clock, struct timespec *tp) +{ + get_ktime_real_ts(tp); + return 0; +} + +/* + * Get monotonic time for posix timers + */ +static int posix_get_ktime_mono_ts(clockid_t which_clock, struct timespec *tp) +{ + get_ktime_mono_ts(tp); + return 0; +} + +void do_posix_clock_monotonic_gettime(struct timespec *ts) +{ + get_ktime_mono_ts(ts); +} /* * Initialize everything, well, just everything in Posix clocks/timers ;) */ static __init int init_posix_timers(void) { - struct k_clock clock_realtime = {.res = CLOCK_REALTIME_RES, - .abs_struct = &abs_list + struct k_clock clock_realtime = { + .clock_getres = get_ktimer_real_res, + .clock_get = posix_get_ktime_real_ts, + .timer_create = timer_create_real, }; - struct k_clock clock_monotonic = {.res = CLOCK_REALTIME_RES, - .abs_struct = NULL, - .clock_get = do_posix_clock_monotonic_get, - .clock_set = do_posix_clock_nosettime + struct k_clock clock_monotonic = { + .clock_getres = get_ktimer_mono_res, + .clock_get = posix_get_ktime_mono_ts, + .clock_set = do_posix_clock_nosettime, + .timer_create = timer_create_mono, }; register_posix_clock(CLOCK_REALTIME, &clock_realtime); @@ -265,117 +279,17 @@ static __init int init_posix_timers(void __initcall(init_posix_timers); -static void tstojiffie(struct timespec *tp, int res, u64 *jiff) -{ - long sec = tp->tv_sec; - long nsec = tp->tv_nsec + res - 1; - - if (nsec > NSEC_PER_SEC) { - sec++; - nsec -= NSEC_PER_SEC; - } - - /* - * The scaling constants are defined in <linux/time.h> - * The difference between there and here is that we do the - * res rounding and compute a 64-bit result (well so does that - * but it then throws away the high bits). - */ - *jiff = (mpy_l_X_l_ll(sec, SEC_CONVERSION) + - (mpy_l_X_l_ll(nsec, NSEC_CONVERSION) >> - (NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC; -} - -/* - * This function adjusts the timer as needed as a result of the clock - * being set. It should only be called for absolute timers, and then - * under the abs_list lock. It computes the time difference and sets - * the new jiffies value in the timer. It also updates the timers - * reference wall_to_monotonic value. It is complicated by the fact - * that tstojiffies() only handles positive times and it needs to work - * with both positive and negative times. Also, for negative offsets, - * we need to defeat the res round up. - * - * Return is true if there is a new time, else false. - */ -static long add_clockset_delta(struct k_itimer *timr, - struct timespec *new_wall_to) -{ - struct timespec delta; - int sign = 0; - u64 exp; - - set_normalized_timespec(&delta, - new_wall_to->tv_sec - - timr->it.real.wall_to_prev.tv_sec, - new_wall_to->tv_nsec - - timr->it.real.wall_to_prev.tv_nsec); - if (likely(!(delta.tv_sec | delta.tv_nsec))) - return 0; - if (delta.tv_sec < 0) { - set_normalized_timespec(&delta, - -delta.tv_sec, - 1 - delta.tv_nsec - - posix_clocks[timr->it_clock].res); - sign++; - } - tstojiffie(&delta, posix_clocks[timr->it_clock].res, &exp); - timr->it.real.wall_to_prev = *new_wall_to; - timr->it.real.timer.expires += (sign ? -exp : exp); - return 1; -} - -static void remove_from_abslist(struct k_itimer *timr) -{ - if (!list_empty(&timr->it.real.abs_timer_entry)) { - spin_lock(&abs_list.lock); - list_del_init(&timr->it.real.abs_timer_entry); - spin_unlock(&abs_list.lock); - } -} static void schedule_next_timer(struct k_itimer *timr) { - struct timespec new_wall_to; - struct now_struct now; - unsigned long seq; - - /* - * Set up the timer for the next interval (if there is one). - * Note: this code uses the abs_timer_lock to protect - * it.real.wall_to_prev and must hold it until exp is set, not exactly - * obvious... - - * This function is used for CLOCK_REALTIME* and - * CLOCK_MONOTONIC* timers. If we ever want to handle other - * CLOCKs, the calling code (do_schedule_next_timer) would need - * to pull the "clock" info from the timer and dispatch the - * "other" CLOCKs "next timer" code (which, I suppose should - * also be added to the k_clock structure). - */ if (!timr->it.real.incr) return; - do { - seq = read_seqbegin(&xtime_lock); - new_wall_to = wall_to_monotonic; - posix_get_now(&now); - } while (read_seqretry(&xtime_lock, seq)); - - if (!list_empty(&timr->it.real.abs_timer_entry)) { - spin_lock(&abs_list.lock); - add_clockset_delta(timr, &new_wall_to); - - posix_bump_timer(timr, now); - - spin_unlock(&abs_list.lock); - } else { - posix_bump_timer(timr, now); - } - timr->it_overrun_last = timr->it_overrun; - timr->it_overrun = -1; + timr->it_overrun_last = timr->it.real.overrun; + timr->it.real.overrun = timr->it.real.timer.overrun = -1; ++timr->it_requeue_pending; - add_timer(&timr->it.real.timer); + start_ktimer(&timr->it.real.timer, &timr->it.real.incr, KTIMER_FORWARD); + timr->it.real.overrun = timr->it.real.timer.overrun; } /* @@ -413,14 +327,7 @@ int posix_timer_event(struct k_itimer *t { memset(&timr->sigq->info, 0, sizeof(siginfo_t)); timr->sigq->info.si_sys_private = si_private; - /* - * Send signal to the process that owns this timer. - - * This code assumes that all the possible abs_lists share the - * same lock (there is only one list at this time). If this is - * not the case, the CLOCK info would need to be used to find - * the proper abs list lock. - */ + /* Send signal to the process that owns this timer.*/ timr->sigq->info.si_signo = timr->it_sigev_signo; timr->sigq->info.si_errno = 0; @@ -452,65 +359,28 @@ EXPORT_SYMBOL_GPL(posix_timer_event); * This code is for CLOCK_REALTIME* and CLOCK_MONOTONIC* timers. */ -static void posix_timer_fn(unsigned long __data) +static void posix_timer_fn(void *data) { - struct k_itimer *timr = (struct k_itimer *) __data; + struct k_itimer *timr = data; unsigned long flags; - unsigned long seq; - struct timespec delta, new_wall_to; - u64 exp = 0; - int do_notify = 1; + int si_private = 0; spin_lock_irqsave(&timr->it_lock, flags); - if (!list_empty(&timr->it.real.abs_timer_entry)) { - spin_lock(&abs_list.lock); - do { - seq = read_seqbegin(&xtime_lock); - new_wall_to = wall_to_monotonic; - } while (read_seqretry(&xtime_lock, seq)); - set_normalized_timespec(&delta, - new_wall_to.tv_sec - - timr->it.real.wall_to_prev.tv_sec, - new_wall_to.tv_nsec - - timr->it.real.wall_to_prev.tv_nsec); - if (likely((delta.tv_sec | delta.tv_nsec ) == 0)) { - /* do nothing, timer is on time */ - } else if (delta.tv_sec < 0) { - /* do nothing, timer is already late */ - } else { - /* timer is early due to a clock set */ - tstojiffie(&delta, - posix_clocks[timr->it_clock].res, - &exp); - timr->it.real.wall_to_prev = new_wall_to; - timr->it.real.timer.expires += exp; - add_timer(&timr->it.real.timer); - do_notify = 0; - } - spin_unlock(&abs_list.lock); - } - if (do_notify) { - int si_private=0; + if (timr->it.real.incr) + si_private = ++timr->it_requeue_pending; - if (timr->it.real.incr) - si_private = ++timr->it_requeue_pending; - else { - remove_from_abslist(timr); - } + if (posix_timer_event(timr, si_private)) + /* + * signal was not sent because of sig_ignor + * we will not get a call back to restart it AND + * it should be restarted. + */ + schedule_next_timer(timr); - if (posix_timer_event(timr, si_private)) - /* - * signal was not sent because of sig_ignor - * we will not get a call back to restart it AND - * it should be restarted. - */ - schedule_next_timer(timr); - } unlock_timer(timr, flags); /* hold thru abs lock to keep irq off */ } - static inline struct task_struct * good_sigevent(sigevent_t * event) { struct task_struct *rtn = current->group_leader; @@ -774,39 +644,39 @@ static struct k_itimer * lock_timer(time static void common_timer_get(struct k_itimer *timr, struct itimerspec *cur_setting) { - unsigned long expires; - struct now_struct now; + nsec_t expires, now, remaining; + struct ktimer *timer = &timr->it.real.timer; - do - expires = timr->it.real.timer.expires; - while ((volatile long) (timr->it.real.timer.expires) != expires); - - posix_get_now(&now); - - if (expires && - ((timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) && - !timr->it.real.incr && - posix_time_before(&timr->it.real.timer, &now)) - timr->it.real.timer.expires = expires = 0; - if (expires) { - if (timr->it_requeue_pending & REQUEUE_PENDING || - (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) { - posix_bump_timer(timr, now); - expires = timr->it.real.timer.expires; - } - else - if (!timer_pending(&timr->it.real.timer)) - expires = 0; - if (expires) - expires -= now.jiffies; - } - jiffies_to_timespec(expires, &cur_setting->it_value); - jiffies_to_timespec(timr->it.real.incr, &cur_setting->it_interval); - - if (cur_setting->it_value.tv_sec < 0) { + memset(cur_setting, 0, sizeof(struct itimerspec)); + expires = get_expiry_ktimer(timer, &now); + remaining = expires - now; + /* Time left ? or timer pending */ + if (remaining > 0 || ktimer_active(timer)) + goto calci; + /* interval timer ? */ + if (!timr->it.real.incr) + return; + /* + * When a requeue is pending or this is a SIGEV_NONE timer + * move the expiry time forward by intervals, so expiry is > + * now. + * The active (non SIGEV_NONE) rearm should be done + * automatically by the ktimer REARM mode. Thats the next + * iteration. The REQUEUE_PENDING part will go away ! + */ + if (timr->it_requeue_pending & REQUEUE_PENDING || + (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE) { + remaining = forward_posix_timer(timr, now); + } + calci: + /* interval timer ? */ + if (timr->it.real.incr) + cur_setting->it_interval = ns_to_timespec(timr->it.real.incr); + /* Return 0 only, when the timer is expired and not pending */ + if (remaining <= 0) cur_setting->it_value.tv_nsec = 1; - cur_setting->it_value.tv_sec = 0; - } + else + cur_setting->it_value = ns_to_timespec(remaining); } /* Get the time remaining on a POSIX.1b interval timer. */ @@ -830,6 +700,7 @@ sys_timer_gettime(timer_t timer_id, stru return 0; } + /* * Get the number of overruns of a POSIX.1b interval timer. This is to * be the overrun of the timer last delivered. At the same time we are @@ -856,84 +727,6 @@ sys_timer_getoverrun(timer_t timer_id) return overrun; } -/* - * Adjust for absolute time - * - * If absolute time is given and it is not CLOCK_MONOTONIC, we need to - * adjust for the offset between the timer clock (CLOCK_MONOTONIC) and - * what ever clock he is using. - * - * If it is relative time, we need to add the current (CLOCK_MONOTONIC) - * time to it to get the proper time for the timer. - */ -static int adjust_abs_time(struct k_clock *clock, struct timespec *tp, - int abs, u64 *exp, struct timespec *wall_to) -{ - struct timespec now; - struct timespec oc = *tp; - u64 jiffies_64_f; - int rtn =0; - - if (abs) { - /* - * The mask pick up the 4 basic clocks - */ - if (!((clock - &posix_clocks[0]) & ~CLOCKS_MASK)) { - jiffies_64_f = do_posix_clock_monotonic_gettime_parts( - &now, wall_to); - /* - * If we are doing a MONOTONIC clock - */ - if((clock - &posix_clocks[0]) & CLOCKS_MONO){ - now.tv_sec += wall_to->tv_sec; - now.tv_nsec += wall_to->tv_nsec; - } - } else { - /* - * Not one of the basic clocks - */ - clock->clock_get(clock - posix_clocks, &now); - jiffies_64_f = get_jiffies_64(); - } - /* - * Take away now to get delta and normalize - */ - set_normalized_timespec(&oc, oc.tv_sec - now.tv_sec, - oc.tv_nsec - now.tv_nsec); - }else{ - jiffies_64_f = get_jiffies_64(); - } - /* - * Check if the requested time is prior to now (if so set now) - */ - if (oc.tv_sec < 0) - oc.tv_sec = oc.tv_nsec = 0; - - if (oc.tv_sec | oc.tv_nsec) - set_normalized_timespec(&oc, oc.tv_sec, - oc.tv_nsec + clock->res); - tstojiffie(&oc, clock->res, exp); - - /* - * Check if the requested time is more than the timer code - * can handle (if so we error out but return the value too). - */ - if (*exp > ((u64)MAX_JIFFY_OFFSET)) - /* - * This is a considered response, not exactly in - * line with the standard (in fact it is silent on - * possible overflows). We assume such a large - * value is ALMOST always a programming error and - * try not to compound it by setting a really dumb - * value. - */ - rtn = -EINVAL; - /* - * return the actual jiffies expire time, full 64 bits - */ - *exp += jiffies_64_f; - return rtn; -} /* Set a POSIX.1b interval timer. */ /* timr->it_lock is taken. */ @@ -941,8 +734,8 @@ static inline int common_timer_set(struct k_itimer *timr, int flags, struct itimerspec *new_setting, struct itimerspec *old_setting) { - struct k_clock *clock = &posix_clocks[timr->it_clock]; - u64 expire_64; + nsec_t expires; + int mode; if (old_setting) common_timer_get(timr, old_setting); @@ -953,56 +746,40 @@ common_timer_set(struct k_itimer *timr, * careful here. If smp we could be in the "fire" routine which will * be spinning as we hold the lock. But this is ONLY an SMP issue. */ - if (try_to_del_timer_sync(&timr->it.real.timer) < 0) { -#ifdef CONFIG_SMP - /* - * It can only be active if on an other cpu. Since - * we have cleared the interval stuff above, it should - * clear once we release the spin lock. Of course once - * we do that anything could happen, including the - * complete melt down of the timer. So return with - * a "retry" exit status. - */ + if (try_to_stop_ktimer(&timr->it.real.timer) < 0) return TIMER_RETRY; -#endif - } - - remove_from_abslist(timr); timr->it_requeue_pending = (timr->it_requeue_pending + 2) & ~REQUEUE_PENDING; timr->it_overrun_last = 0; timr->it_overrun = -1; - /* - *switch off the timer when it_value is zero - */ - if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) { - timr->it.real.timer.expires = 0; + + /* switch off the timer when it_value is zero */ + if (!new_setting->it_value.tv_sec && !new_setting->it_value.tv_nsec) return 0; - } - if (adjust_abs_time(clock, - &new_setting->it_value, flags & TIMER_ABSTIME, - &expire_64, &(timr->it.real.wall_to_prev))) { - return -EINVAL; - } - timr->it.real.timer.expires = (unsigned long)expire_64; - tstojiffie(&new_setting->it_interval, clock->res, &expire_64); - timr->it.real.incr = (unsigned long)expire_64; + mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL; - /* - * We do not even queue SIGEV_NONE timers! But we do put them - * in the abs list so we can do that right. + /* Posix madness. Only absolute CLOCK_REALTIME timers + * are affected by clock sets. So we must reiniatilize + * the timer. */ + if (timr->it_clock == CLOCK_REALTIME && mode == KTIMER_ABS) + timer_create_real(timr); + else + timer_create_mono(timr); + + expires = ktimer_convert_timespec(&timr->it.real.timer, + &new_setting->it_value); + /* This should be moved to the auto rearm code */ + timr->it.real.incr = ktimer_convert_timespec(&timr->it.real.timer, + &new_setting->it_interval); + + /* SIGEV_NONE timers are not queued ! See common_timer_get */ if (((timr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE)) - add_timer(&timr->it.real.timer); + start_ktimer(&timr->it.real.timer, &expires, + mode | KTIMER_NOCHECK); - if (flags & TIMER_ABSTIME && clock->abs_struct) { - spin_lock(&clock->abs_struct->lock); - list_add_tail(&(timr->it.real.abs_timer_entry), - &(clock->abs_struct->list)); - spin_unlock(&clock->abs_struct->lock); - } return 0; } @@ -1037,6 +814,7 @@ retry: unlock_timer(timr, flag); if (error == TIMER_RETRY) { + wait_for_ktimer(&timr->it.real.timer); rtn = NULL; // We already got the old time... goto retry; } @@ -1052,22 +830,8 @@ static inline int common_timer_del(struc { timer->it.real.incr = 0; - if (try_to_del_timer_sync(&timer->it.real.timer) < 0) { -#ifdef CONFIG_SMP - /* - * It can only be active if on an other cpu. Since - * we have cleared the interval stuff above, it should - * clear once we release the spin lock. Of course once - * we do that anything could happen, including the - * complete melt down of the timer. So return with - * a "retry" exit status. - */ + if (try_to_stop_ktimer(&timer->it.real.timer) < 0) return TIMER_RETRY; -#endif - } - - remove_from_abslist(timer); - return 0; } @@ -1083,24 +847,17 @@ sys_timer_delete(timer_t timer_id) struct k_itimer *timer; long flags; -#ifdef CONFIG_SMP - int error; retry_delete: -#endif timer = lock_timer(timer_id, &flags); if (!timer) return -EINVAL; -#ifdef CONFIG_SMP - error = timer_delete_hook(timer); - - if (error == TIMER_RETRY) { + if (timer_delete_hook(timer) == TIMER_RETRY) { unlock_timer(timer, flags); + wait_for_ktimer(&timer->it.real.timer); goto retry_delete; } -#else - timer_delete_hook(timer); -#endif + spin_lock(¤t->sighand->siglock); list_del(&timer->list); spin_unlock(¤t->sighand->siglock); @@ -1117,6 +874,7 @@ retry_delete: release_posix_timer(timer, IT_ID_SET); return 0; } + /* * return timer owned by the process, used by exit_itimers */ @@ -1124,22 +882,14 @@ static inline void itimer_delete(struct { unsigned long flags; -#ifdef CONFIG_SMP - int error; retry_delete: -#endif spin_lock_irqsave(&timer->it_lock, flags); -#ifdef CONFIG_SMP - error = timer_delete_hook(timer); - - if (error == TIMER_RETRY) { + if (timer_delete_hook(timer) == TIMER_RETRY) { unlock_timer(timer, flags); + wait_for_ktimer(&timer->it.real.timer); goto retry_delete; } -#else - timer_delete_hook(timer); -#endif list_del(&timer->list); /* * This keeps any tasks waiting on the spin lock from thinking @@ -1168,60 +918,7 @@ void exit_itimers(struct signal_struct * } } -/* - * And now for the "clock" calls - * - * These functions are called both from timer functions (with the timer - * spin_lock_irq() held and from clock calls with no locking. They must - * use the save flags versions of locks. - */ - -/* - * We do ticks here to avoid the irq lock ( they take sooo long). - * The seqlock is great here. Since we a reader, we don't really care - * if we are interrupted since we don't take lock that will stall us or - * any other cpu. Voila, no irq lock is needed. - * - */ - -static u64 do_posix_clock_monotonic_gettime_parts( - struct timespec *tp, struct timespec *mo) -{ - u64 jiff; - unsigned int seq; - - do { - seq = read_seqbegin(&xtime_lock); - getnstimeofday(tp); - *mo = wall_to_monotonic; - jiff = jiffies_64; - - } while(read_seqretry(&xtime_lock, seq)); - - return jiff; -} - -static int do_posix_clock_monotonic_get(clockid_t clock, struct timespec *tp) -{ - struct timespec wall_to_mono; - - do_posix_clock_monotonic_gettime_parts(tp, &wall_to_mono); - - tp->tv_sec += wall_to_mono.tv_sec; - tp->tv_nsec += wall_to_mono.tv_nsec; - - if ((tp->tv_nsec - NSEC_PER_SEC) > 0) { - tp->tv_nsec -= NSEC_PER_SEC; - tp->tv_sec++; - } - return 0; -} - -int do_posix_clock_monotonic_gettime(struct timespec *tp) -{ - return do_posix_clock_monotonic_get(CLOCK_MONOTONIC, tp); -} - +/* Not available / possible... functions */ int do_posix_clock_nosettime(clockid_t clockid, struct timespec *tp) { return -EINVAL; @@ -1234,7 +931,8 @@ int do_posix_clock_notimer_create(struct } EXPORT_SYMBOL_GPL(do_posix_clock_notimer_create); -int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t) +int do_posix_clock_nonanosleep(clockid_t clock, int flags, struct timespec *t, + struct timespec *r) { #ifndef ENOTSUP return -EOPNOTSUPP; /* aka ENOTSUP in userland for POSIX */ @@ -1293,125 +991,34 @@ sys_clock_getres(clockid_t which_clock, return error; } -static void nanosleep_wake_up(unsigned long __data) -{ - struct task_struct *p = (struct task_struct *) __data; - - wake_up_process(p); -} - /* - * The standard says that an absolute nanosleep call MUST wake up at - * the requested time in spite of clock settings. Here is what we do: - * For each nanosleep call that needs it (only absolute and not on - * CLOCK_MONOTONIC* (as it can not be set)) we thread a little structure - * into the "nanosleep_abs_list". All we need is the task_struct pointer. - * When ever the clock is set we just wake up all those tasks. The rest - * is done by the while loop in clock_nanosleep(). - * - * On locking, clock_was_set() is called from update_wall_clock which - * holds (or has held for it) a write_lock_irq( xtime_lock) and is - * called from the timer bh code. Thus we need the irq save locks. - * - * Also, on the call from update_wall_clock, that is done as part of a - * softirq thing. We don't want to delay the system that much (possibly - * long list of timers to fix), so we defer that work to keventd. + * nanosleep for monotonic and realtime clocks */ - -static DECLARE_WAIT_QUEUE_HEAD(nanosleep_abs_wqueue); -static DECLARE_WORK(clock_was_set_work, (void(*)(void*))clock_was_set, NULL); - -static DECLARE_MUTEX(clock_was_set_lock); - -void clock_was_set(void) +static int common_nsleep(clockid_t which_clock, int flags, + struct timespec *tsave, struct timespec __user *rmtp) { - struct k_itimer *timr; - struct timespec new_wall_to; - LIST_HEAD(cws_list); - unsigned long seq; - + int mode = flags & TIMER_ABSTIME ? KTIMER_ABS : KTIMER_REL; - if (unlikely(in_interrupt())) { - schedule_work(&clock_was_set_work); - return; + switch (which_clock) { + case CLOCK_REALTIME: + /* Posix madness. Only absolute timers on clock realtime + are affected by clock set. */ + if (mode == KTIMER_ABS) + return ktimer_nanosleep_real(tsave, rmtp, mode); + case CLOCK_MONOTONIC: + return ktimer_nanosleep_mono(tsave, rmtp, mode); + default: + break; } - wake_up_all(&nanosleep_abs_wqueue); - - /* - * Check if there exist TIMER_ABSTIME timers to correct. - * - * Notes on locking: This code is run in task context with irq - * on. We CAN be interrupted! All other usage of the abs list - * lock is under the timer lock which holds the irq lock as - * well. We REALLY don't want to scan the whole list with the - * interrupt system off, AND we would like a sequence lock on - * this code as well. Since we assume that the clock will not - * be set often, it seems ok to take and release the irq lock - * for each timer. In fact add_timer will do this, so this is - * not an issue. So we know when we are done, we will move the - * whole list to a new location. Then as we process each entry, - * we will move it to the actual list again. This way, when our - * copy is empty, we are done. We are not all that concerned - * about preemption so we will use a semaphore lock to protect - * aginst reentry. This way we will not stall another - * processor. It is possible that this may delay some timers - * that should have expired, given the new clock, but even this - * will be minimal as we will always update to the current time, - * even if it was set by a task that is waiting for entry to - * this code. Timers that expire too early will be caught by - * the expire code and restarted. - - * Absolute timers that repeat are left in the abs list while - * waiting for the task to pick up the signal. This means we - * may find timers that are not in the "add_timer" list, but are - * in the abs list. We do the same thing for these, save - * putting them back in the "add_timer" list. (Note, these are - * left in the abs list mainly to indicate that they are - * ABSOLUTE timers, a fact that is used by the re-arm code, and - * for which we have no other flag.) - - */ - - down(&clock_was_set_lock); - spin_lock_irq(&abs_list.lock); - list_splice_init(&abs_list.list, &cws_list); - spin_unlock_irq(&abs_list.lock); - do { - do { - seq = read_seqbegin(&xtime_lock); - new_wall_to = wall_to_monotonic; - } while (read_seqretry(&xtime_lock, seq)); - - spin_lock_irq(&abs_list.lock); - if (list_empty(&cws_list)) { - spin_unlock_irq(&abs_list.lock); - break; - } - timr = list_entry(cws_list.next, struct k_itimer, - it.real.abs_timer_entry); - - list_del_init(&timr->it.real.abs_timer_entry); - if (add_clockset_delta(timr, &new_wall_to) && - del_timer(&timr->it.real.timer)) /* timer run yet? */ - add_timer(&timr->it.real.timer); - list_add(&timr->it.real.abs_timer_entry, &abs_list.list); - spin_unlock_irq(&abs_list.lock); - } while (1); - - up(&clock_was_set_lock); + return -EINVAL; } -long clock_nanosleep_restart(struct restart_block *restart_block); - asmlinkage long sys_clock_nanosleep(clockid_t which_clock, int flags, const struct timespec __user *rqtp, struct timespec __user *rmtp) { struct timespec t; - struct restart_block *restart_block = - &(current_thread_info()->restart_block); - int ret; if (invalid_clockid(which_clock)) return -EINVAL; @@ -1419,135 +1026,8 @@ sys_clock_nanosleep(clockid_t which_cloc if (copy_from_user(&t, rqtp, sizeof (struct timespec))) return -EFAULT; - if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0) + if (!timespec_valid(&t)) return -EINVAL; - /* - * Do this here as nsleep function does not have the real address. - */ - restart_block->arg1 = (unsigned long)rmtp; - - ret = CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t)); - - if ((ret == -ERESTART_RESTARTBLOCK) && rmtp && - copy_to_user(rmtp, &t, sizeof (t))) - return -EFAULT; - return ret; -} - - -static int common_nsleep(clockid_t which_clock, - int flags, struct timespec *tsave) -{ - struct timespec t, dum; - struct timer_list new_timer; - DECLARE_WAITQUEUE(abs_wqueue, current); - u64 rq_time = (u64)0; - s64 left; - int abs; - struct restart_block *restart_block = - ¤t_thread_info()->restart_block; - - abs_wqueue.flags = 0; - init_timer(&new_timer); - new_timer.expires = 0; - new_timer.data = (unsigned long) current; - new_timer.function = nanosleep_wake_up; - abs = flags & TIMER_ABSTIME; - - if (restart_block->fn == clock_nanosleep_restart) { - /* - * Interrupted by a non-delivered signal, pick up remaining - * time and continue. Remaining time is in arg2 & 3. - */ - restart_block->fn = do_no_restart_syscall; - - rq_time = restart_block->arg3; - rq_time = (rq_time << 32) + restart_block->arg2; - if (!rq_time) - return -EINTR; - left = rq_time - get_jiffies_64(); - if (left <= (s64)0) - return 0; /* Already passed */ - } - - if (abs && (posix_clocks[which_clock].clock_get != - posix_clocks[CLOCK_MONOTONIC].clock_get)) - add_wait_queue(&nanosleep_abs_wqueue, &abs_wqueue); - - do { - t = *tsave; - if (abs || !rq_time) { - adjust_abs_time(&posix_clocks[which_clock], &t, abs, - &rq_time, &dum); - } - - left = rq_time - get_jiffies_64(); - if (left >= (s64)MAX_JIFFY_OFFSET) - left = (s64)MAX_JIFFY_OFFSET; - if (left < (s64)0) - break; - - new_timer.expires = jiffies + left; - __set_current_state(TASK_INTERRUPTIBLE); - add_timer(&new_timer); - - schedule(); - - del_timer_sync(&new_timer); - left = rq_time - get_jiffies_64(); - } while (left > (s64)0 && !test_thread_flag(TIF_SIGPENDING)); - - if (abs_wqueue.task_list.next) - finish_wait(&nanosleep_abs_wqueue, &abs_wqueue); - - if (left > (s64)0) { - - /* - * Always restart abs calls from scratch to pick up any - * clock shifting that happened while we are away. - */ - if (abs) - return -ERESTARTNOHAND; - - left *= TICK_NSEC; - tsave->tv_sec = div_long_long_rem(left, - NSEC_PER_SEC, - &tsave->tv_nsec); - /* - * Restart works by saving the time remaing in - * arg2 & 3 (it is 64-bits of jiffies). The other - * info we need is the clock_id (saved in arg0). - * The sys_call interface needs the users - * timespec return address which _it_ saves in arg1. - * Since we have cast the nanosleep call to a clock_nanosleep - * both can be restarted with the same code. - */ - restart_block->fn = clock_nanosleep_restart; - restart_block->arg0 = which_clock; - /* - * Caller sets arg1 - */ - restart_block->arg2 = rq_time & 0xffffffffLL; - restart_block->arg3 = rq_time >> 32; - - return -ERESTART_RESTARTBLOCK; - } - - return 0; -} -/* - * This will restart clock_nanosleep. - */ -long -clock_nanosleep_restart(struct restart_block *restart_block) -{ - struct timespec t; - int ret = common_nsleep(restart_block->arg0, 0, &t); - - if ((ret == -ERESTART_RESTARTBLOCK) && restart_block->arg1 && - copy_to_user((struct timespec __user *)(restart_block->arg1), &t, - sizeof (t))) - return -EFAULT; - return ret; + return CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t, rmtp)); } Index: linux-2.6.13.ktimers/kernel/timer.c =================================================================== --- linux-2.6.13.ktimers.orig/kernel/timer.c +++ linux-2.6.13.ktimers/kernel/timer.c @@ -912,6 +912,7 @@ static void run_timer_softirq(struct sof { tvec_base_t *base = &__get_cpu_var(tvec_bases); + run_ktimer_queues(); if (time_after_eq(jiffies, base->timer_jiffies)) __run_timers(base); } @@ -1159,64 +1160,6 @@ asmlinkage long sys_gettid(void) return current->pid; } -static long __sched nanosleep_restart(struct restart_block *restart) -{ - unsigned long expire = restart->arg0, now = jiffies; - struct timespec __user *rmtp = (struct timespec __user *) restart->arg1; - long ret; - - /* Did it expire while we handled signals? */ - if (!time_after(expire, now)) - return 0; - - current->state = TASK_INTERRUPTIBLE; - expire = schedule_timeout(expire - now); - - ret = 0; - if (expire) { - struct timespec t; - jiffies_to_timespec(expire, &t); - - ret = -ERESTART_RESTARTBLOCK; - if (rmtp && copy_to_user(rmtp, &t, sizeof(t))) - ret = -EFAULT; - /* The 'restart' block is already filled in */ - } - return ret; -} - -asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp) -{ - struct timespec t; - unsigned long expire; - long ret; - - if (copy_from_user(&t, rqtp, sizeof(t))) - return -EFAULT; - - if ((t.tv_nsec >= 1000000000L) || (t.tv_nsec < 0) || (t.tv_sec < 0)) - return -EINVAL; - - expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec); - current->state = TASK_INTERRUPTIBLE; - expire = schedule_timeout(expire); - - ret = 0; - if (expire) { - struct restart_block *restart; - jiffies_to_timespec(expire, &t); - if (rmtp && copy_to_user(rmtp, &t, sizeof(t))) - return -EFAULT; - - restart = ¤t_thread_info()->restart_block; - restart->fn = nanosleep_restart; - restart->arg0 = jiffies + expire; - restart->arg1 = (unsigned long) rmtp; - ret = -ERESTART_RESTARTBLOCK; - } - return ret; -} - /* * sys_sysinfo - fill in sysinfo struct */ Index: linux-2.6.13.ktimers/include/asm-generic/div64.h =================================================================== --- linux-2.6.13.ktimers.orig/include/asm-generic/div64.h +++ linux-2.6.13.ktimers/include/asm-generic/div64.h @@ -30,6 +30,24 @@ __rem; \ }) +/* + * (long)X = ((long long)divs) / (long)div + * (long)rem = ((long long)divs) % (long)div + * + * Warning, this will do an exception if X overflows. + */ +#define div_long_long_rem(a,b,c) div_ll_X_l_rem(a,b,c) + +/* x = divs / div; *rem = divs % div; */ +static inline unsigned long div_ll_X_l_rem(unsigned long long divs, + unsigned long div, + unsigned long * rem) +{ + unsigned long long it = divs; + *rem = do_div(it, div); + return (unsigned long)it; +} + #elif BITS_PER_LONG == 32 extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor); ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx 2005-09-19 16:48 ` [PATCH] " tglx @ 2005-09-19 21:47 ` Thomas Gleixner 2005-09-19 22:03 ` Christoph Lameter 2005-09-21 19:50 ` Roman Zippel 3 siblings, 0 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-19 21:47 UTC (permalink / raw) To: linux-kernel; +Cc: mingo, akpm, george, johnstul, paulmck On Mon, 2005-09-19 at 23:04 +0200, tglx@linutronix.de wrote: > ktimers seperate the "timer API" from the "timeout API". ktimers are Sorry for double posting. mailer / operator madness tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx 2005-09-19 16:48 ` [PATCH] " tglx 2005-09-19 21:47 ` [ANNOUNCE] " Thomas Gleixner @ 2005-09-19 22:03 ` Christoph Lameter 2005-09-19 22:17 ` Thomas Gleixner 2005-09-20 0:43 ` George Anzinger 2005-09-21 19:50 ` Roman Zippel 3 siblings, 2 replies; 50+ messages in thread From: Christoph Lameter @ 2005-09-19 22:03 UTC (permalink / raw) To: tglx; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Mon, 19 Sep 2005 tglx@linutronix.de wrote: > sources. Another astonishing implementation detail of the current time > keeping is the fact that we get the monotonic clock (defined by POSIX as > a continous clock source which can not be set) by subtracting a variable > offset from the real time clock, which can be set by the user and > corrected by NTP or other mechanisms. The benefit or drawback of that implementation depends which time is more important: realtime or monotonic time. I think the most used time value is realtime and not monotonic time. Having the real time value in xtime saves one addition when retrieving realtime. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:03 ` Christoph Lameter @ 2005-09-19 22:17 ` Thomas Gleixner 2005-09-19 22:24 ` Christoph Lameter 2005-09-19 22:39 ` Christopher Friesen 2005-09-20 0:43 ` George Anzinger 1 sibling, 2 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-19 22:17 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Mon, 2005-09-19 at 15:03 -0700, Christoph Lameter wrote: > On Mon, 19 Sep 2005 tglx@linutronix.de wrote: > > > sources. Another astonishing implementation detail of the current time > > keeping is the fact that we get the monotonic clock (defined by POSIX as > > a continous clock source which can not be set) by subtracting a variable > > offset from the real time clock, which can be set by the user and > > corrected by NTP or other mechanisms. > > The benefit or drawback of that implementation depends which time is more > important: realtime or monotonic time. I think the most used time value is > realtime and not monotonic time. Having the real time value in xtime > saves one addition when retrieving realtime. Thats only partially true. Granted, the most used time in user space is clock_realtime (gettimeofday() / clock_gettime(CLOCK_REALTIME). But do we really want to discuss one add instruction ? The most used time in kernel space is clock_monotonic. Thats partially a result of the rather odd POSIX specs regarding relative CLOCK_REALTIME timers. Also the basic prerequisite for for high resolution timers is a fast and simple access to clock_monotonic rather than to a backward corrected clock_realtime representation. Kernel code speed in hot pathes must have precedence over code executed on behalf of userspace if its not completely out of bounds. One add/sub is definitely not. We should rather ask glibc people why gettimeofday() / clock_getttime() is called inside the library code all over the place for non obvious reasons. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:17 ` Thomas Gleixner @ 2005-09-19 22:24 ` Christoph Lameter 2005-09-19 22:44 ` Thomas Gleixner ` (2 more replies) 2005-09-19 22:39 ` Christopher Friesen 1 sibling, 3 replies; 50+ messages in thread From: Christoph Lameter @ 2005-09-19 22:24 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Tue, 20 Sep 2005, Thomas Gleixner wrote: > Also the basic prerequisite for for high resolution timers is a fast and > simple access to clock_monotonic rather than to a backward corrected > clock_realtime representation. Yup that may be a reason to tolerate the add for realtime. > We should rather ask glibc people why gettimeofday() / clock_getttime() > is called inside the library code all over the place for non obvious > reasons. You can ask lots of application vendors the same question because its all over lots of user space code. The fact is that gettimeofday() / clock_gettime() efficiency is very critical to the performance of many applications on Linux. That is why the addtion of one add instruction may better be carefully considered. Many platforms can execute gettimeofday without having to enter the kernel. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:24 ` Christoph Lameter @ 2005-09-19 22:44 ` Thomas Gleixner 2005-09-19 22:50 ` john stultz 2005-09-19 23:04 ` Christoph Lameter 2005-09-20 7:10 ` Ingo Molnar 2005-09-21 19:24 ` Pavel Machek 2 siblings, 2 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-19 22:44 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Mon, 2005-09-19 at 15:24 -0700, Christoph Lameter wrote: > > We should rather ask glibc people why gettimeofday() / clock_getttime() > > is called inside the library code all over the place for non obvious > > reasons. > > You can ask lots of application vendors the same question because its all > over lots of user space code. The fact is that gettimeofday() / > clock_gettime() efficiency is very critical to the performance of many > applications on Linux. That is why the addtion of one add instruction may > better be carefully considered. Hmm. I don't understand the argument line completely. 1. The kernel has to provide ugly mechanisms because a lot of applications implementations are doing the Wrong Thing ? 2. All gettimeofday implementations I have looked at do a lot of math anyway, so its definitely more interesting to look at those oddities rather than discussing a single add. John Stulz timeofday rework have a clean solution for this - please do not argue about the div64 in his original patches which he is reworking at the moment. > Many platforms can execute gettimeofday > without having to enter the kernel. Which ones ? How is this achieved with respect to all the time adjust, correction... code ? tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:44 ` Thomas Gleixner @ 2005-09-19 22:50 ` john stultz 2005-09-19 22:58 ` Thomas Gleixner 2005-09-19 23:04 ` Christoph Lameter 1 sibling, 1 reply; 50+ messages in thread From: john stultz @ 2005-09-19 22:50 UTC (permalink / raw) To: tglx; +Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, paulmck On Tue, 2005-09-20 at 00:44 +0200, Thomas Gleixner wrote: > On Mon, 2005-09-19 at 15:24 -0700, Christoph Lameter wrote: > > > > We should rather ask glibc people why gettimeofday() / clock_getttime() > > > is called inside the library code all over the place for non obvious > > > reasons. > > > > You can ask lots of application vendors the same question because its all > > over lots of user space code. The fact is that gettimeofday() / > > clock_gettime() efficiency is very critical to the performance of many > > applications on Linux. That is why the addtion of one add instruction may > > better be carefully considered. > > Hmm. I don't understand the argument line completely. > > 1. The kernel has to provide ugly mechanisms because a lot of > applications implementations are doing the Wrong Thing ? > > 2. All gettimeofday implementations I have looked at do a lot of math > anyway, so its definitely more interesting to look at those oddities > rather than discussing a single add. John Stulz timeofday rework have a > clean solution for this - please do not argue about the div64 in his > original patches which he is reworking at the moment. The simplest solution is to keep wall-time maintained in a timespec as well as the nsec_t based system_time/wall_time_offset combo. This avoids the extra add in the hotpath, and only costs an extra add at interrupt time. I'll have an updated patch that includes some of Roman's suggestions from earlier soon. > > Many platforms can execute gettimeofday > > without having to enter the kernel. > > Which ones ? How is this achieved with respect to all the time adjust, > correction... code ? Many arches have userspace gtod implementations (x86-64, ppc64, and ia64 as well). Although my timeofday code allows for this as well (I had it working for x86-64 awhile back). thanks -john ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:50 ` john stultz @ 2005-09-19 22:58 ` Thomas Gleixner 0 siblings, 0 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-19 22:58 UTC (permalink / raw) To: john stultz; +Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, paulmck On Mon, 2005-09-19 at 15:50 -0700, john stultz wrote: > > 2. All gettimeofday implementations I have looked at do a lot of math > > anyway, so its definitely more interesting to look at those oddities > > rather than discussing a single add. John Stulz timeofday rework have a > > clean solution for this - please do not argue about the div64 in his > > original patches which he is reworking at the moment. > > The simplest solution is to keep wall-time maintained in a timespec as > well as the nsec_t based system_time/wall_time_offset combo. This avoids > the extra add in the hotpath, and only costs an extra add at interrupt > time. <NITPICKING> The crucial question is what's the hot path ? Depending on the application type I want to avoid the add in the interrupt code. :) </NITPICKING> > Many arches have userspace gtod implementations (x86-64, ppc64, and ia64 > as well). Although my timeofday code allows for this as well (I had it > working for x86-64 awhile back). Was not aware of that. Thanks for clarification. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:44 ` Thomas Gleixner 2005-09-19 22:50 ` john stultz @ 2005-09-19 23:04 ` Christoph Lameter 2005-09-19 23:12 ` Thomas Gleixner 1 sibling, 1 reply; 50+ messages in thread From: Christoph Lameter @ 2005-09-19 23:04 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Tue, 20 Sep 2005, Thomas Gleixner wrote: > Hmm. I don't understand the argument line completely. > > 1. The kernel has to provide ugly mechanisms because a lot of > applications implementations are doing the Wrong Thing ? Lets skip the "wrong thing"... Or are you saying that glibc and all the apps are all wrong? Applications call gettimeofday for a variety of reasons. One is because it is widely available over different platformsn and application want to schedule things, need timestamps etc etc. > > Many platforms can execute gettimeofday > > without having to enter the kernel. > > Which ones ? How is this achieved with respect to all the time adjust, > correction... code ? IA64 f.e. has a special instruction that allows access to kernel user space without having to do a context switch. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 23:04 ` Christoph Lameter @ 2005-09-19 23:12 ` Thomas Gleixner 2005-09-20 7:14 ` Ingo Molnar 0 siblings, 1 reply; 50+ messages in thread From: Thomas Gleixner @ 2005-09-19 23:12 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Mon, 2005-09-19 at 16:04 -0700, Christoph Lameter wrote: > On Tue, 20 Sep 2005, Thomas Gleixner wrote: > > > Hmm. I don't understand the argument line completely. > > > > 1. The kernel has to provide ugly mechanisms because a lot of > > applications implementations are doing the Wrong Thing ? > > Lets skip the "wrong thing"... Or are you saying that glibc and all the > apps are all wrong? > > Applications call gettimeofday for a variety of reasons. One is because it > is widely available over different platformsn and application want to > schedule things, need timestamps etc etc. Accepted. But I still doubt that the number of calls to gettimeofday is in anyway justified. The question I'm asking if it is really worth a long and epic discussion about a single add instruction ? > > > Many platforms can execute gettimeofday > > > without having to enter the kernel. > > > > Which ones ? How is this achieved with respect to all the time adjust, > > correction... code ? > > IA64 f.e. has a special instruction that allows access to kernel user > space without having to do a context switch. Ok, was not aware of that and John kindly clarified this already. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 23:12 ` Thomas Gleixner @ 2005-09-20 7:14 ` Ingo Molnar 0 siblings, 0 replies; 50+ messages in thread From: Ingo Molnar @ 2005-09-20 7:14 UTC (permalink / raw) To: Thomas Gleixner Cc: Christoph Lameter, linux-kernel, akpm, george, johnstul, paulmck * Thomas Gleixner <tglx@linutronix.de> wrote: > > Applications call gettimeofday for a variety of reasons. One is because it > > is widely available over different platformsn and application want to > > schedule things, need timestamps etc etc. > > Accepted. But I still doubt that the number of calls to gettimeofday > is in anyway justified. The question I'm asking if it is really worth > a long and epic discussion about a single add instruction ? it is absolutely and emphatically not worth it. even in a hypothetical scenario [which this patchset is _not_ analogous to] where a new, clean subsystem introduces significant overhead, but the old subsystem is unclean, we frequently go with the new one - because it's so much easier to speed up something that is clean, robust and well-designed, than something that has been cobbled together! Ingo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:24 ` Christoph Lameter 2005-09-19 22:44 ` Thomas Gleixner @ 2005-09-20 7:10 ` Ingo Molnar 2005-09-21 19:24 ` Pavel Machek 2 siblings, 0 replies; 50+ messages in thread From: Ingo Molnar @ 2005-09-20 7:10 UTC (permalink / raw) To: Christoph Lameter Cc: Thomas Gleixner, linux-kernel, akpm, george, johnstul, paulmck * Christoph Lameter <clameter@engr.sgi.com> wrote: > > We should rather ask glibc people why gettimeofday() / clock_getttime() > > is called inside the library code all over the place for non obvious > > reasons. > > You can ask lots of application vendors the same question because its > all over lots of user space code. The fact is that gettimeofday() / > clock_gettime() efficiency is very critical to the performance of many > applications on Linux. That is why the addtion of one add instruction > may better be carefully considered. Many platforms can execute > gettimeofday without having to enter the kernel. i think this line of argument got into a bit of a wrong direction: do we seriously consider a single 'add' as an argument to _not_ go to a much cleaner implementation? The answer is very simple: we dont. In the core kernel we frequently skip other, much worthier micro-optimizations in favor of cleanliness. The time subsystem has been limping along for many, many years, and to bring new life into it we need John's and Thomas's stuff. Simple as that. I'd give up much more than just a single cycle add overhead for that ... it's probably not even worth keeping the timespec 'cached' in parallel to nsec_t - but in any case, speedups like that should be considered totally separately - cleanliness, the main problem of the whole time subsystem, comes first. _Once_ cleanliness has been achieved, we can consider micro-optimizations anew, and judge them by how much they bring and how they affect cleanliness. [ Even when not considering cleanliness at all, the best opportunities for optimizations are elsewhere. E.g. we could speed up sys_gettimeofday() much more by skipping a number of hardware bug workarounds that affect the quality of e.g. the TSC, and other timer hardware details that are simpler on modern hardware. So if someone is after sys_gettimeofday() performance, dont look for a single add (or even a single division), go for the bigger picture first. E.g. the vsyscall people went for the bigger picture on modern platforms and sped sys_time up by doing it in userspace most of the time and thus skipping hundreds of cycles of syscall entry overhead - not a cycle like an add is. ] Ingo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:24 ` Christoph Lameter 2005-09-19 22:44 ` Thomas Gleixner 2005-09-20 7:10 ` Ingo Molnar @ 2005-09-21 19:24 ` Pavel Machek 2 siblings, 0 replies; 50+ messages in thread From: Pavel Machek @ 2005-09-21 19:24 UTC (permalink / raw) To: Christoph Lameter Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck Hi! > > Also the basic prerequisite for for high resolution timers is a fast and > > simple access to clock_monotonic rather than to a backward corrected > > clock_realtime representation. > > Yup that may be a reason to tolerate the add for realtime. > > > We should rather ask glibc people why gettimeofday() / clock_getttime() > > is called inside the library code all over the place for non obvious > > reasons. > > You can ask lots of application vendors the same question because its all > over lots of user space code. The fact is that gettimeofday() / > clock_gettime() efficiency is very critical to the performance of many > applications on Linux. That is why the addtion of one add instruction may > better be carefully considered. Many platforms can execute gettimeofday > without having to enter the kernel. Eh? One addition is going to be lost in noise compared to syscall overhead. (For vsyscall, you may be closer to truth, but I doubt it. You could still gain more than one addition by using some strange calling convention). -- 64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:17 ` Thomas Gleixner 2005-09-19 22:24 ` Christoph Lameter @ 2005-09-19 22:39 ` Christopher Friesen 2005-09-19 22:54 ` Thomas Gleixner 1 sibling, 1 reply; 50+ messages in thread From: Christopher Friesen @ 2005-09-19 22:39 UTC (permalink / raw) To: tglx Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck Thomas Gleixner wrote: > We should rather ask glibc people why gettimeofday() / clock_getttime() > is called inside the library code all over the place for non obvious > reasons. From an app point of view, there are any number of reasons to check the time frequently. --debugging --flight-recorder style logs --if you've got timers in your application, you may want to check to make sure that you didn't get woken up early (the linux behaviour of returning unused time in select is not portable) --the app might be tracking it's own behaviour, measuring how long code paths take for its own accounting purposes --emulators (vmware, UML, etc.) often want to check the time quite frequently Chris ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:39 ` Christopher Friesen @ 2005-09-19 22:54 ` Thomas Gleixner 2005-09-20 4:57 ` Christopher Friesen 0 siblings, 1 reply; 50+ messages in thread From: Thomas Gleixner @ 2005-09-19 22:54 UTC (permalink / raw) To: Christopher Friesen Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck On Mon, 2005-09-19 at 16:39 -0600, Christopher Friesen wrote: > Thomas Gleixner wrote: > > > We should rather ask glibc people why gettimeofday() / clock_getttime() > > is called inside the library code all over the place for non obvious > > reasons. > > From an app point of view, there are any number of reasons to check the > time frequently. > > --debugging Non standard case. > --flight-recorder style logs If you want to implement such stuff efficiently you rely on rdtscll() on x86 or other monotonic easy accessible time souces and not on a permanent call to gettimeofday. > --if you've got timers in your application, you may want to check to > make sure that you didn't get woken up early (the linux behaviour of > returning unused time in select is not portable) #ifdef is portable Please beware me of red herrings. If application developers code with respect to random OS worst case behaviour then they should not complain that OS N is having an additional add instruction in one of the pathes. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:54 ` Thomas Gleixner @ 2005-09-20 4:57 ` Christopher Friesen 2005-09-20 5:11 ` Thomas Gleixner 0 siblings, 1 reply; 50+ messages in thread From: Christopher Friesen @ 2005-09-20 4:57 UTC (permalink / raw) To: tglx Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck Thomas Gleixner wrote: > On Mon, 2005-09-19 at 16:39 -0600, Christopher Friesen wrote: > >>Thomas Gleixner wrote: >>>We should rather ask glibc people why gettimeofday() / clock_getttime() >>>is called inside the library code all over the place for non obvious >>>reasons. >>--flight-recorder style logs > If you want to implement such stuff efficiently you rely on rdtscll() on > x86 or other monotonic easy accessible time souces and not on a > permanent call to gettimeofday. Not portable across architectures, and doesn't work across all smp/numa environments. Also not easy to compare with other nodes on the network, whereas with ntp-synch'd nodes you can use gettimeofday() for quite accurate correlations. > Please beware me of red herrings. If application developers code with > respect to random OS worst case behaviour then they should not complain > that OS N is having an additional add instruction in one of the pathes. Actually I'm not complaining about additional add instructions. I was just suggesting some reasons why apps might reasonably want to know the time frequently. Chris ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-20 4:57 ` Christopher Friesen @ 2005-09-20 5:11 ` Thomas Gleixner 0 siblings, 0 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-20 5:11 UTC (permalink / raw) To: Christopher Friesen Cc: Christoph Lameter, linux-kernel, mingo, akpm, george, johnstul, paulmck On Mon, 2005-09-19 at 22:57 -0600, Christopher Friesen wrote: > >>--flight-recorder style logs > > > If you want to implement such stuff efficiently you rely on rdtscll() on > > x86 or other monotonic easy accessible time souces and not on a > > permanent call to gettimeofday. > > Not portable across architectures, and doesn't work across all smp/numa > environments. Also not easy to compare with other nodes on the network, > whereas with ntp-synch'd nodes you can use gettimeofday() for quite > accurate correlations. Sorry was a stupid argument. Withdrawn herby > > Please beware me of red herrings. If application developers code with > > respect to random OS worst case behaviour then they should not complain > > that OS N is having an additional add instruction in one of the pathes. > > Actually I'm not complaining about additional add instructions. I was > just suggesting some reasons why apps might reasonably want to know the > time frequently. ok tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 22:03 ` Christoph Lameter 2005-09-19 22:17 ` Thomas Gleixner @ 2005-09-20 0:43 ` George Anzinger 1 sibling, 0 replies; 50+ messages in thread From: George Anzinger @ 2005-09-20 0:43 UTC (permalink / raw) To: Christoph Lameter; +Cc: tglx, linux-kernel, mingo, akpm, johnstul, paulmck Christoph Lameter wrote: > On Mon, 19 Sep 2005 tglx@linutronix.de wrote: > > >>sources. Another astonishing implementation detail of the current time >>keeping is the fact that we get the monotonic clock (defined by POSIX as >>a continuous clock source which can not be set) by subtracting a variable >>offset from the real time clock, which can be set by the user and >>corrected by NTP or other mechanisms. Why is this astonishing? What it really indicates is the nature of Linux where in we have just (with 2.6) introduced the concept of monotonic time. As such, and with few users, it made a LOT of sense to not upset too much code by making it the primary clock. In the end, the difference between the two clocks is a constant offset and it is only an add in one path or the other. An argument from the other side is that ntp works with CLOCK_REALTIME and so that is where and what it corrects. Agreed, this can be turned around, however, one needs folks like John Stultz who take the time to understand ntp as well as all the other clock issues to turn things like this around. Still, we should consider carefully IF we want to turn it around. A far more astonishing thing, IMHO, is the cascade in the timers code... > > > The benefit or drawback of that implementation depends which time is more > important: realtime or monotonic time. I think the most used time value is > realtime and not monotonic time. Having the real time value in xtime > saves one addition when retrieving realtime. > - Both sides of this argument have merit. Much as we would like to, we can not change user usage. AND, in the end, they are, and will continue to make far more calls to get the time than the kernel does. So, in raw cpu power (or time) consumed, the user get time will win over kernel usage. Also, the time to do a gettimeofday is easily computed with the most simple program... -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx ` (2 preceding siblings ...) 2005-09-19 22:03 ` Christoph Lameter @ 2005-09-21 19:50 ` Roman Zippel 2005-09-21 22:41 ` Thomas Gleixner 3 siblings, 1 reply; 50+ messages in thread From: Roman Zippel @ 2005-09-21 19:50 UTC (permalink / raw) To: tglx; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck Hi, On Mon, 19 Sep 2005 tglx@linutronix.de wrote: First a quick question: > This revealed a reasonable explanation for this behaviour. Both > networking and disk I/O arm a lot of timeout timers (the maximum number > of armed timers during the tests observed was ~400000). This triggers the obvious question: where are these timers coming from? You don't think that having that much timers in first place is little insane (especially if these are kernel timers)? Ok, now to the long part: > The conclusions of my recent work on Linux time(rs) related problems and > the analysis of related patches are: > > 1. The HZ/jiffy based usage of time in the kernel code has to be > converted to human time units. > > 2. A clean seperation of all related APIs and subsystems is necessary > even if they have interdependencies and shared functionality How you get to these conclusions is still rather unclear, I don't even really know what the problem is from just reading the pretext. You talk about previous efforts, you talk about users abusing the timer system, but what is actually the problem of the timer system itself? Later it becomes clear that you want high resolution timers, what doesn't become clear is why it's such an essential feature that everyone has to pay the price for it (this does not only include changes in runtime behaviour, but also the proposed API changes). What is seriously missing here is the big picture. First off how does it currently look like? You rather shortly mention scheduler ticks and your analysis basically says only that it's "a bunch of ugliness". There is no mention why they are needed in first place and there is no real explanation why they are such a big problem for high resolution timers. Second, an API cleanup is all nice, but the more interesting part is still what is behind this API and this part you pretty much leave in the dark. Basically how does the new big picture look like and how do high resolution timer fit into it? (You are more busy defending the 64bit math, than actually explaining why and where it's needed in the first place.) Sorry, if this sounds harsh, but your announcement is more a random collection of information about timers than an explanation of why ktimers are desirable. I'm not against high resolution timers per se, but this doesn't explain why it has to be high resolution all the way. It also doesn't explain how it will interact with Johns work, e.g. I'm only scared if I see this in the ktimer_hres patch: +extern int arch_hrtimer_init(int highres); +extern int arch_hrtimer_reprogram(nsec_t expires); +extern void arch_hrtimer_trigger_ints(void); Ok, so what's missing? From a basic design overview I would expect some information about types of time within the kernel and their relationship. We basically have three types: - scheduler time - wallclock time - process time The scheduler time (aka jiffies) is not just used for timeouts, it's the basic time unit to schedule cpu time. It's major requirement is simplicity - a 32bit value can always be read without locking and calculations based on it are simple. I exclude posix clocks here as it can be used with both wallclock and process time. The main difference between them is that the latter is user programmable. Here we get to the core problem of timer ticks: the current timer system is designed around a simple timer model, which is not reprogrammable, so the timer resolution available to user space is limited to the timer tick resolution. Johns patches now introduce two major new concepts as a generic mechanism (and not just hidden somewhere in arch code): 1) a timer source abstraction, 2) making wallclock updates independent of the timer tick. BTW here you completely miss the "main point of criticizm", the 64bit math is a problem, but the main problem is that he completely changed the NTP kernel model. I don't deny that the NTP code could use some updates itself, but that's a completely separate problem. Regarding the timer system it's only important how to synchronize NTP time with the kernel wallclock time, as soon as you get that right, the whole 64bit math problematic becomes irrelevant. The existence of the timer source abstraction is a major requirement for further improvements (in this regard it's already suspicious, that you put major changes before Johns patch). The next major change would be to add the possibility to reprogram a timer source, the scheduler can use this to skip timer ticks and e.g. itimer can offer higher resolution timers. The main point here is before we get to any API decisions, we need to develop a model how a single time source can drive multiple users. Your split between user timers and kernel timeouts leaves this question completely open. The next step (_after_ reprogrammable timer sources) would be increasing the timer resolution. Here I'm not at all convinced, that we need to change everything to nanosecond resolution, we can easily make this a config option which either ties process time resolution to scheduler time or makes it independent. The first would make process time a 32bit ms value (basically current behaviour), the latter can make it to a 64bit ns value. Anyone trying to introduce nsec_t in common code really needs to come up with some better arguments why calculations in ns are necessary unconditionally, instead of making the resolution configurable. In summary please provide a larger picture for your changes, it's especially important to desribe the relationship between the various systems. The API definition is only the last step and is derived from these relationships. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-21 19:50 ` Roman Zippel @ 2005-09-21 22:41 ` Thomas Gleixner 2005-09-22 12:59 ` Ingo Molnar 2005-09-22 23:09 ` Roman Zippel 0 siblings, 2 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-21 22:41 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck Hi, On Wed, 2005-09-21 at 21:50 +0200, Roman Zippel wrote: > First a quick question: > > > This revealed a reasonable explanation for this behaviour. Both > > networking and disk I/O arm a lot of timeout timers (the maximum number > > of armed timers during the tests observed was ~400000). > > This triggers the obvious question: where are these timers coming from? > You don't think that having that much timers in first place is little > insane (especially if these are kernel timers)? Quick answer: Networking and disk I/O. Insane load on a 4 way SMP machine. Check yourself. :) FYI. I'm not the only one who observed this. > Ok, now to the long part: > > > The conclusions of my recent work on Linux time(rs) related problems and > > the analysis of related patches are: > > > > 1. The HZ/jiffy based usage of time in the kernel code has to be > > converted to human time units. > > > > 2. A clean seperation of all related APIs and subsystems is necessary > > even if they have interdependencies and shared functionality > > How you get to these conclusions is still rather unclear, I don't even > really know what the problem is from just reading the pretext. You talk > about previous efforts, you talk about users abusing the timer system, but > what is actually the problem of the timer system itself? Actually there are several problems in the timer system: 1. HZ/jiffie boundness in the current implementation is problematic vs. the conversion of human time units, which is the way programmers think and the way time values are defined in data sheets, to HZ/jiffies. There are dozens of places which get this completely wrong - either due to the missunderstanding or dumbness of programmers or due to the fact that HZ is not longer =100 and nobody cared to change that affected piece of code. Using human time units in the API solves this in a clear way. There is no problem to have conversion functions which convert those to jiffies/HZ or whatever internal representation during compile or if necessary run time, but relying on non constant units of time in an API is utterly wrong and error prone. 2. The all in one solution for "timeouts" and "timers". I think I made this point rather clear, but nevertheless: Timeouts are coarse grained functions to catch error conditions. The vast majority of those never expire. The implementation emphasis is fast insertion/removal. Timers are (possibly/desirably) fine grained functions to control program flow in a time ordered way. The vast majority of those expire. The implementation emphasis is time accuracy. Don't confuse accuracy with high resolution! Intermingling those types (1.) /(2.) leads to an obvious conflict in interests and to restrictions of extensibility / flexibility especially when (3.) applies. 3. There _are_ provable abuses of the current timer ("timeout") system. Functions which run longer than the next timer tick have to be considered insane. I just pointed this out for reference and I sent a note to the subsystem maintainer(s) to make them aware of that before writing this. If you read the writeup without being biased in front you might notice that there are explicit examples of those insane abuses of timers. If you consider those as legitimate we can stop the discussion right now. I also pointed out that the HZ=1000->HZ=250 change just hides this rather than solving the underlying problem. > Later it becomes clear that you want high resolution timers, what doesn't > become clear is why it's such an essential feature that everyone has to > pay the price for it (this does not only include changes in runtime > behaviour, but also the proposed API changes). 1. Yes, the final goal are high resolution timers. I never denied that. 2. The integration of high resolution timers is a long standing issue, which was rejected mostly due to the intrusiveness of the proposed patches. 3. ktimers itself are designed to be aware of a possible extension for HRT, but they provide a benefit without high resolution timers and nobody has to pay a price for them when they are not configured/implemented. The removal of the abstime list reprogramming in posix-timers.c is definitely a worthwhile cleanup and totaly unrelated to HRT. > What is seriously missing here is the big picture. First off how does it > currently look like? Y Point taken. The writeup was intended for people who are familiar with the current situation. > You rather shortly mention scheduler ticks and your > analysis basically says only that it's "a bunch of ugliness". Wrong. I did not say there is a "bunch of ugliness" in general. Even not between the lines. Period. The only places where "ugl*" is used in the whole writeup are: "ticks introduce a bunch of ugliness especially when it comes to time synchronizing with high resolution time sources." "The combination of both patches provides the grounds and leads the way to the cleanup of the timeout API and the implementation of dyntick/tickless support without introducing additional ugliness." I'm sure that both assertions are true, especially in the context they were made. Roman, please keep this at a serious level. I dont have the intention to participate on another "UFT-8, Reiser4, sizeof(*p)" lkml debate club. > There is no mention why they are needed in first place and there is no real > explanation why they are such a big problem for hich resolution timers. Maybe above does shed more light on it ? > Second, an API cleanup is all nice, but the more interesting part is still > what is behind this API and this part you pretty much leave in the dark. Whats in the dark ? > Basically how does the new big picture look like and how do high > resolution timer fit into it? (You are more busy defending the 64bit math, > than actually explaining why and where it's needed in the first place.) I also explained why I wanted to seperate "timeout" and "timers" APIs. I explained why I choose rbtree and I explained why I used 64bit math and at least why 64 bit math is not that evil as commonly seen. There is also code and if you need more details, call my sales departent.... > Sorry, if this sounds harsh, but your announcement is more a random > collection of information about timers than an explanation of why ktimers > are desirable. First of all, this is volunteer work and I _did_ take the time to write up a detailed explanation at all rather than throwing a random patch with a 10 line bla into the arena. Do you expect that I write a PhD thesis on that ? Second this writeup was not targeted for John User. .... > I'm not against high resolution timers per se, but this > doesn't explain why it has to be high resolution all the way. Where is high resolution all the way. Care to read the patch ? It's high resolution aware and it does take out odd areas of code by design. > It also doesn't explain how it will interact with Johns work, "The following add on patches are not provided for ad hoc inclusion as they contain third party patches. The reason for providing this series is to demonstrate the future use of ktimers and the simple extensibility for the impelemtation of high resolution timers. Especially John Stultz timeofday patch is a complete seperate issue and just used due to the ability to provide high resolution timers in a simple and non intrusive way." Isn't this clear enough ? > e.g. I'm only scared > if I see this in the ktimer_hres patch: > > +extern int arch_hrtimer_init(int highres); > +extern int arch_hrtimer_reprogram(nsec_t expires); > +extern void arch_hrtimer_trigger_ints(void); Whats scary ? This is proof of concept. See above ! Have you a simpler solution and did you care to read the comment in arch/i386/kernel/hrtimer.c ? > Ok, so what's missing? From a basic design overview I would expect some > information about types of time within the kernel and their relationship. > We basically have three types: > - scheduler time > - wallclock time > - process time What about monotonic time ? > The scheduler time (aka jiffies) is not just used for timeouts, it's the > basic time unit to schedule cpu time. It's major requirement is simplicity > - a 32bit value can always be read without locking and calculations based > on it are simple. > I exclude posix clocks here as it can be used with both wallclock and > process time. What about monotonic time ? > The main difference between them is that the latter is user > programmable. wallclock is reprogrammable too and it introduces a bunch of horrible functions in posix-timers.c. grep for abs_list. I explained why its horrible already. > Here we get to the core problem of timer ticks: the current > timer system is designed around a simple timer model, which is not > reprogrammable, so the timer resolution available to user space is > limited to the timer tick resolution. > > Johns patches now introduce two major new concepts as a generic mechanism > (and not just hidden somewhere in arch code): 1) a timer source > abstraction, 2) making wallclock updates independent of the timer tick. 1. I'm well aware of the addressed problems in Johns patches. 2.I dont see any hidden arch code in the ktimers patch. Do you ? fs/exec.c | 9 fs/proc/array.c | 6 include/asm-generic/div64.h | 18 include/linux/ktimer.h | 142 +++++++ include/linux/posix-timers.h | 87 ++-- include/linux/sched.h | 4 include/linux/time.h | 65 ++- include/linux/timer.h | 2 init/main.c | 1 kernel/Makefile | 3 kernel/exit.c | 2 kernel/fork.c | 5 kernel/itimer.c | 83 +--- kernel/ktimers.c | 826 ++++++++++++++++++++++++++++++++++++++++++ kernel/posix-cpu-timers.c | 23 - kernel/posix-timers.c | 832 ++++++++----------------------------------- kernel/timer.c | 59 --- May I politely remind you, that I provided the complete patch series just to show the future use and clearly stated that it is just a proof of concept implemetation on top of ktimers. > BTW here you completely miss the "main point of criticizm", the 64bit math > is a problem, but the main problem is that he completely changed the NTP > kernel model. I don't deny that the NTP code could use some updates > itself, but that's a completely separate problem. Regarding the timer > system it's only important how to synchronize NTP time with the kernel > wallclock time, as soon as you get that right, the whole 64bit math > problematic becomes irrelevant. Roman, what are you trying to achieve ? Finding a playground for rabulistic discussions ? > The existence of the timer source abstraction is a major requirement for > further improvements (in this regard it's already suspicious, that you put > major changes before Johns patch). Whats suspicious on that ? Seperating the "timeout" API and the "timer" API has nothing to do with Johns patches. > The next major change would be to add the possibility to reprogram a > timer source, the scheduler can use this to > skip timer ticks and e.g. itimer can offer higher resolution timers. The > main point here is before we get to any API decisions, we need to develop > a model how a single time source can drive multiple users. Your split > between user timers and kernel timeouts leaves this question completely > open. Did I claim, that ktimers solve this problem? No. The patches are related but address different aspects of the overall problem without conflicting with each other. Quite the contrary: they complement each other. I clearly stated that the reprogramming of timer events, which are not addressed by ktimers and I never claimed ktimers does, is a completely different problem. > The next step (_after_ reprogrammable timer sources) would be increasing > the timer resolution. Please let me correct you here. Adding reprogrammable timer events before you have the core timer system ready is wrong by design. Providing reprogrammable timer events is simple, but when the timer core system which depends on timer events is not ready for that you implement useless things and implement likely stuff which is not matching the requirements of the generic system. > Here I'm not at all convinced, that we need to > change everything to nanosecond resolution, we can easily make this a > config option which either ties process time resolution to scheduler time > or makes it independent. The first would make process time a 32bit ms > value (basically current behaviour), the latter can make it to a 64bit ns > value. Anyone trying to introduce nsec_t in common code really needs to > come up with some better arguments why calculations in ns are necessary > unconditionally, instead of making the resolution configurable. Please provide a whatever time unit based and configurable / flexible solution for that instead of making unprovable claims ! > In summary please provide a larger picture for your changes, it's > especially important to desribe the relationship between the various > systems. The API definition is only the last step and is derived from > these relationships. Sorry. When you are not able to get the larger picture in your mind, I doubt that you are the right person to discuss this topic. This kind of argument is not working with me, especially not when repeated all over the place. Please provide an alternative solution (i.e. code to review) yourself. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-21 22:41 ` Thomas Gleixner @ 2005-09-22 12:59 ` Ingo Molnar 2005-09-22 23:09 ` Roman Zippel 1 sibling, 0 replies; 50+ messages in thread From: Ingo Molnar @ 2005-09-22 12:59 UTC (permalink / raw) To: Thomas Gleixner Cc: Roman Zippel, linux-kernel, akpm, george, johnstul, paulmck * Thomas Gleixner <tglx@linutronix.de> wrote: > > > This revealed a reasonable explanation for this behaviour. Both > > > networking and disk I/O arm a lot of timeout timers (the maximum number > > > of armed timers during the tests observed was ~400000). > > > > This triggers the obvious question: where are these timers coming from? > > You don't think that having that much timers in first place is little > > insane (especially if these are kernel timers)? > > Quick answer: Networking and disk I/O. Insane load on a 4 way SMP > machine. Check yourself. :) a busy network server can easily have millions of timers pending. I once had to increase a server's 16 million tw timer sysctl limit ... Ingo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-21 22:41 ` Thomas Gleixner 2005-09-22 12:59 ` Ingo Molnar @ 2005-09-22 23:09 ` Roman Zippel 2005-09-22 23:31 ` Christopher Friesen ` (3 more replies) 1 sibling, 4 replies; 50+ messages in thread From: Roman Zippel @ 2005-09-22 23:09 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck Hi, On Thu, 22 Sep 2005, Thomas Gleixner wrote: > > > This revealed a reasonable explanation for this behaviour. Both > > > networking and disk I/O arm a lot of timeout timers (the maximum number > > > of armed timers during the tests observed was ~400000). > > > > This triggers the obvious question: where are these timers coming from? > > You don't think that having that much timers in first place is little > > insane (especially if these are kernel timers)? > > Quick answer: Networking and disk I/O. Insane load on a 4 way SMP > machine. Check yourself. :) This no answer at all, you only repeat what you already said above. :( Care to share your knowledge? First off, I can understand that you're rather upset with what I wrote, unfortunately you got overly defensive, so could you please next time not reply immediately and first sleep over it, an overly emotional reply is funny to read but not exactly useful. The main problem with your text is that you jump from one topic to the next, making it impossible to create a coherent picture from it. Maybe it's obvious for you, but I'd like you to fill in these holes to better understand the design desicions. > Actually there are several problems in the timer system: > > 1. HZ/jiffie boundness in the current implementation is problematic vs. > the conversion of human time units, which is the way programmers think > and the way time values are defined in data sheets, to HZ/jiffies. This is an API problem. How is this related to core timer system (and more specially to ktimers). > 2. The all in one solution for "timeouts" and "timers". > > I think I made this point rather clear, but nevertheless: > > Timeouts are coarse grained functions to catch error conditions. The > vast majority of those never expire. The implementation emphasis is fast > insertion/removal. > > Timers are (possibly/desirably) fine grained functions to control > program flow in a time ordered way. The vast majority of those expire. > The implementation emphasis is time accuracy. Don't confuse accuracy > with high resolution! > > Intermingling those types (1.) /(2.) leads to an obvious conflict in > interests and to restrictions of extensibility / flexibility especially > when (3.) applies. You don't make it obvious at all. You jump from problems with _kernel_ timers, to the introduction of a new subsytem to manage _process_ timers. Why don't you fix the problems with kernel timers separately? Only because kernel timer require less precision, doesn't mean you can make them imprecise. The timer accuracy is defined by the timer source and I expect it to be the same for both kernel and process timers. > 3. There _are_ provable abuses of the current timer ("timeout") system. > Functions which run longer than the next timer tick have to be > considered insane. Fine. Again, how exactly does this problem with _kernel_ timer relate to the introduction of ktimers? > The only places where "ugl*" is used in the whole writeup are: > "ticks introduce a bunch of ugliness especially when it comes to time > synchronizing with high resolution time > sources." > > "The combination of both patches provides the grounds and leads the way > to the cleanup of the timeout API and the implementation of > dyntick/tickless support without introducing additional ugliness." > > I'm sure that both assertions are true, especially in the context they > were made. It's nice that you're sure of it, but as long don't provide the means to verify them, they are just assertions. You never really expand on what this "bunch of ugliness" is, you talk about timer abusers and API problems, but what is the problem with timer ticks related to high resolution timers? How are dyntick/tickless support and ktimers supposed to share the same timer source? This never becomes clear from you document and neither from your example code. > > Basically how does the new big picture look like and how do high > > resolution timer fit into it? (You are more busy defending the 64bit math, > > than actually explaining why and where it's needed in the first place.) > > I also explained why I wanted to seperate "timeout" and "timers" APIs. I > explained why I choose rbtree and I explained why I used 64bit math and > at least why 64 bit math is not that evil as commonly seen. I don't say that 64bit math is evil, I just question that it's required - small, but important difference. The main problem with your ktimer patch is that it's another all-in-one patch, it simply changes too many aspects at once. If you want to introduce a new API, you can do so by first introducing a small layer which maps to the old layer. This makes it easier to see and prove any potential improvement. > > Sorry, if this sounds harsh, but your announcement is more a random > > collection of information about timers than an explanation of why ktimers > > are desirable. > > First of all, this is volunteer work and I _did_ take the time to write > up a detailed explanation at all rather than throwing a random patch > with a 10 line bla into the arena. Do you expect that I write a PhD > thesis on that ? No, but I hope you didn't just expect hoorray calls. I appreciate that you try to explain this, but you should also expect criticism. You make it sound I'm doing this just for fun and to annoy you, but I'm trying to keep the quality level of Linux up and half finished ideas out of it. There is a reason that the answer took a few days, I really thought very carefully about this based on your document and patches. It's still possible I missed something, but feel free to point this out (preferably in a civilized manner). > Second this writeup was not targeted for John User. .... Well, I'm not sure whom you targeted, but it wasn't coherent enough for the avarage LKML reader (at least for those wanting more than just cursory information). > > I'm not against high resolution timers per se, but this > > doesn't explain why it has to be high resolution all the way. > > Where is high resolution all the way. Care to read the patch ? It's high > resolution aware and it does take out odd areas of code by design. It's not just high resolution aware, it makes all calculation in high resolution _unconditionally_, which makes it high resolution all the way. > > It also doesn't explain how it will interact with Johns work, > > "The following add on patches are not provided for ad hoc inclusion as > they contain third party patches. The reason for providing this series > is to demonstrate the future use of ktimers and the simple extensibility > for the impelemtation of high resolution timers. Especially John Stultz > timeofday patch is a complete seperate issue > and just used due to the ability to provide high resolution timers in a > simple and non intrusive way." > > Isn't this clear enough ? No and I explained why I think that these are not separate issues at all. > > Ok, so what's missing? From a basic design overview I would expect some > > information about types of time within the kernel and their relationship. > > We basically have three types: > > - scheduler time > > - wallclock time > > - process time > > What about monotonic time ? It's derived from wallclock time. > > The main difference between them is that the latter is user > > programmable. > > wallclock is reprogrammable too and it introduces a bunch of horrible > functions in posix-timers.c. grep for abs_list. I explained why its > horrible already. I said _user_ programmable, wallclock time is usually NTP controlled. > > Johns patches now introduce two major new concepts as a generic mechanism > > (and not just hidden somewhere in arch code): 1) a timer source > > abstraction, 2) making wallclock updates independent of the timer tick. > > 1. I'm well aware of the addressed problems in Johns patches. > > 2.I dont see any hidden arch code in the ktimers patch. Do you ? That's not what I meant (and if you had taken the time to think about it, instead of just being angry at me, I'm sure you would have noticed yourself), this is e.g. about code in arch/i386/kernel/timers/ or arch/ppc/kernel/time.c. > > BTW here you completely miss the "main point of criticizm", the 64bit math > > is a problem, but the main problem is that he completely changed the NTP > > kernel model. I don't deny that the NTP code could use some updates > > itself, but that's a completely separate problem. Regarding the timer > > system it's only important how to synchronize NTP time with the kernel > > wallclock time, as soon as you get that right, the whole 64bit math > > problematic becomes irrelevant. > > Roman, what are you trying to achieve ? Finding a playground for > rabulistic discussions ? Ok, I'm at a loss here, what are you trying to tell me? Is the above in any way incorrect? > > The existence of the timer source abstraction is a major requirement for > > further improvements (in this regard it's already suspicious, that you put > > major changes before Johns patch). > > Whats suspicious on that ? Seperating the "timeout" API and the "timer" > API has nothing to do with Johns patches. Related changes should be done in a logical order, which I'm obviously disagree about with you. > > The next major change would be to add the possibility to reprogram a > > timer source, the scheduler can use this to > > skip timer ticks and e.g. itimer can offer higher resolution timers. The > > main point here is before we get to any API decisions, we need to develop > > a model how a single time source can drive multiple users. Your split > > between user timers and kernel timeouts leaves this question completely > > open. > > Did I claim, that ktimers solve this problem? > > No. > > The patches are related but address different aspects of the overall > problem without conflicting with each other. Quite the contrary: they > complement each other. > > I clearly stated that the reprogramming of timer events, which are not > addressed by ktimers and I never claimed ktimers does, is a completely > different problem. No, it's part of the same problem, how are scheduler and your ktimers supposed to share the same time source? > > The next step (_after_ reprogrammable timer sources) would be increasing > > the timer resolution. > > Please let me correct you here. Adding reprogrammable timer events > before you have the core timer system ready is wrong by design. > Providing reprogrammable timer events is simple, but when the timer core > system which depends on timer events is not ready for that you implement > useless things and implement likely stuff which is not matching the > requirements of the generic system. So what are these requirements? Please be more specific. > > Here I'm not at all convinced, that we need to > > change everything to nanosecond resolution, we can easily make this a > > config option which either ties process time resolution to scheduler time > > or makes it independent. The first would make process time a 32bit ms > > value (basically current behaviour), the latter can make it to a 64bit ns > > value. Anyone trying to introduce nsec_t in common code really needs to > > come up with some better arguments why calculations in ns are necessary > > unconditionally, instead of making the resolution configurable. > > Please provide a whatever time unit based and configurable / flexible > solution for that instead of making unprovable claims ! What unprovable claims? What would change in the basic principles, if you would do them with 32bit ms values instead of 64bit ns values? The basic math should be the same and should demonstrate the basic principles equally well and since the current timer code has only ms (at HZ=1000) precision the behaviour should be the same as well. > > In summary please provide a larger picture for your changes, it's > > especially important to desribe the relationship between the various > > systems. The API definition is only the last step and is derived from > > these relationships. > > Sorry. When you are not able to get the larger picture in your mind, I > doubt that you are the right person to discuss this topic. This kind of > argument is not working with me, especially not when repeated all over > the place. > > Please provide an alternative solution (i.e. code to review) yourself. You seriously trying to tell me, that anyone doing reviews must provide alternative solution first? Please refrain from personal attacks, it should be obvious that we have different ideas how to implement this, what I'm trying to do is to get you to explain your picture better and I'm trying to explain my picture, so we can understand each other better. We won't get very far as long as you are just pissed at me for disagreeing with you. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-22 23:09 ` Roman Zippel @ 2005-09-22 23:31 ` Christopher Friesen 2005-09-23 0:25 ` Roman Zippel 2005-09-23 2:25 ` john stultz ` (2 subsequent siblings) 3 siblings, 1 reply; 50+ messages in thread From: Christopher Friesen @ 2005-09-22 23:31 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck Roman Zippel wrote: > This no answer at all, you only repeat what you already said above. :( > Care to share your knowledge? Ingo already gave an example. "a busy network server can easily have millions of timers pending. I once had to increase a server's 16 million tw timer sysctl limit ..." > I don't say that 64bit math is evil, I just question that it's required - > small, but important difference. <snip> > It's not just high resolution aware, it makes all calculation in high > resolution _unconditionally_, which makes it high resolution all the way. <snip> > What unprovable claims? What would change in the basic principles, if you > would do them with 32bit ms values instead of 64bit ns values? The basic > math should be the same and should demonstrate the basic principles > equally well and since the current timer code has only ms (at HZ=1000) > precision the behaviour should be the same as well. I see two assumptions that lead to the API using nanoseconds: 1) it is desireable to have a human-time-unit timer API, so that people can specify timeouts in easily-understood units 2) eventually we will use sub-ms resolution timers, so it makes sense to just jump to nanoseconds as our base timing unit Are these reasonable starting points, or is there disagreement on these? Maybe it would make sense to have the API be in nanoseconds and internally use 32bit ms for now, and only change to 64bit nanos when we actually move to sub-ms resolution timers. Chris ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-22 23:31 ` Christopher Friesen @ 2005-09-23 0:25 ` Roman Zippel 2005-09-23 6:49 ` Thomas Gleixner 0 siblings, 1 reply; 50+ messages in thread From: Roman Zippel @ 2005-09-23 0:25 UTC (permalink / raw) To: Christopher Friesen Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck Hi, On Thu, 22 Sep 2005, Christopher Friesen wrote: > Roman Zippel wrote: > > > This no answer at all, you only repeat what you already said above. :( > > Care to share your knowledge? > > Ingo already gave an example. "a busy network server can easily have millions > of timers pending. I once had to increase a server's 16 million tw timer > sysctl limit ..." I hoped for a more concrete example (i.e. pointer to source), but this one at least gave me enough hints where to look. There are ways to avoid this huge number of added timers, but this requires a better analysis of the problem. > I see two assumptions that lead to the API using nanoseconds: > > 1) it is desireable to have a human-time-unit timer API, so that people can > specify timeouts in easily-understood units > 2) eventually we will use sub-ms resolution timers, so it makes sense to just > jump to nanoseconds as our base timing unit > > Are these reasonable starting points, or is there disagreement on these? > > Maybe it would make sense to have the API be in nanoseconds and internally use > 32bit ms for now, and only change to 64bit nanos when we actually move to > sub-ms resolution timers. Actually the decision to use ns has nothing to do with API issues. <linux/jiffies.h> has already a lot of options to specify timeouts for kernel timer. The official userspace API is mostly timespec/timeval. The nsec_t type is an _internal_ type to manage time, so this makes it possible to do something like this: #ifdef CONFIG_HIRES_TIMER typedef u64 ktime_t; #else typedef u32 ktime_t; #endif bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-23 0:25 ` Roman Zippel @ 2005-09-23 6:49 ` Thomas Gleixner 2005-09-24 3:15 ` Roman Zippel 0 siblings, 1 reply; 50+ messages in thread From: Thomas Gleixner @ 2005-09-23 6:49 UTC (permalink / raw) To: Roman Zippel Cc: Christopher Friesen, linux-kernel, mingo, akpm, george, johnstul, paulmck On Fri, 2005-09-23 at 02:25 +0200, Roman Zippel wrote: > > Maybe it would make sense to have the API be in nanoseconds and internally use > > 32bit ms for now, and only change to 64bit nanos when we actually move to > > sub-ms resolution timers. > > Actually the decision to use ns has nothing to do with API issues. > <linux/jiffies.h> has already a lot of options to specify timeouts for > kernel timer. The official userspace API is mostly timespec/timeval. > The nsec_t type is an _internal_ type to manage time, so this makes it > possible to do something like this: > > #ifdef CONFIG_HIRES_TIMER > typedef u64 ktime_t; > #else > typedef u32 ktime_t; > #endif Sure that's possible, but the 32bit storage format has its limitiations and it is not possible to keep the code compatible for both use cases. Posix timers - both CLOCK_REALTIME and CLOCK_MONOTONIC - can be programmed in absolute time. In a 32bit representation with ms resolution we can store ~49 days, so we can not fit the value which come up from user space wihtout correction/conversion except we limit the use cases to 49 days uptime and clock realtime < 49days since the epoch. If we can not fit the given value into the internal representation, we have to do exactly what the current implementation of clock realtime in posix-timers.c has to do. Storing information about xtime / monotonic offset, adding the timer to yet another list (abs_list) convert to jiffies and in case the clock gets set, run through all the affected timers in abs_list recalculate the expiry value and requeue them. The idea of ktimers is to use the requested time given by a timespec in human time without any corrections, so we actually can avoid the above. Also doing time ordered insertion into a list introduces incompabilities between 32/64 bit storage formats. I carefully waged the necessary quirk load vs. the cleanliness, simplicity and robustness of a pure 64 bit implementation. The resulting payload for 32bit systems, which is in the range of 1-3 instructions per fast path operation (add, sub, compare) is not worth the trouble IMO to give up a clean, simple and robust design, which also allows high resolution timers with no big change to the base implementation. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-23 6:49 ` Thomas Gleixner @ 2005-09-24 3:15 ` Roman Zippel 2005-09-24 5:16 ` Ingo Molnar 0 siblings, 1 reply; 50+ messages in thread From: Roman Zippel @ 2005-09-24 3:15 UTC (permalink / raw) To: Thomas Gleixner Cc: Christopher Friesen, linux-kernel, mingo, akpm, george, johnstul, paulmck Hi, On Fri, 23 Sep 2005, Thomas Gleixner wrote: > The idea of ktimers is to use the requested time given by a timespec in > human time without any corrections, so we actually can avoid the above. > > Also doing time ordered insertion into a list introduces incompabilities > between 32/64 bit storage formats. Except that the (time) range of the list would be limited I don't really see a big difference. Anyway, the biggest cost is the conversion from/to the 64bit ns value and if its main use is sorting, you can use something like this: typedef union { u64 tv64; struct { #ifdef __BIG_ENDIAN u32 sec, nsec; #else u32 nsec, sec; #endif } tv; } ktimespec; To compare two time values the tv64 value is sufficient. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 3:15 ` Roman Zippel @ 2005-09-24 5:16 ` Ingo Molnar 2005-09-24 10:35 ` Roman Zippel 0 siblings, 1 reply; 50+ messages in thread From: Ingo Molnar @ 2005-09-24 5:16 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck * Roman Zippel <zippel@linux-m68k.org> wrote: > On Fri, 23 Sep 2005, Thomas Gleixner wrote: > > > The idea of ktimers is to use the requested time given by a timespec in > > human time without any corrections, so we actually can avoid the above. > > > > Also doing time ordered insertion into a list introduces incompabilities > > between 32/64 bit storage formats. > > Except that the (time) range of the list would be limited I don't really > see a big difference. > Anyway, the biggest cost is the conversion from/to the 64bit ns value > [...] Where do you get that notion from? Have you personally measured the performance and code size impact of it? If yes, would you mind to share the resulting data with us? Our data is that the use of 64-bit nsec_t significantly reduces the size of a representative piece of code (object size in bytes): AMD64 I386 ARM PPC32 M68K nsec_t_ops 226 284 252 428 206 timespec_ops 412 324 448 640 342 i.e. a ~40% size reduction when going to nsec_t on m68k, in that particular function. Even larger, ~45% code size reduction on a true 64-bit platform. Ingo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 5:16 ` Ingo Molnar @ 2005-09-24 10:35 ` Roman Zippel 2005-09-24 13:56 ` Thomas Gleixner 0 siblings, 1 reply; 50+ messages in thread From: Roman Zippel @ 2005-09-24 10:35 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Hi, On Sat, 24 Sep 2005, Ingo Molnar wrote: > > Anyway, the biggest cost is the conversion from/to the 64bit ns value > > [...] > > Where do you get that notion from? Have you personally measured the > performance and code size impact of it? If yes, would you mind to share > the resulting data with us? > > Our data is that the use of 64-bit nsec_t significantly reduces the size > of a representative piece of code (object size in bytes): > > AMD64 I386 ARM PPC32 M68K > nsec_t_ops 226 284 252 428 206 > timespec_ops 412 324 448 640 342 > > i.e. a ~40% size reduction when going to nsec_t on m68k, in that > particular function. Even larger, ~45% code size reduction on a true > 64-bit platform. Without any source these numbers are not verifiable. You don't even mention here what that "representative piece of code" is... Anyway, Thomas mentioned that this would be from the insert/remove code and here you omitted the most important part of my mail: typedef union { u64 tv64; struct { #ifdef __BIG_ENDIAN u32 sec, nsec; #else u32 nsec, sec; #endif } tv; } ktimespec; IOW this would allow to keep the time value in timespec format and use your nsec_t_ops for sorting. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 10:35 ` Roman Zippel @ 2005-09-24 13:56 ` Thomas Gleixner 2005-09-24 16:51 ` Daniel Walker 2005-09-24 23:45 ` Roman Zippel 0 siblings, 2 replies; 50+ messages in thread From: Thomas Gleixner @ 2005-09-24 13:56 UTC (permalink / raw) To: Roman Zippel Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck On Sat, 2005-09-24 at 12:35 +0200, Roman Zippel wrote: > Hi, > > On Sat, 24 Sep 2005, Ingo Molnar wrote: > > > > Anyway, the biggest cost is the conversion from/to the 64bit ns value > > > [...] > > > > Where do you get that notion from? Have you personally measured the > > performance and code size impact of it? If yes, would you mind to share > > the resulting data with us? > > > > Our data is that the use of 64-bit nsec_t significantly reduces the size > > of a representative piece of code (object size in bytes): > > > > AMD64 I386 ARM PPC32 M68K > > nsec_t_ops 226 284 252 428 206 > > timespec_ops 412 324 448 640 342 > > > > i.e. a ~40% size reduction when going to nsec_t on m68k, in that > > particular function. Even larger, ~45% code size reduction on a true > > 64-bit platform. > > Without any source these numbers are not verifiable. You don't even > mention here what that "representative piece of code" is... struct base { nsec_t now; struct ktimer *timers[16]; struct ktimer *running; }; void nsec_t_ops(struct base *base, struct ktimer *next, nsec_t *tim, int mode) { int i; nsec_t now = base->now; for (i = 0; i < 16; i++) { void (*fn)(void *); void *data; struct ktimer *timer = base->timers[i]; if (timer->expires > now) break; timer->expired = now; fn = timer->function; data = timer->data; base->running = timer; fn(data); base->running = NULL; } switch(mode) { case 0: next->expires = *tim; break; case 1: next->expires = now + *tim; break; case 2: next->expires += *tim; break; case 3: while (next->expires > now) { next->expires += *tim; } break; } base->timers[0] = next; } versus: #define NSEC_PER_SEC 1000000000 struct base { struct timespec now; struct ktimer *timers[16]; struct ktimer *running; }; #define timespec_gt(a,b) \ (((a).tv_sec > (b).tv_sec) ? 1 : \ (((a).tv_sec < (b).tv_sec) ? 0 : \ ((a).tv_nsec > (b).tv_nsec))) #define timespec_addptr(a,b) \ (a)->tv_sec = ((a)->tv_sec + (b)->tv_sec); \ (a)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec); \ if ((a)->tv_nsec >= NSEC_PER_SEC){ \ (a)->tv_nsec -= NSEC_PER_SEC; \ (a)->tv_sec++; \ } #define timespec_addppp(c,a,b) \ (c)->tv_sec = ((a)->tv_sec + (b)->tv_sec); \ (c)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec); \ if ((c)->tv_nsec >= NSEC_PER_SEC){ \ (c)->tv_nsec -= NSEC_PER_SEC; \ (c)->tv_sec++; \ } void timespec_ops(struct base *base, struct ktimer *next, struct timespec *tim, int mode) { int i; struct timespec now = base->now; for (i = 0; i < 16; i++) { void (*fn)(void *); void *data; struct ktimer *timer = base->timers[i]; if (timespec_gt(timer->expires, now)) break; timer->expired = now; fn = timer->function; data = timer->data; base->running = timer; fn(data); base->running = NULL; } switch(mode) { case 0: next->expires = *tim; break; case 1: timespec_addppp(&next->expires, &now, tim); break; case 2: timespec_addptr(&next->expires, tim); break; case 3: while (timespec_gt(now, next->expires)) { timespec_addptr(&next->expires, tim); } break; } base->timers[0] = next; } > Anyway, Thomas mentioned that this would be from the insert/remove code > and here you omitted the most important part of my mail: > > typedef union { > u64 tv64; > struct { > #ifdef __BIG_ENDIAN > u32 sec, nsec; > #else > u32 nsec, sec; > #endif > } tv; > } ktimespec; > > IOW this would allow to keep the time value in timespec format and use > your nsec_t_ops for sorting. Yes, it works for comparisons. But for any other operation this construct has the same problem than struct timespec itself. You need at least an add function which is always an add and a comparison / correction vs. nsec >= NSEC_PER_SEC. The 64 bit nsec_t value can just be used as is without inventing a wrapper macro for each operation. The only point, where (k)timespec has an advantage is that the userspace value must not be converted to nsec_t, but deducing therefor this is the better overall solution is a fallacy. nsec_t ktimespec syscall: 32x32 mul 64bit add 2 x 32bit move arm timer: 64 bit add 2 x 32 bit add 32 bit compare 32 bit sub 32 bit add The 3 operation compensate for the 32x32 multiplication. For interval timers you have the 32 bit compare 32 bit sub 32 bit add additional overhead for each rearm. The backward conversion from nsec_t to timespec is almost a non issue. The vast majority of callers dont provide the second argument to nanosleep(), setitimer(), set_timer() which makes the conversion necessary and I think we optimize for the common use case. Besides that the representation of time in nsec_t values is much clearer. I know that we have to deal with timespecs vs. userspace, but keeping this representation for kernel internal usage reminds me on the BCD calculations which were a similar 2^x vs 10^x oddity in the early days of microprocessors. Of course they were obstinate and survived a surprisingly long time. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 13:56 ` Thomas Gleixner @ 2005-09-24 16:51 ` Daniel Walker 2005-09-24 23:45 ` Roman Zippel 1 sibling, 0 replies; 50+ messages in thread From: Daniel Walker @ 2005-09-24 16:51 UTC (permalink / raw) To: tglx Cc: Roman Zippel, Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck On Sat, 2005-09-24 at 15:56 +0200, Thomas Gleixner wrote: > On Sat, 2005-09-24 at 12:35 +0200, Roman Zippel wrote: > > Hi, > > > > On Sat, 24 Sep 2005, Ingo Molnar wrote: > > > > > > Anyway, the biggest cost is the conversion from/to the 64bit ns value > > > > [...] > > > > > > Where do you get that notion from? Have you personally measured the > > > performance and code size impact of it? If yes, would you mind to share > > > the resulting data with us? > > > > > > Our data is that the use of 64-bit nsec_t significantly reduces the size > > > of a representative piece of code (object size in bytes): > > > > > > AMD64 I386 ARM PPC32 M68K > > > nsec_t_ops 226 284 252 428 206 > > > timespec_ops 412 324 448 640 342 > > > > > > i.e. a ~40% size reduction when going to nsec_t on m68k, in that > > > particular function. Even larger, ~45% code size reduction on a true > > > 64-bit platform. > > > > Without any source these numbers are not verifiable. You don't even > > mention here what that "representative piece of code" is... These numbers are misleading .. Doing a total code comparison shows that a 2.6.14-rc2+ktimers kernel is slightly bigger than a vanilla 2.6.14-rc2 kernel (gcc 4.0, defconfig) .. So your argument that "small is faster" must mean ktimers is slower, or at least not faster .. Making a speed argument based on code size doesn't make much sense to me, if it's actually faster then show that it's faster. Daniel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 13:56 ` Thomas Gleixner 2005-09-24 16:51 ` Daniel Walker @ 2005-09-24 23:45 ` Roman Zippel 2005-09-25 21:00 ` Thomas Gleixner 2005-09-25 21:02 ` Thomas Gleixner 1 sibling, 2 replies; 50+ messages in thread From: Roman Zippel @ 2005-09-24 23:45 UTC (permalink / raw) To: Thomas Gleixner Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Hi, On Sat, 24 Sep 2005, Thomas Gleixner wrote: > #define timespec_gt(a,b) \ > (((a).tv_sec > (b).tv_sec) ? 1 : \ > (((a).tv_sec < (b).tv_sec) ? 0 : \ > ((a).tv_nsec > (b).tv_nsec))) > > #define timespec_addptr(a,b) \ > (a)->tv_sec = ((a)->tv_sec + (b)->tv_sec); \ > (a)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec); \ > if ((a)->tv_nsec >= NSEC_PER_SEC){ \ > (a)->tv_nsec -= NSEC_PER_SEC; \ > (a)->tv_sec++; \ > } > > #define timespec_addppp(c,a,b) \ > (c)->tv_sec = ((a)->tv_sec + (b)->tv_sec); \ > (c)->tv_nsec = ((a)->tv_nsec + (b)->tv_nsec); \ > if ((c)->tv_nsec >= NSEC_PER_SEC){ \ > (c)->tv_nsec -= NSEC_PER_SEC; \ > (c)->tv_sec++; \ > } Alternative for ktimespec: #define timespec_gt(a,b) ((a).tv64 > (b).tv64) #if BITS_PER_LONG == 64 #define timespec_addptr(a,b) \ (a).tv64 += (b).tv64; \ if ((a).tv.nsec >= NSEC_PER_SEC) { \ (a).tv64 += (u32)-NSEC_PER_SEC; \ } #define timespec_addppp(c,a,b) \ (c).tv64 = (a).tv64 + (b).tv64; \ if ((c).tv.nsec >= NSEC_PER_SEC) { \ (c).tv64 += (u32)-NSEC_PER_SEC; \ } #else #define timespec_addptr(a,b) \ (a).tv.sec = ((a).tv.sec + (b).tv.sec); \ (a).tv.nsec = ((a).tv.nsec + (b).tv.nsec); \ if ((a).tv.nsec >= NSEC_PER_SEC) { \ (a).tv.nsec -= NSEC_PER_SEC; \ (a).tv.sec++; \ } #define timespec_addppp(c,a,b) \ (c).tv.sec = ((a).tv.sec + (b).tv.sec); \ (c).tv.nsec = ((a).tv.nsec + (b).tv.nsec); \ if ((c).tv.nsec >= NSEC_PER_SEC) { \ (c).tv.nsec -= NSEC_PER_SEC; \ (c).tv.sec++; \ } #endif Adding the necessary conversion to the makes the difference even smaller. > The only point, where (k)timespec has an advantage is that the userspace > value must not be converted to nsec_t, but deducing therefor this is the > better overall solution is a fallacy. That's your opinion... > nsec_t ktimespec > > syscall: > 32x32 mul > 64bit add 2 x 32bit move > > arm timer: > 64 bit add 2 x 32 bit add > 32 bit compare > 32 bit sub > 32 bit add > > The 3 operation compensate for the 32x32 > multiplication. The multiply is not necessarly cheap, if the arch has no 32x32->64 instruction, gcc will generate a call to __muldi3(). Overall for the common case both variations don't differ much in speed and size (for a single code path). For a few timers it likely doesn't matter and for a lot of timers the tree insert likely dominates. > The backward conversion from nsec_t to timespec is almost a non issue. > The vast majority of callers dont provide the second argument to > nanosleep(), setitimer(), set_timer() which makes the conversion > necessary and I think we optimize for the common use case. You know very well, that the conversion back to timespec is the killer in your calculation. You graciously decide that the "vast majority" doesn't want to read the timer, how did you get to that conclusion? > Besides that the representation of time in nsec_t values is much > clearer. Well, that depends on the bigger picture, mainly how the timesource manages the time. We want to optimize them for a fast get(ns)timeofday, so we have already timespec based interfaces. Tick based sources will keep a cached xtime timespec, so they either have to convert that to ns or maintain another cached value just for your ktimers. As long as you can't get rid of timespec completely (which is impossible), there is a value in keeping it as much as possible as timespec. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 23:45 ` Roman Zippel @ 2005-09-25 21:00 ` Thomas Gleixner 2005-09-27 16:54 ` Roman Zippel 2005-09-25 21:02 ` Thomas Gleixner 1 sibling, 1 reply; 50+ messages in thread From: Thomas Gleixner @ 2005-09-25 21:00 UTC (permalink / raw) To: Roman Zippel Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck On Sun, 2005-09-25 at 01:45 +0200, Roman Zippel wrote: > > The backward conversion from nsec_t to timespec is almost a non issue. > > The vast majority of callers dont provide the second argument to > > nanosleep(), setitimer(), set_timer() which makes the conversion > > necessary and I think we optimize for the common use case. > > You know very well, that the conversion back to timespec is the killer in > your calculation. You graciously decide that the "vast majority" doesn't > want to read the timer, how did you get to that conclusion? I graciously put instrumentation into _all_ the relevant syscalls on a desktop and a server machine. The result is that less than 1% of the calls provide the read back variable. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-25 21:00 ` Thomas Gleixner @ 2005-09-27 16:54 ` Roman Zippel 2005-09-27 19:03 ` Tim Bird 0 siblings, 1 reply; 50+ messages in thread From: Roman Zippel @ 2005-09-27 16:54 UTC (permalink / raw) To: Thomas Gleixner Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Hi, On Sun, 25 Sep 2005, Thomas Gleixner wrote: > > You know very well, that the conversion back to timespec is the killer in > > your calculation. You graciously decide that the "vast majority" doesn't > > want to read the timer, how did you get to that conclusion? > > I graciously put instrumentation into _all_ the relevant syscalls on a > desktop and a server machine. The result is that less than 1% of the > calls provide the read back variable. That sill means it is used and if an application actually depends on it, it would be penalized by your implementation. These timers may open up new application (in kernel or user space), where this conversion may be needed, so _only_ looking at the current numbers is a bit misleading. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-27 16:54 ` Roman Zippel @ 2005-09-27 19:03 ` Tim Bird 2005-09-28 16:36 ` Roman Zippel 0 siblings, 1 reply; 50+ messages in thread From: Tim Bird @ 2005-09-27 19:03 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Roman Zippel wrote: > On Sun, 25 Sep 2005, Thomas Gleixner wrote: >> Roman Zippel wrote: >>> You know very well, that the conversion back to timespec >>> is the killer in your calculation. You graciously >>> decide that the "vast majority" doesn't >>> want to read the timer, how did you get to that >>> conclusion? >> >> I graciously put instrumentation into _all_ the >> relevant syscalls on a desktop and a server machine. >> The result is that less than 1% of the >> calls provide the read back variable. > > That still means it is used and if an application > actually depends on it, it would be penalized by > your implementation. These timers may open up new > application (in kernel or user space), where > this conversion may be needed, so _only_ looking > at the current numbers is a bit misleading. Oh good heavens! One can always point to real or hypothetical cases where a change like this will result in worse performance. Will you only be satisfied if there is provably NO performance degradation for ANY app on ANY platform? Even if the code is easier to maintain, and allows for improvements in functionality and equal or better performance for the majority of apps. and platforms? We're talking about a tradeoff here, and I, of all people, should be worried about the possible impact on low-end embedded hardware. However, having seen some of the problems with the current timer system in the kernel, I'm in favor of looking at some abstraction improvements. Unless I missed something, ktimers has not been recommended for mainlining yet. I suspect (without having measured it myself yet) that the core abstraction that it proposes (timers vs. timeouts) is an important one for improving the kernel timing system. Personally, I'd like to see it go into -mm or some other experimental tree, to give it a proper shakedown. If some nasty corner cases show up, then let them show up under testing rather than via conjecture. -- Tim ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-27 19:03 ` Tim Bird @ 2005-09-28 16:36 ` Roman Zippel 0 siblings, 0 replies; 50+ messages in thread From: Roman Zippel @ 2005-09-28 16:36 UTC (permalink / raw) To: Tim Bird Cc: Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Hi, On Tue, 27 Sep 2005, Tim Bird wrote: > > That still means it is used and if an application > > actually depends on it, it would be penalized by > > your implementation. These timers may open up new > > application (in kernel or user space), where > > this conversion may be needed, so _only_ looking > > at the current numbers is a bit misleading. > > Oh good heavens! One can always point to real or > hypothetical cases where a change like this > will result in worse performance. Will you only > be satisfied if there is provably NO performance > degradation for ANY app on ANY platform? I want to get the focus at the complete picture, as this is a rather critical area and I will be satisfied, as soon as I can see all consequences and possibilities have been considered. > Even > if the code is easier to maintain, and allows > for improvements in functionality and equal or > better performance for the majority of apps. > and platforms? If that's case, you're hopefully not afraid of a few questions? Why do I have to take the code as is and just believe the claims about it? I like improvements as everyone, but I also want to verify them and look at the alternatives and I can't see anything wrong with it. > Unless I missed something, ktimers has not been > recommended for mainlining yet. I suspect (without > having measured it myself yet) that the > core abstraction that it proposes (timers > vs. timeouts) is an important one for improving > the kernel timing system. I'm not saying that the idea is wrong, the general direction is fine, but some course correction should be possible? bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 23:45 ` Roman Zippel 2005-09-25 21:00 ` Thomas Gleixner @ 2005-09-25 21:02 ` Thomas Gleixner 2005-09-27 16:48 ` Roman Zippel 1 sibling, 1 reply; 50+ messages in thread From: Thomas Gleixner @ 2005-09-25 21:02 UTC (permalink / raw) To: Roman Zippel Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck On Sun, 2005-09-25 at 01:45 +0200, Roman Zippel wrote: > The multiply is not necessarly cheap, if the arch has no 32x32->64 > instruction, gcc will generate a call to __muldi3(). Can you please point out which architectures do not have a 32x32->64 instruction ? tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-25 21:02 ` Thomas Gleixner @ 2005-09-27 16:48 ` Roman Zippel 2005-09-27 18:38 ` Tim Bird 0 siblings, 1 reply; 50+ messages in thread From: Roman Zippel @ 2005-09-27 16:48 UTC (permalink / raw) To: Thomas Gleixner Cc: Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Hi, On Sun, 25 Sep 2005, Thomas Gleixner wrote: > On Sun, 2005-09-25 at 01:45 +0200, Roman Zippel wrote: > > > The multiply is not necessarly cheap, if the arch has no 32x32->64 > > instruction, gcc will generate a call to __muldi3(). > > Can you please point out which architectures do not have a 32x32->64 > instruction ? I have no complete overview. I know that Motorola actually removed that instruction in the M68060 (it causes an emulation trap) and it's still not back in newer ColdFire cpus. For arm it's an optional instruction in earlier versions (v3). For ppc it's splitted into two instructions. For the rest you might want to check <asm/div64.h>, if div64 has to be emulated, there are good chances this instruction has to be emulated as well (especially in smaller embedded archs). bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-27 16:48 ` Roman Zippel @ 2005-09-27 18:38 ` Tim Bird 2005-09-27 20:36 ` George Anzinger 0 siblings, 1 reply; 50+ messages in thread From: Tim Bird @ 2005-09-27 18:38 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel, akpm, george, johnstul, paulmck Roman Zippel wrote: > On Sun, 25 Sep 2005, Thomas Gleixner wrote: >>Can you please point out which architectures do not have a 32x32->64 >>instruction ? <snip> > For the rest you might want to check <asm/div64.h>, if div64 has to be > emulated, there are good chances this instruction has to be emulated as > well (especially in smaller embedded archs). Hmmm. In my experience, there are several embedded platforms with a 32x32->64 instruction, which are lacking a div64 instruction. I don't think checking for div64 is a very good metric here. ============================= Tim Bird Architecture Group Chair, CE Linux Forum Senior Staff Engineer, Sony Electronics ============================= ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-27 18:38 ` Tim Bird @ 2005-09-27 20:36 ` George Anzinger 0 siblings, 0 replies; 50+ messages in thread From: George Anzinger @ 2005-09-27 20:36 UTC (permalink / raw) To: Tim Bird Cc: Roman Zippel, Thomas Gleixner, Ingo Molnar, Christopher Friesen, linux-kernel, akpm, johnstul, paulmck Tim Bird wrote: > Roman Zippel wrote: > >>On Sun, 25 Sep 2005, Thomas Gleixner wrote: >> >>>Can you please point out which architectures do not have a 32x32->64 >>>instruction ? > > > <snip> > >>For the rest you might want to check <asm/div64.h>, if div64 has to be >>emulated, there are good chances this instruction has to be emulated as >>well (especially in smaller embedded archs). > > > Hmmm. In my experience, there are several embedded platforms > with a 32x32->64 instruction, which are lacking a div64 instruction. > I don't think checking for div64 is a very good metric here. Also, even having a div64 instruction does not eliminate the asm/div64.h as it checks for results that are >32-bits and does the right thing. For example, letting the x86 do this divide with such a result, results in a trap. This is why we need to be very careful where we use the div_ll_l_rem() which accesses just this instruction. -- George Anzinger george@mvista.com HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-22 23:09 ` Roman Zippel 2005-09-22 23:31 ` Christopher Friesen @ 2005-09-23 2:25 ` john stultz 2005-09-23 8:27 ` Thomas Gleixner 2005-09-23 15:21 ` Paul E. McKenney 3 siblings, 0 replies; 50+ messages in thread From: john stultz @ 2005-09-23 2:25 UTC (permalink / raw) To: Roman Zippel; +Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, paulmck On Fri, 2005-09-23 at 01:09 +0200, Roman Zippel wrote: > On Thu, 22 Sep 2005, Thomas Gleixner wrote: > > > It also doesn't explain how it will interact with Johns work, > > > > "The following add on patches are not provided for ad hoc inclusion as > > they contain third party patches. The reason for providing this series > > is to demonstrate the future use of ktimers and the simple extensibility > > for the impelemtation of high resolution timers. Especially John Stultz > > timeofday patch is a complete seperate issue > > and just used due to the ability to provide high resolution timers in a > > simple and non intrusive way." > > > > Isn't this clear enough ? > > No and I explained why I think that these are not separate issues at all. You had a long response, and some of the terminology is confusing, so I'm not sure exactly how to help address your concern. You did accurately summarized that in my patches I introduced two major generic concepts: 1. The time source abstraction of a free-running counter to be used for timekeeping. (Note: not "timer source" - timers generate interrupts, time sources do not necessarilly) 2. Allowing the wall-clock (as well as a monotonic system clock) to run independent of timer ticks, via the time source abstraction. Your complaint was that in making these change the NTP model has been changed, although I'm not sure if I agree, but hopefully the patches I sent out today can continue that discussion. :) It is my claim that these two concepts allow for correct and robust timekeeping, and additionally give some flexibility so larger change (such as dynamic ticks) can be made without worrying too much about their effect on timekeeping. However, it does not provide anything interface wise that is not currently exist, it just merely cleans some of the interfaces up. So, ignoring correctness in the face of lost ticks and other timekeeping problems, my patch is not strictly necessary for what Thomas is doing. It just so happens that my do_monotonic_clock() interface provided a correct monotonic system time in nanoseconds and that meshed very well with Thomas' work. > > > Ok, so what's missing? From a basic design overview I would expect some > > > information about types of time within the kernel and their relationship. > > > We basically have three types: > > > - scheduler time > > > - wallclock time > > > - process time My clarity in writing is sometimes an issue, but let me take a shot at it(Thomas, or anyone, feel free to correct me). Currently we have two main domains of time in the kernel: xtime and jiffies. jiffies: jiffies is a simple HZ frequency software maintained (interrupt based) counter. Since it is fairly low-res, it easily fits in a single long int(well, a reasonable portion of it does), and requires no locking to atomically access, making it very easy and fast to use. It is used for almost all in-kernel time accounting, from scheduler and processes accounting to soft-timers. Since it is not exported to userspace, the counter is less robust on some arches in the face of things like lost ticks. It is not NTP corrected, has a limited range on some arches (in it's single long int form) and discrepancies between the requested HZ value and the actual tick frequency (ACTHZ)can cause additional confusion when mapping jiffies to actual time. xtime: xtime provides nsec resolution software maintained NTP adjusted wall clock which is exported to userspace. Along with wall_to_monotonic we get a NTP adjusted monotonic clock. This is expected to be very robust and accurate, however since it is so finely grained it requires 64 bits (or a timepsec) to store it, which can cause performance concerns in the cases where nsec resolution is unnecessary. Now that the time domains are covered, how do we use them? Soft-timers / Timeouts (aka: In kernel software maintained timers): Soft-timers provide a internal kernel mechanism for running code at a later specified time. Soft timers use jiffies for expiration, so they are fast to use, but are limited to HZ resolution. As Thomas already discussed (as well as LWN's article) they are very frequently used for timeouts that are removed before they expire. Additionally, since they are jiffies based, they have problems when mapping back and forth with wall time, and do not robustly handle lost ticks (however due to their common use, this is not normally an issue). When userspace requests for action at a future time are made to the kernel, they are made using some form of human time unit (flat usecs or timespecs, whatever). Currently inside the kernel, we must convert these requests to jiffies and use the soft-timer subsystem. This limits both the range and low resolution of the request. Additionally, the discrepancies between HZ, ACTHZ, NTP adjustments and lost ticks can cause for additional inaccuracies in the conversion. ktimers (From my understanding, again Thomas, correct me as needed): ktimers provide a completely separate soft-timer list, which can use either the wall-clock or the monotonic-clock as its domain for addition and expiration. Since users may specify nanosecond resolution requests, ktimers preserve the request in a nanosecond form. This eliminates any discrepancies between jiffies and wall or monotonic time, and allows for future sub-HZ latencies for expiration (in combination with a high-res hardware-timer interrupt source). Additionally, in-kernel users who desire high-precision wall/monotonic clock based timers could find ktimers useful. So the existing fast interface remains with the same jiffies time domain. ktimers just add a secondary high-resolution interface that maps to the wall/monotonic_clock domain. > > > The existence of the timer source abstraction is a major requirement for > > > further improvements (in this regard it's already suspicious, that you put > > > major changes before Johns patch). > > > > Whats suspicious on that ? Seperating the "timeout" API and the "timer" > > API has nothing to do with Johns patches. > > Related changes should be done in a logical order, which I'm obviously > disagree about with you. However, in this case the ktimer patch Thomas mailed out is really independent from my change. I believe my change helps insure his interfaces behave properly (you don't want your monotonic clock jumping backwards occasionally!), but they do not affect his code's logic. The fact that I make wall time update independent of timer ticks is really just for simplicity and correctness. It is in no way a requirement for wall/monotonic domain based timers or high-res timers (my code does not provide any higher resolution interface then what is already there). Maybe if you were talking about the dynamic tick changes, would it make sense to wait and do my changes first, and Thomas is not proposing that at this moment. > > > The next major change would be to add the possibility to reprogram a > > > timer source, the scheduler can use this to > > > skip timer ticks and e.g. itimer can offer higher resolution timers. The > > > main point here is before we get to any API decisions, we need to develop > > > a model how a single time source can drive multiple users. Your split > > > between user timers and kernel timeouts leaves this question completely > > > open. > > > > Did I claim, that ktimers solve this problem? > > > > No. > > > > The patches are related but address different aspects of the overall > > problem without conflicting with each other. Quite the contrary: they > > complement each other. > > > > I clearly stated that the reprogramming of timer events, which are not > > addressed by ktimers and I never claimed ktimers does, is a completely > > different problem. > > No, it's part of the same problem, how are scheduler and your ktimers > supposed to share the same time source? They don't share a time source. ktimers are in the xtime domain, the scheduler is in the jiffies domain. Sorry, this was really much longer then I wanted it to be. Hopefully it wasn't too repetitious, and reasonably clear. thanks -john ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-22 23:09 ` Roman Zippel 2005-09-22 23:31 ` Christopher Friesen 2005-09-23 2:25 ` john stultz @ 2005-09-23 8:27 ` Thomas Gleixner 2005-09-24 2:43 ` Roman Zippel 2005-09-23 15:21 ` Paul E. McKenney 3 siblings, 1 reply; 50+ messages in thread From: Thomas Gleixner @ 2005-09-23 8:27 UTC (permalink / raw) To: Roman Zippel; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck On Fri, 2005-09-23 at 01:09 +0200, Roman Zippel wrote: > > Quick answer: Networking and disk I/O. Insane load on a 4 way SMP > > machine. Check yourself. :) > > This no answer at all, you only repeat what you already said above. :( > Care to share your knowledge? Each network connection, each disk I/O operation arms a timeout timer to cover error conditions. Increasing the load on those increases the number of armed timers. At the same time this increased load keeps the timers longer active as it takes more time to detect that the "good" condition arrived on time. > You don't make it obvious at all. You jump from problems with _kernel_ > timers, to the introduction of a new subsytem to manage _process_ timers. > Why don't you fix the problems with kernel timers separately? > Only because kernel timer require less precision, doesn't mean you can > make them imprecise. I did not make them imprecise. ktimers changes excactly nothing there. > The timer accuracy is defined by the timer source and > I expect it to be the same for both kernel and process timers. As John pointed out correctly, this is a seperation of time domains. Is Johns explanation enough ? > > > Basically how does the new big picture look like and how do high > > > resolution timer fit into it? (You are more busy defending the 64bit math, > > > than actually explaining why and where it's needed in the first place.) > > > > I also explained why I wanted to seperate "timeout" and "timers" APIs. I > > explained why I choose rbtree and I explained why I used 64bit math and > > at least why 64 bit math is not that evil as commonly seen. > > I don't say that 64bit math is evil, I just question that it's required - > small, but important difference. > The main problem with your ktimer patch is that it's another all-in-one > patch, it simply changes too many aspects at once. If you want to > introduce a new API, you can do so by first introducing a small layer > which maps to the old layer. This makes it easier to see and prove any > potential improvement. I dont see ktimers as an all in one change everything patch. It addresses exactly _one_ problem and nothing else. It seperates time domains and their representation and handling. I dont see how you want to map this new API to the old layer. Lets look at the patch: ktimers.c/h introduce the new API (no change to existing code) The other files changed are the conversions of the users to this new API. Mostly small changes except for posix-timers.c, where the abs_list handling was removed as it was not longer necessary. Providing a new API wihtout making use of it and showing the benefits is rather pointless. I consider the simplification of posix-timers as a valuable benefit. > > > I'm not against high resolution timers per se, but this > > > doesn't explain why it has to be high resolution all the way. > > > > Where is high resolution all the way. Care to read the patch ? It's high > > resolution aware and it does take out odd areas of code by design. > > It's not just high resolution aware, it makes all calculation in high > resolution _unconditionally_, which makes it high resolution all the way. Please see the other mail explaining 32/64bit issues. > > > It also doesn't explain how it will interact with Johns work, > > > > "The following add on patches are not provided for ad hoc inclusion as > > they contain third party patches. The reason for providing this series > > is to demonstrate the future use of ktimers and the simple extensibility > > for the impelemtation of high resolution timers. Especially John Stultz > > timeofday patch is a complete seperate issue > > and just used due to the ability to provide high resolution timers in a > > simple and non intrusive way." > > > > Isn't this clear enough ? > > No and I explained why I think that these are not separate issues at all. These issues are seperate even if ktimers uses values provided by the time of day subsystem. ktimers need the current time in the representation of CLOCK_REALTIME and CLOCK_MONOTIC. ktimers doe not rely on Johns work. Ktimers can make use of Johns work as it uses the values provided by the existing timeofday code now. The usage is simpler if Johns work is in place. Nothing else. > > > The main difference between them is that the latter is user > > > programmable. > > > > wallclock is reprogrammable too and it introduces a bunch of horrible > > functions in posix-timers.c. grep for abs_list. I explained why its > > horrible already. > > I said _user_ programmable, wallclock time is usually NTP controlled. I consider sys_adjtimex() and sys_settimeofday() as user interfaces. Both affect wallclock and therefor affect timers related to wallclock. > > 2.I dont see any hidden arch code in the ktimers patch. Do you ? > > That's not what I meant (and if you had taken the time to think about it, > instead of just being angry at me, I'm sure you would have noticed > yourself), this is e.g. about code in arch/i386/kernel/timers/ or > arch/ppc/kernel/time.c. I dont see what you want. The submitted ktimers patch does not contain a single change to arch/xxx files. If you refer to the proof of concept high resolution timer implementation, then I really do not understand why you are insisting on that. It is proof of concept to verify the usability of the ktimer base implementation for high resolution timing - nothing more. I used Johns patches for that proof of concept implementation as they made life simpler. So there is no point in discussing this. I clearly said that from the beginning. I provided this addon patch series to give interested developers a possibility to look at it and compare and contrast it to the existing high resolution timer implementations. > > > The existence of the timer source abstraction is a major requirement for > > > further improvements (in this regard it's already suspicious, that you put > > > major changes before Johns patch). > > > > Whats suspicious on that ? Seperating the "timeout" API and the "timer" > > API has nothing to do with Johns patches. > > Related changes should be done in a logical order, which I'm obviously > disagree about with you. Again. The patches are orthogonal. So where is the logical order ? > > I clearly stated that the reprogramming of timer events, which are not > > addressed by ktimers and I never claimed ktimers does, is a completely > > different problem. > > No, it's part of the same problem, how are scheduler and your ktimers > supposed to share the same time source? <SNIP...> Admittedly everything which is dealing with aspects of time is related, but it can and must be seperated into different subsystems, which make use of the provided interfaces. 1. Time tick - constant frequency tick Provides interface for: reading the current tick count 2. Time of day - handles frequency adjustments of the timesource - keeps track of monotonic time - provides the representation of wall clock time - handles the adjustment of wall clock time Provides interfaces for: reading monotonic time reading wallclock time adjusting the frequency of the time source setting wallclock time Makes possibly use of the interface: (Depends on the availability of time sources) time tick:read tick count 3. Timeout API - Time tick based timer handling - Solely used for in kernel purposes Provides interfaces for: adding timers modifying timers deleting timers Makes use of the interface: time tick:read tick count 4. Timer API - monotonic clock based timers - realtime (wallclock) clock based timers - Mainly intended for application timers Provides interfaces for: adding timers modifying timers deleting timers Makes use of the interfaces: timeofday: read monotonic time timeofday: read wallclock time So we have four seperate building blocks related to time, but clearly seperated. The current implementation in the kernel is providing only 1,2,3. The timeofday API is somewhat intermingled with the tick code. The Timer API is implemented with a bunch of workarounds by using the Timeout API. ktimers provide the seperation of Timeout API and Timer API and therefor the seperation of the time domains. ktimers do not need any changes to 1,2,3 but can benefit from what ever improvement is made in the timeofday domain. Johns patches address the clear seperation of time ticks and time of day timekeeping and do not affect the timeout API nor ktimers. Once Johns work is in place the ktimer code simply makes use of the new interfaces. High resolution timers and dynsmic ticks need this seperations to make them less intrusive. Of course we need an additional abstraction layer, which handles the timer event sources. tglx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-23 8:27 ` Thomas Gleixner @ 2005-09-24 2:43 ` Roman Zippel 2005-09-24 5:03 ` Ingo Molnar 2005-09-24 9:04 ` James Bruce 0 siblings, 2 replies; 50+ messages in thread From: Roman Zippel @ 2005-09-24 2:43 UTC (permalink / raw) To: Thomas Gleixner; +Cc: linux-kernel, mingo, akpm, george, johnstul, paulmck Hi, On Fri, 23 Sep 2005, Thomas Gleixner wrote: > Each network connection, each disk I/O operation arms a timeout timer to > cover error conditions. Increasing the load on those increases the > number of armed timers. At the same time this increased load keeps the > timers longer active as it takes more time to detect that the "good" > condition arrived on time. You're still rather vague here... Anyway, if the amount of active timer should be a problem, there are ways to avoid them. For disk io it's rather simple as the timeout is mostly constant: start i/o: req->expire = jiffies + timeout; list_add_tail(req->list, &timeout_list); if (!timer_pending(timer)) { timer->expire = req->expire; add_timer(timer); } timeout function: req = list_head(&timeout_list); if (time_after_eq(req->expire, jiffies)) { // error... } else { timer->expire = req->expire; add_timer(timer); } Network timer are a bit more difficult as the timeouts are more dynamic, but one can at least delay arming the timer in most cases, by running a timer every x ticks: start timer: if (timeout < x) add_timer(); else list_add_tail(); timer function: list_for_each_entry() add_timer(); Should the action be successfull before the timer runs, it only needs to remove it from the private list. > Admittedly everything which is dealing with aspects of time is related, > but it can and must be seperated into different subsystems, which make > use of the provided interfaces. > > 1. Time tick > > - constant frequency tick > > Provides interface for: > reading the current tick count > > 2. Time of day > > - handles frequency adjustments of the timesource > - keeps track of monotonic time > - provides the representation of wall clock time > - handles the adjustment of wall clock time > > Provides interfaces for: > reading monotonic time > reading wallclock time > adjusting the frequency of the time source > setting wallclock time > > > Makes possibly use of the interface: > (Depends on the availability of time sources) > time tick:read tick count > > 3. Timeout API > > - Time tick based timer handling > - Solely used for in kernel purposes > > Provides interfaces for: > adding timers > modifying timers > deleting timers > > Makes use of the interface: > time tick:read tick count > > 4. Timer API > > - monotonic clock based timers > - realtime (wallclock) clock based timers > - Mainly intended for application timers > > Provides interfaces for: > adding timers > modifying timers > deleting timers > > Makes use of the interfaces: > timeofday: read monotonic time > timeofday: read wallclock time > > So we have four seperate building blocks related to time, but clearly > seperated. First, please don't say API if you talk about subsystems, it's not the same. APIs are part of a subsystem and describe the relationships between subsystems. I think that caused a major part of the confusion. I don't completely agree with your picture above and while these subsystems are mostly separate, they are not independent: current dependencies: 1. time source ---> 2. wallclock ---> 4. process timer \-> 3. kernel timer -/ We have currently a very simple time source, which just provides a nonprogrammable timer interrupt. Process timer are limited in their (programmable) resolution to kernel. This maybe also explains better the 3 types of time I mentioned (or domains if you prefer): - wallclock time: NTP controlled, only readable and not programmable - scheduler time: monotonic and unsynchronized, readable and programmable in HZ resolution - process time: derived from the first two. Johns work now (hopefully) integrates wallclock functionality into the time source: time source ---> wallclock time ntp library -/ This means the NTP code becomes a library which can be used by the time source to provide a synchronized wallclock time. The next step would be to make the time source programmable, so that we can get rid of the kernel timer dependency from the process timer: time source ---> kernel timer \-> process timer In this context the main functionality of your patch now finally becomes understandable: making process timer independent of kernel timer. This never became clear from your announcement, it talks about a lot of unrelated problems, but it never gets to the actual problem. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 2:43 ` Roman Zippel @ 2005-09-24 5:03 ` Ingo Molnar 2005-09-24 9:04 ` James Bruce 1 sibling, 0 replies; 50+ messages in thread From: Ingo Molnar @ 2005-09-24 5:03 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, linux-kernel, akpm, george, johnstul, paulmck * Roman Zippel <zippel@linux-m68k.org> wrote: > On Fri, 23 Sep 2005, Thomas Gleixner wrote: > > > Each network connection, each disk I/O operation arms a timeout timer to > > cover error conditions. Increasing the load on those increases the > > number of armed timers. At the same time this increased load keeps the > > timers longer active as it takes more time to detect that the "good" > > condition arrived on time. > > You're still rather vague here... as i said before, millions of timers are easily possible, and i personally saw in excess of 16 million active timers. I hope there was nothing vague about that ;-) Ingo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-24 2:43 ` Roman Zippel 2005-09-24 5:03 ` Ingo Molnar @ 2005-09-24 9:04 ` James Bruce 1 sibling, 0 replies; 50+ messages in thread From: James Bruce @ 2005-09-24 9:04 UTC (permalink / raw) To: Roman Zippel Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul, paulmck Roman Zippel wrote: > On Fri, 23 Sep 2005, Thomas Gleixner wrote: >>Each network connection, each disk I/O operation arms a timeout timer to >>cover error conditions. Increasing the load on those increases the >>number of armed timers. At the same time this increased load keeps the >>timers longer active as it takes more time to detect that the "good" >>condition arrived on time. > > You're still rather vague here... > Anyway, if the amount of active timer should be a problem, there are ways > to avoid them. <snip> What does this have to do with the ktimers work itself? It's true that other parts of the kernel shouldn't create more timers than necessary, but the timer subsystem should be able to handle a lot of timers regardless of that. To put it in perspective: A server doesn't run very efficiently with a load of 1000, and that should be avoided by proper application design. Yet we still test the scheduler on such workloads, don't we? It's nice to know a subsystem doesn't fall over when its stressed. If you really feel timers are overused, please bring it up with the maintainers of *those subsystems* which are overusing it. There's no point in raising the issue with Thomas since he's not responsible for how other people use/misuse an existing API. Perhaps the real issue is that you feel we should police the kernel usage of timers, instead of moving to a more scalable implementation. This is one of those rare cases however where we can have cleaner, more modular code, barely longer than before, which is also more scalable. The only thing left to measure is the performance impact, but the authors haven't gotten that far yet. Instead of jumping to conclusions now, let's wait until we have some real numbers, shall we? Jim Bruce ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-22 23:09 ` Roman Zippel ` (2 preceding siblings ...) 2005-09-23 8:27 ` Thomas Gleixner @ 2005-09-23 15:21 ` Paul E. McKenney 2005-09-24 3:38 ` Roman Zippel 3 siblings, 1 reply; 50+ messages in thread From: Paul E. McKenney @ 2005-09-23 15:21 UTC (permalink / raw) To: Roman Zippel; +Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul On Fri, Sep 23, 2005 at 01:09:46AM +0200, Roman Zippel wrote: > On Thu, 22 Sep 2005, Thomas Gleixner wrote: [ . . . ] > > > The main difference between them is that the latter is user > > > programmable. > > > > wallclock is reprogrammable too and it introduces a bunch of horrible > > functions in posix-timers.c. grep for abs_list. I explained why its > > horrible already. > > I said _user_ programmable, wallclock time is usually NTP controlled. I believe Thomas is concerned about workloads that need a short-term stable timebase. For example, a process-control application might need to accurately measure a (say) 1500-millisecond time interval. Both user-programmability and NTP adjustments to a given timebase could destroy the needed measurement accuracy. Such a workload does not need the long-term tie to wallclock time that NTP provides, but it does need the accurate short-term timekeeping that NTP cannot provide -- NTP sacrifices short-term accuracy in order to adjust the clock as needed to gain long-term stability. Thomas, John, please jump in if I am missing the point here. Thanx, Paul ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-23 15:21 ` Paul E. McKenney @ 2005-09-24 3:38 ` Roman Zippel 0 siblings, 0 replies; 50+ messages in thread From: Roman Zippel @ 2005-09-24 3:38 UTC (permalink / raw) To: Paul E. McKenney Cc: Thomas Gleixner, linux-kernel, mingo, akpm, george, johnstul Hi, On Fri, 23 Sep 2005, Paul E. McKenney wrote: > I believe Thomas is concerned about workloads that need a short-term > stable timebase. For example, a process-control application might need > to accurately measure a (say) 1500-millisecond time interval. Both > user-programmability and NTP adjustments to a given timebase could > destroy the needed measurement accuracy. NTP adjustments a quite small and not applied all at once, this means as soon as the time is synchronized, we could switch CLOCK_MONOTONIC to it. bye, Roman ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem @ 2005-09-25 15:48 Sid Boyce 2005-09-25 18:20 ` Zwane Mwaikambo 0 siblings, 1 reply; 50+ messages in thread From: Sid Boyce @ 2005-09-25 15:48 UTC (permalink / raw) To: linux-kernel OT, but something that's been bugging me for quite a while. I cut and paste the patch from the email to a file ktimers.patch. "# patch -l -p1 <ktimer.patch" and it returns --- (Patch is indented 1 space.) patching file fs/exec.c patch: **** malformed patch at line 16: } If I prepend 2 tabs to the line, it complains about line 17, I do the same to line 17 and on it moves to the next. from the manpage it reads like the "-l" should take care of the tabs so it only compares the text. Can anyone suggest how to apply the patch? Googling didn't help. Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, licensed Private Pilot Retired IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist Microsoft Windows Free Zone - Linux used for all Computing Tasks ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-25 15:48 Sid Boyce @ 2005-09-25 18:20 ` Zwane Mwaikambo 2005-09-26 0:02 ` Sid Boyce 0 siblings, 1 reply; 50+ messages in thread From: Zwane Mwaikambo @ 2005-09-25 18:20 UTC (permalink / raw) To: Sid Boyce; +Cc: linux-kernel On Sun, 25 Sep 2005, Sid Boyce wrote: > OT, but something that's been bugging me for quite a while. > I cut and paste the patch from the email to a file ktimers.patch. > "# patch -l -p1 <ktimer.patch" and it returns --- > (Patch is indented 1 space.) > patching file fs/exec.c > patch: **** malformed patch at line 16: } > > If I prepend 2 tabs to the line, it complains about line 17, I do the same to > line 17 and on it moves to the next. from the manpage it reads like the "-l" > should take care of the tabs so it only compares the text. > Can anyone suggest how to apply the patch? Googling didn't help. Save the entire email as a text file and apply it. Cut and paste usually introduces white space damage. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [ANNOUNCE] ktimers subsystem 2005-09-25 18:20 ` Zwane Mwaikambo @ 2005-09-26 0:02 ` Sid Boyce 0 siblings, 0 replies; 50+ messages in thread From: Sid Boyce @ 2005-09-26 0:02 UTC (permalink / raw) To: linux-kernel Zwane Mwaikambo wrote: > On Sun, 25 Sep 2005, Sid Boyce wrote: > > >>OT, but something that's been bugging me for quite a while. >>I cut and paste the patch from the email to a file ktimers.patch. >>"# patch -l -p1 <ktimer.patch" and it returns --- >> (Patch is indented 1 space.) >>patching file fs/exec.c >>patch: **** malformed patch at line 16: } >> >>If I prepend 2 tabs to the line, it complains about line 17, I do the same to >>line 17 and on it moves to the next. from the manpage it reads like the "-l" >>should take care of the tabs so it only compares the text. >>Can anyone suggest how to apply the patch? Googling didn't help. > > > Save the entire email as a text file and apply it. Cut and paste usually > introduces white space damage. > > > Thanks. regards Sid. -- Sid Boyce ... Hamradio License G3VBV, licensed Private Pilot Retired IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support Specialist Microsoft Windows Free Zone - Linux used for all Computing Tasks ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2005-09-28 16:37 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-09-19 16:48 [ANNOUNCE] ktimers subsystem tglx 2005-09-19 16:48 ` [PATCH] " tglx 2005-09-19 21:47 ` [ANNOUNCE] " Thomas Gleixner 2005-09-19 22:03 ` Christoph Lameter 2005-09-19 22:17 ` Thomas Gleixner 2005-09-19 22:24 ` Christoph Lameter 2005-09-19 22:44 ` Thomas Gleixner 2005-09-19 22:50 ` john stultz 2005-09-19 22:58 ` Thomas Gleixner 2005-09-19 23:04 ` Christoph Lameter 2005-09-19 23:12 ` Thomas Gleixner 2005-09-20 7:14 ` Ingo Molnar 2005-09-20 7:10 ` Ingo Molnar 2005-09-21 19:24 ` Pavel Machek 2005-09-19 22:39 ` Christopher Friesen 2005-09-19 22:54 ` Thomas Gleixner 2005-09-20 4:57 ` Christopher Friesen 2005-09-20 5:11 ` Thomas Gleixner 2005-09-20 0:43 ` George Anzinger 2005-09-21 19:50 ` Roman Zippel 2005-09-21 22:41 ` Thomas Gleixner 2005-09-22 12:59 ` Ingo Molnar 2005-09-22 23:09 ` Roman Zippel 2005-09-22 23:31 ` Christopher Friesen 2005-09-23 0:25 ` Roman Zippel 2005-09-23 6:49 ` Thomas Gleixner 2005-09-24 3:15 ` Roman Zippel 2005-09-24 5:16 ` Ingo Molnar 2005-09-24 10:35 ` Roman Zippel 2005-09-24 13:56 ` Thomas Gleixner 2005-09-24 16:51 ` Daniel Walker 2005-09-24 23:45 ` Roman Zippel 2005-09-25 21:00 ` Thomas Gleixner 2005-09-27 16:54 ` Roman Zippel 2005-09-27 19:03 ` Tim Bird 2005-09-28 16:36 ` Roman Zippel 2005-09-25 21:02 ` Thomas Gleixner 2005-09-27 16:48 ` Roman Zippel 2005-09-27 18:38 ` Tim Bird 2005-09-27 20:36 ` George Anzinger 2005-09-23 2:25 ` john stultz 2005-09-23 8:27 ` Thomas Gleixner 2005-09-24 2:43 ` Roman Zippel 2005-09-24 5:03 ` Ingo Molnar 2005-09-24 9:04 ` James Bruce 2005-09-23 15:21 ` Paul E. McKenney 2005-09-24 3:38 ` Roman Zippel 2005-09-25 15:48 Sid Boyce 2005-09-25 18:20 ` Zwane Mwaikambo 2005-09-26 0:02 ` Sid Boyce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).