linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: No 100 HZ timer!
@ 2001-04-12 13:14 Bret Indrelee
  0 siblings, 0 replies; 118+ messages in thread
From: Bret Indrelee @ 2001-04-12 13:14 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Thu, 12 Apr 2001, Mark Salisbury wrote:
> On Wed, 11 Apr 2001, Bret Indrelee wrote:
> > Current generation PCs can easily handle 1000s of
> > interrupts a second if you keep the overhead small.
> 
> the PC centric implementation of the ticked system is one of its major flaws.
> 
> there are architectures which the cost of a fixed interval is the same as a
> variable interval, i.e. no reload register, so you must explicitely load a
> value each interrupt anyway. and if you want accurate intervals, you must
> perform a calculation each reload, and not just load a static value, because
> you don't know how many cycles it took from the time the interrupt happened
> till you get to the reload line because the int may have been masked or lower
> pri than another interrupt.

People were saying the cost of adjusting the PIT was high.

On those archs where this is true, you would want to avoid changing the
interval timer.

On a more reasonable architechure, you would change it each time the head
of the timer list changed.

> also, why handle 1000's of interrupts if you only need to handle 10?

There is no reason.

I was pointing out that current hardware can easily handle this sort of
load, not advocating that distributions change HZ to 1000.


On Linux 2.2, if you change HZ to 400 your system is going to run at
about 70% of speed. The thing is it is limited to this value, bumping it
any higher doesn't cause a change.

If you reprogram the RTC to generate a 1000 HZ interrupt pulse, as I 
recall there is about a 3% change in performance. This is with a constant
rate interrupt, making it variable rate would reduce this.

You can verify this yourself. Get whatever benchmark you prefer. Run it on
a system. Rebuild after changing HZ to 400 and run it again. Restore HZ
and use the RTC driver to produce an interval of 1024 HZ, run your
benchmark again. Changing HZ had a huge performance impact in 2.2.13, I'm
pretty sure it didn't change much in later 2.2 releases.

Seems to me like there is a problem here. I'm pretty sure it is a
combination of the overhead of the cascaded timers combined with the
very high overhead of the scheduler that causes this. Someday this should
be fixed.


What I would like to see is support for a higher resolution timer in
Linux, for those of us that need times down to about 1mSec. Those systems
that can quickly reprogram the interval timer would do so each time a new
value was at the head of list, others would have to do it more
intelligently.

Regardless of how it is done, forcing the system to run a constant 1000 HZ
interrupt through the system should not impact system performance more
than a few percent. If it does, something was done wrong. When there is
need for the higher resolution timer, people will accept the overhead. On
most systems it would not be used, so there would be no reason to run the
higher rate timer.


-Bret

------------------------------------------------------------------------------
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
bret@io.com   |  -Riff in The Quatrix



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-14 15:59                                   ` Jamie Lokier
@ 2001-08-14 16:57                                     ` george anzinger
  0 siblings, 0 replies; 118+ messages in thread
From: george anzinger @ 2001-08-14 16:57 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Pavel Machek, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel, Andrew Morton

Jamie Lokier wrote:
> 
> Pavel Machek wrote:
> > > The testing I have done seems to indicate a lower overhead on a lightly
> > > loaded system, about the same overhead with some load, and much more
> > > overhead with a heavy load.  To me this seems like the wrong thing to
> > > do.  We would like as nearly a flat overhead to load curve as we can get
> >
> > Why? Higher overhead is a price for better precision timers. If you rounded
> > all times in "tickless" mode, you'd get about same overhead, right?
> 
> What Pavel says is true only if the data structure for holding pending
> timers degrades into good O(1) insertion & deletion time in the tickless case.
> 
> I recall George Anzinger saying something about the current ticked
> scheduler being able to insert timers that never reach expiry very
> efficiently (simply a list insert + delete), whereas the current
> tickless code could not do that.
> 
> Is that right?  Is it the data structure that is taking too long to
> process, when many timers that do not reach expiry are inserted?  There
> would be one such insertion and deletion on every context switch, right?

No.  It is not the data structure.  I did not change it for the
experimental code.  This, of course, could lead to some additional
overhead looking up the next timer to figure when the next interrupt is
needed.  I used the code from the IBM tick less patch which did a brute
force search.  In actual fact this is not too bad as I also force a
timer every second, so at most ten lists need to be examined.  The
instrumentation on this look up hardly shows up, and of course, it is
only needed after an interrupt.

The overhead comes from setting up a timer for the "slice" on every
schedule.  Schedule(), in a busy system, gets called often, and "slices"
do not often expire.  The other issue is additional overhead for the
jiffie (which is a function).  On review of this code, I see that it
needs to protect it self from interrupt, so as written in the
experimental code, it is wrong and could loose track of time (however,
it is faster as written).  Now neither of these operations take very
long, it is just that they are called every schedule, which is called
often enough to make a difference.  Those who optimized the scheduler
with gotos and such knew what they were doing!  It is a function that is
called enough to make the optimization worth while.

For those who would like to try the experimental system it can be found
at:
http://sourceforge.net/projects/high-res-timers
where it is called the "no HZ tick test".  Do read the release notes to
understand what you are getting.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-11 11:57                                 ` Pavel Machek
@ 2001-08-14 15:59                                   ` Jamie Lokier
  2001-08-14 16:57                                     ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-08-14 15:59 UTC (permalink / raw)
  To: Pavel Machek
  Cc: george anzinger, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel, Andrew Morton

Pavel Machek wrote:
> > The testing I have done seems to indicate a lower overhead on a lightly
> > loaded system, about the same overhead with some load, and much more
> > overhead with a heavy load.  To me this seems like the wrong thing to
> > do.  We would like as nearly a flat overhead to load curve as we can get
> 
> Why? Higher overhead is a price for better precision timers. If you rounded
> all times in "tickless" mode, you'd get about same overhead, right?

What Pavel says is true only if the data structure for holding pending
timers degrades into good O(1) insertion & deletion time in the tickless case.

I recall George Anzinger saying something about the current ticked
scheduler being able to insert timers that never reach expiry very
efficiently (simply a list insert + delete), whereas the current
tickless code could not do that.

Is that right?  Is it the data structure that is taking too long to
process, when many timers that do not reach expiry are inserted?  There
would be one such insertion and deletion on every context switch, right?

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01  1:08                               ` george anzinger
@ 2001-08-11 11:57                                 ` Pavel Machek
  2001-08-14 15:59                                   ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: Pavel Machek @ 2001-08-11 11:57 UTC (permalink / raw)
  To: george anzinger
  Cc: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel, Andrew Morton

Hi!

> The testing I have done seems to indicate a lower overhead on a lightly
> loaded system, about the same overhead with some load, and much more
> overhead with a heavy load.  To me this seems like the wrong thing to
> do.  We would like as nearly a flat overhead to load curve as we can get

Why? Higher overhead is a price for better precision timers. If you rounded
all times in "tickless" mode, you'd get about same overhead, right?
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 21:18                   ` george anzinger
@ 2001-08-02 22:09                     ` Oliver Xymoron
  0 siblings, 0 replies; 118+ messages in thread
From: Oliver Xymoron @ 2001-08-02 22:09 UTC (permalink / raw)
  To: george anzinger; +Cc: Rik van Riel, Chris Friesen, linux-kernel

On Thu, 2 Aug 2001, george anzinger wrote:

> I guess I am confused.  How is it that raising HZ improves throughput?
> And was that before or after the changes in the time slice routines that
> now scale with HZ and before were fixed?  (That happened somewhere
> around 2.2.14 or 2.2.16 or so.)

My guess is that processes that are woken up for whatever reason get to
run sooner, reducing latency, and thereby increasing throughput when not
compute-bound. Presumably this was with shorter time slices.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 18:41                 ` Oliver Xymoron
@ 2001-08-02 21:18                   ` george anzinger
  2001-08-02 22:09                     ` Oliver Xymoron
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-02 21:18 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Rik van Riel, Chris Friesen, linux-kernel

Oliver Xymoron wrote:
> 
> On Thu, 2 Aug 2001, george anzinger wrote:
> 
> > Oliver Xymoron wrote:
> > >
> > > Does the higher timer granularity cause overall throughput to improve, by
> > > any chance?
> > >
> > Good question.  I have not run any tests for this.  You might want to do
> > so.  To do these tests you would want to build the system with the tick
> > less timers only and with the instrumentation turned off.  I would like
> > to hear the results.
> >
> > In the mean time, here is a best guess.  First, due to hardware
> > limitations, the longest time you can program the timer for is ~50ms.
> > This means you are reducing the load by a factor of 5.  Now the load
> > (i.e. timer overhead) is ~0.12%, so it would go to ~0.025%.  This means
> > that you should have about 0.1% more available for thru put.  Even if we
> > take 10 times this to cover the cache disruptions that no longer occur,
> > I would guess a thru put improvement of no more than 1%.  Still,
> > measurements are better that guesses...
> 
> That's not what I'm getting at at all. Simply raising HZ is known to
> improve throughput on many workloads, even with more reschedules: the
> system is able to more finely adapt to changes in available disk and
> memory bandwidth.
> 
> BTW, there are some arguments that tickless is worth doing even on old
> PIC-only systems:
> 
> http://groups.google.com/groups?q=oliver+xymoron+timer&hl=en&group=mlist.linux.kernel&safe=off&rnum=2&selm=linux.kernel.Pine.LNX.4.30.0104111337170.32245-100000%40waste.org
> 
> And I found this while I was looking too:
> 
> http://groups.google.com/groups?q=oliver+xymoron+timer&hl=en&group=mlist.linux.kernel&safe=off&rnum=3&selm=linux.kernel.Pine.LNX.4.10.10010241534110.2957-100000%40waste.org
> 
> ..but no one thought it was interesting at the time.
> 
I guess I am confused.  How is it that raising HZ improves throughput? 
And was that before or after the changes in the time slice routines that
now scale with HZ and before were fixed?  (That happened somewhere
around 2.2.14 or 2.2.16 or so.)  

I am writing a POSIX high-resolution-timers package and hope to have
timers that have resolution at least to 1 micro second, however, this is
independent of ticked or tick less.  

The PIT may be slow to program, but it not that slow.  My timing shows
it to be less than 0.62 micro seconds and this includes the micro second
to PIT count conversion.  At the same time (on an 800 MHz PIII) I see
interrupt overhead (time to execute an int xx to a do nothing interrupt
handler) to be ~6.5 micro seconds.  This turns out, in the tick less
case with a PIT reprogramming, to be more than half of the total average
timer interrupt time. (I.e. the timer interrupt handler + the time list
processing took about 6.1 micro seconds.  To this we add the interrupt
overhead to get 12.6 micro seconds total interrupt time.)

And yes, it is possible to do tick less with out the "tsc".  The KURT
package has the code to do it, along with the note that they don't think
many such machines will ever use their code...  My experiment/ patch
does not do this as it is just an experiment to see if tick less is
worth doing at all.  What I need to see to be convinced is a realistic
load that shows a measurable improvement with the tick less system.  The
system is there (http://sourceforge.net/projects/high-res-timers)
waiting for the load.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 17:46               ` george anzinger
@ 2001-08-02 18:41                 ` Oliver Xymoron
  2001-08-02 21:18                   ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Oliver Xymoron @ 2001-08-02 18:41 UTC (permalink / raw)
  To: george anzinger; +Cc: Rik van Riel, Chris Friesen, linux-kernel

On Thu, 2 Aug 2001, george anzinger wrote:

> Oliver Xymoron wrote:
> >
> > Does the higher timer granularity cause overall throughput to improve, by
> > any chance?
> >
> Good question.  I have not run any tests for this.  You might want to do
> so.  To do these tests you would want to build the system with the tick
> less timers only and with the instrumentation turned off.  I would like
> to hear the results.
>
> In the mean time, here is a best guess.  First, due to hardware
> limitations, the longest time you can program the timer for is ~50ms.
> This means you are reducing the load by a factor of 5.  Now the load
> (i.e. timer overhead) is ~0.12%, so it would go to ~0.025%.  This means
> that you should have about 0.1% more available for thru put.  Even if we
> take 10 times this to cover the cache disruptions that no longer occur,
> I would guess a thru put improvement of no more than 1%.  Still,
> measurements are better that guesses...

That's not what I'm getting at at all. Simply raising HZ is known to
improve throughput on many workloads, even with more reschedules: the
system is able to more finely adapt to changes in available disk and
memory bandwidth.

BTW, there are some arguments that tickless is worth doing even on old
PIC-only systems:

http://groups.google.com/groups?q=oliver+xymoron+timer&hl=en&group=mlist.linux.kernel&safe=off&rnum=2&selm=linux.kernel.Pine.LNX.4.30.0104111337170.32245-100000%40waste.org

And I found this while I was looking too:

http://groups.google.com/groups?q=oliver+xymoron+timer&hl=en&group=mlist.linux.kernel&safe=off&rnum=3&selm=linux.kernel.Pine.LNX.4.10.10010241534110.2957-100000%40waste.org

..but no one thought it was interesting at the time.

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 17:05             ` Oliver Xymoron
@ 2001-08-02 17:46               ` george anzinger
  2001-08-02 18:41                 ` Oliver Xymoron
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-02 17:46 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Rik van Riel, Chris Friesen, linux-kernel

Oliver Xymoron wrote:
> 
> On Thu, 2 Aug 2001, george anzinger wrote:
> 
> > Ok, but then what?  The head timer expires.  Now what?  Since we are not
> > clocking the slice we don't know when it started.  Seems to me we are
> > just shifting the overhead to a different place and adding additional
> > tests and code to do it. The add_timer() code is fast.  The timing
> > tests (800MHZ PIII) show the whole setup taking an average of about 1.16
> > micro seconds.  the problem is that this happens, under kernel compile,
> > ~300 times per second, so the numbers add up.
> 
> As you said, most of those 'time to reschedule' timers never expire - we
> hit a rescheduling point first, yes? In the old system, we essentially had
> one 'time to reschedule' timer pending at any given time, I'm just trying
> to approximate that.
> 
> > Note that the ticked
> > system timer overhead (interrupts, time keeping, timers, the works) is
> > about 0.12% of the available cpu.  Under heavy load this raises to about
> > 0.24% according to my measurments.  The tick less system overhead under
> > the same kernel compile load is about 0.12%.  No load is about 0.012%,
> > but heavy load can take it to 12% or more, most of this comming from the
> > accounting overhead in schedule().  Is it worth it?
> 
> Does the higher timer granularity cause overall throughput to improve, by
> any chance?
> 
Good question.  I have not run any tests for this.  You might want to do
so.  To do these tests you would want to build the system with the tick
less timers only and with the instrumentation turned off.  I would like
to hear the results.

In the mean time, here is a best guess.  First, due to hardware
limitations, the longest time you can program the timer for is ~50ms. 
This means you are reducing the load by a factor of 5.  Now the load
(i.e. timer overhead) is ~0.12%, so it would go to ~0.025%.  This means
that you should have about 0.1% more available for thru put.  Even if we
take 10 times this to cover the cache disruptions that no longer occur,
I would guess a thru put improvement of no more than 1%.  Still,
measurements are better that guesses...

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 16:36           ` george anzinger
  2001-08-02 17:05             ` Oliver Xymoron
@ 2001-08-02 17:26             ` John Alvord
  1 sibling, 0 replies; 118+ messages in thread
From: John Alvord @ 2001-08-02 17:26 UTC (permalink / raw)
  To: linux-kernel

On Thu, 02 Aug 2001 09:36:07 -0700, george anzinger
<george@mvista.com> wrote:
 ...
>  The timing
>tests (800MHZ PIII) show the whole setup taking an average of about 1.16
>micro seconds.  the problem is that this happens, under kernel compile,
>~300 times per second, so the numbers add up.  Note that the ticked
>system timer overhead (interrupts, time keeping, timers, the works) is
>about 0.12% of the available cpu.  Under heavy load this raises to about
>0.24% according to my measurments.  The tick less system overhead under
>the same kernel compile load is about 0.12%.  No load is about 0.012%,
>but heavy load can take it to 12% or more, most of this comming from the
>accounting overhead in schedule().  Is it worth it?

I thought the claimed advantage was on certain S/390 configurations
(running 100s of Linux/390 images under VM) where the cost is
multiplied by the number of images. A measurement on another platform
may be interesting but not conclusive.

john alvord

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 16:36           ` george anzinger
@ 2001-08-02 17:05             ` Oliver Xymoron
  2001-08-02 17:46               ` george anzinger
  2001-08-02 17:26             ` John Alvord
  1 sibling, 1 reply; 118+ messages in thread
From: Oliver Xymoron @ 2001-08-02 17:05 UTC (permalink / raw)
  To: george anzinger; +Cc: Rik van Riel, Chris Friesen, linux-kernel

On Thu, 2 Aug 2001, george anzinger wrote:

> Ok, but then what?  The head timer expires.  Now what?  Since we are not
> clocking the slice we don't know when it started.  Seems to me we are
> just shifting the overhead to a different place and adding additional
> tests and code to do it. The add_timer() code is fast.  The timing
> tests (800MHZ PIII) show the whole setup taking an average of about 1.16
> micro seconds.  the problem is that this happens, under kernel compile,
> ~300 times per second, so the numbers add up.

As you said, most of those 'time to reschedule' timers never expire - we
hit a rescheduling point first, yes? In the old system, we essentially had
one 'time to reschedule' timer pending at any given time, I'm just trying
to approximate that.

> Note that the ticked
> system timer overhead (interrupts, time keeping, timers, the works) is
> about 0.12% of the available cpu.  Under heavy load this raises to about
> 0.24% according to my measurments.  The tick less system overhead under
> the same kernel compile load is about 0.12%.  No load is about 0.012%,
> but heavy load can take it to 12% or more, most of this comming from the
> accounting overhead in schedule().  Is it worth it?

Does the higher timer granularity cause overall throughput to improve, by
any chance?

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02 14:39         ` Oliver Xymoron
@ 2001-08-02 16:36           ` george anzinger
  2001-08-02 17:05             ` Oliver Xymoron
  2001-08-02 17:26             ` John Alvord
  0 siblings, 2 replies; 118+ messages in thread
From: george anzinger @ 2001-08-02 16:36 UTC (permalink / raw)
  To: Oliver Xymoron; +Cc: Rik van Riel, Chris Friesen, linux-kernel

Oliver Xymoron wrote:
> 
> On Wed, 1 Aug 2001, george anzinger wrote:
> 
> > > Never set the next hit of the timer to (now + MIN_INTERVAL).
> >
> > The overhead under load is _not_ the timer interrupt, it is the context
> > switch that needs to set up a "slice" timer, most of which never
> > expire.  During a kernel compile on an 800MHZ PIII I am seeing ~300
> > context switches per second (i.e. about every 3 ms.)   Clearly the
> > switching is being caused by task blocking.  With the ticked system the
> > "slice" timer overhead is constant.
> 
> Can you instead just not set up a reschedule timer if the timer at the
> head of the list is less than MIN_INTERVAL?
> 
> if(slice_timer_needed)
> {
>         if(time_until(next_timer)>TASK_SLICE)
>         {
>                 next_timer=jiffies()+TASK_SLICE;
>                 add_timer(TASK_SLICE);
>         }
>         slice_timer_needed=0;
> }
> 
Ok, but then what?  The head timer expires.  Now what?  Since we are not
clocking the slice we don't know when it started.  Seems to me we are
just shifting the overhead to a different place and adding additional
tests and code to do it.  The add_timer() code is fast.  The timing
tests (800MHZ PIII) show the whole setup taking an average of about 1.16
micro seconds.  the problem is that this happens, under kernel compile,
~300 times per second, so the numbers add up.  Note that the ticked
system timer overhead (interrupts, time keeping, timers, the works) is
about 0.12% of the available cpu.  Under heavy load this raises to about
0.24% according to my measurments.  The tick less system overhead under
the same kernel compile load is about 0.12%.  No load is about 0.012%,
but heavy load can take it to 12% or more, most of this comming from the
accounting overhead in schedule().  Is it worth it?

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02  6:03       ` george anzinger
@ 2001-08-02 14:39         ` Oliver Xymoron
  2001-08-02 16:36           ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Oliver Xymoron @ 2001-08-02 14:39 UTC (permalink / raw)
  To: george anzinger; +Cc: Rik van Riel, Chris Friesen, linux-kernel

On Wed, 1 Aug 2001, george anzinger wrote:

> > Never set the next hit of the timer to (now + MIN_INTERVAL).
>
> The overhead under load is _not_ the timer interrupt, it is the context
> switch that needs to set up a "slice" timer, most of which never
> expire.  During a kernel compile on an 800MHZ PIII I am seeing ~300
> context switches per second (i.e. about every 3 ms.)   Clearly the
> switching is being caused by task blocking.  With the ticked system the
> "slice" timer overhead is constant.

Can you instead just not set up a reschedule timer if the timer at the
head of the list is less than MIN_INTERVAL?

if(slice_timer_needed)
{
	if(time_until(next_timer)>TASK_SLICE)
	{
		next_timer=jiffies()+TASK_SLICE;
		add_timer(TASK_SLICE);
	}
	slice_timer_needed=0;
}

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-02  4:28     ` Rik van Riel
@ 2001-08-02  6:03       ` george anzinger
  2001-08-02 14:39         ` Oliver Xymoron
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-02  6:03 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Chris Friesen, linux-kernel

Rik van Riel wrote:
> 
> On Wed, 1 Aug 2001, george anzinger wrote:
> > Chris Friesen wrote:
> > > george anzinger wrote:
> > >
> > > > The testing I have done seems to indicate a lower overhead on a lightly
> > > > loaded system, about the same overhead with some load, and much more
> > > > overhead with a heavy load.  To me this seems like the wrong thing to
> > >
> > > What about something that tries to get the best of both worlds?  How about a
> > > tickless system that has a max frequency for how often it will schedule?  This
> >
> > How would you do this?  Larger time slices?  But _most_ context
> > switches are not related to end of slice.  Refuse to switch?
> > This just idles the cpu.
> 
> Never set the next hit of the timer to (now + MIN_INTERVAL).
> 
> This way we'll get to run the current task until the timer
> hits or until the current task voluntarily gives up the CPU.

The overhead under load is _not_ the timer interrupt, it is the context
switch that needs to set up a "slice" timer, most of which never
expire.  During a kernel compile on an 800MHZ PIII I am seeing ~300
context switches per second (i.e. about every 3 ms.)   Clearly the
switching is being caused by task blocking.  With the ticked system the
"slice" timer overhead is constant.
> 
> We can check for already-expired timers in schedule().

Delaying "alarm" timers is bad form.  
Especially for someone who is working on high-res-timers :)

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01 21:20   ` george anzinger
@ 2001-08-02  4:28     ` Rik van Riel
  2001-08-02  6:03       ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Rik van Riel @ 2001-08-02  4:28 UTC (permalink / raw)
  To: george anzinger; +Cc: Chris Friesen, linux-kernel

On Wed, 1 Aug 2001, george anzinger wrote:
> Chris Friesen wrote:
> > george anzinger wrote:
> >
> > > The testing I have done seems to indicate a lower overhead on a lightly
> > > loaded system, about the same overhead with some load, and much more
> > > overhead with a heavy load.  To me this seems like the wrong thing to
> >
> > What about something that tries to get the best of both worlds?  How about a
> > tickless system that has a max frequency for how often it will schedule?  This
>
> How would you do this?  Larger time slices?  But _most_ context
> switches are not related to end of slice.  Refuse to switch?
> This just idles the cpu.

Never set the next hit of the timer to (now + MIN_INTERVAL).

This way we'll get to run the current task until the timer
hits or until the current task voluntarily gives up the CPU.

We can check for already-expired timers in schedule().

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01 19:34 ` Chris Friesen
  2001-08-01 19:49   ` Richard B. Johnson
@ 2001-08-01 21:20   ` george anzinger
  2001-08-02  4:28     ` Rik van Riel
  1 sibling, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-01 21:20 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

Chris Friesen wrote:
> 
> george anzinger wrote:
> 
> > The testing I have done seems to indicate a lower overhead on a lightly
> > loaded system, about the same overhead with some load, and much more
> > overhead with a heavy load.  To me this seems like the wrong thing to
> 
> What about something that tries to get the best of both worlds?  How about a
> tickless system that has a max frequency for how often it will schedule?  This

How would you do this?  Larger time slices?  But _most_ context switches
are not related to end of slice.   Refuse to switch?  This just idles
the cpu.

> would give the tickless advantage for big iron running many lightly loaded
> virtual instances, but have some kind of cap on the overhead under heavy load.
> 
> Does this sound feasable?
> 
I don't think so.  The problem is that the test to see if the system
should use one or the other way of doing things would, it self, eat into
the overhead.

Note that we are talking about 0.12% over head for a ticked system.  Is
it really worth it for what amounts to less than 0.05% (if that much)?

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01 19:49   ` Richard B. Johnson
  2001-08-01 20:08     ` Mark Salisbury
@ 2001-08-01 20:33     ` george anzinger
  1 sibling, 0 replies; 118+ messages in thread
From: george anzinger @ 2001-08-01 20:33 UTC (permalink / raw)
  To: root; +Cc: Chris Friesen, linux-kernel

"Richard B. Johnson" wrote:
> 
> > george anzinger wrote:
> >
> > > The testing I have done seems to indicate a lower overhead on a lightly
> > > loaded system, about the same overhead with some load, and much more
> > > overhead with a heavy load.  To me this seems like the wrong thing to
> >
> 
> Doesn't the "tick-less" system presume that somebody, somewhere, will
> be sleeping sometime during the 1/HZ interval so that the scheduler
> gets control?
> 
> If everybody's doing:
> 
>         for(;;)
>           number_crunch();
> 
> And no I/O is pending, how does the jiffy count get bumped?

Who cares if it gets bumped?  In the tick less system the jiffy counter
is a function.  Thus, if you need it, it will be current, more current
than in the ticked system because it is calculated on the spot and does
not rely on an interrupt to "bump" it.
> 
> I think the "tick-less" system relies upon a side-effect of
> interactive use that can't be relied upon for design criteria.
> 
Look at the code.  You will find it here:
http://sourceforge.net/projects/high-res-timers

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01 19:49   ` Richard B. Johnson
@ 2001-08-01 20:08     ` Mark Salisbury
  2001-08-01 20:33     ` george anzinger
  1 sibling, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-08-01 20:08 UTC (permalink / raw)
  To: root, Richard B. Johnson, Chris Friesen; +Cc: george anzinger, linux-kernel

you are misunderstanding what is meant by tickless.

tickless means that instead of having a clock interruptevery 10 ms whether
there are any timed events pending or not, a tickeless system only generates an
interrupt when there is a pending timed event.

timed events are such things as:

	process quanta expiration, 
	POSIX timers
	alarm()
etc.

jiffies, as such, don't need to be updated as a tickless system relies on a
decrementer and a (usually 64-bit) wall clock. a jiffie then is just the
wallclock count. (or more likely, wall clock >> a few bits)

the tickless system is intended to take advantage of some of the more modern
features of newer CPU's (such as useful counters and decrementers)

the fixed interval, ticked system is a reasonable lowest common denominator
solution.  but when hardware is capable of supporting a more flexible, modern
system, it should be an available option.



On Wed, 01 Aug 2001, Richard B. Johnson wrote:
> > george anzinger wrote:
> > 
> > > The testing I have done seems to indicate a lower overhead on a lightly
> > > loaded system, about the same overhead with some load, and much more
> > > overhead with a heavy load.  To me this seems like the wrong thing to
> > 
> 
> Doesn't the "tick-less" system presume that somebody, somewhere, will
> be sleeping sometime during the 1/HZ interval so that the scheduler
> gets control?
> 
> If everybody's doing:
> 
> 	for(;;)
>           number_crunch();
> 
> And no I/O is pending, how does the jiffy count get bumped?
> 
> I think the "tick-less" system relies upon a side-effect of
> interactive use that can't be relied upon for design criteria.
> 
> Cheers,
> Dick Johnson
> 
> Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
> 
>     I was going to compile a list of innovations that could be
>     attributed to Microsoft. Once I realized that Ctrl-Alt-Del
>     was handled in the BIOS, I found that there aren't any.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  Thanks to all who sponsored me for the        **
**  Multiple Sclerosis Great Mass Getaway!!       **
**  Robert, Michele and I raised $10,000 and the  **
**  raised over $1,000,000 ride as a whole.  The  **
**  ride was great, and we didn't get rained on!  **
**  Thanks again to all of you who made this      **
**  possible!!                                    **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01 19:34 ` Chris Friesen
@ 2001-08-01 19:49   ` Richard B. Johnson
  2001-08-01 20:08     ` Mark Salisbury
  2001-08-01 20:33     ` george anzinger
  2001-08-01 21:20   ` george anzinger
  1 sibling, 2 replies; 118+ messages in thread
From: Richard B. Johnson @ 2001-08-01 19:49 UTC (permalink / raw)
  To: Chris Friesen; +Cc: george anzinger, linux-kernel


> george anzinger wrote:
> 
> > The testing I have done seems to indicate a lower overhead on a lightly
> > loaded system, about the same overhead with some load, and much more
> > overhead with a heavy load.  To me this seems like the wrong thing to
> 

Doesn't the "tick-less" system presume that somebody, somewhere, will
be sleeping sometime during the 1/HZ interval so that the scheduler
gets control?

If everybody's doing:

	for(;;)
          number_crunch();

And no I/O is pending, how does the jiffy count get bumped?

I think the "tick-less" system relies upon a side-effect of
interactive use that can't be relied upon for design criteria.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

    I was going to compile a list of innovations that could be
    attributed to Microsoft. Once I realized that Ctrl-Alt-Del
    was handled in the BIOS, I found that there aren't any.



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-08-01 17:22 No 100 HZ timer ! george anzinger
@ 2001-08-01 19:34 ` Chris Friesen
  2001-08-01 19:49   ` Richard B. Johnson
  2001-08-01 21:20   ` george anzinger
  0 siblings, 2 replies; 118+ messages in thread
From: Chris Friesen @ 2001-08-01 19:34 UTC (permalink / raw)
  To: george anzinger, linux-kernel

george anzinger wrote:

> The testing I have done seems to indicate a lower overhead on a lightly
> loaded system, about the same overhead with some load, and much more
> overhead with a heavy load.  To me this seems like the wrong thing to

What about something that tries to get the best of both worlds?  How about a
tickless system that has a max frequency for how often it will schedule?  This
would give the tickless advantage for big iron running many lightly loaded
virtual instances, but have some kind of cap on the overhead under heavy load.

Does this sound feasable?

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 118+ messages in thread

* No 100 HZ timer !
@ 2001-08-01 17:22 george anzinger
  2001-08-01 19:34 ` Chris Friesen
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-01 17:22 UTC (permalink / raw)
  To: linux-kernel

I have just posted a patch on sourceforge:
 http://sourceforge.net/projects/high-res-timers

to the 2.4.7 kernel with both ticked and tick less options, switch able
at any time via a /proc interface.  The system is instrumented with
Andrew Mortons time pegs with a couple of enhancements so you can easily
see your clock/ timer overhead (thanks Andrew).

Please take a look at this system and let me know if a tick less system
is worth further effort.  

The testing I have done seems to indicate a lower overhead on a lightly
loaded system, about the same overhead with some load, and much more
overhead with a heavy load.  To me this seems like the wrong thing to
do.  We would like as nearly a flat overhead to load curve as we can get
and the ticked system seems to be much better in this regard.  Still
there may be applications where this works.

comments?  RESULTS?

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 19:28                             ` george anzinger
  2001-04-10 20:02                               ` mark salisbury
@ 2001-08-01  1:08                               ` george anzinger
  2001-08-11 11:57                                 ` Pavel Machek
  1 sibling, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-01  1:08 UTC (permalink / raw)
  To: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel, Andrew Morton

I have just posted a patch on sourceforge:
 http://sourceforge.net/projects/high-res-timers

to the 2.4.7 kernel with both ticked and tick less options, switch able
at any time via a /proc interface.  The system is instrumented with
Andrew Mortons time pegs with a couple of enhancements so you can easily
see your clock/ timer overhead (thanks Andrew).

Please take a look at this system and let me know if a tick less system
is worth further effort.  

The testing I have done seems to indicate a lower overhead on a lightly
loaded system, about the same overhead with some load, and much more
overhead with a heavy load.  To me this seems like the wrong thing to
do.  We would like as nearly a flat overhead to load curve as we can get
and the ticked system seems to be much better in this regard.  Still
there may be applications where this works.

comments?  RESULTS?

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-23  8:05                             ` Ulrich Windl
@ 2001-04-23 13:22                               ` Mark Salisbury
  0 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-23 13:22 UTC (permalink / raw)
  To: Ulrich Windl, george anzinger; +Cc: linux-kernel

On Mon, 23 Apr 2001, Ulrich Windl wrote:
> IMHO the POSIX is doable to comply with POSIX. Probably not what many 
> of the RT freaks expect, but doable. I'm tuning the nanoseconds for a 
> while now...
> 
> Ulrich

thanks for calling me a freak :)

yes, linux could have posix timers cobbled on today and comply with the POSIX
spec.

but, we would like to make it better, not just feature complete.

10ms timer resolutions, while completely in compliance w/ the posix spec, are
completely useless for "RT freaks" 

remember that 95% of all computers in the world are embedded and of those most
nead at least "soft" RT behavior.  seems like a good growth market for linux to
me.

when a PPC G4 is capable of 30 nanosecond resolution, why would you want to
settle for 10 millisecond  (30 billionths of a second vs 10 thousandths for
those of you who haven't had to familiarize yourselves with sub-second time
units)

> 
> On 17 Apr 2001, at 11:53, george anzinger wrote:
> 
> > I was thinking that it might be good to remove the POSIX API for the
> > kernel and allow a somewhat simplified interface.  For example, the user
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-17 18:53                           ` george anzinger
  2001-04-17 19:41                             ` Jamie Lokier
@ 2001-04-23  8:05                             ` Ulrich Windl
  2001-04-23 13:22                               ` Mark Salisbury
  1 sibling, 1 reply; 118+ messages in thread
From: Ulrich Windl @ 2001-04-23  8:05 UTC (permalink / raw)
  To: george anzinger; +Cc: linux-kernel

IMHO the POSIX is doable to comply with POSIX. Probably not what many 
of the RT freaks expect, but doable. I'm tuning the nanoseconds for a 
while now...

Ulrich

On 17 Apr 2001, at 11:53, george anzinger wrote:

> I was thinking that it might be good to remove the POSIX API for the
> kernel and allow a somewhat simplified interface.  For example, the user


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-17 18:53                           ` george anzinger
@ 2001-04-17 19:41                             ` Jamie Lokier
  2001-04-23  8:05                             ` Ulrich Windl
  1 sibling, 0 replies; 118+ messages in thread
From: Jamie Lokier @ 2001-04-17 19:41 UTC (permalink / raw)
  To: george anzinger
  Cc: Mark Salisbury, Ben Greear, Horst von Brand, linux-kernel,
	high-res-timers-discourse

george anzinger wrote:
> > > a.) list insertion of an arbitrary timer,
> > should be O(log(n)) at worst
> > 
> > > b.) removal of canceled and expired timers, and
> > easy to make O(1)
> 
> I thought this was true also, but the priority heap structure that has
> been discussed here has a O(log(n)) removal time.

Several priority queue structures support removal in O(1) time.
Perhaps you are thinking of the classic array-based heap, for
which removal is O(log n) in the general case.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-17 12:51                         ` Mark Salisbury
@ 2001-04-17 18:53                           ` george anzinger
  2001-04-17 19:41                             ` Jamie Lokier
  2001-04-23  8:05                             ` Ulrich Windl
  0 siblings, 2 replies; 118+ messages in thread
From: george anzinger @ 2001-04-17 18:53 UTC (permalink / raw)
  To: Mark Salisbury
  Cc: Jamie Lokier, Ben Greear, Horst von Brand, linux-kernel,
	high-res-timers-discourse

Mark Salisbury wrote:
> 
> > Functional Specification for the high-res-timers project.
> >
> > In addition we expect that we will provide a high resolution timer for
> > kernel use (heck, we may provide several).
> 
> what we do here determines what we can do for the user..

I was thinking that it might be good to remove the POSIX API for the
kernel and allow a somewhat simplified interface.  For example, the user
gets to resolution by specifying a CLOCK, where we MAY want to allow the
kernel call to directly specify the resolution.  This has already been
suggested.  I suppose you could say the functional spec should define
the kernel PI (KPI?) as well as the user API, but that is a bit fuzzy at
this time as I think it will depend on how we actually code the user
functionality.  Another example: I am leaning toward using a two word
uptime composed of jiffies (i.e. 1/HZ since boot) and a machine
dependent sub jiffie unit.  Each ARCH would then define this unit as
well as the conversion routines to move it back and forth to
nanoseconds, microseconds, and 1/HZ.  I think this format would play
well in task time accounting, as well as in the timer management.  

For calls to something like delay(), however, it sucks.  I think these
calls want a single word, possibly microsecond, time specification. 
This gives a 16 or 17 minutes max and 1 microsecond min. which probably
covers 99.99% of all kernel delay needs.  

Another kernel internal interface should allow the user specified
structures (timespec and timeval).  The notion is to put all the
conversion routines in the timer code to maintain the specified
resolution, and (more importantly), to avoid conversion to a format that
just needs an additional conversion.

To summarize,

I think there is a need for two classes of timer interfaces in the
kernel:

a.) For drivers and others that need "delay()" sorts of things, and
b.) For system calls that handle user specified times.
> 
> > We will provide several "clocks" as defined by the standard.  In
> > particular, the following capabilities will be attached to some clock,
> > regardless of the actual clock "name" we end up using:
> >
> > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> > linux today).
> >
> > CLOCK_HIGH_RES a wall clock supporting timers with the highest
> > resolution the hardware supports.
> >
> > CLOCK_1US a wall clock supporting timers with 1 micro second resolution
> > (if the hardware allows it).
> >
> > CLOCK_UPTIME a clock that give the system up time.  (Does this clock
> > need to support timers?)
> >
> > CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.
> >
> 
> Too many clocks.  we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but
> the others are just fluff.  we should have 1 single clock mechanism for the
> whole system with it's resolution and tick/tickless characteristics determined
> at compile time.

I think you already have let the nose of the camel into the tent :) 
Here is what I am thinking:

Suppose an array of structures of type clock.  Clock_id is an index into
this array.  Here is what is in the structure:

struct clock{
	int resolution;
	int *gettime();
	int *settime();
	int *convert_to_uptime();
	int *convert_from_uptime();
	<room for more bright ideas>;
};

Now the difference between CLOCK_UPTIME and CLOCK_REALTIME is surly in
the gettime/settime and possibly in the resolution.  But the difference
between CLOCK_REALTIME and CLOCK_1US, CLOCK_HIGH_RES, CLOCK_10MS is JUST
the resolution!  In other words, all they cost is the table entry.  Note
that CLOCK_GMT, CLOCK_UST, and CLOCK_GPS, etc. all fit nicely into this
same structure.

We should also provide a way to "register" a new clock so the user can
easily configure in additional clocks.  There are ways of doing this
that are really easy to use, e.g. the module_init() macro.
> 
> also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a
> convenience/compliance offset.

If you mean by "true" that this clock can not be set, starts at 0 at
boot time and can only be affected by rate adjustments to get it to pass
a real second every second, I agree.
> 
> > At the same time we will NOT support the following clocks:
> >
> > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> > wall) of a given task.
> >
> > CLOCK_PROFILE a clock used to generate profiling events.
> >
> > CLOCK_???  any clock keyed to a task.
> 
> we could do some KOOL things here but they are more related to process time
> accounting and should be dealt with in that context and as a separate project.
> 
> however our design should take these concepts into account and allow for easy
> integration of these types of functionality.

I agree.
> 
> >
> > (Note that this does not mean that the clock system will not support the
> > virtual and profile clocks, but that they will not be accessible thru
> > the POSIX timer interface.)
> 
> I think that should sombody choose to implement them, they should be available
> at least through the clock_xxx interfaces..

With care, this could be done.  We might need to add an item to the
structure saying if the clock supports timers.  If you want timers on
these clocks you would need a different way of clocking them (or so I
think, haven't thought about this in much detail).
> 
> >
> > It would be NICE if we could provide a way to hook other time support
> > into the system.  In particular a
> >
> > CLOCK_WWV or CLOCK_GPS
> >
> 
> CLOCK_NNTP
> 
> > might be nice.  The problem with these sorts of clocks is that they
> > imply an array of function pointers for each clock and function pointers
> > slow the code down because of their non predictability.  Never the less,
> > we will attempt to allow easy expansion in this direction.
> >
> > Implications on the current kernel:
> >
> > The high resolution timers will require a fast clock access with the
> > maximum supported resolution in order to convert relative times to
> > absolute times.  This same fast clock will be used to support the
> > various user and system time requests.
> >
> > There are two ways to provide timers to the kernel.  For lack of a
> > better term we will refer to them as "ticked" and "tick less".  Until we
> > have performance information that implies that one or the other of these
> > methods is better in all cases we will provide both ticked and tick less
> > systems.  The variety to be used will be selected at configure time.
> >
> > For tick less systems we will need to provide code to collect execution
> > times.  For the ticked system the current method of collection these
> > times will be used.  This project will NOT attempt to improve the
> > resolution of these timers, however, the high speed, high resolution
> > access to the current time will allow others to augment the system in
> > this area.
> 
> huh? why not?

Well, first accounting (as I call this usage) is not part of the POSIX
API the project is working on.  Second, it is in a real way
independent.  That is to say, it can be done today with the current
clock system as well as any new one.  And Third, more work is needed in 
areas of the system we would not otherwise touch to do them, (wait() 
for example).
> 
> >
> > For the tick less system the project will also provide a time slice
> > expiration interrupt.
> >
> > The timer list(s) (all pending timers) need to be organized so that the
> > following operations are fast:
> >
> > a.) list insertion of an arbitrary timer,
> should be O(log(n)) at worst
> 
> > b.) removal of canceled and expired timers, and
> easy to make O(1)

I thought this was true also, but the priority heap structure that has
been discussed here has a O(log(n)) removal time.
> 
> > c.) finding the timer for "NOW" and its immediate followers.
> head of list or top of tree or top of heap or whatever, O(1)
> 
> > Times in the timer list will be absolute and related to system up time.
> > These times will be converted to wall time as needed.
> 
> and ONLY when needed by users
> 
> >
> > The POSIX interface provides for "absolute" timers relative to a given
> 
> do you mean "relative" timers?
No, the user specifies a wall time for the event.
> 
> > clock.  When these timers are related to a "wall" clock they will need
> > adjusting when the wall clock time is adjusted.  These adjustments are
> > done for "leap seconds" and the date command.  (Note, we are keeping
> > timers relative to "uptime" which can not be adjusted.  This means that
> > relative timers and absolute timers related to CLOCK_UPTIME are not
> > affected by the above wall time adjustments.)
> 
> absolute timers will automatically fall out when you adjust CLOCK_UPTIME...
> i.e.  an absolute time is an absolute time and since CLOCK_UPTIME is the
> ultimate arbiter of what absolute time it is, adjusting CLOCK_UPTIME will cause
> the absolute times in the timer list to be handled properly without modifying
> them. (am I makeing myself clear?  I will try to come up with a better
> description of what I mean)

I don't read the spec this way.  In my reading, the user can specify
that a timer expire at some fixed wall time based on a given clock.  If
that clock moves past that time, we had better expire that timer.  It
doesn't matter if the clock moved because of STD to DST or a leap second
or the date command.  Likewise, if we have converted that CLOCK time to
CLOCK_UPTIME (as we will when we put the timer in the list) we will need
to redo the calculation when any of the above events occur.  Also, as
said earlier, CLOCK_UPTIME can only be set by booting the system.  (By
the way, somewhere in the high-res-timer mailing list archive is a note
requesting (as I read it) CLOCK_UPTIME, a clock that never gives the
same time twice (within resolution/ cpu speed limits), never moves
backward, etc.  See TIME_MONOTONIC at
http://www.cl.cam.ac.uk/~mgk25/c-time/ 
but the whole thing is a good read.)
> 
> > In either a ticked or tick less system, it is expected that resolutions
> > higher than 1/HZ will come with some additional overhead.  For this
> > reason, the CLOCK resolution will be used to round up times for each
> > timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
> > project will attempt to meet or exceed the current systems timer
> > performance.
> 
> ONLY in a ticked system.

I am not sure what you mean here.  The resolution round up will tend to
group timers and thus, given the same timer expiration rate, cause fewer
interrupts.  This is true in both systems, with the exception that in
ticked systems the improvement goes flat once the resolution becomes
less than or equal to the tick rate.
> 
> >
> > Safe guards:
> >
> > Given a system speed, there is a repeating timer rate which will consume
> > 100% of the system in handling the timer interrupts.  An attempt will
> > be made to detect this rate and adjust the timer to prevent system
> > lockup.  This adjustment will look like timer overruns to the user
> > (i.e. we will take a percent of the interrupts and record the untaken
> > interrupts as overruns).
> 
> see my earlier e-mail, although it is a simple matter to enforce a minimum
> allowable interval by parameter checking.
> 
> >
> > What the project will NOT do:
> >
> > This project will NOT provide higher resolution accounting (i.e. user
> > and system execution times).
> >
> > The project will NOT provide higher resolution VIRTUAL or PROFILE timers.
> 
> as I said, there are some kool things we could do here, and we should design in
> allowances for future upgrades which implement these things and let it get done
> as a followon.
> 
Right.  Remember Linus seems to like small patches.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 19:19                       ` george anzinger
  2001-04-16 20:45                         ` Albert D. Cahalan
  2001-04-16 23:57                         ` Mark Salisbury
@ 2001-04-17 12:51                         ` Mark Salisbury
  2001-04-17 18:53                           ` george anzinger
  2 siblings, 1 reply; 118+ messages in thread
From: Mark Salisbury @ 2001-04-17 12:51 UTC (permalink / raw)
  To: george anzinger
  Cc: Jamie Lokier, Ben Greear, Horst von Brand, linux-kernel,
	high-res-timers-discourse

> Functional Specification for the high-res-timers project.
> 
> In addition we expect that we will provide a high resolution timer for
> kernel use (heck, we may provide several).
 
what we do here determines what we can do for the user..

> We will provide several "clocks" as defined by the standard.  In
> particular, the following capabilities will be attached to some clock,
> regardless of the actual clock "name" we end up using:
> 
> CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> linux today). 
> 
> CLOCK_HIGH_RES a wall clock supporting timers with the highest
> resolution the hardware supports.
> 
> CLOCK_1US a wall clock supporting timers with 1 micro second resolution
> (if the hardware allows it).
> 
> CLOCK_UPTIME a clock that give the system up time.  (Does this clock
> need to support timers?)
> 
> CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.
> 

Too many clocks.  we should have CLOCK_REALTIME and CLOCK_UPTIME for sure, but
the others are just fluff.  we should have 1 single clock mechanism for the
whole system with it's resolution and tick/tickless characteristics determined
at compile time.

also CLOCK_UPTIME should be the "true" clock, with CLOCK_REALTIME just a
convenience/compliance offset.



> At the same time we will NOT support the following clocks:
> 
> CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> wall) of a given task.  
> 
> CLOCK_PROFILE a clock used to generate profiling events. 
> 
> CLOCK_???  any clock keyed to a task.

we could do some KOOL things here but they are more related to process time
accounting and should be dealt with in that context and as a separate project.

however our design should take these concepts into account and allow for easy
integration of these types of functionality.

> 
> (Note that this does not mean that the clock system will not support the
> virtual and profile clocks, but that they will not be accessible thru
> the POSIX timer interface.)

I think that should sombody choose to implement them, they should be available
at least through the clock_xxx interfaces..

> 
> It would be NICE if we could provide a way to hook other time support
> into the system.  In particular a
> 
> CLOCK_WWV or CLOCK_GPS
> 

CLOCK_NNTP

> might be nice.  The problem with these sorts of clocks is that they
> imply an array of function pointers for each clock and function pointers
> slow the code down because of their non predictability.  Never the less,
> we will attempt to allow easy expansion in this direction.
> 
> Implications on the current kernel:
> 
> The high resolution timers will require a fast clock access with the
> maximum supported resolution in order to convert relative times to
> absolute times.  This same fast clock will be used to support the
> various user and system time requests.
> 
> There are two ways to provide timers to the kernel.  For lack of a
> better term we will refer to them as "ticked" and "tick less".  Until we
> have performance information that implies that one or the other of these
> methods is better in all cases we will provide both ticked and tick less
> systems.  The variety to be used will be selected at configure time.
> 
> For tick less systems we will need to provide code to collect execution
> times.  For the ticked system the current method of collection these
> times will be used.  This project will NOT attempt to improve the
> resolution of these timers, however, the high speed, high resolution
> access to the current time will allow others to augment the system in
> this area.

huh? why not?

> 
> For the tick less system the project will also provide a time slice
> expiration interrupt.
> 
> The timer list(s) (all pending timers) need to be organized so that the
> following operations are fast:
> 
> a.) list insertion of an arbitrary timer,
should be O(log(n)) at worst

> b.) removal of canceled and expired timers, and
easy to make O(1)

> c.) finding the timer for "NOW" and its immediate followers.
head of list or top of tree or top of heap or whatever, O(1)

> Times in the timer list will be absolute and related to system up time.
> These times will be converted to wall time as needed.
 
and ONLY when needed by users


> 
> The POSIX interface provides for "absolute" timers relative to a given

do you mean "relative" timers?

> clock.  When these timers are related to a "wall" clock they will need
> adjusting when the wall clock time is adjusted.  These adjustments are
> done for "leap seconds" and the date command.  (Note, we are keeping
> timers relative to "uptime" which can not be adjusted.  This means that
> relative timers and absolute timers related to CLOCK_UPTIME are not
> affected by the above wall time adjustments.)

absolute timers will automatically fall out when you adjust CLOCK_UPTIME...
i.e.  an absolute time is an absolute time and since CLOCK_UPTIME is the
ultimate arbiter of what absolute time it is, adjusting CLOCK_UPTIME will cause
the absolute times in the timer list to be handled properly without modifying
them. (am I makeing myself clear?  I will try to come up with a better
description of what I mean)

> In either a ticked or tick less system, it is expected that resolutions 
> higher than 1/HZ will come with some additional overhead.  For this 
> reason, the CLOCK resolution will be used to round up times for each 
> timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the 
> project will attempt to meet or exceed the current systems timer 
> performance. 

ONLY in a ticked system.

> 
> Safe guards:
> 
> Given a system speed, there is a repeating timer rate which will consume
> 100% of the system in handling the timer interrupts.  An attempt will
> be made to detect this rate and adjust the timer to prevent system
> lockup.  This adjustment will look like timer overruns to the user
> (i.e. we will take a percent of the interrupts and record the untaken
> interrupts as overruns).

see my earlier e-mail, although it is a simple matter to enforce a minimum
allowable interval by parameter checking.


> 
> What the project will NOT do:
> 
> This project will NOT provide higher resolution accounting (i.e. user
> and system execution times).
> 
> The project will NOT provide higher resolution VIRTUAL or PROFILE timers.

as I said, there are some kool things we could do here, and we should design in
allowances for future upgrades which implement these things and let it get done
as a followon.



-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-17  0:45                           ` george anzinger
@ 2001-04-17 12:12                             ` Mark Salisbury
  0 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-17 12:12 UTC (permalink / raw)
  To: george anzinger, Mark Salisbury
  Cc: Jamie Lokier, Ben Greear, Horst von Brand, linux-kernel,
	high-res-timers-discourse

On Mon, 16 Apr 2001, you wrote:
> Mark Salisbury wrote:
> > 
> > > Given a system speed, there is a repeating timer rate which will consume
> > > 100% of the system in handling the timer interrupts.  An attempt will
> > > be made to detect this rate and adjust the timer to prevent system
> > > lockup.  This adjustment will look like timer overruns to the user
> > > (i.e. we will take a percent of the interrupts and record the untaken
> > > interrupts as overruns)
> > 
> > just at first blush, there are some things in general but I need to read
> > this again and more closely....
> > 
> > but, with POSIX timers, there is a nifty little restriction/protection built
> > into the spec regarding the re-insertion of short interval repeating timers.
> > that is: a repeating timer will not be re-inserted until AFTER the
> > associated signal handler has been handled.
> 
> Actually what it says is: "Only a single signal shall be queued to the
> process for a given timer at any point in time.  When a timer for which
> a signal is still pending expires, no signal shall be queued, and a
> timer overrun shall occur."
> 
> It then goes on to talk about the overrun count and how it is to be
> managed.
> 
I guess I was confusing what the spec said with the way in which I chose to
comply with the spec.  I calculate overruns (I know when a timer went off, and
how many of its intervals have passed since it went off by by 
time_since_expire/interval) and don't reinsert until return from the signal
handler.  

this prevents a flood of high frequency interrupts inherently and allows
processing of the signal handlers to proceed cleanly.

-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 23:57                         ` Mark Salisbury
@ 2001-04-17  0:45                           ` george anzinger
  2001-04-17 12:12                             ` Mark Salisbury
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-17  0:45 UTC (permalink / raw)
  To: Mark Salisbury
  Cc: Mark Salisbury, Jamie Lokier, Ben Greear, Horst von Brand,
	linux-kernel, high-res-timers-discourse

Mark Salisbury wrote:
> 
> > Given a system speed, there is a repeating timer rate which will consume
> > 100% of the system in handling the timer interrupts.  An attempt will
> > be made to detect this rate and adjust the timer to prevent system
> > lockup.  This adjustment will look like timer overruns to the user
> > (i.e. we will take a percent of the interrupts and record the untaken
> > interrupts as overruns)
> 
> just at first blush, there are some things in general but I need to read
> this again and more closely....
> 
> but, with POSIX timers, there is a nifty little restriction/protection built
> into the spec regarding the re-insertion of short interval repeating timers.
> that is: a repeating timer will not be re-inserted until AFTER the
> associated signal handler has been handled.

Actually what it says is: "Only a single signal shall be queued to the
process for a given timer at any point in time.  When a timer for which
a signal is still pending expires, no signal shall be queued, and a
timer overrun shall occur."

It then goes on to talk about the overrun count and how it is to be
managed.

What I am suggesting is that the system should detect when these
interrupts would come so fast as to stall the system and just set up a
percent of them while bumping the overrun count as if they had all
occured.

George

> 
> this has some interesting consequences for signal handling and signal
> delivery implementations, but importantly, it ensures that even a flood of
> POSIX timers with very short repeat intervals will be handled cleanly.
> 
> I will get more detailed comments to you tomorrow.
> 
> Mark Salisbury
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 19:19                       ` george anzinger
  2001-04-16 20:45                         ` Albert D. Cahalan
@ 2001-04-16 23:57                         ` Mark Salisbury
  2001-04-17  0:45                           ` george anzinger
  2001-04-17 12:51                         ` Mark Salisbury
  2 siblings, 1 reply; 118+ messages in thread
From: Mark Salisbury @ 2001-04-16 23:57 UTC (permalink / raw)
  To: george anzinger, Mark Salisbury
  Cc: Jamie Lokier, Ben Greear, Horst von Brand, linux-kernel,
	high-res-timers-discourse


> Given a system speed, there is a repeating timer rate which will consume
> 100% of the system in handling the timer interrupts.  An attempt will
> be made to detect this rate and adjust the timer to prevent system
> lockup.  This adjustment will look like timer overruns to the user
> (i.e. we will take a percent of the interrupts and record the untaken
> interrupts as overruns)

just at first blush, there are some things in general but I need to read
this again and more closely....

but, with POSIX timers, there is a nifty little restriction/protection built
into the spec regarding the re-insertion of short interval repeating timers.
that is: a repeating timer will not be re-inserted until AFTER the
associated signal handler has been handled.

this has some interesting consequences for signal handling and signal
delivery implementations, but importantly, it ensures that even a flood of
POSIX timers with very short repeat intervals will be handled cleanly.

I will get more detailed comments to you tomorrow.

Mark Salisbury


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 20:45                         ` Albert D. Cahalan
  2001-04-16 21:29                           ` Chris Wedgwood
@ 2001-04-16 22:25                           ` george anzinger
  1 sibling, 0 replies; 118+ messages in thread
From: george anzinger @ 2001-04-16 22:25 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: Mark Salisbury, Jamie Lokier, Ben Greear, Horst von Brand,
	linux-kernel, high-res-timers-discourse

"Albert D. Cahalan" wrote:
> 
> > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> > linux today).
> 
> Except on the Alpha, and on some ARM systems, etc.
> The HZ constant varies from 10 to 1200.

I suspect we will want to use 10 ms resolution for a clock named
CLOCK_10MS :)
On the other hand we could have a CLOCK_1_OVER_HZ...  Actually with
high-res-timers the actual HZ value becomes a bit less important.  It
would be "nice" to keep 1/HZ == jiffie, however.
> 
> > At the same time we will NOT support the following clocks:
> >
> > CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> > wall) of a given task.
> ...
> > For tick less systems we will need to provide code to collect execution
> > times.  For the ticked system the current method of collection these
> > times will be used.  This project will NOT attempt to improve the
> > resolution of these timers, however, the high speed, high resolution
> > access to the current time will allow others to augment the system in
> > this area.
> ...
> > This project will NOT provide higher resolution accounting (i.e. user
> > and system execution times).
> 
> It is nice to have accurate per-process user/system accounting.
> Since you'd be touching the code anyway...

Yeah sure... and will you pick up the ball on all the platform dependent
code to get high-res-timers on all the other platforms?  On second
thought I am reminded of the corollary to the old saw:  "The squeaking
wheel get the grease."  Which is: "He who complains most about the
squeaking gets to do the greasing."  Hint  hint.
> 
> > The POSIX interface provides for "absolute" timers relative to a given
> > clock.  When these timers are related to a "wall" clock they will need
> > adjusting when the wall clock time is adjusted.  These adjustments are
> > done for "leap seconds" and the date command.
> 
> This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
> defined POSIX time that is none of the above. This is a horrid mess,
> even ignoring gravity and speed. :-)
> 
> Can a second be 2 billion nanoseconds?
> Can a nanosecond be twice as long as normal?
> Can a second appear twice, with the nanoseconds getting reset?
> Can a second never appear at all?
> Can you compute times more than 6 months into the future?
> How far does time deviate from solar time? Is this constrained?
> 
> If you deal with leap seconds, you have to have a table of them.
> This table grows with time, with adjustments being made with only
> about 6 months notice. So the user upgrades after a year or two,
> and the installer discovers that the user has been running a
> system that is unaware of the most recent leap second. Arrrgh.
> 
> Sure you want to touch this? The Austin group argued over it for
> a very long time and never did find a really good solution.
> Maybe you should just keep the code simple and fast, without any
> concern for clock adjustments.

There is a relatively simple way to handle this, at least from the
high-res-timers point of view.  First we convert all timers to absolute
uptime.  This is a nice well behaved time.  At boot time we peg the wall
clock to the uptime.  Then at any given time, wall time is boot wall
time + uptime.  Then date, leap seconds, etc. affect the pegged value of
boot wall time.  Using the POSIX CLOCK id we allow the user to ask for
either version of time.  Now if we define an array of struc clock_id
which contains pointers to such things as functions to return time, any
algorithm you might want can be plugged in to bend time as needed.  The
only fly in this mess is the NTP rate adjustment stuff.  This code is
supposed to allow the system to adjust its ticker to produce accurate
seconds and so gets at the root of the uptime counter be it in hardware
or software or some combination of the two.  But then that's what makes
life interesting :)
> 
> > In either a ticked or tick less system, it is expected that resolutions
> > higher than 1/HZ will come with some additional overhead.  For this
> > reason, the CLOCK resolution will be used to round up times for each
> > timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
> > project will attempt to meet or exceed the current systems timer
> > performance.
> 
> Within the kernel at least, it would be good to let drivers specify
> desired resolution. Then a near-by value could be selected, perhaps
> with some consideration for event type. (for cache reasons)

This could be done, however, I would prefer to provide CLOCK_s to do
this as per the standard.  What does the community say?  In either case
you get different resolutions, but in the latter case the possible
values are fixed at least at configure time.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 20:45                         ` Albert D. Cahalan
@ 2001-04-16 21:29                           ` Chris Wedgwood
  2001-04-16 22:25                           ` george anzinger
  1 sibling, 0 replies; 118+ messages in thread
From: Chris Wedgwood @ 2001-04-16 21:29 UTC (permalink / raw)
  To: Albert D. Cahalan
  Cc: george anzinger, Mark Salisbury, Jamie Lokier, Ben Greear,
	Horst von Brand, linux-kernel, high-res-timers-discourse

On Mon, Apr 16, 2001 at 04:45:39PM -0400, Albert D. Cahalan wrote:

    > CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
    > linux today). 
    
    Except on the Alpha, and on some ARM systems, etc.
    The HZ constant varies from 10 to 1200.

Or some people use higher values on x86:

cw@charon(cw)$ uptime
 14:13:02 up 11 days, 21:44,  2 users,  load average: 0.00, 0.00, 0.00
cw@charon(cw)$ cat /proc/interrupts
           CPU0
  0: 2106736381          XT-PIC  timer
  1:     187633          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  7:   94128650          XT-PIC  eth0, usb-uhci, usb-uhci
 10:   18265929          XT-PIC  ide2
 12:    3107415          XT-PIC  PS/2 Mouse
 14:     753375          XT-PIC  via82cxxx
 15:          2          XT-PIC  ide1
NMI:          0
ERR:          0



thats 2048 ticks/sec -- I had to hack the kernel in a couple of
places to make things happy with that, but so far it seems to work.

I also had to hack libproc.so.* -- what a terribly obscure way that
it uses to determine what HZ is defined as!

I know this has ben beaten to death before but can someone tell me:

   - why we cannot export HZ somewhere

   - why is it called HZ, it seems misleading why not something like
     TICKS_PER_SECOND or similar

   - it seems there are a couple of places that assume HZ==100 still,
     or am I just imagining this (ide code, I forget where now)




  --cw

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 19:19                       ` george anzinger
@ 2001-04-16 20:45                         ` Albert D. Cahalan
  2001-04-16 21:29                           ` Chris Wedgwood
  2001-04-16 22:25                           ` george anzinger
  2001-04-16 23:57                         ` Mark Salisbury
  2001-04-17 12:51                         ` Mark Salisbury
  2 siblings, 2 replies; 118+ messages in thread
From: Albert D. Cahalan @ 2001-04-16 20:45 UTC (permalink / raw)
  To: george anzinger
  Cc: Mark Salisbury, Jamie Lokier, Ben Greear, Horst von Brand,
	linux-kernel, high-res-timers-discourse

> CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
> linux today). 

Except on the Alpha, and on some ARM systems, etc.
The HZ constant varies from 10 to 1200.

> At the same time we will NOT support the following clocks:
> 
> CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
> wall) of a given task.  
...
> For tick less systems we will need to provide code to collect execution
> times.  For the ticked system the current method of collection these
> times will be used.  This project will NOT attempt to improve the
> resolution of these timers, however, the high speed, high resolution
> access to the current time will allow others to augment the system in
> this area.
...
> This project will NOT provide higher resolution accounting (i.e. user
> and system execution times).

It is nice to have accurate per-process user/system accounting.
Since you'd be touching the code anyway...

> The POSIX interface provides for "absolute" timers relative to a given
> clock.  When these timers are related to a "wall" clock they will need
> adjusting when the wall clock time is adjusted.  These adjustments are
> done for "leap seconds" and the date command.

This is a BIG can of worms. You have UTC, TAI, GMT, and a loosely
defined POSIX time that is none of the above. This is a horrid mess,
even ignoring gravity and speed. :-)

Can a second be 2 billion nanoseconds?
Can a nanosecond be twice as long as normal?
Can a second appear twice, with the nanoseconds getting reset?
Can a second never appear at all?
Can you compute times more than 6 months into the future?
How far does time deviate from solar time? Is this constrained?

If you deal with leap seconds, you have to have a table of them.
This table grows with time, with adjustments being made with only
about 6 months notice. So the user upgrades after a year or two,
and the installer discovers that the user has been running a
system that is unaware of the most recent leap second. Arrrgh.

Sure you want to touch this? The Austin group argued over it for
a very long time and never did find a really good solution.
Maybe you should just keep the code simple and fast, without any
concern for clock adjustments.

> In either a ticked or tick less system, it is expected that resolutions
> higher than 1/HZ will come with some additional overhead.  For this
> reason, the CLOCK resolution will be used to round up times for each
> timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
> project will attempt to meet or exceed the current systems timer
> performance.

Within the kernel at least, it would be good to let drivers specify
desired resolution. Then a near-by value could be selected, perhaps
with some consideration for event type. (for cache reasons)


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16 12:36                     ` Mark Salisbury
@ 2001-04-16 19:19                       ` george anzinger
  2001-04-16 20:45                         ` Albert D. Cahalan
                                           ` (2 more replies)
  0 siblings, 3 replies; 118+ messages in thread
From: george anzinger @ 2001-04-16 19:19 UTC (permalink / raw)
  To: Mark Salisbury
  Cc: Jamie Lokier, Ben Greear, Horst von Brand, linux-kernel,
	high-res-timers-discourse

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]

Mark Salisbury wrote:
> 
> all this talk about which data structure to use and how to allocate memory is
> waaaay premature.
> 
> there needs to be a clear definition of the requirements that we wish to meet,
> including whether we are going to do ticked, tickless, or both
> 
> a func spec, for lack of a better term needs to be developed
> 
> then, when we go to design the thing, THEN is when we decide on the particular
> flavor of list/tree/heap/array/dbase that we use.
> 
> let's engineer this thing instead of hacking it.

Absolutely, find first draft attached.

Comments please.

George


~snip~

[-- Attachment #2: high-res-timers-fun-spec.txt --]
[-- Type: text/plain, Size: 5786 bytes --]

Functional Specification for the high-res-timers project.

http://sourceforge.net/projects/high-res-timers

We are developing code to implement the POSIX clocks & timers as defined
by IEEE Std 1003.1b-1993 Section 14.  (For an on line reference see our
home page: http://high-res-timers.sourceforge.net/ )
  
The API specifies the following functions (for details please see the spec):

int clock_settime(clockid_t clock_id, const struct timespec *tp);
int clock_gettime(clockid_t clock_id, struct timespec *tp);
int clock_getres(clockid_t clock_id, struct timespec *res);

int timer_creat(clockid_t clock_id, struct sigevent *evp,
                timer_t *timerid);
int timer_delete(timer_t *timerid);

int timer_settime(timer_t *timerid, int flags, 
                  const struct itimerspec *value,
                  struct itimerspec *ovalue);
int timer_gettime(timer_t *timerid, struct itimerspec *value);
int timer_getoverrun(timer_t *timerid);

int nanosleep( const struct timesped *rqtp, struct timespec *rmtp);

In addition we expect that we will provide a high resolution timer for
kernel use (heck, we may provide several).

In all this code we will code to allow resolutions to 1 nanosecond second (the
max provided by the timespec structure).  The actual resolution of
any given clock will be fixed at compile time and the code will do its
work at a higher resolution to avoid round off errors as much as
possible.

We will provide several "clocks" as defined by the standard.  In
particular, the following capabilities will be attached to some clock,
regardless of the actual clock "name" we end up using:

CLOCK_10MS a wall clock supporting timers with 10 ms resolution (same as
linux today). 

CLOCK_HIGH_RES a wall clock supporting timers with the highest
resolution the hardware supports.

CLOCK_1US a wall clock supporting timers with 1 micro second resolution
(if the hardware allows it).

CLOCK_UPTIME a clock that give the system up time.  (Does this clock
need to support timers?)

CLOCK_REALTIME a wall clock supporting timers with 1/HZ resolution.

At the same time we will NOT support the following clocks:

CLOCK_VIRTUAL a clock measuring the elapsed execution time (real or
wall) of a given task.  

CLOCK_PROFILE a clock used to generate profiling events. 

CLOCK_???  any clock keyed to a task.

(Note that this does not mean that the clock system will not support the
virtual and profile clocks, but that they will not be accessible thru
the POSIX timer interface.)

It would be NICE if we could provide a way to hook other time support
into the system.  In particular a

CLOCK_WWV or CLOCK_GPS

might be nice.  The problem with these sorts of clocks is that they
imply an array of function pointers for each clock and function pointers
slow the code down because of their non predictability.  Never the less,
we will attempt to allow easy expansion in this direction.

Implications on the current kernel:

The high resolution timers will require a fast clock access with the
maximum supported resolution in order to convert relative times to
absolute times.  This same fast clock will be used to support the
various user and system time requests.

There are two ways to provide timers to the kernel.  For lack of a
better term we will refer to them as "ticked" and "tick less".  Until we
have performance information that implies that one or the other of these
methods is better in all cases we will provide both ticked and tick less
systems.  The variety to be used will be selected at configure time.

For tick less systems we will need to provide code to collect execution
times.  For the ticked system the current method of collection these
times will be used.  This project will NOT attempt to improve the
resolution of these timers, however, the high speed, high resolution
access to the current time will allow others to augment the system in
this area.

For the tick less system the project will also provide a time slice
expiration interrupt.

The timer list(s) (all pending timers) need to be organized so that the
following operations are fast:

a.) list insertion of an arbitrary timer,
b.) removal of canceled and expired timers, and
c.) finding the timer for "NOW" and its immediate followers.

Times in the timer list will be absolute and related to system up time.
These times will be converted to wall time as needed.

The POSIX interface provides for "absolute" timers relative to a given
clock.  When these timers are related to a "wall" clock they will need
adjusting when the wall clock time is adjusted.  These adjustments are
done for "leap seconds" and the date command.  (Note, we are keeping
timers relative to "uptime" which can not be adjusted.  This means that
relative timers and absolute timers related to CLOCK_UPTIME are not
affected by the above wall time adjustments.)

In either a ticked or tick less system, it is expected that resolutions
higher than 1/HZ will come with some additional overhead.  For this
reason, the CLOCK resolution will be used to round up times for each
timer.  When the CLOCK provides 1/HZ (or coarser) resolution, the
project will attempt to meet or exceed the current systems timer
performance.

Safe guards:

Given a system speed, there is a repeating timer rate which will consume
100% of the system in handling the timer interrupts.  An attempt will
be made to detect this rate and adjust the timer to prevent system
lockup.  This adjustment will look like timer overruns to the user
(i.e. we will take a percent of the interrupts and record the untaken
interrupts as overruns).

What the project will NOT do:

This project will NOT provide higher resolution accounting (i.e. user
and system execution times).

The project will NOT provide higher resolution VIRTUAL or PROFILE timers.




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16  2:46                   ` Jamie Lokier
@ 2001-04-16 12:36                     ` Mark Salisbury
  2001-04-16 19:19                       ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Mark Salisbury @ 2001-04-16 12:36 UTC (permalink / raw)
  To: Jamie Lokier, Ben Greear
  Cc: george anzinger, Horst von Brand, linux-kernel,
	high-res-timers-discourse

all this talk about which data structure to use and how to allocate memory is
waaaay premature.

there needs to be a clear definition of the requirements that we wish to meet,
including whether we are going to do ticked, tickless, or both

a func spec, for lack of a better term needs to be developed

then, when we go to design the thing, THEN is when we decide on the particular
flavor of list/tree/heap/array/dbase that we use.

let's engineer this thing instead of hacking it.



On Sun, 15 Apr 2001, Jamie Lokier wrote:
> Ben Greear wrote:
> > > Note that jumping around the array thrashes no more cache than
> > > traversing a tree (it touches the same number of elements).  I prefer
> > > heap-ordered trees though because fixed size is always a bad idea.
> > 
> > With a tree, you will be allocating and de-allocating for every
> > insert/delete right?
> 
> No, there is no memory allocation.
> 
> > On cache-coherency issues, wouldn't it be more likely to have a cache
> > hit when you are accessing one contigious (ie the array) piece of
> > memory?  A 4-k page will hold a lot of indexes!!
> 
> No, because when traversing an array-heap, you don't access contiguous
> entries.  You might get one or two more cache hits near the root of the
> heap.
> 
> > To get around the fixed size thing..could have
> > the array grow itself when needed (and probably never shrink again).
> 
> You could to that, but then you'd have to deal with memory allocation
> failures and memory deadlocks, making add_timer rather complicated.
> It's not acceptable for add_timer to fail or require kmalloc().
> 
> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13 23:10               ` Jamie Lokier
@ 2001-04-16  3:02                 ` Ben Greear
  2001-04-16  2:46                   ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: Ben Greear @ 2001-04-16  3:02 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: george anzinger, Horst von Brand, linux-kernel,
	high-res-timers-discourse

Jamie Lokier wrote:
> 
> george anzinger wrote:
> > Horst von Brand wrote:
> > >
> > > Ben Greear <greearb@candelatech.com> said:
> > >
> > > [...]
> > >
> > > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > > front....  It can be implemented in an array which should help cache
> > > > coherency and all those other things they talked about in school :)
> > >
> > > Insertion and deleting the first are both O(log N). Plus the array is fixed
> > > size (bad idea) and the jumping around in the array thrashes the caches.
> > > --
> > And your solution is?
> 
> Note that jumping around the array thrashes no more cache than
> traversing a tree (it touches the same number of elements).  I prefer
> heap-ordered trees though because fixed size is always a bad idea.

With a tree, you will be allocating and de-allocating for every
insert/delete right?  That seems like a reasonable performance
hit that an array-based approach might not have... 

On cache-coherency issues, wouldn't it be more likely to have a cache hit when you are
accessing one contigious (ie the array) piece of memory?  A 4-k page
will hold a lot of indexes!!

To get around the fixed size thing..could have
the array grow itself when needed (and probably never shrink again).
This would suck if you did it often, but I'm assuming that it would
quickly grow to needed size and then stabalize...

> 
> Insertion is O(1) if entries can be predicted to be near
> enough some place in the list, be that the beginning, the end, or some
> marked places in the middle.
> 
> By the way, the current timer implementation only appears to be O(1) if
> you ignore the overhead of having to do a check on every tick, and the
> extra processing on table rollover.  For random timer usage patterns,
> that overhead adds up to O(log n), the same as a heap.
> 
> However for skewed usage patterns (likely in the kernel), the current
> table method avoids the O(log n) sorting overhead because long-delay
> timers are often removed before percolating down to the smallest tables.
> It is possible to produce a general purpose heap which also has this
> property.
> 
> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear (greearb@candelatech.com)  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com 4444        (Released under GPL)
http://scry.wanfear.com               http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-16  3:02                 ` Ben Greear
@ 2001-04-16  2:46                   ` Jamie Lokier
  2001-04-16 12:36                     ` Mark Salisbury
  0 siblings, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-16  2:46 UTC (permalink / raw)
  To: Ben Greear
  Cc: george anzinger, Horst von Brand, linux-kernel,
	high-res-timers-discourse

Ben Greear wrote:
> > Note that jumping around the array thrashes no more cache than
> > traversing a tree (it touches the same number of elements).  I prefer
> > heap-ordered trees though because fixed size is always a bad idea.
> 
> With a tree, you will be allocating and de-allocating for every
> insert/delete right?

No, there is no memory allocation.

> On cache-coherency issues, wouldn't it be more likely to have a cache
> hit when you are accessing one contigious (ie the array) piece of
> memory?  A 4-k page will hold a lot of indexes!!

No, because when traversing an array-heap, you don't access contiguous
entries.  You might get one or two more cache hits near the root of the
heap.

> To get around the fixed size thing..could have
> the array grow itself when needed (and probably never shrink again).

You could to that, but then you'd have to deal with memory allocation
failures and memory deadlocks, making add_timer rather complicated.
It's not acceptable for add_timer to fail or require kmalloc().

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13 21:53             ` george anzinger
  2001-04-13 23:10               ` Jamie Lokier
@ 2001-04-16  2:41               ` Ben Greear
  1 sibling, 0 replies; 118+ messages in thread
From: Ben Greear @ 2001-04-16  2:41 UTC (permalink / raw)
  To: george anzinger; +Cc: Horst von Brand, linux-kernel, high-res-timers-discourse

george anzinger wrote:
> 
> Horst von Brand wrote:
> >
> > Ben Greear <greearb@candelatech.com> said:
> >
> > [...]
> >
> > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > front....  It can be implemented in an array which should help cache
> > > coherency and all those other things they talked about in school :)
> >
> > Insertion and deleting the first are both O(log N). Plus the array is fixed
> > size (bad idea) and the jumping around in the array thrashes the caches.

Jumping around an array can't be much worse than jumping around
a linked list, can it?

It does not have to be fixed length, though it wouldn't be log(n) to
grow the array, it can still be done...and once you reach maximal
size, you won't be growing it any more...

I had forgotten about the log(n) to delete, though log(n) is
still pretty good.  As others have suggested, it might be good
to have a linked list for very-soon-to-expire timers.  However,
they would have to be few enough that your linear insert was
not worse than a log(n) instert and a log(n) delete...

> > --
> And your solution is?


> 
> George

-- 
Ben Greear (greearb@candelatech.com)  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com 4444        (Released under GPL)
http://scry.wanfear.com               http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13 21:53             ` george anzinger
@ 2001-04-13 23:10               ` Jamie Lokier
  2001-04-16  3:02                 ` Ben Greear
  2001-04-16  2:41               ` Ben Greear
  1 sibling, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-13 23:10 UTC (permalink / raw)
  To: george anzinger
  Cc: Horst von Brand, Ben Greear, linux-kernel, high-res-timers-discourse

george anzinger wrote:
> Horst von Brand wrote:
> > 
> > Ben Greear <greearb@candelatech.com> said:
> > 
> > [...]
> > 
> > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > front....  It can be implemented in an array which should help cache
> > > coherency and all those other things they talked about in school :)
> > 
> > Insertion and deleting the first are both O(log N). Plus the array is fixed
> > size (bad idea) and the jumping around in the array thrashes the caches.
> > --
> And your solution is?

Note that jumping around the array thrashes no more cache than
traversing a tree (it touches the same number of elements).  I prefer
heap-ordered trees though because fixed size is always a bad idea.

Insertion is O(1) if entries can be predicted to be near
enough some place in the list, be that the beginning, the end, or some
marked places in the middle.

By the way, the current timer implementation only appears to be O(1) if
you ignore the overhead of having to do a check on every tick, and the
extra processing on table rollover.  For random timer usage patterns,
that overhead adds up to O(log n), the same as a heap.

However for skewed usage patterns (likely in the kernel), the current
table method avoids the O(log n) sorting overhead because long-delay
timers are often removed before percolating down to the smallest tables.
It is possible to produce a general purpose heap which also has this
property.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13 16:07               ` george anzinger
@ 2001-04-13 23:00                 ` Jamie Lokier
  0 siblings, 0 replies; 118+ messages in thread
From: Jamie Lokier @ 2001-04-13 23:00 UTC (permalink / raw)
  To: george anzinger
  Cc: Ben Greear, Bret Indrelee, Linux Kernel Mailing List,
	high-res-timers-discourse

george anzinger wrote:
> If we are to do high-res-timers, I think we will always have to do some
> sort of a sort on insert.

Yes, that's called a priority queue...  (And you don't actually sort on
each insertion).

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13 12:05           ` Horst von Brand
@ 2001-04-13 21:53             ` george anzinger
  2001-04-13 23:10               ` Jamie Lokier
  2001-04-16  2:41               ` Ben Greear
  0 siblings, 2 replies; 118+ messages in thread
From: george anzinger @ 2001-04-13 21:53 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Ben Greear, linux-kernel, high-res-timers-discourse

Horst von Brand wrote:
> 
> Ben Greear <greearb@candelatech.com> said:
> 
> [...]
> 
> > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > is log(n) and finding the one with the least time is O(1), ie pop off the
> > front....  It can be implemented in an array which should help cache
> > coherency and all those other things they talked about in school :)
> 
> Insertion and deleting the first are both O(log N). Plus the array is fixed
> size (bad idea) and the jumping around in the array thrashes the caches.
> --
And your solution is?

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13 10:36             ` Jamie Lokier
@ 2001-04-13 16:07               ` george anzinger
  2001-04-13 23:00                 ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-13 16:07 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Ben Greear, Bret Indrelee, Linux Kernel Mailing List,
	high-res-timers-discourse

Jamie Lokier wrote:
> 
> george anzinger wrote:
> > > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > > is log(n) and finding the one with the least time is O(1), ie pop off the
> > > front....  It can be implemented in an array which should help cache
> > > coherency and all those other things they talked about in school :)
> > >
> > I must be missing something here.  You get log(n) from what?  B-trees?
> > How would you manage them to get the needed balance?  Stopping the world
> > to re-balance is worse than the cascade.  I guess I need to read up on
> > this stuff.  A good pointer would be much appreciated.
> 
> Look for "priority queues" and "heaps".  In its simplest form, you use a
> heap-ordered tree, which can be implemented using an array (that's
> usually how it's presented), or having the objects in the heap point to
> each other.
> 
> A heap-ordered tree is not as strictly ordered as, well, an ordered tree
> :-)  The rule is: if A is the parent of B and C, then A expires earlier
> than B, and A expires earlier than C.  There is no constraint on the
> relative expiry times of B and C.
> 
> There is no "stop the world" to rebalance, which you may consider an
> advantage over the present hierarchy of tables.  On the other hand, each
> insertion or deletion operation takes O(log n) time, where n is the
> number of items in the queue.  Although fairly fast, this bound can be
> improved if you know the typical insertion/deletion patterns, to O(1)
> for selected cases.  Also you should know that not all priority queues
> are based on heap-ordered trees.
> 
> Linux' current hierarchy of tables is a good example of optimisation: it
> is optimised for inserting and actually running short term timers, as
> well as inserting and deleting (before running) long term timers.  These
> extremes take O(1) for insertion, removal and expiry, including the
> "stop the world" time.  This should be considered before and move to a
> heap-based priority queue, which may turn out slower.
> 
> > Meanwhile, I keep thinking of a simple doubly linked list in time
> > order.  To speed it up keep an array of pointers to the first N whole
> > jiffie points and maybe pointers to coarser points beyond the first N.
> > Make N, say 512.  Build the pointers as needed.  This should give
> > something like O(n/N) insertion and O(1) removal.
> 
> You've just described the same as the current implementation, but with
> lists for longer term timers.  I.e. slower.  With your coarser points,
> you have to sort the front elements of the coarse list into a fine one
> from time to time.
> 
> The idea of having jiffie-point pointers into a data structure for fast
> insertion is a good one for speeding up any data structure for that
> common case, though.
> 
If we are to do high-res-timers, I think we will always have to do some
sort of a sort on insert.  If we can keep the sort to a VERY short list
and only do it on sub jiffie timers, we will have something that is VERY
close to what we have now.

I think that the density of timers tends to increase as they get closer
to "now".  This should allow coarser pointers for times more removed
from "now".  Still if we sort on insert we will not have to resort
later.

The pointers into the list, however, are perishable and need to be
refreshed as time passes.  My though was to do this only when a needed
pointer was found to be "expired" and then only for the needed pointer. 
On first need we would have a small overhead, but would remember for
next time.

Thanks for the good input

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13  6:32         ` Ben Greear
  2001-04-13  8:42           ` george anzinger
@ 2001-04-13 12:05           ` Horst von Brand
  2001-04-13 21:53             ` george anzinger
  1 sibling, 1 reply; 118+ messages in thread
From: Horst von Brand @ 2001-04-13 12:05 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-kernel

Ben Greear <greearb@candelatech.com> said:

[...]

> Wouldn't a heap be a good data structure for a list of timers?  Insertion
> is log(n) and finding the one with the least time is O(1), ie pop off the
> front....  It can be implemented in an array which should help cache
> coherency and all those other things they talked about in school :)

Insertion and deleting the first are both O(log N). Plus the array is fixed
size (bad idea) and the jumping around in the array thrashes the caches.
-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Vin~a del Mar, Chile                               +56 32 672616

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13  8:42           ` george anzinger
@ 2001-04-13 10:36             ` Jamie Lokier
  2001-04-13 16:07               ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-13 10:36 UTC (permalink / raw)
  To: george anzinger
  Cc: Ben Greear, Bret Indrelee, Linux Kernel Mailing List,
	high-res-timers-discourse

george anzinger wrote:
> > Wouldn't a heap be a good data structure for a list of timers?  Insertion
> > is log(n) and finding the one with the least time is O(1), ie pop off the
> > front....  It can be implemented in an array which should help cache
> > coherency and all those other things they talked about in school :)
> > 
> I must be missing something here.  You get log(n) from what?  B-trees? 
> How would you manage them to get the needed balance?  Stopping the world
> to re-balance is worse than the cascade.  I guess I need to read up on
> this stuff.  A good pointer would be much appreciated. 

Look for "priority queues" and "heaps".  In its simplest form, you use a
heap-ordered tree, which can be implemented using an array (that's
usually how it's presented), or having the objects in the heap point to
each other.

A heap-ordered tree is not as strictly ordered as, well, an ordered tree
:-)  The rule is: if A is the parent of B and C, then A expires earlier
than B, and A expires earlier than C.  There is no constraint on the
relative expiry times of B and C.

There is no "stop the world" to rebalance, which you may consider an
advantage over the present hierarchy of tables.  On the other hand, each
insertion or deletion operation takes O(log n) time, where n is the
number of items in the queue.  Although fairly fast, this bound can be
improved if you know the typical insertion/deletion patterns, to O(1)
for selected cases.  Also you should know that not all priority queues
are based on heap-ordered trees.

Linux' current hierarchy of tables is a good example of optimisation: it
is optimised for inserting and actually running short term timers, as
well as inserting and deleting (before running) long term timers.  These
extremes take O(1) for insertion, removal and expiry, including the
"stop the world" time.  This should be considered before and move to a
heap-based priority queue, which may turn out slower.

> Meanwhile, I keep thinking of a simple doubly linked list in time
> order.  To speed it up keep an array of pointers to the first N whole
> jiffie points and maybe pointers to coarser points beyond the first N. 
> Make N, say 512.  Build the pointers as needed.  This should give
> something like O(n/N) insertion and O(1) removal.

You've just described the same as the current implementation, but with
lists for longer term timers.  I.e. slower.  With your coarser points,
you have to sort the front elements of the coarse list into a fine one
from time to time.

The idea of having jiffie-point pointers into a data structure for fast
insertion is a good one for speeding up any data structure for that
common case, though.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13  6:32         ` Ben Greear
@ 2001-04-13  8:42           ` george anzinger
  2001-04-13 10:36             ` Jamie Lokier
  2001-04-13 12:05           ` Horst von Brand
  1 sibling, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-13  8:42 UTC (permalink / raw)
  To: Ben Greear
  Cc: Bret Indrelee, Linux Kernel Mailing List, high-res-timers-discourse

Ben Greear wrote:
> 
> Bret Indrelee wrote:
> >
> > On Thu, 12 Apr 2001, george anzinger wrote:
> > > Bret Indrelee wrote:
> > > >
> > > > On Thu, 12 Apr 2001, george anzinger wrote:
> > > > > Bret Indrelee wrote:
> > > > > > Keep all timers in a sorted double-linked list. Do the insert
> > > > > > intelligently, adding it from the back or front of the list depending on
> > > > > > where it is in relation to existing entries.
> > > > >
> > > > > I think this is too slow, especially for a busy system, but there are
> > > > > solutions...
> > > >
> > > > It is better than the current solution.
> > >
> > > Uh, where are we talking about.  The current time list insert is real
> > > close to O(1) and never more than O(5).
> >
> > I don't like the cost of the cascades every (as I recall) 256
> > interrupts. This is more work than is done in the rest of the interrupt
> > processing, happens during the tick interrupt, and results in a rebuild of
> > much of the table.

Right, it needs to go, we need to eliminate the "lumps" in time :)
> >
> > -Bret
> 
> Wouldn't a heap be a good data structure for a list of timers?  Insertion
> is log(n) and finding the one with the least time is O(1), ie pop off the
> front....  It can be implemented in an array which should help cache
> coherency and all those other things they talked about in school :)
> 
I must be missing something here.  You get log(n) from what?  B-trees? 
How would you manage them to get the needed balance?  Stopping the world
to re-balance is worse than the cascade.  I guess I need to read up on
this stuff.  A good pointer would be much appreciated. 

Meanwhile, I keep thinking of a simple doubly linked list in time
order.  To speed it up keep an array of pointers to the first N whole
jiffie points and maybe pointers to coarser points beyond the first N. 
Make N, say 512.  Build the pointers as needed.  This should give
something like O(n/N) insertion and O(1) removal.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-13  4:00       ` Bret Indrelee
@ 2001-04-13  6:32         ` Ben Greear
  2001-04-13  8:42           ` george anzinger
  2001-04-13 12:05           ` Horst von Brand
  0 siblings, 2 replies; 118+ messages in thread
From: Ben Greear @ 2001-04-13  6:32 UTC (permalink / raw)
  To: Bret Indrelee
  Cc: george anzinger, Linux Kernel Mailing List, high-res-timers-discourse

Bret Indrelee wrote:
> 
> On Thu, 12 Apr 2001, george anzinger wrote:
> > Bret Indrelee wrote:
> > >
> > > On Thu, 12 Apr 2001, george anzinger wrote:
> > > > Bret Indrelee wrote:
> > > > > Keep all timers in a sorted double-linked list. Do the insert
> > > > > intelligently, adding it from the back or front of the list depending on
> > > > > where it is in relation to existing entries.
> > > >
> > > > I think this is too slow, especially for a busy system, but there are
> > > > solutions...
> > >
> > > It is better than the current solution.
> >
> > Uh, where are we talking about.  The current time list insert is real
> > close to O(1) and never more than O(5).
> 
> I don't like the cost of the cascades every (as I recall) 256
> interrupts. This is more work than is done in the rest of the interrupt
> processing, happens during the tick interrupt, and results in a rebuild of
> much of the table.
> 
> -Bret

Wouldn't a heap be a good data structure for a list of timers?  Insertion
is log(n) and finding the one with the least time is O(1), ie pop off the
front....  It can be implemented in an array which should help cache
coherency and all those other things they talked about in school :)

Ben
> 
> ------------------------------------------------------------------------------
> Bret Indrelee |  Sometimes, to be deep, we must act shallow!
> bret@io.com   |  -Riff in The Quatrix
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear (greearb@candelatech.com)  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com 4444        (Released under GPL)
http://scry.wanfear.com               http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-12 22:20     ` george anzinger
@ 2001-04-13  4:00       ` Bret Indrelee
  2001-04-13  6:32         ` Ben Greear
  0 siblings, 1 reply; 118+ messages in thread
From: Bret Indrelee @ 2001-04-13  4:00 UTC (permalink / raw)
  To: george anzinger; +Cc: Linux Kernel Mailing List, high-res-timers-discourse

On Thu, 12 Apr 2001, george anzinger wrote:
> Bret Indrelee wrote:
> > 
> > On Thu, 12 Apr 2001, george anzinger wrote:
> > > Bret Indrelee wrote:
> > > > Keep all timers in a sorted double-linked list. Do the insert
> > > > intelligently, adding it from the back or front of the list depending on
> > > > where it is in relation to existing entries.
> > >
> > > I think this is too slow, especially for a busy system, but there are
> > > solutions...
> > 
> > It is better than the current solution.
> 
> Uh, where are we talking about.  The current time list insert is real
> close to O(1) and never more than O(5).

I don't like the cost of the cascades every (as I recall) 256
interrupts. This is more work than is done in the rest of the interrupt
processing, happens during the tick interrupt, and results in a rebuild of
much of the table.

-Bret

------------------------------------------------------------------------------
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
bret@io.com   |  -Riff in The Quatrix


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-12 21:19   ` Bret Indrelee
@ 2001-04-12 22:20     ` george anzinger
  2001-04-13  4:00       ` Bret Indrelee
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-12 22:20 UTC (permalink / raw)
  To: Bret Indrelee; +Cc: Linux Kernel Mailing List, high-res-timers-discourse

Bret Indrelee wrote:
> 
> On Thu, 12 Apr 2001, george anzinger wrote:
> > Bret Indrelee wrote:
> > > Keep all timers in a sorted double-linked list. Do the insert
> > > intelligently, adding it from the back or front of the list depending on
> > > where it is in relation to existing entries.
> >
> > I think this is too slow, especially for a busy system, but there are
> > solutions...
> 
> It is better than the current solution.

Uh, where are we talking about.  The current time list insert is real
close to O(1) and never more than O(5).
> 
> The insert takes the most time, having to scan through the list. If you
> had to scan the whole list it would be O(n) with a simple linked list. If
> you insert it from the end, it is almost always going to be less than
> that.

Right, but compared to the current O(5) max, this is just too long.
> 
> The time to remove is O(1).
> 
> Fetching the first element from the list is also O(1), but you may have to
> fetch several items that have all expired. Here you could do something
> clever. Just make sure it is O(1) to determine if the list is empty.
> 
I would hope to move expired timers to another list or just process
them.  In any case they should not be a problem here.

One of the posts that started all this mentioned a tick less system (on
a 360 I think) that used the current time list.  They had to scan
forward in time to find the next event and easy 10 ms was a new list to
look at.  Conclusion: The current list structure is NOT organized for
tick less time keeping.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-12 17:39 ` george anzinger
@ 2001-04-12 21:19   ` Bret Indrelee
  2001-04-12 22:20     ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Bret Indrelee @ 2001-04-12 21:19 UTC (permalink / raw)
  To: george anzinger; +Cc: Linux Kernel Mailing List

On Thu, 12 Apr 2001, george anzinger wrote:
> Bret Indrelee wrote:
> > Keep all timers in a sorted double-linked list. Do the insert
> > intelligently, adding it from the back or front of the list depending on
> > where it is in relation to existing entries.
> 
> I think this is too slow, especially for a busy system, but there are
> solutions...

It is better than the current solution.

The insert takes the most time, having to scan through the list. If you
had to scan the whole list it would be O(n) with a simple linked list. If
you insert it from the end, it is almost always going to be less than
that.

The time to remove is O(1).

Fetching the first element from the list is also O(1), but you may have to
fetch several items that have all expired. Here you could do something
clever. Just make sure it is O(1) to determine if the list is empty.

[ snip ]
> > The real trick is to do a lot less processing on every tick than is
> > currently done. Current generation PCs can easily handle 1000s of
> > interrupts a second if you keep the overhead small.
> 
> I don't see the logic here.  Having taken the interrupt, one would tend
> to want to do as much as possible, rather than schedule another
> interrupt to continue the processing.  Rather, I think you are trying to
> say that we can afford to take more interrupts for time keeping.  Still,
> I think what we are trying to get with tick less timers is a system that
> takes FEWER interrupts, not more.

The system should be CAPABLE of handling 1000s of interrupts without
excessive system load.

The actual number of interrupts would be reduced if we adjusted the
interrupt interval based on the head of the list.

Two different things.

-Bret

------------------------------------------------------------------------------
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
bret@io.com   |  -Riff in The Quatrix


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
  2001-04-11 17:56 No 100 HZ timer! Bret Indrelee
@ 2001-04-12 17:39 ` george anzinger
  2001-04-12 21:19   ` Bret Indrelee
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-12 17:39 UTC (permalink / raw)
  To: Bret Indrelee; +Cc: linux-kernel

Bret Indrelee wrote:
> 
> Mikulas Patocka (mikulas@artax.karlin.mff.cuni.cz) wrote:
> > Adding and removing timers happens much more frequently than PIT tick,
> > so
> > comparing these times is pointless.
> >
> > If you have some device and timer protecting it from lockup on buggy
> > hardware, you actually
> >
> > send request to device
> > add timer
> >
> > receive interrupt and read reply
> > remove timer
> >
> > With the curent timer semantics, the cost of add timer and del timer is
> > nearly zero. If you had to reprogram the PIT on each request and reply,
> > it
> > would slow things down.
> >
> > Note that you call mod_timer also on each packet received - and in worst
> > case (which may happen), you end up reprogramming the PIT on each
> > packet.
> 
> You can still have nearly zero cost for the normal case. Avoiding worst
> case behaviour is also pretty easy.
> 
> You only reprogram the PIT if you have to change the interval.
> 
> Keep all timers in a sorted double-linked list. Do the insert
> intelligently, adding it from the back or front of the list depending on
> where it is in relation to existing entries.

I think this is too slow, especially for a busy system, but there are
solutions...
> 
> You only need to reprogram the interval timer when:
> 1. You've got a new entry at the head of the list
> AND
> 2. You've reprogrammed the interval to something larger than the time to
> the new head of list.

Uh, I think 1. IMPLIES 2.  If it is at the head, it must be closer in
than what the system is waiting for (unless, of course its time has
already passed, but lets not consider that here).
> 
> In the case of a device timeout, it is usually not going to be inserted at
> the head of the list. It is very seldom going to actually timeout.

Right, and if the system doesn't have many timers, thus putting this at
the head, it has the time to do the extra work.
> 
> Choose your interval wisely, only increasing it when you know it will pay
> off. The best way of doing this would probably be to track some sort
> of LCD for timeouts.

I wonder about a more relaxed device timeout semantic that says
something like: wait X + next timer interrupt.  This would cause the
timer insert code to find an entry at least X out and put this timer
just after it.  There are other ways to do this of course.  The question
here is: Would this be worth while?
> 
> The real trick is to do a lot less processing on every tick than is
> currently done. Current generation PCs can easily handle 1000s of
> interrupts a second if you keep the overhead small.

I don't see the logic here.  Having taken the interrupt, one would tend
to want to do as much as possible, rather than schedule another
interrupt to continue the processing.  Rather, I think you are trying to
say that we can afford to take more interrupts for time keeping.  Still,
I think what we are trying to get with tick less timers is a system that
takes FEWER interrupts, not more.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-12 12:58 Mark Salisbury
  0 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-12 12:58 UTC (permalink / raw)
  To: linux-kernel


On Wed, 11 Apr 2001, Bret Indrelee wrote:
> Current generation PCs can easily handle 1000s of
> interrupts a second if you keep the overhead small.

the PC centric implementation of the ticked system is one of its major flaws.

there are architectures which the cost of a fixed interval is the same as a
variable interval, i.e. no reload register, so you must explicitely load a
value each interrupt anyway. and if you want accurate intervals, you must
perform a calculation each reload, and not just load a static value, because
you don't know how many cycles it took from the time the interrupt happened
till you get to the reload line because the int may have been masked or lower
pri than another interrupt.

also, why handle 1000's of interrupts if you only need to handle 10?

-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11 16:13                                 ` Jamie Lokier
@ 2001-04-12  9:51                                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 118+ messages in thread
From: Maciej W. Rozycki @ 2001-04-12  9:51 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Zdenek Kabelac, linux-kernel

On Wed, 11 Apr 2001, Jamie Lokier wrote:

> Think of the original 64k and 256k VGA cards.  I think some of those
> didn't have an irq, but did have a way to read the progress of the
> raster, which you could PLL to using timer interrupts.  Some video games
> still look fine at 320x200 :-)

 The *original* VGA, i.e. the PS/2 one, did have an IRQ, IIRC (according
to docs -- I haven't ever seen one).  Cheap clones might have lacked it,
though.

 Then there is workstation (non-PC)  hardware from early '90s which we run
on and which uses an IRQ-driven interface to graphics adapters -- not only
for vsync. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-12  5:25                     ` watermodem
@ 2001-04-12  8:45                       ` Jamie Lokier
  0 siblings, 0 replies; 118+ messages in thread
From: Jamie Lokier @ 2001-04-12  8:45 UTC (permalink / raw)
  To: watermodem; +Cc: linux-kernel

watermodem wrote:
> As somebody who is now debating how to measure latencies in a 
> giga-bit ethernet environment with several components doing 
> L3 switching in much less than 10 micro-seconds ... (hardware)
> I agree that some method is need to achieve higher resolutions.  
> (Sigh... I will likely need to buy something big and expensive)
> {this is for a system to make use of L. Yarrow's little protocol}

Use Alteon/Netgear cards, everyone else seems to be :-)  We get
measurements on the order of 100ns, if we are just measuring network
latencies.  (Data is not transferred over the PCI bus).

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11 19:21                                           ` John Alvord
@ 2001-04-12  8:41                                             ` Jamie Lokier
  0 siblings, 0 replies; 118+ messages in thread
From: Jamie Lokier @ 2001-04-12  8:41 UTC (permalink / raw)
  To: John Alvord
  Cc: george anzinger, Mark Salisbury, mark salisbury,
	high-res-timers-discourse, Alan Cox, Mikulas Patocka,
	David Schleef, Jeff Dike, schwidefsky, linux-kernel

John Alvord wrote:
> I bumped into a funny non-optimization a few years ago. A system with
> a timer queue like the above had been "optimized" by keeping old timer
> elements... ready for new tasks to link onto and activate. The
> granularity was 1 millsecond. Over time, all timer values from 0 to
> roughly 10 minutes had been used. That resulted in 60,000 permanent
> storage fragments laying about... a significant fragmentation problem.
> The end was a forced recycle every month or so.

This is the sort of thing that Linux does with slab, dentry, inode
caches and so on.  In theory the memory is reclaimed as required :-)

It's not a big issue with timers, as the timer elements are fixed size
structures that tend to be embedded in other structures.  So the
lifetime of the timer element is the same as the lifetime of the object
associated with the timer.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 15:43                   ` Alan Cox
@ 2001-04-12  5:25                     ` watermodem
  2001-04-12  8:45                       ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: watermodem @ 2001-04-12  5:25 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:
> 
> > Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> > and MIN_DELACK is 0.04s, TCP would hardly benefit from them.
> 
> There are a considerable number of people who really do need 1Khz resolution.
> Midi is one of the example cases. That doesn't mean we have to go to a 1KHz
> timer, we may want to combine a 100Hz timer with a smart switch to 1Khz

As somebody who is now debating how to measure latencies in a 
giga-bit ethernet environment with several components doing 
L3 switching in much less than 10 micro-seconds ... (hardware)
I agree that some method is need to achieve higher resolutions.  
(Sigh... I will likely need to buy something big and expensive)
{this is for a system to make use of L. Yarrow's little protocol}

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11  2:35                                     ` george anzinger
@ 2001-04-12  0:24                                       ` Mark Salisbury
  0 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-12  0:24 UTC (permalink / raw)
  To: george anzinger; +Cc: high-res-timers-discourse, linux-kernel

> I suspect you might go for ticked if its overhead was less.  The thing
> that makes me think the overhead is high for tick less is the accounting
> and time slice stuff.  This has to be handled each system call and each
> schedule call.  These happen WAY more often than ticks...  Contrary
> wise, I would go for tick less if its overhead is the same or less under
> a reasonable load.  Of course, we would also want to check overload
> sensitivity.

as I said, i think the process time accounting is disjoint from the
ticked/tickless issue and should therefore be considered seperately..

in my experience with the tickless method, after all pending timers have
been serviced, then to determine the interval until the next interrupt
should be generated all that is needed is one 64 bit subtraction and a
register load (about 5 - 8 cycles)

I think we should spend some serious time with some quick and dirty
prototypes of both methods, both to characterize all the issues raised and
to refine our ideas when it comes time to implement this.

I also think that we should give some thought to implementing both an
improved ticked system and a tickless system to be chosen as CONFIG options
so that the developer putting together a system can make a choice based on
their needs, and not our whims.  I am more than willing to concede that
there is a class of customer to whom a ticked system is appropriate, but I
am quite sure that there is a class for whom the ticked model is completely
unacceptable.

> On a half way loaded system?  Do you think it is that high?  I sort of
> think that, once the system is loaded, there would be a useful timer
> tick most of the time.  Might be useful to do some measurements of this.

any non-useful tick takes away cycles that could be better used performing
FFT's

there are many kinds of loads, some which generate large numbers of timed
events and some that generate none or few.
I think we want to be able to provide good solutions for both cases even.

we should do lots of measurements.

    Mark



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11 18:57                                         ` Jamie Lokier
@ 2001-04-11 19:21                                           ` John Alvord
  2001-04-12  8:41                                             ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: John Alvord @ 2001-04-11 19:21 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: george anzinger, Mark Salisbury, mark salisbury,
	high-res-timers-discourse, Alan Cox, Mikulas Patocka,
	David Schleef, Jeff Dike, schwidefsky, linux-kernel

On Wed, 11 Apr 2001 20:57:04 +0200, Jamie Lokier
<lk@tantalophile.demon.co.uk> wrote:

>george anzinger wrote:
>> > A pointer-based priority queue is really not a very complex thing, and
>> > there are ways to optimise them for typical cases like the above.
>> > 
>> Please do enlighten me.  The big problem in my mind is that the
>> pointers, pointing at points in time, are perishable.
>
>Um, I'm not sure what perishability has to do with anything.  Timers at
>the moment can be added, deleted and destroyed and it's no big deal.
>
>Look in an algorithms book under "priority queue".  They usually don't
>say how to use a heap-ordered tree -- usually it's an array -- but you
>can use a tree if you want.  In such a tree, each timer has a link to
>_two_ next timers and one previous timer.  (The previous timer link is
>only needed if you can delete timers before they expire, which is
>required for Linux).
>
>I'm not saying saying a heap-ordered tree is fastest.  But it's ok, and
>doesn't require any more storage than the `struct timer' objects
>themselves.
>
>It's possible to optimise this kind of data structure rather a lot,
>depending on how you want to use it.  Linux' current timer algorithm is
>a pretty good example of a priority queue optimised for timer ticks,
>where you don't mind doing a small amount of work per tick.
>
>One notable complication with the kernel is that we don't want every
>timer to run at its exactly allocated time, except for some special
>timers.  For example, if you have 100 TCP streams and their
>retransmission times are scheduled for 1.0000s, 1.0001s, 1.0002s, etc.,
>you'd much rather just process them all at once as is done at the moment
>by the 100Hz timer.  This is because cache locality is much more
>important for TCP retransmits than transmit timing resolution.
>
>There's literature online about this class of problems: look up "event
>set" and "simulation" and "fast priority queue".

I bumped into a funny non-optimization a few years ago. A system with
a timer queue like the above had been "optimized" by keeping old timer
elements... ready for new tasks to link onto and activate. The
granularity was 1 millsecond. Over time, all timer values from 0 to
roughly 10 minutes had been used. That resulted in 60,000 permanent
storage fragments laying about... a significant fragmentation problem.
The end was a forced recycle every month or so.

john alvord

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11 16:59                                       ` george anzinger
@ 2001-04-11 18:57                                         ` Jamie Lokier
  2001-04-11 19:21                                           ` John Alvord
  0 siblings, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-11 18:57 UTC (permalink / raw)
  To: george anzinger
  Cc: Mark Salisbury, mark salisbury, high-res-timers-discourse,
	Alan Cox, Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel

george anzinger wrote:
> > A pointer-based priority queue is really not a very complex thing, and
> > there are ways to optimise them for typical cases like the above.
> > 
> Please do enlighten me.  The big problem in my mind is that the
> pointers, pointing at points in time, are perishable.

Um, I'm not sure what perishability has to do with anything.  Timers at
the moment can be added, deleted and destroyed and it's no big deal.

Look in an algorithms book under "priority queue".  They usually don't
say how to use a heap-ordered tree -- usually it's an array -- but you
can use a tree if you want.  In such a tree, each timer has a link to
_two_ next timers and one previous timer.  (The previous timer link is
only needed if you can delete timers before they expire, which is
required for Linux).

I'm not saying saying a heap-ordered tree is fastest.  But it's ok, and
doesn't require any more storage than the `struct timer' objects
themselves.

It's possible to optimise this kind of data structure rather a lot,
depending on how you want to use it.  Linux' current timer algorithm is
a pretty good example of a priority queue optimised for timer ticks,
where you don't mind doing a small amount of work per tick.

One notable complication with the kernel is that we don't want every
timer to run at its exactly allocated time, except for some special
timers.  For example, if you have 100 TCP streams and their
retransmission times are scheduled for 1.0000s, 1.0001s, 1.0002s, etc.,
you'd much rather just process them all at once as is done at the moment
by the 100Hz timer.  This is because cache locality is much more
important for TCP retransmits than transmit timing resolution.

There's literature online about this class of problems: look up "event
set" and "simulation" and "fast priority queue".

enjoy,
-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 22:35         ` Alan Cox
  2001-04-10 11:43           ` David Schleef
@ 2001-04-11 18:43           ` Oliver Xymoron
  1 sibling, 0 replies; 118+ messages in thread
From: Oliver Xymoron @ 2001-04-11 18:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mikulas Patocka, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

On Mon, 9 Apr 2001, Alan Cox wrote:

> > > Its worth doing even on the ancient x86 boards with the PIT.
> >
> > Note that programming the PIT is sloooooooow and doing it on every timer
> > add_timer/del_timer would be a pain.
>
> You only have to do it occasionally.
>
> When you add a timer newer than the current one
> 	(arguably newer by at least 1/2*HZ sec)

That's only if we want to do no better than the current system. We'd want
a new variable called timer_margin or something, which would be dependent
on interrupt source and processor, and could be tuned up or down via
sysctl.

> When you finish running the timers at an interval and the new interval is
> significantly larger than the current one.

Make that larger or smaller. If we come out of a quiescent state (1 hz
interrupts, say) and start getting 10ms timers, we want to respond to them
right away.

> Remember each tick we poke the PIT anyway

We could also have a HZ_max tunable above which we would not try to
reprogram the interval. On older systems, this could be set at
100-200HZ...

--
 "Love the dolphins," she advised him. "Write by W.A.S.T.E.."


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer!
@ 2001-04-11 17:56 Bret Indrelee
  2001-04-12 17:39 ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Bret Indrelee @ 2001-04-11 17:56 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Mikulas Patocka (mikulas@artax.karlin.mff.cuni.cz) wrote:
> Adding and removing timers happens much more frequently than PIT tick,
> so 
> comparing these times is pointless. 
>
> If you have some device and timer protecting it from lockup on buggy 
> hardware, you actually 
>
> send request to device 
> add timer 
>
> receive interrupt and read reply 
> remove timer 
>
> With the curent timer semantics, the cost of add timer and del timer is 
> nearly zero. If you had to reprogram the PIT on each request and reply,
> it 
> would slow things down. 
>
> Note that you call mod_timer also on each packet received - and in worst 
> case (which may happen), you end up reprogramming the PIT on each
> packet. 

You can still have nearly zero cost for the normal case. Avoiding worst
case behaviour is also pretty easy.

You only reprogram the PIT if you have to change the interval.

Keep all timers in a sorted double-linked list. Do the insert
intelligently, adding it from the back or front of the list depending on
where it is in relation to existing entries.

You only need to reprogram the interval timer when:
1. You've got a new entry at the head of the list
AND
2. You've reprogrammed the interval to something larger than the time to 
the new head of list.

In the case of a device timeout, it is usually not going to be inserted at
the head of the list. It is very seldom going to actually timeout.

Choose your interval wisely, only increasing it when you know it will pay
off. The best way of doing this would probably be to track some sort
of LCD for timeouts.

The real trick is to do a lot less processing on every tick than is
currently done. Current generation PCs can easily handle 1000s of
interrupts a second if you keep the overhead small.

-Bret

------------------------------------------------------------------------------
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
bret@io.com   |  -Riff in The Quatrix



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11 16:11                                     ` Jamie Lokier
@ 2001-04-11 16:59                                       ` george anzinger
  2001-04-11 18:57                                         ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-11 16:59 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Mark Salisbury, mark salisbury, high-res-timers-discourse,
	Alan Cox, Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel

Jamie Locker wrote:
> 
> Mark Salisbury wrote:
> > > The complexity comes in when you want to maintain indexes into the list
> > > for quick insertion of new timers.  To get the current insert
> > > performance, for example, you would need pointers to (at least) each of
> > > the next 256 centasecond boundaries in the list.  But the list ages, so
> > > these pointers need to be continually updated.  The thought I had was to
> > > update needed pointers (and only those needed) each time an insert was
> > > done and a needed pointer was found to be missing or stale.  Still it
> > > adds complexity that the static structure used now doesn't have.
> >
> > actually, I think a head and tail pointer would be sufficient for most
> > cases. (most new timers are either going to be a new head of list or a new
> > tail, i.e. long duration timeouts that will never be serviced or short
> > duration timers that are going to go off "real soon now (tm)")  the oddball
> > cases would be mostly coming from user-space, i.e. nanosleep which a longerr
> > list insertion disapears in the block/wakeup/context switch overhead
> 
> A pointer-based priority queue is really not a very complex thing, and
> there are ways to optimise them for typical cases like the above.
> 
Please do enlighten me.  The big problem in my mind is that the
pointers, pointing at points in time, are perishable.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11 11:42                               ` Maciej W. Rozycki
@ 2001-04-11 16:13                                 ` Jamie Lokier
  2001-04-12  9:51                                   ` Maciej W. Rozycki
  0 siblings, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-11 16:13 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Zdenek Kabelac, linux-kernel

Maciej W. Rozycki wrote:
> On Tue, 10 Apr 2001, Zdenek Kabelac wrote:
> 
> > I think it would be wrong if we could not add new very usable features
> > to the system just because some old hardware doesn't support it.
> 
>  s/old/crappy/ -- even old hardware can handle vsync IRQs fine.  E.g. TVGA
> 8900C. 

Think of the original 64k and 256k VGA cards.  I think some of those
didn't have an irq, but did have a way to read the progress of the
raster, which you could PLL to using timer interrupts.  Some video games
still look fine at 320x200 :-)

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11  0:48                                   ` Mark Salisbury
  2001-04-11  2:35                                     ` george anzinger
@ 2001-04-11 16:11                                     ` Jamie Lokier
  2001-04-11 16:59                                       ` george anzinger
  1 sibling, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-11 16:11 UTC (permalink / raw)
  To: Mark Salisbury
  Cc: george anzinger, mark salisbury, high-res-timers-discourse,
	Alan Cox, Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel

Mark Salisbury wrote:
> > The complexity comes in when you want to maintain indexes into the list
> > for quick insertion of new timers.  To get the current insert
> > performance, for example, you would need pointers to (at least) each of
> > the next 256 centasecond boundaries in the list.  But the list ages, so
> > these pointers need to be continually updated.  The thought I had was to
> > update needed pointers (and only those needed) each time an insert was
> > done and a needed pointer was found to be missing or stale.  Still it
> > adds complexity that the static structure used now doesn't have.
> 
> actually, I think a head and tail pointer would be sufficient for most
> cases. (most new timers are either going to be a new head of list or a new
> tail, i.e. long duration timeouts that will never be serviced or short
> duration timers that are going to go off "real soon now (tm)")  the oddball
> cases would be mostly coming from user-space, i.e. nanosleep which a longerr
> list insertion disapears in the block/wakeup/context switch overhead

A pointer-based priority queue is really not a very complex thing, and
there are ways to optimise them for typical cases like the above.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 19:50                             ` Zdenek Kabelac
@ 2001-04-11 11:42                               ` Maciej W. Rozycki
  2001-04-11 16:13                                 ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: Maciej W. Rozycki @ 2001-04-11 11:42 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Jamie Lokier, linux-kernel

On Tue, 10 Apr 2001, Zdenek Kabelac wrote:

> I think it would be wrong if we could not add new very usable features
> to the system just because some old hardware doesn't support it.

 s/old/crappy/ -- even old hardware can handle vsync IRQs fine.  E.g. TVGA
8900C. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-11  9:06 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-11  9:06 UTC (permalink / raw)
  To: george anzinger
  Cc: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	linux-kernel



>f) As noted, the account timers (task user/system times) would be much
>more accurate with the tick less approach.  The cost is added code in
>both the system call and the schedule path.
>
>Tentative conclusions:
>
>Currently we feel that the tick less approach is not acceptable due to
>(f).  We felt that this added code would NOT be welcome AND would, in a
>reasonably active system, have much higher overhead than any savings in
>not having a tick.  Also (d) implies a list organization that will, at
>the very least, be harder to understand.  (We have some thoughts here,
>but abandoned the effort because of (f).)  We are, of course, open to
>discussion on this issue and all others related to the project
>objectives.
f) might be true on Intel based systems. At least for s/390 the situation
is a little bit different. Here is a extract from the s/390 part of the
timer patch:

       .macro  UPDATE_ENTER_TIME reload
       la      %r14,thread+_TSS_UTIME(%r9) # pointer to utime
       tm      SP_PSW+1(%r15),0x01      # interrupting from user ?
       jno     0f                       # yes -> add to user time
       la      %r14,8(%r14)             # no -> add to system time
0:     lm      %r0,%r1,0(%r14)          # load user/system time
       sl      %r1,__LC_LAST_MARK+4     # subtract last time mark
       bc      3,BASED(1f)              # borrow ?
       sl      %r0,BASED(.Lc1)
1:     sl      %r0,__LC_LAST_MARK
       stck    __LC_LAST_MARK           # make a new mark
       al      %r1,__LC_LAST_MARK+4     # add new mark -> added delta
       bc      12,BASED(2f)             # carry ?
       al      %r0,BASED(.Lc1)
2:     al      %r0,__LC_LAST_MARK
       stm     %r0,%r1,0(%r14)          # store updated user/system time
       clc     __LC_LAST_MARK(8),__LC_JIFFY_TIMER # check if enough time
       jl      3f                       # passed for a jiffy update
       l       %r1,BASED(.Ltime_warp)
       basr    %r14,%r1
       .if     \reload                  # reload != 0 for system call
       lm      %r2,%r6,SP_R2(%r15)      # reload clobbered parameters
       .endif
3:

       .macro  UPDATE_LEAVE_TIME
       l       %r1,BASED(.Ltq_timer)    # test if tq_timer list is empty
       x       %r1,0(%r1)               # tq_timer->next != tq_timer ?
       jz      0f
       l       %r1,BASED(.Ltq_timer_active)
       icm     %r0,15,0(%r1)            # timer event already added ?
       jnz     0f
       l       %r1,BASED(.Ltq_pending)
       basr    %r14,%r1
0:     lm      %r0,%r1,thread+_TSS_STIME(%r9) # load system time
       sl      %r1,__LC_LAST_MARK+4     # subtract last time mark
       bc      3,BASED(1f)              # borrow ?
       sl      %r0,BASED(.Lc1)
1:     sl      %r0,__LC_LAST_MARK
       stck    __LC_LAST_MARK           # make new mark
       al      %r1,__LC_LAST_MARK+4     # add new mark -> added delta
       bc      12,BASED(2f)             # carry ?
       al      %r0,BASED(.Lc1)
2:     al      %r0,__LC_LAST_MARK
       stm     %r0,%r1,thread+_TSS_STIME(%r9) # store system time
       .endm

The two macros UPDATE_ENTER_TIME and UPDATE_LEAVE_TIMER are executed
on every system entry/exit. In the case that no special work has to
be done less then 31 instruction are executed in addition to the
normal system entry/exit code. Special work has to be done if more
time then 1/HZ has passed (call time_warp), or if tq_timer contains
an element (call tq_pending).
The accuracy of the timer events has not changed. It still is 1/HZ.
The only thing this patch does is to avoid unneeded interruptions.
I'd be happy if this could be combined with a new, more accurate
timing method.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:14         ` Mark Salisbury
@ 2001-04-11  5:55           ` Karim Yaghmour
  0 siblings, 0 replies; 118+ messages in thread
From: Karim Yaghmour @ 2001-04-11  5:55 UTC (permalink / raw)
  To: Mark Salisbury
  Cc: Martin Mares, Andi Kleen, Jeff Dike, schwidefsky, linux-kernel

Mark Salisbury wrote:
> 
> It would probably be a good compile config option to allow fine or coarse
> process time accounting, that leaves the choice to the person setting up the
> system to make the choice based on their needs.
> 

I suggested this a while ago during a discussion about performance
measurement. This would be fairly easy to implement using the patch
provided with the Linux Trace Toolkit since all entry points and
exit points are known (and it already is available in post-mortem
analysis). Implementing the measurement code within the kernel should
be fairly easy to implement and it would be provided as part of the
compile option. All in all, given the measurements I made, I'd place
the overhead at around 1% for the computations. (The overhead is very
likely to be negligeable when eventual fixes are taken into account.)

===================================================
                 Karim Yaghmour
               karym@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-11  0:48                                   ` Mark Salisbury
@ 2001-04-11  2:35                                     ` george anzinger
  2001-04-12  0:24                                       ` Mark Salisbury
  2001-04-11 16:11                                     ` Jamie Lokier
  1 sibling, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-11  2:35 UTC (permalink / raw)
  To: Mark Salisbury
  Cc: mark salisbury, Jamie Lokier, high-res-timers-discourse,
	Alan Cox, Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel

Mark Salisbury wrote:
> 
> > mark salisbury wrote:
> > >
> > > george anzinger wrote:
> > >
> > > > f) As noted, the account timers (task user/system times) would be much
> > > > more accurate with the tick less approach.  The cost is added code in
> > > > both the system call and the schedule path.
> > > >
> > > > Tentative conclusions:
> > > >
> > > > Currently we feel that the tick less approach is not acceptable due to
> > > > (f).  We felt that this added code would NOT be welcome AND would, in
> a
> > > > reasonably active system, have much higher overhead than any savings
> in
> > > > not having a tick.  Also (d) implies a list organization that will, at
> > > > the very least, be harder to understand.  (We have some thoughts here,
> > > > but abandoned the effort because of (f).)  We are, of course, open to
> > > > discussion on this issue and all others related to the project
> > > > objectives.
> > >
> > > f does not imply tick-less is not acceptable, it implies that better
> process time
> > > accounting is not acceptable.
> >
> > My thinking is that a timer implementation that forced (f) would have
> > problems gaining acceptance (even with me :).  I think a tick less
> > system does force this and this is why we have, at least for the moment,
> > abandoned it.  In no way does this preclude (f) as it is compatible with
> > either ticks or tick less time keeping.  On the other hand, the stated
> > project objectives do not include (f) unless, of course we do a tick
> > less time system.
> > >
> > > list organization is not complex, it is a sorted absolute time list.  I
> would
> > > argue that this is a hell of a lot easier to understand that ticks +
> offsets.
> >
> > The complexity comes in when you want to maintain indexes into the list
> > for quick insertion of new timers.  To get the current insert
> > performance, for example, you would need pointers to (at least) each of
> > the next 256 centasecond boundaries in the list.  But the list ages, so
> > these pointers need to be continually updated.  The thought I had was to
> > update needed pointers (and only those needed) each time an insert was
> > done and a needed pointer was found to be missing or stale.  Still it
> > adds complexity that the static structure used now doesn't have.
> 
> actually, I think a head and tail pointer would be sufficient for most
> cases. (most new timers are either going to be a new head of list or a new
> tail, i.e. long duration timeouts that will never be serviced or short
> duration timers that are going to go off "real soon now (tm)")  the oddball
> cases would be mostly coming from user-space, i.e. nanosleep which a longerr
> list insertion disapears in the block/wakeup/context switch overhead
> 
> > >
> > > still, better process time accounting should be a compile CONFIG option,
> not
> > > ignored and ruled out because some one thinks that is is to expensive in
> the
> > > general case.
> >
> > As I said above, we are not ruling it out, but rather, we are not
> > requiring it by going tick less.
> > As I said, it is not clear how you could get
> > CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every
> > jiffie.  What did you have in mind?
> 
> time accounting can be limited to the quantum expiration and voluntary
> yields in the tickless/useless case.
> 
> > For the most part, I agree.  I am not sure that it makes a lot of sense
> > to mix some of these options, however.  I think it comes down to the
> > question of benefit vs cost.  If keeping an old version around that is
> > not any faster or more efficient in any way would seem too costly to
> > me.  We would like to provide a system that is better in every way and
> > thus eliminate the need to keep the old one around.  We could leave it
> > in as a compile option so folks would have a fall back, I suppose.
> 
> I agree that some combinations don't make much sense _TO_ME_ but that
> doesn't mean they don't meet sombody's needs.
> 
> in my case (embedded, medium hard real time, massively parallel
> multicomputer)  the only choices that makes sense to my customers is
> tickless/useless in deployment and tickless/useful in
> development/profiling/optimization.

I suspect you might go for ticked if its overhead was less.  The thing
that makes me think the overhead is high for tick less is the accounting
and time slice stuff.  This has to be handled each system call and each
schedule call.  These happen WAY more often than ticks...  Contrary
wise, I would go for tick less if its overhead is the same or less under
a reasonable load.  Of course, we would also want to check overload
sensitivity.
> 
> in other cases ticked/useless makes sense (dumb old slow chips)
> 
> I have no idea who would want ticked/useful or hybrid.  i suggested hybrid
> as an alternative in case linus might be reluctant to gutting and replacing
> the sysclock.
> 
> >
> > An Issue no one has raised is that the tick less system would need to
> > start a timer each time it scheduled a task.  This would lead to either
> > slow but very precise time slicing or about what we have today with more
> > schedule overhead.
> 
> no.  you would re-use the timer with a new expiration time and a re-insert.

The issue is not the timer structure itself, but the insert and removal
time AND the time accounting that would have to go on around it.  The
timer would need to be computed and started each time on the way out of
schedule and stopped and the residue saved on schedule entry.  One might
want to roll this into the accounting time stuff...
> 
> also re overhead. the cost of servicing 10 times as many interrupts as
> necessary for system function must cost a fair chunk.
> 
On a half way loaded system?  Do you think it is that high?  I sort of
think that, once the system is loaded, there would be a useful timer
tick most of the time.  Might be useful to do some measurements of this.

George

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 22:08                                 ` george anzinger
@ 2001-04-11  0:48                                   ` Mark Salisbury
  2001-04-11  2:35                                     ` george anzinger
  2001-04-11 16:11                                     ` Jamie Lokier
  0 siblings, 2 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-11  0:48 UTC (permalink / raw)
  To: george anzinger, mark salisbury
  Cc: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel


> mark salisbury wrote:
> >
> > george anzinger wrote:
> >
> > > f) As noted, the account timers (task user/system times) would be much
> > > more accurate with the tick less approach.  The cost is added code in
> > > both the system call and the schedule path.
> > >
> > > Tentative conclusions:
> > >
> > > Currently we feel that the tick less approach is not acceptable due to
> > > (f).  We felt that this added code would NOT be welcome AND would, in
a
> > > reasonably active system, have much higher overhead than any savings
in
> > > not having a tick.  Also (d) implies a list organization that will, at
> > > the very least, be harder to understand.  (We have some thoughts here,
> > > but abandoned the effort because of (f).)  We are, of course, open to
> > > discussion on this issue and all others related to the project
> > > objectives.
> >
> > f does not imply tick-less is not acceptable, it implies that better
process time
> > accounting is not acceptable.
>
> My thinking is that a timer implementation that forced (f) would have
> problems gaining acceptance (even with me :).  I think a tick less
> system does force this and this is why we have, at least for the moment,
> abandoned it.  In no way does this preclude (f) as it is compatible with
> either ticks or tick less time keeping.  On the other hand, the stated
> project objectives do not include (f) unless, of course we do a tick
> less time system.
> >
> > list organization is not complex, it is a sorted absolute time list.  I
would
> > argue that this is a hell of a lot easier to understand that ticks +
offsets.
>
> The complexity comes in when you want to maintain indexes into the list
> for quick insertion of new timers.  To get the current insert
> performance, for example, you would need pointers to (at least) each of
> the next 256 centasecond boundaries in the list.  But the list ages, so
> these pointers need to be continually updated.  The thought I had was to
> update needed pointers (and only those needed) each time an insert was
> done and a needed pointer was found to be missing or stale.  Still it
> adds complexity that the static structure used now doesn't have.

actually, I think a head and tail pointer would be sufficient for most
cases. (most new timers are either going to be a new head of list or a new
tail, i.e. long duration timeouts that will never be serviced or short
duration timers that are going to go off "real soon now (tm)")  the oddball
cases would be mostly coming from user-space, i.e. nanosleep which a longerr
list insertion disapears in the block/wakeup/context switch overhead

> >
> > still, better process time accounting should be a compile CONFIG option,
not
> > ignored and ruled out because some one thinks that is is to expensive in
the
> > general case.
>
> As I said above, we are not ruling it out, but rather, we are not
> requiring it by going tick less.
> As I said, it is not clear how you could get
> CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every
> jiffie.  What did you have in mind?

time accounting can be limited to the quantum expiration and voluntary
yields in the tickless/useless case.

> For the most part, I agree.  I am not sure that it makes a lot of sense
> to mix some of these options, however.  I think it comes down to the
> question of benefit vs cost.  If keeping an old version around that is
> not any faster or more efficient in any way would seem too costly to
> me.  We would like to provide a system that is better in every way and
> thus eliminate the need to keep the old one around.  We could leave it
> in as a compile option so folks would have a fall back, I suppose.

I agree that some combinations don't make much sense _TO_ME_ but that
doesn't mean they don't meet sombody's needs.

in my case (embedded, medium hard real time, massively parallel
multicomputer)  the only choices that makes sense to my customers is
tickless/useless in deployment and tickless/useful in
development/profiling/optimization.

in other cases ticked/useless makes sense (dumb old slow chips)

I have no idea who would want ticked/useful or hybrid.  i suggested hybrid
as an alternative in case linus might be reluctant to gutting and replacing
the sysclock.

>
> An Issue no one has raised is that the tick less system would need to
> start a timer each time it scheduled a task.  This would lead to either
> slow but very precise time slicing or about what we have today with more
> schedule overhead.

no.  you would re-use the timer with a new expiration time and a re-insert.

also re overhead. the cost of servicing 10 times as many interrupts as
necessary for system function must cost a fair chunk.




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 20:02                               ` mark salisbury
@ 2001-04-10 22:08                                 ` george anzinger
  2001-04-11  0:48                                   ` Mark Salisbury
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-04-10 22:08 UTC (permalink / raw)
  To: mark salisbury
  Cc: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel

mark salisbury wrote:
> 
> george anzinger wrote:
> 
> > f) As noted, the account timers (task user/system times) would be much
> > more accurate with the tick less approach.  The cost is added code in
> > both the system call and the schedule path.
> >
> > Tentative conclusions:
> >
> > Currently we feel that the tick less approach is not acceptable due to
> > (f).  We felt that this added code would NOT be welcome AND would, in a
> > reasonably active system, have much higher overhead than any savings in
> > not having a tick.  Also (d) implies a list organization that will, at
> > the very least, be harder to understand.  (We have some thoughts here,
> > but abandoned the effort because of (f).)  We are, of course, open to
> > discussion on this issue and all others related to the project
> > objectives.
> 
> f does not imply tick-less is not acceptable, it implies that better process time
> accounting is not acceptable.

My thinking is that a timer implementation that forced (f) would have
problems gaining acceptance (even with me :).  I think a tick less
system does force this and this is why we have, at least for the moment,
abandoned it.  In no way does this preclude (f) as it is compatible with
either ticks or tick less time keeping.  On the other hand, the stated
project objectives do not include (f) unless, of course we do a tick
less time system.
> 
> list organization is not complex, it is a sorted absolute time list.  I would
> argue that this is a hell of a lot easier to understand that ticks + offsets.

The complexity comes in when you want to maintain indexes into the list
for quick insertion of new timers.  To get the current insert
performance, for example, you would need pointers to (at least) each of
the next 256 centasecond boundaries in the list.  But the list ages, so
these pointers need to be continually updated.  The thought I had was to
update needed pointers (and only those needed) each time an insert was
done and a needed pointer was found to be missing or stale.  Still it
adds complexity that the static structure used now doesn't have.
> 
> still, better process time accounting should be a compile CONFIG option, not
> ignored and ruled out because some one thinks that is is to expensive in the
> general case.

As I said above, we are not ruling it out, but rather, we are not
requiring it by going tick less.
> 
> the whole point of linux and CONFIG options is to get you the kernel with the
> features you want, not what someone else wants.
> 
> there should be a whole range of config options associated with this issue:
> 
> CONFIG_JIFFIES   == old jiffies implementation
> CONFIG_TICKLESS  == tickless
> CONFIG_HYBRID  == old jiffies plus a tickless high-res timer system on
>                                     the side but not assoc w/ process and global
> timekeeping
> 
> CONFIG_USELESS_PROCESS_TIME_ACCOUNTING = old style, cheap time acctg
> CONFIG_USEFUL_BUT_COSTS_TIME_ACCOUNTING = accurate but expensive time accounting
> 
> this way, users who want tickless and lousy time acctg can have it AND people who
> want jiffies and good time acctg could have it.

As I said, it is not clear how you could get
CONFIG_USELESS_PROCESS_TIME_ACCOUNTING unless you did a tick every
jiffie.  What did you have in mind?
> 
> these features are largely disjoint and easily seperable.  it is also relatively
> trivial to do this in such a way that drivers depending on the jiffie abstraction
> can be supported without modification no matter what the configuration.
> 
For the most part, I agree.  I am not sure that it makes a lot of sense
to mix some of these options, however.  I think it comes down to the
question of benefit vs cost.  If keeping an old version around that is
not any faster or more efficient in any way would seem too costly to
me.  We would like to provide a system that is better in every way and
thus eliminate the need to keep the old one around.  We could leave it
in as a compile option so folks would have a fall back, I suppose.

An Issue no one has raised is that the tick less system would need to
start a timer each time it scheduled a task.  This would lead to either
slow but very precise time slicing or about what we have today with more
schedule overhead.

George


>     Mark Salisbury

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 19:28                             ` george anzinger
@ 2001-04-10 20:02                               ` mark salisbury
  2001-04-10 22:08                                 ` george anzinger
  2001-08-01  1:08                               ` george anzinger
  1 sibling, 1 reply; 118+ messages in thread
From: mark salisbury @ 2001-04-10 20:02 UTC (permalink / raw)
  To: george anzinger
  Cc: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Jeff Dike, schwidefsky,
	linux-kernel

george anzinger wrote:

> f) As noted, the account timers (task user/system times) would be much
> more accurate with the tick less approach.  The cost is added code in
> both the system call and the schedule path.
>
> Tentative conclusions:
>
> Currently we feel that the tick less approach is not acceptable due to
> (f).  We felt that this added code would NOT be welcome AND would, in a
> reasonably active system, have much higher overhead than any savings in
> not having a tick.  Also (d) implies a list organization that will, at
> the very least, be harder to understand.  (We have some thoughts here,
> but abandoned the effort because of (f).)  We are, of course, open to
> discussion on this issue and all others related to the project
> objectives.

f does not imply tick-less is not acceptable, it implies that better process time
accounting is not acceptable.

list organization is not complex, it is a sorted absolute time list.  I would
argue that this is a hell of a lot easier to understand that ticks + offsets.

still, better process time accounting should be a compile CONFIG option, not
ignored and ruled out because some one thinks that is is to expensive in the
general case.

the whole point of linux and CONFIG options is to get you the kernel with the
features you want, not what someone else wants.

there should be a whole range of config options associated with this issue:

CONFIG_JIFFIES   == old jiffies implementation
CONFIG_TICKLESS  == tickless
CONFIG_HYBRID  == old jiffies plus a tickless high-res timer system on
                                    the side but not assoc w/ process and global
timekeeping

CONFIG_USELESS_PROCESS_TIME_ACCOUNTING = old style, cheap time acctg
CONFIG_USEFUL_BUT_COSTS_TIME_ACCOUNTING = accurate but expensive time accounting

this way, users who want tickless and lousy time acctg can have it AND people who
want jiffies and good time acctg could have it.

these features are largely disjoint and easily seperable.  it is also relatively
trivial to do this in such a way that drivers depending on the jiffie abstraction
can be supported without modification no matter what the configuration.

    Mark Salisbury


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 18:45               ` Stephen D. Williams
@ 2001-04-10 19:59                 ` Andi Kleen
  0 siblings, 0 replies; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 19:59 UTC (permalink / raw)
  To: Stephen D. Williams
  Cc: Andi Kleen, Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

On Tue, Apr 10, 2001 at 02:45:15PM -0400, Stephen D. Williams wrote:
> When this is rewritten, I would strongly suggest that we find a way to
> make 'gettimeofday' nearly free.  Certain applications need to use this
> frequently while still being portable.  One solution when you do have
> clock ticks is a read-only mapped Int.  Another cheap solution is
> library assembly that adds a cycle clock delta since last system call to
> a 'gettimeofday' value set on every system call return.

The upcoming x86-64 port supports vsyscalls for just that purpose.



-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 18:24                           ` Jamie Lokier
  2001-04-10 19:28                             ` george anzinger
@ 2001-04-10 19:50                             ` Zdenek Kabelac
  2001-04-11 11:42                               ` Maciej W. Rozycki
  1 sibling, 1 reply; 118+ messages in thread
From: Zdenek Kabelac @ 2001-04-10 19:50 UTC (permalink / raw)
  To: Jamie Lokier

Jamie Lokier wrote:
> 
> Alan Cox wrote:
 > 
> Except for cards which don't generate an irq on vsync but do let you see
> how the raster is proceeding.  (I vaguely recall these).  For which a
> PLL and, wait for it.... high resolution timer is needed.
> 
> Perhaps that's a 1990s problem that doesn't need a 2000s solution though :-)

I think it would be wrong if we could not add new very usable features
to the system just because some old hardware doesn't support it.
(I also believe this was the original idea why XFree has no user
interface
for such important thing like VBI screen switch - because not all device
could provide such information)
I think those who use such old system probably don't need any
synchronized
video or game output - simply because it will not work in such system
anyway :)
so we should not worry about this.

kabi@i.am


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 17:27                     ` Alan Cox
  2001-04-10 17:35                       ` Jamie Lokier
@ 2001-04-10 19:42                       ` Zdenek Kabelac
  1 sibling, 0 replies; 118+ messages in thread
From: Zdenek Kabelac @ 2001-04-10 19:42 UTC (permalink / raw)
  To: Alan Cox

Alan Cox wrote:
> 
> > Games would like to be able to page flip at vertical refresh time --
> > <1ms accuracy please.  Network traffic shaping benefits from better than
> 
> This is an X issue. I was talking with Jim Gettys about what is needed to
> get the relevant existing X extensions for this working

I've already proposed my /dev/vbi device (currently works only for MGA
card)
- read returns when VBI occures - works quite well...
(currently in avifile CVS tree)

Anyway in good all days AmigaOS had interrupt service where devices
where sending timer request - they were queued and timer device was
serving
them in order - I don't see the reason why we should implement this
differently.
If there is no real reason to interrupt system more then 100Hz
periodicity
then this is ok - scheduler will simple send time request for
rescheduling in 10ms.

Why we should create 1KHz timers or so when this way seems to be much
more
elegant and will work even on XXGHz systems.


bye

kabi@i.am


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 18:24                           ` Jamie Lokier
@ 2001-04-10 19:28                             ` george anzinger
  2001-04-10 20:02                               ` mark salisbury
  2001-08-01  1:08                               ` george anzinger
  2001-04-10 19:50                             ` Zdenek Kabelac
  1 sibling, 2 replies; 118+ messages in thread
From: george anzinger @ 2001-04-10 19:28 UTC (permalink / raw)
  To: Jamie Lokier, high-res-timers-discourse
  Cc: Alan Cox, Mikulas Patocka, David Schleef, Mark Salisbury,
	Jeff Dike, schwidefsky, linux-kernel

Just for your information we have a project going that is trying to come
up with a good solution for all of this:

http://sourceforge.net/projects/high-res-timers

We have a mailing list there where we have discussed much of the same
stuff.  The mailing list archives are available at sourceforge.

Lets separate this into findings and tentative conclusions :)

Findings:

a) The University of Kansas and others have done a lot of work here.

b) High resolution timer events can be had with or without changing HZ.

c) High resolution timer events can be had with or without eliminating
the 1/HZ tick.

d) The organization of the timer list should reflect the existence of
the 1/HZ tick or not.  The current structure is not optimal for a "tick
less" implementation.  Better would be strict expire order with indexes
to "interesting times".

e) The current organization of the timer list generates a hiccup every
2.56 seconds to handle "cascading".  Hiccups are bad.

f) As noted, the account timers (task user/system times) would be much
more accurate with the tick less approach.  The cost is added code in
both the system call and the schedule path.  

Tentative conclusions:

Currently we feel that the tick less approach is not acceptable due to
(f).  We felt that this added code would NOT be welcome AND would, in a
reasonably active system, have much higher overhead than any savings in
not having a tick.  Also (d) implies a list organization that will, at
the very least, be harder to understand.  (We have some thoughts here,
but abandoned the effort because of (f).)  We are, of course, open to
discussion on this issue and all others related to the project
objectives.

We would reorganize the current timer list structure to eliminate the
cascade (e) and to add higher resolution entries.  The higher resolution
entries would carry an addition word which would be the fraction of a
jiffie that needs to be added to the jiffie value for the timer.  This
fraction would be in units defined by the platform to best suit the sub
jiffie interrupt generation code.  Each of the timer lists would then be
ordered by time based on this sub jiffie value.  In addition, in order
to eliminate the cascade, each timer list would carry all timers for
times that expire on the (jiffie mod (size of list)).  Thus, with the
current 256 first order lists, all timers with the same (jiffies & 255)
would be in the same list, again in expire order.  We also think that
the list size should be configurable to some power of two.  Again we
welcome discussion of these issues.

George

Alan Cox wrote:

>> It's also all interrupts, not only syscalls, and also context switch if you
>> want to be accurate.

>We dont need to be that accurate. Our sample rate is currently so low the
>data is worthless anyway

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:32             ` Andi Kleen
  2001-04-10 12:36               ` Alan Cox
@ 2001-04-10 18:45               ` Stephen D. Williams
  2001-04-10 19:59                 ` Andi Kleen
  1 sibling, 1 reply; 118+ messages in thread
From: Stephen D. Williams @ 2001-04-10 18:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

When this is rewritten, I would strongly suggest that we find a way to
make 'gettimeofday' nearly free.  Certain applications need to use this
frequently while still being portable.  One solution when you do have
clock ticks is a read-only mapped Int.  Another cheap solution is
library assembly that adds a cycle clock delta since last system call to
a 'gettimeofday' value set on every system call return.

sdw

Andi Kleen wrote:
> 
> On Tue, Apr 10, 2001 at 01:12:14PM +0100, Alan Cox wrote:
> > Measure the number of clocks executing a timer interrupt. rdtsc is fast. Now
> > consider the fact that out of this you get KHz or better scheduling
> > resolution required for games and midi. I'd say it looks good. I agree
> 
> And measure the number of cycles a gigahertz CPU can do between a 1ms timer.
> And then check how often the typical application executes something like
> gettimeofday.
> 
...

sdw
-- 
sdw@lig.net  http://sdw.st
Stephen D. Williams
43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax 
Dec2000

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 18:17                         ` Alan Cox
@ 2001-04-10 18:24                           ` Jamie Lokier
  2001-04-10 19:28                             ` george anzinger
  2001-04-10 19:50                             ` Zdenek Kabelac
  0 siblings, 2 replies; 118+ messages in thread
From: Jamie Lokier @ 2001-04-10 18:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel

Alan Cox wrote:
> > > This is an X issue. I was talking with Jim Gettys about what is needed to
> > > get the relevant existing X extensions for this working
> > 
> > Last time I looked at XF86DGA (years ago), it seemed to have the right
> > hooks in place.  Just a matter of server implementation.  My
> > recollection is dusty though.
> 
> There is also a timing extension for synchronizing events to happenings. The
> stopper is the kernel interface for the vblank handling since the irq must
> be handled and cleared in kernel context before we return to X. We now think
> we know how to handle that cleanly

Except for cards which don't generate an irq on vsync but do let you see
how the raster is proceeding.  (I vaguely recall these).  For which a
PLL and, wait for it.... high resolution timer is needed.

Perhaps that's a 1990s problem that doesn't need a 2000s solution though :-)

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 17:35                       ` Jamie Lokier
@ 2001-04-10 18:17                         ` Alan Cox
  2001-04-10 18:24                           ` Jamie Lokier
  0 siblings, 1 reply; 118+ messages in thread
From: Alan Cox @ 2001-04-10 18:17 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Alan Cox, Mikulas Patocka, David Schleef, Mark Salisbury,
	Jeff Dike, schwidefsky, linux-kernel

> > This is an X issue. I was talking with Jim Gettys about what is needed to
> > get the relevant existing X extensions for this working
> 
> Last time I looked at XF86DGA (years ago), it seemed to have the right
> hooks in place.  Just a matter of server implementation.  My
> recollection is dusty though.

There is also a timing extension for synchronizing events to happenings. The
stopper is the kernel interface for the vblank handling since the irq must
be handled and cleared in kernel context before we return to X. We now think
we know how to handle that cleanly

Alan


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 11:43           ` David Schleef
  2001-04-10 12:04             ` Mikulas Patocka
  2001-04-10 12:19             ` Mark Salisbury
@ 2001-04-10 17:51             ` yodaiken
  2 siblings, 0 replies; 118+ messages in thread
From: yodaiken @ 2001-04-10 17:51 UTC (permalink / raw)
  To: David Schleef
  Cc: Alan Cox, Mikulas Patocka, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel

On Tue, Apr 10, 2001 at 04:43:36AM -0700, David Schleef wrote:
> However, on machines without a monotonically increasing counter,
> i.e., the TSC, you have to use 8254 timer 0 as both the timebase
> and the interval counter -- you end up slowly losing time because
> of the race condition between reading the timer and writing a
> new interval.  RTAI's solution is to disable kd_mksound and
> use timer 2 as a poor man's TSC.  If either of those is too big
> of a price, it may suffice to report that the timer granularity
> on 486's is 10 ms.

Just for the record, Michael Barabanov did this in RTLinux from before
kd_mksound was a function pointer in 1995. Michael had an optimization
attempt using channel 1 for a while, but on very slow machines this 
was not sufficient and he went back to channel 2. Of course, the 
fundamental problem is that board designers keep putting an 1920s
part in machines built in 2001. 

Here's the comment from the RTLinux 0.5 patch -- all available on the archives
on rtlinux.com.

+/* The main procedure; resets the 8254 timer to generate an interrupt.  The
+ * tricky part is to keep the global time while reprogramming it.  We latch
+ * counters 0 and 2 atomically before and after reprogramming to figure it out.
+ */


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 17:27                     ` Alan Cox
@ 2001-04-10 17:35                       ` Jamie Lokier
  2001-04-10 18:17                         ` Alan Cox
  2001-04-10 19:42                       ` Zdenek Kabelac
  1 sibling, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-10 17:35 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	schwidefsky, linux-kernel

Alan Cox wrote:
> > Games would like to be able to page flip at vertical refresh time --
> > <1ms accuracy please.  Network traffic shaping benefits from better than
> 
> This is an X issue. I was talking with Jim Gettys about what is needed to
> get the relevant existing X extensions for this working

Last time I looked at XF86DGA (years ago), it seemed to have the right
hooks in place.  Just a matter of server implementation.  My
recollection is dusty though.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 17:15                   ` Jamie Lokier
@ 2001-04-10 17:27                     ` Alan Cox
  2001-04-10 17:35                       ` Jamie Lokier
  2001-04-10 19:42                       ` Zdenek Kabelac
  0 siblings, 2 replies; 118+ messages in thread
From: Alan Cox @ 2001-04-10 17:27 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Mikulas Patocka, David Schleef, Alan Cox, Mark Salisbury,
	Jeff Dike, schwidefsky, linux-kernel

> Games would like to be able to page flip at vertical refresh time --
> <1ms accuracy please.  Network traffic shaping benefits from better than

This is an X issue. I was talking with Jim Gettys about what is needed to
get the relevant existing X extensions for this working


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 14:10                 ` Mikulas Patocka
                                     ` (2 preceding siblings ...)
  2001-04-10 15:43                   ` Alan Cox
@ 2001-04-10 17:15                   ` Jamie Lokier
  2001-04-10 17:27                     ` Alan Cox
  3 siblings, 1 reply; 118+ messages in thread
From: Jamie Lokier @ 2001-04-10 17:15 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Schleef, Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

Mikulas Patocka wrote:
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

Indeed, using precise timers for TCP would probably degrade performance
-- you should process a group of timer events together, when resolution
is not that important.

There are plenty of apps that need higher resolution.  Software modem
comes to mind (guess why ;), though the device driver supplies the high
resolution timed interrupts in that case.

Games would like to be able to page flip at vertical refresh time --
<1ms accuracy please.  Network traffic shaping benefits from better than
1ms timing.  Video players want to display their frames preferably
without 10ms jitter.

Even that old classic game "snake" benefits from decent timing.  I
worked on an X multiplayer snake implementation which was very
unpleasant and jerky at first.  1. Disable nagle for X connection :-)
Better but still jerky.  2. Write delay loop like this:

     calculate next_event_time
     select (0, 0, 0, next_event_time - 20ms)
     while (gettimeofday() < next_event_time)
       /* Busy loop for last 20ms. */

It's no coincidence that I've had to write another very similar event
loop recently.  You can see, this sort of thing is a real waste of CPU.

-- Jamie

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 14:10                 ` Mikulas Patocka
  2001-04-10 13:35                   ` root
  2001-04-10 14:22                   ` Andi Kleen
@ 2001-04-10 15:43                   ` Alan Cox
  2001-04-12  5:25                     ` watermodem
  2001-04-10 17:15                   ` Jamie Lokier
  3 siblings, 1 reply; 118+ messages in thread
From: Alan Cox @ 2001-04-10 15:43 UTC (permalink / raw)
  To: mikulas
  Cc: David Schleef, Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

There are a considerable number of people who really do need 1Khz resolution.
Midi is one of the example cases. That doesn't mean we have to go to a 1KHz
timer, we may want to combine a 100Hz timer with a smart switch to 1Khz


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-10 14:42 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10 14:42 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Schleef, Alan Cox, Mark Salisbury, Jeff Dike, linux-kernel



>BTW. Why we need to redesign timers at all? The cost of timer interrupt
>each 1/100 second is nearly zero (1000 instances on S/390 VM is not common
>case - it is not reasonable to degradate performance of timers because of
>this).
The cost of the timer interrupts on a single image system is neglectable,
true. As I already pointed out in the original proposal we are looking
for a solution that will allow us to minimize the costs of the timer
interrupts when we run many images. For us this case is not unusual and
it is reasonable to degrade performance of a running system by a very
small amount to get rid of the HZ timer. This proposal was never meant
to be the perfect solution for every platform, that is why it is
configuratable with the CONFIG_NO_HZ_TIMER option.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 14:10                 ` Mikulas Patocka
  2001-04-10 13:35                   ` root
@ 2001-04-10 14:22                   ` Andi Kleen
  2001-04-10 15:43                   ` Alan Cox
  2001-04-10 17:15                   ` Jamie Lokier
  3 siblings, 0 replies; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 14:22 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Schleef, Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

On Tue, Apr 10, 2001 at 04:10:28PM +0200, Mikulas Patocka wrote:
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

On networking bandwidth shaping and traffic control generally need higher precision timers.
There are also people who don't like the minimum 10ms delay for non-busy wait, it's
apparently a problem for some apps.

-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:31               ` David Schleef
  2001-04-10 12:34                 ` Mark Salisbury
@ 2001-04-10 14:10                 ` Mikulas Patocka
  2001-04-10 13:35                   ` root
                                     ` (3 more replies)
  1 sibling, 4 replies; 118+ messages in thread
From: Mikulas Patocka @ 2001-04-10 14:10 UTC (permalink / raw)
  To: David Schleef
  Cc: Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

> > Note that you call mod_timer also on each packet received - and in worst
> > case (which may happen), you end up reprogramming the PIT on each packet.
> 
> This just indicates that the interface needs to be richer -- i.e.,
> such as having a "lazy timer" that means: "wake me up when _at least_
> N ns have elapsed, but there's no hurry."  If waking you up at N ns
> is expensive, then the wakeup is delayed until something else
> interesting happens.
> 
> This is effectively what we have now anyway.  Perhaps the
> current add_timer() should be mapped to lazy timers.

BTW. Why we need to redesign timers at all? The cost of timer interrupt
each 1/100 second is nearly zero (1000 instances on S/390 VM is not common
case - it is not reasonable to degradate performance of timers because of
this).

Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
and MIN_DELACK is 0.04s, TCP would hardly benefit from them.

Mikulas




^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 14:10                 ` Mikulas Patocka
@ 2001-04-10 13:35                   ` root
  2001-04-10 14:22                   ` Andi Kleen
                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 118+ messages in thread
From: root @ 2001-04-10 13:35 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Schleef, Alan Cox, Jeff Dike, schwidefsky, linux-kernel

Mikulas Patocka wrote:

> BTW. Why we need to redesign timers at all? The cost of timer interrupt
> each 1/100 second is nearly zero (1000 instances on S/390 VM is not common
> case - it is not reasonable to degradate performance of timers because of
> this).
>
> Timers more precise than 100HZ aren't probably needed - as MIN_RTO is 0.2s
> and MIN_DELACK is 0.04s, TCP would hardly benefit from them.
>

well, I can think dozens of real time applications off the top of my head that
need beter than 1ms timing resolution (think sensor fusion)  1000 clock
interrupts/sec is wasteful when what you need is 1 very precisely timed
interrupt.

why do we redesign anything?  to make it better.  TCP is not the only thing in
the system.


if you are in love with the existing system, it shouldn't be hard to make it a
config option.


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:42           ` Mark Salisbury
@ 2001-04-10 12:54             ` Andi Kleen
  0 siblings, 0 replies; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 12:54 UTC (permalink / raw)
  To: Mark Salisbury; +Cc: Andi Kleen, linux-kernel

On Tue, Apr 10, 2001 at 08:42:33AM -0400, Mark Salisbury wrote:
> On Tue, 10 Apr 2001, Andi Kleen wrote:
> > .... Also generally the kernel has quite a lot of timers. 
> 
> are these generaly of the long interval, failure timeout type?

A lot of them are, but not all.

-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-10 12:54 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10 12:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alan Cox, Andi Kleen, Mark Salisbury, Jeff Dike, linux-kernel



>Does not sound very attractive all at all on non virtual machines (I see
the point on
>UML/VM):
>making system entry/context switch/interrupts slower, making add_timer
slower, just to
>process a few less timer interrupts. That's like robbing the fast paths
for a slow path.
The system entry/exit/context switch is slower. The add_timer/mod_timer is
only
a little bit slower in the case a new soonest timer event has been created.
 I
think you can forget the additional overhead for add_timer/mod_timer, its
the
additional path length on the system entry/exit that might be problematic.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:07       ` Mark Salisbury
@ 2001-04-10 12:45         ` Andi Kleen
  2001-04-10 12:42           ` Mark Salisbury
  0 siblings, 1 reply; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 12:45 UTC (permalink / raw)
  To: Mark Salisbury; +Cc: Andi Kleen, linux-kernel

On Tue, Apr 10, 2001 at 08:07:04AM -0400, Mark Salisbury wrote:
> which kind of U/K accaounting are you referring to?
> 
> are you referring to global changes in world time? are you referring to time
> used by a process? 

The later.

> 
> I think the reduction of clock interrupts by a factor of 10 would buy us some
> performance margin that could be traded for a slightly more complex handler.

It depends on your workload if you can trade that in. e.g. when a few hundred TCP
sockets are active a lot of timer ticks will run some timer handler. Also generally
the kernel has quite a lot of timers. There is some reduction on a idle system. That
is no doubt useful for VM/UML/VMware where you can have idle sessions hanging around, 
but otherwise it's not very interesting to optimize idle systems (except maybe for 
power saving purposes)


-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:45         ` Andi Kleen
@ 2001-04-10 12:42           ` Mark Salisbury
  2001-04-10 12:54             ` Andi Kleen
  0 siblings, 1 reply; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, linux-kernel

On Tue, 10 Apr 2001, Andi Kleen wrote:
> .... Also generally the kernel has quite a lot of timers. 

are these generaly of the long interval, failure timeout type?
i.e. 5 second device timeouts that are killed before they get executed because
the device didn't timeout?  if so, they have no effect on the number of timer
interrupts because they generally never get hit.  and when they do, you have to
pay the price anyway.

 -- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:36               ` Alan Cox
@ 2001-04-10 12:37                 ` Andi Kleen
  0 siblings, 0 replies; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 12:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

On Tue, Apr 10, 2001 at 01:36:27PM +0100, Alan Cox wrote:
> > It's also all interrupts, not only syscalls, and also context switch if you
> > want to be accurate.
> 
> We dont need to be that accurate. Our sample rate is currently so low the
> data is worthless anyway

Just without checking on context switch you would lose the information of per pid
system/user/total that is currently collected completely.

-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:32             ` Andi Kleen
@ 2001-04-10 12:36               ` Alan Cox
  2001-04-10 12:37                 ` Andi Kleen
  2001-04-10 18:45               ` Stephen D. Williams
  1 sibling, 1 reply; 118+ messages in thread
From: Alan Cox @ 2001-04-10 12:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Andi Kleen, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

> It's also all interrupts, not only syscalls, and also context switch if you
> want to be accurate.

We dont need to be that accurate. Our sample rate is currently so low the
data is worthless anyway


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:31               ` David Schleef
@ 2001-04-10 12:34                 ` Mark Salisbury
  2001-04-10 14:10                 ` Mikulas Patocka
  1 sibling, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:34 UTC (permalink / raw)
  To: David Schleef, David Schleef, Mikulas Patocka
  Cc: Alan Cox, Jeff Dike, schwidefsky, linux-kernel

On Tue, 10 Apr 2001, David Schleef wrote:
> This just indicates that the interface needs to be richer -- i.e.,
> such as having a "lazy timer" that means: "wake me up when _at least_
> N ns have elapsed, but there's no hurry."  If waking you up at N ns
> is expensive, then the wakeup is delayed until something else
> interesting happens.

all POSIX timer and timer like functions (timer_xxx, nanosleep, alarm etc)
are defined to operate on a 'no earlier than' semantic. i.e. if you ask to
nanosleep for 500 nsec, you will sleep for no less than 500 nanoseconds, but
possibly up to 20,000,500 nanoseconds.  this is because the maximum legal POSIX
time resolution is 20,000,000 nanoseconds (1/50th second or 50hz because of
european electricity and old hardware)

-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:12           ` Alan Cox
  2001-04-10 12:27             ` Mark Salisbury
@ 2001-04-10 12:32             ` Andi Kleen
  2001-04-10 12:36               ` Alan Cox
  2001-04-10 18:45               ` Stephen D. Williams
  1 sibling, 2 replies; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 12:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

On Tue, Apr 10, 2001 at 01:12:14PM +0100, Alan Cox wrote:
> Measure the number of clocks executing a timer interrupt. rdtsc is fast. Now
> consider the fact that out of this you get KHz or better scheduling 
> resolution required for games and midi. I'd say it looks good. I agree

And measure the number of cycles a gigahertz CPU can do between a 1ms timer.
And then check how often the typical application executes something like
gettimeofday.

> the accounting of user/system time needs care to avoid slowing down syscall
> paths

It's also all interrupts, not only syscalls, and also context switch if you
want to be accurate.

On modern PC hardware it might be possible to do user/system accounting using
performance MSRs. They have a bit in the performance counter that allows to
only account user or system. If you find a count that is near equivalent to
the cycles you have both: total = rdtsc, user = msr, system = rdtsc-msr.
At least PPro derived have event 0x16, number of instructions executed, which
might be good enough when multiplied with a factor if your instruction mix is not 
too unusual.

Still even with that the more complex checking in add_timer doesn't look too good.


-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:04             ` Mikulas Patocka
@ 2001-04-10 12:31               ` David Schleef
  2001-04-10 12:34                 ` Mark Salisbury
  2001-04-10 14:10                 ` Mikulas Patocka
  0 siblings, 2 replies; 118+ messages in thread
From: David Schleef @ 2001-04-10 12:31 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

On Tue, Apr 10, 2001 at 02:04:17PM +0200, Mikulas Patocka wrote:
> 
> Adding and removing timers happens much more frequently than PIT tick, so
> comparing these times is pointless.
> 
> If you have some device and timer protecting it from lockup on buggy
> hardware, you actually
> 
> send request to device
> add timer
> 
> receive interrupt and read reply
> remove timer
> 
> With the curent timer semantics, the cost of add timer and del timer is
> nearly zero. If you had to reprogram the PIT on each request and reply, it
> would slow things down. 
> 
> Note that you call mod_timer also on each packet received - and in worst
> case (which may happen), you end up reprogramming the PIT on each packet.

This just indicates that the interface needs to be richer -- i.e.,
such as having a "lazy timer" that means: "wake me up when _at least_
N ns have elapsed, but there's no hurry."  If waking you up at N ns
is expensive, then the wakeup is delayed until something else
interesting happens.

This is effectively what we have now anyway.  Perhaps the
current add_timer() should be mapped to lazy timers.




dave...


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:12           ` Alan Cox
@ 2001-04-10 12:27             ` Mark Salisbury
  2001-04-10 12:32             ` Andi Kleen
  1 sibling, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:27 UTC (permalink / raw)
  To: Alan Cox, Andi Kleen
  Cc: Alan Cox, Andi Kleen, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

On Tue, 10 Apr 2001, Alan Cox wrote:
> > Does not sound very attractive all at all on non virtual machines (I see the point on
> > UML/VM):
> > making system entry/context switch/interrupts slower, making add_timer slower, just to 
> > process a few less timer interrupts. That's like robbing the fast paths for a slow path.
> 
> Measure the number of clocks executing a timer interrupt. rdtsc is fast. Now
> consider the fact that out of this you get KHz or better scheduling 
> resolution required for games and midi. I'd say it looks good. I agree
> the accounting of user/system time needs care to avoid slowing down syscall
> paths
> 
> Alan

the system I implemented this in went from 25 Millisecond resolution to 25-60
Nanosecond resolution (depending on the CPU underneath).  that is a theoretical
factor of 500,000 to 1,000,000 improvement in resolution for timed events, and
the clock overhead after the change was about the same. (+-10% depending on
underlying CPU)

this is on a system that only had one clock tick per process quantum, as
opposed to the 10 in linux.





-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 11:43           ` David Schleef
  2001-04-10 12:04             ` Mikulas Patocka
@ 2001-04-10 12:19             ` Mark Salisbury
  2001-04-10 17:51             ` yodaiken
  2 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:19 UTC (permalink / raw)
  To: David Schleef, David Schleef, Alan Cox
  Cc: Mikulas Patocka, Jeff Dike, schwidefsky, linux-kernel

On Tue, 10 Apr 2001, David Schleef wrote:
> i.e., the TSC, you have to use 8254 timer 0 as both the timebase
> and the interval counter -- you end up slowly losing time because
> of the race condition between reading the timer and writing a
> new interval.  

actually, I have an algorithm to fix that.  (had to implement this on a system
with a single 32 bit decrementer (an ADI21060 SHARC, YUK!))  the algorithm
simulates a free spinning 64 bit incrementer given  a 32 bit interrupting
decrementer under exclusive control of the timekeeping code.  it also takes
into account the read/calculate/write interval.


  
> It would be nice to see any redesign in this area make it more
> modular.  I have hardware that would make it possible to slave
> the Linux system clock directly off a high-accuracy timebase,
> which would be super-useful for some applications.  I've been
> doing some of this already, both as a kernel patch and as part
> of RTAI; search for 'timekeeper' in the LKML archives if interested.
> 
> 
> 
> 
> dave...
-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10  9:33       ` Martin Mares
  2001-04-10 10:00         ` Albert D. Cahalan
@ 2001-04-10 12:14         ` Mark Salisbury
  2001-04-11  5:55           ` Karim Yaghmour
  1 sibling, 1 reply; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:14 UTC (permalink / raw)
  To: Martin Mares, Andi Kleen; +Cc: Jeff Dike, schwidefsky, linux-kernel

On Tue, 10 Apr 2001, Martin Mares wrote:
> Except for machines with very slow timers we really should account time
> to processes during context switch instead of sampling on timer ticks.
> The current values are in many situations (i.e., lots of processes
> or a process frequently waiting for events bound to timer) a pile
> of random numbers.

yup.  however, there is a performance penalty even on fast machines for the
fine grained process time usage accounting, and it in the past there has been a
strong reluctance to add overhead to syscalls and other context switches.

It would probably be a good compile config option to allow fine or coarse
process time accounting, that leaves the choice to the person setting up the
system to make the choice based on their needs.


 -- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 12:02         ` Andi Kleen
@ 2001-04-10 12:12           ` Alan Cox
  2001-04-10 12:27             ` Mark Salisbury
  2001-04-10 12:32             ` Andi Kleen
  0 siblings, 2 replies; 118+ messages in thread
From: Alan Cox @ 2001-04-10 12:12 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Andi Kleen, Mark Salisbury, Jeff Dike, schwidefsky,
	linux-kernel

> Does not sound very attractive all at all on non virtual machines (I see the point on
> UML/VM):
> making system entry/context switch/interrupts slower, making add_timer slower, just to 
> process a few less timer interrupts. That's like robbing the fast paths for a slow path.

Measure the number of clocks executing a timer interrupt. rdtsc is fast. Now
consider the fact that out of this you get KHz or better scheduling 
resolution required for games and midi. I'd say it looks good. I agree
the accounting of user/system time needs care to avoid slowing down syscall
paths

Alan



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 20:12     ` Alan Cox
  2001-04-09 20:32       ` Mark Salisbury
  2001-04-09 22:31       ` Mikulas Patocka
@ 2001-04-10 12:11       ` Mark Salisbury
  2 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:11 UTC (permalink / raw)
  To: Alan Cox, Mark Salisbury; +Cc: Jeff Dike, schwidefsky, linux-kernel

On Mon, 09 Apr 2001, Alan Cox wrote:
> 
> Its worth doing even on the ancient x86 boards with the PIT. It does require
> some driver changes since
> 
> 
> 	while(time_before(jiffies, we_explode))
> 		poll_things();
> 
> no longer works

jiffies could be replaced easily enough w/ a macro such as

#define jiffies (get_time_in_jiffies())

then driver code would not need to be touched.

-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10  5:51     ` Andi Kleen
  2001-04-10  9:33       ` Martin Mares
  2001-04-10 11:18       ` Alan Cox
@ 2001-04-10 12:07       ` Mark Salisbury
  2001-04-10 12:45         ` Andi Kleen
  2 siblings, 1 reply; 118+ messages in thread
From: Mark Salisbury @ 2001-04-10 12:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

which kind of U/K accaounting are you referring to?

are you referring to global changes in world time? are you referring to time
used by a process? 

I think the reduction of clock interrupts by a factor of 10 would buy us some
performance margin that could be traded for a slightly more complex handler.

On Tue, 10 Apr 2001, Andi Kleen wrote:
> On Mon, Apr 09, 2001 at 02:19:28PM -0400, Mark Salisbury wrote:
> > this is one of linux biggest weaknesses.  the fixed interval timer is a
> > throwback.  it should be replaced with a variable interval timer with interrupts
> > on demand for any system with a cpu sane/modern enough to have an on-chip
> > interrupting decrementer.  (i.e just about any modern chip)
> 
> Just how would you do kernel/user CPU time accounting then ?  It's currently done 
> on every timer tick, and doing it less often would make it useless.
> 
> 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 11:43           ` David Schleef
@ 2001-04-10 12:04             ` Mikulas Patocka
  2001-04-10 12:31               ` David Schleef
  2001-04-10 12:19             ` Mark Salisbury
  2001-04-10 17:51             ` yodaiken
  2 siblings, 1 reply; 118+ messages in thread
From: Mikulas Patocka @ 2001-04-10 12:04 UTC (permalink / raw)
  To: David Schleef
  Cc: Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

> > > > Its worth doing even on the ancient x86 boards with the PIT.
> > > 
> > > Note that programming the PIT is sloooooooow and doing it on every timer
> > > add_timer/del_timer would be a pain.
> > 
> > You only have to do it occasionally.
> > 
> > When you add a timer newer than the current one 
> > 	(arguably newer by at least 1/2*HZ sec)
> > When you finish running the timers at an interval and the new interval is
> > significantly larger than the current one.
> > 
> > Remember each tick we poke the PIT anyway
> 
> Reprogramming takes 3-4 times as long.  However, I still agree
> it's a good idea.

Adding and removing timers happens much more frequently than PIT tick, so
comparing these times is pointless.

If you have some device and timer protecting it from lockup on buggy
hardware, you actually

send request to device
add timer

receive interrupt and read reply
remove timer

With the curent timer semantics, the cost of add timer and del timer is
nearly zero. If you had to reprogram the PIT on each request and reply, it
would slow things down. 

Note that you call mod_timer also on each packet received - and in worst
case (which may happen), you end up reprogramming the PIT on each packet.

Mikulas


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 11:18       ` Alan Cox
@ 2001-04-10 12:02         ` Andi Kleen
  2001-04-10 12:12           ` Alan Cox
  0 siblings, 1 reply; 118+ messages in thread
From: Andi Kleen @ 2001-04-10 12:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

On Tue, Apr 10, 2001 at 12:18:03PM +0100, Alan Cox wrote:
> > > interrupting decrementer.  (i.e just about any modern chip)
> > 
> > Just how would you do kernel/user CPU time accounting then ?  It's currently done 
> > on every timer tick, and doing it less often would make it useless.
> 
> On the contrary doing it less often but at the right time massively improves
> its accuracy. You do it on reschedule. An rdtsc instruction is cheap and all
> of a sudden you have nearly cycle accurate accounting

Does not sound very attractive all at all on non virtual machines (I see the point on
UML/VM):
making system entry/context switch/interrupts slower, making add_timer slower, just to 
process a few less timer interrupts. That's like robbing the fast paths for a slow path.

-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10 11:38 schwidefsky
@ 2001-04-10 11:54 ` Alan Cox
  0 siblings, 0 replies; 118+ messages in thread
From: Alan Cox @ 2001-04-10 11:54 UTC (permalink / raw)
  To: schwidefsky; +Cc: Alan Cox, Andi Kleen, Mark Salisbury, Jeff Dike, linux-kernel

> If you do the accounting on reschedule, how do you find out how much ti=
> me
> has been spent in user versus kernel mode? Or do the Intel chips have t=
> wo
> counters, one for user space execution and one for the kernel?

Unfortunately not so you'd need to do a little bit per syscall, at least for
non trivial syscalls.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 22:35         ` Alan Cox
@ 2001-04-10 11:43           ` David Schleef
  2001-04-10 12:04             ` Mikulas Patocka
                               ` (2 more replies)
  2001-04-11 18:43           ` Oliver Xymoron
  1 sibling, 3 replies; 118+ messages in thread
From: David Schleef @ 2001-04-10 11:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mikulas Patocka, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

On Mon, Apr 09, 2001 at 11:35:44PM +0100, Alan Cox wrote:
> > > Its worth doing even on the ancient x86 boards with the PIT.
> > 
> > Note that programming the PIT is sloooooooow and doing it on every timer
> > add_timer/del_timer would be a pain.
> 
> You only have to do it occasionally.
> 
> When you add a timer newer than the current one 
> 	(arguably newer by at least 1/2*HZ sec)
> When you finish running the timers at an interval and the new interval is
> significantly larger than the current one.
> 
> Remember each tick we poke the PIT anyway

Reprogramming takes 3-4 times as long.  However, I still agree
it's a good idea.

RTAI will run the 8254 timer in one-shot mode if you ask it to.
However, on machines without a monotonically increasing counter,
i.e., the TSC, you have to use 8254 timer 0 as both the timebase
and the interval counter -- you end up slowly losing time because
of the race condition between reading the timer and writing a
new interval.  RTAI's solution is to disable kd_mksound and
use timer 2 as a poor man's TSC.  If either of those is too big
of a price, it may suffice to report that the timer granularity
on 486's is 10 ms.

It would be nice to see any redesign in this area make it more
modular.  I have hardware that would make it possible to slave
the Linux system clock directly off a high-accuracy timebase,
which would be super-useful for some applications.  I've been
doing some of this already, both as a kernel patch and as part
of RTAI; search for 'timekeeper' in the LKML archives if interested.




dave...


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-10 11:38 schwidefsky
  2001-04-10 11:54 ` Alan Cox
  0 siblings, 1 reply; 118+ messages in thread
From: schwidefsky @ 2001-04-10 11:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Mark Salisbury, Jeff Dike, linux-kernel



>> Just how would you do kernel/user CPU time accounting then ?  It's
currently done
>> on every timer tick, and doing it less often would make it useless.
>
>On the contrary doing it less often but at the right time massively
improves
>its accuracy. You do it on reschedule. An rdtsc instruction is cheap and
all
>of a sudden you have nearly cycle accurate accounting
If you do the accounting on reschedule, how do you find out how much time
has been spent in user versus kernel mode? Or do the Intel chips have two
counters, one for user space execution and one for the kernel?

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10  5:51     ` Andi Kleen
  2001-04-10  9:33       ` Martin Mares
@ 2001-04-10 11:18       ` Alan Cox
  2001-04-10 12:02         ` Andi Kleen
  2001-04-10 12:07       ` Mark Salisbury
  2 siblings, 1 reply; 118+ messages in thread
From: Alan Cox @ 2001-04-10 11:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

> > interrupting decrementer.  (i.e just about any modern chip)
> 
> Just how would you do kernel/user CPU time accounting then ?  It's currently done 
> on every timer tick, and doing it less often would make it useless.

On the contrary doing it less often but at the right time massively improves
its accuracy. You do it on reschedule. An rdtsc instruction is cheap and all
of a sudden you have nearly cycle accurate accounting

Alan


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10  9:33       ` Martin Mares
@ 2001-04-10 10:00         ` Albert D. Cahalan
  2001-04-10 12:14         ` Mark Salisbury
  1 sibling, 0 replies; 118+ messages in thread
From: Albert D. Cahalan @ 2001-04-10 10:00 UTC (permalink / raw)
  To: Martin Mares; +Cc: schwidefsky, linux-kernel

Martin Mares writes:
> [lost]

>> Just how would you do kernel/user CPU time accounting then ?
>> It's currently done on every timer tick, and doing it less
>> often would make it useless.
>
> Except for machines with very slow timers we really should account time
> to processes during context switch instead of sampling on timer ticks.
> The current values are in many situations (i.e., lots of processes
> or a process frequently waiting for events bound to timer) a pile
> of random numbers.

Linux should maintain some sort of per-process decaying average.
This data is required for a Unix98-compliant ps program. (for %CPU)
Currently ps is using total CPU usage over the life of the process.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-10  5:51     ` Andi Kleen
@ 2001-04-10  9:33       ` Martin Mares
  2001-04-10 10:00         ` Albert D. Cahalan
  2001-04-10 12:14         ` Mark Salisbury
  2001-04-10 11:18       ` Alan Cox
  2001-04-10 12:07       ` Mark Salisbury
  2 siblings, 2 replies; 118+ messages in thread
From: Martin Mares @ 2001-04-10  9:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

Hi!

> Just how would you do kernel/user CPU time accounting then ?  It's currently done 
> on every timer tick, and doing it less often would make it useless.

Except for machines with very slow timers we really should account time
to processes during context switch instead of sampling on timer ticks.
The current values are in many situations (i.e., lots of processes
or a process frequently waiting for events bound to timer) a pile
of random numbers.

				Have a nice fortnight
-- 
Martin `MJ' Mares <mj@ucw.cz> <mj@suse.cz> http://atrey.karlin.mff.cuni.cz/~mj/
We all live in a yellow subroutine...

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-10  7:29 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10  7:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mark Salisbury, Jeff Dike, linux-kernel



>Its worth doing even on the ancient x86 boards with the PIT. It does
require
>some driver changes since
>
>
>    while(time_before(jiffies, we_explode))
>         poll_things();
>
>no longer works
On S/390 we have a big advantage here. Driver code of this kind does not
exist.
That makes it a lot easier for us compared to other architectures. As I
said in
the original posting, the patch I have is working fine for S/390.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
@ 2001-04-10  7:27 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10  7:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Salisbury, Jeff Dike, linux-kernel



>Just how would you do kernel/user CPU time accounting then ?  It's
currently done
>on every timer tick, and doing it less often would make it useless.
This part is architecture dependent. For S/390 I choose to do a "STCK" on
every
system entry/exit. Dunno if this can be done on other architectures too, on
 S/390
this is reasonably cheap (one STCK costs 15 cycles). That means the
kernel/user CPU
time accounting is MUCH better now.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 18:19   ` Mark Salisbury
  2001-04-09 20:12     ` Alan Cox
@ 2001-04-10  5:51     ` Andi Kleen
  2001-04-10  9:33       ` Martin Mares
                         ` (2 more replies)
  1 sibling, 3 replies; 118+ messages in thread
From: Andi Kleen @ 2001-04-10  5:51 UTC (permalink / raw)
  To: Mark Salisbury; +Cc: Jeff Dike, schwidefsky, linux-kernel

On Mon, Apr 09, 2001 at 02:19:28PM -0400, Mark Salisbury wrote:
> this is one of linux biggest weaknesses.  the fixed interval timer is a
> throwback.  it should be replaced with a variable interval timer with interrupts
> on demand for any system with a cpu sane/modern enough to have an on-chip
> interrupting decrementer.  (i.e just about any modern chip)

Just how would you do kernel/user CPU time accounting then ?  It's currently done 
on every timer tick, and doing it less often would make it useless.


-Andi

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 22:31       ` Mikulas Patocka
@ 2001-04-09 22:35         ` Alan Cox
  2001-04-10 11:43           ` David Schleef
  2001-04-11 18:43           ` Oliver Xymoron
  0 siblings, 2 replies; 118+ messages in thread
From: Alan Cox @ 2001-04-09 22:35 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alan Cox, Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

> > Its worth doing even on the ancient x86 boards with the PIT.
> 
> Note that programming the PIT is sloooooooow and doing it on every timer
> add_timer/del_timer would be a pain.

You only have to do it occasionally.

When you add a timer newer than the current one 
	(arguably newer by at least 1/2*HZ sec)
When you finish running the timers at an interval and the new interval is
significantly larger than the current one.

Remember each tick we poke the PIT anyway


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 20:12     ` Alan Cox
  2001-04-09 20:32       ` Mark Salisbury
@ 2001-04-09 22:31       ` Mikulas Patocka
  2001-04-09 22:35         ` Alan Cox
  2001-04-10 12:11       ` Mark Salisbury
  2 siblings, 1 reply; 118+ messages in thread
From: Mikulas Patocka @ 2001-04-09 22:31 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mark Salisbury, Jeff Dike, schwidefsky, linux-kernel

> > this is one of linux biggest weaknesses.  the fixed interval timer is a
> > throwback.  it should be replaced with a variable interval timer with interrupts
> > on demand for any system with a cpu sane/modern enough to have an on-chip
> > interrupting decrementer.  (i.e just about any modern chip)
> 
> Its worth doing even on the ancient x86 boards with the PIT.

Note that programming the PIT is sloooooooow and doing it on every timer
add_timer/del_timer would be a pain.

Mikulas


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 20:12     ` Alan Cox
@ 2001-04-09 20:32       ` Mark Salisbury
  2001-04-09 22:31       ` Mikulas Patocka
  2001-04-10 12:11       ` Mark Salisbury
  2 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-09 20:32 UTC (permalink / raw)
  To: Alan Cox, Mark Salisbury; +Cc: Jeff Dike, schwidefsky, linux-kernel

On Mon, 09 Apr 2001, Alan Cox wrote:
> > this is one of linux biggest weaknesses.  the fixed interval timer is a
> > throwback.  it should be replaced with a variable interval timer with interrupts
> > on demand for any system with a cpu sane/modern enough to have an on-chip
> > interrupting decrementer.  (i.e just about any modern chip)
> 
> Its worth doing even on the ancient x86 boards with the PIT. It does require
> some driver changes since
> 
> 
> 	while(time_before(jiffies, we_explode))
> 		poll_things();
> 
> no longer works
> 

It would be great if this could be one of the 2.5 goals/projects.

it would make all sorts of fun and useful timed event services easier to
implement and provide (potentially) microsecond resolution as opposed to the
current 10Ms.

plus, we would only get one "sysclock" timer interrupt per process quantum
instead of 10.



-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 18:19   ` Mark Salisbury
@ 2001-04-09 20:12     ` Alan Cox
  2001-04-09 20:32       ` Mark Salisbury
                         ` (2 more replies)
  2001-04-10  5:51     ` Andi Kleen
  1 sibling, 3 replies; 118+ messages in thread
From: Alan Cox @ 2001-04-09 20:12 UTC (permalink / raw)
  To: Mark Salisbury; +Cc: Jeff Dike, schwidefsky, linux-kernel

> this is one of linux biggest weaknesses.  the fixed interval timer is a
> throwback.  it should be replaced with a variable interval timer with interrupts
> on demand for any system with a cpu sane/modern enough to have an on-chip
> interrupting decrementer.  (i.e just about any modern chip)

Its worth doing even on the ancient x86 boards with the PIT. It does require
some driver changes since


	while(time_before(jiffies, we_explode))
		poll_things();

no longer works


^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 15:54 schwidefsky
@ 2001-04-09 18:30 ` Jeff Dike
  2001-04-09 18:19   ` Mark Salisbury
  0 siblings, 1 reply; 118+ messages in thread
From: Jeff Dike @ 2001-04-09 18:30 UTC (permalink / raw)
  To: schwidefsky; +Cc: linux-kernel

schwidefsky@de.ibm.com said:
> I have a suggestion that might seem unusual at first but it is
> important for Linux on S/390. We are facing the problem that we want
> to start many (> 1000) Linux images on a big S/390 machine. Every
> image has its own 100 HZ timer on every processor the images uses
> (normally 1). On a single image system the processor use of the 100 HZ
> timer is not a big deal but with > 1000 images you need a lot of
> processing power just to execute the 100 HZ timers. You quickly end up
> with 100% CPU only for the timer interrupts of otherwise idle images.

This is going to be a problem for UML as well, and I was considering something 
very similar.  I did a quick scan of your prose, and the description sounds 
like what I had in mind.

So, count me in as a supporter of this.

A small request: Since S/390 is not the only port that needs this, I'd be 
happy if it was made as generic as possible (and it may already be, I haven't 
gone through the code yet).

				Jeff



^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: No 100 HZ timer !
  2001-04-09 18:30 ` Jeff Dike
@ 2001-04-09 18:19   ` Mark Salisbury
  2001-04-09 20:12     ` Alan Cox
  2001-04-10  5:51     ` Andi Kleen
  0 siblings, 2 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-09 18:19 UTC (permalink / raw)
  To: Jeff Dike, schwidefsky; +Cc: linux-kernel

this is one of linux biggest weaknesses.  the fixed interval timer is a
throwback.  it should be replaced with a variable interval timer with interrupts
on demand for any system with a cpu sane/modern enough to have an on-chip
interrupting decrementer.  (i.e just about any modern chip)

On Mon, 09 Apr 2001, Jeff Dike wrote:
> schwidefsky@de.ibm.com said:
> > I have a suggestion that might seem unusual at first but it is
> > important for Linux on S/390. We are facing the problem that we want
> > to start many (> 1000) Linux images on a big S/390 machine. Every
> > image has its own 100 HZ timer on every processor the images uses
> > (normally 1). On a single image system the processor use of the 100 HZ
> > timer is not a big deal but with > 1000 images you need a lot of
> > processing power just to execute the 100 HZ timers. You quickly end up
> > with 100% CPU only for the timer interrupts of otherwise idle images.
> 
> This is going to be a problem for UML as well, and I was considering something 
> very similar.  I did a quick scan of your prose, and the description sounds 
> like what I had in mind.
> 
> So, count me in as a supporter of this.
> 
> A small request: Since S/390 is not the only port that needs this, I'd be 
> happy if it was made as generic as possible (and it may already be, I haven't 
> gone through the code yet).
> 
> 				Jeff
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread

* No 100 HZ timer !
@ 2001-04-09 15:54 schwidefsky
  2001-04-09 18:30 ` Jeff Dike
  0 siblings, 1 reply; 118+ messages in thread
From: schwidefsky @ 2001-04-09 15:54 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5671 bytes --]




Hi,
seems like my first try with the complete patch hasn't made it through
to the mailing list. This is the second try with only the common part of
the patch. Here we go (again):
---

I have a suggestion that might seem unusual at first but it is important
for Linux on S/390. We are facing the problem that we want to start many
(> 1000) Linux images on a big S/390 machine. Every image has its own
100 HZ timer on every processor the images uses (normally 1). On a single
image system the processor use of the 100 HZ timer is not a big deal but
with > 1000 images you need a lot of processing power just to execute the
100 HZ timers. You quickly end up with 100% CPU only for the timer
interrupts of otherwise idle images. Therefore I had a go at the timer
stuff and now I have a system running without the 100 HZ timer. Unluckly
I need to make changes to common code and I want you opinion on it.

The first problem was how to get rid of the jiffies. The solution is
simple. I simply defined a macro that calculates the jiffies value from
the TOD clock:
  #define jiffies ({ \
          uint64_t __ticks; \
          asm ("STCK %0" : "=m" (__ticks) ); \
          __ticks = (__ticks - init_timer_cc) >> 12; \
          do_div(__ticks, (1000000/HZ)); \
          ((unsigned long) __ticks); \
  })
With this define you are independent of the jiffies variable which is no
longer needed so I ifdef'ed the definition. There are some places where a
local variable is named jiffies. You may not replace these so I renamed
them to _jiffies. A kernel compiled with only this change works as always.

The second problem is that you need to be able to find out when the next
timer event is due to happen. You'll find a new function "next_timer_event"
in the patch which traverses tv1-tv5 and returns the timer_list of the next
timer event. It is used in timer_bh to indicate to the backend when the
next interrupt should happen. This leads us to the notifier functions.
Each time a new timer is added, a timer is modified, or a timer expires
the architecture backend needs to reset its timeout value. That is what the
"timer_notify" callback is used for. The implementation on S/390 uses the
clock comparator and looks like this:
  static void s390_timer_notify(unsigned long expires)
  {
          S390_lowcore.timer_event =
                  ((__u64) expires*CLK_TICKS_PER_JIFFY) + init_timer_cc;
          asm volatile ("SCKC %0" : : "m" (S390_lowcore.timer_event));
  }
This causes an interrupt on the cpu which executed s390_timer_notify after
"expires" has passed. That means that timer events are spread over the cpus
in the system. Modified or deleted timer events do not cause a deletion
notification. A cpu might be errornously interrupted to early because of a
timer event that has been modified or deleted. But that doesn't do any
harm,
it is just unnecessary work.

There is a second callback "itimer_notify" that is used to get the per
process timers right. We use the cpu timer for this purpose:
  void set_cpu_timer(void)
  {
          unsigned long min_ticks;
          __u64 time_slice;
          if (current->pid != 0 && current->need_resched == 0) {
                  min_ticks = current->counter;
                  if (current->it_prof_value != 0 &&
                      current->it_prof_value < min_ticks)
                          min_ticks = current->it_prof_value;
                  if (current->it_virt_value != 0 &&
                      current->it_virt_value < min_ticks)
                          min_ticks = current->it_virt_value;
                  time_slice = (__u64) min_ticks*CLK_TICKS_PER_JIFFY;
                  asm volatile ("spt %0" : : "m" (time_slice));
          }
  }
The cpu timer is a one shot timer that interrupts after the specified
amount
of time has passed. Not a 100% accurate because VM can schedule the virtual
processor before the "spt" has been done but good enough for per process
timers.

The remaining changes to common code parts deal with the problem that many
ticks may be accounted at once. For example without the 100 HZ timer it is
possible that a process runs for half a second in user space. With the next
interrupt all the ticks between the last update and the interrupt have to
be added to the tick counters. This is why update_wall_time and do_it_prof
have changed and update_process_times2 has been introduced.

That leaves three problems: 1) you need to check on every system entry if
a tick or more has passed and do the update if necessary, 2) you need to
keep track of the elapsed time in user space and in kernel space and 3) you
need to check tq_timer every time the system is left and setup a timer
event for the next timer tick if there is work to do on the timer queue.
These three problems are related and have to be implemented architecture
dependent. A nice thing we get for free is that the user/kernel elapsed
time
measurement gets much more accurate.

The number of interrupts in an idle system due to timer activity drops from
from 100 per second on every cpu to about 5-6 on all (!) cpus if this patch
is used. Exactly what we want to have.

All this new timer code is only used if the config option
CONFIG_NO_HZ_TIMER
is set. Without it everything works as always, especially for architectures
that will not use it.

Now what do you think?

(See attached file: timer_common)

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com

[-- Attachment #2: timer_common --]
[-- Type: application/octet-stream, Size: 13138 bytes --]

diff -urN linux-2.4.3/include/linux/sched.h linux-2.4.3-s390/include/linux/sched.h
--- linux-2.4.3/include/linux/sched.h	Tue Mar 27 01:48:11 2001
+++ linux-2.4.3-s390/include/linux/sched.h	Thu Apr  5 10:23:48 2001
@@ -144,6 +144,7 @@
 extern void cpu_init (void);
 extern void trap_init(void);
 extern void update_process_times(int user);
+extern void update_process_times2(int user, int system);
 extern void update_one_process(struct task_struct *p, unsigned long user,
 			       unsigned long system, int cpu);
 
@@ -534,7 +535,9 @@
 
 #include <asm/current.h>
 
+#ifndef CONFIG_NO_HZ_TIMER
 extern unsigned long volatile jiffies;
+#endif
 extern unsigned long itimer_ticks;
 extern unsigned long itimer_next;
 extern struct timeval xtime;
diff -urN linux-2.4.3/include/linux/time.h linux-2.4.3-s390/include/linux/time.h
--- linux-2.4.3/include/linux/time.h	Tue Mar 27 01:48:10 2001
+++ linux-2.4.3-s390/include/linux/time.h	Thu Apr  5 10:23:48 2001
@@ -42,10 +42,10 @@
 }
 
 static __inline__ void
-jiffies_to_timespec(unsigned long jiffies, struct timespec *value)
+jiffies_to_timespec(unsigned long _jiffies, struct timespec *value)
 {
-	value->tv_nsec = (jiffies % HZ) * (1000000000L / HZ);
-	value->tv_sec = jiffies / HZ;
+	value->tv_nsec = (_jiffies % HZ) * (1000000000L / HZ);
+	value->tv_sec = _jiffies / HZ;
 }
 
 
diff -urN linux-2.4.3/include/linux/timer.h linux-2.4.3-s390/include/linux/timer.h
--- linux-2.4.3/include/linux/timer.h	Tue Mar 27 01:48:10 2001
+++ linux-2.4.3-s390/include/linux/timer.h	Thu Apr  5 10:23:48 2001
@@ -35,6 +35,18 @@
 #define sync_timers()		do { } while (0)
 #endif
 
+#ifdef CONFIG_NO_HZ_TIMER
+/*
+ * Setting timer_notify to something != NULL will make
+ * the timer routines call the notification routine
+ * whenever a new add_timer/mod_timer has set a new
+ * soonest timer event.
+ */
+extern void (*timer_notify)(unsigned long expires);
+extern void (*itimer_notify)(void);
+extern void update_times_irqsave(void);
+#endif
+
 /*
  * mod_timer is a more efficient way to update the expire field of an
  * active timer (if the timer is inactive it will be activated)
diff -urN linux-2.4.3/kernel/itimer.c linux-2.4.3-s390/kernel/itimer.c
--- linux-2.4.3/kernel/itimer.c	Thu Jun 29 19:07:36 2000
+++ linux-2.4.3-s390/kernel/itimer.c	Thu Apr  5 10:23:48 2001
@@ -34,10 +34,10 @@
 	return HZ*sec+usec;
 }
 
-static void jiffiestotv(unsigned long jiffies, struct timeval *value)
+static void jiffiestotv(unsigned long _jiffies, struct timeval *value)
 {
-	value->tv_usec = (jiffies % HZ) * (1000000 / HZ);
-	value->tv_sec = jiffies / HZ;
+	value->tv_usec = (_jiffies % HZ) * (1000000 / HZ);
+	value->tv_sec = _jiffies / HZ;
 }
 
 int do_getitimer(int which, struct itimerval *value)
@@ -105,6 +105,16 @@
 	}
 }
 
+#ifdef CONFIG_NO_HZ_TIMER
+void (*itimer_notify)(void) = NULL;
+
+static inline void do_itimer_notify(void)
+{
+	if (itimer_notify != NULL)
+                (*itimer_notify)();
+}
+#endif
+
 int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)
 {
 	register unsigned long i, j;
@@ -132,12 +142,18 @@
 				j++;
 			current->it_virt_value = j;
 			current->it_virt_incr = i;
+#ifdef CONFIG_NO_HZ_TIMER
+			do_itimer_notify();
+#endif
 			break;
 		case ITIMER_PROF:
 			if (j)
 				j++;
 			current->it_prof_value = j;
 			current->it_prof_incr = i;
+#ifdef CONFIG_NO_HZ_TIMER
+			do_itimer_notify();
+#endif
 			break;
 		default:
 			return -EINVAL;
diff -urN linux-2.4.3/kernel/ksyms.c linux-2.4.3-s390/kernel/ksyms.c
--- linux-2.4.3/kernel/ksyms.c	Thu Apr  5 10:23:36 2001
+++ linux-2.4.3-s390/kernel/ksyms.c	Thu Apr  5 10:23:48 2001
@@ -429,7 +429,9 @@
 EXPORT_SYMBOL(interruptible_sleep_on_timeout);
 EXPORT_SYMBOL(schedule);
 EXPORT_SYMBOL(schedule_timeout);
+#ifndef CONFIG_NO_HZ_TIMER
 EXPORT_SYMBOL(jiffies);
+#endif
 EXPORT_SYMBOL(xtime);
 EXPORT_SYMBOL(do_gettimeofday);
 EXPORT_SYMBOL(do_settimeofday);
diff -urN linux-2.4.3/kernel/timer.c linux-2.4.3-s390/kernel/timer.c
--- linux-2.4.3/kernel/timer.c	Sun Dec 10 18:53:19 2000
+++ linux-2.4.3-s390/kernel/timer.c	Thu Apr  5 10:23:48 2001
@@ -65,7 +65,9 @@
 
 extern int do_setitimer(int, struct itimerval *, struct itimerval *);
 
+#ifndef CONFIG_NO_HZ_TIMER
 unsigned long volatile jiffies;
+#endif
 
 unsigned int * prof_buffer;
 unsigned long prof_len;
@@ -173,6 +175,22 @@
 #define timer_exit()		do { } while (0)
 #endif
 
+#ifdef CONFIG_NO_HZ_TIMER
+void (*timer_notify)(unsigned long) = NULL;
+unsigned long notify_jiffy = 0;
+
+static inline void do_timer_notify(struct timer_list *timer)
+{
+	if (timer_notify != NULL) {
+		if (notify_jiffy == 0 ||
+		    time_before(timer->expires, notify_jiffy)) {
+			(*timer_notify)(timer->expires);
+			notify_jiffy = timer->expires;
+		}
+	}
+}
+#endif
+
 void add_timer(struct timer_list *timer)
 {
 	unsigned long flags;
@@ -181,6 +199,9 @@
 	if (timer_pending(timer))
 		goto bug;
 	internal_add_timer(timer);
+#ifdef CONFIG_NO_HZ_TIMER
+	do_timer_notify(timer);
+#endif
 	spin_unlock_irqrestore(&timerlist_lock, flags);
 	return;
 bug:
@@ -206,6 +227,9 @@
 	timer->expires = expires;
 	ret = detach_timer(timer);
 	internal_add_timer(timer);
+#ifdef CONFIG_NO_HZ_TIMER
+	do_timer_notify(timer);
+#endif
 	spin_unlock_irqrestore(&timerlist_lock, flags);
 	return ret;
 }
@@ -323,6 +347,89 @@
 	spin_unlock_irq(&timerlist_lock);
 }
 
+#ifdef CONFIG_NO_HZ_TIMER
+/*
+ * Check timer list for earliest timer
+ */
+static inline struct timer_list *
+earlier_timer_in_list(struct list_head *head, struct timer_list *event)
+{
+	struct list_head *curr;
+
+	if (list_empty(head))
+		return event;
+	curr = head->next;
+	if (event == NULL) {
+		event = list_entry(curr, struct timer_list, list);
+		curr = curr->next;
+	}
+	while (curr != head) {
+		struct timer_list * tmp;
+
+		tmp = list_entry(curr, struct timer_list, list);
+		if (time_before(tmp->expires, event->expires))
+			event = tmp;
+		curr = curr->next;
+	}
+	return event;
+}
+
+/*
+ * Find out when the next timer event is due to happen. This
+ * is used on S/390 to be able to skip timer ticks.
+ * The timerlist_lock must be acquired before calling this function.
+ */
+struct timer_list *next_timer_event(void)
+{
+	struct timer_list *nte = NULL;
+	int i;
+
+	/* Look for the next timer event in tv1. */
+	i = tv1.index;
+	do {
+		struct list_head *head = tv1.vec + i;
+		if (!list_empty(head)) {
+			nte = list_entry(head->next, struct timer_list, list);
+			if (i < tv1.index) {
+				/* 
+				 * The search wrapped. We need to look
+				 * at the next list from tvecs[1] that
+				 * would cascade into tv1.
+				 */
+				head = tvecs[1]->vec + tvecs[1]->index;
+				nte = earlier_timer_in_list(head, nte);
+			}
+			goto out;
+		}
+		i = (i + 1) & TVR_MASK;
+	} while (i != tv1.index);
+
+	/* No event found in tv1. Check tv2-tv5. */
+	for (i = 1; i < NOOF_TVECS; i++) {
+		int j = tvecs[i]->index;
+		do {
+			struct list_head *head = tvecs[i]->vec + j;
+			nte = earlier_timer_in_list(head, NULL);
+			if (nte) {
+				if (j < tvecs[i]->index && i < NOOF_TVECS-1) {
+					/* 
+					 * The search wrapped. We need to look
+					 * at the next list from tvecs[i+1]
+					 * that would cascade into tvecs[i].
+					 */
+					head = tvecs[i+1]->vec+tvecs[i+1]->index;
+					nte = earlier_timer_in_list(head, nte);
+				}
+				goto out;
+			}
+			j = (j + 1) & TVN_MASK;
+		} while (j != tvecs[i]->index);
+	}
+ out:
+	return nte;
+}
+#endif
+
 spinlock_t tqueue_lock = SPIN_LOCK_UNLOCKED;
 
 void tqueue_bh(void)
@@ -458,8 +565,13 @@
 #endif
 }
 
-/* in the NTP reference this is called "hardclock()" */
-static void update_wall_time_one_tick(void)
+/*
+ * The ticks loop used in the past is gone because with
+ * the CONFIG_NO_HZ_TIMER config option on S/390 it is
+ * possible that ticks is a lot bigger than one.
+ *   -- martin
+ */
+static void update_wall_time(unsigned long ticks)
 {
 	if ( (time_adjust_step = time_adjust) != 0 ) {
 	    /* We are doing an adjtime thing. 
@@ -470,21 +582,22 @@
 	     *
 	     * Limit the amount of the step to be in the range
 	     * -tickadj .. +tickadj
+             * per tick.
 	     */
-	     if (time_adjust > tickadj)
-		time_adjust_step = tickadj;
-	     else if (time_adjust < -tickadj)
-		time_adjust_step = -tickadj;
+	     if (time_adjust > tickadj*ticks)
+		time_adjust_step = tickadj*ticks;
+	     else if (time_adjust < -tickadj*ticks)
+		time_adjust_step = -tickadj*ticks;
 	     
 	    /* Reduce by this step the amount of time left  */
 	    time_adjust -= time_adjust_step;
 	}
-	xtime.tv_usec += tick + time_adjust_step;
+	xtime.tv_usec += tick*ticks + time_adjust_step;
 	/*
 	 * Advance the phase, once it gets to one microsecond, then
 	 * advance the tick more.
 	 */
-	time_phase += time_adj;
+	time_phase += time_adj*ticks;
 	if (time_phase <= -FINEUSEC) {
 		long ltemp = -time_phase >> SHIFT_SCALE;
 		time_phase += ltemp << SHIFT_SCALE;
@@ -495,21 +608,6 @@
 		time_phase -= ltemp << SHIFT_SCALE;
 		xtime.tv_usec += ltemp;
 	}
-}
-
-/*
- * Using a loop looks inefficient, but "ticks" is
- * usually just one (we shouldn't be losing ticks,
- * we're doing this this way mainly for interrupt
- * latency reasons, not because we think we'll
- * have lots of lost timer ticks
- */
-static void update_wall_time(unsigned long ticks)
-{
-	do {
-		ticks--;
-		update_wall_time_one_tick();
-	} while (ticks);
 
 	if (xtime.tv_usec >= 1000000) {
 	    xtime.tv_usec -= 1000000;
@@ -527,7 +625,7 @@
 	psecs += (p->times.tms_stime += system);
 	if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) {
 		/* Send SIGXCPU every second.. */
-		if (!(psecs % HZ))
+		if ((psecs % HZ) < user+system)
 			send_sig(SIGXCPU, p, 1);
 		/* and SIGKILL when we go over max.. */
 		if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_max)
@@ -540,24 +638,25 @@
 	unsigned long it_virt = p->it_virt_value;
 
 	if (it_virt) {
-		it_virt -= ticks;
-		if (!it_virt) {
+		if (it_virt <= ticks) {
 			it_virt = p->it_virt_incr;
 			send_sig(SIGVTALRM, p, 1);
-		}
+		} else
+			it_virt -= ticks;
 		p->it_virt_value = it_virt;
 	}
 }
 
-static inline void do_it_prof(struct task_struct *p)
+static inline void do_it_prof(struct task_struct *p, unsigned long ticks)
 {
 	unsigned long it_prof = p->it_prof_value;
 
 	if (it_prof) {
-		if (--it_prof == 0) {
+		if (it_prof <= ticks) {
 			it_prof = p->it_prof_incr;
 			send_sig(SIGPROF, p, 1);
-		}
+		} else
+			it_prof -= ticks;
 		p->it_prof_value = it_prof;
 	}
 }
@@ -569,7 +668,7 @@
 	p->per_cpu_stime[cpu] += system;
 	do_process_times(p, user, system);
 	do_it_virt(p, user);
-	do_it_prof(p);
+	do_it_prof(p, user + system);
 }	
 
 /*
@@ -597,6 +696,31 @@
 }
 
 /*
+ * Called from the timer interrupt handler to charge a couple of
+ * system and user ticks. 
+ */
+void update_process_times2(int user, int system)
+{
+	struct task_struct *p = current;
+	int cpu = smp_processor_id();
+
+	update_one_process(p, user, system, cpu);
+	if (p->pid) {
+		p->counter -= user + system;
+		if (p->counter <= 0) {
+			p->counter = 0;
+			p->need_resched = 1;
+		}
+		if (p->nice > 0)
+			kstat.per_cpu_nice[cpu] += user;
+		else
+			kstat.per_cpu_user[cpu] += user;
+		kstat.per_cpu_system[cpu] += system;
+	} else if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
+		kstat.per_cpu_system[cpu] += system;
+}
+
+/*
  * Nr of active tasks - counted in fixed-point numbers
  */
 static unsigned long count_active_tasks(void)
@@ -628,7 +752,7 @@
 	static int count = LOAD_FREQ;
 
 	count -= ticks;
-	if (count < 0) {
+	while (count < 0) {
 		count += LOAD_FREQ;
 		active_tasks = count_active_tasks();
 		CALC_LOAD(avenrun[0], EXP_1, active_tasks);
@@ -650,7 +774,7 @@
 	unsigned long ticks;
 
 	/*
-	 * update_times() is run from the raw timer_bh handler so we
+	 * do_update_times() is run from the raw timer_bh handler so we
 	 * just know that the irqs are locally enabled and so we don't
 	 * need to save/restore the flags of the local CPU here. -arca
 	 */
@@ -665,12 +789,49 @@
 	calc_load(ticks);
 }
 
+void update_times_irqsave(void)
+{
+	unsigned long ticks;
+	unsigned long flags;
+
+	/*
+	 * do_update_times() is run from the raw timer_bh handler so we
+	 * just know that the irqs are locally enabled and so we don't
+	 * need to save/restore the flags of the local CPU here. -arca
+	 */
+	write_lock_irqsave(&xtime_lock, flags);
+
+	ticks = jiffies - wall_jiffies;
+	if (ticks) {
+		wall_jiffies += ticks;
+		update_wall_time(ticks);
+	}
+	write_unlock_irqrestore(&xtime_lock, flags);
+	calc_load(ticks);
+}
+
 void timer_bh(void)
 {
 	update_times();
 	run_timer_list();
+#ifdef CONFIG_NO_HZ_TIMER
+	if (timer_notify != NULL) {
+		struct timer_list *timer;
+		unsigned long flags;
+
+		spin_lock_irqsave(&timerlist_lock, flags);
+		timer = next_timer_event();
+		if (timer != NULL) {
+			(*timer_notify)(timer->expires);
+			notify_jiffy = timer->expires;
+		} else
+			notify_jiffy = 0;
+		spin_unlock_irqrestore(&timerlist_lock, flags);
+	}
+#endif
 }
 
+#ifndef CONFIG_NO_HZ_TIMER
 void do_timer(struct pt_regs *regs)
 {
 	(*(unsigned long *)&jiffies)++;
@@ -683,6 +844,7 @@
 	if (TQ_ACTIVE(tq_timer))
 		mark_bh(TQUEUE_BH);
 }
+#endif
 
 #if !defined(__alpha__) && !defined(__ia64__)
 

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2001-08-15 11:21 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-12 13:14 No 100 HZ timer! Bret Indrelee
  -- strict thread matches above, loose matches on Subject: below --
2001-08-01 17:22 No 100 HZ timer ! george anzinger
2001-08-01 19:34 ` Chris Friesen
2001-08-01 19:49   ` Richard B. Johnson
2001-08-01 20:08     ` Mark Salisbury
2001-08-01 20:33     ` george anzinger
2001-08-01 21:20   ` george anzinger
2001-08-02  4:28     ` Rik van Riel
2001-08-02  6:03       ` george anzinger
2001-08-02 14:39         ` Oliver Xymoron
2001-08-02 16:36           ` george anzinger
2001-08-02 17:05             ` Oliver Xymoron
2001-08-02 17:46               ` george anzinger
2001-08-02 18:41                 ` Oliver Xymoron
2001-08-02 21:18                   ` george anzinger
2001-08-02 22:09                     ` Oliver Xymoron
2001-08-02 17:26             ` John Alvord
2001-04-12 12:58 Mark Salisbury
2001-04-11 17:56 No 100 HZ timer! Bret Indrelee
2001-04-12 17:39 ` george anzinger
2001-04-12 21:19   ` Bret Indrelee
2001-04-12 22:20     ` george anzinger
2001-04-13  4:00       ` Bret Indrelee
2001-04-13  6:32         ` Ben Greear
2001-04-13  8:42           ` george anzinger
2001-04-13 10:36             ` Jamie Lokier
2001-04-13 16:07               ` george anzinger
2001-04-13 23:00                 ` Jamie Lokier
2001-04-13 12:05           ` Horst von Brand
2001-04-13 21:53             ` george anzinger
2001-04-13 23:10               ` Jamie Lokier
2001-04-16  3:02                 ` Ben Greear
2001-04-16  2:46                   ` Jamie Lokier
2001-04-16 12:36                     ` Mark Salisbury
2001-04-16 19:19                       ` george anzinger
2001-04-16 20:45                         ` Albert D. Cahalan
2001-04-16 21:29                           ` Chris Wedgwood
2001-04-16 22:25                           ` george anzinger
2001-04-16 23:57                         ` Mark Salisbury
2001-04-17  0:45                           ` george anzinger
2001-04-17 12:12                             ` Mark Salisbury
2001-04-17 12:51                         ` Mark Salisbury
2001-04-17 18:53                           ` george anzinger
2001-04-17 19:41                             ` Jamie Lokier
2001-04-23  8:05                             ` Ulrich Windl
2001-04-23 13:22                               ` Mark Salisbury
2001-04-16  2:41               ` Ben Greear
2001-04-11  9:06 No 100 HZ timer ! schwidefsky
2001-04-10 14:42 schwidefsky
2001-04-10 12:54 schwidefsky
2001-04-10 11:38 schwidefsky
2001-04-10 11:54 ` Alan Cox
2001-04-10  7:29 schwidefsky
2001-04-10  7:27 schwidefsky
2001-04-09 15:54 schwidefsky
2001-04-09 18:30 ` Jeff Dike
2001-04-09 18:19   ` Mark Salisbury
2001-04-09 20:12     ` Alan Cox
2001-04-09 20:32       ` Mark Salisbury
2001-04-09 22:31       ` Mikulas Patocka
2001-04-09 22:35         ` Alan Cox
2001-04-10 11:43           ` David Schleef
2001-04-10 12:04             ` Mikulas Patocka
2001-04-10 12:31               ` David Schleef
2001-04-10 12:34                 ` Mark Salisbury
2001-04-10 14:10                 ` Mikulas Patocka
2001-04-10 13:35                   ` root
2001-04-10 14:22                   ` Andi Kleen
2001-04-10 15:43                   ` Alan Cox
2001-04-12  5:25                     ` watermodem
2001-04-12  8:45                       ` Jamie Lokier
2001-04-10 17:15                   ` Jamie Lokier
2001-04-10 17:27                     ` Alan Cox
2001-04-10 17:35                       ` Jamie Lokier
2001-04-10 18:17                         ` Alan Cox
2001-04-10 18:24                           ` Jamie Lokier
2001-04-10 19:28                             ` george anzinger
2001-04-10 20:02                               ` mark salisbury
2001-04-10 22:08                                 ` george anzinger
2001-04-11  0:48                                   ` Mark Salisbury
2001-04-11  2:35                                     ` george anzinger
2001-04-12  0:24                                       ` Mark Salisbury
2001-04-11 16:11                                     ` Jamie Lokier
2001-04-11 16:59                                       ` george anzinger
2001-04-11 18:57                                         ` Jamie Lokier
2001-04-11 19:21                                           ` John Alvord
2001-04-12  8:41                                             ` Jamie Lokier
2001-08-01  1:08                               ` george anzinger
2001-08-11 11:57                                 ` Pavel Machek
2001-08-14 15:59                                   ` Jamie Lokier
2001-08-14 16:57                                     ` george anzinger
2001-04-10 19:50                             ` Zdenek Kabelac
2001-04-11 11:42                               ` Maciej W. Rozycki
2001-04-11 16:13                                 ` Jamie Lokier
2001-04-12  9:51                                   ` Maciej W. Rozycki
2001-04-10 19:42                       ` Zdenek Kabelac
2001-04-10 12:19             ` Mark Salisbury
2001-04-10 17:51             ` yodaiken
2001-04-11 18:43           ` Oliver Xymoron
2001-04-10 12:11       ` Mark Salisbury
2001-04-10  5:51     ` Andi Kleen
2001-04-10  9:33       ` Martin Mares
2001-04-10 10:00         ` Albert D. Cahalan
2001-04-10 12:14         ` Mark Salisbury
2001-04-11  5:55           ` Karim Yaghmour
2001-04-10 11:18       ` Alan Cox
2001-04-10 12:02         ` Andi Kleen
2001-04-10 12:12           ` Alan Cox
2001-04-10 12:27             ` Mark Salisbury
2001-04-10 12:32             ` Andi Kleen
2001-04-10 12:36               ` Alan Cox
2001-04-10 12:37                 ` Andi Kleen
2001-04-10 18:45               ` Stephen D. Williams
2001-04-10 19:59                 ` Andi Kleen
2001-04-10 12:07       ` Mark Salisbury
2001-04-10 12:45         ` Andi Kleen
2001-04-10 12:42           ` Mark Salisbury
2001-04-10 12:54             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).