linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: No 100 HZ timer !
@ 2001-04-12 12:58 Mark Salisbury
  0 siblings, 0 replies; 118+ messages in thread
From: Mark Salisbury @ 2001-04-12 12:58 UTC (permalink / raw)
  To: linux-kernel


On Wed, 11 Apr 2001, Bret Indrelee wrote:
> Current generation PCs can easily handle 1000s of
> interrupts a second if you keep the overhead small.

the PC centric implementation of the ticked system is one of its major flaws.

there are architectures which the cost of a fixed interval is the same as a
variable interval, i.e. no reload register, so you must explicitely load a
value each interrupt anyway. and if you want accurate intervals, you must
perform a calculation each reload, and not just load a static value, because
you don't know how many cycles it took from the time the interrupt happened
till you get to the reload line because the int may have been masked or lower
pri than another interrupt.

also, why handle 1000's of interrupts if you only need to handle 10?

-- 
/*------------------------------------------------**
**   Mark Salisbury | Mercury Computer Systems    **
**   mbs@mc.com     | System OS - Kernel Team     **
**------------------------------------------------**
**  I will be riding in the Multiple Sclerosis    **
**  Great Mass Getaway, a 150 mile bike ride from **
**  Boston to Provincetown.  Last year I raised   **
**  over $1200.  This year I would like to beat   **
**  that.  If you would like to contribute,       **
**  please contact me.                            **
**------------------------------------------------*/


^ permalink raw reply	[flat|nested] 118+ messages in thread
* No 100 HZ timer !
@ 2001-08-01 17:22 george anzinger
  2001-08-01 19:34 ` Chris Friesen
  0 siblings, 1 reply; 118+ messages in thread
From: george anzinger @ 2001-08-01 17:22 UTC (permalink / raw)
  To: linux-kernel

I have just posted a patch on sourceforge:
 http://sourceforge.net/projects/high-res-timers

to the 2.4.7 kernel with both ticked and tick less options, switch able
at any time via a /proc interface.  The system is instrumented with
Andrew Mortons time pegs with a couple of enhancements so you can easily
see your clock/ timer overhead (thanks Andrew).

Please take a look at this system and let me know if a tick less system
is worth further effort.  

The testing I have done seems to indicate a lower overhead on a lightly
loaded system, about the same overhead with some load, and much more
overhead with a heavy load.  To me this seems like the wrong thing to
do.  We would like as nearly a flat overhead to load curve as we can get
and the ticked system seems to be much better in this regard.  Still
there may be applications where this works.

comments?  RESULTS?

George

^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer!
@ 2001-04-12 13:14 Bret Indrelee
  0 siblings, 0 replies; 118+ messages in thread
From: Bret Indrelee @ 2001-04-12 13:14 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Thu, 12 Apr 2001, Mark Salisbury wrote:
> On Wed, 11 Apr 2001, Bret Indrelee wrote:
> > Current generation PCs can easily handle 1000s of
> > interrupts a second if you keep the overhead small.
> 
> the PC centric implementation of the ticked system is one of its major flaws.
> 
> there are architectures which the cost of a fixed interval is the same as a
> variable interval, i.e. no reload register, so you must explicitely load a
> value each interrupt anyway. and if you want accurate intervals, you must
> perform a calculation each reload, and not just load a static value, because
> you don't know how many cycles it took from the time the interrupt happened
> till you get to the reload line because the int may have been masked or lower
> pri than another interrupt.

People were saying the cost of adjusting the PIT was high.

On those archs where this is true, you would want to avoid changing the
interval timer.

On a more reasonable architechure, you would change it each time the head
of the timer list changed.

> also, why handle 1000's of interrupts if you only need to handle 10?

There is no reason.

I was pointing out that current hardware can easily handle this sort of
load, not advocating that distributions change HZ to 1000.


On Linux 2.2, if you change HZ to 400 your system is going to run at
about 70% of speed. The thing is it is limited to this value, bumping it
any higher doesn't cause a change.

If you reprogram the RTC to generate a 1000 HZ interrupt pulse, as I 
recall there is about a 3% change in performance. This is with a constant
rate interrupt, making it variable rate would reduce this.

You can verify this yourself. Get whatever benchmark you prefer. Run it on
a system. Rebuild after changing HZ to 400 and run it again. Restore HZ
and use the RTC driver to produce an interval of 1024 HZ, run your
benchmark again. Changing HZ had a huge performance impact in 2.2.13, I'm
pretty sure it didn't change much in later 2.2 releases.

Seems to me like there is a problem here. I'm pretty sure it is a
combination of the overhead of the cascaded timers combined with the
very high overhead of the scheduler that causes this. Someday this should
be fixed.


What I would like to see is support for a higher resolution timer in
Linux, for those of us that need times down to about 1mSec. Those systems
that can quickly reprogram the interval timer would do so each time a new
value was at the head of list, others would have to do it more
intelligently.

Regardless of how it is done, forcing the system to run a constant 1000 HZ
interrupt through the system should not impact system performance more
than a few percent. If it does, something was done wrong. When there is
need for the higher resolution timer, people will accept the overhead. On
most systems it would not be used, so there would be no reason to run the
higher rate timer.


-Bret

------------------------------------------------------------------------------
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
bret@io.com   |  -Riff in The Quatrix



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer!
@ 2001-04-11 17:56 Bret Indrelee
  2001-04-12 17:39 ` george anzinger
  0 siblings, 1 reply; 118+ messages in thread
From: Bret Indrelee @ 2001-04-11 17:56 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Mikulas Patocka (mikulas@artax.karlin.mff.cuni.cz) wrote:
> Adding and removing timers happens much more frequently than PIT tick,
> so 
> comparing these times is pointless. 
>
> If you have some device and timer protecting it from lockup on buggy 
> hardware, you actually 
>
> send request to device 
> add timer 
>
> receive interrupt and read reply 
> remove timer 
>
> With the curent timer semantics, the cost of add timer and del timer is 
> nearly zero. If you had to reprogram the PIT on each request and reply,
> it 
> would slow things down. 
>
> Note that you call mod_timer also on each packet received - and in worst 
> case (which may happen), you end up reprogramming the PIT on each
> packet. 

You can still have nearly zero cost for the normal case. Avoiding worst
case behaviour is also pretty easy.

You only reprogram the PIT if you have to change the interval.

Keep all timers in a sorted double-linked list. Do the insert
intelligently, adding it from the back or front of the list depending on
where it is in relation to existing entries.

You only need to reprogram the interval timer when:
1. You've got a new entry at the head of the list
AND
2. You've reprogrammed the interval to something larger than the time to 
the new head of list.

In the case of a device timeout, it is usually not going to be inserted at
the head of the list. It is very seldom going to actually timeout.

Choose your interval wisely, only increasing it when you know it will pay
off. The best way of doing this would probably be to track some sort
of LCD for timeouts.

The real trick is to do a lot less processing on every tick than is
currently done. Current generation PCs can easily handle 1000s of
interrupts a second if you keep the overhead small.

-Bret

------------------------------------------------------------------------------
Bret Indrelee |  Sometimes, to be deep, we must act shallow!
bret@io.com   |  -Riff in The Quatrix



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer !
@ 2001-04-11  9:06 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-11  9:06 UTC (permalink / raw)
  To: george anzinger
  Cc: Jamie Lokier, high-res-timers-discourse, Alan Cox,
	Mikulas Patocka, David Schleef, Mark Salisbury, Jeff Dike,
	linux-kernel



>f) As noted, the account timers (task user/system times) would be much
>more accurate with the tick less approach.  The cost is added code in
>both the system call and the schedule path.
>
>Tentative conclusions:
>
>Currently we feel that the tick less approach is not acceptable due to
>(f).  We felt that this added code would NOT be welcome AND would, in a
>reasonably active system, have much higher overhead than any savings in
>not having a tick.  Also (d) implies a list organization that will, at
>the very least, be harder to understand.  (We have some thoughts here,
>but abandoned the effort because of (f).)  We are, of course, open to
>discussion on this issue and all others related to the project
>objectives.
f) might be true on Intel based systems. At least for s/390 the situation
is a little bit different. Here is a extract from the s/390 part of the
timer patch:

       .macro  UPDATE_ENTER_TIME reload
       la      %r14,thread+_TSS_UTIME(%r9) # pointer to utime
       tm      SP_PSW+1(%r15),0x01      # interrupting from user ?
       jno     0f                       # yes -> add to user time
       la      %r14,8(%r14)             # no -> add to system time
0:     lm      %r0,%r1,0(%r14)          # load user/system time
       sl      %r1,__LC_LAST_MARK+4     # subtract last time mark
       bc      3,BASED(1f)              # borrow ?
       sl      %r0,BASED(.Lc1)
1:     sl      %r0,__LC_LAST_MARK
       stck    __LC_LAST_MARK           # make a new mark
       al      %r1,__LC_LAST_MARK+4     # add new mark -> added delta
       bc      12,BASED(2f)             # carry ?
       al      %r0,BASED(.Lc1)
2:     al      %r0,__LC_LAST_MARK
       stm     %r0,%r1,0(%r14)          # store updated user/system time
       clc     __LC_LAST_MARK(8),__LC_JIFFY_TIMER # check if enough time
       jl      3f                       # passed for a jiffy update
       l       %r1,BASED(.Ltime_warp)
       basr    %r14,%r1
       .if     \reload                  # reload != 0 for system call
       lm      %r2,%r6,SP_R2(%r15)      # reload clobbered parameters
       .endif
3:

       .macro  UPDATE_LEAVE_TIME
       l       %r1,BASED(.Ltq_timer)    # test if tq_timer list is empty
       x       %r1,0(%r1)               # tq_timer->next != tq_timer ?
       jz      0f
       l       %r1,BASED(.Ltq_timer_active)
       icm     %r0,15,0(%r1)            # timer event already added ?
       jnz     0f
       l       %r1,BASED(.Ltq_pending)
       basr    %r14,%r1
0:     lm      %r0,%r1,thread+_TSS_STIME(%r9) # load system time
       sl      %r1,__LC_LAST_MARK+4     # subtract last time mark
       bc      3,BASED(1f)              # borrow ?
       sl      %r0,BASED(.Lc1)
1:     sl      %r0,__LC_LAST_MARK
       stck    __LC_LAST_MARK           # make new mark
       al      %r1,__LC_LAST_MARK+4     # add new mark -> added delta
       bc      12,BASED(2f)             # carry ?
       al      %r0,BASED(.Lc1)
2:     al      %r0,__LC_LAST_MARK
       stm     %r0,%r1,thread+_TSS_STIME(%r9) # store system time
       .endm

The two macros UPDATE_ENTER_TIME and UPDATE_LEAVE_TIMER are executed
on every system entry/exit. In the case that no special work has to
be done less then 31 instruction are executed in addition to the
normal system entry/exit code. Special work has to be done if more
time then 1/HZ has passed (call time_warp), or if tq_timer contains
an element (call tq_pending).
The accuracy of the timer events has not changed. It still is 1/HZ.
The only thing this patch does is to avoid unneeded interruptions.
I'd be happy if this could be combined with a new, more accurate
timing method.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer !
@ 2001-04-10 14:42 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10 14:42 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Schleef, Alan Cox, Mark Salisbury, Jeff Dike, linux-kernel



>BTW. Why we need to redesign timers at all? The cost of timer interrupt
>each 1/100 second is nearly zero (1000 instances on S/390 VM is not common
>case - it is not reasonable to degradate performance of timers because of
>this).
The cost of the timer interrupts on a single image system is neglectable,
true. As I already pointed out in the original proposal we are looking
for a solution that will allow us to minimize the costs of the timer
interrupts when we run many images. For us this case is not unusual and
it is reasonable to degrade performance of a running system by a very
small amount to get rid of the HZ timer. This proposal was never meant
to be the perfect solution for every platform, that is why it is
configuratable with the CONFIG_NO_HZ_TIMER option.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer !
@ 2001-04-10 12:54 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10 12:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alan Cox, Andi Kleen, Mark Salisbury, Jeff Dike, linux-kernel



>Does not sound very attractive all at all on non virtual machines (I see
the point on
>UML/VM):
>making system entry/context switch/interrupts slower, making add_timer
slower, just to
>process a few less timer interrupts. That's like robbing the fast paths
for a slow path.
The system entry/exit/context switch is slower. The add_timer/mod_timer is
only
a little bit slower in the case a new soonest timer event has been created.
 I
think you can forget the additional overhead for add_timer/mod_timer, its
the
additional path length on the system entry/exit that might be problematic.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer !
@ 2001-04-10 11:38 schwidefsky
  2001-04-10 11:54 ` Alan Cox
  0 siblings, 1 reply; 118+ messages in thread
From: schwidefsky @ 2001-04-10 11:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Mark Salisbury, Jeff Dike, linux-kernel



>> Just how would you do kernel/user CPU time accounting then ?  It's
currently done
>> on every timer tick, and doing it less often would make it useless.
>
>On the contrary doing it less often but at the right time massively
improves
>its accuracy. You do it on reschedule. An rdtsc instruction is cheap and
all
>of a sudden you have nearly cycle accurate accounting
If you do the accounting on reschedule, how do you find out how much time
has been spent in user versus kernel mode? Or do the Intel chips have two
counters, one for user space execution and one for the kernel?

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer !
@ 2001-04-10  7:29 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10  7:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mark Salisbury, Jeff Dike, linux-kernel



>Its worth doing even on the ancient x86 boards with the PIT. It does
require
>some driver changes since
>
>
>    while(time_before(jiffies, we_explode))
>         poll_things();
>
>no longer works
On S/390 we have a big advantage here. Driver code of this kind does not
exist.
That makes it a lot easier for us compared to other architectures. As I
said in
the original posting, the patch I have is working fine for S/390.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread
* Re: No 100 HZ timer !
@ 2001-04-10  7:27 schwidefsky
  0 siblings, 0 replies; 118+ messages in thread
From: schwidefsky @ 2001-04-10  7:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Salisbury, Jeff Dike, linux-kernel



>Just how would you do kernel/user CPU time accounting then ?  It's
currently done
>on every timer tick, and doing it less often would make it useless.
This part is architecture dependent. For S/390 I choose to do a "STCK" on
every
system entry/exit. Dunno if this can be done on other architectures too, on
 S/390
this is reasonably cheap (one STCK costs 15 cycles). That means the
kernel/user CPU
time accounting is MUCH better now.

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com



^ permalink raw reply	[flat|nested] 118+ messages in thread
* No 100 HZ timer !
@ 2001-04-09 15:54 schwidefsky
  2001-04-09 18:30 ` Jeff Dike
  0 siblings, 1 reply; 118+ messages in thread
From: schwidefsky @ 2001-04-09 15:54 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5671 bytes --]




Hi,
seems like my first try with the complete patch hasn't made it through
to the mailing list. This is the second try with only the common part of
the patch. Here we go (again):
---

I have a suggestion that might seem unusual at first but it is important
for Linux on S/390. We are facing the problem that we want to start many
(> 1000) Linux images on a big S/390 machine. Every image has its own
100 HZ timer on every processor the images uses (normally 1). On a single
image system the processor use of the 100 HZ timer is not a big deal but
with > 1000 images you need a lot of processing power just to execute the
100 HZ timers. You quickly end up with 100% CPU only for the timer
interrupts of otherwise idle images. Therefore I had a go at the timer
stuff and now I have a system running without the 100 HZ timer. Unluckly
I need to make changes to common code and I want you opinion on it.

The first problem was how to get rid of the jiffies. The solution is
simple. I simply defined a macro that calculates the jiffies value from
the TOD clock:
  #define jiffies ({ \
          uint64_t __ticks; \
          asm ("STCK %0" : "=m" (__ticks) ); \
          __ticks = (__ticks - init_timer_cc) >> 12; \
          do_div(__ticks, (1000000/HZ)); \
          ((unsigned long) __ticks); \
  })
With this define you are independent of the jiffies variable which is no
longer needed so I ifdef'ed the definition. There are some places where a
local variable is named jiffies. You may not replace these so I renamed
them to _jiffies. A kernel compiled with only this change works as always.

The second problem is that you need to be able to find out when the next
timer event is due to happen. You'll find a new function "next_timer_event"
in the patch which traverses tv1-tv5 and returns the timer_list of the next
timer event. It is used in timer_bh to indicate to the backend when the
next interrupt should happen. This leads us to the notifier functions.
Each time a new timer is added, a timer is modified, or a timer expires
the architecture backend needs to reset its timeout value. That is what the
"timer_notify" callback is used for. The implementation on S/390 uses the
clock comparator and looks like this:
  static void s390_timer_notify(unsigned long expires)
  {
          S390_lowcore.timer_event =
                  ((__u64) expires*CLK_TICKS_PER_JIFFY) + init_timer_cc;
          asm volatile ("SCKC %0" : : "m" (S390_lowcore.timer_event));
  }
This causes an interrupt on the cpu which executed s390_timer_notify after
"expires" has passed. That means that timer events are spread over the cpus
in the system. Modified or deleted timer events do not cause a deletion
notification. A cpu might be errornously interrupted to early because of a
timer event that has been modified or deleted. But that doesn't do any
harm,
it is just unnecessary work.

There is a second callback "itimer_notify" that is used to get the per
process timers right. We use the cpu timer for this purpose:
  void set_cpu_timer(void)
  {
          unsigned long min_ticks;
          __u64 time_slice;
          if (current->pid != 0 && current->need_resched == 0) {
                  min_ticks = current->counter;
                  if (current->it_prof_value != 0 &&
                      current->it_prof_value < min_ticks)
                          min_ticks = current->it_prof_value;
                  if (current->it_virt_value != 0 &&
                      current->it_virt_value < min_ticks)
                          min_ticks = current->it_virt_value;
                  time_slice = (__u64) min_ticks*CLK_TICKS_PER_JIFFY;
                  asm volatile ("spt %0" : : "m" (time_slice));
          }
  }
The cpu timer is a one shot timer that interrupts after the specified
amount
of time has passed. Not a 100% accurate because VM can schedule the virtual
processor before the "spt" has been done but good enough for per process
timers.

The remaining changes to common code parts deal with the problem that many
ticks may be accounted at once. For example without the 100 HZ timer it is
possible that a process runs for half a second in user space. With the next
interrupt all the ticks between the last update and the interrupt have to
be added to the tick counters. This is why update_wall_time and do_it_prof
have changed and update_process_times2 has been introduced.

That leaves three problems: 1) you need to check on every system entry if
a tick or more has passed and do the update if necessary, 2) you need to
keep track of the elapsed time in user space and in kernel space and 3) you
need to check tq_timer every time the system is left and setup a timer
event for the next timer tick if there is work to do on the timer queue.
These three problems are related and have to be implemented architecture
dependent. A nice thing we get for free is that the user/kernel elapsed
time
measurement gets much more accurate.

The number of interrupts in an idle system due to timer activity drops from
from 100 per second on every cpu to about 5-6 on all (!) cpus if this patch
is used. Exactly what we want to have.

All this new timer code is only used if the config option
CONFIG_NO_HZ_TIMER
is set. Without it everything works as always, especially for architectures
that will not use it.

Now what do you think?

(See attached file: timer_common)

blue skies,
   Martin

Linux/390 Design & Development, IBM Deutschland Entwicklung GmbH
Schönaicherstr. 220, D-71032 Böblingen, Telefon: 49 - (0)7031 - 16-2247
E-Mail: schwidefsky@de.ibm.com

[-- Attachment #2: timer_common --]
[-- Type: application/octet-stream, Size: 13138 bytes --]

diff -urN linux-2.4.3/include/linux/sched.h linux-2.4.3-s390/include/linux/sched.h
--- linux-2.4.3/include/linux/sched.h	Tue Mar 27 01:48:11 2001
+++ linux-2.4.3-s390/include/linux/sched.h	Thu Apr  5 10:23:48 2001
@@ -144,6 +144,7 @@
 extern void cpu_init (void);
 extern void trap_init(void);
 extern void update_process_times(int user);
+extern void update_process_times2(int user, int system);
 extern void update_one_process(struct task_struct *p, unsigned long user,
 			       unsigned long system, int cpu);
 
@@ -534,7 +535,9 @@
 
 #include <asm/current.h>
 
+#ifndef CONFIG_NO_HZ_TIMER
 extern unsigned long volatile jiffies;
+#endif
 extern unsigned long itimer_ticks;
 extern unsigned long itimer_next;
 extern struct timeval xtime;
diff -urN linux-2.4.3/include/linux/time.h linux-2.4.3-s390/include/linux/time.h
--- linux-2.4.3/include/linux/time.h	Tue Mar 27 01:48:10 2001
+++ linux-2.4.3-s390/include/linux/time.h	Thu Apr  5 10:23:48 2001
@@ -42,10 +42,10 @@
 }
 
 static __inline__ void
-jiffies_to_timespec(unsigned long jiffies, struct timespec *value)
+jiffies_to_timespec(unsigned long _jiffies, struct timespec *value)
 {
-	value->tv_nsec = (jiffies % HZ) * (1000000000L / HZ);
-	value->tv_sec = jiffies / HZ;
+	value->tv_nsec = (_jiffies % HZ) * (1000000000L / HZ);
+	value->tv_sec = _jiffies / HZ;
 }
 
 
diff -urN linux-2.4.3/include/linux/timer.h linux-2.4.3-s390/include/linux/timer.h
--- linux-2.4.3/include/linux/timer.h	Tue Mar 27 01:48:10 2001
+++ linux-2.4.3-s390/include/linux/timer.h	Thu Apr  5 10:23:48 2001
@@ -35,6 +35,18 @@
 #define sync_timers()		do { } while (0)
 #endif
 
+#ifdef CONFIG_NO_HZ_TIMER
+/*
+ * Setting timer_notify to something != NULL will make
+ * the timer routines call the notification routine
+ * whenever a new add_timer/mod_timer has set a new
+ * soonest timer event.
+ */
+extern void (*timer_notify)(unsigned long expires);
+extern void (*itimer_notify)(void);
+extern void update_times_irqsave(void);
+#endif
+
 /*
  * mod_timer is a more efficient way to update the expire field of an
  * active timer (if the timer is inactive it will be activated)
diff -urN linux-2.4.3/kernel/itimer.c linux-2.4.3-s390/kernel/itimer.c
--- linux-2.4.3/kernel/itimer.c	Thu Jun 29 19:07:36 2000
+++ linux-2.4.3-s390/kernel/itimer.c	Thu Apr  5 10:23:48 2001
@@ -34,10 +34,10 @@
 	return HZ*sec+usec;
 }
 
-static void jiffiestotv(unsigned long jiffies, struct timeval *value)
+static void jiffiestotv(unsigned long _jiffies, struct timeval *value)
 {
-	value->tv_usec = (jiffies % HZ) * (1000000 / HZ);
-	value->tv_sec = jiffies / HZ;
+	value->tv_usec = (_jiffies % HZ) * (1000000 / HZ);
+	value->tv_sec = _jiffies / HZ;
 }
 
 int do_getitimer(int which, struct itimerval *value)
@@ -105,6 +105,16 @@
 	}
 }
 
+#ifdef CONFIG_NO_HZ_TIMER
+void (*itimer_notify)(void) = NULL;
+
+static inline void do_itimer_notify(void)
+{
+	if (itimer_notify != NULL)
+                (*itimer_notify)();
+}
+#endif
+
 int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)
 {
 	register unsigned long i, j;
@@ -132,12 +142,18 @@
 				j++;
 			current->it_virt_value = j;
 			current->it_virt_incr = i;
+#ifdef CONFIG_NO_HZ_TIMER
+			do_itimer_notify();
+#endif
 			break;
 		case ITIMER_PROF:
 			if (j)
 				j++;
 			current->it_prof_value = j;
 			current->it_prof_incr = i;
+#ifdef CONFIG_NO_HZ_TIMER
+			do_itimer_notify();
+#endif
 			break;
 		default:
 			return -EINVAL;
diff -urN linux-2.4.3/kernel/ksyms.c linux-2.4.3-s390/kernel/ksyms.c
--- linux-2.4.3/kernel/ksyms.c	Thu Apr  5 10:23:36 2001
+++ linux-2.4.3-s390/kernel/ksyms.c	Thu Apr  5 10:23:48 2001
@@ -429,7 +429,9 @@
 EXPORT_SYMBOL(interruptible_sleep_on_timeout);
 EXPORT_SYMBOL(schedule);
 EXPORT_SYMBOL(schedule_timeout);
+#ifndef CONFIG_NO_HZ_TIMER
 EXPORT_SYMBOL(jiffies);
+#endif
 EXPORT_SYMBOL(xtime);
 EXPORT_SYMBOL(do_gettimeofday);
 EXPORT_SYMBOL(do_settimeofday);
diff -urN linux-2.4.3/kernel/timer.c linux-2.4.3-s390/kernel/timer.c
--- linux-2.4.3/kernel/timer.c	Sun Dec 10 18:53:19 2000
+++ linux-2.4.3-s390/kernel/timer.c	Thu Apr  5 10:23:48 2001
@@ -65,7 +65,9 @@
 
 extern int do_setitimer(int, struct itimerval *, struct itimerval *);
 
+#ifndef CONFIG_NO_HZ_TIMER
 unsigned long volatile jiffies;
+#endif
 
 unsigned int * prof_buffer;
 unsigned long prof_len;
@@ -173,6 +175,22 @@
 #define timer_exit()		do { } while (0)
 #endif
 
+#ifdef CONFIG_NO_HZ_TIMER
+void (*timer_notify)(unsigned long) = NULL;
+unsigned long notify_jiffy = 0;
+
+static inline void do_timer_notify(struct timer_list *timer)
+{
+	if (timer_notify != NULL) {
+		if (notify_jiffy == 0 ||
+		    time_before(timer->expires, notify_jiffy)) {
+			(*timer_notify)(timer->expires);
+			notify_jiffy = timer->expires;
+		}
+	}
+}
+#endif
+
 void add_timer(struct timer_list *timer)
 {
 	unsigned long flags;
@@ -181,6 +199,9 @@
 	if (timer_pending(timer))
 		goto bug;
 	internal_add_timer(timer);
+#ifdef CONFIG_NO_HZ_TIMER
+	do_timer_notify(timer);
+#endif
 	spin_unlock_irqrestore(&timerlist_lock, flags);
 	return;
 bug:
@@ -206,6 +227,9 @@
 	timer->expires = expires;
 	ret = detach_timer(timer);
 	internal_add_timer(timer);
+#ifdef CONFIG_NO_HZ_TIMER
+	do_timer_notify(timer);
+#endif
 	spin_unlock_irqrestore(&timerlist_lock, flags);
 	return ret;
 }
@@ -323,6 +347,89 @@
 	spin_unlock_irq(&timerlist_lock);
 }
 
+#ifdef CONFIG_NO_HZ_TIMER
+/*
+ * Check timer list for earliest timer
+ */
+static inline struct timer_list *
+earlier_timer_in_list(struct list_head *head, struct timer_list *event)
+{
+	struct list_head *curr;
+
+	if (list_empty(head))
+		return event;
+	curr = head->next;
+	if (event == NULL) {
+		event = list_entry(curr, struct timer_list, list);
+		curr = curr->next;
+	}
+	while (curr != head) {
+		struct timer_list * tmp;
+
+		tmp = list_entry(curr, struct timer_list, list);
+		if (time_before(tmp->expires, event->expires))
+			event = tmp;
+		curr = curr->next;
+	}
+	return event;
+}
+
+/*
+ * Find out when the next timer event is due to happen. This
+ * is used on S/390 to be able to skip timer ticks.
+ * The timerlist_lock must be acquired before calling this function.
+ */
+struct timer_list *next_timer_event(void)
+{
+	struct timer_list *nte = NULL;
+	int i;
+
+	/* Look for the next timer event in tv1. */
+	i = tv1.index;
+	do {
+		struct list_head *head = tv1.vec + i;
+		if (!list_empty(head)) {
+			nte = list_entry(head->next, struct timer_list, list);
+			if (i < tv1.index) {
+				/* 
+				 * The search wrapped. We need to look
+				 * at the next list from tvecs[1] that
+				 * would cascade into tv1.
+				 */
+				head = tvecs[1]->vec + tvecs[1]->index;
+				nte = earlier_timer_in_list(head, nte);
+			}
+			goto out;
+		}
+		i = (i + 1) & TVR_MASK;
+	} while (i != tv1.index);
+
+	/* No event found in tv1. Check tv2-tv5. */
+	for (i = 1; i < NOOF_TVECS; i++) {
+		int j = tvecs[i]->index;
+		do {
+			struct list_head *head = tvecs[i]->vec + j;
+			nte = earlier_timer_in_list(head, NULL);
+			if (nte) {
+				if (j < tvecs[i]->index && i < NOOF_TVECS-1) {
+					/* 
+					 * The search wrapped. We need to look
+					 * at the next list from tvecs[i+1]
+					 * that would cascade into tvecs[i].
+					 */
+					head = tvecs[i+1]->vec+tvecs[i+1]->index;
+					nte = earlier_timer_in_list(head, nte);
+				}
+				goto out;
+			}
+			j = (j + 1) & TVN_MASK;
+		} while (j != tvecs[i]->index);
+	}
+ out:
+	return nte;
+}
+#endif
+
 spinlock_t tqueue_lock = SPIN_LOCK_UNLOCKED;
 
 void tqueue_bh(void)
@@ -458,8 +565,13 @@
 #endif
 }
 
-/* in the NTP reference this is called "hardclock()" */
-static void update_wall_time_one_tick(void)
+/*
+ * The ticks loop used in the past is gone because with
+ * the CONFIG_NO_HZ_TIMER config option on S/390 it is
+ * possible that ticks is a lot bigger than one.
+ *   -- martin
+ */
+static void update_wall_time(unsigned long ticks)
 {
 	if ( (time_adjust_step = time_adjust) != 0 ) {
 	    /* We are doing an adjtime thing. 
@@ -470,21 +582,22 @@
 	     *
 	     * Limit the amount of the step to be in the range
 	     * -tickadj .. +tickadj
+             * per tick.
 	     */
-	     if (time_adjust > tickadj)
-		time_adjust_step = tickadj;
-	     else if (time_adjust < -tickadj)
-		time_adjust_step = -tickadj;
+	     if (time_adjust > tickadj*ticks)
+		time_adjust_step = tickadj*ticks;
+	     else if (time_adjust < -tickadj*ticks)
+		time_adjust_step = -tickadj*ticks;
 	     
 	    /* Reduce by this step the amount of time left  */
 	    time_adjust -= time_adjust_step;
 	}
-	xtime.tv_usec += tick + time_adjust_step;
+	xtime.tv_usec += tick*ticks + time_adjust_step;
 	/*
 	 * Advance the phase, once it gets to one microsecond, then
 	 * advance the tick more.
 	 */
-	time_phase += time_adj;
+	time_phase += time_adj*ticks;
 	if (time_phase <= -FINEUSEC) {
 		long ltemp = -time_phase >> SHIFT_SCALE;
 		time_phase += ltemp << SHIFT_SCALE;
@@ -495,21 +608,6 @@
 		time_phase -= ltemp << SHIFT_SCALE;
 		xtime.tv_usec += ltemp;
 	}
-}
-
-/*
- * Using a loop looks inefficient, but "ticks" is
- * usually just one (we shouldn't be losing ticks,
- * we're doing this this way mainly for interrupt
- * latency reasons, not because we think we'll
- * have lots of lost timer ticks
- */
-static void update_wall_time(unsigned long ticks)
-{
-	do {
-		ticks--;
-		update_wall_time_one_tick();
-	} while (ticks);
 
 	if (xtime.tv_usec >= 1000000) {
 	    xtime.tv_usec -= 1000000;
@@ -527,7 +625,7 @@
 	psecs += (p->times.tms_stime += system);
 	if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) {
 		/* Send SIGXCPU every second.. */
-		if (!(psecs % HZ))
+		if ((psecs % HZ) < user+system)
 			send_sig(SIGXCPU, p, 1);
 		/* and SIGKILL when we go over max.. */
 		if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_max)
@@ -540,24 +638,25 @@
 	unsigned long it_virt = p->it_virt_value;
 
 	if (it_virt) {
-		it_virt -= ticks;
-		if (!it_virt) {
+		if (it_virt <= ticks) {
 			it_virt = p->it_virt_incr;
 			send_sig(SIGVTALRM, p, 1);
-		}
+		} else
+			it_virt -= ticks;
 		p->it_virt_value = it_virt;
 	}
 }
 
-static inline void do_it_prof(struct task_struct *p)
+static inline void do_it_prof(struct task_struct *p, unsigned long ticks)
 {
 	unsigned long it_prof = p->it_prof_value;
 
 	if (it_prof) {
-		if (--it_prof == 0) {
+		if (it_prof <= ticks) {
 			it_prof = p->it_prof_incr;
 			send_sig(SIGPROF, p, 1);
-		}
+		} else
+			it_prof -= ticks;
 		p->it_prof_value = it_prof;
 	}
 }
@@ -569,7 +668,7 @@
 	p->per_cpu_stime[cpu] += system;
 	do_process_times(p, user, system);
 	do_it_virt(p, user);
-	do_it_prof(p);
+	do_it_prof(p, user + system);
 }	
 
 /*
@@ -597,6 +696,31 @@
 }
 
 /*
+ * Called from the timer interrupt handler to charge a couple of
+ * system and user ticks. 
+ */
+void update_process_times2(int user, int system)
+{
+	struct task_struct *p = current;
+	int cpu = smp_processor_id();
+
+	update_one_process(p, user, system, cpu);
+	if (p->pid) {
+		p->counter -= user + system;
+		if (p->counter <= 0) {
+			p->counter = 0;
+			p->need_resched = 1;
+		}
+		if (p->nice > 0)
+			kstat.per_cpu_nice[cpu] += user;
+		else
+			kstat.per_cpu_user[cpu] += user;
+		kstat.per_cpu_system[cpu] += system;
+	} else if (local_bh_count(cpu) || local_irq_count(cpu) > 1)
+		kstat.per_cpu_system[cpu] += system;
+}
+
+/*
  * Nr of active tasks - counted in fixed-point numbers
  */
 static unsigned long count_active_tasks(void)
@@ -628,7 +752,7 @@
 	static int count = LOAD_FREQ;
 
 	count -= ticks;
-	if (count < 0) {
+	while (count < 0) {
 		count += LOAD_FREQ;
 		active_tasks = count_active_tasks();
 		CALC_LOAD(avenrun[0], EXP_1, active_tasks);
@@ -650,7 +774,7 @@
 	unsigned long ticks;
 
 	/*
-	 * update_times() is run from the raw timer_bh handler so we
+	 * do_update_times() is run from the raw timer_bh handler so we
 	 * just know that the irqs are locally enabled and so we don't
 	 * need to save/restore the flags of the local CPU here. -arca
 	 */
@@ -665,12 +789,49 @@
 	calc_load(ticks);
 }
 
+void update_times_irqsave(void)
+{
+	unsigned long ticks;
+	unsigned long flags;
+
+	/*
+	 * do_update_times() is run from the raw timer_bh handler so we
+	 * just know that the irqs are locally enabled and so we don't
+	 * need to save/restore the flags of the local CPU here. -arca
+	 */
+	write_lock_irqsave(&xtime_lock, flags);
+
+	ticks = jiffies - wall_jiffies;
+	if (ticks) {
+		wall_jiffies += ticks;
+		update_wall_time(ticks);
+	}
+	write_unlock_irqrestore(&xtime_lock, flags);
+	calc_load(ticks);
+}
+
 void timer_bh(void)
 {
 	update_times();
 	run_timer_list();
+#ifdef CONFIG_NO_HZ_TIMER
+	if (timer_notify != NULL) {
+		struct timer_list *timer;
+		unsigned long flags;
+
+		spin_lock_irqsave(&timerlist_lock, flags);
+		timer = next_timer_event();
+		if (timer != NULL) {
+			(*timer_notify)(timer->expires);
+			notify_jiffy = timer->expires;
+		} else
+			notify_jiffy = 0;
+		spin_unlock_irqrestore(&timerlist_lock, flags);
+	}
+#endif
 }
 
+#ifndef CONFIG_NO_HZ_TIMER
 void do_timer(struct pt_regs *regs)
 {
 	(*(unsigned long *)&jiffies)++;
@@ -683,6 +844,7 @@
 	if (TQ_ACTIVE(tq_timer))
 		mark_bh(TQUEUE_BH);
 }
+#endif
 
 #if !defined(__alpha__) && !defined(__ia64__)
 

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2001-08-15 11:21 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-12 12:58 No 100 HZ timer ! Mark Salisbury
  -- strict thread matches above, loose matches on Subject: below --
2001-08-01 17:22 george anzinger
2001-08-01 19:34 ` Chris Friesen
2001-08-01 19:49   ` Richard B. Johnson
2001-08-01 20:08     ` Mark Salisbury
2001-08-01 20:33     ` george anzinger
2001-08-01 21:20   ` george anzinger
2001-08-02  4:28     ` Rik van Riel
2001-08-02  6:03       ` george anzinger
2001-08-02 14:39         ` Oliver Xymoron
2001-08-02 16:36           ` george anzinger
2001-08-02 17:05             ` Oliver Xymoron
2001-08-02 17:46               ` george anzinger
2001-08-02 18:41                 ` Oliver Xymoron
2001-08-02 21:18                   ` george anzinger
2001-08-02 22:09                     ` Oliver Xymoron
2001-08-02 17:26             ` John Alvord
2001-04-12 13:14 No 100 HZ timer! Bret Indrelee
2001-04-11 17:56 Bret Indrelee
2001-04-12 17:39 ` george anzinger
2001-04-12 21:19   ` Bret Indrelee
2001-04-12 22:20     ` george anzinger
2001-04-13  4:00       ` Bret Indrelee
2001-04-13  6:32         ` Ben Greear
2001-04-13  8:42           ` george anzinger
2001-04-13 10:36             ` Jamie Lokier
2001-04-13 16:07               ` george anzinger
2001-04-13 23:00                 ` Jamie Lokier
2001-04-13 12:05           ` Horst von Brand
2001-04-13 21:53             ` george anzinger
2001-04-13 23:10               ` Jamie Lokier
2001-04-16  3:02                 ` Ben Greear
2001-04-16  2:46                   ` Jamie Lokier
2001-04-16 12:36                     ` Mark Salisbury
2001-04-16 19:19                       ` george anzinger
2001-04-16 20:45                         ` Albert D. Cahalan
2001-04-16 21:29                           ` Chris Wedgwood
2001-04-16 22:25                           ` george anzinger
2001-04-16 23:57                         ` Mark Salisbury
2001-04-17  0:45                           ` george anzinger
2001-04-17 12:12                             ` Mark Salisbury
2001-04-17 12:51                         ` Mark Salisbury
2001-04-17 18:53                           ` george anzinger
2001-04-17 19:41                             ` Jamie Lokier
2001-04-23  8:05                             ` Ulrich Windl
2001-04-23 13:22                               ` Mark Salisbury
2001-04-16  2:41               ` Ben Greear
2001-04-11  9:06 No 100 HZ timer ! schwidefsky
2001-04-10 14:42 schwidefsky
2001-04-10 12:54 schwidefsky
2001-04-10 11:38 schwidefsky
2001-04-10 11:54 ` Alan Cox
2001-04-10  7:29 schwidefsky
2001-04-10  7:27 schwidefsky
2001-04-09 15:54 schwidefsky
2001-04-09 18:30 ` Jeff Dike
2001-04-09 18:19   ` Mark Salisbury
2001-04-09 20:12     ` Alan Cox
2001-04-09 20:32       ` Mark Salisbury
2001-04-09 22:31       ` Mikulas Patocka
2001-04-09 22:35         ` Alan Cox
2001-04-10 11:43           ` David Schleef
2001-04-10 12:04             ` Mikulas Patocka
2001-04-10 12:31               ` David Schleef
2001-04-10 12:34                 ` Mark Salisbury
2001-04-10 14:10                 ` Mikulas Patocka
2001-04-10 13:35                   ` root
2001-04-10 14:22                   ` Andi Kleen
2001-04-10 15:43                   ` Alan Cox
2001-04-12  5:25                     ` watermodem
2001-04-12  8:45                       ` Jamie Lokier
2001-04-10 17:15                   ` Jamie Lokier
2001-04-10 17:27                     ` Alan Cox
2001-04-10 17:35                       ` Jamie Lokier
2001-04-10 18:17                         ` Alan Cox
2001-04-10 18:24                           ` Jamie Lokier
2001-04-10 19:28                             ` george anzinger
2001-04-10 20:02                               ` mark salisbury
2001-04-10 22:08                                 ` george anzinger
2001-04-11  0:48                                   ` Mark Salisbury
2001-04-11  2:35                                     ` george anzinger
2001-04-12  0:24                                       ` Mark Salisbury
2001-04-11 16:11                                     ` Jamie Lokier
2001-04-11 16:59                                       ` george anzinger
2001-04-11 18:57                                         ` Jamie Lokier
2001-04-11 19:21                                           ` John Alvord
2001-04-12  8:41                                             ` Jamie Lokier
2001-08-01  1:08                               ` george anzinger
2001-08-11 11:57                                 ` Pavel Machek
2001-08-14 15:59                                   ` Jamie Lokier
2001-08-14 16:57                                     ` george anzinger
2001-04-10 19:50                             ` Zdenek Kabelac
2001-04-11 11:42                               ` Maciej W. Rozycki
2001-04-11 16:13                                 ` Jamie Lokier
2001-04-12  9:51                                   ` Maciej W. Rozycki
2001-04-10 19:42                       ` Zdenek Kabelac
2001-04-10 12:19             ` Mark Salisbury
2001-04-10 17:51             ` yodaiken
2001-04-11 18:43           ` Oliver Xymoron
2001-04-10 12:11       ` Mark Salisbury
2001-04-10  5:51     ` Andi Kleen
2001-04-10  9:33       ` Martin Mares
2001-04-10 10:00         ` Albert D. Cahalan
2001-04-10 12:14         ` Mark Salisbury
2001-04-11  5:55           ` Karim Yaghmour
2001-04-10 11:18       ` Alan Cox
2001-04-10 12:02         ` Andi Kleen
2001-04-10 12:12           ` Alan Cox
2001-04-10 12:27             ` Mark Salisbury
2001-04-10 12:32             ` Andi Kleen
2001-04-10 12:36               ` Alan Cox
2001-04-10 12:37                 ` Andi Kleen
2001-04-10 18:45               ` Stephen D. Williams
2001-04-10 19:59                 ` Andi Kleen
2001-04-10 12:07       ` Mark Salisbury
2001-04-10 12:45         ` Andi Kleen
2001-04-10 12:42           ` Mark Salisbury
2001-04-10 12:54             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).