linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* High resolution timers and BH processing on -RT
@ 2005-01-28  0:13 Thomas Gleixner
  2005-01-28  4:43 ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2005-01-28  0:13 UTC (permalink / raw)
  To: George Anzinger; +Cc: Ingo Molnar, LKML, Doug Niehaus, Benedikt Spranger

Hi,

some thoughts and observations. 

I'm running -RT + a port of UTIME (the grandfather of HRT, IIRC) on a
couple of machines (x86,ppc,arm - all UP) here.

One part of the problem is the gettimeofday() issue, which seems to be
already addressed by John Stulz's patches.

The more interresting question in terms of realtime is the processing of
the timer queue(s) and the reprogramming of the timer for the next
event.

With -RT enabled the timer queue is solely processed by ksoftirqd, which
handles all the other bottom halfs too. So pushing the priority of
ksoftirqd up is not a solution. 
If I understood the HRT code correctly then this applies here too.
Please correct me if I'm wrong.

This makes the timer queue dependend on a variety of other softirq
clients, so you can not predict which of those clients are active and
what's the computation time they consume.

One possibility to solve this is seperating the softirqs into different
threads. We have done this already and it seems to be an interesting
approach in terms of scalability of bottom half processing.

Another approach is splitting realtime timers and normal timers, which
introduces the ugliness of different timer queues, but it has the
advantage that the rare realtime queue users can be handled in the
(threaded) top half for accuracy sake and the "normal" users are still
running on the jiffy bound vanilla linux timer queue, which is handled
in the non predictible path of the ksoftirqd thread.

I know that this topic was discussed various times already, but the -RT
view adds a different dimension. The modifications for the splitted
queues are minimal invasive in kernel/timer.c, but still introduce the
same amount of noise in nanosleep, itimer and posix_timer code.

Even if we agree that most of the timers are removed from the queue
before the expiry - especially those of the networking code - moving all
timers into high resolution mode results in non trivial noise injection
due to the short reprogramming intervals. 

Some numbers to make this more transparent.

Machine: PIII Celeron 333MHz
RT - T0: 1ms cyclic
RT - T1: 2ms cyclic
....

Load average is ~4.0 for all tests. The numbers are maximum deviation
from the timeline in microseconds. Test time was ~60 minutes for each
szenario.

Running all timers in high resolution mode (ksoftirqd) results in:
[T0  Prio:  60]  2123
[T1  Prio:  59]  2556
[T2  Prio:  58]  2882
[T3  Prio:  57]  2993
[T4  Prio:  56]  2888

Running all timers in high resolution mode (seperated timer softirqd
PRIO=70) results in:
[T0  Prio:  60]  423
[T1  Prio:  59]  372
[T2  Prio:  58]  756
[T3  Prio:  57]  802
[T4  Prio:  56]  1208

Seperating realtime process related timers into the threaded top half
queue (TIMER IRQ PRIO = 90) and running all other timers inside of
ksoftirqd results in:
[T0  Prio:  60]  84
[T1  Prio:  59]  114
[T2  Prio:  58]  120
[T3  Prio:  57]  117
[T4  Prio:  56]  158

I think that these numbers are significant for low power machines. This
area is my main concern - not the highend SMP szenario - as the main
users of realtime systems can be met in the embedded world.

Offtopic, but related question:

I discovered that the USB subsystem introduces 2-5 ms random chaos when
a device is pluged/unplugged. I had not time to track the culprit down
yet. Any pointers ???

tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28  0:13 High resolution timers and BH processing on -RT Thomas Gleixner
@ 2005-01-28  4:43 ` Ingo Molnar
  2005-01-28  8:20   ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2005-01-28  4:43 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: George Anzinger, LKML, Doug Niehaus, Benedikt Spranger


* Thomas Gleixner <tglx@linutronix.de> wrote:

> Some numbers to make this more transparent.
> 
> Machine: PIII Celeron 333MHz
> RT - T0: 1ms cyclic
> RT - T1: 2ms cyclic
> ....
> 
> Load average is ~4.0 for all tests. The numbers are maximum deviation
> from the timeline in microseconds. Test time was ~60 minutes for each
> szenario.
> 
> Running all timers in high resolution mode (ksoftirqd) results in:
> [T0  Prio:  60]  2123
> [T1  Prio:  59]  2556
> [T2  Prio:  58]  2882
> [T3  Prio:  57]  2993
> [T4  Prio:  56]  2888
> 
> Running all timers in high resolution mode (seperated timer softirqd
> PRIO=70) results in:
> [T0  Prio:  60]  423
> [T1  Prio:  59]  372
> [T2  Prio:  58]  756
> [T3  Prio:  57]  802
> [T4  Prio:  56]  1208

is this due to algorithmic/PIT-programming overhead, or due to the noise
introduced by other, non-hard-RT timers? I'd guess the later from the
looks of it, but did your test introduce such noise (via networking and
application workloads?).

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28  4:43 ` Ingo Molnar
@ 2005-01-28  8:20   ` Thomas Gleixner
  2005-01-28  8:24     ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2005-01-28  8:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: George Anzinger, LKML, Doug Niehaus, Benedikt Spranger

On Fri, 2005-01-28 at 05:43 +0100, Ingo Molnar wrote:
> * 
> is this due to algorithmic/PIT-programming overhead, or due to the noise
> introduced by other, non-hard-RT timers? I'd guess the later from the
> looks of it, but did your test introduce such noise (via networking and
> application workloads?).
> 

Right, it's due to noise by non-RT timers, which I enforced by adding
networking and applications.

This adds random timer expires and admittedly the PIT reprogramming
overhead is adding portions of that noise.

tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28  8:20   ` Thomas Gleixner
@ 2005-01-28  8:24     ` Ingo Molnar
  2005-01-28  8:30       ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2005-01-28  8:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: George Anzinger, LKML, Doug Niehaus, Benedikt Spranger


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > is this due to algorithmic/PIT-programming overhead, or due to the noise
> > introduced by other, non-hard-RT timers? I'd guess the later from the
> > looks of it, but did your test introduce such noise (via networking and
> > application workloads?).
> 
> Right, it's due to noise by non-RT timers, which I enforced by adding
> networking and applications.
> 
> This adds random timer expires and admittedly the PIT reprogramming
> overhead is adding portions of that noise.

i havent seen your latest code - what is the basic data-structure? The
stock kernel has arrays of timers with increasing granularity and a
cascade mechanism to move timers down the arrays as they slowly expire -
but with a high-resolution API (1 usec accuracy?) how does the basic
data structure look like?

Is the "noise" due to timers expiring "at once" - but isnt it unlikely
for 'normal' timers to expire in exactly the same usec as the real
high-resolution one?

or is it that we have a 'group' of normal timers expiring, which, if
they happen to occur _just_ prior a HRT event will generate a larger
delay?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28  8:24     ` Ingo Molnar
@ 2005-01-28  8:30       ` Thomas Gleixner
  2005-01-28  8:47         ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2005-01-28  8:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: George Anzinger, LKML, Doug Niehaus, Benedikt Spranger

On Fri, 2005-01-28 at 09:24 +0100, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > > is this due to algorithmic/PIT-programming overhead, or due to the noise
> > > introduced by other, non-hard-RT timers? I'd guess the later from the
> > > looks of it, but did your test introduce such noise (via networking and
> > > application workloads?).
> > 
> > Right, it's due to noise by non-RT timers, which I enforced by adding
> > networking and applications.
> > 
> > This adds random timer expires and admittedly the PIT reprogramming
> > overhead is adding portions of that noise.
> 
> i havent seen your latest code - what is the basic data-structure? The
> stock kernel has arrays of timers with increasing granularity and a
> cascade mechanism to move timers down the arrays as they slowly expire -
> but with a high-resolution API (1 usec accuracy?) how does the basic
> data structure look like?

The current testing code just moves the HRT timers to a seperate list.
It's a linked list at them moment, which must be optimized when the
general dust settles.

> Is the "noise" due to timers expiring "at once" - but isnt it unlikely
> for 'normal' timers to expire in exactly the same usec as the real
> high-resolution one?
> 
> or is it that we have a 'group' of normal timers expiring, which, if
> they happen to occur _just_ prior a HRT event will generate a larger
> delay?
> 

Yep. The timers expire at random times. So it's likely to have short
sequences of timer interrupts going off. This needs reprogramming of the
PIT and processing of the expired timers.

tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28  8:30       ` Thomas Gleixner
@ 2005-01-28  8:47         ` Ingo Molnar
  2005-01-28 18:34           ` George Anzinger
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2005-01-28  8:47 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: George Anzinger, LKML, Doug Niehaus, Benedikt Spranger


* Thomas Gleixner <tglx@linutronix.de> wrote:

> > or is it that we have a 'group' of normal timers expiring, which, if
> > they happen to occur _just_ prior a HRT event will generate a larger
> > delay?
> 
> Yep. The timers expire at random times. So it's likely to have short
> sequences of timer interrupts going off. This needs reprogramming of
> the PIT and processing of the expired timers.

i dont really like the static splitup of 'normal' vs. 'HRT' timers -
there might in fact be separate priority requirements between HRT timers
too.

i think one possible solution would be to introduce some notion of
'timer priority', and to expire each timer priority level in a separate
timer expiry thread. Priority 0 (lowest) would be expired in ksoftirqd,
and there would be 3 separate threads for say priorities 1-3. Or
something like this. Potentially exposed to user-space as well, via new
APIs. Hm?

To push this even further: in theory timers could inherit the priority
of the task that starts them, and they would be expired in that priority
order - but this probably needs a pretty clever (and most likely
complex) data-structure ...

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28  8:47         ` Ingo Molnar
@ 2005-01-28 18:34           ` George Anzinger
  2005-01-28 18:51             ` Ingo Molnar
                               ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: George Anzinger @ 2005-01-28 18:34 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, LKML, Doug Niehaus, Benedikt Spranger

Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> 
>>>or is it that we have a 'group' of normal timers expiring, which, if
>>>they happen to occur _just_ prior a HRT event will generate a larger
>>>delay?
>>
>>Yep. The timers expire at random times. So it's likely to have short
>>sequences of timer interrupts going off. This needs reprogramming of
>>the PIT and processing of the expired timers.

If you can use a machine that has a local apic we can leave the PIT out of it. 
Really this is MUCH preferred.  If your box has a LAPIC, make sure it is not 
disabled by your config setup.

Leaving the PIT out of it, the structure is that HRT timers are put in the 
normal timer list and, when they expire, are moved to a HRT list which only 
contains timers that will expire prior to the next jiffie.  This list is managed 
by interrupt, ideally from the LAPIC, or the PIT is need be.  Aside from the PIT 
reprograming (once per HRT timer plus once to get back to the 1/HZ period), 
there can be delays in getting the timer out of the normal timer list.  The main 
thing here is that the list MUST be processed as close to the jiffie edge as 
possible as any timers due shortly after the jiffie edge will be shadowed by 
this regardless of the HRT interrupt.  Of course, it an expired timer is 
presented to the HRT code by the normal timer expire code, it is expired 
immeadiatly.

A quick comment here on the current RT code.  It looks to me like there is a 
race in timer delivery.  It looks like the softirq is "raised" by the PIT 
interrupt code and the jiffie is bumped by the timer thread.  If the softirq 
gets to run prior to the PIT interrupt thread we could end up in the run_timer 
list code with a stale jiffie value and do nothing.  This would delay normal 
timers for a jiffie and HRT timers for some time less than a jiffie, depending 
on when they were really due.

I thing we should move the raising of the timer softirq to the PIT interrupt 
thread after we release the xtime_lock.
> 
> 
> i dont really like the static splitup of 'normal' vs. 'HRT' timers -
> there might in fact be separate priority requirements between HRT timers
> too.

Yes, and high priority tasks might want low res timers...
> 
> i think one possible solution would be to introduce some notion of
> 'timer priority', and to expire each timer priority level in a separate
> timer expiry thread. Priority 0 (lowest) would be expired in ksoftirqd,
> and there would be 3 separate threads for say priorities 1-3. Or
> something like this. Potentially exposed to user-space as well, via new
> APIs. Hm?
> 
> To push this even further: in theory timers could inherit the priority
> of the task that starts them, and they would be expired in that priority
> order - but this probably needs a pretty clever (and most likely
> complex) data-structure ...

A long time ago in another land, I did such a system.  The timer priority was 
taken from the calling task.  At that time (and now, till convinced otherwise) I 
thought it a _good thing_ to expire timers in order, regardless of their 
priority, so all timers pending delivery were delivered at the priority of the 
highest priority timer in the "batch".  The basic idea was that the interrupt 
code pulled expired timers from the timer list and pushed them into the pending 
list.  In the process it found the highest priority timer in the list.  The 
timer delivery thread was then run at that priority.  This thread adjusted its 
priority downward as needed, but in all cases the timers were delivered in 
strict time order.

Since then, as now, the timer delivery usually just _notified_ a task of a 
pending signal, the low priority timers did not really hold up things for long. 
  Once the high priority timer was delivered and the thread either finished or 
dropped its priority, the waiting task (having been wakened by the signal 
delivery) could switch in.

The primary thing needed for this is a simple and quick way to switch a tasks 
priority, both from outside and from the task itself.
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28 18:34           ` George Anzinger
@ 2005-01-28 18:51             ` Ingo Molnar
  2005-01-28 18:53             ` Ingo Molnar
  2005-01-31  9:12             ` Thomas Gleixner
  2 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2005-01-28 18:51 UTC (permalink / raw)
  To: George Anzinger; +Cc: Thomas Gleixner, LKML, Doug Niehaus, Benedikt Spranger


* George Anzinger <george@mvista.com> wrote:

> A quick comment here on the current RT code.  It looks to me like
> there is a race in timer delivery.  It looks like the softirq is
> "raised" by the PIT interrupt code and the jiffie is bumped by the
> timer thread.  If the softirq gets to run prior to the PIT interrupt
> thread we could end up in the run_timer list code with a stale jiffie
> value and do nothing.  This would delay normal timers for a jiffie and
> HRT timers for some time less than a jiffie, depending on when they
> were really due.
> 
> I thing we should move the raising of the timer softirq to the PIT
> interrupt thread after we release the xtime_lock.

ok - mind sending a patch for this?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28 18:34           ` George Anzinger
  2005-01-28 18:51             ` Ingo Molnar
@ 2005-01-28 18:53             ` Ingo Molnar
  2005-01-31  9:12             ` Thomas Gleixner
  2 siblings, 0 replies; 10+ messages in thread
From: Ingo Molnar @ 2005-01-28 18:53 UTC (permalink / raw)
  To: George Anzinger; +Cc: Thomas Gleixner, LKML, Doug Niehaus, Benedikt Spranger


* George Anzinger <george@mvista.com> wrote:

> The primary thing needed for this is a simple and quick way to switch
> a tasks priority, both from outside and from the task itself.

check out sched.c's mutex_setprio(p, prio) and mutex_getprio(p), which
is used by the PI code in kernel/rt.c. It's pretty robust and heavily
used.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: High resolution timers and BH processing on -RT
  2005-01-28 18:34           ` George Anzinger
  2005-01-28 18:51             ` Ingo Molnar
  2005-01-28 18:53             ` Ingo Molnar
@ 2005-01-31  9:12             ` Thomas Gleixner
  2 siblings, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2005-01-31  9:12 UTC (permalink / raw)
  To: george; +Cc: Ingo Molnar, LKML, Doug Niehaus, Benedikt Spranger

On Fri, 2005-01-28 at 10:34 -0800, George Anzinger wrote:
> If you can use a machine that has a local apic we can leave the PIT out of it. 
> Really this is MUCH preferred.  If your box has a LAPIC, make sure it is not 
> disabled by your config setup.

Unfortunately its an embedded box w/o LAPIC. Embedded systems with low
computing power and restricted peripherals are often used in industrial
environments and likely to be a major client of RT/HRT implementations.

> Leaving the PIT out of it, the structure is that HRT timers are put in the 
> normal timer list and, when they expire, are moved to a HRT list which only 
> contains timers that will expire prior to the next jiffie.  This list is managed 
> by interrupt, ideally from the LAPIC, or the PIT is need be.  Aside from the PIT 
> reprograming (once per HRT timer plus once to get back to the 1/HZ period), 
> there can be delays in getting the timer out of the normal timer list.  The main 
> thing here is that the list MUST be processed as close to the jiffie edge as 
> possible as any timers due shortly after the jiffie edge will be shadowed by 
> this regardless of the HRT interrupt.  Of course, it an expired timer is 
> presented to the HRT code by the normal timer expire code, it is expired 
> immeadiatly.

This requires to run the 1/Hz timer code with high priority in order to
run as close as possible to the jiffie edge. I know that you made this
move in order to make the HRT code less intrusive, but I think this
trades code beauty with latency issues. In order to reduce the latency
for HRT timers it will be necessary to split the timer list work into
pieces.

Jiffie interrupt
- Move the HRT timers out of the next jiffie bucket, reprogram HRT
timesource.
- Run the shadowed HRT timers, which are close to the jiffie edge (too
close to reprogram the HRT timesource)
- Move expired low res timers to a work list
- schedule the low res timer work

Not sure, if this is less intrusive than the previous modifications to
add_timer/mod_timer/del_timer, which can nicely be done with simple
macros and does not touch the low res timer list including the
processing at all.

> > i dont really like the static splitup of 'normal' vs. 'HRT' timers -
> > there might in fact be separate priority requirements between HRT timers
> > too.
> 
> Yes, and high priority tasks might want low res timers...

Sure, is there any API available to select the timer type / priority ?

> A long time ago in another land, I did such a system.  The timer priority was 
> taken from the calling task.  At that time (and now, till convinced otherwise) I 
> thought it a _good thing_ to expire timers in order, regardless of their 
> priority, so all timers pending delivery were delivered at the priority of the 
> highest priority timer in the "batch".  The basic idea was that the interrupt 
> code pulled expired timers from the timer list and pushed them into the pending 
> list.  In the process it found the highest priority timer in the list.  The 
> timer delivery thread was then run at that priority.  This thread adjusted its 
> priority downward as needed, but in all cases the timers were delivered in 
> strict time order.
> 
> Since then, as now, the timer delivery usually just _notified_ a task of a 
> pending signal, the low priority timers did not really hold up things for long. 
>   Once the high priority timer was delivered and the thread either finished or 
> dropped its priority, the waiting task (having been wakened by the signal 
> delivery) could switch in.

Make sense.

tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-01-31  9:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-28  0:13 High resolution timers and BH processing on -RT Thomas Gleixner
2005-01-28  4:43 ` Ingo Molnar
2005-01-28  8:20   ` Thomas Gleixner
2005-01-28  8:24     ` Ingo Molnar
2005-01-28  8:30       ` Thomas Gleixner
2005-01-28  8:47         ` Ingo Molnar
2005-01-28 18:34           ` George Anzinger
2005-01-28 18:51             ` Ingo Molnar
2005-01-28 18:53             ` Ingo Molnar
2005-01-31  9:12             ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).