Nicholas Mc Guire wrote:
> 
>> Latencies are mainly due to cache refills on the P4. Have you already
>> put load onto your system? If not, worst case latencies will be even
>> longer.
> 
> 
> one posibility we found in RTLinux/GPL to reduce latency is to free up
> TLBs by flushing a few of the TLB hot spots, basically these flushpoints
> are something like:
> 
> __asm__ __volatile__("invlpg %0": :"m"
> (*(char*)__builtin_return_address(0)));
> 
> put at places where we know we don't need thos lines any more (i.e.
> after switching tasks or the like). By inserting only a few such
> flushpoints in
> hot code on the kernel side we found a clear reduction of the worst case
> jitter and interrupt response times.

Interesting. Are these flushpoints present in latest kernel patches of
RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :)

> 
> Aside from caches, BTB exhaustion in high load situations is also a
> problem that has not been addressed much in the realtime variants - with
> the P6 families having a botched BTB prediction unit, one can use some
> "strange" constructions to reduce branch penalties - i.e.:
> 
>   if(!condition){slow_path();}
>   else{fast_path();}
> 
> if more predictalbe than
> 
>   if(codition){fast_path();}
>   else{slow_path();}

I think this is also what likely()/unlikely() teaches to the the
compiler on x86 (where there is no branch prediction predicate for the
instructions), isn't it?

> 
> as in the first case the branch prediction is static, thus the worst case
> is that you are jumping over a few bytes of object code when the condition
> is not met. in the second case the default if the BTB does not yet know
> this branch is to guess not-taken and thus load the jump target of the
> slow patch with the overhead of TLB/Cache penalties.
> 
> Regarding the PPC numbers, the surprising thing for me is that the same
> archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4
> kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question
> what is going wrong here in the 2.6.X branches of hard-realtime Linux -

You forget that old stuff was kernel-only, lacking a lot of Linux
integration features. Recent I-pipe-based real-time via Xenomai normally
includes support for user-space RT (you can switch it off, but hardly
anyone does). So its not a useful comparison given that new real-time
projects almost always want full-featured user space these days. For a
fairer comparison, one should consider a simple I-pipe domain that
contains the real-time "application".

> my suspicion is that there is too much work being done on fast-hot CPUs
> and the low-end is being neglected - which is bad as the numbers you
> post here for ADEOS are numbers reachable with mainstream preemptive
> kernel by now as well (off course not on the low end systems though).

That's scenario-dependent. Simple setups like a plain timed task can
reach the dimension of I-pipe-based Xenomai, but more complex scenarios
suffer from the exploding complexity in mainstream Linux, even with -rt.
Just think of "simple" mutexes realised via futexes.

Jan