All of lore.kernel.org
 help / color / mirror / Atom feed
* loading pages to memory causes high latency
@ 2012-10-26 18:00 Stanislav Meduna
  2012-10-26 21:04 ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislav Meduna @ 2012-10-26 18:00 UTC (permalink / raw)
  To: linux-rt-users

Hi all,

I have an application that absolutely has to meet a 100 ms deadline.
All threads of the application are SCHED_RR ones with priority
that is higher than any other userspace process in the system and
also than the irq handler of the PATA controller.

The command

 sync ; echo 3 > /proc/sys/vm/drop_caches ; cat /var/log/* > /dev/null

which is around 6 MB of data needed to fetch from a CF card:

- immediately causes [sched_delayed] sched: RT throttling activated,
  even if there is _no_ RT application started yet. That means
  that it is probably the irq task itself that is triggering it,
  penalizing all other good-behaving RT tasks

# ps -eLo pid,class,rtprio,comm  | egrep 'RR|FF'
    3 FF       1 ksoftirqd/0
    6 FF      99 posixcputmr/0
   27 FF      22 irq/14-pata_cs5
   32 FF      50 irq/15-ehci_hcd
   33 FF      50 irq/15-ohci_hcd
   34 FF      50 irq/8-rtc0
  417 FF      50 irq/5-eth0
  438 FF      50 irq/15-eth1


- causes the process to block for a time that is much longer
  than I am comfortable with.

The command
  echo 1000000 > /proc/sys/kernel/sched_rt_runtime_us
fixes the problem but is not really something I want
to enable when developing a real-time capable system
that will be running in remote locations. With 980000
I still get negligible latencies. With 970000 I get 4 ms.
With 960000 I get 15 ms latencies. So it is indeed
the throttling that is firing here.

Data with the application running and sched_rt_runtime_us
at 960000:

# head -3 /sys/kernel/debug/tracing/latency_hist/wakeup/CPU0
#Minimum latency: 13 microseconds
#Average latency: 25 microseconds
#Maximum latency: 24011 microseconds

# cat /sys/kernel/debug/tracing/latency_hist/wakeup/max_latency-CPU0
605 25 24011 (0) myApp <- 0 -21 swapper 2802.650022

# cat /sys/kernel/debug/tracing/trace

#
# wakeup_rt latency trace v1.1.5 on 3.4.15-rt24+
# --------------------------------------------------------------------
# latency: 24011 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
#    -----------------
#    | task: myApp-605 (uid:0 nice:0 policy:2 rt_prio:25)
#    -----------------
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| / _--=> migrate-disable
#                ||||| /     delay
#  cmd     pid   |||||| time  |   caller
#     \   /      |||||  \   |   /
irq/14-p-27      0d.h3.    1us+:     27: 95:R   + [000]   605: 74:S myApp
irq/14-p-27      0d.h3.    3us!: try_to_wake_up <-wake_up_process
  <idle>-0       0d..3. 24007us+: __schedule <-preempt_schedule
  <idle>-0       0d..3. 24010us :      0:120:R ==> [000]   605: 74:R myApp

The cyclictest or sigwaittest record similar highest numbers.
Setting all of the irq threads to the lower priority does
not matter.

Is this the expected behaviour? Does this point to a problem
in kernel or more likely to a problem in the PATA driver?
The 'd' does not seem right, does it?

Other latencies than wakeup seem to be fine

# head -3 latency_hist/irqsoff/CPU0
#Minimum latency: 0 microseconds
#Average latency: 1 microseconds
#Maximum latency: 77 microseconds

# head -3 latency_hist/preemptirqsoff/CPU0
#Minimum latency: 0 microseconds
#Average latency: 2 microseconds
#Maximum latency: 115 microseconds

#  head -3 latency_hist/preemptoff/CPU0
#Minimum latency: 3 microseconds
#Average latency: 21 microseconds
#Maximum latency: 105 microseconds


Note: I do not have high-precision timers; my board's ACPI
is not recognized and all of the MFGPT  timers on the CS5536
are probably taken by the BIOS. current_clocksource is TSC.
Could this matter?


Geode LX 500 MHz
256 MB RAM, 205 MB free
3.4.15-rt24 + tglx's tiny RCU fix from a week ago
PREEMPT_FULL
e100 network driver
CS5536 companion device which does the PATA (CONFIG_PATA_CS5536)
the CF CARD is detected and used as PIO4
mlockall() for the application
no swap configured

Thanks
-- 
                               Stano

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: loading pages to memory causes high latency
  2012-10-26 18:00 loading pages to memory causes high latency Stanislav Meduna
@ 2012-10-26 21:04 ` Thomas Gleixner
  2012-10-27  8:38   ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2012-10-26 21:04 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: linux-rt-users

On Fri, 26 Oct 2012, Stanislav Meduna wrote:
> Hi all,
> 
> I have an application that absolutely has to meet a 100 ms deadline.
> All threads of the application are SCHED_RR ones with priority
> that is higher than any other userspace process in the system and
> also than the irq handler of the PATA controller.
> 
> The command
> 
>  sync ; echo 3 > /proc/sys/vm/drop_caches ; cat /var/log/* > /dev/null
> 
> which is around 6 MB of data needed to fetch from a CF card:
> 
> - immediately causes [sched_delayed] sched: RT throttling activated,
>   even if there is _no_ RT application started yet. That means
>   that it is probably the irq task itself that is triggering it,
>   penalizing all other good-behaving RT tasks
> 
> # ps -eLo pid,class,rtprio,comm  | egrep 'RR|FF'
>     3 FF       1 ksoftirqd/0
>     6 FF      99 posixcputmr/0
>    27 FF      22 irq/14-pata_cs5
>    32 FF      50 irq/15-ehci_hcd
>    33 FF      50 irq/15-ohci_hcd
>    34 FF      50 irq/8-rtc0
>   417 FF      50 irq/5-eth0
>   438 FF      50 irq/15-eth1
> 
> 
> - causes the process to block for a time that is much longer
>   than I am comfortable with.
> 
> The command
>   echo 1000000 > /proc/sys/kernel/sched_rt_runtime_us
> fixes the problem but is not really something I want
> to enable when developing a real-time capable system
> that will be running in remote locations. With 980000
> I still get negligible latencies. With 970000 I get 4 ms.
> With 960000 I get 15 ms latencies. So it is indeed
> the throttling that is firing here.

Right. So that means that the pata irq or something else loops in
circles for whatever reasons.

For a simple test, can you just set the pata irq thread to SCHED_OTHER
with chrt? Does that cure the problem?

Aside of that I'd really like to see a full function trace of that
scenario. Can you please apply the patch below and run that with the
function tracer enabled. The throttling will stop the trace right at
the point.

Thanks,

	tglx

----------->

--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -868,6 +868,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
 			static bool once = false;
 
 			rt_rq->rt_throttled = 1;
+			tracing_off();
 
 			if (!once) {
 				once = true;





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: loading pages to memory causes high latency
  2012-10-26 21:04 ` Thomas Gleixner
@ 2012-10-27  8:38   ` Thomas Gleixner
  2012-10-27  9:06     ` Stanislav Meduna
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2012-10-27  8:38 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: linux-rt-users

On Fri, 26 Oct 2012, Thomas Gleixner wrote:
> On Fri, 26 Oct 2012, Stanislav Meduna wrote:
> > The command
> >   echo 1000000 > /proc/sys/kernel/sched_rt_runtime_us
> > fixes the problem but is not really something I want
> > to enable when developing a real-time capable system
> > that will be running in remote locations. With 980000
> > I still get negligible latencies. With 970000 I get 4 ms.
> > With 960000 I get 15 ms latencies. So it is indeed
> > the throttling that is firing here.
> 
> Right. So that means that the pata irq or something else loops in
> circles for whatever reasons.
> 
> For a simple test, can you just set the pata irq thread to SCHED_OTHER
> with chrt? Does that cure the problem?
> 
> Aside of that I'd really like to see a full function trace of that
> scenario. Can you please apply the patch below and run that with the
> function tracer enabled. The throttling will stop the trace right at
> the point.

Ok. The trace shows what happens. The pata driver runs all of its work
in the interrupt handler context and the hardware is fast enough to
saturate the machine. I.e. right after the handler returns and
reenables interrupts the next pata irq fires. Lather rinse and repeat.

On mainline you'd just see the machine stalling for a bit as it would
process interrupts and nothing else. Of course you wont see rt
throttling because all of this happens in hard interrupt context,
except if you force interrupt threading via "threadirqs" on the kernel
command line.

There is not much we can do about that unless we want to do a major
surgery on that pata driver code.

So either you disable the throttler or you reconfigure the pata irq
thread to SCHED_OTHER, which will slow it down a bit, but avoids the
hogging of the machine with a RT task going wild.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: loading pages to memory causes high latency
  2012-10-27  8:38   ` Thomas Gleixner
@ 2012-10-27  9:06     ` Stanislav Meduna
  2012-10-27  9:56       ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislav Meduna @ 2012-10-27  9:06 UTC (permalink / raw)
  To: linux-rt-users

On 27.10.2012 10:38, Thomas Gleixner wrote:

> There is not much we can do about that unless we want to do a major
> surgery on that pata driver code.
> 
> So either you disable the throttler or you reconfigure the pata irq
> thread to SCHED_OTHER, which will slow it down a bit, but avoids the
> hogging of the machine with a RT task going wild.

Thank you for the analysis. Yep I'll reconfigure it to SCHED_OTHER -
file access is a totally low-prio thing in my application and I was more
afraid of a possibility where someone fetching logfiles or something
like that will bring the application to the knees.

BTW what granularity does the throttler have? I cannot live with
50 ms lag once in a second, but in my case I can well live with
5 ms lag once in 100 ms (even if that means that the throttler
would be activated more often).

Right now I have

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y

and lacking better choices my clock source is TSC. I admit that
I do not know what precisely each setting does - the NO_HZ combined
with HZ=1000 looks a bit weird ;)

Best regards
-- 
                                      Stano


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: loading pages to memory causes high latency
  2012-10-27  9:06     ` Stanislav Meduna
@ 2012-10-27  9:56       ` Thomas Gleixner
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2012-10-27  9:56 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: linux-rt-users

On Sat, 27 Oct 2012, Stanislav Meduna wrote:
> On 27.10.2012 10:38, Thomas Gleixner wrote:
> 
> > There is not much we can do about that unless we want to do a major
> > surgery on that pata driver code.
> > 
> > So either you disable the throttler or you reconfigure the pata irq
> > thread to SCHED_OTHER, which will slow it down a bit, but avoids the
> > hogging of the machine with a RT task going wild.
> 
> Thank you for the analysis. Yep I'll reconfigure it to SCHED_OTHER -
> file access is a totally low-prio thing in my application and I was more
> afraid of a possibility where someone fetching logfiles or something
> like that will bring the application to the knees.
> 
> BTW what granularity does the throttler have? I cannot live with
> 50 ms lag once in a second, but in my case I can well live with
> 5 ms lag once in 100 ms (even if that means that the throttler
> would be activated more often).

The granularity is microseconds. 

sched_rt_period  defaults to 1000000	== 1.00s
sched_rt_runtime defaults to  950000	== 0.95s

The mechanics of the throttler is that it gives SCHED_OTHER a chance
to run for sched_rt_period - sched_rt_runtime (default 0.05s == 50ms)
if RT tasks hogged the cpu for more than sched_rt_runtime in a row.

This is a safety belt, so you have a chance to kill your RT task which
got lost in a while(1) loop :)

> Right now I have
> 
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
> CONFIG_HZ=1000
> CONFIG_SCHED_HRTICK=y
> 
> and lacking better choices my clock source is TSC. I admit that
> I do not know what precisely each setting does - the NO_HZ combined
> with HZ=1000 looks a bit weird ;)

HZ is the scheduler tick frequency. That's used for timeslicing
SCHED_OTHER tasks.

NO_HZ is only affecting idle. When the machine goes idle and the next
timer expires AFTER the next scheduler tick, then the scheduler tick
is stopped and the timer device is armed to the timer expiry. That's a
power saving feature as it allows the CPU to stay in deeper
powerstates for a longer time, instead of waking up with HZ frequency
just to see that there is nothing to do, which is silly as we know
that right when we go idle :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-27  9:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-26 18:00 loading pages to memory causes high latency Stanislav Meduna
2012-10-26 21:04 ` Thomas Gleixner
2012-10-27  8:38   ` Thomas Gleixner
2012-10-27  9:06     ` Stanislav Meduna
2012-10-27  9:56       ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.