All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: hrtimer: interrupt took 10252 ns - meaning?
@ 2016-06-13  9:16 Jens Koehler
  2016-07-01 10:49 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Koehler @ 2016-06-13  9:16 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Sebastian Andrzej Siewior

I found now out that the problem with RT throttling could happen when
my application sends UDP broadcasts (and receive its own broadcast).

Is this an already known problem?

[137482.258011] hrtimer: interrupt took 9434 ns
[178458.961966] NOHZ: local_softirq_pending 102
[178458.967583] NOHZ: local_softirq_pending 102
[178458.973295] NOHZ: local_softirq_pending 102
[178458.980249] NOHZ: local_softirq_pending 102
[178458.984994] NOHZ: local_softirq_pending 102
[178458.990582] NOHZ: local_softirq_pending 102
[178458.996292] NOHZ: local_softirq_pending 102
[178459.000348] [sched_delayed] sched: RT throttling activated

Jens

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-06-13  9:16 hrtimer: interrupt took 10252 ns - meaning? Jens Koehler
@ 2016-07-01 10:49 ` Sebastian Andrzej Siewior
  2016-08-08 16:45   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-07-01 10:49 UTC (permalink / raw)
  To: Jens Koehler, linux-rt-users

On 06/13/2016 11:16 AM, Jens Koehler wrote:
> I found now out that the problem with RT throttling could happen when
> my application sends UDP broadcasts (and receive its own broadcast).
> 
> Is this an already known problem?

Not to my knowledge. Could you please send a (small) testcase and your
kernel config? I then would take look a check how serious it is :)

> [137482.258011] hrtimer: interrupt took 9434 ns
> [178458.961966] NOHZ: local_softirq_pending 102
> [178458.967583] NOHZ: local_softirq_pending 102
> [178458.973295] NOHZ: local_softirq_pending 102
> [178458.980249] NOHZ: local_softirq_pending 102
> [178458.984994] NOHZ: local_softirq_pending 102
> [178458.990582] NOHZ: local_softirq_pending 102
> [178458.996292] NOHZ: local_softirq_pending 102
> [178459.000348] [sched_delayed] sched: RT throttling activated
> 
> Jens
> 
Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-07-01 10:49 ` Sebastian Andrzej Siewior
@ 2016-08-08 16:45   ` Sebastian Andrzej Siewior
  2016-08-12  9:47     ` Jens Koehler
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-08-08 16:45 UTC (permalink / raw)
  To: Jens Koehler, linux-rt-users

* Sebastian Andrzej Siewior | 2016-07-01 12:49:26 [+0200]:

tried your testcase and it didn't trigger in KVM. To get more
informations out of this I prepared a patch to assist you.

>> [137482.258011] hrtimer: interrupt took 9434 ns

For some reason you looped 3 times through the timer code. If you enable
`timer' events then you should see the "entry" point of the hrtimer irq
followed by hrtimer_expire_entry + hrtimer_expire_exit. Once you run
into that case you can look up your trace and examine which hrtimers
were involved and which took so long.

>> [178458.961966] NOHZ: local_softirq_pending 102
>> [178458.967583] NOHZ: local_softirq_pending 102
>> [178458.973295] NOHZ: local_softirq_pending 102
>> [178458.980249] NOHZ: local_softirq_pending 102
>> [178458.984994] NOHZ: local_softirq_pending 102
>> [178458.990582] NOHZ: local_softirq_pending 102
>> [178458.996292] NOHZ: local_softirq_pending 102
>> [178459.000348] [sched_delayed] sched: RT throttling activated

For those I would recommend the `sched' events. It should show you the
task switches before the NOHZ warning. It might give you a hint why the
scheduler picks the swapper/idle task rather than a task to process
sofitrqis.
Also one thing that comes to mind after I was testing it today: If you
trigger the "RT throttling" then you might see the NOHZ warning. The
timer-softirq might want to run as a softirq thread with RT priority
which won't be scheduled due to the throttling.

One more thing: Your config was from an earlier v3.14 kernel. I
recommend using a later v3.14.73-rt77 (if v3.14 is a must). The later
version has the timers softirq split out of the general softirq thread.
That means the "general" softirq thread which does napi won't run at RT
prio which means it will be excluded from the "RT throttling". We had
also some reports regarding oom, RCU stalls on small/slow CPUs which are
hit with a lot of network traffic (like a ping flood).

>> Jens

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1543,6 +1543,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 
 	raw_spin_lock(&cpu_base->lock);
 	entry_time = now = hrtimer_update_base(cpu_base);
+	trace_printk("entry: %llu\n", now.tv64);
 retry:
 	expires_next.tv64 = KTIME_MAX;
 	/*
@@ -1666,6 +1667,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
 	tick_program_event(expires_next, 1);
 	printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
 		    ktime_to_ns(delta));
+	trace_printk("time took: %llu\n", delta.tv64);
+	tracing_off();
 out:
 	if (raise)
 		raise_softirq_irqoff(HRTIMER_SOFTIRQ);
diff --git a/kernel/softirq.c b/kernel/softirq.c
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -110,6 +110,7 @@ void softirq_check_pending_idle(void)
 	if (rate_limit >= 10)
 		return;
 
+	trace_printk("check\n");
 	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
 	for (i = 0; i < NR_SOFTIRQS; i++) {
 		struct task_struct *tsk = sr->runner[i];
@@ -134,6 +135,8 @@ void softirq_check_pending_idle(void)
 	if (warnpending) {
 		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
 		       warnpending);
+		trace_printk("NOHZ: %02x\n", warnpending);
+		tracing_off();
 		rate_limit++;
 	}
 }
-- 
2.8.1


Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-08-08 16:45   ` Sebastian Andrzej Siewior
@ 2016-08-12  9:47     ` Jens Koehler
  2016-08-12 12:37       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Koehler @ 2016-08-12  9:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hello Sebastian,

We upgraded the kernel according your recommendation to v3.14.73-rt78
(in rt77 was a bug - https://lwn.net/Articles/694711/) and applied
also suggested patch.

During test we noticed the *hrtimer: interrupt took xxxx ns* message
and got following log.

>          <idle>-0     [000] d..h2.. 45070.884077: hrtimer_interrupt: entry: 45069158989509
>          <idle>-0     [000] d..h2.. 45070.884079: hrtimer_cancel: hrtimer=f717fab8
>          <idle>-0     [000] d..h1.. 45070.884082: hrtimer_expire_entry: hrtimer=f717fab8 function=hrtimer_wakeup now=45069158989509
>          <idle>-0     [000] d..h2.. 45070.884084: sched_waking: comm=mlmodbus pid=824 prio=120 target_cpu=000
>          <idle>-0     [000] dN.h3.. 45070.884088: sched_wakeup: comm=mlmodbus pid=824 prio=120 target_cpu=000
>          <idle>-0     [000] dN.h1.. 45070.884089: hrtimer_expire_exit: hrtimer=f717fab8
>          <idle>-0     [000] dN.h2.. 45070.884093: hrtimer_cancel: hrtimer=c1653500
>          <idle>-0     [000] dN.h1.. 45070.884095: hrtimer_expire_entry: hrtimer=c1653500 function=tick_sched_timer now=45069159007328
>          <idle>-0     [000] dN.h2.. 45070.884105: sched_waking: comm=rcuc/0 pid=14 prio=98 target_cpu=000
>          <idle>-0     [000] dN.h3.. 45070.884108: sched_wakeup: comm=rcuc/0 pid=14 prio=98 target_cpu=000
>          <idle>-0     [000] dN.h1.. 45070.884111: hrtimer_expire_exit: hrtimer=c1653500
>          <idle>-0     [000] dN.h2.. 45070.884113: hrtimer_start: hrtimer=c1653500 function=tick_sched_timer expires=45069160000000 softexpires=45069160000000
>          <idle>-0     [000] dN.h2.. 45070.884116: hrtimer_cancel: hrtimer=e7923f08
>          <idle>-0     [000] dN.h1.. 45070.884118: hrtimer_expire_entry: hrtimer=e7923f08 function=hrtimer_wakeup now=45069159031133
>          <idle>-0     [000] dN.h2.. 45070.884120: sched_waking: comm=PreemptTestApp pid=813 prio=19 target_cpu=000
>          <idle>-0     [000] dN.h3.. 45070.884122: sched_wakeup: comm=PreempTestApp pid=813 prio=19 target_cpu=000
>          <idle>-0     [000] dN.h1.. 45070.884124: hrtimer_expire_exit: hrtimer=e7923f08
>          <idle>-0     [000] dN.h1.. 45070.884137: hrtimer_interrupt: time took: 51718

What could be reason that the timer looped 3 times through the code
before leaving it?

Jens




On Mon, Aug 8, 2016 at 6:45 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> * Sebastian Andrzej Siewior | 2016-07-01 12:49:26 [+0200]:
>
> tried your testcase and it didn't trigger in KVM. To get more
> informations out of this I prepared a patch to assist you.
>
>>> [137482.258011] hrtimer: interrupt took 9434 ns
>
> For some reason you looped 3 times through the timer code. If you enable
> `timer' events then you should see the "entry" point of the hrtimer irq
> followed by hrtimer_expire_entry + hrtimer_expire_exit. Once you run
> into that case you can look up your trace and examine which hrtimers
> were involved and which took so long.
>
>>> [178458.961966] NOHZ: local_softirq_pending 102
>>> [178458.967583] NOHZ: local_softirq_pending 102
>>> [178458.973295] NOHZ: local_softirq_pending 102
>>> [178458.980249] NOHZ: local_softirq_pending 102
>>> [178458.984994] NOHZ: local_softirq_pending 102
>>> [178458.990582] NOHZ: local_softirq_pending 102
>>> [178458.996292] NOHZ: local_softirq_pending 102
>>> [178459.000348] [sched_delayed] sched: RT throttling activated
>
> For those I would recommend the `sched' events. It should show you the
> task switches before the NOHZ warning. It might give you a hint why the
> scheduler picks the swapper/idle task rather than a task to process
> sofitrqis.
> Also one thing that comes to mind after I was testing it today: If you
> trigger the "RT throttling" then you might see the NOHZ warning. The
> timer-softirq might want to run as a softirq thread with RT priority
> which won't be scheduled due to the throttling.
>
> One more thing: Your config was from an earlier v3.14 kernel. I
> recommend using a later v3.14.73-rt77 (if v3.14 is a must). The later
> version has the timers softirq split out of the general softirq thread.
> That means the "general" softirq thread which does napi won't run at RT
> prio which means it will be excluded from the "RT throttling". We had
> also some reports regarding oom, RCU stalls on small/slow CPUs which are
> hit with a lot of network traffic (like a ping flood).
>
>>> Jens
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1543,6 +1543,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>
>         raw_spin_lock(&cpu_base->lock);
>         entry_time = now = hrtimer_update_base(cpu_base);
> +       trace_printk("entry: %llu\n", now.tv64);
>  retry:
>         expires_next.tv64 = KTIME_MAX;
>         /*
> @@ -1666,6 +1667,8 @@ void hrtimer_interrupt(struct clock_event_device *dev)
>         tick_program_event(expires_next, 1);
>         printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
>                     ktime_to_ns(delta));
> +       trace_printk("time took: %llu\n", delta.tv64);
> +       tracing_off();
>  out:
>         if (raise)
>                 raise_softirq_irqoff(HRTIMER_SOFTIRQ);
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -110,6 +110,7 @@ void softirq_check_pending_idle(void)
>         if (rate_limit >= 10)
>                 return;
>
> +       trace_printk("check\n");
>         warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
>         for (i = 0; i < NR_SOFTIRQS; i++) {
>                 struct task_struct *tsk = sr->runner[i];
> @@ -134,6 +135,8 @@ void softirq_check_pending_idle(void)
>         if (warnpending) {
>                 printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
>                        warnpending);
> +               trace_printk("NOHZ: %02x\n", warnpending);
> +               tracing_off();
>                 rate_limit++;
>         }
>  }
> --
> 2.8.1
>
>
> Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-08-12  9:47     ` Jens Koehler
@ 2016-08-12 12:37       ` Sebastian Andrzej Siewior
  2016-08-12 18:48         ` Jens Koehler
  2016-09-08 10:36         ` Jens Koehler
  0 siblings, 2 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-08-12 12:37 UTC (permalink / raw)
  To: Jens Koehler; +Cc: linux-rt-users

On 2016-08-12 11:47:31 [+0200], Jens Koehler wrote:
> Hello Sebastian,
Hi Jens,

> During test we noticed the *hrtimer: interrupt took xxxx ns* message
> and got following log.
> 
> >          <idle>-0     [000] d..h2.. 45070.884077: hrtimer_interrupt: entry: 45069158989509
> >          <idle>-0     [000] d..h2.. 45070.884079: hrtimer_cancel: hrtimer=f717fab8
> >          <idle>-0     [000] d..h1.. 45070.884082: hrtimer_expire_entry: hrtimer=f717fab8 function=hrtimer_wakeup now=45069158989509
> >          <idle>-0     [000] d..h2.. 45070.884084: sched_waking: comm=mlmodbus pid=824 prio=120 target_cpu=000
> >          <idle>-0     [000] dN.h3.. 45070.884088: sched_wakeup: comm=mlmodbus pid=824 prio=120 target_cpu=000
> >          <idle>-0     [000] dN.h1.. 45070.884089: hrtimer_expire_exit: hrtimer=f717fab8
> >          <idle>-0     [000] dN.h2.. 45070.884093: hrtimer_cancel: hrtimer=c1653500
> >          <idle>-0     [000] dN.h1.. 45070.884095: hrtimer_expire_entry: hrtimer=c1653500 function=tick_sched_timer now=45069159007328
> >          <idle>-0     [000] dN.h2.. 45070.884105: sched_waking: comm=rcuc/0 pid=14 prio=98 target_cpu=000
> >          <idle>-0     [000] dN.h3.. 45070.884108: sched_wakeup: comm=rcuc/0 pid=14 prio=98 target_cpu=000
> >          <idle>-0     [000] dN.h1.. 45070.884111: hrtimer_expire_exit: hrtimer=c1653500
> >          <idle>-0     [000] dN.h2.. 45070.884113: hrtimer_start: hrtimer=c1653500 function=tick_sched_timer expires=45069160000000 softexpires=45069160000000
> >          <idle>-0     [000] dN.h2.. 45070.884116: hrtimer_cancel: hrtimer=e7923f08
> >          <idle>-0     [000] dN.h1.. 45070.884118: hrtimer_expire_entry: hrtimer=e7923f08 function=hrtimer_wakeup now=45069159031133
> >          <idle>-0     [000] dN.h2.. 45070.884120: sched_waking: comm=PreemptTestApp pid=813 prio=19 target_cpu=000
> >          <idle>-0     [000] dN.h3.. 45070.884122: sched_wakeup: comm=PreempTestApp pid=813 prio=19 target_cpu=000
> >          <idle>-0     [000] dN.h1.. 45070.884124: hrtimer_expire_exit: hrtimer=e7923f08
> >          <idle>-0     [000] dN.h1.. 45070.884137: hrtimer_interrupt: time took: 51718
.000002
.000003
.000002
.000004
.000001
.000004
.000002
.000010
.000003
.000003
.000002
.000003
.000002
.000002
.000002
.000002
.000013
> 
> What could be reason that the timer looped 3 times through the code
> before leaving it?

So we talk here about ~50us. Your initial report was about 9us.
Tracepoint to tracepoint takes approx. 3us except one that needs 10us.
All in all I would say this is okay and you don't have a 3GHz x86 box
here.
You had one timer which expired (hrtimer_wakeup). So you programmed a
new timer via tick_program_event() and this has failed. Probably because
the next event was in the past. This is the second loop where
tick_sched_timer expired. You also failed to program next timer here and
here you have your third loop.
You would have to figure out why programming of the next timer failed.
Maybe min delta is too low. 

> Jens

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-08-12 12:37       ` Sebastian Andrzej Siewior
@ 2016-08-12 18:48         ` Jens Koehler
  2016-09-08 10:36         ` Jens Koehler
  1 sibling, 0 replies; 14+ messages in thread
From: Jens Koehler @ 2016-08-12 18:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Thank you for explanation.

In the application running two real time threads. First one runs with
a sleep time of 2 ms and the second one with 100ms. In difference to
Hello World example from
https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO the calculation
of sleep time starts always with clock_gettime. Could this cause the
problem?

while(1)
{
   //do the stuff

   clock_gettime(CLOCK_MONOTONIC, &tsleep);
   tsleep.tv_nsec += TimerThreadPeriod_ns;
   while (tsleep.tv_nsec >= NSEC_PER_SEC)
   {
      tsleep.tv_nsec -= NSEC_PER_SEC;
      tsleep.tv_sec++;
   }
   clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &tsleep, NULL);
}

How can we find the min delta for our CPU?

Jens

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-08-12 12:37       ` Sebastian Andrzej Siewior
  2016-08-12 18:48         ` Jens Koehler
@ 2016-09-08 10:36         ` Jens Koehler
  2016-09-08 11:37           ` Sebastian Andrzej Siewior
  2016-09-12  9:16           ` Stanislav Meduna
  1 sibling, 2 replies; 14+ messages in thread
From: Jens Koehler @ 2016-09-08 10:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hello Sebastian,

After adding of your patch "disable preemption during CR3 read+write"
we have since 2 weeks no rt throttling event. This is very good
improvement.

Thanks a lot,
Jens

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-09-08 10:36         ` Jens Koehler
@ 2016-09-08 11:37           ` Sebastian Andrzej Siewior
  2016-09-12  9:16           ` Stanislav Meduna
  1 sibling, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-08 11:37 UTC (permalink / raw)
  To: Jens Koehler; +Cc: linux-rt-users

On 2016-09-08 12:36:08 [+0200], Jens Koehler wrote:
> Hello Sebastian,
Hallo Jens,

> After adding of your patch "disable preemption during CR3 read+write"
> we have since 2 weeks no rt throttling event. This is very good
> improvement.

didn't see that coming. Glad it works now.

> Thanks a lot,
> Jens

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-09-08 10:36         ` Jens Koehler
  2016-09-08 11:37           ` Sebastian Andrzej Siewior
@ 2016-09-12  9:16           ` Stanislav Meduna
  2016-09-12 10:07             ` Sebastian Andrzej Siewior
  2016-09-12 10:21             ` Vasudev Kamath
  1 sibling, 2 replies; 14+ messages in thread
From: Stanislav Meduna @ 2016-09-12  9:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rt-users

On 9/8/2016 12:36 PM, Jens Koehler wrote:

> disable preemption during CR3 read+write

This got my attention as I have hunted some very similar looking
thing 3 years ago:

https://lkml.org/lkml/2013/5/17/109

Coud it explain this? I am unfortunately not able to quickly
reproduce the environment back then.

-- 
                                    Stano


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-09-12  9:16           ` Stanislav Meduna
@ 2016-09-12 10:07             ` Sebastian Andrzej Siewior
  2016-09-12 10:13               ` Stanislav Meduna
  2016-09-12 10:21             ` Vasudev Kamath
  1 sibling, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-12 10:07 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: linux-rt-users

On 2016-09-12 11:16:55 [+0200], Stanislav Meduna wrote:
> On 9/8/2016 12:36 PM, Jens Koehler wrote:
> 
> > disable preemption during CR3 read+write
> 
> This got my attention as I have hunted some very similar looking
> thing 3 years ago:
> 
> https://lkml.org/lkml/2013/5/17/109

This looks like the same thing. Repeated pagefault on UP, no help after
rewrite of the PTE.

> Coud it explain this? I am unfortunately not able to quickly
> reproduce the environment back then.
What do you want me to explain? I don't remember the thread from back
then. I probably skipped it because we had Steven, Rik, Linus and HPA
looped in.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-09-12 10:07             ` Sebastian Andrzej Siewior
@ 2016-09-12 10:13               ` Stanislav Meduna
  0 siblings, 0 replies; 14+ messages in thread
From: Stanislav Meduna @ 2016-09-12 10:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

On 9/12/2016 12:07 PM, Sebastian Andrzej Siewior wrote:

>> Coud it explain this? I am unfortunately not able to quickly
>> reproduce the environment back then.
> What do you want me to explain?

Oh, I meant just whether the issue that patch fixed could explain
what I have seen. Thanks for the confirmation.

-- 
                                      Stano


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-09-12  9:16           ` Stanislav Meduna
  2016-09-12 10:07             ` Sebastian Andrzej Siewior
@ 2016-09-12 10:21             ` Vasudev Kamath
  1 sibling, 0 replies; 14+ messages in thread
From: Vasudev Kamath @ 2016-09-12 10:21 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: linux-rt-users

On Mon, Sep 12, 2016 at 2:46 PM, Stanislav Meduna <stano@meduna.org> wrote:
> This got my attention as I have hunted some very similar looking
> thing 3 years ago:

When we first started facing this issue, we had explored almost all of
your threads about throttling :-), but we never found a proper answer.
We did not know exact way to reproduce the issue of throttling except
setup running for few days 2-3 will result in throttling. Only one
common behavior in every case of throttling  throttled thread had
minor page fault of 900k to 1m.

When we read description of patch from Sebastian, we had a gut feeling
it may be our case as we were also a x86 UP system and we also had
minor page faults during throttling.

After using patch we did not faced throttling even after running
couple of weeks.

Cheers,
-- 

Vasudev Kamath
https://copyninja.info

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hrtimer: interrupt took 10252 ns - meaning?
  2016-06-01  7:46 Jens Koehler
@ 2016-06-03 15:59 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-03 15:59 UTC (permalink / raw)
  To: Jens Koehler; +Cc: linux-rt-users

* Jens Koehler | 2016-06-01 09:46:56 [+0200]:

>Hi,
Hi,

>I am running a linux kernel with rt preempt patch and made also a  rt
>application. Application works good but sometimes I a get a message
>like "hrtimer: interrupt took 10252 ns" or "NOHZ:
>local_softirq_pending 102". What does it mean and is it a problem?

The hrtimer message tries to inform you that for some reasons you had
one or more hrtimers which expired and processing them to around 10us.
At a single point. This means either you had a lot of them or one in
particular took a very long time to finish. This might be problem
because you had a window of 10us in which you were not able to react to
external interrupts.

The NOHZ message means that you had a TIMER_SOFTIRQ and HRTIMER_SOFTIRQ
event pending but you rather went idle than process them. I've seen that
with network on an (almost) idle machine that the softirq was blocked
due to CPU pinning and so there was nothing else to do then to go idle.
There was one report around v3.8-RT I think and back the the debug code
didn't cover the scenario.

>Thanks and regards,
>Jens

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* hrtimer: interrupt took 10252 ns - meaning?
@ 2016-06-01  7:46 Jens Koehler
  2016-06-03 15:59 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Koehler @ 2016-06-01  7:46 UTC (permalink / raw)
  To: linux-rt-users

Hi,

I am running a linux kernel with rt preempt patch and made also a  rt
application. Application works good but sometimes I a get a message
like "hrtimer: interrupt took 10252 ns" or "NOHZ:
local_softirq_pending 102". What does it mean and is it a problem?

Thanks and regards,
Jens

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-09-12 10:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13  9:16 hrtimer: interrupt took 10252 ns - meaning? Jens Koehler
2016-07-01 10:49 ` Sebastian Andrzej Siewior
2016-08-08 16:45   ` Sebastian Andrzej Siewior
2016-08-12  9:47     ` Jens Koehler
2016-08-12 12:37       ` Sebastian Andrzej Siewior
2016-08-12 18:48         ` Jens Koehler
2016-09-08 10:36         ` Jens Koehler
2016-09-08 11:37           ` Sebastian Andrzej Siewior
2016-09-12  9:16           ` Stanislav Meduna
2016-09-12 10:07             ` Sebastian Andrzej Siewior
2016-09-12 10:13               ` Stanislav Meduna
2016-09-12 10:21             ` Vasudev Kamath
  -- strict thread matches above, loose matches on Subject: below --
2016-06-01  7:46 Jens Koehler
2016-06-03 15:59 ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.