All of lore.kernel.org
 help / color / mirror / Atom feed
* 100% RT load on 2 processor SMP machine
@ 2018-10-15 15:44 Tim Sander
  2018-10-23  9:36 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 3+ messages in thread
From: Tim Sander @ 2018-10-15 15:44 UTC (permalink / raw)
  To: linux-rt-users

Hi

I just disovered a bug in my playground realtime application which led to 100% 
load on one core. This was the non normal case when the system was waiting for 
a network client to connect and i had a fifo buffer overflow which led 
accidentially to a busy wait loop with 98 fifo priority. This in turn lead to 
a system which did not accept any new network connections.

The expected behaviour for a SMP system which still has one processor lightly 
loaded should be that it still works "normal" even if one core is overloaded 
with hard realtime work. Are there any explanations for this behavior?

The embedded system im am using is an pretty dated Intel/Altera Cortex A9.

Best regards
Tim

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 100% RT load on 2 processor SMP machine
  2018-10-15 15:44 100% RT load on 2 processor SMP machine Tim Sander
@ 2018-10-23  9:36 ` Sebastian Andrzej Siewior
  2018-10-25  9:56   ` Tim Sander
  0 siblings, 1 reply; 3+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-10-23  9:36 UTC (permalink / raw)
  To: Tim Sander; +Cc: linux-rt-users

On 2018-10-15 17:44:42 [+0200], Tim Sander wrote:
> Hi
Hi,

> I just disovered a bug in my playground realtime application which led to 100% 
> load on one core. This was the non normal case when the system was waiting for 
> a network client to connect and i had a fifo buffer overflow which led 
> accidentially to a busy wait loop with 98 fifo priority. This in turn lead to 
> a system which did not accept any new network connections.
> 
> The expected behaviour for a SMP system which still has one processor lightly 
> loaded should be that it still works "normal" even if one core is overloaded 
> with hard realtime work. Are there any explanations for this behavior?
> 
You should figure what blocked you from working in the end. So if you
had an interrupt pinned to the CPU that was blocked by RT task then this
would explain why it did not make any progress.
If you disable RT_RUNTIME_SHARE then the RT (that went wild) will be
throttled.

	echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
> 
> Best regards
> Tim

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 100% RT load on 2 processor SMP machine
  2018-10-23  9:36 ` Sebastian Andrzej Siewior
@ 2018-10-25  9:56   ` Tim Sander
  0 siblings, 0 replies; 3+ messages in thread
From: Tim Sander @ 2018-10-25  9:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian

Thanks for your feedback.
Am Dienstag, 23. Oktober 2018, 11:36:13 CEST schrieb Sebastian Andrzej Siewior:
> > I just disovered a bug in my playground realtime application which led to
> > 100% load on one core. This was the non normal case when the system was
> > waiting for a network client to connect and i had a fifo buffer overflow
> > which led accidentially to a busy wait loop with 98 fifo priority. This
> > in turn lead to a system which did not accept any new network
> > connections.
> > 
> > The expected behaviour for a SMP system which still has one processor
> > lightly loaded should be that it still works "normal" even if one core is
> > overloaded with hard realtime work. Are there any explanations for this
> > behavior?
> You should figure what blocked you from working in the end. So if you
> had an interrupt pinned to the CPU that was blocked by RT task then this
> would explain why it did not make any progress.
I have no cpu affinity set on this system but i have seen that they are only
served by the cpu 0. Nevertheless /proc/interrupts show that only cpu 0 is
serving interrupts (besides twd, Rescheduling interrupts and function call
 interrupts). I have set the cpu_affinity of my runaway high priority thread
for testing to cpu 1. Which made things a little better but still the system
is barely usable.

> If you disable RT_RUNTIME_SHARE then the RT (that went wild) will be
> throttled.
> 
> 	echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
I could not see any improvement from this. I have also not seen any throttling
of the RT-task. I have enabled the check for hung tasks but it seems the tasks
are making enough progress not to trigger it but the system beeing crawling
along so that normal tasks are not properly scheduled:

[  849.052372] i2c_designware ffc04000.i2c: controller timed out
[  849.058120] rtc-pcf8563 0-0051: pcf8563_read_block_data: read error

htop shows me that the first cpu is only lightly loaded, so in my opinion
there should be no need for stalls on this Intel/Altera Cyclone V Arm Cortex
A9 system. Also the runaway task is at 98 priority so all RCU or other
housekeeping tasks should be allowed to run besides of the runaway task.

Probably there is an improvment from RT_RUNTIME_SHARE. While i have not
seen any throtteling i have seen a BUG message:

[  302.045391] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 102s!
[  302.053571] Showing busy workqueues and worker pools:
[  302.058609] workqueue events_highpri: flags=0x10
[  302.063210]   pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
[  302.069387]     pending: flush_backlog BAR(82)
[  302.073843] workqueue netns: flags=0xe000a
[  302.077925]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/1
[  302.083323]     in-flight: 82:cleanup_net
[  302.087344] workqueue ipv6_addrconf: flags=0x40008
[  302.092118]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
[  302.097947]     in-flight: 31:addrconf_verify_work
[  302.102742] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=1s workers=3 idle: 32 5
[  302.110054] pool 4: cpus=0-1 flags=0x4 nice=0 hung=36s workers=4 idle: 57 148 7

Below is the output "ps -Leo pid,tid,class,rtprio,stat,comm,wchan" while the above BUG messages where popping up.

  PID   TID CLS RTPRIO STAT COMMAND         WCHAN
    1     1 TS       - Ss   systemd         epoll_wait
    2     2 TS       - S    kthreadd        kthreadd
    3     3 TS       - I<   rcu_gp          rescuer_thread
    4     4 TS       - I<   rcu_par_gp      rescuer_thread
    5     5 TS       - I    kworker/0:0-rcu worker_thread
    6     6 TS       - I<   kworker/0:0H-ev worker_thread
    7     7 TS       - I    kworker/u4:0-ev worker_thread
    8     8 TS       - I<   mm_percpu_wq    rescuer_thread
    9     9 TS       - S    ksoftirqd/0     smpboot_thread_fn
   10    10 FF       1 S    ktimersoftd/0   smpboot_thread_fn
   11    11 TS       - I    rcu_preempt     rcu_gp_kthread
   12    12 TS       - I    rcu_sched       rcu_gp_kthread
   13    13 TS       - S    rcuc/0          smpboot_thread_fn
   14    14 TS       - S    kswork          swork_kthread
   15    15 FF      99 S    posixcputmr/0   smpboot_thread_fn
   16    16 FF      99 S    migration/0     smpboot_thread_fn
   17    17 FF      99 S    watchdog/0      smpboot_thread_fn
   18    18 TS       - S    cpuhp/0         smpboot_thread_fn
   19    19 TS       - S    cpuhp/1         smpboot_thread_fn
   20    20 FF      99 S    watchdog/1      smpboot_thread_fn
   21    21 FF      99 S    migration/1     smpboot_thread_fn
   22    22 FF      99 S    posixcputmr/1   smpboot_thread_fn
   23    23 TS       - R    rcuc/1          -
   24    24 FF       1 R    ktimersoftd/1   -
   25    25 TS       - R    ksoftirqd/1     -
   26    26 TS       - I    kworker/1:0-eve worker_thread
   27    27 TS       - R<   kworker/1:0H    -
   28    28 TS       - S    kdevtmpfs       devtmpfsd
   29    29 TS       - I<   netns           rescuer_thread
   30    30 TS       - S    rcu_tasks_kthre rcu_tasks_kthread
   31    31 TS       - D    kworker/0:1+ipv rtnl_lock
   32    32 TS       - I    kworker/0:2-eve worker_thread
   33    33 TS       - S    khungtaskd      watchdog
   34    34 TS       - S    oom_reaper      oom_reaper
   35    35 TS       - I<   writeback       rescuer_thread
   36    36 TS       - S    kcompactd0      kcompactd
   37    37 TS       - I<   crypto          rescuer_thread
   38    38 TS       - I<   kblockd         rescuer_thread
   39    39 FF      99 S    watchdogd       kthread_worker_fn
   40    40 TS       - I<   rpciod          rescuer_thread
   41    41 TS       - I<   kworker/u5:0    worker_thread
   42    42 TS       - I<   xprtiod         rescuer_thread
   43    43 TS       - S    kswapd0         kswapd
   44    44 TS       - I<   nfsiod          rescuer_thread
   56    56 FF      50 S    irq/33-denali-n irq_thread
   57    57 TS       - I    kworker/u4:1-ev worker_thread
   59    59 FF      50 S    irq/28-ff706000 irq_thread
   60    60 TS       - I<   ipv6_addrconf   rescuer_thread
   61    61 TS       - I    kworker/1:1-mm_ worker_thread
   62    62 TS       - S    ubi_bgt0d       ubi_thread
   64    64 TS       - S    ubifs_bgt0_1    ubifs_bg_thread
   77    77 TS       - Ss   systemd-journal epoll_wait
   82    82 TS       - D    kworker/u4:2+ne flush_work
  108   108 TS       - Ssl  systemd-timesyn epoll_wait
  108   179 TS       - Ssl  sd-resolve      skb_wait_for_more_packets
  111   111 FF      50 S    irq/30-ffc04000 irq_thread
  112   112 FF      50 S    irq/34-ff705000 irq_thread
  113   113 FF      50 S    irq/31-ffc06000 irq_thread
  114   114 FF      50 S    irq/36-fff00000 irq_thread
  116   116 TS       - S    spi0            kthread_worker_fn
  118   118 TS       - I<   stmmac_wq       rescuer_thread
  148   148 TS       - I    kworker/u4:3    worker_thread
  160   160 TS       - Ss   systemd-network epoll_wait
  161   161 TS       - Ss   ads1115test     hrtimer_nanosleep
  163   163 TS       - Ss   thttpd          poll_schedule_timeout.constprop.3
  164   164 TS       - Ss   dbus-daemon     epoll_wait
  166   166 TS       - Ss   sh              wait
  167   167 TS       - Ss+  agetty          poll_schedule_timeout.constprop.3
  169   169 TS       - Ss   cjet            epoll_wait
  171   171 TS       - Ss   systemd-resolve epoll_wait
  172   172 TS       - Ss   jetstatic.bin   epoll_wait
  173   173 TS       - Ss   devscan.bin     epoll_wait
  175   175 TS       - I    kworker/1:2-eve worker_thread
  176   176 FF      50 S    irq/29-eth0     irq_thread
  177   177 FF      50 S    irq/40-ttyS0    irq_thread
  185   185 TS       - SLsl rawisolwebbits  poll_schedule_timeout.constprop.3
  185   187 FF      98 RLsl RawMeasThread   -
  194   194 TS       - Ss   systemd-udevd   epoll_wait
  200   200 TS       - I<   kworker/0:1H    worker_thread
  204   204 TS       - R+   ps              -

Best regards
Tim

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-25 18:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15 15:44 100% RT load on 2 processor SMP machine Tim Sander
2018-10-23  9:36 ` Sebastian Andrzej Siewior
2018-10-25  9:56   ` Tim Sander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.