linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Udo van den Heuvel <udovdh@xs4all.nl>
Cc: RT <linux-rt-users@vger.kernel.org>
Subject: Re: 5.4.13-rt7 stall on CPU?
Date: Fri, 3 Jul 2020 21:49:34 +0200	[thread overview]
Message-ID: <20200703194934.c5sdqwxwgzmgobtq@linutronix.de> (raw)
In-Reply-To: <3ef1ba37-6b83-e12a-e493-9c45fa3bb3c1@xs4all.nl>

On 2020-06-27 15:30:20 [+0200], Udo van den Heuvel wrote:
> Hello,
Hi,

> Found this in /var/log/messages:
> 
> Jun 25 16:31:39 vuurmuur pppd[1522583]: local  LL address fe80::ed36:3ac4:4115:e23e
> Jun 25 16:31:39 vuurmuur pppd[1522583]: remote LL address fe80::2a8a:1cff:fee0:9484
> Jun 26 04:50:24 vuurmuur kernel: 002: rcu: INFO: rcu_preempt self-detected stall on CPU
> Jun 26 04:50:24 vuurmuur kernel: 002: rcu:      2-....: (5336 ticks this GP) idle=f6a/1/0x4000000000000002 softirq=347363113/347363115 fqs=2430
> Jun 26 04:50:24 vuurmuur kernel: 002:   (t=5250 jiffies g=608224341 q=1297)
> Jun 26 04:50:24 vuurmuur kernel: 002: NMI backtrace for cpu 2
> Jun 26 04:50:24 vuurmuur kernel: 002: RIP: 0010:__fget_light+0x3d/0x60
> Jun 26 04:50:24 vuurmuur kernel: 002: Code: ca 75 2e 48 8b 50 50 8b 02 39 c7 73 21 89 f9 48 39 c1 48 19 c0 21 c7 48 8b 42 08 48 8d 04 f8 48 8b 00 48 85 c0 74 07 85 70 7c <75> 02 f3 c3 31 c0 c3 ba 01 00 00 00 e8 22 fe ff ff 48 85 c0 74 ee
> Jun 26 04:50:24 vuurmuur kernel: 002:  do_select+0x350/0x7a0
> Jun 26 04:50:24 vuurmuur kernel: 002:  core_sys_select+0x1d0/0x380
> Jun 26 04:50:24 vuurmuur kernel: 002:  __x64_sys_pselect6+0x141/0x190
> Jun 26 05:03:01 vuurmuur named[1433212]: received control channel command 'flush'
> 
> 
> What went wrong?

ntpq entered into kernel via pselect(). In that syscall it looped at
somepoint and RCU couldn't make any progress. Assuming you have
CONFIG_HZ=250 then it didn't make any progress for 5250/250 = 21
seconds. This stall piled 1297 callbacks up. The situation resolved by
itself later because this "rcu_preempt self-detected stall" did not
appear again.

> How bad is this?
Each callback would free a data structure i.e. give back memory to the
system. Since ntpq lead to a RCU stall, the system could no release
memory. You will run eventually out of memory if this situation does not
get resolved.

> How to avoid?
Can you reproduce this or was this one a time thing?
I *think* this happened within the loop in __fget_files(). This function
is inlined by __fget_light() and the loop has a RCU-section so it would
make sense.
Do you run something at an elevated priority in the system? I don't know
what the other part was doing but somehow one of the file descriptors
(network sockets probably) was about to be closed while the other side
tried to poll() on it.

> Kind regards,
> Udo

Sebastian

  reply	other threads:[~2020-07-03 19:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27 13:30 5.4.13-rt7 stall on CPU? Udo van den Heuvel
2020-07-03 19:49 ` Sebastian Andrzej Siewior [this message]
2020-07-04  3:43   ` Udo van den Heuvel
2020-07-07 17:47     ` Sebastian Andrzej Siewior
2020-07-20  9:21       ` Udo van den Heuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200703194934.c5sdqwxwgzmgobtq@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=udovdh@xs4all.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).