From: "Paul E. McKenney" <paulmck@kernel.org>
To: "Jorge Ramirez-Ortiz, Foundries" <jorge@foundries.io>
Cc: josh@joshtriplett.org, rostedt@goodmis.org,
mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com,
joel@joelfernandes.org, rcu@vger.kernel.org, soc@kernel.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: rcu_preempt detected stalls
Date: Tue, 31 Aug 2021 08:53:59 -0700 [thread overview]
Message-ID: <20210831155359.GB4156@paulmck-ThinkPad-P17-Gen-1> (raw)
Message-ID: <20210831155359.Lf_X-nNhopb2XqSc-L34l6LeYKbvMZK-etHxz_l8K68@z> (raw)
In-Reply-To: <20210831152144.GA28128@trex>
On Tue, Aug 31, 2021 at 05:21:44PM +0200, Jorge Ramirez-Ortiz, Foundries wrote:
> Hi
>
> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>
> [ 62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 62.585015] (detected by 0, t=5253 jiffies, g=3017, q=2972)
> [ 62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> +->qsmask 0x0
> [ 62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [ 62.613246] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
The message above really does mean what it says: If your workload
prevents RCU's grace-period kthread ("rcu_preempt" in this case) from
running, you just bought yourself an OOM.
> [ 62.622359] rcu: RCU grace-period kthread stack dump:
> [ 62.627395] task:rcu_preempt state:R running task stack: 0 pid: 14 ppid: 2 flags:0x00000028
> [ 62.637308] Call trace:
> [ 62.639748] __switch_to+0x11c/0x190
> [ 62.643319] __schedule+0x3b8/0x8d8
> [ 62.646796] schedule+0x4c/0x108
> [ 62.650018] schedule_timeout+0x1ac/0x358
> [ 62.654021] rcu_gp_kthread+0x6a8/0x12b8
> [ 62.657933] kthread+0x14c/0x158
> [ 62.661153] ret_from_fork+0x10/0x18
> [ 62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> [ 62.689604] Preemption disabled at:
> [ 62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> [ 62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> [ 62.706296] Hardware name: Zynqmp new (DT)
> [ 62.710115] Call trace:
> [ 62.712548] dump_backtrace+0x0/0x240
> [ 62.716202] show_stack+0x2c/0x38
> [ 62.719510] dump_stack+0xcc/0x104
> [ 62.722904] __schedule_bug+0x78/0xc8
> [ 62.726556] __schedule+0x70c/0x8d8
> [ 62.730037] schedule+0x4c/0x108
> [ 62.733259] do_notify_resume+0x224/0x5d8
> [ 62.737259] work_pending+0xc/0x2a4
>
> The error results in OOM eventually.
>
> RCU priority boosting does work around this issue but it seems to me
> a workaround more than a fix (otherwise boosting would be enabled
> by CONFIG_PREEMPT for arm64 I guess?).
RCU priority boosting sets the rcu_preempt kthread's scheduling priority
to SCHED_FIFO priority level 1 instead of the normal SCHED_OTHER.
Therefore, if you build with CONFIG_RCU_BOOST=n, but manually set the
priority of rcu_preempt to SCHED_FIFO priority level 1, you might also
see this RCU CPU stall warning go away.
> The question is: is this an arm64 bug that should be investigated? or
> is this some known corner case of running stress-ng that is already
> understood?
I have not looked at stress-ng, but it is possible to configure your
system so that rcu_preempt gets little or no CPU time, for example,
by placing it into a CPU-poor cgroup on the one hand or by disabling
throttling and running a heavy real-time workload on the other.
Is stress-ng doing something like this?
There could of course also be an arm64 problem that affect scheduling,
but I suggest looking closely at what stress-ng is doing first.
Please let me know how it goes!
Thanx, Paul
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-08-31 15:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210831152144.VOyu0gjmwOCWZBtSzDaOYQE-YYszY9tK_z8p-fZZ_kM@z>
2021-08-31 15:21 ` rcu_preempt detected stalls Jorge Ramirez-Ortiz, Foundries
[not found] ` <20210831155359.Lf_X-nNhopb2XqSc-L34l6LeYKbvMZK-etHxz_l8K68@z>
2021-08-31 15:53 ` Paul E. McKenney [this message]
[not found] ` <20210831170111.ftOr-K7l2idP8zoj3V-y_wsmeIJ_u1im_3zfoOffRAs@z>
2021-08-31 17:01 ` Zhouyi Zhou
2021-08-31 17:01 ` Zhouyi Zhou
[not found] ` <20210831171159.L6GWokiqRdP6iR7ifyRwmK4o4zLRDfv695oAA6v1sWc@z>
2021-08-31 17:11 ` Zhouyi Zhou
2021-08-31 17:11 ` Zhouyi Zhou
[not found] ` <20210901010332.atavvn1KrA_ubxTIThV4OTrLSH5nQAB3vaUEFWHzgho@z>
2021-09-01 1:03 ` Zhouyi Zhou
2021-09-01 1:03 ` Zhouyi Zhou
2021-09-01 4:08 ` Neeraj Upadhyay
[not found] ` <20210901064718.aKXabQBXfHO1cXDOrnGPmUo0Uikxa2W9WUpwaf6UQZs@z>
2021-09-01 6:47 ` Zhouyi Zhou
2021-09-01 6:47 ` Zhouyi Zhou
[not found] ` <20210901082321.nj61nA4mXy_hgClhySzqE-uoWYVzA0aI3c0f_uguoIs@z>
2021-09-01 8:23 ` Jorge Ramirez-Ortiz, Foundries
[not found] ` <20210901091712.oSGeSTF7aXRLTs1rdT4GUH18sEFVlx_z4knvKekvi34@z>
2021-09-01 9:17 ` Zhouyi Zhou
2021-09-01 9:17 ` Zhouyi Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210831155359.GB4156@paulmck-ThinkPad-P17-Gen-1 \
--to=paulmck@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=jorge@foundries.io \
--cc=josh@joshtriplett.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=soc@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).