soc.lore.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhouyi Zhou <zhouzhouyi@gmail.com>
To: "Jorge Ramirez-Ortiz, Foundries" <jorge@foundries.io>
Cc: paulmck@kernel.org, Josh Triplett <josh@joshtriplett.org>,
	 rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	 Lai Jiangshan <jiangshanlai@gmail.com>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>,
	 rcu <rcu@vger.kernel.org>,
	soc@kernel.org, linux-arm-kernel@lists.infradead.org
Subject: Re: rcu_preempt detected stalls
Date: Wed, 1 Sep 2021 01:01:11 +0800	[thread overview]
Message-ID: <CAABZP2xGnSLVbgxqjKMq=Oj_H7rYfYjuCvmBpmZ4tRptGs3SEw@mail.gmail.com> (raw)
Message-ID: <20210831170111.ftOr-K7l2idP8zoj3V-y_wsmeIJ_u1im_3zfoOffRAs@z> (raw)
In-Reply-To: <20210831152144.GA28128@trex>

I did an experiment just now on x86_64 virtual machines, rcu did not
complain after 10 minutes's test, I hope my effort can provide some
clue.

1. I clone the fresh new linux kernel (git clone
https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
./debian10.qcow2 -m 4096 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30)
4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
5. monitor the system by constantly typing top and dmesg
6. after 10 minutes, nothing else happens except that the dmesg report
following two messages
[  672.528192] sched: DL replenish lagged too much
[  751.127790] hrtimer: interrupt took 12143 ns

So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines

Zhouyi

On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
<jorge@foundries.io> wrote:
>
> Hi
>
> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>
> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> +->qsmask 0x0
> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [   62.622359] rcu: RCU grace-period kthread stack dump:
> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> [   62.637308] Call trace:
> [   62.639748]  __switch_to+0x11c/0x190
> [   62.643319]  __schedule+0x3b8/0x8d8
> [   62.646796]  schedule+0x4c/0x108
> [   62.650018]  schedule_timeout+0x1ac/0x358
> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> [   62.657933]  kthread+0x14c/0x158
> [   62.661153]  ret_from_fork+0x10/0x18
> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> [   62.689604] Preemption disabled at:
> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> [   62.706296] Hardware name: Zynqmp new (DT)
> [   62.710115] Call trace:
> [   62.712548]  dump_backtrace+0x0/0x240
> [   62.716202]  show_stack+0x2c/0x38
> [   62.719510]  dump_stack+0xcc/0x104
> [   62.722904]  __schedule_bug+0x78/0xc8
> [   62.726556]  __schedule+0x70c/0x8d8
> [   62.730037]  schedule+0x4c/0x108
> [   62.733259]  do_notify_resume+0x224/0x5d8
> [   62.737259]  work_pending+0xc/0x2a4
>
> The error results in OOM eventually.
>
> RCU priority boosting does work around this issue but it seems to me
> a workaround more than a fix (otherwise boosting would be enabled
> by CONFIG_PREEMPT for arm64 I guess?).
>
> The question is: is this an arm64 bug that should be investigated? or
> is this some known corner case of running stress-ng that is already
> understood?
>
> thanks
> Jorge
>
>
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Zhouyi Zhou <zhouzhouyi@gmail.com>
To: "Jorge Ramirez-Ortiz, Foundries" <jorge@foundries.io>
Cc: paulmck@kernel.org, Josh Triplett <josh@joshtriplett.org>,
	rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>,
	rcu <rcu@vger.kernel.org>,
	soc@kernel.org, linux-arm-kernel@lists.infradead.org
Subject: Re: rcu_preempt detected stalls
Date: Wed, 1 Sep 2021 01:01:11 +0800	[thread overview]
Message-ID: <CAABZP2xGnSLVbgxqjKMq=Oj_H7rYfYjuCvmBpmZ4tRptGs3SEw@mail.gmail.com> (raw)
Message-ID: <20210831170111.ZLS_kwXaNlX0nl_TtJAw9zH9GxVWdepDFu0DccAEE38@z> (raw)
In-Reply-To: <20210831152144.GA28128@trex>

I did an experiment just now on x86_64 virtual machines, rcu did not
complain after 10 minutes's test, I hope my effort can provide some
clue.

1. I clone the fresh new linux kernel (git clone
https://kernel.source.codeaurora.cn/pub/scm/linux/kernel/git/torvalds/linux.git)
2. compile the kernel without CONFIG_RCU_BOOST (: # CONFIG_RCU_BOOST is not set)
3. boot the kernel on a x86_64 VM (kvm -cpu host -smp 16  -hda
./debian10.qcow2 -m 4096 -net
user,hostfwd=tcp::5556-:22,hostfwd=tcp::5555-:19 -net nic,model=e1000
-vnc :30)
4. run the test (stress-ng --sequential 16  --class scheduler -t 5m --times)
5. monitor the system by constantly typing top and dmesg
6. after 10 minutes, nothing else happens except that the dmesg report
following two messages
[  672.528192] sched: DL replenish lagged too much
[  751.127790] hrtimer: interrupt took 12143 ns

So, I guess CONFIG_RCU_BOOST is not necessary for x86_64 virtual machines

Zhouyi

On Tue, Aug 31, 2021 at 11:24 PM Jorge Ramirez-Ortiz, Foundries
<jorge@foundries.io> wrote:
>
> Hi
>
> When enabling CONFIG_PREEMPT and running the stress-ng scheduler class
> tests on arm64 (xilinx zynqmp and imx imx8mm SoCs) we are observing the following.
>
> [   62.578917] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   62.585015]  (detected by 0, t=5253 jiffies, g=3017, q=2972)
> [   62.590663] rcu: All QSes seen, last rcu_preempt kthread activity 5254 (4294907943-4294902689), jiffies_till_next_fqs=1, root
> +->qsmask 0x0
> [   62.603086] rcu: rcu_preempt kthread starved for 5258 jiffies! g3017 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [   62.613246] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [   62.622359] rcu: RCU grace-period kthread stack dump:
> [   62.627395] task:rcu_preempt     state:R  running task     stack:    0 pid:   14 ppid:     2 flags:0x00000028
> [   62.637308] Call trace:
> [   62.639748]  __switch_to+0x11c/0x190
> [   62.643319]  __schedule+0x3b8/0x8d8
> [   62.646796]  schedule+0x4c/0x108
> [   62.650018]  schedule_timeout+0x1ac/0x358
> [   62.654021]  rcu_gp_kthread+0x6a8/0x12b8
> [   62.657933]  kthread+0x14c/0x158
> [   62.661153]  ret_from_fork+0x10/0x18
> [   62.682919] BUG: scheduling while atomic: stress-ng-hrtim/831/0x00000002
> [   62.689604] Preemption disabled at:
> [   62.689614] [<ffffffc010059418>] irq_enter_rcu+0x30/0x58
> [   62.698393] CPU: 0 PID: 831 Comm: stress-ng-hrtim Not tainted 5.10.42+ #5
> [   62.706296] Hardware name: Zynqmp new (DT)
> [   62.710115] Call trace:
> [   62.712548]  dump_backtrace+0x0/0x240
> [   62.716202]  show_stack+0x2c/0x38
> [   62.719510]  dump_stack+0xcc/0x104
> [   62.722904]  __schedule_bug+0x78/0xc8
> [   62.726556]  __schedule+0x70c/0x8d8
> [   62.730037]  schedule+0x4c/0x108
> [   62.733259]  do_notify_resume+0x224/0x5d8
> [   62.737259]  work_pending+0xc/0x2a4
>
> The error results in OOM eventually.
>
> RCU priority boosting does work around this issue but it seems to me
> a workaround more than a fix (otherwise boosting would be enabled
> by CONFIG_PREEMPT for arm64 I guess?).
>
> The question is: is this an arm64 bug that should be investigated? or
> is this some known corner case of running stress-ng that is already
> understood?
>
> thanks
> Jorge
>
>
>

  parent reply	other threads:[~2021-08-31 17:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210831152144.VOyu0gjmwOCWZBtSzDaOYQE-YYszY9tK_z8p-fZZ_kM@z>
2021-08-31 15:21 ` rcu_preempt detected stalls Jorge Ramirez-Ortiz, Foundries
     [not found]   ` <20210831155359.Lf_X-nNhopb2XqSc-L34l6LeYKbvMZK-etHxz_l8K68@z>
2021-08-31 15:53     ` Paul E. McKenney
     [not found]   ` <20210831170111.ftOr-K7l2idP8zoj3V-y_wsmeIJ_u1im_3zfoOffRAs@z>
2021-08-31 17:01     ` Zhouyi Zhou [this message]
2021-08-31 17:01       ` Zhouyi Zhou
     [not found]       ` <20210831171159.L6GWokiqRdP6iR7ifyRwmK4o4zLRDfv695oAA6v1sWc@z>
2021-08-31 17:11         ` Zhouyi Zhou
2021-08-31 17:11           ` Zhouyi Zhou
     [not found]           ` <20210901010332.atavvn1KrA_ubxTIThV4OTrLSH5nQAB3vaUEFWHzgho@z>
2021-09-01  1:03             ` Zhouyi Zhou
2021-09-01  1:03               ` Zhouyi Zhou
2021-09-01  4:08               ` Neeraj Upadhyay
     [not found]                 ` <20210901064718.aKXabQBXfHO1cXDOrnGPmUo0Uikxa2W9WUpwaf6UQZs@z>
2021-09-01  6:47                   ` Zhouyi Zhou
2021-09-01  6:47                     ` Zhouyi Zhou
     [not found]               ` <20210901082321.nj61nA4mXy_hgClhySzqE-uoWYVzA0aI3c0f_uguoIs@z>
2021-09-01  8:23                 ` Jorge Ramirez-Ortiz, Foundries
     [not found]                   ` <20210901091712.oSGeSTF7aXRLTs1rdT4GUH18sEFVlx_z4knvKekvi34@z>
2021-09-01  9:17                     ` Zhouyi Zhou
2021-09-01  9:17                       ` Zhouyi Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAABZP2xGnSLVbgxqjKMq=Oj_H7rYfYjuCvmBpmZ4tRptGs3SEw@mail.gmail.com' \
    --to=zhouzhouyi@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=jorge@foundries.io \
    --cc=josh@joshtriplett.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=soc@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).