All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, mingo@kernel.org, jiangshanlai@gmail.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	fweisbec@gmail.com, oleg@redhat.com, joel@joelfernandes.org,
	Yanfei Xu <yanfei.xu@windriver.com>
Subject: Re: [PATCH rcu 02/18] rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
Date: Tue, 3 Aug 2021 08:52:26 -0700	[thread overview]
Message-ID: <20210803155226.GQ4397@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20210803142458.teveyn6t2gwifdcp@e107158-lin.cambridge.arm.com>

On Tue, Aug 03, 2021 at 03:24:58PM +0100, Qais Yousef wrote:
> Hi
> 
> On 07/21/21 13:21, Paul E. McKenney wrote:
> > From: Yanfei Xu <yanfei.xu@windriver.com>
> > 
> > If rcu_print_task_stall() is invoked on an rcu_node structure that does
> > not contain any tasks blocking the current grace period, it takes an
> > early exit that fails to release that rcu_node structure's lock.  This
> > results in a self-deadlock, which is detected by lockdep.
> > 
> > To reproduce this bug:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0"
> > 
> > This will also result in other complaints, including RCU's scheduler
> > hook complaining about blocking rather than preemption and an rcutorture
> > writer stall.
> > 
> > Only a partial RCU CPU stall warning message will be printed because of
> > the self-deadlock.
> > 
> > This commit therefore releases the lock on the rcu_print_task_stall()
> > function's early exit path.
> > 
> > Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> > Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> 
> We are seeing similar stall/deadlock issue on android 5.10 kernel, is the fix
> relevant here? Trying to apply the patches and test, but the problem is tricky
> to reproduce so thought worth asking first.

Looks like the relevant symptoms to me, so I suggest trying this series
from -rcu:

8baded711edc ("rcu: Fix to include first blocked task in stall warning")
f6b3995a8b56 ("rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock")

							Thanx, Paul

> 	[ 1010.244334] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:"}
> 	[ 1010.250538] rcu: \t2-...!: (42 GPs behind) idle=884/0/0x0 softirq=1646/1647 fqs=0  (false positive?)"}
> 	[ 1010.259746] rcu: \t3-...!: (1 ticks this GP) idle=134/0/0x0 softirq=3205/3205 fqs=0  (false positive?)"}
> 	[ 1010.269013] \t(detected by 4, t=6502 jiffies, g=9121, q=110)"}
> 	[ 1010.274621] "}
> 	[ 1010.276115] ============================================"}
> 	[ 1010.281438] WARNING: possible recursive locking detected"}
> 	[ 1010.286763] 5.10.43-g92fdbb553f50-ab979 #1 Not tainted"}
> 	[ 1010.291912] --------------------------------------------"}
> 	[ 1010.297235] swapper/4/0 is trying to acquire lock:"}
> 	[ 1010.302037] ffff8000121fe618 (rcu_node_0){-.-.}-{2:2}, at: rcu_dump_cpu_stacks+0x60/0xf8"}
> 	[ 1010.310183] "}
> 	[ 1010.310183] but task is already holding lock:"}
> 	[ 1010.316028] ffff8000121fe618 (rcu_node_0){-.-.}-{2:2}, at: rcu_sched_clock_irq+0x658/0xca0"}
> 	[ 1010.324341] "}
> 	[ 1010.324341] other info that might help us debug this:"}
> 	[ 1010.330882]  Possible unsafe locking scenario:"}
> 	[ 1010.330882] "}
> 	[ 1010.336813]        CPU0"}
> 	[ 1010.339263]        ----"}
> 	[ 1010.341713]   lock(rcu_node_0);"}
> 	[ 1010.344872]   lock(rcu_node_0);"}
> 	[ 1010.348029] "}
> 	[ 1010.348029]  *** DEADLOCK ***"}
> 	[ 1010.348029] "}
> 	[ 1010.353961]  May be due to missing lock nesting notation"}
> 	[ 1010.353961] "}
> 	[ 1010.360764] 1 lock held by swapper/4/0:"}
> 	[ 1010.364607]  #0: ffff8000121fe618 (rcu_node_0){-.-.}-{2:2}, at: rcu_sched_clock_irq+0x658/0xca0"}
> 	[ 1010.373359] "}
> 	[ 1010.373359] stack backtrace:"}
> 	[ 1010.377729] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.10.43-g92fdbb553f50-ab979 #1"}
> 	[ 1010.385489] Hardware name: ARM Juno development board (r0) (DT)"}
> 	[ 1010.391421] Call trace:"}
> 	[ 1010.393877]  dump_backtrace+0x0/0x1c0"}
> 	[ 1010.397551]  show_stack+0x24/0x30"}
> 	[ 1010.400877]  dump_stack_lvl+0xf4/0x130"}
> 	[ 1010.404636]  dump_stack+0x18/0x58"}
> 	[ 1010.407962]  __lock_acquire+0xa18/0x1ffc"}
> 	[ 1010.411895]  lock_acquire.part.0+0xc8/0x30c"}
> 	[ 1010.416089]  lock_acquire+0x68/0x84"}
> 	[ 1010.419589]  _raw_spin_lock_irqsave+0x84/0x158"}
> 	[ 1010.424045]  rcu_dump_cpu_stacks+0x60/0xf8"}
> 	[ 1010.428154]  rcu_sched_clock_irq+0x8d8/0xca0"}
> 	[ 1010.432435]  update_process_times+0x6c/0xb0"}
> 	[ 1010.436632]  tick_sched_handle+0x3c/0x60"}
> 	[ 1010.440566]  tick_sched_timer+0x58/0xb0"}
> 	[ 1010.444412]  __hrtimer_run_queues+0x1a4/0x5b0"}
> 	[ 1010.448781]  hrtimer_interrupt+0xf4/0x2cc"}
> 	[ 1010.452803]  arch_timer_handler_phys+0x40/0x50"}
> 	[ 1010.457260]  handle_percpu_devid_irq+0x98/0x180"}
> 	[ 1010.461803]  __handle_domain_irq+0x80/0xec"}
> 	[ 1010.465911]  gic_handle_irq+0x5c/0xf0"}
> 	[ 1010.469583]  el1_irq+0xcc/0x180"}
> 	[ 1010.472735]  cpuidle_enter_state+0xe8/0x350"}
> 	[ 1010.476929]  cpuidle_enter+0x44/0x60"}
> 	[ 1010.480515]  do_idle+0x25c/0x2f0"}
> 	[ 1010.483751]  cpu_startup_entry+0x34/0x8c"}
> 	[ 1010.487687]  secondary_start_kernel+0x154/0x190"}
> 
> 
> Thanks
> 
> --
> Qais Yousef

  reply	other threads:[~2021-08-03 15:52 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-21 20:20 [PATCH rcu 0/18] Miscellaneous fixes for v5.15 Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 01/18] rcu: Fix to include first blocked task in stall warning Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 02/18] rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock Paul E. McKenney
2021-08-03 14:24   ` Qais Yousef
2021-08-03 15:52     ` Paul E. McKenney [this message]
2021-08-03 16:12       ` Qais Yousef
2021-08-03 16:28         ` Paul E. McKenney
2021-08-03 16:33           ` Qais Yousef
2021-08-04 13:50           ` Qais Yousef
2021-08-04 22:33             ` Paul E. McKenney
2021-08-06  9:56               ` Qais Yousef
2021-08-06  9:57   ` Qais Yousef
2021-08-06 11:43     ` Paul E. McKenney
2021-08-06 12:33       ` Qais Yousef
2021-07-21 20:21 ` [PATCH rcu 03/18] rcu: Remove special bit at the bottom of the ->dynticks counter Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates Paul E. McKenney
2021-07-21 20:41   ` Linus Torvalds
2021-07-21 21:25     ` Paul E. McKenney
2021-07-28 17:37   ` [PATCH v2 " Paul E. McKenney
2021-07-28 17:58     ` Linus Torvalds
2021-07-28 18:12       ` Mathieu Desnoyers
2021-07-28 18:32         ` Linus Torvalds
2021-07-28 18:39           ` Mathieu Desnoyers
2021-07-28 18:46         ` Paul E. McKenney
2021-07-28 18:46       ` Paul E. McKenney
2021-07-28 18:57         ` Linus Torvalds
2021-07-28 18:23     ` Mathieu Desnoyers
2021-07-28 18:58       ` Paul E. McKenney
2021-07-28 19:45         ` Paul E. McKenney
2021-07-28 20:03           ` Mathieu Desnoyers
2021-07-28 20:28             ` Paul E. McKenney
2021-07-29 14:41               ` Mathieu Desnoyers
2021-07-29 15:57                 ` Paul E. McKenney
2021-07-29 17:41                   ` Mathieu Desnoyers
2021-07-29 18:05                     ` Paul E. McKenney
2021-07-29 18:42                       ` Mathieu Desnoyers
2021-07-28 20:37     ` Josh Triplett
2021-07-28 20:47       ` Paul E. McKenney
2021-07-28 22:23         ` Frederic Weisbecker
2021-07-29  1:07           ` Paul E. McKenney
2021-07-29  7:58   ` [PATCH " Boqun Feng
2021-07-29 10:53     ` Frederic Weisbecker
2021-07-30  5:56       ` Boqun Feng
2021-07-30 17:18         ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 05/18] rcu: Mark accesses to ->rcu_read_lock_nesting Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 06/18] rculist: Unify documentation about missing list_empty_rcu() Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 07/18] rcu/tree: Handle VM stoppage in stall detection Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 08/18] rcu: Do not disable GP stall detection in rcu_cpu_stall_reset() Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 09/18] rcu: Start timing stall repetitions after warning complete Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 10/18] srcutiny: Mark read-side data races Paul E. McKenney
2021-07-29  8:23   ` Boqun Feng
2021-07-29 13:36     ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 11/18] rcu: Mark lockless ->qsmask read in rcu_check_boost_fail() Paul E. McKenney
2021-07-29  8:54   ` Boqun Feng
2021-07-29 14:03     ` Paul E. McKenney
2021-07-30  2:28       ` Boqun Feng
2021-07-30  3:26         ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 12/18] rcu: Make rcu_gp_init() and rcu_gp_fqs_loop noinline to conserve stack Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 13/18] rcu: Remove trailing spaces and tabs Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 14/18] rcu: Mark accesses in tree_stall.h Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 15/18] rcu: Remove useless "ret" update in rcu_gp_fqs_loop() Paul E. McKenney
2021-08-03 16:48   ` Joe Perches
2021-08-03 17:10     ` Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 16/18] rcu: Use per_cpu_ptr to get the pointer of per_cpu variable Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 17/18] rcu: Explain why rcu_all_qs() is a stub in preemptible TREE RCU Paul E. McKenney
2021-07-21 20:21 ` [PATCH rcu 18/18] rcu: Print human-readable message for schedule() in RCU reader Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210803155226.GQ4397@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=yanfei.xu@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.