linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock
@ 2021-05-16  9:50 yanfei.xu
  2021-05-16 12:33 ` Xu, Yanfei
  2021-05-16 22:58 ` Paul E. McKenney
  0 siblings, 2 replies; 4+ messages in thread
From: yanfei.xu @ 2021-05-16  9:50 UTC (permalink / raw)
  To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, joel
  Cc: rcu, linux-kernel

From: Yanfei Xu <yanfei.xu@windriver.com>

rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
don't contain tasks which blocking the GP. However this rcu_node->lock
will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
non-zero. As a result the cpu will hung by this deadlock.

Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
---
v1->v2:
    1.change the lock function to unlock function.
    2.add fixes tag.

 kernel/rcu/tree_stall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b72311d24a9f..b09a7140ef77 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 	struct task_struct *ts[8];
 
 	lockdep_assert_irqs_disabled();
-	if (!rcu_preempt_blocked_readers_cgp(rnp))
+	if (!rcu_preempt_blocked_readers_cgp(rnp)) {
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		return 0;
+	}
 	pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
 	       rnp->level, rnp->grplo, rnp->grphi);
 	t = list_entry(rnp->gp_tasks->prev,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock
  2021-05-16  9:50 [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock yanfei.xu
@ 2021-05-16 12:33 ` Xu, Yanfei
  2021-05-16 22:58 ` Paul E. McKenney
  1 sibling, 0 replies; 4+ messages in thread
From: Xu, Yanfei @ 2021-05-16 12:33 UTC (permalink / raw)
  To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, joel
  Cc: rcu, linux-kernel

Hi Paul,

Should I merge this patch and the before one into one? If need please 
tell me and I will do it. :)
In addition, before these two patch the bug will lead a phenomenon which 
is "BUG: scheduling while atomic:". Because the preempt_count is 
disabled in tick irq  while missing release the rcu_node->lock.

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:    Tasks blocked on level-1 rcu_node (CPUs 0-11):
         (detected by 3, t=6504 jiffies, g=34033, q=10745911)
rcu: All QSes seen, last rcu_preempt kthread activity 28 
(4295088530-4295088502), jiffies_till_next_fqs=1, root ->qsmask 0x1
BUG: scheduling while atomic: msgstress04/90186/0x00000002
INFO: lockdep is turned off.
Modules linked in: sch_fq_codel
irq event stamp: 0
hardirqs last  enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffff80001004d57c>] 
copy_process+0x678/0x2790
softirqs last  enabled at (0): [<ffff80001004d57c>] 
copy_process+0x678/0x2790
softirqs last disabled at (0): [<0000000000000000>] 0x0
Preemption disabled at:
[<ffff800010402744>] find_and_remove_object+0x34/0xd0
CPU: 3 PID: 90186 Comm: msgstress04 Kdump: loaded Not tainted 
5.12.2-yoctodev-standard #1
Hardware name: Marvell OcteonTX CN96XX board (DT)
Call trace:
  dump_backtrace+0x0/0x2cc
  show_stack+0x24/0x30
  dump_stack+0x110/0x188
  __schedule_bug+0x100/0x114
  __schedule+0xe5c/0xfd4
  schedule+0x70/0x16c
  do_notify_resume+0xe4/0x19d0
  work_pending+0xc/0x2a8


Regards,
Yanfei

On 5/16/21 5:50 PM, yanfei.xu@windriver.com wrote:
> From: Yanfei Xu <yanfei.xu@windriver.com>
> 
> rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
> don't contain tasks which blocking the GP. However this rcu_node->lock
> will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
> non-zero. As a result the cpu will hung by this deadlock.
> 
> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
> ---
> v1->v2:
>      1.change the lock function to unlock function.
>      2.add fixes tag.
> 
>   kernel/rcu/tree_stall.h | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b72311d24a9f..b09a7140ef77 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
>   	struct task_struct *ts[8];
>   
>   	lockdep_assert_irqs_disabled();
> -	if (!rcu_preempt_blocked_readers_cgp(rnp))
> +	if (!rcu_preempt_blocked_readers_cgp(rnp)) {
> +		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>   		return 0;
> +	}
>   	pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
>   	       rnp->level, rnp->grplo, rnp->grphi);
>   	t = list_entry(rnp->gp_tasks->prev,
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock
  2021-05-16  9:50 [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock yanfei.xu
  2021-05-16 12:33 ` Xu, Yanfei
@ 2021-05-16 22:58 ` Paul E. McKenney
  2021-05-17  1:55   ` Xu, Yanfei
  1 sibling, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2021-05-16 22:58 UTC (permalink / raw)
  To: yanfei.xu
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, linux-kernel

On Sun, May 16, 2021 at 05:50:10PM +0800, yanfei.xu@windriver.com wrote:
> From: Yanfei Xu <yanfei.xu@windriver.com>
> 
> rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
> don't contain tasks which blocking the GP. However this rcu_node->lock
> will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
> non-zero. As a result the cpu will hung by this deadlock.
> 
> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>

Also a good catch, thank you!  Queued for further review and testing,
wordsmithed as shown below.  The rcutorture scripts have been known to
work on ARM in the past, and might still do so.  (I test on x86.)

As always, please check to make sure that I didn't mess something up.

							Thanx, Paul

------------------------------------------------------------------------

commit e0a9b77f245ae4fe1537120fd5319bf9e091618e
Author: Yanfei Xu <yanfei.xu@windriver.com>
Date:   Sun May 16 17:50:10 2021 +0800

    rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
    
    If rcu_print_task_stall() is invoked on an rcu_node structure that does
    not contain any tasks blocking the current grace period, it takes an
    early exit that fails to release that rcu_node structure's lock.  This
    results in a self-deadlock, which is detected by lockdep.
    
    To reproduce this bug:
    
    tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0"
    
    This will also result in other complaints, including RCU's scheduler
    hook complaining about blocking rather than preemption and an rcutorture
    writer stall.
    
    Only a partial RCU CPU stall warning message will be printed because of
    the self-deadlock.
    
    This commit therefore releases the lock on the rcu_print_task_stall()
    function's early exit path.
    
    Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
    Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index a10ea1f1f81f..d574e3bbd929 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 	struct task_struct *ts[8];
 
 	lockdep_assert_irqs_disabled();
-	if (!rcu_preempt_blocked_readers_cgp(rnp))
+	if (!rcu_preempt_blocked_readers_cgp(rnp)) {
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		return 0;
+	}
 	pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
 	       rnp->level, rnp->grplo, rnp->grphi);
 	t = list_entry(rnp->gp_tasks->prev,

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock
  2021-05-16 22:58 ` Paul E. McKenney
@ 2021-05-17  1:55   ` Xu, Yanfei
  0 siblings, 0 replies; 4+ messages in thread
From: Xu, Yanfei @ 2021-05-17  1:55 UTC (permalink / raw)
  To: paulmck
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, linux-kernel



On 5/17/21 6:58 AM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
> 
> On Sun, May 16, 2021 at 05:50:10PM +0800, yanfei.xu@windriver.com wrote:
>> From: Yanfei Xu <yanfei.xu@windriver.com>
>>
>> rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
>> don't contain tasks which blocking the GP. However this rcu_node->lock
>> will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
>> non-zero. As a result the cpu will hung by this deadlock.
>>
>> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
>> Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
> 
> Also a good catch, thank you!  Queued for further review and testing,
> wordsmithed as shown below.  The rcutorture scripts have been known to
> work on ARM in the past, and might still do so.  (I test on x86.)
> 
> As always, please check to make sure that I didn't mess something up.
> 

Looks good to me, Thanks!

Regards,
Yanfei

>                                                          Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit e0a9b77f245ae4fe1537120fd5319bf9e091618e
> Author: Yanfei Xu <yanfei.xu@windriver.com>
> Date:   Sun May 16 17:50:10 2021 +0800
> 
>      rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
> 
>      If rcu_print_task_stall() is invoked on an rcu_node structure that does
>      not contain any tasks blocking the current grace period, it takes an
>      early exit that fails to release that rcu_node structure's lock.  This
>      results in a self-deadlock, which is detected by lockdep.
> 
>      To reproduce this bug:
> 
>      tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0"
> 
>      This will also result in other complaints, including RCU's scheduler
>      hook complaining about blocking rather than preemption and an rcutorture
>      writer stall.
> 
>      Only a partial RCU CPU stall warning message will be printed because of
>      the self-deadlock.
> 
>      This commit therefore releases the lock on the rcu_print_task_stall()
>      function's early exit path.
> 
>      Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
>      Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index a10ea1f1f81f..d574e3bbd929 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
>          struct task_struct *ts[8];
> 
>          lockdep_assert_irqs_disabled();
> -       if (!rcu_preempt_blocked_readers_cgp(rnp))
> +       if (!rcu_preempt_blocked_readers_cgp(rnp)) {
> +               raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>                  return 0;
> +       }
>          pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
>                 rnp->level, rnp->grplo, rnp->grphi);
>          t = list_entry(rnp->gp_tasks->prev,
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-17  1:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16  9:50 [PATCH v2] rcu: fix a deadlock caused by not release rcu_node->lock yanfei.xu
2021-05-16 12:33 ` Xu, Yanfei
2021-05-16 22:58 ` Paul E. McKenney
2021-05-17  1:55   ` Xu, Yanfei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).