All of lore.kernel.org
 help / color / mirror / Atom feed
* perf related lockdep bug
@ 2015-11-04  5:17 Dave Jones
  2015-11-04 10:21 ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2015-11-04  5:17 UTC (permalink / raw)
  To: Linux Kernel, Peter Zijlstra, Ingo Molnar, Stephane Eranian

This is new to me since todays merge window.

	Dave

[  293.955261] ======================================================
[  293.955395] [ INFO: possible circular locking dependency detected ]
[  293.955536] 4.3.0-think+ #3 Not tainted
[  293.955618] -------------------------------------------------------
[  293.955756] trinity-c343/12586 is trying to acquire lock:
[  293.955875]  (rcu_node_0){-.-...}, at: [<ffffffffb00eb734>] rcu_report_exp_cpu_mult+0x24/0x70
[  293.956070] 
               but task is already holding lock:
[  293.956194]  (&ctx->lock){-.....}, at: [<ffffffffb0180453>] perf_lock_task_context+0x53/0x300
[  293.956386] 
               which lock already depends on the new lock.

[  293.956559] 
               the existing dependency chain (in reverse order) is:
[  293.956720] 
               -> #3 (&ctx->lock){-.....}:
[  293.956807]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
[  293.956940]        [<ffffffffb07efdd1>] _raw_spin_lock+0x41/0x80
[  293.957075]        [<ffffffffb0185b66>] __perf_event_task_sched_out+0x276/0x700
[  293.957239]        [<ffffffffb07e9f70>] __schedule+0x530/0xaf0
[  293.957370]        [<ffffffffb07ea6df>] schedule+0x3f/0x90
[  293.957493]        [<ffffffffb0001206>] exit_to_usermode_loop+0x76/0xa0
[  293.957642]        [<ffffffffb0001c13>] prepare_exit_to_usermode+0x53/0x60
[  293.957795]        [<ffffffffb07f1867>] retint_user+0x8/0x20
[  293.957923] 
               -> #2 (&rq->lock){-.-.-.}:
[  293.958008]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
[  293.958140]        [<ffffffffb07efdd1>] _raw_spin_lock+0x41/0x80
[  293.958275]        [<ffffffffb00b198d>] sched_move_task+0x4d/0x270
[  293.958417]        [<ffffffffb00b1bde>] cpu_cgroup_fork+0xe/0x10
[  293.958551]        [<ffffffffb012a057>] cgroup_post_fork+0x67/0x160
[  293.958692]        [<ffffffffb007ae0a>] copy_process+0x147a/0x1e10
[  293.958831]        [<ffffffffb007b927>] _do_fork+0x97/0x680
[  293.958955]        [<ffffffffb007bf39>] kernel_thread+0x29/0x30
[  293.959087]        [<ffffffffb07e8477>] rest_init+0x27/0x140
[  293.959211]        [<ffffffffb0ef1f72>] start_kernel+0x450/0x45d
[  293.959347]        [<ffffffffb0ef12fe>] x86_64_start_reservations+0x2a/0x2c
[  293.959503]        [<ffffffffb0ef1468>] x86_64_start_kernel+0x168/0x176
[  293.959652] 
               -> #1 (&p->pi_lock){-.-.-.}:
[  293.959742]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
[  293.959876]        [<ffffffffb07f00ec>] _raw_spin_lock_irqsave+0x4c/0x90
[  293.960026]        [<ffffffffb00ac070>] try_to_wake_up+0x30/0x5f0
[  293.960165]        [<ffffffffb00ac653>] wake_up_process+0x23/0x40
[  293.960300]        [<ffffffffb0f0b7a0>] rcu_spawn_gp_kthread+0xd5/0xfd
[  293.960449]        [<ffffffffb00002d3>] do_one_initcall+0xb3/0x1f0
[  293.960587]        [<ffffffffb0ef2080>] kernel_init_freeable+0x101/0x234
[  293.960736]        [<ffffffffb07e859e>] kernel_init+0xe/0xe0
[  293.960860]        [<ffffffffb07f10ff>] ret_from_fork+0x3f/0x70
[  293.960993] 
               -> #0 (rcu_node_0){-.-...}:
[  293.961081]        [<ffffffffb00d4722>] __lock_acquire+0x19d2/0x1cf0
[  293.961222]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
[  293.961351]        [<ffffffffb07f00ec>] _raw_spin_lock_irqsave+0x4c/0x90
[  293.961501]        [<ffffffffb00eb734>] rcu_report_exp_cpu_mult+0x24/0x70
[  293.961652]        [<ffffffffb00f48e9>] rcu_read_unlock_special+0x109/0x5e0
[  293.961808]        [<ffffffffb00e88f7>] __rcu_read_unlock+0x87/0x90
[  293.961947]        [<ffffffffb018059d>] perf_lock_task_context+0x19d/0x300
[  293.962101]        [<ffffffffb018b5bf>] perf_event_init_task+0x9f/0x370
[  293.962248]        [<ffffffffb007a18a>] copy_process+0x7fa/0x1e10
[  293.962385]        [<ffffffffb007b927>] _do_fork+0x97/0x680
[  293.962508]        [<ffffffffb007bfb9>] SyS_clone+0x19/0x20
[  293.962631]        [<ffffffffb07f0d97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[  293.962787] 
               other info that might help us debug this:

[  293.962955] Chain exists of:
                 rcu_node_0 --> &rq->lock --> &ctx->lock

[  293.963106]  Possible unsafe locking scenario:

[  293.963233]        CPU0                    CPU1
[  293.963332]        ----                    ----
[  293.963430]   lock(&ctx->lock);
[  293.963500]                                lock(&rq->lock);
[  293.963627]                                lock(&ctx->lock);
[  293.963755]   lock(rcu_node_0);
[  293.963824] 
                *** DEADLOCK ***

[  293.963947] 2 locks held by trinity-c343/12586:
[  293.964047]  #0:  (rcu_read_lock){......}, at: [<ffffffffb01804d0>] perf_lock_task_context+0xd0/0x300
[  293.964258]  #1:  (&ctx->lock){-.....}, at: [<ffffffffb0180453>] perf_lock_task_context+0x53/0x300
[  293.972934] 
               stack backtrace:
[  293.990221] CPU: 3 PID: 12586 Comm: trinity-c343 Not tainted 4.3.0-think+ #3
[  293.999210]  ffff8804fb6824a0 ffff88044a0efa60 ffffffffb03607bc ffffffffb1779630
[  294.008115]  ffff88044a0efaa0 ffffffffb00d1453 ffff88044a0efb10 0000000000000002
[  294.017066]  ffff8804fb682468 ffff8804fb6824a0 0000000000000001 ffff8804fb681b80
[  294.026109] Call Trace:
[  294.035079]  [<ffffffffb03607bc>] dump_stack+0x4e/0x82
[  294.044014]  [<ffffffffb00d1453>] print_circular_bug+0x1e3/0x250
[  294.052982]  [<ffffffffb00d4722>] __lock_acquire+0x19d2/0x1cf0
[  294.061926]  [<ffffffffb00cedf9>] ? get_lock_stats+0x19/0x60
[  294.070878]  [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
[  294.079828]  [<ffffffffb00eb734>] ? rcu_report_exp_cpu_mult+0x24/0x70
[  294.088805]  [<ffffffffb07f00ec>] _raw_spin_lock_irqsave+0x4c/0x90
[  294.097771]  [<ffffffffb00eb734>] ? rcu_report_exp_cpu_mult+0x24/0x70
[  294.106807]  [<ffffffffb00eb734>] rcu_report_exp_cpu_mult+0x24/0x70
[  294.115823]  [<ffffffffb00f48e9>] rcu_read_unlock_special+0x109/0x5e0
[  294.124813]  [<ffffffffb00cedf9>] ? get_lock_stats+0x19/0x60
[  294.133801]  [<ffffffffb0180453>] ? perf_lock_task_context+0x53/0x300
[  294.142790]  [<ffffffffb00e88f7>] __rcu_read_unlock+0x87/0x90
[  294.151675]  [<ffffffffb018059d>] perf_lock_task_context+0x19d/0x300
[  294.160420]  [<ffffffffb01804d0>] ? perf_lock_task_context+0xd0/0x300
[  294.169166]  [<ffffffffb018b5bf>] perf_event_init_task+0x9f/0x370
[  294.177832]  [<ffffffffb00a64a5>] ? preempt_count_sub+0xa5/0xf0
[  294.186477]  [<ffffffffb007a18a>] copy_process+0x7fa/0x1e10
[  294.195036]  [<ffffffffb00cedf9>] ? get_lock_stats+0x19/0x60
[  294.203564]  [<ffffffffb00ced1d>] ? trace_hardirqs_off+0xd/0x10
[  294.212042]  [<ffffffffb0146101>] ? __acct_update_integrals+0x91/0xf0
[  294.220411]  [<ffffffffb00a64a5>] ? preempt_count_sub+0xa5/0xf0
[  294.228778]  [<ffffffffb007b927>] _do_fork+0x97/0x680
[  294.237038]  [<ffffffffb018fd73>] ? context_tracking_user_exit+0x13/0x20
[  294.245322]  [<ffffffffb00017ca>] ? enter_from_user_mode+0x1a/0x50
[  294.253586]  [<ffffffffb00018bc>] ? syscall_trace_enter_phase1+0xbc/0x110
[  294.261919]  [<ffffffffb0001017>] ? trace_hardirqs_on_thunk+0x17/0x19
[  294.270306]  [<ffffffffb007bfb9>] SyS_clone+0x19/0x20
[  294.278672]  [<ffffffffb07f0d97>] entry_SYSCALL_64_fastpath+0x12/0x6f


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04  5:17 perf related lockdep bug Dave Jones
@ 2015-11-04 10:21 ` Peter Zijlstra
  2015-11-04 10:28   ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2015-11-04 10:21 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian
  Cc: Paul McKenney, Andi Kleen

On Wed, Nov 04, 2015 at 12:17:17AM -0500, Dave Jones wrote:
> This is new to me since todays merge window.
> 

Thanks, 'interesting' indeed. I've one complaint from Andi about his
machine going belly up which might maybe be explained by this.

The problem appears to be due to the new RCU expedited grace period
stuff, with rcu_read_unlock() now randomly trying to acquire locks it
previously didn't.

Lemme go look at those rcu bits again..

---

> [  293.955261] ======================================================
> [  293.955395] [ INFO: possible circular locking dependency detected ]
> [  293.955536] 4.3.0-think+ #3 Not tainted
> [  293.955618] -------------------------------------------------------
> [  293.955756] trinity-c343/12586 is trying to acquire lock:
> [  293.955875]  (rcu_node_0){-.-...}, at: [<ffffffffb00eb734>] rcu_report_exp_cpu_mult+0x24/0x70
> [  293.956070] 
>                but task is already holding lock:
> [  293.956194]  (&ctx->lock){-.....}, at: [<ffffffffb0180453>] perf_lock_task_context+0x53/0x300
> [  293.956386] 
>                which lock already depends on the new lock.
> 
> [  293.956559] 
>                the existing dependency chain (in reverse order) is:
> [  293.956720] 
>                -> #3 (&ctx->lock){-.....}:
> [  293.956807]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
> [  293.956940]        [<ffffffffb07efdd1>] _raw_spin_lock+0x41/0x80
> [  293.957075]        [<ffffffffb0185b66>] __perf_event_task_sched_out+0x276/0x700
> [  293.957239]        [<ffffffffb07e9f70>] __schedule+0x530/0xaf0
> [  293.957370]        [<ffffffffb07ea6df>] schedule+0x3f/0x90
> [  293.957493]        [<ffffffffb0001206>] exit_to_usermode_loop+0x76/0xa0
> [  293.957642]        [<ffffffffb0001c13>] prepare_exit_to_usermode+0x53/0x60
> [  293.957795]        [<ffffffffb07f1867>] retint_user+0x8/0x20
> [  293.957923] 
>                -> #2 (&rq->lock){-.-.-.}:
> [  293.958008]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
> [  293.958140]        [<ffffffffb07efdd1>] _raw_spin_lock+0x41/0x80
> [  293.958275]        [<ffffffffb00b198d>] sched_move_task+0x4d/0x270
> [  293.958417]        [<ffffffffb00b1bde>] cpu_cgroup_fork+0xe/0x10
> [  293.958551]        [<ffffffffb012a057>] cgroup_post_fork+0x67/0x160
> [  293.958692]        [<ffffffffb007ae0a>] copy_process+0x147a/0x1e10
> [  293.958831]        [<ffffffffb007b927>] _do_fork+0x97/0x680
> [  293.958955]        [<ffffffffb007bf39>] kernel_thread+0x29/0x30
> [  293.959087]        [<ffffffffb07e8477>] rest_init+0x27/0x140
> [  293.959211]        [<ffffffffb0ef1f72>] start_kernel+0x450/0x45d
> [  293.959347]        [<ffffffffb0ef12fe>] x86_64_start_reservations+0x2a/0x2c
> [  293.959503]        [<ffffffffb0ef1468>] x86_64_start_kernel+0x168/0x176
> [  293.959652] 
>                -> #1 (&p->pi_lock){-.-.-.}:
> [  293.959742]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
> [  293.959876]        [<ffffffffb07f00ec>] _raw_spin_lock_irqsave+0x4c/0x90
> [  293.960026]        [<ffffffffb00ac070>] try_to_wake_up+0x30/0x5f0
> [  293.960165]        [<ffffffffb00ac653>] wake_up_process+0x23/0x40
> [  293.960300]        [<ffffffffb0f0b7a0>] rcu_spawn_gp_kthread+0xd5/0xfd
> [  293.960449]        [<ffffffffb00002d3>] do_one_initcall+0xb3/0x1f0
> [  293.960587]        [<ffffffffb0ef2080>] kernel_init_freeable+0x101/0x234
> [  293.960736]        [<ffffffffb07e859e>] kernel_init+0xe/0xe0
> [  293.960860]        [<ffffffffb07f10ff>] ret_from_fork+0x3f/0x70
> [  293.960993] 
>                -> #0 (rcu_node_0){-.-...}:
> [  293.961081]        [<ffffffffb00d4722>] __lock_acquire+0x19d2/0x1cf0
> [  293.961222]        [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
> [  293.961351]        [<ffffffffb07f00ec>] _raw_spin_lock_irqsave+0x4c/0x90
> [  293.961501]        [<ffffffffb00eb734>] rcu_report_exp_cpu_mult+0x24/0x70
> [  293.961652]        [<ffffffffb00f48e9>] rcu_read_unlock_special+0x109/0x5e0
> [  293.961808]        [<ffffffffb00e88f7>] __rcu_read_unlock+0x87/0x90
> [  293.961947]        [<ffffffffb018059d>] perf_lock_task_context+0x19d/0x300
> [  293.962101]        [<ffffffffb018b5bf>] perf_event_init_task+0x9f/0x370
> [  293.962248]        [<ffffffffb007a18a>] copy_process+0x7fa/0x1e10
> [  293.962385]        [<ffffffffb007b927>] _do_fork+0x97/0x680
> [  293.962508]        [<ffffffffb007bfb9>] SyS_clone+0x19/0x20
> [  293.962631]        [<ffffffffb07f0d97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> [  293.962787] 
>                other info that might help us debug this:
> 
> [  293.962955] Chain exists of:
>                  rcu_node_0 --> &rq->lock --> &ctx->lock
> 
> [  293.963106]  Possible unsafe locking scenario:
> 
> [  293.963233]        CPU0                    CPU1
> [  293.963332]        ----                    ----
> [  293.963430]   lock(&ctx->lock);
> [  293.963500]                                lock(&rq->lock);
> [  293.963627]                                lock(&ctx->lock);
> [  293.963755]   lock(rcu_node_0);
> [  293.963824] 
>                 *** DEADLOCK ***
> 
> [  293.963947] 2 locks held by trinity-c343/12586:
> [  293.964047]  #0:  (rcu_read_lock){......}, at: [<ffffffffb01804d0>] perf_lock_task_context+0xd0/0x300
> [  293.964258]  #1:  (&ctx->lock){-.....}, at: [<ffffffffb0180453>] perf_lock_task_context+0x53/0x300
> [  293.972934] 
>                stack backtrace:
> [  293.990221] CPU: 3 PID: 12586 Comm: trinity-c343 Not tainted 4.3.0-think+ #3
> [  293.999210]  ffff8804fb6824a0 ffff88044a0efa60 ffffffffb03607bc ffffffffb1779630
> [  294.008115]  ffff88044a0efaa0 ffffffffb00d1453 ffff88044a0efb10 0000000000000002
> [  294.017066]  ffff8804fb682468 ffff8804fb6824a0 0000000000000001 ffff8804fb681b80
> [  294.026109] Call Trace:
> [  294.035079]  [<ffffffffb03607bc>] dump_stack+0x4e/0x82
> [  294.044014]  [<ffffffffb00d1453>] print_circular_bug+0x1e3/0x250
> [  294.052982]  [<ffffffffb00d4722>] __lock_acquire+0x19d2/0x1cf0
> [  294.061926]  [<ffffffffb00cedf9>] ? get_lock_stats+0x19/0x60
> [  294.070878]  [<ffffffffb00d53a3>] lock_acquire+0xc3/0x1d0
> [  294.079828]  [<ffffffffb00eb734>] ? rcu_report_exp_cpu_mult+0x24/0x70
> [  294.088805]  [<ffffffffb07f00ec>] _raw_spin_lock_irqsave+0x4c/0x90
> [  294.097771]  [<ffffffffb00eb734>] ? rcu_report_exp_cpu_mult+0x24/0x70
> [  294.106807]  [<ffffffffb00eb734>] rcu_report_exp_cpu_mult+0x24/0x70
> [  294.115823]  [<ffffffffb00f48e9>] rcu_read_unlock_special+0x109/0x5e0
> [  294.124813]  [<ffffffffb00cedf9>] ? get_lock_stats+0x19/0x60
> [  294.133801]  [<ffffffffb0180453>] ? perf_lock_task_context+0x53/0x300
> [  294.142790]  [<ffffffffb00e88f7>] __rcu_read_unlock+0x87/0x90
> [  294.151675]  [<ffffffffb018059d>] perf_lock_task_context+0x19d/0x300
> [  294.160420]  [<ffffffffb01804d0>] ? perf_lock_task_context+0xd0/0x300
> [  294.169166]  [<ffffffffb018b5bf>] perf_event_init_task+0x9f/0x370
> [  294.177832]  [<ffffffffb00a64a5>] ? preempt_count_sub+0xa5/0xf0
> [  294.186477]  [<ffffffffb007a18a>] copy_process+0x7fa/0x1e10
> [  294.195036]  [<ffffffffb00cedf9>] ? get_lock_stats+0x19/0x60
> [  294.203564]  [<ffffffffb00ced1d>] ? trace_hardirqs_off+0xd/0x10
> [  294.212042]  [<ffffffffb0146101>] ? __acct_update_integrals+0x91/0xf0
> [  294.220411]  [<ffffffffb00a64a5>] ? preempt_count_sub+0xa5/0xf0
> [  294.228778]  [<ffffffffb007b927>] _do_fork+0x97/0x680
> [  294.237038]  [<ffffffffb018fd73>] ? context_tracking_user_exit+0x13/0x20
> [  294.245322]  [<ffffffffb00017ca>] ? enter_from_user_mode+0x1a/0x50
> [  294.253586]  [<ffffffffb00018bc>] ? syscall_trace_enter_phase1+0xbc/0x110
> [  294.261919]  [<ffffffffb0001017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [  294.270306]  [<ffffffffb007bfb9>] SyS_clone+0x19/0x20
> [  294.278672]  [<ffffffffb07f0d97>] entry_SYSCALL_64_fastpath+0x12/0x6f
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 10:21 ` Peter Zijlstra
@ 2015-11-04 10:28   ` Peter Zijlstra
  2015-11-04 10:50     ` Peter Zijlstra
  2015-11-04 14:01     ` perf related lockdep bug Paul E. McKenney
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2015-11-04 10:28 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian
  Cc: Paul McKenney, Andi Kleen

On Wed, Nov 04, 2015 at 11:21:51AM +0100, Peter Zijlstra wrote:

> The problem appears to be due to the new RCU expedited grace period
> stuff, with rcu_read_unlock() now randomly trying to acquire locks it
> previously didn't.
> 
> Lemme go look at those rcu bits again..

Paul, I think this is because of:

  8203d6d0ee78 ("rcu: Use single-stage IPI algorithm for RCU expedited grace period")

What happens is that the IPI comes in and tags any random
rcu_read_unlock() with the special bit, which then goes on and takes
locks.

Now the problem is that we have scheduler activity inside this lock;
the one reported lockdep seems easy enough to fix, see below.

I'll got and see if there's more sites than can cause this.

---
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f07343b54fe5..a9c57b386258 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4333,8 +4333,8 @@ static int __init rcu_spawn_gp_kthread(void)
 			sp.sched_priority = kthread_prio;
 			sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
 		}
-		wake_up_process(t);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		wake_up_process(t);
 	}
 	rcu_spawn_nocb_kthreads();
 	rcu_spawn_boost_kthreads();

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 10:28   ` Peter Zijlstra
@ 2015-11-04 10:50     ` Peter Zijlstra
  2015-11-04 13:48       ` Paul E. McKenney
  2015-11-04 14:01     ` perf related lockdep bug Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2015-11-04 10:50 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian
  Cc: Paul McKenney, Andi Kleen

On Wed, Nov 04, 2015 at 11:28:00AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 11:21:51AM +0100, Peter Zijlstra wrote:
> 
> > The problem appears to be due to the new RCU expedited grace period
> > stuff, with rcu_read_unlock() now randomly trying to acquire locks it
> > previously didn't.
> > 
> > Lemme go look at those rcu bits again..
> 
> Paul, I think this is because of:
> 
>   8203d6d0ee78 ("rcu: Use single-stage IPI algorithm for RCU expedited grace period")
> 
> What happens is that the IPI comes in and tags any random
> rcu_read_unlock() with the special bit, which then goes on and takes
> locks.
> 
> Now the problem is that we have scheduler activity inside this lock;
> the one reported lockdep seems easy enough to fix, see below.
> 
> I'll got and see if there's more sites than can cause this.

*sigh* yes, there's gobs more. In fact the very first one I looked at:

rcu_dump_cpu_stacks()
  raw_spin_lock_irqsave(&rnp->lock, flags);
  dump_cpu_task()
    pr_info()

That too will end up doing wakeups..

idem:
 - print_other_cpu_stall()
 - rcu_print_details_task_stall_rnp()


So just like you had to pull out all the rcu_gp_kthead_wake() calls from
under rnp->lock, so too must we pull out all implied wakeups, which very
much include printk().


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 10:50     ` Peter Zijlstra
@ 2015-11-04 13:48       ` Paul E. McKenney
  2015-11-04 14:20         ` Peter Zijlstra
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-04 13:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 11:50:10AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 11:28:00AM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 04, 2015 at 11:21:51AM +0100, Peter Zijlstra wrote:
> > 
> > > The problem appears to be due to the new RCU expedited grace period
> > > stuff, with rcu_read_unlock() now randomly trying to acquire locks it
> > > previously didn't.
> > > 
> > > Lemme go look at those rcu bits again..
> > 
> > Paul, I think this is because of:
> > 
> >   8203d6d0ee78 ("rcu: Use single-stage IPI algorithm for RCU expedited grace period")
> > 
> > What happens is that the IPI comes in and tags any random
> > rcu_read_unlock() with the special bit, which then goes on and takes
> > locks.
> > 
> > Now the problem is that we have scheduler activity inside this lock;
> > the one reported lockdep seems easy enough to fix, see below.
> > 
> > I'll got and see if there's more sites than can cause this.
> 
> *sigh* yes, there's gobs more. In fact the very first one I looked at:
> 
> rcu_dump_cpu_stacks()
>   raw_spin_lock_irqsave(&rnp->lock, flags);
>   dump_cpu_task()
>     pr_info()
> 
> That too will end up doing wakeups..
> 
> idem:
>  - print_other_cpu_stall()
>  - rcu_print_details_task_stall_rnp()
> 
> 
> So just like you had to pull out all the rcu_gp_kthead_wake() calls from
> under rnp->lock, so too must we pull out all implied wakeups, which very
> much include printk().

Ouch!!!  Thank you for the analysis, though I am very surprised that
my testing did not find this.  But pulling all printk()s out from under
rnp->lock is going to re-introduce some stall-warning bugs.

So what other options do I have?

o	I could do raise_softirq(), then report the quiescent state in
	the core RCU code, but I bet that raise_softirq()'s  wakeup gets
	me into just as much trouble.

o	Ditto for workqueues, I suspect.

o	I cannot send an IPI because interrupts are disabled, and that
	would be rather annoying from a real-time perspective in any
	case.

Any others?

Ah, and I see why my testing missed this -- I don't normally exercise
perf while running rcutorture.

So this hit the code in perf_lock_task_context() that disables preemption
across an RCU read-side critical section, which previously sufficed to
prevent this scenario.  What happened this time is as follows:

o	CPU 0 entered perf_lock_task_context(), disabled preemption,
	and entered its RCU read-side critical section.  Of course,
	the whole point of disabling preemption is to prevent the
	matching rcu_read_unlock() from grabbing locks.

o	CPU 1 started an expedited grace period.  It checked CPU
	state, saw that CPU 0 was running in the kernel, and therefore
	IPIed it.

o	The IPI handler running on CPU 0 saw that there was an
	RCU read-side critical section in effect, so it set the
	->exp_need_qs flag.

o	When the matching rcu_read_unlock() executes, it notes that
	->exp_need_qs is set, and therefore grabs the locks that it
	shouldn't, hence lockdep's complaints about deadlock.

This problem is caused by the IPI handler interrupting the RCU read-side
critical section.  One way to prevent the IPI from doing this is to
disable interrupts across the RCU read-side critical section instead
of merely disabling preemption.  This is a reasonable approach given
that acquiring the scheduler locks is going to disable interrupts
in any case.

The (untested) patch below takes this approach.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 90ab56c9f90865991155d8f23441a5e135aa33cc
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Nov 4 05:37:08 2015 -0800

    perf: Disable irqs across RCU RS CS that acquires scheduler lock
    
    The perf_lock_task_context() function disables preemption across its
    RCU read-side critical section because that critical section acquires
    a scheduler lock.  If there was a preemption during that RCU read-side
    critical section, the rcu_read_unlock() could attempt to acquire scheduler
    locks, resulting in deadlock.
    
    However, recent optimizations to expedited grace periods mean that IPI
    handlers that execute during preemptible RCU read-side critical sections
    can now cause the subsequent rcu_read_unlock() to acquire scheduler locks.
    Disabling preemption does nothiing to prevent these IPI handlers from
    executing, so these optimizations introduced a deadlock.  In theory,
    this deadlock could be avoided by pulling all wakeups and printk()s out
    from rnp->lock critical sections, but in practice this would re-introduce
    some RCU CPU stall warning bugs.
    
    Given that acquiring scheduler locks entails disabling interrupts, these
    deadlocks can be avoided by disabling interrupts (instead of disabling
    preemption) across any RCU read-side critical that acquires scheduler
    locks and holds them across the rcu_read_unlock().  This commit therefore
    makes this change for perf_lock_task_context().
    
    Reported-by: Dave Jones <davej@codemonkey.org.uk>
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f548f69c4299..2ff0605a5384 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1050,13 +1050,13 @@ retry:
 	/*
 	 * One of the few rules of preemptible RCU is that one cannot do
 	 * rcu_read_unlock() while holding a scheduler (or nested) lock when
-	 * part of the read side critical section was preemptible -- see
+	 * part of the read side critical section was irqs-enabled -- see
 	 * rcu_read_unlock_special().
 	 *
 	 * Since ctx->lock nests under rq->lock we must ensure the entire read
-	 * side critical section is non-preemptible.
+	 * side critical section has interrupts disabled.
 	 */
-	preempt_disable();
+	local_irq_save(*flags);
 	rcu_read_lock();
 	ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
 	if (ctx) {
@@ -1070,21 +1070,22 @@ retry:
 		 * if so.  If we locked the right context, then it
 		 * can't get swapped on us any more.
 		 */
-		raw_spin_lock_irqsave(&ctx->lock, *flags);
+		raw_spin_lock(&ctx->lock);
 		if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
-			raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+			raw_spin_unlock(&ctx->lock);
 			rcu_read_unlock();
-			preempt_enable();
+			local_irq_restore(*flags);
 			goto retry;
 		}
 
 		if (!atomic_inc_not_zero(&ctx->refcount)) {
-			raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+			raw_spin_unlock(&ctx->lock);
 			ctx = NULL;
 		}
 	}
 	rcu_read_unlock();
-	preempt_enable();
+	if (!ctx)
+		local_irq_restore(*flags);
 	return ctx;
 }
 


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 10:28   ` Peter Zijlstra
  2015-11-04 10:50     ` Peter Zijlstra
@ 2015-11-04 14:01     ` Paul E. McKenney
  2015-11-04 14:34       ` Peter Zijlstra
  1 sibling, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-04 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 11:28:00AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 11:21:51AM +0100, Peter Zijlstra wrote:
> 
> > The problem appears to be due to the new RCU expedited grace period
> > stuff, with rcu_read_unlock() now randomly trying to acquire locks it
> > previously didn't.
> > 
> > Lemme go look at those rcu bits again..
> 
> Paul, I think this is because of:
> 
>   8203d6d0ee78 ("rcu: Use single-stage IPI algorithm for RCU expedited grace period")
> 
> What happens is that the IPI comes in and tags any random
> rcu_read_unlock() with the special bit, which then goes on and takes
> locks.
> 
> Now the problem is that we have scheduler activity inside this lock;
> the one reported lockdep seems easy enough to fix, see below.
> 
> I'll got and see if there's more sites than can cause this.

This one only happens during boot time, but it would be good hygiene
in any case.  May I have your SOB on this?

								Thanx, Paul

> ---
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f07343b54fe5..a9c57b386258 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4333,8 +4333,8 @@ static int __init rcu_spawn_gp_kthread(void)
>  			sp.sched_priority = kthread_prio;
>  			sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>  		}
> -		wake_up_process(t);
>  		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +		wake_up_process(t);
>  	}
>  	rcu_spawn_nocb_kthreads();
>  	rcu_spawn_boost_kthreads();
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 13:48       ` Paul E. McKenney
@ 2015-11-04 14:20         ` Peter Zijlstra
  2015-11-04 15:34           ` Paul E. McKenney
  2015-11-04 15:36           ` Peter Zijlstra
  2015-11-04 20:58         ` Andi Kleen
  2015-11-10  6:39         ` [tip:perf/urgent] perf: Disable IRQs across RCU RS CS that acquires scheduler lock tip-bot for Paul E. McKenney
  2 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2015-11-04 14:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 05:48:38AM -0800, Paul E. McKenney wrote:
> Ouch!!!  Thank you for the analysis, though I am very surprised that
> my testing did not find this. 

Yeah, not sure how that ended up not triggering earlier.

I'm thinking of adding a might_wake(), much like we have might_fault()
and add that to printk().

> But pulling all printk()s out from under
> rnp->lock is going to re-introduce some stall-warning bugs.

figures :/

> So what other options do I have?

Kill printk() :-) Its unreliable garbage anyway ;-)

> o	I could do raise_softirq(), then report the quiescent state in
> 	the core RCU code, but I bet that raise_softirq()'s  wakeup gets
> 	me into just as much trouble.

Yep..

> o	Ditto for workqueues, I suspect.

Yep..

> o	I cannot send an IPI because interrupts are disabled, and that
> 	would be rather annoying from a real-time perspective in any
> 	case.

Indeed.

> So this hit the code in perf_lock_task_context() that disables preemption
> across an RCU read-side critical section, which previously sufficed to
> prevent this scenario.  What happened this time is as follows:
> 
> o	CPU 0 entered perf_lock_task_context(), disabled preemption,
> 	and entered its RCU read-side critical section.  Of course,
> 	the whole point of disabling preemption is to prevent the
> 	matching rcu_read_unlock() from grabbing locks.
> 
> o	CPU 1 started an expedited grace period.  It checked CPU
> 	state, saw that CPU 0 was running in the kernel, and therefore
> 	IPIed it.
> 
> o	The IPI handler running on CPU 0 saw that there was an
> 	RCU read-side critical section in effect, so it set the
> 	->exp_need_qs flag.
> 
> o	When the matching rcu_read_unlock() executes, it notes that
> 	->exp_need_qs is set, and therefore grabs the locks that it
> 	shouldn't, hence lockdep's complaints about deadlock.
> 
> This problem is caused by the IPI handler interrupting the RCU read-side
> critical section.  One way to prevent the IPI from doing this is to
> disable interrupts across the RCU read-side critical section instead
> of merely disabling preemption.  This is a reasonable approach given
> that acquiring the scheduler locks is going to disable interrupts
> in any case.
> 
> The (untested) patch below takes this approach.
> 
> Thoughts?

Yes, this should work, but now I worry I need to go audit all of perf
and sched for this :/



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 14:01     ` perf related lockdep bug Paul E. McKenney
@ 2015-11-04 14:34       ` Peter Zijlstra
  2015-11-05  1:57         ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2015-11-04 14:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 06:01:33AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 04, 2015 at 11:28:00AM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 04, 2015 at 11:21:51AM +0100, Peter Zijlstra wrote:
> > 
> > > The problem appears to be due to the new RCU expedited grace period
> > > stuff, with rcu_read_unlock() now randomly trying to acquire locks it
> > > previously didn't.
> > > 
> > > Lemme go look at those rcu bits again..
> > 
> > Paul, I think this is because of:
> > 
> >   8203d6d0ee78 ("rcu: Use single-stage IPI algorithm for RCU expedited grace period")
> > 
> > What happens is that the IPI comes in and tags any random
> > rcu_read_unlock() with the special bit, which then goes on and takes
> > locks.
> > 
> > Now the problem is that we have scheduler activity inside this lock;
> > the one reported lockdep seems easy enough to fix, see below.
> > 
> > I'll got and see if there's more sites than can cause this.
> 
> This one only happens during boot time, but it would be good hygiene
> in any case.  May I have your SOB on this?

Of course,

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 14:20         ` Peter Zijlstra
@ 2015-11-04 15:34           ` Paul E. McKenney
  2015-11-04 15:36           ` Peter Zijlstra
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-04 15:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 03:20:58PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 05:48:38AM -0800, Paul E. McKenney wrote:
> > Ouch!!!  Thank you for the analysis, though I am very surprised that
> > my testing did not find this. 
> 
> Yeah, not sure how that ended up not triggering earlier.
> 
> I'm thinking of adding a might_wake(), much like we have might_fault()
> and add that to printk().

The idea being that might_wake() complains if a scheduler lock is held?
Sounds like a good idea to me.

> > But pulling all printk()s out from under
> > rnp->lock is going to re-introduce some stall-warning bugs.
> 
> figures :/
> 
> > So what other options do I have?
> 
> Kill printk() :-) Its unreliable garbage anyway ;-)

;-) ;-) ;-)

> > o	I could do raise_softirq(), then report the quiescent state in
> > 	the core RCU code, but I bet that raise_softirq()'s  wakeup gets
> > 	me into just as much trouble.
> 
> Yep..
> 
> > o	Ditto for workqueues, I suspect.
> 
> Yep..
> 
> > o	I cannot send an IPI because interrupts are disabled, and that
> > 	would be rather annoying from a real-time perspective in any
> > 	case.
> 
> Indeed.
> 
> > So this hit the code in perf_lock_task_context() that disables preemption
> > across an RCU read-side critical section, which previously sufficed to
> > prevent this scenario.  What happened this time is as follows:
> > 
> > o	CPU 0 entered perf_lock_task_context(), disabled preemption,
> > 	and entered its RCU read-side critical section.  Of course,
> > 	the whole point of disabling preemption is to prevent the
> > 	matching rcu_read_unlock() from grabbing locks.
> > 
> > o	CPU 1 started an expedited grace period.  It checked CPU
> > 	state, saw that CPU 0 was running in the kernel, and therefore
> > 	IPIed it.
> > 
> > o	The IPI handler running on CPU 0 saw that there was an
> > 	RCU read-side critical section in effect, so it set the
> > 	->exp_need_qs flag.
> > 
> > o	When the matching rcu_read_unlock() executes, it notes that
> > 	->exp_need_qs is set, and therefore grabs the locks that it
> > 	shouldn't, hence lockdep's complaints about deadlock.
> > 
> > This problem is caused by the IPI handler interrupting the RCU read-side
> > critical section.  One way to prevent the IPI from doing this is to
> > disable interrupts across the RCU read-side critical section instead
> > of merely disabling preemption.  This is a reasonable approach given
> > that acquiring the scheduler locks is going to disable interrupts
> > in any case.
> > 
> > The (untested) patch below takes this approach.
> > 
> > Thoughts?
> 
> Yes, this should work, but now I worry I need to go audit all of perf
> and sched for this :/

Could lockdep be convinced to do the auditing for you?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 14:20         ` Peter Zijlstra
  2015-11-04 15:34           ` Paul E. McKenney
@ 2015-11-04 15:36           ` Peter Zijlstra
  2015-11-04 15:51             ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2015-11-04 15:36 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 03:20:58PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 05:48:38AM -0800, Paul E. McKenney wrote:
> > This problem is caused by the IPI handler interrupting the RCU read-side
> > critical section.  One way to prevent the IPI from doing this is to
> > disable interrupts across the RCU read-side critical section instead
> > of merely disabling preemption.  This is a reasonable approach given
> > that acquiring the scheduler locks is going to disable interrupts
> > in any case.
> > 
> > The (untested) patch below takes this approach.
> > 
> > Thoughts?
> 
> Yes, this should work, but now I worry I need to go audit all of perf
> and sched for this :/

I can't find any other sites just now, so let me queue this.

I also had a brief look if you used any other locks under rnp->lock, but
aside from the printk and sched things it seems clean.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 15:36           ` Peter Zijlstra
@ 2015-11-04 15:51             ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-04 15:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 04:36:51PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 03:20:58PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 04, 2015 at 05:48:38AM -0800, Paul E. McKenney wrote:
> > > This problem is caused by the IPI handler interrupting the RCU read-side
> > > critical section.  One way to prevent the IPI from doing this is to
> > > disable interrupts across the RCU read-side critical section instead
> > > of merely disabling preemption.  This is a reasonable approach given
> > > that acquiring the scheduler locks is going to disable interrupts
> > > in any case.
> > > 
> > > The (untested) patch below takes this approach.
> > > 
> > > Thoughts?
> > 
> > Yes, this should work, but now I worry I need to go audit all of perf
> > and sched for this :/
> 
> I can't find any other sites just now, so let me queue this.

Works for me, thank you!  It passes light rcutorture testing, but then
again so did the version without this commit.  :-/

I will queue the needed documentation updates separately.  I don't see
these as urgent, so probably next merge window.

> I also had a brief look if you used any other locks under rnp->lock, but
> aside from the printk and sched things it seems clean.

Whew!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 13:48       ` Paul E. McKenney
  2015-11-04 14:20         ` Peter Zijlstra
@ 2015-11-04 20:58         ` Andi Kleen
  2015-11-05  0:55           ` Paul E. McKenney
  2015-11-10  6:39         ` [tip:perf/urgent] perf: Disable IRQs across RCU RS CS that acquires scheduler lock tip-bot for Paul E. McKenney
  2 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2015-11-04 20:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Dave Jones, Linux Kernel, Ingo Molnar,
	Stephane Eranian, Andi Kleen


I tested my perf stress workload with the patch applied on 4.3,
unfortunately got a hang again :-/

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 20:58         ` Andi Kleen
@ 2015-11-05  0:55           ` Paul E. McKenney
  2015-11-05  1:59             ` Paul E. McKenney
  2015-11-05  2:46             ` Andi Kleen
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-05  0:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian

On Wed, Nov 04, 2015 at 09:58:36PM +0100, Andi Kleen wrote:
> 
> I tested my perf stress workload with the patch applied on 4.3,
> unfortunately got a hang again :-/

Any diagnostics, sysrq-T output, or whatever?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-04 14:34       ` Peter Zijlstra
@ 2015-11-05  1:57         ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-05  1:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian, Andi Kleen

On Wed, Nov 04, 2015 at 03:34:19PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 06:01:33AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 04, 2015 at 11:28:00AM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 04, 2015 at 11:21:51AM +0100, Peter Zijlstra wrote:
> > > 
> > > > The problem appears to be due to the new RCU expedited grace period
> > > > stuff, with rcu_read_unlock() now randomly trying to acquire locks it
> > > > previously didn't.
> > > > 
> > > > Lemme go look at those rcu bits again..
> > > 
> > > Paul, I think this is because of:
> > > 
> > >   8203d6d0ee78 ("rcu: Use single-stage IPI algorithm for RCU expedited grace period")
> > > 
> > > What happens is that the IPI comes in and tags any random
> > > rcu_read_unlock() with the special bit, which then goes on and takes
> > > locks.
> > > 
> > > Now the problem is that we have scheduler activity inside this lock;
> > > the one reported lockdep seems easy enough to fix, see below.
> > > 
> > > I'll got and see if there's more sites than can cause this.
> > 
> > This one only happens during boot time, but it would be good hygiene
> > in any case.  May I have your SOB on this?
> 
> Of course,
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thank you, applied as shown below.

							Thanx, Paul

------------------------------------------------------------------------

commit 05faf451f1239a28fcd63bf4b66c0db57d7b13f9
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Nov 4 08:22:05 2015 -0800

    rcu: Move wakeup out from under rnp->lock
    
    This patch removes a potential deadlock hazard by moving the
    wake_up_process() in rcu_spawn_gp_kthread() out from under rnp->lock.
    
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index caf3651fa5c9..183445959d00 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4323,8 +4323,8 @@ static int __init rcu_spawn_gp_kthread(void)
 			sp.sched_priority = kthread_prio;
 			sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
 		}
-		wake_up_process(t);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		wake_up_process(t);
 	}
 	rcu_spawn_nocb_kthreads();
 	rcu_spawn_boost_kthreads();


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-05  0:55           ` Paul E. McKenney
@ 2015-11-05  1:59             ` Paul E. McKenney
  2015-11-05  2:46             ` Andi Kleen
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-05  1:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian

On Wed, Nov 04, 2015 at 04:55:28PM -0800, Paul E. McKenney wrote:
> On Wed, Nov 04, 2015 at 09:58:36PM +0100, Andi Kleen wrote:
> > 
> > I tested my perf stress workload with the patch applied on 4.3,
> > unfortunately got a hang again :-/
> 
> Any diagnostics, sysrq-T output, or whatever?

Given that it looks like your hang is happening at runtime, I would guess
that the following patch won't help, but who knows?

							Thanx, Paul

------------------------------------------------------------------------

commit 05faf451f1239a28fcd63bf4b66c0db57d7b13f9
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Nov 4 08:22:05 2015 -0800

    rcu: Move wakeup out from under rnp->lock
    
    This patch removes a potential deadlock hazard by moving the
    wake_up_process() in rcu_spawn_gp_kthread() out from under rnp->lock.
    
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index caf3651fa5c9..183445959d00 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4323,8 +4323,8 @@ static int __init rcu_spawn_gp_kthread(void)
 			sp.sched_priority = kthread_prio;
 			sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
 		}
-		wake_up_process(t);
 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
+		wake_up_process(t);
 	}
 	rcu_spawn_nocb_kthreads();
 	rcu_spawn_boost_kthreads();


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-05  0:55           ` Paul E. McKenney
  2015-11-05  1:59             ` Paul E. McKenney
@ 2015-11-05  2:46             ` Andi Kleen
  2015-11-05 14:04               ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2015-11-05  2:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andi Kleen, Peter Zijlstra, Dave Jones, Linux Kernel,
	Ingo Molnar, Stephane Eranian

On Wed, Nov 04, 2015 at 04:55:28PM -0800, Paul E. McKenney wrote:
> On Wed, Nov 04, 2015 at 09:58:36PM +0100, Andi Kleen wrote:
> > 
> > I tested my perf stress workload with the patch applied on 4.3,
> > unfortunately got a hang again :-/
> 
> Any diagnostics, sysrq-T output, or whatever?

The system was dead in the screen saver, no serial console.
I restarted it and may have some more information tomorrow.

The previous hangs were usually timeout on smp_call_function,
so one CPU disappeared.

I'll try the extra hunk you pointed out.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-05  2:46             ` Andi Kleen
@ 2015-11-05 14:04               ` Paul E. McKenney
  2015-11-11 13:29                 ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-05 14:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian

On Thu, Nov 05, 2015 at 03:46:07AM +0100, Andi Kleen wrote:
> On Wed, Nov 04, 2015 at 04:55:28PM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 04, 2015 at 09:58:36PM +0100, Andi Kleen wrote:
> > > 
> > > I tested my perf stress workload with the patch applied on 4.3,
> > > unfortunately got a hang again :-/
> > 
> > Any diagnostics, sysrq-T output, or whatever?
> 
> The system was dead in the screen saver, no serial console.
> I restarted it and may have some more information tomorrow.
> 
> The previous hangs were usually timeout on smp_call_function,
> so one CPU disappeared.
> 
> I'll try the extra hunk you pointed out.

Thank you, looking forward to seeing what you get from this.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:perf/urgent] perf: Disable IRQs across RCU RS CS that acquires scheduler lock
  2015-11-04 13:48       ` Paul E. McKenney
  2015-11-04 14:20         ` Peter Zijlstra
  2015-11-04 20:58         ` Andi Kleen
@ 2015-11-10  6:39         ` tip-bot for Paul E. McKenney
  2 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Paul E. McKenney @ 2015-11-10  6:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: eranian, acme, davej, tglx, jolsa, linux-kernel, paulmck,
	torvalds, peterz, hpa, mingo

Commit-ID:  2fd59077755c44dbbd9b2fa89cf988235a3a6a2b
Gitweb:     http://git.kernel.org/tip/2fd59077755c44dbbd9b2fa89cf988235a3a6a2b
Author:     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
AuthorDate: Wed, 4 Nov 2015 05:48:38 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 9 Nov 2015 16:13:11 +0100

perf: Disable IRQs across RCU RS CS that acquires scheduler lock

The perf_lock_task_context() function disables preemption across its
RCU read-side critical section because that critical section acquires
a scheduler lock.  If there was a preemption during that RCU read-side
critical section, the rcu_read_unlock() could attempt to acquire scheduler
locks, resulting in deadlock.

However, recent optimizations to expedited grace periods mean that IPI
handlers that execute during preemptible RCU read-side critical sections
can now cause the subsequent rcu_read_unlock() to acquire scheduler locks.
Disabling preemption does nothiing to prevent these IPI handlers from
executing, so these optimizations introduced a deadlock.  In theory,
this deadlock could be avoided by pulling all wakeups and printk()s out
from rnp->lock critical sections, but in practice this would re-introduce
some RCU CPU stall warning bugs.

Given that acquiring scheduler locks entails disabling interrupts, these
deadlocks can be avoided by disabling interrupts (instead of disabling
preemption) across any RCU read-side critical that acquires scheduler
locks and holds them across the rcu_read_unlock().  This commit therefore
makes this change for perf_lock_task_context().

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151104134838.GR29027@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ea02109..f8e5c44 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1050,13 +1050,13 @@ retry:
 	/*
 	 * One of the few rules of preemptible RCU is that one cannot do
 	 * rcu_read_unlock() while holding a scheduler (or nested) lock when
-	 * part of the read side critical section was preemptible -- see
+	 * part of the read side critical section was irqs-enabled -- see
 	 * rcu_read_unlock_special().
 	 *
 	 * Since ctx->lock nests under rq->lock we must ensure the entire read
-	 * side critical section is non-preemptible.
+	 * side critical section has interrupts disabled.
 	 */
-	preempt_disable();
+	local_irq_save(*flags);
 	rcu_read_lock();
 	ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
 	if (ctx) {
@@ -1070,21 +1070,22 @@ retry:
 		 * if so.  If we locked the right context, then it
 		 * can't get swapped on us any more.
 		 */
-		raw_spin_lock_irqsave(&ctx->lock, *flags);
+		raw_spin_lock(&ctx->lock);
 		if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
-			raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+			raw_spin_unlock(&ctx->lock);
 			rcu_read_unlock();
-			preempt_enable();
+			local_irq_restore(*flags);
 			goto retry;
 		}
 
 		if (!atomic_inc_not_zero(&ctx->refcount)) {
-			raw_spin_unlock_irqrestore(&ctx->lock, *flags);
+			raw_spin_unlock(&ctx->lock);
 			ctx = NULL;
 		}
 	}
 	rcu_read_unlock();
-	preempt_enable();
+	if (!ctx)
+		local_irq_restore(*flags);
 	return ctx;
 }
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: perf related lockdep bug
  2015-11-05 14:04               ` Paul E. McKenney
@ 2015-11-11 13:29                 ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2015-11-11 13:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Dave Jones, Linux Kernel, Ingo Molnar, Stephane Eranian

On Thu, Nov 05, 2015 at 06:04:46AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 05, 2015 at 03:46:07AM +0100, Andi Kleen wrote:
> > On Wed, Nov 04, 2015 at 04:55:28PM -0800, Paul E. McKenney wrote:
> > > On Wed, Nov 04, 2015 at 09:58:36PM +0100, Andi Kleen wrote:
> > > > 
> > > > I tested my perf stress workload with the patch applied on 4.3,
> > > > unfortunately got a hang again :-/
> > > 
> > > Any diagnostics, sysrq-T output, or whatever?
> > 
> > The system was dead in the screen saver, no serial console.
> > I restarted it and may have some more information tomorrow.
> > 
> > The previous hangs were usually timeout on smp_call_function,
> > so one CPU disappeared.
> > 
> > I'll try the extra hunk you pointed out.
> 
> Thank you, looking forward to seeing what you get from this.

Just following up -- any new info on this hang?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-11-11 13:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04  5:17 perf related lockdep bug Dave Jones
2015-11-04 10:21 ` Peter Zijlstra
2015-11-04 10:28   ` Peter Zijlstra
2015-11-04 10:50     ` Peter Zijlstra
2015-11-04 13:48       ` Paul E. McKenney
2015-11-04 14:20         ` Peter Zijlstra
2015-11-04 15:34           ` Paul E. McKenney
2015-11-04 15:36           ` Peter Zijlstra
2015-11-04 15:51             ` Paul E. McKenney
2015-11-04 20:58         ` Andi Kleen
2015-11-05  0:55           ` Paul E. McKenney
2015-11-05  1:59             ` Paul E. McKenney
2015-11-05  2:46             ` Andi Kleen
2015-11-05 14:04               ` Paul E. McKenney
2015-11-11 13:29                 ` Paul E. McKenney
2015-11-10  6:39         ` [tip:perf/urgent] perf: Disable IRQs across RCU RS CS that acquires scheduler lock tip-bot for Paul E. McKenney
2015-11-04 14:01     ` perf related lockdep bug Paul E. McKenney
2015-11-04 14:34       ` Peter Zijlstra
2015-11-05  1:57         ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.