* sig->cputimer.lock/rq->lock lockdep report
@ 2011-10-06 15:55 Dave Jones
2011-10-20 15:47 ` Paul Bolle
0 siblings, 1 reply; 6+ messages in thread
From: Dave Jones @ 2011-10-06 15:55 UTC (permalink / raw)
To: Linux Kernel
This just started happening the last few days during boot.
[ 35.629250] =======================================================
[ 35.630234] [ INFO: possible circular locking dependency detected ]
[ 35.630234] 3.1.0-rc8+ #16
[ 35.630234] -------------------------------------------------------
[ 35.630234] modprobe/1036 is trying to acquire lock:
[ 35.630234] (&(&sig->cputimer.lock)->rlock){-.....}, at: [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
[ 35.630234]
[ 35.630234] but task is already holding lock:
[ 35.630234] (&rq->lock){-.-.-.}, at: [<ffffffff81058638>] scheduler_tick+0x3d/0x238
[ 35.630234]
[ 35.630234] which lock already depends on the new lock.
[ 35.630234]
[ 35.630234]
[ 35.630234] the existing dependency chain (in reverse order) is:
[ 35.630234]
[ 35.630234] -> #2 (&rq->lock){-.-.-.}:
[ 35.630234] [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[ 35.630234] [<ffffffff814e98e3>] _raw_spin_lock+0x40/0x73
[ 35.630234] [<ffffffff81049570>] __task_rq_lock+0x5e/0x8b
[ 35.630234] [<ffffffff81058029>] wake_up_new_task+0x46/0x10d
[ 35.630234] [<ffffffff8105d401>] do_fork+0x201/0x2cd
[ 35.630234] [<ffffffff81016f94>] kernel_thread+0x75/0x77
[ 35.630234] [<ffffffff814c6102>] rest_init+0x26/0xdc
[ 35.630234] [<ffffffff81d48bbe>] start_kernel+0x3ec/0x3f7
[ 35.630234] [<ffffffff81d482c4>] x86_64_start_reservations+0xaf/0xb3
[ 35.630234] [<ffffffff81d483ca>] x86_64_start_kernel+0x102/0x111
[ 35.630234]
[ 35.630234] -> #1 (&p->pi_lock){-.-.-.}:
[ 35.630234] [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[ 35.630234] [<ffffffff814e9a70>] _raw_spin_lock_irqsave+0x4f/0x89
[ 35.630234] [<ffffffff8107df4c>] thread_group_cputimer+0x32/0xbd
[ 35.630234] [<ffffffff8107e002>] cpu_timer_sample_group+0x2b/0x75
[ 35.630234] [<ffffffff8107efc4>] set_process_cpu_timer+0x36/0xe0
[ 35.630234] [<ffffffff8107f0ad>] update_rlimit_cpu+0x3f/0x5a
[ 35.630234] [<ffffffff8123096a>] selinux_bprm_committing_creds+0xc6/0xcb
[ 35.630234] [<ffffffff812280b9>] security_bprm_committing_creds+0x13/0x15
[ 35.630234] [<ffffffff8114a324>] install_exec_creds+0x13/0x52
[ 35.630234] [<ffffffff81189415>] load_elf_binary+0xf3b/0x1762
[ 35.630234] [<ffffffff8114b21b>] search_binary_handler+0xbf/0x263
[ 35.630234] [<ffffffff8114c211>] do_execve_common+0x1e6/0x2dd
[ 35.630234] [<ffffffff8114c34b>] do_execve+0x43/0x45
[ 35.630234] [<ffffffff81017bba>] sys_execve+0x43/0x5a
[ 35.630234] [<ffffffff814f091c>] stub_execve+0x6c/0xc0
[ 35.630234]
[ 35.630234] -> #0 (&(&sig->cputimer.lock)->rlock){-.....}:
[ 35.630234] [<ffffffff81090a29>] __lock_acquire+0xa2f/0xd0c
[ 35.630234] [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[ 35.630234] [<ffffffff814e98e3>] _raw_spin_lock+0x40/0x73
[ 35.630234] [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
[ 35.630234] [<ffffffff81055ec2>] update_curr+0x112/0x134
[ 35.630234] [<ffffffff810561fc>] task_tick_fair+0x2f/0x120
[ 35.630234] [<ffffffff810586c1>] scheduler_tick+0xc6/0x238
[ 35.630234] [<ffffffff8106cc1c>] update_process_times+0x65/0x75
[ 35.630234] [<ffffffff8108b96b>] tick_sched_timer+0x72/0x9b
[ 35.630234] [<ffffffff8107f9d4>] __run_hrtimer+0xed/0x1db
[ 35.630234] [<ffffffff81080224>] hrtimer_interrupt+0xea/0x1b4
[ 35.630234] [<ffffffff814f30ca>] smp_apic_timer_interrupt+0x76/0x89
[ 35.630234] [<ffffffff814f0fb3>] apic_timer_interrupt+0x73/0x80
[ 35.630234] [<ffffffff814ea146>] _raw_spin_unlock_irq+0x32/0x5e
[ 35.630234] [<ffffffff8107f0c0>] update_rlimit_cpu+0x52/0x5a
[ 35.630234] [<ffffffff8123096a>] selinux_bprm_committing_creds+0xc6/0xcb
[ 35.630234] [<ffffffff812280b9>] security_bprm_committing_creds+0x13/0x15
[ 35.630234] [<ffffffff8114a324>] install_exec_creds+0x13/0x52
[ 35.630234] [<ffffffff81189415>] load_elf_binary+0xf3b/0x1762
[ 35.630234] [<ffffffff8114b21b>] search_binary_handler+0xbf/0x263
[ 35.630234] [<ffffffff8114c211>] do_execve_common+0x1e6/0x2dd
[ 35.630234] [<ffffffff8114c34b>] do_execve+0x43/0x45
[ 35.630234] [<ffffffff81017bba>] sys_execve+0x43/0x5a
[ 35.630234] [<ffffffff814f091c>] stub_execve+0x6c/0xc0
[ 35.630234]
[ 35.630234] other info that might help us debug this:
[ 35.630234]
[ 35.630234] Chain exists of:
[ 35.630234] &(&sig->cputimer.lock)->rlock --> &p->pi_lock --> &rq->lock
[ 35.630234]
[ 35.630234] Possible unsafe locking scenario:
[ 35.630234]
[ 35.630234] CPU0 CPU1
[ 35.630234] ---- ----
[ 35.630234] lock(&rq->lock);
[ 35.630234] lock(&p->pi_lock);
[ 35.630234] lock(&rq->lock);
[ 35.630234] lock(&(&sig->cputimer.lock)->rlock);
[ 35.630234]
[ 35.630234] *** DEADLOCK ***
[ 35.630234]
[ 35.630234] 2 locks held by modprobe/1036:
[ 35.630234] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff8114bed0>] prepare_bprm_creds+0x2d/0x69
[ 35.630234] #1: (&rq->lock){-.-.-.}, at: [<ffffffff81058638>] scheduler_tick+0x3d/0x238
[ 35.630234]
[ 35.630234] stack backtrace:
[ 35.630234] Pid: 1036, comm: modprobe Not tainted 3.1.0-rc8+ #16
[ 35.630234] Call Trace:
[ 35.630234] <IRQ> [<ffffffff81081179>] ? up+0x39/0x3e
[ 35.630234] [<ffffffff814df59a>] print_circular_bug+0x1f8/0x209
[ 35.630234] [<ffffffff81090a29>] __lock_acquire+0xa2f/0xd0c
[ 35.630234] [<ffffffff81016228>] ? __cycles_2_ns+0xe/0x3a
[ 35.630234] [<ffffffff81082176>] ? local_clock+0x14/0x4c
[ 35.630234] [<ffffffff810164e9>] ? sched_clock+0x9/0xd
[ 35.630234] [<ffffffff8104935e>] ? account_group_exec_runtime+0x32/0x49
[ 35.630234] [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[ 35.630234] [<ffffffff8104935e>] ? account_group_exec_runtime+0x32/0x49
[ 35.630234] [<ffffffff8108d972>] ? put_lock_stats+0xe/0x29
[ 35.630234] [<ffffffff8108b8f9>] ? tick_nohz_handler+0xe4/0xe4
[ 35.630234] [<ffffffff814e98e3>] _raw_spin_lock+0x40/0x73
[ 35.630234] [<ffffffff8104935e>] ? account_group_exec_runtime+0x32/0x49
[ 35.630234] [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
[ 35.630234] [<ffffffff81055ec2>] update_curr+0x112/0x134
[ 35.630234] [<ffffffff810561fc>] task_tick_fair+0x2f/0x120
[ 35.630234] [<ffffffff810586c1>] scheduler_tick+0xc6/0x238
[ 35.630234] [<ffffffff8106cc1c>] update_process_times+0x65/0x75
[ 35.630234] [<ffffffff8108b96b>] tick_sched_timer+0x72/0x9b
[ 35.630234] [<ffffffff8107f9d4>] __run_hrtimer+0xed/0x1db
[ 35.630234] [<ffffffff81080224>] hrtimer_interrupt+0xea/0x1b4
[ 35.630234] [<ffffffff814f30ca>] smp_apic_timer_interrupt+0x76/0x89
[ 35.630234] [<ffffffff814f0fb3>] apic_timer_interrupt+0x73/0x80
[ 35.630234] <EOI> [<ffffffff81097ad6>] ? arch_local_irq_enable+0x8/0xd
[ 35.630234] [<ffffffff810916a1>] ? trace_hardirqs_on+0xd/0xf
[ 35.630234] [<ffffffff814ea146>] _raw_spin_unlock_irq+0x32/0x5e
[ 35.630234] [<ffffffff8107f0c0>] update_rlimit_cpu+0x52/0x5a
[ 35.630234] [<ffffffff8123096a>] selinux_bprm_committing_creds+0xc6/0xcb
[ 35.630234] [<ffffffff812280b9>] security_bprm_committing_creds+0x13/0x15
[ 35.630234] [<ffffffff8114a324>] install_exec_creds+0x13/0x52
[ 35.630234] [<ffffffff81189415>] load_elf_binary+0xf3b/0x1762
[ 35.630234] [<ffffffff810164e9>] ? sched_clock+0x9/0xd
[ 35.630234] [<ffffffff8114b212>] ? search_binary_handler+0xb6/0x263
[ 35.630234] [<ffffffff814ea22b>] ? _raw_read_unlock+0x32/0x54
[ 35.630234] [<ffffffff8114b21b>] search_binary_handler+0xbf/0x263
[ 35.630234] [<ffffffff811884da>] ? load_elf_library+0x23b/0x23b
[ 35.630234] [<ffffffff8114c211>] do_execve_common+0x1e6/0x2dd
[ 35.630234] [<ffffffff8114c34b>] do_execve+0x43/0x45
[ 35.630234] [<ffffffff81017bba>] sys_execve+0x43/0x5a
[ 35.630234] [<ffffffff814f091c>] stub_execve+0x6c/0xc0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sig->cputimer.lock/rq->lock lockdep report
2011-10-06 15:55 sig->cputimer.lock/rq->lock lockdep report Dave Jones
@ 2011-10-20 15:47 ` Paul Bolle
0 siblings, 0 replies; 6+ messages in thread
From: Paul Bolle @ 2011-10-20 15:47 UTC (permalink / raw)
To: Dave Jones; +Cc: Linux Kernel
On Thu, 2011-10-06 at 11:55 -0400, Dave Jones wrote:
> [ 35.630234] [ INFO: possible circular locking dependency detected ]
> [ 35.630234] 3.1.0-rc8+ #16
> [ 35.630234] -------------------------------------------------------
> [ 35.630234] modprobe/1036 is trying to acquire lock:
> [ 35.630234] (&(&sig->cputimer.lock)->rlock){-.....}, at: [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
> [ 35.630234]
> [ 35.630234] but task is already holding lock:
> [ 35.630234] (&rq->lock){-.-.-.}, at: [<ffffffff81058638>] scheduler_tick+0x3d/0x238
> [ 35.630234]
> [ 35.630234] which lock already depends on the new lock.
I ran into a rather similar lockdep warning in v3.0.7, so it doesn't
seem to be a regression:
[ 30.389966] =======================================================
[ 30.389970] [ INFO: possible circular locking dependency detected ]
[ 30.389974] 3.0.7-local0.fc14.x86_64 #1
[ 30.389976] -------------------------------------------------------
[ 30.389979] modprobe/809 is trying to acquire lock:
[ 30.389982] (&(&sig->cputimer.lock)->rlock){......}, at: [<ffffffff8104b730>] account_group_exec_runtime+0x34/0x4b
[ 30.389994]
[ 30.389994] but task is already holding lock:
[ 30.389997] (&rq->lock){-.-.-.}, at: [<ffffffff810548d5>] try_to_wake_up+0x18f/0x1d7
[ 30.390005]
[ 30.390005] which lock already depends on the new lock.
[ 30.390007]
[ 30.390009]
[ 30.390010] the existing dependency chain (in reverse order) is:
[ 30.390012]
[ 30.390013] -> #2 (&rq->lock){-.-.-.}:
[ 30.390018] [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[ 30.390020] [<ffffffff814ef2cb>] _raw_spin_lock+0x36/0x69
[ 30.390020] [<ffffffff8104bc86>] __task_rq_lock+0x60/0x8f
[ 30.390020] [<ffffffff81054ccd>] wake_up_new_task+0x46/0xd2
[ 30.390020] [<ffffffff8105c227>] do_fork+0x201/0x2ce
[ 30.390020] [<ffffffff81010db1>] kernel_thread+0x75/0x77
[ 30.390020] [<ffffffff814d6382>] rest_init+0x26/0xbe
[ 30.390020] [<ffffffff81d47bc3>] start_kernel+0x3ee/0x3f9
[ 30.390020] [<ffffffff81d472cb>] x86_64_start_reservations+0xb6/0xba
[ 30.390020] [<ffffffff81d473d0>] x86_64_start_kernel+0x101/0x110
[ 30.390020]
[ 30.390020] -> #1 (&p->pi_lock){-.-.-.}:
[ 30.390020] [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[ 30.390020] [<ffffffff814ef43b>] _raw_spin_lock_irqsave+0x45/0x7f
[ 30.390020] [<ffffffff8107c179>] thread_group_cputimer+0x32/0xbd
[ 30.390020] [<ffffffff8107c22f>] cpu_timer_sample_group+0x2b/0x76
[ 30.390020] [<ffffffff8107d28a>] set_process_cpu_timer+0x36/0xdd
[ 30.390020] [<ffffffff8107d370>] update_rlimit_cpu+0x3f/0x5b
[ 30.390020] [<ffffffff8120a054>] selinux_bprm_committing_creds+0xc9/0xce
[ 30.390020] [<ffffffff812016fc>] security_bprm_committing_creds+0x13/0x15
[ 30.390020] [<ffffffff81147ee1>] install_exec_creds+0x16/0x55
[ 30.390020] [<ffffffff811861b8>] load_elf_binary+0xf65/0x178d
[ 30.390020] [<ffffffff81147d28>] search_binary_handler+0x8f/0x232
[ 30.390020] [<ffffffff81149738>] do_execve_common.clone.21+0x195/0x292
[ 30.390020] [<ffffffff81149850>] do_execve+0x1b/0x1d
[ 30.390020] [<ffffffff810116d2>] sys_execve+0x43/0x5a
[ 30.390020] [<ffffffff814f6c1c>] stub_execve+0x6c/0xc0
[ 30.390020]
[ 30.390020] -> #0 (&(&sig->cputimer.lock)->rlock){......}:
[ 30.390020] [<ffffffff8108f1fa>] __lock_acquire+0xadc/0xdd8
[ 30.390020] [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[ 30.390020] [<ffffffff814ef2cb>] _raw_spin_lock+0x36/0x69
[ 30.390020] [<ffffffff8104b730>] account_group_exec_runtime+0x34/0x4b
[ 30.390020] [<ffffffff8104b819>] update_curr+0xd2/0xf0
[ 30.390020] [<ffffffff8104f405>] check_preempt_wakeup+0xd0/0x1d9
[ 30.390020] [<ffffffff8104a817>] check_preempt_curr+0x34/0x74
[ 30.390020] [<ffffffff8104a997>] ttwu_do_wakeup+0x4d/0xe0
[ 30.390020] [<ffffffff810507a8>] ttwu_do_activate.clone.98+0x5c/0x61
[ 30.390020] [<ffffffff810548e1>] try_to_wake_up+0x19b/0x1d7
[ 30.390020] [<ffffffff8105492f>] default_wake_function+0x12/0x14
[ 30.390020] [<ffffffff81047526>] __wake_up_common+0x4e/0x84
[ 30.390020] [<ffffffff8104d547>] __wake_up+0x39/0x4d
[ 30.390020] [<ffffffff810af184>] audit_log_end+0x71/0x8f
[ 30.390020] [<ffffffff810b3404>] audit_log_exit+0x2d6/0xf63
[ 30.390020] [<ffffffff810b4a95>] audit_syscall_exit+0x4b/0x14b
[ 30.390020] [<ffffffff810146fc>] syscall_trace_leave+0x3d/0x101
[ 30.390020] [<ffffffff814f6a78>] int_check_syscall_exit_work+0x34/0x3d
[ 30.390020]
[ 30.390020] other info that might help us debug this:
[ 30.390020]
[ 30.390020] Chain exists of:
[ 30.390020] &(&sig->cputimer.lock)->rlock --> &p->pi_lock --> &rq->lock
[ 30.390020]
[ 30.390020] Possible unsafe locking scenario:
[ 30.390020]
[ 30.390020] CPU0 CPU1
[ 30.390020] ---- ----
[ 30.390020] lock(&rq->lock);
[ 30.390020] lock(&p->pi_lock);
[ 30.390020] lock(&rq->lock);
[ 30.390020] lock(&(&sig->cputimer.lock)->rlock);
[ 30.390020]
[ 30.390020] *** DEADLOCK ***
[ 30.390020]
[ 30.390020] 3 locks held by modprobe/809:
[ 30.390020] #0: (kauditd_wait.lock){......}, at: [<ffffffff8104d530>] __wake_up+0x22/0x4d
[ 30.390020] #1: (&p->pi_lock){-.-.-.}, at: [<ffffffff81054774>] try_to_wake_up+0x2e/0x1d7
[ 30.390020] #2: (&rq->lock){-.-.-.}, at: [<ffffffff810548d5>] try_to_wake_up+0x18f/0x1d7
[ 30.390020]
[ 30.390020] stack backtrace:
[ 30.390020] Pid: 809, comm: modprobe Not tainted 3.0.7-local0.fc14.x86_64 #1
[ 30.390020] Call Trace:
[ 30.390020] [<ffffffff8108d967>] print_circular_bug+0x1f8/0x209
[ 30.390020] [<ffffffff8108f1fa>] __lock_acquire+0xadc/0xdd8
[ 30.390020] [<ffffffff8104b730>] ? account_group_exec_runtime+0x34/0x4b
[ 30.390020] [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[ 30.390020] [<ffffffff8104b730>] ? account_group_exec_runtime+0x34/0x4b
[ 30.390020] [<ffffffff814ef2cb>] _raw_spin_lock+0x36/0x69
[ 30.390020] [<ffffffff8104b730>] ? account_group_exec_runtime+0x34/0x4b
[ 30.390020] [<ffffffff8104b730>] account_group_exec_runtime+0x34/0x4b
[ 30.390020] [<ffffffff8104b819>] update_curr+0xd2/0xf0
[ 30.390020] [<ffffffff8104f405>] check_preempt_wakeup+0xd0/0x1d9
[ 30.390020] [<ffffffff8104a817>] check_preempt_curr+0x34/0x74
[ 30.390020] [<ffffffff8104a997>] ttwu_do_wakeup+0x4d/0xe0
[ 30.390020] [<ffffffff810507a8>] ttwu_do_activate.clone.98+0x5c/0x61
[ 30.390020] [<ffffffff810548e1>] try_to_wake_up+0x19b/0x1d7
[ 30.390020] [<ffffffff8108d31f>] ? lock_acquired+0x1ea/0x215
[ 30.390020] [<ffffffff8105492f>] default_wake_function+0x12/0x14
[ 30.390020] [<ffffffff81047526>] __wake_up_common+0x4e/0x84
[ 30.390020] [<ffffffff8104d530>] ? __wake_up+0x22/0x4d
[ 30.390020] [<ffffffff8104d547>] __wake_up+0x39/0x4d
[ 30.390020] [<ffffffff810af184>] audit_log_end+0x71/0x8f
[ 30.390020] [<ffffffff810b3404>] audit_log_exit+0x2d6/0xf63
[ 30.390020] [<ffffffff8108f8e3>] ? lock_release+0x18a/0x1b2
[ 30.390020] [<ffffffff810b2592>] ? rcu_read_unlock+0x21/0x23
[ 30.390020] [<ffffffff810b462c>] ? audit_filter_inodes+0x11b/0x12a
[ 30.390020] [<ffffffff810b4a95>] audit_syscall_exit+0x4b/0x14b
[ 30.390020] [<ffffffff814f6a3c>] ? int_very_careful+0x5/0xd
[ 30.390020] [<ffffffff810146fc>] syscall_trace_leave+0x3d/0x101
[ 30.390020] [<ffffffff814f6a78>] int_check_syscall_exit_work+0x34/0x3d
Paul Bolle
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sig->cputimer.lock/rq->lock lockdep report
2011-10-13 14:44 ` Ari Savolainen
@ 2011-10-14 19:01 ` Ari Savolainen
0 siblings, 0 replies; 6+ messages in thread
From: Ari Savolainen @ 2011-10-14 19:01 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-kernel, Dave Jones
Today there were no freezes in nine hours with the patch applied.
Ari
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sig->cputimer.lock/rq->lock lockdep report
2011-10-12 20:50 ` Peter Zijlstra
@ 2011-10-13 14:44 ` Ari Savolainen
2011-10-14 19:01 ` Ari Savolainen
0 siblings, 1 reply; 6+ messages in thread
From: Ari Savolainen @ 2011-10-13 14:44 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-kernel, Dave Jones
The patch works. No more warnings in the log after boot. I have the
issue in two machines. Both of them freeze after a few hours of use,
but log files don't tell anything after a forced reboot. Tomorrow will
tell if this was it.
Ari
2011/10/12 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> On Wed, 2011-10-12 at 23:13 +0300, Ari Savolainen wrote:
>> I've got the same problem that Dave reported earlier. I bisected it to
>> commit d670ec13178d "posix-cpu-timers: Cure SMP wobbles".
>
> https://lkml.org/lkml/2011/10/7/282
>
> Does the below work for you? (still not proud of it)
>
> ---
> include/linux/sched.h | 3 +++
> kernel/posix-cpu-timers.c | 6 +++++-
> kernel/sched_stats.h | 4 ++--
> 3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 41d0237..ad9eafc 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -503,6 +503,7 @@ struct task_cputime {
> * @running: non-zero when there are timers running and
> * @cputime receives updates.
> * @lock: lock for fields in this struct.
> + * @runtime_lock: lock for cputime.sum_exec_runtime
> *
> * This structure contains the version of task_cputime, above, that is
> * used for thread group CPU timer calculations.
> @@ -511,6 +512,7 @@ struct thread_group_cputimer {
> struct task_cputime cputime;
> int running;
> spinlock_t lock;
> + spinlock_t runtime_lock;
> };
>
> #include <linux/rwsem.h>
> @@ -2566,6 +2568,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
> static inline void thread_group_cputime_init(struct signal_struct *sig)
> {
> spin_lock_init(&sig->cputimer.lock);
> + spin_lock_init(&sig->cputimer.runtime_lock);
> }
>
> /*
> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> index c8008dd..fa189a6 100644
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
> * it.
> */
> thread_group_cputime(tsk, &sum);
> + spin_lock(&cputimer->runtime_lock);
> update_gt_cputime(&cputimer->cputime, &sum);
> - }
> + } else
> + spin_lock(&cputimer->runtime_lock);
> +
> *times = cputimer->cputime;
> + spin_unlock(&cputimer->runtime_lock);
> spin_unlock_irqrestore(&cputimer->lock, flags);
> }
>
> diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
> index 331e01b..a7e2c1a 100644
> --- a/kernel/sched_stats.h
> +++ b/kernel/sched_stats.h
> @@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
> if (!cputimer->running)
> return;
>
> - spin_lock(&cputimer->lock);
> + spin_lock(&cputimer->runtime_lock);
> cputimer->cputime.sum_exec_runtime += ns;
> - spin_unlock(&cputimer->lock);
> + spin_unlock(&cputimer->runtime_lock);
> }
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sig->cputimer.lock/rq->lock lockdep report
2011-10-12 20:13 Ari Savolainen
@ 2011-10-12 20:50 ` Peter Zijlstra
2011-10-13 14:44 ` Ari Savolainen
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2011-10-12 20:50 UTC (permalink / raw)
To: Ari Savolainen; +Cc: Thomas Gleixner, linux-kernel, Dave Jones
On Wed, 2011-10-12 at 23:13 +0300, Ari Savolainen wrote:
> I've got the same problem that Dave reported earlier. I bisected it to
> commit d670ec13178d "posix-cpu-timers: Cure SMP wobbles".
https://lkml.org/lkml/2011/10/7/282
Does the below work for you? (still not proud of it)
---
include/linux/sched.h | 3 +++
kernel/posix-cpu-timers.c | 6 +++++-
kernel/sched_stats.h | 4 ++--
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..ad9eafc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -503,6 +503,7 @@ struct task_cputime {
* @running: non-zero when there are timers running and
* @cputime receives updates.
* @lock: lock for fields in this struct.
+ * @runtime_lock: lock for cputime.sum_exec_runtime
*
* This structure contains the version of task_cputime, above, that is
* used for thread group CPU timer calculations.
@@ -511,6 +512,7 @@ struct thread_group_cputimer {
struct task_cputime cputime;
int running;
spinlock_t lock;
+ spinlock_t runtime_lock;
};
#include <linux/rwsem.h>
@@ -2566,6 +2568,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
static inline void thread_group_cputime_init(struct signal_struct *sig)
{
spin_lock_init(&sig->cputimer.lock);
+ spin_lock_init(&sig->cputimer.runtime_lock);
}
/*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..fa189a6 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
* it.
*/
thread_group_cputime(tsk, &sum);
+ spin_lock(&cputimer->runtime_lock);
update_gt_cputime(&cputimer->cputime, &sum);
- }
+ } else
+ spin_lock(&cputimer->runtime_lock);
+
*times = cputimer->cputime;
+ spin_unlock(&cputimer->runtime_lock);
spin_unlock_irqrestore(&cputimer->lock, flags);
}
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 331e01b..a7e2c1a 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
if (!cputimer->running)
return;
- spin_lock(&cputimer->lock);
+ spin_lock(&cputimer->runtime_lock);
cputimer->cputime.sum_exec_runtime += ns;
- spin_unlock(&cputimer->lock);
+ spin_unlock(&cputimer->runtime_lock);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* sig->cputimer.lock/rq->lock lockdep report
@ 2011-10-12 20:13 Ari Savolainen
2011-10-12 20:50 ` Peter Zijlstra
0 siblings, 1 reply; 6+ messages in thread
From: Ari Savolainen @ 2011-10-12 20:13 UTC (permalink / raw)
To: Thomas Gleixner, Peter Zijlstra, linux-kernel; +Cc: Dave Jones
I've got the same problem that Dave reported earlier. I bisected it to
commit d670ec13178d "posix-cpu-timers: Cure SMP wobbles".
[ 23.032034] =======================================================
[ 23.032039] [ INFO: possible circular locking dependency detected ]
[ 23.032042] 3.1.0-rc9+ #6
[ 23.032043] -------------------------------------------------------
[ 23.032045] xmonad/3052 is trying to acquire lock:
[ 23.032047] (&(&sig->cputimer.lock)->rlock){-.....}, at:
[<ffffffff810855bc>] update_curr+0x12c/0x260
[ 23.032057]
[ 23.032057] but task is already holding lock:
[ 23.032059] (&rq->lock){-.-.-.}, at: [<ffffffff81089043>]
scheduler_tick+0x53/0x3e0
[ 23.032064]
[ 23.032065] which lock already depends on the new lock.
[ 23.032065]
[ 23.032067]
[ 23.032067] the existing dependency chain (in reverse order) is:
[ 23.032069]
[ 23.032070] -> #2 (&rq->lock){-.-.-.}:
[ 23.032073] [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[ 23.032078] [<ffffffff816f4426>] _raw_spin_lock+0x46/0x80
[ 23.032083] [<ffffffff810881ac>] wake_up_new_task+0x9c/0x2a0
[ 23.032086] [<ffffffff8108f887>] do_fork+0x167/0x3e0
[ 23.032089] [<ffffffff8104d4d1>] kernel_thread+0x71/0x80
[ 23.032093] [<ffffffff816cac02>] rest_init+0x26/0xe4
[ 23.032098] [<ffffffff81ea6b72>] start_kernel+0x365/0x370
[ 23.032103] [<ffffffff81ea6322>] x86_64_start_reservations+0x132/0x136
[ 23.032106] [<ffffffff81ea6416>] x86_64_start_kernel+0xf0/0xf7
[ 23.032109]
[ 23.032110] -> #1 (&p->pi_lock){-.-.-.}:
[ 23.032113] [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[ 23.032116] [<ffffffff816f45f8>] _raw_spin_lock_irqsave+0x58/0xa0
[ 23.032119] [<ffffffff810ba0ed>] thread_group_cputimer+0x3d/0x100
[ 23.032123] [<ffffffff810ba1e2>] cpu_timer_sample_group+0x32/0xb0
[ 23.032126] [<ffffffff810baaf3>] posix_cpu_timer_set+0xf3/0x350
[ 23.032129] [<ffffffff810b6b18>] sys_timer_settime+0xa8/0x180
[ 23.032134] [<ffffffff816fc2ab>] system_call_fastpath+0x16/0x1b
[ 23.032138]
[ 23.032139] -> #0 (&(&sig->cputimer.lock)->rlock){-.....}:
[ 23.032142] [<ffffffff810d22b5>] __lock_acquire+0x1755/0x1d70
[ 23.032145] [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[ 23.032147] [<ffffffff816f4426>] _raw_spin_lock+0x46/0x80
[ 23.032150] [<ffffffff810855bc>] update_curr+0x12c/0x260
[ 23.032153] [<ffffffff81085c17>] task_tick_fair+0x37/0x180
[ 23.032156] [<ffffffff810890c4>] scheduler_tick+0xd4/0x3e0
[ 23.032160] [<ffffffff810a3f7e>] update_process_times+0x6e/0x90
[ 23.032164] [<ffffffff810ca314>] tick_sched_timer+0x64/0xc0
[ 23.032169] [<ffffffff810bc38f>] __run_hrtimer+0x6f/0x360
[ 23.032172] [<ffffffff810bcdf3>] hrtimer_interrupt+0xf3/0x220
[ 23.032175] [<ffffffff816fe119>] smp_apic_timer_interrupt+0x69/0x99
[ 23.032179] [<ffffffff816fcdb0>] apic_timer_interrupt+0x70/0x80
[ 23.032182] [<ffffffff8119bfac>] kmem_cache_alloc+0x1fc/0x210
[ 23.032186] [<ffffffff81151c65>] mempool_alloc_slab+0x15/0x20
[ 23.032191] [<ffffffff81151fa9>] mempool_alloc+0x59/0x150
[ 23.032194] [<ffffffff811dd73e>] bio_alloc_bioset+0x3e/0xf0
[ 23.032199] [<ffffffff81510330>] __split_and_process_bio+0x580/0x6b0
[ 23.032203] [<ffffffff815105cf>] dm_request+0x16f/0x230
[ 23.032206] [<ffffffff8133f294>] generic_make_request+0x274/0x700
[ 23.032211] [<ffffffff8133f798>] submit_bio+0x78/0xf0
[ 23.032214] [<ffffffff811e2f60>] mpage_readpages+0x120/0x140
[ 23.032218] [<ffffffff8121b90d>] ext4_readpages+0x1d/0x20
[ 23.032222] [<ffffffff8115b09a>] __do_page_cache_readahead+0x21a/0x2d0
[ 23.032226] [<ffffffff8115b2e1>] ra_submit+0x21/0x30
[ 23.032229] [<ffffffff811516f2>] filemap_fault+0x282/0x4b0
[ 23.032232] [<ffffffff81171ec1>] __do_fault+0x71/0x4b0
[ 23.032237] [<ffffffff81174824>] handle_pte_fault+0x84/0x8e0
[ 23.032240] [<ffffffff8117537f>] handle_mm_fault+0x1bf/0x2d0
[ 23.032243] [<ffffffff816f8041>] do_page_fault+0x141/0x530
[ 23.032247] [<ffffffff816f56bf>] page_fault+0x1f/0x30
[ 23.032250]
[ 23.032251] other info that might help us debug this:
[ 23.032252]
[ 23.032253] Chain exists of:
[ 23.032253] &(&sig->cputimer.lock)->rlock --> &p->pi_lock --> &rq->lock
[ 23.032258]
[ 23.032259] Possible unsafe locking scenario:
[ 23.032260]
[ 23.032262] CPU0 CPU1
[ 23.032263] ---- ----
[ 23.032265] lock(&rq->lock);
[ 23.032267] lock(&p->pi_lock);
[ 23.032270] lock(&rq->lock);
[ 23.032272] lock(&(&sig->cputimer.lock)->rlock);
[ 23.032275]
[ 23.032275] *** DEADLOCK ***
[ 23.032276]
[ 23.032277] 3 locks held by xmonad/3052:
[ 23.032279] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff816f7fcf>]
do_page_fault+0xcf/0x530
[ 23.032284] #1: (&md->io_lock){++++..}, at: [<ffffffff8151049f>]
dm_request+0x3f/0x230
[ 23.032289] #2: (&rq->lock){-.-.-.}, at: [<ffffffff81089043>]
scheduler_tick+0x53/0x3e0
[ 23.032294]
[ 23.032294] stack backtrace:
[ 23.032297] Pid: 3052, comm: xmonad Not tainted 3.1.0-rc9+ #6
[ 23.032299] Call Trace:
[ 23.032301] <IRQ> [<ffffffff816e6d9a>] print_circular_bug+0x23d/0x24e
[ 23.032308] [<ffffffff810d22b5>] __lock_acquire+0x1755/0x1d70
[ 23.032312] [<ffffffff810bf335>] ? sched_clock_local+0x25/0x90
[ 23.032316] [<ffffffff810bf4c8>] ? sched_clock_cpu+0xa8/0x120
[ 23.032318] [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[ 23.032321] [<ffffffff810855bc>] ? update_curr+0x12c/0x260
[ 23.032324] [<ffffffff816f4426>] _raw_spin_lock+0x46/0x80
[ 23.032327] [<ffffffff810855bc>] ? update_curr+0x12c/0x260
[ 23.032330] [<ffffffff810855bc>] update_curr+0x12c/0x260
[ 23.032333] [<ffffffff81085c17>] task_tick_fair+0x37/0x180
[ 23.032336] [<ffffffff810890c4>] scheduler_tick+0xd4/0x3e0
[ 23.032339] [<ffffffff810a3f7e>] update_process_times+0x6e/0x90
[ 23.032342] [<ffffffff810ca314>] tick_sched_timer+0x64/0xc0
[ 23.032345] [<ffffffff810bc38f>] __run_hrtimer+0x6f/0x360
[ 23.032348] [<ffffffff810ca2b0>] ? tick_nohz_handler+0xf0/0xf0
[ 23.032351] [<ffffffff810bcdf3>] hrtimer_interrupt+0xf3/0x220
[ 23.032354] [<ffffffff816fe119>] smp_apic_timer_interrupt+0x69/0x99
[ 23.032357] [<ffffffff816fcdb0>] apic_timer_interrupt+0x70/0x80
[ 23.032359] <EOI> [<ffffffff810ceeb8>] ? mark_held_locks+0x88/0x150
[ 23.032365] [<ffffffff816ea30f>] ? __slab_alloc.isra.58+0x44c/0x461
[ 23.032368] [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[ 23.032371] [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[ 23.032374] [<ffffffff810d107f>] ? __lock_acquire+0x51f/0x1d70
[ 23.032377] [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[ 23.032380] [<ffffffff8119bfac>] kmem_cache_alloc+0x1fc/0x210
[ 23.032383] [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[ 23.032386] [<ffffffff81151c65>] mempool_alloc_slab+0x15/0x20
[ 23.032389] [<ffffffff81151fa9>] mempool_alloc+0x59/0x150
[ 23.032392] [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[ 23.032395] [<ffffffff81151fa9>] ? mempool_alloc+0x59/0x150
[ 23.032398] [<ffffffff811dd73e>] bio_alloc_bioset+0x3e/0xf0
[ 23.032401] [<ffffffff81510330>] __split_and_process_bio+0x580/0x6b0
[ 23.032404] [<ffffffff8150fde1>] ? __split_and_process_bio+0x31/0x6b0
[ 23.032408] [<ffffffff810cc80d>] ? trace_hardirqs_off+0xd/0x10
[ 23.032411] [<ffffffff815105cf>] dm_request+0x16f/0x230
[ 23.032414] [<ffffffff81510493>] ? dm_request+0x33/0x230
[ 23.032417] [<ffffffff811dcfc1>] ? __bio_add_page.part.15+0x101/0x210
[ 23.032421] [<ffffffff8133f294>] generic_make_request+0x274/0x700
[ 23.032424] [<ffffffff811dd123>] ? bio_add_page+0x53/0x60
[ 23.032427] [<ffffffff811e2bd4>] ? do_mpage_readpage+0x434/0x630
[ 23.032430] [<ffffffff8133f798>] submit_bio+0x78/0xf0
[ 23.032433] [<ffffffff811e2f60>] mpage_readpages+0x120/0x140
[ 23.032436] [<ffffffff81220110>] ? noalloc_get_block_write+0x30/0x30
[ 23.032439] [<ffffffff81220110>] ? noalloc_get_block_write+0x30/0x30
[ 23.032443] [<ffffffff810cc80d>] ? trace_hardirqs_off+0xd/0x10
[ 23.032446] [<ffffffff810bf58f>] ? local_clock+0x4f/0x60
[ 23.032449] [<ffffffff8121b90d>] ext4_readpages+0x1d/0x20
[ 23.032452] [<ffffffff8115b09a>] __do_page_cache_readahead+0x21a/0x2d0
[ 23.032455] [<ffffffff8115af3e>] ? __do_page_cache_readahead+0xbe/0x2d0
[ 23.032458] [<ffffffff8115b2e1>] ra_submit+0x21/0x30
[ 23.032461] [<ffffffff811516f2>] filemap_fault+0x282/0x4b0
[ 23.032464] [<ffffffff810d107f>] ? __lock_acquire+0x51f/0x1d70
[ 23.032468] [<ffffffff81171ec1>] __do_fault+0x71/0x4b0
[ 23.032471] [<ffffffff81174824>] handle_pte_fault+0x84/0x8e0
[ 23.032474] [<ffffffff816f7fcf>] ? do_page_fault+0xcf/0x530
[ 23.032478] [<ffffffff810ccf8c>] ? lock_release_holdtime.part.23+0x11c/0x1a0
[ 23.032481] [<ffffffff816f80ec>] ? do_page_fault+0x1ec/0x530
[ 23.032484] [<ffffffff8117537f>] handle_mm_fault+0x1bf/0x2d0
[ 23.032487] [<ffffffff816f8041>] do_page_fault+0x141/0x530
[ 23.032490] [<ffffffff81171df3>] ? might_fault+0x53/0xb0
[ 23.032494] [<ffffffff8136241d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 23.032497] [<ffffffff816f56bf>] page_fault+0x1f/0x30
Ari
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-10-20 15:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-06 15:55 sig->cputimer.lock/rq->lock lockdep report Dave Jones
2011-10-20 15:47 ` Paul Bolle
2011-10-12 20:13 Ari Savolainen
2011-10-12 20:50 ` Peter Zijlstra
2011-10-13 14:44 ` Ari Savolainen
2011-10-14 19:01 ` Ari Savolainen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.