All of lore.kernel.org
 help / color / mirror / Atom feed
* sig->cputimer.lock/rq->lock lockdep report
@ 2011-10-06 15:55 Dave Jones
  2011-10-20 15:47 ` Paul Bolle
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Jones @ 2011-10-06 15:55 UTC (permalink / raw)
  To: Linux Kernel

This just started happening the last few days during boot.

[   35.629250] =======================================================
[   35.630234] [ INFO: possible circular locking dependency detected ]
[   35.630234] 3.1.0-rc8+ #16
[   35.630234] -------------------------------------------------------
[   35.630234] modprobe/1036 is trying to acquire lock:
[   35.630234]  (&(&sig->cputimer.lock)->rlock){-.....}, at: [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
[   35.630234] 
[   35.630234] but task is already holding lock:
[   35.630234]  (&rq->lock){-.-.-.}, at: [<ffffffff81058638>] scheduler_tick+0x3d/0x238
[   35.630234] 
[   35.630234] which lock already depends on the new lock.
[   35.630234] 
[   35.630234] 
[   35.630234] the existing dependency chain (in reverse order) is:
[   35.630234] 
[   35.630234] -> #2 (&rq->lock){-.-.-.}:
[   35.630234]        [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[   35.630234]        [<ffffffff814e98e3>] _raw_spin_lock+0x40/0x73
[   35.630234]        [<ffffffff81049570>] __task_rq_lock+0x5e/0x8b
[   35.630234]        [<ffffffff81058029>] wake_up_new_task+0x46/0x10d
[   35.630234]        [<ffffffff8105d401>] do_fork+0x201/0x2cd
[   35.630234]        [<ffffffff81016f94>] kernel_thread+0x75/0x77
[   35.630234]        [<ffffffff814c6102>] rest_init+0x26/0xdc
[   35.630234]        [<ffffffff81d48bbe>] start_kernel+0x3ec/0x3f7
[   35.630234]        [<ffffffff81d482c4>] x86_64_start_reservations+0xaf/0xb3
[   35.630234]        [<ffffffff81d483ca>] x86_64_start_kernel+0x102/0x111
[   35.630234] 
[   35.630234] -> #1 (&p->pi_lock){-.-.-.}:
[   35.630234]        [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[   35.630234]        [<ffffffff814e9a70>] _raw_spin_lock_irqsave+0x4f/0x89
[   35.630234]        [<ffffffff8107df4c>] thread_group_cputimer+0x32/0xbd
[   35.630234]        [<ffffffff8107e002>] cpu_timer_sample_group+0x2b/0x75
[   35.630234]        [<ffffffff8107efc4>] set_process_cpu_timer+0x36/0xe0
[   35.630234]        [<ffffffff8107f0ad>] update_rlimit_cpu+0x3f/0x5a
[   35.630234]        [<ffffffff8123096a>] selinux_bprm_committing_creds+0xc6/0xcb
[   35.630234]        [<ffffffff812280b9>] security_bprm_committing_creds+0x13/0x15
[   35.630234]        [<ffffffff8114a324>] install_exec_creds+0x13/0x52
[   35.630234]        [<ffffffff81189415>] load_elf_binary+0xf3b/0x1762
[   35.630234]        [<ffffffff8114b21b>] search_binary_handler+0xbf/0x263
[   35.630234]        [<ffffffff8114c211>] do_execve_common+0x1e6/0x2dd
[   35.630234]        [<ffffffff8114c34b>] do_execve+0x43/0x45
[   35.630234]        [<ffffffff81017bba>] sys_execve+0x43/0x5a
[   35.630234]        [<ffffffff814f091c>] stub_execve+0x6c/0xc0
[   35.630234] 
[   35.630234] -> #0 (&(&sig->cputimer.lock)->rlock){-.....}:
[   35.630234]        [<ffffffff81090a29>] __lock_acquire+0xa2f/0xd0c
[   35.630234]        [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[   35.630234]        [<ffffffff814e98e3>] _raw_spin_lock+0x40/0x73
[   35.630234]        [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
[   35.630234]        [<ffffffff81055ec2>] update_curr+0x112/0x134
[   35.630234]        [<ffffffff810561fc>] task_tick_fair+0x2f/0x120
[   35.630234]        [<ffffffff810586c1>] scheduler_tick+0xc6/0x238
[   35.630234]        [<ffffffff8106cc1c>] update_process_times+0x65/0x75
[   35.630234]        [<ffffffff8108b96b>] tick_sched_timer+0x72/0x9b
[   35.630234]        [<ffffffff8107f9d4>] __run_hrtimer+0xed/0x1db
[   35.630234]        [<ffffffff81080224>] hrtimer_interrupt+0xea/0x1b4
[   35.630234]        [<ffffffff814f30ca>] smp_apic_timer_interrupt+0x76/0x89
[   35.630234]        [<ffffffff814f0fb3>] apic_timer_interrupt+0x73/0x80
[   35.630234]        [<ffffffff814ea146>] _raw_spin_unlock_irq+0x32/0x5e
[   35.630234]        [<ffffffff8107f0c0>] update_rlimit_cpu+0x52/0x5a
[   35.630234]        [<ffffffff8123096a>] selinux_bprm_committing_creds+0xc6/0xcb
[   35.630234]        [<ffffffff812280b9>] security_bprm_committing_creds+0x13/0x15
[   35.630234]        [<ffffffff8114a324>] install_exec_creds+0x13/0x52
[   35.630234]        [<ffffffff81189415>] load_elf_binary+0xf3b/0x1762
[   35.630234]        [<ffffffff8114b21b>] search_binary_handler+0xbf/0x263
[   35.630234]        [<ffffffff8114c211>] do_execve_common+0x1e6/0x2dd
[   35.630234]        [<ffffffff8114c34b>] do_execve+0x43/0x45
[   35.630234]        [<ffffffff81017bba>] sys_execve+0x43/0x5a
[   35.630234]        [<ffffffff814f091c>] stub_execve+0x6c/0xc0
[   35.630234] 
[   35.630234] other info that might help us debug this:
[   35.630234] 
[   35.630234] Chain exists of:
[   35.630234]   &(&sig->cputimer.lock)->rlock --> &p->pi_lock --> &rq->lock
[   35.630234] 
[   35.630234]  Possible unsafe locking scenario:
[   35.630234] 
[   35.630234]        CPU0                    CPU1
[   35.630234]        ----                    ----
[   35.630234]   lock(&rq->lock);
[   35.630234]                                lock(&p->pi_lock);
[   35.630234]                                lock(&rq->lock);
[   35.630234]   lock(&(&sig->cputimer.lock)->rlock);
[   35.630234] 
[   35.630234]  *** DEADLOCK ***
[   35.630234] 
[   35.630234] 2 locks held by modprobe/1036:
[   35.630234]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff8114bed0>] prepare_bprm_creds+0x2d/0x69
[   35.630234]  #1:  (&rq->lock){-.-.-.}, at: [<ffffffff81058638>] scheduler_tick+0x3d/0x238
[   35.630234] 
[   35.630234] stack backtrace:
[   35.630234] Pid: 1036, comm: modprobe Not tainted 3.1.0-rc8+ #16
[   35.630234] Call Trace:
[   35.630234]  <IRQ>  [<ffffffff81081179>] ? up+0x39/0x3e
[   35.630234]  [<ffffffff814df59a>] print_circular_bug+0x1f8/0x209
[   35.630234]  [<ffffffff81090a29>] __lock_acquire+0xa2f/0xd0c
[   35.630234]  [<ffffffff81016228>] ? __cycles_2_ns+0xe/0x3a
[   35.630234]  [<ffffffff81082176>] ? local_clock+0x14/0x4c
[   35.630234]  [<ffffffff810164e9>] ? sched_clock+0x9/0xd
[   35.630234]  [<ffffffff8104935e>] ? account_group_exec_runtime+0x32/0x49
[   35.630234]  [<ffffffff81091209>] lock_acquire+0xf3/0x13e
[   35.630234]  [<ffffffff8104935e>] ? account_group_exec_runtime+0x32/0x49
[   35.630234]  [<ffffffff8108d972>] ? put_lock_stats+0xe/0x29
[   35.630234]  [<ffffffff8108b8f9>] ? tick_nohz_handler+0xe4/0xe4
[   35.630234]  [<ffffffff814e98e3>] _raw_spin_lock+0x40/0x73
[   35.630234]  [<ffffffff8104935e>] ? account_group_exec_runtime+0x32/0x49
[   35.630234]  [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
[   35.630234]  [<ffffffff81055ec2>] update_curr+0x112/0x134
[   35.630234]  [<ffffffff810561fc>] task_tick_fair+0x2f/0x120
[   35.630234]  [<ffffffff810586c1>] scheduler_tick+0xc6/0x238
[   35.630234]  [<ffffffff8106cc1c>] update_process_times+0x65/0x75
[   35.630234]  [<ffffffff8108b96b>] tick_sched_timer+0x72/0x9b
[   35.630234]  [<ffffffff8107f9d4>] __run_hrtimer+0xed/0x1db
[   35.630234]  [<ffffffff81080224>] hrtimer_interrupt+0xea/0x1b4
[   35.630234]  [<ffffffff814f30ca>] smp_apic_timer_interrupt+0x76/0x89
[   35.630234]  [<ffffffff814f0fb3>] apic_timer_interrupt+0x73/0x80
[   35.630234]  <EOI>  [<ffffffff81097ad6>] ? arch_local_irq_enable+0x8/0xd
[   35.630234]  [<ffffffff810916a1>] ? trace_hardirqs_on+0xd/0xf
[   35.630234]  [<ffffffff814ea146>] _raw_spin_unlock_irq+0x32/0x5e
[   35.630234]  [<ffffffff8107f0c0>] update_rlimit_cpu+0x52/0x5a
[   35.630234]  [<ffffffff8123096a>] selinux_bprm_committing_creds+0xc6/0xcb
[   35.630234]  [<ffffffff812280b9>] security_bprm_committing_creds+0x13/0x15
[   35.630234]  [<ffffffff8114a324>] install_exec_creds+0x13/0x52
[   35.630234]  [<ffffffff81189415>] load_elf_binary+0xf3b/0x1762
[   35.630234]  [<ffffffff810164e9>] ? sched_clock+0x9/0xd
[   35.630234]  [<ffffffff8114b212>] ? search_binary_handler+0xb6/0x263
[   35.630234]  [<ffffffff814ea22b>] ? _raw_read_unlock+0x32/0x54
[   35.630234]  [<ffffffff8114b21b>] search_binary_handler+0xbf/0x263
[   35.630234]  [<ffffffff811884da>] ? load_elf_library+0x23b/0x23b
[   35.630234]  [<ffffffff8114c211>] do_execve_common+0x1e6/0x2dd
[   35.630234]  [<ffffffff8114c34b>] do_execve+0x43/0x45
[   35.630234]  [<ffffffff81017bba>] sys_execve+0x43/0x5a
[   35.630234]  [<ffffffff814f091c>] stub_execve+0x6c/0xc0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sig->cputimer.lock/rq->lock lockdep report
  2011-10-06 15:55 sig->cputimer.lock/rq->lock lockdep report Dave Jones
@ 2011-10-20 15:47 ` Paul Bolle
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Bolle @ 2011-10-20 15:47 UTC (permalink / raw)
  To: Dave Jones; +Cc: Linux Kernel

On Thu, 2011-10-06 at 11:55 -0400, Dave Jones wrote:
> [   35.630234] [ INFO: possible circular locking dependency detected ]
> [   35.630234] 3.1.0-rc8+ #16
> [   35.630234] -------------------------------------------------------
> [   35.630234] modprobe/1036 is trying to acquire lock:
> [   35.630234]  (&(&sig->cputimer.lock)->rlock){-.....}, at: [<ffffffff8104935e>] account_group_exec_runtime+0x32/0x49
> [   35.630234] 
> [   35.630234] but task is already holding lock:
> [   35.630234]  (&rq->lock){-.-.-.}, at: [<ffffffff81058638>] scheduler_tick+0x3d/0x238
> [   35.630234] 
> [   35.630234] which lock already depends on the new lock.

I ran into a rather similar lockdep warning in v3.0.7, so it doesn't
seem to be a regression:

[   30.389966] =======================================================
[   30.389970] [ INFO: possible circular locking dependency detected ]
[   30.389974] 3.0.7-local0.fc14.x86_64 #1
[   30.389976] -------------------------------------------------------
[   30.389979] modprobe/809 is trying to acquire lock:
[   30.389982]  (&(&sig->cputimer.lock)->rlock){......}, at: [<ffffffff8104b730>] account_group_exec_runtime+0x34/0x4b
[   30.389994] 
[   30.389994] but task is already holding lock:
[   30.389997]  (&rq->lock){-.-.-.}, at: [<ffffffff810548d5>] try_to_wake_up+0x18f/0x1d7
[   30.390005] 
[   30.390005] which lock already depends on the new lock.
[   30.390007] 
[   30.390009] 
[   30.390010] the existing dependency chain (in reverse order) is:
[   30.390012] 
[   30.390013] -> #2 (&rq->lock){-.-.-.}:
[   30.390018]        [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[   30.390020]        [<ffffffff814ef2cb>] _raw_spin_lock+0x36/0x69
[   30.390020]        [<ffffffff8104bc86>] __task_rq_lock+0x60/0x8f
[   30.390020]        [<ffffffff81054ccd>] wake_up_new_task+0x46/0xd2
[   30.390020]        [<ffffffff8105c227>] do_fork+0x201/0x2ce
[   30.390020]        [<ffffffff81010db1>] kernel_thread+0x75/0x77
[   30.390020]        [<ffffffff814d6382>] rest_init+0x26/0xbe
[   30.390020]        [<ffffffff81d47bc3>] start_kernel+0x3ee/0x3f9
[   30.390020]        [<ffffffff81d472cb>] x86_64_start_reservations+0xb6/0xba
[   30.390020]        [<ffffffff81d473d0>] x86_64_start_kernel+0x101/0x110
[   30.390020] 
[   30.390020] -> #1 (&p->pi_lock){-.-.-.}:
[   30.390020]        [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[   30.390020]        [<ffffffff814ef43b>] _raw_spin_lock_irqsave+0x45/0x7f
[   30.390020]        [<ffffffff8107c179>] thread_group_cputimer+0x32/0xbd
[   30.390020]        [<ffffffff8107c22f>] cpu_timer_sample_group+0x2b/0x76
[   30.390020]        [<ffffffff8107d28a>] set_process_cpu_timer+0x36/0xdd
[   30.390020]        [<ffffffff8107d370>] update_rlimit_cpu+0x3f/0x5b
[   30.390020]        [<ffffffff8120a054>] selinux_bprm_committing_creds+0xc9/0xce
[   30.390020]        [<ffffffff812016fc>] security_bprm_committing_creds+0x13/0x15
[   30.390020]        [<ffffffff81147ee1>] install_exec_creds+0x16/0x55
[   30.390020]        [<ffffffff811861b8>] load_elf_binary+0xf65/0x178d
[   30.390020]        [<ffffffff81147d28>] search_binary_handler+0x8f/0x232
[   30.390020]        [<ffffffff81149738>] do_execve_common.clone.21+0x195/0x292
[   30.390020]        [<ffffffff81149850>] do_execve+0x1b/0x1d
[   30.390020]        [<ffffffff810116d2>] sys_execve+0x43/0x5a
[   30.390020]        [<ffffffff814f6c1c>] stub_execve+0x6c/0xc0
[   30.390020] 
[   30.390020] -> #0 (&(&sig->cputimer.lock)->rlock){......}:
[   30.390020]        [<ffffffff8108f1fa>] __lock_acquire+0xadc/0xdd8
[   30.390020]        [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[   30.390020]        [<ffffffff814ef2cb>] _raw_spin_lock+0x36/0x69
[   30.390020]        [<ffffffff8104b730>] account_group_exec_runtime+0x34/0x4b
[   30.390020]        [<ffffffff8104b819>] update_curr+0xd2/0xf0
[   30.390020]        [<ffffffff8104f405>] check_preempt_wakeup+0xd0/0x1d9
[   30.390020]        [<ffffffff8104a817>] check_preempt_curr+0x34/0x74
[   30.390020]        [<ffffffff8104a997>] ttwu_do_wakeup+0x4d/0xe0
[   30.390020]        [<ffffffff810507a8>] ttwu_do_activate.clone.98+0x5c/0x61
[   30.390020]        [<ffffffff810548e1>] try_to_wake_up+0x19b/0x1d7
[   30.390020]        [<ffffffff8105492f>] default_wake_function+0x12/0x14
[   30.390020]        [<ffffffff81047526>] __wake_up_common+0x4e/0x84
[   30.390020]        [<ffffffff8104d547>] __wake_up+0x39/0x4d
[   30.390020]        [<ffffffff810af184>] audit_log_end+0x71/0x8f
[   30.390020]        [<ffffffff810b3404>] audit_log_exit+0x2d6/0xf63
[   30.390020]        [<ffffffff810b4a95>] audit_syscall_exit+0x4b/0x14b
[   30.390020]        [<ffffffff810146fc>] syscall_trace_leave+0x3d/0x101
[   30.390020]        [<ffffffff814f6a78>] int_check_syscall_exit_work+0x34/0x3d
[   30.390020] 
[   30.390020] other info that might help us debug this:
[   30.390020] 
[   30.390020] Chain exists of:
[   30.390020]   &(&sig->cputimer.lock)->rlock --> &p->pi_lock --> &rq->lock
[   30.390020] 
[   30.390020]  Possible unsafe locking scenario:
[   30.390020] 
[   30.390020]        CPU0                    CPU1
[   30.390020]        ----                    ----
[   30.390020]   lock(&rq->lock);
[   30.390020]                                lock(&p->pi_lock);
[   30.390020]                                lock(&rq->lock);
[   30.390020]   lock(&(&sig->cputimer.lock)->rlock);
[   30.390020] 
[   30.390020]  *** DEADLOCK ***
[   30.390020] 
[   30.390020] 3 locks held by modprobe/809:
[   30.390020]  #0:  (kauditd_wait.lock){......}, at: [<ffffffff8104d530>] __wake_up+0x22/0x4d
[   30.390020]  #1:  (&p->pi_lock){-.-.-.}, at: [<ffffffff81054774>] try_to_wake_up+0x2e/0x1d7
[   30.390020]  #2:  (&rq->lock){-.-.-.}, at: [<ffffffff810548d5>] try_to_wake_up+0x18f/0x1d7
[   30.390020] 
[   30.390020] stack backtrace:
[   30.390020] Pid: 809, comm: modprobe Not tainted 3.0.7-local0.fc14.x86_64 #1
[   30.390020] Call Trace:
[   30.390020]  [<ffffffff8108d967>] print_circular_bug+0x1f8/0x209
[   30.390020]  [<ffffffff8108f1fa>] __lock_acquire+0xadc/0xdd8
[   30.390020]  [<ffffffff8104b730>] ? account_group_exec_runtime+0x34/0x4b
[   30.390020]  [<ffffffff8108f9c2>] lock_acquire+0xb7/0xfb
[   30.390020]  [<ffffffff8104b730>] ? account_group_exec_runtime+0x34/0x4b
[   30.390020]  [<ffffffff814ef2cb>] _raw_spin_lock+0x36/0x69
[   30.390020]  [<ffffffff8104b730>] ? account_group_exec_runtime+0x34/0x4b
[   30.390020]  [<ffffffff8104b730>] account_group_exec_runtime+0x34/0x4b
[   30.390020]  [<ffffffff8104b819>] update_curr+0xd2/0xf0
[   30.390020]  [<ffffffff8104f405>] check_preempt_wakeup+0xd0/0x1d9
[   30.390020]  [<ffffffff8104a817>] check_preempt_curr+0x34/0x74
[   30.390020]  [<ffffffff8104a997>] ttwu_do_wakeup+0x4d/0xe0
[   30.390020]  [<ffffffff810507a8>] ttwu_do_activate.clone.98+0x5c/0x61
[   30.390020]  [<ffffffff810548e1>] try_to_wake_up+0x19b/0x1d7
[   30.390020]  [<ffffffff8108d31f>] ? lock_acquired+0x1ea/0x215
[   30.390020]  [<ffffffff8105492f>] default_wake_function+0x12/0x14
[   30.390020]  [<ffffffff81047526>] __wake_up_common+0x4e/0x84
[   30.390020]  [<ffffffff8104d530>] ? __wake_up+0x22/0x4d
[   30.390020]  [<ffffffff8104d547>] __wake_up+0x39/0x4d
[   30.390020]  [<ffffffff810af184>] audit_log_end+0x71/0x8f
[   30.390020]  [<ffffffff810b3404>] audit_log_exit+0x2d6/0xf63
[   30.390020]  [<ffffffff8108f8e3>] ? lock_release+0x18a/0x1b2
[   30.390020]  [<ffffffff810b2592>] ? rcu_read_unlock+0x21/0x23
[   30.390020]  [<ffffffff810b462c>] ? audit_filter_inodes+0x11b/0x12a
[   30.390020]  [<ffffffff810b4a95>] audit_syscall_exit+0x4b/0x14b
[   30.390020]  [<ffffffff814f6a3c>] ? int_very_careful+0x5/0xd
[   30.390020]  [<ffffffff810146fc>] syscall_trace_leave+0x3d/0x101
[   30.390020]  [<ffffffff814f6a78>] int_check_syscall_exit_work+0x34/0x3d


Paul Bolle


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sig->cputimer.lock/rq->lock lockdep report
  2011-10-13 14:44   ` Ari Savolainen
@ 2011-10-14 19:01     ` Ari Savolainen
  0 siblings, 0 replies; 6+ messages in thread
From: Ari Savolainen @ 2011-10-14 19:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-kernel, Dave Jones

Today there were no freezes in nine hours with the patch applied.

Ari

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sig->cputimer.lock/rq->lock lockdep report
  2011-10-12 20:50 ` Peter Zijlstra
@ 2011-10-13 14:44   ` Ari Savolainen
  2011-10-14 19:01     ` Ari Savolainen
  0 siblings, 1 reply; 6+ messages in thread
From: Ari Savolainen @ 2011-10-13 14:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Thomas Gleixner, linux-kernel, Dave Jones

The patch works. No more warnings in the log after boot. I have the
issue in two machines. Both of them freeze after a few hours of use,
but log files don't tell anything after a forced reboot. Tomorrow will
tell if this was it.

Ari

2011/10/12 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> On Wed, 2011-10-12 at 23:13 +0300, Ari Savolainen wrote:
>> I've got the same problem that Dave reported earlier. I bisected it to
>> commit d670ec13178d "posix-cpu-timers: Cure SMP wobbles".
>
> https://lkml.org/lkml/2011/10/7/282
>
> Does the below work for you? (still not proud of it)
>
> ---
>  include/linux/sched.h     |    3 +++
>  kernel/posix-cpu-timers.c |    6 +++++-
>  kernel/sched_stats.h      |    4 ++--
>  3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 41d0237..ad9eafc 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -503,6 +503,7 @@ struct task_cputime {
>  * @running:           non-zero when there are timers running and
>  *                     @cputime receives updates.
>  * @lock:              lock for fields in this struct.
> + * @runtime_lock:      lock for cputime.sum_exec_runtime
>  *
>  * This structure contains the version of task_cputime, above, that is
>  * used for thread group CPU timer calculations.
> @@ -511,6 +512,7 @@ struct thread_group_cputimer {
>        struct task_cputime cputime;
>        int running;
>        spinlock_t lock;
> +       spinlock_t runtime_lock;
>  };
>
>  #include <linux/rwsem.h>
> @@ -2566,6 +2568,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
>  static inline void thread_group_cputime_init(struct signal_struct *sig)
>  {
>        spin_lock_init(&sig->cputimer.lock);
> +       spin_lock_init(&sig->cputimer.runtime_lock);
>  }
>
>  /*
> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> index c8008dd..fa189a6 100644
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
>                 * it.
>                 */
>                thread_group_cputime(tsk, &sum);
> +               spin_lock(&cputimer->runtime_lock);
>                update_gt_cputime(&cputimer->cputime, &sum);
> -       }
> +       } else
> +               spin_lock(&cputimer->runtime_lock);
> +
>        *times = cputimer->cputime;
> +       spin_unlock(&cputimer->runtime_lock);
>        spin_unlock_irqrestore(&cputimer->lock, flags);
>  }
>
> diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
> index 331e01b..a7e2c1a 100644
> --- a/kernel/sched_stats.h
> +++ b/kernel/sched_stats.h
> @@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
>        if (!cputimer->running)
>                return;
>
> -       spin_lock(&cputimer->lock);
> +       spin_lock(&cputimer->runtime_lock);
>        cputimer->cputime.sum_exec_runtime += ns;
> -       spin_unlock(&cputimer->lock);
> +       spin_unlock(&cputimer->runtime_lock);
>  }
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sig->cputimer.lock/rq->lock lockdep report
  2011-10-12 20:13 Ari Savolainen
@ 2011-10-12 20:50 ` Peter Zijlstra
  2011-10-13 14:44   ` Ari Savolainen
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2011-10-12 20:50 UTC (permalink / raw)
  To: Ari Savolainen; +Cc: Thomas Gleixner, linux-kernel, Dave Jones

On Wed, 2011-10-12 at 23:13 +0300, Ari Savolainen wrote:
> I've got the same problem that Dave reported earlier. I bisected it to
> commit d670ec13178d "posix-cpu-timers: Cure SMP wobbles". 

https://lkml.org/lkml/2011/10/7/282

Does the below work for you? (still not proud of it)

---
 include/linux/sched.h     |    3 +++
 kernel/posix-cpu-timers.c |    6 +++++-
 kernel/sched_stats.h      |    4 ++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 41d0237..ad9eafc 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -503,6 +503,7 @@ struct task_cputime {
  * @running:		non-zero when there are timers running and
  * 			@cputime receives updates.
  * @lock:		lock for fields in this struct.
+ * @runtime_lock:	lock for cputime.sum_exec_runtime
  *
  * This structure contains the version of task_cputime, above, that is
  * used for thread group CPU timer calculations.
@@ -511,6 +512,7 @@ struct thread_group_cputimer {
 	struct task_cputime cputime;
 	int running;
 	spinlock_t lock;
+	spinlock_t runtime_lock;
 };
 
 #include <linux/rwsem.h>
@@ -2566,6 +2568,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times);
 static inline void thread_group_cputime_init(struct signal_struct *sig)
 {
 	spin_lock_init(&sig->cputimer.lock);
+	spin_lock_init(&sig->cputimer.runtime_lock);
 }
 
 /*
diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
index c8008dd..fa189a6 100644
--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -284,9 +284,13 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
 		 * it.
 		 */
 		thread_group_cputime(tsk, &sum);
+		spin_lock(&cputimer->runtime_lock);
 		update_gt_cputime(&cputimer->cputime, &sum);
-	}
+	} else
+		spin_lock(&cputimer->runtime_lock);
+
 	*times = cputimer->cputime;
+	spin_unlock(&cputimer->runtime_lock);
 	spin_unlock_irqrestore(&cputimer->lock, flags);
 }
 
diff --git a/kernel/sched_stats.h b/kernel/sched_stats.h
index 331e01b..a7e2c1a 100644
--- a/kernel/sched_stats.h
+++ b/kernel/sched_stats.h
@@ -330,7 +330,7 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
 	if (!cputimer->running)
 		return;
 
-	spin_lock(&cputimer->lock);
+	spin_lock(&cputimer->runtime_lock);
 	cputimer->cputime.sum_exec_runtime += ns;
-	spin_unlock(&cputimer->lock);
+	spin_unlock(&cputimer->runtime_lock);
 }


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* sig->cputimer.lock/rq->lock lockdep report
@ 2011-10-12 20:13 Ari Savolainen
  2011-10-12 20:50 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Ari Savolainen @ 2011-10-12 20:13 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, linux-kernel; +Cc: Dave Jones

I've got the same problem that Dave reported earlier. I bisected it to
commit d670ec13178d "posix-cpu-timers: Cure SMP wobbles".

[   23.032034] =======================================================
[   23.032039] [ INFO: possible circular locking dependency detected ]
[   23.032042] 3.1.0-rc9+ #6
[   23.032043] -------------------------------------------------------
[   23.032045] xmonad/3052 is trying to acquire lock:
[   23.032047]  (&(&sig->cputimer.lock)->rlock){-.....}, at:
[<ffffffff810855bc>] update_curr+0x12c/0x260
[   23.032057]
[   23.032057] but task is already holding lock:
[   23.032059]  (&rq->lock){-.-.-.}, at: [<ffffffff81089043>]
scheduler_tick+0x53/0x3e0
[   23.032064]
[   23.032065] which lock already depends on the new lock.
[   23.032065]
[   23.032067]
[   23.032067] the existing dependency chain (in reverse order) is:
[   23.032069]
[   23.032070] -> #2 (&rq->lock){-.-.-.}:
[   23.032073]        [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[   23.032078]        [<ffffffff816f4426>] _raw_spin_lock+0x46/0x80
[   23.032083]        [<ffffffff810881ac>] wake_up_new_task+0x9c/0x2a0
[   23.032086]        [<ffffffff8108f887>] do_fork+0x167/0x3e0
[   23.032089]        [<ffffffff8104d4d1>] kernel_thread+0x71/0x80
[   23.032093]        [<ffffffff816cac02>] rest_init+0x26/0xe4
[   23.032098]        [<ffffffff81ea6b72>] start_kernel+0x365/0x370
[   23.032103]        [<ffffffff81ea6322>] x86_64_start_reservations+0x132/0x136
[   23.032106]        [<ffffffff81ea6416>] x86_64_start_kernel+0xf0/0xf7
[   23.032109]
[   23.032110] -> #1 (&p->pi_lock){-.-.-.}:
[   23.032113]        [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[   23.032116]        [<ffffffff816f45f8>] _raw_spin_lock_irqsave+0x58/0xa0
[   23.032119]        [<ffffffff810ba0ed>] thread_group_cputimer+0x3d/0x100
[   23.032123]        [<ffffffff810ba1e2>] cpu_timer_sample_group+0x32/0xb0
[   23.032126]        [<ffffffff810baaf3>] posix_cpu_timer_set+0xf3/0x350
[   23.032129]        [<ffffffff810b6b18>] sys_timer_settime+0xa8/0x180
[   23.032134]        [<ffffffff816fc2ab>] system_call_fastpath+0x16/0x1b
[   23.032138]
[   23.032139] -> #0 (&(&sig->cputimer.lock)->rlock){-.....}:
[   23.032142]        [<ffffffff810d22b5>] __lock_acquire+0x1755/0x1d70
[   23.032145]        [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[   23.032147]        [<ffffffff816f4426>] _raw_spin_lock+0x46/0x80
[   23.032150]        [<ffffffff810855bc>] update_curr+0x12c/0x260
[   23.032153]        [<ffffffff81085c17>] task_tick_fair+0x37/0x180
[   23.032156]        [<ffffffff810890c4>] scheduler_tick+0xd4/0x3e0
[   23.032160]        [<ffffffff810a3f7e>] update_process_times+0x6e/0x90
[   23.032164]        [<ffffffff810ca314>] tick_sched_timer+0x64/0xc0
[   23.032169]        [<ffffffff810bc38f>] __run_hrtimer+0x6f/0x360
[   23.032172]        [<ffffffff810bcdf3>] hrtimer_interrupt+0xf3/0x220
[   23.032175]        [<ffffffff816fe119>] smp_apic_timer_interrupt+0x69/0x99
[   23.032179]        [<ffffffff816fcdb0>] apic_timer_interrupt+0x70/0x80
[   23.032182]        [<ffffffff8119bfac>] kmem_cache_alloc+0x1fc/0x210
[   23.032186]        [<ffffffff81151c65>] mempool_alloc_slab+0x15/0x20
[   23.032191]        [<ffffffff81151fa9>] mempool_alloc+0x59/0x150
[   23.032194]        [<ffffffff811dd73e>] bio_alloc_bioset+0x3e/0xf0
[   23.032199]        [<ffffffff81510330>] __split_and_process_bio+0x580/0x6b0
[   23.032203]        [<ffffffff815105cf>] dm_request+0x16f/0x230
[   23.032206]        [<ffffffff8133f294>] generic_make_request+0x274/0x700
[   23.032211]        [<ffffffff8133f798>] submit_bio+0x78/0xf0
[   23.032214]        [<ffffffff811e2f60>] mpage_readpages+0x120/0x140
[   23.032218]        [<ffffffff8121b90d>] ext4_readpages+0x1d/0x20
[   23.032222]        [<ffffffff8115b09a>] __do_page_cache_readahead+0x21a/0x2d0
[   23.032226]        [<ffffffff8115b2e1>] ra_submit+0x21/0x30
[   23.032229]        [<ffffffff811516f2>] filemap_fault+0x282/0x4b0
[   23.032232]        [<ffffffff81171ec1>] __do_fault+0x71/0x4b0
[   23.032237]        [<ffffffff81174824>] handle_pte_fault+0x84/0x8e0
[   23.032240]        [<ffffffff8117537f>] handle_mm_fault+0x1bf/0x2d0
[   23.032243]        [<ffffffff816f8041>] do_page_fault+0x141/0x530
[   23.032247]        [<ffffffff816f56bf>] page_fault+0x1f/0x30
[   23.032250]
[   23.032251] other info that might help us debug this:
[   23.032252]
[   23.032253] Chain exists of:
[   23.032253]   &(&sig->cputimer.lock)->rlock --> &p->pi_lock --> &rq->lock
[   23.032258]
[   23.032259]  Possible unsafe locking scenario:
[   23.032260]
[   23.032262]        CPU0                    CPU1
[   23.032263]        ----                    ----
[   23.032265]   lock(&rq->lock);
[   23.032267]                                lock(&p->pi_lock);
[   23.032270]                                lock(&rq->lock);
[   23.032272]   lock(&(&sig->cputimer.lock)->rlock);
[   23.032275]
[   23.032275]  *** DEADLOCK ***
[   23.032276]
[   23.032277] 3 locks held by xmonad/3052:
[   23.032279]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff816f7fcf>]
do_page_fault+0xcf/0x530
[   23.032284]  #1:  (&md->io_lock){++++..}, at: [<ffffffff8151049f>]
dm_request+0x3f/0x230
[   23.032289]  #2:  (&rq->lock){-.-.-.}, at: [<ffffffff81089043>]
scheduler_tick+0x53/0x3e0
[   23.032294]
[   23.032294] stack backtrace:
[   23.032297] Pid: 3052, comm: xmonad Not tainted 3.1.0-rc9+ #6
[   23.032299] Call Trace:
[   23.032301]  <IRQ>  [<ffffffff816e6d9a>] print_circular_bug+0x23d/0x24e
[   23.032308]  [<ffffffff810d22b5>] __lock_acquire+0x1755/0x1d70
[   23.032312]  [<ffffffff810bf335>] ? sched_clock_local+0x25/0x90
[   23.032316]  [<ffffffff810bf4c8>] ? sched_clock_cpu+0xa8/0x120
[   23.032318]  [<ffffffff810d2ff9>] lock_acquire+0x99/0x200
[   23.032321]  [<ffffffff810855bc>] ? update_curr+0x12c/0x260
[   23.032324]  [<ffffffff816f4426>] _raw_spin_lock+0x46/0x80
[   23.032327]  [<ffffffff810855bc>] ? update_curr+0x12c/0x260
[   23.032330]  [<ffffffff810855bc>] update_curr+0x12c/0x260
[   23.032333]  [<ffffffff81085c17>] task_tick_fair+0x37/0x180
[   23.032336]  [<ffffffff810890c4>] scheduler_tick+0xd4/0x3e0
[   23.032339]  [<ffffffff810a3f7e>] update_process_times+0x6e/0x90
[   23.032342]  [<ffffffff810ca314>] tick_sched_timer+0x64/0xc0
[   23.032345]  [<ffffffff810bc38f>] __run_hrtimer+0x6f/0x360
[   23.032348]  [<ffffffff810ca2b0>] ? tick_nohz_handler+0xf0/0xf0
[   23.032351]  [<ffffffff810bcdf3>] hrtimer_interrupt+0xf3/0x220
[   23.032354]  [<ffffffff816fe119>] smp_apic_timer_interrupt+0x69/0x99
[   23.032357]  [<ffffffff816fcdb0>] apic_timer_interrupt+0x70/0x80
[   23.032359]  <EOI>  [<ffffffff810ceeb8>] ? mark_held_locks+0x88/0x150
[   23.032365]  [<ffffffff816ea30f>] ? __slab_alloc.isra.58+0x44c/0x461
[   23.032368]  [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[   23.032371]  [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[   23.032374]  [<ffffffff810d107f>] ? __lock_acquire+0x51f/0x1d70
[   23.032377]  [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[   23.032380]  [<ffffffff8119bfac>] kmem_cache_alloc+0x1fc/0x210
[   23.032383]  [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[   23.032386]  [<ffffffff81151c65>] mempool_alloc_slab+0x15/0x20
[   23.032389]  [<ffffffff81151fa9>] mempool_alloc+0x59/0x150
[   23.032392]  [<ffffffff81151c65>] ? mempool_alloc_slab+0x15/0x20
[   23.032395]  [<ffffffff81151fa9>] ? mempool_alloc+0x59/0x150
[   23.032398]  [<ffffffff811dd73e>] bio_alloc_bioset+0x3e/0xf0
[   23.032401]  [<ffffffff81510330>] __split_and_process_bio+0x580/0x6b0
[   23.032404]  [<ffffffff8150fde1>] ? __split_and_process_bio+0x31/0x6b0
[   23.032408]  [<ffffffff810cc80d>] ? trace_hardirqs_off+0xd/0x10
[   23.032411]  [<ffffffff815105cf>] dm_request+0x16f/0x230
[   23.032414]  [<ffffffff81510493>] ? dm_request+0x33/0x230
[   23.032417]  [<ffffffff811dcfc1>] ? __bio_add_page.part.15+0x101/0x210
[   23.032421]  [<ffffffff8133f294>] generic_make_request+0x274/0x700
[   23.032424]  [<ffffffff811dd123>] ? bio_add_page+0x53/0x60
[   23.032427]  [<ffffffff811e2bd4>] ? do_mpage_readpage+0x434/0x630
[   23.032430]  [<ffffffff8133f798>] submit_bio+0x78/0xf0
[   23.032433]  [<ffffffff811e2f60>] mpage_readpages+0x120/0x140
[   23.032436]  [<ffffffff81220110>] ? noalloc_get_block_write+0x30/0x30
[   23.032439]  [<ffffffff81220110>] ? noalloc_get_block_write+0x30/0x30
[   23.032443]  [<ffffffff810cc80d>] ? trace_hardirqs_off+0xd/0x10
[   23.032446]  [<ffffffff810bf58f>] ? local_clock+0x4f/0x60
[   23.032449]  [<ffffffff8121b90d>] ext4_readpages+0x1d/0x20
[   23.032452]  [<ffffffff8115b09a>] __do_page_cache_readahead+0x21a/0x2d0
[   23.032455]  [<ffffffff8115af3e>] ? __do_page_cache_readahead+0xbe/0x2d0
[   23.032458]  [<ffffffff8115b2e1>] ra_submit+0x21/0x30
[   23.032461]  [<ffffffff811516f2>] filemap_fault+0x282/0x4b0
[   23.032464]  [<ffffffff810d107f>] ? __lock_acquire+0x51f/0x1d70
[   23.032468]  [<ffffffff81171ec1>] __do_fault+0x71/0x4b0
[   23.032471]  [<ffffffff81174824>] handle_pte_fault+0x84/0x8e0
[   23.032474]  [<ffffffff816f7fcf>] ? do_page_fault+0xcf/0x530
[   23.032478]  [<ffffffff810ccf8c>] ? lock_release_holdtime.part.23+0x11c/0x1a0
[   23.032481]  [<ffffffff816f80ec>] ? do_page_fault+0x1ec/0x530
[   23.032484]  [<ffffffff8117537f>] handle_mm_fault+0x1bf/0x2d0
[   23.032487]  [<ffffffff816f8041>] do_page_fault+0x141/0x530
[   23.032490]  [<ffffffff81171df3>] ? might_fault+0x53/0xb0
[   23.032494]  [<ffffffff8136241d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[   23.032497]  [<ffffffff816f56bf>] page_fault+0x1f/0x30

Ari

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-10-20 15:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-06 15:55 sig->cputimer.lock/rq->lock lockdep report Dave Jones
2011-10-20 15:47 ` Paul Bolle
2011-10-12 20:13 Ari Savolainen
2011-10-12 20:50 ` Peter Zijlstra
2011-10-13 14:44   ` Ari Savolainen
2011-10-14 19:01     ` Ari Savolainen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.