* [2.6.20] BUG: workqueue leaked lock @ 2007-03-13 16:50 Folkert van Heusden 2007-03-15 19:06 ` Andrew Morton 0 siblings, 1 reply; 35+ messages in thread From: Folkert van Heusden @ 2007-03-13 16:50 UTC (permalink / raw) To: linux-kernel ... [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] [ 1756.728392] 2 locks held by nfsd4/3577: [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 [ 1756.729547] ======================= [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd) [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd] [ 1846.683832] 2 locks held by nfsd4/3577: [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14 [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18 [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170 [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165 [ 1846.684012] [<c102f797>] kthread+0x97/0xc4 [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 [ 1846.684027] ======================= [ 1936.638930] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 [ 1936.638992] last function: laundromat_main+0x0/0x69 [nfsd] [ 1936.639141] 2 locks held by nfsd4/3577: [ 1936.639195] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 1936.639292] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 1936.639301] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 1936.639306] [<c1003d7f>] show_trace+0x12/0x14 [ 1936.639311] [<c1003e79>] dump_stack+0x16/0x18 [ 1936.639314] [<c102c2e8>] run_workqueue+0x167/0x170 [ 1936.639321] [<c102c437>] worker_thread+0x146/0x165 [ 1936.639325] [<c102f797>] kthread+0x97/0xc4 [ 1936.639329] [<c1003bdb>] kernel_thread_helper+0x7/0x10 [ 1936.639332] ======================= this repeats several times also: [ 137.324255] ======================================================= [ 137.324359] [ INFO: possible circular locking dependency detected ] [ 137.324412] 2.6.20 #2 [ 137.324457] ------------------------------------------------------- [ 137.324510] tor/4857 is trying to acquire lock: [ 137.324559] (tty_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.324765] [ 137.324766] but task is already holding lock: [ 137.324859] (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189 [ 137.325067] [ 137.325069] which lock already depends on the new lock. [ 137.325071] [ 137.325206] [ 137.325208] the existing dependency chain (in reverse order) is: [ 137.325300] [ 137.325301] -> #4 (&s->s_dquot.dqptr_sem){----}: [ 137.325501] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.325852] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.326197] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.326538] [<c1037624>] lock_acquire+0x62/0x81 [ 137.326887] [<c1032b5f>] down_read+0x2b/0x3d [ 137.327241] [<c1099403>] dquot_alloc_space+0x50/0x189 [ 137.327588] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 [ 137.327935] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf [ 137.328280] [<c10abf67>] ext3_alloc_branch+0x50/0x21b [ 137.328622] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 [ 137.328980] [<c10ac76e>] ext3_getblk+0x97/0x228 [ 137.329330] [<c10ac919>] ext3_bread+0x1a/0x78 [ 137.329672] [<c10b1bc2>] ext3_mkdir+0xf4/0x270 [ 137.330022] [<c1072a6a>] vfs_mkdir+0xb3/0x161 [ 137.330368] [<c1072ba4>] sys_mkdirat+0x8c/0xc4 [ 137.330714] [<c1072bfc>] sys_mkdir+0x20/0x22 [ 137.331063] [<c1002f78>] syscall_call+0x7/0xb [ 137.331406] [<ffffffff>] 0xffffffff [ 137.331771] [ 137.331772] -> #3 (&ei->truncate_mutex){--..}: [ 137.331979] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.332332] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.332676] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.333025] [<c1037624>] lock_acquire+0x62/0x81 [ 137.333370] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.333930] [<c1205b88>] mutex_lock+0x8/0xa [ 137.334271] [<c10ae14f>] ext3_truncate+0x170/0x468 [ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116 [ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c [ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116 [ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c [ 137.335286] [<c10af071>] ext3_setattr+0x150/0x22f [ 137.335627] [<c107d909>] notify_change+0x35b/0x392 [ 137.335968] [<c10679d2>] do_truncate+0x52/0x75 [ 137.336305] [<c1071f5e>] may_open+0x1ec/0x231 [ 137.336642] [<c107211c>] open_namei+0xda/0x59b [ 137.336975] [<c106863a>] do_filp_open+0x2c/0x53 [ 137.337310] [<c106896f>] do_sys_open+0x52/0xd8 [ 137.337645] [<c1068a11>] sys_open+0x1c/0x1e [ 137.337980] [<c1002f78>] syscall_call+0x7/0xb [ 137.338315] [<ffffffff>] 0xffffffff [ 137.338665] [ 137.338666] -> #2 (&inode->i_alloc_sem){--..}: [ 137.338864] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.339874] [<c1037624>] lock_acquire+0x62/0x81 [ 137.340207] [<c1032bf5>] down_write+0x2b/0x45 [ 137.340545] [<c107d890>] notify_change+0x2e2/0x392 [ 137.340886] [<c10679d2>] do_truncate+0x52/0x75 [ 137.341222] [<c1071f5e>] may_open+0x1ec/0x231 [ 137.341557] [<c107211c>] open_namei+0xda/0x59b [ 137.341898] [<c106863a>] do_filp_open+0x2c/0x53 [ 137.342236] [<c106896f>] do_sys_open+0x52/0xd8 [ 137.342572] [<c1068a11>] sys_open+0x1c/0x1e [ 137.343109] [<c1002f78>] syscall_call+0x7/0xb [ 137.343444] [<ffffffff>] 0xffffffff [ 137.343792] [ 137.343793] -> #1 (&sysfs_inode_imutex_key){--..}: [ 137.343988] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.344320] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.344655] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.344986] [<c1037624>] lock_acquire+0x62/0x81 [ 137.345321] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.345658] [<c1205b88>] mutex_lock+0x8/0xa [ 137.345991] [<c10a5447>] sysfs_hash_and_remove+0x43/0x11c [ 137.346328] [<c10a5ef7>] sysfs_remove_file+0xd/0x12 [ 137.346660] [<c114b00c>] device_remove_file+0x32/0x44 [ 137.346992] [<c114b6d9>] device_del+0x174/0x1d2 [ 137.347325] [<c114b742>] device_unregister+0xb/0x15 [ 137.347661] [<c114b93d>] device_destroy+0x8d/0x9a [ 137.347994] [<c1137a38>] vcs_remove_sysfs+0x1c/0x38 [ 137.348328] [<c113e0d8>] con_close+0x5e/0x6b [ 137.348661] [<c1130103>] release_dev+0x4c4/0x6ce [ 137.348999] [<c113077b>] tty_release+0x12/0x1c [ 137.349332] [<c106a471>] __fput+0x145/0x16e [ 137.349671] [<c106a32a>] fput+0x19/0x1b [ 137.350004] [<c1068a92>] filp_close+0x3c/0x75 [ 137.350338] [<c1068b34>] sys_close+0x69/0xb4 [ 137.350671] [<c1002f78>] syscall_call+0x7/0xb [ 137.351003] [<ffffffff>] 0xffffffff [ 137.351352] [ 137.351354] -> #0 (tty_mutex){--..}: [ 137.351544] [<c1035398>] check_prev_add+0x34/0x206 [ 137.352086] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.352421] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.352759] [<c1037624>] lock_acquire+0x62/0x81 [ 137.353090] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.353426] [<c1205b88>] mutex_lock+0x8/0xa [ 137.353761] [<c1098d79>] print_warning+0x8c/0x15d [ 137.354096] [<c1099537>] dquot_alloc_space+0x184/0x189 [ 137.354434] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 [ 137.354773] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf [ 137.355109] [<c10abf67>] ext3_alloc_branch+0x50/0x21b [ 137.355446] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 [ 137.355783] [<c10ac66c>] ext3_get_block+0x78/0xe3 [ 137.356121] [<c108a239>] __block_prepare_write+0x168/0x3fd [ 137.356459] [<c108ad10>] block_prepare_write+0x28/0x3b [ 137.356800] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d [ 137.357138] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6 [ 137.357474] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad [ 137.357817] [<c104e6ea>] generic_file_aio_write+0x58/0xc4 [ 137.358156] [<c10aa723>] ext3_file_write+0x2d/0xba [ 137.357817] [<c104e6ea>] generic_file_aio_write+0x58/0xc4 [ 137.358156] [<c10aa723>] ext3_file_write+0x2d/0xba [ 137.358495] [<c1069441>] do_sync_write+0xc7/0x116 [ 137.358838] [<c1069612>] vfs_write+0x182/0x187 [ 137.359176] [<c10696b8>] sys_write+0x3d/0x64 [ 137.359513] [<c1002f78>] syscall_call+0x7/0xb [ 137.359848] [<ffffffff>] 0xffffffff [ 137.360200] [ 137.360202] other info that might help us debug this: [ 137.360204] [ 137.360335] 3 locks held by tor/4857: [ 137.360382] #0: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.360625] #1: (&ei->truncate_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.361072] #2: (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189 [ 137.361313] [ 137.361314] stack backtrace: [ 137.361401] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 137.361488] [<c1003d7f>] show_trace+0x12/0x14 [ 137.361571] [<c1003e79>] dump_stack+0x16/0x18 [ 137.361663] [<c1034d6d>] print_circular_bug_tail+0x6f/0x71 [ 137.361752] [<c1035398>] check_prev_add+0x34/0x206 [ 137.361838] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.362786] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.362877] [<c1037624>] lock_acquire+0x62/0x81 [ 137.362960] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.363047] [<c1205b88>] mutex_lock+0x8/0xa [ 137.363132] [<c1098d79>] print_warning+0x8c/0x15d [ 137.363217] [<c1099537>] dquot_alloc_space+0x184/0x189 [ 137.363302] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 [ 137.363391] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf [ 137.363476] [<c10abf67>] ext3_alloc_branch+0x50/0x21b [ 137.363561] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 [ 137.363650] [<c10ac66c>] ext3_get_block+0x78/0xe3 [ 137.363735] [<c108a239>] __block_prepare_write+0x168/0x3fd [ 137.363822] [<c108ad10>] block_prepare_write+0x28/0x3b [ 137.363908] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d [ 137.363996] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6 [ 137.364084] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad [ 137.364171] [<c104e6ea>] generic_file_aio_write+0x58/0xc4 [ 137.364254] [<c10aa723>] ext3_file_write+0x2d/0xba [ 137.364340] [<c1069441>] do_sync_write+0xc7/0x116 [ 137.364425] [<c1069612>] vfs_write+0x182/0x187 [ 137.364511] [<c10696b8>] sys_write+0x3d/0x64 [ 137.364595] [<c1002f78>] syscall_call+0x7/0xb [ 137.364682] ======================= [ 137.471738] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 [ 137.471812] last function: laundromat_main+0x0/0x69 [nfsd] [ 137.471923] 2 locks held by nfsd4/3577: [ 137.471971] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.472209] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.472445] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 137.472528] [<c1003d7f>] show_trace+0x12/0x14 [ 137.472613] [<c1003e79>] dump_stack+0x16/0x18 [ 137.472695] [<c102c2e8>] run_workqueue+0x167/0x170 [ 137.472779] [<c102c437>] worker_thread+0x146/0x165 [ 137.472861] [<c102f797>] kthread+0x97/0xc4 [ 137.472943] [<c1003bdb>] kernel_thread_helper+0x7/0x10 [ 137.473026] ======================= Folkert van Heusden -- MultiTail is a versatile tool for watching logfiles and output of commands. Filtering, coloring, merging, diff-view, etc. http://www.vanheusden.com/multitail/ ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock 2007-03-13 16:50 [2.6.20] BUG: workqueue leaked lock Folkert van Heusden @ 2007-03-15 19:06 ` Andrew Morton 2007-03-15 19:17 ` Folkert van Heusden 2007-03-16 8:41 ` Peter Zijlstra 0 siblings, 2 replies; 35+ messages in thread From: Andrew Morton @ 2007-03-15 19:06 UTC (permalink / raw) To: Folkert van Heusden; +Cc: linux-kernel, Oleg Nesterov, Neil Brown > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > ... > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > [ 1756.728392] 2 locks held by nfsd4/3577: > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > [ 1756.729547] ======================= > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd) > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd] > [ 1846.683832] 2 locks held by nfsd4/3577: > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14 > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18 > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170 > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165 > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4 > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 Oleg, that's a fairly incomprehensible message we have in there. Can you please explain what it means? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock 2007-03-15 19:06 ` Andrew Morton @ 2007-03-15 19:17 ` Folkert van Heusden 2007-03-16 14:49 ` Jarek Poplawski 2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski 2007-03-16 8:41 ` Peter Zijlstra 1 sibling, 2 replies; 35+ messages in thread From: Folkert van Heusden @ 2007-03-15 19:17 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Oleg Nesterov, Neil Brown > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > ... > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 ... > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > Oleg, that's a fairly incomprehensible message we have in there. Can you > please explain what it means? Haha ok :-) Good, since I run 2.6.20 with these debugging switches switched on, I get occasionally errors like these. I get ALWAYS the following error when the system first boots when the TOR executable is started: [ 137.324255] ======================================================= [ 137.324359] [ INFO: possible circular locking dependency detected ] [ 137.324412] 2.6.20 #2 [ 137.324457] ------------------------------------------------------- [ 137.324510] tor/4857 is trying to acquire lock: [ 137.324559] (tty_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.324765] [ 137.324766] but task is already holding lock: [ 137.324859] (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189 [ 137.325067] [ 137.325069] which lock already depends on the new lock. [ 137.325071] [ 137.325206] [ 137.325208] the existing dependency chain (in reverse order) is: [ 137.325300] [ 137.325301] -> #4 (&s->s_dquot.dqptr_sem){----}: [ 137.325501] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.325852] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.326197] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.326538] [<c1037624>] lock_acquire+0x62/0x81 [ 137.326887] [<c1032b5f>] down_read+0x2b/0x3d [ 137.327241] [<c1099403>] dquot_alloc_space+0x50/0x189 [ 137.327588] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 [ 137.327935] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf [ 137.328280] [<c10abf67>] ext3_alloc_branch+0x50/0x21b [ 137.328622] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 [ 137.328980] [<c10ac76e>] ext3_getblk+0x97/0x228 [ 137.329330] [<c10ac919>] ext3_bread+0x1a/0x78 [ 137.329672] [<c10b1bc2>] ext3_mkdir+0xf4/0x270 [ 137.330022] [<c1072a6a>] vfs_mkdir+0xb3/0x161 [ 137.330368] [<c1072ba4>] sys_mkdirat+0x8c/0xc4 [ 137.330714] [<c1072bfc>] sys_mkdir+0x20/0x22 [ 137.331063] [<c1002f78>] syscall_call+0x7/0xb [ 137.331406] [<ffffffff>] 0xffffffff [ 137.331771] [ 137.331772] -> #3 (&ei->truncate_mutex){--..}: [ 137.331979] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.332332] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.332676] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.333025] [<c1037624>] lock_acquire+0x62/0x81 [ 137.333370] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.333930] [<c1205b88>] mutex_lock+0x8/0xa [ 137.334271] [<c10ae14f>] ext3_truncate+0x170/0x468 [ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116 [ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c [ 137.335286] [<c10af071>] ext3_setattr+0x150/0x22f [ 137.335627] [<c107d909>] notify_change+0x35b/0x392 [ 137.335968] [<c10679d2>] do_truncate+0x52/0x75 [ 137.336305] [<c1071f5e>] may_open+0x1ec/0x231 [ 137.336642] [<c107211c>] open_namei+0xda/0x59b [ 137.336975] [<c106863a>] do_filp_open+0x2c/0x53 [ 137.337310] [<c106896f>] do_sys_open+0x52/0xd8 [ 137.337645] [<c1068a11>] sys_open+0x1c/0x1e [ 137.337980] [<c1002f78>] syscall_call+0x7/0xb [ 137.338315] [<ffffffff>] 0xffffffff [ 137.338665] [ 137.338666] -> #2 (&inode->i_alloc_sem){--..}: [ 137.338864] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.339874] [<c1037624>] lock_acquire+0x62/0x81 [ 137.340207] [<c1032bf5>] down_write+0x2b/0x45 [ 137.340545] [<c107d890>] notify_change+0x2e2/0x392 [ 137.340886] [<c10679d2>] do_truncate+0x52/0x75 [ 137.341222] [<c1071f5e>] may_open+0x1ec/0x231 [ 137.341557] [<c107211c>] open_namei+0xda/0x59b [ 137.341898] [<c1068a11>] sys_open+0x1c/0x1e [ 137.343109] [<c1002f78>] syscall_call+0x7/0xb [ 137.343444] [<ffffffff>] 0xffffffff [ 137.343792] [ 137.343793] -> #1 (&sysfs_inode_imutex_key){--..}: [ 137.343988] [<c10354b8>] check_prev_add+0x154/0x206 [ 137.344320] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.344655] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.344986] [<c1037624>] lock_acquire+0x62/0x81 [ 137.345321] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.345658] [<c1205b88>] mutex_lock+0x8/0xa [ 137.345991] [<c10a5447>] sysfs_hash_and_remove+0x43/0x11c [ 137.346328] [<c10a5ef7>] sysfs_remove_file+0xd/0x12 [ 137.346660] [<c114b00c>] device_remove_file+0x32/0x44 [ 137.346992] [<c114b6d9>] device_del+0x174/0x1d2 [ 137.347325] [<c114b742>] device_unregister+0xb/0x15 [ 137.347661] [<c114b93d>] device_destroy+0x8d/0x9a [ 137.347994] [<c1137a38>] vcs_remove_sysfs+0x1c/0x38 [ 137.348328] [<c113e0d8>] con_close+0x5e/0x6b [ 137.348661] [<c1130103>] release_dev+0x4c4/0x6ce [ 137.348999] [<c113077b>] tty_release+0x12/0x1c [ 137.349332] [<c10aa723>] ext3_file_write+0x2d/0xba [ 137.358495] [<c1069441>] do_sync_write+0xc7/0x116 [ 137.358838] [<c1069612>] vfs_write+0x182/0x187 [ 137.359176] [<c10696b8>] sys_write+0x3d/0x64 [ 137.359513] [<c1002f78>] syscall_call+0x7/0xb [ 137.359848] [<ffffffff>] 0xffffffff [ 137.360200] [ 137.360202] other info that might help us debug this: [ 137.360204] [ 137.360335] 3 locks held by tor/4857: [ 137.360382] #0: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.360625] #1: (&ei->truncate_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.361072] #2: (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189 [ 137.361313] [ 137.361314] stack backtrace: [ 137.361401] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 137.361488] [<c1003d7f>] show_trace+0x12/0x14 [ 137.361571] [<c1003e79>] dump_stack+0x16/0x18 [ 137.361663] [<c1034d6d>] print_circular_bug_tail+0x6f/0x71 [ 137.361752] [<c1035398>] check_prev_add+0x34/0x206 [ 137.361838] [<c10355d4>] check_prevs_add+0x6a/0xd5 [ 137.362786] [<c1036dd2>] __lock_acquire+0x61c/0xa05 [ 137.362877] [<c1037624>] lock_acquire+0x62/0x81 [ 137.362960] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c [ 137.363047] [<c1205b88>] mutex_lock+0x8/0xa [ 137.363132] [<c1098d79>] print_warning+0x8c/0x15d [ 137.363217] [<c1099537>] dquot_alloc_space+0x184/0x189 [ 137.363302] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 [ 137.363391] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf [ 137.363476] [<c10abf67>] ext3_alloc_branch+0x50/0x21b [ 137.363561] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 [ 137.363650] [<c10ac66c>] ext3_get_block+0x78/0xe3 [ 137.363735] [<c108a239>] __block_prepare_write+0x168/0x3fd [ 137.363822] [<c108ad10>] block_prepare_write+0x28/0x3b [ 137.363908] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d [ 137.363996] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6 [ 137.364084] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad [ 137.364171] [<c104e6ea>] generic_file_aio_write+0x58/0xc4 [ 137.364254] [<c10aa723>] ext3_file_write+0x2d/0xba [ 137.364340] [<c1069441>] do_sync_write+0xc7/0x116 [ 137.364425] [<c1069612>] vfs_write+0x182/0x187 [ 137.364511] [<c10696b8>] sys_write+0x3d/0x64 [ 137.364595] [<c1002f78>] syscall_call+0x7/0xb [ 137.364682] ======================= Now apart from this reproducable problem, I suddenly also get these errors in dmesg: [ 137.471738] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 [ 137.471812] last function: laundromat_main+0x0/0x69 [nfsd] [ 137.471923] 2 locks held by nfsd4/3577: [ 137.471971] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.472209] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [ 137.472445] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [ 137.472528] [<c1003d7f>] show_trace+0x12/0x14 [ 137.472613] [<c1003e79>] dump_stack+0x16/0x18 [ 137.472695] [<c102c2e8>] run_workqueue+0x167/0x170 [ 137.472779] [<c102c437>] worker_thread+0x146/0x165 [ 137.472861] [<c102f797>] kthread+0x97/0xc4 [ 137.472943] [<c1003bdb>] kernel_thread_helper+0x7/0x10 These happen on 2 times per minute and then once in 2 minutes. The last one (while writing this message) is: [182932.590394] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 [182932.590450] last function: laundromat_main+0x0/0x69 [nfsd] [182932.590557] 2 locks held by nfsd4/3577: [182932.590600] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [182932.590830] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa [182932.591061] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 [182932.591145] [<c1003d7f>] show_trace+0x12/0x14 [182932.591234] [<c1003e79>] dump_stack+0x16/0x18 [182932.591316] [<c102c2e8>] run_workqueue+0x167/0x170 [182932.591399] [<c102c437>] worker_thread+0x146/0x165 [182932.591482] [<c102f797>] kthread+0x97/0xc4 [182932.591564] [<c1003bdb>] kernel_thread_helper+0x7/0x10 Config can be found here: http://keetweej.vanheusden.com/~folkert/config-2.6.20.txt System is a P4 @ 3.2GHz with 2GB of ram. IDE harddisks, 5 mounts mounted on a server connected via a 1Gb lan. The other end (the server) runs 2.6.19.1. On the problem-system this is the output of nfsstat: Server rpc stats: calls badcalls badauth badclnt xdrcall 1833 611 611 0 0 Server nfs v2: null getattr setattr root lookup readlink 611 100% 0 0% 0 0% 0 0% 0 0% 0 0% read wrcache write create remove rename 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% link symlink mkdir rmdir readdir fsstat 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% Server nfs v3: null getattr setattr lookup access readlink 611 100% 0 0% 0 0% 0 0% 0 0% 0 0% read write create mkdir symlink mknod 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% fsstat fsinfo pathconf commit 0 0% 0 0% 0 0% 0 0% Server nfs v4: null compound 611 100% 0 0% Server nfs v4 operations: op0-unused op1-unused op2-future access close commit 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% create delegpurge delegreturn getattr getfh link 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% lock lockt locku lookup lookup_root nverify 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% open openattr open_conf open_dgrd putfh putpubfh 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% putrootfh read readdir readlink remove rename 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% renew restorefh savefh secinfo setattr setcltid 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% setcltidconf verify write rellockowner 0 0% 0 0% 0 0% 0 0% Client rpc stats: calls retrans authrefrsh 1404874 5502 0 Client nfs v2: null getattr setattr root lookup readlink 0 0% 157240 11% 11360 0% 0 0% 707396 50% 20434 1% read wrcache write create remove rename 18 0% 0 0% 471048 33% 2900 0% 323 0% 2888 0% link symlink mkdir rmdir readdir fsstat 0 0% 2 0% 60 0% 32 0% 27463 1% 3709 0% Client nfs v4: null read write commit open open_conf 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% open_noat open_dgrd close setattr fsinfo renew 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% setclntid confirm lock lockt locku access 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% getattr lookup lookup_root remove rename link 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% symlink create pathconf statfs readlink readdir 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% server_caps delegreturn 0 0% 0 0% Folkert van Heusden -- www.biglumber.com <- site where one can exchange PGP key signatures ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock 2007-03-15 19:17 ` Folkert van Heusden @ 2007-03-16 14:49 ` Jarek Poplawski 2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski 1 sibling, 0 replies; 35+ messages in thread From: Jarek Poplawski @ 2007-03-16 14:49 UTC (permalink / raw) To: Folkert van Heusden Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown On 15-03-2007 20:17, Folkert van Heusden wrote: >>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: >>> ... >>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > ... >>> [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 >> Oleg, that's a fairly incomprehensible message we have in there. Can you >> please explain what it means? > > Haha ok :-) > > Good, since I run 2.6.20 with these debugging switches switched on, I > get occasionally errors like these. I get ALWAYS the following error > when the system first boots when the TOR executable is started: > > [ 137.324255] ======================================================= > [ 137.324359] [ INFO: possible circular locking dependency detected ] > [ 137.324412] 2.6.20 #2 > [ 137.324457] ------------------------------------------------------- Hi, IMHO you should now switch off lockdep ("Lock debuging: ..." options), until you update or get some patch for this circular problem. Your logs should be sufficient to nfsd people and lockdep turns off checking after this anyway. I don't know if this is a known and real locking problem, but it could be interesting now, if workqueue error would reapear. It seems nfsd code is breaking somewhere with locks held - maybe because of problems with lockdep's error handling. Regards, Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-15 19:17 ` Folkert van Heusden 2007-03-16 14:49 ` Jarek Poplawski @ 2007-03-20 11:17 ` Jarek Poplawski 2007-03-20 11:22 ` Jarek Poplawski 1 sibling, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-20 11:17 UTC (permalink / raw) To: Folkert van Heusden Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig, Jan Kara On 15-03-2007 20:17, Folkert van Heusden wrote: >>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: ... > Haha ok :-) > > Good, since I run 2.6.20 with these debugging switches switched on, I > get occasionally errors like these. I get ALWAYS the following error > when the system first boots when the TOR executable is started: > > [ 137.324255] ======================================================= > [ 137.324359] [ INFO: possible circular locking dependency detected ] > [ 137.324412] 2.6.20 #2 > [ 137.324457] ------------------------------------------------------- > [ 137.324510] tor/4857 is trying to acquire lock: > [ 137.324559] (tty_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 137.324765] > [ 137.324766] but task is already holding lock: > [ 137.324859] (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189 > [ 137.325067] > [ 137.325069] which lock already depends on the new lock. IMHO lockdep found that two locks are taken in different order: -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later) -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning() Probably print_warning() and flush_warnings() should be reworked to do printing without dqptr_sem or locking order should be changed. I hope somebody from this CC can work it out better. Regards, Jarek P. > [ 137.325071] > [ 137.325206] > [ 137.325208] the existing dependency chain (in reverse order) is: > [ 137.325300] > [ 137.325301] -> #4 (&s->s_dquot.dqptr_sem){----}: > [ 137.325501] [<c10354b8>] check_prev_add+0x154/0x206 > [ 137.325852] [<c10355d4>] check_prevs_add+0x6a/0xd5 > [ 137.326197] [<c1036dd2>] __lock_acquire+0x61c/0xa05 > [ 137.326538] [<c1037624>] lock_acquire+0x62/0x81 > [ 137.326887] [<c1032b5f>] down_read+0x2b/0x3d > [ 137.327241] [<c1099403>] dquot_alloc_space+0x50/0x189 > [ 137.327588] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 > [ 137.327935] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf > [ 137.328280] [<c10abf67>] ext3_alloc_branch+0x50/0x21b > [ 137.328622] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 > [ 137.328980] [<c10ac76e>] ext3_getblk+0x97/0x228 > [ 137.329330] [<c10ac919>] ext3_bread+0x1a/0x78 > [ 137.329672] [<c10b1bc2>] ext3_mkdir+0xf4/0x270 > [ 137.330022] [<c1072a6a>] vfs_mkdir+0xb3/0x161 > [ 137.330368] [<c1072ba4>] sys_mkdirat+0x8c/0xc4 > [ 137.330714] [<c1072bfc>] sys_mkdir+0x20/0x22 > [ 137.331063] [<c1002f78>] syscall_call+0x7/0xb > [ 137.331406] [<ffffffff>] 0xffffffff > [ 137.331771] > [ 137.331772] -> #3 (&ei->truncate_mutex){--..}: > [ 137.331979] [<c10354b8>] check_prev_add+0x154/0x206 > [ 137.332332] [<c10355d4>] check_prevs_add+0x6a/0xd5 > [ 137.332676] [<c1036dd2>] __lock_acquire+0x61c/0xa05 > [ 137.333025] [<c1037624>] lock_acquire+0x62/0x81 > [ 137.333370] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c > [ 137.333930] [<c1205b88>] mutex_lock+0x8/0xa > [ 137.334271] [<c10ae14f>] ext3_truncate+0x170/0x468 > [ 137.334613] [<c1058e72>] vmtruncate+0xa6/0x116 > [ 137.334949] [<c107d587>] inode_setattr+0x145/0x16c > [ 137.335286] [<c10af071>] ext3_setattr+0x150/0x22f > [ 137.335627] [<c107d909>] notify_change+0x35b/0x392 > [ 137.335968] [<c10679d2>] do_truncate+0x52/0x75 > [ 137.336305] [<c1071f5e>] may_open+0x1ec/0x231 > [ 137.336642] [<c107211c>] open_namei+0xda/0x59b > [ 137.336975] [<c106863a>] do_filp_open+0x2c/0x53 > [ 137.337310] [<c106896f>] do_sys_open+0x52/0xd8 > [ 137.337645] [<c1068a11>] sys_open+0x1c/0x1e > [ 137.337980] [<c1002f78>] syscall_call+0x7/0xb > [ 137.338315] [<ffffffff>] 0xffffffff > [ 137.338665] > [ 137.338666] -> #2 (&inode->i_alloc_sem){--..}: > [ 137.338864] [<c10354b8>] check_prev_add+0x154/0x206 > [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5 > [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05 > [ 137.339200] [<c10355d4>] check_prevs_add+0x6a/0xd5 > [ 137.339535] [<c1036dd2>] __lock_acquire+0x61c/0xa05 > [ 137.339874] [<c1037624>] lock_acquire+0x62/0x81 > [ 137.340207] [<c1032bf5>] down_write+0x2b/0x45 > [ 137.340545] [<c107d890>] notify_change+0x2e2/0x392 > [ 137.340886] [<c10679d2>] do_truncate+0x52/0x75 > [ 137.341222] [<c1071f5e>] may_open+0x1ec/0x231 > [ 137.341557] [<c107211c>] open_namei+0xda/0x59b > [ 137.341898] [<c1068a11>] sys_open+0x1c/0x1e > [ 137.343109] [<c1002f78>] syscall_call+0x7/0xb > [ 137.343444] [<ffffffff>] 0xffffffff > [ 137.343792] > [ 137.343793] -> #1 (&sysfs_inode_imutex_key){--..}: > [ 137.343988] [<c10354b8>] check_prev_add+0x154/0x206 > [ 137.344320] [<c10355d4>] check_prevs_add+0x6a/0xd5 > [ 137.344655] [<c1036dd2>] __lock_acquire+0x61c/0xa05 > [ 137.344986] [<c1037624>] lock_acquire+0x62/0x81 > [ 137.345321] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c > [ 137.345658] [<c1205b88>] mutex_lock+0x8/0xa > [ 137.345991] [<c10a5447>] sysfs_hash_and_remove+0x43/0x11c > [ 137.346328] [<c10a5ef7>] sysfs_remove_file+0xd/0x12 > [ 137.346660] [<c114b00c>] device_remove_file+0x32/0x44 > [ 137.346992] [<c114b6d9>] device_del+0x174/0x1d2 > [ 137.347325] [<c114b742>] device_unregister+0xb/0x15 > [ 137.347661] [<c114b93d>] device_destroy+0x8d/0x9a > [ 137.347994] [<c1137a38>] vcs_remove_sysfs+0x1c/0x38 > [ 137.348328] [<c113e0d8>] con_close+0x5e/0x6b > [ 137.348661] [<c1130103>] release_dev+0x4c4/0x6ce > [ 137.348999] [<c113077b>] tty_release+0x12/0x1c > [ 137.349332] [<c10aa723>] ext3_file_write+0x2d/0xba > [ 137.358495] [<c1069441>] do_sync_write+0xc7/0x116 > [ 137.358838] [<c1069612>] vfs_write+0x182/0x187 > [ 137.359176] [<c10696b8>] sys_write+0x3d/0x64 > [ 137.359513] [<c1002f78>] syscall_call+0x7/0xb > [ 137.359848] [<ffffffff>] 0xffffffff > [ 137.360200] > [ 137.360202] other info that might help us debug this: > [ 137.360204] > [ 137.360335] 3 locks held by tor/4857: > [ 137.360382] #0: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 137.360625] #1: (&ei->truncate_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > [ 137.361072] #2: (&s->s_dquot.dqptr_sem){----}, at: [<c1099403>] dquot_alloc_space+0x50/0x189 > [ 137.361313] > [ 137.361314] stack backtrace: > [ 137.361401] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > [ 137.361488] [<c1003d7f>] show_trace+0x12/0x14 > [ 137.361571] [<c1003e79>] dump_stack+0x16/0x18 > [ 137.361663] [<c1034d6d>] print_circular_bug_tail+0x6f/0x71 > [ 137.361752] [<c1035398>] check_prev_add+0x34/0x206 > [ 137.361838] [<c10355d4>] check_prevs_add+0x6a/0xd5 > [ 137.362786] [<c1036dd2>] __lock_acquire+0x61c/0xa05 > [ 137.362877] [<c1037624>] lock_acquire+0x62/0x81 > [ 137.362960] [<c1205c09>] __mutex_lock_slowpath+0x75/0x28c > [ 137.363047] [<c1205b88>] mutex_lock+0x8/0xa > [ 137.363132] [<c1098d79>] print_warning+0x8c/0x15d > [ 137.363217] [<c1099537>] dquot_alloc_space+0x184/0x189 > [ 137.363302] [<c10a98e6>] ext3_new_blocks+0x44b/0x5a2 > [ 137.363391] [<c10abe78>] ext3_alloc_blocks+0x40/0xdf > [ 137.363476] [<c10abf67>] ext3_alloc_branch+0x50/0x21b > [ 137.363561] [<c10ac445>] ext3_get_blocks_handle+0x1b8/0x367 > [ 137.363650] [<c10ac66c>] ext3_get_block+0x78/0xe3 > [ 137.363735] [<c108a239>] __block_prepare_write+0x168/0x3fd > [ 137.363822] [<c108ad10>] block_prepare_write+0x28/0x3b > [ 137.363908] [<c10acbcd>] ext3_prepare_write+0x3f/0x18d > [ 137.363996] [<c104dbe5>] generic_file_buffered_write+0x197/0x5e6 > [ 137.364084] [<c104e2e2>] __generic_file_aio_write_nolock+0x2ae/0x5ad > [ 137.364171] [<c104e6ea>] generic_file_aio_write+0x58/0xc4 > [ 137.364254] [<c10aa723>] ext3_file_write+0x2d/0xba > [ 137.364340] [<c1069441>] do_sync_write+0xc7/0x116 > [ 137.364425] [<c1069612>] vfs_write+0x182/0x187 > [ 137.364511] [<c10696b8>] sys_write+0x3d/0x64 > [ 137.364595] [<c1002f78>] syscall_call+0x7/0xb > [ 137.364682] ======================= > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski @ 2007-03-20 11:22 ` Jarek Poplawski 2007-03-20 11:31 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-20 11:22 UTC (permalink / raw) To: Folkert van Heusden Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig, Jan Kara On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote: ... > IMHO lockdep found that two locks are taken in different order: > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later) > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning() Should be: -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning() Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 11:22 ` Jarek Poplawski @ 2007-03-20 11:31 ` Jarek Poplawski 2007-03-20 12:19 ` Jan Kara 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-20 11:31 UTC (permalink / raw) To: Folkert van Heusden Cc: Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig, Jan Kara On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote: > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote: > ... > > IMHO lockdep found that two locks are taken in different order: > > > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later) > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning() Once more - should be: -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later) -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning() Very sorry!!! Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 11:31 ` Jarek Poplawski @ 2007-03-20 12:19 ` Jan Kara 2007-03-20 13:35 ` Arjan van de Ven 2007-03-20 13:44 ` Jarek Poplawski 0 siblings, 2 replies; 35+ messages in thread From: Jan Kara @ 2007-03-20 12:19 UTC (permalink / raw) To: Jarek Poplawski Cc: Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig On Tue 20-03-07 12:31:51, Jarek Poplawski wrote: > On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote: > > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote: > > ... > > > IMHO lockdep found that two locks are taken in different order: > > > > > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later) > > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning() > > Once more - should be: > -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later) > -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning() Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being acquired under dqptr_sem in quota code. But looking at the path from con_close() there's another inversion with i_mutex which is also acquired along the path for sysfs. And we can hardly get rid of it in the quota code. Now none of these is a real deadlock as quota should never call print_warning() for sysfs (it doesn't use quota) but still it's nasty. I suppose tty_mutex is above i_mutex because of those sysfs calls and it seems sysfs must be called under tty_mutex because of races with init_dev(). So it's not easy to get rid of that dependency either. Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 12:19 ` Jan Kara @ 2007-03-20 13:35 ` Arjan van de Ven 2007-03-20 14:21 ` Jan Kara 2007-03-20 13:44 ` Jarek Poplawski 1 sibling, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2007-03-20 13:35 UTC (permalink / raw) To: Jan Kara Cc: Jarek Poplawski, Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being > acquired under dqptr_sem in quota code. But looking at the path from > con_close() there's another inversion with i_mutex which is also acquired > along the path for sysfs. And we can hardly get rid of it in the quota code. > Now none of these is a real deadlock as quota should never call > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I > suppose tty_mutex is above i_mutex because of those sysfs calls and it > seems sysfs must be called under tty_mutex because of races with > init_dev(). So it's not easy to get rid of that dependency either. maybe a far more serious option: Why on EARTH is the quota code going to TTY's directly? That's just WRONG. Maybe it wasn't 10 years ago, but nowadays most people use graphical user interfaces and the like... can we please just get rid of this instead? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 13:35 ` Arjan van de Ven @ 2007-03-20 14:21 ` Jan Kara 2007-03-20 14:18 ` Arjan van de Ven 0 siblings, 1 reply; 35+ messages in thread From: Jan Kara @ 2007-03-20 14:21 UTC (permalink / raw) To: Arjan van de Ven Cc: Jarek Poplawski, Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig On Tue 20-03-07 14:35:10, Arjan van de Ven wrote: > > > > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being > > acquired under dqptr_sem in quota code. But looking at the path from > > con_close() there's another inversion with i_mutex which is also acquired > > along the path for sysfs. And we can hardly get rid of it in the quota code. > > Now none of these is a real deadlock as quota should never call > > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I > > suppose tty_mutex is above i_mutex because of those sysfs calls and it > > seems sysfs must be called under tty_mutex because of races with > > init_dev(). So it's not easy to get rid of that dependency either. > > maybe a far more serious option: Why on EARTH is the quota code going to > TTY's directly? That's just WRONG. Maybe it wasn't 10 years ago, but > nowadays most people use graphical user interfaces and the like... We've been discussing this sometimes back in August ;) (http://lkml.org/lkml/2006/8/8/237) and we've decided to leave the code in. The only reason why I think it should stay in is the existence of quota softlimits. There it's nice to warn the user and there's no other way to propagate this information into userspace (as the write succeeds). One solution would be to leave the warning to some userspace process (like warnquota) run from cron but still I'm not sure we should change the behavior. Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 14:21 ` Jan Kara @ 2007-03-20 14:18 ` Arjan van de Ven 0 siblings, 0 replies; 35+ messages in thread From: Arjan van de Ven @ 2007-03-20 14:18 UTC (permalink / raw) To: Jan Kara Cc: Jarek Poplawski, Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig On Tue, 2007-03-20 at 15:21 +0100, Jan Kara wrote: > On Tue 20-03-07 14:35:10, Arjan van de Ven wrote: > > > > > > > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being > > > acquired under dqptr_sem in quota code. But looking at the path from > > > con_close() there's another inversion with i_mutex which is also acquired > > > along the path for sysfs. And we can hardly get rid of it in the quota code. > > > Now none of these is a real deadlock as quota should never call > > > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I > > > suppose tty_mutex is above i_mutex because of those sysfs calls and it > > > seems sysfs must be called under tty_mutex because of races with > > > init_dev(). So it's not easy to get rid of that dependency either. > > > > maybe a far more serious option: Why on EARTH is the quota code going to > > TTY's directly? That's just WRONG. Maybe it wasn't 10 years ago, but > > nowadays most people use graphical user interfaces and the like... > We've been discussing this sometimes back in August ;) > (http://lkml.org/lkml/2006/8/8/237) and we've decided to leave the code in. > The only reason why I think it should stay in is the existence of quota > softlimits. There it's nice to warn the user and there's no other way to > propagate this information into userspace (as the write succeeds). > One solution would be to leave the warning to some userspace process > (like warnquota) run from cron but still I'm not sure we should change the > behavior. or send a uevent or something -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 12:19 ` Jan Kara 2007-03-20 13:35 ` Arjan van de Ven @ 2007-03-20 13:44 ` Jarek Poplawski 2007-03-20 14:00 ` Jan Kara 1 sibling, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-20 13:44 UTC (permalink / raw) To: Jan Kara Cc: Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig On Tue, Mar 20, 2007 at 01:19:09PM +0100, Jan Kara wrote: > On Tue 20-03-07 12:31:51, Jarek Poplawski wrote: > > On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote: > > > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote: > > > ... > > > > IMHO lockdep found that two locks are taken in different order: > > > > > > > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later) > > > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning() > > > > Once more - should be: > > -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later) > > -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning() > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being > acquired under dqptr_sem in quota code. But looking at the path from > con_close() there's another inversion with i_mutex which is also acquired > along the path for sysfs. And we can hardly get rid of it in the quota code. > Now none of these is a real deadlock as quota should never call > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I > suppose tty_mutex is above i_mutex because of those sysfs calls and it > seems sysfs must be called under tty_mutex because of races with > init_dev(). So it's not easy to get rid of that dependency either. I wonder if this cannot be done with a workqueue (message to a buffer, maybe after try_lock on tty_mutex) and BTW isn't there any "modern" way to queue console messages like these? Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: dquot.c: possible circular locking Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 13:44 ` Jarek Poplawski @ 2007-03-20 14:00 ` Jan Kara 0 siblings, 0 replies; 35+ messages in thread From: Jan Kara @ 2007-03-20 14:00 UTC (permalink / raw) To: Jarek Poplawski Cc: Folkert van Heusden, Andrew Morton, linux-kernel, Oleg Nesterov, Neil Brown, Christoph Hellwig On Tue 20-03-07 14:44:46, Jarek Poplawski wrote: > On Tue, Mar 20, 2007 at 01:19:09PM +0100, Jan Kara wrote: > > On Tue 20-03-07 12:31:51, Jarek Poplawski wrote: > > > On Tue, Mar 20, 2007 at 12:22:53PM +0100, Jarek Poplawski wrote: > > > > On Tue, Mar 20, 2007 at 12:17:01PM +0100, Jarek Poplawski wrote: > > > > ... > > > > > IMHO lockdep found that two locks are taken in different order: > > > > > > > > > > -> #1: 1) tty_mutex in con_console() 2) dqptr_sem (somewhere later) > > > > > -> #0: 1) dqptr_sem 2) tty_console in dquot_alloc_space() with print_warning() > > > > > > Once more - should be: > > > -> #1: 1) tty_mutex in con_close() 2) dqptr_sem (somewhere later) > > > -> #0: 1) dqptr_sem 2) tty_mutex in dquot_alloc_space() with print_warning() > > Yes, I was looking at it. Hmm, we can possibly get rid of tty_mutex being > > acquired under dqptr_sem in quota code. But looking at the path from > > con_close() there's another inversion with i_mutex which is also acquired > > along the path for sysfs. And we can hardly get rid of it in the quota code. > > Now none of these is a real deadlock as quota should never call > > print_warning() for sysfs (it doesn't use quota) but still it's nasty. I > > suppose tty_mutex is above i_mutex because of those sysfs calls and it > > seems sysfs must be called under tty_mutex because of races with > > init_dev(). So it's not easy to get rid of that dependency either. > > I wonder if this cannot be done with a workqueue (message to a buffer, > maybe after try_lock on tty_mutex) and BTW isn't there any "modern" > way to queue console messages like these? I don't know about any such way... Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock 2007-03-15 19:06 ` Andrew Morton 2007-03-15 19:17 ` Folkert van Heusden @ 2007-03-16 8:41 ` Peter Zijlstra 2007-03-16 11:39 ` Andrew Morton 1 sibling, 1 reply; 35+ messages in thread From: Peter Zijlstra @ 2007-03-16 8:41 UTC (permalink / raw) To: Andrew Morton Cc: Folkert van Heusden, linux-kernel, Oleg Nesterov, Neil Brown On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > ... > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > [ 1756.728392] 2 locks held by nfsd4/3577: > > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > [ 1756.729547] ======================= > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd) > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd] > > [ 1846.683832] 2 locks held by nfsd4/3577: > > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14 > > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18 > > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170 > > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165 > > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4 > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > Oleg, that's a fairly incomprehensible message we have in there. Can you > please explain what it means? I think I'm responsible for this message (commit d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the function executed by the workqueue (here laundromat_main) leaked an atomic context or is still holding locks (2 locks in this case). ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock 2007-03-16 8:41 ` Peter Zijlstra @ 2007-03-16 11:39 ` Andrew Morton 2007-03-19 6:24 ` Neil Brown 0 siblings, 1 reply; 35+ messages in thread From: Andrew Morton @ 2007-03-16 11:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Folkert van Heusden, linux-kernel, Oleg Nesterov, Neil Brown On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > > ... > > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > > [ 1756.728392] 2 locks held by nfsd4/3577: > > > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > > > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > > > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > > > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > > > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > > > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > > [ 1756.729547] ======================= > > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd) > > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd] > > > [ 1846.683832] 2 locks held by nfsd4/3577: > > > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14 > > > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18 > > > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170 > > > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165 > > > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4 > > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > > > Oleg, that's a fairly incomprehensible message we have in there. Can you > > please explain what it means? > > I think I'm responsible for this message (commit > d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the > function executed by the workqueue (here laundromat_main) leaked an > atomic context or is still holding locks (2 locks in this case). > OK. That's not necessarily a bug: one could envisage a (weird) piece of code which takes a lock then releases it on a later workqueue invokation. But I'm not sure that nfs4_laundromat() is actually supposed to be doing anything like that. Then again, maybe it is: it seems to be waddling through a directory under the control of a little state machine, with timeouts. Neil: help? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [2.6.20] BUG: workqueue leaked lock 2007-03-16 11:39 ` Andrew Morton @ 2007-03-19 6:24 ` Neil Brown 2007-03-20 9:37 ` [PATCH] " Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Neil Brown @ 2007-03-19 6:24 UTC (permalink / raw) To: Andrew Morton Cc: Peter Zijlstra, Folkert van Heusden, linux-kernel, Oleg Nesterov, J. Bruce Fields On Friday March 16, akpm@linux-foundation.org wrote: > > OK. That's not necessarily a bug: one could envisage a (weird) piece of > code which takes a lock then releases it on a later workqueue invokation. > But I'm not sure that nfs4_laundromat() is actually supposed to be doing > anything like that. > > Then again, maybe it is: it seems to be waddling through a directory under > the control of a little state machine, with timeouts. > > Neil: help? I'm quite certain that laundromat_main does *not* leave client_mutex locked as the last thing it does is call nfs4_unlock_state which is mutex_unlock(&client_mutex); To me, that raises some doubt about whether the lock leak check is working properly... It is somewhat harder to track locking of i_mutex, but it seems to me that every time it is taken, it is released again shortly afterwards. So I think this must be a problem with leak detection, not with NFSd. NeilBrown > On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > > On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > > > ... > > > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > > > [ 1756.728392] 2 locks held by nfsd4/3577: > > > > [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > > [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > > > [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > > > > [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > > > > [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > > > > [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > > > > [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > > > > [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > > > [ 1756.729547] ======================= > > > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd) > > > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd] > > > > [ 1846.683832] 2 locks held by nfsd4/3577: > > > > [ 1846.683885] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > > [ 1846.683988] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > > > [ 1846.683994] [<c1003d7f>] show_trace+0x12/0x14 > > > > [ 1846.683997] [<c1003e79>] dump_stack+0x16/0x18 > > > > [ 1846.684001] [<c102c2e8>] run_workqueue+0x167/0x170 > > > > [ 1846.684006] [<c102c437>] worker_thread+0x146/0x165 > > > > [ 1846.684012] [<c102f797>] kthread+0x97/0xc4 > > > > [ 1846.684023] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > > > > > Oleg, that's a fairly incomprehensible message we have in there. Can you > > > please explain what it means? > > > > I think I'm responsible for this message (commit > > d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the > > function executed by the workqueue (here laundromat_main) leaked an > > atomic context or is still holding locks (2 locks in this case). > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-19 6:24 ` Neil Brown @ 2007-03-20 9:37 ` Jarek Poplawski 2007-03-20 16:07 ` Oleg Nesterov 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-20 9:37 UTC (permalink / raw) To: Neil Brown Cc: Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, Oleg Nesterov, J. Bruce Fields, Ingo Molnar On 19-03-2007 07:24, Neil Brown wrote: > On Friday March 16, akpm@linux-foundation.org wrote: >> OK. That's not necessarily a bug: one could envisage a (weird) piece of >> code which takes a lock then releases it on a later workqueue invokation. >> But I'm not sure that nfs4_laundromat() is actually supposed to be doing >> anything like that. >> >> Then again, maybe it is: it seems to be waddling through a directory under >> the control of a little state machine, with timeouts. >> >> Neil: help? > > I'm quite certain that laundromat_main does *not* leave client_mutex > locked as the last thing it does is call nfs4_unlock_state which is > mutex_unlock(&client_mutex); > To me, that raises some doubt about whether the lock leak check is > working properly... > It is somewhat harder to track locking of i_mutex, but it seems to me > that every time it is taken, it is released again shortly afterwards. > > So I think this must be a problem with leak detection, not with NFSd. > > NeilBrown > > >> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: >> >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: >>>>> ... >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] >>>>> [ 1756.728392] 2 locks held by nfsd4/3577: >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 >>>>> [ 1756.729547] ======================= ... >>> I think I'm responsible for this message (commit >>> d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the >>> function executed by the workqueue (here laundromat_main) leaked an >>> atomic context or is still holding locks (2 locks in this case). This check is valid with keventd, but it looks like nfsd runs kthread by itself. I'm not sure it's illegal to hold locks then, so, if I'm not wrong with this, here is my patch proposal (for testing) to take this into consideration. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp 2.6.21-rc4-git4-/kernel/workqueue.c 2.6.21-rc4-git4/kernel/workqueue.c --- 2.6.21-rc4-git4-/kernel/workqueue.c 2007-02-04 19:44:54.000000000 +0100 +++ 2.6.21-rc4-git4/kernel/workqueue.c 2007-03-20 09:30:46.000000000 +0100 @@ -316,6 +316,7 @@ static void run_workqueue(struct cpu_wor struct work_struct *work = list_entry(cwq->worklist.next, struct work_struct, entry); work_func_t f = work->func; + int ld; list_del_init(cwq->worklist.next); spin_unlock_irqrestore(&cwq->lock, flags); @@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor BUG_ON(get_wq_data(work) != cwq); if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work))) work_release(work); + ld = lockdep_depth(current); + f(work); - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) { + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) { printk(KERN_ERR "BUG: workqueue leaked lock or atomic: " - "%s/0x%08x/%d\n", + "%s/0x%08x/%d/%d\n", current->comm, preempt_count(), - current->pid); + current->pid, ld); printk(KERN_ERR " last function: "); print_symbol("%s\n", (unsigned long)f); debug_show_held_locks(current); ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 9:37 ` [PATCH] " Jarek Poplawski @ 2007-03-20 16:07 ` Oleg Nesterov 2007-03-21 8:05 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Oleg Nesterov @ 2007-03-20 16:07 UTC (permalink / raw) To: Jarek Poplawski Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On 03/20, Jarek Poplawski wrote: > > >> On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > >> > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > >>>>> ... > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577: > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > >>>>> [ 1756.729547] ======================= > ... > >>> I think I'm responsible for this message (commit > >>> d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the > >>> function executed by the workqueue (here laundromat_main) leaked an > >>> atomic context or is still holding locks (2 locks in this case). > > This check is valid with keventd, but it looks like nfsd runs > kthread by itself. I'm not sure it's illegal to hold locks then, nfsd creates laundry_wq by itself, yes, but cwq->thread runs with lockdep_depth() == 0. Unless we have a bug with lockdep_depth(), lockdep_depth() != 0 means that work->func() returns with a lock held (or it can flush its own workqueue under lock, but in that case we should have a different trace). Personally I agree with Andrew: > > > OK. That's not necessarily a bug: one could envisage a (weird) piece of > > code which takes a lock then releases it on a later workqueue invokation. > @@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor > BUG_ON(get_wq_data(work) != cwq); > if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work))) > work_release(work); > + ld = lockdep_depth(current); > + > f(work); > > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) { > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) { and with this change we will also have a BUG report on "then releases it on a later workqueue invokation". Btw. __nfs4_state_shutdown() does cancel_delayed_work(&laundromat_work). This is unneeded and imho confusing. laundry_wq was already destroyed, it would be BUG to have a pending delayed work. Oleg. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-20 16:07 ` Oleg Nesterov @ 2007-03-21 8:05 ` Jarek Poplawski 2007-03-21 14:46 ` Oleg Nesterov 0 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-21 8:05 UTC (permalink / raw) To: Oleg Nesterov Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Tue, Mar 20, 2007 at 07:07:59PM +0300, Oleg Nesterov wrote: > On 03/20, Jarek Poplawski wrote: ... > > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > >>>>> ... > > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577: > > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > >>>>> [ 1756.729547] ======================= ... > > This check is valid with keventd, but it looks like nfsd runs > > kthread by itself. I'm not sure it's illegal to hold locks then, > > nfsd creates laundry_wq by itself, yes, but cwq->thread runs with > lockdep_depth() == 0. Unless we have a bug with lockdep_depth(), > lockdep_depth() != 0 means that work->func() returns with a lock > held (or it can flush its own workqueue under lock, but in that case > we should have a different trace). IMHO you can only say this thread is supposed to run with lockdep_depth() == 0. lockdep_depth is counted within a process, which starts before f(), so the only way to say f() leaked locks is to check these locks before and after f(). > > Personally I agree with Andrew: > > > > > OK. That's not necessarily a bug: one could envisage a (weird) piece of > > > code which takes a lock then releases it on a later workqueue invokation. But this code is named here as laundromat_main and it doesn't seem to work like this. > > > @@ -323,13 +324,15 @@ static void run_workqueue(struct cpu_wor > > BUG_ON(get_wq_data(work) != cwq); > > if (!test_bit(WORK_STRUCT_NOAUTOREL, work_data_bits(work))) > > work_release(work); > > + ld = lockdep_depth(current); > > + > > f(work); > > > > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) { > > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) { > > and with this change we will also have a BUG report on "then releases it on a > later workqueue invokation". Then we could say at least this code is weird (buggy - in my opinion). This patch doesn't change the way the "standard" code is treated, so I cannot see any possibility to get it worse then now. Regards, Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-21 8:05 ` Jarek Poplawski @ 2007-03-21 14:46 ` Oleg Nesterov 2007-03-21 15:16 ` Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Oleg Nesterov @ 2007-03-21 14:46 UTC (permalink / raw) To: Jarek Poplawski Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On 03/21, Jarek Poplawski wrote: > > On Tue, Mar 20, 2007 at 07:07:59PM +0300, Oleg Nesterov wrote: > > On 03/20, Jarek Poplawski wrote: > ... > > > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > > >>>>> ... > > > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577: > > > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > > > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > > > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > > > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > > > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > > > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > > >>>>> [ 1756.729547] ======================= > ... > > > This check is valid with keventd, but it looks like nfsd runs > > > kthread by itself. I'm not sure it's illegal to hold locks then, > > > > nfsd creates laundry_wq by itself, yes, but cwq->thread runs with > > lockdep_depth() == 0. Unless we have a bug with lockdep_depth(), > > lockdep_depth() != 0 means that work->func() returns with a lock > > held (or it can flush its own workqueue under lock, but in that case > > we should have a different trace). > > IMHO you can only say this thread is supposed to run with > lockdep_depth() == 0. lockdep_depth is counted within a process, > which starts before f(), so the only way to say f() leaked locks > is to check these locks before and after f(). Sorry, I can't understand you. lockdep_depth is counted within a process, which starts before f(), yes. This process is cwq->thread, it was forked during create_workqueue(). It does not take any locks directly, only by calling work->func(). laundry_wq doesn't differ from keventd_wq or any other wq in this sense. nfsd does not "runs kthread by itself", it inserts the work and wakes up cwq->thread. > > Personally I agree with Andrew: > > > > > > > OK. That's not necessarily a bug: one could envisage a (weird) piece of > > > > code which takes a lock then releases it on a later workqueue invokation. > > But this code is named here as laundromat_main and it doesn't > seem to work like this. This means we have a problem with leak detection. > > > + ld = lockdep_depth(current); > > > + > > > f(work); > > > > > > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) { > > > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) { > > > > and with this change we will also have a BUG report on "then releases it on a > > later workqueue invokation". > > Then we could say at least this code is weird (buggy - in my opinion). > This patch doesn't change the way the "standard" code is treated, > so I cannot see any possibility to get it worse then now. I didn't mean this patch makes things worse (except it conflicts with other patches in -mm tree). In fact, it may improve the diagnostics. My point is that this patch afaics has nothing to do with the discussed problem. Oleg. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-21 14:46 ` Oleg Nesterov @ 2007-03-21 15:16 ` Jarek Poplawski 2007-03-21 15:17 ` Folkert van Heusden 2007-03-21 15:29 ` Oleg Nesterov 0 siblings, 2 replies; 35+ messages in thread From: Jarek Poplawski @ 2007-03-21 15:16 UTC (permalink / raw) To: Oleg Nesterov Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Wed, Mar 21, 2007 at 05:46:20PM +0300, Oleg Nesterov wrote: > On 03/21, Jarek Poplawski wrote: > > > > On Tue, Mar 20, 2007 at 07:07:59PM +0300, Oleg Nesterov wrote: > > > On 03/20, Jarek Poplawski wrote: > > ... > > > > >>> On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > > > >>>>> On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden <folkert@vanheusden.com> wrote: > > > > >>>>> ... > > > > >>>>> [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > > >>>>> [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > > > >>>>> [ 1756.728392] 2 locks held by nfsd4/3577: > > > > >>>>> [ 1756.728435] #0: (client_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > > >>>>> [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [<c1205b88>] mutex_lock+0x8/0xa > > > > >>>>> [ 1756.728923] [<c1003d57>] show_trace_log_lvl+0x1a/0x30 > > > > >>>>> [ 1756.729015] [<c1003d7f>] show_trace+0x12/0x14 > > > > >>>>> [ 1756.729103] [<c1003e79>] dump_stack+0x16/0x18 > > > > >>>>> [ 1756.729187] [<c102c2e8>] run_workqueue+0x167/0x170 > > > > >>>>> [ 1756.729276] [<c102c437>] worker_thread+0x146/0x165 > > > > >>>>> [ 1756.729368] [<c102f797>] kthread+0x97/0xc4 > > > > >>>>> [ 1756.729456] [<c1003bdb>] kernel_thread_helper+0x7/0x10 > > > > >>>>> [ 1756.729547] ======================= > > ... > > > > This check is valid with keventd, but it looks like nfsd runs > > > > kthread by itself. I'm not sure it's illegal to hold locks then, > > > > > > nfsd creates laundry_wq by itself, yes, but cwq->thread runs with > > > lockdep_depth() == 0. Unless we have a bug with lockdep_depth(), > > > lockdep_depth() != 0 means that work->func() returns with a lock > > > held (or it can flush its own workqueue under lock, but in that case > > > we should have a different trace). > > > > IMHO you can only say this thread is supposed to run with > > lockdep_depth() == 0. lockdep_depth is counted within a process, > > which starts before f(), so the only way to say f() leaked locks > > is to check these locks before and after f(). > > Sorry, I can't understand you. lockdep_depth is counted within a process, > which starts before f(), yes. This process is cwq->thread, it was forked > during create_workqueue(). It does not take any locks directly, only by > calling work->func(). laundry_wq doesn't differ from keventd_wq or any other > wq in this sense. nfsd does not "runs kthread by itself", it inserts the > work and wakes up cwq->thread. I think we know how it all should start and go. If only analyzing the code could be enough, this current check of lockdep_depth() is unnecessary too, because laundromat_code code looks as good as run_workqueue code. I send it for testing and I mean it - something strange is going here, so some checks should be added - I didn't say mine is the right and will certainly help. So, if you have another idea for looking after it, let us know. > > > > Personally I agree with Andrew: > > > > > > > > > OK. That's not necessarily a bug: one could envisage a (weird) piece of > > > > > code which takes a lock then releases it on a later workqueue invokation. > > > > But this code is named here as laundromat_main and it doesn't > > seem to work like this. > > This means we have a problem with leak detection. I totally agree! > > > > > + ld = lockdep_depth(current); > > > > + > > > > f(work); > > > > > > > > - if (unlikely(in_atomic() || lockdep_depth(current) > 0)) { > > > > + if (unlikely(in_atomic() || (ld -= lockdep_depth(current)))) { > > > > > > and with this change we will also have a BUG report on "then releases it on a > > > later workqueue invokation". > > > > Then we could say at least this code is weird (buggy - in my opinion). > > This patch doesn't change the way the "standard" code is treated, > > so I cannot see any possibility to get it worse then now. > > I didn't mean this patch makes things worse (except it conflicts with other > patches in -mm tree). In fact, it may improve the diagnostics. My point is > that this patch afaics has nothing to do with the discussed problem. I thought you are trying to diagnose a bug and maybe Folkert could test, if this patch makes any difference. Sorry, if I missed the real problem. Cheers, Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-21 15:16 ` Jarek Poplawski @ 2007-03-21 15:17 ` Folkert van Heusden 2007-03-21 15:29 ` Oleg Nesterov 1 sibling, 0 replies; 35+ messages in thread From: Folkert van Heusden @ 2007-03-21 15:17 UTC (permalink / raw) To: Jarek Poplawski Cc: Oleg Nesterov, Neil Brown, Andrew Morton, Peter Zijlstra, linux-kernel, J. Bruce Fields, Ingo Molnar > I thought you are trying to diagnose a bug and maybe Folkert could test, > if this patch makes any difference. Sorry, if I missed the real problem. Yes please let me know when I can test a patch. Folkert van Heusden -- www.biglumber.com <- site where one can exchange PGP key signatures ---------------------------------------------------------------------- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-21 15:16 ` Jarek Poplawski 2007-03-21 15:17 ` Folkert van Heusden @ 2007-03-21 15:29 ` Oleg Nesterov 2007-03-21 18:16 ` Oleg Nesterov 1 sibling, 1 reply; 35+ messages in thread From: Oleg Nesterov @ 2007-03-21 15:29 UTC (permalink / raw) To: Jarek Poplawski Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On 03/21, Jarek Poplawski wrote: > > > Sorry, I can't understand you. lockdep_depth is counted within a process, > > which starts before f(), yes. This process is cwq->thread, it was forked > > during create_workqueue(). It does not take any locks directly, only by > > calling work->func(). laundry_wq doesn't differ from keventd_wq or any other > > wq in this sense. nfsd does not "runs kthread by itself", it inserts the > > work and wakes up cwq->thread. > > I think we know how it all should start and go. If only analyzing > the code could be enough, this current check of lockdep_depth() > is unnecessary too, because laundromat_code code looks as good > as run_workqueue code. I send it for testing and I mean it - > something strange is going here, so some checks should be added > - I didn't say mine is the right and will certainly help. Ah, sorry, I misunderstood you. I had (my fault) a false impression that you propose this patch as a "fix", while we seem to agree that the real source of this problem remains unknown. > So, if > you have another idea for looking after it, let us know. No, I don't. Oleg. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] Re: [2.6.20] BUG: workqueue leaked lock 2007-03-21 15:29 ` Oleg Nesterov @ 2007-03-21 18:16 ` Oleg Nesterov 2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski 0 siblings, 1 reply; 35+ messages in thread From: Oleg Nesterov @ 2007-03-21 18:16 UTC (permalink / raw) To: Jarek Poplawski Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On 03/21, Oleg Nesterov wrote: > > On 03/21, Jarek Poplawski wrote: > > > > So, if > > you have another idea for looking after it, let us know. > > No, I don't. Perhaps we can add lockdep_depth() checks to laundromat_main/nfs4_laundromat to rule out workqueue.c. According to the trace, we return with client_mutex held, this seems impossible. So we can also print the value of mutex_is_locked() if lockdep_depth() != 0. Oleg. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-21 18:16 ` Oleg Nesterov @ 2007-03-22 6:11 ` Jarek Poplawski 2007-03-22 6:28 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Jarek Poplawski @ 2007-03-22 6:11 UTC (permalink / raw) To: Oleg Nesterov Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar Here is some joke: [PATCH] lockdep: lockdep_depth vs. debug_locks lockdep really shouldn't be used when debug_locks == 0! Reported-by: Folkert van Heusden <folkert@vanheusden.com> Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100 +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100 @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_ # define INIT_LOCKDEP .lockdep_recursion = 0, -#define lockdep_depth(tsk) ((tsk)->lockdep_depth) +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0) #else /* !LOCKDEP */ ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski @ 2007-03-22 6:28 ` Andrew Morton 2007-03-22 7:06 ` Jarek Poplawski ` (2 more replies) 2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski 2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra 2 siblings, 3 replies; 35+ messages in thread From: Andrew Morton @ 2007-03-22 6:28 UTC (permalink / raw) To: Jarek Poplawski Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Thu, 22 Mar 2007 07:11:19 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote: > > Here is some joke: > > [PATCH] lockdep: lockdep_depth vs. debug_locks > > lockdep really shouldn't be used when debug_locks == 0! > This isn't a very good changelog. > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > --- > > diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h > --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100 > +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100 > @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_ > > # define INIT_LOCKDEP .lockdep_recursion = 0, > > -#define lockdep_depth(tsk) ((tsk)->lockdep_depth) > +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0) > > #else /* !LOCKDEP */ > What problem does this solve, and how does it solve it? I assume that some codepath is incrementing ->lockdep_depth even when debug_locks==0. Isn't that wrong of it? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 6:28 ` Andrew Morton @ 2007-03-22 7:06 ` Jarek Poplawski 2007-03-22 7:23 ` Jarek Poplawski 2007-03-22 7:13 ` Jarek Poplawski 2007-03-22 8:26 ` Jarek Poplawski 2 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-22 7:06 UTC (permalink / raw) To: Andrew Morton Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Wed, Mar 21, 2007 at 10:28:02PM -0800, Andrew Morton wrote: > On Thu, 22 Mar 2007 07:11:19 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote: > > > > > Here is some joke: > > > > [PATCH] lockdep: lockdep_depth vs. debug_locks > > > > lockdep really shouldn't be used when debug_locks == 0! > > > > This isn't a very good changelog. > > > > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > > > --- > > > > diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h > > --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100 > > +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100 > > @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_ > > > > # define INIT_LOCKDEP .lockdep_recursion = 0, > > > > -#define lockdep_depth(tsk) ((tsk)->lockdep_depth) > > +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0) > > > > #else /* !LOCKDEP */ > > > > What problem does this solve, and how does it solve it? > > I assume that some codepath is incrementing ->lockdep_depth even when > debug_locks==0. Isn't that wrong of it? > This should definitely solve this problem - as it was said a few times before lockdep stops registering locks after a bug, so even the lock which caused the warning isn't reported. Here lockdep found a bug in a workqueue function, so after this no lock or unlock isn't counted nor reported. I think Ingo can tell more. Regards, Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 7:06 ` Jarek Poplawski @ 2007-03-22 7:23 ` Jarek Poplawski 0 siblings, 0 replies; 35+ messages in thread From: Jarek Poplawski @ 2007-03-22 7:23 UTC (permalink / raw) To: Andrew Morton Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Thu, Mar 22, 2007 at 08:06:44AM +0100, Jarek Poplawski wrote: ... > This should definitely solve this problem - as it was said > a few times before lockdep stops registering locks after > a bug, so even the lock which caused the warning isn't > reported. Here lockdep found a bug in a workqueue function, More precisely: lockdep found a bug during a run of workqueue function - this could be also caused by a bug from other code running simultaneusly. Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 6:28 ` Andrew Morton 2007-03-22 7:06 ` Jarek Poplawski @ 2007-03-22 7:13 ` Jarek Poplawski 2007-03-22 8:26 ` Jarek Poplawski 2 siblings, 0 replies; 35+ messages in thread From: Jarek Poplawski @ 2007-03-22 7:13 UTC (permalink / raw) To: Andrew Morton Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Wed, Mar 21, 2007 at 10:28:02PM -0800, Andrew Morton wrote: ... > I assume that some codepath is incrementing ->lockdep_depth even when > debug_locks==0. Isn't that wrong of it? > lockdep simply stops to update lockdep_depth just after (during) a bug or a WARN. Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 6:28 ` Andrew Morton 2007-03-22 7:06 ` Jarek Poplawski 2007-03-22 7:13 ` Jarek Poplawski @ 2007-03-22 8:26 ` Jarek Poplawski 2 siblings, 0 replies; 35+ messages in thread From: Jarek Poplawski @ 2007-03-22 8:26 UTC (permalink / raw) To: Andrew Morton Cc: Oleg Nesterov, Neil Brown, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Wed, Mar 21, 2007 at 10:28:02PM -0800, Andrew Morton wrote: > On Thu, 22 Mar 2007 07:11:19 +0100 Jarek Poplawski <jarkao2@o2.pl> wrote: > > > > > Here is some joke: > > > > [PATCH] lockdep: lockdep_depth vs. debug_locks > > > > lockdep really shouldn't be used when debug_locks == 0! > > > > This isn't a very good changelog. Actually it's not a changelog (it's a joke). Jarek P. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks 2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski 2007-03-22 6:28 ` Andrew Morton @ 2007-03-22 6:57 ` Jarek Poplawski 2007-03-22 7:23 ` Peter Zijlstra 2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra 2 siblings, 1 reply; 35+ messages in thread From: Jarek Poplawski @ 2007-03-22 6:57 UTC (permalink / raw) To: Oleg Nesterov Cc: Neil Brown, Andrew Morton, Peter Zijlstra, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar And here is some addition. [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks lockdep's data shouldn't be used when debug_locks == 0 because it's not updated after this, so it's more misleading than helpful. PS: probably lockdep's current-> fields should be reset after it turns debug_locks off: so, after printing a bug report, but before return from exported functions, but there are really a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON), so, something could be missed. (Of course direct use of this fields isn't recommended either.) Reported-by: Folkert van Heusden <folkert@vanheusden.com> Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp 2.6.21-rc4-git4-/kernel/lockdep.c 2.6.21-rc4-git4/kernel/lockdep.c --- 2.6.21-rc4-git4-/kernel/lockdep.c 2007-03-21 22:46:26.000000000 +0100 +++ 2.6.21-rc4-git4/kernel/lockdep.c 2007-03-21 23:05:17.000000000 +0100 @@ -2742,6 +2742,10 @@ void debug_show_all_locks(void) int count = 10; int unlock = 1; + if (unlikely(!debug_locks)) { + printk("INFO: lockdep is turned off.\n"); + return; + } printk("\nShowing all locks held in the system:\n"); /* @@ -2785,6 +2789,10 @@ EXPORT_SYMBOL_GPL(debug_show_all_locks); void debug_show_held_locks(struct task_struct *task) { + if (unlikely(!debug_locks)) { + printk("INFO: lockdep is turned off.\n"); + return; + } lockdep_print_held_locks(task); } ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks 2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski @ 2007-03-22 7:23 ` Peter Zijlstra 2007-03-22 9:06 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Peter Zijlstra @ 2007-03-22 7:23 UTC (permalink / raw) To: Jarek Poplawski Cc: Oleg Nesterov, Neil Brown, Andrew Morton, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Thu, 2007-03-22 at 07:57 +0100, Jarek Poplawski wrote: > And here is some addition. > > [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks > > lockdep's data shouldn't be used when debug_locks == 0 > because it's not updated after this, so it's more misleading > than helpful. > > PS: probably lockdep's current-> fields should be reset after > it turns debug_locks off: so, after printing a bug report, but > before return from exported functions, but there are really > a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON), > so, something could be missed. (Of course direct use of this > fields isn't recommended either.) > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > > diff -Nurp 2.6.21-rc4-git4-/kernel/lockdep.c 2.6.21-rc4-git4/kernel/lockdep.c > --- 2.6.21-rc4-git4-/kernel/lockdep.c 2007-03-21 22:46:26.000000000 +0100 > +++ 2.6.21-rc4-git4/kernel/lockdep.c 2007-03-21 23:05:17.000000000 +0100 > @@ -2742,6 +2742,10 @@ void debug_show_all_locks(void) > int count = 10; > int unlock = 1; > > + if (unlikely(!debug_locks)) { > + printk("INFO: lockdep is turned off.\n"); > + return; > + } > printk("\nShowing all locks held in the system:\n"); > > /* > @@ -2785,6 +2789,10 @@ EXPORT_SYMBOL_GPL(debug_show_all_locks); > > void debug_show_held_locks(struct task_struct *task) > { > + if (unlikely(!debug_locks)) { > + printk("INFO: lockdep is turned off.\n"); > + return; > + } > lockdep_print_held_locks(task); > } > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks 2007-03-22 7:23 ` Peter Zijlstra @ 2007-03-22 9:06 ` Ingo Molnar 0 siblings, 0 replies; 35+ messages in thread From: Ingo Molnar @ 2007-03-22 9:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Jarek Poplawski, Oleg Nesterov, Neil Brown, Andrew Morton, Folkert van Heusden, linux-kernel, J. Bruce Fields * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Thu, 2007-03-22 at 07:57 +0100, Jarek Poplawski wrote: > > And here is some addition. > > > > [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks > > > > lockdep's data shouldn't be used when debug_locks == 0 > > because it's not updated after this, so it's more misleading > > than helpful. > > > > PS: probably lockdep's current-> fields should be reset after > > it turns debug_locks off: so, after printing a bug report, but > > before return from exported functions, but there are really > > a lot of these possibilities (e.g. after DEBUG_LOCKS_WARN_ON), > > so, something could be missed. (Of course direct use of this > > fields isn't recommended either.) > > > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski 2007-03-22 6:28 ` Andrew Morton 2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski @ 2007-03-22 7:22 ` Peter Zijlstra 2007-03-22 9:06 ` Ingo Molnar 2 siblings, 1 reply; 35+ messages in thread From: Peter Zijlstra @ 2007-03-22 7:22 UTC (permalink / raw) To: Jarek Poplawski Cc: Oleg Nesterov, Neil Brown, Andrew Morton, Folkert van Heusden, linux-kernel, J. Bruce Fields, Ingo Molnar On Thu, 2007-03-22 at 07:11 +0100, Jarek Poplawski wrote: > Here is some joke: > > [PATCH] lockdep: lockdep_depth vs. debug_locks > > lockdep really shouldn't be used when debug_locks == 0! This happens then lockdep reports a fatal error, at which point it will stop tracking locks and leave whatever state was there. > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> This looks sane, thanks for figuring this out. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > > diff -Nurp 2.6.21-rc4-git4-/include/linux/lockdep.h 2.6.21-rc4-git4/include/linux/lockdep.h > --- 2.6.21-rc4-git4-/include/linux/lockdep.h 2007-03-20 20:24:17.000000000 +0100 > +++ 2.6.21-rc4-git4/include/linux/lockdep.h 2007-03-21 22:32:41.000000000 +0100 > @@ -245,7 +245,7 @@ extern void lock_release(struct lockdep_ > > # define INIT_LOCKDEP .lockdep_recursion = 0, > > -#define lockdep_depth(tsk) ((tsk)->lockdep_depth) > +#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0) > > #else /* !LOCKDEP */ > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock 2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra @ 2007-03-22 9:06 ` Ingo Molnar 0 siblings, 0 replies; 35+ messages in thread From: Ingo Molnar @ 2007-03-22 9:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Jarek Poplawski, Oleg Nesterov, Neil Brown, Andrew Morton, Folkert van Heusden, linux-kernel, J. Bruce Fields * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Thu, 2007-03-22 at 07:11 +0100, Jarek Poplawski wrote: > > Here is some joke: > > > > [PATCH] lockdep: lockdep_depth vs. debug_locks > > > > lockdep really shouldn't be used when debug_locks == 0! > > This happens then lockdep reports a fatal error, at which point > it will stop tracking locks and leave whatever state was there. > > > Reported-by: Folkert van Heusden <folkert@vanheusden.com> > > Inspired-by: Oleg Nesterov <oleg@tv-sign.ru> > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > This looks sane, thanks for figuring this out. > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2007-03-22 9:07 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-03-13 16:50 [2.6.20] BUG: workqueue leaked lock Folkert van Heusden 2007-03-15 19:06 ` Andrew Morton 2007-03-15 19:17 ` Folkert van Heusden 2007-03-16 14:49 ` Jarek Poplawski 2007-03-20 11:17 ` dquot.c: possible circular locking " Jarek Poplawski 2007-03-20 11:22 ` Jarek Poplawski 2007-03-20 11:31 ` Jarek Poplawski 2007-03-20 12:19 ` Jan Kara 2007-03-20 13:35 ` Arjan van de Ven 2007-03-20 14:21 ` Jan Kara 2007-03-20 14:18 ` Arjan van de Ven 2007-03-20 13:44 ` Jarek Poplawski 2007-03-20 14:00 ` Jan Kara 2007-03-16 8:41 ` Peter Zijlstra 2007-03-16 11:39 ` Andrew Morton 2007-03-19 6:24 ` Neil Brown 2007-03-20 9:37 ` [PATCH] " Jarek Poplawski 2007-03-20 16:07 ` Oleg Nesterov 2007-03-21 8:05 ` Jarek Poplawski 2007-03-21 14:46 ` Oleg Nesterov 2007-03-21 15:16 ` Jarek Poplawski 2007-03-21 15:17 ` Folkert van Heusden 2007-03-21 15:29 ` Oleg Nesterov 2007-03-21 18:16 ` Oleg Nesterov 2007-03-22 6:11 ` [PATCH] lockdep: lockdep_depth vs. debug_locks " Jarek Poplawski 2007-03-22 6:28 ` Andrew Morton 2007-03-22 7:06 ` Jarek Poplawski 2007-03-22 7:23 ` Jarek Poplawski 2007-03-22 7:13 ` Jarek Poplawski 2007-03-22 8:26 ` Jarek Poplawski 2007-03-22 6:57 ` [PATCH] lockdep: debug_show_all_locks & debug_show_held_locks vs. debug_locks Jarek Poplawski 2007-03-22 7:23 ` Peter Zijlstra 2007-03-22 9:06 ` Ingo Molnar 2007-03-22 7:22 ` [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock Peter Zijlstra 2007-03-22 9:06 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).