All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs deadlock in 3.5-rc3
@ 2012-06-23  8:50 Stefan Priebe
  2012-06-23 13:46 ` Michael
  2012-06-25 13:08 ` Josef Bacik
  0 siblings, 2 replies; 23+ messages in thread
From: Stefan Priebe @ 2012-06-23  8:50 UTC (permalink / raw)
  To: linux-btrfs

Hello list,

i've seen this deadlock today and can reproduce it while using ceph.

Is this a known bug?

[  599.514534] INFO: task kworker/6:0:29 blocked for more than 120 seconds.
[  599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  599.546832] kworker/6:0     D ffffffff8180ebc0     0    29      2 
0x00000000
[  599.563725]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 
0000000000012280
[  599.580812]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 
0000000000012280
[  599.580813]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 
ffff880e78f4a180
[  599.580814] Call Trace:
[  599.580820]  [<ffffffff81620334>] schedule+0x24/0x70
[  599.580832]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  599.580836]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580844]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  599.580846]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  599.580847]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580853]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  599.580859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  599.580861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  599.580862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  599.580863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  599.580865]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  599.580866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  599.580868]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  599.580869]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
[  599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 
seconds.
[  599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  599.580873] kworker/2:1     D ffffffff8180ebc0     0  1184      2 
0x00000000
[  599.580874]  ffff880e448cfc90 0000000000000046 ffff880e46cce380 
0000000000012280
[  599.580875]  ffff880e448cffd8 ffff880e448ce010 0000000000012280 
0000000000012280
[  599.580876]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240 
ffff880e46cce380
[  599.580877] Call Trace:
[  599.580879]  [<ffffffff81620334>] schedule+0x24/0x70
[  599.580884]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  599.580886]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580891]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  599.580893]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  599.580894]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580899]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  599.580905]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  599.580906]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  599.580907]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  599.580908]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  599.580910]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  599.580911]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  599.580913]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  599.580914]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
[  599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 
seconds.
[  599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  599.580916] kworker/5:1     D ffffffff8180ebc0     0  1186      2 
0x00000000
[  599.580917]  ffff880e448a3c90 0000000000000046 ffff880e46d36600 
0000000000012280
[  599.580919]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280 
0000000000012280
[  599.580920]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300 
ffff880e46d36600
[  599.580920] Call Trace:
[  599.580922]  [<ffffffff81620334>] schedule+0x24/0x70
[  599.580927]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  599.580929]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580934]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  599.580936]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  599.580937]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580942]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  599.580948]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  599.580949]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  599.580950]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  599.580951]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  599.580952]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  599.580954]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  599.580955]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  599.580957]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
[  599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds.
[  599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  599.580967] ceph-osd        D 0000000000000006     0  3248      1 
0x00000000
[  599.580968]  ffff880bc4201c38 0000000000000082 ffff880e3efc4480 
0000000000012280
[  599.580970]  ffff880bc4201fd8 ffff880bc4200010 0000000000012280 
0000000000012280
[  599.580971]  ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 
ffff880e3efc4480
[  599.580971] Call Trace:
[  599.580972]  [<ffffffff81620334>] schedule+0x24/0x70
[  599.580978]  [<ffffffffa009dd65>] 
btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
[  599.580983]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs]
[  599.580985]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.580990]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 
[btrfs]
[  599.580997]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
[  599.581004]  [<ffffffffa00ceec9>] 
btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
[  599.581010]  [<ffffffffa00d0bc4>] 
btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs]
[  599.581016]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
[  599.581019]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
[  599.581021]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
[  599.581022]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
[  599.581024]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
[  599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds.
[  599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  599.581036] ceph-osd        D 0000000000000002     0  3278      1 
0x00000000
[  599.581038]  ffff880bc41fbc38 0000000000000082 ffff880e412283c0 
0000000000012280
[  599.581039]  ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 
0000000000012280
[  599.581040]  ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 
ffff880e412283c0
[  599.581040] Call Trace:
[  599.581042]  [<ffffffff81620334>] schedule+0x24/0x70
[  599.581048]  [<ffffffffa009dd65>] 
btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
[  599.581053]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs]
[  599.581055]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.581059]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 
[btrfs]
[  599.581066]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
[  599.581071]  [<ffffffffa00ceec9>] 
btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
[  599.581077]  [<ffffffffa00d0bc4>] 
btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs]
[  599.581082]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
[  599.581084]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
[  599.581085]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
[  599.581086]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
[  599.581088]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
[  599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds.
[  599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  599.581095] ceph-osd        D 0000000000000005     0  3330      1 
0x00000000
[  599.581096]  ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 
0000000000012280
[  599.581098]  ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 
0000000000012280
[  599.581099]  ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 
ffff880bc3bd4740
[  599.581099] Call Trace:
[  599.581101]  [<ffffffff81620334>] schedule+0x24/0x70
[  599.581106]  [<ffffffffa009dd65>] 
btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
[  599.581111]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs]
[  599.581113]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  599.581118]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 
[btrfs]
[  599.581123]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
[  599.581129]  [<ffffffffa00ceec9>] 
btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
[  599.581134]  [<ffffffffa00d0bc4>] 
btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs]
[  599.581139]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
[  599.581141]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
[  599.581142]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
[  599.581143]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
[  599.581145]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
[  719.395814] INFO: task kworker/6:0:29 blocked for more than 120 seconds.
[  719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  719.439537] kworker/6:0     D ffffffff8180ebc0     0    29      2 
0x00000000
[  719.461937]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 
0000000000012280
[  719.484547]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 
0000000000012280
[  719.507211]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 
ffff880e78f4a180
[  719.530071] Call Trace:
[  719.552729]  [<ffffffff81620334>] schedule+0x24/0x70
[  719.552741]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  719.552744]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552752]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  719.552755]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  719.552756]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552762]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  719.552768]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  719.552770]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  719.552771]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  719.552772]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  719.552774]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  719.552776]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  719.552777]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  719.552779]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
[  719.552781] INFO: task kworker/3:1:578 blocked for more than 120 seconds.
[  719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  719.552783] kworker/3:1     D ffffffff8180ebc0     0   578      2 
0x00000000
[  719.552784]  ffff880e45dd3c90 0000000000000046 ffff880e7890e080 
0000000000012280
[  719.552786]  ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 
0000000000012280
[  719.552787]  ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 
ffff880e7890e080
[  719.552787] Call Trace:
[  719.552789]  [<ffffffff81620334>] schedule+0x24/0x70
[  719.552795]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  719.552797]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552802]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  719.552804]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  719.552805]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552810]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  719.552816]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  719.552817]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  719.552818]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  719.552819]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  719.552820]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  719.552822]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  719.552824]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  719.552825]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
[  719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 
seconds.
[  719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  719.552828] kworker/2:1     D ffffffff8180ebc0     0  1184      2 
0x00000000
[  719.552830]  ffff880e448cfc90 0000000000000046 ffff880e46cce380 
0000000000012280
[  719.552831]  ffff880e448cffd8 ffff880e448ce010 0000000000012280 
0000000000012280
[  719.552832]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240 
ffff880e46cce380
[  719.552832] Call Trace:
[  719.552834]  [<ffffffff81620334>] schedule+0x24/0x70
[  719.552839]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  719.552841]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552846]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  719.552848]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  719.552849]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552854]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  719.552859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  719.552861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  719.552862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  719.552863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  719.552864]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  719.552866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  719.552867]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  719.552868]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
[  719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 
seconds.
[  719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  719.552870] kworker/5:1     D ffffffff8180ebc0     0  1186      2 
0x00000000
[  719.552872]  ffff880e448a3c90 0000000000000046 ffff880e46d36600 
0000000000012280
[  719.552873]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280 
0000000000012280
[  719.552874]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300 
ffff880e46d36600
[  719.552874] Call Trace:
[  719.552876]  [<ffffffff81620334>] schedule+0x24/0x70
[  719.552881]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
[  719.552883]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552888]  [<ffffffffa009d095>] 
btrfs_commit_transaction+0x655/0xab0 [btrfs]
[  719.552890]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
[  719.552891]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
[  719.552896]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 
[btrfs]
[  719.552901]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
[  719.552902]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
[  719.552903]  [<ffffffff8105b128>] worker_thread+0x178/0x400
[  719.552904]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
[  719.552906]  [<ffffffff8105fc46>] kthread+0x96/0xa0
[  719.552907]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
[  719.552909]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
[  719.552910]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-23  8:50 btrfs deadlock in 3.5-rc3 Stefan Priebe
@ 2012-06-23 13:46 ` Michael
  2012-06-23 14:55   ` Stefan Priebe
  2012-06-25 13:08 ` Josef Bacik
  1 sibling, 1 reply; 23+ messages in thread
From: Michael @ 2012-06-23 13:46 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: linux-btrfs

Hi Stefan,
Could you please give us details about your setup, especially what
kernel/btrfs version. If you aren't on the most recent, it would be
great if you could update and see if you can still reproduce, as there
was a major change in btrfs recently for Ceph. I'm not sure what
version it made it into.
Regards,
Michael

On Sat, Jun 23, 2012 at 3:50 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>
> Hello list,
>
> i've seen this deadlock today and can reproduce it while using ceph.
>
> Is this a known bug?
>
> [  599.514534] INFO: task kworker/6:0:29 blocked for more than 120
> seconds.
> [  599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  599.546832] kworker/6:0     D ffffffff8180ebc0     0    29      2
> 0x00000000
> [  599.563725]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180
> 0000000000012280
> [  599.580812]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280
> 0000000000012280
> [  599.580813]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340
> ffff880e78f4a180
> [  599.580814] Call Trace:
> [  599.580820]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580832]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  599.580836]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580844]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  599.580846]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  599.580847]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580853]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  599.580859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  599.580861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  599.580862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  599.580863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  599.580865]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  599.580866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  599.580868]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  599.580869]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  599.580872] INFO: task kworker/2:1:1184 blocked for more than 120
> seconds.
> [  599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  599.580873] kworker/2:1     D ffffffff8180ebc0     0  1184      2
> 0x00000000
> [  599.580874]  ffff880e448cfc90 0000000000000046 ffff880e46cce380
> 0000000000012280
> [  599.580875]  ffff880e448cffd8 ffff880e448ce010 0000000000012280
> 0000000000012280
> [  599.580876]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240
> ffff880e46cce380
> [  599.580877] Call Trace:
> [  599.580879]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580884]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  599.580886]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580891]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  599.580893]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  599.580894]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580899]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  599.580905]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  599.580906]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  599.580907]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  599.580908]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  599.580910]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  599.580911]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  599.580913]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  599.580914]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  599.580915] INFO: task kworker/5:1:1186 blocked for more than 120
> seconds.
> [  599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  599.580916] kworker/5:1     D ffffffff8180ebc0     0  1186      2
> 0x00000000
> [  599.580917]  ffff880e448a3c90 0000000000000046 ffff880e46d36600
> 0000000000012280
> [  599.580919]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280
> 0000000000012280
> [  599.580920]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300
> ffff880e46d36600
> [  599.580920] Call Trace:
> [  599.580922]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580927]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  599.580929]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580934]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  599.580936]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  599.580937]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580942]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  599.580948]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  599.580949]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  599.580950]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  599.580951]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  599.580952]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  599.580954]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  599.580955]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  599.580957]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds.
> [  599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  599.580967] ceph-osd        D 0000000000000006     0  3248      1
> 0x00000000
> [  599.580968]  ffff880bc4201c38 0000000000000082 ffff880e3efc4480
> 0000000000012280
> [  599.580970]  ffff880bc4201fd8 ffff880bc4200010 0000000000012280
> 0000000000012280
> [  599.580971]  ffff880bc4201fd8 0000000000012280 ffff880e78f4a180
> ffff880e3efc4480
> [  599.580971] Call Trace:
> [  599.580972]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580978]  [<ffffffffa009dd65>]
> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
> [  599.580983]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60
> [btrfs]
> [  599.580985]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580990]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
> [btrfs]
> [  599.580997]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
> [  599.581004]  [<ffffffffa00ceec9>]
> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
> [  599.581010]  [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0
> [btrfs]
> [  599.581016]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
> [  599.581019]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
> [  599.581021]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
> [  599.581022]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
> [  599.581024]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
> [  599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds.
> [  599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  599.581036] ceph-osd        D 0000000000000002     0  3278      1
> 0x00000000
> [  599.581038]  ffff880bc41fbc38 0000000000000082 ffff880e412283c0
> 0000000000012280
> [  599.581039]  ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280
> 0000000000012280
> [  599.581040]  ffff880bc41fbfd8 0000000000012280 ffff880e46cce380
> ffff880e412283c0
> [  599.581040] Call Trace:
> [  599.581042]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.581048]  [<ffffffffa009dd65>]
> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
> [  599.581053]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60
> [btrfs]
> [  599.581055]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.581059]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
> [btrfs]
> [  599.581066]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
> [  599.581071]  [<ffffffffa00ceec9>]
> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
> [  599.581077]  [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0
> [btrfs]
> [  599.581082]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
> [  599.581084]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
> [  599.581085]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
> [  599.581086]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
> [  599.581088]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
> [  599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds.
> [  599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  599.581095] ceph-osd        D 0000000000000005     0  3330      1
> 0x00000000
> [  599.581096]  ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740
> 0000000000012280
> [  599.581098]  ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280
> 0000000000012280
> [  599.581099]  ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600
> ffff880bc3bd4740
> [  599.581099] Call Trace:
> [  599.581101]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.581106]  [<ffffffffa009dd65>]
> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
> [  599.581111]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60
> [btrfs]
> [  599.581113]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.581118]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
> [btrfs]
> [  599.581123]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
> [  599.581129]  [<ffffffffa00ceec9>]
> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
> [  599.581134]  [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0
> [btrfs]
> [  599.581139]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
> [  599.581141]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
> [  599.581142]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
> [  599.581143]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
> [  599.581145]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
> [  719.395814] INFO: task kworker/6:0:29 blocked for more than 120
> seconds.
> [  719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  719.439537] kworker/6:0     D ffffffff8180ebc0     0    29      2
> 0x00000000
> [  719.461937]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180
> 0000000000012280
> [  719.484547]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280
> 0000000000012280
> [  719.507211]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340
> ffff880e78f4a180
> [  719.530071] Call Trace:
> [  719.552729]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552741]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552744]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552752]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  719.552755]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552756]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552762]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552768]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552770]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552771]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552772]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552774]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552776]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552777]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552779]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  719.552781] INFO: task kworker/3:1:578 blocked for more than 120
> seconds.
> [  719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  719.552783] kworker/3:1     D ffffffff8180ebc0     0   578      2
> 0x00000000
> [  719.552784]  ffff880e45dd3c90 0000000000000046 ffff880e7890e080
> 0000000000012280
> [  719.552786]  ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280
> 0000000000012280
> [  719.552787]  ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280
> ffff880e7890e080
> [  719.552787] Call Trace:
> [  719.552789]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552795]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552797]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552802]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  719.552804]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552805]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552810]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552816]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552817]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552818]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552819]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552820]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552822]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552824]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552825]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  719.552827] INFO: task kworker/2:1:1184 blocked for more than 120
> seconds.
> [  719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  719.552828] kworker/2:1     D ffffffff8180ebc0     0  1184      2
> 0x00000000
> [  719.552830]  ffff880e448cfc90 0000000000000046 ffff880e46cce380
> 0000000000012280
> [  719.552831]  ffff880e448cffd8 ffff880e448ce010 0000000000012280
> 0000000000012280
> [  719.552832]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240
> ffff880e46cce380
> [  719.552832] Call Trace:
> [  719.552834]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552839]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552841]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552846]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  719.552848]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552849]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552854]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552864]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552867]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552868]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  719.552869] INFO: task kworker/5:1:1186 blocked for more than 120
> seconds.
> [  719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  719.552870] kworker/5:1     D ffffffff8180ebc0     0  1186      2
> 0x00000000
> [  719.552872]  ffff880e448a3c90 0000000000000046 ffff880e46d36600
> 0000000000012280
> [  719.552873]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280
> 0000000000012280
> [  719.552874]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300
> ffff880e46d36600
> [  719.552874] Call Trace:
> [  719.552876]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552881]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552883]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552888]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
> [btrfs]
> [  719.552890]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552891]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552896]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552901]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552902]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552903]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552904]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552906]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552907]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552909]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552910]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-23 13:46 ` Michael
@ 2012-06-23 14:55   ` Stefan Priebe
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Priebe @ 2012-06-23 14:55 UTC (permalink / raw)
  To: Michael; +Cc: linux-btrfs

I'm on linux kernel 3.5.0-rc3 like I mentioned in subject ;-). Kernel.org says this is the latest.

Greets
Stefan

Am 23.06.2012 um 15:46 schrieb Michael <mike@draftx.net>:

> Hi Stefan,
> Could you please give us details about your setup, especially what
> kernel/btrfs version. If you aren't on the most recent, it would be
> great if you could update and see if you can still reproduce, as there
> was a major change in btrfs recently for Ceph. I'm not sure what
> version it made it into.
> Regards,
> Michael
> 
> On Sat, Jun 23, 2012 at 3:50 AM, Stefan Priebe <s.priebe@profihost.ag> wrote:
>> 
>> Hello list,
>> 
>> i've seen this deadlock today and can reproduce it while using ceph.
>> 
>> Is this a known bug?
>> 
>> [  599.514534] INFO: task kworker/6:0:29 blocked for more than 120
>> seconds.
>> [  599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  599.546832] kworker/6:0     D ffffffff8180ebc0     0    29      2
>> 0x00000000
>> [  599.563725]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180
>> 0000000000012280
>> [  599.580812]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280
>> 0000000000012280
>> [  599.580813]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340
>> ffff880e78f4a180
>> [  599.580814] Call Trace:
>> [  599.580820]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  599.580832]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  599.580836]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580844]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  599.580846]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  599.580847]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580853]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  599.580859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  599.580861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  599.580862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  599.580863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  599.580865]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  599.580866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  599.580868]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  599.580869]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> [  599.580872] INFO: task kworker/2:1:1184 blocked for more than 120
>> seconds.
>> [  599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  599.580873] kworker/2:1     D ffffffff8180ebc0     0  1184      2
>> 0x00000000
>> [  599.580874]  ffff880e448cfc90 0000000000000046 ffff880e46cce380
>> 0000000000012280
>> [  599.580875]  ffff880e448cffd8 ffff880e448ce010 0000000000012280
>> 0000000000012280
>> [  599.580876]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240
>> ffff880e46cce380
>> [  599.580877] Call Trace:
>> [  599.580879]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  599.580884]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  599.580886]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580891]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  599.580893]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  599.580894]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580899]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  599.580905]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  599.580906]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  599.580907]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  599.580908]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  599.580910]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  599.580911]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  599.580913]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  599.580914]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> [  599.580915] INFO: task kworker/5:1:1186 blocked for more than 120
>> seconds.
>> [  599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  599.580916] kworker/5:1     D ffffffff8180ebc0     0  1186      2
>> 0x00000000
>> [  599.580917]  ffff880e448a3c90 0000000000000046 ffff880e46d36600
>> 0000000000012280
>> [  599.580919]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280
>> 0000000000012280
>> [  599.580920]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300
>> ffff880e46d36600
>> [  599.580920] Call Trace:
>> [  599.580922]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  599.580927]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  599.580929]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580934]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  599.580936]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  599.580937]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580942]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  599.580948]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  599.580949]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  599.580950]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  599.580951]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  599.580952]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  599.580954]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  599.580955]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  599.580957]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> [  599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds.
>> [  599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  599.580967] ceph-osd        D 0000000000000006     0  3248      1
>> 0x00000000
>> [  599.580968]  ffff880bc4201c38 0000000000000082 ffff880e3efc4480
>> 0000000000012280
>> [  599.580970]  ffff880bc4201fd8 ffff880bc4200010 0000000000012280
>> 0000000000012280
>> [  599.580971]  ffff880bc4201fd8 0000000000012280 ffff880e78f4a180
>> ffff880e3efc4480
>> [  599.580971] Call Trace:
>> [  599.580972]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  599.580978]  [<ffffffffa009dd65>]
>> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
>> [  599.580983]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60
>> [btrfs]
>> [  599.580985]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.580990]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
>> [btrfs]
>> [  599.580997]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
>> [  599.581004]  [<ffffffffa00ceec9>]
>> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
>> [  599.581010]  [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0
>> [btrfs]
>> [  599.581016]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
>> [  599.581019]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
>> [  599.581021]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
>> [  599.581022]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
>> [  599.581024]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
>> [  599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds.
>> [  599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  599.581036] ceph-osd        D 0000000000000002     0  3278      1
>> 0x00000000
>> [  599.581038]  ffff880bc41fbc38 0000000000000082 ffff880e412283c0
>> 0000000000012280
>> [  599.581039]  ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280
>> 0000000000012280
>> [  599.581040]  ffff880bc41fbfd8 0000000000012280 ffff880e46cce380
>> ffff880e412283c0
>> [  599.581040] Call Trace:
>> [  599.581042]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  599.581048]  [<ffffffffa009dd65>]
>> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
>> [  599.581053]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60
>> [btrfs]
>> [  599.581055]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.581059]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
>> [btrfs]
>> [  599.581066]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
>> [  599.581071]  [<ffffffffa00ceec9>]
>> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
>> [  599.581077]  [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0
>> [btrfs]
>> [  599.581082]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
>> [  599.581084]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
>> [  599.581085]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
>> [  599.581086]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
>> [  599.581088]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
>> [  599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds.
>> [  599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  599.581095] ceph-osd        D 0000000000000005     0  3330      1
>> 0x00000000
>> [  599.581096]  ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740
>> 0000000000012280
>> [  599.581098]  ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280
>> 0000000000012280
>> [  599.581099]  ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600
>> ffff880bc3bd4740
>> [  599.581099] Call Trace:
>> [  599.581101]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  599.581106]  [<ffffffffa009dd65>]
>> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
>> [  599.581111]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60
>> [btrfs]
>> [  599.581113]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  599.581118]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
>> [btrfs]
>> [  599.581123]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
>> [  599.581129]  [<ffffffffa00ceec9>]
>> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
>> [  599.581134]  [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0
>> [btrfs]
>> [  599.581139]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
>> [  599.581141]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
>> [  599.581142]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
>> [  599.581143]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
>> [  599.581145]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
>> [  719.395814] INFO: task kworker/6:0:29 blocked for more than 120
>> seconds.
>> [  719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  719.439537] kworker/6:0     D ffffffff8180ebc0     0    29      2
>> 0x00000000
>> [  719.461937]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180
>> 0000000000012280
>> [  719.484547]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280
>> 0000000000012280
>> [  719.507211]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340
>> ffff880e78f4a180
>> [  719.530071] Call Trace:
>> [  719.552729]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  719.552741]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  719.552744]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552752]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  719.552755]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  719.552756]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552762]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  719.552768]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  719.552770]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  719.552771]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  719.552772]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  719.552774]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  719.552776]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  719.552777]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  719.552779]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> [  719.552781] INFO: task kworker/3:1:578 blocked for more than 120
>> seconds.
>> [  719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  719.552783] kworker/3:1     D ffffffff8180ebc0     0   578      2
>> 0x00000000
>> [  719.552784]  ffff880e45dd3c90 0000000000000046 ffff880e7890e080
>> 0000000000012280
>> [  719.552786]  ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280
>> 0000000000012280
>> [  719.552787]  ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280
>> ffff880e7890e080
>> [  719.552787] Call Trace:
>> [  719.552789]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  719.552795]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  719.552797]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552802]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  719.552804]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  719.552805]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552810]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  719.552816]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  719.552817]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  719.552818]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  719.552819]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  719.552820]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  719.552822]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  719.552824]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  719.552825]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> [  719.552827] INFO: task kworker/2:1:1184 blocked for more than 120
>> seconds.
>> [  719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  719.552828] kworker/2:1     D ffffffff8180ebc0     0  1184      2
>> 0x00000000
>> [  719.552830]  ffff880e448cfc90 0000000000000046 ffff880e46cce380
>> 0000000000012280
>> [  719.552831]  ffff880e448cffd8 ffff880e448ce010 0000000000012280
>> 0000000000012280
>> [  719.552832]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240
>> ffff880e46cce380
>> [  719.552832] Call Trace:
>> [  719.552834]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  719.552839]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  719.552841]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552846]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  719.552848]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  719.552849]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552854]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  719.552859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  719.552861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  719.552862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  719.552863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  719.552864]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  719.552866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  719.552867]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  719.552868]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> [  719.552869] INFO: task kworker/5:1:1186 blocked for more than 120
>> seconds.
>> [  719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  719.552870] kworker/5:1     D ffffffff8180ebc0     0  1186      2
>> 0x00000000
>> [  719.552872]  ffff880e448a3c90 0000000000000046 ffff880e46d36600
>> 0000000000012280
>> [  719.552873]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280
>> 0000000000012280
>> [  719.552874]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300
>> ffff880e46d36600
>> [  719.552874] Call Trace:
>> [  719.552876]  [<ffffffff81620334>] schedule+0x24/0x70
>> [  719.552881]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
>> [  719.552883]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552888]  [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0
>> [btrfs]
>> [  719.552890]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
>> [  719.552891]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
>> [  719.552896]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
>> [btrfs]
>> [  719.552901]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
>> [  719.552902]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
>> [  719.552903]  [<ffffffff8105b128>] worker_thread+0x178/0x400
>> [  719.552904]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
>> [  719.552906]  [<ffffffff8105fc46>] kthread+0x96/0xa0
>> [  719.552907]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
>> [  719.552909]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
>> [  719.552910]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>> 
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-23  8:50 btrfs deadlock in 3.5-rc3 Stefan Priebe
  2012-06-23 13:46 ` Michael
@ 2012-06-25 13:08 ` Josef Bacik
  2012-06-25 14:08   ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-25 13:08 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: linux-btrfs

On 06/23/2012 04:50 AM, Stefan Priebe wrote:
> Hello list,
>
> i've seen this deadlock today and can reproduce it while using ceph.
>
> Is this a known bug?
>
> [  599.514534] INFO: task kworker/6:0:29 blocked for more than 120 seconds.
> [  599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  599.546832] kworker/6:0     D ffffffff8180ebc0     0    29      2
> 0x00000000
> [  599.563725]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180
> 0000000000012280
> [  599.580812]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280
> 0000000000012280
> [  599.580813]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340
> ffff880e78f4a180
> [  599.580814] Call Trace:
> [  599.580820]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580832]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  599.580836]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580844]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  599.580846]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  599.580847]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580853]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  599.580859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  599.580861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  599.580862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  599.580863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  599.580865]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  599.580866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  599.580868]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  599.580869]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  599.580872] INFO: task kworker/2:1:1184 blocked for more than 120
> seconds.
> [  599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  599.580873] kworker/2:1     D ffffffff8180ebc0     0  1184      2
> 0x00000000
> [  599.580874]  ffff880e448cfc90 0000000000000046 ffff880e46cce380
> 0000000000012280
> [  599.580875]  ffff880e448cffd8 ffff880e448ce010 0000000000012280
> 0000000000012280
> [  599.580876]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240
> ffff880e46cce380
> [  599.580877] Call Trace:
> [  599.580879]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580884]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  599.580886]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580891]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  599.580893]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  599.580894]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580899]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  599.580905]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  599.580906]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  599.580907]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  599.580908]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  599.580910]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  599.580911]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  599.580913]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  599.580914]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  599.580915] INFO: task kworker/5:1:1186 blocked for more than 120
> seconds.
> [  599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  599.580916] kworker/5:1     D ffffffff8180ebc0     0  1186      2
> 0x00000000
> [  599.580917]  ffff880e448a3c90 0000000000000046 ffff880e46d36600
> 0000000000012280
> [  599.580919]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280
> 0000000000012280
> [  599.580920]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300
> ffff880e46d36600
> [  599.580920] Call Trace:
> [  599.580922]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580927]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  599.580929]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580934]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  599.580936]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  599.580937]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580942]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  599.580948]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  599.580949]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  599.580950]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  599.580951]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  599.580952]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  599.580954]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  599.580955]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  599.580957]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds.
> [  599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  599.580967] ceph-osd        D 0000000000000006     0  3248      1
> 0x00000000
> [  599.580968]  ffff880bc4201c38 0000000000000082 ffff880e3efc4480
> 0000000000012280
> [  599.580970]  ffff880bc4201fd8 ffff880bc4200010 0000000000012280
> 0000000000012280
> [  599.580971]  ffff880bc4201fd8 0000000000012280 ffff880e78f4a180
> ffff880e3efc4480
> [  599.580971] Call Trace:
> [  599.580972]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.580978]  [<ffffffffa009dd65>]
> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
> [  599.580983]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs]
> [  599.580985]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.580990]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
> [btrfs]
> [  599.580997]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
> [  599.581004]  [<ffffffffa00ceec9>]
> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
> [  599.581010]  [<ffffffffa00d0bc4>]
> btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs]
> [  599.581016]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
> [  599.581019]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
> [  599.581021]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
> [  599.581022]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
> [  599.581024]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
> [  599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds.
> [  599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  599.581036] ceph-osd        D 0000000000000002     0  3278      1
> 0x00000000
> [  599.581038]  ffff880bc41fbc38 0000000000000082 ffff880e412283c0
> 0000000000012280
> [  599.581039]  ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280
> 0000000000012280
> [  599.581040]  ffff880bc41fbfd8 0000000000012280 ffff880e46cce380
> ffff880e412283c0
> [  599.581040] Call Trace:
> [  599.581042]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.581048]  [<ffffffffa009dd65>]
> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
> [  599.581053]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs]
> [  599.581055]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.581059]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
> [btrfs]
> [  599.581066]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
> [  599.581071]  [<ffffffffa00ceec9>]
> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
> [  599.581077]  [<ffffffffa00d0bc4>]
> btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs]
> [  599.581082]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
> [  599.581084]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
> [  599.581085]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
> [  599.581086]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
> [  599.581088]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
> [  599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds.
> [  599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  599.581095] ceph-osd        D 0000000000000005     0  3330      1
> 0x00000000
> [  599.581096]  ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740
> 0000000000012280
> [  599.581098]  ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280
> 0000000000012280
> [  599.581099]  ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600
> ffff880bc3bd4740
> [  599.581099] Call Trace:
> [  599.581101]  [<ffffffff81620334>] schedule+0x24/0x70
> [  599.581106]  [<ffffffffa009dd65>]
> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs]
> [  599.581111]  [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs]
> [  599.581113]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  599.581118]  [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50
> [btrfs]
> [  599.581123]  [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs]
> [  599.581129]  [<ffffffffa00ceec9>]
> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs]
> [  599.581134]  [<ffffffffa00d0bc4>]
> btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs]
> [  599.581139]  [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs]
> [  599.581141]  [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0
> [  599.581142]  [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0
> [  599.581143]  [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80
> [  599.581145]  [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b
> [  719.395814] INFO: task kworker/6:0:29 blocked for more than 120 seconds.
> [  719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  719.439537] kworker/6:0     D ffffffff8180ebc0     0    29      2
> 0x00000000
> [  719.461937]  ffff880e78f4dc90 0000000000000046 ffff880e78f4a180
> 0000000000012280
> [  719.484547]  ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280
> 0000000000012280
> [  719.507211]  ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340
> ffff880e78f4a180
> [  719.530071] Call Trace:
> [  719.552729]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552741]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552744]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552752]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  719.552755]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552756]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552762]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552768]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552770]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552771]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552772]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552774]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552776]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552777]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552779]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  719.552781] INFO: task kworker/3:1:578 blocked for more than 120 seconds.
> [  719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  719.552783] kworker/3:1     D ffffffff8180ebc0     0   578      2
> 0x00000000
> [  719.552784]  ffff880e45dd3c90 0000000000000046 ffff880e7890e080
> 0000000000012280
> [  719.552786]  ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280
> 0000000000012280
> [  719.552787]  ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280
> ffff880e7890e080
> [  719.552787] Call Trace:
> [  719.552789]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552795]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552797]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552802]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  719.552804]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552805]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552810]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552816]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552817]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552818]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552819]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552820]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552822]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552824]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552825]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  719.552827] INFO: task kworker/2:1:1184 blocked for more than 120
> seconds.
> [  719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  719.552828] kworker/2:1     D ffffffff8180ebc0     0  1184      2
> 0x00000000
> [  719.552830]  ffff880e448cfc90 0000000000000046 ffff880e46cce380
> 0000000000012280
> [  719.552831]  ffff880e448cffd8 ffff880e448ce010 0000000000012280
> 0000000000012280
> [  719.552832]  ffff880e448cffd8 0000000000012280 ffff880e78e5a240
> ffff880e46cce380
> [  719.552832] Call Trace:
> [  719.552834]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552839]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552841]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552846]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  719.552848]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552849]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552854]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552859]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552861]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552862]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552863]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552864]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552866]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552867]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552868]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
> [  719.552869] INFO: task kworker/5:1:1186 blocked for more than 120
> seconds.
> [  719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  719.552870] kworker/5:1     D ffffffff8180ebc0     0  1186      2
> 0x00000000
> [  719.552872]  ffff880e448a3c90 0000000000000046 ffff880e46d36600
> 0000000000012280
> [  719.552873]  ffff880e448a3fd8 ffff880e448a2010 0000000000012280
> 0000000000012280
> [  719.552874]  ffff880e448a3fd8 0000000000012280 ffff880e78e88300
> ffff880e46d36600
> [  719.552874] Call Trace:
> [  719.552876]  [<ffffffff81620334>] schedule+0x24/0x70
> [  719.552881]  [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs]
> [  719.552883]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552888]  [<ffffffffa009d095>]
> btrfs_commit_transaction+0x655/0xab0 [btrfs]
> [  719.552890]  [<ffffffff81076058>] ? idle_balance+0x108/0x130
> [  719.552891]  [<ffffffff81060140>] ? wake_up_bit+0x40/0x40
> [  719.552896]  [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20
> [btrfs]
> [  719.552901]  [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs]
> [  719.552902]  [<ffffffff81058eff>] process_one_work+0x11f/0x470
> [  719.552903]  [<ffffffff8105b128>] worker_thread+0x178/0x400
> [  719.552904]  [<ffffffff8105afb0>] ? manage_workers+0x210/0x210
> [  719.552906]  [<ffffffff8105fc46>] kthread+0x96/0xa0
> [  719.552907]  [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10
> [  719.552909]  [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130
> [  719.552910]  [<ffffffff81622dd0>] ? gs_change+0xb/0xb
>

This isn't showing the guy who's actually trying to commit the 
transaction.  Next time this happens can you do a sysrq+w and capture 
the output and post it here so we can see what everybody is doing?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 13:08 ` Josef Bacik
@ 2012-06-25 14:08   ` Stefan Priebe - Profihost AG
  2012-06-25 14:20     ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-25 14:08 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Am 25.06.2012 15:08, schrieb Josef Bacik:>
 > This isn't showing the guy who's actually trying to commit the
 > transaction.  Next time this happens can you do a sysrq+w and capture
 > the output and post it here so we can see what everybody is doing? 
Thanks,
 >
 > Josef

No problem.

Kernel trace:
http://pastebin.com/raw.php?i=puZkCRCn

# sys rq w trigger:
# echo w > /proc/sysrq-trigger
http://pastebin.com/raw.php?i=AQA8xxCX

Hope that helps. I'm able to trigger this pretty easily with ceph. So i 
can produce as much info as you want.

Thanks!

Greets,
Stefan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 14:08   ` Stefan Priebe - Profihost AG
@ 2012-06-25 14:20     ` Josef Bacik
  2012-06-25 14:45       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-25 14:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs

On Mon, Jun 25, 2012 at 08:08:31AM -0600, Stefan Priebe - Profihost AG wrote:
> Am 25.06.2012 15:08, schrieb Josef Bacik:>
>  > This isn't showing the guy who's actually trying to commit the
>  > transaction.  Next time this happens can you do a sysrq+w and capture
>  > the output and post it here so we can see what everybody is doing? 
> Thanks,
>  >
>  > Josef
> 
> No problem.
> 
> Kernel trace:
> http://pastebin.com/raw.php?i=puZkCRCn
> 
> # sys rq w trigger:
> # echo w > /proc/sysrq-trigger
> http://pastebin.com/raw.php?i=AQA8xxCX


Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so
unless theres a bug theres nothing blocked.  Is the box actually hung or is it
just taking forever?  Maybe try sysrq+w again to see if the one you pasted was
just a fluke?  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 14:20     ` Josef Bacik
@ 2012-06-25 14:45       ` Stefan Priebe - Profihost AG
  2012-06-25 14:48         ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-25 14:45 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

>
> Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so
> unless theres a bug theres nothing blocked.  Is the box actually hung or is it
> just taking forever?  Maybe try sysrq+w again to see if the one you pasted was
> just a fluke?  Thanks,

This one looks better:
http://pastebin.com/raw.php?i=R4pztDRt

Sorry.

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 14:45       ` Stefan Priebe - Profihost AG
@ 2012-06-25 14:48         ` Josef Bacik
  2012-06-25 17:38           ` Stefan Priebe
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-25 14:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs

On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote:
> >
> > Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so
> > unless theres a bug theres nothing blocked.  Is the box actually hung or is it
> > just taking forever?  Maybe try sysrq+w again to see if the one you pasted was
> > just a fluke?  Thanks,
> 
> This one looks better:
> http://pastebin.com/raw.php?i=R4pztDRt
> 

Ok looks like you have discard turned on.  Can you turn that off and see if you
can still reproduce the deadlock?  If so sysrq+w again, if not then I know where
to look ;).  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 14:48         ` Josef Bacik
@ 2012-06-25 17:38           ` Stefan Priebe
  2012-06-25 18:02             ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe @ 2012-06-25 17:38 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs


Am 25.06.2012 16:48, schrieb Josef Bacik:
> On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote:
>>>
>>> Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so
>>> unless theres a bug theres nothing blocked.  Is the box actually hung or is it
>>> just taking forever?  Maybe try sysrq+w again to see if the one you pasted was
>>> just a fluke?  Thanks,
>>
>> This one looks better:
>> http://pastebin.com/raw.php?i=R4pztDRt
>>
>
> Ok looks like you have discard turned on.
Yes

 >  Can you turn that off and see if you
> can still reproduce the deadlock?  If so sysrq+w again, if not then I know where
> to look ;).  Thanks,
without discard i can't reproduce but random write speed with ceph 
without discard is a LOT slower (around 8000 iops/s instead of 
13000iops/s). So i don't know if it is discard or if i'm just not able 
to trigger it.

Thanks!

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 17:38           ` Stefan Priebe
@ 2012-06-25 18:02             ` Josef Bacik
  2012-06-25 18:28               ` Stefan Priebe
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-25 18:02 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs

On Mon, Jun 25, 2012 at 11:38:38AM -0600, Stefan Priebe wrote:
> 
> Am 25.06.2012 16:48, schrieb Josef Bacik:
> > On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote:
> >>>
> >>> Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so
> >>> unless theres a bug theres nothing blocked.  Is the box actually hung or is it
> >>> just taking forever?  Maybe try sysrq+w again to see if the one you pasted was
> >>> just a fluke?  Thanks,
> >>
> >> This one looks better:
> >> http://pastebin.com/raw.php?i=R4pztDRt
> >>
> >
> > Ok looks like you have discard turned on.
> Yes
> 
>  >  Can you turn that off and see if you
> > can still reproduce the deadlock?  If so sysrq+w again, if not then I know where
> > to look ;).  Thanks,
> without discard i can't reproduce but random write speed with ceph 
> without discard is a LOT slower (around 8000 iops/s instead of 
> 13000iops/s). So i don't know if it is discard or if i'm just not able 
> to trigger it.
> 

Ouch, what kind of drive goes faster with discard _on_?  Anyway it looks like
we're waiting for the discard to come back, so either its your drive or theres a
bug in the block layer.  Maybe try an older kernel and see if it's broken there,
and then bisect it down?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 18:02             ` Josef Bacik
@ 2012-06-25 18:28               ` Stefan Priebe
  2012-06-25 19:33                 ` Stefan Priebe
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe @ 2012-06-25 18:28 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Am 25.06.2012 20:02, schrieb Josef Bacik:
>>   >  Can you turn that off and see if you
>>> can still reproduce the deadlock?  If so sysrq+w again, if not then I know where
>>> to look ;).  Thanks,
>> without discard i can't reproduce but random write speed with ceph
>> without discard is a LOT slower (around 8000 iops/s instead of
>> 13000iops/s). So i don't know if it is discard or if i'm just not able
>> to trigger it.
>>
>
> Ouch, what kind of drive goes faster with discard _on_?  Anyway it looks like
> we're waiting for the discard to come back, so either its your drive or theres a
> bug in the block layer.  Maybe try an older kernel and see if it's broken there,
> and then bisect it down?  Thanks,

Intel 520 series SSD. I can't try an older kernel version as ceph 
crashes with them pretty fast due to other bugs which where fixed with 
3.5-rc1.

Greets
Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 18:28               ` Stefan Priebe
@ 2012-06-25 19:33                 ` Stefan Priebe
  2012-06-25 20:11                   ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe @ 2012-06-25 19:33 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

With v3.4 the same. Can't go back more as this really results in very 
fast corruption. Any ideas how to debug?

Stefan

Am 25.06.2012 20:28, schrieb Stefan Priebe:
> Am 25.06.2012 20:02, schrieb Josef Bacik:
>>>   >  Can you turn that off and see if you
>>>> can still reproduce the deadlock?  If so sysrq+w again, if not then
>>>> I know where
>>>> to look ;).  Thanks,
>>> without discard i can't reproduce but random write speed with ceph
>>> without discard is a LOT slower (around 8000 iops/s instead of
>>> 13000iops/s). So i don't know if it is discard or if i'm just not able
>>> to trigger it.
>>>
>>
>> Ouch, what kind of drive goes faster with discard _on_?  Anyway it
>> looks like
>> we're waiting for the discard to come back, so either its your drive
>> or theres a
>> bug in the block layer.  Maybe try an older kernel and see if it's
>> broken there,
>> and then bisect it down?  Thanks,
>
> Intel 520 series SSD. I can't try an older kernel version as ceph
> crashes with them pretty fast due to other bugs which where fixed with
> 3.5-rc1.
>
> Greets
> Stefan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 19:33                 ` Stefan Priebe
@ 2012-06-25 20:11                   ` Josef Bacik
  2012-06-25 20:20                     ` Stefan Priebe
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-25 20:11 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs

On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:
> With v3.4 the same. Can't go back more as this really results in very 
> fast corruption. Any ideas how to debug?
> 

What workload are you running?  I have a ssd here with discard support I can try
and reproduce on.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 20:11                   ` Josef Bacik
@ 2012-06-25 20:20                     ` Stefan Priebe
  2012-06-25 20:23                       ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe @ 2012-06-25 20:20 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Am 25.06.2012 22:11, schrieb Josef Bacik:
> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:
>> With v3.4 the same. Can't go back more as this really results in very
>> fast corruption. Any ideas how to debug?
>>
>
> What workload are you running?  I have a ssd here with discard support I can try
> and reproduce on.  Thanks,

i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i 
don't know which load ceph then exactly generates. ;-(

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 20:20                     ` Stefan Priebe
@ 2012-06-25 20:23                       ` Josef Bacik
  2012-06-25 20:33                         ` Stefan Priebe
  2012-06-26 16:47                         ` Stefan Priebe
  0 siblings, 2 replies; 23+ messages in thread
From: Josef Bacik @ 2012-06-25 20:23 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs

On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote:
> Am 25.06.2012 22:11, schrieb Josef Bacik:
> > On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:
> >> With v3.4 the same. Can't go back more as this really results in very
> >> fast corruption. Any ideas how to debug?
> >>
> >
> > What workload are you running?  I have a ssd here with discard support I can try
> > and reproduce on.  Thanks,
> 
> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i 
> don't know which load ceph then exactly generates. ;-(
> 

Thats fine, I have this handy "create a local ceph cluster" script from an
earlier problem, just send me your fio job and I'll run it locally.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 20:23                       ` Josef Bacik
@ 2012-06-25 20:33                         ` Stefan Priebe
  2012-06-26 16:47                         ` Stefan Priebe
  1 sibling, 0 replies; 23+ messages in thread
From: Stefan Priebe @ 2012-06-25 20:33 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Am 25.06.2012 22:23, schrieb Josef Bacik:
> On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote:
>> Am 25.06.2012 22:11, schrieb Josef Bacik:
>>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:
>>>> With v3.4 the same. Can't go back more as this really results in very
>>>> fast corruption. Any ideas how to debug?
>>>>
>>>
>>> What workload are you running?  I have a ssd here with discard support I can try
>>> and reproduce on.  Thanks,
>>
>> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i
>> don't know which load ceph then exactly generates. ;-(
>>
>
> Thats fine, I have this handy "create a local ceph cluster" script from an
> earlier problem, just send me your fio job and I'll run it locally.  Thanks,

OK my fio job is running in a KVM using a RBD:

fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G 
--numjobs=50 --runtime=300 --group_reporting --name=file1

Backed by 3 server with 4 OSDs (all intel SSDs) each running btrfs on
top.

THANKS!

Greets
Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-25 20:23                       ` Josef Bacik
  2012-06-25 20:33                         ` Stefan Priebe
@ 2012-06-26 16:47                         ` Stefan Priebe
  2012-06-26 20:14                           ` Josef Bacik
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Priebe @ 2012-06-26 16:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Am 25.06.2012 22:23, schrieb Josef Bacik:
> On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote:
>> Am 25.06.2012 22:11, schrieb Josef Bacik:
>>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:
>>>> With v3.4 the same. Can't go back more as this really results in very
>>>> fast corruption. Any ideas how to debug?
>>>>
>>>
>>> What workload are you running?  I have a ssd here with discard support I can try
>>> and reproduce on.  Thanks,
>>
>> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i
>> don't know which load ceph then exactly generates. ;-(
>>
>
> Thats fine, I have this handy "create a local ceph cluster" script from an
> earlier problem, just send me your fio job and I'll run it locally.  Thanks,

Where you able to find anything? Can i do more or different testing?

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-26 16:47                         ` Stefan Priebe
@ 2012-06-26 20:14                           ` Josef Bacik
  2012-06-26 20:19                             ` Stefan Priebe
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-26 20:14 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs

On Tue, Jun 26, 2012 at 10:47:56AM -0600, Stefan Priebe wrote:
> Am 25.06.2012 22:23, schrieb Josef Bacik:
> > On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote:
> >> Am 25.06.2012 22:11, schrieb Josef Bacik:
> >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote:
> >>>> With v3.4 the same. Can't go back more as this really results in very
> >>>> fast corruption. Any ideas how to debug?
> >>>>
> >>>
> >>> What workload are you running?  I have a ssd here with discard support I can try
> >>> and reproduce on.  Thanks,
> >>
> >> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i
> >> don't know which load ceph then exactly generates. ;-(
> >>
> >
> > Thats fine, I have this handy "create a local ceph cluster" script from an
> > earlier problem, just send me your fio job and I'll run it locally.  Thanks,
> 
> Where you able to find anything? Can i do more or different testing?
> 

I can't reproduce so I'm going to have to figure out a way to debug it through
you, as soon as I think of something I will let you know.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-26 20:14                           ` Josef Bacik
@ 2012-06-26 20:19                             ` Stefan Priebe
  2012-06-26 20:48                               ` Josef Bacik
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Priebe @ 2012-06-26 20:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Am 26.06.2012 22:14, schrieb Josef Bacik:
> I can't reproduce so I'm going to have to figure out a way to debug it through
> you, as soon as I think of something I will let you know.  Thanks,
>

Thanks. You mentioned that discard shouldn't have any positive effects 
on a SSD.

May i see a sideffect? I mean with discard 13.000 IOPs in ceph without 
discard just 6000-9000 IOPs could this be real or might this just happen 
due to the bug i see?

Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-26 20:19                             ` Stefan Priebe
@ 2012-06-26 20:48                               ` Josef Bacik
  2012-06-27  5:47                                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 23+ messages in thread
From: Josef Bacik @ 2012-06-26 20:48 UTC (permalink / raw)
  To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs

On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote:
> Am 26.06.2012 22:14, schrieb Josef Bacik:
> > I can't reproduce so I'm going to have to figure out a way to debug it through
> > you, as soon as I think of something I will let you know.  Thanks,
> >
> 
> Thanks. You mentioned that discard shouldn't have any positive effects 
> on a SSD.
> 
> May i see a sideffect? I mean with discard 13.000 IOPs in ceph without 
> discard just 6000-9000 IOPs could this be real or might this just happen 
> due to the bug i see?
> 

Beats me, it would seem to me that discard would make things slower since we
have to wait for the commit to discard everybody, but who knows, stranger things
have happened.  Can you reproduce 2 more times and get sysrq+w each time so I
have a few different outputs to compare, maybe I'm missing something.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-26 20:48                               ` Josef Bacik
@ 2012-06-27  5:47                                 ` Stefan Priebe - Profihost AG
  2012-06-27 13:30                                   ` Josef Bacik
  2012-06-27 21:17                                   ` Josef Bacik
  0 siblings, 2 replies; 23+ messages in thread
From: Stefan Priebe - Profihost AG @ 2012-06-27  5:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Yes i will do so. Right now i was trying to compare discard with non 
discard with this simple command:
for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm 
t_$i; done;

But i hit a new bug:

[39577.660228] BUG: unable to handle kernel paging request at 
fffffffffffffe50
[39577.686517] IP: [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0
[39577.713417] PGD 1c0d067 PUD 1c0e067 PMD 0
[39577.740039] Oops: 0000 [#1] SMP
[39577.766401] CPU 6
[39577.792540] Modules linked in: nf_conntrack_ipv4 nf_conntrack 
nf_defrag_ipv4 ipv6 i2c_i801 coretemp i2c_core ixgbe(O) [last unloaded: 
scsi_wait_scan]
[39577.847511]
[39577.847513] Pid: 3447, comm: btrfs-endio-wri Tainted: G           O 
3.5.0-rc4intel #15 Supermicro 
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F
[39577.847516] RIP: 0010:[<ffffffff8131b4f3>]  [<ffffffff8131b4f3>] 
btrfs_finish_ordered_io+0x23/0x3f0
[39577.847516] RSP: 0018:ffff880e3b861d90  EFLAGS: 00010287
[39577.847517] RAX: ffff880e3b861e90 RBX: ffff880e3a8fb100 RCX: 
ffff880e3b861e90
[39577.847517] RDX: ffff880e3b861e90 RSI: ffff880e3a8fb190 RDI: 
ffff880e3a8fb100
[39577.847518] RBP: ffff880e3b861e10 R08: dead000000100100 R09: 
dead000000200200
[39577.847518] R10: 0000000000000000 R11: 0000000000000001 R12: 
ffff880e3a624770
[39577.847518] R13: 0000000000000000 R14: ffff880e3a8fb1b8 R15: 
ffff880e3b861e80
[39577.847519] FS:  0000000000000000(0000) GS:ffff880e7fd80000(0000) 
knlGS:0000000000000000
[39577.847520] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[39577.847520] CR2: fffffffffffffe50 CR3: 0000000001c0b000 CR4: 
00000000000407e0
[39577.847521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[39577.847521] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[39577.847522] Process btrfs-endio-wri (pid: 3447, threadinfo 
ffff880e3b860000, task ffff880e40e58000)
[39577.847522] Stack:
[39577.847524]  0000000000000000 dead000000200200 0000000100965b86 
ffff880e40e94000
[39577.847525]  ffffffff8104dc20 ffff880e40e58000 ffffffffffffffff 
0000000000000000
[39577.847526]  0000000000000000 0000000000000000 ffff880e40e58000 
ffff880e3a624720
[39577.847527] Call Trace:
[39577.847530]  [<ffffffff8104dc20>] ? lock_timer_base+0x70/0x70
[39577.847531]  [<ffffffff8131b8d0>] finish_ordered_fn+0x10/0x20
[39577.847533]  [<ffffffff8133f38e>] worker_loop+0x14e/0x530
[39577.847534]  [<ffffffff8133f240>] ? btrfs_queue_worker+0x310/0x310
[39577.847535]  [<ffffffff8133f240>] ? btrfs_queue_worker+0x310/0x310
[39577.847538]  [<ffffffff8105ffd6>] kthread+0x96/0xa0
[39577.847541]  [<ffffffff816dc594>] kernel_thread_helper+0x4/0x10
[39577.847543]  [<ffffffff8105ff40>] ? kthread_worker_fn+0x130/0x130
[39577.847544]  [<ffffffff816dc590>] ? gs_change+0xb/0xb
[39577.847555] Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 c4 80 48 
89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 89 fb 4c 8b 
6f 38 <4d> 8b a5 50 fe ff ff 4d 8d 95 50 fe ff ff 48 c7 45 c8 00 00 00
[39577.847556] RIP  [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0
[39577.847557]  RSP <ffff880e3b861d90>
[39577.847557] CR2: fffffffffffffe50
[39577.847558] ---[ end trace 27bdc0b318ad6463 ]---

Am 26.06.2012 22:48, schrieb Josef Bacik:
> On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote:
>> Am 26.06.2012 22:14, schrieb Josef Bacik:
>>> I can't reproduce so I'm going to have to figure out a way to debug it through
>>> you, as soon as I think of something I will let you know.  Thanks,
>>>
>>
>> Thanks. You mentioned that discard shouldn't have any positive effects
>> on a SSD.
>>
>> May i see a sideffect? I mean with discard 13.000 IOPs in ceph without
>> discard just 6000-9000 IOPs could this be real or might this just happen
>> due to the bug i see?
>>
>
> Beats me, it would seem to me that discard would make things slower since we
> have to wait for the commit to discard everybody, but who knows, stranger things
> have happened.  Can you reproduce 2 more times and get sysrq+w each time so I
> have a few different outputs to compare, maybe I'm missing something.  Thanks,
>
> Josef
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-27  5:47                                 ` Stefan Priebe - Profihost AG
@ 2012-06-27 13:30                                   ` Josef Bacik
  2012-06-27 21:17                                   ` Josef Bacik
  1 sibling, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2012-06-27 13:30 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs

On Tue, Jun 26, 2012 at 11:47:15PM -0600, Stefan Priebe - Profihost AG wrote:
> Yes i will do so. Right now i was trying to compare discard with non 
> discard with this simple command:
> for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm 
> t_$i; done;
> 
> But i hit a new bug:
> 

Can you gdb btrfs.ko and do 

list *(btrfs_finish_ordered_io+0x23)

and what are your mount options?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: btrfs deadlock in 3.5-rc3
  2012-06-27  5:47                                 ` Stefan Priebe - Profihost AG
  2012-06-27 13:30                                   ` Josef Bacik
@ 2012-06-27 21:17                                   ` Josef Bacik
  1 sibling, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2012-06-27 21:17 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs

On Tue, Jun 26, 2012 at 11:47:15PM -0600, Stefan Priebe - Profihost AG wrote:
> Yes i will do so. Right now i was trying to compare discard with non 
> discard with this simple command:
> for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm 
> t_$i; done;
> 
> But i hit a new bug:
> 

Fixed this one with

Btrfs: hold a ref on the inode during writepages

Apply that and carry on with your tests :).  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-06-27 21:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-23  8:50 btrfs deadlock in 3.5-rc3 Stefan Priebe
2012-06-23 13:46 ` Michael
2012-06-23 14:55   ` Stefan Priebe
2012-06-25 13:08 ` Josef Bacik
2012-06-25 14:08   ` Stefan Priebe - Profihost AG
2012-06-25 14:20     ` Josef Bacik
2012-06-25 14:45       ` Stefan Priebe - Profihost AG
2012-06-25 14:48         ` Josef Bacik
2012-06-25 17:38           ` Stefan Priebe
2012-06-25 18:02             ` Josef Bacik
2012-06-25 18:28               ` Stefan Priebe
2012-06-25 19:33                 ` Stefan Priebe
2012-06-25 20:11                   ` Josef Bacik
2012-06-25 20:20                     ` Stefan Priebe
2012-06-25 20:23                       ` Josef Bacik
2012-06-25 20:33                         ` Stefan Priebe
2012-06-26 16:47                         ` Stefan Priebe
2012-06-26 20:14                           ` Josef Bacik
2012-06-26 20:19                             ` Stefan Priebe
2012-06-26 20:48                               ` Josef Bacik
2012-06-27  5:47                                 ` Stefan Priebe - Profihost AG
2012-06-27 13:30                                   ` Josef Bacik
2012-06-27 21:17                                   ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.