* btrfs deadlock in 3.5-rc3 @ 2012-06-23 8:50 Stefan Priebe 2012-06-23 13:46 ` Michael 2012-06-25 13:08 ` Josef Bacik 0 siblings, 2 replies; 23+ messages in thread From: Stefan Priebe @ 2012-06-23 8:50 UTC (permalink / raw) To: linux-btrfs Hello list, i've seen this deadlock today and can reproduce it while using ceph. Is this a known bug? [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 seconds. [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 0x00000000 [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 0000000000012280 [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 0000000000012280 [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 ffff880e78f4a180 [ 599.580814] Call Trace: [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580844] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 seconds. [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 0x00000000 [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 0000000000012280 [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 0000000000012280 [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 ffff880e46cce380 [ 599.580877] Call Trace: [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580891] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 seconds. [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 0x00000000 [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 0000000000012280 [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 0000000000012280 [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 ffff880e46d36600 [ 599.580920] Call Trace: [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580934] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 0x00000000 [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 0000000000012280 [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 0000000000012280 [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 ffff880e3efc4480 [ 599.580971] Call Trace: [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.580978] [<ffffffffa009dd65>] btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 [btrfs] [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] [ 599.581004] [<ffffffffa00ceec9>] btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] [ 599.581010] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 0x00000000 [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 0000000000012280 [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 0000000000012280 [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 ffff880e412283c0 [ 599.581040] Call Trace: [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.581048] [<ffffffffa009dd65>] btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 [btrfs] [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] [ 599.581071] [<ffffffffa00ceec9>] btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] [ 599.581077] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 0x00000000 [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 0000000000012280 [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 0000000000012280 [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 ffff880bc3bd4740 [ 599.581099] Call Trace: [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 [ 599.581106] [<ffffffffa009dd65>] btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 [btrfs] [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] [ 599.581129] [<ffffffffa00ceec9>] btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] [ 599.581134] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 seconds. [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 0x00000000 [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 0000000000012280 [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 0000000000012280 [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 ffff880e78f4a180 [ 719.530071] Call Trace: [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552752] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 seconds. [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 0x00000000 [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 0000000000012280 [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 0000000000012280 [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 ffff880e7890e080 [ 719.552787] Call Trace: [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552802] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 seconds. [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 0x00000000 [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 0000000000012280 [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 0000000000012280 [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 ffff880e46cce380 [ 719.552832] Call Trace: [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552846] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 seconds. [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 0x00000000 [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 0000000000012280 [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 0000000000012280 [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 ffff880e46d36600 [ 719.552874] Call Trace: [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552888] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 [btrfs] [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 [btrfs] [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-23 8:50 btrfs deadlock in 3.5-rc3 Stefan Priebe @ 2012-06-23 13:46 ` Michael 2012-06-23 14:55 ` Stefan Priebe 2012-06-25 13:08 ` Josef Bacik 1 sibling, 1 reply; 23+ messages in thread From: Michael @ 2012-06-23 13:46 UTC (permalink / raw) To: Stefan Priebe; +Cc: linux-btrfs Hi Stefan, Could you please give us details about your setup, especially what kernel/btrfs version. If you aren't on the most recent, it would be great if you could update and see if you can still reproduce, as there was a major change in btrfs recently for Ceph. I'm not sure what version it made it into. Regards, Michael On Sat, Jun 23, 2012 at 3:50 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: > > Hello list, > > i've seen this deadlock today and can reproduce it while using ceph. > > Is this a known bug? > > [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 > seconds. > [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 599.580814] Call Trace: > [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580844] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 599.580877] Call Trace: > [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580891] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 599.580920] Call Trace: > [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580934] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. > [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 > 0x00000000 > [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 > 0000000000012280 > [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 > 0000000000012280 > [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 > ffff880e3efc4480 > [ 599.580971] Call Trace: > [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580978] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 > [btrfs] > [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581004] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581010] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 > [btrfs] > [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. > [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 > 0x00000000 > [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 > 0000000000012280 > [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 > 0000000000012280 > [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 > ffff880e412283c0 > [ 599.581040] Call Trace: > [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581048] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 > [btrfs] > [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581071] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581077] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 > [btrfs] > [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. > [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 > 0x00000000 > [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 > 0000000000012280 > [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 > 0000000000012280 > [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 > ffff880bc3bd4740 > [ 599.581099] Call Trace: > [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581106] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 > [btrfs] > [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581129] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581134] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 > [btrfs] > [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 > seconds. > [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 719.530071] Call Trace: > [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552752] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 > seconds. > [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 > 0x00000000 > [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 > 0000000000012280 > [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 > 0000000000012280 > [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 > ffff880e7890e080 > [ 719.552787] Call Trace: > [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552802] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 719.552832] Call Trace: > [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552846] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 719.552874] Call Trace: > [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552888] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 > [btrfs] > [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > > Stefan > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-23 13:46 ` Michael @ 2012-06-23 14:55 ` Stefan Priebe 0 siblings, 0 replies; 23+ messages in thread From: Stefan Priebe @ 2012-06-23 14:55 UTC (permalink / raw) To: Michael; +Cc: linux-btrfs I'm on linux kernel 3.5.0-rc3 like I mentioned in subject ;-). Kernel.org says this is the latest. Greets Stefan Am 23.06.2012 um 15:46 schrieb Michael <mike@draftx.net>: > Hi Stefan, > Could you please give us details about your setup, especially what > kernel/btrfs version. If you aren't on the most recent, it would be > great if you could update and see if you can still reproduce, as there > was a major change in btrfs recently for Ceph. I'm not sure what > version it made it into. > Regards, > Michael > > On Sat, Jun 23, 2012 at 3:50 AM, Stefan Priebe <s.priebe@profihost.ag> wrote: >> >> Hello list, >> >> i've seen this deadlock today and can reproduce it while using ceph. >> >> Is this a known bug? >> >> [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 >> seconds. >> [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 >> 0x00000000 >> [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 >> 0000000000012280 >> [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 >> 0000000000012280 >> [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 >> ffff880e78f4a180 >> [ 599.580814] Call Trace: >> [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580844] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 >> seconds. >> [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 >> 0x00000000 >> [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 >> 0000000000012280 >> [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 >> 0000000000012280 >> [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 >> ffff880e46cce380 >> [ 599.580877] Call Trace: >> [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580891] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 >> seconds. >> [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 >> 0x00000000 >> [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 >> 0000000000012280 >> [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 >> 0000000000012280 >> [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 >> ffff880e46d36600 >> [ 599.580920] Call Trace: >> [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580934] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. >> [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 >> 0x00000000 >> [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 >> 0000000000012280 >> [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 >> 0000000000012280 >> [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 >> ffff880e3efc4480 >> [ 599.580971] Call Trace: >> [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.580978] [<ffffffffa009dd65>] >> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] >> [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 >> [btrfs] >> [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 >> [btrfs] >> [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] >> [ 599.581004] [<ffffffffa00ceec9>] >> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] >> [ 599.581010] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 >> [btrfs] >> [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] >> [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 >> [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 >> [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 >> [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b >> [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. >> [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 >> 0x00000000 >> [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 >> 0000000000012280 >> [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 >> 0000000000012280 >> [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 >> ffff880e412283c0 >> [ 599.581040] Call Trace: >> [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.581048] [<ffffffffa009dd65>] >> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] >> [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 >> [btrfs] >> [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 >> [btrfs] >> [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] >> [ 599.581071] [<ffffffffa00ceec9>] >> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] >> [ 599.581077] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 >> [btrfs] >> [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] >> [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 >> [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 >> [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 >> [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b >> [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. >> [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 >> 0x00000000 >> [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 >> 0000000000012280 >> [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 >> 0000000000012280 >> [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 >> ffff880bc3bd4740 >> [ 599.581099] Call Trace: >> [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 599.581106] [<ffffffffa009dd65>] >> btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] >> [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 >> [btrfs] >> [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 >> [btrfs] >> [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] >> [ 599.581129] [<ffffffffa00ceec9>] >> btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] >> [ 599.581134] [<ffffffffa00d0bc4>] btrfs_ioctl_snap_create_v2+0x84/0xf0 >> [btrfs] >> [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] >> [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 >> [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 >> [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 >> [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b >> [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 >> seconds. >> [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 >> 0x00000000 >> [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 >> 0000000000012280 >> [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 >> 0000000000012280 >> [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 >> ffff880e78f4a180 >> [ 719.530071] Call Trace: >> [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552752] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 >> seconds. >> [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 >> 0x00000000 >> [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 >> 0000000000012280 >> [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 >> 0000000000012280 >> [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 >> ffff880e7890e080 >> [ 719.552787] Call Trace: >> [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552802] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 >> seconds. >> [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 >> 0x00000000 >> [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 >> 0000000000012280 >> [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 >> 0000000000012280 >> [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 >> ffff880e46cce380 >> [ 719.552832] Call Trace: >> [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552846] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 >> seconds. >> [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 >> 0x00000000 >> [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 >> 0000000000012280 >> [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 >> 0000000000012280 >> [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 >> ffff880e46d36600 >> [ 719.552874] Call Trace: >> [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 >> [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] >> [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552888] [<ffffffffa009d095>] btrfs_commit_transaction+0x655/0xab0 >> [btrfs] >> [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 >> [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 >> [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 >> [btrfs] >> [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] >> [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 >> [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 >> [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 >> [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 >> [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 >> [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 >> [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb >> >> Stefan >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-23 8:50 btrfs deadlock in 3.5-rc3 Stefan Priebe 2012-06-23 13:46 ` Michael @ 2012-06-25 13:08 ` Josef Bacik 2012-06-25 14:08 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-25 13:08 UTC (permalink / raw) To: Stefan Priebe; +Cc: linux-btrfs On 06/23/2012 04:50 AM, Stefan Priebe wrote: > Hello list, > > i've seen this deadlock today and can reproduce it while using ceph. > > Is this a known bug? > > [ 599.514534] INFO: task kworker/6:0:29 blocked for more than 120 seconds. > [ 599.530394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.546832] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 599.563725] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 599.580812] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 599.580813] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 599.580814] Call Trace: > [ 599.580820] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580832] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580836] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580844] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 599.580846] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580847] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580853] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580865] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580868] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580869] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580872] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 599.580872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.580873] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 599.580874] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 599.580875] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 599.580876] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 599.580877] Call Trace: > [ 599.580879] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580884] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580886] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580891] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 599.580893] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580894] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580899] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580905] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580906] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580907] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580908] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580910] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580911] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580913] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580914] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580915] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 599.580915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.580916] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 599.580917] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 599.580919] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 599.580920] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 599.580920] Call Trace: > [ 599.580922] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580927] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 599.580929] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580934] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 599.580936] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 599.580937] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580942] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 599.580948] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 599.580949] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 599.580950] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 599.580951] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 599.580952] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 599.580954] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 599.580955] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 599.580957] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 599.580966] INFO: task ceph-osd:3248 blocked for more than 120 seconds. > [ 599.580966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.580967] ceph-osd D 0000000000000006 0 3248 1 > 0x00000000 > [ 599.580968] ffff880bc4201c38 0000000000000082 ffff880e3efc4480 > 0000000000012280 > [ 599.580970] ffff880bc4201fd8 ffff880bc4200010 0000000000012280 > 0000000000012280 > [ 599.580971] ffff880bc4201fd8 0000000000012280 ffff880e78f4a180 > ffff880e3efc4480 > [ 599.580971] Call Trace: > [ 599.580972] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.580978] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.580983] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] > [ 599.580985] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.580990] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.580997] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581004] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581010] [<ffffffffa00d0bc4>] > btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] > [ 599.581016] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581019] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581021] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581022] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581024] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581035] INFO: task ceph-osd:3278 blocked for more than 120 seconds. > [ 599.581035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.581036] ceph-osd D 0000000000000002 0 3278 1 > 0x00000000 > [ 599.581038] ffff880bc41fbc38 0000000000000082 ffff880e412283c0 > 0000000000012280 > [ 599.581039] ffff880bc41fbfd8 ffff880bc41fa010 0000000000012280 > 0000000000012280 > [ 599.581040] ffff880bc41fbfd8 0000000000012280 ffff880e46cce380 > ffff880e412283c0 > [ 599.581040] Call Trace: > [ 599.581042] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581048] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581053] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] > [ 599.581055] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581059] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581066] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581071] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581077] [<ffffffffa00d0bc4>] > btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] > [ 599.581082] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581084] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581085] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581086] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581088] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 599.581094] INFO: task ceph-osd:3330 blocked for more than 120 seconds. > [ 599.581094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 599.581095] ceph-osd D 0000000000000005 0 3330 1 > 0x00000000 > [ 599.581096] ffff880bc3bf9c38 0000000000000086 ffff880bc3bd4740 > 0000000000012280 > [ 599.581098] ffff880bc3bf9fd8 ffff880bc3bf8010 0000000000012280 > 0000000000012280 > [ 599.581099] ffff880bc3bf9fd8 0000000000012280 ffff880e46d36600 > ffff880bc3bd4740 > [ 599.581099] Call Trace: > [ 599.581101] [<ffffffff81620334>] schedule+0x24/0x70 > [ 599.581106] [<ffffffffa009dd65>] > btrfs_commit_transaction_async+0x1d5/0x240 [btrfs] > [ 599.581111] [<ffffffffa0081eb6>] ? block_rsv_add_bytes+0x26/0x60 [btrfs] > [ 599.581113] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 599.581118] [<ffffffffa0081f25>] ? block_rsv_migrate_bytes+0x35/0x50 > [btrfs] > [ 599.581123] [<ffffffffa00ced2e>] btrfs_mksubvol+0x2be/0x350 [btrfs] > [ 599.581129] [<ffffffffa00ceec9>] > btrfs_ioctl_snap_create_transid+0x109/0x1a0 [btrfs] > [ 599.581134] [<ffffffffa00d0bc4>] > btrfs_ioctl_snap_create_v2+0x84/0xf0 [btrfs] > [ 599.581139] [<ffffffffa00d353f>] btrfs_ioctl+0x76f/0x12d0 [btrfs] > [ 599.581141] [<ffffffff8114459a>] ? fsnotify+0x1ba/0x2e0 > [ 599.581142] [<ffffffff8111ade3>] do_vfs_ioctl+0x93/0x4f0 > [ 599.581143] [<ffffffff8111b28a>] sys_ioctl+0x4a/0x80 > [ 599.581145] [<ffffffff81621ba2>] system_call_fastpath+0x16/0x1b > [ 719.395814] INFO: task kworker/6:0:29 blocked for more than 120 seconds. > [ 719.417426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.439537] kworker/6:0 D ffffffff8180ebc0 0 29 2 > 0x00000000 > [ 719.461937] ffff880e78f4dc90 0000000000000046 ffff880e78f4a180 > 0000000000012280 > [ 719.484547] ffff880e78f4dfd8 ffff880e78f4c010 0000000000012280 > 0000000000012280 > [ 719.507211] ffff880e78f4dfd8 0000000000012280 ffff880e78e8c340 > ffff880e78f4a180 > [ 719.530071] Call Trace: > [ 719.552729] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552741] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552744] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552752] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552755] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552756] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552762] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552768] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552770] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552771] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552772] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552774] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552776] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552777] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552779] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552781] INFO: task kworker/3:1:578 blocked for more than 120 seconds. > [ 719.552781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.552783] kworker/3:1 D ffffffff8180ebc0 0 578 2 > 0x00000000 > [ 719.552784] ffff880e45dd3c90 0000000000000046 ffff880e7890e080 > 0000000000012280 > [ 719.552786] ffff880e45dd3fd8 ffff880e45dd2010 0000000000012280 > 0000000000012280 > [ 719.552787] ffff880e45dd3fd8 0000000000012280 ffff880e78e7e280 > ffff880e7890e080 > [ 719.552787] Call Trace: > [ 719.552789] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552795] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552797] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552802] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552804] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552805] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552810] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552816] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552817] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552818] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552819] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552820] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552822] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552824] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552825] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552827] INFO: task kworker/2:1:1184 blocked for more than 120 > seconds. > [ 719.552827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.552828] kworker/2:1 D ffffffff8180ebc0 0 1184 2 > 0x00000000 > [ 719.552830] ffff880e448cfc90 0000000000000046 ffff880e46cce380 > 0000000000012280 > [ 719.552831] ffff880e448cffd8 ffff880e448ce010 0000000000012280 > 0000000000012280 > [ 719.552832] ffff880e448cffd8 0000000000012280 ffff880e78e5a240 > ffff880e46cce380 > [ 719.552832] Call Trace: > [ 719.552834] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552839] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552841] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552846] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552848] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552849] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552854] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552859] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552861] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552862] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552863] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552864] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552866] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552867] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552868] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > [ 719.552869] INFO: task kworker/5:1:1186 blocked for more than 120 > seconds. > [ 719.552869] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 719.552870] kworker/5:1 D ffffffff8180ebc0 0 1186 2 > 0x00000000 > [ 719.552872] ffff880e448a3c90 0000000000000046 ffff880e46d36600 > 0000000000012280 > [ 719.552873] ffff880e448a3fd8 ffff880e448a2010 0000000000012280 > 0000000000012280 > [ 719.552874] ffff880e448a3fd8 0000000000012280 ffff880e78e88300 > ffff880e46d36600 > [ 719.552874] Call Trace: > [ 719.552876] [<ffffffff81620334>] schedule+0x24/0x70 > [ 719.552881] [<ffffffffa009b4c5>] wait_for_commit+0x55/0x90 [btrfs] > [ 719.552883] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552888] [<ffffffffa009d095>] > btrfs_commit_transaction+0x655/0xab0 [btrfs] > [ 719.552890] [<ffffffff81076058>] ? idle_balance+0x108/0x130 > [ 719.552891] [<ffffffff81060140>] ? wake_up_bit+0x40/0x40 > [ 719.552896] [<ffffffffa009d7f0>] ? btrfs_end_transaction+0x20/0x20 > [btrfs] > [ 719.552901] [<ffffffffa009d80a>] do_async_commit+0x1a/0x30 [btrfs] > [ 719.552902] [<ffffffff81058eff>] process_one_work+0x11f/0x470 > [ 719.552903] [<ffffffff8105b128>] worker_thread+0x178/0x400 > [ 719.552904] [<ffffffff8105afb0>] ? manage_workers+0x210/0x210 > [ 719.552906] [<ffffffff8105fc46>] kthread+0x96/0xa0 > [ 719.552907] [<ffffffff81622dd4>] kernel_thread_helper+0x4/0x10 > [ 719.552909] [<ffffffff8105fbb0>] ? kthread_worker_fn+0x130/0x130 > [ 719.552910] [<ffffffff81622dd0>] ? gs_change+0xb/0xb > This isn't showing the guy who's actually trying to commit the transaction. Next time this happens can you do a sysrq+w and capture the output and post it here so we can see what everybody is doing? Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 13:08 ` Josef Bacik @ 2012-06-25 14:08 ` Stefan Priebe - Profihost AG 2012-06-25 14:20 ` Josef Bacik 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-25 14:08 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 25.06.2012 15:08, schrieb Josef Bacik:> > This isn't showing the guy who's actually trying to commit the > transaction. Next time this happens can you do a sysrq+w and capture > the output and post it here so we can see what everybody is doing? Thanks, > > Josef No problem. Kernel trace: http://pastebin.com/raw.php?i=puZkCRCn # sys rq w trigger: # echo w > /proc/sysrq-trigger http://pastebin.com/raw.php?i=AQA8xxCX Hope that helps. I'm able to trigger this pretty easily with ceph. So i can produce as much info as you want. Thanks! Greets, Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 14:08 ` Stefan Priebe - Profihost AG @ 2012-06-25 14:20 ` Josef Bacik 2012-06-25 14:45 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-25 14:20 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs On Mon, Jun 25, 2012 at 08:08:31AM -0600, Stefan Priebe - Profihost AG wrote: > Am 25.06.2012 15:08, schrieb Josef Bacik:> > > This isn't showing the guy who's actually trying to commit the > > transaction. Next time this happens can you do a sysrq+w and capture > > the output and post it here so we can see what everybody is doing? > Thanks, > > > > Josef > > No problem. > > Kernel trace: > http://pastebin.com/raw.php?i=puZkCRCn > > # sys rq w trigger: > # echo w > /proc/sysrq-trigger > http://pastebin.com/raw.php?i=AQA8xxCX Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so unless theres a bug theres nothing blocked. Is the box actually hung or is it just taking forever? Maybe try sysrq+w again to see if the one you pasted was just a fluke? Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 14:20 ` Josef Bacik @ 2012-06-25 14:45 ` Stefan Priebe - Profihost AG 2012-06-25 14:48 ` Josef Bacik 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-25 14:45 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs > > Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so > unless theres a bug theres nothing blocked. Is the box actually hung or is it > just taking forever? Maybe try sysrq+w again to see if the one you pasted was > just a fluke? Thanks, This one looks better: http://pastebin.com/raw.php?i=R4pztDRt Sorry. Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 14:45 ` Stefan Priebe - Profihost AG @ 2012-06-25 14:48 ` Josef Bacik 2012-06-25 17:38 ` Stefan Priebe 0 siblings, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-25 14:48 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote: > > > > Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so > > unless theres a bug theres nothing blocked. Is the box actually hung or is it > > just taking forever? Maybe try sysrq+w again to see if the one you pasted was > > just a fluke? Thanks, > > This one looks better: > http://pastebin.com/raw.php?i=R4pztDRt > Ok looks like you have discard turned on. Can you turn that off and see if you can still reproduce the deadlock? If so sysrq+w again, if not then I know where to look ;). Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 14:48 ` Josef Bacik @ 2012-06-25 17:38 ` Stefan Priebe 2012-06-25 18:02 ` Josef Bacik 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe @ 2012-06-25 17:38 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 25.06.2012 16:48, schrieb Josef Bacik: > On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote: >>> >>> Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so >>> unless theres a bug theres nothing blocked. Is the box actually hung or is it >>> just taking forever? Maybe try sysrq+w again to see if the one you pasted was >>> just a fluke? Thanks, >> >> This one looks better: >> http://pastebin.com/raw.php?i=R4pztDRt >> > > Ok looks like you have discard turned on. Yes > Can you turn that off and see if you > can still reproduce the deadlock? If so sysrq+w again, if not then I know where > to look ;). Thanks, without discard i can't reproduce but random write speed with ceph without discard is a LOT slower (around 8000 iops/s instead of 13000iops/s). So i don't know if it is discard or if i'm just not able to trigger it. Thanks! Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 17:38 ` Stefan Priebe @ 2012-06-25 18:02 ` Josef Bacik 2012-06-25 18:28 ` Stefan Priebe 0 siblings, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-25 18:02 UTC (permalink / raw) To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs On Mon, Jun 25, 2012 at 11:38:38AM -0600, Stefan Priebe wrote: > > Am 25.06.2012 16:48, schrieb Josef Bacik: > > On Mon, Jun 25, 2012 at 08:45:02AM -0600, Stefan Priebe - Profihost AG wrote: > >>> > >>> Thats weird, sysrq+w should have a bunch of stacktraces but it's empty, so > >>> unless theres a bug theres nothing blocked. Is the box actually hung or is it > >>> just taking forever? Maybe try sysrq+w again to see if the one you pasted was > >>> just a fluke? Thanks, > >> > >> This one looks better: > >> http://pastebin.com/raw.php?i=R4pztDRt > >> > > > > Ok looks like you have discard turned on. > Yes > > > Can you turn that off and see if you > > can still reproduce the deadlock? If so sysrq+w again, if not then I know where > > to look ;). Thanks, > without discard i can't reproduce but random write speed with ceph > without discard is a LOT slower (around 8000 iops/s instead of > 13000iops/s). So i don't know if it is discard or if i'm just not able > to trigger it. > Ouch, what kind of drive goes faster with discard _on_? Anyway it looks like we're waiting for the discard to come back, so either its your drive or theres a bug in the block layer. Maybe try an older kernel and see if it's broken there, and then bisect it down? Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 18:02 ` Josef Bacik @ 2012-06-25 18:28 ` Stefan Priebe 2012-06-25 19:33 ` Stefan Priebe 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe @ 2012-06-25 18:28 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 25.06.2012 20:02, schrieb Josef Bacik: >> > Can you turn that off and see if you >>> can still reproduce the deadlock? If so sysrq+w again, if not then I know where >>> to look ;). Thanks, >> without discard i can't reproduce but random write speed with ceph >> without discard is a LOT slower (around 8000 iops/s instead of >> 13000iops/s). So i don't know if it is discard or if i'm just not able >> to trigger it. >> > > Ouch, what kind of drive goes faster with discard _on_? Anyway it looks like > we're waiting for the discard to come back, so either its your drive or theres a > bug in the block layer. Maybe try an older kernel and see if it's broken there, > and then bisect it down? Thanks, Intel 520 series SSD. I can't try an older kernel version as ceph crashes with them pretty fast due to other bugs which where fixed with 3.5-rc1. Greets Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 18:28 ` Stefan Priebe @ 2012-06-25 19:33 ` Stefan Priebe 2012-06-25 20:11 ` Josef Bacik 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe @ 2012-06-25 19:33 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs With v3.4 the same. Can't go back more as this really results in very fast corruption. Any ideas how to debug? Stefan Am 25.06.2012 20:28, schrieb Stefan Priebe: > Am 25.06.2012 20:02, schrieb Josef Bacik: >>> > Can you turn that off and see if you >>>> can still reproduce the deadlock? If so sysrq+w again, if not then >>>> I know where >>>> to look ;). Thanks, >>> without discard i can't reproduce but random write speed with ceph >>> without discard is a LOT slower (around 8000 iops/s instead of >>> 13000iops/s). So i don't know if it is discard or if i'm just not able >>> to trigger it. >>> >> >> Ouch, what kind of drive goes faster with discard _on_? Anyway it >> looks like >> we're waiting for the discard to come back, so either its your drive >> or theres a >> bug in the block layer. Maybe try an older kernel and see if it's >> broken there, >> and then bisect it down? Thanks, > > Intel 520 series SSD. I can't try an older kernel version as ceph > crashes with them pretty fast due to other bugs which where fixed with > 3.5-rc1. > > Greets > Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 19:33 ` Stefan Priebe @ 2012-06-25 20:11 ` Josef Bacik 2012-06-25 20:20 ` Stefan Priebe 0 siblings, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-25 20:11 UTC (permalink / raw) To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: > With v3.4 the same. Can't go back more as this really results in very > fast corruption. Any ideas how to debug? > What workload are you running? I have a ssd here with discard support I can try and reproduce on. Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 20:11 ` Josef Bacik @ 2012-06-25 20:20 ` Stefan Priebe 2012-06-25 20:23 ` Josef Bacik 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe @ 2012-06-25 20:20 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 25.06.2012 22:11, schrieb Josef Bacik: > On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: >> With v3.4 the same. Can't go back more as this really results in very >> fast corruption. Any ideas how to debug? >> > > What workload are you running? I have a ssd here with discard support I can try > and reproduce on. Thanks, i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i don't know which load ceph then exactly generates. ;-( Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 20:20 ` Stefan Priebe @ 2012-06-25 20:23 ` Josef Bacik 2012-06-25 20:33 ` Stefan Priebe 2012-06-26 16:47 ` Stefan Priebe 0 siblings, 2 replies; 23+ messages in thread From: Josef Bacik @ 2012-06-25 20:23 UTC (permalink / raw) To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: > Am 25.06.2012 22:11, schrieb Josef Bacik: > > On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: > >> With v3.4 the same. Can't go back more as this really results in very > >> fast corruption. Any ideas how to debug? > >> > > > > What workload are you running? I have a ssd here with discard support I can try > > and reproduce on. Thanks, > > i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i > don't know which load ceph then exactly generates. ;-( > Thats fine, I have this handy "create a local ceph cluster" script from an earlier problem, just send me your fio job and I'll run it locally. Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 20:23 ` Josef Bacik @ 2012-06-25 20:33 ` Stefan Priebe 2012-06-26 16:47 ` Stefan Priebe 1 sibling, 0 replies; 23+ messages in thread From: Stefan Priebe @ 2012-06-25 20:33 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 25.06.2012 22:23, schrieb Josef Bacik: > On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: >> Am 25.06.2012 22:11, schrieb Josef Bacik: >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: >>>> With v3.4 the same. Can't go back more as this really results in very >>>> fast corruption. Any ideas how to debug? >>>> >>> >>> What workload are you running? I have a ssd here with discard support I can try >>> and reproduce on. Thanks, >> >> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i >> don't know which load ceph then exactly generates. ;-( >> > > Thats fine, I have this handy "create a local ceph cluster" script from an > earlier problem, just send me your fio job and I'll run it locally. Thanks, OK my fio job is running in a KVM using a RBD: fio --filename=$DISK --direct=1 --rw=randwrite --bs=4k --size=200G --numjobs=50 --runtime=300 --group_reporting --name=file1 Backed by 3 server with 4 OSDs (all intel SSDs) each running btrfs on top. THANKS! Greets Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-25 20:23 ` Josef Bacik 2012-06-25 20:33 ` Stefan Priebe @ 2012-06-26 16:47 ` Stefan Priebe 2012-06-26 20:14 ` Josef Bacik 1 sibling, 1 reply; 23+ messages in thread From: Stefan Priebe @ 2012-06-26 16:47 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 25.06.2012 22:23, schrieb Josef Bacik: > On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: >> Am 25.06.2012 22:11, schrieb Josef Bacik: >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: >>>> With v3.4 the same. Can't go back more as this really results in very >>>> fast corruption. Any ideas how to debug? >>>> >>> >>> What workload are you running? I have a ssd here with discard support I can try >>> and reproduce on. Thanks, >> >> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i >> don't know which load ceph then exactly generates. ;-( >> > > Thats fine, I have this handy "create a local ceph cluster" script from an > earlier problem, just send me your fio job and I'll run it locally. Thanks, Where you able to find anything? Can i do more or different testing? Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-26 16:47 ` Stefan Priebe @ 2012-06-26 20:14 ` Josef Bacik 2012-06-26 20:19 ` Stefan Priebe 0 siblings, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-26 20:14 UTC (permalink / raw) To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs On Tue, Jun 26, 2012 at 10:47:56AM -0600, Stefan Priebe wrote: > Am 25.06.2012 22:23, schrieb Josef Bacik: > > On Mon, Jun 25, 2012 at 02:20:31PM -0600, Stefan Priebe wrote: > >> Am 25.06.2012 22:11, schrieb Josef Bacik: > >>> On Mon, Jun 25, 2012 at 01:33:09PM -0600, Stefan Priebe wrote: > >>>> With v3.4 the same. Can't go back more as this really results in very > >>>> fast corruption. Any ideas how to debug? > >>>> > >>> > >>> What workload are you running? I have a ssd here with discard support I can try > >>> and reproduce on. Thanks, > >> > >> i'm using fio with 50 jobs and randwrite of 4k blocks in ceph but i > >> don't know which load ceph then exactly generates. ;-( > >> > > > > Thats fine, I have this handy "create a local ceph cluster" script from an > > earlier problem, just send me your fio job and I'll run it locally. Thanks, > > Where you able to find anything? Can i do more or different testing? > I can't reproduce so I'm going to have to figure out a way to debug it through you, as soon as I think of something I will let you know. Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-26 20:14 ` Josef Bacik @ 2012-06-26 20:19 ` Stefan Priebe 2012-06-26 20:48 ` Josef Bacik 0 siblings, 1 reply; 23+ messages in thread From: Stefan Priebe @ 2012-06-26 20:19 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Am 26.06.2012 22:14, schrieb Josef Bacik: > I can't reproduce so I'm going to have to figure out a way to debug it through > you, as soon as I think of something I will let you know. Thanks, > Thanks. You mentioned that discard shouldn't have any positive effects on a SSD. May i see a sideffect? I mean with discard 13.000 IOPs in ceph without discard just 6000-9000 IOPs could this be real or might this just happen due to the bug i see? Stefan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-26 20:19 ` Stefan Priebe @ 2012-06-26 20:48 ` Josef Bacik 2012-06-27 5:47 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 23+ messages in thread From: Josef Bacik @ 2012-06-26 20:48 UTC (permalink / raw) To: Stefan Priebe; +Cc: Josef Bacik, linux-btrfs On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote: > Am 26.06.2012 22:14, schrieb Josef Bacik: > > I can't reproduce so I'm going to have to figure out a way to debug it through > > you, as soon as I think of something I will let you know. Thanks, > > > > Thanks. You mentioned that discard shouldn't have any positive effects > on a SSD. > > May i see a sideffect? I mean with discard 13.000 IOPs in ceph without > discard just 6000-9000 IOPs could this be real or might this just happen > due to the bug i see? > Beats me, it would seem to me that discard would make things slower since we have to wait for the commit to discard everybody, but who knows, stranger things have happened. Can you reproduce 2 more times and get sysrq+w each time so I have a few different outputs to compare, maybe I'm missing something. Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-26 20:48 ` Josef Bacik @ 2012-06-27 5:47 ` Stefan Priebe - Profihost AG 2012-06-27 13:30 ` Josef Bacik 2012-06-27 21:17 ` Josef Bacik 0 siblings, 2 replies; 23+ messages in thread From: Stefan Priebe - Profihost AG @ 2012-06-27 5:47 UTC (permalink / raw) To: Josef Bacik; +Cc: linux-btrfs Yes i will do so. Right now i was trying to compare discard with non discard with this simple command: for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm t_$i; done; But i hit a new bug: [39577.660228] BUG: unable to handle kernel paging request at fffffffffffffe50 [39577.686517] IP: [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0 [39577.713417] PGD 1c0d067 PUD 1c0e067 PMD 0 [39577.740039] Oops: 0000 [#1] SMP [39577.766401] CPU 6 [39577.792540] Modules linked in: nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 i2c_i801 coretemp i2c_core ixgbe(O) [last unloaded: scsi_wait_scan] [39577.847511] [39577.847513] Pid: 3447, comm: btrfs-endio-wri Tainted: G O 3.5.0-rc4intel #15 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F [39577.847516] RIP: 0010:[<ffffffff8131b4f3>] [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0 [39577.847516] RSP: 0018:ffff880e3b861d90 EFLAGS: 00010287 [39577.847517] RAX: ffff880e3b861e90 RBX: ffff880e3a8fb100 RCX: ffff880e3b861e90 [39577.847517] RDX: ffff880e3b861e90 RSI: ffff880e3a8fb190 RDI: ffff880e3a8fb100 [39577.847518] RBP: ffff880e3b861e10 R08: dead000000100100 R09: dead000000200200 [39577.847518] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880e3a624770 [39577.847518] R13: 0000000000000000 R14: ffff880e3a8fb1b8 R15: ffff880e3b861e80 [39577.847519] FS: 0000000000000000(0000) GS:ffff880e7fd80000(0000) knlGS:0000000000000000 [39577.847520] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [39577.847520] CR2: fffffffffffffe50 CR3: 0000000001c0b000 CR4: 00000000000407e0 [39577.847521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [39577.847521] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [39577.847522] Process btrfs-endio-wri (pid: 3447, threadinfo ffff880e3b860000, task ffff880e40e58000) [39577.847522] Stack: [39577.847524] 0000000000000000 dead000000200200 0000000100965b86 ffff880e40e94000 [39577.847525] ffffffff8104dc20 ffff880e40e58000 ffffffffffffffff 0000000000000000 [39577.847526] 0000000000000000 0000000000000000 ffff880e40e58000 ffff880e3a624720 [39577.847527] Call Trace: [39577.847530] [<ffffffff8104dc20>] ? lock_timer_base+0x70/0x70 [39577.847531] [<ffffffff8131b8d0>] finish_ordered_fn+0x10/0x20 [39577.847533] [<ffffffff8133f38e>] worker_loop+0x14e/0x530 [39577.847534] [<ffffffff8133f240>] ? btrfs_queue_worker+0x310/0x310 [39577.847535] [<ffffffff8133f240>] ? btrfs_queue_worker+0x310/0x310 [39577.847538] [<ffffffff8105ffd6>] kthread+0x96/0xa0 [39577.847541] [<ffffffff816dc594>] kernel_thread_helper+0x4/0x10 [39577.847543] [<ffffffff8105ff40>] ? kthread_worker_fn+0x130/0x130 [39577.847544] [<ffffffff816dc590>] ? gs_change+0xb/0xb [39577.847555] Code: 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 c4 80 48 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 89 fb 4c 8b 6f 38 <4d> 8b a5 50 fe ff ff 4d 8d 95 50 fe ff ff 48 c7 45 c8 00 00 00 [39577.847556] RIP [<ffffffff8131b4f3>] btrfs_finish_ordered_io+0x23/0x3f0 [39577.847557] RSP <ffff880e3b861d90> [39577.847557] CR2: fffffffffffffe50 [39577.847558] ---[ end trace 27bdc0b318ad6463 ]--- Am 26.06.2012 22:48, schrieb Josef Bacik: > On Tue, Jun 26, 2012 at 02:19:17PM -0600, Stefan Priebe wrote: >> Am 26.06.2012 22:14, schrieb Josef Bacik: >>> I can't reproduce so I'm going to have to figure out a way to debug it through >>> you, as soon as I think of something I will let you know. Thanks, >>> >> >> Thanks. You mentioned that discard shouldn't have any positive effects >> on a SSD. >> >> May i see a sideffect? I mean with discard 13.000 IOPs in ceph without >> discard just 6000-9000 IOPs could this be real or might this just happen >> due to the bug i see? >> > > Beats me, it would seem to me that discard would make things slower since we > have to wait for the commit to discard everybody, but who knows, stranger things > have happened. Can you reproduce 2 more times and get sysrq+w each time so I > have a few different outputs to compare, maybe I'm missing something. Thanks, > > Josef > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-27 5:47 ` Stefan Priebe - Profihost AG @ 2012-06-27 13:30 ` Josef Bacik 2012-06-27 21:17 ` Josef Bacik 1 sibling, 0 replies; 23+ messages in thread From: Josef Bacik @ 2012-06-27 13:30 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs On Tue, Jun 26, 2012 at 11:47:15PM -0600, Stefan Priebe - Profihost AG wrote: > Yes i will do so. Right now i was trying to compare discard with non > discard with this simple command: > for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm > t_$i; done; > > But i hit a new bug: > Can you gdb btrfs.ko and do list *(btrfs_finish_ordered_io+0x23) and what are your mount options? Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs deadlock in 3.5-rc3 2012-06-27 5:47 ` Stefan Priebe - Profihost AG 2012-06-27 13:30 ` Josef Bacik @ 2012-06-27 21:17 ` Josef Bacik 1 sibling, 0 replies; 23+ messages in thread From: Josef Bacik @ 2012-06-27 21:17 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: Josef Bacik, linux-btrfs On Tue, Jun 26, 2012 at 11:47:15PM -0600, Stefan Priebe - Profihost AG wrote: > Yes i will do so. Right now i was trying to compare discard with non > discard with this simple command: > for i in `seq 0 1 1000`; do dd if=/dev/zero of=t_$i bs=4M count=1; rm > t_$i; done; > > But i hit a new bug: > Fixed this one with Btrfs: hold a ref on the inode during writepages Apply that and carry on with your tests :). Thanks, Josef ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-06-27 21:17 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-06-23 8:50 btrfs deadlock in 3.5-rc3 Stefan Priebe 2012-06-23 13:46 ` Michael 2012-06-23 14:55 ` Stefan Priebe 2012-06-25 13:08 ` Josef Bacik 2012-06-25 14:08 ` Stefan Priebe - Profihost AG 2012-06-25 14:20 ` Josef Bacik 2012-06-25 14:45 ` Stefan Priebe - Profihost AG 2012-06-25 14:48 ` Josef Bacik 2012-06-25 17:38 ` Stefan Priebe 2012-06-25 18:02 ` Josef Bacik 2012-06-25 18:28 ` Stefan Priebe 2012-06-25 19:33 ` Stefan Priebe 2012-06-25 20:11 ` Josef Bacik 2012-06-25 20:20 ` Stefan Priebe 2012-06-25 20:23 ` Josef Bacik 2012-06-25 20:33 ` Stefan Priebe 2012-06-26 16:47 ` Stefan Priebe 2012-06-26 20:14 ` Josef Bacik 2012-06-26 20:19 ` Stefan Priebe 2012-06-26 20:48 ` Josef Bacik 2012-06-27 5:47 ` Stefan Priebe - Profihost AG 2012-06-27 13:30 ` Josef Bacik 2012-06-27 21:17 ` Josef Bacik
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.