* 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. @ 2014-05-27 18:11 Torbjørn 2014-05-27 19:09 ` Chris Mason 0 siblings, 1 reply; 12+ messages in thread From: Torbjørn @ 2014-05-27 18:11 UTC (permalink / raw) To: linux-btrfs Hi, Btrfs-transaction keeps blocking for me on all 3.15-rc versions. 3.14 does not have this issue. The process never gets unstuck. btrfs fi sync does not help. A hard reboot seems to be the only way to recover. The volume is still readable when it's in this state. dmesg + sysrq-w is available at http://pastebin.com/vHQnRE2F It's over 6000 lines, and would most likely not be allowed on the list. The blocking happons on a server with local kvm-clients reading and writing to a local btrfs-volume over nfs. The btrfs-volume is on top of dm-crypt devices. Any additional info I can give to help? Tests you want me to run? -- Torbjørn --snip-- [138011.263478] INFO: task btrfs-transacti:4157 blocked for more than 120 seconds. [138011.263558] Tainted: G E 3.15.0-rc6 #58 [138011.263605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [138011.263687] btrfs-transacti D ffff88042fc94500 0 4157 2 0x00000000 [138011.263744] ffff880400737e10 0000000000000002 ffff880403b33260 ffff880400737fd8 [138011.263831] 0000000000014500 0000000000014500 ffff88041973b260 ffff8802a9c97b40 [138011.263919] ffff880403a0e800 ffff880400737e30 ffff8800c950ac00 ffff8800c950ac58 [138011.264007] Call Trace: [138011.264045] [<ffffffff816f81a9>] schedule+0x29/0x70 [138011.264097] [<ffffffffa01a6f65>] btrfs_commit_transaction+0x315/0xa10 [btrfs] [138011.264178] [<ffffffff810acf40>] ? prepare_to_wait_event+0x100/0x100 [138011.264235] [<ffffffffa01a2ea5>] transaction_kthread+0x1b5/0x240 [btrfs] [138011.264293] [<ffffffffa01a2cf0>] ? btrfs_cleanup_transaction+0x560/0x560 [btrfs] [138011.264376] [<ffffffff81088e72>] kthread+0xd2/0xf0 [138011.264422] [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190 [138011.264475] [<ffffffff81704dfc>] ret_from_fork+0x7c/0xb0 [138011.264522] [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190 --snip-- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-27 18:11 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds Torbjørn @ 2014-05-27 19:09 ` Chris Mason [not found] ` <5384F0B4.7040309@skagestad.org> 0 siblings, 1 reply; 12+ messages in thread From: Chris Mason @ 2014-05-27 19:09 UTC (permalink / raw) To: Torbjørn, linux-btrfs On 05/27/2014 02:11 PM, Torbjørn wrote: > Hi, > > Btrfs-transaction keeps blocking for me on all 3.15-rc versions. > 3.14 does not have this issue. > The process never gets unstuck. btrfs fi sync does not help. A hard > reboot seems to be the only way to recover. > > The volume is still readable when it's in this state. > > dmesg + sysrq-w is available at > https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8 > > It's over 6000 lines, and would most likely not be allowed on the list. > > The blocking happons on a server with local kvm-clients reading and > writing to a local btrfs-volume over nfs. > The btrfs-volume is on top of dm-crypt devices. > > Any additional info I can give to help? > Tests you want me to run? Very strange, since I don't actually see what we're waiting for. Can you please either send me your btrfs.ko or use gdb to see where this statement is: btrfs_commit_transaction+0x315 The syntax is gdb btrfs.ko gdb> list *btrfs_commit_transaction+0x315 -chris ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <5384F0B4.7040309@skagestad.org>]
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. [not found] ` <5384F0B4.7040309@skagestad.org> @ 2014-05-27 20:42 ` Torbjørn 2014-05-27 20:50 ` Chris Mason 0 siblings, 1 reply; 12+ messages in thread From: Torbjørn @ 2014-05-27 20:42 UTC (permalink / raw) To: linux-btrfs; +Cc: Chris Mason On 05/27/2014 10:08 PM, Torbjørn wrote: > On 05/27/2014 09:09 PM, Chris Mason wrote: >> >> On 05/27/2014 02:11 PM, Torbjørn wrote: >>> Hi, >>> >>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions. >>> 3.14 does not have this issue. >>> The process never gets unstuck. btrfs fi sync does not help. A hard >>> reboot seems to be the only way to recover. >>> >>> The volume is still readable when it's in this state. >>> >>> dmesg + sysrq-w is available at >>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8 >>> >>> >>> It's over 6000 lines, and would most likely not be allowed on the list. >>> >>> The blocking happons on a server with local kvm-clients reading and >>> writing to a local btrfs-volume over nfs. >>> The btrfs-volume is on top of dm-crypt devices. >>> >>> Any additional info I can give to help? >>> Tests you want me to run? >> Very strange, since I don't actually see what we're waiting for. Can >> you please either send me your btrfs.ko or use gdb to see where this >> statement is: >> >> >> btrfs_commit_transaction+0x315 >> >> The syntax is >> >> gdb btrfs.ko >> gdb> list *btrfs_commit_transaction+0x315 >> >> -chris > Sure, here you go. > > Reading symbols from btrfs.ko...done. > (gdb) list *btrfs_commit_transaction+0x315 > 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752). > 1747 * COMMIT_DOING so make sure to wait for num_writers to == > 1 again. > 1748 */ > 1749 spin_lock(&root->fs_info->trans_lock); > 1750 cur_trans->state = TRANS_STATE_COMMIT_DOING; > 1751 spin_unlock(&root->fs_info->trans_lock); > 1752 wait_event(cur_trans->writer_wait, > 1753 atomic_read(&cur_trans->num_writers) == 1); > 1754 > 1755 /* ->aborted might be set after the previous check, so > check it */ > 1756 if (unlikely(ACCESS_ONCE(cur_trans->aborted))) { > (gdb) > > I'm attaching the btrfs.ko as well, hopefully the 20M file gets through. > -- > Torbjørn Seems like the attachment stopped the email from reaching the list. Resending with only debug info. -- Torbjørn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-27 20:42 ` Torbjørn @ 2014-05-27 20:50 ` Chris Mason 2014-05-27 21:15 ` Chris Mason 0 siblings, 1 reply; 12+ messages in thread From: Chris Mason @ 2014-05-27 20:50 UTC (permalink / raw) To: Torbjørn, linux-btrfs On 05/27/2014 04:42 PM, Torbjørn wrote: > On 05/27/2014 10:08 PM, Torbjørn wrote: >> On 05/27/2014 09:09 PM, Chris Mason wrote: >>> >>> On 05/27/2014 02:11 PM, Torbjørn wrote: >>>> Hi, >>>> >>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions. >>>> 3.14 does not have this issue. >>>> The process never gets unstuck. btrfs fi sync does not help. A hard >>>> reboot seems to be the only way to recover. >>>> >>>> The volume is still readable when it's in this state. >>>> >>>> dmesg + sysrq-w is available at >>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8 >>>> >>>> >>>> It's over 6000 lines, and would most likely not be allowed on the list. >>>> >>>> The blocking happons on a server with local kvm-clients reading and >>>> writing to a local btrfs-volume over nfs. >>>> The btrfs-volume is on top of dm-crypt devices. >>>> >>>> Any additional info I can give to help? >>>> Tests you want me to run? >>> Very strange, since I don't actually see what we're waiting for. Can >>> you please either send me your btrfs.ko or use gdb to see where this >>> statement is: >>> >>> >>> btrfs_commit_transaction+0x315 >>> >>> The syntax is >>> >>> gdb btrfs.ko >>> gdb> list *btrfs_commit_transaction+0x315 >>> >>> -chris >> Sure, here you go. >> >> Reading symbols from btrfs.ko...done. >> (gdb) list *btrfs_commit_transaction+0x315 >> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752). >> 1747 * COMMIT_DOING so make sure to wait for num_writers to == >> 1 again. >> 1748 */ >> 1749 spin_lock(&root->fs_info->trans_lock); >> 1750 cur_trans->state = TRANS_STATE_COMMIT_DOING; >> 1751 spin_unlock(&root->fs_info->trans_lock); >> 1752 wait_event(cur_trans->writer_wait, >> 1753 atomic_read(&cur_trans->num_writers) == 1); >> 1754 >> 1755 /* ->aborted might be set after the previous check, so >> check it */ >> 1756 if (unlikely(ACCESS_ONCE(cur_trans->aborted))) { >> (gdb) >> >> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through. Thanks, this is enough. Someone has the transaction pegged open. Checking your sysrq again. -chris ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-27 20:50 ` Chris Mason @ 2014-05-27 21:15 ` Chris Mason 2014-05-28 5:53 ` Torbjørn 0 siblings, 1 reply; 12+ messages in thread From: Chris Mason @ 2014-05-27 21:15 UTC (permalink / raw) To: Torbjørn, linux-btrfs On 05/27/2014 04:50 PM, Chris Mason wrote: > On 05/27/2014 04:42 PM, Torbjørn wrote: >> On 05/27/2014 10:08 PM, Torbjørn wrote: >>> On 05/27/2014 09:09 PM, Chris Mason wrote: >>>> >>>> On 05/27/2014 02:11 PM, Torbjørn wrote: >>>>> Hi, >>>>> >>>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions. >>>>> 3.14 does not have this issue. >>>>> The process never gets unstuck. btrfs fi sync does not help. A hard >>>>> reboot seems to be the only way to recover. >>>>> >>>>> The volume is still readable when it's in this state. >>>>> >>>>> dmesg + sysrq-w is available at >>>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8 >>>>> >>>>> >>>>> It's over 6000 lines, and would most likely not be allowed on the list. >>>>> >>>>> The blocking happons on a server with local kvm-clients reading and >>>>> writing to a local btrfs-volume over nfs. >>>>> The btrfs-volume is on top of dm-crypt devices. >>>>> >>>>> Any additional info I can give to help? >>>>> Tests you want me to run? >>>> Very strange, since I don't actually see what we're waiting for. Can >>>> you please either send me your btrfs.ko or use gdb to see where this >>>> statement is: >>>> >>>> >>>> btrfs_commit_transaction+0x315 >>>> >>>> The syntax is >>>> >>>> gdb btrfs.ko >>>> gdb> list *btrfs_commit_transaction+0x315 >>>> >>>> -chris >>> Sure, here you go. >>> >>> Reading symbols from btrfs.ko...done. >>> (gdb) list *btrfs_commit_transaction+0x315 >>> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752). >>> 1747 * COMMIT_DOING so make sure to wait for num_writers to == >>> 1 again. >>> 1748 */ >>> 1749 spin_lock(&root->fs_info->trans_lock); >>> 1750 cur_trans->state = TRANS_STATE_COMMIT_DOING; >>> 1751 spin_unlock(&root->fs_info->trans_lock); >>> 1752 wait_event(cur_trans->writer_wait, >>> 1753 atomic_read(&cur_trans->num_writers) == 1); >>> 1754 >>> 1755 /* ->aborted might be set after the previous check, so >>> check it */ >>> 1756 if (unlikely(ACCESS_ONCE(cur_trans->aborted))) { >>> (gdb) >>> >>> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through. Ok, we're stuck here. The transaction won't coomplete until this disk IO is done. Since this is just a read, are you able to read from the device when this is happening? This would be the dm-crypt block device w/btrfs on it. [180625.987870] kworker/u16:12 D ffff88042fd94500 0 15271 2 0x00000000 [180625.987935] Workqueue: btrfs-delalloc normal_work_helper [btrfs] [180625.987987] ffff880107a4f648 0000000000000002 ffff88001383b260 ffff880107a4ffd8 [180625.988075] 0000000000014500 0000000000014500 ffff880419749930 ffff88042fd94e18 [180625.988163] ffff88042ffadce8 0000000000000002 ffffffff8114df40 ffff880107a4f6c0 [180625.988251] Call Trace: [180625.988291] [<ffffffff8114df40>] ? wait_on_page_read+0x60/0x60 [180625.988342] [<ffffffff816f84cd>] io_schedule+0x9d/0x140 [180625.988391] [<ffffffff8114df4e>] sleep_on_page+0xe/0x20 [180625.988440] [<ffffffff816f8962>] __wait_on_bit+0x62/0x90 [180625.988490] [<ffffffff8114dd0f>] wait_on_page_bit+0x7f/0x90 [180625.988541] [<ffffffff810acf80>] ? autoremove_wake_function+0x40/0x40 [180625.988601] [<ffffffffa01c8e8a>] read_extent_buffer_pages+0x2ca/0x300 [btrfs] [180625.988687] [<ffffffffa019dd70>] ? free_root_pointers+0x60/0x60 [btrfs] [180625.988746] [<ffffffffa019efa3>] btree_read_extent_buffer_pages.constprop.52+0xb3/0x120 [btrfs] [180625.988839] [<ffffffffa01a00c8>] read_tree_block+0x38/0x60 [btrfs] [180625.988895] [<ffffffffa01828c8>] read_block_for_search.isra.33+0x148/0x380 [btrfs] [180625.988983] [<ffffffffa0187f97>] btrfs_next_old_leaf+0x297/0x4a0 [btrfs] [180625.989041] [<ffffffffa01881b0>] btrfs_next_leaf+0x10/0x20 [btrfs] [180625.989099] [<ffffffffa01cdc9c>] find_free_dev_extent+0xbc/0x350 [btrfs] [180625.989159] [<ffffffffa01ce0e4>] __btrfs_alloc_chunk+0x1b4/0x770 [btrfs] [180625.989214] [<ffffffff8101ad25>] ? native_sched_clock+0x35/0x90 [180625.989265] [<ffffffff811bef49>] ? __sb_start_write+0x49/0xe0 [180625.989322] [<ffffffffa01d0b94>] btrfs_alloc_chunk+0x34/0x40 [btrfs] [180625.989380] [<ffffffffa018f9fe>] do_chunk_alloc+0x23e/0x410 [btrfs] [180625.989438] [<ffffffffa0194753>] find_free_extent+0xb03/0xbb0 [btrfs] [180625.989496] [<ffffffffa01949d8>] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [180625.989555] [<ffffffffa01ad9f5>] cow_file_range+0x135/0x440 [btrfs] [180625.989613] [<ffffffffa01aecff>] submit_compressed_extents+0x1bf/0x480 [btrfs] [180625.989700] [<ffffffffa01ac804>] ? async_cow_free+0x24/0x30 [btrfs] [180625.989758] [<ffffffffa01aefc0>] ? submit_compressed_extents+0x480/0x480 [btrfs] [180625.989845] [<ffffffffa01af046>] async_cow_submit+0x86/0x90 [btrfs] [180625.989904] [<ffffffffa01d5333>] normal_work_helper+0x193/0x2b0 [btrfs] [180625.989957] [<ffffffff81081532>] process_one_work+0x182/0x450 [180625.990008] [<ffffffff81082331>] worker_thread+0x121/0x410 [180625.990058] [<ffffffff81082210>] ? rescuer_thread+0x430/0x430 [180625.990109] [<ffffffff81088e72>] kthread+0xd2/0xf0 [180625.990156] [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190 [180625.990210] [<ffffffff81704dfc>] ret_from_fork+0x7c/0xb0 [180625.990259] [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-27 21:15 ` Chris Mason @ 2014-05-28 5:53 ` Torbjørn 2014-05-28 13:41 ` Chris Mason 0 siblings, 1 reply; 12+ messages in thread From: Torbjørn @ 2014-05-28 5:53 UTC (permalink / raw) To: Chris Mason, linux-btrfs On 27. mai 2014 23:15, Chris Mason wrote: > On 05/27/2014 04:50 PM, Chris Mason wrote: >> On 05/27/2014 04:42 PM, Torbjørn wrote: >>> On 05/27/2014 10:08 PM, Torbjørn wrote: >>>> On 05/27/2014 09:09 PM, Chris Mason wrote: >>>>> On 05/27/2014 02:11 PM, Torbjørn wrote: >>>>>> Hi, >>>>>> >>>>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions. >>>>>> 3.14 does not have this issue. >>>>>> The process never gets unstuck. btrfs fi sync does not help. A hard >>>>>> reboot seems to be the only way to recover. >>>>>> >>>>>> The volume is still readable when it's in this state. >>>>>> >>>>>> dmesg + sysrq-w is available at >>>>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8 >>>>>> >>>>>> >>>>>> It's over 6000 lines, and would most likely not be allowed on the list. >>>>>> >>>>>> The blocking happons on a server with local kvm-clients reading and >>>>>> writing to a local btrfs-volume over nfs. >>>>>> The btrfs-volume is on top of dm-crypt devices. >>>>>> >>>>>> Any additional info I can give to help? >>>>>> Tests you want me to run? >>>>> Very strange, since I don't actually see what we're waiting for. Can >>>>> you please either send me your btrfs.ko or use gdb to see where this >>>>> statement is: >>>>> >>>>> >>>>> btrfs_commit_transaction+0x315 >>>>> >>>>> The syntax is >>>>> >>>>> gdb btrfs.ko >>>>> gdb> list *btrfs_commit_transaction+0x315 >>>>> >>>>> -chris >>>> Sure, here you go. >>>> >>>> Reading symbols from btrfs.ko...done. >>>> (gdb) list *btrfs_commit_transaction+0x315 >>>> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752). >>>> 1747 * COMMIT_DOING so make sure to wait for num_writers to == >>>> 1 again. >>>> 1748 */ >>>> 1749 spin_lock(&root->fs_info->trans_lock); >>>> 1750 cur_trans->state = TRANS_STATE_COMMIT_DOING; >>>> 1751 spin_unlock(&root->fs_info->trans_lock); >>>> 1752 wait_event(cur_trans->writer_wait, >>>> 1753 atomic_read(&cur_trans->num_writers) == 1); >>>> 1754 >>>> 1755 /* ->aborted might be set after the previous check, so >>>> check it */ >>>> 1756 if (unlikely(ACCESS_ONCE(cur_trans->aborted))) { >>>> (gdb) >>>> >>>> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through. > > Ok, we're stuck here. The transaction won't coomplete until this disk IO is done. > > Since this is just a read, are you able to read from the device when this is > happening? This would be the dm-crypt block device w/btrfs on it. > > [180625.987870] kworker/u16:12 D ffff88042fd94500 0 15271 2 0x00000000 > [180625.987935] Workqueue: btrfs-delalloc normal_work_helper [btrfs] > [180625.987987] ffff880107a4f648 0000000000000002 ffff88001383b260 ffff880107a4ffd8 > [180625.988075] 0000000000014500 0000000000014500 ffff880419749930 ffff88042fd94e18 > [180625.988163] ffff88042ffadce8 0000000000000002 ffffffff8114df40 ffff880107a4f6c0 > [180625.988251] Call Trace: > [180625.988291] [<ffffffff8114df40>] ? wait_on_page_read+0x60/0x60 > [180625.988342] [<ffffffff816f84cd>] io_schedule+0x9d/0x140 > [180625.988391] [<ffffffff8114df4e>] sleep_on_page+0xe/0x20 > [180625.988440] [<ffffffff816f8962>] __wait_on_bit+0x62/0x90 > [180625.988490] [<ffffffff8114dd0f>] wait_on_page_bit+0x7f/0x90 > [180625.988541] [<ffffffff810acf80>] ? autoremove_wake_function+0x40/0x40 > [180625.988601] [<ffffffffa01c8e8a>] read_extent_buffer_pages+0x2ca/0x300 [btrfs] > [180625.988687] [<ffffffffa019dd70>] ? free_root_pointers+0x60/0x60 [btrfs] > [180625.988746] [<ffffffffa019efa3>] btree_read_extent_buffer_pages.constprop.52+0xb3/0x120 [btrfs] > [180625.988839] [<ffffffffa01a00c8>] read_tree_block+0x38/0x60 [btrfs] > [180625.988895] [<ffffffffa01828c8>] read_block_for_search.isra.33+0x148/0x380 [btrfs] > [180625.988983] [<ffffffffa0187f97>] btrfs_next_old_leaf+0x297/0x4a0 [btrfs] > [180625.989041] [<ffffffffa01881b0>] btrfs_next_leaf+0x10/0x20 [btrfs] > [180625.989099] [<ffffffffa01cdc9c>] find_free_dev_extent+0xbc/0x350 [btrfs] > [180625.989159] [<ffffffffa01ce0e4>] __btrfs_alloc_chunk+0x1b4/0x770 [btrfs] > [180625.989214] [<ffffffff8101ad25>] ? native_sched_clock+0x35/0x90 > [180625.989265] [<ffffffff811bef49>] ? __sb_start_write+0x49/0xe0 > [180625.989322] [<ffffffffa01d0b94>] btrfs_alloc_chunk+0x34/0x40 [btrfs] > [180625.989380] [<ffffffffa018f9fe>] do_chunk_alloc+0x23e/0x410 [btrfs] > [180625.989438] [<ffffffffa0194753>] find_free_extent+0xb03/0xbb0 [btrfs] > [180625.989496] [<ffffffffa01949d8>] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] > [180625.989555] [<ffffffffa01ad9f5>] cow_file_range+0x135/0x440 [btrfs] > [180625.989613] [<ffffffffa01aecff>] submit_compressed_extents+0x1bf/0x480 [btrfs] > [180625.989700] [<ffffffffa01ac804>] ? async_cow_free+0x24/0x30 [btrfs] > [180625.989758] [<ffffffffa01aefc0>] ? submit_compressed_extents+0x480/0x480 [btrfs] > [180625.989845] [<ffffffffa01af046>] async_cow_submit+0x86/0x90 [btrfs] > [180625.989904] [<ffffffffa01d5333>] normal_work_helper+0x193/0x2b0 [btrfs] > [180625.989957] [<ffffffff81081532>] process_one_work+0x182/0x450 > [180625.990008] [<ffffffff81082331>] worker_thread+0x121/0x410 > [180625.990058] [<ffffffff81082210>] ? rescuer_thread+0x430/0x430 > [180625.990109] [<ffffffff81088e72>] kthread+0xd2/0xf0 > [180625.990156] [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190 > [180625.990210] [<ffffffff81704dfc>] ret_from_fork+0x7c/0xb0 > [180625.990259] [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190 > It's actually a raid10 array of 11 dm-crypt devices. I'm able to read data from the array (accessing files), and also read directly from all the underlying dm-crypt devices using dd, if that's what you meant. I have not rebooted the system since that dmesg, so it's still stuck in the same state. I can keep it like that for some days. The system is not critical at all. -- Torbjørn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-28 5:53 ` Torbjørn @ 2014-05-28 13:41 ` Chris Mason 2014-05-28 14:56 ` Torbjørn 0 siblings, 1 reply; 12+ messages in thread From: Chris Mason @ 2014-05-28 13:41 UTC (permalink / raw) To: Torbjørn, linux-btrfs On 05/28/2014 01:53 AM, Torbjørn wrote: > It's actually a raid10 array of 11 dm-crypt devices. > I'm able to read data from the array (accessing files), and also read > directly from all the underlying dm-crypt devices using dd, if that's > what you meant. > > I have not rebooted the system since that dmesg, so it's still stuck in > the same state. > I can keep it like that for some days. The system is not critical at all. > Thanks, Can you please capture the result of sysrq-t and sysrq-l? Maybe the IO is stuck in a thread that isn't in the D state. -chris ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-28 13:41 ` Chris Mason @ 2014-05-28 14:56 ` Torbjørn 2014-06-01 21:29 ` Marc MERLIN 0 siblings, 1 reply; 12+ messages in thread From: Torbjørn @ 2014-05-28 14:56 UTC (permalink / raw) To: Chris Mason, linux-btrfs On 05/28/2014 03:41 PM, Chris Mason wrote: > On 05/28/2014 01:53 AM, Torbjørn wrote: > >> It's actually a raid10 array of 11 dm-crypt devices. >> I'm able to read data from the array (accessing files), and also read >> directly from all the underlying dm-crypt devices using dd, if that's >> what you meant. >> >> I have not rebooted the system since that dmesg, so it's still stuck in >> the same state. >> I can keep it like that for some days. The system is not critical at all. >> > Thanks, > > Can you please capture the result of sysrq-t and sysrq-l? Maybe the IO > is stuck in a thread that isn't in the D state. > > -chris The output of sysrq-t was too much data for pastebin, will send both to you as attachments outside of the list. -- Torbjørn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-05-28 14:56 ` Torbjørn @ 2014-06-01 21:29 ` Marc MERLIN 2014-06-02 3:55 ` Andrew McGlashan ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Marc MERLIN @ 2014-06-01 21:29 UTC (permalink / raw) To: Torbjørn; +Cc: Chris Mason, linux-btrfs On Wed, May 28, 2014 at 04:56:56PM +0200, Torbjørn wrote: > On 05/28/2014 03:41 PM, Chris Mason wrote: > >On 05/28/2014 01:53 AM, Torbjørn wrote: > > > >>It's actually a raid10 array of 11 dm-crypt devices. > >>I'm able to read data from the array (accessing files), and also read > >>directly from all the underlying dm-crypt devices using dd, if that's > >>what you meant. > >> > >>I have not rebooted the system since that dmesg, so it's still stuck in > >>the same state. > >>I can keep it like that for some days. The system is not critical at all. > >> > >Thanks, > > > >Can you please capture the result of sysrq-t and sysrq-l? Maybe the IO > >is stuck in a thread that isn't in the D state. > > > >-chris > The output of sysrq-t was too much data for pastebin, will send both > to you as attachments outside of the list. Considering I have experienced this with with 3.15rc but wasn't able to save a sysrq since my save file and syslog got lost, did you find out anything from that? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-06-01 21:29 ` Marc MERLIN @ 2014-06-02 3:55 ` Andrew McGlashan 2014-06-02 7:29 ` Torbjørn 2014-06-05 11:44 ` Gary Coulbourne 2 siblings, 0 replies; 12+ messages in thread From: Andrew McGlashan @ 2014-06-02 3:55 UTC (permalink / raw) To: Marc MERLIN, Torbjørn; +Cc: Chris Mason, linux-btrfs Hi, I'm getting "blocked for more than 120 seconds" messages too. In my case, it is for a simple RAID1 volume that is rebuilding for the first time (new volume), whilst mounted as a LUKS crypt volume and writing /dev/zero before actual deployment. I am running a stock Wheezy kernel: ~ # uname -a Linux n4800eco-a 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3 x86_64 GNU/Linux As I am deliberately NOT running btrfs, the common link may be mdadm or LUKS. I'm also noticing btrfs modules being loaded on another box, even though I am not at all ready to trust btrfs for any file system at this time. There may be some bug with mdadm/crypt(LUKS)/btrfs -- or it may have nothing to do with btrfs. My problem almost certainly isn't btrfs unless there is some kernel code giving me grief without loading the btrfs module. More details of my problem: https://lists.debian.org/debian-user/2014/06/msg00070.html NB: working through my problem in a dropbear busybox environment that hasn't loaded btrfs module. The kernel is the same as the /normal/ system, compliments of mkinitramfs tool.... Cheers A. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-06-01 21:29 ` Marc MERLIN 2014-06-02 3:55 ` Andrew McGlashan @ 2014-06-02 7:29 ` Torbjørn 2014-06-05 11:44 ` Gary Coulbourne 2 siblings, 0 replies; 12+ messages in thread From: Torbjørn @ 2014-06-02 7:29 UTC (permalink / raw) To: Marc MERLIN; +Cc: Chris Mason, linux-btrfs On 01. juni 2014 23:29, Marc MERLIN wrote: > On Wed, May 28, 2014 at 04:56:56PM +0200, Torbjørn wrote: >> On 05/28/2014 03:41 PM, Chris Mason wrote: >>> On 05/28/2014 01:53 AM, Torbjørn wrote: >>> >>>> It's actually a raid10 array of 11 dm-crypt devices. >>>> I'm able to read data from the array (accessing files), and also read >>>> directly from all the underlying dm-crypt devices using dd, if that's >>>> what you meant. >>>> >>>> I have not rebooted the system since that dmesg, so it's still stuck in >>>> the same state. >>>> I can keep it like that for some days. The system is not critical at all. >>>> >>> Thanks, >>> >>> Can you please capture the result of sysrq-t and sysrq-l? Maybe the IO >>> is stuck in a thread that isn't in the D state. >>> >>> -chris >> The output of sysrq-t was too much data for pastebin, will send both >> to you as attachments outside of the list. > Considering I have experienced this with with 3.15rc but wasn't able to > save a sysrq since my save file and syslog got lost, did you find out > anything from that? > > Thanks, > Marc Hi Marc, I have not done any real investigation of those yet, so no. The server is mainly for testing, so I had already netconsole set up for it. Btrfs only blocked on the one array though. Root is btrfs raid1 with 2 devices, and it was operating as normal. If it's of any interest, I can make the captured logs available for download. -- Torbjørn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds. 2014-06-01 21:29 ` Marc MERLIN 2014-06-02 3:55 ` Andrew McGlashan 2014-06-02 7:29 ` Torbjørn @ 2014-06-05 11:44 ` Gary Coulbourne 2 siblings, 0 replies; 12+ messages in thread From: Gary Coulbourne @ 2014-06-05 11:44 UTC (permalink / raw) To: linux-btrfs I am also getting this... My system is configured like so: 3.15.0-rc8 Btrfs v3.12 #btrfs fi show Label: none uuid: e421ceeb-6e4e-4c7e-84b4-6f25442745fa Total devices 1 FS bytes used 22.51GiB devid 1 size 116.75GiB used 28.02GiB path /dev/sda4 Label: tank uuid: 52044d1c-5308-40f7-9d21-4edca3b63d05 Total devices 5 FS bytes used 1.87TiB devid 2 size 931.51GiB used 367.00GiB path /dev/sdc devid 3 size 931.51GiB used 366.00GiB path /dev/sdf devid 4 size 2.73TiB used 1.88TiB path /dev/sde devid 5 size 1.36TiB used 833.00GiB path /dev/sdd devid 6 size 931.51GiB used 364.03GiB path /dev/sdb /tank is the filesystem giving the trouble. Here's the df for it: Data, RAID1: total=1.88TiB, used=1.87TiB System, RAID1: total=32.00MiB, used=276.00KiB Metadata, RAID1: total=4.00GiB, used=2.84GiB unknown, single: total=512.00MiB, used=0.00 It is just a raid 1, zlib-compressed. No encryption. If I do any kind of large file transfers (large amounts OR many files, not sure what aspect causes it) with it, the following happens: My load average shoots up to HUGE numbers, the filesystem becomes unusable although directories are still readable that do not have files that are being accessed in them, and the dmesg begins filling with: Jun 5 06:55:40 maul kernel: [ 361.276631] INFO: task kworker/u8:0:6 blocked for more than 120 seconds. Jun 5 06:55:40 maul kernel: [ 361.276640] Not tainted 3.15.0-rc8 #1 Jun 5 06:55:40 maul kernel: [ 361.276644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 5 06:55:40 maul kernel: [ 361.276649] kworker/u8:0 D 0000000000000000 0 6 2 0x00000000 Jun 5 06:55:40 maul kernel: [ 361.276665] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2) Jun 5 06:55:40 maul kernel: [ 361.276671] ffff8802331c79f8 0000000000000002 ffff8802331c8000 ffff8802331c7fd8 Jun 5 06:55:40 maul kernel: [ 361.276679] 00000000000147c0 00000000000147c0 ffff880233226540 ffff8802331c8000 Jun 5 06:55:40 maul kernel: [ 361.276686] ffff88023ed150d8 ffff88023efdeb68 ffff8802331c7a80 0000000000000002 Jun 5 06:55:40 maul kernel: [ 361.276693] Call Trace: Jun 5 06:55:40 maul kernel: [ 361.276706] [<ffffffff8114a1d0>] ? wait_on_page_read+0x60/0x60 Jun 5 06:55:40 maul kernel: [ 361.276714] [<ffffffff817aeb9f>] io_schedule+0xaf/0x150 Jun 5 06:55:40 maul kernel: [ 361.276721] [<ffffffff8114a1de>] sleep_on_page+0xe/0x20 Jun 5 06:55:40 maul kernel: [ 361.276727] [<ffffffff817af213>] __wait_on_bit_lock+0x53/0xb0 Jun 5 06:55:40 maul kernel: [ 361.276734] [<ffffffff8114a2fa>] __lock_page+0x6a/0x70 Jun 5 06:55:40 maul kernel: [ 361.276741] [<ffffffff810a4430>] ? autoremove_wake_function+0x40/0x40 Jun 5 06:55:40 maul kernel: [ 361.276749] [<ffffffff8131579e>] ? flush_write_bio+0xe/0x10 Jun 5 06:55:40 maul kernel: [ 361.276756] [<ffffffff81319d50>] extent_write_cache_pages.isra.28.constprop.50+0x230/0x350 Jun 5 06:55:40 maul kernel: [ 361.276763] [<ffffffff8109cb33>] ? find_busiest_group+0x133/0x830 Jun 5 06:55:40 maul kernel: [ 361.276771] [<ffffffff8131b06c>] extent_writepages+0x4c/0x60 Jun 5 06:55:40 maul kernel: [ 361.276779] [<ffffffff812fecb0>] ? btrfs_writepage_end_io_hook+0x190/0x190 Jun 5 06:55:40 maul kernel: [ 361.276785] [<ffffffff817b7134>] ? preempt_count_add+0x54/0xa0 Jun 5 06:55:40 maul kernel: [ 361.276791] [<ffffffff812fc178>] btrfs_writepages+0x28/0x30 Jun 5 06:55:40 maul kernel: [ 361.276798] [<ffffffff81157dfe>] do_writepages+0x1e/0x40 Jun 5 06:55:40 maul kernel: [ 361.276805] [<ffffffff811ea2d0>] __writeback_single_inode+0x40/0x2a0 Jun 5 06:55:40 maul kernel: [ 361.276812] [<ffffffff811ed4ff>] writeback_sb_inodes+0x23f/0x3f0 Jun 5 06:55:40 maul kernel: [ 361.276821] [<ffffffff811ed747>] __writeback_inodes_wb+0x97/0xd0 Jun 5 06:55:40 maul kernel: [ 361.276828] [<ffffffff811ed97b>] wb_writeback+0x1fb/0x310 Jun 5 06:55:40 maul kernel: [ 361.276836] [<ffffffff811eddd6>] bdi_writeback_workfn+0x1d6/0x490 Jun 5 06:55:40 maul kernel: [ 361.276844] [<ffffffff810779c8>] process_one_work+0x178/0x4b0 Jun 5 06:55:40 maul kernel: [ 361.276850] [<ffffffff81078731>] worker_thread+0x131/0x3d0 Jun 5 06:55:40 maul kernel: [ 361.276856] [<ffffffff81078600>] ? manage_workers.isra.25+0x2c0/0x2c0 Jun 5 06:55:40 maul kernel: [ 361.276863] [<ffffffff8107eabb>] kthread+0xdb/0x100 Jun 5 06:55:40 maul kernel: [ 361.276871] [<ffffffff8107e9e0>] ? kthread_create_on_node+0x190/0x190 Jun 5 06:55:40 maul kernel: [ 361.276878] [<ffffffff817bb7fc>] ret_from_fork+0x7c/0xb0 Jun 5 06:55:40 maul kernel: [ 361.276884] [<ffffffff8107e9e0>] ? kthread_create_on_node+0x190/0x190 Jun 5 06:55:40 maul kernel: [ 361.276896] INFO: task kworker/u8:1:55 blocked for more than 120 seconds. Jun 5 06:55:40 maul kernel: [ 361.276900] Not tainted 3.15.0-rc8 #1 Jun 5 06:55:40 maul kernel: [ 361.276903] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 5 06:55:40 maul kernel: [ 361.276906] kworker/u8:1 D ffffffff814231d7 0 55 2 0x00000000 Jun 5 06:55:40 maul kernel: [ 361.276916] Workqueue: btrfs-cache normal_work_helper Jun 5 06:55:40 maul kernel: [ 361.276919] ffff880232ed3c60 0000000000000002 ffff880232e11950 ffff880232ed3fd8 Jun 5 06:55:40 maul kernel: [ 361.276927] 00000000000147c0 00000000000147c0 ffff88022def9950 ffff880232e11950 Jun 5 06:55:40 maul kernel: [ 361.276933] ffff880232e11950 ffff88022f75c680 ffff88022f75c688 0000000000000040 Jun 5 06:55:40 maul kernel: [ 361.276940] Call Trace: Jun 5 06:55:40 maul kernel: [ 361.276949] [<ffffffff817ae839>] schedule+0x29/0x70 Jun 5 06:55:40 maul kernel: [ 361.276955] [<ffffffff817b19a5>] rwsem_down_read_failed+0xc5/0x160 Jun 5 06:55:40 maul kernel: [ 361.276962] [<ffffffff812da77c>] ? btrfs_next_old_leaf+0x1dc/0x4a0 Jun 5 06:55:40 maul kernel: [ 361.276971] [<ffffffff814191f4>] call_rwsem_down_read_failed+0x14/0x30 Jun 5 06:55:40 maul kernel: [ 361.276979] [<ffffffff817b13b7>] ? down_read+0x17/0x20 Jun 5 06:55:40 maul kernel: [ 361.276985] [<ffffffff812dfc5b>] caching_thread+0xeb/0x490 Jun 5 06:55:40 maul kernel: [ 361.276993] [<ffffffff8132960f>] normal_work_helper+0x12f/0x310 Jun 5 06:55:40 maul kernel: [ 361.276999] [<ffffffff810779c8>] process_one_work+0x178/0x4b0 Jun 5 06:55:40 maul kernel: [ 361.277005] [<ffffffff81078731>] worker_thread+0x131/0x3d0 Jun 5 06:55:40 maul kernel: [ 361.277011] [<ffffffff81078600>] ? manage_workers.isra.25+0x2c0/0x2c0 Jun 5 06:55:40 maul kernel: [ 361.277018] [<ffffffff8107eabb>] kthread+0xdb/0x100 Jun 5 06:55:40 maul kernel: [ 361.277025] [<ffffffff8107e9e0>] ? kthread_create_on_node+0x190/0x190 Jun 5 06:55:40 maul kernel: [ 361.277032] [<ffffffff817bb7fc>] ret_from_fork+0x7c/0xb0 Jun 5 06:55:40 maul kernel: [ 361.277039] [<ffffffff8107e9e0>] ? kthread_create_on_node+0x190/0x190 This had all seemed to be fine in kernel 3.13. Peace, Gary ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-06-05 12:10 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-27 18:11 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds Torbjørn 2014-05-27 19:09 ` Chris Mason [not found] ` <5384F0B4.7040309@skagestad.org> 2014-05-27 20:42 ` Torbjørn 2014-05-27 20:50 ` Chris Mason 2014-05-27 21:15 ` Chris Mason 2014-05-28 5:53 ` Torbjørn 2014-05-28 13:41 ` Chris Mason 2014-05-28 14:56 ` Torbjørn 2014-06-01 21:29 ` Marc MERLIN 2014-06-02 3:55 ` Andrew McGlashan 2014-06-02 7:29 ` Torbjørn 2014-06-05 11:44 ` Gary Coulbourne
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).