3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
@ 2014-05-27 18:11 Torbjørn
  2014-05-27 19:09 ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Torbjørn @ 2014-05-27 18:11 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Btrfs-transaction keeps blocking for me on all 3.15-rc versions.
3.14 does not have this issue.
The process never gets unstuck. btrfs fi sync does not help. A hard 
reboot seems to be the only way to recover.

The volume is still readable when it's in this state.

dmesg + sysrq-w is available at http://pastebin.com/vHQnRE2F
It's over 6000 lines, and would most likely not be allowed on the list.

The blocking happons on a server with local kvm-clients reading and 
writing to a local btrfs-volume over nfs.
The btrfs-volume is on top of dm-crypt devices.

Any additional info I can give to help?
Tests you want me to run?

--
Torbjørn

--snip--
[138011.263478] INFO: task btrfs-transacti:4157 blocked for more than 
120 seconds.
[138011.263558]       Tainted: G            E 3.15.0-rc6 #58
[138011.263605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[138011.263687] btrfs-transacti D ffff88042fc94500     0  4157 2 0x00000000
[138011.263744]  ffff880400737e10 0000000000000002 ffff880403b33260 
ffff880400737fd8
[138011.263831]  0000000000014500 0000000000014500 ffff88041973b260 
ffff8802a9c97b40
[138011.263919]  ffff880403a0e800 ffff880400737e30 ffff8800c950ac00 
ffff8800c950ac58
[138011.264007] Call Trace:
[138011.264045]  [<ffffffff816f81a9>] schedule+0x29/0x70
[138011.264097]  [<ffffffffa01a6f65>] 
btrfs_commit_transaction+0x315/0xa10 [btrfs]
[138011.264178]  [<ffffffff810acf40>] ? prepare_to_wait_event+0x100/0x100
[138011.264235]  [<ffffffffa01a2ea5>] transaction_kthread+0x1b5/0x240 
[btrfs]
[138011.264293]  [<ffffffffa01a2cf0>] ? 
btrfs_cleanup_transaction+0x560/0x560 [btrfs]
[138011.264376]  [<ffffffff81088e72>] kthread+0xd2/0xf0
[138011.264422]  [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190
[138011.264475]  [<ffffffff81704dfc>] ret_from_fork+0x7c/0xb0
[138011.264522]  [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190
--snip--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-27 18:11 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds Torbjørn
@ 2014-05-27 19:09 ` Chris Mason
       [not found]   ` <5384F0B4.7040309@skagestad.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2014-05-27 19:09 UTC (permalink / raw)
  To: Torbjørn, linux-btrfs



On 05/27/2014 02:11 PM, Torbjørn wrote:
> Hi,
> 
> Btrfs-transaction keeps blocking for me on all 3.15-rc versions.
> 3.14 does not have this issue.
> The process never gets unstuck. btrfs fi sync does not help. A hard
> reboot seems to be the only way to recover.
> 
> The volume is still readable when it's in this state.
> 
> dmesg + sysrq-w is available at
> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8
> 
> It's over 6000 lines, and would most likely not be allowed on the list.
> 
> The blocking happons on a server with local kvm-clients reading and
> writing to a local btrfs-volume over nfs.
> The btrfs-volume is on top of dm-crypt devices.
> 
> Any additional info I can give to help?
> Tests you want me to run?

Very strange, since I don't actually see what we're waiting for.  Can
you please either send me your btrfs.ko or use gdb to see where this
statement is:


btrfs_commit_transaction+0x315

The syntax is

gdb btrfs.ko
gdb> list *btrfs_commit_transaction+0x315

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
       [not found]   ` <5384F0B4.7040309@skagestad.org>
@ 2014-05-27 20:42     ` Torbjørn
  2014-05-27 20:50       ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Torbjørn @ 2014-05-27 20:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason

On 05/27/2014 10:08 PM, Torbjørn wrote:
> On 05/27/2014 09:09 PM, Chris Mason wrote:
>>
>> On 05/27/2014 02:11 PM, Torbjørn wrote:
>>> Hi,
>>>
>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions.
>>> 3.14 does not have this issue.
>>> The process never gets unstuck. btrfs fi sync does not help. A hard
>>> reboot seems to be the only way to recover.
>>>
>>> The volume is still readable when it's in this state.
>>>
>>> dmesg + sysrq-w is available at
>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8 
>>>
>>>
>>> It's over 6000 lines, and would most likely not be allowed on the list.
>>>
>>> The blocking happons on a server with local kvm-clients reading and
>>> writing to a local btrfs-volume over nfs.
>>> The btrfs-volume is on top of dm-crypt devices.
>>>
>>> Any additional info I can give to help?
>>> Tests you want me to run?
>> Very strange, since I don't actually see what we're waiting for.  Can
>> you please either send me your btrfs.ko or use gdb to see where this
>> statement is:
>>
>>
>> btrfs_commit_transaction+0x315
>>
>> The syntax is
>>
>> gdb btrfs.ko
>> gdb> list *btrfs_commit_transaction+0x315
>>
>> -chris
> Sure, here you go.
>
> Reading symbols from btrfs.ko...done.
> (gdb) list *btrfs_commit_transaction+0x315
> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752).
> 1747         * COMMIT_DOING so make sure to wait for num_writers to == 
> 1 again.
> 1748         */
> 1749        spin_lock(&root->fs_info->trans_lock);
> 1750        cur_trans->state = TRANS_STATE_COMMIT_DOING;
> 1751        spin_unlock(&root->fs_info->trans_lock);
> 1752        wait_event(cur_trans->writer_wait,
> 1753               atomic_read(&cur_trans->num_writers) == 1);
> 1754
> 1755        /* ->aborted might be set after the previous check, so 
> check it */
> 1756        if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
> (gdb)
>
> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through.
> -- 
> Torbjørn
Seems like the attachment stopped the email from reaching the list.
Resending with only debug info.

--
Torbjørn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-27 20:42     ` Torbjørn
@ 2014-05-27 20:50       ` Chris Mason
  2014-05-27 21:15         ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2014-05-27 20:50 UTC (permalink / raw)
  To: Torbjørn, linux-btrfs

On 05/27/2014 04:42 PM, Torbjørn wrote:
> On 05/27/2014 10:08 PM, Torbjørn wrote:
>> On 05/27/2014 09:09 PM, Chris Mason wrote:
>>>
>>> On 05/27/2014 02:11 PM, Torbjørn wrote:
>>>> Hi,
>>>>
>>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions.
>>>> 3.14 does not have this issue.
>>>> The process never gets unstuck. btrfs fi sync does not help. A hard
>>>> reboot seems to be the only way to recover.
>>>>
>>>> The volume is still readable when it's in this state.
>>>>
>>>> dmesg + sysrq-w is available at
>>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8
>>>>
>>>>
>>>> It's over 6000 lines, and would most likely not be allowed on the list.
>>>>
>>>> The blocking happons on a server with local kvm-clients reading and
>>>> writing to a local btrfs-volume over nfs.
>>>> The btrfs-volume is on top of dm-crypt devices.
>>>>
>>>> Any additional info I can give to help?
>>>> Tests you want me to run?
>>> Very strange, since I don't actually see what we're waiting for.  Can
>>> you please either send me your btrfs.ko or use gdb to see where this
>>> statement is:
>>>
>>>
>>> btrfs_commit_transaction+0x315
>>>
>>> The syntax is
>>>
>>> gdb btrfs.ko
>>> gdb> list *btrfs_commit_transaction+0x315
>>>
>>> -chris
>> Sure, here you go.
>>
>> Reading symbols from btrfs.ko...done.
>> (gdb) list *btrfs_commit_transaction+0x315
>> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752).
>> 1747         * COMMIT_DOING so make sure to wait for num_writers to ==
>> 1 again.
>> 1748         */
>> 1749        spin_lock(&root->fs_info->trans_lock);
>> 1750        cur_trans->state = TRANS_STATE_COMMIT_DOING;
>> 1751        spin_unlock(&root->fs_info->trans_lock);
>> 1752        wait_event(cur_trans->writer_wait,
>> 1753               atomic_read(&cur_trans->num_writers) == 1);
>> 1754
>> 1755        /* ->aborted might be set after the previous check, so
>> check it */
>> 1756        if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
>> (gdb)
>>
>> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through.

Thanks, this is enough.  Someone has the transaction pegged open.
Checking your sysrq again.

-chris


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-27 20:50       ` Chris Mason
@ 2014-05-27 21:15         ` Chris Mason
  2014-05-28  5:53           ` Torbjørn
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2014-05-27 21:15 UTC (permalink / raw)
  To: Torbjørn, linux-btrfs

On 05/27/2014 04:50 PM, Chris Mason wrote:
> On 05/27/2014 04:42 PM, Torbjørn wrote:
>> On 05/27/2014 10:08 PM, Torbjørn wrote:
>>> On 05/27/2014 09:09 PM, Chris Mason wrote:
>>>>
>>>> On 05/27/2014 02:11 PM, Torbjørn wrote:
>>>>> Hi,
>>>>>
>>>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions.
>>>>> 3.14 does not have this issue.
>>>>> The process never gets unstuck. btrfs fi sync does not help. A hard
>>>>> reboot seems to be the only way to recover.
>>>>>
>>>>> The volume is still readable when it's in this state.
>>>>>
>>>>> dmesg + sysrq-w is available at
>>>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8
>>>>>
>>>>>
>>>>> It's over 6000 lines, and would most likely not be allowed on the list.
>>>>>
>>>>> The blocking happons on a server with local kvm-clients reading and
>>>>> writing to a local btrfs-volume over nfs.
>>>>> The btrfs-volume is on top of dm-crypt devices.
>>>>>
>>>>> Any additional info I can give to help?
>>>>> Tests you want me to run?
>>>> Very strange, since I don't actually see what we're waiting for.  Can
>>>> you please either send me your btrfs.ko or use gdb to see where this
>>>> statement is:
>>>>
>>>>
>>>> btrfs_commit_transaction+0x315
>>>>
>>>> The syntax is
>>>>
>>>> gdb btrfs.ko
>>>> gdb> list *btrfs_commit_transaction+0x315
>>>>
>>>> -chris
>>> Sure, here you go.
>>>
>>> Reading symbols from btrfs.ko...done.
>>> (gdb) list *btrfs_commit_transaction+0x315
>>> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752).
>>> 1747         * COMMIT_DOING so make sure to wait for num_writers to ==
>>> 1 again.
>>> 1748         */
>>> 1749        spin_lock(&root->fs_info->trans_lock);
>>> 1750        cur_trans->state = TRANS_STATE_COMMIT_DOING;
>>> 1751        spin_unlock(&root->fs_info->trans_lock);
>>> 1752        wait_event(cur_trans->writer_wait,
>>> 1753               atomic_read(&cur_trans->num_writers) == 1);
>>> 1754
>>> 1755        /* ->aborted might be set after the previous check, so
>>> check it */
>>> 1756        if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
>>> (gdb)
>>>
>>> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through.


Ok, we're stuck here.  The transaction won't coomplete until this disk IO is done.

Since this is just a read, are you able to read from the device when this is
happening?  This would be the dm-crypt block device w/btrfs on it.

[180625.987870] kworker/u16:12  D ffff88042fd94500     0 15271      2 0x00000000 
[180625.987935] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[180625.987987]  ffff880107a4f648 0000000000000002 ffff88001383b260 ffff880107a4ffd8
[180625.988075]  0000000000014500 0000000000014500 ffff880419749930 ffff88042fd94e18
[180625.988163]  ffff88042ffadce8 0000000000000002 ffffffff8114df40 ffff880107a4f6c0
[180625.988251] Call Trace:
[180625.988291]  [<ffffffff8114df40>] ? wait_on_page_read+0x60/0x60
[180625.988342]  [<ffffffff816f84cd>] io_schedule+0x9d/0x140
[180625.988391]  [<ffffffff8114df4e>] sleep_on_page+0xe/0x20
[180625.988440]  [<ffffffff816f8962>] __wait_on_bit+0x62/0x90
[180625.988490]  [<ffffffff8114dd0f>] wait_on_page_bit+0x7f/0x90
[180625.988541]  [<ffffffff810acf80>] ? autoremove_wake_function+0x40/0x40
[180625.988601]  [<ffffffffa01c8e8a>] read_extent_buffer_pages+0x2ca/0x300 [btrfs]
[180625.988687]  [<ffffffffa019dd70>] ? free_root_pointers+0x60/0x60 [btrfs]
[180625.988746]  [<ffffffffa019efa3>] btree_read_extent_buffer_pages.constprop.52+0xb3/0x120 [btrfs]
[180625.988839]  [<ffffffffa01a00c8>] read_tree_block+0x38/0x60 [btrfs]
[180625.988895]  [<ffffffffa01828c8>] read_block_for_search.isra.33+0x148/0x380 [btrfs]
[180625.988983]  [<ffffffffa0187f97>] btrfs_next_old_leaf+0x297/0x4a0 [btrfs]
[180625.989041]  [<ffffffffa01881b0>] btrfs_next_leaf+0x10/0x20 [btrfs]
[180625.989099]  [<ffffffffa01cdc9c>] find_free_dev_extent+0xbc/0x350 [btrfs]
[180625.989159]  [<ffffffffa01ce0e4>] __btrfs_alloc_chunk+0x1b4/0x770 [btrfs]
[180625.989214]  [<ffffffff8101ad25>] ? native_sched_clock+0x35/0x90
[180625.989265]  [<ffffffff811bef49>] ? __sb_start_write+0x49/0xe0
[180625.989322]  [<ffffffffa01d0b94>] btrfs_alloc_chunk+0x34/0x40 [btrfs]
[180625.989380]  [<ffffffffa018f9fe>] do_chunk_alloc+0x23e/0x410 [btrfs]
[180625.989438]  [<ffffffffa0194753>] find_free_extent+0xb03/0xbb0 [btrfs]
[180625.989496]  [<ffffffffa01949d8>] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[180625.989555]  [<ffffffffa01ad9f5>] cow_file_range+0x135/0x440 [btrfs]
[180625.989613]  [<ffffffffa01aecff>] submit_compressed_extents+0x1bf/0x480 [btrfs]
[180625.989700]  [<ffffffffa01ac804>] ? async_cow_free+0x24/0x30 [btrfs]
[180625.989758]  [<ffffffffa01aefc0>] ? submit_compressed_extents+0x480/0x480 [btrfs]
[180625.989845]  [<ffffffffa01af046>] async_cow_submit+0x86/0x90 [btrfs]
[180625.989904]  [<ffffffffa01d5333>] normal_work_helper+0x193/0x2b0 [btrfs]
[180625.989957]  [<ffffffff81081532>] process_one_work+0x182/0x450
[180625.990008]  [<ffffffff81082331>] worker_thread+0x121/0x410
[180625.990058]  [<ffffffff81082210>] ? rescuer_thread+0x430/0x430
[180625.990109]  [<ffffffff81088e72>] kthread+0xd2/0xf0
[180625.990156]  [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190
[180625.990210]  [<ffffffff81704dfc>] ret_from_fork+0x7c/0xb0
[180625.990259]  [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-27 21:15         ` Chris Mason
@ 2014-05-28  5:53           ` Torbjørn
  2014-05-28 13:41             ` Chris Mason
  0 siblings, 1 reply; 12+ messages in thread
From: Torbjørn @ 2014-05-28  5:53 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On 27. mai 2014 23:15, Chris Mason wrote:
> On 05/27/2014 04:50 PM, Chris Mason wrote:
>> On 05/27/2014 04:42 PM, Torbjørn wrote:
>>> On 05/27/2014 10:08 PM, Torbjørn wrote:
>>>> On 05/27/2014 09:09 PM, Chris Mason wrote:
>>>>> On 05/27/2014 02:11 PM, Torbjørn wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Btrfs-transaction keeps blocking for me on all 3.15-rc versions.
>>>>>> 3.14 does not have this issue.
>>>>>> The process never gets unstuck. btrfs fi sync does not help. A hard
>>>>>> reboot seems to be the only way to recover.
>>>>>>
>>>>>> The volume is still readable when it's in this state.
>>>>>>
>>>>>> dmesg + sysrq-w is available at
>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://pastebin.com/vHQnRE2F&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=IKSs%2F0C3x9a0LIiVKFmZVoP9lSAZ%2BK9JgEkchLEAAzM%3D%0A&s=127b40cc34dbb205b5277e6081b082f26e84fc417d35310f3aeee04998a679a8
>>>>>>
>>>>>>
>>>>>> It's over 6000 lines, and would most likely not be allowed on the list.
>>>>>>
>>>>>> The blocking happons on a server with local kvm-clients reading and
>>>>>> writing to a local btrfs-volume over nfs.
>>>>>> The btrfs-volume is on top of dm-crypt devices.
>>>>>>
>>>>>> Any additional info I can give to help?
>>>>>> Tests you want me to run?
>>>>> Very strange, since I don't actually see what we're waiting for.  Can
>>>>> you please either send me your btrfs.ko or use gdb to see where this
>>>>> statement is:
>>>>>
>>>>>
>>>>> btrfs_commit_transaction+0x315
>>>>>
>>>>> The syntax is
>>>>>
>>>>> gdb btrfs.ko
>>>>> gdb> list *btrfs_commit_transaction+0x315
>>>>>
>>>>> -chris
>>>> Sure, here you go.
>>>>
>>>> Reading symbols from btrfs.ko...done.
>>>> (gdb) list *btrfs_commit_transaction+0x315
>>>> 0x30f95 is in btrfs_commit_transaction (fs/btrfs/transaction.c:1752).
>>>> 1747         * COMMIT_DOING so make sure to wait for num_writers to ==
>>>> 1 again.
>>>> 1748         */
>>>> 1749        spin_lock(&root->fs_info->trans_lock);
>>>> 1750        cur_trans->state = TRANS_STATE_COMMIT_DOING;
>>>> 1751        spin_unlock(&root->fs_info->trans_lock);
>>>> 1752        wait_event(cur_trans->writer_wait,
>>>> 1753               atomic_read(&cur_trans->num_writers) == 1);
>>>> 1754
>>>> 1755        /* ->aborted might be set after the previous check, so
>>>> check it */
>>>> 1756        if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
>>>> (gdb)
>>>>
>>>> I'm attaching the btrfs.ko as well, hopefully the 20M file gets through.
>
> Ok, we're stuck here.  The transaction won't coomplete until this disk IO is done.
>
> Since this is just a read, are you able to read from the device when this is
> happening?  This would be the dm-crypt block device w/btrfs on it.
>
> [180625.987870] kworker/u16:12  D ffff88042fd94500     0 15271      2 0x00000000
> [180625.987935] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
> [180625.987987]  ffff880107a4f648 0000000000000002 ffff88001383b260 ffff880107a4ffd8
> [180625.988075]  0000000000014500 0000000000014500 ffff880419749930 ffff88042fd94e18
> [180625.988163]  ffff88042ffadce8 0000000000000002 ffffffff8114df40 ffff880107a4f6c0
> [180625.988251] Call Trace:
> [180625.988291]  [<ffffffff8114df40>] ? wait_on_page_read+0x60/0x60
> [180625.988342]  [<ffffffff816f84cd>] io_schedule+0x9d/0x140
> [180625.988391]  [<ffffffff8114df4e>] sleep_on_page+0xe/0x20
> [180625.988440]  [<ffffffff816f8962>] __wait_on_bit+0x62/0x90
> [180625.988490]  [<ffffffff8114dd0f>] wait_on_page_bit+0x7f/0x90
> [180625.988541]  [<ffffffff810acf80>] ? autoremove_wake_function+0x40/0x40
> [180625.988601]  [<ffffffffa01c8e8a>] read_extent_buffer_pages+0x2ca/0x300 [btrfs]
> [180625.988687]  [<ffffffffa019dd70>] ? free_root_pointers+0x60/0x60 [btrfs]
> [180625.988746]  [<ffffffffa019efa3>] btree_read_extent_buffer_pages.constprop.52+0xb3/0x120 [btrfs]
> [180625.988839]  [<ffffffffa01a00c8>] read_tree_block+0x38/0x60 [btrfs]
> [180625.988895]  [<ffffffffa01828c8>] read_block_for_search.isra.33+0x148/0x380 [btrfs]
> [180625.988983]  [<ffffffffa0187f97>] btrfs_next_old_leaf+0x297/0x4a0 [btrfs]
> [180625.989041]  [<ffffffffa01881b0>] btrfs_next_leaf+0x10/0x20 [btrfs]
> [180625.989099]  [<ffffffffa01cdc9c>] find_free_dev_extent+0xbc/0x350 [btrfs]
> [180625.989159]  [<ffffffffa01ce0e4>] __btrfs_alloc_chunk+0x1b4/0x770 [btrfs]
> [180625.989214]  [<ffffffff8101ad25>] ? native_sched_clock+0x35/0x90
> [180625.989265]  [<ffffffff811bef49>] ? __sb_start_write+0x49/0xe0
> [180625.989322]  [<ffffffffa01d0b94>] btrfs_alloc_chunk+0x34/0x40 [btrfs]
> [180625.989380]  [<ffffffffa018f9fe>] do_chunk_alloc+0x23e/0x410 [btrfs]
> [180625.989438]  [<ffffffffa0194753>] find_free_extent+0xb03/0xbb0 [btrfs]
> [180625.989496]  [<ffffffffa01949d8>] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
> [180625.989555]  [<ffffffffa01ad9f5>] cow_file_range+0x135/0x440 [btrfs]
> [180625.989613]  [<ffffffffa01aecff>] submit_compressed_extents+0x1bf/0x480 [btrfs]
> [180625.989700]  [<ffffffffa01ac804>] ? async_cow_free+0x24/0x30 [btrfs]
> [180625.989758]  [<ffffffffa01aefc0>] ? submit_compressed_extents+0x480/0x480 [btrfs]
> [180625.989845]  [<ffffffffa01af046>] async_cow_submit+0x86/0x90 [btrfs]
> [180625.989904]  [<ffffffffa01d5333>] normal_work_helper+0x193/0x2b0 [btrfs]
> [180625.989957]  [<ffffffff81081532>] process_one_work+0x182/0x450
> [180625.990008]  [<ffffffff81082331>] worker_thread+0x121/0x410
> [180625.990058]  [<ffffffff81082210>] ? rescuer_thread+0x430/0x430
> [180625.990109]  [<ffffffff81088e72>] kthread+0xd2/0xf0
> [180625.990156]  [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190
> [180625.990210]  [<ffffffff81704dfc>] ret_from_fork+0x7c/0xb0
> [180625.990259]  [<ffffffff81088da0>] ? kthread_create_on_node+0x190/0x190
>
It's actually a raid10 array of 11 dm-crypt devices.
I'm able to read data from the array (accessing files), and also read 
directly from all the underlying dm-crypt devices using dd, if that's 
what you meant.

I have not rebooted the system since that dmesg, so it's still stuck in 
the same state.
I can keep it like that for some days. The system is not critical at all.

--
Torbjørn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-28  5:53           ` Torbjørn
@ 2014-05-28 13:41             ` Chris Mason
  2014-05-28 14:56               ` Torbjørn
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Mason @ 2014-05-28 13:41 UTC (permalink / raw)
  To: Torbjørn, linux-btrfs

On 05/28/2014 01:53 AM, Torbjørn wrote:

> It's actually a raid10 array of 11 dm-crypt devices.
> I'm able to read data from the array (accessing files), and also read
> directly from all the underlying dm-crypt devices using dd, if that's
> what you meant.
> 
> I have not rebooted the system since that dmesg, so it's still stuck in
> the same state.
> I can keep it like that for some days. The system is not critical at all.
> 

Thanks,

Can you please capture the result of sysrq-t and sysrq-l?  Maybe the IO
is stuck in a thread that isn't in the D state.

-chris

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-28 13:41             ` Chris Mason
@ 2014-05-28 14:56               ` Torbjørn
  2014-06-01 21:29                 ` Marc MERLIN
  0 siblings, 1 reply; 12+ messages in thread
From: Torbjørn @ 2014-05-28 14:56 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On 05/28/2014 03:41 PM, Chris Mason wrote:
> On 05/28/2014 01:53 AM, Torbjørn wrote:
>
>> It's actually a raid10 array of 11 dm-crypt devices.
>> I'm able to read data from the array (accessing files), and also read
>> directly from all the underlying dm-crypt devices using dd, if that's
>> what you meant.
>>
>> I have not rebooted the system since that dmesg, so it's still stuck in
>> the same state.
>> I can keep it like that for some days. The system is not critical at all.
>>
> Thanks,
>
> Can you please capture the result of sysrq-t and sysrq-l?  Maybe the IO
> is stuck in a thread that isn't in the D state.
>
> -chris
The output of sysrq-t was too much data for pastebin, will send both to 
you as attachments outside of the list.

--
Torbjørn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-05-28 14:56               ` Torbjørn
@ 2014-06-01 21:29                 ` Marc MERLIN
  2014-06-02  3:55                   ` Andrew McGlashan
                                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Marc MERLIN @ 2014-06-01 21:29 UTC (permalink / raw)
  To: Torbjørn; +Cc: Chris Mason, linux-btrfs

On Wed, May 28, 2014 at 04:56:56PM +0200, Torbjørn wrote:
> On 05/28/2014 03:41 PM, Chris Mason wrote:
> >On 05/28/2014 01:53 AM, Torbjørn wrote:
> >
> >>It's actually a raid10 array of 11 dm-crypt devices.
> >>I'm able to read data from the array (accessing files), and also read
> >>directly from all the underlying dm-crypt devices using dd, if that's
> >>what you meant.
> >>
> >>I have not rebooted the system since that dmesg, so it's still stuck in
> >>the same state.
> >>I can keep it like that for some days. The system is not critical at all.
> >>
> >Thanks,
> >
> >Can you please capture the result of sysrq-t and sysrq-l?  Maybe the IO
> >is stuck in a thread that isn't in the D state.
> >
> >-chris
> The output of sysrq-t was too much data for pastebin, will send both
> to you as attachments outside of the list.

Considering I have experienced this with with 3.15rc but wasn't able to
save a sysrq since my save file and syslog got lost, did you find out
anything from that?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-06-01 21:29                 ` Marc MERLIN
@ 2014-06-02  3:55                   ` Andrew McGlashan
  2014-06-02  7:29                   ` Torbjørn
  2014-06-05 11:44                   ` Gary Coulbourne
  2 siblings, 0 replies; 12+ messages in thread
From: Andrew McGlashan @ 2014-06-02  3:55 UTC (permalink / raw)
  To: Marc MERLIN, Torbjørn; +Cc: Chris Mason, linux-btrfs

Hi,

I'm getting "blocked for more than 120 seconds" messages too.

In my case, it is for a simple RAID1 volume that is rebuilding for the
first time (new volume), whilst mounted as a LUKS crypt volume and
writing /dev/zero before actual deployment.

I am running a stock Wheezy kernel:

~ # uname -a
Linux n4800eco-a 3.2.0-4-amd64 #1 SMP Debian 3.2.57-3 x86_64 GNU/Linux

As I am deliberately NOT running btrfs, the common link may be mdadm or
LUKS.

I'm also noticing btrfs modules being loaded on another box, even though
I am not at all ready to trust btrfs for any file system at this time.

There may be some bug with mdadm/crypt(LUKS)/btrfs -- or it may have
nothing to do with btrfs.  My problem almost certainly isn't btrfs
unless there is some kernel code giving me grief without loading the
btrfs module.

More details of my problem:

 https://lists.debian.org/debian-user/2014/06/msg00070.html

NB: working through my problem in a dropbear busybox environment that
hasn't loaded btrfs module.  The kernel is the same as the /normal/
system, compliments of mkinitramfs tool....

Cheers
A.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-06-01 21:29                 ` Marc MERLIN
  2014-06-02  3:55                   ` Andrew McGlashan
@ 2014-06-02  7:29                   ` Torbjørn
  2014-06-05 11:44                   ` Gary Coulbourne
  2 siblings, 0 replies; 12+ messages in thread
From: Torbjørn @ 2014-06-02  7:29 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Chris Mason, linux-btrfs

On 01. juni 2014 23:29, Marc MERLIN wrote:
> On Wed, May 28, 2014 at 04:56:56PM +0200, Torbjørn wrote:
>> On 05/28/2014 03:41 PM, Chris Mason wrote:
>>> On 05/28/2014 01:53 AM, Torbjørn wrote:
>>>
>>>> It's actually a raid10 array of 11 dm-crypt devices.
>>>> I'm able to read data from the array (accessing files), and also read
>>>> directly from all the underlying dm-crypt devices using dd, if that's
>>>> what you meant.
>>>>
>>>> I have not rebooted the system since that dmesg, so it's still stuck in
>>>> the same state.
>>>> I can keep it like that for some days. The system is not critical at all.
>>>>
>>> Thanks,
>>>
>>> Can you please capture the result of sysrq-t and sysrq-l?  Maybe the IO
>>> is stuck in a thread that isn't in the D state.
>>>
>>> -chris
>> The output of sysrq-t was too much data for pastebin, will send both
>> to you as attachments outside of the list.
> Considering I have experienced this with with 3.15rc but wasn't able to
> save a sysrq since my save file and syslog got lost, did you find out
> anything from that?
>
> Thanks,
> Marc
Hi Marc,

I have not done any real investigation of those yet, so no.

The server is mainly for testing, so I had already netconsole set up for it.
Btrfs only blocked on the one array though. Root is btrfs raid1 with 2 
devices, and it was operating as normal.

If it's of any interest, I can make the captured logs available for 
download.

--
Torbjørn


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds.
  2014-06-01 21:29                 ` Marc MERLIN
  2014-06-02  3:55                   ` Andrew McGlashan
  2014-06-02  7:29                   ` Torbjørn
@ 2014-06-05 11:44                   ` Gary Coulbourne
  2 siblings, 0 replies; 12+ messages in thread
From: Gary Coulbourne @ 2014-06-05 11:44 UTC (permalink / raw)
  To: linux-btrfs

I am also getting this...

My system is configured like so:

3.15.0-rc8 
Btrfs v3.12

#btrfs fi show
Label: none  uuid: e421ceeb-6e4e-4c7e-84b4-6f25442745fa
	Total devices 1 FS bytes used 22.51GiB
	devid    1 size 116.75GiB used 28.02GiB path /dev/sda4

Label: tank  uuid: 52044d1c-5308-40f7-9d21-4edca3b63d05
	Total devices 5 FS bytes used 1.87TiB
	devid    2 size 931.51GiB used 367.00GiB path /dev/sdc
	devid    3 size 931.51GiB used 366.00GiB path /dev/sdf
	devid    4 size 2.73TiB used 1.88TiB path /dev/sde
	devid    5 size 1.36TiB used 833.00GiB path /dev/sdd
	devid    6 size 931.51GiB used 364.03GiB path /dev/sdb

/tank is the filesystem giving the trouble.  Here's the df for it:

Data, RAID1: total=1.88TiB, used=1.87TiB
System, RAID1: total=32.00MiB, used=276.00KiB
Metadata, RAID1: total=4.00GiB, used=2.84GiB
unknown, single: total=512.00MiB, used=0.00

It is just a raid 1, zlib-compressed.  No encryption.

If I do any kind of large file transfers (large amounts OR many files, not
sure what aspect causes it) with it, the following happens:

My load average shoots up to HUGE numbers, the filesystem becomes unusable
although directories are still readable that do not have files that are
being accessed in them, and the dmesg begins filling with:

Jun  5 06:55:40 maul kernel: [  361.276631] INFO: task kworker/u8:0:6
blocked for more than 120 seconds.
Jun  5 06:55:40 maul kernel: [  361.276640]       Not tainted 3.15.0-rc8 #1
Jun  5 06:55:40 maul kernel: [  361.276644] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  5 06:55:40 maul kernel: [  361.276649] kworker/u8:0    D
0000000000000000     0     6      2 0x00000000
Jun  5 06:55:40 maul kernel: [  361.276665] Workqueue: writeback
bdi_writeback_workfn (flush-btrfs-2)
Jun  5 06:55:40 maul kernel: [  361.276671]  ffff8802331c79f8
0000000000000002 ffff8802331c8000 ffff8802331c7fd8
Jun  5 06:55:40 maul kernel: [  361.276679]  00000000000147c0
00000000000147c0 ffff880233226540 ffff8802331c8000
Jun  5 06:55:40 maul kernel: [  361.276686]  ffff88023ed150d8
ffff88023efdeb68 ffff8802331c7a80 0000000000000002
Jun  5 06:55:40 maul kernel: [  361.276693] Call Trace:
Jun  5 06:55:40 maul kernel: [  361.276706]  [<ffffffff8114a1d0>] ?
wait_on_page_read+0x60/0x60
Jun  5 06:55:40 maul kernel: [  361.276714]  [<ffffffff817aeb9f>]
io_schedule+0xaf/0x150
Jun  5 06:55:40 maul kernel: [  361.276721]  [<ffffffff8114a1de>]
sleep_on_page+0xe/0x20
Jun  5 06:55:40 maul kernel: [  361.276727]  [<ffffffff817af213>]
__wait_on_bit_lock+0x53/0xb0
Jun  5 06:55:40 maul kernel: [  361.276734]  [<ffffffff8114a2fa>]
__lock_page+0x6a/0x70
Jun  5 06:55:40 maul kernel: [  361.276741]  [<ffffffff810a4430>] ?
autoremove_wake_function+0x40/0x40
Jun  5 06:55:40 maul kernel: [  361.276749]  [<ffffffff8131579e>] ?
flush_write_bio+0xe/0x10
Jun  5 06:55:40 maul kernel: [  361.276756]  [<ffffffff81319d50>]
extent_write_cache_pages.isra.28.constprop.50+0x230/0x350
Jun  5 06:55:40 maul kernel: [  361.276763]  [<ffffffff8109cb33>] ?
find_busiest_group+0x133/0x830
Jun  5 06:55:40 maul kernel: [  361.276771]  [<ffffffff8131b06c>]
extent_writepages+0x4c/0x60
Jun  5 06:55:40 maul kernel: [  361.276779]  [<ffffffff812fecb0>] ?
btrfs_writepage_end_io_hook+0x190/0x190
Jun  5 06:55:40 maul kernel: [  361.276785]  [<ffffffff817b7134>] ?
preempt_count_add+0x54/0xa0
Jun  5 06:55:40 maul kernel: [  361.276791]  [<ffffffff812fc178>]
btrfs_writepages+0x28/0x30
Jun  5 06:55:40 maul kernel: [  361.276798]  [<ffffffff81157dfe>]
do_writepages+0x1e/0x40
Jun  5 06:55:40 maul kernel: [  361.276805]  [<ffffffff811ea2d0>]
__writeback_single_inode+0x40/0x2a0
Jun  5 06:55:40 maul kernel: [  361.276812]  [<ffffffff811ed4ff>]
writeback_sb_inodes+0x23f/0x3f0
Jun  5 06:55:40 maul kernel: [  361.276821]  [<ffffffff811ed747>]
__writeback_inodes_wb+0x97/0xd0
Jun  5 06:55:40 maul kernel: [  361.276828]  [<ffffffff811ed97b>]
wb_writeback+0x1fb/0x310
Jun  5 06:55:40 maul kernel: [  361.276836]  [<ffffffff811eddd6>]
bdi_writeback_workfn+0x1d6/0x490
Jun  5 06:55:40 maul kernel: [  361.276844]  [<ffffffff810779c8>]
process_one_work+0x178/0x4b0
Jun  5 06:55:40 maul kernel: [  361.276850]  [<ffffffff81078731>]
worker_thread+0x131/0x3d0
Jun  5 06:55:40 maul kernel: [  361.276856]  [<ffffffff81078600>] ?
manage_workers.isra.25+0x2c0/0x2c0
Jun  5 06:55:40 maul kernel: [  361.276863]  [<ffffffff8107eabb>]
kthread+0xdb/0x100
Jun  5 06:55:40 maul kernel: [  361.276871]  [<ffffffff8107e9e0>] ?
kthread_create_on_node+0x190/0x190
Jun  5 06:55:40 maul kernel: [  361.276878]  [<ffffffff817bb7fc>]
ret_from_fork+0x7c/0xb0
Jun  5 06:55:40 maul kernel: [  361.276884]  [<ffffffff8107e9e0>] ?
kthread_create_on_node+0x190/0x190
Jun  5 06:55:40 maul kernel: [  361.276896] INFO: task kworker/u8:1:55
blocked for more than 120 seconds.
Jun  5 06:55:40 maul kernel: [  361.276900]       Not tainted 3.15.0-rc8 #1
Jun  5 06:55:40 maul kernel: [  361.276903] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  5 06:55:40 maul kernel: [  361.276906] kworker/u8:1    D
ffffffff814231d7     0    55      2 0x00000000
Jun  5 06:55:40 maul kernel: [  361.276916] Workqueue: btrfs-cache
normal_work_helper
Jun  5 06:55:40 maul kernel: [  361.276919]  ffff880232ed3c60
0000000000000002 ffff880232e11950 ffff880232ed3fd8
Jun  5 06:55:40 maul kernel: [  361.276927]  00000000000147c0
00000000000147c0 ffff88022def9950 ffff880232e11950
Jun  5 06:55:40 maul kernel: [  361.276933]  ffff880232e11950
ffff88022f75c680 ffff88022f75c688 0000000000000040
Jun  5 06:55:40 maul kernel: [  361.276940] Call Trace:
Jun  5 06:55:40 maul kernel: [  361.276949]  [<ffffffff817ae839>]
schedule+0x29/0x70
Jun  5 06:55:40 maul kernel: [  361.276955]  [<ffffffff817b19a5>]
rwsem_down_read_failed+0xc5/0x160
Jun  5 06:55:40 maul kernel: [  361.276962]  [<ffffffff812da77c>] ?
btrfs_next_old_leaf+0x1dc/0x4a0
Jun  5 06:55:40 maul kernel: [  361.276971]  [<ffffffff814191f4>]
call_rwsem_down_read_failed+0x14/0x30
Jun  5 06:55:40 maul kernel: [  361.276979]  [<ffffffff817b13b7>] ?
down_read+0x17/0x20
Jun  5 06:55:40 maul kernel: [  361.276985]  [<ffffffff812dfc5b>]
caching_thread+0xeb/0x490
Jun  5 06:55:40 maul kernel: [  361.276993]  [<ffffffff8132960f>]
normal_work_helper+0x12f/0x310
Jun  5 06:55:40 maul kernel: [  361.276999]  [<ffffffff810779c8>]
process_one_work+0x178/0x4b0
Jun  5 06:55:40 maul kernel: [  361.277005]  [<ffffffff81078731>]
worker_thread+0x131/0x3d0
Jun  5 06:55:40 maul kernel: [  361.277011]  [<ffffffff81078600>] ?
manage_workers.isra.25+0x2c0/0x2c0
Jun  5 06:55:40 maul kernel: [  361.277018]  [<ffffffff8107eabb>]
kthread+0xdb/0x100
Jun  5 06:55:40 maul kernel: [  361.277025]  [<ffffffff8107e9e0>] ?
kthread_create_on_node+0x190/0x190
Jun  5 06:55:40 maul kernel: [  361.277032]  [<ffffffff817bb7fc>]
ret_from_fork+0x7c/0xb0
Jun  5 06:55:40 maul kernel: [  361.277039]  [<ffffffff8107e9e0>] ?
kthread_create_on_node+0x190/0x190

This had all seemed to be fine in kernel 3.13.

Peace,
Gary


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-06-05 12:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-27 18:11 3.15-rc6 - btrfs-transacti:4157 blocked for more than 120 seconds Torbjørn
2014-05-27 19:09 ` Chris Mason
     [not found]   ` <5384F0B4.7040309@skagestad.org>
2014-05-27 20:42     ` Torbjørn
2014-05-27 20:50       ` Chris Mason
2014-05-27 21:15         ` Chris Mason
2014-05-28  5:53           ` Torbjørn
2014-05-28 13:41             ` Chris Mason
2014-05-28 14:56               ` Torbjørn
2014-06-01 21:29                 ` Marc MERLIN
2014-06-02  3:55                   ` Andrew McGlashan
2014-06-02  7:29                   ` Torbjørn
2014-06-05 11:44                   ` Gary Coulbourne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).