All of lore.kernel.org
 help / color / mirror / Atom feed
* Hanging after frequent use of systemd-nspawn --ephemeral
@ 2018-01-14  1:36 Johannes Ernst
  2018-01-14  2:27 ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Ernst @ 2018-01-14  1:36 UTC (permalink / raw)
  To: linux-btrfs

Summary: frequent “hangs” of the system with dmesg saying:

task systemd:22229 blocked for more than 120 seconds.
[ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
[ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2948.928665] systemd         D    0 22229  22226 0x00000100
[ 2948.928671] Call Trace:
[ 2948.928684]  ? __schedule+0x290/0x890
[ 2948.928690]  schedule+0x2f/0x90
[ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
[ 2948.928752]  ? wait_woken+0x80/0x80
[ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
[ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
[ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
[ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
[ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
[ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]
[ 2948.929026]  btrfs_add_delayed_tree_ref+0x1db/0x200 [btrfs]
[ 2948.929055]  btrfs_inc_extent_ref+0xa6/0xe0 [btrfs]
[ 2948.929084]  __btrfs_mod_ref+0x144/0x230 [btrfs]
[ 2948.929110]  ? add_pinned_bytes+0x40/0x40 [btrfs]
[ 2948.929136]  update_ref_for_cow+0xd2/0x320 [btrfs]
[ 2948.929161]  __btrfs_cow_block+0x1f9/0x5b0 [btrfs]
[ 2948.929187]  btrfs_cow_block+0xca/0x1c0 [btrfs]
[ 2948.929212]  btrfs_search_slot+0x321/0x9f0 [btrfs]
[ 2948.929239]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
[ 2948.929271]  insert_with_overflow+0x44/0x110 [btrfs]
[ 2948.929303]  btrfs_insert_dir_item+0xcd/0x280 [btrfs]
[ 2948.929342]  btrfs_add_link+0xe8/0x430 [btrfs]
[ 2948.929349]  ? getnstimeofday64+0xa/0x20
[ 2948.929384]  btrfs_mkdir+0x1d3/0x210 [btrfs]
[ 2948.929392]  vfs_mkdir+0xd7/0x140
[ 2948.929397]  SyS_mkdir+0x7a/0xf0
[ 2948.929404]  do_syscall_64+0x55/0x110
[ 2948.929409]  entry_SYSCALL64_slow_path+0x25/0x25
[ 2948.929413] RIP: 0033:0x7f8735697687
[ 2948.929415] RSP: 002b:00007ffc8e72d028 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[ 2948.929419] RAX: ffffffffffffffda RBX: 000000000003a2f8 RCX: 00007f8735697687
[ 2948.929421] RDX: 0000000000000156 RSI: 00000000000001c0 RDI: 0000561976693a80
[ 2948.929423] RBP: 0000000000000001 R08: 000056197666b880 R09: 0000561974c0f734
[ 2948.929425] R10: 00007ffc8e72d030 R11: 0000000000000246 R12: 0000561976693a80
[ 2948.929427] R13: 0000561976693ad3 R14: 00007f8735732f80 R15: 8421084210842109


Detail:

I build a Linux distro called UBOS [1]. For testing purposes, we boot UBOS in a systemd-nspawn container, do some automated tests, and poweroff the container. A few dozen of those tests can happen in rapid succession (but not in parallel). 

We run systemd-nspawn with the —ephemeral flag in some directory on a btrfs filesystem, which causes systemd to allocate a temporary subvolume that goes away again when the container quits, leaving the filesystem unchanged (this is very handy for testing purposes).

That works well … but not for long. Often we don’t make it through the test suite and the starting of new containers hangs. Other filesystem operations also hang. The above stacktrace, or something rather similar shows up in dmesg (not in the journal, because that hangs, too!) This is repeated, but I don’t see any other relevant messages. Only a reboot seems to allows to recover.

This happens on two-disk RAID1 as well as one-disk systems, and it happens on x86_64, armv6, armv7 and aarch64. So it does not seem to have anything to do with any particular disk or filesystem. The disks have plenty of space. aIt has been the same behavior for some time, so it’s not a recent problem. UBOS is a derivative of Arch, so it’s fairly current in terms of code. Current kernel versions are 4.14.x on all platforms.

Any idea what this might be or how to debug this? 

Thanks,



Johannes.


[1] https://ubos.net/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-14  1:36 Hanging after frequent use of systemd-nspawn --ephemeral Johannes Ernst
@ 2018-01-14  2:27 ` Qu Wenruo
  2018-01-14 16:30   ` Duncan
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Qu Wenruo @ 2018-01-14  2:27 UTC (permalink / raw)
  To: Johannes Ernst, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4836 bytes --]



On 2018年01月14日 09:36, Johannes Ernst wrote:
> Summary: frequent “hangs” of the system with dmesg saying:
> 
> task systemd:22229 blocked for more than 120 seconds.
> [ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
> [ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2948.928665] systemd         D    0 22229  22226 0x00000100
> [ 2948.928671] Call Trace:
> [ 2948.928684]  ? __schedule+0x290/0x890
> [ 2948.928690]  schedule+0x2f/0x90
> [ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
> [ 2948.928752]  ? wait_woken+0x80/0x80
> [ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
> [ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
> [ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
> [ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
> [ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
> [ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]

You're using qgroup, and the timing is to find the old_roots of one
extent, which will only search commit roots.

Normally this shouldn't cause any problem, especially for commit roots.

Is there any special operation happening?

> [ 2948.929026]  btrfs_add_delayed_tree_ref+0x1db/0x200 [btrfs]
> [ 2948.929055]  btrfs_inc_extent_ref+0xa6/0xe0 [btrfs]
> [ 2948.929084]  __btrfs_mod_ref+0x144/0x230 [btrfs]
> [ 2948.929110]  ? add_pinned_bytes+0x40/0x40 [btrfs]
> [ 2948.929136]  update_ref_for_cow+0xd2/0x320 [btrfs]
> [ 2948.929161]  __btrfs_cow_block+0x1f9/0x5b0 [btrfs]
> [ 2948.929187]  btrfs_cow_block+0xca/0x1c0 [btrfs]
> [ 2948.929212]  btrfs_search_slot+0x321/0x9f0 [btrfs]
> [ 2948.929239]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
> [ 2948.929271]  insert_with_overflow+0x44/0x110 [btrfs]
> [ 2948.929303]  btrfs_insert_dir_item+0xcd/0x280 [btrfs]
> [ 2948.929342]  btrfs_add_link+0xe8/0x430 [btrfs]
> [ 2948.929349]  ? getnstimeofday64+0xa/0x20
> [ 2948.929384]  btrfs_mkdir+0x1d3/0x210 [btrfs]
> [ 2948.929392]  vfs_mkdir+0xd7/0x140
> [ 2948.929397]  SyS_mkdir+0x7a/0xf0
> [ 2948.929404]  do_syscall_64+0x55/0x110
> [ 2948.929409]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 2948.929413] RIP: 0033:0x7f8735697687
> [ 2948.929415] RSP: 002b:00007ffc8e72d028 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
> [ 2948.929419] RAX: ffffffffffffffda RBX: 000000000003a2f8 RCX: 00007f8735697687
> [ 2948.929421] RDX: 0000000000000156 RSI: 00000000000001c0 RDI: 0000561976693a80
> [ 2948.929423] RBP: 0000000000000001 R08: 000056197666b880 R09: 0000561974c0f734
> [ 2948.929425] R10: 00007ffc8e72d030 R11: 0000000000000246 R12: 0000561976693a80
> [ 2948.929427] R13: 0000561976693ad3 R14: 00007f8735732f80 R15: 8421084210842109
> 
> 
> Detail:
> 
> I build a Linux distro called UBOS [1]. For testing purposes, we boot UBOS in a systemd-nspawn container, do some automated tests, and poweroff the container. A few dozen of those tests can happen in rapid succession (but not in parallel). 
> 
> We run systemd-nspawn with the —ephemeral flag in some directory on a btrfs filesystem, which causes systemd to allocate a temporary subvolume that goes away again when the container quits, leaving the filesystem unchanged (this is very handy for testing purposes).
> 
> That works well … but not for long. Often we don’t make it through the test suite and the starting of new containers hangs. Other filesystem operations also hang. The above stacktrace, or something rather similar shows up in dmesg (not in the journal, because that hangs, too!) This is repeated, but I don’t see any other relevant messages. Only a reboot seems to allows to recover.

So Qgroup is used to limit disk usage of each container?

Maybe it's related to subvolume deletion?

Anyway, if qgroup is not necessary, disabling qgroup should fix your
problem.

Despite of that, did that really hangs?
Qgroup dramatically increase overhead to delete a subvolume or balance
the fs.
Maybe it's just a little slow?

Thanks,
Qu

> 
> This happens on two-disk RAID1 as well as one-disk systems, and it happens on x86_64, armv6, armv7 and aarch64. So it does not seem to have anything to do with any particular disk or filesystem. The disks have plenty of space. aIt has been the same behavior for some time, so it’s not a recent problem. UBOS is a derivative of Arch, so it’s fairly current in terms of code. Current kernel versions are 4.14.x on all platforms.
> 
> Any idea what this might be or how to debug this? 
> 
> Thanks,
> 
> 
> 
> Johannes.
> 
> 
> [1] https://ubos.net/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-14  2:27 ` Qu Wenruo
@ 2018-01-14 16:30   ` Duncan
  2018-01-14 20:17   ` Johannes Ernst
  2018-01-14 20:18   ` Johannes Ernst
  2 siblings, 0 replies; 10+ messages in thread
From: Duncan @ 2018-01-14 16:30 UTC (permalink / raw)
  To: linux-btrfs

Qu Wenruo posted on Sun, 14 Jan 2018 10:27:40 +0800 as excerpted:

> Despite of that, did that really hangs?
> Qgroup dramatically increase overhead to delete a subvolume or balance
> the fs.
> Maybe it's just a little slow?

Same question about the "hang" here.

Note that btrfs is optimized to make snapshot creation fast, while 
snapshot deletion has to do more work to clean things up.  So even 
without qgroup enabled, deletion can take a bit of time (much longer than 
creation, which should be nearly instantaneous in human terms) if there's 
a lot of relinks and the like to clean up.

And qgroups makes btrfs do much more work to track that as well, so as Qu 
says, that'll make snapshot deletion take even longer, and you probably 
want it disabled unless you actually need the feature for something 
you're doing.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-14  2:27 ` Qu Wenruo
  2018-01-14 16:30   ` Duncan
@ 2018-01-14 20:17   ` Johannes Ernst
  2018-01-15  0:48     ` Qu Wenruo
  2018-01-14 20:18   ` Johannes Ernst
  2 siblings, 1 reply; 10+ messages in thread
From: Johannes Ernst @ 2018-01-14 20:17 UTC (permalink / raw)
  To: linux-btrfs

> On Jan 13, 2018, at 18:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2018年01月14日 09:36, Johannes Ernst wrote:
>> Summary: frequent “hangs” of the system with dmesg saying:
>> 
>> task systemd:22229 blocked for more than 120 seconds.
>> [ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
>> [ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 2948.928665] systemd         D    0 22229  22226 0x00000100
>> [ 2948.928671] Call Trace:
>> [ 2948.928684]  ? __schedule+0x290/0x890
>> [ 2948.928690]  schedule+0x2f/0x90
>> [ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>> [ 2948.928752]  ? wait_woken+0x80/0x80
>> [ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
>> [ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
>> [ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
>> [ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]
> 
> You're using qgroup, and the timing is to find the old_roots of one
> extent, which will only search commit roots.
> 
> Normally this shouldn't cause any problem, especially for commit roots.
> 
> Is there any special operation happening?

Nope. It appears it happens right when systemd-nspawn begins to run and I am not executing any other commands.

I did not realize qgroups are involved … all I did is mkfs.btrfs and running systemd-nspawn :-) Perhaps the defaults should be qgroups off? (But I digress)

>> That works well … but not for long. Often we don’t make it through the test suite and the starting of new containers hangs. Other filesystem operations also hang. The above stacktrace, or something rather similar shows up in dmesg (not in the journal, because that hangs, too!) This is repeated, but I don’t see any other relevant messages. Only a reboot seems to allows to recover.
> 
> So Qgroup is used to limit disk usage of each container?
> 
> Maybe it's related to subvolume deletion?
> 
> Anyway, if qgroup is not necessary, disabling qgroup should fix your
> problem.
> 
> Despite of that, did that really hangs?
> Qgroup dramatically increase overhead to delete a subvolume or balance
> the fs.
> Maybe it's just a little slow?

I have waited for several hours and the system has not recovered (me walking away from the running tests, returning hours later).

Update: so I executed "btrfs quota disable” on all relevant file systems. (right?) Running tests again, this morning I’m getting:

INFO: task systemd-journal:20876 blocked for more than 120 seconds.
[ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
[ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
[ 5037.962622] Call Trace:
[ 5037.962635]  ? __schedule+0x290/0x890
[ 5037.962640]  ? __slab_free+0x14e/0x300
[ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
[ 5037.962651]  schedule+0x2f/0x90
[ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
[ 5037.962713]  ? wait_woken+0x80/0x80
[ 5037.962739]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 5037.962767]  btrfs_search_slot+0x703/0x9f0 [btrfs]
[ 5037.962796]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
[ 5037.962841]  btrfs_insert_orphan_item+0x66/0xa0 [btrfs]
[ 5037.962880]  btrfs_orphan_add+0xa1/0x200 [btrfs]
[ 5037.962919]  btrfs_setattr+0x123/0x3b0 [btrfs]
[ 5037.962926]  notify_change+0x2fd/0x420
[ 5037.962933]  do_truncate+0x75/0xc0
[ 5037.962940]  do_sys_ftruncate.constprop.19+0xe7/0x100
[ 5037.962947]  do_syscall_64+0x55/0x110
[ 5037.962952]  entry_SYSCALL64_slow_path+0x25/0x25
[ 5037.962956] RIP: 0033:0x7fd9423697ba
[ 5037.962958] RSP: 002b:00007fff1179cc18 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
[ 5037.962962] RAX: ffffffffffffffda RBX: 000055bd48cbe9f0 RCX: 00007fd9423697ba
[ 5037.962965] RDX: 000055bd48cbe660 RSI: 0000000000640000 RDI: 000000000000000f
[ 5037.962966] RBP: 00007fff1179cc50 R08: 000055bd48cbc62c R09: 000055bd48cbea6c
[ 5037.962968] R10: 000055bd48cbe9f0 R11: 0000000000000206 R12: 00007fff1179cc48
[ 5037.962970] R13: 0000000000000003 R14: 000562b7234e71a9 R15: 000055bd47749ca0

and doing a simple “touch /build/tmp/foo” never returns. 10+ hours have passed since the previous command was issued.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-14  2:27 ` Qu Wenruo
  2018-01-14 16:30   ` Duncan
  2018-01-14 20:17   ` Johannes Ernst
@ 2018-01-14 20:18   ` Johannes Ernst
  2 siblings, 0 replies; 10+ messages in thread
From: Johannes Ernst @ 2018-01-14 20:18 UTC (permalink / raw)
  To: linux-btrfs

> On Jan 13, 2018, at 18:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2018年01月14日 09:36, Johannes Ernst wrote:
>> Summary: frequent “hangs” of the system with dmesg saying:
>> 
>> task systemd:22229 blocked for more than 120 seconds.
>> [ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
>> [ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 2948.928665] systemd         D    0 22229  22226 0x00000100
>> [ 2948.928671] Call Trace:
>> [ 2948.928684]  ? __schedule+0x290/0x890
>> [ 2948.928690]  schedule+0x2f/0x90
>> [ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>> [ 2948.928752]  ? wait_woken+0x80/0x80
>> [ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
>> [ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
>> [ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>> [ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
>> [ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]
> 
> You're using qgroup, and the timing is to find the old_roots of one
> extent, which will only search commit roots.
> 
> Normally this shouldn't cause any problem, especially for commit roots.
> 
> Is there any special operation happening?

Nope. It appears it happens right when systemd-nspawn begins to run and I am not executing any other commands.

I did not realize qgroups are involved … all I did is mkfs.btrfs and running systemd-nspawn :-) Perhaps the defaults should be qgroups off? (But I digress)

>> That works well … but not for long. Often we don’t make it through the test suite and the starting of new containers hangs. Other filesystem operations also hang. The above stacktrace, or something rather similar shows up in dmesg (not in the journal, because that hangs, too!) This is repeated, but I don’t see any other relevant messages. Only a reboot seems to allows to recover.
> 
> So Qgroup is used to limit disk usage of each container?
> 
> Maybe it's related to subvolume deletion?
> 
> Anyway, if qgroup is not necessary, disabling qgroup should fix your
> problem.
> 
> Despite of that, did that really hangs?
> Qgroup dramatically increase overhead to delete a subvolume or balance
> the fs.
> Maybe it's just a little slow?

I have waited for several hours and the system has not recovered (me walking away from the running tests, returning hours later).

Update: so I executed "btrfs quota disable” on all relevant file systems. (right?) Running tests again, this morning I’m getting:

INFO: task systemd-journal:20876 blocked for more than 120 seconds.
[ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
[ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
[ 5037.962622] Call Trace:
[ 5037.962635]  ? __schedule+0x290/0x890
[ 5037.962640]  ? __slab_free+0x14e/0x300
[ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
[ 5037.962651]  schedule+0x2f/0x90
[ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
[ 5037.962713]  ? wait_woken+0x80/0x80
[ 5037.962739]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 5037.962767]  btrfs_search_slot+0x703/0x9f0 [btrfs]
[ 5037.962796]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
[ 5037.962841]  btrfs_insert_orphan_item+0x66/0xa0 [btrfs]
[ 5037.962880]  btrfs_orphan_add+0xa1/0x200 [btrfs]
[ 5037.962919]  btrfs_setattr+0x123/0x3b0 [btrfs]
[ 5037.962926]  notify_change+0x2fd/0x420
[ 5037.962933]  do_truncate+0x75/0xc0
[ 5037.962940]  do_sys_ftruncate.constprop.19+0xe7/0x100
[ 5037.962947]  do_syscall_64+0x55/0x110
[ 5037.962952]  entry_SYSCALL64_slow_path+0x25/0x25
[ 5037.962956] RIP: 0033:0x7fd9423697ba
[ 5037.962958] RSP: 002b:00007fff1179cc18 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
[ 5037.962962] RAX: ffffffffffffffda RBX: 000055bd48cbe9f0 RCX: 00007fd9423697ba
[ 5037.962965] RDX: 000055bd48cbe660 RSI: 0000000000640000 RDI: 000000000000000f
[ 5037.962966] RBP: 00007fff1179cc50 R08: 000055bd48cbc62c R09: 000055bd48cbea6c
[ 5037.962968] R10: 000055bd48cbe9f0 R11: 0000000000000206 R12: 00007fff1179cc48
[ 5037.962970] R13: 0000000000000003 R14: 000562b7234e71a9 R15: 000055bd47749ca0

and doing a simple “touch /build/tmp/foo” never returns. 10+ hours have passed since the previous command was issued.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-14 20:17   ` Johannes Ernst
@ 2018-01-15  0:48     ` Qu Wenruo
  2018-01-15  0:58       ` Johannes Ernst
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2018-01-15  0:48 UTC (permalink / raw)
  To: Johannes Ernst, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5160 bytes --]



On 2018年01月15日 04:17, Johannes Ernst wrote:
>> On Jan 13, 2018, at 18:27, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2018年01月14日 09:36, Johannes Ernst wrote:
>>> Summary: frequent “hangs” of the system with dmesg saying:
>>>
>>> task systemd:22229 blocked for more than 120 seconds.
>>> [ 2948.928653]       Tainted: G         C O    4.14.9-1-ARCH #1
>>> [ 2948.928658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [ 2948.928665] systemd         D    0 22229  22226 0x00000100
>>> [ 2948.928671] Call Trace:
>>> [ 2948.928684]  ? __schedule+0x290/0x890
>>> [ 2948.928690]  schedule+0x2f/0x90
>>> [ 2948.928744]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>>> [ 2948.928752]  ? wait_woken+0x80/0x80
>>> [ 2948.928799]  find_parent_nodes+0x341/0xfd0 [btrfs]
>>> [ 2948.928827]  ? btrfs_search_slot+0x84c/0x9f0 [btrfs]
>>> [ 2948.928873]  ? btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>>> [ 2948.928912]  btrfs_find_all_roots_safe+0x9e/0x110 [btrfs]
>>> [ 2948.928950]  btrfs_find_all_roots+0x45/0x60 [btrfs]
>>> [ 2948.928987]  btrfs_qgroup_trace_extent_post+0x30/0x60 [btrfs]
>>
>> You're using qgroup, and the timing is to find the old_roots of one
>> extent, which will only search commit roots.
>>
>> Normally this shouldn't cause any problem, especially for commit roots.
>>
>> Is there any special operation happening?
> 
> Nope. It appears it happens right when systemd-nspawn begins to run and I am not executing any other commands.
> 
> I did not realize qgroups are involved … all I did is mkfs.btrfs and running systemd-nspawn :-) Perhaps the defaults should be qgroups off? (But I digress)
> 
>>> That works well … but not for long. Often we don’t make it through the test suite and the starting of new containers hangs. Other filesystem operations also hang. The above stacktrace, or something rather similar shows up in dmesg (not in the journal, because that hangs, too!) This is repeated, but I don’t see any other relevant messages. Only a reboot seems to allows to recover.
>>
>> So Qgroup is used to limit disk usage of each container?
>>
>> Maybe it's related to subvolume deletion?
>>
>> Anyway, if qgroup is not necessary, disabling qgroup should fix your
>> problem.
>>
>> Despite of that, did that really hangs?
>> Qgroup dramatically increase overhead to delete a subvolume or balance
>> the fs.
>> Maybe it's just a little slow?
> 
> I have waited for several hours and the system has not recovered (me walking away from the running tests, returning hours later).
> 
> Update: so I executed "btrfs quota disable” on all relevant file systems. (right?) Running tests again, this morning I’m getting:

So not a bug for quota. (At least I have less thing to worry about)

> 
> INFO: task systemd-journal:20876 blocked for more than 120 seconds.
> [ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
> [ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
> [ 5037.962622] Call Trace:
> [ 5037.962635]  ? __schedule+0x290/0x890
> [ 5037.962640]  ? __slab_free+0x14e/0x300
> [ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
> [ 5037.962651]  schedule+0x2f/0x90
> [ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]

Still some tree write is blocking all incoming read.

BTW, did your tests include any send and relocation operations?

IIRC there is a bug that makes send and relocation (I'm not sure though)
will cause similar problem.

Thanks,
Qu

> [ 5037.962713]  ? wait_woken+0x80/0x80
> [ 5037.962739]  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> [ 5037.962767]  btrfs_search_slot+0x703/0x9f0 [btrfs]
> [ 5037.962796]  btrfs_insert_empty_items+0x66/0xb0 [btrfs]
> [ 5037.962841]  btrfs_insert_orphan_item+0x66/0xa0 [btrfs]
> [ 5037.962880]  btrfs_orphan_add+0xa1/0x200 [btrfs]
> [ 5037.962919]  btrfs_setattr+0x123/0x3b0 [btrfs]
> [ 5037.962926]  notify_change+0x2fd/0x420
> [ 5037.962933]  do_truncate+0x75/0xc0
> [ 5037.962940]  do_sys_ftruncate.constprop.19+0xe7/0x100
> [ 5037.962947]  do_syscall_64+0x55/0x110
> [ 5037.962952]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 5037.962956] RIP: 0033:0x7fd9423697ba
> [ 5037.962958] RSP: 002b:00007fff1179cc18 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
> [ 5037.962962] RAX: ffffffffffffffda RBX: 000055bd48cbe9f0 RCX: 00007fd9423697ba
> [ 5037.962965] RDX: 000055bd48cbe660 RSI: 0000000000640000 RDI: 000000000000000f
> [ 5037.962966] RBP: 00007fff1179cc50 R08: 000055bd48cbc62c R09: 000055bd48cbea6c
> [ 5037.962968] R10: 000055bd48cbe9f0 R11: 0000000000000206 R12: 00007fff1179cc48
> [ 5037.962970] R13: 0000000000000003 R14: 000562b7234e71a9 R15: 000055bd47749ca0
> 
> and doing a simple “touch /build/tmp/foo” never returns. 10+ hours have passed since the previous command was issued.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-15  0:48     ` Qu Wenruo
@ 2018-01-15  0:58       ` Johannes Ernst
  2018-01-15  1:34         ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Ernst @ 2018-01-15  0:58 UTC (permalink / raw)
  To: linux-btrfs

>> INFO: task systemd-journal:20876 blocked for more than 120 seconds.
>> [ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
>> [ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
>> [ 5037.962622] Call Trace:
>> [ 5037.962635]  ? __schedule+0x290/0x890
>> [ 5037.962640]  ? __slab_free+0x14e/0x300
>> [ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
>> [ 5037.962651]  schedule+0x2f/0x90
>> [ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
> 
> Still some tree write is blocking all incoming read.
> 
> BTW, did your tests include any send and relocation operations?
> 
> IIRC there is a bug that makes send and relocation (I'm not sure though)
> will cause similar problem.

No. I’m doing nothing btrfs-specific at all, it’s just whatever systemd-nspawn —ephemeral does.

Cheers,



Johannes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-15  0:58       ` Johannes Ernst
@ 2018-01-15  1:34         ` Qu Wenruo
  2018-01-15  1:47           ` Johannes Ernst
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2018-01-15  1:34 UTC (permalink / raw)
  To: Johannes Ernst, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1365 bytes --]



On 2018年01月15日 08:58, Johannes Ernst wrote:
>>> INFO: task systemd-journal:20876 blocked for more than 120 seconds.
>>> [ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
>>> [ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
>>> [ 5037.962622] Call Trace:
>>> [ 5037.962635]  ? __schedule+0x290/0x890
>>> [ 5037.962640]  ? __slab_free+0x14e/0x300
>>> [ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
>>> [ 5037.962651]  schedule+0x2f/0x90
>>> [ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>>
>> Still some tree write is blocking all incoming read.
>>
>> BTW, did your tests include any send and relocation operations?
>>
>> IIRC there is a bug that makes send and relocation (I'm not sure though)
>> will cause similar problem.
> 
> No. I’m doing nothing btrfs-specific at all, it’s just whatever systemd-nspawn —ephemeral does.

So, only systemd-nspwan --ephemeral, and nothing else at all?

Even nothing happened inside the container?

Thanks,
Qu

> 
> Cheers,
> 
> 
> 
> Johannes.--
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-15  1:34         ` Qu Wenruo
@ 2018-01-15  1:47           ` Johannes Ernst
  2018-01-15  1:55             ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Ernst @ 2018-01-15  1:47 UTC (permalink / raw)
  To: linux-btrfs

>>>> INFO: task systemd-journal:20876 blocked for more than 120 seconds.
>>>> [ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
>>>> [ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
>>>> [ 5037.962622] Call Trace:
>>>> [ 5037.962635]  ? __schedule+0x290/0x890
>>>> [ 5037.962640]  ? __slab_free+0x14e/0x300
>>>> [ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
>>>> [ 5037.962651]  schedule+0x2f/0x90
>>>> [ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>>> 
>>> Still some tree write is blocking all incoming read.
>>> 
>>> BTW, did your tests include any send and relocation operations?
>>> 
>>> IIRC there is a bug that makes send and relocation (I'm not sure though)
>>> will cause similar problem.
>> 
>> No. I’m doing nothing btrfs-specific at all, it’s just whatever systemd-nspawn —ephemeral does.
> 
> So, only systemd-nspwan --ephemeral, and nothing else at all?
> 
> Even nothing happened inside the container?


Gotcha. (Sorry for missing that)

Inside the container, snapper might be invoked. Here are the snippets:

snapper -c root create-config -t ubos-default /
snapper setup-quota
snapper -c root create --type pre --print-number
snapper -c root create --type post --pre-number <n>

Here is the snapper config:

# Snapper default configuration for UBOS.
#
# subvolume to snapshot
SUBVOLUME="/"

# filesystem type
FSTYPE="btrfs"

# btrfs qgroup for space aware cleanup algorithms
QGROUP=""

# fraction of the filesystems space the snapshots may use
SPACE_LIMIT="0.2"

# users and groups allowed to work with config
ALLOW_USERS=""
ALLOW_GROUPS=""

# sync users and groups from ALLOW_USERS and ALLOW_GROUPS to .snapshots
# directory
SYNC_ACL="no"

# start comparing pre- and post-snapshot in background after creating
# post-snapshot
BACKGROUND_COMPARISON="no"

# run daily number cleanup
NUMBER_CLEANUP="yes"

# limit for number cleanup; unit is seconds
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="50"
NUMBER_LIMIT_IMPORTANT="10"

# create hourly snapshots
TIMELINE_CREATE="no"

# cleanup hourly snapshots after some time
TIMELINE_CLEANUP="yes"

# limits for timeline cleanup; unit is seconds
TIMELINE_MIN_AGE="1800"
TIMELINE_LIMIT_HOURLY="12"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="0"
TIMELINE_LIMIT_MONTHLY="6"
TIMELINE_LIMIT_YEARLY="0"

# cleanup empty pre-post-pairs
EMPTY_PRE_POST_CLEANUP="yes"

# limits for empty pre-post-pair cleanup
EMPTY_PRE_POST_MIN_AGE="1800"






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Hanging after frequent use of systemd-nspawn --ephemeral
  2018-01-15  1:47           ` Johannes Ernst
@ 2018-01-15  1:55             ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2018-01-15  1:55 UTC (permalink / raw)
  To: Johannes Ernst, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3256 bytes --]



On 2018年01月15日 09:47, Johannes Ernst wrote:
>>>>> INFO: task systemd-journal:20876 blocked for more than 120 seconds.
>>>>> [ 5037.962603]       Tainted: G         C O    4.14.9-1-ARCH #1
>>>>> [ 5037.962609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [ 5037.962616] systemd-journal D    0 20876  20860 0x00000100
>>>>> [ 5037.962622] Call Trace:
>>>>> [ 5037.962635]  ? __schedule+0x290/0x890
>>>>> [ 5037.962640]  ? __slab_free+0x14e/0x300
>>>>> [ 5037.962645]  ? _copy_to_iter+0x8f/0x3d0
>>>>> [ 5037.962651]  schedule+0x2f/0x90
>>>>> [ 5037.962704]  btrfs_tree_read_lock+0xb6/0x100 [btrfs]
>>>>
>>>> Still some tree write is blocking all incoming read.
>>>>
>>>> BTW, did your tests include any send and relocation operations?
>>>>
>>>> IIRC there is a bug that makes send and relocation (I'm not sure though)
>>>> will cause similar problem.
>>>
>>> No. I’m doing nothing btrfs-specific at all, it’s just whatever systemd-nspawn —ephemeral does.
>>
>> So, only systemd-nspwan --ephemeral, and nothing else at all?
>>
>> Even nothing happened inside the container?
> 
> 
> Gotcha. (Sorry for missing that)
> 
> Inside the container, snapper might be invoked. Here are the snippets:
> 
> snapper -c root create-config -t ubos-default /
> snapper setup-quota
> snapper -c root create --type pre --print-number
> snapper -c root create --type post --pre-number <n>
> 
> Here is the snapper config:
> 
> # Snapper default configuration for UBOS.
> #
> # subvolume to snapshot
> SUBVOLUME="/"
> 
> # filesystem type
> FSTYPE="btrfs"
> 
> # btrfs qgroup for space aware cleanup algorithms
> QGROUP=""

What about disabling qgroup here?

And since I'm not familiar with snapper, I'm not pretty sure what it
will do.

But what about disabling snapper and try again to see if the btrfs hangs?

Thanks,
Qu

> 
> # fraction of the filesystems space the snapshots may use
> SPACE_LIMIT="0.2"
> 
> # users and groups allowed to work with config
> ALLOW_USERS=""
> ALLOW_GROUPS=""
> 
> # sync users and groups from ALLOW_USERS and ALLOW_GROUPS to .snapshots
> # directory
> SYNC_ACL="no"
> 
> # start comparing pre- and post-snapshot in background after creating
> # post-snapshot
> BACKGROUND_COMPARISON="no"
> 
> # run daily number cleanup
> NUMBER_CLEANUP="yes"
> 
> # limit for number cleanup; unit is seconds
> NUMBER_MIN_AGE="1800"
> NUMBER_LIMIT="50"
> NUMBER_LIMIT_IMPORTANT="10"
> 
> # create hourly snapshots
> TIMELINE_CREATE="no"
> 
> # cleanup hourly snapshots after some time
> TIMELINE_CLEANUP="yes"
> 
> # limits for timeline cleanup; unit is seconds
> TIMELINE_MIN_AGE="1800"
> TIMELINE_LIMIT_HOURLY="12"
> TIMELINE_LIMIT_DAILY="7"
> TIMELINE_LIMIT_WEEKLY="0"
> TIMELINE_LIMIT_MONTHLY="6"
> TIMELINE_LIMIT_YEARLY="0"
> 
> # cleanup empty pre-post-pairs
> EMPTY_PRE_POST_CLEANUP="yes"
> 
> # limits for empty pre-post-pair cleanup
> EMPTY_PRE_POST_MIN_AGE="1800"
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-15  1:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-14  1:36 Hanging after frequent use of systemd-nspawn --ephemeral Johannes Ernst
2018-01-14  2:27 ` Qu Wenruo
2018-01-14 16:30   ` Duncan
2018-01-14 20:17   ` Johannes Ernst
2018-01-15  0:48     ` Qu Wenruo
2018-01-15  0:58       ` Johannes Ernst
2018-01-15  1:34         ` Qu Wenruo
2018-01-15  1:47           ` Johannes Ernst
2018-01-15  1:55             ` Qu Wenruo
2018-01-14 20:18   ` Johannes Ernst

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.