All of lore.kernel.org
 help / color / mirror / Atom feed
* Indefinite hang in reserve_metadata_bytes on kernel 3.18.3
@ 2015-01-29  1:44 Steven Schlansker
  2015-01-29 10:08 ` Holger Hoffstätte
       [not found] ` <54CA06A6.3070108@googlemail.com>
  0 siblings, 2 replies; 4+ messages in thread
From: Steven Schlansker @ 2015-01-29  1:44 UTC (permalink / raw)
  To: linux-btrfs

[ Please CC me on responses, I am not subscribed to the list ]

Hello linux-btrfs,

I am running an cluster of Docker containers managed by Apache Mesos.  Until recently, we'd found btrfs to be the most reliable storage backend for Docker.  But now we are having troubles where large numbers of our slave nodes go offline due to processes hanging indefinitely inside of btrfs.

We were initially running Ubuntu kernel 3.13.0-44, but had serious troubles, so I moved to 3.18.3 to preemptively address the "You should run a recent vanilla kernel" response I expected from this mailing list :)

The symptoms are an endlessly increasing stream of hung tasks and high load average.  The first process (a Mesos slave task) to hang is stuck here, according to /proc/*/stack:

[<ffffffffa004333a>] reserve_metadata_bytes+0xca/0x4c0 [btrfs]
[<ffffffffa00443c9>] btrfs_delalloc_reserve_metadata+0x149/0x490 [btrfs]
[<ffffffffa006dbe2>] __btrfs_buffered_write+0x162/0x590 [btrfs]
[<ffffffffa006e297>] btrfs_file_write_iter+0x287/0x4e0 [btrfs]
[<ffffffff811dc6f1>] new_sync_write+0x81/0xb0
[<ffffffff811dd017>] vfs_write+0xb7/0x1f0
[<ffffffff811dda96>] SyS_write+0x46/0xb0
[<ffffffff8187842d>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

A further attempt to access btrfs ( ls -Rl /mnt ) hangs here:

[<ffffffffa004333a>] reserve_metadata_bytes+0xca/0x4c0 [btrfs]
[<ffffffffa0043cc0>] btrfs_block_rsv_add+0x30/0x60 [btrfs]
[<ffffffffa005c1ba>] start_transaction+0x45a/0x5a0 [btrfs]
[<ffffffffa005c31b>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[<ffffffffa0061f88>] btrfs_dirty_inode+0xb8/0xe0 [btrfs]
[<ffffffffa0062014>] btrfs_update_time+0x64/0xd0 [btrfs]
[<ffffffff811f7c65>] update_time+0x25/0xc0
[<ffffffff811f7dfa>] touch_atime+0xfa/0x140
[<ffffffff811e2621>] SyS_readlink+0xd1/0x130
[<ffffffff8187842d>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The only other processes with stack traces in btrfs code are the cleaner and transaction kthreads, sleeping here:
[<ffffffffa00536c5>] cleaner_kthread+0x165/0x190 [btrfs]
[<ffffffff8108d6b2>] kthread+0xd2/0xf0
[<ffffffff8187837c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffffa0057139>] transaction_kthread+0x1f9/0x240 [btrfs]
[<ffffffff8108d6b2>] kthread+0xd2/0xf0
[<ffffffff8187837c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff

Here's the information asked on the mailing list page:

Linux ip-10-70-6-163 3.18.3 #4 SMP Tue Jan 27 20:14:45 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@ip-10-70-6-163:/proc/1086# btrfs fi show
Label: none  uuid: 5c10d6f5-6207-41fd-8756-6399fab731f5
	Total devices 2 FS bytes used 14.63GiB
	devid    1 size 74.99GiB used 9.01GiB path /dev/xvdc
	devid    2 size 74.99GiB used 9.01GiB path /dev/xvdd

Btrfs v3.12

root@ip-10-70-6-163:/proc/1086# btrfs fi df /mnt
Data, RAID0: total=16.00GiB, used=13.28GiB
System, RAID0: total=16.00MiB, used=16.00KiB
Metadata, RAID0: total=2.00GiB, used=1.35GiB
unknown, single: total=336.00MiB, used=0.00


What can I do to further diagnose this problem?  How do I keep my cluster from falling down around me in many tiny pieces?

Thanks,
Steven Schlansker

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Indefinite hang in reserve_metadata_bytes on kernel 3.18.3
  2015-01-29  1:44 Indefinite hang in reserve_metadata_bytes on kernel 3.18.3 Steven Schlansker
@ 2015-01-29 10:08 ` Holger Hoffstätte
       [not found] ` <54CA06A6.3070108@googlemail.com>
  1 sibling, 0 replies; 4+ messages in thread
From: Holger Hoffstätte @ 2015-01-29 10:08 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 29 Jan 2015 01:44:02 +0000, Steven Schlansker wrote:

[..snip..]

> The symptoms are an endlessly increasing stream of hung tasks and high

Please try 3.18.5 (-rc1 is good) which contains the following fix:

"workqueue: fix subtle pool management issue which can stall whole
 worker_pool"

see: http://article.gmane.org/gmane.linux.kernel.stable/122074

No promises but it seems sufficiently causally related..

-h


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Indefinite hang in reserve_metadata_bytes on kernel 3.18.3
       [not found] ` <54CA06A6.3070108@googlemail.com>
@ 2015-01-29 22:50   ` Steven Schlansker
  2015-01-30 10:02     ` Holger Hoffstätte
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Schlansker @ 2015-01-29 22:50 UTC (permalink / raw)
  To: linux-btrfs, Holger Hoffstätte

Hi Holger,

On Jan 29, 2015, at 2:08 AM, Holger Hoffstätte <holger.hoffstaette@googlemail.com> wrote:

> [This mail was also posted to gmane.comp.file-systems.btrfs.]
> 
> On Thu, 29 Jan 2015 01:44:02 +0000, Steven Schlansker wrote:
> 
> [..snip..]
> 
>> The symptoms are an endlessly increasing stream of hung tasks and high
> 
> Please try 3.18.5 (-rc1 is good) which contains the following fix:
> 
> "workqueue: fix subtle pool management issue which can stall whole
> worker_pool"
> 
> see: http://article.gmane.org/gmane.linux.kernel.stable/122074
> 
> No promises but it seems sufficiently causally related..

Thank you for the suggestion.  I did not find a 3.18.5 presented in any form other than as a large number of *.patch files, so I went for 3.19-rc6 instead (which I verified has this commit)

Now I am getting:

[ 1224.728313] ------------[ cut here ]------------
[ 1224.728323] kernel BUG at fs/btrfs/extent-tree.c:7362!
[ 1224.728327] invalid opcode: 0000 [#1] SMP 
[ 1224.728331] Modules linked in: dm_multipath(E) scsi_dh(E) x86_pkg_temp_thermal(E) coretemp(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) btrfs(E)
[ 1224.728347] CPU: 3 PID: 18072 Comm: gunicorn Tainted: G            E  3.19.0-rc6 #2
[ 1224.728351] task: ffff8803eafeb1c0 ti: ffff8803e8380000 task.ti: ffff8803e8380000
[ 1224.728354] RIP: e030:[<ffffffffa00210d3>]  [<ffffffffa00210d3>] btrfs_alloc_tree_block+0x3c3/0x3d0 [btrfs]
[ 1224.728372] RSP: e02b:ffff8803e83838b8  EFLAGS: 00010202
[ 1224.728375] RAX: fffffffffffffff4 RBX: fffffffffffffff4 RCX: 00000000000032ca
[ 1224.728377] RDX: 00000000000032c9 RSI: ffff8802660c6b90 RDI: ffff8807543ace00
[ 1224.728380] RBP: ffff8803e8383958 R08: 000000000001e560 R09: ffff88075a2de560
[ 1224.728383] R10: ffffffffa004fe02 R11: ffffea0009983180 R12: ffff88073dfaf8f0
[ 1224.728386] R13: ffff8807557fe170 R14: 0000000000000000 R15: ffff8800fe05d000
[ 1224.728393] FS:  00007fc8e0fc7740(0000) GS:ffff88075a2c0000(0000) knlGS:0000000000000000
[ 1224.728396] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1224.728399] CR2: 00007fc8e0fcd000 CR3: 00000003e8695000 CR4: 0000000000002660
[ 1224.728402] Stack:
[ 1224.728404]  0000000000000000 0000000000000000 ffff8803e83838d8 ffffffffa002b5be
[ 1224.728409]  ffff8803e83839a7 0000000000000061 ffff8807557fe000 0000000000000000
[ 1224.728413]  00000000000001d3 0000400000000000 ffff8802660c6a68 00ffffffa005782c
[ 1224.728417] Call Trace:
[ 1224.728430]  [<ffffffffa002b5be>] ? btree_set_page_dirty+0xe/0x10 [btrfs]
[ 1224.728442]  [<ffffffffa000b46c>] __btrfs_cow_block+0x11c/0x530 [btrfs]
[ 1224.728454]  [<ffffffffa000ba3c>] btrfs_cow_block+0x12c/0x1d0 [btrfs]
[ 1224.728465]  [<ffffffffa000f673>] btrfs_search_slot+0x1f3/0xa40 [btrfs]
[ 1224.728472]  [<ffffffff81003610>] ? xen_write_msr_safe+0x40/0x70
[ 1224.728477]  [<ffffffff8101260f>] ? __switch_to+0x15f/0x580
[ 1224.728482]  [<ffffffff811fa186>] ? inode_init_always+0x106/0x1e0
[ 1224.728490]  [<ffffffffa0011700>] btrfs_insert_empty_items+0x70/0xc0 [btrfs]
[ 1224.728494]  [<ffffffff811fc5a2>] ? insert_inode_locked4+0xe2/0x190
[ 1224.728506]  [<ffffffffa00413ff>] btrfs_new_inode+0x1bf/0x540 [btrfs]
[ 1224.728517]  [<ffffffffa0042bcf>] btrfs_create+0xdf/0x210 [btrfs]
[ 1224.728521]  [<ffffffff811ebf75>] vfs_create+0xd5/0x140
[ 1224.728524]  [<ffffffff811efaa3>] do_last+0x1013/0x1210
[ 1224.728528]  [<ffffffff811ecff1>] ? path_init+0xc1/0x470
[ 1224.728532]  [<ffffffff811efd24>] path_openat+0x84/0x630
[ 1224.728536]  [<ffffffff811e29f5>] ? __sb_end_write+0x35/0x70
[ 1224.728540]  [<ffffffff811f146a>] do_filp_open+0x3a/0x90
[ 1224.728544]  [<ffffffff811fe007>] ? __alloc_fd+0xa7/0x130
[ 1224.728548]  [<ffffffff811df548>] do_sys_open+0x128/0x220
[ 1224.728552]  [<ffffffff811df65e>] SyS_open+0x1e/0x20
[ 1224.728557]  [<ffffffff819521ed>] system_call_fastpath+0x16/0x1b
[ 1224.728560] Code: 3d 00 f0 ff ff 0f 87 b8 fd ff ff 44 8b 6d ac 4d 01 af d0 03 00 00 48 83 c4 78 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 0f 0b 0f 0b <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 
[ 1224.728597] RIP  [<ffffffffa00210d3>] btrfs_alloc_tree_block+0x3c3/0x3d0 [btrfs]
[ 1224.728610]  RSP <ffff8803e83838b8>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Indefinite hang in reserve_metadata_bytes on kernel 3.18.3
  2015-01-29 22:50   ` Steven Schlansker
@ 2015-01-30 10:02     ` Holger Hoffstätte
  0 siblings, 0 replies; 4+ messages in thread
From: Holger Hoffstätte @ 2015-01-30 10:02 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 29 Jan 2015 22:50:20 +0000, Steven Schlansker wrote:

> Thank you for the suggestion.  I did not find a 3.18.5 presented in any
> form other than as a large number of *.patch files, so I went for
> 3.19-rc6 instead (which I verified has this commit)

3.18.5 is out now but it shouldn't matter, in your case it looks like 
something else is wrong.

> Now I am getting:
> 
> [ 1224.728313] ------------[ cut here ]------------
> [ 1224.728323] kernel BUG at fs/btrfs/extent-tree.c:7362!

That's -ENOMEM in btrfs_alloc_tree_block(), no definite idea what could 
(really) cause that. In any case your first order of business should be 
to get an up-to-date btrfs-progs (= 3.18.2) and see what a check says. 
Your 3.12 is really old.

Another thing that I found helpful is to mount the fs in question at 
least once without the free-space-cache, aka with -o 
clear_cache,nospace_cache options. btrfs tries to detect whether this 
cache is out of sync/corrupted, but I've seen it lead to weird problems 
down the road on rare occasion. So mount the fs without it, maybe do a 
little cleanup/work, cleanly unmount it and then try to remount/work with 
the cache enabled.

If you first created & worked on this fs with 3.13 you may have other yet 
undetected problems lurking.

That's really all I can recommend for now from here. Good luck!

-h


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-30 10:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-29  1:44 Indefinite hang in reserve_metadata_bytes on kernel 3.18.3 Steven Schlansker
2015-01-29 10:08 ` Holger Hoffstätte
     [not found] ` <54CA06A6.3070108@googlemail.com>
2015-01-29 22:50   ` Steven Schlansker
2015-01-30 10:02     ` Holger Hoffstätte

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.