All of lore.kernel.org
 help / color / mirror / Atom feed
* Deadlock/high load
@ 2014-06-12 15:15 Alin Dobre
  2014-06-13  3:37 ` Liu Bo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alin Dobre @ 2014-06-12 15:15 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I have a problem that triggers quite often on our production machines. I
don't really know what's triggering this or how to reproduce it, but the
machine enters in some sort of deadlock state, where it consumes all the
i/o and the load average goes very high in seconds (it even gets to over
200), sometimes in about a minute or even less, the machine is
unresponsive and we have to reset it. Rarely, the load just stays high
(~25) for hours, but it never gets down again, but this happens rarely,
as I said. In general, the machine is either already unresponsive or is
about to become unresponsive.

The last machine that encountered this has 40 cores and the btrfs
filesystem is running over SSDs. We encountered this on a plain 3.14
kernel, and also on the latest 3.14.6 kernel + all the patches whose
summary is marked "btrfs:" that made it in 3.15, straight forward
backported (cherry-picked) to 3.14.

Also, no suspicious (malicious) activity from the running processes either.

I noticed there was another report on 3.13 which was solved by a 3.15rc
patch, it doesn't seem to be the same thing.

Since the only chance to obtain something was via a SysRq dump, here's
what I could get from the last "w" trigger (tasks that are in
uninterruptable (blocked) state), showing only tasks that are related to
btrfs:

btrfs-transacti D 000000000000000e     0  2483      2 0x00080008
 ffff881fd05975d8 ffffffff81880a27 ffff881fd05974e8 ffff881fd05974f8
 ffff881fd0596010 ffff881fd05975d8 0000000000011bc0 ffff881fd13398f0
 0000000000011bc0 0000000000011bc0 ffff881fd28ecad0 ffff881fd13398f0
Call Trace:
 [<ffffffff81880a27>] ? __schedule+0x687/0x72c
 [<ffffffff8163aaf0>] ? do_release_stripe+0xeb/0x182
 [<ffffffff8114a076>] ? zone_statistics+0x77/0x7e
 [<ffffffff8163fed0>] ? raid5_unplug+0xaa/0xb3
 [<ffffffff813cb87e>] ? blk_flush_plug_list+0x99/0x1f0
 [<ffffffff8163c24d>] ? get_active_stripe+0x65/0x5ca
 [<ffffffff810f8704>] ? prepare_to_wait+0x71/0x7c
 [<ffffffff816431f9>] ? make_request+0x7b0/0x999
 [<ffffffff816429d4>] ? release_stripe_plug+0x20/0x95
 [<ffffffff810f8497>] ? bit_waitqueue+0xb0/0xb0
 [<ffffffff810f8497>] ? bit_waitqueue+0xb0/0xb0
 [<ffffffff8166375a>] ? md_make_request+0xfa/0x215
 [<ffffffff81324f22>] ? __btrfs_map_block+0xd6f/0xd89
 [<ffffffff813ca63c>] ? generic_make_request+0x99/0xda
 [<ffffffff813ca770>] ? submit_bio+0xf3/0xfe
 [<ffffffff813251de>] ? submit_stripe_bio+0x77/0x82
 [<ffffffff813255b6>] ? btrfs_map_bio+0x3cd/0x440
 [<ffffffff812fdc1d>] ? csum_tree_block+0x1c1/0x1ec
 [<ffffffff812fdfa6>] ? btree_submit_bio_hook+0x97/0xf0
 [<ffffffff811b561e>] ? __bio_add_page+0x153/0x1de
 [<ffffffff8131ab64>] ? submit_one_bio+0x63/0x90
 [<ffffffff8113c61b>] ? account_page_writeback+0x28/0x2d
 [<ffffffff8131b504>] ? submit_extent_page+0xe7/0x17e
 [<ffffffff81320796>] ? btree_write_cache_pages+0x44c/0x71a
 [<ffffffff8131b272>] ? extent_range_clear_dirty_for_io+0x5a/0x5a
 [<ffffffff812fc41a>] ? btree_writepages+0x4a/0x53
 [<ffffffff8113cf5f>] ? do_writepages+0x1b/0x24
 [<ffffffff81134f76>] ? __filemap_fdatawrite_range+0x4e/0x50
 [<ffffffff81135b55>] ? filemap_fdatawrite_range+0xe/0x10
 [<ffffffff813020c1>] ? btrfs_write_marked_extents+0x83/0xd1
 [<ffffffff8130216b>] ? btrfs_write_and_wait_transaction+0x5c/0x8a
 [<ffffffff81302ee2>] ? btrfs_commit_transaction+0x68b/0x87c
 [<ffffffff810cf0b7>] ? del_timer+0x87/0x87
 [<ffffffff812fef3f>] ? transaction_kthread+0x114/0x1e9
 [<ffffffff812fee2b>] ? close_ctree+0x280/0x280
 [<ffffffff810df1ff>] ? kthread+0xc9/0xd1
 [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
 [<ffffffff818842cc>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
rs:main Q:Reg   D 0000000000000002     0  6857   4976 0x00000000
 ffff883fc9b0bb08 0000000000000002 ffff883fc9b0b9e8 ffff883fc9b0ba28
 ffff883fc9b0a010 ffff883fc9b0bb08 0000000000011bc0 ffff883fc794db70
 0000000000011bc0 0000000000011bc0 ffff881fd28e8000 ffff883fc794db70
Call Trace:
 [<ffffffff81040a93>] ? native_sched_clock+0x17/0xd3
 [<ffffffff810406e7>] ? sched_clock+0x9/0xd
 [<ffffffff810eb7c2>] ? arch_vtime_task_switch+0x81/0x86
 [<ffffffff810ebc88>] ? vtime_common_task_switch+0x29/0x2d
 [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
 [<ffffffff81880c2a>] schedule+0x6e/0x70
 [<ffffffff81880cbf>] io_schedule+0x93/0xd7
 [<ffffffff81134170>] ? __lock_page+0x68/0x68
 [<ffffffff81134179>] sleep_on_page+0x9/0xd
 [<ffffffff8188118f>] __wait_on_bit+0x45/0x7a
 [<ffffffff8113444e>] wait_on_page_bit+0x71/0x78
 [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
 [<ffffffff81311056>] prepare_pages+0xd2/0x11b
 [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
 [<ffffffff8110eb76>] ? futex_wait+0x176/0x239
 [<ffffffff810c9650>] ? current_fs_time+0x22/0x29
 [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
 [<ffffffff81198eb3>] ? path_openat+0x593/0x5cc
 [<ffffffff8118c275>] do_sync_write+0x59/0x79
 [<ffffffff8118d53e>] vfs_write+0xd3/0x172
 [<ffffffff8118d69e>] SyS_write+0x4b/0x8f
 [<ffffffff81884505>] tracesys+0xd0/0xd5
freshclam       D 0000000000000002     0  8305   4976 0x00000000
 ffff883fbc1b1b08 0000000000000002 ffff881fdfc72760 ffff883fbc1b1a28
 ffff883fbc1b0010 ffff883fbc1b1b08 0000000000011bc0 ffff883fb5ca31e0
 0000000000011bc0 0000000000011bc0 ffffffff81b8d440 ffff883fb5ca31e0
Call Trace:
 [<ffffffff8113f65f>] ? lru_cache_add+0x9/0xb
 [<ffffffff8115d727>] ? page_add_new_anon_rmap+0x108/0x11a
 [<ffffffff811535a8>] ? handle_mm_fault+0xbdf/0xc84
 [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
 [<ffffffff81880c2a>] schedule+0x6e/0x70
 [<ffffffff81880cbf>] io_schedule+0x93/0xd7
 [<ffffffff81134170>] ? __lock_page+0x68/0x68
 [<ffffffff81134179>] sleep_on_page+0x9/0xd
 [<ffffffff8188118f>] __wait_on_bit+0x45/0x7a
 [<ffffffff8113444e>] wait_on_page_bit+0x71/0x78
 [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
 [<ffffffff81311056>] prepare_pages+0xd2/0x11b
 [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
 [<ffffffff81194a71>] ? complete_walk+0x84/0xc9
 [<ffffffff810c9650>] ? current_fs_time+0x22/0x29
 [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
 [<ffffffff811991d6>] ? final_putname+0x33/0x37
 [<ffffffff8118c275>] do_sync_write+0x59/0x79
 [<ffffffff8118d53e>] vfs_write+0xd3/0x172
 [<ffffffff8118d69e>] SyS_write+0x4b/0x8f
 [<ffffffff81884505>] tracesys+0xd0/0xd5
nginx           D 0000000000000002     0 12360  12358 0x00000000
 ffff881f9ef6bb08 0000000000000002 ffff881f9ef6ba08 ffff881f9ef6ba28
 ffff881f9ef6a010 ffff881f9ef6bb08 0000000000011bc0 ffff881fb894ec10
 0000000000011bc0 0000000000011bc0 ffffffff81b8d440 ffff881fb894ec10
Call Trace:
 [<ffffffff816db51b>] ? netif_rx_internal+0xc9/0xda
 [<ffffffff816db535>] ? netif_rx+0x9/0xb
 [<ffffffff8150c554>] ? loopback_xmit+0x9a/0xb4
 [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
 [<ffffffff81880c2a>] schedule+0x6e/0x70
 [<ffffffff81880cbf>] io_schedule+0x93/0xd7
 [<ffffffff81134170>] ? __lock_page+0x68/0x68
 [<ffffffff81134179>] sleep_on_page+0x9/0xd
 [<ffffffff8188118f>] __wait_on_bit+0x45/0x7a
 [<ffffffff8113444e>] wait_on_page_bit+0x71/0x78
 [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
 [<ffffffff81311056>] prepare_pages+0xd2/0x11b
 [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
 [<ffffffff810c9650>] ? current_fs_time+0x22/0x29
 [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
 [<ffffffff816c442b>] ? sock_destroy_inode+0x2e/0x32
 [<ffffffff8118c275>] do_sync_write+0x59/0x79
 [<ffffffff8118d53e>] vfs_write+0xd3/0x172
 [<ffffffff8118d69e>] SyS_write+0x4b/0x8f
 [<ffffffff81884505>] tracesys+0xd0/0xd5
kworker/u81:1   D ffff883fd150e800     0 35017      2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-9:0)
 ffff881f60ae56d8 0000000000000002 ffff881f60ae55c8 ffff881f60ae55f8
 ffff881f60ae4010 ffff881f60ae56d8 0000000000011bc0 ffff881fd12bf460
 0000000000011bc0 0000000000011bc0 ffff881fd28e8850 ffff881fd12bf460
Call Trace:
 [<ffffffff810ea9a7>] ? default_wake_function+0xd/0xf
 [<ffffffff810f84a8>] ? autoremove_wake_function+0x11/0x34
 [<ffffffff810f838c>] ? __wake_up_common+0x49/0x7f
 [<ffffffff81880c2a>] schedule+0x6e/0x70
 [<ffffffff816613c8>] md_write_start+0x12a/0x142
 [<ffffffff810f8497>] ? bit_waitqueue+0xb0/0xb0
 [<ffffffff81631e4f>] make_request+0x61/0xc26
 [<ffffffff811b561e>] ? __bio_add_page+0x153/0x1de
 [<ffffffff81134218>] ? unlock_page+0x22/0x26
 [<ffffffff8113efc1>] ? release_pages+0x1f2/0x201
 [<ffffffff8166375a>] md_make_request+0xfa/0x215
 [<ffffffff810f874e>] ? __wake_up+0x3f/0x48
 [<ffffffff813ca63c>] generic_make_request+0x99/0xda
 [<ffffffff813ca770>] submit_bio+0xf3/0xfe
 [<ffffffff811f4c1c>] ext4_io_submit+0x24/0x43
 [<ffffffff811f46f7>] ext4_writepages+0x8bc/0xa07
 [<ffffffff813cb9e8>] ? blk_finish_plug+0x13/0x34
 [<ffffffff8113cf5f>] do_writepages+0x1b/0x24
 [<ffffffff811abb30>] __writeback_single_inode+0x40/0xf3
 [<ffffffff811ac99c>] writeback_sb_inodes+0x21d/0x391
 [<ffffffff8118e6ba>] ? put_super+0x2c/0x31
 [<ffffffff811acb83>] __writeback_inodes_wb+0x73/0xb4
 [<ffffffff811acccd>] wb_writeback+0x109/0x19c
 [<ffffffff8113d0f1>] ? bdi_dirty_limit+0x2c/0x89
 [<ffffffff811acea9>] wb_do_writeback+0x149/0x16d
 [<ffffffff811acf3a>] bdi_writeback_workfn+0x6d/0x16f
 [<ffffffff81310708>] ? finish_ordered_fn+0x10/0x12
 [<ffffffff8132a747>] ? normal_work_helper+0xcc/0x18e
 [<ffffffff810d9977>] ? pwq_dec_nr_in_flight+0xe3/0xec
 [<ffffffff810d9bd3>] process_one_work+0x253/0x368
 [<ffffffff810d9ed2>] worker_thread+0x1ea/0x343
 [<ffffffff810d9ce8>] ? process_one_work+0x368/0x368
 [<ffffffff810df1ff>] kthread+0xc9/0xd1
 [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
 [<ffffffff818842cc>] ret_from_fork+0x7c/0xb0
 [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
kworker/u82:2   D 0000000000000000     0 35547      2 0x00080000
Workqueue: btrfs-submit normal_work_helper
 0000000000000000 0000000000000000 0000000000000008 ffff881fd0ec0040
 ffff881fd0ec0070 0000000000000000 0000000091827364 ffff883eec757d30
 ffff883eec757d30 ffff883eec757d40 ffff883eec757d40 ffff883fd157a0c0
Call Trace:
 [<ffffffff813241b1>] ? pending_bios_fn+0x10/0x12
 [<ffffffff8132a747>] ? normal_work_helper+0xcc/0x18e
 [<ffffffff810d9bd3>] ? process_one_work+0x253/0x368
 [<ffffffff810d9ed2>] ? worker_thread+0x1ea/0x343
 [<ffffffff810d9ce8>] ? process_one_work+0x368/0x368
 [<ffffffff810df1ff>] ? kthread+0xc9/0xd1
 [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
 [<ffffffff818842cc>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
php-fpm         D ffffffff81134170     0 16086  12386 0x00000000
 ffff881f474c3b28 0000000000000002 0000001400000010 ffff881f474c3a48
 ffff881f474c2010 ffff881f474c3b28 0000000000011bc0 ffff881f7184f460
 0000000000011bc0 0000000000011bc0 ffff881fd28bec10 ffff881f7184f460
Call Trace:
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff8118e25b>] ? __sb_end_write+0x2d/0x5c
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
 [<ffffffff81134170>] ? __lock_page+0x68/0x68
 [<ffffffff81880c2a>] schedule+0x6e/0x70
 [<ffffffff81880cbf>] io_schedule+0x93/0xd7
 [<ffffffff81134179>] sleep_on_page+0x9/0xd
 [<ffffffff81880fd3>] __wait_on_bit_lock+0x43/0x8f
 [<ffffffff81134169>] __lock_page+0x61/0x68
 [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff81151a47>] do_wp_page+0x1e4/0x79a
 [<ffffffff811969b1>] ? path_lookupat+0x5db/0x64d
 [<ffffffff81153584>] handle_mm_fault+0xbbb/0xc84
 [<ffffffff811991d6>] ? final_putname+0x33/0x37
 [<ffffffff81199344>] ? user_path_at_empty+0x5e/0x8f
 [<ffffffff810f99ed>] ? cpuacct_account_field+0x55/0x5e
 [<ffffffff8106b08c>] __do_page_fault+0x3bb/0x3e2
 [<ffffffff811061e9>] ? rcu_eqs_enter+0x70/0x83
 [<ffffffff8110620a>] ? rcu_user_enter+0xe/0x10
 [<ffffffff810f99ed>] ? cpuacct_account_field+0x55/0x5e
 [<ffffffff810ebcfa>] ? account_user_time+0x6e/0x97
 [<ffffffff810ebd70>] ? vtime_account_user+0x4d/0x52
 [<ffffffff8106b0f7>] do_page_fault+0x44/0x61
 [<ffffffff81883e38>] page_fault+0x28/0x30

And then the php-fpm process is present 14 more times with the same
backtrace. In total, the number of btrfs calls is pretty much in the
following state:

$ grep btrfs blocked.txt | sort
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
 [<ffffffff813020c1>] ? btrfs_write_marked_extents+0x83/0xd1
 [<ffffffff8130216b>] ? btrfs_write_and_wait_transaction+0x5c/0x8a
 [<ffffffff81302ee2>] ? btrfs_commit_transaction+0x68b/0x87c
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
 [<ffffffff8130ee69>] btrfs_page_mkwrite+0xef/0x2c2
 [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
 [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
 [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
 [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
 [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
 [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
 [<ffffffff81324f22>] ? __btrfs_map_block+0xd6f/0xd89
 [<ffffffff813255b6>] ? btrfs_map_bio+0x3cd/0x440
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
 [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
Workqueue: btrfs-submit normal_work_helper
btrfs-transacti D 000000000000000e     0  2483      2 0x00080008

I can send the whole file, if someone is interested.

Appreciate any feedback.

Cheers,
Alin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Deadlock/high load
  2014-06-12 15:15 Deadlock/high load Alin Dobre
@ 2014-06-13  3:37 ` Liu Bo
  2014-06-13  6:46   ` Alin Dobre
  2014-06-13  6:50   ` Alin Dobre
  2014-06-20 16:22 ` Alin Dobre
  2014-06-27 16:12 ` Alin Dobre
  2 siblings, 2 replies; 6+ messages in thread
From: Liu Bo @ 2014-06-13  3:37 UTC (permalink / raw)
  To: Alin Dobre; +Cc: linux-btrfs

On Thu, Jun 12, 2014 at 04:15:27PM +0100, Alin Dobre wrote:
> Hi all,
> 
> I have a problem that triggers quite often on our production machines. I
> don't really know what's triggering this or how to reproduce it, but the
> machine enters in some sort of deadlock state, where it consumes all the
> i/o and the load average goes very high in seconds (it even gets to over
> 200), sometimes in about a minute or even less, the machine is
> unresponsive and we have to reset it. Rarely, the load just stays high
> (~25) for hours, but it never gets down again, but this happens rarely,
> as I said. In general, the machine is either already unresponsive or is
> about to become unresponsive.
> 
> The last machine that encountered this has 40 cores and the btrfs
> filesystem is running over SSDs. We encountered this on a plain 3.14
> kernel, and also on the latest 3.14.6 kernel + all the patches whose
> summary is marked "btrfs:" that made it in 3.15, straight forward
> backported (cherry-picked) to 3.14.
> 
> Also, no suspicious (malicious) activity from the running processes either.
> 
> I noticed there was another report on 3.13 which was solved by a 3.15rc
> patch, it doesn't seem to be the same thing.

The output of 'btrfs filesystem df' is appreciate, it can help determine if the
FS has entered into 'almost full' situation, and that may cause a bug that pages
are not marked with writeback tag and lead to processes's endless waiting.

-liubo

> 
> Since the only chance to obtain something was via a SysRq dump, here's
> what I could get from the last "w" trigger (tasks that are in
> uninterruptable (blocked) state), showing only tasks that are related to
> btrfs:
> 
> btrfs-transacti D 000000000000000e     0  2483      2 0x00080008
>  ffff881fd05975d8 ffffffff81880a27 ffff881fd05974e8 ffff881fd05974f8
>  ffff881fd0596010 ffff881fd05975d8 0000000000011bc0 ffff881fd13398f0
>  0000000000011bc0 0000000000011bc0 ffff881fd28ecad0 ffff881fd13398f0
> Call Trace:
>  [<ffffffff81880a27>] ? __schedule+0x687/0x72c
>  [<ffffffff8163aaf0>] ? do_release_stripe+0xeb/0x182
>  [<ffffffff8114a076>] ? zone_statistics+0x77/0x7e
>  [<ffffffff8163fed0>] ? raid5_unplug+0xaa/0xb3
>  [<ffffffff813cb87e>] ? blk_flush_plug_list+0x99/0x1f0
>  [<ffffffff8163c24d>] ? get_active_stripe+0x65/0x5ca
>  [<ffffffff810f8704>] ? prepare_to_wait+0x71/0x7c
>  [<ffffffff816431f9>] ? make_request+0x7b0/0x999
>  [<ffffffff816429d4>] ? release_stripe_plug+0x20/0x95
>  [<ffffffff810f8497>] ? bit_waitqueue+0xb0/0xb0
>  [<ffffffff810f8497>] ? bit_waitqueue+0xb0/0xb0
>  [<ffffffff8166375a>] ? md_make_request+0xfa/0x215
>  [<ffffffff81324f22>] ? __btrfs_map_block+0xd6f/0xd89
>  [<ffffffff813ca63c>] ? generic_make_request+0x99/0xda
>  [<ffffffff813ca770>] ? submit_bio+0xf3/0xfe
>  [<ffffffff813251de>] ? submit_stripe_bio+0x77/0x82
>  [<ffffffff813255b6>] ? btrfs_map_bio+0x3cd/0x440
>  [<ffffffff812fdc1d>] ? csum_tree_block+0x1c1/0x1ec
>  [<ffffffff812fdfa6>] ? btree_submit_bio_hook+0x97/0xf0
>  [<ffffffff811b561e>] ? __bio_add_page+0x153/0x1de
>  [<ffffffff8131ab64>] ? submit_one_bio+0x63/0x90
>  [<ffffffff8113c61b>] ? account_page_writeback+0x28/0x2d
>  [<ffffffff8131b504>] ? submit_extent_page+0xe7/0x17e
>  [<ffffffff81320796>] ? btree_write_cache_pages+0x44c/0x71a
>  [<ffffffff8131b272>] ? extent_range_clear_dirty_for_io+0x5a/0x5a
>  [<ffffffff812fc41a>] ? btree_writepages+0x4a/0x53
>  [<ffffffff8113cf5f>] ? do_writepages+0x1b/0x24
>  [<ffffffff81134f76>] ? __filemap_fdatawrite_range+0x4e/0x50
>  [<ffffffff81135b55>] ? filemap_fdatawrite_range+0xe/0x10
>  [<ffffffff813020c1>] ? btrfs_write_marked_extents+0x83/0xd1
>  [<ffffffff8130216b>] ? btrfs_write_and_wait_transaction+0x5c/0x8a
>  [<ffffffff81302ee2>] ? btrfs_commit_transaction+0x68b/0x87c
>  [<ffffffff810cf0b7>] ? del_timer+0x87/0x87
>  [<ffffffff812fef3f>] ? transaction_kthread+0x114/0x1e9
>  [<ffffffff812fee2b>] ? close_ctree+0x280/0x280
>  [<ffffffff810df1ff>] ? kthread+0xc9/0xd1
>  [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
>  [<ffffffff818842cc>] ? ret_from_fork+0x7c/0xb0
>  [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
> rs:main Q:Reg   D 0000000000000002     0  6857   4976 0x00000000
>  ffff883fc9b0bb08 0000000000000002 ffff883fc9b0b9e8 ffff883fc9b0ba28
>  ffff883fc9b0a010 ffff883fc9b0bb08 0000000000011bc0 ffff883fc794db70
>  0000000000011bc0 0000000000011bc0 ffff881fd28e8000 ffff883fc794db70
> Call Trace:
>  [<ffffffff81040a93>] ? native_sched_clock+0x17/0xd3
>  [<ffffffff810406e7>] ? sched_clock+0x9/0xd
>  [<ffffffff810eb7c2>] ? arch_vtime_task_switch+0x81/0x86
>  [<ffffffff810ebc88>] ? vtime_common_task_switch+0x29/0x2d
>  [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
>  [<ffffffff81880c2a>] schedule+0x6e/0x70
>  [<ffffffff81880cbf>] io_schedule+0x93/0xd7
>  [<ffffffff81134170>] ? __lock_page+0x68/0x68
>  [<ffffffff81134179>] sleep_on_page+0x9/0xd
>  [<ffffffff8188118f>] __wait_on_bit+0x45/0x7a
>  [<ffffffff8113444e>] wait_on_page_bit+0x71/0x78
>  [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
>  [<ffffffff81311056>] prepare_pages+0xd2/0x11b
>  [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
>  [<ffffffff8110eb76>] ? futex_wait+0x176/0x239
>  [<ffffffff810c9650>] ? current_fs_time+0x22/0x29
>  [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
>  [<ffffffff81198eb3>] ? path_openat+0x593/0x5cc
>  [<ffffffff8118c275>] do_sync_write+0x59/0x79
>  [<ffffffff8118d53e>] vfs_write+0xd3/0x172
>  [<ffffffff8118d69e>] SyS_write+0x4b/0x8f
>  [<ffffffff81884505>] tracesys+0xd0/0xd5
> freshclam       D 0000000000000002     0  8305   4976 0x00000000
>  ffff883fbc1b1b08 0000000000000002 ffff881fdfc72760 ffff883fbc1b1a28
>  ffff883fbc1b0010 ffff883fbc1b1b08 0000000000011bc0 ffff883fb5ca31e0
>  0000000000011bc0 0000000000011bc0 ffffffff81b8d440 ffff883fb5ca31e0
> Call Trace:
>  [<ffffffff8113f65f>] ? lru_cache_add+0x9/0xb
>  [<ffffffff8115d727>] ? page_add_new_anon_rmap+0x108/0x11a
>  [<ffffffff811535a8>] ? handle_mm_fault+0xbdf/0xc84
>  [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
>  [<ffffffff81880c2a>] schedule+0x6e/0x70
>  [<ffffffff81880cbf>] io_schedule+0x93/0xd7
>  [<ffffffff81134170>] ? __lock_page+0x68/0x68
>  [<ffffffff81134179>] sleep_on_page+0x9/0xd
>  [<ffffffff8188118f>] __wait_on_bit+0x45/0x7a
>  [<ffffffff8113444e>] wait_on_page_bit+0x71/0x78
>  [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
>  [<ffffffff81311056>] prepare_pages+0xd2/0x11b
>  [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
>  [<ffffffff81194a71>] ? complete_walk+0x84/0xc9
>  [<ffffffff810c9650>] ? current_fs_time+0x22/0x29
>  [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
>  [<ffffffff811991d6>] ? final_putname+0x33/0x37
>  [<ffffffff8118c275>] do_sync_write+0x59/0x79
>  [<ffffffff8118d53e>] vfs_write+0xd3/0x172
>  [<ffffffff8118d69e>] SyS_write+0x4b/0x8f
>  [<ffffffff81884505>] tracesys+0xd0/0xd5
> nginx           D 0000000000000002     0 12360  12358 0x00000000
>  ffff881f9ef6bb08 0000000000000002 ffff881f9ef6ba08 ffff881f9ef6ba28
>  ffff881f9ef6a010 ffff881f9ef6bb08 0000000000011bc0 ffff881fb894ec10
>  0000000000011bc0 0000000000011bc0 ffffffff81b8d440 ffff881fb894ec10
> Call Trace:
>  [<ffffffff816db51b>] ? netif_rx_internal+0xc9/0xda
>  [<ffffffff816db535>] ? netif_rx+0x9/0xb
>  [<ffffffff8150c554>] ? loopback_xmit+0x9a/0xb4
>  [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
>  [<ffffffff81880c2a>] schedule+0x6e/0x70
>  [<ffffffff81880cbf>] io_schedule+0x93/0xd7
>  [<ffffffff81134170>] ? __lock_page+0x68/0x68
>  [<ffffffff81134179>] sleep_on_page+0x9/0xd
>  [<ffffffff8188118f>] __wait_on_bit+0x45/0x7a
>  [<ffffffff8113444e>] wait_on_page_bit+0x71/0x78
>  [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
>  [<ffffffff81311056>] prepare_pages+0xd2/0x11b
>  [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
>  [<ffffffff810c9650>] ? current_fs_time+0x22/0x29
>  [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
>  [<ffffffff816c442b>] ? sock_destroy_inode+0x2e/0x32
>  [<ffffffff8118c275>] do_sync_write+0x59/0x79
>  [<ffffffff8118d53e>] vfs_write+0xd3/0x172
>  [<ffffffff8118d69e>] SyS_write+0x4b/0x8f
>  [<ffffffff81884505>] tracesys+0xd0/0xd5
> kworker/u81:1   D ffff883fd150e800     0 35017      2 0x00000000
> Workqueue: writeback bdi_writeback_workfn (flush-9:0)
>  ffff881f60ae56d8 0000000000000002 ffff881f60ae55c8 ffff881f60ae55f8
>  ffff881f60ae4010 ffff881f60ae56d8 0000000000011bc0 ffff881fd12bf460
>  0000000000011bc0 0000000000011bc0 ffff881fd28e8850 ffff881fd12bf460
> Call Trace:
>  [<ffffffff810ea9a7>] ? default_wake_function+0xd/0xf
>  [<ffffffff810f84a8>] ? autoremove_wake_function+0x11/0x34
>  [<ffffffff810f838c>] ? __wake_up_common+0x49/0x7f
>  [<ffffffff81880c2a>] schedule+0x6e/0x70
>  [<ffffffff816613c8>] md_write_start+0x12a/0x142
>  [<ffffffff810f8497>] ? bit_waitqueue+0xb0/0xb0
>  [<ffffffff81631e4f>] make_request+0x61/0xc26
>  [<ffffffff811b561e>] ? __bio_add_page+0x153/0x1de
>  [<ffffffff81134218>] ? unlock_page+0x22/0x26
>  [<ffffffff8113efc1>] ? release_pages+0x1f2/0x201
>  [<ffffffff8166375a>] md_make_request+0xfa/0x215
>  [<ffffffff810f874e>] ? __wake_up+0x3f/0x48
>  [<ffffffff813ca63c>] generic_make_request+0x99/0xda
>  [<ffffffff813ca770>] submit_bio+0xf3/0xfe
>  [<ffffffff811f4c1c>] ext4_io_submit+0x24/0x43
>  [<ffffffff811f46f7>] ext4_writepages+0x8bc/0xa07
>  [<ffffffff813cb9e8>] ? blk_finish_plug+0x13/0x34
>  [<ffffffff8113cf5f>] do_writepages+0x1b/0x24
>  [<ffffffff811abb30>] __writeback_single_inode+0x40/0xf3
>  [<ffffffff811ac99c>] writeback_sb_inodes+0x21d/0x391
>  [<ffffffff8118e6ba>] ? put_super+0x2c/0x31
>  [<ffffffff811acb83>] __writeback_inodes_wb+0x73/0xb4
>  [<ffffffff811acccd>] wb_writeback+0x109/0x19c
>  [<ffffffff8113d0f1>] ? bdi_dirty_limit+0x2c/0x89
>  [<ffffffff811acea9>] wb_do_writeback+0x149/0x16d
>  [<ffffffff811acf3a>] bdi_writeback_workfn+0x6d/0x16f
>  [<ffffffff81310708>] ? finish_ordered_fn+0x10/0x12
>  [<ffffffff8132a747>] ? normal_work_helper+0xcc/0x18e
>  [<ffffffff810d9977>] ? pwq_dec_nr_in_flight+0xe3/0xec
>  [<ffffffff810d9bd3>] process_one_work+0x253/0x368
>  [<ffffffff810d9ed2>] worker_thread+0x1ea/0x343
>  [<ffffffff810d9ce8>] ? process_one_work+0x368/0x368
>  [<ffffffff810df1ff>] kthread+0xc9/0xd1
>  [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
>  [<ffffffff818842cc>] ret_from_fork+0x7c/0xb0
>  [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
> kworker/u82:2   D 0000000000000000     0 35547      2 0x00080000
> Workqueue: btrfs-submit normal_work_helper
>  0000000000000000 0000000000000000 0000000000000008 ffff881fd0ec0040
>  ffff881fd0ec0070 0000000000000000 0000000091827364 ffff883eec757d30
>  ffff883eec757d30 ffff883eec757d40 ffff883eec757d40 ffff883fd157a0c0
> Call Trace:
>  [<ffffffff813241b1>] ? pending_bios_fn+0x10/0x12
>  [<ffffffff8132a747>] ? normal_work_helper+0xcc/0x18e
>  [<ffffffff810d9bd3>] ? process_one_work+0x253/0x368
>  [<ffffffff810d9ed2>] ? worker_thread+0x1ea/0x343
>  [<ffffffff810d9ce8>] ? process_one_work+0x368/0x368
>  [<ffffffff810df1ff>] ? kthread+0xc9/0xd1
>  [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
>  [<ffffffff818842cc>] ? ret_from_fork+0x7c/0xb0
>  [<ffffffff810df136>] ? kthread_freezable_should_stop+0x5b/0x5b
> php-fpm         D ffffffff81134170     0 16086  12386 0x00000000
>  ffff881f474c3b28 0000000000000002 0000001400000010 ffff881f474c3a48
>  ffff881f474c2010 ffff881f474c3b28 0000000000011bc0 ffff881f7184f460
>  0000000000011bc0 0000000000011bc0 ffff881fd28bec10 ffff881f7184f460
> Call Trace:
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff8118e25b>] ? __sb_end_write+0x2d/0x5c
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff8104072d>] ? read_tsc+0x9/0x1b
>  [<ffffffff81134170>] ? __lock_page+0x68/0x68
>  [<ffffffff81880c2a>] schedule+0x6e/0x70
>  [<ffffffff81880cbf>] io_schedule+0x93/0xd7
>  [<ffffffff81134179>] sleep_on_page+0x9/0xd
>  [<ffffffff81880fd3>] __wait_on_bit_lock+0x43/0x8f
>  [<ffffffff81134169>] __lock_page+0x61/0x68
>  [<ffffffff810f84f3>] ? wake_atomic_t_function+0x28/0x28
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff81151a47>] do_wp_page+0x1e4/0x79a
>  [<ffffffff811969b1>] ? path_lookupat+0x5db/0x64d
>  [<ffffffff81153584>] handle_mm_fault+0xbbb/0xc84
>  [<ffffffff811991d6>] ? final_putname+0x33/0x37
>  [<ffffffff81199344>] ? user_path_at_empty+0x5e/0x8f
>  [<ffffffff810f99ed>] ? cpuacct_account_field+0x55/0x5e
>  [<ffffffff8106b08c>] __do_page_fault+0x3bb/0x3e2
>  [<ffffffff811061e9>] ? rcu_eqs_enter+0x70/0x83
>  [<ffffffff8110620a>] ? rcu_user_enter+0xe/0x10
>  [<ffffffff810f99ed>] ? cpuacct_account_field+0x55/0x5e
>  [<ffffffff810ebcfa>] ? account_user_time+0x6e/0x97
>  [<ffffffff810ebd70>] ? vtime_account_user+0x4d/0x52
>  [<ffffffff8106b0f7>] do_page_fault+0x44/0x61
>  [<ffffffff81883e38>] page_fault+0x28/0x30
> 
> And then the php-fpm process is present 14 more times with the same
> backtrace. In total, the number of btrfs calls is pretty much in the
> following state:
> 
> $ grep btrfs blocked.txt | sort
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff812e9c53>] ? btrfs_block_rsv_check+0x55/0x61
>  [<ffffffff813020c1>] ? btrfs_write_marked_extents+0x83/0xd1
>  [<ffffffff8130216b>] ? btrfs_write_and_wait_transaction+0x5c/0x8a
>  [<ffffffff81302ee2>] ? btrfs_commit_transaction+0x68b/0x87c
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff81303380>] ? __btrfs_end_transaction+0x2ad/0x2d1
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee2a>] btrfs_page_mkwrite+0xb0/0x2c2
>  [<ffffffff8130ee69>] btrfs_page_mkwrite+0xef/0x2c2
>  [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
>  [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
>  [<ffffffff813143a5>] __btrfs_buffered_write+0x214/0x482
>  [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
>  [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
>  [<ffffffff81314a2a>] btrfs_file_aio_write+0x417/0x507
>  [<ffffffff81324f22>] ? __btrfs_map_block+0xd6f/0xd89
>  [<ffffffff813255b6>] ? btrfs_map_bio+0x3cd/0x440
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
>  [<ffffffff81348277>] ? __btrfs_release_delayed_node+0x184/0x1e2
> Workqueue: btrfs-submit normal_work_helper
> btrfs-transacti D 000000000000000e     0  2483      2 0x00080008
> 
> I can send the whole file, if someone is interested.
> 
> Appreciate any feedback.
> 
> Cheers,
> Alin.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Deadlock/high load
  2014-06-13  3:37 ` Liu Bo
@ 2014-06-13  6:46   ` Alin Dobre
  2014-06-13  6:50   ` Alin Dobre
  1 sibling, 0 replies; 6+ messages in thread
From: Alin Dobre @ 2014-06-13  6:46 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On 13/06/14 04:37, Liu Bo wrote:
> The output of 'btrfs filesystem df' is appreciate, it can help determine if the
> FS has entered into 'almost full' situation, and that may cause a bug that pages
> are not marked with writeback tag and lead to processes's endless waiting.

I'll send it as soon as it happens again, thanks!

Cheers,
Alin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Deadlock/high load
  2014-06-13  3:37 ` Liu Bo
  2014-06-13  6:46   ` Alin Dobre
@ 2014-06-13  6:50   ` Alin Dobre
  1 sibling, 0 replies; 6+ messages in thread
From: Alin Dobre @ 2014-06-13  6:50 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On 13/06/14 04:37, Liu Bo wrote:
> The output of 'btrfs filesystem df' is appreciate, it can help determine if the
> FS has entered into 'almost full' situation, and that may cause a bug that pages
> are not marked with writeback tag and lead to processes's endless waiting.

The output now (when the host is stable) is:
Data, single: total=54.01GiB, used=24.48GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=3.01GiB, used=1.31GiB
unknown, single: total=464.00MiB, used=0.00

Cheers,
Alin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Deadlock/high load
  2014-06-12 15:15 Deadlock/high load Alin Dobre
  2014-06-13  3:37 ` Liu Bo
@ 2014-06-20 16:22 ` Alin Dobre
  2014-06-27 16:12 ` Alin Dobre
  2 siblings, 0 replies; 6+ messages in thread
From: Alin Dobre @ 2014-06-20 16:22 UTC (permalink / raw)
  To: linux-btrfs

On 12/06/14 16:15, Alin Dobre wrote:
> Hi all,
> 
> I have a problem that triggers quite often on our production machines. I
> don't really know what's triggering this or how to reproduce it, but the
> machine enters in some sort of deadlock state, where it consumes all the
> i/o and the load average goes very high in seconds (it even gets to over
> 200), sometimes in about a minute or even less, the machine is
> unresponsive and we have to reset it. Rarely, the load just stays high
> (~25) for hours, but it never gets down again, but this happens rarely,
> as I said. In general, the machine is either already unresponsive or is
> about to become unresponsive.
> 
> The last machine that encountered this has 40 cores and the btrfs
> filesystem is running over SSDs. We encountered this on a plain 3.14
> kernel, and also on the latest 3.14.6 kernel + all the patches whose
> summary is marked "btrfs:" that made it in 3.15, straight forward
> backported (cherry-picked) to 3.14.
> 
> Also, no suspicious (malicious) activity from the running processes either.
> 
> I noticed there was another report on 3.13 which was solved by a 3.15rc
> patch, it doesn't seem to be the same thing.
> 
> Since the only chance to obtain something was via a SysRq dump, here's
> what I could get from the last "w" trigger (tasks that are in
> uninterruptable (blocked) state), showing only tasks that are related to
> btrfs:

I tried to reproduce this on a slower/older machine with older SSDs and
couldn't get anywhere, the machine stood up. However, when I tried one
of our faster/newer machine also with newer and faster SSDs, I managed
to reproduce it twice.

I should mention that the disks are set up in a MD RAID6, and btrfs
single for both data and metadata is on top of that. I ran bonnie++ to
reproduce it (bonnie++ -d /home/bonnie -s 4g -m test -r 1024 -x 100 -u
bonnie) inside a container that was memory capped to 1GB (hence the -r
1024) with the help of cgroups.

Just before the machine stopped being fully responsive I had 3 processes
that were consuming 100% CPU: md128_raid6, btrfs-transact,
kworker/u82:6. The load was fairly low, but atop stopped working at ~5
load average.

I couldn't dump the sysrq blocked processes this time, but the above 3
processes are also in my initial report.

As per Liu Bo's request, the output of the df command is:
Data, single: total=73.01GiB, used=28.05GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=3.01GiB, used=1.04GiB
unknown, single: total=368.00MiB, used=0.00
at the moment when atop was already unresponsive.

Another thing to mention is that our production machines also have a
fairly high traffic of snapshotting (or plain creation, more rarely) and
deletion operations on subvolumes that are quota enabled.

Cheers,
Alin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Deadlock/high load
  2014-06-12 15:15 Deadlock/high load Alin Dobre
  2014-06-13  3:37 ` Liu Bo
  2014-06-20 16:22 ` Alin Dobre
@ 2014-06-27 16:12 ` Alin Dobre
  2 siblings, 0 replies; 6+ messages in thread
From: Alin Dobre @ 2014-06-27 16:12 UTC (permalink / raw)
  To: linux-btrfs, linux-raid, linux-kernel; +Cc: NeilBrown, Chris Mason

Hello,

When using parity md raid backed up by faster SSD disks, with btrfs on
top of it, at intensive I/O, the machine enters a sort of deadlock and
the load average starts to grow until a point where the machine is no
longer responsive.

At the time when the deadlock happens, there are 2 processes that
consume 100% CPU, as reported by atop:
P1. mdX_raidY
P2. btrfs-trasaction
where X is the number of the md device, 130 in our case, Y is the raid
level, which is one of 4, 5, or 6, since it only happens when using
parity raid. Sometimes there's also a kworker process along with the two
above. I'm going to paste the SysRq dump for these processes at the end
of this e-mail.

As stated by the tests table below, it doesn't happen with a file system
other than btrfs, and it doesn't happen with linear raid configurations
(raid 0, 1, 10).

I managed to reproduce the problem only by using ramdisks instead of
SSDs for the base block storage layer. It doesn't reproduce with slower
spinning disks, even some older SSD disks, so the speed is an important
factor for this race condition.

I have used all current kernels, starting from 3.15, to 3.16-rc2, and
linux-next-20140624. The problem is still there.

Here are the steps to reproduce.

R1. Install Bonnie++ disk benchmark tool. I installed it manually from
[1]. Also, make sure you configured ramdisk block devices in the kernel.

R2. Create the md array on top of the /dev/ramX blocks:
mdadm --create --auto=md \
  --level=6 --raid-devices=10 \
  --bitmap=internal --bitmap-chunk=65536 \
  --metadata=0.90 --assume-clean \
  --run --force \
  /dev/md130 \
  /dev/ram{0..9}

R3. Create and mount the filesystem
mkfs.btrfs -K -f -d single -m single /dev/md130
# (a plain "mkfs.ext4 /dev/md130" was used for the ext4 test, see below)
mount /dev/md130 /tmp/test

R4. Create the test environment
mkdir /tmp/test/bonnie
chown nobody /tmp/test/bonnie

R5. Run bonnie
bonnie++ -d /tmp/test/bonnie -s 20480m -m test -r 10240 -x 100 -u nobody


After between 1 and 12 run cycles, the machine enters the state
described at the beginning of this e-mail.

Below are the tests I have performed, with the result. FAIL means that
the machines breaks and the problem happens.
T1. md raid 6 + btrfs + discard => FAIL
T2. md raid 6 + btrfs - discard => FAIL
T3. md raid 5 + btrfs + discard => FAIL
T4. md raid 4 + btrfs + discard => FAIL
T5. md raid 6 + ext4 - discard => PASS
T6. md raid 10 + btrfs + discard => PASS
T7. md raid 0 + btrfs + discard => PASS
T8. md raid 1 + btrfs + discard => PASS
T9. btrfs + discard => PASS

The discard option (as in mount -o ssd,discard) is present in the above
tests to be able to make the environment as close as our production
machines as possible. But in the end it doesn't seem to be a root cause.
Also, this is the case for raid 6, this is the level we use on our
production machines, and it has been preferred over other levels for
many of the tests.

And the SysRq "t" dump follows, for the 2 main processes above. The full
output is available at [3]. You can also find some information in the
SysRq output from the original report at [2].

md130_raid6     R  running task        0  1146      2 0x00080008
ffffea005be81e00 0000000000000000 0000000000000000 ffff880ffb107af8
ffffffff818478a0 0000000000000000 0000000000000000 ffff880ffb107b20
ffffffff818478a0 0000000000000000 ffff881c0810f478 0000000180190018
Call Trace:
[<ffffffff818478a0>] ? __slab_free+0x20b/0x2d9
[<ffffffff818478a0>] ? __slab_free+0x20b/0x2d9
[<ffffffff81178dd4>] ? kmem_cache_free+0x17e/0x1a3
[<ffffffff8113a809>] ? mempool_free_slab+0x12/0x14
[<ffffffff8113ac77>] ? mempool_free+0x71/0x7a
[<ffffffff813b3dd4>] ? bio_put+0x59/0x68
[<ffffffff810f4518>] ? select_task_rq_fair+0x532/0x565
[<ffffffff81616cea>] ? handle_stripe+0x1a97/0x1c06
[<ffffffff810ff20b>] ? autoremove_wake_function+0xd/0x2f
[<ffffffff8160f11b>] ? release_inactive_stripe_list+0x65/0x108
[<ffffffff81617077>] ? handle_active_stripes.isra.30+0x21e/0x2b3
[<ffffffff81617501>] ? raid5d+0x358/0x3c1
[<ffffffff810d7ec5>] ? del_timer_sync+0x25/0x43
[<ffffffff8163410a>] ? md_thread+0x10f/0x125
[<ffffffff810ff1fe>] ? __wake_up_sync+0xd/0xd
[<ffffffff81633ffb>] ? md_register_thread+0xcb/0xcb
[<ffffffff810e72cc>] ? kthread+0xd6/0xde
[<ffffffff810e71f6>] ? kthread_create_on_node+0x167/0x167
[<ffffffff8185043c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff810e71f6>] ? kthread_create_on_node+0x167/0x167

btrfs-transacti S ffff882026d119c0     0   704      2 0x00000000
ffff8817f9c77df8 0000000000000002 ffff8817f9c77fd8 ffff8817fbf45f90
00000000000119c0 ffff881ffb173420 ffff8817fbf45748 ffff8817f9c77d40
ffffffff810f45b2 ffff8817f9c77d88 ffffffff810f8d15 ffff8817f9c77d90
Call Trace:
[<ffffffff810f45b2>] ? __enqueue_entity+0x67/0x69
[<ffffffff810f8d15>] ? enqueue_entity+0x4bd/0x5e1
[<ffffffff8106040c>] ? physflat_send_IPI_mask+0x9/0xb
[<ffffffff8105a0f2>] ? native_smp_send_reschedule+0x56/0x58
[<ffffffff810ed64a>] ? resched_task+0x45/0x48
[<ffffffff8184d4d8>] schedule+0x65/0x67
[<ffffffff8184cb07>] schedule_timeout+0x13f/0x167
[<ffffffff810d7a6d>] ? __internal_add_timer+0xb5/0xb5
[<ffffffff812f1ba1>] transaction_kthread+0x1a2/0x1cb
[<ffffffff812f19ff>] ? btrfs_cleanup_transaction+0x45b/0x45b
[<ffffffff810e72cc>] kthread+0xd6/0xde
[<ffffffff810e71f6>] ? kthread_create_on_node+0x167/0x167
[<ffffffff8185043c>] ret_from_fork+0x7c/0xb0
[<ffffffff810e71f6>] ? kthread_create_on_node+0x167/0x167


I can reproduce this under different testing conditions and gather some
more data, if requested.

Thank for any insight or feedback.

Links referenced above:
[1]
http://archive.ubuntu.com/ubuntu/pool/main/b/bonnie++/bonnie++_1.97.1.tar.gz
[2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/35865
[3] http://84.45.121.180/sysrq.txt


Cheers,
Alin.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-27 16:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-12 15:15 Deadlock/high load Alin Dobre
2014-06-13  3:37 ` Liu Bo
2014-06-13  6:46   ` Alin Dobre
2014-06-13  6:50   ` Alin Dobre
2014-06-20 16:22 ` Alin Dobre
2014-06-27 16:12 ` Alin Dobre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.