All of lore.kernel.org
 help / color / mirror / Atom feed
* hang on 3.9, 3.10-rc5
@ 2013-06-11 15:43 Sage Weil
  2013-06-11 16:21 ` Chris Mason
  2013-06-18 16:37 ` Josef Bacik
  0 siblings, 2 replies; 23+ messages in thread
From: Sage Weil @ 2013-06-11 15:43 UTC (permalink / raw)
  To: linux-btrfs

I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
is a known problem?  In this case there is no powercycling; just a regular 
ceph-osd workload.

Thanks-
sage


[ 2885.479116] INFO: task kworker/u64:1:28713 blocked for more than 120 seconds.
[ 2885.486277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2885.494201] kworker/u64:1   D ffff880222b81f90     0 28713      2 0x00000000
[ 2885.501355] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-12)
[ 2885.507924]  ffff88021c3d7238 0000000000000046 0000000000000286 ffff880082af52e8
[ 2885.515456]  ffff880222b81f90 ffff88021c3d7fd8 ffff88021c3d7fd8 ffff88021c3d7fd8
[ 2885.522987]  ffff880225d2bf20 ffff880222b81f90 ffff88021c3d7248 ffff880082af51c0
[ 2885.530497] Call Trace:
[ 2885.532964]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2885.537972]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 2885.544407]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2885.550760]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 2885.557490]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 2885.564196]  [<ffffffffa01ff76e>] ? free_extent_map+0x4e/0x90 [btrfs]
[ 2885.570738]  [<ffffffffa01dd198>] btrfs_lookup_file_extent+0x38/0x40 [btrfs]
[ 2885.577823]  [<ffffffffa01fd1be>] __btrfs_drop_extents+0x12e/0xaf0 [btrfs]
[ 2885.584737]  [<ffffffff8107b093>] ? select_task_rq_fair+0x53/0x8a0
[ 2885.590977]  [<ffffffff81167788>] ? kmem_cache_alloc+0xd8/0x160
[ 2885.596934]  [<ffffffffa01fe63e>] btrfs_drop_extents+0x6e/0xa0 [btrfs]
[ 2885.603531]  [<ffffffffa01f16c1>] cow_file_range_inline+0xf1/0x1e0 [btrfs]
[ 2885.610451]  [<ffffffffa01f1ad6>] __cow_file_range+0x326/0x4b0 [btrfs]
[ 2885.617014]  [<ffffffffa01e89ea>] ? join_transaction.isra.32+0x10a/0x3c0 [btrfs]
[ 2885.624520]  [<ffffffffa01f27d5>] cow_file_range+0x95/0xe0 [btrfs]
[ 2885.630772]  [<ffffffffa01f2b5b>] run_delalloc_range+0x33b/0x360 [btrfs]
[ 2885.637581]  [<ffffffffa0208120>] __extent_writepage+0x5e0/0x770 [btrfs]
[ 2885.644357]  [<ffffffff8111cf9f>] ? find_get_pages_tag+0x11f/0x1c0
[ 2885.650614]  [<ffffffff8111ceae>] ? find_get_pages_tag+0x2e/0x1c0
[ 2885.656744]  [<ffffffffa0208532>] extent_write_cache_pages.isra.29.constprop.41+0x282/0x3e0 [btrfs]
[ 2885.665906]  [<ffffffffa01ec8c0>] ? add_pending_csums.isra.44+0x70/0x70 [btrfs]
[ 2885.673325]  [<ffffffffa01ebde0>] ? btrfs_readpage_end_io_hook+0x270/0x270 [btrfs]
[ 2885.680990]  [<ffffffff8119815c>] ? __writeback_single_inode+0x6c/0x2a0
[ 2885.687792]  [<ffffffffa020892e>] extent_writepages+0x4e/0x70 [btrfs]
[ 2885.694296]  [<ffffffffa01eeec0>] ? can_nocow_odirect+0x330/0x330 [btrfs]
[ 2885.701787]  [<ffffffffa01ec698>] btrfs_writepages+0x28/0x30 [btrfs]
[ 2885.708164]  [<ffffffff811289e3>] do_writepages+0x23/0x40
[ 2885.713679]  [<ffffffff8119813e>] __writeback_single_inode+0x4e/0x2a0
[ 2885.720347]  [<ffffffff81199630>] writeback_sb_inodes+0x280/0x3e0
[ 2885.726528]  [<ffffffff8119982e>] __writeback_inodes_wb+0x9e/0xd0
[ 2885.732766]  [<ffffffff81199a5b>] wb_writeback+0x1fb/0x310
[ 2885.738271]  [<ffffffff81127a21>] ? bdi_dirty_limit+0x31/0xc0
[ 2885.744266]  [<ffffffff8119b3e8>] wb_do_writeback+0x138/0x1e0
[ 2885.750091]  [<ffffffff8119b50a>] bdi_writeback_workfn+0x7a/0x200
[ 2885.756207]  [<ffffffff8105f4ba>] process_one_work+0x1da/0x540
[ 2885.762111]  [<ffffffff8105f44f>] ? process_one_work+0x16f/0x540
[ 2885.768138]  [<ffffffff8106069c>] worker_thread+0x11c/0x370
[ 2885.773836]  [<ffffffff81060580>] ? manage_workers.isra.20+0x2e0/0x2e0
[ 2885.780513]  [<ffffffff8106735a>] kthread+0xea/0xf0
[ 2885.785412]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2885.791805]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
[ 2885.797227]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2885.803694] INFO: lockdep is turned off.
[ 2885.807640] INFO: task btrfs-transacti:23992 blocked for more than 120 seconds.
[ 2885.815086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2885.823067] btrfs-transacti D ffff880222493f20     0 23992      2 0x00000000
[ 2885.830265]  ffff88020cebfa28 0000000000000046 0000000000000286 ffff880082af52e8
[ 2885.837761]  ffff880222493f20 ffff88020cebffd8 ffff88020cebffd8 ffff88020cebffd8
[ 2885.845281]  ffff880225d29f90 ffff880222493f20 ffff88020cebfa38 ffff880082af51c0
[ 2885.852840] Call Trace:
[ 2885.855306]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2885.860393]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 2885.866772]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2885.873123]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 2885.879881]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 2885.886517]  [<ffffffff8163760b>] ? _raw_spin_unlock+0x2b/0x40
[ 2885.892465]  [<ffffffffa01c8f28>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 2885.899574]  [<ffffffffa0238d54>] btrfs_insert_delayed_items+0x84/0x460 [btrfs]
[ 2885.906936]  [<ffffffffa0239938>] __btrfs_run_delayed_items+0xb8/0x1e0 [btrfs]
[ 2885.914285]  [<ffffffffa0239a93>] btrfs_run_delayed_items+0x13/0x20 [btrfs]
[ 2885.921371]  [<ffffffffa01e7c32>] btrfs_flush_all_pending_stuffs+0xa2/0xe0 [btrfs]
[ 2885.929040]  [<ffffffffa01e9511>] btrfs_commit_transaction+0x201/0xa50 [btrfs]
[ 2885.937489]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2885.943838]  [<ffffffffa01e289d>] transaction_kthread+0x1ad/0x230 [btrfs]
[ 2885.950705]  [<ffffffffa01e26f0>] ? check_leaf.isra.108+0x330/0x330 [btrfs]
[ 2885.957692]  [<ffffffff8106735a>] kthread+0xea/0xf0
[ 2885.962630]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2885.969024]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
[ 2885.974473]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2885.980918] INFO: lockdep is turned off.
[ 2885.984960] INFO: task ceph-osd:24240 blocked for more than 120 seconds.
[ 2885.991751] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2885.999743] ceph-osd        D ffff88022431f830     0 24240  24119 0x00000000
[ 2886.007005]  ffff88021af51be8 0000000000000046 0000000000000282 ffff88022431f8a0
[ 2886.014543]  ffff88021af48000 ffff88021af51fd8 ffff88021af51fd8 ffff88021af51fd8
[ 2886.022103]  ffff880225d2deb0 ffff88021af48000 ffff88021af51bf8 ffff88022431f8a0
[ 2886.029653] Call Trace:
[ 2886.032137]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2886.037220]  [<ffffffffa0222275>] log_one_extent.isra.22+0x485/0x750 [btrfs]
[ 2886.044315]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2886.050672]  [<ffffffffa022486c>] btrfs_log_inode+0x7bc/0x8e0 [btrfs]
[ 2886.057224]  [<ffffffffa02259b2>] btrfs_log_inode_parent+0x192/0x460 [btrfs]
[ 2886.064373]  [<ffffffff8118585c>] ? dget_parent+0x1c/0xd0
[ 2886.069893]  [<ffffffffa02267d6>] btrfs_log_dentry_safe+0x46/0x70 [btrfs]
[ 2886.076754]  [<ffffffffa01fadf5>] btrfs_sync_file+0x155/0x2c0 [btrfs]
[ 2886.083288]  [<ffffffff8118c400>] ? fget+0x90/0xe0
[ 2886.088122]  [<ffffffff8118c48c>] ? fget_light+0x3c/0x130
[ 2886.093602]  [<ffffffff8119ef3d>] do_fsync+0x5d/0x90
[ 2886.098588]  [<ffffffff8131e7fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2886.105069]  [<ffffffff8119f350>] SyS_fsync+0x10/0x20
[ 2886.110215]  [<ffffffff81640042>] system_call_fastpath+0x16/0x1b
[ 2886.116241] INFO: lockdep is turned off.
[ 2886.120232] INFO: task ceph-osd:24324 blocked for more than 120 seconds.
[ 2886.126978] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2886.134868] ceph-osd        D ffff8802228e1f90     0 24324  24119 0x00000000
[ 2886.141997]  ffff8802228eb948 0000000000000046 0000000000000282 ffff880082af52e8
[ 2886.149528]  ffff8802228e1f90 ffff8802228ebfd8 ffff8802228ebfd8 ffff8802228ebfd8
[ 2886.157041]  ffff880225d2bf20 ffff8802228e1f90 ffff8802228eb958 ffff880082af51c0
[ 2886.164550] Call Trace:
[ 2886.167015]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2886.172093]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 2886.178471]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2886.184915]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 2886.191689]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 2886.198333]  [<ffffffffa01c457c>] ? comp_keys+0x2c/0x30 [btrfs]
[ 2886.204398]  [<ffffffffa01c8f28>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 2886.211530]  [<ffffffff812ee41b>] ? chksum_update+0x1b/0x30
[ 2886.217138]  [<ffffffffa01dc30c>] insert_with_overflow+0x3c/0x110 [btrfs]
[ 2886.224033]  [<ffffffffa01dc678>] btrfs_insert_xattr_item+0x98/0x190 [btrfs]
[ 2886.231174]  [<ffffffff8163760b>] ? _raw_spin_unlock+0x2b/0x40
[ 2886.237044]  [<ffffffffa02093d9>] ? free_extent_buffer+0x49/0xc0 [btrfs]
[ 2886.243853]  [<ffffffffa02093f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
[ 2886.250649]  [<ffffffffa0200759>] do_setxattr+0x1a9/0x2d0 [btrfs]
[ 2886.256778]  [<ffffffffa0200a83>] __btrfs_setxattr+0x93/0x110 [btrfs]
[ 2886.263379]  [<ffffffffa0200f60>] btrfs_setxattr+0x90/0xd0 [btrfs]
[ 2886.269642]  [<ffffffff81193cfb>] __vfs_setxattr_noperm+0x7b/0x1c0
[ 2886.275841]  [<ffffffff81193f04>] vfs_setxattr+0xc4/0xd0
[ 2886.281236]  [<ffffffff8119404e>] setxattr+0x13e/0x1e0
[ 2886.286393]  [<ffffffff8118cc92>] ? __fd_install+0x32/0x70
[ 2886.291999]  [<ffffffff811717f7>] ? __sb_start_write+0xe7/0x1b0
[ 2886.298016]  [<ffffffff8118f118>] ? mnt_want_write_file+0x28/0x60
[ 2886.304201]  [<ffffffff8118c48c>] ? fget_light+0x3c/0x130
[ 2886.309753]  [<ffffffff8118f118>] ? mnt_want_write_file+0x28/0x60
[ 2886.315938]  [<ffffffff8118f028>] ? __mnt_want_write+0x58/0x70
[ 2886.321869]  [<ffffffff811944ee>] SyS_fsetxattr+0xbe/0x100
[ 2886.327375]  [<ffffffff81640042>] system_call_fastpath+0x16/0x1b
[ 2886.333517] INFO: lockdep is turned off.
[ 2886.337487] INFO: task ceph-osd:24325 blocked for more than 120 seconds.
[ 2886.344213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2886.352127] ceph-osd        D ffff8802228e3f20     0 24325  24119 0x00000000
[ 2886.359299]  ffff8802228ed948 0000000000000046 0000000000000282 ffff880082af52e8
[ 2886.366793]  ffff8802228e3f20 ffff8802228edfd8 ffff8802228edfd8 ffff8802228edfd8
[ 2886.374302]  ffff880225d38000 ffff8802228e3f20 ffff8802228ed958 ffff880082af51c0
[ 2886.381820] Call Trace:
[ 2886.384282]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2886.389275]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 2886.395716]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2886.402063]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 2886.408785]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 2886.415501]  [<ffffffffa01c457c>] ? comp_keys+0x2c/0x30 [btrfs]
[ 2886.421487]  [<ffffffffa01c8f28>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 2886.428555]  [<ffffffff812ee41b>] ? chksum_update+0x1b/0x30
[ 2886.434205]  [<ffffffffa01dc30c>] insert_with_overflow+0x3c/0x110 [btrfs]
[ 2886.441080]  [<ffffffffa01dc678>] btrfs_insert_xattr_item+0x98/0x190 [btrfs]
[ 2886.448146]  [<ffffffff8163760b>] ? _raw_spin_unlock+0x2b/0x40
[ 2886.454117]  [<ffffffffa02093d9>] ? free_extent_buffer+0x49/0xc0 [btrfs]
[ 2886.460897]  [<ffffffffa02093f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
[ 2886.467626]  [<ffffffffa0200759>] do_setxattr+0x1a9/0x2d0 [btrfs]
[ 2886.473805]  [<ffffffffa0200a83>] __btrfs_setxattr+0x93/0x110 [btrfs]
[ 2886.480324]  [<ffffffffa0200f60>] btrfs_setxattr+0x90/0xd0 [btrfs]
[ 2886.486521]  [<ffffffff81193cfb>] __vfs_setxattr_noperm+0x7b/0x1c0
[ 2886.492826]  [<ffffffff81193f04>] vfs_setxattr+0xc4/0xd0
[ 2886.498225]  [<ffffffff8119404e>] setxattr+0x13e/0x1e0
[ 2886.503433]  [<ffffffff811717f7>] ? __sb_start_write+0xe7/0x1b0
[ 2886.509370]  [<ffffffff8118f118>] ? mnt_want_write_file+0x28/0x60
[ 2886.515561]  [<ffffffff8118c48c>] ? fget_light+0x3c/0x130
[ 2886.521030]  [<ffffffff8118f118>] ? mnt_want_write_file+0x28/0x60
[ 2886.527174]  [<ffffffff8118de82>] ? mnt_clone_write+0x12/0x30
[ 2886.532989]  [<ffffffff811944ee>] SyS_fsetxattr+0xbe/0x100
[ 2886.538492]  [<ffffffff81640042>] system_call_fastpath+0x16/0x1b
[ 2886.544576] INFO: lockdep is turned off.
[ 2886.548528] INFO: task ceph-osd:24340 blocked for more than 120 seconds.
[ 2886.555295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2886.563210] ceph-osd        D ffff88020aae1f90     0 24340  24119 0x00000000
[ 2886.570348]  ffff88020aaebbf8 0000000000000046 ffff88020aaebc18 0000000000000246
[ 2886.577847]  ffff88020aae1f90 ffff88020aaebfd8 ffff88020aaebfd8 ffff88020aaebfd8
[ 2886.585345]  ffff880225d3deb0 ffff88020aae1f90 0000000000000000 ffff88021a8b84e0
[ 2886.592993] Call Trace:
[ 2886.595498]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2886.600533]  [<ffffffff8163620e>] schedule_preempt_disabled+0xe/0x10
[ 2886.606904]  [<ffffffff8163357c>] mutex_lock_nested+0x15c/0x360
[ 2886.612887]  [<ffffffff8162eae1>] ? lookup_slow+0x38/0xab
[ 2886.618379]  [<ffffffff8162eae1>] lookup_slow+0x38/0xab
[ 2886.623717]  [<ffffffff8117c372>] path_lookupat+0x222/0x790
[ 2886.629306]  [<ffffffff811676e9>] ? kmem_cache_alloc+0x39/0x160
[ 2886.635300]  [<ffffffff8117c914>] filename_lookup+0x34/0xc0
[ 2886.640915]  [<ffffffff8117fbd9>] user_path_at_empty+0x59/0xa0
[ 2886.646767]  [<ffffffff8113e27e>] ? might_fault+0x8e/0xa0
[ 2886.652234]  [<ffffffff8113e235>] ? might_fault+0x45/0xa0
[ 2886.657726]  [<ffffffff811746a6>] ? cp_new_stat+0x116/0x130
[ 2886.663361]  [<ffffffff8117fc31>] user_path_at+0x11/0x20
[ 2886.668691]  [<ffffffff81174921>] vfs_fstatat+0x51/0xb0
[ 2886.674008]  [<ffffffff81174a4b>] vfs_stat+0x1b/0x20
[ 2886.678988]  [<ffffffff81174a65>] SYSC_newstat+0x15/0x30
[ 2886.684347]  [<ffffffff8131e7fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2886.690898]  [<ffffffff81174bbe>] SyS_newstat+0xe/0x10
[ 2886.696068]  [<ffffffff81640042>] system_call_fastpath+0x16/0x1b
[ 2886.702101] INFO: lockdep is turned off.
[ 2886.706075] INFO: task btrfs-endio-wri:26078 blocked for more than 120 seconds.
[ 2886.713450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2886.721404] btrfs-endio-wri D ffff8802187e1f90     0 26078      2 0x00000000
[ 2886.728564]  ffff8802152db918 0000000000000046 0000000000000282 ffff880082af52e8
[ 2886.736122]  ffff8802187e1f90 ffff8802152dbfd8 ffff8802152dbfd8 ffff8802152dbfd8
[ 2886.743676]  ffff880225d29f90 ffff8802187e1f90 ffff8802152db928 ffff880082af51c0
[ 2886.751311] Call Trace:
[ 2886.753774]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2886.758831]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 2886.765259]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2886.771663]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 2886.778438]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 2886.785128]  [<ffffffff8163760b>] ? _raw_spin_unlock+0x2b/0x40
[ 2886.791036]  [<ffffffffa01dd198>] btrfs_lookup_file_extent+0x38/0x40 [btrfs]
[ 2886.798119]  [<ffffffffa01fd1be>] __btrfs_drop_extents+0x12e/0xaf0 [btrfs]
[ 2886.805033]  [<ffffffff810a317d>] ? trace_hardirqs_on+0xd/0x10
[ 2886.810900]  [<ffffffff81167788>] ? kmem_cache_alloc+0xd8/0x160
[ 2886.816883]  [<ffffffffa01e8905>] ? join_transaction.isra.32+0x25/0x3c0 [btrfs]
[ 2886.824258]  [<ffffffffa01fe63e>] btrfs_drop_extents+0x6e/0xa0 [btrfs]
[ 2886.830872]  [<ffffffffa01ecf9b>] insert_reserved_file_extent.constprop.59+0x7b/0x2c0 [btrfs]
[ 2886.839450]  [<ffffffffa01f6b3e>] btrfs_finish_ordered_io+0x51e/0x9d0 [btrfs]
[ 2886.846640]  [<ffffffffa02171e1>] ? worker_loop+0xa1/0x580 [btrfs]
[ 2886.852896]  [<ffffffffa01f7005>] finish_ordered_fn+0x15/0x20 [btrfs]
[ 2886.859372]  [<ffffffffa02172aa>] worker_loop+0x16a/0x580 [btrfs]
[ 2886.865504]  [<ffffffffa0217140>] ? btrfs_queue_worker+0x320/0x320 [btrfs]
[ 2886.872456]  [<ffffffff8106735a>] kthread+0xea/0xf0
[ 2886.877371]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2886.883763]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
[ 2886.889180]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2886.895574] INFO: lockdep is turned off.
[ 2886.899513] INFO: task btrfs-endio-wri:380 blocked for more than 120 seconds.
[ 2886.907921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2886.915833] btrfs-endio-wri D ffff88020c1ebf20     0   380      2 0x00000000
[ 2886.923008]  ffff880203e83a68 0000000000000046 0000000000000286 ffff880082af52e8
[ 2886.930711]  ffff88020c1ebf20 ffff880203e83fd8 ffff880203e83fd8 ffff880203e83fd8
[ 2886.938177]  ffff88020d0e9f90 ffff88020c1ebf20 ffff880203e83a78 ffff880082af51c0
[ 2886.945817] Call Trace:
[ 2886.948280]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 2886.953326]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 2886.959752]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 2886.966097]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 2886.972931]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 2886.979633]  [<ffffffffa01c8f28>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 2886.986750]  [<ffffffffa01ed01f>] insert_reserved_file_extent.constprop.59+0xff/0x2c0 [btrfs]
[ 2886.995495]  [<ffffffffa01f6b3e>] btrfs_finish_ordered_io+0x51e/0x9d0 [btrfs]
[ 2887.002712]  [<ffffffffa02171e1>] ? worker_loop+0xa1/0x580 [btrfs]
[ 2887.008919]  [<ffffffffa01f7005>] finish_ordered_fn+0x15/0x20 [btrfs]
[ 2887.015434]  [<ffffffffa02172aa>] worker_loop+0x16a/0x580 [btrfs]
[ 2887.021585]  [<ffffffffa0217140>] ? btrfs_queue_worker+0x320/0x320 [btrfs]
[ 2887.028478]  [<ffffffff8106735a>] kthread+0xea/0xf0
[ 2887.033400]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2887.039779]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
[ 2887.045215]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 2887.051637] INFO: lockdep is turned off.
[ 3007.122737] INFO: task kworker/u64:1:28713 blocked for more than 120 seconds.
[ 3007.129898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3007.137877] kworker/u64:1   D ffff880222b81f90     0 28713      2 0x00000000
[ 3007.145094] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-12)
[ 3007.151678]  ffff88021c3d7238 0000000000000046 0000000000000286 ffff880082af52e8
[ 3007.159483]  ffff880222b81f90 ffff88021c3d7fd8 ffff88021c3d7fd8 ffff88021c3d7fd8
[ 3007.167063]  ffff880225d2bf20 ffff880222b81f90 ffff88021c3d7248 ffff880082af51c0
[ 3007.174592] Call Trace:
[ 3007.177059]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 3007.182065]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 3007.188504]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 3007.194919]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 3007.201663]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 3007.208372]  [<ffffffffa01ff76e>] ? free_extent_map+0x4e/0x90 [btrfs]
[ 3007.214944]  [<ffffffffa01dd198>] btrfs_lookup_file_extent+0x38/0x40 [btrfs]
[ 3007.222144]  [<ffffffffa01fd1be>] __btrfs_drop_extents+0x12e/0xaf0 [btrfs]
[ 3007.229106]  [<ffffffff8107b093>] ? select_task_rq_fair+0x53/0x8a0
[ 3007.235436]  [<ffffffff81167788>] ? kmem_cache_alloc+0xd8/0x160
[ 3007.241394]  [<ffffffffa01fe63e>] btrfs_drop_extents+0x6e/0xa0 [btrfs]
[ 3007.248024]  [<ffffffffa01f16c1>] cow_file_range_inline+0xf1/0x1e0 [btrfs]
[ 3007.255058]  [<ffffffffa01f1ad6>] __cow_file_range+0x326/0x4b0 [btrfs]
[ 3007.261633]  [<ffffffffa01e89ea>] ? join_transaction.isra.32+0x10a/0x3c0 [btrfs]
[ 3007.269142]  [<ffffffffa01f27d5>] cow_file_range+0x95/0xe0 [btrfs]
[ 3007.275452]  [<ffffffffa01f2b5b>] run_delalloc_range+0x33b/0x360 [btrfs]
[ 3007.282316]  [<ffffffffa0208120>] __extent_writepage+0x5e0/0x770 [btrfs]
[ 3007.289136]  [<ffffffff8111cf9f>] ? find_get_pages_tag+0x11f/0x1c0
[ 3007.295491]  [<ffffffff8111ceae>] ? find_get_pages_tag+0x2e/0x1c0
[ 3007.301622]  [<ffffffffa0208532>] extent_write_cache_pages.isra.29.constprop.41+0x282/0x3e0 [btrfs]
[ 3007.310834]  [<ffffffffa01ec8c0>] ? add_pending_csums.isra.44+0x70/0x70 [btrfs]
[ 3007.318319]  [<ffffffffa01ebde0>] ? btrfs_readpage_end_io_hook+0x270/0x270 [btrfs]
[ 3007.325995]  [<ffffffff8119815c>] ? __writeback_single_inode+0x6c/0x2a0
[ 3007.332648]  [<ffffffffa020892e>] extent_writepages+0x4e/0x70 [btrfs]
[ 3007.339197]  [<ffffffffa01eeec0>] ? can_nocow_odirect+0x330/0x330 [btrfs]
[ 3007.346084]  [<ffffffffa01ec698>] btrfs_writepages+0x28/0x30 [btrfs]
[ 3007.352459]  [<ffffffff811289e3>] do_writepages+0x23/0x40
[ 3007.357999]  [<ffffffff8119813e>] __writeback_single_inode+0x4e/0x2a0
[ 3007.364535]  [<ffffffff81199630>] writeback_sb_inodes+0x280/0x3e0
[ 3007.370653]  [<ffffffff8119982e>] __writeback_inodes_wb+0x9e/0xd0
[ 3007.376908]  [<ffffffff81199a5b>] wb_writeback+0x1fb/0x310
[ 3007.382483]  [<ffffffff81127a21>] ? bdi_dirty_limit+0x31/0xc0
[ 3007.388313]  [<ffffffff8119b3e8>] wb_do_writeback+0x138/0x1e0
[ 3007.394196]  [<ffffffff8119b50a>] bdi_writeback_workfn+0x7a/0x200
[ 3007.400323]  [<ffffffff8105f4ba>] process_one_work+0x1da/0x540
[ 3007.406271]  [<ffffffff8105f44f>] ? process_one_work+0x16f/0x540
[ 3007.412356]  [<ffffffff8106069c>] worker_thread+0x11c/0x370
[ 3007.417974]  [<ffffffff81060580>] ? manage_workers.isra.20+0x2e0/0x2e0
[ 3007.424767]  [<ffffffff8106735a>] kthread+0xea/0xf0
[ 3007.429766]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 3007.436263]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
[ 3007.441812]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 3007.448303] INFO: lockdep is turned off.
[ 3007.452250] INFO: task btrfs-transacti:23992 blocked for more than 120 seconds.
[ 3007.459765] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3007.467753] btrfs-transacti D ffff880222493f20     0 23992      2 0x00000000
[ 3007.475023]  ffff88020cebfa28 0000000000000046 0000000000000286 ffff880082af52e8
[ 3007.482676]  ffff880222493f20 ffff88020cebffd8 ffff88020cebffd8 ffff88020cebffd8
[ 3007.490257]  ffff880225d29f90 ffff880222493f20 ffff88020cebfa38 ffff880082af51c0
[ 3007.497889] Call Trace:
[ 3007.500354]  [<ffffffff81635ed9>] schedule+0x29/0x70
[ 3007.505452]  [<ffffffffa021ec95>] btrfs_tree_lock+0x75/0x270 [btrfs]
[ 3007.511956]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 3007.518358]  [<ffffffffa01c22cb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
[ 3007.525223]  [<ffffffffa01c77ba>] btrfs_search_slot+0x87a/0x940 [btrfs]
[ 3007.531917]  [<ffffffff8163760b>] ? _raw_spin_unlock+0x2b/0x40
[ 3007.537876]  [<ffffffffa01c8f28>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
[ 3007.545100]  [<ffffffffa0238d54>] btrfs_insert_delayed_items+0x84/0x460 [btrfs]
[ 3007.552485]  [<ffffffffa0239938>] __btrfs_run_delayed_items+0xb8/0x1e0 [btrfs]
[ 3007.559882]  [<ffffffffa0239a93>] btrfs_run_delayed_items+0x13/0x20 [btrfs]
[ 3007.567061]  [<ffffffffa01e7c32>] btrfs_flush_all_pending_stuffs+0xa2/0xe0 [btrfs]
[ 3007.574806]  [<ffffffffa01e9511>] btrfs_commit_transaction+0x201/0xa50 [btrfs]
[ 3007.582066]  [<ffffffff81067b50>] ? __init_waitqueue_head+0x60/0x60
[ 3007.588543]  [<ffffffffa01e289d>] transaction_kthread+0x1ad/0x230 [btrfs]
[ 3007.595429]  [<ffffffffa01e26f0>] ? check_leaf.isra.108+0x330/0x330 [btrfs]
[ 3007.602417]  [<ffffffff8106735a>] kthread+0xea/0xf0
[ 3007.607443]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 3007.613879]  [<ffffffff8163ff9c>] ret_from_fork+0x7c/0xb0
[ 3007.619299]  [<ffffffff81067270>] ? flush_kthread_worker+0x150/0x150
[ 3007.625788] INFO: lockdep is turned off.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-11 15:43 hang on 3.9, 3.10-rc5 Sage Weil
@ 2013-06-11 16:21 ` Chris Mason
       [not found]   ` <alpine.DEB.2.00.1306121317300.27596@cobra.newdream.net>
  2013-06-18 16:37 ` Josef Bacik
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Mason @ 2013-06-11 16:21 UTC (permalink / raw)
  To: Sage Weil, linux-btrfs

Quoting Sage Weil (2013-06-11 11:43:30)
> I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this
> is a known problem?  In this case there is no powercycling; just a regular
> ceph-osd workload.

Everyone here is waiting for the root node, but it isn't immediately
clear who has the lock.  log_one_extent is the most likely suspect, but
I can't see how it would be scheduling with the root lock held.

Could you please sysrq-w?

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
       [not found]   ` <alpine.DEB.2.00.1306121317300.27596@cobra.newdream.net>
@ 2013-06-18 15:56     ` Sage Weil
  2013-06-18 16:36       ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Sage Weil @ 2013-06-18 15:56 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Wed, 12 Jun 2013, Sage Weil wrote:
> On Tue, 11 Jun 2013, Chris Mason wrote:
> > Quoting Sage Weil (2013-06-11 11:43:30)
> > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this
> > > is a known problem?  In this case there is no powercycling; just a regular
> > > ceph-osd workload.
> > 
> > Everyone here is waiting for the root node, but it isn't immediately
> > clear who has the lock.  log_one_extent is the most likely suspect, but
> > I can't see how it would be scheduling with the root lock held.
> > 
> > Could you please sysrq-w?
> 
> Sorry for the slow reply; had to wait for it to reproduce again.  Attached 
> both the original dmesg output and the blocked tasks.  I'll leave the box 
> wedged this time in case there is any other info that can help.

Still seeing this hang with the latest from linus' tree.  Is there another 
tree I should try testing against?

Thanks!
sage


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-18 15:56     ` Sage Weil
@ 2013-06-18 16:36       ` Chris Mason
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Mason @ 2013-06-18 16:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-btrfs

Quoting Sage Weil (2013-06-18 11:56:37)
> On Wed, 12 Jun 2013, Sage Weil wrote:
> > On Tue, 11 Jun 2013, Chris Mason wrote:
> > > Quoting Sage Weil (2013-06-11 11:43:30)
> > > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this
> > > > is a known problem?  In this case there is no powercycling; just a regular
> > > > ceph-osd workload.
> > > 
> > > Everyone here is waiting for the root node, but it isn't immediately
> > > clear who has the lock.  log_one_extent is the most likely suspect, but
> > > I can't see how it would be scheduling with the root lock held.
> > > 
> > > Could you please sysrq-w?
> > 
> > Sorry for the slow reply; had to wait for it to reproduce again.  Attached 
> > both the original dmesg output and the blocked tasks.  I'll leave the box 
> > wedged this time in case there is any other info that can help.
> 
> Still seeing this hang with the latest from linus' tree.  Is there another 
> tree I should try testing against?

No, I don't see anything in -next that should fix this.  Trying to
reproduce here...

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-11 15:43 hang on 3.9, 3.10-rc5 Sage Weil
  2013-06-11 16:21 ` Chris Mason
@ 2013-06-18 16:37 ` Josef Bacik
  2013-06-18 16:59   ` Chris Mason
  2013-06-18 17:19   ` Jon Nelson
  1 sibling, 2 replies; 23+ messages in thread
From: Josef Bacik @ 2013-06-18 16:37 UTC (permalink / raw)
  To: Sage Weil; +Cc: linux-btrfs

On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
> is a known problem?  In this case there is no powercycling; just a regular 
> ceph-osd workload.
> 

Have you gotten sysrq+w?  Can you tell me where

log_one_extent.isra.22+0x485

is on your box?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-18 16:37 ` Josef Bacik
@ 2013-06-18 16:59   ` Chris Mason
  2013-06-20  3:44     ` Sage Weil
  2013-06-18 17:19   ` Jon Nelson
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Mason @ 2013-06-18 16:59 UTC (permalink / raw)
  To: Josef Bacik, Sage Weil; +Cc: linux-btrfs

Quoting Josef Bacik (2013-06-18 12:37:06)
> On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
> > is a known problem?  In this case there is no powercycling; just a regular 
> > ceph-osd workload.
> > 
> 
> Have you gotten sysrq+w?  Can you tell me where

He attached it last week.

> 
> log_one_extent.isra.22+0x485
> 
> is on your box?  Thanks,

It's very suspect that both of the times he logged this
log_one_extent popped up.  That should be: 

                wait_event(ordered->wait, ordered->csum_bytes_left == 0);

But Sage it would definitely help if you could confirm.

If we follow log_one_extent all the way up to btrfs_log_inode:

               } else if (test_and_clear_bit(BTRFS_INODE_COPY_EVERYTHING,
                                              &BTRFS_I(inode)->runtime_flags)) { 
                        if (inode_only == LOG_INODE_ALL)
                                fast_search = true;
                        max_key.type = BTRFS_XATTR_ITEM_KEY;
                        ret = drop_objectid_items(trans, log, path, ino,
                                                  max_key.type);

Now fast_search is true, but we don't jump directly to logging the
extent.  The while loop runs, we hit the first break.  ins_nr is zero.

Then we:
       if (fast_search) {
                btrfs_release_path(dst_path);
                ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
                if (ret) {
                        err = ret;
                        goto out_unlock;
                }


Very long way of saying I think we're one release_path short.  Sage, I
haven't tested this at all yet, I was hoping to trigger it first.

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c276ac9..c1954b3 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3730,6 +3730,7 @@ next_slot:
 log_extents:
 	if (fast_search) {
 		btrfs_release_path(dst_path);
+		btrfs_release_path(path);
 		ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
 		if (ret) {
 			err = ret;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-18 16:37 ` Josef Bacik
  2013-06-18 16:59   ` Chris Mason
@ 2013-06-18 17:19   ` Jon Nelson
  2013-06-21  1:11     ` Chris Mason
  2013-06-21 14:54     ` Johannes Hirte
  1 sibling, 2 replies; 23+ messages in thread
From: Jon Nelson @ 2013-06-18 17:19 UTC (permalink / raw)
  To: linux-btrfs

Josef Bacik <jbacik <at> fusionio.com> writes:

> 
> On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
> > is a known problem?  In this case there is no powercycling; just a regular 
> > ceph-osd workload.

..


I'm able to cause a complete kernel hang by defrag'ing even one 
file on 3.9.X (3.9.0 through 3.9.4, so far).




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-18 16:59   ` Chris Mason
@ 2013-06-20  3:44     ` Sage Weil
  2013-06-20 21:56       ` Sage Weil
  0 siblings, 1 reply; 23+ messages in thread
From: Sage Weil @ 2013-06-20  3:44 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, linux-btrfs

Hi Chris,

On Tue, 18 Jun 2013, Chris Mason wrote:
> [...]
> Very long way of saying I think we're one release_path short.  Sage, I
> haven't tested this at all yet, I was hoping to trigger it first.
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index c276ac9..c1954b3 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3730,6 +3730,7 @@ next_slot:
>  log_extents:
>  	if (fast_search) {
>  		btrfs_release_path(dst_path);
> +		btrfs_release_path(path);
>  		ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
>  		if (ret) {
>  			err = ret;

This seems to be doing the trick.  I'll keep testing overnight, but so far 
so good!

Thanks-
sage


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-20  3:44     ` Sage Weil
@ 2013-06-20 21:56       ` Sage Weil
  2013-06-21  0:42         ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Sage Weil @ 2013-06-20 21:56 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, linux-btrfs

On Wed, 19 Jun 2013, Sage Weil wrote:
> Hi Chris,
> 
> On Tue, 18 Jun 2013, Chris Mason wrote:
> > [...]
> > Very long way of saying I think we're one release_path short.  Sage, I
> > haven't tested this at all yet, I was hoping to trigger it first.
> > 
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index c276ac9..c1954b3 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -3730,6 +3730,7 @@ next_slot:
> >  log_extents:
> >  	if (fast_search) {
> >  		btrfs_release_path(dst_path);
> > +		btrfs_release_path(path);
> >  		ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> >  		if (ret) {
> >  			err = ret;
> 
> This seems to be doing the trick.  I'll keep testing overnight, but so far 
> so good!

...and it's still holding up well in QA.

Thanks, Chris!
sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-20 21:56       ` Sage Weil
@ 2013-06-21  0:42         ` Chris Mason
  2013-06-21  1:00           ` Sage Weil
  2013-08-09 18:04           ` Sage Weil
  0 siblings, 2 replies; 23+ messages in thread
From: Chris Mason @ 2013-06-21  0:42 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, linux-btrfs

Quoting Sage Weil (2013-06-20 17:56:19)
> On Wed, 19 Jun 2013, Sage Weil wrote:
> > Hi Chris,
> > 
> > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > [...]
> > > Very long way of saying I think we're one release_path short.  Sage, I
> > > haven't tested this at all yet, I was hoping to trigger it first.
> > > 
> > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > index c276ac9..c1954b3 100644
> > > --- a/fs/btrfs/tree-log.c
> > > +++ b/fs/btrfs/tree-log.c
> > > @@ -3730,6 +3730,7 @@ next_slot:
> > >  log_extents:
> > >     if (fast_search) {
> > >             btrfs_release_path(dst_path);
> > > +           btrfs_release_path(path);
> > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > >             if (ret) {
> > >                     err = ret;
> > 
> > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > so good!
> 
> ...and it's still holding up well in QA.

Awesome, thanks for getting the traces for us.  Looks like this one has
been around since v3.7, so I'm not going to try and sneak it into the
3.10 final.  I'll have it in the next merge window and for stable.

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  0:42         ` Chris Mason
@ 2013-06-21  1:00           ` Sage Weil
  2013-06-21  1:04             ` Chris Mason
  2013-08-09 18:04           ` Sage Weil
  1 sibling, 1 reply; 23+ messages in thread
From: Sage Weil @ 2013-06-21  1:00 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, linux-btrfs

On Thu, 20 Jun 2013, Chris Mason wrote:
> Quoting Sage Weil (2013-06-20 17:56:19)
> > On Wed, 19 Jun 2013, Sage Weil wrote:
> > > Hi Chris,
> > > 
> > > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > > [...]
> > > > Very long way of saying I think we're one release_path short.  Sage, I
> > > > haven't tested this at all yet, I was hoping to trigger it first.
> > > > 
> > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > > index c276ac9..c1954b3 100644
> > > > --- a/fs/btrfs/tree-log.c
> > > > +++ b/fs/btrfs/tree-log.c
> > > > @@ -3730,6 +3730,7 @@ next_slot:
> > > >  log_extents:
> > > >     if (fast_search) {
> > > >             btrfs_release_path(dst_path);
> > > > +           btrfs_release_path(path);
> > > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > > >             if (ret) {
> > > >                     err = ret;
> > > 
> > > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > > so good!
> > 
> > ...and it's still holding up well in QA.
> 
> Awesome, thanks for getting the traces for us.  Looks like this one has
> been around since v3.7, so I'm not going to try and sneak it into the
> 3.10 final.  I'll have it in the next merge window and for stable.

Weird, these same tests have been running on it nightly for ages and it 
seems like these failures just started with 3.9.  Perhaps some other 
change made it hang when it didn't before?

In any case, thanks!
sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  1:00           ` Sage Weil
@ 2013-06-21  1:04             ` Chris Mason
  2013-06-21  1:09               ` Sage Weil
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2013-06-21  1:04 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, linux-btrfs

Quoting Sage Weil (2013-06-20 21:00:21)
> On Thu, 20 Jun 2013, Chris Mason wrote:
> > Awesome, thanks for getting the traces for us.  Looks like this one has
> > been around since v3.7, so I'm not going to try and sneak it into the
> > 3.10 final.  I'll have it in the next merge window and for stable.
> 
> Weird, these same tests have been running on it nightly for ages and it 
> seems like these failures just started with 3.9.  Perhaps some other 
> change made it hang when it didn't before?
> 

It's always possible, there are a ton of moving pieces here.  The
wait_event you were hung on was waiting for crcs to finish, and that
part at least isn't new.

Somewhat unrelated, but are you still using notreelog?

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  1:04             ` Chris Mason
@ 2013-06-21  1:09               ` Sage Weil
  0 siblings, 0 replies; 23+ messages in thread
From: Sage Weil @ 2013-06-21  1:09 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, linux-btrfs

On Thu, 20 Jun 2013, Chris Mason wrote:
> Quoting Sage Weil (2013-06-20 21:00:21)
> > On Thu, 20 Jun 2013, Chris Mason wrote:
> > > Awesome, thanks for getting the traces for us.  Looks like this one has
> > > been around since v3.7, so I'm not going to try and sneak it into the
> > > 3.10 final.  I'll have it in the next merge window and for stable.
> > 
> > Weird, these same tests have been running on it nightly for ages and it 
> > seems like these failures just started with 3.9.  Perhaps some other 
> > change made it hang when it didn't before?
> > 
> 
> It's always possible, there are a ton of moving pieces here.  The
> wait_event you were hung on was waiting for crcs to finish, and that
> part at least isn't new.

K.  There was also a shift of writes to leveldb (which does the mmap 
thing), so that may explain the change in behavior.

> Somewhat unrelated, but are you still using notreelog?

Nope, just noatime.

Thanks-
sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-18 17:19   ` Jon Nelson
@ 2013-06-21  1:11     ` Chris Mason
  2013-06-21  1:46       ` Jon Nelson
  2013-06-21 14:54     ` Johannes Hirte
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Mason @ 2013-06-21  1:11 UTC (permalink / raw)
  To: Jon Nelson, linux-btrfs

Quoting Jon Nelson (2013-06-18 13:19:04)
> Josef Bacik <jbacik <at> fusionio.com> writes:
> 
> > 
> > On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
> > > is a known problem?  In this case there is no powercycling; just a regular 
> > > ceph-osd workload.
> 
> ..
> 
> 
> I'm able to cause a complete kernel hang by defrag'ing even one 
> file on 3.9.X (3.9.0 through 3.9.4, so far).

I'm not able to reproduce this here.  Could you please capture the
output from sysrq-w during the hang?  (It will all go to your dmesg)

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  1:11     ` Chris Mason
@ 2013-06-21  1:46       ` Jon Nelson
  2013-06-21  5:40         ` Chris Murphy
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jon Nelson @ 2013-06-21  1:46 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jon Nelson, linux-btrfs

Is this what you are looking for?
After this, the CPU gets "stuck" and I have to reboot.


[360491.932226] ------------[ cut here ]------------
[360491.932261] kernel BUG at
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144!
[360491.932312] invalid opcode: 0000 [#1] PREEMPT SMP
[360491.932344] Modules linked in: xfs nilfs2 jfs usb_storage
nls_iso8859_1 nls_cp437 vfat fat mmc_block nfsv4 auth_rpcgss nfs
fscache lockd sunrpc tun snd_usb_audio snd_usbmidi_lib snd_rawmidi
snd_seq_device fuse xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables arc4 iwldvm mac80211 snd_hda_codec_hdmi
snd_hda_codec_conexant iTCO_wdt iTCO_vendor_support mperf coretemp
kvm_intel kvm snd_hda_intel snd_hda_codec microcode snd_hwdep snd_pcm
thinkpad_acpi snd_timer joydev pcspkr sr_mod snd tpm_tis iwlwifi cdrom
e1000e wmi tpm battery ac tpm_bios cfg80211 sdhci_pci soundcore ptp
sdhci snd_page_alloc i2c_i801 rfkill pps_core mmc_core lpc_ich
mfd_core tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor
libcrc32c sha256_generic dm_crypt dm_mod crc32_pclmul
ghash_clmulni_intel crc32c_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul thermal i915 drm_kms_helper drm i2c_algo_bit
video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw
scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix
[360491.933095] CPU 3
[360491.933110] Pid: 22166, comm: btrfs-endio-wri Not tainted
3.9.6-1.g8ead728-desktop #1 LENOVO 4239CTO/4239CTO
[360491.933161] RIP: 0010:[<ffffffffa022075b>]  [<ffffffffa022075b>]
__tree_mod_log_rewind+0x23b/0x240 [btrfs]
[360491.933225] RSP: 0018:ffff88014cad7888  EFLAGS: 00010297
[360491.933253] RAX: 0000000000000000 RBX: ffff880065705b40 RCX:
ffff88014cad7828
[360491.933289] RDX: 000000000247cc54 RSI: ffff88014949e821 RDI:
ffff8801bf916640
[360491.933326] RBP: ffff880073048490 R08: 0000000000001000 R09:
ffff88014cad7838
[360491.933362] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8801c0556640
[360491.933398] R13: 000000000001731e R14: 000000000000003d R15:
ffff880158b76340
[360491.933435] FS:  0000000000000000(0000) GS:ffff88021e2c0000(0000)
knlGS:0000000000000000
[360491.933476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[360491.933506] CR2: 00007f0a360017f8 CR3: 000000015be03000 CR4:
00000000000407e0
[360491.933543] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[360491.933579] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[360491.933617] Process btrfs-endio-wri (pid: 22166, threadinfo
ffff88014cad6000, task ffff8801a5f8a6c0)
[360491.933662] Stack:
[360491.933674]  ffff88011b573800 ffff88010126e7f0 0000000000000001
0000160000000000
[360491.933719]  ffff880073048490 ffffffffa0228d45 6db6db6db6db6db7
ffff8801c0556640
[360491.933763]  ffff88020e764000 ffff880000000000 000000000001731e
ffff880197cb8158
[360491.933807] Call Trace:
[360491.933848]  [<ffffffffa0228d45>] btrfs_search_old_slot+0x635/0x950 [btrfs]
[360491.933909]  [<ffffffffa02a1ec6>]
__resolve_indirect_refs+0x156/0x640 [btrfs]
[360491.934044]  [<ffffffffa02a2e0c>] find_parent_nodes+0x95c/0x1050 [btrfs]
[360491.934176]  [<ffffffffa02a3592>] btrfs_find_all_roots+0x92/0x100 [btrfs]
[360491.934307]  [<ffffffffa02a401e>] iterate_extent_inodes+0x16e/0x370 [btrfs]
[360491.934440]  [<ffffffffa02a42b8>]
iterate_inodes_from_logical+0x98/0xc0 [btrfs]
[360491.934572]  [<ffffffffa024c1c8>] record_extent_backrefs+0x68/0xe0 [btrfs]
[360491.934652]  [<ffffffffa0256d80>]
btrfs_finish_ordered_io+0x150/0x990 [btrfs]
[360491.934739]  [<ffffffffa0276ef3>] worker_loop+0x153/0x560 [btrfs]
[360491.934833]  [<ffffffff810697c3>] kthread+0xb3/0xc0
[360491.934864]  [<ffffffff815dc6bc>] ret_from_fork+0x7c/0xb0
[360491.934896] DWARF2 unwinder stuck at ret_from_fork+0x7c/0xb0
[360491.934925]
[360491.934934] Leftover inexact backtrace:
[360491.934934]
[360491.934965]  [<ffffffff81069710>] ? kthread_create_on_node+0x120/0x120
[360491.934999] Code: c1 48 63 43 58 48 89 c2 48 c1 e2 05 48 8d 54 10
65 48 63 43 2c 48 89 c6 48 c1 e6 05 48 8d 74 30 65 e8 3a af 04 00 e9
b3 fe ff ff <0f> 0b 0f 0b 90 41 57 41 56 41 55 41 54 55 48 89 fd 53 48
83 ec
[360491.935188] RIP  [<ffffffffa022075b>]
__tree_mod_log_rewind+0x23b/0x240 [btrfs]
[360491.935233]  RSP <ffff88014cad7888>
[360491.946047] ---[ end trace 1475a0830dcadf9c ]---
[360491.946051] note: btrfs-endio-wri[22166] exited with preempt_count 1

On Thu, Jun 20, 2013 at 8:11 PM, Chris Mason <chris.mason@fusionio.com> wrote:
> Quoting Jon Nelson (2013-06-18 13:19:04)
>> Josef Bacik <jbacik <at> fusionio.com> writes:
>>
>> >
>> > On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
>> > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this
>> > > is a known problem?  In this case there is no powercycling; just a regular
>> > > ceph-osd workload.
>>
>> ..
>>
>>
>> I'm able to cause a complete kernel hang by defrag'ing even one
>> file on 3.9.X (3.9.0 through 3.9.4, so far).
>
> I'm not able to reproduce this here.  Could you please capture the
> output from sysrq-w during the hang?  (It will all go to your dmesg)
>
> -chris
>



-- 
Jon
Software Blacksmith

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  1:46       ` Jon Nelson
@ 2013-06-21  5:40         ` Chris Murphy
  2013-06-21  6:42         ` Clemens Eisserer
  2013-06-21 11:30         ` Chris Mason
  2 siblings, 0 replies; 23+ messages in thread
From: Chris Murphy @ 2013-06-21  5:40 UTC (permalink / raw)
  To: Jon Nelson; +Cc: linux-btrfs


On Jun 20, 2013, at 7:46 PM, Jon Nelson <jnelson@jamponi.net> wrote:

> Is this what you are looking for?

If you're able to reproduce while you're remoted in via ssh, then if you get the dmesg at least you won't have to spend time trying to save it somewhere since you'll have it on the remote system's terminal window.

https://www.kernel.org/doc/Documentation/sysrq.txt

So basically:
echo w > /proc/sysrq-trigger
dmesg




Chris Murphy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  1:46       ` Jon Nelson
  2013-06-21  5:40         ` Chris Murphy
@ 2013-06-21  6:42         ` Clemens Eisserer
  2013-06-25 15:52           ` Liu Bo
  2013-06-21 11:30         ` Chris Mason
  2 siblings, 1 reply; 23+ messages in thread
From: Clemens Eisserer @ 2013-06-21  6:42 UTC (permalink / raw)
  To: linux-btrfs

Hi Jon,

> Is this what you are looking for?
> After this, the CPU gets "stuck" and I have to reboot.

This issue has already been reported:
https://bugzilla.kernel.org/show_bug.cgi?id=59451

Regards, Clemens

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  1:46       ` Jon Nelson
  2013-06-21  5:40         ` Chris Murphy
  2013-06-21  6:42         ` Clemens Eisserer
@ 2013-06-21 11:30         ` Chris Mason
  2013-06-21 13:14           ` Jon Nelson
  2 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2013-06-21 11:30 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Jon Nelson, linux-btrfs

Quoting Jon Nelson (2013-06-20 21:46:46)
> Is this what you are looking for?
> After this, the CPU gets "stuck" and I have to reboot.
> 
> 
> [360491.932226] ------------[ cut here ]------------
> [360491.932261] kernel BUG at
> /home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144!

Aha, this is not a hang but a crash.  I know the end result is the same
(you push the reset button) but for a crash we always want to see this
oops message.

I'll try to reproduce (and check out the bugzilla too).  Do you have
compression on?

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21 11:30         ` Chris Mason
@ 2013-06-21 13:14           ` Jon Nelson
  0 siblings, 0 replies; 23+ messages in thread
From: Jon Nelson @ 2013-06-21 13:14 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jon Nelson, linux-btrfs

On Fri, Jun 21, 2013 at 6:30 AM, Chris Mason <chris.mason@fusionio.com> wrote:
> Quoting Jon Nelson (2013-06-20 21:46:46)
>> Is this what you are looking for?
>> After this, the CPU gets "stuck" and I have to reboot.
>>
>>
>> [360491.932226] ------------[ cut here ]------------
>> [360491.932261] kernel BUG at
>> /home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144!
>
> Aha, this is not a hang but a crash.  I know the end result is the same
> (you push the reset button) but for a crash we always want to see this
> oops message.

In my case, "pick a file, any file". I randomly choose a file, and
this most recent time I chose $HOME/.config/Clementine/jamendo.db.

> I'll try to reproduce (and check out the bugzilla too).  Do you have>
> -chris
> compression on?

I don't have compression on:

/blah/blah on / type btrfs (rw,relatime,ssd,space_cache,enospc_debug)

But those mount opts don't seem to matter.
I checked out the bugzilla entry, it sure looks similar!


--
Jon
Software Blacksmith

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-18 17:19   ` Jon Nelson
  2013-06-21  1:11     ` Chris Mason
@ 2013-06-21 14:54     ` Johannes Hirte
  1 sibling, 0 replies; 23+ messages in thread
From: Johannes Hirte @ 2013-06-21 14:54 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 18 Jun 2013 17:19:04 +0000 (UTC)
Jon Nelson <jnelson+btrfs@jamponi.net> wrote:

> Josef Bacik <jbacik <at> fusionio.com> writes:
> 
> > 
> > On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
> > > I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.
> > > Is this is a known problem?  In this case there is no
> > > powercycling; just a regular ceph-osd workload.
> 
> ..
> 
> 
> I'm able to cause a complete kernel hang by defrag'ing even one 
> file on 3.9.X (3.9.0 through 3.9.4, so far).

I see similar behavior with autodefrag enabled. When fetching mails
with claws (piped through bogofilter) the whole system got stuck more
or less. I can switch between the tasks but everything involving I/O is
hanging. Most time I was able to solve this with the sync command in a
shell. I got only one time a backtrace from hung task checker:

Jun 20 12:37:47 localhost kernel: INFO: task btrfs-cleaner:771 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: btrfs-cleaner   D ffff88011fc91740     0   771      2 0x00000000
Jun 20 12:37:47 localhost kernel: ffff88011aaff530 0000000000000002 ffff880119929fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff880119929fd8 ffff88011aabddc0 ffff880119928000 ffff880119929cb8
Jun 20 12:37:47 localhost kernel: ffff8800aaac22b4 ffff8800aaac22b0 0000000000000000 ffff8800aaac22b8
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff8109856d>] ? ondemand_readahead+0x15d/0x200
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff81171579>] ? btrfs_defrag_file+0x41a/0xa4e
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff81159eb9>] ? btrfs_run_defrag_inodes+0x1f7/0x2d3
Jun 20 12:37:47 localhost kernel: [<ffffffff8115100e>] ? btrfs_run_delayed_iputs+0x44/0xbe
Jun 20 12:37:47 localhost kernel: [<ffffffff811459d5>] ? cleaner_kthread+0x89/0xf3
Jun 20 12:37:47 localhost kernel: [<ffffffff8114594c>] ? transaction_kthread+0x17a/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff81042c00>] ? kthread+0x7d/0x85
Jun 20 12:37:47 localhost kernel: [<ffffffff81040000>] ? thaw_workqueues+0xd3/0xff
Jun 20 12:37:47 localhost kernel: [<ffffffff81042b83>] ? __kthread_parkme+0x59/0x59
Jun 20 12:37:47 localhost kernel: [<ffffffff8148942c>] ? ret_from_fork+0x7c/0xb0
Jun 20 12:37:47 localhost kernel: [<ffffffff81042b83>] ? __kthread_parkme+0x59/0x59
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2384 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0  2384   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82dce20 0000000000000002 ffff8800c0c55fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0c55fd8 ffffffff81a10400 ffff8800c0c54000 ffff8800c0c55e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2385 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2385   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82dd5f0 0000000000000002 ffff8800c0d1dfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0d1dfd8 ffff88011aabddc0 ffff8800c0d1c000 ffff8800c0d1de90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2393 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2393   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800c82df530 0000000000000002 ffff8800c0daffd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0daffd8 ffff88011aabddc0 ffff8800ddd9ade0 ffff88011816e1e8
Jun 20 12:37:47 localhost kernel: 0000000000000000 0000000000000000 00000000003c0000 0000000000000000
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff8114aa11>] ? wait_current_trans.isra.19+0xaa/0xdc
Jun 20 12:37:47 localhost kernel: [<ffffffff81043492>] ? finish_wait+0x60/0x60
Jun 20 12:37:47 localhost kernel: [<ffffffff8114c663>] ? start_transaction+0x3b9/0x3e9
Jun 20 12:37:47 localhost kernel: [<ffffffff8114d71b>] ? __unlink_start_trans+0x46/0x37f
Jun 20 12:37:47 localhost kernel: [<ffffffff810d9b18>] ? __d_lookup+0xdb/0xed
Jun 20 12:37:47 localhost kernel: [<ffffffff810d242d>] ? path_lookupat+0x6d/0x2a7
Jun 20 12:37:47 localhost kernel: [<ffffffff811537be>] ? btrfs_unlink+0x1f/0x91
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3ee2>] ? vfs_unlink+0x61/0xa8
Jun 20 12:37:47 localhost kernel: [<ffffffff810d4000>] ? do_unlinkat+0xd7/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2398 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff880106fc0000     0  2398   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff880106fc0000 0000000000000002 ffff8800c0fa3fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800c0fa3fd8 ffff8800c82dd5f0 ffff8800c0fa2000 ffff8800c0fa3e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:2400 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0  2400   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800d8ebc650 0000000000000002 ffff8800bece9fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800bece9fd8 ffff88011aabddc0 ffff8800bece8000 ffff8800bece9e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:12930 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc91740     0 12930   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8800acf22710 0000000000000002 ffff8801018a1fd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8801018a1fd8 ffff88011aabddc0 ffff8801018a0000 ffff8801018a1e90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:18283 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0 18283   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff8801030f8000 0000000000000002 ffff8800a8a9dfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff8800a8a9dfd8 ffffffff81a10400 ffff8800a8a9c000 ffff8800a8a9de90
Jun 20 12:37:47 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:47 localhost kernel: Call Trace:
Jun 20 12:37:47 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:47 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:47 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:47 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:47 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:47 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:47 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:47 localhost kernel: INFO: task konqueror:21478 blocked for more than 120 seconds.
Jun 20 12:37:47 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:47 localhost kernel: konqueror       D ffff88011fc11740     0 21478   2133 0x00000000
Jun 20 12:37:47 localhost kernel: ffff88011aafc650 0000000000000002 ffff88010852bfd8 0000000000011740
Jun 20 12:37:47 localhost kernel: ffff88010852bfd8 ffffffff81a10400 ffff88010852a000 ffff88010852be90
Jun 20 12:37:49 localhost kernel: ffff8800c40aeb44 ffff8800c40aeb40 0000000000000000 ffff8800c40aeb48
Jun 20 12:37:49 localhost kernel: Call Trace:
Jun 20 12:37:49 localhost kernel: [<ffffffff81487fd4>] ? schedule_preempt_disabled+0x19/0x1e
Jun 20 12:37:49 localhost kernel: [<ffffffff81486fa1>] ? __mutex_lock_common.isra.9+0x19d/0x283
Jun 20 12:37:49 localhost kernel: [<ffffffff81486dbf>] ? mutex_lock+0xe/0x1d
Jun 20 12:37:49 localhost kernel: [<ffffffff810d3fb1>] ? do_unlinkat+0x88/0x17a
Jun 20 12:37:49 localhost kernel: [<ffffffff814889b0>] ? _raw_spin_unlock+0x27/0x31
Jun 20 12:37:49 localhost kernel: [<ffffffff8100ae75>] ? syscall_trace_enter+0xcf/0x145
Jun 20 12:37:49 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2
Jun 20 12:37:49 localhost kernel: INFO: task bogofilter:21625 blocked for more than 120 seconds.
Jun 20 12:37:49 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 20 12:37:49 localhost kernel: bogofilter      D ffff88011fc91740     0 21625      1 0x00000000
Jun 20 12:37:49 localhost kernel: ffff88009f116d60 0000000000000002 ffff8800296e7fd8 0000000000011740
Jun 20 12:37:49 localhost kernel: ffff8800296e7fd8 ffff88011aabddc0 ffff88004b568da8 ffff88002970a800
Jun 20 12:37:49 localhost kernel: 0000000000000000 ffff8800aaa2a2e0 ffff8800aaa2a350 ffff88002970ac60
Jun 20 12:37:49 localhost kernel: Call Trace:
Jun 20 12:37:49 localhost kernel: [<ffffffff811763fd>] ? log_one_extent.isra.24+0x441/0x58b
Jun 20 12:37:49 localhost kernel: [<ffffffff81043492>] ? finish_wait+0x60/0x60
Jun 20 12:37:49 localhost kernel: [<ffffffff81177ee9>] ? btrfs_log_inode+0x5d0/0x71e
Jun 20 12:37:49 localhost kernel: [<ffffffff811791f5>] ? btrfs_log_inode_parent+0x1f1/0x2dd
Jun 20 12:37:49 localhost kernel: [<ffffffff81179c34>] ? btrfs_log_dentry_safe+0x31/0x45
Jun 20 12:37:49 localhost kernel: [<ffffffff81158e69>] ? btrfs_sync_file+0x11a/0x1e2
Jun 20 12:37:49 localhost kernel: [<ffffffff810e8211>] ? do_fsync+0x2b/0x50
Jun 20 12:37:49 localhost kernel: [<ffffffff8148966c>] ? tracesys+0x7e/0xe2
Jun 20 12:37:49 localhost kernel: [<ffffffff810e8402>] ? SyS_fdatasync+0xa/0xd
Jun 20 12:37:49 localhost kernel: [<ffffffff814896cb>] ? tracesys+0xdd/0xe2

After disabling autodefrag it didn't happen again. The kernel is the latest git.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  6:42         ` Clemens Eisserer
@ 2013-06-25 15:52           ` Liu Bo
  0 siblings, 0 replies; 23+ messages in thread
From: Liu Bo @ 2013-06-25 15:52 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-btrfs, Jon Nelson, Chris Mason

On Fri, Jun 21, 2013 at 08:42:55AM +0200, Clemens Eisserer wrote:
> Hi Jon,
> 
> > Is this what you are looking for?
> > After this, the CPU gets "stuck" and I have to reboot.
> 
> This issue has already been reported:
> https://bugzilla.kernel.org/show_bug.cgi?id=59451

Hi,

Could you please try this patch?

(at least it addresses my ulist_add_merge crash :))

thanks,
liubo

diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c
index 7b417e2..69a9c32 100644
--- a/fs/btrfs/ulist.c
+++ b/fs/btrfs/ulist.c
@@ -73,7 +73,6 @@ void ulist_fini(struct ulist *ulist)
 	if (ulist->nodes_alloced > ULIST_SIZE)
 		kfree(ulist->nodes);
 	ulist->nodes_alloced = 0;	/* in case ulist_fini is called twice */
-	ulist->root = RB_ROOT;
 }
 EXPORT_SYMBOL(ulist_fini);
 
@@ -205,6 +204,7 @@ int ulist_add_merge(struct ulist *ulist, u64 val, u64 aux,
 		u64 new_alloced = ulist->nodes_alloced + 128;
 		struct ulist_node *new_nodes;
 		void *old = NULL;
+		int i;
 
 		/*
 		 * if nodes_alloced == ULIST_SIZE no memory has been allocated
@@ -222,6 +222,17 @@ int ulist_add_merge(struct ulist *ulist, u64 val, u64 aux,
 			memcpy(new_nodes, ulist->int_nodes,
 			       sizeof(ulist->int_nodes));
 
+		/*
+		 * krealloc actually uses memcpy, which does not copy rb_node
+		 * pointers, so we have to do it ourselves.  Otherwise we may
+		 * be bitten by crashes.
+		 */
+		for (i = 0; i < ulist->nnodes; i++) {
+			rb_erase(&ulist->nodes[i].rb_node, &ulist->root);
+			ret = ulist_rbtree_insert(ulist, &new_nodes[i]);
+			BUG_ON(ret);
+		}
+
 		ulist->nodes = new_nodes;
 		ulist->nodes_alloced = new_alloced;
 	}


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-06-21  0:42         ` Chris Mason
  2013-06-21  1:00           ` Sage Weil
@ 2013-08-09 18:04           ` Sage Weil
  2013-08-09 18:08             ` Chris Mason
  1 sibling, 1 reply; 23+ messages in thread
From: Sage Weil @ 2013-08-09 18:04 UTC (permalink / raw)
  To: Chris Mason; +Cc: Josef Bacik, linux-btrfs

Hi Chris,

On Thu, 20 Jun 2013, Chris Mason wrote:
> Quoting Sage Weil (2013-06-20 17:56:19)
> > On Wed, 19 Jun 2013, Sage Weil wrote:
> > > Hi Chris,
> > > 
> > > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > > [...]
> > > > Very long way of saying I think we're one release_path short.  Sage, I
> > > > haven't tested this at all yet, I was hoping to trigger it first.
> > > > 
> > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > > index c276ac9..c1954b3 100644
> > > > --- a/fs/btrfs/tree-log.c
> > > > +++ b/fs/btrfs/tree-log.c
> > > > @@ -3730,6 +3730,7 @@ next_slot:
> > > >  log_extents:
> > > >     if (fast_search) {
> > > >             btrfs_release_path(dst_path);
> > > > +           btrfs_release_path(path);
> > > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > > >             if (ret) {
> > > >                     err = ret;
> > > 
> > > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > > so good!
> > 
> > ...and it's still holding up well in QA.
> 
> Awesome, thanks for getting the traces for us.  Looks like this one has
> been around since v3.7, so I'm not going to try and sneak it into the
> 3.10 final.  I'll have it in the next merge window and for stable.

I was just pulling in the current -rc for testing and realized that this 
isn't upstream yet.  Was this missed?  Sorry I didn't check earlier.

Thanks!
sage

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: hang on 3.9, 3.10-rc5
  2013-08-09 18:04           ` Sage Weil
@ 2013-08-09 18:08             ` Chris Mason
  0 siblings, 0 replies; 23+ messages in thread
From: Chris Mason @ 2013-08-09 18:08 UTC (permalink / raw)
  To: Sage Weil; +Cc: Josef Bacik, linux-btrfs

Quoting Sage Weil (2013-08-09 14:04:15)
> Hi Chris,
> 
> On Thu, 20 Jun 2013, Chris Mason wrote:
> > Quoting Sage Weil (2013-06-20 17:56:19)
> > > On Wed, 19 Jun 2013, Sage Weil wrote:
> > > > Hi Chris,
> > > > 
> > > > On Tue, 18 Jun 2013, Chris Mason wrote:
> > > > > [...]
> > > > > Very long way of saying I think we're one release_path short.  Sage, I
> > > > > haven't tested this at all yet, I was hoping to trigger it first.
> > > > > 
> > > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > > > > index c276ac9..c1954b3 100644
> > > > > --- a/fs/btrfs/tree-log.c
> > > > > +++ b/fs/btrfs/tree-log.c
> > > > > @@ -3730,6 +3730,7 @@ next_slot:
> > > > >  log_extents:
> > > > >     if (fast_search) {
> > > > >             btrfs_release_path(dst_path);
> > > > > +           btrfs_release_path(path);
> > > > >             ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
> > > > >             if (ret) {
> > > > >                     err = ret;
> > > > 
> > > > This seems to be doing the trick.  I'll keep testing overnight, but so far 
> > > > so good!
> > > 
> > > ...and it's still holding up well in QA.
> > 
> > Awesome, thanks for getting the traces for us.  Looks like this one has
> > been around since v3.7, so I'm not going to try and sneak it into the
> > 3.10 final.  I'll have it in the next merge window and for stable.
> 
> I was just pulling in the current -rc for testing and realized that this 
> isn't upstream yet.  Was this missed?  Sorry I didn't check earlier.

Ack, thanks Sage.  It'll go into the next pull (cc stable)

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-08-09 18:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11 15:43 hang on 3.9, 3.10-rc5 Sage Weil
2013-06-11 16:21 ` Chris Mason
     [not found]   ` <alpine.DEB.2.00.1306121317300.27596@cobra.newdream.net>
2013-06-18 15:56     ` Sage Weil
2013-06-18 16:36       ` Chris Mason
2013-06-18 16:37 ` Josef Bacik
2013-06-18 16:59   ` Chris Mason
2013-06-20  3:44     ` Sage Weil
2013-06-20 21:56       ` Sage Weil
2013-06-21  0:42         ` Chris Mason
2013-06-21  1:00           ` Sage Weil
2013-06-21  1:04             ` Chris Mason
2013-06-21  1:09               ` Sage Weil
2013-08-09 18:04           ` Sage Weil
2013-08-09 18:08             ` Chris Mason
2013-06-18 17:19   ` Jon Nelson
2013-06-21  1:11     ` Chris Mason
2013-06-21  1:46       ` Jon Nelson
2013-06-21  5:40         ` Chris Murphy
2013-06-21  6:42         ` Clemens Eisserer
2013-06-25 15:52           ` Liu Bo
2013-06-21 11:30         ` Chris Mason
2013-06-21 13:14           ` Jon Nelson
2013-06-21 14:54     ` Johannes Hirte

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.