linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.16.2 btrfs deadlock
@ 2014-10-05 20:29 Marc MERLIN
  2014-10-05 23:16 ` 3.16.2 btrfs deadlock -> detecting deadlocks with cron Marc MERLIN
  2014-10-07 21:22 ` 3.16.2 btrfs deadlock Chris Mason
  0 siblings, 2 replies; 4+ messages in thread
From: Marc MERLIN @ 2014-10-05 20:29 UTC (permalink / raw)
  To: linux-btrfs

Deadlocks have been less frequent (good), but here is one.

An rsync from 5 days ago got stuck on btrfs it seems, and things just 
started piling up on top until the system deadlocked

I see btrfs-transaction saying wait on page, but if it's RAM, I had
plenty left:
             total       used       free     shared    buffers     cached
Mem:       7894580    6950108     944472          0         40    4816148
-/+ buffers/cache:    2133920    5760660
Swap:     15616764     767004   14849760


Here's the trace:
SysRq : Show Blocked State
  task                        PC stack   pid father
md8_raid5       D ffff88017028cb80     0   675      2 0x00000000
 ffff88020fd67aa8 0000000000000046 ffffffff812f1799 ffff88020fd67fd8
 ffff880037228410 00000000000140c0 ffff88021e3940c0 ffff880037228410
 ffff8801f5579bf0 0000000000000004 ffff880211ad07c8 ffff88020fd67ab8
Call Trace:
 [<ffffffff812f1799>] ? blk_flush_plug_list+0x1bc/0x1cb
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
 [<ffffffff812ef8f6>] get_request+0x4b8/0x56a
 [<ffffffff8130ba01>] ? cfq_merge+0x49/0x9e
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff812f1921>] blk_queue_bio+0x179/0x262
 [<ffffffff812eec2c>] generic_make_request+0x9c/0xdb
 [<ffffffffc042b31d>] handle_stripe+0x1e41/0x2166 [raid456]
 [<ffffffff81017316>] ? ___preempt_schedule+0x56/0xa8
 [<ffffffff8163d0d9>] ? _raw_spin_unlock_irqrestore+0x1f/0x32
 [<ffffffffc042b925>] handle_active_stripes.isra.22+0x2e3/0x359 [raid456]
 [<ffffffff8151d35e>] ? md_wakeup_thread+0x55/0x58
 [<ffffffffc042bdd6>] raid5d+0x330/0x428 [raid456]
 [<ffffffff810795b1>] ? get_parent_ip+0xd/0x3c
 [<ffffffff815202f9>] md_thread+0x11c/0x13a
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff815201dd>] ? bb_store+0x55/0x55
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
btrfs-transacti D 0000000000000002     0   781      2 0x00000000
 ffff88020fd6b8c0 0000000000000046 ffff88020fd6b890 ffff88020fd6bfd8
 ffff88021150e3d0 00000000000140c0 ffff88021e3140c0 ffff88021150e3d0
 ffff88020fd6b960 ffffffff81106441 0000000000000002 ffff88020fd6b8d0
Call Trace:
 [<ffffffff81106441>] ? wait_on_page_read+0x3c/0x3c
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
 [<ffffffff8110644f>] sleep_on_page+0xe/0x12
 [<ffffffff8163acbd>] __wait_on_bit+0x48/0x7a
 [<ffffffff811063ef>] wait_on_page_bit+0x7a/0x7c
 [<ffffffff81087ba4>] ? autoremove_wake_function+0x34/0x34
 [<ffffffff812504b5>] read_extent_buffer_pages+0x1bf/0x204
 [<ffffffff8122d7f9>] ? free_root_pointers+0x5b/0x5b
 [<ffffffff8122e4ec>] btree_read_extent_buffer_pages.constprop.44+0x66/0x100
 [<ffffffff8122f532>] read_tree_block+0x2f/0x47
 [<ffffffff81214f20>] read_node_slot+0x85/0xb2
 [<ffffffff812182e3>] push_leaf_right+0x76/0x16b
 [<ffffffff81218b24>] split_leaf+0xc5/0x567
 [<ffffffff81215755>] ? leaf_space_used+0xa9/0xd8
 [<ffffffff8163d188>] ? _raw_write_unlock+0x16/0x29
 [<ffffffff81262878>] ? btrfs_set_lock_blocking_rw+0x70/0xb5
 [<ffffffff8121961a>] btrfs_search_slot+0x654/0x6ef
 [<ffffffff8121a8f5>] btrfs_insert_empty_items+0x5d/0xa8
 [<ffffffff812253ff>] __btrfs_run_delayed_refs+0x706/0xc49
 [<ffffffff812274bd>] btrfs_run_delayed_refs+0x71/0x197
 [<ffffffff81234d62>] btrfs_commit_transaction+0x3d/0x87f
 [<ffffffff812317b1>] transaction_kthread+0xf8/0x1ab
 [<ffffffff812316b9>] ? btrfs_cleanup_transaction+0x436/0x436
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
btrfs-transacti D 0000000000000002     0  3349      2 0x00000000
 ffff8802085b3a60 0000000000000046 ffff8802085b3a30 ffff8802085b3fd8
 ffff8800c6836350 00000000000140c0 ffff88021e3140c0 ffff8800c6836350
 ffff8802085b3b00 ffffffff81106441 0000000000000002 ffff8802085b3a70
Call Trace:
 [<ffffffff81106441>] ? wait_on_page_read+0x3c/0x3c
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
 [<ffffffff8110644f>] sleep_on_page+0xe/0x12
 [<ffffffff8163acbd>] __wait_on_bit+0x48/0x7a
 [<ffffffff811063ef>] wait_on_page_bit+0x7a/0x7c
 [<ffffffff81087ba4>] ? autoremove_wake_function+0x34/0x34
 [<ffffffff8110766f>] filemap_fdatawait_range+0x7e/0x126
 [<ffffffff81249f9c>] btrfs_wait_ordered_range+0x6a/0x11a
 [<ffffffff8126ad42>] __btrfs_write_out_cache+0x29b/0x387
 [<ffffffff8126b2eb>] btrfs_write_out_cache+0xab/0xe3
 [<ffffffff81227ba1>] btrfs_write_dirty_block_groups+0x543/0x561
 [<ffffffff81635381>] commit_cowonly_roots+0x14d/0x219
 [<ffffffff81235142>] btrfs_commit_transaction+0x41d/0x87f
 [<ffffffff812317b1>] transaction_kthread+0xf8/0x1ab
 [<ffffffff812316b9>] ? btrfs_cleanup_transaction+0x436/0x436
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
md8_resync      D 0000000000000002     0 20574      2 0x00000080
 ffff88006666fbd0 0000000000000046 ffff88021149f618 ffff88006666ffd8
 ffff880015f20510 00000000000140c0 ffff88021149f400 ffff88021149f424
 0000000000000000 ffff88021149f5f8 ffff88006666fc28 ffff88006666fbe0
Call Trace:
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffffc0428fcf>] get_active_stripe+0x225/0x575 [raid456]
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffffc042c876>] sync_request+0x25d/0x2d0 [raid456]
 [<ffffffff81521461>] md_do_sync+0x7fb/0xc2f
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff815202f9>] md_thread+0x11c/0x13a
 [<ffffffff815201dd>] ? bb_store+0x55/0x55
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
kworker/0:2     D ffff88010e4bf440     0 28539      2 0x00000080
Workqueue: events do_sync_work
 ffff880075083b60 0000000000000046 ffff880075083b30 ffff880075083fd8
 ffff880017c3a210 00000000000140c0 ffff88021e2140c0 ffff880017c3a210
 ffff880075083c00 ffffffff81106441 0000000000000002 ffff880075083b70
Call Trace:
 [<ffffffff81106441>] ? wait_on_page_read+0x3c/0x3c
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
 [<ffffffff8110644f>] sleep_on_page+0xe/0x12
 [<ffffffff8163acbd>] __wait_on_bit+0x48/0x7a
 [<ffffffff811063ef>] wait_on_page_bit+0x7a/0x7c
 [<ffffffff81087ba4>] ? autoremove_wake_function+0x34/0x34
 [<ffffffff81111cf3>] ? pagevec_lookup_tag+0x21/0x29
 [<ffffffff8110766f>] filemap_fdatawait_range+0x7e/0x126
 [<ffffffff81107738>] filemap_fdatawait+0x21/0x23
 [<ffffffff81181b06>] sync_inodes_sb+0x14e/0x1a0
 [<ffffffff81185854>] ? vfs_fsync+0x1e/0x1e
 [<ffffffff81185868>] sync_inodes_one_sb+0x14/0x16
 [<ffffffff8116337e>] iterate_supers+0x6e/0xc2
 [<ffffffff811858bf>] do_sync_work+0x25/0x8b
 [<ffffffff81068bf7>] process_one_work+0x195/0x2d2
 [<ffffffff81068fd8>] worker_thread+0x275/0x352
 [<ffffffff81068d63>] ? process_scheduled_works+0x2f/0x2f
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
kworker/u8:10   D 0000000000000000     0 20493      2 0x00000080
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
 ffff88007d427cd8 0000000000000046 ffff8800c96aa270 ffff88007d427fd8
 ffff88003c7e62d0 00000000000140c0 ffff880110e8e1e0 ffff8802115c11f0
 ffff8802115c1000 ffff88010b2a0e80 00000000fffffff0 ffff88007d427ce8
Call Trace:
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff81234554>] wait_current_trans.isra.15+0x98/0xf4
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff812357c8>] start_transaction+0x224/0x50f
 [<ffffffff81235b04>] btrfs_join_transaction+0x17/0x19
 [<ffffffff812275fb>] delayed_ref_async_start+0x18/0x7b
 [<ffffffff8125ac72>] normal_work_helper+0xf1/0x233
 [<ffffffff8125af61>] btrfs_extent_refs_helper+0x12/0x14
 [<ffffffff81068bf7>] process_one_work+0x195/0x2d2
 [<ffffffff81068fd8>] worker_thread+0x275/0x352
 [<ffffffff81068d63>] ? process_scheduled_works+0x2f/0x2f
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
kworker/0:4     D ffff88017028cb80     0 22618      2 0x00000080
Workqueue: kcryptd kcryptd_crypt [dm_crypt]
 ffff88006c577b88 0000000000000046 ffff88021149f618 ffff88006c577fd8
 ffff8800017d41d0 00000000000140c0 ffff88021149f400 ffff88021149f418
 0000000000000000 ffff88021149f5c8 ffff88006c577be0 ffff88006c577b98
Call Trace:
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffffc0428fcf>] get_active_stripe+0x225/0x575 [raid456]
 [<ffffffff8107965a>] ? preempt_count_add+0x7a/0x8d
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffffc042d079>] make_request+0x790/0xa04 [raid456]
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff81521ac5>] md_make_request+0x105/0x213
 [<ffffffff812eec2c>] generic_make_request+0x9c/0xdb
 [<ffffffffc0437d1f>] kcryptd_crypt_write_io_submit+0x9d/0xa2 [dm_crypt]
 [<ffffffffc0438418>] kcryptd_crypt+0x261/0x411 [dm_crypt]
 [<ffffffff81068bf7>] process_one_work+0x195/0x2d2
 [<ffffffff81068fd8>] worker_thread+0x275/0x352
 [<ffffffff81068d63>] ? process_scheduled_works+0x2f/0x2f
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
rsync           D ffff880129726080     0 28673   9798 0x20020080
 ffff8800484d7c30 0000000000000082 ffff88013f4976d8 ffff8800484d7fd8
 ffff88007cfe61d0 00000000000140c0 ffff880150dce660 ffff880150dce664
 ffff88007cfe61d0 ffff880150dce668 00000000ffffffff ffff8800484d7c40
Call Trace:
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163ab47>] schedule_preempt_disabled+0x18/0x24
 [<ffffffff8163bc6a>] __mutex_lock_slowpath+0x160/0x1d7
 [<ffffffff811690f4>] ? lookup_fast+0x15f/0x228
 [<ffffffff8163bcf8>] mutex_lock+0x17/0x27
 [<ffffffff8116956c>] walk_component+0x69/0x180
 [<ffffffff811696b1>] lookup_last+0x2e/0x30
 [<ffffffff8116a357>] path_lookupat+0x90/0x2d7
 [<ffffffff81148eb1>] ? kmemleak_alloc_recursive.constprop.44+0x16/0x18
 [<ffffffff8114aa0b>] ? kmem_cache_alloc+0xff/0x17e
 [<ffffffff8116a5c7>] filename_lookup+0x29/0x7f
 [<ffffffff8116d2ed>] user_path_at_empty+0x59/0x8c
 [<ffffffff8163d0a7>] ? _raw_spin_unlock+0x17/0x2a
 [<ffffffff81125b6e>] ? list_lru_add+0xc4/0xdb
 [<ffffffff81178f06>] ? mntput_no_expire+0x33/0x12b
 [<ffffffff8116d331>] user_path_at+0x11/0x13
 [<ffffffff811648a4>] vfs_fstatat+0x49/0x96
 [<ffffffff810c2efb>] ? from_kgid_munged+0x12/0x1e
 [<ffffffff8116498c>] vfs_lstat+0x1e/0x20
 [<ffffffff8104b32d>] sys32_lstat64+0x1a/0x33
 [<ffffffff811682a0>] ? path_put+0x1e/0x21
 [<ffffffff810a4aae>] ? current_kernel_time+0xe/0x32
 [<ffffffff810cb5d4>] ? __audit_syscall_entry+0xa6/0xc8
 [<ffffffff816401cc>] sysenter_dispatch+0x7/0x21
kworker/3:2     D 0000000000000000     0 29416      2 0x00000080
Workqueue: kcryptd kcryptd_crypt [dm_crypt]
 ffff88000549bbe0 0000000000000046 ffff880211117298 ffff88000549bfd8
 ffff88002c84a150 00000000000140c0 ffff880211117000 ffff880211117298
 ffff880211117290 0000000000000001 ffff8800c878d700 ffff88000549bbf0
Call Trace:
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8151d5b1>] md_write_start+0x11e/0x137
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff81510c5a>] make_request+0x3f/0xa96
 [<ffffffff812e3827>] ? crypto_aes_set_key+0x2c/0x2c
 [<ffffffff812e3827>] ? crypto_aes_set_key+0x2c/0x2c
 [<ffffffff81521ac5>] md_make_request+0x105/0x213
 [<ffffffff812eec2c>] generic_make_request+0x9c/0xdb
 [<ffffffffc0437d1f>] kcryptd_crypt_write_io_submit+0x9d/0xa2 [dm_crypt]
 [<ffffffffc0438418>] kcryptd_crypt+0x261/0x411 [dm_crypt]
 [<ffffffff81068bf7>] process_one_work+0x195/0x2d2
 [<ffffffff81068fd8>] worker_thread+0x275/0x352
 [<ffffffff81068d63>] ? process_scheduled_works+0x2f/0x2f
 [<ffffffff8106e3a9>] kthread+0xae/0xb6
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
rsync           D ffff88018351f040     0 30350  28673 0x20020080
 ffff88002e6fbbb0 0000000000000086 ffff8800c96aa270 ffff88002e6fbfd8
 ffff8800824be510 00000000000140c0 ffff880110e8e1e0 ffff8800c96a19f0
 ffff8800c96a1800 ffff880209e9f980 0000000000000000 ffff88002e6fbbc0
Call Trace:
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff81234554>] wait_current_trans.isra.15+0x98/0xf4
 [<ffffffff81087b70>] ? finish_wait+0x65/0x65
 [<ffffffff81235a43>] start_transaction+0x49f/0x50f
 [<ffffffff81235ace>] btrfs_start_transaction+0x1b/0x1d
 [<ffffffff8123fdae>] btrfs_create+0x3c/0x1ce
 [<ffffffff812a401b>] ? security_inode_permission+0x1c/0x23
 [<ffffffff81169b0b>] ? __inode_permission+0x79/0xa4
 [<ffffffff8116adc9>] vfs_create+0x66/0x8c
 [<ffffffff8116bb2b>] do_last+0x5af/0xa23
 [<ffffffff8116c1d6>] path_openat+0x237/0x4de
 [<ffffffff8116d5d5>] do_filp_open+0x3a/0x7f
 [<ffffffff810795b1>] ? get_parent_ip+0xd/0x3c
 [<ffffffff8163d0a7>] ? _raw_spin_unlock+0x17/0x2a
 [<ffffffff811775f8>] ? __alloc_fd+0xea/0xf9
 [<ffffffff8115fb65>] do_sys_open+0x70/0xff
 [<ffffffff8119d7bc>] compat_SyS_open+0x1b/0x1d
 [<ffffffff816401cc>] sysenter_dispatch+0x7/0x21
exim            D 0000000000000001     0  9533   4717 0x20020080
 ffff8800016bfc30 0000000000000082 ffff8800016bfc00 ffff8800016bffd8
 ffff8800b9fac410 00000000000140c0 ffff88021e2940c0 ffff8800b9fac410
 ffff8800016bfcd0 ffffffff81106441 0000000000000002 ffff8800016bfc40
Call Trace:
 [<ffffffff81106441>] ? wait_on_page_read+0x3c/0x3c
 [<ffffffff8163a889>] schedule+0x6e/0x70
 [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
 [<ffffffff8110644f>] sleep_on_page+0xe/0x12
 [<ffffffff8163acbd>] __wait_on_bit+0x48/0x7a
 [<ffffffff811063ef>] wait_on_page_bit+0x7a/0x7c
 [<ffffffff81087ba4>] ? autoremove_wake_function+0x34/0x34
 [<ffffffff8124ae77>] wait_on_page_writeback+0x16/0x18
 [<ffffffff8124ed06>] extent_write_cache_pages.isra.16.constprop.31+0x15b/0x2c3
 [<ffffffff8163b2e4>] ? mutex_unlock+0x16/0x18
 [<ffffffff81243e32>] ? btrfs_file_write_iter+0x34e/0x41c
 [<ffffffff8124f2a2>] extent_writepages+0x4b/0x5c
 [<ffffffff81238e7c>] ? btrfs_submit_direct+0x3f9/0x3f9
 [<ffffffff81237568>] btrfs_writepages+0x28/0x2a
 [<ffffffff81110efe>] do_writepages+0x1e/0x2c
 [<ffffffff81107b4f>] __filemap_fdatawrite_range+0x55/0x57
 [<ffffffff81107b82>] filemap_fdatawrite_range+0x13/0x15
 [<ffffffff81242a99>] btrfs_sync_file+0xaa/0x2cc
 [<ffffffff811627cb>] ? __sb_end_write+0x2d/0x5b
 [<ffffffff8118582c>] vfs_fsync_range+0x1b/0x25
 [<ffffffff81185852>] vfs_fsync+0x1c/0x1e
 [<ffffffff81185a5e>] do_fsync+0x2c/0x4c
 [<ffffffff81185c7e>] SyS_fdatasync+0x13/0x17
 [<ffffffff816401cc>] sysenter_dispatch+0x7/0x21


Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
CPU: 3 PID: 9583 Comm: sadc Not tainted 3.16.2-amd64-i915-preempt-20140714 #1
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012
 0000000000000000 ffff88021e386c20 ffffffff816369d6 ffffffff81aa16ef
 ffff88021e386c98 ffffffff81632988 0000000000000010 ffff88021e386ca8
 ffff88021e386c48 0000000000000000 0000000000000003 ffff88021e386ef8
Call Trace:
 <NMI>  [<ffffffff816369d6>] dump_stack+0x4e/0x7a
 [<ffffffff81632988>] panic+0xcb/0x1fe
 [<ffffffff810d0c32>] watchdog_overflow_callback+0xbb/0x110
 [<ffffffff810ff920>] __perf_event_overflow+0x155/0x1e3
 [<ffffffff81021695>] ? x86_perf_event_set_period+0x103/0x113
 [<ffffffff810ffe31>] perf_event_overflow+0x14/0x16
 [<ffffffff810277ec>] intel_pmu_handle_irq+0x2d7/0x364
 [<ffffffff810205ce>] perf_event_nmi_handler+0x25/0x3e
 [<ffffffff810122a5>] nmi_handle+0x60/0x11c
 [<ffffffff81038b62>] ? cpumask_clear_cpu.constprop.3+0x11/0x11
 [<ffffffff81012573>] default_do_nmi+0x74/0x157
 [<ffffffff810126b8>] do_nmi+0x62/0xa2
 [<ffffffff8163fde1>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff8108c679>] ? do_raw_spin_lock+0x19/0x27
 [<ffffffff8108c679>] ? do_raw_spin_lock+0x19/0x27
 [<ffffffff8108c679>] ? do_raw_spin_lock+0x19/0x27
 <<EOE>>  [<ffffffff8163d54a>] _raw_spin_lock_irq+0x24/0x27
 [<ffffffff813de347>] uart_proc_show+0x134/0x331
 [<ffffffff8117bcca>] seq_read+0x150/0x302
 [<ffffffff812a3a92>] ? fsnotify_perm+0x60/0x6b
 [<ffffffff811ae760>] proc_reg_read+0x5a/0x7c
 [<ffffffff8116069b>] vfs_read+0x88/0xfa
 [<ffffffff81160e9a>] SyS_read+0x46/0x79
 [<ffffffff816401cc>] sysenter_dispatch+0x7/0x21

      131.048038         0.016424         0.007621 /
    kworker/2:1H   347 4825481473.873853  32768954   100 4825481473.873853   1024031.415751 2431265490.458642 /

          bioset   656      7340.438018         2   100      7340.438018         0.079953         0.032596 /
          bioset   674      7428.235256         2   100      7428.235256         0.011555         0.006256 /
    btrfs-worker   761      8129.795955         2   100      8129.795955         0.096081         0.046142 /
 btrfs-worker-hi   763 3243544430.862921         4   100 3243544430.862921         0.119452 1603390139.421423 /
 btrfs-flush_del   765 4740941088.983912         4   100 4740941088.983912         0.105922 2389612400.423491 /
     btrfs-fixup   768      8175.005182         2   100      8175.005182         0.009229         0.006478 /
     btrfs-endio   769 4805566466.196436        64   100 4805566466.196436         9.582823 2423209769.669892 /
 btrfs-endio-met   771      8202.024157         2   100      8202.024157         0.007683         0.003995 /
 btrfs-endio-rai   772      8211.030439         2   100      8211.030439         0.007657         0.003322 /
       btrfs-rmw   773      8220.036746         2   100      8220.036746         0.007690         0.003264 /
 btrfs-freespace   775 4731100793.494699       131   100 4731100793.494699         3.131112 2383851399.473326 /
 btrfs-delayed-m   776  65137379.868524         3   100  65137379.868524         0.113543  26290219.856218 /
 btrfs-readahead   777      8


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.16.2 btrfs deadlock -> detecting deadlocks with cron
  2014-10-05 20:29 3.16.2 btrfs deadlock Marc MERLIN
@ 2014-10-05 23:16 ` Marc MERLIN
  2014-10-07 21:22 ` 3.16.2 btrfs deadlock Chris Mason
  1 sibling, 0 replies; 4+ messages in thread
From: Marc MERLIN @ 2014-10-05 23:16 UTC (permalink / raw)
  To: linux-btrfs

On Sun, Oct 05, 2014 at 01:29:37PM -0700, Marc MERLIN wrote:
> Deadlocks have been less frequent (good), but here is one.
> 
> An rsync from 5 days ago got stuck on btrfs it seems, and things just 
> started piling up on top until the system deadlocked

This gave me a chance to fix my cronjob that should have detected this
earlier (there is no fix but rebooting, but I can reboot earlier and
before the watchdog kills everything without syncing my software raid
five arrays first).

I just polished and released the crontab below (posted on
http://marc.merlins.org/perso/btrfs/post_2014-10-05_Btrfs-Tips_-Catch-Btrfs-Deadlocks.html
)

You can paste this template in your crontab 

SHELL=/bin/bash

# If load average is more than MAXLA, show load average and all blocked processes
# As any time show anything blocked on wait_current_trans.isra.15 (used to be a btrfs hang bug)
# Also show swap if it drops below MINSWAP
# We pipe into bc because shell comparison doesn't do floating point.
*/5 * * * * nobody MAXLA=25; MINSWAP=10; if [[ $(echo "$(awk '{print $1}' < /proc/loadavg) > $MAXLA" | bc) == 1 ]]; then cat /proc/loadavg; ps -eo state,pid,etime,wchan:30,args | grep W |grep -v "^[RS]" ; fi; ps -eo pid,etime,wchan:30,args | grep -q [w]ait_current_trans.isra.15; if [[ $(echo "$(free | grep 'Swap' | awk '{t = $2; f = $4; print (f/t*100)}') < $MINSWAP" | bc) == 1 ]]; then free; fi

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.16.2 btrfs deadlock
  2014-10-05 20:29 3.16.2 btrfs deadlock Marc MERLIN
  2014-10-05 23:16 ` 3.16.2 btrfs deadlock -> detecting deadlocks with cron Marc MERLIN
@ 2014-10-07 21:22 ` Chris Mason
  2014-10-07 23:45   ` Marc MERLIN
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Mason @ 2014-10-07 21:22 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs



On Sun, Oct 5, 2014 at 4:29 PM, Marc MERLIN <marc@merlins.org> wrote:
> Deadlocks have been less frequent (good), but here is one.
> 
> An rsync from 5 days ago got stuck on btrfs it seems, and things just
> started piling up on top until the system deadlocked
> 
> I see btrfs-transaction saying wait on page, but if it's RAM, I had
> plenty left:
>              total       used       free     shared    buffers     
> cached
> Mem:       7894580    6950108     944472          0         40    
> 4816148
> -/+ buffers/cache:    2133920    5760660
> Swap:     15616764     767004   14849760
> 
> 
> Here's the trace:
> SysRq : Show Blocked State
>   task                        PC stack   pid father
> md8_raid5       D ffff88017028cb80     0   675      2 0x00000000
>  ffff88020fd67aa8 0000000000000046 ffffffff812f1799 ffff88020fd67fd8
>  ffff880037228410 00000000000140c0 ffff88021e3940c0 ffff880037228410
>  ffff8801f5579bf0 0000000000000004 ffff880211ad07c8 ffff88020fd67ab8
> Call Trace:
>  [<ffffffff812f1799>] ? blk_flush_plug_list+0x1bc/0x1cb
>  [<ffffffff8163a889>] schedule+0x6e/0x70
>  [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
>  [<ffffffff812ef8f6>] get_request+0x4b8/0x56a
>  [<ffffffff8130ba01>] ? cfq_merge+0x49/0x9e
>  [<ffffffff81087b70>] ? finish_wait+0x65/0x65
>  [<ffffffff812f1921>] blk_queue_bio+0x179/0x262
>  [<ffffffff812eec2c>] generic_make_request+0x9c/0xdb
>  [<ffffffffc042b31d>] handle_stripe+0x1e41/0x2166 [raid456]
>  [<ffffffff81017316>] ? ___preempt_schedule+0x56/0xa8
>  [<ffffffff8163d0d9>] ? _raw_spin_unlock_irqrestore+0x1f/0x32
>  [<ffffffffc042b925>] handle_active_stripes.isra.22+0x2e3/0x359 
> [raid456]
>  [<ffffffff8151d35e>] ? md_wakeup_thread+0x55/0x58
>  [<ffffffffc042bdd6>] raid5d+0x330/0x428 [raid456]
>  [<ffffffff810795b1>] ? get_parent_ip+0xd/0x3c
>  [<ffffffff815202f9>] md_thread+0x11c/0x13a
>  [<ffffffff81087b70>] ? finish_wait+0x65/0x65
>  [<ffffffff815201dd>] ? bb_store+0x55/0x55
>  [<ffffffff8106e3a9>] kthread+0xae/0xb6
>  [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
>  [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61

This trace shows we're stuck somewhere different from the 3.15 stalls.  
md is waiting for a request, and unfortunately those are outside of 
btrfs completely.  It's likely that if you had let it sit, the box 
would have eventually dig its way out.

-chris



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3.16.2 btrfs deadlock
  2014-10-07 21:22 ` 3.16.2 btrfs deadlock Chris Mason
@ 2014-10-07 23:45   ` Marc MERLIN
  0 siblings, 0 replies; 4+ messages in thread
From: Marc MERLIN @ 2014-10-07 23:45 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Tue, Oct 07, 2014 at 05:22:52PM -0400, Chris Mason wrote:
> >Here's the trace:
> >SysRq : Show Blocked State
> >  task                        PC stack   pid father
> >md8_raid5       D ffff88017028cb80     0   675      2 0x00000000
> > ffff88020fd67aa8 0000000000000046 ffffffff812f1799 ffff88020fd67fd8
> > ffff880037228410 00000000000140c0 ffff88021e3940c0 ffff880037228410
> > ffff8801f5579bf0 0000000000000004 ffff880211ad07c8 ffff88020fd67ab8
> >Call Trace:
> > [<ffffffff812f1799>] ? blk_flush_plug_list+0x1bc/0x1cb
> > [<ffffffff8163a889>] schedule+0x6e/0x70
> > [<ffffffff8163aa2b>] io_schedule+0x60/0x7a
> > [<ffffffff812ef8f6>] get_request+0x4b8/0x56a
> > [<ffffffff8130ba01>] ? cfq_merge+0x49/0x9e
> > [<ffffffff81087b70>] ? finish_wait+0x65/0x65
> > [<ffffffff812f1921>] blk_queue_bio+0x179/0x262
> > [<ffffffff812eec2c>] generic_make_request+0x9c/0xdb
> > [<ffffffffc042b31d>] handle_stripe+0x1e41/0x2166 [raid456]
> > [<ffffffff81017316>] ? ___preempt_schedule+0x56/0xa8
> > [<ffffffff8163d0d9>] ? _raw_spin_unlock_irqrestore+0x1f/0x32
> > [<ffffffffc042b925>] handle_active_stripes.isra.22+0x2e3/0x359
> >[raid456]
> > [<ffffffff8151d35e>] ? md_wakeup_thread+0x55/0x58
> > [<ffffffffc042bdd6>] raid5d+0x330/0x428 [raid456]
> > [<ffffffff810795b1>] ? get_parent_ip+0xd/0x3c
> > [<ffffffff815202f9>] md_thread+0x11c/0x13a
> > [<ffffffff81087b70>] ? finish_wait+0x65/0x65
> > [<ffffffff815201dd>] ? bb_store+0x55/0x55
> > [<ffffffff8106e3a9>] kthread+0xae/0xb6
> > [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
> > [<ffffffff8163d8fc>] ret_from_fork+0x7c/0xb0
> > [<ffffffff8106e2fb>] ? __kthread_parkme+0x61/0x61
> 
> This trace shows we're stuck somewhere different from the 3.15
> stalls.  md is waiting for a request, and unfortunately those are
> outside of btrfs completely.  It's likely that if you had let it
> sit, the box would have eventually dig its way out.

Thanks for having a look.
I didn't actually reboot it, it deadlocked and hit a CPU stuck watchdog
and self rebooted.

But as long as it's not btrfs, then that's good :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-07 23:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-05 20:29 3.16.2 btrfs deadlock Marc MERLIN
2014-10-05 23:16 ` 3.16.2 btrfs deadlock -> detecting deadlocks with cron Marc MERLIN
2014-10-07 21:22 ` 3.16.2 btrfs deadlock Chris Mason
2014-10-07 23:45   ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).