linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: linux-btrfs@vger.kernel.org
Subject: Re: 3.15.0-rc5: btrfs and sync deadlock: call_rwsem_down_read_failed / balance seems to create locks that block everything else
Date: Thu, 22 May 2014 06:15:29 -0700	[thread overview]
Message-ID: <20140522131528.GB22952@merlins.org> (raw)
In-Reply-To: <20140522090921.GA12037@merlins.org>

On Thu, May 22, 2014 at 02:09:21AM -0700, Marc MERLIN wrote:
> I got m laptop to hang all IO to one of its devices again, this time
> drive #2.
> This is the 3rd time it happens, and I've already lost data as a result
> since things that haven't hit disk, don't make it at this point.
> 
> I was doing balance and btrfs send/receive.
> Then cron started a scrub in the background too.
> 
> IO to drive #1 was working fine, I didn't even notice that drive #2 IO
> was hung.
> 
> And then I typed sync and it never returned.
> 
> legolas:~# ps -eo pid,user,args,wchan  | grep  sync
> 23605 root     sync                        call_rwsem_down_read_failed
> 31885 root     sync                        call_rwsem_down_read_failed
> 
> What does this mean when sync is stuck that way?
> 
> When I'm in that state, accessing btrfs on drive 1 still works (read and
> write).
> Any access on drive 2 through btrfs hangs

After reboot, I got hangs on drive 2 quickly:
[ 1559.667362] INFO: task btrfs-balance:3280 blocked for more than 120 seconds.
[ 1559.667374]       Not tainted 3.15.0-rc5-amd64-i915-preempt-20140216s2 #1
[ 1559.667379] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1559.667383] btrfs-balance   D 0000000000000001     0  3280      2 0x00000000
[ 1559.667395]  ffff880408531c20 0000000000000046 000000000003da54 ffff880408531fd8
[ 1559.667405]  ffff880408fe8110 00000000000141c0 ffff8800ca1cc5e0 ffff8800ca1cc5e4
[ 1559.667414]  ffff880408fe8110 ffff8800ca1cc5e8 00000000ffffffff ffff880408531c30
[ 1559.667423] Call Trace:
[ 1559.667442]  [<ffffffff8161c896>] schedule+0x73/0x75
[ 1559.667451]  [<ffffffff8161cb57>] schedule_preempt_disabled+0x18/0x24
[ 1559.667459]  [<ffffffff8161dc7a>] __mutex_lock_slowpath+0x160/0x1d7
[ 1559.667466]  [<ffffffff8161dd08>] mutex_lock+0x17/0x27
[ 1559.667475]  [<ffffffff8126adb7>] btrfs_relocate_block_group+0x153/0x26d
[ 1559.667486]  [<ffffffff81249838>] btrfs_relocate_chunk.isra.23+0x5c/0x5e8
[ 1559.667494]  [<ffffffff8161efbb>] ? _raw_spin_unlock+0x17/0x2a
[ 1559.667502]  [<ffffffff81245584>] ? free_extent_buffer+0x8a/0x8d
[ 1559.667510]  [<ffffffff8124c0be>] btrfs_balance+0x9b6/0xb74
[ 1559.667517]  [<ffffffff81615c3d>] ? printk+0x54/0x56
[ 1559.667526]  [<ffffffff8124c27c>] ? btrfs_balance+0xb74/0xb74
[ 1559.667534]  [<ffffffff8124c2d5>] balance_kthread+0x59/0x7b
[ 1559.667542]  [<ffffffff8106b467>] kthread+0xae/0xb6
[ 1559.667549]  [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61
[ 1559.667557]  [<ffffffff81625b3c>] ret_from_fork+0x7c/0xb0
[ 1559.667563]  [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61
[ 1679.595668] INFO: task btrfs-balance:3280 blocked for more than 120 seconds.
[ 1679.595680]       Not tainted 3.15.0-rc5-amd64-i915-preempt-20140216s2 #1
[ 1679.595685] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Balance cancel hangs too and so does sync again:
legolas:~# ps -eo pid,user,args,wchan  | grep  btrfs
  527 root     [btrfs-worker]              rescuer_thread
  528 root     [btrfs-worker-hi]           rescuer_thread
  529 root     [btrfs-delalloc]            rescuer_thread
  530 root     [btrfs-flush_del]           rescuer_thread
  531 root     [btrfs-cache]               rescuer_thread
  532 root     [btrfs-submit]              rescuer_thread
  533 root     [btrfs-fixup]               rescuer_thread
  534 root     [btrfs-endio]               rescuer_thread
  535 root     [btrfs-endio-met]           rescuer_thread
  536 root     [btrfs-endio-met]           rescuer_thread
  537 root     [btrfs-endio-rai]           rescuer_thread
  538 root     [btrfs-rmw]                 rescuer_thread
  539 root     [btrfs-endio-wri]           rescuer_thread
  540 root     [btrfs-freespace]           rescuer_thread
  541 root     [btrfs-delayed-m]           rescuer_thread
  542 root     [btrfs-readahead]           rescuer_thread
  543 root     [btrfs-qgroup-re]           rescuer_thread
  544 root     [btrfs-cleaner]             cleaner_kthread
  545 root     [btrfs-transacti]           transaction_kthread
 2267 root     [btrfs-worker]              rescuer_thread
 2268 root     [btrfs-worker-hi]           rescuer_thread
 2269 root     [btrfs-delalloc]            rescuer_thread
 2271 root     [btrfs-flush_del]           rescuer_thread
 2272 root     [btrfs-cache]               rescuer_thread
 2275 root     [btrfs-submit]              rescuer_thread
 2276 root     [btrfs-fixup]               rescuer_thread
 2277 root     [btrfs-endio]               rescuer_thread
 2278 root     [btrfs-endio-met]           rescuer_thread
 2279 root     [btrfs-endio-met]           rescuer_thread
 2281 root     [btrfs-endio-rai]           rescuer_thread
 2282 root     [btrfs-rmw]                 rescuer_thread
 2283 root     [btrfs-endio-wri]           rescuer_thread
 2284 root     [btrfs-freespace]           rescuer_thread
 2285 root     [btrfs-delayed-m]           rescuer_thread
 2286 root     [btrfs-readahead]           rescuer_thread
 2288 root     [btrfs-qgroup-re]           rescuer_thread
 3278 root     [btrfs-cleaner]             sleep_on_page
 3279 root     [btrfs-transacti]           sleep_on_page
 3280 root     [btrfs-balance]             btrfs_relocate_block_group
14727 root     [kworker/u16:47]            btrfs_tree_lock
14770 root     [kworker/u16:90]            btrfs_tree_lock
22551 root     btrfs send var_ro.20140522_ pipe_wait
22552 root     btrfs receive /mnt/btrfs_po balance_dirty_pages_ratelimited
22593 root     [kworker/u16:3]             btrfs_tree_lock
25054 root     btrfs balance cancel .      btrfs_cancel_balance

I was able to stop my btrfs send/receive, in turn this unlocked sync which
succeeded too (2mn later).
btrfs balance cancel did not return, but maybe that's normal. 
I see:
legolas:~# btrfs balance status /mnt/btrfs_pool2/
Balance on '/mnt/btrfs_pool2/' is running, cancel requested
383 out of about 388 chunks balanced (457 considered),   1% left

It's been running for at least 15mn in 'cancel mode'. Is that normal?

The system doesn't seem hung, but it seems that running anything else while
balance is running creates an avalanche of locks that kills everything.

Is that a known performance problem?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

  reply	other threads:[~2014-05-22 13:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-22  9:09 3.15.0-rc5: btrfs and sync deadlock: call_rwsem_down_read_failed Marc MERLIN
2014-05-22 13:15 ` Marc MERLIN [this message]
2014-05-22 20:52   ` 3.15.0-rc5: btrfs and sync deadlock: call_rwsem_down_read_failed / balance seems to create locks that block everything else Duncan
2014-05-23  0:22     ` Marc MERLIN
2014-05-23 14:17       ` 3.15.0-rc5: now sync and mount are hung on call_rwsem_down_write_failed Marc MERLIN
2014-05-23 20:24         ` Chris Mason
2014-05-23 23:13           ` Marc MERLIN
2014-05-27 19:27             ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522131528.GB22952@merlins.org \
    --to=marc@merlins.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).