linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: David Newall <btrfs@davidnewall.com>, linux-btrfs@vger.kernel.org
Subject: Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
Date: Wed, 11 Sep 2019 13:52:59 +0300	[thread overview]
Message-ID: <133d5657-a045-ad53-31f0-75714d983255@suse.com> (raw)
In-Reply-To: <cfc872b2-ea1e-57b4-f548-48679daad069@davidnewall.com>



On 11.09.19 г. 13:21 ч., David Newall wrote:
>> echo w > /proc/sysrq-trigger
> 
> Interesting.
> 
> One material point which I failed to mention is that the btrfs volume is
> on an encrypted volume (cryptsetup luksOpen /dev/vdc backups).
> 
> The first step, "mount -r /dev/vg/ext2fs-snapshot
> /btrfs-backup-volume/local-snapshot", seemed to trigger the problem. 
> When I did the echo to sysrq-trigger, it seemed to stop blocking, but
> that might have been a coincidence.  After the echo, kernel output
> exceeded 100KB, so I saved it to https://davidnewall.com/kern.1

Nothing useful in that log, everything seems normal. 

> 
> During rsync (--archive --one-file-system --hard-links --inplace
> --numeric-ids --delete /btrfs-backup-volume/local-snapshot/
> /btrfs-backup-volume/data/), initially there was no problem, but, then
> it (df) seemed to hang again.  The rsync took a long time to complete,
> and before it did finish, I did the echo to sysrq-trigger again; kernel
> output is saved to https://davidnewall.com/kern.2

Same thing here

> 
> The rsync finished not long after the echo to sysrq-trigger, but that's
> probably also a coincidence.  After rsync completed, df still hung.  I
> did another echo to sysrq-trigger, and saved kernel output to
> https://davidnewall.com/kern.3

So here it seems df is blocked inside btrfs_show_devname trying to acquire device_list_mutex:

Sep 11 19:24:11 crowies kernel: [1109983.104578] Call Trace:
Sep 11 19:24:11 crowies kernel: [1109983.104580]  [<ffffffff81860625>] schedule+0x35/0x80
Sep 11 19:24:11 crowies kernel: [1109983.104582]  [<ffffffff8186097e>] schedule_preempt_disabled+0xe/0x10
Sep 11 19:24:11 crowies kernel: [1109983.104584]  [<ffffffff81862817>] __mutex_lock_slowpath+0xb7/0x130
Sep 11 19:24:11 crowies kernel: [1109983.104586]  [<ffffffff818628af>] mutex_lock+0x1f/0x30
Sep 11 19:24:11 crowies kernel: [1109983.104599]  [<ffffffffc01a0edb>] btrfs_show_devname+0x2b/0xe0 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104603]  [<ffffffff81261547>] show_mountinfo+0x1b7/0x2a0
Sep 11 19:24:11 crowies kernel: [1109983.104608]  [<ffffffff8123d25d>] m_show+0x1d/0x20
Sep 11 19:24:11 crowies kernel: [1109983.104611]  [<ffffffff81242af0>] seq_read+0x300/0x3d0
Sep 11 19:24:11 crowies kernel: [1109983.104614]  [<ffffffff8121d1eb>] __vfs_read+0x1b/0x40
Sep 11 19:24:11 crowies kernel: [1109983.104616]  [<ffffffff8121d966>] vfs_read+0x86/0x130
Sep 11 19:24:11 crowies kernel: [1109983.104618]  [<ffffffff8121e6bc>] SyS_read+0x5c/0xe0
Sep 11 19:24:11 crowies kernel: [1109983.104623]  [<ffffffff8106f0a7>] ? trace_do_page_fault+0x37/0xe0
Sep 11 19:24:11 crowies kernel: [1109983.104625]  [<ffffffff81864f9b>] entry_SYSCALL_64_fastpath+0x22/0xcb

Looking at the other threads transaction commit is writing the superblocks
to the underlying storage: 

Sep 11 19:24:11 crowies kernel: [1109983.104414]  [<ffffffff818600e1>] ? __schedule+0x301/0x810
Sep 11 19:24:11 crowies kernel: [1109983.104417]  [<ffffffff81860625>] schedule+0x35/0x80
Sep 11 19:24:11 crowies kernel: [1109983.104419]  [<ffffffff81863be4>] schedule_timeout+0x1b4/0x270
Sep 11 19:24:11 crowies kernel: [1109983.104421]  [<ffffffff81860121>] ? __schedule+0x341/0x810
Sep 11 19:24:11 crowies kernel: [1109983.104424]  [<ffffffff81861132>] wait_for_completion+0xb2/0x190
Sep 11 19:24:11 crowies kernel: [1109983.104429]  [<ffffffff810b34a0>] ? wake_up_q+0x70/0x70
Sep 11 19:24:11 crowies kernel: [1109983.104485]  [<ffffffffc01ce557>] write_all_supers.isra.43+0x977/0xb10 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104501]  [<ffffffffc01cf7b7>] write_ctree_super+0x17/0x20 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104517]  [<ffffffffc01d658a>] btrfs_commit_transaction+0x7fa/0xb10 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104532]  [<ffffffffc01d13c8>] transaction_kthread+0x1c8/0x230 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104548]  [<ffffffffc01d1200>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104553]  [<ffffffff810a7707>] kthread+0xe7/0x100
Sep 11 19:24:11 crowies kernel: [1109983.104555]  [<ffffffff818600e1>] ? __schedule+0x301/0x810
Sep 11 19:24:11 crowies kernel: [1109983.104557]  [<ffffffff810a7620>] ? kthread_create_on_node+0x1e0/0x1e0
Sep 11 19:24:11 crowies kernel: [1109983.104560]  [<ffffffff81865425>] ret_from_fork+0x55/0x80
Sep 11 19:24:11 crowies kernel: [1109983.104562]  [<ffffffff810a7620>] ? kthread_create_on_node+0x1e0/0x1e0

And it's waiting for that write to complete, under device_list_mutex. From my POV it
seems you have slow storage and df is blocked while transaction commit is finished, in 
particular, the last phase of transaction commit - writing the superblocks. 


At this point I don't see anything which would suggest foul play. Newer kernels have changed
the locking in btrfs_show_devname and it takes an RCU rather than device_list_mutex. Also 
4.4.0-159 is ubuntu's own kernel and 4.4 is rather old at this stage. The only remediation I could
suggest is to upgrade your kernel either via ubuntu's HWE or compiling one on your own. 


> 
> I tried a minor change in procedure to see if it would bring the system
> back to normal response.  Normally I'd do "btrfs subvolume snapshot",
> but I tried unmounting the lvm2 snapshot first (umount
> /btrfs-backup-volume/local-snapshot).  It did not complete within the
> expected time, and another echo to sysrq-trigger resulted in
> https://davidnewall.com/kern.4
> 
> Eventually the umount completed and system came back to normal response.
> 
> I did the btrfs subvolume snapshot, and it completed faster than I could
> notice without causing any issues.
> 
> After unmounting the btrfs volume, I tried each step again, and
> everything completed within expected times without causing any hang.
> 
> Something which I did previously mention, but I'll repeat because it
> might well be important, is that the base ext2 filesystem is on a
> drbd-replicated volume.  I don't know if it's part of the problem, and I
> observe that the hang condition was not triggered at the point of
> creating the lvm2 snapshot.
> 
> I greatly appreciate your advice and help.
> 
> Thanks,
> 
> David
> 
> 

  reply	other threads:[~2019-09-11 10:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  6:45 Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot David Newall
2019-09-11  6:55 ` Nikolay Borisov
2019-09-11  7:03   ` David Newall
2019-09-11 10:21   ` David Newall
2019-09-11 10:52     ` Nikolay Borisov [this message]
2019-09-12  4:38       ` David Newall
2019-09-12  6:11         ` Nikolay Borisov
2019-09-12  6:28           ` Nikolay Borisov
2019-09-12  7:05             ` Qu Wenruo
2019-09-12 14:03               ` David Newall
2019-09-12 14:12                 ` Nikolay Borisov
2019-09-12 14:16                   ` Qu Wenruo
2019-09-12 14:14                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=133d5657-a045-ad53-31f0-75714d983255@suse.com \
    --to=nborisov@suse.com \
    --cc=btrfs@davidnewall.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).