From: Nikolay Borisov <nborisov@suse.com>
To: David Newall <btrfs@davidnewall.com>, linux-btrfs@vger.kernel.org
Subject: Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
Date: Wed, 11 Sep 2019 09:55:43 +0300 [thread overview]
Message-ID: <1a651f17-ba40-2f17-403e-69999e927b2d@suse.com> (raw)
In-Reply-To: <c00dfaf7-81a4-5e79-6279-b4af53f7f928@davidnewall.com>
On 11.09.19 г. 9:45 ч., David Newall wrote:
> Hi All,
>
> I might have misunderstood how to report a problem. I registered for
> bugzilla and reported a bug
> (https://bugzilla.kernel.org/show_bug.cgi?id=204757), but, perhaps I
> should have sent this message to this mailing list, first. My apologies
> if I bungled it.
>
> I've been trying to track down a problem, intermittently, for a long
> time, and now need to reach out for advice. I apologise in advance for
> the quality of this report, which I feel includes more detail than
> needed, yet may be missing what's important. I'm trying my best.
>
> The brief summary is that my system hangs during SSH login while a
> backup is in progress. Sshd uses PAM authentication. The problem seems
> to be related to mounts as df and mount also hang.
>
> The longer details are: I'm running Ubuntu 16.04.5 on a 64-bit VM under
> kvm. I backup data using the following steps:
>
> 1. Take an LVM2 snapshot of the (non-root) ext2 file-system mounted as
> /data;
> 2. Mount a btrfs file system as /backup;
> 2. Mount the snapshot over an empty directory (may be subvolume; does it
> make a difference?) on /backup/snapshot;
> 3. Rsync the snapshot (with --archive --one-file-system --hard-links
> --inplace --numeric-ids --delete) to a subvolume /backup/data (thus it
> always contains /data as at last backup);
> 4. Take btrfs subvolume snapshot of /backup/data;
> 5. Unmount /backup/snapshot and /backup.
>
> By the time I get called, SSH logins via PAM hang (but complete
> "immediately" if I re-configure sshd for UsePAM no). Sessions which are
> still logged in seem unaffected, except df and mount both hang. I don't
> know what else hangs.
>
> During all of these steps, the /data is almost static, maybe even be
> completely static.
>
> I've queried my user, carefully, to determine the exact step where it
> starts to hang, and am 90% confident in her answer, which indicates that
> the hang-condition starts during rsync.
>
> Processes that were hanging complete normally when subvolume snapshot
> finishes.
>
> There's a chance that processes complete when the snapshot or btrfs
> file-system is unmounted, but I think it's before then because I've
> tried running each step by hand, was unable to reproduce the problem,
> probably because the amount of data to rsync in real-use is much larger
> than I tried writing during that test. At any rate, during that test I
> could log in between and during each step of the procedure.
>
> The only messages in dmesg are "mounting ext2 file system using the ext4
> subsystem" and "mounted filesystem without journal. Opts: (null)", which
> sounds right as I use "mount" instead of "mount -text2".
>
> When I tried running df under strace, strace's output was:
>
> open("/proc/self/mountinfo", O_RDONLY) = 3
> fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> read(3, "18 24 0:17 / /sys rw,nosuid,node"..., 1024) = 1024
> read(3, "ystemd/systemd-cgroups-agent,nam"..., 1024) = 1024
> read(3, "t rw,nosuid,nodev,noexec,relatim"..., 1024) = 1024
> read(3,
>
> After the subvolume snapshot completed, strace continued producing output:
>
> "fs lxcfs rw,user_id=0,group_id=0"..., 1024) = 624
> --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28055,
> si_uid=1000} ---
> read(3, "", 1024) = 0
> lseek(3, 0, SEEK_CUR) = 3696
> close(3) = 0
>
> I think the SIGCONT was because I suspended the parent, strace, using
> Ctrl-Z.
>
> I could just leave sshd doing non-PAM authentication but I think that's
> the wrong approach. How do I zero in on this problem?
When the issue manifests do :
echo w > /proc/sysrq-trigger
This should provide a backtrace for all threads which are currently in
uninterruptible sleep. If it's a deadlock due to btrfs being stuck we
should see it. Also provide your exact kernel version.
>
> Thanks,
>
> David
>
>
next prev parent reply other threads:[~2019-09-11 6:55 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-11 6:45 Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot David Newall
2019-09-11 6:55 ` Nikolay Borisov [this message]
2019-09-11 7:03 ` David Newall
2019-09-11 10:21 ` David Newall
2019-09-11 10:52 ` Nikolay Borisov
2019-09-12 4:38 ` David Newall
2019-09-12 6:11 ` Nikolay Borisov
2019-09-12 6:28 ` Nikolay Borisov
2019-09-12 7:05 ` Qu Wenruo
2019-09-12 14:03 ` David Newall
2019-09-12 14:12 ` Nikolay Borisov
2019-09-12 14:16 ` Qu Wenruo
2019-09-12 14:14 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1a651f17-ba40-2f17-403e-69999e927b2d@suse.com \
--to=nborisov@suse.com \
--cc=btrfs@davidnewall.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).