linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: David Newall <btrfs@davidnewall.com>, linux-btrfs@vger.kernel.org
Subject: Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
Date: Wed, 11 Sep 2019 09:55:43 +0300	[thread overview]
Message-ID: <1a651f17-ba40-2f17-403e-69999e927b2d@suse.com> (raw)
In-Reply-To: <c00dfaf7-81a4-5e79-6279-b4af53f7f928@davidnewall.com>



On 11.09.19 г. 9:45 ч., David Newall wrote:
> Hi All,
> 
> I might have misunderstood how to report a problem.  I registered for
> bugzilla and reported a bug
> (https://bugzilla.kernel.org/show_bug.cgi?id=204757), but, perhaps I
> should have sent this message to this mailing list, first.  My apologies
> if I bungled it.
> 
> I've been trying to track down a problem, intermittently, for a long
> time, and now need to reach out for advice.  I apologise in advance for
> the quality of this report, which I feel includes more detail than
> needed, yet may be missing what's important.  I'm trying my best.
> 
> The brief summary is that my system hangs during SSH login while a
> backup is in progress.  Sshd uses PAM authentication.  The problem seems
> to be related to mounts as df and mount also hang.
> 
> The longer details are:  I'm running Ubuntu 16.04.5 on a 64-bit VM under
> kvm.  I backup data using the following steps:
> 
> 1. Take an LVM2 snapshot of the (non-root) ext2 file-system mounted as
> /data;
> 2. Mount a btrfs file system as /backup;
> 2. Mount the snapshot over an empty directory (may be subvolume; does it
> make a difference?) on /backup/snapshot;
> 3. Rsync the snapshot (with --archive --one-file-system --hard-links
> --inplace --numeric-ids --delete) to a subvolume /backup/data (thus it
> always contains /data as at last backup);
> 4. Take btrfs subvolume snapshot of /backup/data;
> 5. Unmount /backup/snapshot and /backup.
> 
> By the time I get called, SSH logins via PAM hang (but complete
> "immediately" if I re-configure sshd for UsePAM no).  Sessions which are
> still logged in seem unaffected, except df and mount both hang.  I don't
> know what else hangs.
> 
> During all of these steps, the /data is almost static, maybe even be
> completely static.
> 
> I've queried my user, carefully, to determine the exact step where it
> starts to hang, and am 90% confident in her answer, which indicates that
> the hang-condition starts during rsync.
> 
> Processes that were hanging complete normally when subvolume snapshot
> finishes.
> 
> There's a chance that processes complete when the snapshot or btrfs
> file-system is unmounted, but I think it's before then because I've
> tried running each step by hand, was unable to reproduce the problem,
> probably because the amount of data to rsync in real-use is much larger
> than I tried writing during that test.  At any rate, during that test I
> could log in between and during each step of the procedure.
> 
> The only messages in dmesg are "mounting ext2 file system using the ext4
> subsystem" and "mounted filesystem without journal. Opts: (null)", which
> sounds right as I use "mount" instead of "mount -text2".
> 
> When I tried running df under strace, strace's output was:
> 
>   open("/proc/self/mountinfo", O_RDONLY)  = 3
>   fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
>   read(3, "18 24 0:17 / /sys rw,nosuid,node"..., 1024) = 1024
>   read(3, "ystemd/systemd-cgroups-agent,nam"..., 1024) = 1024
>   read(3, "t rw,nosuid,nodev,noexec,relatim"..., 1024) = 1024
>   read(3,
> 
> After the subvolume snapshot completed, strace continued producing output:
> 
>   "fs lxcfs rw,user_id=0,group_id=0"..., 1024) = 624
>   --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28055,
> si_uid=1000} ---
>   read(3, "", 1024)                       = 0
>   lseek(3, 0, SEEK_CUR)                   = 3696
>   close(3)                                = 0
> 
> I think the SIGCONT was because I suspended the parent, strace, using
> Ctrl-Z.
> 
> I could just leave sshd doing non-PAM authentication but I think that's
> the wrong approach.  How do I zero in on this problem?

When the issue manifests do :

echo w > /proc/sysrq-trigger

This should provide a backtrace for all threads which are currently in
uninterruptible sleep. If it's a deadlock due to btrfs being stuck we
should see it. Also provide your exact kernel version.

> 
> Thanks,
> 
> David
> 
> 

  reply	other threads:[~2019-09-11  6:55 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  6:45 Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot David Newall
2019-09-11  6:55 ` Nikolay Borisov [this message]
2019-09-11  7:03   ` David Newall
2019-09-11 10:21   ` David Newall
2019-09-11 10:52     ` Nikolay Borisov
2019-09-12  4:38       ` David Newall
2019-09-12  6:11         ` Nikolay Borisov
2019-09-12  6:28           ` Nikolay Borisov
2019-09-12  7:05             ` Qu Wenruo
2019-09-12 14:03               ` David Newall
2019-09-12 14:12                 ` Nikolay Borisov
2019-09-12 14:16                   ` Qu Wenruo
2019-09-12 14:14                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a651f17-ba40-2f17-403e-69999e927b2d@suse.com \
    --to=nborisov@suse.com \
    --cc=btrfs@davidnewall.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).