linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Newall <btrfs@davidnewall.com>
To: linux-btrfs@vger.kernel.org
Subject: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
Date: Wed, 11 Sep 2019 16:15:06 +0930	[thread overview]
Message-ID: <c00dfaf7-81a4-5e79-6279-b4af53f7f928@davidnewall.com> (raw)

Hi All,

I might have misunderstood how to report a problem.  I registered for 
bugzilla and reported a bug 
(https://bugzilla.kernel.org/show_bug.cgi?id=204757), but, perhaps I 
should have sent this message to this mailing list, first.  My apologies 
if I bungled it.

I've been trying to track down a problem, intermittently, for a long 
time, and now need to reach out for advice.  I apologise in advance for 
the quality of this report, which I feel includes more detail than 
needed, yet may be missing what's important.  I'm trying my best.

The brief summary is that my system hangs during SSH login while a 
backup is in progress.  Sshd uses PAM authentication.  The problem seems 
to be related to mounts as df and mount also hang.

The longer details are:  I'm running Ubuntu 16.04.5 on a 64-bit VM under 
kvm.  I backup data using the following steps:

1. Take an LVM2 snapshot of the (non-root) ext2 file-system mounted as 
/data;
2. Mount a btrfs file system as /backup;
2. Mount the snapshot over an empty directory (may be subvolume; does it 
make a difference?) on /backup/snapshot;
3. Rsync the snapshot (with --archive --one-file-system --hard-links 
--inplace --numeric-ids --delete) to a subvolume /backup/data (thus it 
always contains /data as at last backup);
4. Take btrfs subvolume snapshot of /backup/data;
5. Unmount /backup/snapshot and /backup.

By the time I get called, SSH logins via PAM hang (but complete 
"immediately" if I re-configure sshd for UsePAM no).  Sessions which are 
still logged in seem unaffected, except df and mount both hang.  I don't 
know what else hangs.

During all of these steps, the /data is almost static, maybe even be 
completely static.

I've queried my user, carefully, to determine the exact step where it 
starts to hang, and am 90% confident in her answer, which indicates that 
the hang-condition starts during rsync.

Processes that were hanging complete normally when subvolume snapshot 
finishes.

There's a chance that processes complete when the snapshot or btrfs 
file-system is unmounted, but I think it's before then because I've 
tried running each step by hand, was unable to reproduce the problem, 
probably because the amount of data to rsync in real-use is much larger 
than I tried writing during that test.  At any rate, during that test I 
could log in between and during each step of the procedure.

The only messages in dmesg are "mounting ext2 file system using the ext4 
subsystem" and "mounted filesystem without journal. Opts: (null)", which 
sounds right as I use "mount" instead of "mount -text2".

When I tried running df under strace, strace's output was:

   open("/proc/self/mountinfo", O_RDONLY)  = 3
   fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
   read(3, "18 24 0:17 / /sys rw,nosuid,node"..., 1024) = 1024
   read(3, "ystemd/systemd-cgroups-agent,nam"..., 1024) = 1024
   read(3, "t rw,nosuid,nodev,noexec,relatim"..., 1024) = 1024
   read(3,

After the subvolume snapshot completed, strace continued producing output:

   "fs lxcfs rw,user_id=0,group_id=0"..., 1024) = 624
   --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28055, 
si_uid=1000} ---
   read(3, "", 1024)                       = 0
   lseek(3, 0, SEEK_CUR)                   = 3696
   close(3)                                = 0

I think the SIGCONT was because I suspended the parent, strace, using 
Ctrl-Z.

I could just leave sshd doing non-PAM authentication but I think that's 
the wrong approach.  How do I zero in on this problem?

Thanks,

David


             reply	other threads:[~2019-09-11  6:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  6:45 David Newall [this message]
2019-09-11  6:55 ` Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot Nikolay Borisov
2019-09-11  7:03   ` David Newall
2019-09-11 10:21   ` David Newall
2019-09-11 10:52     ` Nikolay Borisov
2019-09-12  4:38       ` David Newall
2019-09-12  6:11         ` Nikolay Borisov
2019-09-12  6:28           ` Nikolay Borisov
2019-09-12  7:05             ` Qu Wenruo
2019-09-12 14:03               ` David Newall
2019-09-12 14:12                 ` Nikolay Borisov
2019-09-12 14:16                   ` Qu Wenruo
2019-09-12 14:14                 ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c00dfaf7-81a4-5e79-6279-b4af53f7f928@davidnewall.com \
    --to=btrfs@davidnewall.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).