linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
@ 2019-09-11  6:45 David Newall
  2019-09-11  6:55 ` Nikolay Borisov
  0 siblings, 1 reply; 13+ messages in thread
From: David Newall @ 2019-09-11  6:45 UTC (permalink / raw)
  To: linux-btrfs

Hi All,

I might have misunderstood how to report a problem.  I registered for 
bugzilla and reported a bug 
(https://bugzilla.kernel.org/show_bug.cgi?id=204757), but, perhaps I 
should have sent this message to this mailing list, first.  My apologies 
if I bungled it.

I've been trying to track down a problem, intermittently, for a long 
time, and now need to reach out for advice.  I apologise in advance for 
the quality of this report, which I feel includes more detail than 
needed, yet may be missing what's important.  I'm trying my best.

The brief summary is that my system hangs during SSH login while a 
backup is in progress.  Sshd uses PAM authentication.  The problem seems 
to be related to mounts as df and mount also hang.

The longer details are:  I'm running Ubuntu 16.04.5 on a 64-bit VM under 
kvm.  I backup data using the following steps:

1. Take an LVM2 snapshot of the (non-root) ext2 file-system mounted as 
/data;
2. Mount a btrfs file system as /backup;
2. Mount the snapshot over an empty directory (may be subvolume; does it 
make a difference?) on /backup/snapshot;
3. Rsync the snapshot (with --archive --one-file-system --hard-links 
--inplace --numeric-ids --delete) to a subvolume /backup/data (thus it 
always contains /data as at last backup);
4. Take btrfs subvolume snapshot of /backup/data;
5. Unmount /backup/snapshot and /backup.

By the time I get called, SSH logins via PAM hang (but complete 
"immediately" if I re-configure sshd for UsePAM no).  Sessions which are 
still logged in seem unaffected, except df and mount both hang.  I don't 
know what else hangs.

During all of these steps, the /data is almost static, maybe even be 
completely static.

I've queried my user, carefully, to determine the exact step where it 
starts to hang, and am 90% confident in her answer, which indicates that 
the hang-condition starts during rsync.

Processes that were hanging complete normally when subvolume snapshot 
finishes.

There's a chance that processes complete when the snapshot or btrfs 
file-system is unmounted, but I think it's before then because I've 
tried running each step by hand, was unable to reproduce the problem, 
probably because the amount of data to rsync in real-use is much larger 
than I tried writing during that test.  At any rate, during that test I 
could log in between and during each step of the procedure.

The only messages in dmesg are "mounting ext2 file system using the ext4 
subsystem" and "mounted filesystem without journal. Opts: (null)", which 
sounds right as I use "mount" instead of "mount -text2".

When I tried running df under strace, strace's output was:

   open("/proc/self/mountinfo", O_RDONLY)  = 3
   fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
   read(3, "18 24 0:17 / /sys rw,nosuid,node"..., 1024) = 1024
   read(3, "ystemd/systemd-cgroups-agent,nam"..., 1024) = 1024
   read(3, "t rw,nosuid,nodev,noexec,relatim"..., 1024) = 1024
   read(3,

After the subvolume snapshot completed, strace continued producing output:

   "fs lxcfs rw,user_id=0,group_id=0"..., 1024) = 624
   --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28055, 
si_uid=1000} ---
   read(3, "", 1024)                       = 0
   lseek(3, 0, SEEK_CUR)                   = 3696
   close(3)                                = 0

I think the SIGCONT was because I suspended the parent, strace, using 
Ctrl-Z.

I could just leave sshd doing non-PAM authentication but I think that's 
the wrong approach.  How do I zero in on this problem?

Thanks,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-11  6:45 Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot David Newall
@ 2019-09-11  6:55 ` Nikolay Borisov
  2019-09-11  7:03   ` David Newall
  2019-09-11 10:21   ` David Newall
  0 siblings, 2 replies; 13+ messages in thread
From: Nikolay Borisov @ 2019-09-11  6:55 UTC (permalink / raw)
  To: David Newall, linux-btrfs



On 11.09.19 г. 9:45 ч., David Newall wrote:
> Hi All,
> 
> I might have misunderstood how to report a problem.  I registered for
> bugzilla and reported a bug
> (https://bugzilla.kernel.org/show_bug.cgi?id=204757), but, perhaps I
> should have sent this message to this mailing list, first.  My apologies
> if I bungled it.
> 
> I've been trying to track down a problem, intermittently, for a long
> time, and now need to reach out for advice.  I apologise in advance for
> the quality of this report, which I feel includes more detail than
> needed, yet may be missing what's important.  I'm trying my best.
> 
> The brief summary is that my system hangs during SSH login while a
> backup is in progress.  Sshd uses PAM authentication.  The problem seems
> to be related to mounts as df and mount also hang.
> 
> The longer details are:  I'm running Ubuntu 16.04.5 on a 64-bit VM under
> kvm.  I backup data using the following steps:
> 
> 1. Take an LVM2 snapshot of the (non-root) ext2 file-system mounted as
> /data;
> 2. Mount a btrfs file system as /backup;
> 2. Mount the snapshot over an empty directory (may be subvolume; does it
> make a difference?) on /backup/snapshot;
> 3. Rsync the snapshot (with --archive --one-file-system --hard-links
> --inplace --numeric-ids --delete) to a subvolume /backup/data (thus it
> always contains /data as at last backup);
> 4. Take btrfs subvolume snapshot of /backup/data;
> 5. Unmount /backup/snapshot and /backup.
> 
> By the time I get called, SSH logins via PAM hang (but complete
> "immediately" if I re-configure sshd for UsePAM no).  Sessions which are
> still logged in seem unaffected, except df and mount both hang.  I don't
> know what else hangs.
> 
> During all of these steps, the /data is almost static, maybe even be
> completely static.
> 
> I've queried my user, carefully, to determine the exact step where it
> starts to hang, and am 90% confident in her answer, which indicates that
> the hang-condition starts during rsync.
> 
> Processes that were hanging complete normally when subvolume snapshot
> finishes.
> 
> There's a chance that processes complete when the snapshot or btrfs
> file-system is unmounted, but I think it's before then because I've
> tried running each step by hand, was unable to reproduce the problem,
> probably because the amount of data to rsync in real-use is much larger
> than I tried writing during that test.  At any rate, during that test I
> could log in between and during each step of the procedure.
> 
> The only messages in dmesg are "mounting ext2 file system using the ext4
> subsystem" and "mounted filesystem without journal. Opts: (null)", which
> sounds right as I use "mount" instead of "mount -text2".
> 
> When I tried running df under strace, strace's output was:
> 
>   open("/proc/self/mountinfo", O_RDONLY)  = 3
>   fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
>   read(3, "18 24 0:17 / /sys rw,nosuid,node"..., 1024) = 1024
>   read(3, "ystemd/systemd-cgroups-agent,nam"..., 1024) = 1024
>   read(3, "t rw,nosuid,nodev,noexec,relatim"..., 1024) = 1024
>   read(3,
> 
> After the subvolume snapshot completed, strace continued producing output:
> 
>   "fs lxcfs rw,user_id=0,group_id=0"..., 1024) = 624
>   --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28055,
> si_uid=1000} ---
>   read(3, "", 1024)                       = 0
>   lseek(3, 0, SEEK_CUR)                   = 3696
>   close(3)                                = 0
> 
> I think the SIGCONT was because I suspended the parent, strace, using
> Ctrl-Z.
> 
> I could just leave sshd doing non-PAM authentication but I think that's
> the wrong approach.  How do I zero in on this problem?

When the issue manifests do :

echo w > /proc/sysrq-trigger

This should provide a backtrace for all threads which are currently in
uninterruptible sleep. If it's a deadlock due to btrfs being stuck we
should see it. Also provide your exact kernel version.

> 
> Thanks,
> 
> David
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-11  6:55 ` Nikolay Borisov
@ 2019-09-11  7:03   ` David Newall
  2019-09-11 10:21   ` David Newall
  1 sibling, 0 replies; 13+ messages in thread
From: David Newall @ 2019-09-11  7:03 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

> provide your exact kernel version.

I apologise.  I included that in bugzilla but forgot to include it in my 
message to this list.  It's 4.4.0-159.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-11  6:55 ` Nikolay Borisov
  2019-09-11  7:03   ` David Newall
@ 2019-09-11 10:21   ` David Newall
  2019-09-11 10:52     ` Nikolay Borisov
  1 sibling, 1 reply; 13+ messages in thread
From: David Newall @ 2019-09-11 10:21 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

> echo w > /proc/sysrq-trigger

Interesting.

One material point which I failed to mention is that the btrfs volume is 
on an encrypted volume (cryptsetup luksOpen /dev/vdc backups).

The first step, "mount -r /dev/vg/ext2fs-snapshot 
/btrfs-backup-volume/local-snapshot", seemed to trigger the problem.  
When I did the echo to sysrq-trigger, it seemed to stop blocking, but 
that might have been a coincidence.  After the echo, kernel output 
exceeded 100KB, so I saved it to https://davidnewall.com/kern.1

During rsync (--archive --one-file-system --hard-links --inplace 
--numeric-ids --delete /btrfs-backup-volume/local-snapshot/ 
/btrfs-backup-volume/data/), initially there was no problem, but, then 
it (df) seemed to hang again.  The rsync took a long time to complete, 
and before it did finish, I did the echo to sysrq-trigger again; kernel 
output is saved to https://davidnewall.com/kern.2

The rsync finished not long after the echo to sysrq-trigger, but that's 
probably also a coincidence.  After rsync completed, df still hung.  I 
did another echo to sysrq-trigger, and saved kernel output to 
https://davidnewall.com/kern.3

I tried a minor change in procedure to see if it would bring the system 
back to normal response.  Normally I'd do "btrfs subvolume snapshot", 
but I tried unmounting the lvm2 snapshot first (umount 
/btrfs-backup-volume/local-snapshot).  It did not complete within the 
expected time, and another echo to sysrq-trigger resulted in 
https://davidnewall.com/kern.4

Eventually the umount completed and system came back to normal response.

I did the btrfs subvolume snapshot, and it completed faster than I could 
notice without causing any issues.

After unmounting the btrfs volume, I tried each step again, and 
everything completed within expected times without causing any hang.

Something which I did previously mention, but I'll repeat because it 
might well be important, is that the base ext2 filesystem is on a 
drbd-replicated volume.  I don't know if it's part of the problem, and I 
observe that the hang condition was not triggered at the point of 
creating the lvm2 snapshot.

I greatly appreciate your advice and help.

Thanks,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-11 10:21   ` David Newall
@ 2019-09-11 10:52     ` Nikolay Borisov
  2019-09-12  4:38       ` David Newall
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2019-09-11 10:52 UTC (permalink / raw)
  To: David Newall, linux-btrfs



On 11.09.19 г. 13:21 ч., David Newall wrote:
>> echo w > /proc/sysrq-trigger
> 
> Interesting.
> 
> One material point which I failed to mention is that the btrfs volume is
> on an encrypted volume (cryptsetup luksOpen /dev/vdc backups).
> 
> The first step, "mount -r /dev/vg/ext2fs-snapshot
> /btrfs-backup-volume/local-snapshot", seemed to trigger the problem. 
> When I did the echo to sysrq-trigger, it seemed to stop blocking, but
> that might have been a coincidence.  After the echo, kernel output
> exceeded 100KB, so I saved it to https://davidnewall.com/kern.1

Nothing useful in that log, everything seems normal. 

> 
> During rsync (--archive --one-file-system --hard-links --inplace
> --numeric-ids --delete /btrfs-backup-volume/local-snapshot/
> /btrfs-backup-volume/data/), initially there was no problem, but, then
> it (df) seemed to hang again.  The rsync took a long time to complete,
> and before it did finish, I did the echo to sysrq-trigger again; kernel
> output is saved to https://davidnewall.com/kern.2

Same thing here

> 
> The rsync finished not long after the echo to sysrq-trigger, but that's
> probably also a coincidence.  After rsync completed, df still hung.  I
> did another echo to sysrq-trigger, and saved kernel output to
> https://davidnewall.com/kern.3

So here it seems df is blocked inside btrfs_show_devname trying to acquire device_list_mutex:

Sep 11 19:24:11 crowies kernel: [1109983.104578] Call Trace:
Sep 11 19:24:11 crowies kernel: [1109983.104580]  [<ffffffff81860625>] schedule+0x35/0x80
Sep 11 19:24:11 crowies kernel: [1109983.104582]  [<ffffffff8186097e>] schedule_preempt_disabled+0xe/0x10
Sep 11 19:24:11 crowies kernel: [1109983.104584]  [<ffffffff81862817>] __mutex_lock_slowpath+0xb7/0x130
Sep 11 19:24:11 crowies kernel: [1109983.104586]  [<ffffffff818628af>] mutex_lock+0x1f/0x30
Sep 11 19:24:11 crowies kernel: [1109983.104599]  [<ffffffffc01a0edb>] btrfs_show_devname+0x2b/0xe0 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104603]  [<ffffffff81261547>] show_mountinfo+0x1b7/0x2a0
Sep 11 19:24:11 crowies kernel: [1109983.104608]  [<ffffffff8123d25d>] m_show+0x1d/0x20
Sep 11 19:24:11 crowies kernel: [1109983.104611]  [<ffffffff81242af0>] seq_read+0x300/0x3d0
Sep 11 19:24:11 crowies kernel: [1109983.104614]  [<ffffffff8121d1eb>] __vfs_read+0x1b/0x40
Sep 11 19:24:11 crowies kernel: [1109983.104616]  [<ffffffff8121d966>] vfs_read+0x86/0x130
Sep 11 19:24:11 crowies kernel: [1109983.104618]  [<ffffffff8121e6bc>] SyS_read+0x5c/0xe0
Sep 11 19:24:11 crowies kernel: [1109983.104623]  [<ffffffff8106f0a7>] ? trace_do_page_fault+0x37/0xe0
Sep 11 19:24:11 crowies kernel: [1109983.104625]  [<ffffffff81864f9b>] entry_SYSCALL_64_fastpath+0x22/0xcb

Looking at the other threads transaction commit is writing the superblocks
to the underlying storage: 

Sep 11 19:24:11 crowies kernel: [1109983.104414]  [<ffffffff818600e1>] ? __schedule+0x301/0x810
Sep 11 19:24:11 crowies kernel: [1109983.104417]  [<ffffffff81860625>] schedule+0x35/0x80
Sep 11 19:24:11 crowies kernel: [1109983.104419]  [<ffffffff81863be4>] schedule_timeout+0x1b4/0x270
Sep 11 19:24:11 crowies kernel: [1109983.104421]  [<ffffffff81860121>] ? __schedule+0x341/0x810
Sep 11 19:24:11 crowies kernel: [1109983.104424]  [<ffffffff81861132>] wait_for_completion+0xb2/0x190
Sep 11 19:24:11 crowies kernel: [1109983.104429]  [<ffffffff810b34a0>] ? wake_up_q+0x70/0x70
Sep 11 19:24:11 crowies kernel: [1109983.104485]  [<ffffffffc01ce557>] write_all_supers.isra.43+0x977/0xb10 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104501]  [<ffffffffc01cf7b7>] write_ctree_super+0x17/0x20 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104517]  [<ffffffffc01d658a>] btrfs_commit_transaction+0x7fa/0xb10 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104532]  [<ffffffffc01d13c8>] transaction_kthread+0x1c8/0x230 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104548]  [<ffffffffc01d1200>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
Sep 11 19:24:11 crowies kernel: [1109983.104553]  [<ffffffff810a7707>] kthread+0xe7/0x100
Sep 11 19:24:11 crowies kernel: [1109983.104555]  [<ffffffff818600e1>] ? __schedule+0x301/0x810
Sep 11 19:24:11 crowies kernel: [1109983.104557]  [<ffffffff810a7620>] ? kthread_create_on_node+0x1e0/0x1e0
Sep 11 19:24:11 crowies kernel: [1109983.104560]  [<ffffffff81865425>] ret_from_fork+0x55/0x80
Sep 11 19:24:11 crowies kernel: [1109983.104562]  [<ffffffff810a7620>] ? kthread_create_on_node+0x1e0/0x1e0

And it's waiting for that write to complete, under device_list_mutex. From my POV it
seems you have slow storage and df is blocked while transaction commit is finished, in 
particular, the last phase of transaction commit - writing the superblocks. 


At this point I don't see anything which would suggest foul play. Newer kernels have changed
the locking in btrfs_show_devname and it takes an RCU rather than device_list_mutex. Also 
4.4.0-159 is ubuntu's own kernel and 4.4 is rather old at this stage. The only remediation I could
suggest is to upgrade your kernel either via ubuntu's HWE or compiling one on your own. 


> 
> I tried a minor change in procedure to see if it would bring the system
> back to normal response.  Normally I'd do "btrfs subvolume snapshot",
> but I tried unmounting the lvm2 snapshot first (umount
> /btrfs-backup-volume/local-snapshot).  It did not complete within the
> expected time, and another echo to sysrq-trigger resulted in
> https://davidnewall.com/kern.4
> 
> Eventually the umount completed and system came back to normal response.
> 
> I did the btrfs subvolume snapshot, and it completed faster than I could
> notice without causing any issues.
> 
> After unmounting the btrfs volume, I tried each step again, and
> everything completed within expected times without causing any hang.
> 
> Something which I did previously mention, but I'll repeat because it
> might well be important, is that the base ext2 filesystem is on a
> drbd-replicated volume.  I don't know if it's part of the problem, and I
> observe that the hang condition was not triggered at the point of
> creating the lvm2 snapshot.
> 
> I greatly appreciate your advice and help.
> 
> Thanks,
> 
> David
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-11 10:52     ` Nikolay Borisov
@ 2019-09-12  4:38       ` David Newall
  2019-09-12  6:11         ` Nikolay Borisov
  0 siblings, 1 reply; 13+ messages in thread
From: David Newall @ 2019-09-12  4:38 UTC (permalink / raw)
  To: Nikolay Borisov, David Newall, linux-btrfs

On 11/9/19 8:22 pm, Nikolay Borisov wrote:
>> I saved it tohttps://davidnewall.com/kern.1
> Nothing useful in that log, everything seems normal.

Thank you, again, for your help.  I am grateful.

If I understand the output, both df and mount are waiting for 
show_mountinfo which is waiting for btrfs_show_devname which is waiting 
to get a lock.  They have to wait to find the devname for ten minutes.  
Is that really normal?

I'm not saying that btrfs is behind it, but it does seem like something 
is not right.

I notice that there's a waiting btrfs-transation, too.  I don't know 
what the transaction would be, and no doubt it's completely appropriate, 
even though the use of btrfs at that point is merely one mount (and one 
mount of ext2 over a sub-directory, probably no involvement by btrfs.)

I see, too, that systemd is waiting for btrfs_show_devname.  It's a 
pattern.

I take your point about the kernel being somewhat old and accept that 
I'm unlikely to get far without confirming the problem on a recent kernel.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12  4:38       ` David Newall
@ 2019-09-12  6:11         ` Nikolay Borisov
  2019-09-12  6:28           ` Nikolay Borisov
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2019-09-12  6:11 UTC (permalink / raw)
  To: David Newall, linux-btrfs



On 12.09.19 г. 7:38 ч., David Newall wrote:
> On 11/9/19 8:22 pm, Nikolay Borisov wrote:
>>> I saved it tohttps://davidnewall.com/kern.1
>> Nothing useful in that log, everything seems normal.
> 
> Thank you, again, for your help.  I am grateful.
> 
> If I understand the output, both df and mount are waiting for
> show_mountinfo which is waiting for btrfs_show_devname which is waiting
> to get a lock.  They have to wait to find the devname for ten minutes. 
> Is that really normal?

It's normal for them to wait for the lock, it's not normal for the lock
to be taken for 10 minutes.

> 
> I'm not saying that btrfs is behind it, but it does seem like something
> is not right.

From my POV what's wrong is the fact that btrfs transaction commit is
taking a long time. Is it possible that the underlying storage is
exhibiting problems e.g. a dying disk/ssd?

> 
> I notice that there's a waiting btrfs-transation, too.  I don't know
> what the transaction would be, and no doubt it's completely appropriate,
> even though the use of btrfs at that point is merely one mount (and one
> mount of ext2 over a sub-directory, probably no involvement by btrfs.)
> 
> I see, too, that systemd is waiting for btrfs_show_devname.  It's a
> pattern.
> 
> I take your point about the kernel being somewhat old and accept that
> I'm unlikely to get far without confirming the problem on a recent kernel.
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12  6:11         ` Nikolay Borisov
@ 2019-09-12  6:28           ` Nikolay Borisov
  2019-09-12  7:05             ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2019-09-12  6:28 UTC (permalink / raw)
  To: David Newall, linux-btrfs



On 12.09.19 г. 9:11 ч., Nikolay Borisov wrote:
> 
> 
> On 12.09.19 г. 7:38 ч., David Newall wrote:
>> On 11/9/19 8:22 pm, Nikolay Borisov wrote:
>>>> I saved it tohttps://davidnewall.com/kern.1
>>> Nothing useful in that log, everything seems normal.
>>
>> Thank you, again, for your help.  I am grateful.
>>
>> If I understand the output, both df and mount are waiting for
>> show_mountinfo which is waiting for btrfs_show_devname which is waiting
>> to get a lock.  They have to wait to find the devname for ten minutes. 
>> Is that really normal?
> 
> It's normal for them to wait for the lock, it's not normal for the lock
> to be taken for 10 minutes.
> 
>>
>> I'm not saying that btrfs is behind it, but it does seem like something
>> is not right.
> 
> From my POV what's wrong is the fact that btrfs transaction commit is
> taking a long time. Is it possible that the underlying storage is
> exhibiting problems e.g. a dying disk/ssd?

Actually when the issue occurs again can you sample the output of echo w
> /proc/sysrq-trigger. Because right now you have provided 3 samples in
the course of I don't know how many minutes. So they just give a
momentarily glimpse into what's happening. E.g. just because we saw
btrfs transaction/btrfs_show_devname doesn't necessarily mean that's
what's happening (Though having the same consistent state in the 3 logs
kind of suggests otherwise).

> 
>>
>> I notice that there's a waiting btrfs-transation, too.  I don't know
>> what the transaction would be, and no doubt it's completely appropriate,
>> even though the use of btrfs at that point is merely one mount (and one
>> mount of ext2 over a sub-directory, probably no involvement by btrfs.)
>>
>> I see, too, that systemd is waiting for btrfs_show_devname.  It's a
>> pattern.
>>
>> I take your point about the kernel being somewhat old and accept that
>> I'm unlikely to get far without confirming the problem on a recent kernel.
>>
>>
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12  6:28           ` Nikolay Borisov
@ 2019-09-12  7:05             ` Qu Wenruo
  2019-09-12 14:03               ` David Newall
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2019-09-12  7:05 UTC (permalink / raw)
  To: Nikolay Borisov, David Newall, linux-btrfs



On 2019/9/12 下午2:28, Nikolay Borisov wrote:
>
>
> On 12.09.19 г. 9:11 ч., Nikolay Borisov wrote:
>>
>>
>> On 12.09.19 г. 7:38 ч., David Newall wrote:
>>> On 11/9/19 8:22 pm, Nikolay Borisov wrote:
>>>>> I saved it tohttps://davidnewall.com/kern.1
>>>> Nothing useful in that log, everything seems normal.
>>>
>>> Thank you, again, for your help.  I am grateful.
>>>
>>> If I understand the output, both df and mount are waiting for
>>> show_mountinfo which is waiting for btrfs_show_devname which is waiting
>>> to get a lock.  They have to wait to find the devname for ten minutes. 
>>> Is that really normal?
>>
>> It's normal for them to wait for the lock, it's not normal for the lock
>> to be taken for 10 minutes.
>>
>>>
>>> I'm not saying that btrfs is behind it, but it does seem like something
>>> is not right.
>>
>> From my POV what's wrong is the fact that btrfs transaction commit is
>> taking a long time. Is it possible that the underlying storage is
>> exhibiting problems e.g. a dying disk/ssd?
>
> Actually when the issue occurs again can you sample the output of echo w
>> /proc/sysrq-trigger. Because right now you have provided 3 samples in
> the course of I don't know how many minutes. So they just give a
> momentarily glimpse into what's happening. E.g. just because we saw
> btrfs transaction/btrfs_show_devname doesn't necessarily mean that's
> what's happening (Though having the same consistent state in the 3 logs
> kind of suggests otherwise).

I see all dumps are waiting for write_all_supers.

Would you please provide the code context of write_all_supers.isra.43+0x977?

I guess it's wait_dev_flush(), which is just really waiting for disk writes.

Would you please check how fast (or how slow in this particular case)
the related disks are?
To me, it really looks like just too slow devices.

Thanks,
Qu

>
>>
>>>
>>> I notice that there's a waiting btrfs-transation, too.  I don't know
>>> what the transaction would be, and no doubt it's completely appropriate,
>>> even though the use of btrfs at that point is merely one mount (and one
>>> mount of ext2 over a sub-directory, probably no involvement by btrfs.)
>>>
>>> I see, too, that systemd is waiting for btrfs_show_devname.  It's a
>>> pattern.
>>>
>>> I take your point about the kernel being somewhat old and accept that
>>> I'm unlikely to get far without confirming the problem on a recent kernel.
>>>
>>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12  7:05             ` Qu Wenruo
@ 2019-09-12 14:03               ` David Newall
  2019-09-12 14:12                 ` Nikolay Borisov
  2019-09-12 14:14                 ` Qu Wenruo
  0 siblings, 2 replies; 13+ messages in thread
From: David Newall @ 2019-09-12 14:03 UTC (permalink / raw)
  To: Qu Wenruo, Nikolay Borisov, linux-btrfs

Hello Qu,

Thank you very much for helping me with this.

On 12/9/19 4:35 pm, Qu Wenruo wrote:
> Would you please check how fast (or how slow in this particular case)
> the related disks are?
> To me, it really looks like just too slow devices.

I discover that you are correct about the underlying storage being 
slow.  Nikolay suggested that, too.

Although I mentioned that the filesystem is encrypted with luks on the 
VM, I didn't say that the underlying storage is connected via multipath 
iSCSI (with two paths) on the host server, and provided to the VM via 
KVM as Virtio disk, which should be fine, but, using dd (bs=1024k 
count=15) on the VM, I'm seeing a woeful 255KB/s read speed through the 
encryption layer, and 274KB/s from the raw disk.  :-(

On the host, I'm seeing 2MB/s via one path and 846KB/s via the other, so 
I think that's where I need to turn my attention.  (Time to benchmark, 
turn off one path, and speak to the DC management.)

> I see all dumps are waiting for write_all_supers.
>
> Would you please provide the code context of write_all_supers.isra.43+0x977?
>
> I guess it's wait_dev_flush(), which is just really waiting for disk writes.

Sorry, I don't understand what you mean by "code context".  Maybe the 
question is now moot.

Although it's now apparent that I've got a really slow disk, I still 
wonder if btrfs is holding a lock for an unnecessarily long time 
(assuming that it is btrfs holding the lock.)  I feel that having to 
wait tens of minutes to find the device names of mounted devices could 
never be intended, so there might be something that needs tweaking.

On 12/9/19 3:58 pm, Nikolay Borisov wrote:
> Actually when the issue occurs again can you sample the output of
> echo w > /proc/sysrq-trigger.  Because right now you have provided
> 3 samples in the course of I don't know how many minutes. So they just give
> a momentarily glimpse into what's happening. E.g. just because we saw
> btrfs transaction/btrfs_show_devname doesn't necessarily mean that's
> what's happening (Though having the same consistent state in the 3 logs
> kind of suggests otherwise).

Again, it's probably all moot, now, but I did take samples at about 
20-second intervals during 20-minutes of the "hang" period while rsync 
was running.  See https://davidnewall.com/kern.5 through kern.62.

Thanks to all for your help.

Regards,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12 14:03               ` David Newall
@ 2019-09-12 14:12                 ` Nikolay Borisov
  2019-09-12 14:16                   ` Qu Wenruo
  2019-09-12 14:14                 ` Qu Wenruo
  1 sibling, 1 reply; 13+ messages in thread
From: Nikolay Borisov @ 2019-09-12 14:12 UTC (permalink / raw)
  To: David Newall, Qu Wenruo, linux-btrfs



On 12.09.19 г. 17:03 ч., David Newall wrote:
> Hello Qu,
> 
> Thank you very much for helping me with this.
> 
> On 12/9/19 4:35 pm, Qu Wenruo wrote:
>> Would you please check how fast (or how slow in this particular case)
>> the related disks are?
>> To me, it really looks like just too slow devices.
> 
> I discover that you are correct about the underlying storage being
> slow.  Nikolay suggested that, too.
> 
> Although I mentioned that the filesystem is encrypted with luks on the
> VM, I didn't say that the underlying storage is connected via multipath
> iSCSI (with two paths) on the host server, and provided to the VM via
> KVM as Virtio disk, which should be fine, but, using dd (bs=1024k
> count=15) on the VM, I'm seeing a woeful 255KB/s read speed through the
> encryption layer, and 274KB/s from the raw disk.  :-(
> 
> On the host, I'm seeing 2MB/s via one path and 846KB/s via the other, so
> I think that's where I need to turn my attention.  (Time to benchmark,
> turn off one path, and speak to the DC management.)
> 
>> I see all dumps are waiting for write_all_supers.
>>
>> Would you please provide the code context of
>> write_all_supers.isra.43+0x977?
>>
>> I guess it's wait_dev_flush(), which is just really waiting for disk
>> writes.
> 
> Sorry, I don't understand what you mean by "code context".  Maybe the
> question is now moot.
> 
> Although it's now apparent that I've got a really slow disk, I still
> wonder if btrfs is holding a lock for an unnecessarily long time
> (assuming that it is btrfs holding the lock.)  I feel that having to
> wait tens of minutes to find the device names of mounted devices could
> never be intended, so there might be something that needs tweaking.
> 

With the kernel you are using that's how things were structured for
various reasons. Recent kernel versions do not take device_list_mutex
when printing the device name. So if you update your kernel to one which
contains :

88c14590cdd6 ("btrfs: use RCU in btrfs_show_devname for device list
traversal")

this particular problem would be gone. Looking at the history of that
commit this means kernel 4.17 at least.

<snip>

> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12 14:03               ` David Newall
  2019-09-12 14:12                 ` Nikolay Borisov
@ 2019-09-12 14:14                 ` Qu Wenruo
  1 sibling, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-09-12 14:14 UTC (permalink / raw)
  To: David Newall, Nikolay Borisov, linux-btrfs



On 2019/9/12 下午10:03, David Newall wrote:
> Hello Qu,
>
> Thank you very much for helping me with this.
>
> On 12/9/19 4:35 pm, Qu Wenruo wrote:
>> Would you please check how fast (or how slow in this particular case)
>> the related disks are?
>> To me, it really looks like just too slow devices.
>
> I discover that you are correct about the underlying storage being
> slow.  Nikolay suggested that, too.
>
> Although I mentioned that the filesystem is encrypted with luks on the
> VM, I didn't say that the underlying storage is connected via multipath
> iSCSI (with two paths) on the host server, and provided to the VM via
> KVM as Virtio disk, which should be fine, but, using dd (bs=1024k
> count=15) on the VM, I'm seeing a woeful 255KB/s read speed through the
> encryption layer, and 274KB/s from the raw disk.  :-(
>
> On the host, I'm seeing 2MB/s via one path and 846KB/s via the other, so
> I think that's where I need to turn my attention.  (Time to benchmark,
> turn off one path, and speak to the DC management.)

Glad we found the root cause.

>
>> I see all dumps are waiting for write_all_supers.
>>
>> Would you please provide the code context of
>> write_all_supers.isra.43+0x977?
>>
>> I guess it's wait_dev_flush(), which is just really waiting for disk
>> writes.
>
> Sorry, I don't understand what you mean by "code context".  Maybe the
> question is now moot.
>
> Although it's now apparent that I've got a really slow disk, I still
> wonder if btrfs is holding a lock for an unnecessarily long time
> (assuming that it is btrfs holding the lock.)  I feel that having to
> wait tens of minutes to find the device names of mounted devices could
> never be intended, so there might be something that needs tweaking.

It's not completely unnecessary, but you're right, we can enhance it.

It's the device mutex. At the context of committing a transaction, we
definitely don't want a random new device joining in while we're
iterating devices to flush each device.

However you're still right, since the flush can be slow, we shouldn't
block other dev list read operations, thus it may be a good idea to make
fs_devices->device_list_mutex a rw_semaphore.
So that we only block device add/remove while still allow other device
list read-only operations to kick in.

We may need to take a look into, but please also take in mind that, the
benefit may only be obvious for such slow device, so it's up to the
developers.

Thanks,
Qu

>
> On 12/9/19 3:58 pm, Nikolay Borisov wrote:
>> Actually when the issue occurs again can you sample the output of
>> echo w > /proc/sysrq-trigger.  Because right now you have provided
>> 3 samples in the course of I don't know how many minutes. So they just
>> give
>> a momentarily glimpse into what's happening. E.g. just because we saw
>> btrfs transaction/btrfs_show_devname doesn't necessarily mean that's
>> what's happening (Though having the same consistent state in the 3 logs
>> kind of suggests otherwise).
>
> Again, it's probably all moot, now, but I did take samples at about
> 20-second intervals during 20-minutes of the "hang" period while rsync
> was running.  See https://davidnewall.com/kern.5 through kern.62.
>
> Thanks to all for your help.
>
> Regards,
>
> David
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot
  2019-09-12 14:12                 ` Nikolay Borisov
@ 2019-09-12 14:16                   ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2019-09-12 14:16 UTC (permalink / raw)
  To: Nikolay Borisov, David Newall, linux-btrfs



On 2019/9/12 下午10:12, Nikolay Borisov wrote:
>
>
> On 12.09.19 г. 17:03 ч., David Newall wrote:
>> Hello Qu,
>>
>> Thank you very much for helping me with this.
>>
>> On 12/9/19 4:35 pm, Qu Wenruo wrote:
>>> Would you please check how fast (or how slow in this particular case)
>>> the related disks are?
>>> To me, it really looks like just too slow devices.
>>
>> I discover that you are correct about the underlying storage being
>> slow.  Nikolay suggested that, too.
>>
>> Although I mentioned that the filesystem is encrypted with luks on the
>> VM, I didn't say that the underlying storage is connected via multipath
>> iSCSI (with two paths) on the host server, and provided to the VM via
>> KVM as Virtio disk, which should be fine, but, using dd (bs=1024k
>> count=15) on the VM, I'm seeing a woeful 255KB/s read speed through the
>> encryption layer, and 274KB/s from the raw disk.  :-(
>>
>> On the host, I'm seeing 2MB/s via one path and 846KB/s via the other, so
>> I think that's where I need to turn my attention.  (Time to benchmark,
>> turn off one path, and speak to the DC management.)
>>
>>> I see all dumps are waiting for write_all_supers.
>>>
>>> Would you please provide the code context of
>>> write_all_supers.isra.43+0x977?
>>>
>>> I guess it's wait_dev_flush(), which is just really waiting for disk
>>> writes.
>>
>> Sorry, I don't understand what you mean by "code context".  Maybe the
>> question is now moot.
>>
>> Although it's now apparent that I've got a really slow disk, I still
>> wonder if btrfs is holding a lock for an unnecessarily long time
>> (assuming that it is btrfs holding the lock.)  I feel that having to
>> wait tens of minutes to find the device names of mounted devices could
>> never be intended, so there might be something that needs tweaking.
>>
>
> With the kernel you are using that's how things were structured for
> various reasons. Recent kernel versions do not take device_list_mutex
> when printing the device name. So if you update your kernel to one which
> contains :
>
> 88c14590cdd6 ("btrfs: use RCU in btrfs_show_devname for device list
> traversal")

Oh, we have already addressed that in a better way than rw_semaphore...

I must have lived under a rock for a while...

Thanks,
Qu
>
> this particular problem would be gone. Looking at the history of that
> commit this means kernel 4.17 at least.
>
> <snip>
>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-09-12 14:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-11  6:45 Mount/df/PAM login hangs during rsync to btrfs subvolume, or maybe doing btrfs subvolume snapshot David Newall
2019-09-11  6:55 ` Nikolay Borisov
2019-09-11  7:03   ` David Newall
2019-09-11 10:21   ` David Newall
2019-09-11 10:52     ` Nikolay Borisov
2019-09-12  4:38       ` David Newall
2019-09-12  6:11         ` Nikolay Borisov
2019-09-12  6:28           ` Nikolay Borisov
2019-09-12  7:05             ` Qu Wenruo
2019-09-12 14:03               ` David Newall
2019-09-12 14:12                 ` Nikolay Borisov
2019-09-12 14:16                   ` Qu Wenruo
2019-09-12 14:14                 ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).