linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: LVM general discussion and development <linux-lvm@redhat.com>,
	Christian Hesse <list@eworm.de>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>,
	LVM, Heming Zhao <heming.zhao@suse.com>,
	development <linux-lvm@e1890.dsca.akamaiedge.net>
Subject: Re: [linux-lvm] [PATCH 1/1] pvscan: wait for udevd
Date: Thu, 18 Feb 2021 16:19:01 +0100	[thread overview]
Message-ID: <ea269d6255c4cd48bb60771aa7e4204e4f70db8a.camel@suse.com> (raw)
In-Reply-To: <20210217133826.u4gglfsowqfvxdff@spock.localdomain>

On Wed, 2021-02-17 at 14:38 +0100, Oleksandr Natalenko wrote:
> Hi.
> 

Thanks for the logs!

> I'm not sure this issue is reproducible with any kind of LVM layout.
> What I have is thin-LVM-on-LUKS-on-LVM:

I saw MD in your other logs...?

More comments below.

> With regard to the journal, here it is (from the same machine in the
> Arch bugreport; matches the second layout above):
> 
> 
> [~]> LC_TIME=C sudo journalctl -b -10 -u lvm2-pvscan@\*
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:28:05 CET. --
> Feb 10 17:24:17 archlinux systemd[1]: Starting LVM event activation
> on device 9:0...
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] PV /dev/md0 online,
> VG base is complete.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] VG base run
> autoactivation.
> Feb 10 17:24:17 archlinux lvm[463]:   2 logical volume(s) in volume
> group "base" now active
> Feb 10 17:24:17 archlinux systemd[1]: Finished LVM event activation
> on device 9:0.
> Feb 10 17:24:26 archlinux systemd[1]: Starting LVM event activation
> on device 253:2...
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] PV /dev/mapper/sys
> online, VG sys is complete.

All good up to here, but then...

> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> autoactivation.
> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory

What's going on here? pvscan trying to start dmeventd ? Why ? There's a
dedicated service for starting dmeventd (lvm2-monitor.service). I can
see that running dmeventd makes sense as you have thin pools, but I'm
at a loss why it has to be started at that early stage during boot
already.

This is a curious message, it looks as if pvscan was running from an
environment (initramfs??) where dmeventd wasn't available. The message
is repeated, and after that, pvscan appears to hang...


> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> sys/pool.
> Feb 10 17:24:26 archlinux systemd[1]: Stopping LVM event activation
> on device 9:0...

Here I suppose systemd is switching root, and trying to stop jobs,
including the pvscan job.


> Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> Feb 10 17:24:26 archlinux lvm[643]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 10 17:24:26 archlinux lvm[643]:   WARNING: Failed to monitor
> sys/pool.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: State
> 'stop-sigterm' timed out. Killing.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Killing
> process 643 (lvm) with signal SIGKILL.
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Main
> process exited, code=killed, status=9/KILL
> Feb 10 17:24:56 spock systemd[1]: lvm2-pvscan@253:2.service: Failed
> with result 'timeout'.
> Feb 10 17:24:56 spock systemd[1]: Stopped LVM event activation on
> device 253:2.

So what's timing out here is the attempt to _stop_ pvscan. That's
curious. It looks like a problem in pvscan to me, not having reacted to
a TERM signal for 30s.

It's also worth noting that the parallel pvscan process for device 9:0
terminated correctly (didn't hang).

> 
> [~]> LC_TIME=C sudo journalctl -b -10 --grep pvscan
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:31:27 CET. --
> Feb 10 17:24:17 archlinux systemd[1]: Created slice system-
> lvm2\x2dpvscan.slice.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] PV /dev/md0 online,
> VG base is complete.
> Feb 10 17:24:17 archlinux lvm[463]:   pvscan[463] VG base run
> autoactivation.
> Feb 10 17:24:17 archlinux audit[1]: SERVICE_START pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0
> comm="systemd" exe="/init" hostname=? addr=? terminal=? res=success'
> Feb 10 17:24:17 archlinux kernel: audit: type=1130
> audit(1612974257.986:6): pid=1 uid=0 auid=4294967295 ses=4294967295 
> msg='unit=lvm2-pvscan@9:0 comm="systemd" exe="/init" hostname=?
> addr=? terminal=? res=success'
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] PV /dev/mapper/sys
> online, VG sys is complete.
> Feb 10 17:24:26 archlinux lvm[643]:   pvscan[643] VG sys run
> autoactivation.
> Feb 10 17:24:26 archlinux lvm[720]:   pvscan[720] PV /dev/md0 online.
> Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Control
> process exited, code=killed, status=15/TERM
> Feb 10 17:24:27 spock systemd[1]: lvm2-pvscan@9:0.service: Failed
> with result 'signal'.
> Feb 10 17:24:26 spock audit[1]: SERVICE_STOP pid=1 uid=0
> auid=4294967295 ses=4294967295 msg='unit=lvm2-pvscan@9:0
> comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=?
> terminal=? res=failed'
> Feb 10 17:24:27 spock systemd[1]: Requested transaction contradicts
> existing jobs: Transaction for lvm2-pvscan@253:2.service/start is
> destructive (lvm2-pvscan@253:2.service has 'stop' job queued, but
> 'start' is included in transaction).

This one is curious, too. Apparently systemd is done switching root,
while the pvscan service from the initrd is still hanging, so the
"stop" job for it isn't finished. This looks like a problem in systemd
to me. It shouldn't have proceeded with switching root before
sucessfully terminating pvscan.


> Feb 10 17:24:27 spock systemd[1]: sys-devices-virtual-block-
> dm\x2d2.device: Failed to enqueue SYSTEMD_WANTS= job, ignoring:
> Transaction for lvm2-pvscan@253:2.service/start is destructive
> (lvm2-pvscan@253:2.service has 'stop' job queued, but 'start' is
> included in transaction).

So systemd doesn't enqueue another pvscan job for this device because 
the old one isn't stopped yet. Looks fishy, again.

> 
> FWIW, here's what it looks like now, after adding the "After=" fix:
> [~]> LC_TIME=C sudo journalctl -b -u lvm2-pvscan@\*
> -- Journal begins at Fri 2020-12-18 16:33:22 CET, ends at Wed 2021-
> 02-17 14:29:15 CET. --
> Feb 16 22:13:27 archlinux systemd[1]: Starting LVM event activation
> on device 9:0...
> Feb 16 22:13:27 archlinux lvm[495]:   pvscan[495] PV /dev/md0 online,
> VG base is complete.
> Feb 16 22:13:27 archlinux lvm[495]:   pvscan[495] VG base run
> autoactivation.
> Feb 16 22:13:27 archlinux lvm[495]:   2 logical volume(s) in volume
> group "base" now active
> Feb 16 22:13:28 archlinux systemd[1]: Finished LVM event activation
> on device 9:0.
> Feb 16 22:13:37 archlinux systemd[1]: Starting LVM event activation
> on device 253:2...
> Feb 16 22:13:37 archlinux lvm[665]:   pvscan[665] PV /dev/mapper/sys
> online, VG sys is complete.
> Feb 16 22:13:37 archlinux lvm[665]:   pvscan[665] VG sys run
> autoactivation.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux systemd[1]: Stopping LVM event activation
> on device 9:0...
> Feb 16 22:13:37 archlinux lvm[743]:   pvscan[743] PV /dev/md0 online.
> Feb 16 22:13:37 archlinux lvm[665]:   /usr/bin/dmeventd: stat failed:
> No such file or directory
> Feb 16 22:13:37 archlinux lvm[665]:   WARNING: Failed to monitor
> sys/pool.
> Feb 16 22:13:37 archlinux lvm[665]:   8 logical volume(s) in volume
> group "sys" now active
> Feb 16 22:13:37 archlinux systemd[1]: lvm2-pvscan@9:0.service:
> Succeeded.
> Feb 16 22:13:37 archlinux systemd[1]: Stopped LVM event activation on
> device 9:0.
> Feb 16 22:13:37 archlinux systemd[1]: lvm2-pvscan@253:2.service: Main
> process exited, code=killed, status=15/TERM
> Feb 16 22:13:37 archlinux systemd[1]: lvm2-pvscan@253:2.service:
> Failed with result 'signal'.
> Feb 16 22:13:37 archlinux systemd[1]: Stopped LVM event activation on
> device 253:2.

We have the same error messages about dmeventd as before. But unlike
before, pvscan was successfully terminated this time.

I think this is the explanation: with the "After=", systemd kills
pvscan first and then udevd. Wihtout the "After=", it may kill them
both at once, and that might have caused pvscan to hang and not
terminate cleanly. Apparently it was still waiting for some udev-
related action to complete.

Thanks
Martin



_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


  reply	other threads:[~2021-02-19  8:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-11 11:16 [linux-lvm] [PATCH 1/1] pvscan: wait for udevd Christian Hesse
     [not found] ` <f18dff7a-050a-d41d-c643-4616522ba4a0@suse.com>
2021-02-17  8:22   ` Martin Wilck
     [not found]     ` <20210217130329.7de41147@leda>
2021-02-17 13:38       ` Oleksandr Natalenko
2021-02-18 15:19         ` Martin Wilck [this message]
2021-02-18 15:30           ` Oleksandr Natalenko
2021-02-19  9:22             ` Martin Wilck
2021-02-19 16:37           ` David Teigland
2021-02-19 22:47             ` Zdenek Kabelac
2021-02-21 20:23               ` Martin Wilck
2021-02-22  9:57                 ` Zdenek Kabelac
2021-02-22 13:04                   ` Christian Hesse
2021-02-25 16:51                     ` Oleksandr Natalenko
2021-02-21 20:26             ` Martin Wilck
2021-02-17 13:49       ` Martin Wilck
2021-02-17 19:11         ` Oleksandr Natalenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea269d6255c4cd48bb60771aa7e4204e4f70db8a.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=heming.zhao@suse.com \
    --cc=linux-lvm@e1890.dsca.akamaiedge.net \
    --cc=linux-lvm@redhat.com \
    --cc=list@eworm.de \
    --cc=oleksandr@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).