linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* Re: [linux-lvm] [systemd-devel] Possible race condition with LVM activation during boot
       [not found] <20190206183841.69e38617@gmail.com>
@ 2019-02-07 10:18 ` Peter Rajnoha
  2019-02-07 18:13   ` suscricions
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Rajnoha @ 2019-02-07 10:18 UTC (permalink / raw)
  To: suscricions; +Cc: systemd-devel, linux-lvm

On 2/6/19 6:38 PM, suscricions@gmail.com wrote:
> Hi,
> 
> First of all apologies if this is not the correct channel to request
> for help with this issue. I've tried asking in Arch Linux forums
> without luck for the moment.
> 
> Long story short, from time to time I'm dropped to a rescue shell
> during boot because of a logical volume that cannot be found, so the
> respective .mount service fails making the local-fs.target stop normal
> boot process.
> 
> local-fs.target: Job local-fs.target/start failed with result
> 'dependency'.
> 
> Under normal circumstances I'd assume that a logical volume should be
> activated first in order to be mounted, but a few times mounting happens
> first so causing the error. I think this can be a race condition or
> something similar because strikes randomly. Booting again avoids the
> problem for the moment, which happened twice during past days.
> 
> All the relevant parts from logs and information about my system
> partition scheme is posted here:
> https://bbs.archlinux.org/viewtopic.php?pid=1830442
> 
> Hope some of you can help me to find the root cause and again apologies
> if this is not the place or the issue is too obvious.
> 

If there's a mount unit, it's bound to certain device for which systemd
waits to appear on the system. So yes, the device must be activated
first before the mounting happens. If device doesn't appear within a
timeout, you usually get dropped to a rescue shell.

Please, if you hit the problem again, try to collect LVM debug info by
running the "lvmdump -u -l -s" command that creates a tarball with
various LVM-related debug info we can analyze. It contains content of
the journal for the current boot, the udev environment, LVM
configuration, device stack listing and various other useful information
for more thorough debugging.

I'm adding CC linux-lvm, let's move this discussion there. Thanks.

-- 
Peter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] [systemd-devel] Possible race condition with LVM activation during boot
  2019-02-07 10:18 ` [linux-lvm] [systemd-devel] Possible race condition with LVM activation during boot Peter Rajnoha
@ 2019-02-07 18:13   ` suscricions
  2019-02-07 21:51     ` Martin Wilck
  0 siblings, 1 reply; 4+ messages in thread
From: suscricions @ 2019-02-07 18:13 UTC (permalink / raw)
  To: systemd-devel; +Cc: linux-lvm

El Thu, 7 Feb 2019 11:18:40 +0100
Peter Rajnoha <prajnoha@redhat.com> escribió:

> On 2/6/19 6:38 PM, suscricions@gmail.com wrote:
> > Hi,
> > 
> > First of all apologies if this is not the correct channel to request
> > for help with this issue. I've tried asking in Arch Linux forums
> > without luck for the moment.
> > 
> > Long story short, from time to time I'm dropped to a rescue shell
> > during boot because of a logical volume that cannot be found, so the
> > respective .mount service fails making the local-fs.target stop
> > normal boot process.
> > 
> > local-fs.target: Job local-fs.target/start failed with result
> > 'dependency'.
> > 
> > Under normal circumstances I'd assume that a logical volume should
> > be activated first in order to be mounted, but a few times mounting
> > happens first so causing the error. I think this can be a race
> > condition or something similar because strikes randomly. Booting
> > again avoids the problem for the moment, which happened twice
> > during past days.
> > 
> > All the relevant parts from logs and information about my system
> > partition scheme is posted here:
> > https://bbs.archlinux.org/viewtopic.php?pid=1830442
> > 
> > Hope some of you can help me to find the root cause and again
> > apologies if this is not the place or the issue is too obvious.
> >   
> 
> If there's a mount unit, it's bound to certain device for which
> systemd waits to appear on the system. So yes, the device must be
> activated first before the mounting happens. If device doesn't appear
> within a timeout, you usually get dropped to a rescue shell.
> 
> Please, if you hit the problem again, try to collect LVM debug info by
> running the "lvmdump -u -l -s" command that creates a tarball with
> various LVM-related debug info we can analyze. It contains content of
> the journal for the current boot, the udev environment, LVM
> configuration, device stack listing and various other useful
> information for more thorough debugging.
> 
> I'm adding CC linux-lvm, let's move this discussion there. Thanks.
> 

There's been a reply in Arch Linux forums and at least I can apply some
contingency measures. If it happens again I will provide more info
following your advice.

Many thanks!

--
Miguel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] [systemd-devel] Possible race condition with LVM activation during boot
  2019-02-07 18:13   ` suscricions
@ 2019-02-07 21:51     ` Martin Wilck
  2019-02-08 13:12       ` suscricions
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Wilck @ 2019-02-07 21:51 UTC (permalink / raw)
  To: suscricions, systemd-devel; +Cc: linux-lvm

On Thu, 2019-02-07 at 19:13 +0100, suscricions@gmail.com wrote:
> El Thu, 7 Feb 2019 11:18:40 +0100
> 
> There's been a reply in Arch Linux forums and at least I can apply
> some
> contingency measures. If it happens again I will provide more info
> following your advice.

The log shows clearly that the device was available first:

feb 06 12:07:09 systemd[1]: Starting File System Check on /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080...
feb 06 12:07:09 systemd-fsck[520]: multimedia: limpio, 1051/953984 ficheros, 193506294/244189184 bloques
feb 06 12:07:09 systemd[1]: Started File System Check on /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080.

That wouldn't be possible without the device being visible to the system.
Shortly after you get

feb 06 12:07:09 mount[544]: mount: /mnt/multimedia: el dispositivo especial /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080 no existe.

... So the device that had already been visible must have disappeared temporarily.
After the mount failure, we see messages about the LV becoming active:

[...]
feb 06 12:07:09 lvm[483]:   1 logical volume(s) in volume group "storage" now active
[...]
feb 06 12:07:09 lvm[494]:   1 logical volume(s) in volume group "storage" now active

There are two "lvm pvscan" processes (483 and 494) that may be interfering 
with each other and/or with the "mount" process. These processes are running 
on 8:17 (/dev/sdb1) and 254:1. I couldn't figure out from your logs what this latter
device might be. 

Wild guess: the pvscan processes working on the VG while it's already
visible are causing the device to get offline for a short time span,
and if the mount is attempted in that time window, the error occurs.

There's another lvm process (340) started earlier by the lvm2-monitor
unit ("Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or
progress polling."). Immediately after termination of this process, the
device seems to be detected by systemd and the above fsck/mount
sequence begins, while the pvscan processes are still running. "lvm2-
monitor.service" runs "vgchange --monitor y". It almost looks as if
this had caused the devices to be visible by systemd, but that would be
wrong AFAICT.

Can you reproduce this with "udev.log-priority=debug"?

Regards,
Martin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [linux-lvm] [systemd-devel] Possible race condition with LVM activation during boot
  2019-02-07 21:51     ` Martin Wilck
@ 2019-02-08 13:12       ` suscricions
  0 siblings, 0 replies; 4+ messages in thread
From: suscricions @ 2019-02-08 13:12 UTC (permalink / raw)
  To: systemd-devel; +Cc: linux-lvm

El Thu, 07 Feb 2019 22:51:17 +0100
Martin Wilck <mwilck@suse.de> escribió:

> On Thu, 2019-02-07 at 19:13 +0100, suscricions@gmail.com wrote:
> > El Thu, 7 Feb 2019 11:18:40 +0100
> > 
> > There's been a reply in Arch Linux forums and at least I can apply
> > some
> > contingency measures. If it happens again I will provide more info
> > following your advice.  
> 
> The log shows clearly that the device was available first:
> 
> feb 06 12:07:09 systemd[1]: Starting File System Check
> on /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080... feb 06
> 12:07:09 systemd-fsck[520]: multimedia: limpio, 1051/953984 ficheros,
> 193506294/244189184 bloques feb 06 12:07:09 systemd[1]: Started File
> System Check
> on /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080.
> 
> That wouldn't be possible without the device being visible to the
> system. Shortly after you get
> 
> feb 06 12:07:09 mount[544]: mount: /mnt/multimedia: el dispositivo
> especial /dev/disk/by-uuid/cabdab31-983b-401f-be30-dda0ae462080 no
> existe.
> 
> ... So the device that had already been visible must have disappeared
> temporarily. After the mount failure, we see messages about the LV
> becoming active:
> 
> [...]
> feb 06 12:07:09 lvm[483]:   1 logical volume(s) in volume group
> "storage" now active [...]
> feb 06 12:07:09 lvm[494]:   1 logical volume(s) in volume group
> "storage" now active
> 
> There are two "lvm pvscan" processes (483 and 494) that may be
> interfering with each other and/or with the "mount" process. These
> processes are running on 8:17 (/dev/sdb1) and 254:1. I couldn't
> figure out from your logs what this latter device might be. 
> 
> Wild guess: the pvscan processes working on the VG while it's already
> visible are causing the device to get offline for a short time span,
> and if the mount is attempted in that time window, the error occurs.
> 
> There's another lvm process (340) started earlier by the lvm2-monitor
> unit ("Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or
> progress polling."). Immediately after termination of this process,
> the device seems to be detected by systemd and the above fsck/mount
> sequence begins, while the pvscan processes are still running. "lvm2-
> monitor.service" runs "vgchange --monitor y". It almost looks as if
> this had caused the devices to be visible by systemd, but that would
> be wrong AFAICT.
> 
> Can you reproduce this with "udev.log-priority=debug"?
> 
> Regards,
> Martin
> 
> 
> 

In /etc/udev/udev.conf with default values there's "udev_log=info"
commented out. You mean changing logging levels here?

Meantime I've pasted online info about my lvm scheme that could help.

https://termbin.com/ipdq4


Best.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-02-08 13:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190206183841.69e38617@gmail.com>
2019-02-07 10:18 ` [linux-lvm] [systemd-devel] Possible race condition with LVM activation during boot Peter Rajnoha
2019-02-07 18:13   ` suscricions
2019-02-07 21:51     ` Martin Wilck
2019-02-08 13:12       ` suscricions

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).