All of lore.kernel.org
 help / color / mirror / Atom feed
* Multipath bootup failure
@ 2024-01-13  0:51 Benjamin Marzinski
  2024-01-15 12:06 ` Martin Wilck
  0 siblings, 1 reply; 3+ messages in thread
From: Benjamin Marzinski @ 2024-01-13  0:51 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

Booting a machine with a multipathed kpartx root device fails for me
using the fedora rawhide multipath packages, which are based on the
0.9.7 release. Using LVM on top works. The issue is that when the root
device is directly on a partition, dracut finds it on one of the path
devices, and starts using that. If multipathd isn't running when the
uevent for that path device is processed, it won't be claimed by
multipath (starting in 0.9.7), since there is no multipathd.socket in
the initramfs and no systemd_service_enabled().  Afterwards, multipathd
creates a multipath device on top of the device, claims it, and removes
the partitions. If this happens while dracut is attempting to mount the
root device, the boot fails. In practice, it usually failed for me.

Reverting 6fad1464 ("libmpathutil: remove systemd_service_enabled()")
resolves the problem. When I tried to add 

Before=dracut-pre-mount.service

to dracut's version of multipathd.service instead, it works over 95% of
the time, but it still occasionally fails. The issue is that even though
multipathd will creates the multipath device before before signaling
that it has started up, meaning that dracut won't start working towards
mounting the root device until after the multipath device exists, dracut
won't know to not use the underlying device partition until it processes
the uevents that get triggered by multipathd creating the device. And it
won't be able to use the kpartx device until in processes the uevents
that get triggered by kpartx running when processing the multipath
device uevents. Depending on how quickly dracut processes these events
relative to the rest of the bootup work, it can still hang. I've tested
adding

Before=systemd-udev-trigger.service

to multipathd.sevice with no failures so far.  This requires fixing
multipathd-configure.service, so that there aren't any dependency
conflicts, but that should happen anyway.  I need to talk to the CoreOS
people who added this, but I think the only necessary dependency for
multipathd-configure.service to come after is

After=dracut-cmdline.service

With this, I think that multipathd should always be running before
device uevents get processed, but perhaps it needs to be before
systemd-udevd.service instead.

If it's not possible to guarantee that multipathd has started before we
process uevents so that we always claim the path devices as soon as they
appear, then to close this race window, we need to either wait after
multipathd starts for all the uevents to settle (and I don't think we
want to get back into the business of relying on udev-settle), or to go
back to some method of making multipath able to claim devices before
multipathd starts. 

Or we do something more clever. Thoughts?

-Ben


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Multipath bootup failure
  2024-01-13  0:51 Multipath bootup failure Benjamin Marzinski
@ 2024-01-15 12:06 ` Martin Wilck
  2024-01-15 15:47   ` Benjamin Marzinski
  0 siblings, 1 reply; 3+ messages in thread
From: Martin Wilck @ 2024-01-15 12:06 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

On Fri, 2024-01-12 at 19:51 -0500, Benjamin Marzinski wrote:
> Booting a machine with a multipathed kpartx root device fails for me
> using the fedora rawhide multipath packages, which are based on the
> 0.9.7 release. Using LVM on top works. The issue is that when the
> root
> device is directly on a partition, dracut finds it on one of the path
> devices, and starts using that. If multipathd isn't running when the
> uevent for that path device is processed, it won't be claimed by
> multipath (starting in 0.9.7), since there is no multipathd.socket in
> the initramfs and no systemd_service_enabled().  Afterwards,
> multipathd
> creates a multipath device on top of the device, claims it, and
> removes
> the partitions. If this happens while dracut is attempting to mount
> the
> root device, the boot fails. In practice, it usually failed for me.
> 
> Reverting 6fad1464 ("libmpathutil: remove systemd_service_enabled()")
> resolves the problem. When I tried to add 
> 
> Before=dracut-pre-mount.service
> 
> to dracut's version of multipathd.service instead, it works over 95%
> of
> the time, but it still occasionally fails. The issue is that even
> though
> multipathd will creates the multipath device before before signaling
> that it has started up, meaning that dracut won't start working
> towards
> mounting the root device until after the multipath device exists,
> dracut
> won't know to not use the underlying device partition until it
> processes
> the uevents that get triggered by multipathd creating the device. And
> it
> won't be able to use the kpartx device until in processes the uevents
> that get triggered by kpartx running when processing the multipath
> device uevents. Depending on how quickly dracut processes these
> events
> relative to the rest of the bootup work, it can still hang. I've
> tested
> adding
> 
> Before=systemd-udev-trigger.service
> 
> to multipathd.sevice with no failures so far.  This requires fixing
> multipathd-configure.service, so that there aren't any dependency
> conflicts, but that should happen anyway.  I need to talk to the
> CoreOS
> people who added this, but I think the only necessary dependency for
> multipathd-configure.service to come after is

Disclaimer: I have no experience with multipathd-configure.service.

> 
> After=dracut-cmdline.service
> 
> With this, I think that multipathd should always be running before
> device uevents get processed, but perhaps it needs to be before
> systemd-udevd.service instead.

Yes indeed. I thought this was already the case with the "recent"
changes made to dracut's multipath module:

297525c fix(multipath): remove dependency on multipathd.socket
6246da4 fix(multipathd.service): drop dependencies on iscsi and iscsid
a247d2b fix(multipathd.service): adapt to upstream multipath-tools unit file
371b338 fix(multipathd.service): remove dependency on systemd-udev-settle

Basically multipathd.service should have (almost) no "After="
dependencies, making it start very early during boot, and definitely
before systemd-udev-trigger.service. Actually, it should start up after
systemd-udev-socket.service, but before systemd-udevd.service. This way
we'd ensure that we don't miss any uevents.

I don't quite understand why this wasn't the case for you. Was it
caused by multipathd.configure.service and its dependencies?

(Note also my pending dracut PR
https://github.com/dracutdevs/dracut/pull/2563#issuecomment-1823525208
where I'm trying to get rid of the dracut-specific multipathd.service
file).

> If it's not possible to guarantee that multipathd has started before
> we
> process uevents so that we always claim the path devices as soon as
> they
> appear, then to close this race window, we need to either wait after
> multipathd starts for all the uevents to settle (and I don't think we
> want to get back into the business of relying on udev-settle), or to
> go
> back to some method of making multipath able to claim devices before
> multipathd starts. 

I don't think this will be necessary. We just need to get the
dependencies right. Your example shows, though, that it might be
sufficient to just add another service (here I suspect multipathd-
configure.service) to mess up the deps. We can consider adding an
explicit

Before=systemd-udevd.service

to our unit file. This way it'd be guaranteed that we start up before
udevd, and if some other unit got it wrong, systemd should report a
dependency cycle.

Regards
Martin



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Multipath bootup failure
  2024-01-15 12:06 ` Martin Wilck
@ 2024-01-15 15:47   ` Benjamin Marzinski
  0 siblings, 0 replies; 3+ messages in thread
From: Benjamin Marzinski @ 2024-01-15 15:47 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Mon, Jan 15, 2024 at 01:06:56PM +0100, Martin Wilck wrote:
> > I've tested adding
> > 
> > Before=systemd-udev-trigger.service
> > 
> > to multipathd.sevice with no failures so far.  This requires fixing
> > multipathd-configure.service, so that there aren't any dependency
> > conflicts, but that should happen anyway.  I need to talk to the
> > CoreOS
> > people who added this, but I think the only necessary dependency for
> > multipathd-configure.service to come after is
> 
> Disclaimer: I have no experience with multipathd-configure.service.

This is from the CoreOS team, and I've never really understood their
need for it outside of setting up a new system.

> > 
> > After=dracut-cmdline.service
> > 
> > With this, I think that multipathd should always be running before
> > device uevents get processed, but perhaps it needs to be before
> > systemd-udevd.service instead.
> 
> Yes indeed. I thought this was already the case with the "recent"
> changes made to dracut's multipath module:
> 
> 297525c fix(multipath): remove dependency on multipathd.socket
> 6246da4 fix(multipathd.service): drop dependencies on iscsi and iscsid
> a247d2b fix(multipathd.service): adapt to upstream multipath-tools unit file
> 371b338 fix(multipathd.service): remove dependency on systemd-udev-settle
> 
> Basically multipathd.service should have (almost) no "After="
> dependencies, making it start very early during boot, and definitely
> before systemd-udev-trigger.service. Actually, it should start up after
> systemd-udev-socket.service, but before systemd-udevd.service. This way
> we'd ensure that we don't miss any uevents.
> 
> I don't quite understand why this wasn't the case for you. Was it
> caused by multipathd.configure.service and its dependencies?

Yeah, to get it that early I do needed to change the
multipathd-configure.service dependencies to something sensible.

> (Note also my pending dracut PR
> https://github.com/dracutdevs/dracut/pull/2563#issuecomment-1823525208
> where I'm trying to get rid of the dracut-specific multipathd.service
> file).
> 
> > If it's not possible to guarantee that multipathd has started before
> > we
> > process uevents so that we always claim the path devices as soon as
> > they
> > appear, then to close this race window, we need to either wait after
> > multipathd starts for all the uevents to settle (and I don't think we
> > want to get back into the business of relying on udev-settle), or to
> > go
> > back to some method of making multipath able to claim devices before
> > multipathd starts. 
> 
> I don't think this will be necessary. We just need to get the
> dependencies right. Your example shows, though, that it might be
> sufficient to just add another service (here I suspect multipathd-
> configure.service) to mess up the deps. We can consider adding an
> explicit
> 
> Before=systemd-udevd.service
> 
> to our unit file. This way it'd be guaranteed that we start up before
> udevd, and if some other unit got it wrong, systemd should report a
> dependency cycle.
> 

Yep, I was planning on posting a fix to change this, once I have a
discussion with the CoreOS people about changing
multipathd-configure.service.

-Ben
 
> Regards
> Martin
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-15 15:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-13  0:51 Multipath bootup failure Benjamin Marzinski
2024-01-15 12:06 ` Martin Wilck
2024-01-15 15:47   ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.