All of lore.kernel.org
 help / color / mirror / Atom feed
* dracut, degraded md arrays, resume and systemd.
@ 2015-03-11  0:28 NeilBrown
       [not found] ` <20150311112845.01dd3269-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2015-03-11  0:28 UTC (permalink / raw)
  To: initramfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]


hi,
 I have a problem....  I'm not entirely sure where to fix it.

 I have a system with 2 drives, each partitioned into 3 partitions and those
 partitions combined in md/raid RAID1 arrays.
 The arrays are used for /boot, swap and an LVM PV - which has the root
 filesystem in a VG.  Normally this all works wonderfully.

 If I shutdown, remove one drive, and boot - it doesn't.

 When mdadm sees a newly-degraded array like this (i.e. the one drive it
 finds doesn't have recorded that the other device is missing), it won't
 assemble the array until it is explicitly told that all devices have been
 found.

 dracut has code to do exactly this: /sbin/mdraid_start is placed on the
 'timeout' queue to be run after a suitable timeout.  And this script does
 the right thing.  The md arrays are all assembled, the LVM PV is found so
 the VG and LV is assembled, and the root filesystem is mounted..

 HOWEVER, before all that happens, systemd stops waiting and gives up -
 moments too soon.

 There are (at least) two other scripts on the 'timeout' queue which run
 immediately after mdraid_start.
 One is lvmscan, which probably does the right thing, isn't exactly necessary
 in my context I think, and certainly isn't a problem.

 But there is also the 'resume.sh' script:

        {
            printf -- "%s" 'warn "Cancelling resume operation. Device not found.";'
            printf -- ' cancel_wait_for_dev /dev/resume; rm -f -- "$job" "%s/initqueue/settled/resume.sh";\n
        } >> $hookdir/initqueue/timeout/resume.sh

 This calls cancel_wait_for_dev, which supposedly cancels the wait for the
 swap array.  Unfortunately it also cancels the wait for the root filesystem.
 I'm not completely sure why, but it certainly relates to the
     systemctl daemon-reload
 command that cancel_wait_for_dev schedules.  If I comment that out, it all
 works. (also if I boot with noresume, it all works).

 But I don't think that is all of the problem.
 If I had swap on an LVM volume, then I suspect it wouldn't quite be found by
 the time that the resume.sh script gets run, and so the resume attempt would
 incorrectly abort.

 My feeling is that if any script in the 'timeout' queue makes any progress,
 then the remaining scripts should be delayed for another timeout.

 There is code that does something a little bit like this:

    if [ $main_loop -gt $((2*$RDRETRY/3)) ]; then
        for job in $hookdir/initqueue/timeout/*.sh; do
            [ -e "$job" ] || break
            job=$job . $job
            udevadm settle --timeout=0 >/dev/null 2>&1 || main_loop=0
            [ -f $hookdir/initqueue/work ] && main_loop=0
        done
    fi

so if 'udevadm settle --timeout=0' fails, or if initqueue/work has been
created, the main_loop is set to 0, which seems to imply "try again".
However the subsequent jobs in the queue are not aborted.

Also, in my case, neither of these conditions trigger.  mdraid_start doesn't
create the 'initqueue/work' file, and 'udevadm settle' has never (as far as I
can tell) actually honoured "--timeout=0" the way it is documented.
It waits indefinitely for the queue to be empty, then succeeds.

So my proposed solution (which is really just a suggestion and I suspect
something else will be better), is to:
1/ modify the above loop to break out if main_loop is ever reset to zero, and
2/ modify mdraid_start to create initqueue/work if it finds anything to do.

The following patch makes that explicit.  It seems to fix my problem.

Is this a good way to fix it?  Is there something better?

BTW, I also have a problem in a similar config where the md/raid RAID1 is
encrypted.  Systemd gives up waiting for the encrypted device after 90
seconds:
[   92.250437] linux systemd[1]: Dependency failed for Cryptography Setup for cr_md1.

but mdraid_start doesn't get run until 120 seconds have elapsed.
I haven't looked into setup of encrypted devices yet, but if anyone has
suggestions, I'm very interested :-)

Thanks,
NeilBrown



diff --git a/modules.d/90mdraid/mdraid_start.sh b/modules.d/90mdraid/mdraid_start.sh
index 761e64f312d3..400ab5dc46c7 100755
--- a/modules.d/90mdraid/mdraid_start.sh
+++ b/modules.d/90mdraid/mdraid_start.sh
@@ -27,6 +27,7 @@ _md_force_run() {
 
         _path_d="${_path_s%/*}/degraded"
         [ ! -r "$_path_d" ] && continue
+        > $hookdir/initqueue/work
     done
 }
 
diff --git a/modules.d/98systemd/dracut-initqueue.sh b/modules.d/98systemd/dracut-initqueue.sh
index 88cd1e056ed7..af9cec2c5b8c 100755
--- a/modules.d/98systemd/dracut-initqueue.sh
+++ b/modules.d/98systemd/dracut-initqueue.sh
@@ -60,6 +60,7 @@ while :; do
             job=$job . $job
             udevadm settle --timeout=0 >/dev/null 2>&1 || main_loop=0
             [ -f $hookdir/initqueue/work ] && main_loop=0
+            [ $main_loop -eq 0 ] && break
         done
     fi
 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: dracut, degraded md arrays, resume and systemd.
       [not found] ` <20150311112845.01dd3269-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2015-03-12  7:41   ` NeilBrown
       [not found]     ` <20150312184115.34f40a09-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2015-03-12  7:41 UTC (permalink / raw)
  To: initramfs-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 6304 bytes --]

On Wed, 11 Mar 2015 11:28:45 +1100 NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:


> BTW, I also have a problem in a similar config where the md/raid RAID1 is
> encrypted.  Systemd gives up waiting for the encrypted device after 90
> seconds:
> [   92.250437] linux systemd[1]: Dependency failed for Cryptography Setup for cr_md1.
> 
> but mdraid_start doesn't get run until 120 seconds have elapsed.
> I haven't looked into setup of encrypted devices yet, but if anyone has
> suggestions, I'm very interested :-)

I've looked into this problem some more.

The main problem is that generator_wait_for_dev in rootfs-generator.sh
doesn't always do the same thing.

When you "systemctl daemon-reload", systemd will remove all
of /run/systemd/generator and then re-run all the generators.

generator_wait_for_dev will only create files in /run/systemd/generator if
the .../initqueue/finished/devexists.... file doesn't exist.
So the first time, this file is created and so are the generator files.
Subsequent times, nothing is created, so the /run/systemd/generator files are
not recreated.

This means that the timeout set by timeout.conf is ignored, and the default
timeout is used instead.
The default timeout is 90 second.  The rd.retry timeout, which determines
when the md array will be assembled, is 120 seconds.  So systemd times out
first.

If I apply this patch:

diff --git a/modules.d/98systemd/rootfs-generator.sh b/modules.d/98systemd/rootfs-generator.sh
index f3c7d1f237df..97512c07ab06 100755
--- a/modules.d/98systemd/rootfs-generator.sh
+++ b/modules.d/98systemd/rootfs-generator.sh
@@ -11,14 +11,15 @@ generator_wait_for_dev()
     _timeout=$(getarg rd.timeout)
     _timeout=${_timeout:-0}
 
-    [ -e "$hookdir/initqueue/finished/devexists-${_name}.sh" ] && return 0
+    if ! [ -e "$hookdir/initqueue/finished/devexists-${_name}.sh" ]; then
 
-    printf '[ -e "%s" ]\n' $1 \
-        >> "$hookdir/initqueue/finished/devexists-${_name}.sh"
-    {
-        printf '[ -e "%s" ] || ' $1
-        printf 'warn "\"%s\" does not exist"\n' $1
-    } >> "$hookdir/emergency/80-${_name}.sh"
+        printf '[ -e "%s" ]\n' $1 \
+            >> "$hookdir/initqueue/finished/devexists-${_name}.sh"
+        {
+            printf '[ -e "%s" ] || ' $1
+            printf 'warn "\"%s\" does not exist"\n' $1
+        } >> "$hookdir/emergency/80-${_name}.sh"
+    fi
 
     _name=$(dev_unit_name "$1")
     if ! [ -L /run/systemd/generator/initrd.target.wants/${_name}.device ]; then


Then it does the right thing ... mostly.

I get a successful boot, but there are warning messages about
a timeout waiting for dev-md-2.device.   This doesn't seem to be fatal,
but it would be good to get rid of it.

systemd knows about this device because the cryptsetup generator tells it.
So it seems to make sense to tell systemd that all devices in /etc/crypttab
should have the correct timeout.  We cannot simply use "wait_for_dev", as we
don't want to wait for the device necessarily, but we want to be sure that
systemd doesn't complain about it.

So I have split "set_systemd_timeout_for_dev" out of "wait_for_dev", and then
called it on all crypttab devices, as shown in these patches:

diff --git a/modules.d/90crypt/parse-crypt.sh b/modules.d/90crypt/parse-crypt.sh
index 94ad1f63ae6f..5a64652cc51c 100755
--- a/modules.d/90crypt/parse-crypt.sh
+++ b/modules.d/90crypt/parse-crypt.sh
@@ -14,6 +14,10 @@ else
     LUKS=$(getargs rd.luks.uuid -d rd_LUKS_UUID)
     tout=$(getarg rd.luks.key.tout)
 
+    while read _dev _uuid ; do
+        set_systemd_timeout_for_dev $_dev
+    done
+
     if [ -n "$LUKS" ]; then
         for luksid in $LUKS; do
 
diff --git a/modules.d/99base/dracut-lib.sh b/modules.d/99base/dracut-lib.sh
index 079c9a21ecad..15e6b992b114 100755
--- a/modules.d/99base/dracut-lib.sh
+++ b/modules.d/99base/dracut-lib.sh
@@ -892,12 +892,10 @@ dev_unit_name()
     printf -- "%s" "$dev"
 }
 
-# wait_for_dev <dev>
-#
-# Installs a initqueue-finished script,
-# which will cause the main loop only to exit,
-# if the device <dev> is recognized by the system.
-wait_for_dev()
+# set_systemd_timeout_for_dev <dev>
+# Set 'rd.timeout' as the systemd timeout for <dev>
+
+set_systemd_timeout_for_dev()
 {
     local _name
     local _needreload
@@ -912,19 +910,6 @@ wait_for_dev()
     _timeout=$(getarg rd.timeout)
     _timeout=${_timeout:-0}
 
-    _name="$(str_replace "$1" '/' '\x2f')"
-
-    type mark_hostonly >/dev/null 2>&1 && mark_hostonly "$hookdir/initqueue/finished/devexists-${_name}.sh"
-
-    [ -e "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh" ] && return 0
-
-    printf '[ -e "%s" ]\n' $1 \
-        >> "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh"
-    {
-        printf '[ -e "%s" ] || ' $1
-        printf 'warn "\"%s\" does not exist"\n' $1
-    } >> "${PREFIX}$hookdir/emergency/80-${_name}.sh"
-
     if [ -n "$DRACUT_SYSTEMD" ]; then
         _name=$(dev_unit_name "$1")
         if ! [ -L ${PREFIX}/etc/systemd/system/initrd.target.wants/${_name}.device ]; then
@@ -949,6 +934,36 @@ wait_for_dev()
         fi
     fi
 }
+# wait_for_dev <dev>
+#
+# Installs a initqueue-finished script,
+# which will cause the main loop only to exit,
+# if the device <dev> is recognized by the system.
+wait_for_dev()
+{
+    local _name
+    local _noreload
+
+    if [ "$1" = "-n" ]; then
+        _noreload=-n
+        shift
+    fi
+
+    _name="$(str_replace "$1" '/' '\x2f')"
+
+    type mark_hostonly >/dev/null 2>&1 && mark_hostonly "$hookdir/initqueue/finished/devexists-${_name}.sh"
+
+    [ -e "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh" ] && return 0
+
+    printf '[ -e "%s" ]\n' $1 \
+        >> "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh"
+    {
+        printf '[ -e "%s" ] || ' $1
+        printf 'warn "\"%s\" does not exist"\n' $1
+    } >> "${PREFIX}$hookdir/emergency/80-${_name}.sh"
+
+    set_systemd_timeout_for_dev $_noreload $1
+}
 
 cancel_wait_for_dev()
 {


and that seems do do what I want.

Is this an appropriate fix?
If you like I can send them as properly formatted patches.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: dracut, degraded md arrays, resume and systemd.
       [not found]     ` <20150312184115.34f40a09-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2015-03-16  8:45       ` Harald Hoyer
  0 siblings, 0 replies; 3+ messages in thread
From: Harald Hoyer @ 2015-03-16  8:45 UTC (permalink / raw)
  To: NeilBrown, initramfs-u79uwXL29TY76Z2rM5mHXA

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12.03.2015 08:41, NeilBrown wrote:
> On Wed, 11 Mar 2015 11:28:45 +1100 NeilBrown <neilb-l3A5Bk7waGM@public.gmane.org> wrote:
> 
> 
>> BTW, I also have a problem in a similar config where the md/raid RAID1
>> is encrypted.  Systemd gives up waiting for the encrypted device after
>> 90 seconds: [   92.250437] linux systemd[1]: Dependency failed for
>> Cryptography Setup for cr_md1.
>> 
>> but mdraid_start doesn't get run until 120 seconds have elapsed. I
>> haven't looked into setup of encrypted devices yet, but if anyone has 
>> suggestions, I'm very interested :-)
> 
> I've looked into this problem some more.
> 
> The main problem is that generator_wait_for_dev in rootfs-generator.sh 
> doesn't always do the same thing.
> 
> When you "systemctl daemon-reload", systemd will remove all of
> /run/systemd/generator and then re-run all the generators.
> 
> generator_wait_for_dev will only create files in /run/systemd/generator
> if the .../initqueue/finished/devexists.... file doesn't exist. So the
> first time, this file is created and so are the generator files. 
> Subsequent times, nothing is created, so the /run/systemd/generator files
> are not recreated.
> 
> This means that the timeout set by timeout.conf is ignored, and the
> default timeout is used instead. The default timeout is 90 second.  The
> rd.retry timeout, which determines when the md array will be assembled, is
> 120 seconds.  So systemd times out first.
> 
> If I apply this patch:
> 
> diff --git a/modules.d/98systemd/rootfs-generator.sh
> b/modules.d/98systemd/rootfs-generator.sh index f3c7d1f237df..97512c07ab06
> 100755 --- a/modules.d/98systemd/rootfs-generator.sh +++
> b/modules.d/98systemd/rootfs-generator.sh @@ -11,14 +11,15 @@
> generator_wait_for_dev() _timeout=$(getarg rd.timeout) 
> _timeout=${_timeout:-0}
> 
> -    [ -e "$hookdir/initqueue/finished/devexists-${_name}.sh" ] && return
> 0 +    if ! [ -e "$hookdir/initqueue/finished/devexists-${_name}.sh" ];
> then
> 
> -    printf '[ -e "%s" ]\n' $1 \ -        >>
> "$hookdir/initqueue/finished/devexists-${_name}.sh" -    { -        printf
> '[ -e "%s" ] || ' $1 -        printf 'warn "\"%s\" does not exist"\n' $1 -
> } >> "$hookdir/emergency/80-${_name}.sh" +        printf '[ -e "%s" ]\n'
> $1 \ +            >> "$hookdir/initqueue/finished/devexists-${_name}.sh" +
> { +            printf '[ -e "%s" ] || ' $1 +            printf 'warn
> "\"%s\" does not exist"\n' $1 +        } >>
> "$hookdir/emergency/80-${_name}.sh" +    fi
> 
> _name=$(dev_unit_name "$1") if ! [ -L
> /run/systemd/generator/initrd.target.wants/${_name}.device ]; then
> 
> 
> Then it does the right thing ... mostly.
> 
> I get a successful boot, but there are warning messages about a timeout
> waiting for dev-md-2.device.   This doesn't seem to be fatal, but it would
> be good to get rid of it.
> 
> systemd knows about this device because the cryptsetup generator tells
> it. So it seems to make sense to tell systemd that all devices in
> /etc/crypttab should have the correct timeout.  We cannot simply use
> "wait_for_dev", as we don't want to wait for the device necessarily, but
> we want to be sure that systemd doesn't complain about it.
> 
> So I have split "set_systemd_timeout_for_dev" out of "wait_for_dev", and
> then called it on all crypttab devices, as shown in these patches:
> 
> diff --git a/modules.d/90crypt/parse-crypt.sh
> b/modules.d/90crypt/parse-crypt.sh index 94ad1f63ae6f..5a64652cc51c
> 100755 --- a/modules.d/90crypt/parse-crypt.sh +++
> b/modules.d/90crypt/parse-crypt.sh @@ -14,6 +14,10 @@ else LUKS=$(getargs
> rd.luks.uuid -d rd_LUKS_UUID) tout=$(getarg rd.luks.key.tout)
> 
> +    while read _dev _uuid ; do +        set_systemd_timeout_for_dev
> $_dev +    done + if [ -n "$LUKS" ]; then for luksid in $LUKS; do
> 
> diff --git a/modules.d/99base/dracut-lib.sh
> b/modules.d/99base/dracut-lib.sh index 079c9a21ecad..15e6b992b114 100755 
> --- a/modules.d/99base/dracut-lib.sh +++ b/modules.d/99base/dracut-lib.sh 
> @@ -892,12 +892,10 @@ dev_unit_name() printf -- "%s" "$dev" }
> 
> -# wait_for_dev <dev> -# -# Installs a initqueue-finished script, -# which
> will cause the main loop only to exit, -# if the device <dev> is
> recognized by the system. -wait_for_dev() +# set_systemd_timeout_for_dev
> <dev> +# Set 'rd.timeout' as the systemd timeout for <dev> + 
> +set_systemd_timeout_for_dev() { local _name local _needreload @@ -912,19
> +910,6 @@ wait_for_dev() _timeout=$(getarg rd.timeout) 
> _timeout=${_timeout:-0}
> 
> -    _name="$(str_replace "$1" '/' '\x2f')" - -    type mark_hostonly
> >/dev/null 2>&1 && mark_hostonly
> "$hookdir/initqueue/finished/devexists-${_name}.sh" - -    [ -e
> "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh" ] && return
> 0 - -    printf '[ -e "%s" ]\n' $1 \ -        >>
> "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh" -    { -
> printf '[ -e "%s" ] || ' $1 -        printf 'warn "\"%s\" does not
> exist"\n' $1 -    } >> "${PREFIX}$hookdir/emergency/80-${_name}.sh" - if [
> -n "$DRACUT_SYSTEMD" ]; then _name=$(dev_unit_name "$1") if ! [ -L
> ${PREFIX}/etc/systemd/system/initrd.target.wants/${_name}.device ]; then 
> @@ -949,6 +934,36 @@ wait_for_dev() fi fi } +# wait_for_dev <dev> +# +#
> Installs a initqueue-finished script, +# which will cause the main loop
> only to exit, +# if the device <dev> is recognized by the system. 
> +wait_for_dev() +{ +    local _name +    local _noreload + +    if [ "$1"
> = "-n" ]; then +        _noreload=-n +        shift +    fi + +
> _name="$(str_replace "$1" '/' '\x2f')" + +    type mark_hostonly
> >/dev/null 2>&1 && mark_hostonly
> "$hookdir/initqueue/finished/devexists-${_name}.sh" + +    [ -e
> "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh" ] && return
> 0 + +    printf '[ -e "%s" ]\n' $1 \ +        >>
> "${PREFIX}$hookdir/initqueue/finished/devexists-${_name}.sh" +    { +
> printf '[ -e "%s" ] || ' $1 +        printf 'warn "\"%s\" does not
> exist"\n' $1 +    } >> "${PREFIX}$hookdir/emergency/80-${_name}.sh" + +
> set_systemd_timeout_for_dev $_noreload $1 +}
> 
> cancel_wait_for_dev() {
> 
> 
> and that seems do do what I want.
> 
> Is this an appropriate fix? If you like I can send them as properly
> formatted patches.
> 
> Thanks, NeilBrown
> 

Thanks for your debugging time. If you don't mind, I would like to have those
properly formatted patches :)

Thanks again!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJVBphFAAoJEANOs3ABTfJwHRUP/i9xuwxNS4/KZy9Ky1kHY4VR
ERl3VCBxG3hdyP1HrspSgsNzckS30TZKTFzZqQBN3F7ncWZFDhJ9SxSLrL9p2Ih6
kWeUg8FH+VkNRIa9F/XbnXStlKHt2YMwENmC7vt6JgwWw9eh7EQeEu8+2S4UpCB1
IKezrx/7m7vjBcqgL4wEQiJvC/a7sMrynAwz8n+V61yN8u8Dw7crQGznBC1DyiYA
SrNK21r0ET9rDVeqxgU14hUwr1YNk/B45W8ePXRxdMWwHZyCgxLkM1RApYndu8mD
QGeEM/QN4aDDN/gdF275+uznHRaqMRs88wb2ElqM5KMzWAFPTmEE+QLyYAZMaVlQ
w6KqsFT1MQWWSJgETJrDDIsarLbg7G1RODdqsD8g5JewIDab1hMsRmR+PwJMwD8R
7USsa3Knr0bVfcKqc0oEBUpuB/xLZlZpaqJOfLIv2bPB1AuyoRozE7jEztP311zq
0s+GXeL0JY4gLVnXiUQmFZ7TDA9ouUAO+DmjoACZxFhffkiZURLDi54Pwv5sHZ1y
1P6kw9InUjZIbxsbscvhManBo4huZb7JbUoB7r1f7ayzF3sRZIKZraXlN0uXk66Y
PUTk3Cs7L6GacNkDB25oIIrCd5kzYNk+Ts11m0Djr7gxQSeGPf8J6TN3Hxsa83GJ
U8wkyAH/8wWr6iZWjss6
=xd5L
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-16  8:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11  0:28 dracut, degraded md arrays, resume and systemd NeilBrown
     [not found] ` <20150311112845.01dd3269-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2015-03-12  7:41   ` NeilBrown
     [not found]     ` <20150312184115.34f40a09-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2015-03-16  8:45       ` Harald Hoyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.