All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces
@ 2017-12-04 20:14 cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA
       [not found] ` <1512418484-3948-1-git-send-email-cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
       [not found] ` <9326cbf7-b533-4cf2-baac-6699ee5dc4d6@googlegroups.com>
  0 siblings, 2 replies; 5+ messages in thread
From: cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA @ 2017-12-04 20:14 UTC (permalink / raw)
  To: initramfs-u79uwXL29TY76Z2rM5mHXA
  Cc: cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA,
	si-wei.liu-QHcLZuEGTsvQT0dZR+AlfA,
	konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw

From: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>

Co-existence of iscsid and iscsistart can lead to unexpected iSCSI boot
behavior:

We ran into iSCSI netboot failure when multiple interfaces are configured
with dhcp. The problem is that iscsistart is not designed to be working
together with iscsid. When an interface gets the dhcp offer successfully,
the iscsiroot script is run which starts the iscsistart service to
establish the iSCSI session. With the existence of iscsid, the iscsistart
service's attempt to setup its own mgmt ipc fails. Instead, the request
to login to the iscsi target is handled by the mgmt ipc of iscsid. After
iscsistart finishes its login attempt, it eventually sends a
stop_event_loop request to stop the mgmt process. As the result, it
terminates iscsid.

I don't think this behavior is expected. Further, we believe the restart
of the iscsid service in iscsiroot.sh for multiple interfaces could also
interfere with each other and race. That could also cause unexpected
behavior.

Specifically, in our multiple interface iSCSI boot case, below is what
happened:

a. eth0 successfully got the dhcp offer, the iscsiroot script was run and
eventually ran "system-run" to start the "oneshot" iscsistart service.

b. Before step a succeeded, the iscsiroot script was also run for eth1.
It checks the status of the first iscsistart service instance, which was
still "activating". So the iscsistart service was restarted and that
killed the first instance. Note that in this case, the stop_event_loop
request has not been sent, and iscsid was not terminated. As the result,
the second iscsistart instance also failed because of the "existing
session" error, as the half-baked session already existed in iscsid.

Based on iscsistart(8), iscsistart's primary use is to start session
used for iSCSI root boot, and is meant to work indpependently from iscsid.
Making the two work together would need coordination between each
other, which would be complicate and unnecessary.

Therefore, to fix the issue, we'd either choose to remove the use of
iscsid and solely use iscsistart or to remove iscsistart and use iscsid
and iscsiadm to manage the iSCSI sessions.

Our fix chooses the former. We tested the change in our setup.

Signed-off-by: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>
---
 modules.d/95iscsi/cleanup-iscsi.sh   |  2 +-
 modules.d/95iscsi/iscsiroot.sh       | 25 +------------------------
 modules.d/95iscsi/module-setup.sh    | 30 ------------------------------
 modules.d/95iscsi/parse-iscsiroot.sh | 10 ----------
 4 files changed, 2 insertions(+), 65 deletions(-)

diff --git a/modules.d/95iscsi/cleanup-iscsi.sh b/modules.d/95iscsi/cleanup-iscsi.sh
index bfc8aefc..e97d65ac 100755
--- a/modules.d/95iscsi/cleanup-iscsi.sh
+++ b/modules.d/95iscsi/cleanup-iscsi.sh
@@ -1,4 +1,4 @@
 #!/bin/sh
 
-[ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] && killproc iscsiuio
+[ -e /sys/module/bnx2i ] && killproc iscsiuio
 
diff --git a/modules.d/95iscsi/iscsiroot.sh b/modules.d/95iscsi/iscsiroot.sh
index aefd263d..c7f1c474 100755
--- a/modules.d/95iscsi/iscsiroot.sh
+++ b/modules.d/95iscsi/iscsiroot.sh
@@ -36,7 +36,7 @@ iroot=${iroot#:}
 # figured out a way how to check whether this is built-in or not
 modprobe crc32c 2>/dev/null
 
-if [ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
+[ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
         iscsiuio
         > /tmp/iscsiuio-started
 fi
@@ -117,11 +117,6 @@ handle_netroot()
            mkdir -p /etc/iscsi
            ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
            > /tmp/iscsi_set_initiator
-           if [ -n "$DRACUT_SYSTEMD" ]; then
-               systemctl try-restart iscsid
-               # FIXME: iscsid is not yet ready, when the service is :-/
-               sleep 1
-           fi
     fi
 
     if [ -z "$iscsi_initiator" ]; then
@@ -138,11 +133,6 @@ handle_netroot()
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
         > /tmp/iscsi_set_initiator
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 
 
@@ -163,11 +153,6 @@ handle_netroot()
     if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 # FIXME $iscsi_protocol??
 
@@ -234,14 +219,6 @@ if [ "$netif" != "timeout" ] && getargbool 1 rd.iscsi.waitnet; then
     all_ifaces_setup || exit 0
 fi
 
-if [ "$netif" = "timeout" ] && all_ifaces_setup; then
-    # s.th. went wrong and the timeout script hits
-    # restart
-    systemctl restart iscsid
-    # damn iscsid is not ready after unit says it's ready
-    sleep 2
-fi
-
 if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
     if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
         handle_firmware
diff --git a/modules.d/95iscsi/module-setup.sh b/modules.d/95iscsi/module-setup.sh
index 04937b5b..1d185a84 100755
--- a/modules.d/95iscsi/module-setup.sh
+++ b/modules.d/95iscsi/module-setup.sh
@@ -199,36 +199,6 @@ install() {
     inst "$moddir/iscsiroot.sh" "/sbin/iscsiroot"
     if ! dracut_module_included "systemd"; then
         inst "$moddir/mount-lun.sh" "/bin/mount-lun.sh"
-    else
-        inst_multiple -o \
-                      $systemdsystemunitdir/iscsi.service \
-                      $systemdsystemunitdir/iscsid.service \
-                      $systemdsystemunitdir/iscsid.socket \
-                      $systemdsystemunitdir/iscsiuio.service \
-                      $systemdsystemunitdir/iscsiuio.socket \
-                      iscsiadm iscsid
-
-        mkdir -p "${initdir}/$systemdsystemunitdir/sockets.target.wants"
-        for i in \
-                iscsiuio.socket \
-            ; do
-            ln_r "$systemdsystemunitdir/${i}" "$systemdsystemunitdir/sockets.target.wants/${i}"
-        done
-
-        mkdir -p "${initdir}/$systemdsystemunitdir/basic.target.wants"
-        for i in \
-                iscsid.service \
-            ; do
-            ln_r "$systemdsystemunitdir/${i}" "$systemdsystemunitdir/basic.target.wants/${i}"
-        done
-
-        # Make sure iscsid is started after dracut-cmdline and ready for the initqueue
-        mkdir -p "${initdir}/$systemdsystemunitdir/iscsid.service.d"
-        (
-            echo "[Unit]"
-            echo "After=dracut-cmdline.service"
-            echo "Before=dracut-initqueue.service"
-        ) > "${initdir}/$systemdsystemunitdir/iscsid.service.d/dracut.conf"
     fi
     inst_dir /var/lib/iscsi
     dracut_need_initqueue
diff --git a/modules.d/95iscsi/parse-iscsiroot.sh b/modules.d/95iscsi/parse-iscsiroot.sh
index 43b2e088..c8c66ccf 100755
--- a/modules.d/95iscsi/parse-iscsiroot.sh
+++ b/modules.d/95iscsi/parse-iscsiroot.sh
@@ -116,11 +116,6 @@ if arg=$(getarg rd.iscsi.initiator -d iscsi_initiator=) && [ -n "$arg" ] && ! [
     if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 fi
 
@@ -133,11 +128,6 @@ if [ -z $iscsi_initiator ] && [ -f /sys/firmware/ibft/initiator/initiator-name ]
         mkdir -p /etc/iscsi
         ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
         > /tmp/iscsi_set_initiator
-        if [ -n "$DRACUT_SYSTEMD" ]; then
-            systemctl try-restart iscsid
-            # FIXME: iscsid is not yet ready, when the service is :-/
-            sleep 1
-        fi
     fi
 fi
 
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces
       [not found] ` <1512418484-3948-1-git-send-email-cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-12-08 19:09   ` Cathy Zhou
  0 siblings, 0 replies; 5+ messages in thread
From: Cathy Zhou @ 2017-12-08 19:09 UTC (permalink / raw)
  To: initramfs-u79uwXL29TY76Z2rM5mHXA
  Cc: si-wei.liu-QHcLZuEGTsvQT0dZR+AlfA,
	konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
	open-iscsi-/JYPxA39Uh5TLH3MbocFFw

A Reminder...

Looking forward to any comments.

Thanks
- Cathy

On 12/4/2017 12:14 PM, cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote:
> From: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>
>
> Co-existence of iscsid and iscsistart can lead to unexpected iSCSI boot
> behavior:
>
> We ran into iSCSI netboot failure when multiple interfaces are configured
> with dhcp. The problem is that iscsistart is not designed to be working
> together with iscsid. When an interface gets the dhcp offer successfully,
> the iscsiroot script is run which starts the iscsistart service to
> establish the iSCSI session. With the existence of iscsid, the iscsistart
> service's attempt to setup its own mgmt ipc fails. Instead, the request
> to login to the iscsi target is handled by the mgmt ipc of iscsid. After
> iscsistart finishes its login attempt, it eventually sends a
> stop_event_loop request to stop the mgmt process. As the result, it
> terminates iscsid.
>
> I don't think this behavior is expected. Further, we believe the restart
> of the iscsid service in iscsiroot.sh for multiple interfaces could also
> interfere with each other and race. That could also cause unexpected
> behavior.
>
> Specifically, in our multiple interface iSCSI boot case, below is what
> happened:
>
> a. eth0 successfully got the dhcp offer, the iscsiroot script was run and
> eventually ran "system-run" to start the "oneshot" iscsistart service.
>
> b. Before step a succeeded, the iscsiroot script was also run for eth1.
> It checks the status of the first iscsistart service instance, which was
> still "activating". So the iscsistart service was restarted and that
> killed the first instance. Note that in this case, the stop_event_loop
> request has not been sent, and iscsid was not terminated. As the result,
> the second iscsistart instance also failed because of the "existing
> session" error, as the half-baked session already existed in iscsid.
>
> Based on iscsistart(8), iscsistart's primary use is to start session
> used for iSCSI root boot, and is meant to work indpependently from iscsid.
> Making the two work together would need coordination between each
> other, which would be complicate and unnecessary.
>
> Therefore, to fix the issue, we'd either choose to remove the use of
> iscsid and solely use iscsistart or to remove iscsistart and use iscsid
> and iscsiadm to manage the iSCSI sessions.
>
> Our fix chooses the former. We tested the change in our setup.
>
> Signed-off-by: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>
> ---
>   modules.d/95iscsi/cleanup-iscsi.sh   |  2 +-
>   modules.d/95iscsi/iscsiroot.sh       | 25 +------------------------
>   modules.d/95iscsi/module-setup.sh    | 30 ------------------------------
>   modules.d/95iscsi/parse-iscsiroot.sh | 10 ----------
>   4 files changed, 2 insertions(+), 65 deletions(-)
>
> diff --git a/modules.d/95iscsi/cleanup-iscsi.sh b/modules.d/95iscsi/cleanup-iscsi.sh
> index bfc8aefc..e97d65ac 100755
> --- a/modules.d/95iscsi/cleanup-iscsi.sh
> +++ b/modules.d/95iscsi/cleanup-iscsi.sh
> @@ -1,4 +1,4 @@
>   #!/bin/sh
>   
> -[ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] && killproc iscsiuio
> +[ -e /sys/module/bnx2i ] && killproc iscsiuio
>   
> diff --git a/modules.d/95iscsi/iscsiroot.sh b/modules.d/95iscsi/iscsiroot.sh
> index aefd263d..c7f1c474 100755
> --- a/modules.d/95iscsi/iscsiroot.sh
> +++ b/modules.d/95iscsi/iscsiroot.sh
> @@ -36,7 +36,7 @@ iroot=${iroot#:}
>   # figured out a way how to check whether this is built-in or not
>   modprobe crc32c 2>/dev/null
>   
> -if [ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
> +[ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
>           iscsiuio
>           > /tmp/iscsiuio-started
>   fi
> @@ -117,11 +117,6 @@ handle_netroot()
>              mkdir -p /etc/iscsi
>              ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
>              > /tmp/iscsi_set_initiator
> -           if [ -n "$DRACUT_SYSTEMD" ]; then
> -               systemctl try-restart iscsid
> -               # FIXME: iscsid is not yet ready, when the service is :-/
> -               sleep 1
> -           fi
>       fi
>   
>       if [ -z "$iscsi_initiator" ]; then
> @@ -138,11 +133,6 @@ handle_netroot()
>           mkdir -p /etc/iscsi
>           ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
>           > /tmp/iscsi_set_initiator
> -        if [ -n "$DRACUT_SYSTEMD" ]; then
> -            systemctl try-restart iscsid
> -            # FIXME: iscsid is not yet ready, when the service is :-/
> -            sleep 1
> -        fi
>       fi
>   
>   
> @@ -163,11 +153,6 @@ handle_netroot()
>       if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
>           mkdir -p /etc/iscsi
>           ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
> -        if [ -n "$DRACUT_SYSTEMD" ]; then
> -            systemctl try-restart iscsid
> -            # FIXME: iscsid is not yet ready, when the service is :-/
> -            sleep 1
> -        fi
>       fi
>   # FIXME $iscsi_protocol??
>   
> @@ -234,14 +219,6 @@ if [ "$netif" != "timeout" ] && getargbool 1 rd.iscsi.waitnet; then
>       all_ifaces_setup || exit 0
>   fi
>   
> -if [ "$netif" = "timeout" ] && all_ifaces_setup; then
> -    # s.th. went wrong and the timeout script hits
> -    # restart
> -    systemctl restart iscsid
> -    # damn iscsid is not ready after unit says it's ready
> -    sleep 2
> -fi
> -
>   if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
>       if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
>           handle_firmware
> diff --git a/modules.d/95iscsi/module-setup.sh b/modules.d/95iscsi/module-setup.sh
> index 04937b5b..1d185a84 100755
> --- a/modules.d/95iscsi/module-setup.sh
> +++ b/modules.d/95iscsi/module-setup.sh
> @@ -199,36 +199,6 @@ install() {
>       inst "$moddir/iscsiroot.sh" "/sbin/iscsiroot"
>       if ! dracut_module_included "systemd"; then
>           inst "$moddir/mount-lun.sh" "/bin/mount-lun.sh"
> -    else
> -        inst_multiple -o \
> -                      $systemdsystemunitdir/iscsi.service \
> -                      $systemdsystemunitdir/iscsid.service \
> -                      $systemdsystemunitdir/iscsid.socket \
> -                      $systemdsystemunitdir/iscsiuio.service \
> -                      $systemdsystemunitdir/iscsiuio.socket \
> -                      iscsiadm iscsid
> -
> -        mkdir -p "${initdir}/$systemdsystemunitdir/sockets.target.wants"
> -        for i in \
> -                iscsiuio.socket \
> -            ; do
> -            ln_r "$systemdsystemunitdir/${i}" "$systemdsystemunitdir/sockets.target.wants/${i}"
> -        done
> -
> -        mkdir -p "${initdir}/$systemdsystemunitdir/basic.target.wants"
> -        for i in \
> -                iscsid.service \
> -            ; do
> -            ln_r "$systemdsystemunitdir/${i}" "$systemdsystemunitdir/basic.target.wants/${i}"
> -        done
> -
> -        # Make sure iscsid is started after dracut-cmdline and ready for the initqueue
> -        mkdir -p "${initdir}/$systemdsystemunitdir/iscsid.service.d"
> -        (
> -            echo "[Unit]"
> -            echo "After=dracut-cmdline.service"
> -            echo "Before=dracut-initqueue.service"
> -        ) > "${initdir}/$systemdsystemunitdir/iscsid.service.d/dracut.conf"
>       fi
>       inst_dir /var/lib/iscsi
>       dracut_need_initqueue
> diff --git a/modules.d/95iscsi/parse-iscsiroot.sh b/modules.d/95iscsi/parse-iscsiroot.sh
> index 43b2e088..c8c66ccf 100755
> --- a/modules.d/95iscsi/parse-iscsiroot.sh
> +++ b/modules.d/95iscsi/parse-iscsiroot.sh
> @@ -116,11 +116,6 @@ if arg=$(getarg rd.iscsi.initiator -d iscsi_initiator=) && [ -n "$arg" ] && ! [
>       if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
>           mkdir -p /etc/iscsi
>           ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
> -        if [ -n "$DRACUT_SYSTEMD" ]; then
> -            systemctl try-restart iscsid
> -            # FIXME: iscsid is not yet ready, when the service is :-/
> -            sleep 1
> -        fi
>       fi
>   fi
>   
> @@ -133,11 +128,6 @@ if [ -z $iscsi_initiator ] && [ -f /sys/firmware/ibft/initiator/initiator-name ]
>           mkdir -p /etc/iscsi
>           ln -fs /run/initiatorname.iscsi /etc/iscsi/initiatorname.iscsi
>           > /tmp/iscsi_set_initiator
> -        if [ -n "$DRACUT_SYSTEMD" ]; then
> -            systemctl try-restart iscsid
> -            # FIXME: iscsid is not yet ready, when the service is :-/
> -            sleep 1
> -        fi
>       fi
>   fi
>   

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces
       [not found]         ` <B00ADBFE-F00B-4BCD-A7FE-0FEFE2A64F9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-21 18:54           ` Cathy Zhou
       [not found]             ` <29030bdf-e65a-dc01-ee1e-48498dafdc8f-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Cathy Zhou @ 2017-12-21 18:54 UTC (permalink / raw)
  To: Lee Duncan; +Cc: open-iscsi, initramfs-u79uwXL29TY76Z2rM5mHXA, Si-Wei


On 12/20/2017 5:39 PM, Lee Duncan wrote:
> Hi Cathy:
>
> My comments are (well) below ...
Lee,

Thanks. See more inline:
>
>> On Dec 20, 2017, at 3:04 PM, Cathy Zhou <cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org 
>> <mailto:cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>> wrote:
>>
>> Lee-man,
>>
>> Thanks for your comments. See inline:
>>
>> On 12/20/2017 1:47 PM, The Lee-Man wrote:
>>>
>>>
>>> On Friday, December 8, 2017 at 4:30:36 PM UTC-8, The Lee-Man wrote:
>>>
>>>    On Thursday, December 7, 2017 at 11:49:53 AM UTC-8,
>>> cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org <mailto:cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 
>>> <mailto:cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>        From: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org 
>>> <mailto:Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>>
>>>
>>>        Co-existence of iscsid and iscsistart can lead to unexpected
>>>        iSCSI boot
>>>        behavior:
>>>
>>>        We ran into iSCSI netboot failure when multiple interfaces are
>>>        configured
>>>        with dhcp. The problem is that iscsistart is not designed to
>>>        be working
>>>        together with iscsid. When an interface gets the dhcp offer
>>>        successfully,
>>>        the iscsiroot script is run which starts the iscsistart
>>>        service to
>>>        establish the iSCSI session. With the existence of iscsid, the
>>>        iscsistart
>>>        service's attempt to setup its own mgmt ipc fails. Instead,
>>>        the request
>>>        to login to the iscsi target is handled by the mgmt ipc of
>>>        iscsid. After
>>>        iscsistart finishes its login attempt, it eventually sends a
>>>        stop_event_loop request to stop the mgmt process. As the
>>>        result, it
>>>        terminates iscsid.
>>>
>>>        I don't think this behavior is expected. Further, we believe
>>>        the restart
>>>        of the iscsid service in iscsiroot.sh for multiple interfaces
>>>        could also
>>>        interfere with each other and race. That could also cause
>>>        unexpected
>>>        behavior.
>>>
>>>        Specifically, in our multiple interface iSCSI boot case, below
>>>        is what
>>>        happened:
>>>
>>>        a. eth0 successfully got the dhcp offer, the iscsiroot script
>>>        was run and
>>>        eventually ran "system-run" to start the "oneshot" iscsistart
>>>        service.
>>>
>>>        b. Before step a succeeded, the iscsiroot script was also run
>>>        for eth1.
>>>        It checks the status of the first iscsistart service instance,
>>>        which was
>>>        still "activating". So the iscsistart service was restarted
>>>        and that
>>>        killed the first instance. Note that in this case, the
>>>        stop_event_loop
>>>        request has not been sent, and iscsid was not terminated. As
>>>        the result,
>>>        the second iscsistart instance also failed because of the
>>>        "existing
>>>        session" error, as the half-baked session already existed in
>>>        iscsid.
>>>
>>>        Based on iscsistart(8), iscsistart's primary use is to start
>>>        session
>>>        used for iSCSI root boot, and is meant to work indpependently
>>>        from iscsid.
>>>        Making the two work together would need coordination between each
>>>        other, which would be complicate and unnecessary.
>>>
>>>
>>>    No, I believe iscsistart was mean to run *instead* of iscsid, not
>>>    independently.
>>>
>> That was what I meant: iscsistart is not to run together with iscsid.
>>>
>>>
>>>        Therefore, to fix the issue, we'd either choose to remove the
>>>        use of
>>>        iscsid and solely use iscsistart or to remove iscsistart and
>>>        use iscsid
>>>        and iscsiadm to manage the iSCSI sessions.
>>>
>>>        Our fix chooses the former. We tested the change in our setup.
>>>
>>>        Signed-off-by: Cathy Zhou <Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org 
>>> <mailto:Cathy.Zhou-veTT2BtV2gBXrIkS9f7CXA@public.gmane.org>>
>>>        ---
>>>         modules.d/95iscsi/cleanup-iscsi.sh   |  2 +-
>>>         modules.d/95iscsi/iscsiroot.sh       | 25
>>>        +------------------------
>>>         modules.d/95iscsi/module-setup.sh    | 30
>>>        ------------------------------
>>>         modules.d/95iscsi/parse-iscsiroot.sh | 10 ----------
>>>         4 files changed, 2 insertions(+), 65 deletions(-)
>>>
>>>        diff --git a/modules.d/95iscsi/cleanup-iscsi.sh
>>>        b/modules.d/95iscsi/cleanup-iscsi.sh
>>>        index bfc8aefc..e97d65ac 100755
>>>        --- a/modules.d/95iscsi/cleanup-iscsi.sh
>>>        +++ b/modules.d/95iscsi/cleanup-iscsi.sh
>>>        @@ -1,4 +1,4 @@
>>>         #!/bin/sh
>>>
>>>        -[ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] &&
>>>        killproc iscsiuio
>>>        +[ -e /sys/module/bnx2i ] && killproc iscsiuio
>>>
>>>
>>>    Why are you removing the DRACUT_SYSTEMD checks?
>>>
>> Because before my change, iscsiuio is started in the form of service 
>> as a dependency of the iscsid service. My change removed the iscsid 
>> service, so iscsiuio needs to be started explicitly for iSCSI offload 
>> for bnx2i driver.
>>>
>>>
>>>        diff --git a/modules.d/95iscsi/iscsiroot.sh
>>>        b/modules.d/95iscsi/iscsiroot.sh
>>>        index aefd263d..c7f1c474 100755
>>>        --- a/modules.d/95iscsi/iscsiroot.sh
>>>        +++ b/modules.d/95iscsi/iscsiroot.sh
>>>        @@ -36,7 +36,7 @@ iroot=${iroot#:}
>>>         # figured out a way how to check whether this is built-in or not
>>>         modprobe crc32c 2>/dev/null
>>>
>>>        -if [ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] &&
>>>        ! [ -e /tmp/iscsiuio-started ]; then
>>>        +[ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
>>>                 iscsiuio
>>>                 > /tmp/iscsiuio-started
>>>         fi
>>>        @@ -117,11 +117,6 @@ handle_netroot()
>>>                    mkdir -p /etc/iscsi
>>>                    ln -fs /run/initiatorname.iscsi
>>>        /etc/iscsi/initiatorname.iscsi
>>>                    > /tmp/iscsi_set_initiator
>>>        -           if [ -n "$DRACUT_SYSTEMD" ]; then
>>>        -               systemctl try-restart iscsid
>>>        -               # FIXME: iscsid is not yet ready, when the
>>>        service is :-/
>>>        -               sleep 1
>>>        -           fi
>>>             fi
>>>
>>>             if [ -z "$iscsi_initiator" ]; then
>>>        @@ -138,11 +133,6 @@ handle_netroot()
>>>                 mkdir -p /etc/iscsi
>>>                 ln -fs /run/initiatorname.iscsi
>>>        /etc/iscsi/initiatorname.iscsi
>>>                 > /tmp/iscsi_set_initiator
>>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>>        -            systemctl try-restart iscsid
>>>        -            # FIXME: iscsid is not yet ready, when the
>>>        service is :-/
>>>        -            sleep 1
>>>        -        fi
>>>             fi
>>>
>>>
>>>        @@ -163,11 +153,6 @@ handle_netroot()
>>>             if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
>>>                 mkdir -p /etc/iscsi
>>>                 ln -fs /run/initiatorname.iscsi
>>>        /etc/iscsi/initiatorname.iscsi
>>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>>        -            systemctl try-restart iscsid
>>>        -            # FIXME: iscsid is not yet ready, when the
>>>        service is :-/
>>>        -            sleep 1
>>>        -        fi
>>>             fi
>>>         # FIXME $iscsi_protocol??
>>>
>>>        @@ -234,14 +219,6 @@ if [ "$netif" != "timeout" ] &&
>>>        getargbool 1 rd.iscsi.waitnet; then
>>>             all_ifaces_setup || exit 0
>>>         fi
>>>
>>>        -if [ "$netif" = "timeout" ] && all_ifaces_setup; then
>>>        -    # s.th <http://s.th> <http://s.th>. went wrong and the 
>>> timeout script hits
>>>        -    # restart
>>>        -    systemctl restart iscsid
>>>        -    # damn iscsid is not ready after unit says it's ready
>>>        -    sleep 2
>>>        -fi
>>>        -
>>>         if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
>>>             if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
>>>                 handle_firmware
>>>        diff --git a/modules.d/95iscsi/module-setup.sh
>>>        b/modules.d/95iscsi/module-setup.sh
>>>        index 04937b5b..1d185a84 100755
>>>        --- a/modules.d/95iscsi/module-setup.sh
>>>        +++ b/modules.d/95iscsi/module-setup.sh
>>>        @@ -199,36 +199,6 @@ install() {
>>>             inst "$moddir/iscsiroot.sh" "/sbin/iscsiroot"
>>>             if ! dracut_module_included "systemd"; then
>>>                 inst "$moddir/mount-lun.sh" "/bin/mount-lun.sh"
>>>        -    else
>>>        -        inst_multiple -o \
>>>        -  $systemdsystemunitdir/iscsi.service \
>>>        -  $systemdsystemunitdir/iscsid.service \
>>>        -  $systemdsystemunitdir/iscsid.socket \
>>>        -  $systemdsystemunitdir/iscsiuio.service \
>>>        -  $systemdsystemunitdir/iscsiuio.socket \
>>>        -                      iscsiadm iscsid
>>>        -
>>>        -        mkdir -p
>>>        "${initdir}/$systemdsystemunitdir/sockets.target.wants"
>>>        -        for i in \
>>>        -                iscsiuio.socket \
>>>        -            ; do
>>>        -            ln_r "$systemdsystemunitdir/${i}"
>>>        "$systemdsystemunitdir/sockets.target.wants/${i}"
>>>        -        done
>>>        -
>>>        -        mkdir -p
>>>        "${initdir}/$systemdsystemunitdir/basic.target.wants"
>>>        -        for i in \
>>>        -                iscsid.service \
>>>        -            ; do
>>>        -            ln_r "$systemdsystemunitdir/${i}"
>>>        "$systemdsystemunitdir/basic.target.wants/${i}"
>>>        -        done
>>>        -
>>>        -        # Make sure iscsid is started after dracut-cmdline
>>>        and ready for the initqueue
>>>        -        mkdir -p
>>>        "${initdir}/$systemdsystemunitdir/iscsid.service.d"
>>>        -        (
>>>        -            echo "[Unit]"
>>>        -            echo "After=dracut-cmdline.service"
>>>        -            echo "Before=dracut-initqueue.service"
>>>        -        ) >
>>>        "${initdir}/$systemdsystemunitdir/iscsid.service.d/dracut.conf"
>>>
>>>
>>>    I'm not sure why you've removed all of this.
>>>
>> Because the above is added for the iscsid service, which is no longer 
>> needed.
>>>
>>>
>>>             fi
>>>             inst_dir /var/lib/iscsi
>>>             dracut_need_initqueue
>>>        diff --git a/modules.d/95iscsi/parse-iscsiroot.sh
>>>        b/modules.d/95iscsi/parse-iscsiroot.sh
>>>        index 43b2e088..c8c66ccf 100755
>>>        --- a/modules.d/95iscsi/parse-iscsiroot.sh
>>>        +++ b/modules.d/95iscsi/parse-iscsiroot.sh
>>>        @@ -116,11 +116,6 @@ if arg=$(getarg rd.iscsi.initiator -d
>>>        iscsi_initiator=) && [ -n "$arg" ] && ! [
>>>             if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
>>>                 mkdir -p /etc/iscsi
>>>                 ln -fs /run/initiatorname.iscsi
>>>        /etc/iscsi/initiatorname.iscsi
>>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>>        -            systemctl try-restart iscsid
>>>        -            # FIXME: iscsid is not yet ready, when the
>>>        service is :-/
>>>        -            sleep 1
>>>        -        fi
>>>             fi
>>>         fi
>>>
>>>        @@ -133,11 +128,6 @@ if [ -z $iscsi_initiator ] && [ -f
>>>        /sys/firmware/ibft/initiator/initiator-name ]
>>>                 mkdir -p /etc/iscsi
>>>                 ln -fs /run/initiatorname.iscsi
>>>        /etc/iscsi/initiatorname.iscsi
>>>                 > /tmp/iscsi_set_initiator
>>>        -        if [ -n "$DRACUT_SYSTEMD" ]; then
>>>        -            systemctl try-restart iscsid
>>>        -            # FIXME: iscsid is not yet ready, when the
>>>        service is :-/
>>>        -            sleep 1
>>>        -        fi
>>>             fi
>>>         fi
>>>
>>>        --         2.11.1
>>>
>>>    So in general you've stopped starting or restarting the iscsid
>>>    service?
>>>
>> Yes. Actually we do not even start the iscsid service from the 
>> beginning. The whole fix is to remove iscsid but only run iscsistart.
>
> Okay, I’m glad you made this clear, because this is the problem with 
> your approach.
>
> The iscsistart program is not sufficient for supporting an iSCSI 
> initiator. Namely, it does not manage sessions, so it can’t do any 
> error handling (i.e. log out, log back in).
Yes, I understand that. But I don't see this is a problem with my 
approach. From what I can see, there is no need for log out/log back in. 
iscsid is to work with iscsiadm to manage sessions. But there is no such 
usage of iscisadm in dracut as far as I can see.

There is only one iscsiadm instance, and I looked into it and made sure 
that does not need the existence of iscsid.
>
> The whole reason the iscsistart program was created was that, at iSCSI 
> boot time, we needed a lighter-weight program than the iscsiadm/iscsid 
> pair to connect to a single target.

>>>    Have you tested this with iscsiuio/brcm? I have no way currently
>>>    to test to make sure things all still work.
>>>
>> We will, and also planning to add more test cases to get more 
>> iscsiboot coverage. But because this is a significant change, we'd 
>> like to get some comments to say this is the right approach.
>>>
>>>
>>>    Note: dracut is *not* my specialty, but I have looked at this
>>>    stuff in the SUSE distribution, and it seems like a bit of a house
>>>    of cards. :(
>>>
>>
>> I agree. dracut is not my specialty either.
>>
>> Thank you very much!
>> - Cathy
>>
>
> Bottom line, I need to better understand why you are using iscsitart 
> in the first place, if you are not booting from an iSCSI disc. And if 
> you are using iscsistart, why are you running iscsid before iscsistart 
> is done?
iscsistart was used before my changes. I didn't change that. The problem 
is that iscsid was started before iscsistart and that was what I am 
trying to fix.

Hopefully my fix makes more sense now. Please let me know if you still 
have doubts.

Thanks
- Cathy
>
> -- 
> Lee Duncan
>
> "/Ambition is the last refuge of failure/." -- Oscar Wilde
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces
       [not found]             ` <29030bdf-e65a-dc01-ee1e-48498dafdc8f-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-12-22  1:03               ` Lee Duncan
       [not found]                 ` <C94AC86B-C41E-4B3A-B375-3405E25BD45D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Lee Duncan @ 2017-12-22  1:03 UTC (permalink / raw)
  To: Cathy Zhou; +Cc: open-iscsi, initramfs-u79uwXL29TY76Z2rM5mHXA, Si-Wei

Hi Cathy:

Too many quote levels, so I’m starting from scratch.

I understand now that what you are talking about is your dracut environment, not run time iscsi initiator support.

In this environment, how is iscsid getting started?

As you know, the change you propose is a dracut change, not an open-iscsi change. Never the less, I’d like to reply to your request for comment.

I think the key is in this, from your first RFC email:

>        Therefore, to fix the issue, we'd either choose to remove the
>        use of
>        iscsid and solely use iscsistart or to remove iscsistart and
>        use iscsid
>        and iscsiadm to manage the iSCSI sessions.

That seems like the wrong approach to me. iscsiadm/iscsid were designed for this. The iscsistart script was designed for sing, one-shot use, and is not used in dracut now at all, AFAIK.

— 
Lee Duncan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces
       [not found]                 ` <C94AC86B-C41E-4B3A-B375-3405E25BD45D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-22  6:20                   ` si-wei liu
  0 siblings, 0 replies; 5+ messages in thread
From: si-wei liu @ 2017-12-22  6:20 UTC (permalink / raw)
  To: Lee Duncan, Cathy Zhou; +Cc: open-iscsi, initramfs-u79uwXL29TY76Z2rM5mHXA



On 12/21/2017 05:03 PM, Lee Duncan wrote:
> That seems like the wrong approach to me. iscsiadm/iscsid were designed for this. The iscsistart script was designed for sing, one-shot use, and is not used in dracut now at all, AFAIK.
The latter is not true, unfortunately. Distros like OpenSUSE and SLES 
does move to iscsiadm to manage sessions for their dracut fork,  
however, the upstream dracut still uses iscsistart now. The context is 
that it might take a while for the SUSE patches to be upstream ready, 
hence we need an interim that addressed the particular iSCSI boot issue 
when dealing with multiple interfaces, until iscsiadm dracut patches 
become avail. Please see:

https://www.spinics.net/lists/linux-initramfs/msg04502.html
https://www.spinics.net/lists/linux-initramfs/msg04505.html
https://www.spinics.net/lists/linux-initramfs/msg04506.html


Thanks,
-Siwei

>
> —
> Lee Duncan
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-12-22  6:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-04 20:14 [RFC dracut 1/1] unexpected boot failure due to races between iscsistart for multiple interfaces cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA
     [not found] ` <1512418484-3948-1-git-send-email-cathy.zhou-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-12-08 19:09   ` Cathy Zhou
     [not found] ` <9326cbf7-b533-4cf2-baac-6699ee5dc4d6@googlegroups.com>
     [not found]   ` <a759d3d3-5c21-4a28-a205-1390561abc2a@googlegroups.com>
     [not found]     ` <213a0c37-587e-7375-341b-680faa752cde@oracle.com>
     [not found]       ` <B00ADBFE-F00B-4BCD-A7FE-0FEFE2A64F9D@gmail.com>
     [not found]         ` <B00ADBFE-F00B-4BCD-A7FE-0FEFE2A64F9D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-21 18:54           ` Cathy Zhou
     [not found]             ` <29030bdf-e65a-dc01-ee1e-48498dafdc8f-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-12-22  1:03               ` Lee Duncan
     [not found]                 ` <C94AC86B-C41E-4B3A-B375-3405E25BD45D-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-22  6:20                   ` si-wei liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.