All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: lduncan@suse.com, martin.petersen@oracle.com,
	rbharath@google.com, krisman@collabora.com,
	linux-scsi@vger.kernel.org, jejb@linux.ibm.com
Subject: [RFC 0/7] iscsi: Fix in kernel conn failure handling
Date: Sun, 11 Apr 2021 02:55:38 -0500	[thread overview]
Message-ID: <20210411075545.27866-1-michael.christie@oracle.com> (raw)

The patch 0ab710458da1 ("scsi: iscsi: Perform connection failure entirely
in kernel space") has the following regressions or bugs that this patch
set fixes.

1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new IO to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect.

2. The drivers that implement ep_disconnect expect that it's called
before conn_stop. Besides crashes, if the cleanup_task callout is called
before ep_disconnect it might free up driver/card resources for session1
then they could be allocated for session2. But because the driver's
ep_disconnect is not called it has not cleaned up the firmware so the
card is still using the resources for the original cmd.

3. The system shutdown case does not work for the eh path. Passing
stop_conn STOP_CONN_TERM will never block the session and start the
recovery timer, because for that flag userspace will do the unbind
and destroy events which would remove the devices and wake up and kill
the eh. We should be using STOP_CONN_RECOVER.

4. The stop_conn_work_fn can run after userspace has done it's
recovery and we are happily using the session. We will then end up
with various bugs depending on what is going on at the time.

When we add ep_disconnect we need to make sure they only exec once
and exec in order.

5. returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run
the in kernel stop_conn function is breaking userspace. We should have
been doing this for the caller.

The patchset should also maintain support for the fix in 7e7cd796f277
("scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim").
I'm not 100% sure about that though. This patchset allows us to do
max_active conn cleanups in parallel (256 default and up to 512). We
used to only do 1 at a time. I'm not sure if this will allow us to hit
the issue described in that patch more easily or it will be better
because we have a higher chance of cleaning up commands that can be
failed over to another path and free up dirty memory.

I'm still testing the patches, but wanted to get some feedback from
the google and collabora devs that made the original patches.




             reply	other threads:[~2021-04-11  7:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-11  7:55 Mike Christie [this message]
2021-04-11  7:55 ` [RFC 1/7] scsi: iscsi: fix iscsi cls conn state Mike Christie
2021-04-11  7:55 ` [RFC 2/7] scsi: iscsi: force immediate failure during shutdown Mike Christie
2021-04-11  7:55 ` [RFC 3/7] scsi: iscsi: make iscsi_eh_timer wq generic Mike Christie
2021-04-11  7:55 ` [RFC 4/7] scsi: iscsi: have callers do the put on iscsi_lookup_endpoint Mike Christie
2021-04-11  7:55 ` [RFC 5/7] scsi: iscsi: fix some of the bugs with Perform connection failure entirely in kernel space Mike Christie
2021-04-11  7:55 ` [RFC 6/7] scsi: iscsi: remove conn_mutex Mike Christie
2021-04-11  7:55 ` [RFC 7/7] scsi: iscsi_tcp: set no linger Mike Christie
2021-04-11  8:29   ` Mike Christie
2021-04-11 17:44 ` [RFC 0/7] iscsi: Fix in kernel conn failure handling Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210411075545.27866-1-michael.christie@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=jejb@linux.ibm.com \
    --cc=krisman@collabora.com \
    --cc=lduncan@suse.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=rbharath@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.