linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Mike Christie <michael.christie@oracle.com>,
	Dmitry Bogdanov <d.bogdanov@yadro.com>
Cc: mlombard@redhat.com, martin.petersen@oracle.com,
	mgurtovoy@nvidia.com, linux-scsi@vger.kernel.org,
	target-devel@vger.kernel.org
Subject: Re: [PATCH v2 09/13] scsi: target: iscsit: Fix isert disconnect handling during login
Date: Mon, 23 Jan 2023 11:53:29 +0200	[thread overview]
Message-ID: <d2629836-78f0-ddc6-c654-80a14d16eec1@grimberg.me> (raw)
In-Reply-To: <e21571bf-2244-7b12-9765-f3ebe3a024ad@oracle.com>



On 1/18/23 04:14, Mike Christie wrote:
> On 1/13/23 08:08, Dmitry Bogdanov wrote:
>> On Wed, Jan 11, 2023 at 09:08:28PM -0600, Mike Christie wrote:
>>>
>>> If we get a disconnect event while logging in we can end up in a state
>>> where will never be able to relogin. This happens when:
>>>
>>> 1. login thread has put us into TARG_CONN_STATE_IN_LOGIN
>>> 2. isert then does
>>>
>>> isert_disconnected_handler -> iscsit_cause_connection_reinstatement
>>>
>>> which sets the conn connection_reinstatement flag. Nothing else happens
>>> because we are only in IN_LOGIN. The tx/rx threads are not running yet
>>> so we can't start recovery from those contexts at this time.
>>>
>>> 3. The login thread finishes processing the login pdu and thinks login is
>>> done. It sets us into TARG_CONN_STATE_LOGGED_IN/TARG_SESS_STATE_LOGGED_IN.
>>> This starts the rx/tx threads.
>>>
>>> 4. The initiator thought it disconnected the connection at 2, and has
>>> since sent a new connect which is now handled. This leads us to eventually
>>> run:
>>>
>>> iscsi_check_for_session_reinstatement-> iscsit_stop_session ->
>>> iscsit_cause_connection_reinstatement
>>>
>>> iscsit_stop_session sees the old conn and does
>>> iscsit_cause_connection_reinstatement which sees connection_reinstatement
>>> is set so it just returns instead of trying to kill the tx/rx threads
>>> which would have caused recovery to start.
>>>
>>> 5. iscsit_stop_session then waits on session_wait_comp which will never
>>> happen since we didn't kill the tx/rx threads.
>>>
>>> This has the iscsit login code check if a fabric driver ran
>>> iscsit_cause_connection_reinstatement during the login process similar
>>> to how we check for the sk state for tcp based iscsit. This will prevent
>>> us from getting into 3 and creating a ghost session.
>>>
>>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>>> ---
>>>   drivers/target/iscsi/iscsi_target_nego.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
>>> index ff49c8f3fe24..2dd81752d4c9 100644
>>> --- a/drivers/target/iscsi/iscsi_target_nego.c
>>> +++ b/drivers/target/iscsi/iscsi_target_nego.c
>>> @@ -350,6 +350,16 @@ static int iscsi_target_do_tx_login_io(struct iscsit_conn *conn, struct iscsi_lo
>>>                                              ISCSI_LOGIN_STATUS_NO_RESOURCES);
>>>                          return -1;
>>>                  }
>>> +
>>> +               /*
>>> +                * isert doesn't know the iscsit state and uses
>>> +                * iscsit_cause_connection_reinstatement as a generic error
>>> +                * notification system. It may call it before we are in FFP.
>>> +                * Handle this now in case it signaled a failure before the
>>> +                * rx/tx threads were up and could start recovery.
>>> +                */
>>> +               if (atomic_read(&conn->connection_reinstatement))
>>> +                       goto err;
>>
>> Why only for login->login_complete case? In other case the session will
>> not hang? Will it be droppped on login timeout or something else?
> 
> It will not hang. If you hit an error with isert before we think we can go into
> FFP then the login timeout currently cleans up the conn and session.
> 
>>
>> May be the root cause is point 2 itself - calling iscsit_cause_connection_reinstatement
>> in not ISER_CONN_FULL_FEATURE state where there are no TX_RX threads?
>> I mean that was a misuse of iscsit_cause_connection_reinstatement?
> 
> Let me drop this patch for now. After writing the response above about normally it just
> times out, and thinking about your question here I think to really fix this we want to
> fully integrate isert login into iscsit.
> 
> The root problem is that isert login is not integrated into iscsit, so there is really
> no error handling. iscsit_cause_connection_reinstatement was supposed to do it, but it
> doesn't do anything.
> 
> So we need to separate the LOGIN_FLAGS_CLOSED tests and setting from the iscsi_target_sk*
> code. Add a helper to set that bit and do some state checks, and make the checks generic
> in the login code (not tied to having a socket). Convert iscsit and isert to use the new
> helper. Then handle the LOGIN_FLAGS_READ_ACTIVE/LOGIN_FLAGS_WRITE_ACTIVE and login_work
> stuff. Then review cxgb?
> 
> I'll do that later after fixing the command cleanup stuff in this patchsdet and the
> other patchsets I have outstanding.

Perhaps the fix is as simple as checking the iscsi state to be also 
full-featured and if not, do a simple error handling within
iser_disconnected_handler?

  reply	other threads:[~2023-01-23  9:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12  3:08 [PATCH v2 00/13] target task management fixes Mike Christie
2023-01-12  3:08 ` [PATCH v2 01/13] scsi: target: Move sess cmd counter to new struct Mike Christie
2023-01-12  3:08 ` [PATCH v2 02/13] scsi: target: Move cmd counter allocation Mike Christie
2023-01-12  3:08 ` [PATCH v2 03/13] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
2023-01-12  3:08 ` [PATCH v2 04/13] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
2023-01-12  3:08 ` [PATCH v2 05/13] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
2023-01-12  3:08 ` [PATCH v2 06/13] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
2023-01-12  3:08 ` [PATCH v2 07/13] scsi: target: Fix multiple LUN_RESET handling Mike Christie
2023-01-12  3:08 ` [PATCH v2 08/13] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
2023-01-12  3:08 ` [PATCH v2 09/13] scsi: target: iscsit: Fix isert disconnect handling during login Mike Christie
2023-01-13 14:08   ` Dmitry Bogdanov
2023-01-18  2:14     ` Mike Christie
2023-01-23  9:53       ` Sagi Grimberg [this message]
2023-01-24  2:40         ` michael.christie
2023-01-12  3:08 ` [PATCH v2 10/13] scsi: target: iscsit: Add helper to check when cmd has failed Mike Christie
2023-01-12  3:08 ` [PATCH v2 11/13] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
2023-01-13 14:15   ` Dmitry Bogdanov
2023-01-17 11:52     ` Dmitry Bogdanov
2023-01-17 18:45       ` Mike Christie
2023-01-17 19:55         ` Dmitry Bogdanov
2023-01-18 19:12           ` Mike Christie
2023-01-12  3:08 ` [PATCH v2 12/13] scsi: target: iscsit: Cleanup isert commands at conn closure Mike Christie
2023-01-12  3:08 ` [PATCH v2 13/13] IB/isert: Fix hang in target_wait_for_cmds Mike Christie
2023-01-23  9:50   ` Sagi Grimberg
2023-01-12  3:13 ` [PATCH v2 00/13] target task management fixes Mike Christie
2023-01-23  9:54   ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2629836-78f0-ddc6-c654-80a14d16eec1@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=d.bogdanov@yadro.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=michael.christie@oracle.com \
    --cc=mlombard@redhat.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).