Re: [PATCH 1/3] target: iscsi: fix hang in the iSCSI login code

From: Mike Christie <michael.christie@oracle.com>
To: Maurizio Lombardi <mlombard@redhat.com>
Cc: martin.petersen@oracle.com, serapheim.dimitro@delphix.com,
	target-devel@vger.kernel.org
Subject: Re: [PATCH 1/3] target: iscsi: fix hang in the iSCSI login code
Date: Tue, 14 Mar 2023 16:23:04 -0500	[thread overview]
Message-ID: <afb25e86-3b1a-3b19-f257-e748d0900005@oracle.com> (raw)
In-Reply-To: <CAFL455=HQ9-juB5fCqRJYmLK-jH3RuLCQM1Rk6bzG4QA-yWq4Q@mail.gmail.com>

On 3/14/23 6:09 AM, Maurizio Lombardi wrote:
> út 14. 3. 2023 v 0:52 odesílatel Mike Christie
> <michael.christie@oracle.com> napsal:
>>
>>> +     case TCP_CLOSE:
>>> +             pr_debug("__iscsi_target_sk_check_close: socket closing,"
>>>                       "returning TRUE\n");
>>
>> Don't need to break up a string. We do it a lot in the lio code, but we've
>> been trying not to in new code.
>>
>>> +             /*
>>> +              * Restart the login timer to prevent the
>>> +              * login process from getting stuck if the initiator
>>
>> I would fix up the formatting so the first line is longer.
> 
> Ok
> 
>>> @@ -1358,6 +1348,9 @@ int iscsi_target_start_negotiation(
>>>               set_bit(LOGIN_FLAGS_INITIAL_PDU, &conn->login_flags);
>>>               write_unlock_bh(&sk->sk_callback_lock);
>>>       }
>>> +
>>> +     iscsit_start_login_timer(conn);
>>
>> At this time, we have the np->np_login_timer running right?
> 
> Yes.
> 
>>
>> Don't we only need to start this new timer when we know there are
>> more PDUs and the connection is good (iscsi_target_do_login returns
>> 0 and iscsi_target_sk_check_and_clear returns 0)?
> 
> The moment iscsi_target_sk_check_and_clear() clears the
> LOGIN_FLAGS_INITIAL_PDU flag
> and returns 0, the login worker may be already running.
> If we start the timer after the call to
> iscsi_target_sk_check_and_clear(), we could have a race condition:
> 
> 1) login_work runs and reschedules itself non-stop because
> LOGIN_FLAGS_INITIAL_PDU is set
> 2) login kthread calls  iscsi_target_sk_check_and_clear() and clears
> LOGIN_FLAGS_INITIAL_PDU
> 3) login work runs and completes the login
> 4) login kthread starts the timer
> 5) No one stops the timer, it fires and kills the connection despite
> the fact the login was successful.
> 
> I could however replace this code:
>
> ret = iscsi_target_do_login(conn, login);
>  if (!ret && iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU))
>            ret = -1;
> 
> with the following, if you like it more:
> 
> ret = iscsi_target_do_login(conn, login);
> if (!ret) {
>       iscsit_start_login_timer(conn);
>       if (iscsi_target_sk_check_and_clear(conn, LOGIN_FLAGS_INITIAL_PDU)) {
>            iscsit_stop_login_timer(conn);
>            ret = -1;
>       }
> }

Ah yeah, I wasn't thinking specifically about this race when I wrote the
above comment. With the combined timer below, I was thinking this is handled
when you set/check the login_kworker thread.

> 
>>
>> I think you can just kill np timer and only use the login_timer for
>> both cases. So I mean set the thread to kill as the login one and start
>> this login_timer in __iscsi_target_login_thread where we used to call
>> iscsi_start_login_thread_timer. You would then mod the timer when we
>> transition from iscsi_target_start_negotiation to waiting for the next
>> PDU.
>>
> 
> Yes, maybe, but I would need to find a way to detect if conn->login_kworker
> is pointing to the login thread or to the login_work's thread, because
> the np_login_timer is supposed to clear the ISCSI_TF_RUNNING flag.
> 
> maybe something like this:
> 
> if (conn->login_kworker == conn->tpg_np->tpg_np->np_thread) {
>      spin_lock_bh(&np->np_thread_lock);
>      if (!(np->np_login_timer_flags & ISCSI_TF_STOP))
>            np->np_login_timer_flags &= ~ISCSI_TF_RUNNING;
>      spin_unlock_bh(&np->np_thread_lock);
> }

We don't need any of the np_login_timer_flags code if we are using your per
conn login_timer do we?

For the new timer:
- We are adding one per conn timer.
- We use that for both the initial pdu and later ones.
- The timeout function, sends a signal if there is a thread set or does whatever
we figure out below for the case where there is no thread (we don't do any
np_login_timer_flags stuff).
- We probably don't need to do both the signal and whatever we decide below.
Or, we need to check some of the LOGIN_FLAGS since for example we don't
need to queue the login_work and set LOGIN_FLAGS_CLOSED if LOGIN_FLAGS_INITIAL_PDU
is set.
- The iscsi_start_login_timer function handles setting the login_kworker thread.

So we do:
1. Replace iscsi_start_login_thread_timer/iscsi_stop_login_thread_timer with
iscsit_start_login_timer/iscsit_stop_login_timer

__iscsi_target_login_thread only calls iscsit_stop_login_timer on failure.
On success it will either timeout waiting for the next PDU or
iscsi_target_do_login_rx will called and reset the timer.

2. You also have a iscsit_mod_login_timer depending on how you want to handle
the race you described above.
3. iscsi_target_start_negotiation only mods the timer if iscsi_target_do_login/
iscsi_target_sk_check_and_clear is successful and if iscsi_target_do_login_rx
has not already run.

For the latter, in iscsi_start/mod_login_timer you could add a check like your
np_thread above where if the login_work has already reset login_kworker then
we just return. Or we can add a new LOGIN_FLAGS flag, or add a bool arg to
iscsi_start/mod_login_timer where if set to true and if the login_kworker thread
does not equal current, then you know iscsi_target_do_login_rx already took over
the timer and do nothing.

4. You call iscsit_start_login_timer/iscsit_stop_login_timer like you are in
iscsi_target_do_login_rx.

> 
>> For isert and cxgbit we won't have conn->sock set so I think you need some
>> sort of callout for those drivers, or maybe set LOGIN_FLAGS_CLOSED and queue
>> the login_work. Maybe the latter will work for all drivers as well. You probably
>> need some extra locking and LOGIN_FLAGS checks to handle an issue similar to
>> below.
> 
> Hmm, that would need to be tested, because LOGIN_FLAGS_CLOSED is supposed> to be set when the socket is already in the process of getting closed
> (it's state is TCP_CLOSE_WAIT or TCP_FIN_WAIT* or whatever)
> So If I set LOGIN_FLAGS_CLOSED in the timer and the socket is
> TCP_ESTABLISHED it means that I am trying to
> do the opposite, will the socket be properly closed
> by isert/cxgbit in this case?

I'm open to suggestions. I don't really care how we fix it, but it should
work for all the drivers.

If we go the login_work route, ignore LOGIN_FLAGS_CLOSED for a second. I'm
just talking about the code path we use for error handling in this type of
case. We can easily just add a new bit.

- For iscsit tcp, if we are reading in data in iscsit_get_login_rx, and the
timer fires and we get a signal, iscsit_get_login_rx will return a failure
and we do the goto err path. So in this case we, the connection is not closed
and we go through iscsi_target_login_drop, so we know it works.

isert/cxgbit doesn't use sockets. We don't hit any of the LOGIN_FLAGS or
sk_state checks for them. The tcp connection might or might not be closed
for both drivers when the timer fires.

- For cxgbit, we can be waiting in iscsit_get_login_rx. It seems to kick this off
after we send a login PDU. The connection might be open or closed, then we timeout
and get a signal and break from our wait in iscsit_get_login_rx. We then do the
goto err path like iscsit tcp.

- For isert, after the first PDU we queue login_work when we get a login PDU.
So, I don't think we do anything right now if after the first PDU nothing is
sent like in your bug. iscsi_target_login_sess_out works for the initial PDU
timing out though, so we know we can call it when the conn is open.

Note: The isert_disconnected_handler is sort of like tcp's sk_state_change callout,
but isert just calls to iscsit_cause_connection_reinstatement which doesn't do
anything (it doesn't set LOGIN_FLAGS and iscsi_target_nego.c doesn't have any
checks for what isert is doing).

My suggestion to set LOGIN_FLAGS_CLOSED and then to queue the login work would
work like the iscsit tcp case above. When the work runs, it sees the bit and we
go down the goto err path and run iscsi_target_login_drop.

- If your concern is that we would abuse LOGIN_FLAGS_CLOSED, then we can add a
new bit.
- I didn't write out everything. Like the ordering of cancel_delayed_work and
iscsit_stop_login_timer would need to be reversed. We probably can add some
more LOGIN_FLAGS checks like in iscsi_target_sk_state_change so we don't always
queue the login_work like is done when the ACTIVE bits are set (or we could
kill that and just always queue the login_work).

If your concern was that we have no idea about all the code paths then yeah
it needs testing.