All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <mchristi@redhat.com>
To: Hou Pu <houpu@bytedance.com>,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	target-devel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] iscsi-target: fix login error when receiving is too fast
Date: Tue, 28 Apr 2020 17:50:43 +0000	[thread overview]
Message-ID: <f8e06fd6-d2d5-b237-7d32-86ee3277e85f@redhat.com> (raw)
In-Reply-To: <ef4dce23-dca8-c75a-0e18-c4bb49fe503a@bytedance.com>

On 4/26/20 1:09 AM, Hou Pu wrote:
> 
>>>>> +     */
>>>>> +    if (conn->sock) {
>>>>> +        struct sock *sk = conn->sock->sk;
>>>>> +
>>>>> +        write_lock_bh(&sk->sk_callback_lock);
>>>>> +        clear_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags);
>>>>> +        set_bit(LOGIN_FLAGS_WRITE_ACTIVE, &conn->login_flags);
>>>>> +        write_unlock_bh(&sk->sk_callback_lock);
>>>>> +    }
>>>>> +
>>>> Hey,
>>>>
>>>> I had one more question.
>>>>
>>>> With the above code, I think we have a race where if we clear the bit
>>>> above early and iscsi_target_sk_data_ready runs while
>>>> iscsi_target_do_login_rx is still running then we could queue the work
>>>> an extra time and get stuck. Because the bit is now not set, if
>>>> iscsi_target_sk_data_ready runs it will end up calling
>>>> schedule_delayed_work which will queue up the work again since the work
>>>> is running and not pending.
>>
>> Yes. I was trying to allow queuing the delayed work early.
>>
>>>>
>>>> If that is correct and we hit the race what happens if this was the
>>>> last
>>>> login pdu, and we are supposed to go to full feature phase next? For
>>>> example if iscsi_target_do_login_rx runs an extra time, will we end up
>>>> stuck waiting in iscsi_target_do_login_rx's call to:
>>>>
>>>> rc = conn->conn_transport->iscsit_get_login_rx(conn, login);
>>>>
>>>> ?
>>
>> For the last login pdu, we may have race as you said. thanks for
>> pointing it out.
>>
>> I was trying to image a case where we can hit the race, normally it is
>> case a).
>>
>> a). initiator send last login pdu -> target received -> target replied
>>
>> b). initiator send last login pdu -> target received -> initiator send
>> something -> target replied
>>
>> in case b). we will queue another delayed work which we should not. 
>> After the target replied
>>
>> the last login pdu, conn->conn_login is freed. we might visited it in
>> the delayed wo>>
>>
>>> Just answering my own question. It looks like we do not get stuck. But
>>> we either get nothing on the session so the login timeout fires and we
>>> drop the session. Or, we get a PDU and the login thread reads it in
>>> before the normal rx thread does, but it assumes it is a login related
>>> and we most likely drop the session due to invalid fields.
>>>
>>> I think in iscsi_target_restore_sock_callbacks we want to do a:
>>>
>>> cancel_delayed_work(&conn->login_work)
>>>
>>> after we reset the callbacks and drop the sk_callback_lock lock.
>>
>> I am not very sure if we could or if it is good to cancel_delayed_work
>> from the work itself.
>>
>> If it is ok then i am ok with it. Or in another way, I think we could
>> just clear
>>
>> LOGIN_FLAGS_READ_ACTIVE and set LOGIN_FLAGS_WRITE_ACTIVE
>>
>> after iscsi_target_restore_sock_callbacks when finish process last
>> login pdu.
> 
> That would look like (in iscsi_target_do_tx_login_io):
> 
> diff --git a/drivers/target/iscsi/iscsi_target_nego.c
> b/drivers/target/iscsi/iscsi_target_nego.c
> index 685d771b51d4..4d0658731382 100644
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -328,6 +328,28 @@ static int iscsi_target_do_tx_login_io(struct
> iscsi_conn *conn, struct iscsi_log
>         u32 padding = 0;
>         struct iscsi_login_rsp *login_rsp;
> 
> +       /*
> +        * LOGIN_FLAGS_READ_ACTIVE is cleared so that sk_data_ready
> +        * could be trigger again after this.
> +        *
> +        * LOGIN_FLAGS_WRITE_ACTIVE is cleared after we successfully
> +        * process a login pdu, so that sk_state_chage could do login
> +        * cleanup as needed if the socket is closed. If a delayed work is
> +        * ongoing (LOGIN_FLAGS_WRITE_ACTIVE or LOGIN_FLAGS_READ_ACTIVE),
> +        * sk_state_change will leave the cleanup to the delayed work or
> +        * it will schedule a delayed work to do cleanup.
> +        */
> +       if (conn->sock) {
> +               struct sock *sk = conn->sock->sk;
> +
> +               write_lock_bh(&sk->sk_callback_lock);
> +               if (!test_bit(LOGIN_FLAGS_INITIAL_PDU,
> &conn->login_flags)) {
> +                       clear_bit(LOGIN_FLAGS_READ_ACTIVE,
> &conn->login_flags);
> +                       set_bit(LOGIN_FLAGS_WRITE_ACTIVE,
> &conn->login_flags);
> +               }
> +               write_unlock_bh(&sk->sk_callback_lock);
> +       }

You lost me. I didn't understand this part. Would you still be doing the
above bit manipulation in iscsi_target_do_login_rx too?

Is the above code then to handle when
iscsi_target_start_negotiation->iscsi_target_do_login->iscsi_target_do_tx_login_io
runs?

I was thinking when you mentioned the final login PDU you were going to
do something when you detect login->login_complete is true.

This code is not my expertise area, so I might just not be understanding
something simple about how it all works.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Christie <mchristi@redhat.com>
To: Hou Pu <houpu@bytedance.com>,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	target-devel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] iscsi-target: fix login error when receiving is too fast
Date: Tue, 28 Apr 2020 12:50:43 -0500	[thread overview]
Message-ID: <f8e06fd6-d2d5-b237-7d32-86ee3277e85f@redhat.com> (raw)
In-Reply-To: <ef4dce23-dca8-c75a-0e18-c4bb49fe503a@bytedance.com>

On 4/26/20 1:09 AM, Hou Pu wrote:
> 
>>>>> +     */
>>>>> +    if (conn->sock) {
>>>>> +        struct sock *sk = conn->sock->sk;
>>>>> +
>>>>> +        write_lock_bh(&sk->sk_callback_lock);
>>>>> +        clear_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags);
>>>>> +        set_bit(LOGIN_FLAGS_WRITE_ACTIVE, &conn->login_flags);
>>>>> +        write_unlock_bh(&sk->sk_callback_lock);
>>>>> +    }
>>>>> +
>>>> Hey,
>>>>
>>>> I had one more question.
>>>>
>>>> With the above code, I think we have a race where if we clear the bit
>>>> above early and iscsi_target_sk_data_ready runs while
>>>> iscsi_target_do_login_rx is still running then we could queue the work
>>>> an extra time and get stuck. Because the bit is now not set, if
>>>> iscsi_target_sk_data_ready runs it will end up calling
>>>> schedule_delayed_work which will queue up the work again since the work
>>>> is running and not pending.
>>
>> Yes. I was trying to allow queuing the delayed work early.
>>
>>>>
>>>> If that is correct and we hit the race what happens if this was the
>>>> last
>>>> login pdu, and we are supposed to go to full feature phase next? For
>>>> example if iscsi_target_do_login_rx runs an extra time, will we end up
>>>> stuck waiting in iscsi_target_do_login_rx's call to:
>>>>
>>>> rc = conn->conn_transport->iscsit_get_login_rx(conn, login);
>>>>
>>>> ?
>>
>> For the last login pdu, we may have race as you said. thanks for
>> pointing it out.
>>
>> I was trying to image a case where we can hit the race, normally it is
>> case a).
>>
>> a). initiator send last login pdu -> target received -> target replied
>>
>> b). initiator send last login pdu -> target received -> initiator send
>> something -> target replied
>>
>> in case b). we will queue another delayed work which we should not. 
>> After the target replied
>>
>> the last login pdu, conn->conn_login is freed. we might visited it in
>> the delayed wo>>
>>
>>> Just answering my own question. It looks like we do not get stuck. But
>>> we either get nothing on the session so the login timeout fires and we
>>> drop the session. Or, we get a PDU and the login thread reads it in
>>> before the normal rx thread does, but it assumes it is a login related
>>> and we most likely drop the session due to invalid fields.
>>>
>>> I think in iscsi_target_restore_sock_callbacks we want to do a:
>>>
>>> cancel_delayed_work(&conn->login_work)
>>>
>>> after we reset the callbacks and drop the sk_callback_lock lock.
>>
>> I am not very sure if we could or if it is good to cancel_delayed_work
>> from the work itself.
>>
>> If it is ok then i am ok with it. Or in another way, I think we could
>> just clear
>>
>> LOGIN_FLAGS_READ_ACTIVE and set LOGIN_FLAGS_WRITE_ACTIVE
>>
>> after iscsi_target_restore_sock_callbacks when finish process last
>> login pdu.
> 
> That would look like (in iscsi_target_do_tx_login_io):
> 
> diff --git a/drivers/target/iscsi/iscsi_target_nego.c
> b/drivers/target/iscsi/iscsi_target_nego.c
> index 685d771b51d4..4d0658731382 100644
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -328,6 +328,28 @@ static int iscsi_target_do_tx_login_io(struct
> iscsi_conn *conn, struct iscsi_log
>         u32 padding = 0;
>         struct iscsi_login_rsp *login_rsp;
> 
> +       /*
> +        * LOGIN_FLAGS_READ_ACTIVE is cleared so that sk_data_ready
> +        * could be trigger again after this.
> +        *
> +        * LOGIN_FLAGS_WRITE_ACTIVE is cleared after we successfully
> +        * process a login pdu, so that sk_state_chage could do login
> +        * cleanup as needed if the socket is closed. If a delayed work is
> +        * ongoing (LOGIN_FLAGS_WRITE_ACTIVE or LOGIN_FLAGS_READ_ACTIVE),
> +        * sk_state_change will leave the cleanup to the delayed work or
> +        * it will schedule a delayed work to do cleanup.
> +        */
> +       if (conn->sock) {
> +               struct sock *sk = conn->sock->sk;
> +
> +               write_lock_bh(&sk->sk_callback_lock);
> +               if (!test_bit(LOGIN_FLAGS_INITIAL_PDU,
> &conn->login_flags)) {
> +                       clear_bit(LOGIN_FLAGS_READ_ACTIVE,
> &conn->login_flags);
> +                       set_bit(LOGIN_FLAGS_WRITE_ACTIVE,
> &conn->login_flags);
> +               }
> +               write_unlock_bh(&sk->sk_callback_lock);
> +       }

You lost me. I didn't understand this part. Would you still be doing the
above bit manipulation in iscsi_target_do_login_rx too?

Is the above code then to handle when
iscsi_target_start_negotiation->iscsi_target_do_login->iscsi_target_do_tx_login_io
runs?

I was thinking when you mentioned the final login PDU you were going to
do something when you detect login->login_complete is true.

This code is not my expertise area, so I might just not be understanding
something simple about how it all works.


  reply	other threads:[~2020-04-28 17:50 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-24 12:35 [PATCH v3 0/2] iscsi-target: fix login error when receiving is too fast Hou Pu
2020-04-24 12:35 ` Hou Pu
2020-04-24 12:35 ` [PATCH v3 1/2] " Hou Pu
2020-04-24 12:35   ` Hou Pu
2020-04-24 17:01   ` Mike Christie
2020-04-24 17:01     ` Mike Christie
2020-04-25 17:45     ` Mike Christie
2020-04-25 17:45       ` Mike Christie
2020-04-26  5:49       ` [External] " Hou Pu
2020-04-26  5:49         ` Hou Pu
2020-04-26  6:09         ` Hou Pu
2020-04-26  6:09           ` Hou Pu
2020-04-28 17:50           ` Mike Christie [this message]
2020-04-28 17:50             ` Mike Christie
2020-04-29  6:59             ` Hou Pu
2020-04-29  6:59               ` Hou Pu
2020-04-24 12:35 ` [PATCH v3 2/2] iscsi-target: Fix inconsistent debug message in __iscsi_target_sk_check_close Hou Pu
2020-04-24 12:35   ` Hou Pu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8e06fd6-d2d5-b237-7d32-86ee3277e85f@redhat.com \
    --to=mchristi@redhat.com \
    --cc=houpu@bytedance.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.