From: Mike Christie <michael.christie@oracle.com>
To: Maurizio Lombardi <mlombard@redhat.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org, target-devel@vger.kernel.org,
bvanassche@acm.org
Subject: Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
Date: Wed, 28 Oct 2020 20:37:55 +0000 [thread overview]
Message-ID: <d7107857-ef7a-3c88-8146-a5e7abce5ce6@oracle.com> (raw)
In-Reply-To: <a936cc4e-1610-5201-5960-107689b81820@redhat.com>
On 10/28/20 12:09 PM, Maurizio Lombardi wrote:
>
>
> Dne 27. 10. 20 v 21:03 Michael Christie napsal(a):
>>
>>
>>> On Oct 27, 2020, at 12:54 PM, Mike Christie <michael.christie@oracle.com> wrote:
>>>
>>> On 10/27/20 8:49 AM, Maurizio Lombardi wrote:
>>>> Hello Mike,
>>>>
>>>> Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
>>>>> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
>>>>>
>>>>> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
>>>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
>>>>>
>>>>> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
>>>>>
>>>>> The cmd is now freed.
>>>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
>>>>>
>>>>>
>>>>
>>>> Thanks for the review!
>>>>
>>>> There are definitely some problems with task aborts and commands' refcounting *
>>>> but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
>>>> unless you are saying that abort tasks should never be executed when the connection
>>>> is going down and we have to prevent such cases from happening at all.
>>>
>>> Yeah, I think if we prevent the race then we fix the refcount issue and your issue.
>>> Here is a patch that is only compile tested:
>>>
>>> From 209709bcedd9a6ce6003e6bb86f3ebf547dca6af Mon Sep 17 00:00:00 2001
>>> From: Mike Christie <michael.christie@oracle.com>
>>> Date: Tue, 27 Oct 2020 12:30:53 -0500
>>> Subject: [PATCH] iscsi target: fix cmd abort vs fabric stop race
>>>
>>> The abort and cmd stop paths can race where:
>>>
>>> 1. thread1 runs iscsit_release_commands_from_conn and sets
>>> CMD_T_FABRIC_STOP.
>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
>>> then returns from the aborted_task callout and we finish
>>> target_handle_abort and do:
>>>
>>> target_handle_abort -> transport_cmd_check_stop_to_fabric ->
>>> lio_check_stop_free -> target_put_sess_cmd
>>>
>>> The cmd is now freed.
>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs
>>> iscsit_free_cmd while accessing a command we just released.
>>>
>>> In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
>>> CMD_T_ABORTED if the driver is not cleaning up the cmd because of
>>> a session shutdown. However, iscsit_release_commands_from_conn only
>>> sets the CMD_T_FABRIC_STOP and does not check to see if the abort path
>>> has claimed completion ownership of the command.
>>>
>>> This adds a check in iscsit_release_commands_from_conn so only the
>>> abort or fabric stop path cleanup the command.
>>> ---
>>> drivers/target/iscsi/iscsi_target.c | 13 +++++++++++--
>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>>> index f77e5ee..85027d3 100644
>>> --- a/drivers/target/iscsi/iscsi_target.c
>>> +++ b/drivers/target/iscsi/iscsi_target.c
>>> @@ -483,8 +483,7 @@ int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>>> void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>>> {
>>> spin_lock_bh(&conn->cmd_lock);
>>> - if (!list_empty(&cmd->i_conn_node) &&
>>> - !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
>>> + if (!list_empty(&cmd->i_conn_node))
>>> list_del_init(&cmd->i_conn_node);
>>> spin_unlock_bh(&conn->cmd_lock);
>>>
>>> @@ -4088,6 +4087,16 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>>
>>> if (se_cmd->se_tfo != NULL) {
>>> spin_lock_irq(&se_cmd->t_state_lock);
>>> + if (se_cmd->transport_state & CMD_T_ABORTED) {
>>> + /*
>>> + * LIO's abort path owns the cleanup for this,
>>> + * so put it back on the list and let
>>> + * aborted_task handle it.
>>> + */
>>> + list_add_tail(&cmd->i_conn_node,
>>> + &conn->conn_cmd_list);
>>
>>
>> That should have been a move from the tmp list back to the conn_cmd_list.
>
> Nice, it looks simple and I will test it.
> I am a bit worried there could be other possible race conditions.
>
> Example:
>
> thread1: connection is going to be closed,
> iscsit_release_commands_from_conn() finds a command that is about
> to be aborted, re-adds it to conn_cmd_list and proceeds.
> iscsit_close_connection() decreases the conn usage count and finally calls iscsit_free_conn(conn)
> that destroys the conn structure.
>
> thread2: iscsit_aborted_task() gets called and tries to lock the conn->cmd_lock spinlock, dereferencing
> an invalid pointer.
Nice catch. You are right.
>
> Possible solutions that I can think of:
>
> - Make iscsit_release_commands_from_conn() wait for the abort task to finish
Yeah you could set a completion in there then have aborted_task do the
complete() call maybe?
There is also the transport_cmd_check_stop_to_fabric
t_transport_stop_comp completion and transport_wait_for_tasks pairing
that seems close to what we want. I think we would have to change it to
make it work for this case. Oh yeah, that is more of a brain stroming
guess than a review type of comment :)
> or
> - abort handler could hold a reference to the conn structure so that iscsit_close_connection()
> will sleep when calling iscsit_check_conn_usage_count(conn) until abort finishes.
>
next prev parent reply other threads:[~2020-10-28 20:37 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-07 14:53 [PATCH 0/2] fix race conditions with task aborts Maurizio Lombardi
2020-10-07 14:53 ` [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd() Maurizio Lombardi
2020-10-08 2:15 ` Bart Van Assche
2020-10-08 9:42 ` Maurizio Lombardi
2020-10-07 14:53 ` [PATCH 2/2] target: iscsi: fix a race condition when aborting a task Maurizio Lombardi
2020-10-22 2:42 ` Mike Christie
2020-10-27 13:49 ` Maurizio Lombardi
2020-10-27 17:54 ` Mike Christie
2020-10-27 20:03 ` Michael Christie
2020-10-28 17:09 ` Maurizio Lombardi
2020-10-28 20:37 ` Mike Christie [this message]
2020-11-10 21:29 ` Maurizio Lombardi
2020-11-10 23:08 ` Mike Christie
2020-11-11 2:16 ` Mike Christie
2020-11-11 14:58 ` Maurizio Lombardi
2020-11-11 15:37 ` Michael Christie
2020-11-11 15:48 ` Maurizio Lombardi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d7107857-ef7a-3c88-8146-a5e7abce5ce6@oracle.com \
--to=michael.christie@oracle.com \
--cc=bvanassche@acm.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=mlombard@redhat.com \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).