Target-devel archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] fix race conditions with task aborts
@ 2020-10-07 14:53 Maurizio Lombardi
  2020-10-07 14:53 ` [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd() Maurizio Lombardi
  2020-10-07 14:53 ` [PATCH 2/2] target: iscsi: fix a race condition when aborting a task Maurizio Lombardi
  0 siblings, 2 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2020-10-07 14:53 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, target-devel, bvanassche, michael.christie

This patchset fixes some possible race conditions when
2 threads call iscsit_free_cmd()/__iscsit_free_cmd() at
the same time.

Maurizio Lombardi (2):
  target: iscsi: prevent a race condition in iscsit_unmap_cmd()
  target: iscsi: fix a race condition when aborting a task

 drivers/target/iscsi/iscsi_target_login.c | 1 +
 drivers/target/iscsi/iscsi_target_util.c  | 7 ++++---
 include/target/iscsi/iscsi_target_core.h  | 1 +
 3 files changed, 6 insertions(+), 3 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd()
  2020-10-07 14:53 [PATCH 0/2] fix race conditions with task aborts Maurizio Lombardi
@ 2020-10-07 14:53 ` Maurizio Lombardi
  2020-10-08  2:15   ` Bart Van Assche
  2020-10-07 14:53 ` [PATCH 2/2] target: iscsi: fix a race condition when aborting a task Maurizio Lombardi
  1 sibling, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2020-10-07 14:53 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, target-devel, bvanassche, michael.christie

A potential race condition may occur in iscsit_unmap_cmd() if the
__iscsit_free_cmd() function is called by two different threads.

This patch adds a spinlock to serialize the calls to
iscsit_unmap_cmd()

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/target/iscsi/iscsi_target_login.c | 1 +
 drivers/target/iscsi/iscsi_target_util.c  | 5 ++++-
 include/target/iscsi/iscsi_target_core.h  | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 893d1b406c29..e16ceee87bba 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -1110,6 +1110,7 @@ static struct iscsi_conn *iscsit_alloc_conn(struct iscsi_np *np)
 	spin_lock_init(&conn->nopin_timer_lock);
 	spin_lock_init(&conn->response_queue_lock);
 	spin_lock_init(&conn->state_lock);
+	spin_lock_init(&conn->unmap_cmd_lock);
 
 	timer_setup(&conn->nopin_response_timer,
 		    iscsit_handle_nopin_response_timeout, 0);
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 45ba07c6ec27..3082f5bde9fa 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -755,8 +755,11 @@ void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool check_queues)
 		iscsit_remove_cmd_from_response_queue(cmd, conn);
 	}
 
-	if (conn && conn->conn_transport->iscsit_unmap_cmd)
+	if (conn && conn->conn_transport->iscsit_unmap_cmd) {
+		spin_lock(&conn->unmap_cmd_lock);
 		conn->conn_transport->iscsit_unmap_cmd(conn, cmd);
+		spin_unlock(&conn->unmap_cmd_lock);
+	}
 }
 
 void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index 1eccb2ac7d02..ae7ac0134c8c 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -575,6 +575,7 @@ struct iscsi_conn {
 	spinlock_t		nopin_timer_lock;
 	spinlock_t		response_queue_lock;
 	spinlock_t		state_lock;
+	spinlock_t		unmap_cmd_lock;
 	/* libcrypto RX and TX contexts for crc32c */
 	struct ahash_request	*conn_rx_hash;
 	struct ahash_request	*conn_tx_hash;
-- 
2.26.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-07 14:53 [PATCH 0/2] fix race conditions with task aborts Maurizio Lombardi
  2020-10-07 14:53 ` [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd() Maurizio Lombardi
@ 2020-10-07 14:53 ` Maurizio Lombardi
  2020-10-22  2:42   ` Mike Christie
  1 sibling, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2020-10-07 14:53 UTC (permalink / raw)
  To: martin.petersen; +Cc: linux-scsi, target-devel, bvanassche, michael.christie

The iscsit_release_commands_from_conn() function does the following
operations:

1) locks the cmd_lock spinlock
2) Scans the list of commands and sets the CMD_T_FABRIC_STOP flag
3) Releases the cmd_lock spinlock
4) Rescans the list again and clears the i_conn_node link of each command

If an abort task timer is fired between 3) and 4), it will find
the CMD_T_FABRIC_STOP flag set and won't call list_del_init();
therefore it may end up calling __iscsit_free_cmd() with a
non-empty i_conn_node list, thus triggering the warning.

Considering that:

- we expect list_del_init() to be executed by
  iscsit_release_commands_from_conn() when the CMD_T_FABRIC_STOP is set.
- iscsit_aborted_task() is the only function that calls __iscsit_free_cmd()
  directly, while all the other functions call iscsit_free_cmd().
- the warning in __iscsit_free_cmd() is a duplicate (the same warning
  can be found in iscsit_free_cmd().

We can fix the bug by simply removing the warning from __iscsit_free_cmd()

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 1 PID: 21173 at drivers/target/iscsi/iscsi_target_util.c:720 __iscsit_free_cmd+0x26e/0x290 [iscsi_target_mod]
...
kernel: CPU: 1 PID: 21173 Comm: kworker/u8:3 Kdump: loaded Not tainted 3.10.0-1062.4.1.el7.x86_64 #1
kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015
kernel: Workqueue: tmr-user target_tmr_work [target_core_mod]
kernel: Call Trace:
kernel: [<ffffffff91d78ba4>] dump_stack+0x19/0x1b
kernel: [<ffffffff9169a958>] __warn+0xd8/0x100
kernel: [<ffffffff9169aa9d>] warn_slowpath_null+0x1d/0x20
kernel: [<ffffffffc0a6f69e>] __iscsit_free_cmd+0x26e/0x290 [iscsi_target_mod]
kernel: [<ffffffffc0a70bc4>] iscsit_aborted_task+0x64/0x70 [iscsi_target_mod]
kernel: [<ffffffffc0a7830a>] lio_aborted_task+0x2a/0x30 [iscsi_target_mod]
kernel: [<ffffffffc09fa516>] transport_cmd_finish_abort+0x66/0xb0 [target_core_mod]
kernel: [<ffffffffc09f4d92>] core_tmr_abort_task+0x102/0x180 [target_core_mod]
kernel: [<ffffffffc09f7bb2>] target_tmr_work+0x152/0x170 [target_core_mod]
kernel: [<ffffffff916bd1df>] process_one_work+0x17f/0x440
kernel: [<ffffffff916be2f6>] worker_thread+0x126/0x3c0
kernel: [<ffffffff916be1d0>] ? manage_workers.isra.26+0x2a0/0x2a0
kernel: [<ffffffff916c51b1>] kthread+0xd1/0xe0
kernel: [<ffffffff916c50e0>] ? insert_kthread_work+0x40/0x40
kernel: [<ffffffff91d8bd37>] ret_from_fork_nospec_begin+0x21/0x21
kernel: [<ffffffff916c50e0>] ? insert_kthread_work+0x40/0x40
kernel: ---[ end trace ed2119501826ec7a ]---


Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/target/iscsi/iscsi_target_util.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 3082f5bde9fa..61233755ece0 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -741,8 +741,6 @@ void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool check_queues)
 {
 	struct iscsi_conn *conn = cmd->conn;
 
-	WARN_ON(!list_empty(&cmd->i_conn_node));
-
 	if (cmd->data_direction = DMA_TO_DEVICE) {
 		iscsit_stop_dataout_timer(cmd);
 		iscsit_free_r2ts_from_list(cmd);
-- 
2.26.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd()
  2020-10-07 14:53 ` [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd() Maurizio Lombardi
@ 2020-10-08  2:15   ` Bart Van Assche
  2020-10-08  9:42     ` Maurizio Lombardi
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Van Assche @ 2020-10-08  2:15 UTC (permalink / raw)
  To: Maurizio Lombardi, martin.petersen
  Cc: linux-scsi, target-devel, michael.christie

On 2020-10-07 07:53, Maurizio Lombardi wrote:
> A potential race condition may occur in iscsit_unmap_cmd() if the
> __iscsit_free_cmd() function is called by two different threads.
> 
> This patch adds a spinlock to serialize the calls to
> iscsit_unmap_cmd()
> 
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>  drivers/target/iscsi/iscsi_target_login.c | 1 +
>  drivers/target/iscsi/iscsi_target_util.c  | 5 ++++-
>  include/target/iscsi/iscsi_target_core.h  | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
> index 893d1b406c29..e16ceee87bba 100644
> --- a/drivers/target/iscsi/iscsi_target_login.c
> +++ b/drivers/target/iscsi/iscsi_target_login.c
> @@ -1110,6 +1110,7 @@ static struct iscsi_conn *iscsit_alloc_conn(struct iscsi_np *np)
>  	spin_lock_init(&conn->nopin_timer_lock);
>  	spin_lock_init(&conn->response_queue_lock);
>  	spin_lock_init(&conn->state_lock);
> +	spin_lock_init(&conn->unmap_cmd_lock);
>  
>  	timer_setup(&conn->nopin_response_timer,
>  		    iscsit_handle_nopin_response_timeout, 0);
> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
> index 45ba07c6ec27..3082f5bde9fa 100644
> --- a/drivers/target/iscsi/iscsi_target_util.c
> +++ b/drivers/target/iscsi/iscsi_target_util.c
> @@ -755,8 +755,11 @@ void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool check_queues)
>  		iscsit_remove_cmd_from_response_queue(cmd, conn);
>  	}
>  
> -	if (conn && conn->conn_transport->iscsit_unmap_cmd)
> +	if (conn && conn->conn_transport->iscsit_unmap_cmd) {
> +		spin_lock(&conn->unmap_cmd_lock);
>  		conn->conn_transport->iscsit_unmap_cmd(conn, cmd);
> +		spin_unlock(&conn->unmap_cmd_lock);
> +	}
>  }

This looks weird to me. Shouldn't the iSCSI target code make sure that
__iscsit_free_cmd() is called once per command instead of allowing concurrent
calls of that function and serializing iscsit_unmap_cmd() calls?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd()
  2020-10-08  2:15   ` Bart Van Assche
@ 2020-10-08  9:42     ` Maurizio Lombardi
  0 siblings, 0 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2020-10-08  9:42 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen
  Cc: linux-scsi, target-devel, michael.christie



Dne 08. 10. 20 v 4:15 Bart Van Assche napsal(a):
> On 2020-10-07 07:53, Maurizio Lombardi wrote:
>> A potential race condition may occur in iscsit_unmap_cmd() if the
>> __iscsit_free_cmd() function is called by two different threads.
>>
>> This patch adds a spinlock to serialize the calls to
>> iscsit_unmap_cmd()
>>
>> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
>> ---
>>  drivers/target/iscsi/iscsi_target_login.c | 1 +
>>  drivers/target/iscsi/iscsi_target_util.c  | 5 ++++-
>>  include/target/iscsi/iscsi_target_core.h  | 1 +
>>  3 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
>> index 893d1b406c29..e16ceee87bba 100644
>> --- a/drivers/target/iscsi/iscsi_target_login.c
>> +++ b/drivers/target/iscsi/iscsi_target_login.c
>> @@ -1110,6 +1110,7 @@ static struct iscsi_conn *iscsit_alloc_conn(struct iscsi_np *np)
>>  	spin_lock_init(&conn->nopin_timer_lock);
>>  	spin_lock_init(&conn->response_queue_lock);
>>  	spin_lock_init(&conn->state_lock);
>> +	spin_lock_init(&conn->unmap_cmd_lock);
>>  
>>  	timer_setup(&conn->nopin_response_timer,
>>  		    iscsit_handle_nopin_response_timeout, 0);
>> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
>> index 45ba07c6ec27..3082f5bde9fa 100644
>> --- a/drivers/target/iscsi/iscsi_target_util.c
>> +++ b/drivers/target/iscsi/iscsi_target_util.c
>> @@ -755,8 +755,11 @@ void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool check_queues)
>>  		iscsit_remove_cmd_from_response_queue(cmd, conn);
>>  	}
>>  
>> -	if (conn && conn->conn_transport->iscsit_unmap_cmd)
>> +	if (conn && conn->conn_transport->iscsit_unmap_cmd) {
>> +		spin_lock(&conn->unmap_cmd_lock);
>>  		conn->conn_transport->iscsit_unmap_cmd(conn, cmd);
>> +		spin_unlock(&conn->unmap_cmd_lock);
>> +	}
>>  }
> 
> This looks weird to me. Shouldn't the iSCSI target code make sure that
> __iscsit_free_cmd() is called once per command instead of allowing concurrent
> calls of that function and serializing iscsit_unmap_cmd() calls?

__iscsit_free_cmd() doesn't actually "free" the command.
If you look at the code, you will notice that the only thing that it does is to stop some timers,
remove the command from the immediate and response queues and call the unmap callback.

In some cases, the iscsit_free_cmd() calls __iscsit_free_cmd() twice
against the same command (yes, it's safe).

It should also be safe to execute __iscsit_free_cmd() even after a concurrent call to
iscsit_free_cmd() has been completed because the abort timer holds a
reference against the command.
Additionally, all the functions called by __iscsit_free_cmd() are thread-safe, with
the exception of the unmap callback that may crash the kernel if the cxgbit offload driver is used.

Maurizio

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-07 14:53 ` [PATCH 2/2] target: iscsi: fix a race condition when aborting a task Maurizio Lombardi
@ 2020-10-22  2:42   ` Mike Christie
  2020-10-27 13:49     ` Maurizio Lombardi
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2020-10-22  2:42 UTC (permalink / raw)
  To: Maurizio Lombardi, martin.petersen; +Cc: linux-scsi, target-devel, bvanassche

On 10/7/20 9:53 AM, Maurizio Lombardi wrote:
> The iscsit_release_commands_from_conn() function does the following
> operations:
> 
> 1) locks the cmd_lock spinlock
> 2) Scans the list of commands and sets the CMD_T_FABRIC_STOP flag
> 3) Releases the cmd_lock spinlock
> 4) Rescans the list again and clears the i_conn_node link of each command
> 
> If an abort task timer is fired between 3) and 4), it will find
> the CMD_T_FABRIC_STOP flag set and won't call list_del_init();
> therefore it may end up calling __iscsit_free_cmd() with a
> non-empty i_conn_node list, thus triggering the warning.
> 
> Considering that:
> 
> - we expect list_del_init() to be executed by
>   iscsit_release_commands_from_conn() when the CMD_T_FABRIC_STOP is set.
> - iscsit_aborted_task() is the only function that calls __iscsit_free_cmd()
>   directly, while all the other functions call iscsit_free_cmd().
> - the warning in __iscsit_free_cmd() is a duplicate (the same warning
>   can be found in iscsit_free_cmd().
> 

For iscsi, what does the last put on the cmd for a successful abort. Ignore this conn stop case.

If we abort a command successfully, the abort path does:

target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd

When we create the cmd we do transport_init_se_cmd and that sets the refcount=1. We do target_get_sess_cmd with ack_kref=true so that sets refcount=2.

On the abort completion path, target_handle_abort does target_put_sess_cmd (refcount=1 after), check_stop_free ends up doing a put (refcount=0 after).

If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:

1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:

target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd

The cmd is now freed.
3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-22  2:42   ` Mike Christie
@ 2020-10-27 13:49     ` Maurizio Lombardi
  2020-10-27 17:54       ` Mike Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2020-10-27 13:49 UTC (permalink / raw)
  To: Mike Christie, martin.petersen; +Cc: linux-scsi, target-devel, bvanassche

Hello Mike,

Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
> 
> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
> 
> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
> 
> The cmd is now freed.
> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
> 
> 

Thanks for the review!

There are definitely some problems with task aborts and commands' refcounting *
but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
unless you are saying that abort tasks should never be executed when the connection 
is going down and we have to prevent such cases from happening at all.

* I did a test a few days ago: 
1) blocked the backstore device on the target
2) issued a write command on the initiator
3) waited for abort task and connection shutdown
4) switched the target backstore device to state "running"

I triggered the following warnings in target_handle_abort():

        } else {
                /*
                 * Allow the fabric driver to unmap any resources before
                 * releasing the descriptor via TFO->release_cmd().
                 */
                cmd->se_tfo->aborted_task(cmd);
                if (ack_kref)
                        WARN_ON_ONCE(target_put_sess_cmd(cmd) != 0); <-----------------------
                /*
                 * To do: establish a unit attention condition on the I_T
                 * nexus associated with cmd. See also the paragraph "Aborting
                 * commands" in SAM.
                 */
        }

        WARN_ON_ONCE(kref_read(&cmd->cmd_kref) = 0); <----------------------

        transport_lun_remove_cmd(cmd);

        transport_cmd_check_stop_to_fabric(cmd);
}

target_handle_abort() is executed against a command which has already been released.

[  240.883865] iSCSI/iqn.1994-05.com.redhat:9489e049fb5e: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[  353.848593] ABORT_TASK: Found referenced iSCSI task_tag: 101
[  386.521582] iSCSI Login timeout on Network Portal 0.0.0.0:3260
[  396.537133] WARNING: CPU: 1 PID: 2936 at drivers/target/target_core_transport.c:805 target_handle_abort+0xce/0x180 [target_core_mod]
[  396.537139] Modules linked in: iscsi_target_mod target_core_user uio target_core_pscsi target_core_file target_core_iblock target_core_mod uinput nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm joydev pcspkr i2c_piix4 parport_pc parport xfs libcrc32c sr_mod cdrom sd_mod sg ata_generic ata_piix libata virtio_net net_failover failover serio_raw fuse [last unloaded: ip_tables]
[  396.537302] CPU: 1 PID: 2936 Comm: kworker/1:2 Kdump: loaded Not tainted 4.18.0-240.7.el8.x86_64 #1
[  396.537306] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  396.537353] Workqueue: target_completion target_abort_work [target_core_mod]
[  396.537394] RIP: 0010:target_handle_abort+0xce/0x180 [target_core_mod]
[  396.537399] Code: 00 e8 e6 db 00 f1 41 81 e4 00 00 40 00 74 9d 48 8d bb 0c 01 00 00 f0 ff 8b 0c 01 00 00 0f 88 ed cd 00 00 75 87 e8 32 ed ff ff <0f> 0b e9 7b ff ff ff c6 03 40 0f 1f 44 00 00 0f 1f 44 00 00 48 8b
[  396.537402] RSP: 0018:ffffac1b818bbe80 EFLAGS: 00010286
[  396.537407] RAX: 0000312f0022d690 RBX: ffff9aec21595350 RCX: 0000000000000000
[  396.537410] RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000282
[  396.537413] RBP: 0000000000000000 R08: ffff9aebf4d65a00 R09: ffff9aebf4c12d98
[  396.537416] R10: 0000000000000082 R11: 000000000000002e R12: 0000000000400000
[  396.537418] R13: 0000000000000000 R14: ffff9aec02cd9c00 R15: ffff9aec21595488
[  396.537423] FS:  0000000000000000(0000) GS:ffff9aec7fa80000(0000) knlGS:0000000000000000
[  396.537427] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  396.537430] CR2: 000055bfa25e9110 CR3: 00000000614c6000 CR4: 00000000000006e0
[  396.537444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  396.537447] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  396.537449] Call Trace:
[  396.537504]  process_one_work+0x1a7/0x360
[  396.537525]  worker_thread+0x30/0x390
[  396.537532]  ? create_worker+0x1a0/0x1a0
[  396.537536]  kthread+0x112/0x130
[  396.537542]  ? kthread_flush_work_fn+0x10/0x10
[  396.537550]  ret_from_fork+0x35/0x40
[  396.537570] ---[ end trace d6b5bf9aea218e55 ]---
[  396.537734] WARNING: CPU: 1 PID: 2936 at drivers/target/target_core_transport.c:813 target_handle_abort+0x178/0x180 [target_core_mod]
[  396.537737] Modules linked in: iscsi_target_mod target_core_user uio target_core_pscsi target_core_file target_core_iblock target_core_mod uinput nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ppdev drm joydev pcspkr i2c_piix4 parport_pc parport xfs libcrc32c sr_mod cdrom sd_mod sg ata_generic ata_piix libata virtio_net net_failover failover serio_raw fuse [last unloaded: ip_tables]
[  396.537811] CPU: 1 PID: 2936 Comm: kworker/1:2 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-240.7.el8.x86_64 #1
[  396.537814] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[  396.537860] Workqueue: target_completion target_abort_work [target_core_mod]
[  396.537903] RIP: 0010:target_handle_abort+0x178/0x180 [target_core_mod]
[  396.537909] Code: 67 03 00 48 85 ed 74 1d 48 8b 45 00 48 8b 7d 08 48 83 c5 18 48 89 de e8 26 db 00 f1 48 8b 45 00 48 85 c0 75 e7 e9 6a ff ff ff <0f> 0b e9 df fe ff ff 90 0f 1f 44 00 00 55 48 8d af 08 01 00 00 53
[  396.537912] RSP: 0018:ffffac1b818bbe80 EFLAGS: 00010246
[  396.537916] RAX: 0000000000000000 RBX: ffff9aec21595350 RCX: 0000000000000000
[  396.537919] RDX: 0000000000000000 RSI: 0000000000000282 RDI: 0000000000000282
[  396.537922] RBP: 0000000000000000 R08: ffff9aebf4d65a00 R09: ffff9aebf4c12d98
[  396.537925] R10: 0000000000000082 R11: 000000000000002e R12: 0000000000400000
[  396.537928] R13: 0000000000000000 R14: ffff9aec02cd9c00 R15: ffff9aec21595488
[  396.537933] FS:  0000000000000000(0000) GS:ffff9aec7fa80000(0000) knlGS:0000000000000000
[  396.537936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  396.537939] CR2: 000055bfa25e9110 CR3: 00000000614c6000 CR4: 00000000000006e0
[  396.537951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  396.537954] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  396.537956] Call Trace:
[  396.537971]  process_one_work+0x1a7/0x360
[  396.537979]  worker_thread+0x30/0x390
[  396.537985]  ? create_worker+0x1a0/0x1a0
[  396.537989]  kthread+0x112/0x130
[  396.537994]  ? kthread_flush_work_fn+0x10/0x10
[  396.538009]  ret_from_fork+0x35/0x40
[  396.538016] ---[ end trace d6b5bf9aea218e56 ]---
[  396.538221] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 101

Maurizio

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-27 13:49     ` Maurizio Lombardi
@ 2020-10-27 17:54       ` Mike Christie
  2020-10-27 20:03         ` Michael Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Christie @ 2020-10-27 17:54 UTC (permalink / raw)
  To: Maurizio Lombardi, martin.petersen; +Cc: linux-scsi, target-devel, bvanassche

On 10/27/20 8:49 AM, Maurizio Lombardi wrote:
> Hello Mike,
> 
> Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
>> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
>>
>> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
>>
>> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
>>
>> The cmd is now freed.
>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
>>
>>
> 
> Thanks for the review!
> 
> There are definitely some problems with task aborts and commands' refcounting *
> but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
> unless you are saying that abort tasks should never be executed when the connection 
> is going down and we have to prevent such cases from happening at all.

Yeah, I think if we prevent the race then we fix the refcount issue and your issue.
Here is a patch that is only compile tested:

From 209709bcedd9a6ce6003e6bb86f3ebf547dca6af Mon Sep 17 00:00:00 2001
From: Mike Christie <michael.christie@oracle.com>
Date: Tue, 27 Oct 2020 12:30:53 -0500
Subject: [PATCH] iscsi target: fix cmd abort vs fabric stop race

The abort and cmd stop paths can race where:

1. thread1 runs iscsit_release_commands_from_conn and sets
CMD_T_FABRIC_STOP.
2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
then returns from the aborted_task callout and we finish
target_handle_abort and do:

target_handle_abort -> transport_cmd_check_stop_to_fabric ->
lio_check_stop_free -> target_put_sess_cmd

The cmd is now freed.
3. thread1 now finishes iscsit_release_commands_from_conn and runs
iscsit_free_cmd while accessing a command we just released.

In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
CMD_T_ABORTED if the driver is not cleaning up the cmd because of
a session shutdown. However, iscsit_release_commands_from_conn only
sets the CMD_T_FABRIC_STOP and does not check to see if the abort path
has claimed completion ownership of the command.

This adds a check in iscsit_release_commands_from_conn so only the
abort or fabric stop path cleanup the command.
---
 drivers/target/iscsi/iscsi_target.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index f77e5ee..85027d3 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -483,8 +483,7 @@ int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 {
 	spin_lock_bh(&conn->cmd_lock);
-	if (!list_empty(&cmd->i_conn_node) &&
-	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
+	if (!list_empty(&cmd->i_conn_node))
 		list_del_init(&cmd->i_conn_node);
 	spin_unlock_bh(&conn->cmd_lock);
 
@@ -4088,6 +4087,16 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
 
 		if (se_cmd->se_tfo != NULL) {
 			spin_lock_irq(&se_cmd->t_state_lock);
+			if (se_cmd->transport_state & CMD_T_ABORTED) {
+				/*
+				 * LIO's abort path owns the cleanup for this,
+				 * so put it back on the list and let
+				 * aborted_task handle it.
+				 */
+				list_add_tail(&cmd->i_conn_node,
+					      &conn->conn_cmd_list);
+				continue;
+			}
 			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
 			spin_unlock_irq(&se_cmd->t_state_lock);
 		}
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-27 17:54       ` Mike Christie
@ 2020-10-27 20:03         ` Michael Christie
  2020-10-28 17:09           ` Maurizio Lombardi
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Christie @ 2020-10-27 20:03 UTC (permalink / raw)
  To: Maurizio Lombardi, Martin K. Petersen
  Cc: linux-scsi, target-devel, bvanassche



> On Oct 27, 2020, at 12:54 PM, Mike Christie <michael.christie@oracle.com> wrote:
> 
> On 10/27/20 8:49 AM, Maurizio Lombardi wrote:
>> Hello Mike,
>> 
>> Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
>>> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
>>> 
>>> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
>>> 
>>> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
>>> 
>>> The cmd is now freed.
>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
>>> 
>>> 
>> 
>> Thanks for the review!
>> 
>> There are definitely some problems with task aborts and commands' refcounting *
>> but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
>> unless you are saying that abort tasks should never be executed when the connection 
>> is going down and we have to prevent such cases from happening at all.
> 
> Yeah, I think if we prevent the race then we fix the refcount issue and your issue.
> Here is a patch that is only compile tested:
> 
> From 209709bcedd9a6ce6003e6bb86f3ebf547dca6af Mon Sep 17 00:00:00 2001
> From: Mike Christie <michael.christie@oracle.com>
> Date: Tue, 27 Oct 2020 12:30:53 -0500
> Subject: [PATCH] iscsi target: fix cmd abort vs fabric stop race
> 
> The abort and cmd stop paths can race where:
> 
> 1. thread1 runs iscsit_release_commands_from_conn and sets
> CMD_T_FABRIC_STOP.
> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
> then returns from the aborted_task callout and we finish
> target_handle_abort and do:
> 
> target_handle_abort -> transport_cmd_check_stop_to_fabric ->
> lio_check_stop_free -> target_put_sess_cmd
> 
> The cmd is now freed.
> 3. thread1 now finishes iscsit_release_commands_from_conn and runs
> iscsit_free_cmd while accessing a command we just released.
> 
> In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
> CMD_T_ABORTED if the driver is not cleaning up the cmd because of
> a session shutdown. However, iscsit_release_commands_from_conn only
> sets the CMD_T_FABRIC_STOP and does not check to see if the abort path
> has claimed completion ownership of the command.
> 
> This adds a check in iscsit_release_commands_from_conn so only the
> abort or fabric stop path cleanup the command.
> ---
> drivers/target/iscsi/iscsi_target.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index f77e5ee..85027d3 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -483,8 +483,7 @@ int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
> void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
> {
> 	spin_lock_bh(&conn->cmd_lock);
> -	if (!list_empty(&cmd->i_conn_node) &&
> -	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
> +	if (!list_empty(&cmd->i_conn_node))
> 		list_del_init(&cmd->i_conn_node);
> 	spin_unlock_bh(&conn->cmd_lock);
> 
> @@ -4088,6 +4087,16 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
> 
> 		if (se_cmd->se_tfo != NULL) {
> 			spin_lock_irq(&se_cmd->t_state_lock);
> +			if (se_cmd->transport_state & CMD_T_ABORTED) {
> +				/*
> +				 * LIO's abort path owns the cleanup for this,
> +				 * so put it back on the list and let
> +				 * aborted_task handle it.
> +				 */
> +				list_add_tail(&cmd->i_conn_node,
> +					      &conn->conn_cmd_list);


That should have been a move from the tmp list back to the conn_cmd_list.


> +				continue;
> +			}
> 			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
> 			spin_unlock_irq(&se_cmd->t_state_lock);
> 		}
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-27 20:03         ` Michael Christie
@ 2020-10-28 17:09           ` Maurizio Lombardi
  2020-10-28 20:37             ` Mike Christie
  2020-11-11  2:16             ` Mike Christie
  0 siblings, 2 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2020-10-28 17:09 UTC (permalink / raw)
  To: Michael Christie, Martin K. Petersen; +Cc: linux-scsi, target-devel, bvanassche



Dne 27. 10. 20 v 21:03 Michael Christie napsal(a):
> 
> 
>> On Oct 27, 2020, at 12:54 PM, Mike Christie <michael.christie@oracle.com> wrote:
>>
>> On 10/27/20 8:49 AM, Maurizio Lombardi wrote:
>>> Hello Mike,
>>>
>>> Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
>>>> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
>>>>
>>>> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
>>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
>>>>
>>>> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
>>>>
>>>> The cmd is now freed.
>>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
>>>>
>>>>
>>>
>>> Thanks for the review!
>>>
>>> There are definitely some problems with task aborts and commands' refcounting *
>>> but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
>>> unless you are saying that abort tasks should never be executed when the connection 
>>> is going down and we have to prevent such cases from happening at all.
>>
>> Yeah, I think if we prevent the race then we fix the refcount issue and your issue.
>> Here is a patch that is only compile tested:
>>
>> From 209709bcedd9a6ce6003e6bb86f3ebf547dca6af Mon Sep 17 00:00:00 2001
>> From: Mike Christie <michael.christie@oracle.com>
>> Date: Tue, 27 Oct 2020 12:30:53 -0500
>> Subject: [PATCH] iscsi target: fix cmd abort vs fabric stop race
>>
>> The abort and cmd stop paths can race where:
>>
>> 1. thread1 runs iscsit_release_commands_from_conn and sets
>> CMD_T_FABRIC_STOP.
>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
>> then returns from the aborted_task callout and we finish
>> target_handle_abort and do:
>>
>> target_handle_abort -> transport_cmd_check_stop_to_fabric ->
>> lio_check_stop_free -> target_put_sess_cmd
>>
>> The cmd is now freed.
>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs
>> iscsit_free_cmd while accessing a command we just released.
>>
>> In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
>> CMD_T_ABORTED if the driver is not cleaning up the cmd because of
>> a session shutdown. However, iscsit_release_commands_from_conn only
>> sets the CMD_T_FABRIC_STOP and does not check to see if the abort path
>> has claimed completion ownership of the command.
>>
>> This adds a check in iscsit_release_commands_from_conn so only the
>> abort or fabric stop path cleanup the command.
>> ---
>> drivers/target/iscsi/iscsi_target.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>> index f77e5ee..85027d3 100644
>> --- a/drivers/target/iscsi/iscsi_target.c
>> +++ b/drivers/target/iscsi/iscsi_target.c
>> @@ -483,8 +483,7 @@ int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>> void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>> {
>> 	spin_lock_bh(&conn->cmd_lock);
>> -	if (!list_empty(&cmd->i_conn_node) &&
>> -	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
>> +	if (!list_empty(&cmd->i_conn_node))
>> 		list_del_init(&cmd->i_conn_node);
>> 	spin_unlock_bh(&conn->cmd_lock);
>>
>> @@ -4088,6 +4087,16 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>
>> 		if (se_cmd->se_tfo != NULL) {
>> 			spin_lock_irq(&se_cmd->t_state_lock);
>> +			if (se_cmd->transport_state & CMD_T_ABORTED) {
>> +				/*
>> +				 * LIO's abort path owns the cleanup for this,
>> +				 * so put it back on the list and let
>> +				 * aborted_task handle it.
>> +				 */
>> +				list_add_tail(&cmd->i_conn_node,
>> +					      &conn->conn_cmd_list);
> 
> 
> That should have been a move from the tmp list back to the conn_cmd_list.

Nice, it looks simple and I will test it.
I am a bit worried there could be other possible race conditions.

Example: 

thread1: connection is going to be closed,
iscsit_release_commands_from_conn() finds a command that is about
to be aborted, re-adds it to conn_cmd_list and proceeds.
iscsit_close_connection() decreases the conn usage count and finally calls iscsit_free_conn(conn)
that destroys the conn structure.

thread2: iscsit_aborted_task() gets called and tries to lock the conn->cmd_lock spinlock, dereferencing
an invalid pointer.

Possible solutions that I can think of:

- Make iscsit_release_commands_from_conn() wait for the abort task to finish
or
- abort handler could hold a reference to the conn structure so that iscsit_close_connection()
will sleep when calling iscsit_check_conn_usage_count(conn) until abort finishes.


Maurizio

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-28 17:09           ` Maurizio Lombardi
@ 2020-10-28 20:37             ` Mike Christie
  2020-11-10 21:29               ` Maurizio Lombardi
  2020-11-11  2:16             ` Mike Christie
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Christie @ 2020-10-28 20:37 UTC (permalink / raw)
  To: Maurizio Lombardi, Martin K. Petersen
  Cc: linux-scsi, target-devel, bvanassche

On 10/28/20 12:09 PM, Maurizio Lombardi wrote:
> 
> 
> Dne 27. 10. 20 v 21:03 Michael Christie napsal(a):
>>
>>
>>> On Oct 27, 2020, at 12:54 PM, Mike Christie <michael.christie@oracle.com> wrote:
>>>
>>> On 10/27/20 8:49 AM, Maurizio Lombardi wrote:
>>>> Hello Mike,
>>>>
>>>> Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
>>>>> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
>>>>>
>>>>> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
>>>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
>>>>>
>>>>> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
>>>>>
>>>>> The cmd is now freed.
>>>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
>>>>>
>>>>>
>>>>
>>>> Thanks for the review!
>>>>
>>>> There are definitely some problems with task aborts and commands' refcounting *
>>>> but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
>>>> unless you are saying that abort tasks should never be executed when the connection
>>>> is going down and we have to prevent such cases from happening at all.
>>>
>>> Yeah, I think if we prevent the race then we fix the refcount issue and your issue.
>>> Here is a patch that is only compile tested:
>>>
>>>  From 209709bcedd9a6ce6003e6bb86f3ebf547dca6af Mon Sep 17 00:00:00 2001
>>> From: Mike Christie <michael.christie@oracle.com>
>>> Date: Tue, 27 Oct 2020 12:30:53 -0500
>>> Subject: [PATCH] iscsi target: fix cmd abort vs fabric stop race
>>>
>>> The abort and cmd stop paths can race where:
>>>
>>> 1. thread1 runs iscsit_release_commands_from_conn and sets
>>> CMD_T_FABRIC_STOP.
>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
>>> then returns from the aborted_task callout and we finish
>>> target_handle_abort and do:
>>>
>>> target_handle_abort -> transport_cmd_check_stop_to_fabric ->
>>> lio_check_stop_free -> target_put_sess_cmd
>>>
>>> The cmd is now freed.
>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs
>>> iscsit_free_cmd while accessing a command we just released.
>>>
>>> In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
>>> CMD_T_ABORTED if the driver is not cleaning up the cmd because of
>>> a session shutdown. However, iscsit_release_commands_from_conn only
>>> sets the CMD_T_FABRIC_STOP and does not check to see if the abort path
>>> has claimed completion ownership of the command.
>>>
>>> This adds a check in iscsit_release_commands_from_conn so only the
>>> abort or fabric stop path cleanup the command.
>>> ---
>>> drivers/target/iscsi/iscsi_target.c | 13 +++++++++++--
>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>>> index f77e5ee..85027d3 100644
>>> --- a/drivers/target/iscsi/iscsi_target.c
>>> +++ b/drivers/target/iscsi/iscsi_target.c
>>> @@ -483,8 +483,7 @@ int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>>> void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>>> {
>>> 	spin_lock_bh(&conn->cmd_lock);
>>> -	if (!list_empty(&cmd->i_conn_node) &&
>>> -	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
>>> +	if (!list_empty(&cmd->i_conn_node))
>>> 		list_del_init(&cmd->i_conn_node);
>>> 	spin_unlock_bh(&conn->cmd_lock);
>>>
>>> @@ -4088,6 +4087,16 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>>
>>> 		if (se_cmd->se_tfo != NULL) {
>>> 			spin_lock_irq(&se_cmd->t_state_lock);
>>> +			if (se_cmd->transport_state & CMD_T_ABORTED) {
>>> +				/*
>>> +				 * LIO's abort path owns the cleanup for this,
>>> +				 * so put it back on the list and let
>>> +				 * aborted_task handle it.
>>> +				 */
>>> +				list_add_tail(&cmd->i_conn_node,
>>> +					      &conn->conn_cmd_list);
>>
>>
>> That should have been a move from the tmp list back to the conn_cmd_list.
> 
> Nice, it looks simple and I will test it.
> I am a bit worried there could be other possible race conditions.
> 
> Example:
> 
> thread1: connection is going to be closed,
> iscsit_release_commands_from_conn() finds a command that is about
> to be aborted, re-adds it to conn_cmd_list and proceeds.
> iscsit_close_connection() decreases the conn usage count and finally calls iscsit_free_conn(conn)
> that destroys the conn structure.
> 
> thread2: iscsit_aborted_task() gets called and tries to lock the conn->cmd_lock spinlock, dereferencing
> an invalid pointer.

Nice catch. You are right.

> 
> Possible solutions that I can think of:
> 
> - Make iscsit_release_commands_from_conn() wait for the abort task to finish

Yeah you could set a completion in there then have aborted_task do the 
complete() call maybe?

There is also the transport_cmd_check_stop_to_fabric 
t_transport_stop_comp completion and transport_wait_for_tasks pairing 
that seems close to what we want. I think we would have to change it to 
make it work for this case. Oh yeah, that is more of a brain stroming 
guess than a review type of comment :)


> or
> - abort handler could hold a reference to the conn structure so that iscsit_close_connection()
> will sleep when calling iscsit_check_conn_usage_count(conn) until abort finishes.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-28 20:37             ` Mike Christie
@ 2020-11-10 21:29               ` Maurizio Lombardi
  2020-11-10 23:08                 ` Mike Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2020-11-10 21:29 UTC (permalink / raw)
  To: Mike Christie, Martin K. Petersen
  Cc: linux-scsi, target-devel, bvanassche, m.lombardi85



Dne 28. 10. 20 v 21:37 Mike Christie napsal(a):
>>
>> Possible solutions that I can think of:
>>
>> - Make iscsit_release_commands_from_conn() wait for the abort task to finish
> 
> Yeah you could set a completion in there then have aborted_task do the complete() call maybe?
> 

We could do something like this, what do you think?

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 067074ef50818..ffd3dbc53a42f 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -490,13 +490,16 @@ EXPORT_SYMBOL(iscsit_queue_rsp);
 
 void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 {
+	struct se_cmd *se_cmd = cmd->se_cmd.se_tfo ? &cmd->se_cmd : NULL;
+
 	spin_lock_bh(&conn->cmd_lock);
-	if (!list_empty(&cmd->i_conn_node) &&
-	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
+	if (!list_empty(&cmd->i_conn_node))
 		list_del_init(&cmd->i_conn_node);
 	spin_unlock_bh(&conn->cmd_lock);
 
 	__iscsit_free_cmd(cmd, true);
+	if (se_cmd && se_cmd->abrt_task_compl)
+		complete(se_cmd->abrt_task_compl);
 }
 EXPORT_SYMBOL(iscsit_aborted_task);
 
@@ -4080,6 +4083,7 @@ int iscsi_target_rx_thread(void *arg)
 
 static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
 {
+	DECLARE_COMPLETION_ONSTACK(compl);
 	LIST_HEAD(tmp_list);
 	struct iscsi_cmd *cmd = NULL, *cmd_tmp = NULL;
 	struct iscsi_session *sess = conn->sess;
@@ -4096,8 +4100,24 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
 
 		if (se_cmd->se_tfo != NULL) {
 			spin_lock_irq(&se_cmd->t_state_lock);
+			if (se_cmd->transport_state & CMD_T_ABORTED) {
+				/*
+				 * LIO's abort path owns the cleanup for this,
+				 * so put it back on the list and let
+				 * aborted_task handle it.
+				 */
+				list_move_tail(&cmd->i_conn_node, &conn->conn_cmd_list);
+				WARN_ON_ONCE(se_cmd->abrt_task_compl);
+				se_cmd->abrt_task_compl = &compl;
+			}
 			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
 			spin_unlock_irq(&se_cmd->t_state_lock);
+
+			if (se_cmd->abrt_task_compl) {
+				spin_unlock_bh(&conn->cmd_lock);
+				wait_for_completion(&compl);
+				spin_lock_bh(&conn->cmd_lock);
+			}
 		}
 	}
 	spin_unlock_bh(&conn->cmd_lock);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index db53a0d649da7..5611e6c00f18c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1391,6 +1391,7 @@ void transport_init_se_cmd(
 	init_completion(&cmd->t_transport_stop_comp);
 	cmd->free_compl = NULL;
 	cmd->abrt_compl = NULL;
+	cmd->abrt_task_compl = NULL;
 	spin_lock_init(&cmd->t_state_lock);
 	INIT_WORK(&cmd->work, NULL);
 	kref_init(&cmd->cmd_kref);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 549947d407cfd..25cc451930281 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -491,6 +491,7 @@ struct se_cmd {
 	struct list_head	se_cmd_list;
 	struct completion	*free_compl;
 	struct completion	*abrt_compl;
+	struct completion	*abrt_task_compl;
 	const struct target_core_fabric_ops *se_tfo;
 	sense_reason_t		(*execute_cmd)(struct se_cmd *);
 	sense_reason_t (*transport_complete_callback)(struct se_cmd *, bool, int *);
-- 
2.26.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-11-10 21:29               ` Maurizio Lombardi
@ 2020-11-10 23:08                 ` Mike Christie
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Christie @ 2020-11-10 23:08 UTC (permalink / raw)
  To: Maurizio Lombardi, Martin K. Petersen
  Cc: linux-scsi, target-devel, bvanassche, m.lombardi85

On 11/10/20 3:29 PM, Maurizio Lombardi wrote:
> 
> 
> Dne 28. 10. 20 v 21:37 Mike Christie napsal(a):
>>>
>>> Possible solutions that I can think of:
>>>
>>> - Make iscsit_release_commands_from_conn() wait for the abort task to finish
>>
>> Yeah you could set a completion in there then have aborted_task do the complete() call maybe?
>>
> 
> We could do something like this, what do you think?
> 
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index 067074ef50818..ffd3dbc53a42f 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -490,13 +490,16 @@ EXPORT_SYMBOL(iscsit_queue_rsp);
>   
>   void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>   {
> +	struct se_cmd *se_cmd = cmd->se_cmd.se_tfo ? &cmd->se_cmd : NULL;
> +
>   	spin_lock_bh(&conn->cmd_lock);
> -	if (!list_empty(&cmd->i_conn_node) &&
> -	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
> +	if (!list_empty(&cmd->i_conn_node))
>   		list_del_init(&cmd->i_conn_node);
>   	spin_unlock_bh(&conn->cmd_lock);
>   
>   	__iscsit_free_cmd(cmd, true);
> +	if (se_cmd && se_cmd->abrt_task_compl)
> +		complete(se_cmd->abrt_task_compl);
>   }
>   EXPORT_SYMBOL(iscsit_aborted_task);
>   
> @@ -4080,6 +4083,7 @@ int iscsi_target_rx_thread(void *arg)
>   
>   static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>   {
> +	DECLARE_COMPLETION_ONSTACK(compl);
>   	LIST_HEAD(tmp_list);
>   	struct iscsi_cmd *cmd = NULL, *cmd_tmp = NULL;
>   	struct iscsi_session *sess = conn->sess;
> @@ -4096,8 +4100,24 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>   
>   		if (se_cmd->se_tfo != NULL) {
>   			spin_lock_irq(&se_cmd->t_state_lock);
> +			if (se_cmd->transport_state & CMD_T_ABORTED) {
> +				/*
> +				 * LIO's abort path owns the cleanup for this,
> +				 * so put it back on the list and let
> +				 * aborted_task handle it.
> +				 */
> +				list_move_tail(&cmd->i_conn_node, &conn->conn_cmd_list);
> +				WARN_ON_ONCE(se_cmd->abrt_task_compl);
> +				se_cmd->abrt_task_compl = &compl;
> +			}
>   			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
>   			spin_unlock_irq(&se_cmd->t_state_lock);
> +
> +			if (se_cmd->abrt_task_compl) {
> +				spin_unlock_bh(&conn->cmd_lock);
> +				wait_for_completion(&compl);
> +				spin_lock_bh(&conn->cmd_lock);

You can still hit your freed conn race I think. The aborted_task callout 
is not the last time we are referencing the iscsi cmd and conn. That 
code path still has to do the release_cmd callout for example. Once we 
get past this wait_for_completion the abort code path could be still 
running. If this iscsit_release_commands_from_conn completes first then 
we can hit your case where we free the conn before the abort code path 
has done release_cmd and we could hit that cmd->conn->sess reference.

I think if you do the complete in iscsit_release_cmd then we would not 
hit that issue.

There might be a second issue though. What happens if aborted_task ran 
first and deleted the cmd from the conn_cmd_list. It would then be 
running while iscsit_release_commands_from_conn is running. We would 
then not do the wait_for_completion above.


> +			}
>   		}
>   	}
>   	spin_unlock_bh(&conn->cmd_lock);
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index db53a0d649da7..5611e6c00f18c 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -1391,6 +1391,7 @@ void transport_init_se_cmd(
>   	init_completion(&cmd->t_transport_stop_comp);
>   	cmd->free_compl = NULL;
>   	cmd->abrt_compl = NULL;
> +	cmd->abrt_task_compl = NULL;
>   	spin_lock_init(&cmd->t_state_lock);
>   	INIT_WORK(&cmd->work, NULL);
>   	kref_init(&cmd->cmd_kref);
> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
> index 549947d407cfd..25cc451930281 100644
> --- a/include/target/target_core_base.h
> +++ b/include/target/target_core_base.h
> @@ -491,6 +491,7 @@ struct se_cmd {
>   	struct list_head	se_cmd_list;
>   	struct completion	*free_compl;
>   	struct completion	*abrt_compl;
> +	struct completion	*abrt_task_compl;

This should be on the iscsi cmd since only iscsi uses it.

>   	const struct target_core_fabric_ops *se_tfo;
>   	sense_reason_t		(*execute_cmd)(struct se_cmd *);
>   	sense_reason_t (*transport_complete_callback)(struct se_cmd *, bool, int *);
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-10-28 17:09           ` Maurizio Lombardi
  2020-10-28 20:37             ` Mike Christie
@ 2020-11-11  2:16             ` Mike Christie
  2020-11-11 14:58               ` Maurizio Lombardi
  1 sibling, 1 reply; 17+ messages in thread
From: Mike Christie @ 2020-11-11  2:16 UTC (permalink / raw)
  To: Maurizio Lombardi, Martin K. Petersen
  Cc: linux-scsi, target-devel, bvanassche

On 10/28/20 12:09 PM, Maurizio Lombardi wrote:
> 
> 
> Dne 27. 10. 20 v 21:03 Michael Christie napsal(a):
>>
>>
>>> On Oct 27, 2020, at 12:54 PM, Mike Christie <michael.christie@oracle.com> wrote:
>>>
>>> On 10/27/20 8:49 AM, Maurizio Lombardi wrote:
>>>> Hello Mike,
>>>>
>>>> Dne 22. 10. 20 v 4:42 Mike Christie napsal(a):
>>>>> If we free the cmd from the abort path, then for your conn stop plus abort race case, could we do:
>>>>>
>>>>> 1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
>>>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
>>>>>
>>>>> target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
>>>>>
>>>>> The cmd is now freed.
>>>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
>>>>>
>>>>>
>>>>
>>>> Thanks for the review!
>>>>
>>>> There are definitely some problems with task aborts and commands' refcounting *
>>>> but this is a different bug than the one this patch is trying to solve (a race to list_del_init());
>>>> unless you are saying that abort tasks should never be executed when the connection
>>>> is going down and we have to prevent such cases from happening at all.
>>>
>>> Yeah, I think if we prevent the race then we fix the refcount issue and your issue.
>>> Here is a patch that is only compile tested:
>>>
>>>  From 209709bcedd9a6ce6003e6bb86f3ebf547dca6af Mon Sep 17 00:00:00 2001
>>> From: Mike Christie <michael.christie@oracle.com>
>>> Date: Tue, 27 Oct 2020 12:30:53 -0500
>>> Subject: [PATCH] iscsi target: fix cmd abort vs fabric stop race
>>>
>>> The abort and cmd stop paths can race where:
>>>
>>> 1. thread1 runs iscsit_release_commands_from_conn and sets
>>> CMD_T_FABRIC_STOP.
>>> 2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It
>>> then returns from the aborted_task callout and we finish
>>> target_handle_abort and do:
>>>
>>> target_handle_abort -> transport_cmd_check_stop_to_fabric ->
>>> lio_check_stop_free -> target_put_sess_cmd
>>>
>>> The cmd is now freed.
>>> 3. thread1 now finishes iscsit_release_commands_from_conn and runs
>>> iscsit_free_cmd while accessing a command we just released.
>>>
>>> In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the
>>> CMD_T_ABORTED if the driver is not cleaning up the cmd because of
>>> a session shutdown. However, iscsit_release_commands_from_conn only
>>> sets the CMD_T_FABRIC_STOP and does not check to see if the abort path
>>> has claimed completion ownership of the command.
>>>
>>> This adds a check in iscsit_release_commands_from_conn so only the
>>> abort or fabric stop path cleanup the command.
>>> ---
>>> drivers/target/iscsi/iscsi_target.c | 13 +++++++++++--
>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>>> index f77e5ee..85027d3 100644
>>> --- a/drivers/target/iscsi/iscsi_target.c
>>> +++ b/drivers/target/iscsi/iscsi_target.c
>>> @@ -483,8 +483,7 @@ int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>>> void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
>>> {
>>> 	spin_lock_bh(&conn->cmd_lock);
>>> -	if (!list_empty(&cmd->i_conn_node) &&
>>> -	    !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP))
>>> +	if (!list_empty(&cmd->i_conn_node))
>>> 		list_del_init(&cmd->i_conn_node);
>>> 	spin_unlock_bh(&conn->cmd_lock);
>>>
>>> @@ -4088,6 +4087,16 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>>>
>>> 		if (se_cmd->se_tfo != NULL) {
>>> 			spin_lock_irq(&se_cmd->t_state_lock);
>>> +			if (se_cmd->transport_state & CMD_T_ABORTED) {
>>> +				/*
>>> +				 * LIO's abort path owns the cleanup for this,
>>> +				 * so put it back on the list and let
>>> +				 * aborted_task handle it.
>>> +				 */
>>> +				list_add_tail(&cmd->i_conn_node,
>>> +					      &conn->conn_cmd_list);
>>
>>
>> That should have been a move from the tmp list back to the conn_cmd_list.
> 
> Nice, it looks simple and I will test it.
> I am a bit worried there could be other possible race conditions.
> 
> Example:
> 
> thread1: connection is going to be closed,
> iscsit_release_commands_from_conn() finds a command that is about
> to be aborted, re-adds it to conn_cmd_list and proceeds.
> iscsit_close_connection() decreases the conn usage count and finally calls iscsit_free_conn(conn)
> that destroys the conn structure.
> 
> thread2: iscsit_aborted_task() gets called and tries to lock the conn->cmd_lock spinlock, dereferencing
> an invalid pointer.

Hey, I tested this out and I do not think this will happen. We will get 
stuck waiting on the TMF completion for the affected cmd/cmds.

In conn_cmd_list we would have [CMD1 -> ABORT TMF]. Those cmds get moved 
to the tmp list. It might happen where CMD1's CMD_T_ABORTED bit is set, 
and iscsit_release_commands_from_conn will would put it back onto the 
conn_cmd_list. But then it will see the ABORT on the list. We will then 
wait on the ABORT in:

iscsit_release_commands_from_conn -> iscsit_free_cmd -> 
transport_generic_free_cmd.

The abort at this time would be waiting on CMD1 in 
target_put_cmd_and_wait in target_core_tmr.c. Eventually CMD1 completes, 
aborted_task is run and the cmd is released. target_release_cmd_kref 
will then do complete on abort_compl. The abort will then wake from 
target_put_cmd_and_wait and will complete. We will then return from 
transport_generic_free_cmd for the abort.

A LUN RESET should work similarly but we would be waiting on the cmds 
and aborts and then the LUN RESET would complete.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-11-11  2:16             ` Mike Christie
@ 2020-11-11 14:58               ` Maurizio Lombardi
  2020-11-11 15:37                 ` Michael Christie
  0 siblings, 1 reply; 17+ messages in thread
From: Maurizio Lombardi @ 2020-11-11 14:58 UTC (permalink / raw)
  To: Mike Christie, Martin K. Petersen
  Cc: linux-scsi, target-devel, bvanassche, m.lombardi85



Dne 11. 11. 20 v 3:16 Mike Christie napsal(a):
> Hey, I tested this out and I do not think this will happen. We will get stuck waiting on the TMF completion for the affected cmd/cmds.
> 
> In conn_cmd_list we would have [CMD1 -> ABORT TMF]. Those cmds get moved to the tmp list. It might happen where CMD1's CMD_T_ABORTED bit is set, and iscsit_release_commands_from_conn will would put it back onto the conn_cmd_list. But then it will see the ABORT on the list. We will then wait on the ABORT in:
> 
> iscsit_release_commands_from_conn -> iscsit_free_cmd -> transport_generic_free_cmd.

Hi Mike,

I'm not sure if I understood this part.

The commands are moved to the tmp_list;
we check for CMD_T_ABORTED and eventually move the commands from tmp_list back to conn_cmd_list
because it's the abort task the one that should do the cleanup.

iscsit_release_commands_from_conn() then scans the tmp_list and calls iscsit_free_cmd()... but not against
those commands with CMD_T_ABORTED flag set because we just moved them back to conn_cmd_list
and aren't linked to tmp_list anymore.

Am I missing something?

Maurizio

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-11-11 14:58               ` Maurizio Lombardi
@ 2020-11-11 15:37                 ` Michael Christie
  2020-11-11 15:48                   ` Maurizio Lombardi
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Christie @ 2020-11-11 15:37 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Martin K. Petersen, linux-scsi, target-devel, bvanassche, m.lombardi85



> On Nov 11, 2020, at 8:58 AM, Maurizio Lombardi <mlombard@redhat.com> wrote:
> 
> 
> 
> Dne 11. 11. 20 v 3:16 Mike Christie napsal(a):
>> Hey, I tested this out and I do not think this will happen. We will get stuck waiting on the TMF completion for the affected cmd/cmds.
>> 
>> In conn_cmd_list we would have [CMD1 -> ABORT TMF]. Those cmds get moved to the tmp list. It might happen where CMD1's CMD_T_ABORTED bit is set, and iscsit_release_commands_from_conn will would put it back onto the conn_cmd_list. But then it will see the ABORT on the list. We will then wait on the ABORT in:
>> 
>> iscsit_release_commands_from_conn -> iscsit_free_cmd -> transport_generic_free_cmd.
> 
> Hi Mike,
> 
> I'm not sure if I understood this part.
> 
> The commands are moved to the tmp_list;
> we check for CMD_T_ABORTED and eventually move the commands from tmp_list back to conn_cmd_list
> because it's the abort task the one that should do the cleanup.

I’m not sure what you mean here. Are you saying both CMD1’s se_cmd and the ABORT’s se_cmd will have the CMD_T_ABORTED bit set and will both go through the aborted_task callout?


> 
> iscsit_release_commands_from_conn() then scans the tmp_list and calls iscsit_free_cmd()... but not against
> those commands with CMD_T_ABORTED flag set because we just moved them back to conn_cmd_list
> and aren't linked to tmp_list anymore.
> 
> Am I missing something?


If you have a SCSI READ/WRITE se_cmd (CMD1 in my example) and a ABORT se_cmd (ABORT TMF in my example) on the conn_cmd_list, then the ABORT’s se_cmd would not have the CMD_T_ABORTED bit set, right? If so, what sets it?

If the SCSI R/W has the CMD_T_ABORTED bit set, we move it it back to the conn_cmd_list and the abort code path cleans it up. But then we still have the ABORT’s se_cmd on the tmp_list. We will then call 

transport_generic_free_cmd(wait_for_tasks=true) -> __transport_wait_for_tasks(fabric_stop=true)

And wait for the ABORT to complete, and the ABORT does not complete until the last ref on the command it’s aborting completes.

If you have a LUN RESET in the mix like:

[CMD1 -> ABORT TMF -> LUN RESET TMF]

Then CMD1 and the ABORT could have their CMD_T_ABORTED bit set. core_tmr_drain_tmr_list would call __target_check_io_state during the RESET processing. However, in this case, the LUN RESET’s se_cmd would not have the bit set, so we would end up waiting like I described for that to complete. In that case though the RESET waits for the cmds and tmfs it is cleaning up.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] target: iscsi: fix a race condition when aborting a task
  2020-11-11 15:37                 ` Michael Christie
@ 2020-11-11 15:48                   ` Maurizio Lombardi
  0 siblings, 0 replies; 17+ messages in thread
From: Maurizio Lombardi @ 2020-11-11 15:48 UTC (permalink / raw)
  To: Michael Christie
  Cc: Martin K. Petersen, linux-scsi, target-devel, bvanassche, m.lombardi85



Dne 11. 11. 20 v 16:37 Michael Christie napsal(a):
> If the SCSI R/W has the CMD_T_ABORTED bit set, we move it it back to the conn_cmd_list and the abort code path cleans it up. But then we still have the ABORT’s se_cmd on the tmp_list. We will then call 
> 
> transport_generic_free_cmd(wait_for_tasks=true) -> __transport_wait_for_tasks(fabric_stop=true)
> 
> And wait for the ABORT to complete, and the ABORT does not complete until the last ref on the command it’s aborting completes.


Right. now I understand it.
Thanks.

Maurizio

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 14:53 [PATCH 0/2] fix race conditions with task aborts Maurizio Lombardi
2020-10-07 14:53 ` [PATCH 1/2] target: iscsi: prevent a race condition in iscsit_unmap_cmd() Maurizio Lombardi
2020-10-08  2:15   ` Bart Van Assche
2020-10-08  9:42     ` Maurizio Lombardi
2020-10-07 14:53 ` [PATCH 2/2] target: iscsi: fix a race condition when aborting a task Maurizio Lombardi
2020-10-22  2:42   ` Mike Christie
2020-10-27 13:49     ` Maurizio Lombardi
2020-10-27 17:54       ` Mike Christie
2020-10-27 20:03         ` Michael Christie
2020-10-28 17:09           ` Maurizio Lombardi
2020-10-28 20:37             ` Mike Christie
2020-11-10 21:29               ` Maurizio Lombardi
2020-11-10 23:08                 ` Mike Christie
2020-11-11  2:16             ` Mike Christie
2020-11-11 14:58               ` Maurizio Lombardi
2020-11-11 15:37                 ` Michael Christie
2020-11-11 15:48                   ` Maurizio Lombardi

Target-devel archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/target-devel/0 target-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 target-devel target-devel/ https://lore.kernel.org/target-devel \
		target-devel@vger.kernel.org
	public-inbox-index target-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.target-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git