target-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] target: TMF and recovery fixes
@ 2023-03-09 22:32 Mike Christie
  2023-03-09 22:32 ` [PATCH 01/18] scsi: target: Move sess cmd counter to new struct Mike Christie
                   ` (17 more replies)
  0 siblings, 18 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:32 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel

The following patches apply over Martin's 6.4 branches and Linus's tree.
They fix a couple regressions/bugs in iscsit that occur when there are
multiple sessions accessing the same se_device and TMRs are executing and
a connection is closed. And they fix some bugs in isert for the single
session case when there are TMRs executing and the connection is closed.
It also includes Dimitry's fix for cmd cleanup when ERL2 is used.

v4:
- Fix
[PATCH v3 07/14] scsi: target: Fix multiple LUN_RESET handling
so it doesn't add back the bug where resets can wait on each other
and deadlock.
- Updated
[PATCH v3 06/14] scsi: target: iscsit: Fix TAS handling during conn
so it handles the case where commands have passed the STOP checks and
have not yet hit them.
- Added patch to handle another hang found while testnig where we wait
on the free_compl but never drop a ref due to being in the wrong state.
v3:
- Drop patch "iscsit: Fix isert disconnect handling during login"
- Add patch to drop cmd lock in completion callout
- Add patch to clean up recovery commands earlier when ERL2 is used
- Add patch to handle use-after-free in isert
v2:
- Fix wait/stop use during login failures
- Add patches to support isert





^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/18] scsi: target: Move sess cmd counter to new struct
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
@ 2023-03-09 22:32 ` Mike Christie
  2023-03-09 22:32 ` [PATCH 02/18] scsi: target: Move cmd counter allocation Mike Christie
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:32 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

iSCSI needs to wait on outstanding commands like how srp and the FC/fcoe
drivers do. It can't use target_stop_session because for MCS support we
can't stop the entire session during recovery because if other connections
are ok then we want to be able to continue to execute IO on them.

This patch moves the per session cmd counters to a new struct, so iSCSI
can allocate it per connection. The xcopy code can also just not allocate
it in the future since it doesn't need to track commands.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>

---
 drivers/target/target_core_tpg.c         |   2 +-
 drivers/target/target_core_transport.c   | 135 ++++++++++++++++-------
 include/target/iscsi/iscsi_target_core.h |   1 +
 include/target/target_core_base.h        |  13 ++-
 4 files changed, 107 insertions(+), 44 deletions(-)

diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index 736847c933e5..8ebccdbd94f0 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -328,7 +328,7 @@ static void target_shutdown_sessions(struct se_node_acl *acl)
 restart:
 	spin_lock_irqsave(&acl->nacl_sess_lock, flags);
 	list_for_each_entry(sess, &acl->acl_sess_list, sess_acl_list) {
-		if (atomic_read(&sess->stopped))
+		if (sess->cmd_cnt && atomic_read(&sess->cmd_cnt->stopped))
 			continue;
 
 		list_del_init(&sess->sess_acl_list);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5926316252eb..3d6034f00dcd 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -220,11 +220,49 @@ void transport_subsystem_check_init(void)
 	sub_api_initialized = 1;
 }
 
-static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
+static void target_release_cmd_refcnt(struct percpu_ref *ref)
 {
-	struct se_session *sess = container_of(ref, typeof(*sess), cmd_count);
+	struct target_cmd_counter *cmd_cnt  = container_of(ref,
+							   typeof(*cmd_cnt),
+							   refcnt);
+	wake_up(&cmd_cnt->refcnt_wq);
+}
+
+static struct target_cmd_counter *target_alloc_cmd_counter(void)
+{
+	struct target_cmd_counter *cmd_cnt;
+	int rc;
+
+	cmd_cnt = kzalloc(sizeof(*cmd_cnt), GFP_KERNEL);
+	if (!cmd_cnt)
+		return NULL;
 
-	wake_up(&sess->cmd_count_wq);
+	init_completion(&cmd_cnt->stop_done);
+	init_waitqueue_head(&cmd_cnt->refcnt_wq);
+	atomic_set(&cmd_cnt->stopped, 0);
+
+	rc = percpu_ref_init(&cmd_cnt->refcnt, target_release_cmd_refcnt, 0,
+			     GFP_KERNEL);
+	if (rc)
+		goto free_cmd_cnt;
+
+	return cmd_cnt;
+
+free_cmd_cnt:
+	kfree(cmd_cnt);
+	return NULL;
+}
+
+static void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt)
+{
+	/*
+	 * Drivers like loop do not call target_stop_session during session
+	 * shutdown so we have to drop the ref taken at init time here.
+	 */
+	if (!atomic_read(&cmd_cnt->stopped))
+		percpu_ref_put(&cmd_cnt->refcnt);
+
+	percpu_ref_exit(&cmd_cnt->refcnt);
 }
 
 /**
@@ -238,25 +276,17 @@ int transport_init_session(struct se_session *se_sess)
 	INIT_LIST_HEAD(&se_sess->sess_list);
 	INIT_LIST_HEAD(&se_sess->sess_acl_list);
 	spin_lock_init(&se_sess->sess_cmd_lock);
-	init_waitqueue_head(&se_sess->cmd_count_wq);
-	init_completion(&se_sess->stop_done);
-	atomic_set(&se_sess->stopped, 0);
-	return percpu_ref_init(&se_sess->cmd_count,
-			       target_release_sess_cmd_refcnt, 0, GFP_KERNEL);
+	se_sess->cmd_cnt = target_alloc_cmd_counter();
+	if (!se_sess->cmd_cnt)
+		return -ENOMEM;
+
+	return  0;
 }
 EXPORT_SYMBOL(transport_init_session);
 
 void transport_uninit_session(struct se_session *se_sess)
 {
-	/*
-	 * Drivers like iscsi and loop do not call target_stop_session
-	 * during session shutdown so we have to drop the ref taken at init
-	 * time here.
-	 */
-	if (!atomic_read(&se_sess->stopped))
-		percpu_ref_put(&se_sess->cmd_count);
-
-	percpu_ref_exit(&se_sess->cmd_count);
+	target_free_cmd_counter(se_sess->cmd_cnt);
 }
 
 /**
@@ -2970,9 +3000,16 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
 		se_cmd->se_cmd_flags |= SCF_ACK_KREF;
 	}
 
-	if (!percpu_ref_tryget_live(&se_sess->cmd_count))
-		ret = -ESHUTDOWN;
-
+	/*
+	 * Users like xcopy do not use counters since they never do a stop
+	 * and wait.
+	 */
+	if (se_sess->cmd_cnt) {
+		if (!percpu_ref_tryget_live(&se_sess->cmd_cnt->refcnt))
+			ret = -ESHUTDOWN;
+		else
+			se_cmd->cmd_cnt = se_sess->cmd_cnt;
+	}
 	if (ret && ack_kref)
 		target_put_sess_cmd(se_cmd);
 
@@ -2993,7 +3030,7 @@ static void target_free_cmd_mem(struct se_cmd *cmd)
 static void target_release_cmd_kref(struct kref *kref)
 {
 	struct se_cmd *se_cmd = container_of(kref, struct se_cmd, cmd_kref);
-	struct se_session *se_sess = se_cmd->se_sess;
+	struct target_cmd_counter *cmd_cnt = se_cmd->cmd_cnt;
 	struct completion *free_compl = se_cmd->free_compl;
 	struct completion *abrt_compl = se_cmd->abrt_compl;
 
@@ -3004,7 +3041,8 @@ static void target_release_cmd_kref(struct kref *kref)
 	if (abrt_compl)
 		complete(abrt_compl);
 
-	percpu_ref_put(&se_sess->cmd_count);
+	if (cmd_cnt)
+		percpu_ref_put(&cmd_cnt->refcnt);
 }
 
 /**
@@ -3123,46 +3161,65 @@ void target_show_cmd(const char *pfx, struct se_cmd *cmd)
 }
 EXPORT_SYMBOL(target_show_cmd);
 
-static void target_stop_session_confirm(struct percpu_ref *ref)
+static void target_stop_cmd_counter_confirm(struct percpu_ref *ref)
+{
+	struct target_cmd_counter *cmd_cnt = container_of(ref,
+						struct target_cmd_counter,
+						refcnt);
+	complete_all(&cmd_cnt->stop_done);
+}
+
+/**
+ * target_stop_cmd_counter - Stop new IO from being added to the counter.
+ * @cmd_cnt: counter to stop
+ */
+static void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt)
 {
-	struct se_session *se_sess = container_of(ref, struct se_session,
-						  cmd_count);
-	complete_all(&se_sess->stop_done);
+	pr_debug("Stopping command counter.\n");
+	if (!atomic_cmpxchg(&cmd_cnt->stopped, 0, 1))
+		percpu_ref_kill_and_confirm(&cmd_cnt->refcnt,
+					    target_stop_cmd_counter_confirm);
 }
 
 /**
  * target_stop_session - Stop new IO from being queued on the session.
- * @se_sess:    session to stop
+ * @se_sess: session to stop
  */
 void target_stop_session(struct se_session *se_sess)
 {
-	pr_debug("Stopping session queue.\n");
-	if (atomic_cmpxchg(&se_sess->stopped, 0, 1) == 0)
-		percpu_ref_kill_and_confirm(&se_sess->cmd_count,
-					    target_stop_session_confirm);
+	target_stop_cmd_counter(se_sess->cmd_cnt);
 }
 EXPORT_SYMBOL(target_stop_session);
 
 /**
- * target_wait_for_sess_cmds - Wait for outstanding commands
- * @se_sess:    session to wait for active I/O
+ * target_wait_for_cmds - Wait for outstanding cmds.
+ * @cmd_cnt: counter to wait for active I/O for.
  */
-void target_wait_for_sess_cmds(struct se_session *se_sess)
+static void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
 {
 	int ret;
 
-	WARN_ON_ONCE(!atomic_read(&se_sess->stopped));
+	WARN_ON_ONCE(!atomic_read(&cmd_cnt->stopped));
 
 	do {
 		pr_debug("Waiting for running cmds to complete.\n");
-		ret = wait_event_timeout(se_sess->cmd_count_wq,
-				percpu_ref_is_zero(&se_sess->cmd_count),
-				180 * HZ);
+		ret = wait_event_timeout(cmd_cnt->refcnt_wq,
+					 percpu_ref_is_zero(&cmd_cnt->refcnt),
+					 180 * HZ);
 	} while (ret <= 0);
 
-	wait_for_completion(&se_sess->stop_done);
+	wait_for_completion(&cmd_cnt->stop_done);
 	pr_debug("Waiting for cmds done.\n");
 }
+
+/**
+ * target_wait_for_sess_cmds - Wait for outstanding commands
+ * @se_sess: session to wait for active I/O
+ */
+void target_wait_for_sess_cmds(struct se_session *se_sess)
+{
+	target_wait_for_cmds(se_sess->cmd_cnt);
+}
 EXPORT_SYMBOL(target_wait_for_sess_cmds);
 
 /*
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index 94d06ddfd80a..229118156a1f 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -600,6 +600,7 @@ struct iscsit_conn {
 	struct iscsi_tpg_np	*tpg_np;
 	/* Pointer to parent session */
 	struct iscsit_session	*sess;
+	struct target_cmd_counter *cmd_cnt;
 	int			bitmap_id;
 	int			rx_thread_active;
 	struct task_struct	*rx_thread;
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 12c9ba16217e..bd299790e99c 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -494,6 +494,7 @@ struct se_cmd {
 	struct se_lun		*se_lun;
 	/* Only used for internal passthrough and legacy TCM fabric modules */
 	struct se_session	*se_sess;
+	struct target_cmd_counter *cmd_cnt;
 	struct se_tmr_req	*se_tmr_req;
 	struct llist_node	se_cmd_list;
 	struct completion	*free_compl;
@@ -619,22 +620,26 @@ static inline struct se_node_acl *fabric_stat_to_nacl(struct config_item *item)
 			acl_fabric_stat_group);
 }
 
-struct se_session {
+struct target_cmd_counter {
+	struct percpu_ref	refcnt;
+	wait_queue_head_t	refcnt_wq;
+	struct completion	stop_done;
 	atomic_t		stopped;
+};
+
+struct se_session {
 	u64			sess_bin_isid;
 	enum target_prot_op	sup_prot_ops;
 	enum target_prot_type	sess_prot_type;
 	struct se_node_acl	*se_node_acl;
 	struct se_portal_group *se_tpg;
 	void			*fabric_sess_ptr;
-	struct percpu_ref	cmd_count;
 	struct list_head	sess_list;
 	struct list_head	sess_acl_list;
 	spinlock_t		sess_cmd_lock;
-	wait_queue_head_t	cmd_count_wq;
-	struct completion	stop_done;
 	void			*sess_cmd_map;
 	struct sbitmap_queue	sess_tag_pool;
+	struct target_cmd_counter *cmd_cnt;
 };
 
 struct se_device;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/18] scsi: target: Move cmd counter allocation
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
  2023-03-09 22:32 ` [PATCH 01/18] scsi: target: Move sess cmd counter to new struct Mike Christie
@ 2023-03-09 22:32 ` Mike Christie
  2023-03-09 22:32 ` [PATCH 03/18] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:32 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

iSCSI needs to allocate its cmd counter per connection for MCS support
where we need to stop and wait on commands running on a connection instead
of per session. This moves the cmd counter allocation to
target_setup_session which is used by drivers that need the stop+wait
behavior per session.

xcopy doesn't need stop+wait at all, so we will be ok moving the cmd
counter allocation outside of transport_init_session.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/iscsi/iscsi_target_login.c | 10 +++++
 drivers/target/target_core_internal.h     |  1 -
 drivers/target/target_core_transport.c    | 55 +++++++++++------------
 drivers/target/target_core_xcopy.c        | 15 +------
 include/target/target_core_fabric.h       |  4 +-
 5 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 27e448c2d066..8ab6c0107d89 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -324,8 +324,18 @@ static int iscsi_login_zero_tsih_s1(
 		goto free_ops;
 	}
 
+	/*
+	 * This is temp for iser. It will be moved to per conn in later
+	 * patches for iscsi.
+	 */
+	sess->se_sess->cmd_cnt = target_alloc_cmd_counter();
+	if (!sess->se_sess->cmd_cnt)
+		goto free_se_sess;
+
 	return 0;
 
+free_se_sess:
+	transport_free_session(sess->se_sess);
 free_ops:
 	kfree(sess->sess_ops);
 free_id:
diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index 38a6d08f75b3..85e35cf582e5 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -138,7 +138,6 @@ int	init_se_kmem_caches(void);
 void	release_se_kmem_caches(void);
 u32	scsi_get_new_index(scsi_index_t);
 void	transport_subsystem_check_init(void);
-void	transport_uninit_session(struct se_session *);
 unsigned char *transport_dump_cmd_direction(struct se_cmd *);
 void	transport_dump_dev_state(struct se_device *, char *, int *);
 void	transport_dump_dev_info(struct se_device *, struct se_lun *,
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 3d6034f00dcd..60647a49a1d3 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -228,7 +228,7 @@ static void target_release_cmd_refcnt(struct percpu_ref *ref)
 	wake_up(&cmd_cnt->refcnt_wq);
 }
 
-static struct target_cmd_counter *target_alloc_cmd_counter(void)
+struct target_cmd_counter *target_alloc_cmd_counter(void)
 {
 	struct target_cmd_counter *cmd_cnt;
 	int rc;
@@ -252,6 +252,7 @@ static struct target_cmd_counter *target_alloc_cmd_counter(void)
 	kfree(cmd_cnt);
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(target_alloc_cmd_counter);
 
 static void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt)
 {
@@ -271,24 +272,14 @@ static void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt)
  *
  * The caller must have zero-initialized @se_sess before calling this function.
  */
-int transport_init_session(struct se_session *se_sess)
+void transport_init_session(struct se_session *se_sess)
 {
 	INIT_LIST_HEAD(&se_sess->sess_list);
 	INIT_LIST_HEAD(&se_sess->sess_acl_list);
 	spin_lock_init(&se_sess->sess_cmd_lock);
-	se_sess->cmd_cnt = target_alloc_cmd_counter();
-	if (!se_sess->cmd_cnt)
-		return -ENOMEM;
-
-	return  0;
 }
 EXPORT_SYMBOL(transport_init_session);
 
-void transport_uninit_session(struct se_session *se_sess)
-{
-	target_free_cmd_counter(se_sess->cmd_cnt);
-}
-
 /**
  * transport_alloc_session - allocate a session object and initialize it
  * @sup_prot_ops: bitmask that defines which T10-PI modes are supported.
@@ -296,7 +287,6 @@ void transport_uninit_session(struct se_session *se_sess)
 struct se_session *transport_alloc_session(enum target_prot_op sup_prot_ops)
 {
 	struct se_session *se_sess;
-	int ret;
 
 	se_sess = kmem_cache_zalloc(se_sess_cache, GFP_KERNEL);
 	if (!se_sess) {
@@ -304,11 +294,7 @@ struct se_session *transport_alloc_session(enum target_prot_op sup_prot_ops)
 				" se_sess_cache\n");
 		return ERR_PTR(-ENOMEM);
 	}
-	ret = transport_init_session(se_sess);
-	if (ret < 0) {
-		kmem_cache_free(se_sess_cache, se_sess);
-		return ERR_PTR(ret);
-	}
+	transport_init_session(se_sess);
 	se_sess->sup_prot_ops = sup_prot_ops;
 
 	return se_sess;
@@ -474,8 +460,13 @@ target_setup_session(struct se_portal_group *tpg,
 		     int (*callback)(struct se_portal_group *,
 				     struct se_session *, void *))
 {
+	struct target_cmd_counter *cmd_cnt;
 	struct se_session *sess;
+	int rc;
 
+	cmd_cnt = target_alloc_cmd_counter();
+	if (!cmd_cnt)
+		return ERR_PTR(-ENOMEM);
 	/*
 	 * If the fabric driver is using percpu-ida based pre allocation
 	 * of I/O descriptor tags, go ahead and perform that setup now..
@@ -485,29 +476,36 @@ target_setup_session(struct se_portal_group *tpg,
 	else
 		sess = transport_alloc_session(prot_op);
 
-	if (IS_ERR(sess))
-		return sess;
+	if (IS_ERR(sess)) {
+		rc = PTR_ERR(sess);
+		goto free_cnt;
+	}
+	sess->cmd_cnt = cmd_cnt;
 
 	sess->se_node_acl = core_tpg_check_initiator_node_acl(tpg,
 					(unsigned char *)initiatorname);
 	if (!sess->se_node_acl) {
-		transport_free_session(sess);
-		return ERR_PTR(-EACCES);
+		rc = -EACCES;
+		goto free_sess;
 	}
 	/*
 	 * Go ahead and perform any remaining fabric setup that is
 	 * required before transport_register_session().
 	 */
 	if (callback != NULL) {
-		int rc = callback(tpg, sess, private);
-		if (rc) {
-			transport_free_session(sess);
-			return ERR_PTR(rc);
-		}
+		rc = callback(tpg, sess, private);
+		if (rc)
+			goto free_sess;
 	}
 
 	transport_register_session(tpg, sess->se_node_acl, sess, private);
 	return sess;
+
+free_sess:
+	transport_free_session(sess);
+free_cnt:
+	target_free_cmd_counter(cmd_cnt);
+	return ERR_PTR(rc);
 }
 EXPORT_SYMBOL(target_setup_session);
 
@@ -632,7 +630,8 @@ void transport_free_session(struct se_session *se_sess)
 		sbitmap_queue_free(&se_sess->sess_tag_pool);
 		kvfree(se_sess->sess_cmd_map);
 	}
-	transport_uninit_session(se_sess);
+	if (se_sess->cmd_cnt)
+		target_free_cmd_counter(se_sess->cmd_cnt);
 	kmem_cache_free(se_sess_cache, se_sess);
 }
 EXPORT_SYMBOL(transport_free_session);
diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 49eaee022ef1..49a83500c8b7 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -461,8 +461,6 @@ static const struct target_core_fabric_ops xcopy_pt_tfo = {
 
 int target_xcopy_setup_pt(void)
 {
-	int ret;
-
 	xcopy_wq = alloc_workqueue("xcopy_wq", WQ_MEM_RECLAIM, 0);
 	if (!xcopy_wq) {
 		pr_err("Unable to allocate xcopy_wq\n");
@@ -479,9 +477,7 @@ int target_xcopy_setup_pt(void)
 	INIT_LIST_HEAD(&xcopy_pt_nacl.acl_list);
 	INIT_LIST_HEAD(&xcopy_pt_nacl.acl_sess_list);
 	memset(&xcopy_pt_sess, 0, sizeof(struct se_session));
-	ret = transport_init_session(&xcopy_pt_sess);
-	if (ret < 0)
-		goto destroy_wq;
+	transport_init_session(&xcopy_pt_sess);
 
 	xcopy_pt_nacl.se_tpg = &xcopy_pt_tpg;
 	xcopy_pt_nacl.nacl_sess = &xcopy_pt_sess;
@@ -490,19 +486,12 @@ int target_xcopy_setup_pt(void)
 	xcopy_pt_sess.se_node_acl = &xcopy_pt_nacl;
 
 	return 0;
-
-destroy_wq:
-	destroy_workqueue(xcopy_wq);
-	xcopy_wq = NULL;
-	return ret;
 }
 
 void target_xcopy_release_pt(void)
 {
-	if (xcopy_wq) {
+	if (xcopy_wq)
 		destroy_workqueue(xcopy_wq);
-		transport_uninit_session(&xcopy_pt_sess);
-	}
 }
 
 /*
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index 38f0662476d1..65527174b8bc 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -133,7 +133,9 @@ struct se_session *target_setup_session(struct se_portal_group *,
 				struct se_session *, void *));
 void target_remove_session(struct se_session *);
 
-int transport_init_session(struct se_session *se_sess);
+struct target_cmd_counter *target_alloc_cmd_counter(void);
+
+void transport_init_session(struct se_session *se_sess);
 struct se_session *transport_alloc_session(enum target_prot_op);
 int transport_alloc_session_tags(struct se_session *, unsigned int,
 		unsigned int);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/18] scsi: target: Pass in cmd counter to use during cmd setup
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
  2023-03-09 22:32 ` [PATCH 01/18] scsi: target: Move sess cmd counter to new struct Mike Christie
  2023-03-09 22:32 ` [PATCH 02/18] scsi: target: Move cmd counter allocation Mike Christie
@ 2023-03-09 22:32 ` Mike Christie
  2023-03-09 22:32 ` [PATCH 04/18] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:32 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This allows target_get_sess_cmd users to pass in the cmd counter they want
to use. Right now we pass in the session's cmd counter but in the next
patch iSCSI will switch from per session to per conn so this patch will be
needed for that conversion.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/iscsi/iscsi_target.c    | 10 +++++----
 drivers/target/target_core_transport.c | 28 ++++++++++++--------------
 drivers/target/target_core_xcopy.c     |  8 ++++----
 drivers/usb/gadget/function/f_tcm.c    |  4 ++--
 include/target/target_core_fabric.h    |  8 +++++---
 5 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index baf4da7bb3b4..87927a36f90d 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1190,9 +1190,10 @@ int iscsit_setup_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd,
 	 * Initialize struct se_cmd descriptor from target_core_mod infrastructure
 	 */
 	__target_init_cmd(&cmd->se_cmd, &iscsi_ops,
-			 conn->sess->se_sess, be32_to_cpu(hdr->data_length),
-			 cmd->data_direction, sam_task_attr,
-			 cmd->sense_buffer + 2, scsilun_to_int(&hdr->lun));
+			  conn->sess->se_sess, be32_to_cpu(hdr->data_length),
+			  cmd->data_direction, sam_task_attr,
+			  cmd->sense_buffer + 2, scsilun_to_int(&hdr->lun),
+			  conn->sess->se_sess->cmd_cnt);
 
 	pr_debug("Got SCSI Command, ITT: 0x%08x, CmdSN: 0x%08x,"
 		" ExpXferLen: %u, Length: %u, CID: %hu\n", hdr->itt,
@@ -2055,7 +2056,8 @@ iscsit_handle_task_mgt_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd,
 	__target_init_cmd(&cmd->se_cmd, &iscsi_ops,
 			  conn->sess->se_sess, 0, DMA_NONE,
 			  TCM_SIMPLE_TAG, cmd->sense_buffer + 2,
-			  scsilun_to_int(&hdr->lun));
+			  scsilun_to_int(&hdr->lun),
+			  conn->sess->se_sess->cmd_cnt);
 
 	target_get_sess_cmd(&cmd->se_cmd, true);
 
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 60647a49a1d3..c395606ab1a9 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1441,14 +1441,12 @@ target_cmd_size_check(struct se_cmd *cmd, unsigned int size)
  *
  * Preserves the value of @cmd->tag.
  */
-void __target_init_cmd(
-	struct se_cmd *cmd,
-	const struct target_core_fabric_ops *tfo,
-	struct se_session *se_sess,
-	u32 data_length,
-	int data_direction,
-	int task_attr,
-	unsigned char *sense_buffer, u64 unpacked_lun)
+void __target_init_cmd(struct se_cmd *cmd,
+		       const struct target_core_fabric_ops *tfo,
+		       struct se_session *se_sess, u32 data_length,
+		       int data_direction, int task_attr,
+		       unsigned char *sense_buffer, u64 unpacked_lun,
+		       struct target_cmd_counter *cmd_cnt)
 {
 	INIT_LIST_HEAD(&cmd->se_delayed_node);
 	INIT_LIST_HEAD(&cmd->se_qf_node);
@@ -1468,6 +1466,7 @@ void __target_init_cmd(
 	cmd->sam_task_attr = task_attr;
 	cmd->sense_buffer = sense_buffer;
 	cmd->orig_fe_lun = unpacked_lun;
+	cmd->cmd_cnt = cmd_cnt;
 
 	if (!(cmd->se_cmd_flags & SCF_USE_CPUID))
 		cmd->cpuid = raw_smp_processor_id();
@@ -1687,7 +1686,8 @@ int target_init_cmd(struct se_cmd *se_cmd, struct se_session *se_sess,
 	 * target_core_fabric_ops->queue_status() callback
 	 */
 	__target_init_cmd(se_cmd, se_tpg->se_tpg_tfo, se_sess, data_length,
-			  data_dir, task_attr, sense, unpacked_lun);
+			  data_dir, task_attr, sense, unpacked_lun,
+			  se_sess->cmd_cnt);
 
 	/*
 	 * Obtain struct se_cmd->cmd_kref reference. A second kref_get here is
@@ -1982,7 +1982,8 @@ int target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess,
 	BUG_ON(!se_tpg);
 
 	__target_init_cmd(se_cmd, se_tpg->se_tpg_tfo, se_sess,
-			  0, DMA_NONE, TCM_SIMPLE_TAG, sense, unpacked_lun);
+			  0, DMA_NONE, TCM_SIMPLE_TAG, sense, unpacked_lun,
+			  se_sess->cmd_cnt);
 	/*
 	 * FIXME: Currently expect caller to handle se_cmd->se_tmr_req
 	 * allocation failure.
@@ -2986,7 +2987,6 @@ EXPORT_SYMBOL(transport_generic_free_cmd);
  */
 int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
 {
-	struct se_session *se_sess = se_cmd->se_sess;
 	int ret = 0;
 
 	/*
@@ -3003,11 +3003,9 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
 	 * Users like xcopy do not use counters since they never do a stop
 	 * and wait.
 	 */
-	if (se_sess->cmd_cnt) {
-		if (!percpu_ref_tryget_live(&se_sess->cmd_cnt->refcnt))
+	if (se_cmd->cmd_cnt) {
+		if (!percpu_ref_tryget_live(&se_cmd->cmd_cnt->refcnt))
 			ret = -ESHUTDOWN;
-		else
-			se_cmd->cmd_cnt = se_sess->cmd_cnt;
 	}
 	if (ret && ack_kref)
 		target_put_sess_cmd(se_cmd);
diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 49a83500c8b7..91ed015b588c 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -591,8 +591,8 @@ static int target_xcopy_read_source(
 		(unsigned long long)src_lba, transfer_length_block, src_bytes);
 
 	__target_init_cmd(se_cmd, &xcopy_pt_tfo, &xcopy_pt_sess, src_bytes,
-			  DMA_FROM_DEVICE, 0, &xpt_cmd.sense_buffer[0], 0);
-
+			  DMA_FROM_DEVICE, 0, &xpt_cmd.sense_buffer[0], 0,
+			  NULL);
 	rc = target_xcopy_setup_pt_cmd(&xpt_cmd, xop, src_dev, &cdb[0],
 				remote_port);
 	if (rc < 0) {
@@ -636,8 +636,8 @@ static int target_xcopy_write_destination(
 		(unsigned long long)dst_lba, transfer_length_block, dst_bytes);
 
 	__target_init_cmd(se_cmd, &xcopy_pt_tfo, &xcopy_pt_sess, dst_bytes,
-			  DMA_TO_DEVICE, 0, &xpt_cmd.sense_buffer[0], 0);
-
+			  DMA_TO_DEVICE, 0, &xpt_cmd.sense_buffer[0], 0,
+			  NULL);
 	rc = target_xcopy_setup_pt_cmd(&xpt_cmd, xop, dst_dev, &cdb[0],
 				remote_port);
 	if (rc < 0) {
diff --git a/drivers/usb/gadget/function/f_tcm.c b/drivers/usb/gadget/function/f_tcm.c
index 658e2e21fdd0..c21acebe8aae 100644
--- a/drivers/usb/gadget/function/f_tcm.c
+++ b/drivers/usb/gadget/function/f_tcm.c
@@ -1054,7 +1054,7 @@ static void usbg_cmd_work(struct work_struct *work)
 				  tv_nexus->tvn_se_sess->se_tpg->se_tpg_tfo,
 				  tv_nexus->tvn_se_sess, cmd->data_len, DMA_NONE,
 				  cmd->prio_attr, cmd->sense_iu.sense,
-				  cmd->unpacked_lun);
+				  cmd->unpacked_lun, NULL);
 		goto out;
 	}
 
@@ -1183,7 +1183,7 @@ static void bot_cmd_work(struct work_struct *work)
 				  tv_nexus->tvn_se_sess->se_tpg->se_tpg_tfo,
 				  tv_nexus->tvn_se_sess, cmd->data_len, DMA_NONE,
 				  cmd->prio_attr, cmd->sense_iu.sense,
-				  cmd->unpacked_lun);
+				  cmd->unpacked_lun, NULL);
 		goto out;
 	}
 
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index 65527174b8bc..d507e7885f17 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -151,9 +151,11 @@ void	transport_deregister_session_configfs(struct se_session *);
 void	transport_deregister_session(struct se_session *);
 
 
-void	__target_init_cmd(struct se_cmd *,
-		const struct target_core_fabric_ops *,
-		struct se_session *, u32, int, int, unsigned char *, u64);
+void	__target_init_cmd(struct se_cmd *cmd,
+		const struct target_core_fabric_ops *tfo,
+		struct se_session *sess, u32 data_length, int data_direction,
+		int task_attr, unsigned char *sense_buffer, u64 unpacked_lun,
+		struct target_cmd_counter *cmd_cnt);
 int	target_init_cmd(struct se_cmd *se_cmd, struct se_session *se_sess,
 		unsigned char *sense, u64 unpacked_lun, u32 data_length,
 		int task_attr, int data_dir, int flags);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/18] scsi: target: iscsit/isert: Alloc per conn cmd counter
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (2 preceding siblings ...)
  2023-03-09 22:32 ` [PATCH 03/18] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
@ 2023-03-09 22:32 ` Mike Christie
  2023-03-09 22:32 ` [PATCH 05/18] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:32 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This has iscsit allocate a per conn cmd counter and converts iscsit/isert
to use it instead of the per session one.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c   |  4 ++--
 drivers/target/iscsi/iscsi_target.c       |  4 ++--
 drivers/target/iscsi/iscsi_target_login.c | 17 +++++++----------
 drivers/target/target_core_transport.c    |  9 ++++++---
 include/target/target_core_fabric.h       |  3 +++
 5 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 75404885cf98..f290cd49698e 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2506,8 +2506,8 @@ isert_wait4cmds(struct iscsit_conn *conn)
 	isert_info("iscsit_conn %p\n", conn);
 
 	if (conn->sess) {
-		target_stop_session(conn->sess->se_sess);
-		target_wait_for_sess_cmds(conn->sess->se_sess);
+		target_stop_cmd_counter(conn->cmd_cnt);
+		target_wait_for_cmds(conn->cmd_cnt);
 	}
 }
 
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 87927a36f90d..11115c207844 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -1193,7 +1193,7 @@ int iscsit_setup_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd,
 			  conn->sess->se_sess, be32_to_cpu(hdr->data_length),
 			  cmd->data_direction, sam_task_attr,
 			  cmd->sense_buffer + 2, scsilun_to_int(&hdr->lun),
-			  conn->sess->se_sess->cmd_cnt);
+			  conn->cmd_cnt);
 
 	pr_debug("Got SCSI Command, ITT: 0x%08x, CmdSN: 0x%08x,"
 		" ExpXferLen: %u, Length: %u, CID: %hu\n", hdr->itt,
@@ -2057,7 +2057,7 @@ iscsit_handle_task_mgt_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd,
 			  conn->sess->se_sess, 0, DMA_NONE,
 			  TCM_SIMPLE_TAG, cmd->sense_buffer + 2,
 			  scsilun_to_int(&hdr->lun),
-			  conn->sess->se_sess->cmd_cnt);
+			  conn->cmd_cnt);
 
 	target_get_sess_cmd(&cmd->se_cmd, true);
 
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 8ab6c0107d89..274bdd7845ca 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -324,18 +324,8 @@ static int iscsi_login_zero_tsih_s1(
 		goto free_ops;
 	}
 
-	/*
-	 * This is temp for iser. It will be moved to per conn in later
-	 * patches for iscsi.
-	 */
-	sess->se_sess->cmd_cnt = target_alloc_cmd_counter();
-	if (!sess->se_sess->cmd_cnt)
-		goto free_se_sess;
-
 	return 0;
 
-free_se_sess:
-	transport_free_session(sess->se_sess);
 free_ops:
 	kfree(sess->sess_ops);
 free_id:
@@ -1157,8 +1147,14 @@ static struct iscsit_conn *iscsit_alloc_conn(struct iscsi_np *np)
 		goto free_conn_cpumask;
 	}
 
+	conn->cmd_cnt = target_alloc_cmd_counter();
+	if (!conn->cmd_cnt)
+		goto free_conn_allowed_cpumask;
+
 	return conn;
 
+free_conn_allowed_cpumask:
+	free_cpumask_var(conn->allowed_cpumask);
 free_conn_cpumask:
 	free_cpumask_var(conn->conn_cpumask);
 free_conn_ops:
@@ -1172,6 +1168,7 @@ static struct iscsit_conn *iscsit_alloc_conn(struct iscsi_np *np)
 
 void iscsit_free_conn(struct iscsit_conn *conn)
 {
+	target_free_cmd_counter(conn->cmd_cnt);
 	free_cpumask_var(conn->allowed_cpumask);
 	free_cpumask_var(conn->conn_cpumask);
 	kfree(conn->conn_ops);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index c395606ab1a9..86adff2a86ed 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -254,7 +254,7 @@ struct target_cmd_counter *target_alloc_cmd_counter(void)
 }
 EXPORT_SYMBOL_GPL(target_alloc_cmd_counter);
 
-static void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt)
+void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt)
 {
 	/*
 	 * Drivers like loop do not call target_stop_session during session
@@ -265,6 +265,7 @@ static void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt)
 
 	percpu_ref_exit(&cmd_cnt->refcnt);
 }
+EXPORT_SYMBOL_GPL(target_free_cmd_counter);
 
 /**
  * transport_init_session - initialize a session object
@@ -3170,13 +3171,14 @@ static void target_stop_cmd_counter_confirm(struct percpu_ref *ref)
  * target_stop_cmd_counter - Stop new IO from being added to the counter.
  * @cmd_cnt: counter to stop
  */
-static void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt)
+void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt)
 {
 	pr_debug("Stopping command counter.\n");
 	if (!atomic_cmpxchg(&cmd_cnt->stopped, 0, 1))
 		percpu_ref_kill_and_confirm(&cmd_cnt->refcnt,
 					    target_stop_cmd_counter_confirm);
 }
+EXPORT_SYMBOL_GPL(target_stop_cmd_counter);
 
 /**
  * target_stop_session - Stop new IO from being queued on the session.
@@ -3192,7 +3194,7 @@ EXPORT_SYMBOL(target_stop_session);
  * target_wait_for_cmds - Wait for outstanding cmds.
  * @cmd_cnt: counter to wait for active I/O for.
  */
-static void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
+void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
 {
 	int ret;
 
@@ -3208,6 +3210,7 @@ static void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt)
 	wait_for_completion(&cmd_cnt->stop_done);
 	pr_debug("Waiting for cmds done.\n");
 }
+EXPORT_SYMBOL_GPL(target_wait_for_cmds);
 
 /**
  * target_wait_for_sess_cmds - Wait for outstanding commands
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index d507e7885f17..b188b1e90e1e 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -133,7 +133,10 @@ struct se_session *target_setup_session(struct se_portal_group *,
 				struct se_session *, void *));
 void target_remove_session(struct se_session *);
 
+void target_stop_cmd_counter(struct target_cmd_counter *cmd_cnt);
+void target_wait_for_cmds(struct target_cmd_counter *cmd_cnt);
 struct target_cmd_counter *target_alloc_cmd_counter(void);
+void target_free_cmd_counter(struct target_cmd_counter *cmd_cnt);
 
 void transport_init_session(struct se_session *se_sess);
 struct se_session *transport_alloc_session(enum target_prot_op);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/18] scsi: target: iscsit: stop/wait on cmds during conn close
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (3 preceding siblings ...)
  2023-03-09 22:32 ` [PATCH 04/18] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
@ 2023-03-09 22:32 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 06/18] scsi: target: Drop t_state_lock use in compare_and_write_post Mike Christie
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:32 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This fixes a bug added in:

commit f36199355c64 ("scsi: target: iscsi: Fix cmd abort fabric stop
race")

If we have multiple sessions to the same se_device we can hit a race where
a LUN_RESET on one session cleans up the se_cmds from under another
session which is being closed. This results in the closing session freeing
its conn/session structs while they are still in use.

The bug is:

1. Session1 has IO se_cmd1.
2. Session2 can also have se_cmds for IO and optionally TMRs for ABORTS
but then gets a LUN_RESET.
3. The LUN_RESET on session2 sees the se_cmds on session1 and during
the drain stages marks them all with CMD_T_ABORTED.
4. session1 is now closed so iscsit_release_commands_from_conn only sees
se_cmds with the CMD_T_ABORTED bit set and returns immediately even
though we have outstanding commands.
5. session1's connection and session are freed.
6. The backend request for se_cmd1 completes and it accesses the freed
connection/session.

This hooks the iscsit layer into the cmd counter code, so we can wait for
all outstanding se_cmds before freeing the connection.

Fixes: f36199355c64 ("scsi: target: iscsi: Fix cmd abort fabric stop race")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>

---
 drivers/target/iscsi/iscsi_target.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 11115c207844..83b007141229 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4245,6 +4245,16 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
 		iscsit_free_cmd(cmd, true);
 
 	}
+
+	/*
+	 * Wait on commands that were cleaned up via the aborted_task path.
+	 * LLDs that implement iscsit_wait_conn will already have waited for
+	 * commands.
+	 */
+	if (!conn->conn_transport->iscsit_wait_conn) {
+		target_stop_cmd_counter(conn->cmd_cnt);
+		target_wait_for_cmds(conn->cmd_cnt);
+	}
 }
 
 static void iscsit_stop_timers_for_cmds(
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/18] scsi: target: Drop t_state_lock use in compare_and_write_post
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (4 preceding siblings ...)
  2023-03-09 22:32 ` [PATCH 05/18] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

We don't need the t_state_lock lock in compare_and_write_post while
checking the scsi_status because we have just set it before calling
the callback and there is no chance for races with other accesses.

This patch removes it so in the next patch we can hold the lock when
calling the callbacks.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_sbc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index 7536ca797606..c1cf37a1b4ce 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -352,14 +352,12 @@ static sense_reason_t compare_and_write_post(struct se_cmd *cmd, bool success,
 	struct se_device *dev = cmd->se_dev;
 	sense_reason_t ret = TCM_NO_SENSE;
 
-	spin_lock_irq(&cmd->t_state_lock);
 	if (success) {
 		*post_ret = 1;
 
 		if (cmd->scsi_status == SAM_STAT_CHECK_CONDITION)
 			ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 	}
-	spin_unlock_irq(&cmd->t_state_lock);
 
 	/*
 	 * Unlock ->caw_sem originally obtained during sbc_compare_and_write()
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (5 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 06/18] scsi: target: Drop t_state_lock use in compare_and_write_post Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-15 10:47   ` Dmitry Bogdanov
  2023-03-09 22:33 ` [PATCH 08/18] scsi: target: iscsit: Add helper to check when cmd has failed Mike Christie
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

iscsit will set CMD_T_FABRIC_STOP on running commands when its transport
connection is down and it can't send/recv IO (tx/rx threads are killed
or the cleanup thread is run from the one thats up). It will then loop
over running commands and wait for LIO core to complete them or clean
them up if they were on an internal queue waiting to be sent or ackd.

Currently, CMD_T_FABRIC_STOP only stops TMRs from operating on the
command but for isert we need to prevent LIO core from calling into
iscsit callouts when the connection is being brought down. If LIO core
queues commands to iscsit and it ends up adding to an internal queue
instead of passing back to the driver then we can end up hanging waiting
on command completion that never occurs because it's stuck on the internal
list (the tx thread is stopped at this time, so it will never loop over
the response list and call into isert). We also want to sync up on a
point where we no longer call into isert so it can cleanup it's structs.

This has LIO core treat CMD_T_FABRIC_STOP like CMD_T_STOP during
command execution and also fixes the locking around the
target_cmd_interrupted calls so fabric modules can make sure cmds are
never marked both CMD_T_COMPLETE and CMD_T_STOP|CMD_T_FABRIC_STOP.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_sbc.c       |  2 +-
 drivers/target/target_core_transport.c | 27 +++++++++++++++-----------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index c1cf37a1b4ce..ff1ae779543f 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -457,7 +457,7 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes
 		 * we don't have to perform the write operation.
 		 */
 		WARN_ON(!(cmd->transport_state &
-			(CMD_T_ABORTED | CMD_T_STOP)));
+			(CMD_T_ABORTED | CMD_T_STOP | CMD_T_FABRIC_STOP)));
 		goto out;
 	}
 	/*
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 86adff2a86ed..1c23079a5d7f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -737,8 +737,8 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd)
 	 * Determine if frontend context caller is requesting the stopping of
 	 * this command for frontend exceptions.
 	 */
-	if (cmd->transport_state & CMD_T_STOP) {
-		pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
+	if (cmd->transport_state & (CMD_T_STOP | CMD_T_FABRIC_STOP)) {
+		pr_debug("%s:%d CMD_T_STOP|CMD_T_FABRIC_STOP for ITT: 0x%08llx\n",
 			__func__, __LINE__, cmd->tag);
 
 		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
@@ -889,7 +889,7 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
 		INIT_WORK(&cmd->work, target_abort_work);
 		queue_work(target_completion_wq, &cmd->work);
 		return true;
-	} else if (cmd->transport_state & CMD_T_STOP) {
+	} else if (cmd->transport_state & (CMD_T_STOP | CMD_T_FABRIC_STOP)) {
 		if (cmd->transport_complete_callback)
 			cmd->transport_complete_callback(cmd, false, &post_ret);
 		complete_all(&cmd->t_transport_stop_comp);
@@ -907,13 +907,15 @@ void target_complete_cmd_with_sense(struct se_cmd *cmd, u8 scsi_status,
 	int success, cpu;
 	unsigned long flags;
 
-	if (target_cmd_interrupted(cmd))
+	spin_lock_irqsave(&cmd->t_state_lock, flags);
+	if (target_cmd_interrupted(cmd)) {
+		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 		return;
+	}
 
 	cmd->scsi_status = scsi_status;
 	cmd->sense_reason = sense_reason;
 
-	spin_lock_irqsave(&cmd->t_state_lock, flags);
 	switch (cmd->scsi_status) {
 	case SAM_STAT_CHECK_CONDITION:
 		if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
@@ -2277,10 +2279,12 @@ void target_execute_cmd(struct se_cmd *cmd)
 	 *
 	 * If the received CDB has already been aborted stop processing it here.
 	 */
-	if (target_cmd_interrupted(cmd))
+	spin_lock_irq(&cmd->t_state_lock);
+	if (target_cmd_interrupted(cmd)) {
+		spin_unlock_irq(&cmd->t_state_lock);
 		return;
+	}
 
-	spin_lock_irq(&cmd->t_state_lock);
 	cmd->t_state = TRANSPORT_PROCESSING;
 	cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
 	spin_unlock_irq(&cmd->t_state_lock);
@@ -2847,9 +2851,9 @@ transport_generic_new_cmd(struct se_cmd *cmd)
 	 * Determine if frontend context caller is requesting the stopping of
 	 * this command for frontend exceptions.
 	 */
-	if (cmd->transport_state & CMD_T_STOP &&
+	if (cmd->transport_state & (CMD_T_STOP | CMD_T_FABRIC_STOP) &&
 	    !cmd->se_tfo->write_pending_must_be_called) {
-		pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
+		pr_debug("%s:%d CMD_T_STOP|CMD_T_FABRIC_STOPfor ITT: 0x%08llx\n",
 			 __func__, __LINE__, cmd->tag);
 
 		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
@@ -2880,11 +2884,12 @@ static void transport_write_pending_qf(struct se_cmd *cmd)
 	bool stop;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	stop = (cmd->transport_state & (CMD_T_STOP | CMD_T_ABORTED));
+	stop = (cmd->transport_state &
+		(CMD_T_STOP | CMD_T_FABRIC_STOP | CMD_T_ABORTED));
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 
 	if (stop) {
-		pr_debug("%s:%d CMD_T_STOP|CMD_T_ABORTED for ITT: 0x%08llx\n",
+		pr_debug("%s:%d CMD_T_STOP|CMD_T_FABRIC_STOP|CMD_T_ABORTED for ITT: 0x%08llx\n",
 			__func__, __LINE__, cmd->tag);
 		complete_all(&cmd->t_transport_stop_comp);
 		return;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/18] scsi: target: iscsit: Add helper to check when cmd has failed
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (6 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 09/18] scsi: target: iscsit: Cleanup isert commands at conn closure Mike Christie
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This moves the check in lio_queue_status to new helper so other code can
use it to check for commands that were failed by lio core or iscsit.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>

---
 drivers/target/iscsi/iscsi_target_configfs.c | 3 +--
 include/target/iscsi/iscsi_target_core.h     | 5 +++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c
index 5d0f51822414..82c1d335c369 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1411,9 +1411,8 @@ static int lio_queue_status(struct se_cmd *se_cmd)
 
 	cmd->i_state = ISTATE_SEND_STATUS;
 
-	if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
+	if (iscsit_cmd_failed(cmd))
 		return iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
-	}
 	return conn->conn_transport->iscsit_queue_status(conn, cmd);
 }
 
diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
index 229118156a1f..938dee8b7a51 100644
--- a/include/target/iscsi/iscsi_target_core.h
+++ b/include/target/iscsi/iscsi_target_core.h
@@ -913,6 +913,11 @@ static inline u32 session_get_next_ttt(struct iscsit_session *session)
 	return ttt;
 }
 
+static inline bool iscsit_cmd_failed(struct iscsit_cmd *cmd)
+{
+	return cmd->se_cmd.scsi_status || cmd->sense_reason;
+}
+
 extern struct iscsit_cmd *iscsit_find_cmd_from_itt(struct iscsit_conn *, itt_t);
 
 extern void iscsit_thread_check_cpumask(struct iscsit_conn *conn,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/18] scsi: target: iscsit: Cleanup isert commands at conn closure
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (7 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 08/18] scsi: target: iscsit: Add helper to check when cmd has failed Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 10/18] IB/isert: Fix hang in target_wait_for_cmds Mike Christie
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

Currently, isert does target_wait_for_cmds before iscsit calls
iscsit_release_commands_from_conn because we can race where LIO core
calls into isert when it wants to cleanup the connection. The wait
prevents isert from freeing the connection while trying to post responses
but it can result in a hang during connection closure if there are se_cmds
on the iscsit response queue, because when isert calls
target_wait_for_cmds the tx thread is stopped or we are running the wait
from it.

For example this is hit when a command times out on the initiator,  the
initiator sends an ABORT, then the connection is closed. When the command
completes it will be placed on the response queue if TAS is set, and the
ABORT response will be placed on the response queue. So at the very
least we will hang waiting on the last put on the ABORT's se_cmd which
will never happen.

This patch adds support to iscsit so it can now handle isert and iscsit
running commands during connection closure so we can have a common place
for the code.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/iscsi/iscsi_target.c      | 33 ++++++++++++++++++------
 drivers/target/iscsi/iscsi_target_util.c |  8 +++++-
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 83b007141229..2e9c0d7b36a9 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4230,6 +4230,15 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
 				 */
 				list_move_tail(&cmd->i_conn_node,
 					       &conn->conn_cmd_list);
+			} else if (conn->sess->sess_ops->RDMAExtensions &&
+				   (se_cmd->transport_state & CMD_T_COMPLETE) &&
+				   !iscsit_cmd_failed(cmd)) {
+				/*
+				 * isert is still handling these cmds so wait in
+				 * target_wait_for_cmds.
+				 */
+				list_move_tail(&cmd->i_conn_node,
+					       &conn->conn_cmd_list);
 			} else {
 				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
 			}
@@ -4242,19 +4251,27 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
 		list_del_init(&cmd->i_conn_node);
 
 		iscsit_increment_maxcmdsn(cmd, sess);
+		/*
+		 * Free cmds that:
+		 * 1. we have not got acks for.
+		 * 2. are (or will be when the backend completes them) stuck
+		 * on the response/immediate queue (failed cmds, TMRs, iscsi
+		 * reqs).
+		 * 3. completed ok on the backend, but hit the CMD_T_FABRIC_STOP
+		 * or CMD_T_STOP checks.
+		 */
 		iscsit_free_cmd(cmd, true);
-
 	}
 
 	/*
-	 * Wait on commands that were cleaned up via the aborted_task path.
-	 * LLDs that implement iscsit_wait_conn will already have waited for
-	 * commands.
+	 * We need to wait:
+	 * 1. for commands that are being cleaned up via the aborted_task path.
+	 * 2. for isert we need to wait for iscsit_queue_status calls
+	 * that posted a response after the ib_drain_qp call returned but
+	 * have not yet called isert_send_done.
 	 */
-	if (!conn->conn_transport->iscsit_wait_conn) {
-		target_stop_cmd_counter(conn->cmd_cnt);
-		target_wait_for_cmds(conn->cmd_cnt);
-	}
+	target_stop_cmd_counter(conn->cmd_cnt);
+	target_wait_for_cmds(conn->cmd_cnt);
 }
 
 static void iscsit_stop_timers_for_cmds(
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 26dc8ed3045b..b0d7d6c73a1c 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -753,7 +753,13 @@ void iscsit_free_cmd(struct iscsit_cmd *cmd, bool shutdown)
 	if (se_cmd) {
 		rc = transport_generic_free_cmd(se_cmd, shutdown);
 		if (!rc && shutdown && se_cmd->se_sess) {
-			__iscsit_free_cmd(cmd, shutdown);
+			struct iscsit_conn *conn = cmd->conn;
+			/*
+			 * The command wasn't aborted via ABORT_TASK but didn't
+			 * reach the driver so allow it to cleanup resources
+			 * now.
+			 */
+			conn->conn_transport->iscsit_aborted_task(conn, cmd);
 			target_put_sess_cmd(se_cmd);
 		}
 	} else {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/18] IB/isert: Fix hang in target_wait_for_cmds
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (8 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 09/18] scsi: target: iscsit: Cleanup isert commands at conn closure Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 11/18] IB/isert: Fix use after free during conn cleanup Mike Christie
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This removes the target_wait_for_cmds call from isert, to fix a hang that
occurs when isert's calls target_wait_for_cmds to wait on running
commands, but also ends up waiting on failed SCSI commands or TMR
responses that are on the iscsit response queue. When isert_wait_conn is
called the tx thread is down, so the response queue will not be
processed and the target_wait_for_cmds call will never wake up.

This is safe because iscsit can now handle cleaning up both iscsit and
isert commands that are running/completing and stuck on the response
queue.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index f290cd49698e..516fa37494e1 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2500,17 +2500,6 @@ isert_wait4logout(struct isert_conn *isert_conn)
 	}
 }
 
-static void
-isert_wait4cmds(struct iscsit_conn *conn)
-{
-	isert_info("iscsit_conn %p\n", conn);
-
-	if (conn->sess) {
-		target_stop_cmd_counter(conn->cmd_cnt);
-		target_wait_for_cmds(conn->cmd_cnt);
-	}
-}
-
 /**
  * isert_put_unsol_pending_cmds() - Drop commands waiting for
  *     unsolicitate dataout
@@ -2558,7 +2547,6 @@ static void isert_wait_conn(struct iscsit_conn *conn)
 
 	ib_drain_qp(isert_conn->qp);
 	isert_put_unsol_pending_cmds(conn);
-	isert_wait4cmds(conn);
 	isert_wait4logout(isert_conn);
 
 	queue_work(isert_release_wq, &isert_conn->release_work);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/18] IB/isert: Fix use after free during conn cleanup
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (9 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 10/18] IB/isert: Fix hang in target_wait_for_cmds Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-15 15:21   ` Sagi Grimberg
  2023-03-09 22:33 ` [PATCH 12/18] scsi: target: iscsit: free cmds before session free Mike Christie
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

We can end up freeing a command too early during conn cleanup by doing:

1. Send iserts N 1MB writes.
2. Pull cable or down the port isert is using.
3. isert_wait_conn's call to isert_put_unsol_pending_cmds races
with a running isert_send_done -> isert_completion_put -> isert_put_cmd
where isert_put_unsol_pending_cmds sees the cmd on the conn_cmd_list
and it does a isert_put_cmd which will free the cmd. Then isert_send_done
is run and will access the freed command while doing it's normal command
completion:
isert_completion_put -> isert_put_cmd -> transport_generic_free_cmd

This patch has us increment write_data_done, so
isert_put_unsol_pending_cmds can correctly detect commands which will we
will not be calling isert_send_done for.

Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
[code written and suggested by]
Suggested-by: Sagi Grimberg <sagi@grimberg.me>

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>

---
 drivers/infiniband/ulp/isert/ib_isert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 516fa37494e1..a44da60352f6 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1636,7 +1636,7 @@ isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 	if (isert_prot_cmd(isert_conn, se_cmd))
 		ret = isert_check_pi_status(se_cmd, isert_cmd->rw.reg->mr);
 	isert_rdma_rw_ctx_destroy(isert_cmd, isert_conn);
-	cmd->write_data_done = 0;
+	cmd->write_data_done = se_cmd->data_length; /* done fetching data */
 
 	isert_dbg("Cmd: %p RDMA_READ comp calling execute_cmd\n", isert_cmd);
 	spin_lock_bh(&cmd->istate_lock);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/18] scsi: target: iscsit: free cmds before session free
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (10 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 11/18] IB/isert: Fix use after free during conn cleanup Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling Mike Christie
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Forza, Mike Christie

From: Dmitry Bogdanov <d.bogdanov@yadro.com>

Commands from recovery entries are freed after its session has been
closed. That leads to use-after-free at command free or NPE with such
call trace:

Time2Retain timer expired for SID: 1, cleaning up iSCSI session.
BUG: kernel NULL pointer dereference, address: 0000000000000140
RIP: 0010:sbitmap_queue_clear+0x3a/0xa0
Call Trace:
 target_release_cmd_kref+0xd1/0x1f0 [target_core_mod]
 transport_generic_free_cmd+0xd1/0x180 [target_core_mod]
 iscsit_free_cmd+0x53/0xd0 [iscsi_target_mod]
 iscsit_free_connection_recovery_entries+0x29d/0x320 [iscsi_target_mod]
 iscsit_close_session+0x13a/0x140 [iscsi_target_mod]
 iscsit_check_post_dataout+0x440/0x440 [iscsi_target_mod]
 call_timer_fn+0x24/0x140

Move cleanup of recovery enrties to before session freeing.

Reported-by: Forza <forza@tnonline.net>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/target/iscsi/iscsi_target.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 2e9c0d7b36a9..cd9ef668a054 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4546,6 +4546,9 @@ int iscsit_close_session(struct iscsit_session *sess, bool can_sleep)
 	iscsit_stop_time2retain_timer(sess);
 	spin_unlock_bh(&se_tpg->session_lock);
 
+	if (sess->sess_ops->ErrorRecoveryLevel == 2)
+		iscsit_free_connection_recovery_entries(sess);
+
 	/*
 	 * transport_deregister_session_configfs() will clear the
 	 * struct se_node_acl->nacl_sess pointer now as a iscsi_np process context
@@ -4569,9 +4572,6 @@ int iscsit_close_session(struct iscsit_session *sess, bool can_sleep)
 
 	transport_deregister_session(sess->se_sess);
 
-	if (sess->sess_ops->ErrorRecoveryLevel == 2)
-		iscsit_free_connection_recovery_entries(sess);
-
 	iscsit_free_all_ooo_cmdsns(sess);
 
 	spin_lock_bh(&se_tpg->session_lock);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (11 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 12/18] scsi: target: iscsit: free cmds before session free Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-15 16:13   ` Dmitry Bogdanov
  2023-03-09 22:33 ` [PATCH 14/18] scsi: target: Don't set CMD_T_FABRIC_STOP for aborted tasks Mike Christie
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This fixes a bug where an initiator thinks a LUN_RESET has cleaned
up running commands when it hasn't. The bug was added in:

commit 51ec502a3266 ("target: Delete tmr from list before processing")

The problem occurs when:

1. We have N IO cmds running in the target layer spread over 2 sessions.
2. The initiator sends a LUN_RESET for each session.
3. session1's LUN_RESET loops over all the running commands from both
sessions and moves them to its local drain_task_list.
4. session2's LUN_RESET does not see the LUN_RESET from session1 because
the commit above has it remove itself. session2 also does not see any
commands since the other reset moved them off the state lists.
5. sessions2's LUN_RESET will then complete with a successful response.
6. sessions2's inititor believes the running commands on its session are
now cleaned up due to the successful response and cleans up the running
commands from its side. It then restarts them.
7. The commands do eventually complete on the backend and the target
starts to return aborted task statuses for them. The initiator will
either throw a invalid ITT error or might accidentally lookup a new task
if the ITT has been reallocated already.

This fixes the bug by reverting the patch, and also serializes the
execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
because it turns out the commit accidentally fixed a bug where if there
are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
put the other one on their local drain list, then end up waiting on each
other resulting in a deadlock.

Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_device.c    | 15 ++++++--
 drivers/target/target_core_tmr.c       | 15 ++++----
 drivers/target/target_core_transport.c | 50 ++++++++++++++++++++++++--
 include/target/target_core_base.h      |  5 ++-
 4 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index f6e58410ec3f..c9f75ed1566b 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -179,7 +179,16 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd)
 	se_tmr->tmr_dev = rcu_dereference_raw(se_lun->lun_se_dev);
 
 	spin_lock_irqsave(&se_tmr->tmr_dev->se_tmr_lock, flags);
-	list_add_tail(&se_tmr->tmr_list, &se_tmr->tmr_dev->dev_tmr_list);
+	switch (se_tmr->function) {
+	case TMR_ABORT_TASK:
+		list_add_tail(&se_tmr->tmr_list,
+			      &se_tmr->tmr_dev->generic_tmr_list);
+		break;
+	case TMR_LUN_RESET:
+		list_add_tail(&se_tmr->tmr_list,
+			      &se_tmr->tmr_dev->lun_reset_tmr_list);
+		break;
+	}
 	spin_unlock_irqrestore(&se_tmr->tmr_dev->se_tmr_lock, flags);
 
 	return 0;
@@ -761,7 +770,8 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
 	dev->hba_index = hba->hba_index;
 
 	INIT_LIST_HEAD(&dev->dev_sep_list);
-	INIT_LIST_HEAD(&dev->dev_tmr_list);
+	INIT_LIST_HEAD(&dev->generic_tmr_list);
+	INIT_LIST_HEAD(&dev->lun_reset_tmr_list);
 	INIT_LIST_HEAD(&dev->delayed_cmd_list);
 	INIT_LIST_HEAD(&dev->qf_cmd_list);
 	spin_lock_init(&dev->delayed_cmd_lock);
@@ -782,6 +792,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
 	spin_lock_init(&dev->t10_alua.lba_map_lock);
 
 	INIT_WORK(&dev->delayed_cmd_work, target_do_delayed_work);
+	mutex_init(&dev->lun_reset_mutex);
 
 	dev->t10_wwn.t10_dev = dev;
 	/*
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 2b95b4550a63..88d2a7839876 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -184,13 +184,11 @@ static void core_tmr_drain_tmr_list(
 	unsigned long flags;
 	bool rc;
 	/*
-	 * Release all pending and outgoing TMRs aside from the received
-	 * LUN_RESET tmr..
+	 * Release all pending and outgoing TMRs except for LUN_RESETS.
 	 */
 	spin_lock_irqsave(&dev->se_tmr_lock, flags);
-	if (tmr)
-		list_del_init(&tmr->tmr_list);
-	list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
+	list_for_each_entry_safe(tmr_p, tmr_pp, &dev->generic_tmr_list,
+				 tmr_list) {
 		cmd = tmr_p->task_cmd;
 		if (!cmd) {
 			pr_err("Unable to locate struct se_cmd for TMR\n");
@@ -379,14 +377,19 @@ int core_tmr_lun_reset(
 				tmr_nacl->initiatorname);
 		}
 	}
+
+	/* Serialize LUN RESET TMRs and preempt and aborts */
+	mutex_lock(&dev->lun_reset_mutex);
+
 	pr_debug("LUN_RESET: %s starting for [%s], tas: %d\n",
 		(preempt_and_abort_list) ? "Preempt" : "TMR",
 		dev->transport->name, tas);
-
 	core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
 	core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
 				preempt_and_abort_list);
 
+	mutex_unlock(&dev->lun_reset_mutex);
+
 	/*
 	 * Clear any legacy SPC-2 reservation when called during
 	 * LOGICAL UNIT RESET
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 1c23079a5d7f..3c732b1b5389 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -3574,6 +3574,7 @@ static void target_tmr_work(struct work_struct *work)
 	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
 	struct se_device *dev = cmd->se_dev;
 	struct se_tmr_req *tmr = cmd->se_tmr_req;
+	bool sched_reset = false;
 	int ret;
 
 	if (cmd->transport_state & CMD_T_ABORTED)
@@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
 			target_dev_ua_allocate(dev, 0x29,
 					       ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
 		}
+
+		/*
+		 * If this is the last reset the device can be freed after we
+		 * run transport_cmd_check_stop_to_fabric. Figure out if there
+		 * are other resets that need to be scheduled while we know we
+		 * have a refcount on the device.
+		 */
+		spin_lock_irq(&dev->se_tmr_lock);
+		if (list_first_entry(&dev->lun_reset_tmr_list,
+				     struct se_tmr_req, tmr_list) !=
+		    list_last_entry(&dev->lun_reset_tmr_list,
+				    struct se_tmr_req, tmr_list))
+			sched_reset = true;
+		else
+			dev->dev_flags &= ~DF_RESETTING_LUN;
+		spin_unlock_irq(&dev->se_tmr_lock);
 		break;
 	case TMR_TARGET_WARM_RESET:
 		tmr->response = TMR_FUNCTION_REJECTED;
@@ -3617,15 +3634,26 @@ static void target_tmr_work(struct work_struct *work)
 
 	transport_lun_remove_cmd(cmd);
 	transport_cmd_check_stop_to_fabric(cmd);
+
+	if (!sched_reset)
+		return;
+
+	spin_lock_irq(&dev->se_tmr_lock);
+	tmr = list_first_entry(&dev->lun_reset_tmr_list, struct se_tmr_req,
+			       tmr_list);
+	spin_unlock_irq(&dev->se_tmr_lock);
+
+	INIT_WORK(&tmr->task_cmd->work, target_tmr_work);
+	schedule_work(&tmr->task_cmd->work);
 	return;
 
 aborted:
 	target_handle_abort(cmd);
 }
 
-int transport_generic_handle_tmr(
-	struct se_cmd *cmd)
+int transport_generic_handle_tmr(struct se_cmd *cmd)
 {
+	struct se_device *dev = cmd->se_dev;
 	unsigned long flags;
 	bool aborted = false;
 
@@ -3646,8 +3674,26 @@ int transport_generic_handle_tmr(
 		return 0;
 	}
 
+	spin_lock_irqsave(&dev->se_tmr_lock, flags);
+	if (cmd->se_tmr_req->function == TMR_LUN_RESET) {
+		/*
+		 * We only allow one reset to execute at a time to prevent
+		 * one reset waiting on another, and to make sure one reset
+		 * does not claim all the cmds causing the other reset to
+		 * return early.
+		 */
+		if (dev->dev_flags & DF_RESETTING_LUN) {
+			spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
+			goto done;
+		}
+
+		dev->dev_flags |= DF_RESETTING_LUN;
+	}
+	spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
+
 	INIT_WORK(&cmd->work, target_tmr_work);
 	schedule_work(&cmd->work);
+done:
 	return 0;
 }
 EXPORT_SYMBOL(transport_generic_handle_tmr);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index bd299790e99c..0a5b51f8e5e8 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -804,6 +804,7 @@ struct se_device {
 #define DF_USING_UDEV_PATH			0x00000008
 #define DF_USING_ALIAS				0x00000010
 #define DF_READ_ONLY				0x00000020
+#define DF_RESETTING_LUN			0x00000040
 	u8			transport_flags;
 	/* Physical device queue depth */
 	u32			queue_depth;
@@ -840,7 +841,8 @@ struct se_device {
 	/* Used for SPC-3 Persistent Reservations */
 	struct t10_pr_registration *dev_pr_res_holder;
 	struct list_head	dev_sep_list;
-	struct list_head	dev_tmr_list;
+	struct list_head	generic_tmr_list;
+	struct list_head	lun_reset_tmr_list;
 	struct work_struct	qf_work_queue;
 	struct work_struct	delayed_cmd_work;
 	struct list_head	delayed_cmd_list;
@@ -872,6 +874,7 @@ struct se_device {
 	struct rcu_head		rcu_head;
 	int			queue_cnt;
 	struct se_device_queue	*queues;
+	struct mutex		lun_reset_mutex;
 };
 
 struct target_opcode_descriptor {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/18] scsi: target: Don't set CMD_T_FABRIC_STOP for aborted tasks
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (12 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 15/18] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

We don't want to set CMD_T_FABRIC_STOP on commands with CMD_T_ABORTED
set, because we can end up with hangs where:

1. __transport_wait_for_tasks sets CMD_T_FABRIC_STOP but because
CMD_T_ABORTED is set we hit the:

        if (fabric_stop && *aborted)
                return false;

and wait in transport_generic_free_cmd's aborted wait on the free_compl.

2. transport_cmd_check_stop_to_fabric sees CMD_T_FABRIC_STOP set and
does the complete_all call on the t_transport_stop_comp and returns.

3. We are now hung because we are waiting on the free_compl which won't
happen because transport_cmd_check_stop_to_fabric completed the
transport completion and didn't do the check_stop_free call which
normally drops the refcount which is needed for the free_compl to get
waken up.

This patch has us either set the CMD_T_FABRIC_STOP bit and wait on
t_transport_stop_comp, or set the CMD_T_ABORTED and wait on
free_compl completion.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 3c732b1b5389..1e42fd3ac8a8 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -3245,11 +3245,14 @@ __transport_wait_for_tasks(struct se_cmd *cmd, bool fabric_stop,
 {
 	lockdep_assert_held(&cmd->t_state_lock);
 
-	if (fabric_stop)
-		cmd->transport_state |= CMD_T_FABRIC_STOP;
-
 	if (cmd->transport_state & CMD_T_ABORTED)
 		*aborted = true;
+	/*
+	 * We can be either CMD_T_FABRIC_STOP and wait below or CMD_T_ABORTED
+	 * and return early below and wait in transport_generic_free_cmd.
+	 */
+	if (fabric_stop && !*aborted)
+		cmd->transport_state |= CMD_T_FABRIC_STOP;
 
 	if (cmd->transport_state & CMD_T_TAS)
 		*tas = true;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/18] scsi: target: iscsit: Fix TAS handling during conn cleanup
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (13 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 14/18] scsi: target: Don't set CMD_T_FABRIC_STOP for aborted tasks Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 16/18] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

This fixes a bug added in:

commit f36199355c64 ("scsi: target: iscsi: Fix cmd abort fabric stop
race")

If CMD_T_TAS is set on the se_cmd we must call iscsit_free_cmd to do the
last put on the cmd and free it, because the connection is down and we
will not up sending the response and doing the put from the normal IO
path. This patch adds a check for CMD_T_TAS in
iscsit_release_commands_from_conn so we now detect this case and run
iscsit_free_cmd.

Fixes: f36199355c64 ("scsi: target: iscsi: Fix cmd abort fabric stop race")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/iscsi/iscsi_target.c | 52 ++++++++++++++++-------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index cd9ef668a054..a64b984b9dca 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -4220,30 +4220,36 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
 	list_for_each_entry_safe(cmd, cmd_tmp, &tmp_list, i_conn_node) {
 		struct se_cmd *se_cmd = &cmd->se_cmd;
 
-		if (se_cmd->se_tfo != NULL) {
-			spin_lock_irq(&se_cmd->t_state_lock);
-			if (se_cmd->transport_state & CMD_T_ABORTED) {
-				/*
-				 * LIO's abort path owns the cleanup for this,
-				 * so put it back on the list and let
-				 * aborted_task handle it.
-				 */
-				list_move_tail(&cmd->i_conn_node,
-					       &conn->conn_cmd_list);
-			} else if (conn->sess->sess_ops->RDMAExtensions &&
-				   (se_cmd->transport_state & CMD_T_COMPLETE) &&
-				   !iscsit_cmd_failed(cmd)) {
-				/*
-				 * isert is still handling these cmds so wait in
-				 * target_wait_for_cmds.
-				 */
+		if (!se_cmd->se_tfo)
+			continue;
+
+		spin_lock_irq(&se_cmd->t_state_lock);
+		if (se_cmd->transport_state & CMD_T_ABORTED) {
+			/*
+			 * If TAS is set, we complete it similar to a normal
+			 * cmd and will wait on and free it below. We don't
+			 * set any of the STOP bits, so we work like other
+			 * drivers and wait on the cmd's free_compl.
+			 *
+			 * If TAS is not set, then LIO's abort patch owns the
+			 * cleanup, so we put it back on the list and let
+			 * the aborted_task path handle it.
+			 */
+			if (!(se_cmd->transport_state & CMD_T_TAS))
 				list_move_tail(&cmd->i_conn_node,
 					       &conn->conn_cmd_list);
-			} else {
-				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
-			}
-			spin_unlock_irq(&se_cmd->t_state_lock);
+		} else if (conn->sess->sess_ops->RDMAExtensions &&
+			   (se_cmd->transport_state & CMD_T_COMPLETE) &&
+			   !iscsit_cmd_failed(cmd)) {
+			/*
+			 * isert is still handling these cmds so wait in
+			 * target_wait_for_cmds.
+			 */
+			list_move_tail(&cmd->i_conn_node, &conn->conn_cmd_list);
+		} else {
+			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
 		}
+		spin_unlock_irq(&se_cmd->t_state_lock);
 	}
 	spin_unlock_bh(&conn->cmd_lock);
 
@@ -4256,7 +4262,7 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
 		 * 1. we have not got acks for.
 		 * 2. are (or will be when the backend completes them) stuck
 		 * on the response/immediate queue (failed cmds, TMRs, iscsi
-		 * reqs).
+		 * reqs, aborted cmds with TAS).
 		 * 3. completed ok on the backend, but hit the CMD_T_FABRIC_STOP
 		 * or CMD_T_STOP checks.
 		 */
@@ -4265,7 +4271,7 @@ static void iscsit_release_commands_from_conn(struct iscsit_conn *conn)
 
 	/*
 	 * We need to wait:
-	 * 1. for commands that are being cleaned up via the aborted_task path.
+	 * 1. for commands that are being aborted with no TAS.
 	 * 2. for isert we need to wait for iscsit_queue_status calls
 	 * that posted a response after the ib_drain_qp call returned but
 	 * have not yet called isert_send_done.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 16/18] scsi: target: drop tas arg from __transport_wait_for_tasks
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (14 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 15/18] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 17/18] scsi: target: Remove sess_cmd_lock Mike Christie
  2023-03-09 22:33 ` [PATCH 18/18] scsi: target: Move tag pr_debug to before we do a put on the cmd Mike Christie
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

The tas arg is no longer used by callers of __transport_wait_for_tasks
so drop it. Also, fix up comment about us skipping the put, when TAS is
not set because that was not correct as we always did the put.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
 drivers/target/target_core_transport.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 1e42fd3ac8a8..692104325b38 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2904,15 +2904,14 @@ static void transport_write_pending_qf(struct se_cmd *cmd)
 }
 
 static bool
-__transport_wait_for_tasks(struct se_cmd *, bool, bool *, bool *,
-			   unsigned long *flags);
+__transport_wait_for_tasks(struct se_cmd *, bool, bool *, unsigned long *flags);
 
-static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas)
+static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	__transport_wait_for_tasks(cmd, true, aborted, tas, &flags);
+	__transport_wait_for_tasks(cmd, true, aborted, &flags);
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 }
 
@@ -2946,8 +2945,6 @@ void target_put_cmd_and_wait(struct se_cmd *cmd)
  *   CMD_T_COMPLETE has been set.
  * - CMD_T_ABORTED is set atomically after the CMD_T_COMPLETE check for
  *   commands that will be aborted.
- * - If the CMD_T_ABORTED flag is set but CMD_T_TAS has not been set
- *   transport_generic_free_cmd() skips its call to target_put_sess_cmd().
  * - For aborted commands for which CMD_T_TAS has been set .queue_status() will
  *   be called and will drop a reference.
  * - For aborted commands for which CMD_T_TAS has not been set .aborted_task()
@@ -2957,10 +2954,10 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 {
 	DECLARE_COMPLETION_ONSTACK(compl);
 	int ret = 0;
-	bool aborted = false, tas = false;
+	bool aborted = false;
 
 	if (wait_for_tasks)
-		target_wait_free_cmd(cmd, &aborted, &tas);
+		target_wait_free_cmd(cmd, &aborted);
 
 	if (cmd->se_cmd_flags & SCF_SE_LUN_CMD) {
 		/*
@@ -3239,7 +3236,7 @@ void transport_clear_lun_ref(struct se_lun *lun)
 
 static bool
 __transport_wait_for_tasks(struct se_cmd *cmd, bool fabric_stop,
-			   bool *aborted, bool *tas, unsigned long *flags)
+			   bool *aborted, unsigned long *flags)
 	__releases(&cmd->t_state_lock)
 	__acquires(&cmd->t_state_lock)
 {
@@ -3254,9 +3251,6 @@ __transport_wait_for_tasks(struct se_cmd *cmd, bool fabric_stop,
 	if (fabric_stop && !*aborted)
 		cmd->transport_state |= CMD_T_FABRIC_STOP;
 
-	if (cmd->transport_state & CMD_T_TAS)
-		*tas = true;
-
 	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD) &&
 	    !(cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
 		return false;
@@ -3297,10 +3291,10 @@ __transport_wait_for_tasks(struct se_cmd *cmd, bool fabric_stop,
 bool transport_wait_for_tasks(struct se_cmd *cmd)
 {
 	unsigned long flags;
-	bool ret, aborted = false, tas = false;
+	bool ret, aborted = false;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	ret = __transport_wait_for_tasks(cmd, false, &aborted, &tas, &flags);
+	ret = __transport_wait_for_tasks(cmd, false, &aborted, &flags);
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 
 	return ret;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 17/18] scsi: target: Remove sess_cmd_lock
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (15 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 16/18] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  2023-03-09 22:33 ` [PATCH 18/18] scsi: target: Move tag pr_debug to before we do a put on the cmd Mike Christie
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

We used to ue the sess_cmd_lock to grab a ref to the se_cmd and add it to
the se_cmd_list if the session was not being stopped. And if the cmd was
being completed while __target_check_io_state was being run then we held
the lock while taking a ref because we were looping over the se_cmd_list
and because we didn't remove the cmd from the list until after the last
ref was released in target_release_cmd_kref.

In:

Commit 6f55b06f9b07 ("scsi: target: Drop sess_cmd_lock from I/O path")

we switched from the sess list and lock use to the per cpu session
cmd_count, so the lock was no longer needed, but it didn't get removed
from __target_check_io_state. This patch removes the last target uses of
sess_cmd_lock.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_tmr.c       | 11 -----------
 drivers/target/target_core_transport.c |  1 -
 include/target/target_core_base.h      |  1 -
 3 files changed, 13 deletions(-)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 88d2a7839876..6fa037ffc119 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -71,10 +71,6 @@ static int target_check_cdb_and_preempt(struct list_head *list,
 static bool __target_check_io_state(struct se_cmd *se_cmd,
 				    struct se_session *tmr_sess, bool tas)
 {
-	struct se_session *sess = se_cmd->se_sess;
-
-	lockdep_assert_held(&sess->sess_cmd_lock);
-
 	/*
 	 * If command already reached CMD_T_COMPLETE state within
 	 * target_complete_cmd() or CMD_T_FABRIC_STOP due to shutdown,
@@ -137,9 +133,7 @@ void core_tmr_abort_task(
 			pr_err("ABORT_TASK: Found referenced %s task_tag: %llu\n",
 			       se_cmd->se_tfo->fabric_name, ref_tag);
 
-			spin_lock(&se_sess->sess_cmd_lock);
 			rc = __target_check_io_state(se_cmd, se_sess, 0);
-			spin_unlock(&se_sess->sess_cmd_lock);
 			if (!rc)
 				continue;
 
@@ -206,10 +200,7 @@ static void core_tmr_drain_tmr_list(
 		if (WARN_ON_ONCE(!sess))
 			continue;
 
-		spin_lock(&sess->sess_cmd_lock);
 		rc = __target_check_io_state(cmd, sess, 0);
-		spin_unlock(&sess->sess_cmd_lock);
-
 		if (!rc) {
 			printk("LUN_RESET TMR: non-zero kref_get_unless_zero\n");
 			continue;
@@ -310,9 +301,7 @@ static void core_tmr_drain_state_list(
 			if (WARN_ON_ONCE(!sess))
 				continue;
 
-			spin_lock(&sess->sess_cmd_lock);
 			rc = __target_check_io_state(cmd, tmr_sess, tas);
-			spin_unlock(&sess->sess_cmd_lock);
 			if (!rc)
 				continue;
 
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 692104325b38..c260cb60f5cb 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -277,7 +277,6 @@ void transport_init_session(struct se_session *se_sess)
 {
 	INIT_LIST_HEAD(&se_sess->sess_list);
 	INIT_LIST_HEAD(&se_sess->sess_acl_list);
-	spin_lock_init(&se_sess->sess_cmd_lock);
 }
 EXPORT_SYMBOL(transport_init_session);
 
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 0a5b51f8e5e8..845db96f50dd 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -636,7 +636,6 @@ struct se_session {
 	void			*fabric_sess_ptr;
 	struct list_head	sess_list;
 	struct list_head	sess_acl_list;
-	spinlock_t		sess_cmd_lock;
 	void			*sess_cmd_map;
 	struct sbitmap_queue	sess_tag_pool;
 	struct target_cmd_counter *cmd_cnt;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 18/18] scsi: target: Move tag pr_debug to before we do a put on the cmd
  2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
                   ` (16 preceding siblings ...)
  2023-03-09 22:33 ` [PATCH 17/18] scsi: target: Remove sess_cmd_lock Mike Christie
@ 2023-03-09 22:33 ` Mike Christie
  17 siblings, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-09 22:33 UTC (permalink / raw)
  To: mlombard, martin.petersen, mgurtovoy, sagi, d.bogdanov,
	linux-scsi, target-devel
  Cc: Mike Christie

We have to print the tag before we do the put, because the put could
free the cmd.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index c260cb60f5cb..a39ffb8133f2 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2970,11 +2970,12 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 		if (cmd->se_lun)
 			transport_lun_remove_cmd(cmd);
 	}
-	if (aborted)
+	if (aborted) {
+		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
 		cmd->free_compl = &compl;
+	}
 	ret = target_put_sess_cmd(cmd);
 	if (aborted) {
-		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
 		wait_for_completion(&compl);
 		ret = 1;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP
  2023-03-09 22:33 ` [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
@ 2023-03-15 10:47   ` Dmitry Bogdanov
  2023-03-15 22:54     ` Mike Christie
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bogdanov @ 2023-03-15 10:47 UTC (permalink / raw)
  To: Mike Christie
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On Thu, Mar 09, 2023 at 04:33:01PM -0600, Mike Christie wrote:
> 
> iscsit will set CMD_T_FABRIC_STOP on running commands when its transport
> connection is down and it can't send/recv IO (tx/rx threads are killed
> or the cleanup thread is run from the one thats up). It will then loop
> over running commands and wait for LIO core to complete them or clean
> them up if they were on an internal queue waiting to be sent or ackd.

The current usage of CMD_T_FABRIC_STOP and CMD_T_ABORTED is to
distinguish will command be aborted or finished at the connection release.
Technically that means who is in charge to decrease the command's kref.

The current usage of CMD_T_FABRIC_STOP and CMD_T_ABORTED is race free -
it checks and *changes* the state under a lock. They are mutually
exclusive.

> Currently, CMD_T_FABRIC_STOP only stops TMRs from operating on the
> command but for isert we need to prevent LIO core from calling into
> iscsit callouts when the connection is being brought down. If LIO core
> queues commands to iscsit and it ends up adding to an internal queue
> instead of passing back to the driver then we can end up hanging waiting
> on command completion that never occurs because it's stuck on the internal
> list (the tx thread is stopped at this time, so it will never loop over
> the response list and call into isert). We also want to sync up on a
> point where we no longer call into isert so it can cleanup it's structs.

If fabric driver knows that responses will not be completed by HW
then the fabric driver shall itself complete such responses.
Please do not shift this responsibility to LIO core.

> This has LIO core treat CMD_T_FABRIC_STOP like CMD_T_STOP during
> command execution and also fixes the locking around the
> target_cmd_interrupted calls so fabric modules can make sure cmds are
> never marked both CMD_T_COMPLETE and CMD_T_STOP|CMD_T_FABRIC_STOP.

CMD_T_STOP is some ancient logic that is used to move responses from a failed
connection to a new one during recovery in ERL=2.
I believe that CMT_T_STOP logic was reused at connection release just
to reduce conn/session use-after-free cases at command release.

Thanks to this patchset all commands in the connection are waited for
the completion in iscsit_release_commands_from_conn(). Is there any
sense to use CMD_T_STOP mechanism there now? I believe it's time to
remove it and to become like other fabric drivers - just wait for commands
in async manner. For connection release CMT_T_STOP is definitely
superfluous and error prone now.

The long story short, at connection release with ERL=0 I propose to
completely avoid CMD_T_STOP logic instead of reusing CMD_T_STOP logic.

> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/target/target_core_sbc.c       |  2 +-
>  drivers/target/target_core_transport.c | 27 +++++++++++++++-----------
>  2 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
> index c1cf37a1b4ce..ff1ae779543f 100644
> --- a/drivers/target/target_core_sbc.c
> +++ b/drivers/target/target_core_sbc.c
> @@ -457,7 +457,7 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes
>                  * we don't have to perform the write operation.
>                  */
>                 WARN_ON(!(cmd->transport_state &
> -                       (CMD_T_ABORTED | CMD_T_STOP)));
> +                       (CMD_T_ABORTED | CMD_T_STOP | CMD_T_FABRIC_STOP)));
>                 goto out;
>         }
>         /*
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 86adff2a86ed..1c23079a5d7f 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -737,8 +737,8 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd)
>          * Determine if frontend context caller is requesting the stopping of
>          * this command for frontend exceptions.
>          */
> -       if (cmd->transport_state & CMD_T_STOP) {
> -               pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
> +       if (cmd->transport_state & (CMD_T_STOP | CMD_T_FABRIC_STOP)) {
> +               pr_debug("%s:%d CMD_T_STOP|CMD_T_FABRIC_STOP for ITT: 0x%08llx\n",
>                         __func__, __LINE__, cmd->tag);

For example, this snippet forbids kref decrement for CMD_TFABRIC_STOP
commands although it is supposed to happen - that is a decrement from
Core meaning that Core is not needed in this command any more.
> 
>                 spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> @@ -889,7 +889,7 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
>                 INIT_WORK(&cmd->work, target_abort_work);
>                 queue_work(target_completion_wq, &cmd->work);
>                 return true;
> -       } else if (cmd->transport_state & CMD_T_STOP) {
> +       } else if (cmd->transport_state & (CMD_T_STOP | CMD_T_FABRIC_STOP)) {
>                 if (cmd->transport_complete_callback)
>                         cmd->transport_complete_callback(cmd, false, &post_ret);
>                 complete_all(&cmd->t_transport_stop_comp);
> @@ -907,13 +907,15 @@ void target_complete_cmd_with_sense(struct se_cmd *cmd, u8 scsi_status,
>         int success, cpu;
>         unsigned long flags;
> 
> -       if (target_cmd_interrupted(cmd))
> +       spin_lock_irqsave(&cmd->t_state_lock, flags);
> +       if (target_cmd_interrupted(cmd)) {
> +               spin_unlock_irqrestore(&cmd->t_state_lock, flags);
>                 return;
> +       }
> 
>         cmd->scsi_status = scsi_status;
>         cmd->sense_reason = sense_reason;
> 
> -       spin_lock_irqsave(&cmd->t_state_lock, flags);
>         switch (cmd->scsi_status) {
>         case SAM_STAT_CHECK_CONDITION:
>                 if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
> @@ -2277,10 +2279,12 @@ void target_execute_cmd(struct se_cmd *cmd)
>          *
>          * If the received CDB has already been aborted stop processing it here.
>          */
> -       if (target_cmd_interrupted(cmd))
> +       spin_lock_irq(&cmd->t_state_lock);
> +       if (target_cmd_interrupted(cmd)) {
> +               spin_unlock_irq(&cmd->t_state_lock);
>                 return;
> +       }
> 
> -       spin_lock_irq(&cmd->t_state_lock);
>         cmd->t_state = TRANSPORT_PROCESSING;
>         cmd->transport_state |= CMD_T_ACTIVE | CMD_T_SENT;
>         spin_unlock_irq(&cmd->t_state_lock);
> @@ -2847,9 +2851,9 @@ transport_generic_new_cmd(struct se_cmd *cmd)
>          * Determine if frontend context caller is requesting the stopping of
>          * this command for frontend exceptions.
>          */
> -       if (cmd->transport_state & CMD_T_STOP &&
> +       if (cmd->transport_state & (CMD_T_STOP | CMD_T_FABRIC_STOP) &&
>             !cmd->se_tfo->write_pending_must_be_called) {
> -               pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
> +               pr_debug("%s:%d CMD_T_STOP|CMD_T_FABRIC_STOPfor ITT: 0x%08llx\n",
>                          __func__, __LINE__, cmd->tag);
> 
>                 spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> @@ -2880,11 +2884,12 @@ static void transport_write_pending_qf(struct se_cmd *cmd)
>         bool stop;
> 
>         spin_lock_irqsave(&cmd->t_state_lock, flags);
> -       stop = (cmd->transport_state & (CMD_T_STOP | CMD_T_ABORTED));
> +       stop = (cmd->transport_state &
> +               (CMD_T_STOP | CMD_T_FABRIC_STOP | CMD_T_ABORTED));
>         spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> 
>         if (stop) {
> -               pr_debug("%s:%d CMD_T_STOP|CMD_T_ABORTED for ITT: 0x%08llx\n",
> +               pr_debug("%s:%d CMD_T_STOP|CMD_T_FABRIC_STOP|CMD_T_ABORTED for ITT: 0x%08llx\n",
>                         __func__, __LINE__, cmd->tag);
>                 complete_all(&cmd->t_transport_stop_comp);
>                 return;
> --
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 11/18] IB/isert: Fix use after free during conn cleanup
  2023-03-09 22:33 ` [PATCH 11/18] IB/isert: Fix use after free during conn cleanup Mike Christie
@ 2023-03-15 15:21   ` Sagi Grimberg
  0 siblings, 0 replies; 30+ messages in thread
From: Sagi Grimberg @ 2023-03-15 15:21 UTC (permalink / raw)
  To: Mike Christie, mlombard, martin.petersen, mgurtovoy, d.bogdanov,
	linux-scsi, target-devel

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-09 22:33 ` [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling Mike Christie
@ 2023-03-15 16:13   ` Dmitry Bogdanov
  2023-03-15 16:44     ` Mike Christie
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bogdanov @ 2023-03-15 16:13 UTC (permalink / raw)
  To: Mike Christie
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
> «Внимание! Данное письмо от внешнего адресата!»
> 
> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
> up running commands when it hasn't. The bug was added in:
> 
> commit 51ec502a3266 ("target: Delete tmr from list before processing")
> 
> The problem occurs when:
> 
> 1. We have N IO cmds running in the target layer spread over 2 sessions.
> 2. The initiator sends a LUN_RESET for each session.
> 3. session1's LUN_RESET loops over all the running commands from both
> sessions and moves them to its local drain_task_list.
> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
> the commit above has it remove itself. session2 also does not see any
> commands since the other reset moved them off the state lists.
> 5. sessions2's LUN_RESET will then complete with a successful response.
> 6. sessions2's inititor believes the running commands on its session are
> now cleaned up due to the successful response and cleans up the running
> commands from its side. It then restarts them.
> 7. The commands do eventually complete on the backend and the target
> starts to return aborted task statuses for them. The initiator will
> either throw a invalid ITT error or might accidentally lookup a new task
> if the ITT has been reallocated already.
> 
> This fixes the bug by reverting the patch, and also serializes the
> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
> because it turns out the commit accidentally fixed a bug where if there
> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
> put the other one on their local drain list, then end up waiting on each
> other resulting in a deadlock.

If LUN_RESET is not in TMR list anymore there is no need to serialize
core_tmr_drain_tmr_list.

> 
> Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/target/target_core_device.c    | 15 ++++++--
>  drivers/target/target_core_tmr.c       | 15 ++++----
>  drivers/target/target_core_transport.c | 50 ++++++++++++++++++++++++--
>  include/target/target_core_base.h      |  5 ++-
>  4 files changed, 74 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
> index f6e58410ec3f..c9f75ed1566b 100644
> --- a/drivers/target/target_core_device.c
> +++ b/drivers/target/target_core_device.c
> @@ -179,7 +179,16 @@ int transport_lookup_tmr_lun(struct se_cmd *se_cmd)
>         se_tmr->tmr_dev = rcu_dereference_raw(se_lun->lun_se_dev);
> 
>         spin_lock_irqsave(&se_tmr->tmr_dev->se_tmr_lock, flags);
> -       list_add_tail(&se_tmr->tmr_list, &se_tmr->tmr_dev->dev_tmr_list);
> +       switch (se_tmr->function) {
> +       case TMR_ABORT_TASK:
> +               list_add_tail(&se_tmr->tmr_list,
> +                             &se_tmr->tmr_dev->generic_tmr_list);
> +               break;
> +       case TMR_LUN_RESET:
> +               list_add_tail(&se_tmr->tmr_list,
> +                             &se_tmr->tmr_dev->lun_reset_tmr_list);
> +               break;
> +       }
>         spin_unlock_irqrestore(&se_tmr->tmr_dev->se_tmr_lock, flags);
> 
>         return 0;
> @@ -761,7 +770,8 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
>         dev->hba_index = hba->hba_index;
> 
>         INIT_LIST_HEAD(&dev->dev_sep_list);
> -       INIT_LIST_HEAD(&dev->dev_tmr_list);
> +       INIT_LIST_HEAD(&dev->generic_tmr_list);
> +       INIT_LIST_HEAD(&dev->lun_reset_tmr_list);
>         INIT_LIST_HEAD(&dev->delayed_cmd_list);
>         INIT_LIST_HEAD(&dev->qf_cmd_list);
>         spin_lock_init(&dev->delayed_cmd_lock);
> @@ -782,6 +792,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
>         spin_lock_init(&dev->t10_alua.lba_map_lock);
> 
>         INIT_WORK(&dev->delayed_cmd_work, target_do_delayed_work);
> +       mutex_init(&dev->lun_reset_mutex);
> 
>         dev->t10_wwn.t10_dev = dev;
>         /*
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 2b95b4550a63..88d2a7839876 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -184,13 +184,11 @@ static void core_tmr_drain_tmr_list(
>         unsigned long flags;
>         bool rc;
>         /*
> -        * Release all pending and outgoing TMRs aside from the received
> -        * LUN_RESET tmr..
> +        * Release all pending and outgoing TMRs except for LUN_RESETS.
>          */
>         spin_lock_irqsave(&dev->se_tmr_lock, flags);
> -       if (tmr)
> -               list_del_init(&tmr->tmr_list);
> -       list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
> +       list_for_each_entry_safe(tmr_p, tmr_pp, &dev->generic_tmr_list,
> +                                tmr_list) {
>                 cmd = tmr_p->task_cmd;
>                 if (!cmd) {
>                         pr_err("Unable to locate struct se_cmd for TMR\n");
> @@ -379,14 +377,19 @@ int core_tmr_lun_reset(
>                                 tmr_nacl->initiatorname);
>                 }
>         }
> +
> +       /* Serialize LUN RESET TMRs and preempt and aborts */
> +       mutex_lock(&dev->lun_reset_mutex);
> +
>         pr_debug("LUN_RESET: %s starting for [%s], tas: %d\n",
>                 (preempt_and_abort_list) ? "Preempt" : "TMR",
>                 dev->transport->name, tas);
> -
>         core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
>         core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
>                                 preempt_and_abort_list);
> 
> +       mutex_unlock(&dev->lun_reset_mutex);
> +
>         /*
>          * Clear any legacy SPC-2 reservation when called during
>          * LOGICAL UNIT RESET
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 1c23079a5d7f..3c732b1b5389 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -3574,6 +3574,7 @@ static void target_tmr_work(struct work_struct *work)
>         struct se_cmd *cmd = container_of(work, struct se_cmd, work);
>         struct se_device *dev = cmd->se_dev;
>         struct se_tmr_req *tmr = cmd->se_tmr_req;
> +       bool sched_reset = false;
>         int ret;
> 
>         if (cmd->transport_state & CMD_T_ABORTED)
> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
>                         target_dev_ua_allocate(dev, 0x29,
>                                                ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
>                 }
> +
> +               /*
> +                * If this is the last reset the device can be freed after we
> +                * run transport_cmd_check_stop_to_fabric. Figure out if there
> +                * are other resets that need to be scheduled while we know we
> +                * have a refcount on the device.
> +                */
> +               spin_lock_irq(&dev->se_tmr_lock);

tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them 
can be finished but not yet removed from the list. 
 
You may delete lun_reset here with nulling tmr->tmr_dev:
+			list_del_init(&cmd->se_tmr_req->tmr_list);
+			cmd->se_tmr_req->tmr_dev = NULL;

Then the check below will be just 
+			if (!list_empty(dev->lun_reset_tmr_list))

> +               if (list_first_entry(&dev->lun_reset_tmr_list,
> +                                    struct se_tmr_req, tmr_list) !=
> +                   list_last_entry(&dev->lun_reset_tmr_list,
> +                                   struct se_tmr_req, tmr_list))
> +                       sched_reset = true;
> +               else
> +                       dev->dev_flags &= ~DF_RESETTING_LUN;
> +               spin_unlock_irq(&dev->se_tmr_lock);
>                 break;
>         case TMR_TARGET_WARM_RESET:
>                 tmr->response = TMR_FUNCTION_REJECTED;
> @@ -3617,15 +3634,26 @@ static void target_tmr_work(struct work_struct *work)
> 
>         transport_lun_remove_cmd(cmd);
>         transport_cmd_check_stop_to_fabric(cmd);
> +
> +       if (!sched_reset)
> +               return;
> +
> +       spin_lock_irq(&dev->se_tmr_lock);
> +       tmr = list_first_entry(&dev->lun_reset_tmr_list, struct se_tmr_req,
> +                              tmr_list);

And this list_first_entry will return the next LUN_RESET as you
expected.

> +       spin_unlock_irq(&dev->se_tmr_lock);
> +
> +       INIT_WORK(&tmr->task_cmd->work, target_tmr_work);
> +       schedule_work(&tmr->task_cmd->work);
>         return;
> 
>  aborted:
>         target_handle_abort(cmd);
>  }
> 
> -int transport_generic_handle_tmr(
> -       struct se_cmd *cmd)
> +int transport_generic_handle_tmr(struct se_cmd *cmd)
>  {
> +       struct se_device *dev = cmd->se_dev;
>         unsigned long flags;
>         bool aborted = false;
> 
> @@ -3646,8 +3674,26 @@ int transport_generic_handle_tmr(
>                 return 0;
>         }
> 
> +       spin_lock_irqsave(&dev->se_tmr_lock, flags);
> +       if (cmd->se_tmr_req->function == TMR_LUN_RESET) {
> +               /*
> +                * We only allow one reset to execute at a time to prevent
> +                * one reset waiting on another, and to make sure one reset
> +                * does not claim all the cmds causing the other reset to
> +                * return early.
> +                */
> +               if (dev->dev_flags & DF_RESETTING_LUN) {
> +                       spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> +                       goto done;
> +               }
> +
> +               dev->dev_flags |= DF_RESETTING_LUN;

Not good choise of flag variable. It is used at configuration time and
not under a lock. Configfs file dev/alias can be changed in any time
and could race with LUN_RESET.

> +       }
> +       spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> +
>         INIT_WORK(&cmd->work, target_tmr_work);
>         schedule_work(&cmd->work);
> +done:
>         return 0;
>  }
>  EXPORT_SYMBOL(transport_generic_handle_tmr);
> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
> index bd299790e99c..0a5b51f8e5e8 100644
> --- a/include/target/target_core_base.h
> +++ b/include/target/target_core_base.h
> @@ -804,6 +804,7 @@ struct se_device {
>  #define DF_USING_UDEV_PATH                     0x00000008
>  #define DF_USING_ALIAS                         0x00000010
>  #define DF_READ_ONLY                           0x00000020
> +#define DF_RESETTING_LUN                       0x00000040
>         u8                      transport_flags;
>         /* Physical device queue depth */
>         u32                     queue_depth;
> @@ -840,7 +841,8 @@ struct se_device {
>         /* Used for SPC-3 Persistent Reservations */
>         struct t10_pr_registration *dev_pr_res_holder;
>         struct list_head        dev_sep_list;
> -       struct list_head        dev_tmr_list;
> +       struct list_head        generic_tmr_list;
> +       struct list_head        lun_reset_tmr_list;
>         struct work_struct      qf_work_queue;
>         struct work_struct      delayed_cmd_work;
>         struct list_head        delayed_cmd_list;
> @@ -872,6 +874,7 @@ struct se_device {
>         struct rcu_head         rcu_head;
>         int                     queue_cnt;
>         struct se_device_queue  *queues;
> +       struct mutex            lun_reset_mutex;
>  };
> 
>  struct target_opcode_descriptor {
> --
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-15 16:13   ` Dmitry Bogdanov
@ 2023-03-15 16:44     ` Mike Christie
  2023-03-15 19:11       ` Dmitry Bogdanov
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Christie @ 2023-03-15 16:44 UTC (permalink / raw)
  To: Dmitry Bogdanov
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On 3/15/23 11:13 AM, Dmitry Bogdanov wrote:
> On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
>> «Внимание! Данное письмо от внешнего адресата!»
>>
>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
>> up running commands when it hasn't. The bug was added in:
>>
>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>>
>> The problem occurs when:
>>
>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
>> 2. The initiator sends a LUN_RESET for each session.
>> 3. session1's LUN_RESET loops over all the running commands from both
>> sessions and moves them to its local drain_task_list.
>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
>> the commit above has it remove itself. session2 also does not see any
>> commands since the other reset moved them off the state lists.
>> 5. sessions2's LUN_RESET will then complete with a successful response.
>> 6. sessions2's inititor believes the running commands on its session are
>> now cleaned up due to the successful response and cleans up the running
>> commands from its side. It then restarts them.
>> 7. The commands do eventually complete on the backend and the target
>> starts to return aborted task statuses for them. The initiator will
>> either throw a invalid ITT error or might accidentally lookup a new task
>> if the ITT has been reallocated already.
>>
>> This fixes the bug by reverting the patch, and also serializes the
>> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
>> because it turns out the commit accidentally fixed a bug where if there
>> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
>> put the other one on their local drain list, then end up waiting on each
>> other resulting in a deadlock.
> 
> If LUN_RESET is not in TMR list anymore there is no need to serialize
> core_tmr_drain_tmr_list.

Ah shoot yeah I miswrote that. I meant I needed the serialization for my
bug not yours.


>>
>>         if (cmd->transport_state & CMD_T_ABORTED)
>> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
>>                         target_dev_ua_allocate(dev, 0x29,
>>                                                ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
>>                 }
>> +
>> +               /*
>> +                * If this is the last reset the device can be freed after we
>> +                * run transport_cmd_check_stop_to_fabric. Figure out if there
>> +                * are other resets that need to be scheduled while we know we
>> +                * have a refcount on the device.
>> +                */
>> +               spin_lock_irq(&dev->se_tmr_lock);
> 
> tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
> so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them 
> can be finished but not yet removed from the list. 

Don't we remove it from the list a little later in this function when
we call transport_lun_remove_cmd?

>  
> You may delete lun_reset here with nulling tmr->tmr_dev:
> +			list_del_init(&cmd->se_tmr_req->tmr_list);
> +			cmd->se_tmr_req->tmr_dev = NULL;
> 
> Then the check below will be just 
> +			if (!list_empty(dev->lun_reset_tmr_list))

I could go either way on this. Normally it's best to just have the one
place where we handle something like the deletion and clearing. If I'm
correct then it's already done a little later in this function so we
are ok.

On the other hand, yeah my test is kind of gross.


>>
>> +       spin_lock_irqsave(&dev->se_tmr_lock, flags);
>> +       if (cmd->se_tmr_req->function == TMR_LUN_RESET) {
>> +               /*
>> +                * We only allow one reset to execute at a time to prevent
>> +                * one reset waiting on another, and to make sure one reset
>> +                * does not claim all the cmds causing the other reset to
>> +                * return early.
>> +                */
>> +               if (dev->dev_flags & DF_RESETTING_LUN) {
>> +                       spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
>> +                       goto done;
>> +               }
>> +
>> +               dev->dev_flags |= DF_RESETTING_LUN;
> 
> Not good choise of flag variable. It is used at configuration time and
> not under a lock. Configfs file dev/alias can be changed in any time
> and could race with LUN_RESET.

I didn't see any places where one place can overwrite other flags. Are
you just saying in general it could happen. If so, would you also not
want dev->transport_flags to be used then?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-15 16:44     ` Mike Christie
@ 2023-03-15 19:11       ` Dmitry Bogdanov
  2023-03-15 21:42         ` Mike Christie
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Bogdanov @ 2023-03-15 19:11 UTC (permalink / raw)
  To: Mike Christie
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On Wed, Mar 15, 2023 at 11:44:48AM -0500, Mike Christie wrote:
> 
> On 3/15/23 11:13 AM, Dmitry Bogdanov wrote:
> > On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
> >>
> >> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
> >> up running commands when it hasn't. The bug was added in:
> >>
> >> commit 51ec502a3266 ("target: Delete tmr from list before processing")
> >>
> >> The problem occurs when:
> >>
> >> 1. We have N IO cmds running in the target layer spread over 2 sessions.
> >> 2. The initiator sends a LUN_RESET for each session.
> >> 3. session1's LUN_RESET loops over all the running commands from both
> >> sessions and moves them to its local drain_task_list.
> >> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
> >> the commit above has it remove itself. session2 also does not see any
> >> commands since the other reset moved them off the state lists.
> >> 5. sessions2's LUN_RESET will then complete with a successful response.
> >> 6. sessions2's inititor believes the running commands on its session are
> >> now cleaned up due to the successful response and cleans up the running
> >> commands from its side. It then restarts them.
> >> 7. The commands do eventually complete on the backend and the target
> >> starts to return aborted task statuses for them. The initiator will
> >> either throw a invalid ITT error or might accidentally lookup a new task
> >> if the ITT has been reallocated already.
> >>
> >> This fixes the bug by reverting the patch, and also serializes the
> >> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
> >> because it turns out the commit accidentally fixed a bug where if there
> >> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
> >> put the other one on their local drain list, then end up waiting on each
> >> other resulting in a deadlock.
> >
> > If LUN_RESET is not in TMR list anymore there is no need to serialize
> > core_tmr_drain_tmr_list.
> 
> Ah shoot yeah I miswrote that. I meant I needed the serialization for my
> bug not yours.

I still did not get why you wrapping core_tmr_drain_*_list by mutex.
general_tmr_list have only aborts now and they do not wait for other aborts.

> 
> >>
> >>         if (cmd->transport_state & CMD_T_ABORTED)
> >> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
> >>                         target_dev_ua_allocate(dev, 0x29,
> >>                                                ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
> >>                 }
> >> +
> >> +               /*
> >> +                * If this is the last reset the device can be freed after we
> >> +                * run transport_cmd_check_stop_to_fabric. Figure out if there
> >> +                * are other resets that need to be scheduled while we know we
> >> +                * have a refcount on the device.
> >> +                */
> >> +               spin_lock_irq(&dev->se_tmr_lock);
> >
> > tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
> > so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them
> > can be finished but not yet removed from the list.
> 
> Don't we remove it from the list a little later in this function when
> we call transport_lun_remove_cmd?

OMG, yes, of course, you a right. I messed up something.

But I have concerns still:
transport_lookup_tmr_lun (where LUN_RESET is added to the list) and
transport_generic_handle_tmr(where LUN_RESET is scheduled to handle)
are not serialized. And below you can start the second LUN_RESET while
transport_generic_handle_tmr is not yet called for it. The _handle_tmr
could be delayed for a such time that is enough to handle that second
LUN_RESET and to clear the flag. _handle_tmr will then start the work
again.

Is it worth doing that list management? Is it not enough just wrap
calling core_tmr_lun_reset() in target_tmr_work by a mutex?


> >
> > You may delete lun_reset here with nulling tmr->tmr_dev:
> > +                     list_del_init(&cmd->se_tmr_req->tmr_list);
> > +                     cmd->se_tmr_req->tmr_dev = NULL;
> >
> > Then the check below will be just
> > +                     if (!list_empty(dev->lun_reset_tmr_list))
> 
> I could go either way on this. Normally it's best to just have the one
> place where we handle something like the deletion and clearing. If I'm
> correct then it's already done a little later in this function so we
> are ok.
> 
> On the other hand, yeah my test is kind of gross.
> 
> 
> >>
> >> +       spin_lock_irqsave(&dev->se_tmr_lock, flags);
> >> +       if (cmd->se_tmr_req->function == TMR_LUN_RESET) {
> >> +               /*
> >> +                * We only allow one reset to execute at a time to prevent
> >> +                * one reset waiting on another, and to make sure one reset
> >> +                * does not claim all the cmds causing the other reset to
> >> +                * return early.
> >> +                */
> >> +               if (dev->dev_flags & DF_RESETTING_LUN) {
> >> +                       spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> >> +                       goto done;
> >> +               }
> >> +
> >> +               dev->dev_flags |= DF_RESETTING_LUN;
> >
> > Not good choise of flag variable. It is used at configuration time and
> > not under a lock. Configfs file dev/alias can be changed in any time
> > and could race with LUN_RESET.
> 
> I didn't see any places where one place can overwrite other flags. Are
> you just saying in general it could happen. If so, would you also not
> want dev->transport_flags to be used then?

Yes, in general, bit setting is not atomic, write of some bit can
clear other bit being write in parallel.
Better to have a separarte variable used only under lock.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-15 19:11       ` Dmitry Bogdanov
@ 2023-03-15 21:42         ` Mike Christie
  2023-03-16 10:39           ` Dmitry Bogdanov
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Christie @ 2023-03-15 21:42 UTC (permalink / raw)
  To: Dmitry Bogdanov
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On 3/15/23 2:11 PM, Dmitry Bogdanov wrote:
> On Wed, Mar 15, 2023 at 11:44:48AM -0500, Mike Christie wrote:
>>
>> On 3/15/23 11:13 AM, Dmitry Bogdanov wrote:
>>> On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
>>>>
>>>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
>>>> up running commands when it hasn't. The bug was added in:
>>>>
>>>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>>>>
>>>> The problem occurs when:
>>>>
>>>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
>>>> 2. The initiator sends a LUN_RESET for each session.
>>>> 3. session1's LUN_RESET loops over all the running commands from both
>>>> sessions and moves them to its local drain_task_list.
>>>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
>>>> the commit above has it remove itself. session2 also does not see any
>>>> commands since the other reset moved them off the state lists.
>>>> 5. sessions2's LUN_RESET will then complete with a successful response.
>>>> 6. sessions2's inititor believes the running commands on its session are
>>>> now cleaned up due to the successful response and cleans up the running
>>>> commands from its side. It then restarts them.
>>>> 7. The commands do eventually complete on the backend and the target
>>>> starts to return aborted task statuses for them. The initiator will
>>>> either throw a invalid ITT error or might accidentally lookup a new task
>>>> if the ITT has been reallocated already.
>>>>
>>>> This fixes the bug by reverting the patch, and also serializes the
>>>> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
>>>> because it turns out the commit accidentally fixed a bug where if there
>>>> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
>>>> put the other one on their local drain list, then end up waiting on each
>>>> other resulting in a deadlock.
>>>
>>> If LUN_RESET is not in TMR list anymore there is no need to serialize
>>> core_tmr_drain_tmr_list.
>>
>> Ah shoot yeah I miswrote that. I meant I needed the serialization for my
>> bug not yours.
> 
> I still did not get why you wrapping core_tmr_drain_*_list by mutex.
> general_tmr_list have only aborts now and they do not wait for other aborts.

Do you mean I don't need the mutex for the bug I originally hit that's described
at the beginning? If your saying I don't need it for 2 resets running at the same
time, I agree. I thought I needed it if we have a RESET and Preempt and Abort:

1. You have 2 sessions. There are no TMRs initially.
2. session1 gets Preempt and Abort. It calls core_tmr_drain_state_list
and takes all the cmds from both sessions and puts them on the local
drain_task_list list.
3. session1 or 2 gets a LUN_RESET, it sees no cmds on the device's
state_lists, and returns success.
4. The initiator thinks the commands were cleaned up by the LUN_RESET.

- It could end up re-using the ITT while the original task being cleaned up is
still running. Then depending on which session got what and if TAS was set, if
the original command completes first then the initiator would think the second
command failed with SAM_STAT_TASK_ABORTED.

- If there was no TAS or the RESET and Preempt and Abort were on the same session
then when we could still hit a bug. We get the RESET response, the initiator might
retry the cmds or fail and the app might retry. The retry might go down a completely
different path on the target (like if hw queue1 was blocked and had the original
command, but this retry goes down hw queue2 due to being received on a different
CPU, so it completes right away). We do some new IO. Then hw queue1 unblocks and
overwrites the new IO.

With the mutex, the LUN_RESET will wait for the Preempt and Abort
which is waiting on the running commands. I could have had Preempt
and Abort create a tmr, and queue a work and go through that path
but I thought it looked uglier faking it.


> 
>>
>>>>
>>>>         if (cmd->transport_state & CMD_T_ABORTED)
>>>> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
>>>>                         target_dev_ua_allocate(dev, 0x29,
>>>>                                                ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
>>>>                 }
>>>> +
>>>> +               /*
>>>> +                * If this is the last reset the device can be freed after we
>>>> +                * run transport_cmd_check_stop_to_fabric. Figure out if there
>>>> +                * are other resets that need to be scheduled while we know we
>>>> +                * have a refcount on the device.
>>>> +                */
>>>> +               spin_lock_irq(&dev->se_tmr_lock);
>>>
>>> tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
>>> so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them
>>> can be finished but not yet removed from the list.
>>
>> Don't we remove it from the list a little later in this function when
>> we call transport_lun_remove_cmd?
> 
> OMG, yes, of course, you a right. I messed up something.
> 
> But I have concerns still:
> transport_lookup_tmr_lun (where LUN_RESET is added to the list) and
> transport_generic_handle_tmr(where LUN_RESET is scheduled to handle)
> are not serialized. And below you can start the second LUN_RESET while
> transport_generic_handle_tmr is not yet called for it. The _handle_tmr
> could be delayed for a such time that is enough to handle that second
> LUN_RESET and to clear the flag. _handle_tmr will then start the work
> again.

Ah yeah, nice catch.

> 
> Is it worth doing that list management? Is it not enough just wrap
> calling core_tmr_lun_reset() in target_tmr_work by a mutex?

I can just do the mutex.

Was trying to reduce how many threads we use, but the big win is for aborts.
Will work on that type of thing in a separate patchset.


> Better to have a separarte variable used only under lock.
>
Will fix.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP
  2023-03-15 10:47   ` Dmitry Bogdanov
@ 2023-03-15 22:54     ` Mike Christie
  2023-03-16  0:01       ` michael.christie
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Christie @ 2023-03-15 22:54 UTC (permalink / raw)
  To: Dmitry Bogdanov
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On 3/15/23 5:47 AM, Dmitry Bogdanov wrote:
> On Thu, Mar 09, 2023 at 04:33:01PM -0600, Mike Christie wrote:
>>
>> iscsit will set CMD_T_FABRIC_STOP on running commands when its transport
>> connection is down and it can't send/recv IO (tx/rx threads are killed
>> or the cleanup thread is run from the one thats up). It will then loop
>> over running commands and wait for LIO core to complete them or clean
>> them up if they were on an internal queue waiting to be sent or ackd.
> 
> The current usage of CMD_T_FABRIC_STOP and CMD_T_ABORTED is to
> distinguish will command be aborted or finished at the connection release.
> Technically that means who is in charge to decrease the command's kref.
> 
> The current usage of CMD_T_FABRIC_STOP and CMD_T_ABORTED is race free -
> it checks and *changes* the state under a lock. They are mutually
> exclusive

I wasn't sure about that because __transport_wait_for_tasks will always
set CMD_T_FABRIC_STOP if fabric_stop is true. It will do it even if CMD_T_ABORTED
is set, so you can have them both set. I changed that in patch 14 so we do what
you described above.


> 
>> Currently, CMD_T_FABRIC_STOP only stops TMRs from operating on the
>> command but for isert we need to prevent LIO core from calling into
>> iscsit callouts when the connection is being brought down. If LIO core
>> queues commands to iscsit and it ends up adding to an internal queue
>> instead of passing back to the driver then we can end up hanging waiting
>> on command completion that never occurs because it's stuck on the internal
>> list (the tx thread is stopped at this time, so it will never loop over
>> the response list and call into isert). We also want to sync up on a
>> point where we no longer call into isert so it can cleanup it's structs.
> 
> If fabric driver knows that responses will not be completed by HW
> then the fabric driver shall itself complete such responses.
> Please do not shift this responsibility to LIO core.


This patch makes it so the fabric driver can tell LIO core to not do
the last put, so the fabric driver can do some cleanup on on the cmd when
it completes on the backend but before we free it. Basically for isert
iscsit wants to call aborted_task to clean up the command. iscsit tcp will
also re-call __iscsit_free_cmd to cleanup queue_status calls that were done.

Are you saying that is shifting the responsibility to complete the cmd on LIO
core? I think it's the opposite since the fabric driver is doing the last
put and cleanup.

However, are you saying that just adding the check is putting it on LIO core?
If so, we can always call check_stop_free in transport_cmd_check_stop_to_fabric.
The fabric driver's release_cmd then has to figure out what needs to be cleaned
up then. I can make that work. It's more complicated than just what I wrote
because of how isert handles failed cmds and tmrs, but it's doable.


> 
>> This has LIO core treat CMD_T_FABRIC_STOP like CMD_T_STOP during
>> command execution and also fixes the locking around the
>> target_cmd_interrupted calls so fabric modules can make sure cmds are
>> never marked both CMD_T_COMPLETE and CMD_T_STOP|CMD_T_FABRIC_STOP.
> 
> CMD_T_STOP is some ancient logic that is used to move responses from a failed
> connection to a new one during recovery in ERL=2.
> I believe that CMT_T_STOP logic was reused at connection release just
> to reduce conn/session use-after-free cases at command release.
> 
> Thanks to this patchset all commands in the connection are waited for
> the completion in iscsit_release_commands_from_conn(). Is there any
> sense to use CMD_T_STOP mechanism there now?


CMD_T_STOP does 2 things for us in iscsit_release_commands_from_conn:
1. It waits for the cmd to complete so we don't release the conn/session
from under a cmd.
2. It prevents transport_cmd_check_stop_to_fabric from doing the last put
if the command is completing normally (not aborted). iscsit does it instead
so we can do some extra cleanup.

For iscsit tcp, we don't need 1 or 2. Funnily enough if you never brought
up isert in the other patchset, we could just do target_wait_for_cmds
and be done :)

For isert, we don't need 1. We are using it for #2 due to how the driver
handles failed cmds and tmrs. I either need the some check so I can tell
transport_cmd_check_stop_to_fabric that iscsit wants to do the last put,
or we can do release_cmd based.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP
  2023-03-15 22:54     ` Mike Christie
@ 2023-03-16  0:01       ` michael.christie
  0 siblings, 0 replies; 30+ messages in thread
From: michael.christie @ 2023-03-16  0:01 UTC (permalink / raw)
  To: Dmitry Bogdanov
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On 3/15/23 5:54 PM, Mike Christie wrote:
> However, are you saying that just adding the check is putting it on LIO core?
> If so, we can always call check_stop_free in transport_cmd_check_stop_to_fabric.
> The fabric driver's release_cmd then has to figure out what needs to be cleaned
> up then. I can make that work. It's more complicated than just what I wrote
> because of how isert handles failed cmds and tmrs, but it's doable.

I missed the point of your mail before. CMD_T_STOP sucks so don't replicate it
with another bit :) Agree with that :)

I'll work on a patch like I described above.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-15 21:42         ` Mike Christie
@ 2023-03-16 10:39           ` Dmitry Bogdanov
  2023-03-16 16:03             ` Mike Christie
  2023-03-16 16:07             ` Mike Christie
  0 siblings, 2 replies; 30+ messages in thread
From: Dmitry Bogdanov @ 2023-03-16 10:39 UTC (permalink / raw)
  To: Mike Christie
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On Wed, Mar 15, 2023 at 04:42:19PM -0500, Mike Christie wrote:
> 
> On 3/15/23 2:11 PM, Dmitry Bogdanov wrote:
> > On Wed, Mar 15, 2023 at 11:44:48AM -0500, Mike Christie wrote:
> >>
> >> On 3/15/23 11:13 AM, Dmitry Bogdanov wrote:
> >>> On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
> >>>>
> >>>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
> >>>> up running commands when it hasn't. The bug was added in:
> >>>>
> >>>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
> >>>>
> >>>> The problem occurs when:
> >>>>
> >>>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
> >>>> 2. The initiator sends a LUN_RESET for each session.
> >>>> 3. session1's LUN_RESET loops over all the running commands from both
> >>>> sessions and moves them to its local drain_task_list.
> >>>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
> >>>> the commit above has it remove itself. session2 also does not see any
> >>>> commands since the other reset moved them off the state lists.
> >>>> 5. sessions2's LUN_RESET will then complete with a successful response.
> >>>> 6. sessions2's inititor believes the running commands on its session are
> >>>> now cleaned up due to the successful response and cleans up the running
> >>>> commands from its side. It then restarts them.
> >>>> 7. The commands do eventually complete on the backend and the target
> >>>> starts to return aborted task statuses for them. The initiator will
> >>>> either throw a invalid ITT error or might accidentally lookup a new task
> >>>> if the ITT has been reallocated already.
> >>>>
> >>>> This fixes the bug by reverting the patch, and also serializes the
> >>>> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
> >>>> because it turns out the commit accidentally fixed a bug where if there
> >>>> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
> >>>> put the other one on their local drain list, then end up waiting on each
> >>>> other resulting in a deadlock.
> >>>
> >>> If LUN_RESET is not in TMR list anymore there is no need to serialize
> >>> core_tmr_drain_tmr_list.
> >>
> >> Ah shoot yeah I miswrote that. I meant I needed the serialization for my
> >> bug not yours.
> >
> > I still did not get why you wrapping core_tmr_drain_*_list by mutex.
> > general_tmr_list have only aborts now and they do not wait for other aborts.
> 
> Do you mean I don't need the mutex for the bug I originally hit that's described
> at the beginning? If your saying I don't need it for 2 resets running at the same
> time, I agree. I thought I needed it if we have a RESET and Preempt and Abort:
> 
> 1. You have 2 sessions. There are no TMRs initially.
> 2. session1 gets Preempt and Abort. It calls core_tmr_drain_state_list
> and takes all the cmds from both sessions and puts them on the local
> drain_task_list list.
> 3. session1 or 2 gets a LUN_RESET, it sees no cmds on the device's
> state_lists, and returns success.
> 4. The initiator thinks the commands were cleaned up by the LUN_RESET.
> 
> - It could end up re-using the ITT while the original task being cleaned up is
> still running. Then depending on which session got what and if TAS was set, if
> the original command completes first then the initiator would think the second
> command failed with SAM_STAT_TASK_ABORTED.
> 
> - If there was no TAS or the RESET and Preempt and Abort were on the same session
> then when we could still hit a bug. We get the RESET response, the initiator might
> retry the cmds or fail and the app might retry. The retry might go down a completely
> different path on the target (like if hw queue1 was blocked and had the original
> command, but this retry goes down hw queue2 due to being received on a different
> CPU, so it completes right away). We do some new IO. Then hw queue1 unblocks and
> overwrites the new IO.
> 
> With the mutex, the LUN_RESET will wait for the Preempt and Abort
> which is waiting on the running commands. I could have had Preempt
> and Abort create a tmr, and queue a work and go through that path
> but I thought it looked uglier faking it.

Thank you for explanation. But I think you a not right here.
Preempt And Abort is used to change the reservation holder and abort
preempted session's commands. A preempted session is not allowed to send
any new messages, they will be failed anyway.
So we are safe here. Or did I miss something?

> >>
> >>>>
> >>>>         if (cmd->transport_state & CMD_T_ABORTED)
> >>>> @@ -3596,6 +3597,22 @@ static void target_tmr_work(struct work_struct *work)
> >>>>                         target_dev_ua_allocate(dev, 0x29,
> >>>>                                                ASCQ_29H_BUS_DEVICE_RESET_FUNCTION_OCCURRED);
> >>>>                 }
> >>>> +
> >>>> +               /*
> >>>> +                * If this is the last reset the device can be freed after we
> >>>> +                * run transport_cmd_check_stop_to_fabric. Figure out if there
> >>>> +                * are other resets that need to be scheduled while we know we
> >>>> +                * have a refcount on the device.
> >>>> +                */
> >>>> +               spin_lock_irq(&dev->se_tmr_lock);
> >>>
> >>> tmr->tmr_list is removed from the list in the very end of se_cmd lifecycle
> >>> so any number of LUN_RESETs can be in lun_reset_tmr_list. And all of them
> >>> can be finished but not yet removed from the list.
> >>
> >> Don't we remove it from the list a little later in this function when
> >> we call transport_lun_remove_cmd?
> >
> > OMG, yes, of course, you a right. I messed up something.
> >
> > But I have concerns still:
> > transport_lookup_tmr_lun (where LUN_RESET is added to the list) and
> > transport_generic_handle_tmr(where LUN_RESET is scheduled to handle)
> > are not serialized. And below you can start the second LUN_RESET while
> > transport_generic_handle_tmr is not yet called for it. The _handle_tmr
> > could be delayed for a such time that is enough to handle that second
> > LUN_RESET and to clear the flag. _handle_tmr will then start the work
> > again.
> 
> Ah yeah, nice catch.
> 
> >
> > Is it worth doing that list management? Is it not enough just wrap
> > calling core_tmr_lun_reset() in target_tmr_work by a mutex?
> 
> I can just do the mutex.
> 
> Was trying to reduce how many threads we use, but the big win is for aborts.
> Will work on that type of thing in a separate patchset.

Considering that (if) I am right with PreemptAndAbort,
to address multiple LUN_RESET issue it's enough to wrap
core_tmr_lun_reset and skip all LUN_RESETs in target_drain_tmr_list.
Without any new lists. That would be as simple patch as possible. 

> 
> > Better to have a separarte variable used only under lock.
> >
> Will fix.
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-16 10:39           ` Dmitry Bogdanov
@ 2023-03-16 16:03             ` Mike Christie
  2023-03-16 16:07             ` Mike Christie
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-16 16:03 UTC (permalink / raw)
  To: Dmitry Bogdanov
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On 3/16/23 5:39 AM, Dmitry Bogdanov wrote:
> On Wed, Mar 15, 2023 at 04:42:19PM -0500, Mike Christie wrote:
>> On 3/15/23 2:11 PM, Dmitry Bogdanov wrote:
>>> On Wed, Mar 15, 2023 at 11:44:48AM -0500, Mike Christie wrote:
>>>> On 3/15/23 11:13 AM, Dmitry Bogdanov wrote:
>>>>> On Thu, Mar 09, 2023 at 04:33:07PM -0600, Mike Christie wrote:
>>>>>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
>>>>>> up running commands when it hasn't. The bug was added in:
>>>>>>
>>>>>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>>>>>>
>>>>>> The problem occurs when:
>>>>>>
>>>>>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
>>>>>> 2. The initiator sends a LUN_RESET for each session.
>>>>>> 3. session1's LUN_RESET loops over all the running commands from both
>>>>>> sessions and moves them to its local drain_task_list.
>>>>>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
>>>>>> the commit above has it remove itself. session2 also does not see any
>>>>>> commands since the other reset moved them off the state lists.
>>>>>> 5. sessions2's LUN_RESET will then complete with a successful response.
>>>>>> 6. sessions2's inititor believes the running commands on its session are
>>>>>> now cleaned up due to the successful response and cleans up the running
>>>>>> commands from its side. It then restarts them.
>>>>>> 7. The commands do eventually complete on the backend and the target
>>>>>> starts to return aborted task statuses for them. The initiator will
>>>>>> either throw a invalid ITT error or might accidentally lookup a new task
>>>>>> if the ITT has been reallocated already.
>>>>>>
>>>>>> This fixes the bug by reverting the patch, and also serializes the
>>>>>> execution of LUN_RESETs and Preempt and Aborts. The latter is necessary
>>>>>> because it turns out the commit accidentally fixed a bug where if there
>>>>>> are 2 LUN RESETs executing they can see each other on the dev_tmr_list,
>>>>>> put the other one on their local drain list, then end up waiting on each
>>>>>> other resulting in a deadlock.
>>>>> If LUN_RESET is not in TMR list anymore there is no need to serialize
>>>>> core_tmr_drain_tmr_list.
>>>> Ah shoot yeah I miswrote that. I meant I needed the serialization for my
>>>> bug not yours.
>>> I still did not get why you wrapping core_tmr_drain_*_list by mutex.
>>> general_tmr_list have only aborts now and they do not wait for other aborts.
>> Do you mean I don't need the mutex for the bug I originally hit that's described
>> at the beginning? If your saying I don't need it for 2 resets running at the same
>> time, I agree. I thought I needed it if we have a RESET and Preempt and Abort:
>>
>> 1. You have 2 sessions. There are no TMRs initially.
>> 2. session1 gets Preempt and Abort. It calls core_tmr_drain_state_list
>> and takes all the cmds from both sessions and puts them on the local
>> drain_task_list list.
>> 3. session1 or 2 gets a LUN_RESET, it sees no cmds on the device's
>> state_lists, and returns success.
>> 4. The initiator thinks the commands were cleaned up by the LUN_RESET.
>>
>> - It could end up re-using the ITT while the original task being cleaned up is
>> still running. Then depending on which session got what and if TAS was set, if
>> the original command completes first then the initiator would think the second
>> command failed with SAM_STAT_TASK_ABORTED.
>>
>> - If there was no TAS or the RESET and Preempt and Abort were on the same session
>> then when we could still hit a bug. We get the RESET response, the initiator might
>> retry the cmds or fail and the app might retry. The retry might go down a completely
>> different path on the target (like if hw queue1 was blocked and had the original
>> command, but this retry goes down hw queue2 due to being received on a different
>> CPU, so it completes right away). We do some new IO. Then hw queue1 unblocks and
>> overwrites the new IO.
>>
>> With the mutex, the LUN_RESET will wait for the Preempt and Abort
>> which is waiting on the running commands. I could have had Preempt
>> and Abort create a tmr, and queue a work and go through that path
>> but I thought it looked uglier faking it.
> Thank you for explanation. But I think you a not right here.
> Preempt And Abort is used to change the reservation holder and abort
> preempted session's commands. A preempted session is not allowed to send
> any new messages, they will be failed anyway.

For the ITT bug, a preempted session can still send commands like INQUIRY,
TURS, RTPG, PR-in, etc. If those commands have the same ITT as the command
the Preempt and Abort is waiting on, we can hit the bug.

Also in general for the ITT bug, even if the new cmd was going to be failed
due to a conflict, it's not right. Eventually the command the Preempt and Abort
is waiting on completes. The initiator is going to end up logging a message
the user almost never sees about getting a command response but no running
command, and drop the connection, and bug people like us :)

For the second issue, if the LUN_RESET came after the Preempt and Abort on
the same session, the RESET doesn't clear the registrations and reservation
So it's going to be sending IO down that specific path, so they will be
executing.

Agree with you for the no TAS and RESET and Preempt on Abort running
on different sessions case. I was thinking the path that got preempted
could later get registered and start sending IO, but I don't think that
makes sense.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling
  2023-03-16 10:39           ` Dmitry Bogdanov
  2023-03-16 16:03             ` Mike Christie
@ 2023-03-16 16:07             ` Mike Christie
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Christie @ 2023-03-16 16:07 UTC (permalink / raw)
  To: Dmitry Bogdanov
  Cc: mlombard, martin.petersen, mgurtovoy, sagi, linux-scsi, target-devel

On 3/16/23 5:39 AM, Dmitry Bogdanov wrote:
> Considering that (if) I am right with PreemptAndAbort,
> to address multiple LUN_RESET issue it's enough to wrap
> core_tmr_lun_reset and skip all LUN_RESETs in target_drain_tmr_list.
> Without any new lists. That would be as simple patch as possible. 

I think even if you are not correct about PreemptAndAbort, we can
just do the mutex and skip LUN_RESETs in target_drain_tmr_list :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-03-16 16:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09 22:32 [PATCH 00/18] target: TMF and recovery fixes Mike Christie
2023-03-09 22:32 ` [PATCH 01/18] scsi: target: Move sess cmd counter to new struct Mike Christie
2023-03-09 22:32 ` [PATCH 02/18] scsi: target: Move cmd counter allocation Mike Christie
2023-03-09 22:32 ` [PATCH 03/18] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
2023-03-09 22:32 ` [PATCH 04/18] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
2023-03-09 22:32 ` [PATCH 05/18] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
2023-03-09 22:33 ` [PATCH 06/18] scsi: target: Drop t_state_lock use in compare_and_write_post Mike Christie
2023-03-09 22:33 ` [PATCH 07/18] scsi: target: Treat CMD_T_FABRIC_STOP like CMD_T_STOP Mike Christie
2023-03-15 10:47   ` Dmitry Bogdanov
2023-03-15 22:54     ` Mike Christie
2023-03-16  0:01       ` michael.christie
2023-03-09 22:33 ` [PATCH 08/18] scsi: target: iscsit: Add helper to check when cmd has failed Mike Christie
2023-03-09 22:33 ` [PATCH 09/18] scsi: target: iscsit: Cleanup isert commands at conn closure Mike Christie
2023-03-09 22:33 ` [PATCH 10/18] IB/isert: Fix hang in target_wait_for_cmds Mike Christie
2023-03-09 22:33 ` [PATCH 11/18] IB/isert: Fix use after free during conn cleanup Mike Christie
2023-03-15 15:21   ` Sagi Grimberg
2023-03-09 22:33 ` [PATCH 12/18] scsi: target: iscsit: free cmds before session free Mike Christie
2023-03-09 22:33 ` [PATCH 13/18] scsi: target: Fix multiple LUN_RESET handling Mike Christie
2023-03-15 16:13   ` Dmitry Bogdanov
2023-03-15 16:44     ` Mike Christie
2023-03-15 19:11       ` Dmitry Bogdanov
2023-03-15 21:42         ` Mike Christie
2023-03-16 10:39           ` Dmitry Bogdanov
2023-03-16 16:03             ` Mike Christie
2023-03-16 16:07             ` Mike Christie
2023-03-09 22:33 ` [PATCH 14/18] scsi: target: Don't set CMD_T_FABRIC_STOP for aborted tasks Mike Christie
2023-03-09 22:33 ` [PATCH 15/18] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
2023-03-09 22:33 ` [PATCH 16/18] scsi: target: drop tas arg from __transport_wait_for_tasks Mike Christie
2023-03-09 22:33 ` [PATCH 17/18] scsi: target: Remove sess_cmd_lock Mike Christie
2023-03-09 22:33 ` [PATCH 18/18] scsi: target: Move tag pr_debug to before we do a put on the cmd Mike Christie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).