target-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: mlombard@redhat.com, martin.petersen@oracle.com,
	d.bogdanov@yadro.com, linux-scsi@vger.kernel.org,
	target-devel@vger.kernel.org
Cc: Mike Christie <michael.christie@oracle.com>
Subject: [PATCH v5 7/9] scsi: target: Fix multiple LUN_RESET handling
Date: Sat, 18 Mar 2023 20:56:18 -0500	[thread overview]
Message-ID: <20230319015620.96006-8-michael.christie@oracle.com> (raw)
In-Reply-To: <20230319015620.96006-1-michael.christie@oracle.com>

This fixes a bug where an initiator thinks a LUN_RESET has cleaned
up running commands when it hasn't. The bug was added in:

commit 51ec502a3266 ("target: Delete tmr from list before processing")

The problem occurs when:

1. We have N IO cmds running in the target layer spread over 2 sessions.
2. The initiator sends a LUN_RESET for each session.
3. session1's LUN_RESET loops over all the running commands from both
sessions and moves them to its local drain_task_list.
4. session2's LUN_RESET does not see the LUN_RESET from session1 because
the commit above has it remove itself. session2 also does not see any
commands since the other reset moved them off the state lists.
5. sessions2's LUN_RESET will then complete with a successful response.
6. sessions2's inititor believes the running commands on its session are
now cleaned up due to the successful response and cleans up the running
commands from its side. It then restarts them.
7. The commands do eventually complete on the backend and the target
starts to return aborted task statuses for them. The initiator will
either throw a invalid ITT error or might accidentally lookup a new task
if the ITT has been reallocated already.

This fixes the bug by reverting the patch, and serializes the execution of
LUN_RESETs and Preempt and Aborts.

The patch also prevents us from waiting on LUN_RESETs in
core_tmr_drain_tmr_list, because it turns out the original patch fixed a
bug that was not mentioned. For LUN_RESET1 core_tmr_drain_tmr_list can see
a second LUN_RESET and wait on it. Then the second reset will run
core_tmr_drain_tmr_list and see the first reset and wait on it resulting
in a deadlock.

Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_device.c |  1 +
 drivers/target/target_core_tmr.c    | 26 +++++++++++++++++++++++---
 include/target/target_core_base.h   |  1 +
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index f6e58410ec3f..aeb03136773d 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -782,6 +782,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
 	spin_lock_init(&dev->t10_alua.lba_map_lock);
 
 	INIT_WORK(&dev->delayed_cmd_work, target_do_delayed_work);
+	mutex_init(&dev->lun_reset_mutex);
 
 	dev->t10_wwn.t10_dev = dev;
 	/*
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 2b95b4550a63..4718db628222 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -188,14 +188,23 @@ static void core_tmr_drain_tmr_list(
 	 * LUN_RESET tmr..
 	 */
 	spin_lock_irqsave(&dev->se_tmr_lock, flags);
-	if (tmr)
-		list_del_init(&tmr->tmr_list);
 	list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
+		if (tmr_p == tmr)
+			continue;
+
 		cmd = tmr_p->task_cmd;
 		if (!cmd) {
 			pr_err("Unable to locate struct se_cmd for TMR\n");
 			continue;
 		}
+
+		/*
+		 * We only execute one LUN_RESET at a time so we can't wait
+		 * on them below.
+		 */
+		if (tmr_p->function == TMR_LUN_RESET)
+			continue;
+
 		/*
 		 * If this function was called with a valid pr_res_key
 		 * parameter (eg: for PROUT PREEMPT_AND_ABORT service action
@@ -379,14 +388,25 @@ int core_tmr_lun_reset(
 				tmr_nacl->initiatorname);
 		}
 	}
+
+
+	/*
+	 * We only allow one reset or preempt and abort to execute at a time
+	 * to prevent one call from claiming all the cmds causing a second
+	 * call from returning while cmds it should have waited on are still
+	 * running.
+	 */
+	mutex_lock(&dev->lun_reset_mutex);
+
 	pr_debug("LUN_RESET: %s starting for [%s], tas: %d\n",
 		(preempt_and_abort_list) ? "Preempt" : "TMR",
 		dev->transport->name, tas);
-
 	core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
 	core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
 				preempt_and_abort_list);
 
+	mutex_unlock(&dev->lun_reset_mutex);
+
 	/*
 	 * Clear any legacy SPC-2 reservation when called during
 	 * LOGICAL UNIT RESET
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index bd299790e99c..8cc42ad65c92 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -872,6 +872,7 @@ struct se_device {
 	struct rcu_head		rcu_head;
 	int			queue_cnt;
 	struct se_device_queue	*queues;
+	struct mutex		lun_reset_mutex;
 };
 
 struct target_opcode_descriptor {
-- 
2.25.1


  parent reply	other threads:[~2023-03-19  1:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-19  1:56 [PATCH v5 0/9] target: TMF and recovery fixes Mike Christie
2023-03-19  1:56 ` [PATCH v5 1/9] scsi: target: Move sess cmd counter to new struct Mike Christie
2023-03-19  1:56 ` [PATCH v5 2/9] scsi: target: Move cmd counter allocation Mike Christie
2023-03-19  1:56 ` [PATCH v5 3/9] scsi: target: Pass in cmd counter to use during cmd setup Mike Christie
2023-03-19  1:56 ` [PATCH v5 4/9] scsi: target: iscsit/isert: Alloc per conn cmd counter Mike Christie
2023-03-19  1:56 ` [PATCH v5 5/9] scsi: target: iscsit: stop/wait on cmds during conn close Mike Christie
2023-03-19  1:56 ` [PATCH v5 6/9] scsi: target: iscsit: free cmds before session free Mike Christie
2023-03-19  1:56 ` Mike Christie [this message]
2023-03-19  1:56 ` [PATCH v5 8/9] scsi: target: iscsit: Fix TAS handling during conn cleanup Mike Christie
2023-03-19  1:56 ` [PATCH v5 9/9] scsi: target: iscsi: handle abort for WRITE_PENDING cmds Mike Christie
2023-03-24 21:40 ` [PATCH v5 0/9] target: TMF and recovery fixes Martin K. Petersen
2023-04-03  2:15 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230319015620.96006-8-michael.christie@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=d.bogdanov@yadro.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mlombard@redhat.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).