All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>, Justin Tee <justin.tee@broadcom.com>
Subject: [PATCH 06/26] lpfc: Fix SCSI I/O completion and abort handler deadlock
Date: Tue, 12 Apr 2022 15:19:48 -0700	[thread overview]
Message-ID: <20220412222008.126521-7-jsmart2021@gmail.com> (raw)
In-Reply-To: <20220412222008.126521-1-jsmart2021@gmail.com>

During stress I/O tests with 500+ vports, hard LOCKUP call traces are
observed.

CPU A:
 native_queued_spin_lock_slowpath+0x192
 _raw_spin_lock_irqsave+0x32
 lpfc_handle_fcp_err+0x4c6
 lpfc_fcp_io_cmd_wqe_cmpl+0x964
 lpfc_sli4_fp_handle_cqe+0x266
 __lpfc_sli4_process_cq+0x105
 __lpfc_sli4_hba_process_cq+0x3c
 lpfc_cq_poll_hdler+0x16
 irq_poll_softirq+0x76
 __softirqentry_text_start+0xe4
 irq_exit+0xf7
 do_IRQ+0x7f

CPU B:
 native_queued_spin_lock_slowpath+0x5b
 _raw_spin_lock+0x1c
 lpfc_abort_handler+0x13e
 scmd_eh_abort_handler+0x85
 process_one_work+0x1a7
 worker_thread+0x30
 kthread+0x112
 ret_from_fork+0x1f

Diagram of lockup:

CPUA                            CPUB
----                            ----
lpfc_cmd->buf_lock
                            phba->hbalock
                            lpfc_cmd->buf_lock
phba->hbalock

Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock
in lpfc_abort_handler routine so that it tries to take the
lpfc_cmd->buf_lock first before phba->hbalock.

Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_scsi.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index ae340850d94f..c3daf7a3e123 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5865,25 +5865,25 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 	if (!lpfc_cmd)
 		return ret;
 
-	spin_lock_irqsave(&phba->hbalock, flags);
+	/* Guard against IO completion being called at same time */
+	spin_lock_irqsave(&lpfc_cmd->buf_lock, flags);
+
+	spin_lock(&phba->hbalock);
 	/* driver queued commands are in process of being flushed */
 	if (phba->hba_flag & HBA_IOQ_FLUSH) {
 		lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP,
 			"3168 SCSI Layer abort requested I/O has been "
 			"flushed by LLD.\n");
 		ret = FAILED;
-		goto out_unlock;
+		goto out_unlock_hba;
 	}
 
-	/* Guard against IO completion being called at same time */
-	spin_lock(&lpfc_cmd->buf_lock);
-
 	if (!lpfc_cmd->pCmd) {
 		lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP,
 			 "2873 SCSI Layer I/O Abort Request IO CMPL Status "
 			 "x%x ID %d LUN %llu\n",
 			 SUCCESS, cmnd->device->id, cmnd->device->lun);
-		goto out_unlock_buf;
+		goto out_unlock_hba;
 	}
 
 	iocb = &lpfc_cmd->cur_iocbq;
@@ -5891,7 +5891,7 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 		pring_s4 = phba->sli4_hba.hdwq[iocb->hba_wqidx].io_wq->pring;
 		if (!pring_s4) {
 			ret = FAILED;
-			goto out_unlock_buf;
+			goto out_unlock_hba;
 		}
 		spin_lock(&pring_s4->ring_lock);
 	}
@@ -5924,8 +5924,8 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 			 "3389 SCSI Layer I/O Abort Request is pending\n");
 		if (phba->sli_rev == LPFC_SLI_REV4)
 			spin_unlock(&pring_s4->ring_lock);
-		spin_unlock(&lpfc_cmd->buf_lock);
-		spin_unlock_irqrestore(&phba->hbalock, flags);
+		spin_unlock(&phba->hbalock);
+		spin_unlock_irqrestore(&lpfc_cmd->buf_lock, flags);
 		goto wait_for_cmpl;
 	}
 
@@ -5946,15 +5946,13 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 	if (ret_val != IOCB_SUCCESS) {
 		/* Indicate the IO is not being aborted by the driver. */
 		lpfc_cmd->waitq = NULL;
-		spin_unlock(&lpfc_cmd->buf_lock);
-		spin_unlock_irqrestore(&phba->hbalock, flags);
 		ret = FAILED;
-		goto out;
+		goto out_unlock_hba;
 	}
 
 	/* no longer need the lock after this point */
-	spin_unlock(&lpfc_cmd->buf_lock);
-	spin_unlock_irqrestore(&phba->hbalock, flags);
+	spin_unlock(&phba->hbalock);
+	spin_unlock_irqrestore(&lpfc_cmd->buf_lock, flags);
 
 	if (phba->cfg_poll & DISABLE_FCP_RING_INT)
 		lpfc_sli_handle_fast_ring_event(phba,
@@ -5989,10 +5987,9 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 out_unlock_ring:
 	if (phba->sli_rev == LPFC_SLI_REV4)
 		spin_unlock(&pring_s4->ring_lock);
-out_unlock_buf:
-	spin_unlock(&lpfc_cmd->buf_lock);
-out_unlock:
-	spin_unlock_irqrestore(&phba->hbalock, flags);
+out_unlock_hba:
+	spin_unlock(&phba->hbalock);
+	spin_unlock_irqrestore(&lpfc_cmd->buf_lock, flags);
 out:
 	lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP,
 			 "0749 SCSI Layer I/O Abort Request Status x%x ID %d "
-- 
2.26.2


  parent reply	other threads:[~2022-04-12 23:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 22:19 [PATCH 00/26] lpfc: Update lpfc to revision 14.2.0.2 James Smart
2022-04-12 22:19 ` [PATCH 01/26] lpfc: Tweak message log categories for ELS/FDMI/NVME Rescan James Smart
2022-04-12 22:19 ` [PATCH 02/26] lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg James Smart
2022-04-12 22:19 ` [PATCH 03/26] lpfc: Fix diagnostic fw logging after a function reset James Smart
2022-04-12 22:19 ` [PATCH 04/26] lpfc: Zero SLI4 fcp_cmnd buffer's fcpCntl0 field James Smart
2022-04-12 22:19 ` [PATCH 05/26] lpfc: Requeue SCSI I/O to upper layer when fw reports link down James Smart
2022-04-12 22:19 ` James Smart [this message]
2022-04-12 22:19 ` [PATCH 07/26] lpfc: Clear fabric topology flag before initiating a new FLOGI James Smart
2022-04-12 22:19 ` [PATCH 08/26] lpfc: Fix null pointer dereference after failing to issue FLOGI and PLOGI James Smart
2022-04-12 22:19 ` [PATCH 09/26] lpfc: Protect memory leak for NPIV ports sending PLOGI_RJT James Smart
2022-04-12 22:19 ` [PATCH 10/26] lpfc: Update fc_prli_sent outstanding only after guaranteed IOCB submit James Smart
2022-04-12 22:19 ` [PATCH 11/26] lpfc: Transition to NPR state upon LOGO cmpl if link down or aborted James Smart
2022-04-12 22:19 ` [PATCH 12/26] lpfc: Remove unnecessary NULL pointer assignment for ELS_RDF path James Smart
2022-04-12 22:19 ` [PATCH 13/26] lpfc: Move MI module parameter check to handle dynamic disable James Smart
2022-04-12 22:19 ` [PATCH 14/26] lpfc: Correct CRC32 calculation for congestion stats James Smart
2022-04-12 22:19 ` [PATCH 15/26] lpfc: Fix call trace observed during I/O with CMF enabled James Smart
2022-04-12 22:19 ` [PATCH 16/26] lpfc: Revise FDMI reporting of supported port speed for trunk groups James Smart
2022-04-12 22:19 ` [PATCH 17/26] lpfc: Remove false FDMI NVME FC-4 support for NPIV ports James Smart
2022-04-12 22:20 ` [PATCH 18/26] lpfc: Register for Application Services FC-4 type in Fabric topology James Smart
2022-04-12 22:20 ` [PATCH 19/26] lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion James Smart
2022-04-12 22:20 ` [PATCH 20/26] lpfc: Fix field overload in lpfc_iocbq data structure James Smart
2022-04-13 16:25   ` kernel test robot
2022-04-12 22:20 ` [PATCH 21/26] lpfc: Refactor cleanup of mailbox commands James Smart
2022-04-12 22:20 ` [PATCH 22/26] lpfc: Change FA-PWWN detection methodology James Smart
2022-04-12 22:20 ` [PATCH 23/26] lpfc: Update stat accounting for READ_STATUS mbox command James Smart
2022-04-12 22:20 ` [PATCH 24/26] lpfc: Expand setting ELS_ID field in ELS_REQUEST64_WQE James Smart
2022-04-12 22:20 ` [PATCH 25/26] lpfc: Update lpfc version to 14.2.0.2 James Smart
2022-04-12 22:20 ` [PATCH 26/26] lpfc: Copyright updates for 14.2.0.2 patches James Smart
2022-04-19  2:50 ` [PATCH 00/26] lpfc: Update lpfc to revision 14.2.0.2 Martin K. Petersen
2022-04-26  4:00 ` Martin K. Petersen
2022-04-18 18:53 [PATCH 20/26] lpfc: Fix field overload in lpfc_iocbq data structure kernel test robot
2022-04-22 14:51 ` Dan Carpenter
2022-04-22 14:51 ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220412222008.126521-7-jsmart2021@gmail.com \
    --to=jsmart2021@gmail.com \
    --cc=justin.tee@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.