All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7
@ 2021-01-04 18:02 James Smart
  2021-01-04 18:02 ` [PATCH v2 01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config James Smart
                   ` (16 more replies)
  0 siblings, 17 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart

Update lpfc to revision 12.8.0.7

This patch set contains fixes and a cleanup of trace logging.

The patches were cut against Martin's 5.11/scsi-queue tree

---
v2:
 Reword description on patch 7 as it wasn't quite right:
   Prevent duplicate requests to unregister with cpuhp framework

James Smart (15):
  lpfc: Fix PLOGI S_ID of 0 on pt2pt config
  lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
  lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state
  lpfc: Fix crash when a fabric node is released prematurely.
  lpfc: Use the nvme-fc transport supplied timeout for LS requests
  lpfc: Fix FW reset action if IOs are outstanding
  lpfc: Prevent duplicate requests to unregister with cpuhp framework
  lpfc: Fix error log messages being logged following scsi task mgnt
  lpfc: Fix target reset failing
  lpfc: Fix NVME recovery after mailbox timeout
  lpfc: Fix vport create logging
  lpfc: Fix crash when nvmet transport calls host_release
  lpfc: Implement health checking when aborting io
  lpfc: Enhancements to LOG_TRACE_EVENT for better readability
  lpfc: Update lpfc version to 12.8.0.7

 drivers/scsi/lpfc/lpfc.h           |   4 +-
 drivers/scsi/lpfc/lpfc_attr.c      |   9 +-
 drivers/scsi/lpfc/lpfc_crtn.h      |   6 +-
 drivers/scsi/lpfc/lpfc_disc.h      |  15 +-
 drivers/scsi/lpfc/lpfc_els.c       |  47 +++---
 drivers/scsi/lpfc/lpfc_hbadisc.c   |  21 ++-
 drivers/scsi/lpfc/lpfc_init.c      | 241 +++++++++++++++++++----------
 drivers/scsi/lpfc/lpfc_nportdisc.c |  21 ++-
 drivers/scsi/lpfc/lpfc_nvme.c      |  45 +++---
 drivers/scsi/lpfc/lpfc_nvmet.c     |  33 +++-
 drivers/scsi/lpfc/lpfc_scsi.c      |  58 ++++++-
 drivers/scsi/lpfc/lpfc_sli.c       | 141 +++++++++++------
 drivers/scsi/lpfc/lpfc_version.h   |   2 +-
 drivers/scsi/lpfc/lpfc_vport.c     |   2 +-
 14 files changed, 436 insertions(+), 209 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3 James Smart
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Under some pt2pt situations, the other end of the link may issue a LOGO
after successfully completing PLOGI and assigning addresses to the port.
Thus the driver may attempt a new PLOGI to re-create the login, but the
LOGO handling cleared the address back to 0. Once this happens, the other
end, which may be address 0, gets all confused and this cannot be
resolved without an administrative action to bounce the link.

Fix by assuming that address assignment only occurs on the 1st PLOGI
after link up, and regardless of login state, the address assignment
sticks.  The FC standards aren't particularly clear in this situation
(it only describes initial PLOGI), but there is nothing that contradicts
this and behaviors on the devices tested appears to conform to the
understanding.

Thus, don't reset the port address to 0 as part of LOGO handling. Port
addresses will only reset on link down.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_els.c | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 96c087b8b474..e099caa04535 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -2815,7 +2815,6 @@ lpfc_cmpl_els_logo(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 	struct lpfc_nodelist *ndlp = (struct lpfc_nodelist *) cmdiocb->context1;
 	struct lpfc_vport *vport = ndlp->vport;
 	IOCB_t *irsp;
-	struct lpfcMboxq *mbox;
 	unsigned long flags;
 	uint32_t skip_recovery = 0;
 
@@ -2884,31 +2883,11 @@ lpfc_cmpl_els_logo(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 	lpfc_els_free_iocb(phba, cmdiocb);
 	lpfc_nlp_put(ndlp);
 
-	/* If we are in pt2pt mode, we could rcv new S_ID on PLOGI */
-	if ((vport->fc_flag & FC_PT2PT) &&
-		!(vport->fc_flag & FC_PT2PT_PLOGI)) {
-		phba->pport->fc_myDID = 0;
-
-		if ((vport->cfg_enable_fc4_type == LPFC_ENABLE_BOTH) ||
-		    (vport->cfg_enable_fc4_type == LPFC_ENABLE_NVME)) {
-			if (phba->nvmet_support)
-				lpfc_nvmet_update_targetport(phba);
-			else
-				lpfc_nvme_update_localport(phba->pport);
-		}
-
-		mbox = mempool_alloc(phba->mbox_mem_pool, GFP_KERNEL);
-		if (mbox) {
-			lpfc_config_link(phba, mbox);
-			mbox->mbox_cmpl = lpfc_sli_def_mbox_cmpl;
-			mbox->vport = vport;
-			if (lpfc_sli_issue_mbox(phba, mbox, MBX_NOWAIT) ==
-				MBX_NOT_FINISHED) {
-				mempool_free(mbox, phba->mbox_mem_pool);
-				skip_recovery = 1;
-			}
-		}
-	}
+	/* At this point, the LOGO processing is complete. NOTE: For a
+	 * pt2pt topology, we are assuming the NPortID will only change
+	 * on link up processing. For a LOGO / PLOGI initiated by the
+	 * Initiator, we are assuming the NPortID is not going to change.
+	 */
 
 	/*
 	 * If the node is a target, the handling attempts to recover the port.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
  2021-01-04 18:02 ` [PATCH v2 01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-06-07 11:06   ` Daniel Wagner
  2021-01-04 18:02 ` [PATCH v2 03/15] lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state James Smart
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

A very long time ago, there was a feature: auto sli mode. It gave the
user the ability to auto select the SLI mode (SLI2 or SLI3) to run the
port in, or even force SLI2 mode if configured.  Because of the
convoluted logic, the CONFIG_PORT mbox command ends up being called 2 or
3 times. It should have been called only once.  Additionally, the driver
no longer supports SLI-2, so only SLI-3 mode should be allowed.

The following changes were made:
- Force module parameter to SLI3 only.
- Rip out redundant CONFIG_PORT mbox commands.
- Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.
- Added changes for offline to online behavior

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc.h      |  1 +
 drivers/scsi/lpfc/lpfc_attr.c |  7 ++----
 drivers/scsi/lpfc/lpfc_init.c | 20 ++++++++-------
 drivers/scsi/lpfc/lpfc_sli.c  | 46 +++++++++--------------------------
 4 files changed, 26 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index a54c8da30273..7875552c07d3 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -779,6 +779,7 @@ struct lpfc_hba {
 					 */
 #define HBA_FLOGI_ISSUED	0x100000 /* FLOGI was issued */
 #define HBA_DEFER_FLOGI		0x800000 /* Defer FLOGI till read_sparm cmpl */
+#define HBA_NEEDS_CFG_PORT	0x2000000 /* SLI3 - needs a CONFIG_PORT mbox */
 
 	uint32_t fcp_ring_in_use; /* When polling test if intr-hndlr active*/
 	struct lpfc_dmabuf slim2p;
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 4528166dee36..f8bb6a4f780c 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -3441,11 +3441,8 @@ unsigned long lpfc_no_hba_reset[MAX_HBAS_NO_RESET] = {
 module_param_array(lpfc_no_hba_reset, ulong, &lpfc_no_hba_reset_cnt, 0444);
 MODULE_PARM_DESC(lpfc_no_hba_reset, "WWPN of HBAs that should not be reset");
 
-LPFC_ATTR(sli_mode, 0, 0, 3,
-	"SLI mode selector:"
-	" 0 - auto (SLI-3 if supported),"
-	" 2 - select SLI-2 even on SLI-3 capable HBAs,"
-	" 3 - select SLI-3");
+LPFC_ATTR(sli_mode, 3, 3, 3,
+	"SLI mode selector: 3 - select SLI-3");
 
 LPFC_ATTR_R(enable_npiv, 1, 0, 1,
 	"Enable NPIV functionality");
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index ac67f420ec26..1f0a62ecfad8 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -10728,17 +10728,19 @@ lpfc_sli_enable_intr(struct lpfc_hba *phba, uint32_t cfg_mode)
 	uint32_t intr_mode = LPFC_INTR_ERROR;
 	int retval;
 
+	/* Need to issue conf_port mbox cmd before conf_msi mbox cmd */
+	retval = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
+	if (retval)
+		return intr_mode;
+	phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;
+
 	if (cfg_mode == 2) {
-		/* Need to issue conf_port mbox cmd before conf_msi mbox cmd */
-		retval = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
+		/* Now, try to enable MSI-X interrupt mode */
+		retval = lpfc_sli_enable_msix(phba);
 		if (!retval) {
-			/* Now, try to enable MSI-X interrupt mode */
-			retval = lpfc_sli_enable_msix(phba);
-			if (!retval) {
-				/* Indicate initialization to MSI-X mode */
-				phba->intr_type = MSIX;
-				intr_mode = 2;
-			}
+			/* Indicate initialization to MSI-X mode */
+			phba->intr_type = MSIX;
+			intr_mode = 2;
 		}
 	}
 
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 95caad764fb7..735fa1d484eb 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -4359,6 +4359,8 @@ lpfc_sli_brdready_s3(struct lpfc_hba *phba, uint32_t mask)
 	if (lpfc_readl(phba->HSregaddr, &status))
 		return 1;
 
+	phba->hba_flag |= HBA_NEEDS_CFG_PORT;
+
 	/*
 	 * Check status register every 100ms for 5 retries, then every
 	 * 500ms for 5, then every 2.5 sec for 5, then reset board and
@@ -4687,6 +4689,7 @@ lpfc_sli_brdreset(struct lpfc_hba *phba)
 	/* perform board reset */
 	phba->fc_eventTag = 0;
 	phba->link_events = 0;
+	phba->hba_flag |= HBA_NEEDS_CFG_PORT;
 	if (phba->pport) {
 		phba->pport->fc_myDID = 0;
 		phba->pport->fc_prevDID = 0;
@@ -5020,6 +5023,8 @@ lpfc_sli_chipset_init(struct lpfc_hba *phba)
 		return -EIO;
 	}
 
+	phba->hba_flag |= HBA_NEEDS_CFG_PORT;
+
 	/* Clear all interrupt enable conditions */
 	writel(0, phba->HCregaddr);
 	readl(phba->HCregaddr); /* flush */
@@ -5316,45 +5321,18 @@ int
 lpfc_sli_hba_setup(struct lpfc_hba *phba)
 {
 	uint32_t rc;
-	int  mode = 3, i;
+	int  i;
 	int longs;
 
-	switch (phba->cfg_sli_mode) {
-	case 2:
-		if (phba->cfg_enable_npiv) {
-			lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
-				"1824 NPIV enabled: Override sli_mode "
-				"parameter (%d) to auto (0).\n",
-				phba->cfg_sli_mode);
-			break;
-		}
-		mode = 2;
-		break;
-	case 0:
-	case 3:
-		break;
-	default:
-		lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
-				"1819 Unrecognized sli_mode parameter: %d.\n",
-				phba->cfg_sli_mode);
-
-		break;
+	/* Enable ISR already does config_port because of config_msi mbx */
+	if (phba->hba_flag & HBA_NEEDS_CFG_PORT) {
+		rc = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
+		if (rc)
+			return -EIO;
+		phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;
 	}
 	phba->fcp_embed_io = 0;	/* SLI4 FC support only */
 
-	rc = lpfc_sli_config_port(phba, mode);
-
-	if (rc && phba->cfg_sli_mode == 3)
-		lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
-				"1820 Unable to select SLI-3.  "
-				"Not supported by adapter.\n");
-	if (rc && mode != 2)
-		rc = lpfc_sli_config_port(phba, 2);
-	else if (rc && mode == 2)
-		rc = lpfc_sli_config_port(phba, 3);
-	if (rc)
-		goto lpfc_sli_hba_setup_error;
-
 	/* Enable PCIe device Advanced Error Reporting (AER) if configured */
 	if (phba->cfg_aer_support == 1 && !(phba->hba_flag & HBA_AER_ENABLED)) {
 		rc = pci_enable_pcie_error_reporting(phba->pcidev);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 03/15] lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
  2021-01-04 18:02 ` [PATCH v2 01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config James Smart
  2021-01-04 18:02 ` [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3 James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 04/15] lpfc: Fix crash when a fabric node is released prematurely James Smart
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Testing with target ports coming and going, the driver eventually reached
a state where it no longer discovered the target. When the driver has
issued a PRLI and receives a PRLI from the target, it is not proper
updating the node's initiator/target role flags. Thus, when a subsequent
RSCN is received for a target loss, the driver mis-identifies the target
as an initiator and does not initiate lun scanning.

Fix by always refreshing the ndlp with the latest PRLI state information
whenever a PRLI is processed.  Also clear the ndlp flags when processing
a PLOGI so that there is no carry over through a re-login.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_nportdisc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index 1ac855640fc5..4961a8a55844 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -471,6 +471,15 @@ lpfc_rcv_plogi(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 		 */
 		if (!(ndlp->nlp_type & NLP_FABRIC) &&
 		    !(phba->nvmet_support)) {
+			/* Clear ndlp info, since follow up PRLI may have
+			 * updated ndlp information
+			 */
+			ndlp->nlp_type &= ~(NLP_FCP_TARGET | NLP_FCP_INITIATOR);
+			ndlp->nlp_type &= ~(NLP_NVME_TARGET | NLP_NVME_INITIATOR);
+			ndlp->nlp_fcp_info &= ~NLP_FCP_2_DEVICE;
+			ndlp->nlp_nvme_info &= ~NLP_NVME_NSLER;
+			ndlp->nlp_flag &= ~NLP_FIRSTBURST;
+
 			lpfc_els_rsp_acc(vport, ELS_CMD_PLOGI, cmdiocb,
 					 ndlp, NULL);
 			return 1;
@@ -499,6 +508,7 @@ lpfc_rcv_plogi(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 	ndlp->nlp_type &= ~(NLP_FCP_TARGET | NLP_FCP_INITIATOR);
 	ndlp->nlp_type &= ~(NLP_NVME_TARGET | NLP_NVME_INITIATOR);
 	ndlp->nlp_fcp_info &= ~NLP_FCP_2_DEVICE;
+	ndlp->nlp_nvme_info &= ~NLP_NVME_NSLER;
 	ndlp->nlp_flag &= ~NLP_FIRSTBURST;
 
 	login_mbox = NULL;
@@ -2107,6 +2117,7 @@ lpfc_rcv_prli_prli_issue(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 
 	if (!lpfc_rcv_prli_support_check(vport, ndlp, cmdiocb))
 		return ndlp->nlp_state;
+	lpfc_rcv_prli(vport, ndlp, cmdiocb);
 	lpfc_els_rsp_prli_acc(vport, cmdiocb, ndlp);
 	return ndlp->nlp_state;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 04/15] lpfc: Fix crash when a fabric node is released prematurely.
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (2 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 03/15] lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 05/15] lpfc: Use the nvme-fc transport supplied timeout for LS requests James Smart
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller
node while devloss is outstanding from the SCSI transport.  This is
triggered by an odd behavior where the switch reacts to a rejected RDP
request with a PLOGI and nothing else, not even a LOGO.  The driver
ACKS the PLOGI and after successfully registering the RPI, incorrectly
registers the fabric controller node because it has the NLP_FC4_FCP
flag still set from the fabric controller PRLI.  If a LIP is issued,
the driver attempts to cleanup on Link Up and ends up executing too
many puts.

Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence
dev_loss event.  The driver cannot count on any persistence from
fabric controller nodes.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 18 +++++++++++++-----
 drivers/scsi/lpfc/lpfc_nportdisc.c |  8 +++++++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 2b6b5fc671fe..bcb5bf7e19dc 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -73,6 +73,16 @@ static void lpfc_unregister_fcfi_cmpl(struct lpfc_hba *, LPFC_MBOXQ_t *);
 static int lpfc_fcf_inuse(struct lpfc_hba *);
 static void lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *, LPFC_MBOXQ_t *);
 
+static int
+lpfc_valid_xpt_node(struct lpfc_nodelist *ndlp)
+{
+	if (ndlp->nlp_fc4_type ||
+	    ndlp->nlp_DID == Fabric_DID ||
+	    ndlp->nlp_DID == NameServer_DID ||
+	    ndlp->nlp_DID == FDMI_DID)
+		return 1;
+	return 0;
+}
 /* The source of a terminate rport I/O is either a dev_loss_tmo
  * event or a call to fc_remove_host.  While the rport should be
  * valid during these downcalls, the transport can call twice
@@ -4318,7 +4328,8 @@ lpfc_nlp_state_cleanup(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 	/* FCP and NVME Transport interface */
 	if ((old_state == NLP_STE_MAPPED_NODE ||
 	     old_state == NLP_STE_UNMAPPED_NODE)) {
-		if (ndlp->rport) {
+		if (ndlp->rport &&
+		    lpfc_valid_xpt_node(ndlp)) {
 			vport->phba->nport_event_cnt++;
 			lpfc_unregister_remote_port(ndlp);
 		}
@@ -4340,10 +4351,7 @@ lpfc_nlp_state_cleanup(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 
 	if (new_state ==  NLP_STE_MAPPED_NODE ||
 	    new_state == NLP_STE_UNMAPPED_NODE) {
-		if (ndlp->nlp_fc4_type ||
-		    ndlp->nlp_DID == Fabric_DID ||
-		    ndlp->nlp_DID == NameServer_DID ||
-		    ndlp->nlp_DID == FDMI_DID) {
+		if (lpfc_valid_xpt_node(ndlp)) {
 			vport->phba->nport_event_cnt++;
 			/*
 			 * Tell the fc transport about the port, if we haven't
diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index 4961a8a55844..0d0d2ca1a5d8 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -1021,7 +1021,12 @@ lpfc_rcv_prli(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 			ndlp->nlp_fc4_type |= NLP_FC4_NVME;
 			lpfc_nlp_set_state(vport, ndlp, NLP_STE_UNMAPPED_NODE);
 		}
-		if (npr->prliType == PRLI_FCP_TYPE)
+
+		/* Fabric Controllers send FCP PRLI as an initiator but should
+		 * not get recognized as FCP type and registered with transport.
+		 */
+		if (npr->prliType == PRLI_FCP_TYPE &&
+		    !(ndlp->nlp_type & NLP_FABRIC))
 			ndlp->nlp_fc4_type |= NLP_FC4_FCP;
 	}
 	if (rport) {
@@ -2044,6 +2049,7 @@ lpfc_cmpl_reglogin_reglogin_issue(struct lpfc_vport *vport,
 		 * must complete PRLI.
 		 */
 		if (ndlp->nlp_type & NLP_FABRIC) {
+			ndlp->nlp_fc4_type &= ~NLP_FC4_FCP;
 			ndlp->nlp_prev_state = NLP_STE_REG_LOGIN_ISSUE;
 			lpfc_nlp_set_state(vport, ndlp, NLP_STE_UNMAPPED_NODE);
 		}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 05/15] lpfc: Use the nvme-fc transport supplied timeout for LS requests
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (3 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 04/15] lpfc: Fix crash when a fabric node is released prematurely James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 06/15] lpfc: Fix FW reset action if IOs are outstanding James Smart
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

When lpfc generates a GEN_REQUEST wqe for the nvme LS (such as Create
Association), the timeout is set to R_A_TOV  without regard to the
timeout value supplied by the nvme-fc transport. The driver should be
setting the timeout to the value passed into the routine. Additionally
the caller should be setting the timeout value to the value in the ls
request set by the nvme transport. Instead, it unconditionally is
setting it to a driver defined value.  So the driver actually overrode
the value twice.

Fix by using the timeout provided to the routine, and for the caller,
set the timeout to the ls request timeout value.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_nvme.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 1cb82fa6a60e..fd4a1cf0e4a6 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -458,7 +458,7 @@ lpfc_nvme_gen_req(struct lpfc_vport *vport, struct lpfc_dmabuf *bmp,
 	bf_set(wqe_xri_tag, &wqe->gen_req.wqe_com, genwqe->sli4_xritag);
 
 	/* Word 7 */
-	bf_set(wqe_tmo, &wqe->gen_req.wqe_com, (vport->phba->fc_ratov-1));
+	bf_set(wqe_tmo, &wqe->gen_req.wqe_com, tmo);
 	bf_set(wqe_class, &wqe->gen_req.wqe_com, CLASS3);
 	bf_set(wqe_cmnd, &wqe->gen_req.wqe_com, CMD_GEN_REQUEST64_WQE);
 	bf_set(wqe_ct, &wqe->gen_req.wqe_com, SLI4_CT_RPI);
@@ -615,7 +615,7 @@ __lpfc_nvme_ls_req(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 
 	ret = lpfc_nvme_gen_req(vport, bmp, pnvme_lsreq->rqstaddr,
 				pnvme_lsreq, gen_req_cmp, ndlp, 2,
-				LPFC_NVME_LS_TIMEOUT, 0);
+				pnvme_lsreq->timeout, 0);
 	if (ret != WQE_SUCCESS) {
 		lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
 				 "6052 NVMEx REQ: EXIT. issue ls wqe failed "
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 06/15] lpfc: Fix FW reset action if IOs are outstanding
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (4 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 05/15] lpfc: Use the nvme-fc transport supplied timeout for LS requests James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 07/15] lpfc: Prevent duplicate requests to unregister with cpuhp framework James Smart
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

If the port is configured for NVME and has any outstanding IOs when a FW
reset is requesteed, outstanding I/O's are not properly cleaned up. This
causes the fw download request to fail.

Fix by clearing the LPFC_SLI_ACTIVE flag to signify the I/O must be
manually flushed by the driver on port reset.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_init.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 1f0a62ecfad8..593b175702eb 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -1833,6 +1833,16 @@ lpfc_sli4_port_sta_fn_reset(struct lpfc_hba *phba, int mbx_action,
 		lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
 				"2887 Reset Needed: Attempting Port "
 				"Recovery...\n");
+
+	/* If we are no wait, the HBA has been reset and is not
+	 * functional, thus we should clear LPFC_SLI_ACTIVE flag.
+	 */
+	if (mbx_action == LPFC_MBX_NO_WAIT) {
+		spin_lock_irq(&phba->hbalock);
+		phba->sli.sli_flag &= ~LPFC_SLI_ACTIVE;
+		spin_unlock_irq(&phba->hbalock);
+	}
+
 	lpfc_offline_prep(phba, mbx_action);
 	lpfc_sli_flush_io_rings(phba);
 	lpfc_offline(phba);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 07/15] lpfc: Prevent duplicate requests to unregister with cpuhp framework
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (5 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 06/15] lpfc: Fix FW reset action if IOs are outstanding James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 08/15] lpfc: Fix error log messages being logged following scsi task mgnt James Smart
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

In the lpfc offline routine, called for various reasons such as sysfs
attribute, driver unload, or port error, the driver is calling
__lpfc_cpuhp_remove() to destroy the hot plug data. If the offline
routine is called while the driver is in the process of being unloaded,
a request using lpfc_cpuhp_remove() is also made from
lpfc_sli4_hba_unset(). The cpuhp elements are no longer valid when the
second removal request is made.

Fix by only calling the cpuhp removal once when the adapter is in the
process of unloading.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_init.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 593b175702eb..af926768bcae 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3602,7 +3602,11 @@ lpfc_offline(struct lpfc_hba *phba)
 			spin_unlock_irq(shost->host_lock);
 		}
 	lpfc_destroy_vport_work_array(phba, vports);
-	__lpfc_cpuhp_remove(phba);
+	/* If OFFLINE flag is clear (i.e. unloading), cpuhp removal is handled
+	 * in hba_unset
+	 */
+	if (phba->pport->fc_flag & FC_OFFLINE_MODE)
+		__lpfc_cpuhp_remove(phba);
 
 	if (phba->cfg_xri_rebalancing)
 		lpfc_destroy_multixri_pools(phba);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 08/15] lpfc: Fix error log messages being logged following scsi task mgnt
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (6 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 07/15] lpfc: Prevent duplicate requests to unregister with cpuhp framework James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 09/15] lpfc: Fix target reset failing James Smart
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

A successful task mgmt command is logging errors, making it look like
problems were encountered.  This is due to log messages for the
device/target and bus reset handlers having the LOG_TRACE_EVENT flag set.

Fix by adjusting the event flag such that the call to the logging routine
only receives a LOG_TRACE_EVENT if a prior call actually failed.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_scsi.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 3b989f720937..78f34b1af980 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5849,6 +5849,7 @@ lpfc_device_reset_handler(struct scsi_cmnd *cmnd)
 	uint64_t lun_id = cmnd->device->lun;
 	struct lpfc_scsi_event_header scsi_event;
 	int status;
+	u32 logit = LOG_FCP;
 
 	rdata = lpfc_rport_data_from_scsi_device(cmnd->device);
 	if (!rdata || !rdata->pnode) {
@@ -5880,8 +5881,10 @@ lpfc_device_reset_handler(struct scsi_cmnd *cmnd)
 
 	status = lpfc_send_taskmgmt(vport, cmnd, tgt_id, lun_id,
 						FCP_LUN_RESET);
+	if (status != SUCCESS)
+		logit =  LOG_TRACE_EVENT;
 
-	lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
+	lpfc_printf_vlog(vport, KERN_ERR, logit,
 			 "0713 SCSI layer issued Device Reset (%d, %llu) "
 			 "return x%x\n", tgt_id, lun_id, status);
 
@@ -5920,6 +5923,7 @@ lpfc_target_reset_handler(struct scsi_cmnd *cmnd)
 	uint64_t lun_id = cmnd->device->lun;
 	struct lpfc_scsi_event_header scsi_event;
 	int status;
+	u32 logit = LOG_FCP;
 
 	rdata = lpfc_rport_data_from_scsi_device(cmnd->device);
 	if (!rdata || !rdata->pnode) {
@@ -5959,8 +5963,10 @@ lpfc_target_reset_handler(struct scsi_cmnd *cmnd)
 
 	status = lpfc_send_taskmgmt(vport, cmnd, tgt_id, lun_id,
 					FCP_TARGET_RESET);
+	if (status != SUCCESS)
+		logit =  LOG_TRACE_EVENT;
 
-	lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
+	lpfc_printf_vlog(vport, KERN_ERR, logit,
 			 "0723 SCSI layer issued Target Reset (%d, %llu) "
 			 "return x%x\n", tgt_id, lun_id, status);
 
@@ -5996,6 +6002,7 @@ lpfc_bus_reset_handler(struct scsi_cmnd *cmnd)
 	struct lpfc_scsi_event_header scsi_event;
 	int match;
 	int ret = SUCCESS, status, i;
+	u32 logit = LOG_FCP;
 
 	scsi_event.event_type = FC_REG_SCSI_EVENT;
 	scsi_event.subcategory = LPFC_EVENT_BUSRESET;
@@ -6056,8 +6063,10 @@ lpfc_bus_reset_handler(struct scsi_cmnd *cmnd)
 	status = lpfc_reset_flush_io_context(vport, 0, 0, LPFC_CTX_HOST);
 	if (status != SUCCESS)
 		ret = FAILED;
+	if (ret == FAILED)
+		logit =  LOG_TRACE_EVENT;
 
-	lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
+	lpfc_printf_vlog(vport, KERN_ERR, logit,
 			 "0714 SCSI layer issued Bus Reset Data: x%x\n", ret);
 	return ret;
 }
@@ -6086,7 +6095,7 @@ lpfc_host_reset_handler(struct scsi_cmnd *cmnd)
 	struct lpfc_hba *phba = vport->phba;
 	int rc, ret = SUCCESS;
 
-	lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
+	lpfc_printf_vlog(vport, KERN_ERR, LOG_FCP,
 			 "3172 SCSI layer issued Host Reset Data:\n");
 
 	lpfc_offline_prep(phba, LPFC_MBX_WAIT);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 09/15] lpfc: Fix target reset failing
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (7 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 08/15] lpfc: Fix error log messages being logged following scsi task mgnt James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 10/15] lpfc: Fix NVME recovery after mailbox timeout James Smart
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Target reset is failed by the target as an invalid command.

The Target Reset TMF has been obsoleted in T10 for a while, but continues
to be used. On (newer) devices, the TMF is rejected causing the reset
handler to escalate to adapter resets.

Fix by having Target Reset TMF rejections be translated into a LOGO and
re-PLOGI with the target device. This provides the same semantic action
(although, if the device also supports nvme traffic, it will terminate
nvme traffic as well - but it's still recoverable).

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_disc.h |  3 +++
 drivers/scsi/lpfc/lpfc_els.c  |  7 +++++++
 drivers/scsi/lpfc/lpfc_scsi.c | 38 +++++++++++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_disc.h b/drivers/scsi/lpfc/lpfc_disc.h
index ea07afcb750a..4cea61b63fdf 100644
--- a/drivers/scsi/lpfc/lpfc_disc.h
+++ b/drivers/scsi/lpfc/lpfc_disc.h
@@ -135,14 +135,17 @@ struct lpfc_nodelist {
 	struct lpfc_scsicmd_bkt *lat_data;	/* Latency data */
 	uint32_t fc4_prli_sent;
 	uint32_t fc4_xpt_flags;
+	uint32_t upcall_flags;
 #define NLP_WAIT_FOR_UNREG    0x1
 #define SCSI_XPT_REGD         0x2
 #define NVME_XPT_REGD         0x4
+#define NLP_WAIT_FOR_LOGO     0x2
 
 
 	uint32_t nvme_fb_size; /* NVME target's supported byte cnt */
 #define NVME_FB_BIT_SHIFT 9    /* PRLI Rsp first burst in 512B units. */
 	uint32_t nlp_defer_did;
+	wait_queue_head_t *logo_waitq;
 };
 
 struct lpfc_node_rrq {
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index e099caa04535..c944f220406e 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -2817,6 +2817,7 @@ lpfc_cmpl_els_logo(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 	IOCB_t *irsp;
 	unsigned long flags;
 	uint32_t skip_recovery = 0;
+	int wake_up_waiter = 0;
 
 	/* we pass cmdiocb to state machine which needs rspiocb as well */
 	cmdiocb->context_un.rsp_iocb = rspiocb;
@@ -2824,6 +2825,10 @@ lpfc_cmpl_els_logo(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 	irsp = &(rspiocb->iocb);
 	spin_lock_irq(&ndlp->lock);
 	ndlp->nlp_flag &= ~NLP_LOGO_SND;
+	if (ndlp->upcall_flags & NLP_WAIT_FOR_LOGO) {
+		wake_up_waiter = 1;
+		ndlp->upcall_flags &= ~NLP_WAIT_FOR_LOGO;
+	}
 	spin_unlock_irq(&ndlp->lock);
 
 	lpfc_debugfs_disc_trc(vport, LPFC_DISC_TRC_ELS_CMD,
@@ -2889,6 +2894,8 @@ lpfc_cmpl_els_logo(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 	 * Initiator, we are assuming the NPortID is not going to change.
 	 */
 
+	if (wake_up_waiter && ndlp->logo_waitq)
+		wake_up(ndlp->logo_waitq);
 	/*
 	 * If the node is a target, the handling attempts to recover the port.
 	 * For any other port type, the rpi is unregistered as an implicit
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 78f34b1af980..4ee890de556b 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5924,6 +5924,8 @@ lpfc_target_reset_handler(struct scsi_cmnd *cmnd)
 	struct lpfc_scsi_event_header scsi_event;
 	int status;
 	u32 logit = LOG_FCP;
+	unsigned long flags;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(waitq);
 
 	rdata = lpfc_rport_data_from_scsi_device(cmnd->device);
 	if (!rdata || !rdata->pnode) {
@@ -5942,10 +5944,10 @@ lpfc_target_reset_handler(struct scsi_cmnd *cmnd)
 		lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
 			"0722 Target Reset rport failure: rdata x%px\n", rdata);
 		if (pnode) {
-			spin_lock_irq(&pnode->lock);
+			spin_lock_irqsave(&pnode->lock, flags);
 			pnode->nlp_flag &= ~NLP_NPR_ADISC;
 			pnode->nlp_fcp_info &= ~NLP_FCP_2_DEVICE;
-			spin_unlock_irq(&pnode->lock);
+			spin_unlock_irqrestore(&pnode->lock, flags);
 		}
 		lpfc_reset_flush_io_context(vport, tgt_id, lun_id,
 					  LPFC_CTX_TGT);
@@ -5965,6 +5967,38 @@ lpfc_target_reset_handler(struct scsi_cmnd *cmnd)
 					FCP_TARGET_RESET);
 	if (status != SUCCESS)
 		logit =  LOG_TRACE_EVENT;
+	spin_lock_irqsave(&pnode->lock, flags);
+	if (status != SUCCESS &&
+	    (!(pnode->upcall_flags & NLP_WAIT_FOR_LOGO)) &&
+	     !pnode->logo_waitq) {
+		pnode->logo_waitq = &waitq;
+		pnode->nlp_fcp_info &= ~NLP_FCP_2_DEVICE;
+		pnode->nlp_flag |= NLP_ISSUE_LOGO;
+		pnode->upcall_flags |= NLP_WAIT_FOR_LOGO;
+		spin_unlock_irqrestore(&pnode->lock, flags);
+		lpfc_unreg_rpi(vport, pnode);
+		wait_event_timeout(waitq,
+				   (!(pnode->upcall_flags & NLP_WAIT_FOR_LOGO)),
+				    msecs_to_jiffies(vport->cfg_devloss_tmo *
+				    1000));
+
+		if (pnode->upcall_flags & NLP_WAIT_FOR_LOGO) {
+			lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
+				"0725 SCSI layer TGTRST failed & LOGO TMO "
+				" (%d, %llu) return x%x\n", tgt_id,
+				 lun_id, status);
+			spin_lock_irqsave(&pnode->lock, flags);
+			pnode->upcall_flags &= ~NLP_WAIT_FOR_LOGO;
+		} else {
+			spin_lock_irqsave(&pnode->lock, flags);
+		}
+		pnode->logo_waitq = NULL;
+		spin_unlock_irqrestore(&pnode->lock, flags);
+		status = SUCCESS;
+	} else {
+		status = FAILED;
+		spin_unlock_irqrestore(&pnode->lock, flags);
+	}
 
 	lpfc_printf_vlog(vport, KERN_ERR, logit,
 			 "0723 SCSI layer issued Target Reset (%d, %llu) "
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 10/15] lpfc: Fix NVME recovery after mailbox timeout
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (8 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 09/15] lpfc: Fix target reset failing James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 11/15] lpfc: Fix vport create logging James Smart
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

If a mailbox command times out, the SLI port is deemd in error and the
port is reset.  The hba cleanup is not returning I/O's to the NVMe layer
before the port is unregistered. This is due to the hba being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout
handler rather than an general adapter reset routine.  The mailbox timeout
handler mailbox handler only cleaned up scsi ios.

Fix by reworking the mailbox handler to:
- After handling the mailbox error, detect the board is already in
  failure (may be due to another error), and leave cleanup to the
  other handler.
- If the mailbox command timeout is initial detector of the port error,
  continue with the board cleanup and marking the adapter offline
  (!SLI_ACTIVE). Remove the scsi-only io cleanup routine. The generic
  reset adapter routine that is subsequently invoked, will clean up the
  ios.
- Have the reset adapter routine flush all nvme and scsi ios if the
  adapter has been marked failed (!SLI_ACTIVE).
- Rework the nvme io terminate routine to take a status code to fail
  the io with and update so that cleaned up io calls the wqe completion
  routine. Currently it is bypassing the wqe cleanup and calling the nvme
  io completion directly. The wqe completion routine will take care of
  data structure and node cleanup then call the nvme io completion
  handler.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_crtn.h |  4 ++--
 drivers/scsi/lpfc/lpfc_init.c |  8 ++++++--
 drivers/scsi/lpfc/lpfc_nvme.c | 33 +++++++++++++++++----------------
 drivers/scsi/lpfc/lpfc_sli.c  | 20 ++++++++++++--------
 4 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index f78e52a18b0b..e70db9ec7da4 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -255,7 +255,6 @@ void lpfc_nvmet_ctxbuf_post(struct lpfc_hba *phba,
 int lpfc_nvmet_rcv_unsol_abort(struct lpfc_vport *vport,
 			       struct fc_frame_header *fc_hdr);
 void lpfc_nvmet_wqfull_process(struct lpfc_hba *phba, struct lpfc_queue *wq);
-void lpfc_sli_flush_nvme_rings(struct lpfc_hba *phba);
 void lpfc_nvme_wait_for_io_drain(struct lpfc_hba *phba);
 void lpfc_sli4_build_dflt_fcf_record(struct lpfc_hba *, struct fcf_record *,
 			uint16_t);
@@ -598,7 +597,8 @@ void lpfc_release_io_buf(struct lpfc_hba *phba, struct lpfc_io_buf *ncmd,
 void lpfc_io_ktime(struct lpfc_hba *phba, struct lpfc_io_buf *ncmd);
 void lpfc_wqe_cmd_template(void);
 void lpfc_nvmet_cmd_template(void);
-void lpfc_nvme_cancel_iocb(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn);
+void lpfc_nvme_cancel_iocb(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn,
+			   uint32_t stat, uint32_t param);
 extern int lpfc_enable_nvmet_cnt;
 extern unsigned long long lpfc_enable_nvmet[];
 extern int lpfc_no_hba_reset_cnt;
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index af926768bcae..c2619d56be12 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -6191,10 +6191,14 @@ lpfc_reset_hba(struct lpfc_hba *phba)
 		phba->link_state = LPFC_HBA_ERROR;
 		return;
 	}
-	if (phba->sli.sli_flag & LPFC_SLI_ACTIVE)
+
+	/* If not LPFC_SLI_ACTIVE, force all IO to be flushed */
+	if (phba->sli.sli_flag & LPFC_SLI_ACTIVE) {
 		lpfc_offline_prep(phba, LPFC_MBX_WAIT);
-	else
+	} else {
 		lpfc_offline_prep(phba, LPFC_MBX_NO_WAIT);
+		lpfc_sli_flush_io_rings(phba);
+	}
 	lpfc_offline(phba);
 	lpfc_sli_brdrestart(phba);
 	lpfc_online(phba);
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index fd4a1cf0e4a6..e72c4cd3a97a 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -2596,14 +2596,17 @@ lpfc_nvme_wait_for_io_drain(struct lpfc_hba *phba)
 }
 
 void
-lpfc_nvme_cancel_iocb(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn)
+lpfc_nvme_cancel_iocb(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn,
+		      uint32_t stat, uint32_t param)
 {
 #if (IS_ENABLED(CONFIG_NVME_FC))
 	struct lpfc_io_buf *lpfc_ncmd;
 	struct nvmefc_fcp_req *nCmd;
-	struct lpfc_nvme_fcpreq_priv *freqpriv;
+	struct lpfc_wcqe_complete wcqe;
+	struct lpfc_wcqe_complete *wcqep = &wcqe;
 
-	if (!pwqeIn->context1) {
+	lpfc_ncmd = (struct lpfc_io_buf *)pwqeIn->context1;
+	if (!lpfc_ncmd) {
 		lpfc_sli_release_iocbq(phba, pwqeIn);
 		return;
 	}
@@ -2613,31 +2616,29 @@ lpfc_nvme_cancel_iocb(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn)
 		lpfc_sli_release_iocbq(phba, pwqeIn);
 		return;
 	}
-	lpfc_ncmd = (struct lpfc_io_buf *)pwqeIn->context1;
 
 	spin_lock(&lpfc_ncmd->buf_lock);
-	if (!lpfc_ncmd->nvmeCmd) {
+	nCmd = lpfc_ncmd->nvmeCmd;
+	if (!nCmd) {
 		spin_unlock(&lpfc_ncmd->buf_lock);
 		lpfc_release_nvme_buf(phba, lpfc_ncmd);
 		return;
 	}
+	spin_unlock(&lpfc_ncmd->buf_lock);
 
-	nCmd = lpfc_ncmd->nvmeCmd;
 	lpfc_printf_log(phba, KERN_INFO, LOG_NVME_IOERR,
 			"6194 NVME Cancel xri %x\n",
 			lpfc_ncmd->cur_iocbq.sli4_xritag);
 
-	nCmd->transferred_length = 0;
-	nCmd->rcv_rsplen = 0;
-	nCmd->status = NVME_SC_INTERNAL;
-	freqpriv = nCmd->private;
-	freqpriv->nvme_buf = NULL;
-	lpfc_ncmd->nvmeCmd = NULL;
-
-	spin_unlock(&lpfc_ncmd->buf_lock);
-	nCmd->done(nCmd);
+	wcqep->word0 = 0;
+	bf_set(lpfc_wcqe_c_status, wcqep, stat);
+	wcqep->parameter = param;
+	wcqep->word3 = 0; /* xb is 0 */
 
 	/* Call release with XB=1 to queue the IO into the abort list. */
-	lpfc_release_nvme_buf(phba, lpfc_ncmd);
+	if (phba->sli.sli_flag & LPFC_SLI_ACTIVE)
+		bf_set(lpfc_wcqe_c_xb, wcqep, 1);
+
+	(pwqeIn->wqe_cmpl)(phba, pwqeIn, wcqep);
 #endif
 }
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 735fa1d484eb..dedea5de7d78 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -1532,15 +1532,19 @@ lpfc_sli_cancel_iocbs(struct lpfc_hba *phba, struct list_head *iocblist,
 
 	while (!list_empty(iocblist)) {
 		list_remove_head(iocblist, piocb, struct lpfc_iocbq, list);
-		if (!piocb->iocb_cmpl) {
+		if (piocb->wqe_cmpl) {
 			if (piocb->iocb_flag & LPFC_IO_NVME)
-				lpfc_nvme_cancel_iocb(phba, piocb);
+				lpfc_nvme_cancel_iocb(phba, piocb,
+						      ulpstatus, ulpWord4);
 			else
 				lpfc_sli_release_iocbq(phba, piocb);
-		} else {
+
+		} else if (piocb->iocb_cmpl) {
 			piocb->iocb.ulpStatus = ulpstatus;
 			piocb->iocb.un.ulpWord[4] = ulpWord4;
 			(piocb->iocb_cmpl) (phba, piocb, piocb);
+		} else {
+			lpfc_sli_release_iocbq(phba, piocb);
 		}
 	}
 	return;
@@ -8269,8 +8273,10 @@ lpfc_mbox_timeout_handler(struct lpfc_hba *phba)
 
 	struct lpfc_sli *psli = &phba->sli;
 
-	/* If the mailbox completed, process the completion and return */
-	if (lpfc_sli4_process_missed_mbox_completions(phba))
+	/* If the mailbox completed, process the completion */
+	lpfc_sli4_process_missed_mbox_completions(phba);
+
+	if (!(psli->sli_flag & LPFC_SLI_ACTIVE))
 		return;
 
 	if (pmbox != NULL)
@@ -8311,8 +8317,6 @@ lpfc_mbox_timeout_handler(struct lpfc_hba *phba)
 	psli->sli_flag &= ~LPFC_SLI_ACTIVE;
 	spin_unlock_irq(&phba->hbalock);
 
-	lpfc_sli_abort_fcp_rings(phba);
-
 	lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
 			"0345 Resetting board due to mailbox timeout\n");
 
@@ -11783,7 +11787,7 @@ lpfc_sli_validate_fcp_iocb(struct lpfc_iocbq *iocbq, struct lpfc_vport *vport,
 	struct lpfc_io_buf *lpfc_cmd;
 	int rc = 1;
 
-	if (iocbq->vport != vport)
+	if (!iocbq || iocbq->vport != vport)
 		return rc;
 
 	if (!(iocbq->iocb_flag &  LPFC_IO_FCP) ||
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 11/15] lpfc: Fix vport create logging
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (9 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 10/15] lpfc: Fix NVME recovery after mailbox timeout James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 12/15] lpfc: Fix crash when nvmet transport calls host_release James Smart
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

When with testing with large numbers of npiv vports and link bounces,
the driver is flooding the messages file, even with log_verbose = 0.

The new LOG_TRACE_EVENT messages are still generating events to the
messages files.

Fix by converting the vport create msg from LOG_TRACE_EVENT to
LOG_VPORT.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_vport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_vport.c b/drivers/scsi/lpfc/lpfc_vport.c
index a99fdfba7d27..ccf7b6cd0bd8 100644
--- a/drivers/scsi/lpfc/lpfc_vport.c
+++ b/drivers/scsi/lpfc/lpfc_vport.c
@@ -478,7 +478,7 @@ lpfc_vport_create(struct fc_vport *fc_vport, bool disable)
 	rc = VPORT_OK;
 
 out:
-	lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
+	lpfc_printf_vlog(vport, KERN_ERR, LOG_VPORT,
 			 "1825 Vport Created.\n");
 	lpfc_host_attrib_init(lpfc_shost_from_vport(vport));
 error_out:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 12/15] lpfc: Fix crash when nvmet transport calls host_release
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (10 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 11/15] lpfc: Fix vport create logging James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 13/15] lpfc: Implement health checking when aborting io James Smart
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

When lpfc is running in NVMET mode and supports the NVME-1 addendum
changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's
link resulted in an Oops in lpfc_nvmet_host_release.

The fix requires lpfc NVMET to maintain an additional reference on any
node structure that acts as the hosthandle for the NVMET transport.
This reference get is a one-time addition, is taken prior to the upcall
of an unsolicited LS_REQ, and is released when the NVMET transport releases
the hosthandle during the host_release downcall.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_disc.h  | 16 ++++++++++------
 drivers/scsi/lpfc/lpfc_nvmet.c | 33 ++++++++++++++++++++++++++++-----
 drivers/scsi/lpfc/lpfc_sli.c   | 29 +++++++++++++++++++++++++----
 3 files changed, 63 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_disc.h b/drivers/scsi/lpfc/lpfc_disc.h
index 4cea61b63fdf..8ce13ef3cac3 100644
--- a/drivers/scsi/lpfc/lpfc_disc.h
+++ b/drivers/scsi/lpfc/lpfc_disc.h
@@ -77,6 +77,13 @@ struct lpfc_node_rrqs {
 	unsigned long xri_bitmap[XRI_BITMAP_ULONGS];
 };
 
+enum lpfc_fc4_xpt_flags {
+	NLP_WAIT_FOR_UNREG = 0x1,
+	SCSI_XPT_REGD      = 0x2,
+	NVME_XPT_REGD      = 0x4,
+	NLP_XPT_HAS_HH     = 0x8,
+};
+
 struct lpfc_nodelist {
 	struct list_head nlp_listp;
 	struct lpfc_name nlp_portname;
@@ -134,13 +141,10 @@ struct lpfc_nodelist {
 	unsigned long *active_rrqs_xri_bitmap;
 	struct lpfc_scsicmd_bkt *lat_data;	/* Latency data */
 	uint32_t fc4_prli_sent;
-	uint32_t fc4_xpt_flags;
-	uint32_t upcall_flags;
-#define NLP_WAIT_FOR_UNREG    0x1
-#define SCSI_XPT_REGD         0x2
-#define NVME_XPT_REGD         0x4
-#define NLP_WAIT_FOR_LOGO     0x2
+	u32 upcall_flags;
+#define	NLP_WAIT_FOR_LOGO 0x2
 
+	enum lpfc_fc4_xpt_flags fc4_xpt_flags;
 
 	uint32_t nvme_fb_size; /* NVME target's supported byte cnt */
 #define NVME_FB_BIT_SHIFT 9    /* PRLI Rsp first burst in 512B units. */
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index a71df8788fff..bb2a4a0d1295 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -1367,17 +1367,22 @@ static void
 lpfc_nvmet_host_release(void *hosthandle)
 {
 	struct lpfc_nodelist *ndlp = hosthandle;
-	struct lpfc_hba *phba = NULL;
+	struct lpfc_hba *phba = ndlp->phba;
 	struct lpfc_nvmet_tgtport *tgtp;
 
-	phba = ndlp->phba;
 	if (!phba->targetport || !phba->targetport->private)
 		return;
 
 	lpfc_printf_log(phba, KERN_ERR, LOG_NVME,
-			"6202 NVMET XPT releasing hosthandle x%px\n",
-			hosthandle);
+			"6202 NVMET XPT releasing hosthandle x%px "
+			"DID x%x xflags x%x refcnt %d\n",
+			hosthandle, ndlp->nlp_DID, ndlp->fc4_xpt_flags,
+			kref_read(&ndlp->kref));
 	tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private;
+	spin_lock_irq(&ndlp->lock);
+	ndlp->fc4_xpt_flags &= ~NLP_XPT_HAS_HH;
+	spin_unlock_irq(&ndlp->lock);
+	lpfc_nlp_put(ndlp);
 	atomic_set(&tgtp->state, 0);
 }
 
@@ -3644,15 +3649,33 @@ lpfc_nvme_unsol_ls_issue_abort(struct lpfc_hba *phba,
 void
 lpfc_nvmet_invalidate_host(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp)
 {
+	u32 ndlp_has_hh;
 	struct lpfc_nvmet_tgtport *tgtp;
 
-	lpfc_printf_log(phba, KERN_INFO, LOG_NVME | LOG_NVME_ABTS,
+	lpfc_printf_log(phba, KERN_INFO,
+			LOG_NVME | LOG_NVME_ABTS | LOG_NVME_DISC,
 			"6203 Invalidating hosthandle x%px\n",
 			ndlp);
 
 	tgtp = (struct lpfc_nvmet_tgtport *)phba->targetport->private;
 	atomic_set(&tgtp->state, LPFC_NVMET_INV_HOST_ACTIVE);
 
+	spin_lock_irq(&ndlp->lock);
+	ndlp_has_hh = ndlp->fc4_xpt_flags & NLP_XPT_HAS_HH;
+	spin_unlock_irq(&ndlp->lock);
+
+	/* Do not invalidate any nodes that do not have a hosthandle.
+	 * The host_release callbk will cause a node reference
+	 * count imbalance and a crash.
+	 */
+	if (!ndlp_has_hh) {
+		lpfc_printf_log(phba, KERN_INFO,
+				LOG_NVME | LOG_NVME_ABTS | LOG_NVME_DISC,
+				"6204 Skip invalidate on node x%px DID x%x\n",
+				ndlp, ndlp->nlp_DID);
+		return;
+	}
+
 #if (IS_ENABLED(CONFIG_NVME_TARGET_FC))
 	/* Need to get the nvmet_fc_target_port pointer here.*/
 	nvmet_fc_invalidate_host(phba->targetport, ndlp);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index dedea5de7d78..176706aaebf5 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -3011,23 +3011,44 @@ lpfc_nvme_unsol_ls_handler(struct lpfc_hba *phba, struct lpfc_iocbq *piocb)
 	axchg->payload = nvmebuf->dbuf.virt;
 	INIT_LIST_HEAD(&axchg->list);
 
-	if (phba->nvmet_support)
+	if (phba->nvmet_support) {
 		ret = lpfc_nvmet_handle_lsreq(phba, axchg);
-	else
+		spin_lock_irq(&ndlp->lock);
+		if (!ret && !(ndlp->fc4_xpt_flags & NLP_XPT_HAS_HH)) {
+			ndlp->fc4_xpt_flags |= NLP_XPT_HAS_HH;
+			spin_unlock_irq(&ndlp->lock);
+
+			/* This reference is a single occurrence to hold the
+			 * node valid until the nvmet transport calls
+			 * host_release.
+			 */
+			if (!lpfc_nlp_get(ndlp))
+				goto out_fail;
+
+			lpfc_printf_log(phba, KERN_ERR, LOG_NODE,
+					"6206 NVMET unsol ls_req ndlp %p "
+					"DID x%x xflags x%x refcnt %d\n",
+					ndlp, ndlp->nlp_DID,
+					ndlp->fc4_xpt_flags,
+					kref_read(&ndlp->kref));
+		} else {
+			spin_unlock_irq(&ndlp->lock);
+		}
+	} else {
 		ret = lpfc_nvme_handle_lsreq(phba, axchg);
+	}
 
 	/* if zero, LS was successfully handled. If non-zero, LS not handled */
 	if (!ret)
 		return;
 
+out_fail:
 	lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
 			"6155 Drop NVME LS from DID %06X: SID %06X OXID x%X "
 			"NVMe%s handler failed %d\n",
 			did, sid, oxid,
 			(phba->nvmet_support) ? "T" : "I", ret);
 
-out_fail:
-
 	/* recycle receive buffer */
 	lpfc_in_buf_free(phba, &nvmebuf->dbuf);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 13/15] lpfc: Implement health checking when aborting io
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (11 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 12/15] lpfc: Fix crash when nvmet transport calls host_release James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 14/15] lpfc: Enhancements to LOG_TRACE_EVENT for better readability James Smart
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Several errors have occurred where the adapter stops or fails but does
not raise the register values for the driver to detect failure. Thus
driver is unaware of the failure. The failure typically results in io
timeouts, the io timeout handler failing (after several seconds), and
the error handler escalating recovery policy and resulting in more
errors. Eventually, the driver is in a position where things have
spiraled and it can't do recovery because other recovery ops are still
outstanding and it becomes unusable.

Resolve the situation by having the io timeout handler (actually a els,
scsi io, nvme ls, or nvme io timeout), in addition to aborting the io,
perform a mailbox command and look for a response from the hardware.
if the mailbox command fails, it will mark the adapter offline and then
invoke the adapter reset handler to clean up.

The new io timeout test will be limited to a test every 5s. If there are
multiple io timeouts concurrently, only the 1st io timeout will generate
the mailbox command. Further testing will only occur once a timeout
occurs after a 5s delay from the last mailbox command has expired.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc.h           |   3 +-
 drivers/scsi/lpfc/lpfc_attr.c      |   2 +
 drivers/scsi/lpfc/lpfc_crtn.h      |   2 +
 drivers/scsi/lpfc/lpfc_els.c       |   9 ++
 drivers/scsi/lpfc/lpfc_hbadisc.c   |   3 +
 drivers/scsi/lpfc/lpfc_init.c      | 177 +++++++++++++++++------------
 drivers/scsi/lpfc/lpfc_nportdisc.c |   2 +
 drivers/scsi/lpfc/lpfc_nvme.c      |   8 ++
 drivers/scsi/lpfc/lpfc_scsi.c      |   3 +
 drivers/scsi/lpfc/lpfc_sli.c       |  44 ++++++-
 10 files changed, 178 insertions(+), 75 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 7875552c07d3..6ba5fa08c47a 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -780,6 +780,8 @@ struct lpfc_hba {
 #define HBA_FLOGI_ISSUED	0x100000 /* FLOGI was issued */
 #define HBA_DEFER_FLOGI		0x800000 /* Defer FLOGI till read_sparm cmpl */
 #define HBA_NEEDS_CFG_PORT	0x2000000 /* SLI3 - needs a CONFIG_PORT mbox */
+#define HBA_HBEAT_INP		0x4000000 /* mbox HBEAT is in progress */
+#define HBA_HBEAT_TMO		0x8000000 /* HBEAT initiated after timeout */
 
 	uint32_t fcp_ring_in_use; /* When polling test if intr-hndlr active*/
 	struct lpfc_dmabuf slim2p;
@@ -1136,7 +1138,6 @@ struct lpfc_hba {
 	unsigned long last_completion_time;
 	unsigned long skipped_hb;
 	struct timer_list hb_tmofunc;
-	uint8_t hb_outstanding;
 	struct timer_list rrq_tmr;
 	enum hba_temp_state over_temp_state;
 	/*
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index f8bb6a4f780c..bdd9a29f4201 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -1788,6 +1788,8 @@ lpfc_board_mode_store(struct device *dev, struct device_attribute *attr,
 	else if (strncmp(buf, "pci_bus_reset", sizeof("pci_bus_reset") - 1)
 		 == 0)
 		status = lpfc_reset_pci_bus(phba);
+	else if (strncmp(buf, "heartbeat", sizeof("heartbeat") - 1) == 0)
+		lpfc_issue_hb_tmo(phba);
 	else if (strncmp(buf, "trunk", sizeof("trunk") - 1) == 0)
 		status = lpfc_set_trunking(phba, (char *)buf + sizeof("trunk"));
 	else
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index e70db9ec7da4..a0aad4896a45 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -359,6 +359,8 @@ lpfc_sli_abort_taskmgmt(struct lpfc_vport *, struct lpfc_sli_ring *,
 
 void lpfc_mbox_timeout(struct timer_list *t);
 void lpfc_mbox_timeout_handler(struct lpfc_hba *);
+int lpfc_issue_hb_mbox(struct lpfc_hba *phba);
+void lpfc_issue_hb_tmo(struct lpfc_hba *phba);
 
 struct lpfc_nodelist *lpfc_findnode_did(struct lpfc_vport *, uint32_t);
 struct lpfc_nodelist *lpfc_findnode_wwpn(struct lpfc_vport *,
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index c944f220406e..d1bb99220495 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -1428,6 +1428,9 @@ lpfc_els_abort_flogi(struct lpfc_hba *phba)
 							   NULL);
 		}
 	}
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	spin_unlock_irq(&phba->hbalock);
 
 	return 0;
@@ -8127,6 +8130,9 @@ lpfc_els_timeout_handler(struct lpfc_vport *vport)
 		spin_unlock_irq(&phba->hbalock);
 	}
 
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	if (!list_empty(&pring->txcmplq))
 		if (!(phba->pport->load_flag & FC_UNLOADING))
 			mod_timer(&vport->els_tmofunc,
@@ -8226,6 +8232,9 @@ lpfc_els_flush_cmd(struct lpfc_vport *vport)
 		lpfc_sli_issue_abort_iotag(phba, pring, piocb, NULL);
 		spin_unlock_irqrestore(&phba->hbalock, iflags);
 	}
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	if (!list_empty(&abort_list))
 		lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
 				 "3387 abort list for txq not empty\n");
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index bcb5bf7e19dc..f890b5b7e6ca 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -5619,6 +5619,9 @@ lpfc_free_tx(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp)
 	}
 	spin_unlock_irq(&phba->hbalock);
 
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	/* Cancel all the IOCBs from the completions list */
 	lpfc_sli_cancel_iocbs(phba, &completions, IOSTAT_LOCAL_REJECT,
 			      IOERR_SLI_ABORTED);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index c2619d56be12..dbd7e40f67f9 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -591,7 +591,7 @@ lpfc_config_port_post(struct lpfc_hba *phba)
 	/* Set up heart beat (HB) timer */
 	mod_timer(&phba->hb_tmofunc,
 		  jiffies + msecs_to_jiffies(1000 * LPFC_HB_MBOX_INTERVAL));
-	phba->hb_outstanding = 0;
+	phba->hba_flag &= ~(HBA_HBEAT_INP | HBA_HBEAT_TMO);
 	phba->last_completion_time = jiffies;
 	/* Set up error attention (ERATT) polling timer */
 	mod_timer(&phba->eratt_poll,
@@ -1204,10 +1204,10 @@ lpfc_hb_mbox_cmpl(struct lpfc_hba * phba, LPFC_MBOXQ_t * pmboxq)
 	unsigned long drvr_flag;
 
 	spin_lock_irqsave(&phba->hbalock, drvr_flag);
-	phba->hb_outstanding = 0;
+	phba->hba_flag &= ~(HBA_HBEAT_INP | HBA_HBEAT_TMO);
 	spin_unlock_irqrestore(&phba->hbalock, drvr_flag);
 
-	/* Check and reset heart-beat timer is necessary */
+	/* Check and reset heart-beat timer if necessary */
 	mempool_free(pmboxq, phba->mbox_mem_pool);
 	if (!(phba->pport->fc_flag & FC_OFFLINE_MODE) &&
 		!(phba->link_state == LPFC_HBA_ERROR) &&
@@ -1380,6 +1380,60 @@ static void lpfc_hb_mxp_handler(struct lpfc_hba *phba)
 	}
 }
 
+/**
+ * lpfc_issue_hb_mbox - Issues heart-beat mailbox command
+ * @phba: pointer to lpfc hba data structure.
+ *
+ * If a HB mbox is not already in progrees, this routine will allocate
+ * a LPFC_MBOXQ_t, populate it with a MBX_HEARTBEAT (0x31) command,
+ * and issue it. The HBA_HBEAT_INP flag means the command is in progress.
+ **/
+int
+lpfc_issue_hb_mbox(struct lpfc_hba *phba)
+{
+	LPFC_MBOXQ_t *pmboxq;
+	int retval;
+
+	/* Is a Heartbeat mbox already in progress */
+	if (phba->hba_flag & HBA_HBEAT_INP)
+		return 0;
+
+	pmboxq = mempool_alloc(phba->mbox_mem_pool, GFP_KERNEL);
+	if (!pmboxq)
+		return -ENOMEM;
+
+	lpfc_heart_beat(phba, pmboxq);
+	pmboxq->mbox_cmpl = lpfc_hb_mbox_cmpl;
+	pmboxq->vport = phba->pport;
+	retval = lpfc_sli_issue_mbox(phba, pmboxq, MBX_NOWAIT);
+
+	if (retval != MBX_BUSY && retval != MBX_SUCCESS) {
+		mempool_free(pmboxq, phba->mbox_mem_pool);
+		return -ENXIO;
+	}
+	phba->hba_flag |= HBA_HBEAT_INP;
+
+	return 0;
+}
+
+/**
+ * lpfc_issue_hb_tmo - Signals heartbeat timer to issue mbox command
+ * @phba: pointer to lpfc hba data structure.
+ *
+ * The heartbeat timer (every 5 sec) will fire. If the HBA_HBEAT_TMO
+ * flag is set, it will force a MBX_HEARTBEAT mbox command, regardless
+ * of the value of lpfc_enable_hba_heartbeat.
+ * If lpfc_enable_hba_heartbeat is set, the timeout routine will always
+ * try to issue a MBX_HEARTBEAT mbox command.
+ **/
+void
+lpfc_issue_hb_tmo(struct lpfc_hba *phba)
+{
+	if (phba->cfg_enable_hba_heartbeat)
+		return;
+	phba->hba_flag |= HBA_HBEAT_TMO;
+}
+
 /**
  * lpfc_hb_timeout_handler - The HBA-timer timeout handler
  * @phba: pointer to lpfc hba data structure.
@@ -1400,9 +1454,9 @@ void
 lpfc_hb_timeout_handler(struct lpfc_hba *phba)
 {
 	struct lpfc_vport **vports;
-	LPFC_MBOXQ_t *pmboxq;
 	struct lpfc_dmabuf *buf_ptr;
-	int retval, i;
+	int retval = 0;
+	int i, tmo;
 	struct lpfc_sli *psli = &phba->sli;
 	LIST_HEAD(completions);
 
@@ -1424,24 +1478,6 @@ lpfc_hb_timeout_handler(struct lpfc_hba *phba)
 		(phba->pport->fc_flag & FC_OFFLINE_MODE))
 		return;
 
-	spin_lock_irq(&phba->pport->work_port_lock);
-
-	if (time_after(phba->last_completion_time +
-			msecs_to_jiffies(1000 * LPFC_HB_MBOX_INTERVAL),
-			jiffies)) {
-		spin_unlock_irq(&phba->pport->work_port_lock);
-		if (!phba->hb_outstanding)
-			mod_timer(&phba->hb_tmofunc,
-				jiffies +
-				msecs_to_jiffies(1000 * LPFC_HB_MBOX_INTERVAL));
-		else
-			mod_timer(&phba->hb_tmofunc,
-				jiffies +
-				msecs_to_jiffies(1000 * LPFC_HB_MBOX_TIMEOUT));
-		return;
-	}
-	spin_unlock_irq(&phba->pport->work_port_lock);
-
 	if (phba->elsbuf_cnt &&
 		(phba->elsbuf_cnt == phba->elsbuf_prev_cnt)) {
 		spin_lock_irq(&phba->hbalock);
@@ -1461,37 +1497,43 @@ lpfc_hb_timeout_handler(struct lpfc_hba *phba)
 
 	/* If there is no heart beat outstanding, issue a heartbeat command */
 	if (phba->cfg_enable_hba_heartbeat) {
-		if (!phba->hb_outstanding) {
+		/* If IOs are completing, no need to issue a MBX_HEARTBEAT */
+		spin_lock_irq(&phba->pport->work_port_lock);
+		if (time_after(phba->last_completion_time +
+				msecs_to_jiffies(1000 * LPFC_HB_MBOX_INTERVAL),
+				jiffies)) {
+			spin_unlock_irq(&phba->pport->work_port_lock);
+			if (phba->hba_flag & HBA_HBEAT_INP)
+				tmo = (1000 * LPFC_HB_MBOX_TIMEOUT);
+			else
+				tmo = (1000 * LPFC_HB_MBOX_INTERVAL);
+			goto out;
+		}
+		spin_unlock_irq(&phba->pport->work_port_lock);
+
+		/* Check if a MBX_HEARTBEAT is already in progress */
+		if (phba->hba_flag & HBA_HBEAT_INP) {
+			/*
+			 * If heart beat timeout called with HBA_HBEAT_INP set
+			 * we need to give the hb mailbox cmd a chance to
+			 * complete or TMO.
+			 */
+			lpfc_printf_log(phba, KERN_WARNING, LOG_INIT,
+				"0459 Adapter heartbeat still outstanding: "
+				"last compl time was %d ms.\n",
+				jiffies_to_msecs(jiffies
+					 - phba->last_completion_time));
+			tmo = (1000 * LPFC_HB_MBOX_TIMEOUT);
+		} else {
 			if ((!(psli->sli_flag & LPFC_SLI_MBOX_ACTIVE)) &&
 				(list_empty(&psli->mboxq))) {
-				pmboxq = mempool_alloc(phba->mbox_mem_pool,
-							GFP_KERNEL);
-				if (!pmboxq) {
-					mod_timer(&phba->hb_tmofunc,
-						 jiffies +
-						 msecs_to_jiffies(1000 *
-						 LPFC_HB_MBOX_INTERVAL));
-					return;
-				}
 
-				lpfc_heart_beat(phba, pmboxq);
-				pmboxq->mbox_cmpl = lpfc_hb_mbox_cmpl;
-				pmboxq->vport = phba->pport;
-				retval = lpfc_sli_issue_mbox(phba, pmboxq,
-						MBX_NOWAIT);
-
-				if (retval != MBX_BUSY &&
-					retval != MBX_SUCCESS) {
-					mempool_free(pmboxq,
-							phba->mbox_mem_pool);
-					mod_timer(&phba->hb_tmofunc,
-						jiffies +
-						msecs_to_jiffies(1000 *
-						LPFC_HB_MBOX_INTERVAL));
-					return;
+				retval = lpfc_issue_hb_mbox(phba);
+				if (retval) {
+					tmo = (1000 * LPFC_HB_MBOX_INTERVAL);
+					goto out;
 				}
 				phba->skipped_hb = 0;
-				phba->hb_outstanding = 1;
 			} else if (time_before_eq(phba->last_completion_time,
 					phba->skipped_hb)) {
 				lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
@@ -1502,30 +1544,23 @@ lpfc_hb_timeout_handler(struct lpfc_hba *phba)
 			} else
 				phba->skipped_hb = jiffies;
 
-			mod_timer(&phba->hb_tmofunc,
-				 jiffies +
-				 msecs_to_jiffies(1000 * LPFC_HB_MBOX_TIMEOUT));
-			return;
-		} else {
-			/*
-			* If heart beat timeout called with hb_outstanding set
-			* we need to give the hb mailbox cmd a chance to
-			* complete or TMO.
-			*/
-			lpfc_printf_log(phba, KERN_WARNING, LOG_INIT,
-					"0459 Adapter heartbeat still out"
-					"standing:last compl time was %d ms.\n",
-					jiffies_to_msecs(jiffies
-						 - phba->last_completion_time));
-			mod_timer(&phba->hb_tmofunc,
-				jiffies +
-				msecs_to_jiffies(1000 * LPFC_HB_MBOX_TIMEOUT));
+			tmo = (1000 * LPFC_HB_MBOX_TIMEOUT);
+			goto out;
 		}
 	} else {
-			mod_timer(&phba->hb_tmofunc,
-				jiffies +
-				msecs_to_jiffies(1000 * LPFC_HB_MBOX_INTERVAL));
+		/* Check to see if we want to force a MBX_HEARTBEAT */
+		if (phba->hba_flag & HBA_HBEAT_TMO) {
+			retval = lpfc_issue_hb_mbox(phba);
+			if (retval)
+				tmo = (1000 * LPFC_HB_MBOX_INTERVAL);
+			else
+				tmo = (1000 * LPFC_HB_MBOX_TIMEOUT);
+			goto out;
+		}
+		tmo = (1000 * LPFC_HB_MBOX_INTERVAL);
 	}
+out:
+	mod_timer(&phba->hb_tmofunc, jiffies + msecs_to_jiffies(tmo));
 }
 
 /**
@@ -2989,7 +3024,7 @@ lpfc_stop_hba_timers(struct lpfc_hba *phba)
 		del_timer_sync(&phba->rrq_tmr);
 		phba->hba_flag &= ~HBA_RRQ_ACTIVE;
 	}
-	phba->hb_outstanding = 0;
+	phba->hba_flag &= ~(HBA_HBEAT_INP | HBA_HBEAT_TMO);
 
 	switch (phba->pci_dev_grp) {
 	case LPFC_PCI_DEV_LP:
diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index 0d0d2ca1a5d8..135d8e8a42ba 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -250,6 +250,8 @@ lpfc_els_abort(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp)
 			lpfc_sli_issue_abort_iotag(phba, pring, iocb, NULL);
 			spin_unlock_irq(&phba->hbalock);
 	}
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
 
 	INIT_LIST_HEAD(&abort_list);
 
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index e72c4cd3a97a..5e990f4c1ca6 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -1847,6 +1847,10 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port *pnvme_lport,
 
 	spin_unlock(&lpfc_nbuf->buf_lock);
 	spin_unlock_irqrestore(&phba->hbalock, flags);
+
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	if (ret_val != WQE_SUCCESS) {
 		lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
 				 "6137 Failed abts issue_wqe with status x%x "
@@ -2593,6 +2597,10 @@ lpfc_nvme_wait_for_io_drain(struct lpfc_hba *phba)
 			}
 		}
 	}
+
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 }
 
 void
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 4ee890de556b..1b0e1df9545f 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5479,6 +5479,9 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 						     lpfc_sli_abort_fcp_cmpl);
 	}
 
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	if (ret_val != IOCB_SUCCESS) {
 		/* Indicate the IO is not being aborted by the driver. */
 		lpfc_cmd->waitq = NULL;
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 176706aaebf5..d51d86959bbc 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -4246,6 +4246,8 @@ lpfc_sli_abort_iocb_ring(struct lpfc_hba *phba, struct lpfc_sli_ring *pring)
 			lpfc_sli_issue_abort_iotag(phba, pring, iocb, NULL);
 		spin_unlock_irq(&phba->hbalock);
 	}
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
 
 	/* Cancel all the IOCBs from the completions list */
 	lpfc_sli_cancel_iocbs(phba, &completions, IOSTAT_LOCAL_REJECT,
@@ -8044,7 +8046,7 @@ lpfc_sli4_hba_setup(struct lpfc_hba *phba)
 	/* Start heart beat timer */
 	mod_timer(&phba->hb_tmofunc,
 		  jiffies + msecs_to_jiffies(1000 * LPFC_HB_MBOX_INTERVAL));
-	phba->hb_outstanding = 0;
+	phba->hba_flag &= ~(HBA_HBEAT_INP | HBA_HBEAT_TMO);
 	phba->last_completion_time = jiffies;
 
 	/* start eq_delay heartbeat */
@@ -11218,6 +11220,9 @@ lpfc_sli_host_down(struct lpfc_vport *vport)
 	}
 	spin_unlock_irqrestore(&phba->hbalock, flags);
 
+	/* Make sure HBA is alive */
+	lpfc_issue_hb_tmo(phba);
+
 	/* Cancel all the IOCBs from the completions list */
 	lpfc_sli_cancel_iocbs(phba, &completions, IOSTAT_LOCAL_REJECT,
 			      IOERR_SLI_DOWN);
@@ -13029,7 +13034,21 @@ lpfc_sli_sp_intr_handler(int irq, void *dev_id)
 				spin_unlock_irqrestore(
 						&phba->pport->work_port_lock,
 						iflag);
-				lpfc_mbox_cmpl_put(phba, pmb);
+
+				/* Do NOT queue MBX_HEARTBEAT to the worker
+				 * thread for processing.
+				 */
+				if (pmbox->mbxCommand == MBX_HEARTBEAT) {
+					/* Process mbox now */
+					phba->sli.mbox_active = NULL;
+					phba->sli.sli_flag &=
+						~LPFC_SLI_MBOX_ACTIVE;
+					if (pmb->mbox_cmpl)
+						pmb->mbox_cmpl(phba, pmb);
+				} else {
+					/* Queue to worker thread to process */
+					lpfc_mbox_cmpl_put(phba, pmb);
+				}
 			}
 		} else
 			spin_unlock_irqrestore(&phba->hbalock, iflag);
@@ -13625,7 +13644,26 @@ lpfc_sli4_sp_handle_mbox_event(struct lpfc_hba *phba, struct lpfc_mcqe *mcqe)
 	phba->pport->work_port_events &= ~WORKER_MBOX_TMO;
 	spin_unlock_irqrestore(&phba->pport->work_port_lock, iflags);
 
-	/* There is mailbox completion work to do */
+	/* Do NOT queue MBX_HEARTBEAT to the worker thread for processing. */
+	if (pmbox->mbxCommand == MBX_HEARTBEAT) {
+		spin_lock_irqsave(&phba->hbalock, iflags);
+		/* Release the mailbox command posting token */
+		phba->sli.sli_flag &= ~LPFC_SLI_MBOX_ACTIVE;
+		phba->sli.mbox_active = NULL;
+		if (bf_get(lpfc_trailer_consumed, mcqe))
+			lpfc_sli4_mq_release(phba->sli4_hba.mbx_wq);
+		spin_unlock_irqrestore(&phba->hbalock, iflags);
+
+		/* Post the next mbox command, if there is one */
+		lpfc_sli4_post_async_mbox(phba);
+
+		/* Process cmpl now */
+		if (pmb->mbox_cmpl)
+			pmb->mbox_cmpl(phba, pmb);
+		return false;
+	}
+
+	/* There is mailbox completion work to queue to the worker thread */
 	spin_lock_irqsave(&phba->hbalock, iflags);
 	__lpfc_mbox_cmpl_put(phba, pmb);
 	phba->work_ha |= HA_MBATT;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 14/15] lpfc: Enhancements to LOG_TRACE_EVENT for better readability
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (12 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 13/15] lpfc: Implement health checking when aborting io James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-04 18:02 ` [PATCH v2 15/15] lpfc: Update lpfc version to 12.8.0.7 James Smart
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

While testing recent discovery node rework, several items were seen
that could be done better with respect to the new trace event logic.

1) in the following msg:
      kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
   If cnt is zero in the 1st message, there is no reason to display the
   1st message, which is just giving start/end positioning.

   Fix by not displaying message if cnt is 0.

2) If the driver is loaded with module log verbosity off, and later a
   single NPIV host instance verbosity is enabled via sysfs, it enables
   messages on all instances. This is due to the trace log verbosity
   checks (lpfc_dmp_dbg) looking at the phba only. It should look at the
   phba and the vport.

   Fix by enabling a check on both phba and vport.

3) in the following messages:
       2904 Firmware Dump Image Present on Adapter
       2887 Reset Needed: Attempting Port Recovery...
   These messages are not necessary for the trace event log, which is
   primarily for discovery.

   Fix by changing log level on these 2 messages to LOG_SLI.

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_init.c | 20 +++++++++++++++++++-
 drivers/scsi/lpfc/lpfc_sli.c  |  2 +-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index dbd7e40f67f9..71f340dd4fbd 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -1865,7 +1865,7 @@ lpfc_sli4_port_sta_fn_reset(struct lpfc_hba *phba, int mbx_action,
 
 	/* need reset: attempt for port recovery */
 	if (en_rn_msg)
-		lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
+		lpfc_printf_log(phba, KERN_ERR, LOG_SLI,
 				"2887 Reset Needed: Attempting Port "
 				"Recovery...\n");
 
@@ -14177,15 +14177,32 @@ void lpfc_dmp_dbg(struct lpfc_hba *phba)
 	int i;
 	int j = 0;
 	unsigned long rem_nsec;
+	struct lpfc_vport **vports;
 
+	/* Don't dump messages if we explicitly set log_verbose for the
+	 * physical port or any vport.
+	 */
 	if (phba->cfg_log_verbose)
 		return;
 
+	vports = lpfc_create_vport_work_array(phba);
+	if (vports != NULL) {
+		for (i = 0; i <= phba->max_vpi && vports[i] != NULL; i++) {
+			if (vports[i]->cfg_log_verbose) {
+				lpfc_destroy_vport_work_array(phba, vports);
+				return;
+			}
+		}
+	}
+	lpfc_destroy_vport_work_array(phba, vports);
+
 	if (atomic_cmpxchg(&phba->dbg_log_dmping, 0, 1) != 0)
 		return;
 
 	start_idx = (unsigned int)atomic_read(&phba->dbg_log_idx) % DBG_LOG_SZ;
 	dbg_cnt = (unsigned int)atomic_read(&phba->dbg_log_cnt);
+	if (!dbg_cnt)
+		goto out;
 	temp_idx = start_idx;
 	if (dbg_cnt >= DBG_LOG_SZ) {
 		dbg_cnt = DBG_LOG_SZ;
@@ -14215,6 +14232,7 @@ void lpfc_dmp_dbg(struct lpfc_hba *phba)
 			 rem_nsec / 1000,
 			 phba->dbg_log[temp_idx].log);
 	}
+out:
 	atomic_set(&phba->dbg_log_cnt, 0);
 	atomic_set(&phba->dbg_log_dmping, 0);
 }
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index d51d86959bbc..fa1a714a78f0 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -7491,7 +7491,7 @@ static void lpfc_sli4_dip(struct lpfc_hba *phba)
 			return;
 
 		if (bf_get(lpfc_sliport_status_dip, &reg_data))
-			lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
+			lpfc_printf_log(phba, KERN_ERR, LOG_SLI,
 					"2904 Firmware Dump Image Present"
 					" on Adapter");
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 15/15] lpfc: Update lpfc version to 12.8.0.7
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (13 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 14/15] lpfc: Enhancements to LOG_TRACE_EVENT for better readability James Smart
@ 2021-01-04 18:02 ` James Smart
  2021-01-08  4:02 ` [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 Martin K. Petersen
  2021-01-13  5:48 ` Martin K. Petersen
  16 siblings, 0 replies; 23+ messages in thread
From: James Smart @ 2021-01-04 18:02 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Update lpfc version to 12.8.0.7

Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h
index 234dca60995b..fade044c8f15 100644
--- a/drivers/scsi/lpfc/lpfc_version.h
+++ b/drivers/scsi/lpfc/lpfc_version.h
@@ -20,7 +20,7 @@
  * included with this package.                                     *
  *******************************************************************/
 
-#define LPFC_DRIVER_VERSION "12.8.0.6"
+#define LPFC_DRIVER_VERSION "12.8.0.7"
 #define LPFC_DRIVER_NAME		"lpfc"
 
 /* Used for SLI 2/3 */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (14 preceding siblings ...)
  2021-01-04 18:02 ` [PATCH v2 15/15] lpfc: Update lpfc version to 12.8.0.7 James Smart
@ 2021-01-08  4:02 ` Martin K. Petersen
  2021-01-13  5:48 ` Martin K. Petersen
  16 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2021-01-08  4:02 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi


James,

> Update lpfc to revision 12.8.0.7

Applied to 5.12/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7
  2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
                   ` (15 preceding siblings ...)
  2021-01-08  4:02 ` [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 Martin K. Petersen
@ 2021-01-13  5:48 ` Martin K. Petersen
  16 siblings, 0 replies; 23+ messages in thread
From: Martin K. Petersen @ 2021-01-13  5:48 UTC (permalink / raw)
  To: James Smart, linux-scsi; +Cc: Martin K . Petersen

On Mon, 4 Jan 2021 10:02:25 -0800, James Smart wrote:

> Update lpfc to revision 12.8.0.7
> 
> This patch set contains fixes and a cleanup of trace logging.
> 
> The patches were cut against Martin's 5.11/scsi-queue tree

Applied to 5.12/scsi-queue, thanks!

[01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config
        https://git.kernel.org/mkp/scsi/c/8e062ce305ad
[02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
        https://git.kernel.org/mkp/scsi/c/d2f2547efd39
[03/15] lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state
        https://git.kernel.org/mkp/scsi/c/ecf041fe9895
[04/15] lpfc: Fix crash when a fabric node is released prematurely.
        https://git.kernel.org/mkp/scsi/c/07aaefdf75c5
[05/15] lpfc: Use the nvme-fc transport supplied timeout for LS requests
        https://git.kernel.org/mkp/scsi/c/c33b1609344f
[06/15] lpfc: Fix FW reset action if IOs are outstanding
        https://git.kernel.org/mkp/scsi/c/3ba6216aaded
[07/15] lpfc: Prevent duplicate requests to unregister with cpuhp framework
        https://git.kernel.org/mkp/scsi/c/f0871ab68a8b
[08/15] lpfc: Fix error log messages being logged following scsi task mgnt
        https://git.kernel.org/mkp/scsi/c/da09ae4864e1
[09/15] lpfc: Fix target reset failing
        https://git.kernel.org/mkp/scsi/c/31051249f12e
[10/15] lpfc: Fix NVME recovery after mailbox timeout
        https://git.kernel.org/mkp/scsi/c/9ec58ec7d41a
[11/15] lpfc: Fix vport create logging
        https://git.kernel.org/mkp/scsi/c/ff8a44bff5ef
[12/15] lpfc: Fix crash when nvmet transport calls host_release
        https://git.kernel.org/mkp/scsi/c/243156c0108d
[13/15] lpfc: Implement health checking when aborting io
        https://git.kernel.org/mkp/scsi/c/a22d73b655a8
[14/15] lpfc: Enhancements to LOG_TRACE_EVENT for better readability
        https://git.kernel.org/mkp/scsi/c/0b3ad32e2646
[15/15] lpfc: Update lpfc version to 12.8.0.7
        https://git.kernel.org/mkp/scsi/c/181dd9a4c2c6

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
  2021-01-04 18:02 ` [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3 James Smart
@ 2021-06-07 11:06   ` Daniel Wagner
  2021-06-07 15:12     ` James Smart
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2021-06-07 11:06 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi, Dick Kennedy

Hi James,

On Mon, Jan 04, 2021 at 10:02:27AM -0800, James Smart wrote:
> A very long time ago, there was a feature: auto sli mode. It gave the
> user the ability to auto select the SLI mode (SLI2 or SLI3) to run the
> port in, or even force SLI2 mode if configured.  Because of the
> convoluted logic, the CONFIG_PORT mbox command ends up being called 2 or
> 3 times. It should have been called only once.  Additionally, the driver
> no longer supports SLI-2, so only SLI-3 mode should be allowed.
> 
> The following changes were made:
> - Force module parameter to SLI3 only.
> - Rip out redundant CONFIG_PORT mbox commands.
> - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.
> - Added changes for offline to online behavior

We got a regression report for this patch. The problem seems to be
related with older Emulex HBAs. The symptom is in this case one port is
not enabled. A revert of this patch fixed the problem. This was
observed with:

  Emulex LPe11000 FV2.72X2 DV12.8.0.7 HN:FR2AS6AP2-0001 OS:Linux

Here some ramblings from my debugging:

In the logs I found:

> 0000:0b:00.0: 0:0431 Failed to enable interrupt.
> 0000:0b:00.0: 0:0431 Failed to enable interrupt.
> 0000:0b:00.0: 0:0431 Failed to enable interrupt.

cfg_sli_mode used to be 0 (auto) and the config port setup
used to try first mode = 3 and then fall back to mode = 2

> -       rc = lpfc_sli_config_port(phba, mode);
> -
> -       if (rc && phba->cfg_sli_mode == 3)
> -               lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
> -                               "1820 Unable to select SLI-3.  "
> -                               "Not supported by adapter.\n");
> -       if (rc && mode != 2)
> -               rc = lpfc_sli_config_port(phba, 2);

the port config is now in lpfc_sli_enable_intr which is hardcoded
to LPFC_SLI_REV3 and I think this fails and the HBA_NEEDS_CFG_PORT
flag is not resetted, hence in lpfc_sli_hba_setup() the new
code tries to enable the port again with:

> +       /* Enable ISR already does config_port because of config_msi mbx */
> +       if (phba->hba_flag & HBA_NEEDS_CFG_PORT) {
> +               rc = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
> +               if (rc)
> +                       return -EIO;
> +               phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;

Though I think this should something like

   lpfc_sli_config_port(phba, LPFC_SLI_REV2);

for the specific case.

HTH!

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
  2021-06-07 11:06   ` Daniel Wagner
@ 2021-06-07 15:12     ` James Smart
  2021-06-15 12:45       ` Daniel Wagner
  0 siblings, 1 reply; 23+ messages in thread
From: James Smart @ 2021-06-07 15:12 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: linux-scsi, Dick Kennedy

On 6/7/2021 4:06 AM, Daniel Wagner wrote:
> Hi James,
> 
> On Mon, Jan 04, 2021 at 10:02:27AM -0800, James Smart wrote:
>> A very long time ago, there was a feature: auto sli mode. It gave the
>> user the ability to auto select the SLI mode (SLI2 or SLI3) to run the
>> port in, or even force SLI2 mode if configured.  Because of the
>> convoluted logic, the CONFIG_PORT mbox command ends up being called 2 or
>> 3 times. It should have been called only once.  Additionally, the driver
>> no longer supports SLI-2, so only SLI-3 mode should be allowed.
>>
>> The following changes were made:
>> - Force module parameter to SLI3 only.
>> - Rip out redundant CONFIG_PORT mbox commands.
>> - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.
>> - Added changes for offline to online behavior
> 
> We got a regression report for this patch. The problem seems to be
> related with older Emulex HBAs. The symptom is in this case one port is
> not enabled. A revert of this patch fixed the problem. This was
> observed with:
> 
>    Emulex LPe11000 FV2.72X2 DV12.8.0.7 HN:FR2AS6AP2-0001 OS:Linux
> 
> Here some ramblings from my debugging:
> 
> In the logs I found:
> 
>> 0000:0b:00.0: 0:0431 Failed to enable interrupt.
>> 0000:0b:00.0: 0:0431 Failed to enable interrupt.
>> 0000:0b:00.0: 0:0431 Failed to enable interrupt.
> 
> cfg_sli_mode used to be 0 (auto) and the config port setup
> used to try first mode = 3 and then fall back to mode = 2
> 
>> -       rc = lpfc_sli_config_port(phba, mode);
>> -
>> -       if (rc && phba->cfg_sli_mode == 3)
>> -               lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
>> -                               "1820 Unable to select SLI-3.  "
>> -                               "Not supported by adapter.\n");
>> -       if (rc && mode != 2)
>> -               rc = lpfc_sli_config_port(phba, 2);
> 
> the port config is now in lpfc_sli_enable_intr which is hardcoded
> to LPFC_SLI_REV3 and I think this fails and the HBA_NEEDS_CFG_PORT
> flag is not resetted, hence in lpfc_sli_hba_setup() the new
> code tries to enable the port again with:
> 
>> +       /* Enable ISR already does config_port because of config_msi mbx */
>> +       if (phba->hba_flag & HBA_NEEDS_CFG_PORT) {
>> +               rc = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
>> +               if (rc)
>> +                       return -EIO;
>> +               phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;
> 
> Though I think this should something like
> 
>     lpfc_sli_config_port(phba, LPFC_SLI_REV2);
> 
> for the specific case.
> 
> HTH!
> 
> Thanks,
> Daniel
> 

ouch - What you are describing is likely true, but sli-2 firmware is 
*extremely* old - 2 decades or more. If a change wont work first shot, 
it likely won't be worth the effort to try to fix it. Other 
functionality may be hanging on by a thread.  That adapter certainly 
runs SLI-3 (even that is 10-15 yrs old), so the best solution is a fw 
upgrade that picks up the sli3 interface. Is that possible?

Given that the error message you quoted was a failure of interrupt, that 
may be a clue. It may well be the adapter has sli3 firmware and it's 
failing on setting the interrupt vector type.  The older adapters 
supported MSI and INTx. SLI-2 may have been limited to INTx only. There 
used to be hiccups in some platforms with MSI support (platform said it 
did, but was broken) which is why the driver had "set it, test it, 
revert it" logic. I believe the driver has a lpfc_use_msi module 
parameter that when set to 0 should use only INTx, which may be what the 
sli2 downgrade is effectively doing. Try setting that and seeing if the 
card loads the sli3 image and runs.


-- james

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
  2021-06-07 15:12     ` James Smart
@ 2021-06-15 12:45       ` Daniel Wagner
  2021-06-18  8:52         ` Daniel Wagner
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2021-06-15 12:45 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi, Dick Kennedy

Hi James,

On Mon, Jun 07, 2021 at 08:12:22AM -0700, James Smart wrote:
> ouch - What you are describing is likely true, but sli-2 firmware is
> *extremely* old - 2 decades or more. If a change wont work first shot, it
> likely won't be worth the effort to try to fix it. Other functionality may
> be hanging on by a thread.  That adapter certainly runs SLI-3 (even that is
> 10-15 yrs old), so the best solution is a fw upgrade that picks up the sli3
> interface. Is that possible?

I forwarded the info.

> Given that the error message you quoted was a failure of interrupt, that may
> be a clue. It may well be the adapter has sli3 firmware and it's failing on
> setting the interrupt vector type.  The older adapters supported MSI and
> INTx. SLI-2 may have been limited to INTx only. There used to be hiccups in
> some platforms with MSI support (platform said it did, but was broken) which
> is why the driver had "set it, test it, revert it" logic. I believe the
> driver has a lpfc_use_msi module parameter that when set to 0 should use
> only INTx, which may be what the sli2 downgrade is effectively doing. Try
> setting that and seeing if the card loads the sli3 image and runs.

I haven't heard back yet if the lpfc_use_msi=0 setting fixes the problem
(waiting for the next maintenance window for the experiment).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3
  2021-06-15 12:45       ` Daniel Wagner
@ 2021-06-18  8:52         ` Daniel Wagner
  2021-12-14 13:19           ` [PATCH] lpfc: Reintroduce old IRQ probe logic Daniel Wagner
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Wagner @ 2021-06-18  8:52 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi, Dick Kennedy

Hi James,

On Tue, Jun 15, 2021 at 02:45:02PM +0200, Daniel Wagner wrote:
> I haven't heard back yet if the lpfc_use_msi=0 setting fixes the problem
> (waiting for the next maintenance window for the experiment).

lpfc_use_msi=0 did not help in this case. Only a complete revert of the
patch fixes the issue.

Anyway, I was thinking about adding something a workaround to our
downstream version. I figured that sli_mode is unused and we could abuse
it to select SLI version for lpfc_sli_config_port(). Something like:


diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1ad1beb2a8a8..cf8538ca8402 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -5355,7 +5355,10 @@ lpfc_sli_hba_setup(struct lpfc_hba *phba)
 
        /* Enable ISR already does config_port because of config_msi mbx */
        if (phba->hba_flag & HBA_NEEDS_CFG_PORT) {
-               rc = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
+               if (phba->cfg_sli_mode == 2)
+                       rc = lpfc_sli_config_port(phba, LPFC_SLI_REV2);
+               else
+                       rc = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
                if (rc)
                        return -EIO;
                phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;


Thanks,
Daniel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] lpfc: Reintroduce old IRQ probe logic
  2021-06-18  8:52         ` Daniel Wagner
@ 2021-12-14 13:19           ` Daniel Wagner
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Wagner @ 2021-12-14 13:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy, Daniel Wagner

This brings back the original probing logic by adding the dropped code
to lpfc_sli_hba_setup().

Fixes: d2f2547efd39 ("scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
Hi James,

after a back and forth, this version of the patch 'fixes' the problem
with older hardware. I just post for reference.

Daniel

See also:
https://lore.kernel.org/linux-scsi/20210618085257.ouah6xsjv3akkjhz@beryllium.lan/

 drivers/scsi/lpfc/lpfc_attr.c |  2 +-
 drivers/scsi/lpfc/lpfc_init.c |  8 ++++++--
 drivers/scsi/lpfc/lpfc_sli.c  | 34 +++++++++++++++++++++++++++++++++-
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 7a7f17d71811..5730ac6462fa 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -3632,7 +3632,7 @@ unsigned long lpfc_no_hba_reset[MAX_HBAS_NO_RESET] = {
 module_param_array(lpfc_no_hba_reset, ulong, &lpfc_no_hba_reset_cnt, 0444);
 MODULE_PARM_DESC(lpfc_no_hba_reset, "WWPN of HBAs that should not be reset");
 
-LPFC_ATTR(sli_mode, 3, 3, 3,
+LPFC_ATTR(sli_mode, 3, 0, 3,
 	"SLI mode selector: 3 - select SLI-3");
 
 LPFC_ATTR_R(enable_npiv, 1, 0, 1,
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 2fe7d9d885d9..3f3734127a7f 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12112,8 +12112,12 @@ lpfc_sli_enable_intr(struct lpfc_hba *phba, uint32_t cfg_mode)
 
 	/* Need to issue conf_port mbox cmd before conf_msi mbox cmd */
 	retval = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
-	if (retval)
-		return intr_mode;
+	if (retval) {
+		/* Try SLI-2 before erroring out */
+		retval = lpfc_sli_config_port(phba, LPFC_SLI_REV2);
+		if (retval)
+			return intr_mode;
+	}
 	phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;
 
 	if (cfg_mode == 2) {
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 513a78d08b1d..5f32b5243302 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -5602,7 +5602,39 @@ lpfc_sli_hba_setup(struct lpfc_hba *phba)
 
 	/* Enable ISR already does config_port because of config_msi mbx */
 	if (phba->hba_flag & HBA_NEEDS_CFG_PORT) {
-		rc = lpfc_sli_config_port(phba, LPFC_SLI_REV3);
+		int mode = 3;
+
+		switch (phba->cfg_sli_mode) {
+		case 2:
+			if (phba->cfg_enable_npiv) {
+				lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
+						"1824 NPIV enabled: Override sli_mode "
+						"parameter (%d) to auto (0).\n",
+						phba->cfg_sli_mode);
+				break;
+			}
+			mode = 2;
+			break;
+		case 0:
+		case 3:
+			break;
+		default:
+			lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
+					"1819 Unrecognized sli_mode parameter: %d.\n",
+					phba->cfg_sli_mode);
+			break;
+		}
+
+		rc = lpfc_sli_config_port(phba, mode);
+
+		if (rc && phba->cfg_sli_mode == 3)
+			lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
+					"1820 Unable to select SLI-3.  "
+					"Not supported by adapter.\n");
+		if (rc && mode != 2)
+			rc = lpfc_sli_config_port(phba, 2);
+		else if (rc && mode == 2)
+			rc = lpfc_sli_config_port(phba, 3);
 		if (rc)
 			return -EIO;
 		phba->hba_flag &= ~HBA_NEEDS_CFG_PORT;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-12-14 13:19 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 18:02 [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 James Smart
2021-01-04 18:02 ` [PATCH v2 01/15] lpfc: Fix PLOGI S_ID of 0 on pt2pt config James Smart
2021-01-04 18:02 ` [PATCH v2 02/15] lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3 James Smart
2021-06-07 11:06   ` Daniel Wagner
2021-06-07 15:12     ` James Smart
2021-06-15 12:45       ` Daniel Wagner
2021-06-18  8:52         ` Daniel Wagner
2021-12-14 13:19           ` [PATCH] lpfc: Reintroduce old IRQ probe logic Daniel Wagner
2021-01-04 18:02 ` [PATCH v2 03/15] lpfc: Refresh ndlp when a new PRLI is received in the PRLI issue state James Smart
2021-01-04 18:02 ` [PATCH v2 04/15] lpfc: Fix crash when a fabric node is released prematurely James Smart
2021-01-04 18:02 ` [PATCH v2 05/15] lpfc: Use the nvme-fc transport supplied timeout for LS requests James Smart
2021-01-04 18:02 ` [PATCH v2 06/15] lpfc: Fix FW reset action if IOs are outstanding James Smart
2021-01-04 18:02 ` [PATCH v2 07/15] lpfc: Prevent duplicate requests to unregister with cpuhp framework James Smart
2021-01-04 18:02 ` [PATCH v2 08/15] lpfc: Fix error log messages being logged following scsi task mgnt James Smart
2021-01-04 18:02 ` [PATCH v2 09/15] lpfc: Fix target reset failing James Smart
2021-01-04 18:02 ` [PATCH v2 10/15] lpfc: Fix NVME recovery after mailbox timeout James Smart
2021-01-04 18:02 ` [PATCH v2 11/15] lpfc: Fix vport create logging James Smart
2021-01-04 18:02 ` [PATCH v2 12/15] lpfc: Fix crash when nvmet transport calls host_release James Smart
2021-01-04 18:02 ` [PATCH v2 13/15] lpfc: Implement health checking when aborting io James Smart
2021-01-04 18:02 ` [PATCH v2 14/15] lpfc: Enhancements to LOG_TRACE_EVENT for better readability James Smart
2021-01-04 18:02 ` [PATCH v2 15/15] lpfc: Update lpfc version to 12.8.0.7 James Smart
2021-01-08  4:02 ` [PATCH v2 00/15] lpfc: Update lpfc to revision 12.8.0.7 Martin K. Petersen
2021-01-13  5:48 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.