All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1
@ 2019-11-05  0:56 James Smart
  2019-11-05  0:56 ` [PATCH 01/11] lpfc: Fix duplicate unreg_rpi error in port offline flow James Smart
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart

Update lpfc to revision 12.6.0.1

This patch contains a set fixes and 3 patches of new functionality.
One of the fixes corrects a typo in one of the patches in the last
set (dynamic fw log).

The patches were cut against Martin's 5.5/scsi-queue tree

James Smart (11):
  lpfc: Fix duplicate unreg_rpi error in port offline flow
  lpfc: Fix configuration of BB credit recovery in service parameters
  lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port
    bounce
  lpfc: Fix unexpected error messages during RSCN handling
  lpfc: Fix dynamic fw log enablement check
  lpfc: Sync with FC-NVMe-2 SLER change to require Conf with SLER
  lpfc: Clarify FAWNN error message
  lpfc: Add registration for CPU Offline/Online events
  lpfc: Change default IRQ model on AMD architectures
  lpfc: Add enablement of multiple adapter dumps
  lpfc: Update lpfc version to 12.6.0.1

 drivers/scsi/lpfc/lpfc.h           |  28 ++
 drivers/scsi/lpfc/lpfc_attr.c      | 174 +++++++---
 drivers/scsi/lpfc/lpfc_crtn.h      |   6 +
 drivers/scsi/lpfc/lpfc_els.c       |  21 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c   |  17 +-
 drivers/scsi/lpfc/lpfc_hw4.h       |  10 +
 drivers/scsi/lpfc/lpfc_init.c      | 630 ++++++++++++++++++++++++++++++-------
 drivers/scsi/lpfc/lpfc_nportdisc.c |   4 +-
 drivers/scsi/lpfc/lpfc_sli.c       | 198 +++++++++++-
 drivers/scsi/lpfc/lpfc_sli4.h      |  40 ++-
 drivers/scsi/lpfc/lpfc_version.h   |   2 +-
 11 files changed, 953 insertions(+), 177 deletions(-)

-- 
2.13.7


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/11] lpfc: Fix duplicate unreg_rpi error in port offline flow
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
@ 2019-11-05  0:56 ` James Smart
  2019-11-05  0:56 ` [PATCH 02/11] lpfc: Fix configuration of BB credit recovery in service parameters James Smart
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

If the driver receives a login that is later then LOGO'd by the remote
port (aka ndlp), the driver, upon the completion of the LOGO ACC
transmission, the will logout the node and unregister the rpi that is
being used for the node.  As part of the unreg, the node's rpi
value is replaced by the LPFC_RPI_ALLOC_ERROR value.  If the port is
subsequently offlined, the offline walks the nodes and ensures they
are logged out, which possibly entails unreg'ing their rpi values.
This path does not validate the nodes rpi value, thus doesn't detect that
it has been unreg'd already.  The replaced rpi value is then used when
accessing the rpi bitmask array which tracks active rpi values.
As the LPFC_RPI_ALLOC_ERROR value is not a valid index for the bitmask,
it may fault the system.

Revise the rpi release code to detect when the rpi value is the replaced
RPI_ALLOC_ERROR value and ignore further release steps.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_sli.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 294f041961a8..660f96218b25 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -18247,6 +18247,13 @@ lpfc_sli4_alloc_rpi(struct lpfc_hba *phba)
 static void
 __lpfc_sli4_free_rpi(struct lpfc_hba *phba, int rpi)
 {
+	/*
+	 * if the rpi value indicates a prior unreg has already
+	 * been done, skip the unreg.
+	 */
+	if (rpi == LPFC_RPI_ALLOC_ERROR)
+		return;
+
 	if (test_and_clear_bit(rpi, phba->sli4_hba.rpi_bmask)) {
 		phba->sli4_hba.rpi_count--;
 		phba->sli4_hba.max_cfg_param.rpi_used--;
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 02/11] lpfc: Fix configuration of BB credit recovery in service parameters
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
  2019-11-05  0:56 ` [PATCH 01/11] lpfc: Fix duplicate unreg_rpi error in port offline flow James Smart
@ 2019-11-05  0:56 ` James Smart
  2019-11-05  0:57 ` [PATCH 03/11] lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce James Smart
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

The driver today is reading service parameters from the firmware and
then overwriting the firmware-provided values with values of its own.
There are some switch features that require preliminary FLOGI's that
are switch-specific and done prior to the actual fabric FLOGI for traffic.
The fw will perform those FLOGIs and will revise the service parameters
for the features configured. As the driver later overwrites those values
with its own values, it misconfigures things like BBSCN use by doing so.

Correct by eliminating the driver-overwrite of firmware values. The driver
correctly re-reads the service parameters after each link up to obtain the
latest values from firmware.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_hbadisc.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 40075b391546..88507aa4e920 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -1138,7 +1138,6 @@ void
 lpfc_mbx_cmpl_local_config_link(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb)
 {
 	struct lpfc_vport *vport = pmb->vport;
-	uint8_t bbscn = 0;
 
 	if (pmb->u.mb.mbxStatus)
 		goto out;
@@ -1165,17 +1164,11 @@ lpfc_mbx_cmpl_local_config_link(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb)
 	/* Start discovery by sending a FLOGI. port_state is identically
 	 * LPFC_FLOGI while waiting for FLOGI cmpl
 	 */
-	if (vport->port_state != LPFC_FLOGI) {
-		if (phba->bbcredit_support && phba->cfg_enable_bbcr) {
-			bbscn = bf_get(lpfc_bbscn_def,
-				       &phba->sli4_hba.bbscn_params);
-			vport->fc_sparam.cmn.bbRcvSizeMsb &= 0xf;
-			vport->fc_sparam.cmn.bbRcvSizeMsb |= (bbscn << 4);
-		}
+	if (vport->port_state != LPFC_FLOGI)
 		lpfc_initial_flogi(vport);
-	} else if (vport->fc_flag & FC_PT2PT) {
+	else if (vport->fc_flag & FC_PT2PT)
 		lpfc_disc_start(vport);
-	}
+
 	return;
 
 out:
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 03/11] lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
  2019-11-05  0:56 ` [PATCH 01/11] lpfc: Fix duplicate unreg_rpi error in port offline flow James Smart
  2019-11-05  0:56 ` [PATCH 02/11] lpfc: Fix configuration of BB credit recovery in service parameters James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 04/11] lpfc: Fix unexpected error messages during RSCN handling James Smart
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

When reading sysfs nvme_info file while a remote port leaves and comes
back, a NULL pointer is encountered. The issue is due to ndlp list
corruption as the the nvme_info_show does not use the same lock as the
rest of the code.

Correct by removing the rcu_xxx_lock calls and replace by the host_lock
and phba->hbaLock spinlocks that are used by the rest of the driver.
Given we're called from sysfs, we are safe to use _irq rather than
_irqsave.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_attr.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 2d090ea2b736..3f30bc02da9e 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -176,7 +176,6 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 	int i;
 	int len = 0;
 	char tmp[LPFC_MAX_NVME_INFO_TMP_LEN] = {0};
-	unsigned long iflags = 0;
 
 	if (!(vport->cfg_enable_fc4_type & LPFC_ENABLE_NVME)) {
 		len = scnprintf(buf, PAGE_SIZE, "NVME Disabled\n");
@@ -347,7 +346,6 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 	if (strlcat(buf, "\nNVME Initiator Enabled\n", PAGE_SIZE) >= PAGE_SIZE)
 		goto buffer_done;
 
-	rcu_read_lock();
 	scnprintf(tmp, sizeof(tmp),
 		  "XRI Dist lpfc%d Total %d IO %d ELS %d\n",
 		  phba->brd_no,
@@ -355,7 +353,7 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 		  phba->sli4_hba.io_xri_max,
 		  lpfc_sli4_get_els_iocb_cnt(phba));
 	if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-		goto rcu_unlock_buf_done;
+		goto buffer_done;
 
 	/* Port state is only one of two values for now. */
 	if (localport->port_id)
@@ -371,15 +369,17 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 		  wwn_to_u64(vport->fc_nodename.u.wwn),
 		  localport->port_id, statep);
 	if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-		goto rcu_unlock_buf_done;
+		goto buffer_done;
+
+	spin_lock_irq(shost->host_lock);
 
 	list_for_each_entry(ndlp, &vport->fc_nodes, nlp_listp) {
 		nrport = NULL;
-		spin_lock_irqsave(&vport->phba->hbalock, iflags);
+		spin_lock(&vport->phba->hbalock);
 		rport = lpfc_ndlp_get_nrport(ndlp);
 		if (rport)
 			nrport = rport->remoteport;
-		spin_unlock_irqrestore(&vport->phba->hbalock, iflags);
+		spin_unlock(&vport->phba->hbalock);
 		if (!nrport)
 			continue;
 
@@ -398,39 +398,39 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 
 		/* Tab in to show lport ownership. */
 		if (strlcat(buf, "NVME RPORT       ", PAGE_SIZE) >= PAGE_SIZE)
-			goto rcu_unlock_buf_done;
+			goto unlock_buf_done;
 		if (phba->brd_no >= 10) {
 			if (strlcat(buf, " ", PAGE_SIZE) >= PAGE_SIZE)
-				goto rcu_unlock_buf_done;
+				goto unlock_buf_done;
 		}
 
 		scnprintf(tmp, sizeof(tmp), "WWPN x%llx ",
 			  nrport->port_name);
 		if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-			goto rcu_unlock_buf_done;
+			goto unlock_buf_done;
 
 		scnprintf(tmp, sizeof(tmp), "WWNN x%llx ",
 			  nrport->node_name);
 		if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-			goto rcu_unlock_buf_done;
+			goto unlock_buf_done;
 
 		scnprintf(tmp, sizeof(tmp), "DID x%06x ",
 			  nrport->port_id);
 		if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-			goto rcu_unlock_buf_done;
+			goto unlock_buf_done;
 
 		/* An NVME rport can have multiple roles. */
 		if (nrport->port_role & FC_PORT_ROLE_NVME_INITIATOR) {
 			if (strlcat(buf, "INITIATOR ", PAGE_SIZE) >= PAGE_SIZE)
-				goto rcu_unlock_buf_done;
+				goto unlock_buf_done;
 		}
 		if (nrport->port_role & FC_PORT_ROLE_NVME_TARGET) {
 			if (strlcat(buf, "TARGET ", PAGE_SIZE) >= PAGE_SIZE)
-				goto rcu_unlock_buf_done;
+				goto unlock_buf_done;
 		}
 		if (nrport->port_role & FC_PORT_ROLE_NVME_DISCOVERY) {
 			if (strlcat(buf, "DISCSRVC ", PAGE_SIZE) >= PAGE_SIZE)
-				goto rcu_unlock_buf_done;
+				goto unlock_buf_done;
 		}
 		if (nrport->port_role & ~(FC_PORT_ROLE_NVME_INITIATOR |
 					  FC_PORT_ROLE_NVME_TARGET |
@@ -438,14 +438,14 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 			scnprintf(tmp, sizeof(tmp), "UNKNOWN ROLE x%x",
 				  nrport->port_role);
 			if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-				goto rcu_unlock_buf_done;
+				goto unlock_buf_done;
 		}
 
 		scnprintf(tmp, sizeof(tmp), "%s\n", statep);
 		if (strlcat(buf, tmp, PAGE_SIZE) >= PAGE_SIZE)
-			goto rcu_unlock_buf_done;
+			goto unlock_buf_done;
 	}
-	rcu_read_unlock();
+	spin_unlock_irq(shost->host_lock);
 
 	if (!lport)
 		goto buffer_done;
@@ -505,11 +505,11 @@ lpfc_nvme_info_show(struct device *dev, struct device_attribute *attr,
 		  atomic_read(&lport->cmpl_fcp_err));
 	strlcat(buf, tmp, PAGE_SIZE);
 
-	/* RCU is already unlocked. */
+	/* host_lock is already unlocked. */
 	goto buffer_done;
 
- rcu_unlock_buf_done:
-	rcu_read_unlock();
+ unlock_buf_done:
+	spin_unlock_irq(shost->host_lock);
 
  buffer_done:
 	len = strnlen(buf, PAGE_SIZE);
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 04/11] lpfc: Fix unexpected error messages during RSCN handling
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (2 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 03/11] lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 05/11] lpfc: Fix dynamic fw log enablement check James Smart
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

During heavy RCN activity and log_verbose = 0 we see these messages:
2754 PRLI failure DID:521245 Status:x9/xb2c00, data: x0
0231 RSCN timeout Data: x0 x3
0230 Unexpected timeout, hba link state x5

This is due to delayed RSCN activity.

Correct by avoiding the timeout thus the messages by resetarting
the discovery timeout whenever an rscn is received.

Filter PRLI responses such that severity depends on whether expected for
the configuration or not. For example, PRLI errors on a fabric will be
informational (they are expected), but Point-to-Point errors are not
necessarily expected so they are raised to an error level.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_els.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 9a1b7f331718..9a570c15b2a1 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -2236,6 +2236,7 @@ lpfc_cmpl_els_prli(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 	struct Scsi_Host  *shost = lpfc_shost_from_vport(vport);
 	IOCB_t *irsp;
 	struct lpfc_nodelist *ndlp;
+	char *mode;
 
 	/* we pass cmdiocb to state machine which needs rspiocb as well */
 	cmdiocb->context_un.rsp_iocb = rspiocb;
@@ -2273,8 +2274,17 @@ lpfc_cmpl_els_prli(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 			goto out;
 		}
 
+		/* If we don't send GFT_ID to Fabric, a PRLI error
+		 * could be expected.
+		 */
+		if ((vport->fc_flag & FC_FABRIC) ||
+		    (vport->cfg_enable_fc4_type != LPFC_ENABLE_BOTH))
+			mode = KERN_ERR;
+		else
+			mode = KERN_INFO;
+
 		/* PRLI failed */
-		lpfc_printf_vlog(vport, KERN_ERR, LOG_ELS,
+		lpfc_printf_vlog(vport, mode, LOG_ELS,
 				 "2754 PRLI failure DID:%06X Status:x%x/x%x, "
 				 "data: x%x\n",
 				 ndlp->nlp_DID, irsp->ulpStatus,
@@ -6465,7 +6475,7 @@ lpfc_els_rcv_rscn(struct lpfc_vport *vport, struct lpfc_iocbq *cmdiocb,
 	uint32_t payload_len, length, nportid, *cmd;
 	int rscn_cnt;
 	int rscn_id = 0, hba_id = 0;
-	int i;
+	int i, tmo;
 
 	pcmd = (struct lpfc_dmabuf *) cmdiocb->context2;
 	lp = (uint32_t *) pcmd->virt;
@@ -6571,6 +6581,13 @@ lpfc_els_rcv_rscn(struct lpfc_vport *vport, struct lpfc_iocbq *cmdiocb,
 
 		spin_lock_irq(shost->host_lock);
 		vport->fc_flag |= FC_RSCN_DEFERRED;
+
+		/* Restart disctmo if its already running */
+		if (vport->fc_flag & FC_DISC_TMO) {
+			tmo = ((phba->fc_ratov * 3) + 3);
+			mod_timer(&vport->fc_disctmo,
+				  jiffies + msecs_to_jiffies(1000 * tmo));
+		}
 		if ((rscn_cnt < FC_MAX_HOLD_RSCN) &&
 		    !(vport->fc_flag & FC_RSCN_DISCOVERY)) {
 			vport->fc_flag |= FC_RSCN_MODE;
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 05/11] lpfc: Fix dynamic fw log enablement check
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (3 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 04/11] lpfc: Fix unexpected error messages during RSCN handling James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 06/11] lpfc: Sync with FC-NVMe-2 SLER change to require Conf with SLER James Smart
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

The recently posted patch had a typo that incorrectly tested the receiving
function.

Fix the typo (change == to !=)

Fixes: 95bfc6d8ad86 ("scsi: lpfc: Make FW logging dynamically configurable")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>

---
Prior patch is in the 12.6.0.0 patch set and is only in 5.5/scsi-queue git branch
---
 drivers/scsi/lpfc/lpfc_attr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 3f30bc02da9e..e7f6581935bf 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -5981,7 +5981,7 @@ lpfc_ras_fwlog_buffsize_set(struct lpfc_hba  *phba, uint val)
 	if (phba->cfg_ras_fwlog_buffsize == val)
 		return 0;
 
-	if (phba->cfg_ras_fwlog_func == PCI_FUNC(phba->pcidev->devfn))
+	if (phba->cfg_ras_fwlog_func != PCI_FUNC(phba->pcidev->devfn))
 		return -EINVAL;
 
 	spin_lock_irq(&phba->hbalock);
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 06/11] lpfc: Sync with FC-NVMe-2 SLER change to require Conf with SLER
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (4 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 05/11] lpfc: Fix dynamic fw log enablement check James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 07/11] lpfc: Clarify FAWNN error message James Smart
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Prior to the last FC-NVME-2 draft, SLER and CONF were independent.
SLER now requires CONF to be set.

Revise the NVME PRLI checking to look for both inorder to enable SLER.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_nportdisc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index 64b7aeeea337..3bbe77c36a05 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -2121,7 +2121,9 @@ lpfc_cmpl_prli_prli_issue(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 		if (bf_get_be32(prli_init, nvpr))
 			ndlp->nlp_type |= NLP_NVME_INITIATOR;
 
-		if (phba->nsler && bf_get_be32(prli_nsler, nvpr))
+		if (phba->nsler && bf_get_be32(prli_nsler, nvpr) &&
+		    bf_get_be32(prli_conf, nvpr))
+
 			ndlp->nlp_nvme_info |= NLP_NVME_NSLER;
 		else
 			ndlp->nlp_nvme_info &= ~NLP_NVME_NSLER;
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 07/11] lpfc: Clarify FAWNN error message
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (5 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 06/11] lpfc: Sync with FC-NVMe-2 SLER change to require Conf with SLER James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 08/11] lpfc: Add registration for CPU Offline/Online events James Smart
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Current message on FAWWN events is rather cryptic.

Expand the message to clarify its meaning.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_hbadisc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 88507aa4e920..85ada3deb47d 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -3452,8 +3452,8 @@ lpfc_mbx_cmpl_read_topology(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb)
 				phba->pport->port_state, vport->fc_flag);
 		else if (attn_type == LPFC_ATT_UNEXP_WWPN)
 			lpfc_printf_log(phba, KERN_ERR, LOG_LINK_EVENT,
-				"1313 Link Down UNEXP WWPN Event x%x received "
-				"Data: x%x x%x x%x x%x x%x\n",
+				"1313 Link Down Unexpected FA WWPN Event x%x "
+				"received Data: x%x x%x x%x x%x x%x\n",
 				la->eventTag, phba->fc_eventTag,
 				phba->pport->port_state, vport->fc_flag,
 				bf_get(lpfc_mbx_read_top_mm, la),
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 08/11] lpfc: Add registration for CPU Offline/Online events
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (6 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 07/11] lpfc: Clarify FAWNN error message James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 09/11] lpfc: Change default IRQ model on AMD architectures James Smart
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

The recent affinitization didn't address cpu offlining/onlining.
If an interrupt vector is shared and the low order cpu owning
the vector is offlined, as interrupts are managed, the vector
is taken offline. This causes the other cpu's sharing the vector
will hang as they can't get io completions.

Correct by registering callbacks with the system for Offline/Online
events. When a cpu is taken offline, its eq, which is tied to an
interrupt vector is found. If the cpu is the "owner" of the vector
and if the eq/vector is shared by other cpu's, the eq is placed
into a polled mode.  Additionally, code paths that perform io
submission on the "sharing cpus" will check the eq state and poll
for completion after submission of new io to a wq that uses the eq.

Similarly, when a cpu comes back online and owns an offline'd vector,
the eq is taken out of polled mode and rearmed to start driving
interrupts for eq.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc.h      |   7 ++
 drivers/scsi/lpfc/lpfc_crtn.h |   6 ++
 drivers/scsi/lpfc/lpfc_init.c | 202 +++++++++++++++++++++++++++++++++++++++++-
 drivers/scsi/lpfc/lpfc_sli.c  | 164 ++++++++++++++++++++++++++++++++--
 drivers/scsi/lpfc/lpfc_sli4.h |  21 ++++-
 5 files changed, 388 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 4559f1700c6d..88d5fd99f5ec 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -1215,6 +1215,13 @@ struct lpfc_hba {
 	uint64_t ktime_seg10_min;
 	uint64_t ktime_seg10_max;
 #endif
+
+	struct hlist_node cpuhp;	/* used for cpuhp per hba callback */
+	struct timer_list cpuhp_poll_timer;
+	struct list_head poll_list;	/* slowpath eq polling list */
+#define LPFC_POLL_HB	1		/* slowpath heartbeat */
+#define LPFC_POLL_FASTPATH	0	/* called from fastpath */
+#define LPFC_POLL_SLOWPATH	1	/* called from slowpath */
 };
 
 static inline struct Scsi_Host *
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 6e09fd98a922..d91aa5330306 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -215,6 +215,12 @@ irqreturn_t lpfc_sli_fp_intr_handler(int, void *);
 irqreturn_t lpfc_sli4_intr_handler(int, void *);
 irqreturn_t lpfc_sli4_hba_intr_handler(int, void *);
 
+inline void lpfc_sli4_cleanup_poll_list(struct lpfc_hba *phba);
+int lpfc_sli4_poll_eq(struct lpfc_queue *q, uint8_t path);
+void lpfc_sli4_poll_hbtimer(struct timer_list *t);
+void lpfc_sli4_start_polling(struct lpfc_queue *q);
+void lpfc_sli4_stop_polling(struct lpfc_queue *q);
+
 void lpfc_read_rev(struct lpfc_hba *, LPFC_MBOXQ_t *);
 void lpfc_sli4_swap_str(struct lpfc_hba *, LPFC_MBOXQ_t *);
 void lpfc_config_ring(struct lpfc_hba *, int, LPFC_MBOXQ_t *);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index ec9dbc042a41..888ad32d5267 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -40,6 +40,7 @@
 #include <linux/irq.h>
 #include <linux/bitops.h>
 #include <linux/crash_dump.h>
+#include <linux/cpuhotplug.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
@@ -66,9 +67,13 @@
 #include "lpfc_version.h"
 #include "lpfc_ids.h"
 
+static enum cpuhp_state lpfc_cpuhp_state;
 /* Used when mapping IRQ vectors in a driver centric manner */
 static uint32_t lpfc_present_cpu;
 
+static void __lpfc_cpuhp_remove(struct lpfc_hba *phba);
+static void lpfc_cpuhp_remove(struct lpfc_hba *phba);
+static void lpfc_cpuhp_add(struct lpfc_hba *phba);
 static void lpfc_get_hba_model_desc(struct lpfc_hba *, uint8_t *, uint8_t *);
 static int lpfc_post_rcv_buf(struct lpfc_hba *);
 static int lpfc_sli4_queue_verify(struct lpfc_hba *);
@@ -3379,6 +3384,8 @@ lpfc_online(struct lpfc_hba *phba)
 	if (phba->cfg_xri_rebalancing)
 		lpfc_create_multixri_pools(phba);
 
+	lpfc_cpuhp_add(phba);
+
 	lpfc_unblock_mgmt_io(phba);
 	return 0;
 }
@@ -3542,6 +3549,7 @@ lpfc_offline(struct lpfc_hba *phba)
 			spin_unlock_irq(shost->host_lock);
 		}
 	lpfc_destroy_vport_work_array(phba, vports);
+	__lpfc_cpuhp_remove(phba);
 
 	if (phba->cfg_xri_rebalancing)
 		lpfc_destroy_multixri_pools(phba);
@@ -9255,6 +9263,8 @@ lpfc_sli4_queue_destroy(struct lpfc_hba *phba)
 	}
 	spin_unlock_irq(&phba->hbalock);
 
+	lpfc_sli4_cleanup_poll_list(phba);
+
 	/* Release HBA eqs */
 	if (phba->sli4_hba.hdwq)
 		lpfc_sli4_release_hdwq(phba);
@@ -11058,6 +11068,170 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 }
 
 /**
+ * lpfc_cpuhp_get_eq
+ *
+ * @phba:   pointer to lpfc hba data structure.
+ * @cpu:    cpu going offline
+ * @eqlist:
+ */
+static void
+lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu,
+		  struct list_head *eqlist)
+{
+	struct lpfc_vector_map_info *map;
+	const struct cpumask *maskp;
+	struct lpfc_queue *eq;
+	unsigned int i;
+	cpumask_t tmp;
+	u16 idx;
+
+	for (idx = 0; idx < phba->cfg_irq_chann; idx++) {
+		maskp = pci_irq_get_affinity(phba->pcidev, idx);
+		if (!maskp)
+			continue;
+		/*
+		 * if irq is not affinitized to the cpu going
+		 * then we don't need to poll the eq attached
+		 * to it.
+		 */
+		if (!cpumask_and(&tmp, maskp, cpumask_of(cpu)))
+			continue;
+		/* get the cpus that are online and are affini-
+		 * tized to this irq vector.  If the count is
+		 * more than 1 then cpuhp is not going to shut-
+		 * down this vector.  Since this cpu has not
+		 * gone offline yet, we need >1.
+		 */
+		cpumask_and(&tmp, maskp, cpu_online_mask);
+		if (cpumask_weight(&tmp) > 1)
+			continue;
+
+		/* Now that we have an irq to shutdown, get the eq
+		 * mapped to this irq.  Note: multiple hdwq's in
+		 * the software can share an eq, but eventually
+		 * only eq will be mapped to this vector
+		 */
+		for_each_possible_cpu(i) {
+			map = &phba->sli4_hba.cpu_map[i];
+			if (!(map->irq == pci_irq_vector(phba->pcidev, idx)))
+				continue;
+			eq = phba->sli4_hba.hdwq[map->hdwq].hba_eq;
+			list_add(&eq->_poll_list, eqlist);
+			/* 1 is good enough. others will be a copy of this */
+			break;
+		}
+	}
+}
+
+static void __lpfc_cpuhp_remove(struct lpfc_hba *phba)
+{
+	if (phba->sli_rev != LPFC_SLI_REV4)
+		return;
+
+	cpuhp_state_remove_instance_nocalls(lpfc_cpuhp_state,
+					    &phba->cpuhp);
+	/*
+	 * unregistering the instance doesn't stop the polling
+	 * timer. Wait for the poll timer to retire.
+	 */
+	synchronize_rcu();
+	del_timer_sync(&phba->cpuhp_poll_timer);
+}
+
+static void lpfc_cpuhp_remove(struct lpfc_hba *phba)
+{
+	if (phba->pport->fc_flag & FC_OFFLINE_MODE)
+		return;
+
+	__lpfc_cpuhp_remove(phba);
+}
+
+static void lpfc_cpuhp_add(struct lpfc_hba *phba)
+{
+	if (phba->sli_rev != LPFC_SLI_REV4)
+		return;
+
+	rcu_read_lock();
+
+	if (!list_empty(&phba->poll_list)) {
+		timer_setup(&phba->cpuhp_poll_timer, lpfc_sli4_poll_hbtimer, 0);
+		mod_timer(&phba->cpuhp_poll_timer,
+			  jiffies + msecs_to_jiffies(LPFC_POLL_HB));
+	}
+
+	rcu_read_unlock();
+
+	cpuhp_state_add_instance_nocalls(lpfc_cpuhp_state,
+					 &phba->cpuhp);
+}
+
+static int __lpfc_cpuhp_checks(struct lpfc_hba *phba, int *retval)
+{
+	if (phba->pport->load_flag & FC_UNLOADING) {
+		*retval = -EAGAIN;
+		return true;
+	}
+
+	if (phba->sli_rev != LPFC_SLI_REV4) {
+		*retval = 0;
+		return true;
+	}
+
+	/* proceed with the hotplug */
+	return false;
+}
+
+static int lpfc_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+	struct lpfc_hba *phba = hlist_entry_safe(node, struct lpfc_hba, cpuhp);
+	struct lpfc_queue *eq, *next;
+	LIST_HEAD(eqlist);
+	int retval;
+
+	if (!phba) {
+		WARN_ONCE(!phba, "cpu: %u. phba:NULL", raw_smp_processor_id());
+		return 0;
+	}
+
+	if (__lpfc_cpuhp_checks(phba, &retval))
+		return retval;
+
+	lpfc_cpuhp_get_eq(phba, cpu, &eqlist);
+
+	/* start polling on these eq's */
+	list_for_each_entry_safe(eq, next, &eqlist, _poll_list) {
+		list_del_init(&eq->_poll_list);
+		lpfc_sli4_start_polling(eq);
+	}
+
+	return 0;
+}
+
+static int lpfc_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct lpfc_hba *phba = hlist_entry_safe(node, struct lpfc_hba, cpuhp);
+	struct lpfc_queue *eq, *next;
+	unsigned int n;
+	int retval;
+
+	if (!phba) {
+		WARN_ONCE(!phba, "cpu: %u. phba:NULL", raw_smp_processor_id());
+		return 0;
+	}
+
+	if (__lpfc_cpuhp_checks(phba, &retval))
+		return retval;
+
+	list_for_each_entry_safe(eq, next, &phba->poll_list, _poll_list) {
+		n = lpfc_find_cpu_handle(phba, eq->hdwq, LPFC_FIND_BY_HDWQ);
+		if (n == cpu)
+			lpfc_sli4_stop_polling(eq);
+	}
+
+	return 0;
+}
+
+/**
  * lpfc_sli4_enable_msix - Enable MSI-X interrupt mode to SLI-4 device
  * @phba: pointer to lpfc hba data structure.
  *
@@ -11460,6 +11634,9 @@ lpfc_sli4_hba_unset(struct lpfc_hba *phba)
 	/* Wait for completion of device XRI exchange busy */
 	lpfc_sli4_xri_exchange_busy_wait(phba);
 
+	/* per-phba callback de-registration for hotplug event */
+	lpfc_cpuhp_remove(phba);
+
 	/* Disable PCI subsystem interrupt */
 	lpfc_sli4_disable_intr(phba);
 
@@ -12752,6 +12929,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid)
 	/* Enable RAS FW log support */
 	lpfc_sli4_ras_setup(phba);
 
+	INIT_LIST_HEAD(&phba->poll_list);
+	cpuhp_state_add_instance_nocalls(lpfc_cpuhp_state, &phba->cpuhp);
+
 	return 0;
 
 out_free_sysfs_attr:
@@ -13569,11 +13749,24 @@ lpfc_init(void)
 	/* Initialize in case vector mapping is needed */
 	lpfc_present_cpu = num_present_cpus();
 
+	error = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+					"lpfc/sli4:online",
+					lpfc_cpu_online, lpfc_cpu_offline);
+	if (error < 0)
+		goto cpuhp_failure;
+	lpfc_cpuhp_state = error;
+
 	error = pci_register_driver(&lpfc_driver);
-	if (error) {
-		fc_release_transport(lpfc_transport_template);
-		fc_release_transport(lpfc_vport_transport_template);
-	}
+	if (error)
+		goto unwind;
+
+	return error;
+
+unwind:
+	cpuhp_remove_multi_state(lpfc_cpuhp_state);
+cpuhp_failure:
+	fc_release_transport(lpfc_transport_template);
+	fc_release_transport(lpfc_vport_transport_template);
 
 	return error;
 }
@@ -13590,6 +13783,7 @@ lpfc_exit(void)
 {
 	misc_deregister(&lpfc_mgmt_dev);
 	pci_unregister_driver(&lpfc_driver);
+	cpuhp_remove_multi_state(lpfc_cpuhp_state);
 	fc_release_transport(lpfc_transport_template);
 	fc_release_transport(lpfc_vport_transport_template);
 	idr_destroy(&lpfc_hba_index);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 660f96218b25..5f097328d236 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -515,7 +515,8 @@ lpfc_sli4_eqcq_flush(struct lpfc_hba *phba, struct lpfc_queue *eq)
 }
 
 static int
-lpfc_sli4_process_eq(struct lpfc_hba *phba, struct lpfc_queue *eq)
+lpfc_sli4_process_eq(struct lpfc_hba *phba, struct lpfc_queue *eq,
+		     uint8_t rearm)
 {
 	struct lpfc_eqe *eqe;
 	int count = 0, consumed = 0;
@@ -549,8 +550,8 @@ lpfc_sli4_process_eq(struct lpfc_hba *phba, struct lpfc_queue *eq)
 	eq->queue_claimed = 0;
 
 rearm_and_exit:
-	/* Always clear and re-arm the EQ */
-	phba->sli4_hba.sli4_write_eq_db(phba, eq, consumed, LPFC_QUEUE_REARM);
+	/* Always clear the EQ. */
+	phba->sli4_hba.sli4_write_eq_db(phba, eq, consumed, rearm);
 
 	return count;
 }
@@ -7948,7 +7949,7 @@ lpfc_sli4_process_missed_mbox_completions(struct lpfc_hba *phba)
 
 	if (mbox_pending)
 		/* process and rearm the EQ */
-		lpfc_sli4_process_eq(phba, fpeq);
+		lpfc_sli4_process_eq(phba, fpeq, LPFC_QUEUE_REARM);
 	else
 		/* Always clear and re-arm the EQ */
 		sli4_hba->sli4_write_eq_db(phba, fpeq, 0, LPFC_QUEUE_REARM);
@@ -10113,10 +10114,13 @@ lpfc_sli_issue_iocb(struct lpfc_hba *phba, uint32_t ring_number,
 		    struct lpfc_iocbq *piocb, uint32_t flag)
 {
 	struct lpfc_sli_ring *pring;
+	struct lpfc_queue *eq;
 	unsigned long iflags;
 	int rc;
 
 	if (phba->sli_rev == LPFC_SLI_REV4) {
+		eq = phba->sli4_hba.hdwq[piocb->hba_wqidx].hba_eq;
+
 		pring = lpfc_sli4_calc_ring(phba, piocb);
 		if (unlikely(pring == NULL))
 			return IOCB_ERROR;
@@ -10124,6 +10128,8 @@ lpfc_sli_issue_iocb(struct lpfc_hba *phba, uint32_t ring_number,
 		spin_lock_irqsave(&pring->ring_lock, iflags);
 		rc = __lpfc_sli_issue_iocb(phba, ring_number, piocb, flag);
 		spin_unlock_irqrestore(&pring->ring_lock, iflags);
+
+		lpfc_sli4_poll_eq(eq, LPFC_POLL_FASTPATH);
 	} else {
 		/* For now, SLI2/3 will still use hbalock */
 		spin_lock_irqsave(&phba->hbalock, iflags);
@@ -14302,7 +14308,7 @@ lpfc_sli4_hba_intr_handler(int irq, void *dev_id)
 		lpfc_sli4_mod_hba_eq_delay(phba, fpeq, LPFC_MAX_AUTO_EQ_DELAY);
 
 	/* process and rearm the EQ */
-	ecount = lpfc_sli4_process_eq(phba, fpeq);
+	ecount = lpfc_sli4_process_eq(phba, fpeq, LPFC_QUEUE_REARM);
 
 	if (unlikely(ecount == 0)) {
 		fpeq->EQ_no_entry++;
@@ -14362,6 +14368,147 @@ lpfc_sli4_intr_handler(int irq, void *dev_id)
 	return (hba_handled == true) ? IRQ_HANDLED : IRQ_NONE;
 } /* lpfc_sli4_intr_handler */
 
+void lpfc_sli4_poll_hbtimer(struct timer_list *t)
+{
+	struct lpfc_hba *phba = from_timer(phba, t, cpuhp_poll_timer);
+	struct lpfc_queue *eq;
+	int i = 0;
+
+	rcu_read_lock();
+
+	list_for_each_entry_rcu(eq, &phba->poll_list, _poll_list)
+		i += lpfc_sli4_poll_eq(eq, LPFC_POLL_SLOWPATH);
+	if (!list_empty(&phba->poll_list))
+		mod_timer(&phba->cpuhp_poll_timer,
+			  jiffies + msecs_to_jiffies(LPFC_POLL_HB));
+
+	rcu_read_unlock();
+}
+
+inline int lpfc_sli4_poll_eq(struct lpfc_queue *eq, uint8_t path)
+{
+	struct lpfc_hba *phba = eq->phba;
+	int i = 0;
+
+	/*
+	 * Unlocking an irq is one of the entry point to check
+	 * for re-schedule, but we are good for io submission
+	 * path as midlayer does a get_cpu to glue us in. Flush
+	 * out the invalidate queue so we can see the updated
+	 * value for flag.
+	 */
+	smp_rmb();
+
+	if (READ_ONCE(eq->mode) == LPFC_EQ_POLL)
+		/* We will not likely get the completion for the caller
+		 * during this iteration but i guess that's fine.
+		 * Future io's coming on this eq should be able to
+		 * pick it up.  As for the case of single io's, they
+		 * will be handled through a sched from polling timer
+		 * function which is currently triggered every 1msec.
+		 */
+		i = lpfc_sli4_process_eq(phba, eq, LPFC_QUEUE_NOARM);
+
+	return i;
+}
+
+static inline void lpfc_sli4_add_to_poll_list(struct lpfc_queue *eq)
+{
+	struct lpfc_hba *phba = eq->phba;
+
+	if (list_empty(&phba->poll_list)) {
+		timer_setup(&phba->cpuhp_poll_timer, lpfc_sli4_poll_hbtimer, 0);
+		/* kickstart slowpath processing for this eq */
+		mod_timer(&phba->cpuhp_poll_timer,
+			  jiffies + msecs_to_jiffies(LPFC_POLL_HB));
+	}
+
+	list_add_rcu(&eq->_poll_list, &phba->poll_list);
+	synchronize_rcu();
+}
+
+static inline void lpfc_sli4_remove_from_poll_list(struct lpfc_queue *eq)
+{
+	struct lpfc_hba *phba = eq->phba;
+
+	/* Disable slowpath processing for this eq.  Kick start the eq
+	 * by RE-ARMING the eq's ASAP
+	 */
+	list_del_rcu(&eq->_poll_list);
+	synchronize_rcu();
+
+	if (list_empty(&phba->poll_list))
+		del_timer_sync(&phba->cpuhp_poll_timer);
+}
+
+inline void lpfc_sli4_cleanup_poll_list(struct lpfc_hba *phba)
+{
+	struct lpfc_queue *eq, *next;
+
+	list_for_each_entry_safe(eq, next, &phba->poll_list, _poll_list)
+		list_del(&eq->_poll_list);
+
+	INIT_LIST_HEAD(&phba->poll_list);
+	synchronize_rcu();
+}
+
+static inline void
+__lpfc_sli4_switch_eqmode(struct lpfc_queue *eq, uint8_t mode)
+{
+	if (mode == eq->mode)
+		return;
+	/*
+	 * currently this function is only called during a hotplug
+	 * event and the cpu on which this function is executing
+	 * is going offline.  By now the hotplug has instructed
+	 * the scheduler to remove this cpu from cpu active mask.
+	 * So we don't need to work about being put aside by the
+	 * scheduler for a high priority process.  Yes, the inte-
+	 * rrupts could come but they are known to retire ASAP.
+	 */
+
+	/* Disable polling in the fastpath */
+	WRITE_ONCE(eq->mode, mode);
+	/* flush out the store buffer */
+	smp_wmb();
+
+	/*
+	 * Add this eq to the polling list and start polling. For
+	 * a grace period both interrupt handler and poller will
+	 * try to process the eq _but_ that's fine.  We have a
+	 * synchronization mechanism in place (queue_claimed) to
+	 * deal with it.  This is just a draining phase for int-
+	 * errupt handler (not eq's) as we have guranteed through
+	 * barrier that all the cpu's have seen the new CQ_POLLED
+	 * state. which will effectively disable the REARMING of
+	 * the EQ.  The whole idea is eq's die off eventually as
+	 * we are not rearming EQ's anymore.
+	 */
+	mode ? lpfc_sli4_add_to_poll_list(eq) :
+	       lpfc_sli4_remove_from_poll_list(eq);
+}
+
+void lpfc_sli4_start_polling(struct lpfc_queue *eq)
+{
+	__lpfc_sli4_switch_eqmode(eq, LPFC_EQ_POLL);
+}
+
+void lpfc_sli4_stop_polling(struct lpfc_queue *eq)
+{
+	struct lpfc_hba *phba = eq->phba;
+
+	__lpfc_sli4_switch_eqmode(eq, LPFC_EQ_INTERRUPT);
+
+	/* Kick start for the pending io's in h/w.
+	 * Once we switch back to interrupt processing on a eq
+	 * the io path completion will only arm eq's when it
+	 * receives a completion.  But since eq's are in disa-
+	 * rmed state it doesn't receive a completion.  This
+	 * creates a deadlock scenaro.
+	 */
+	phba->sli4_hba.sli4_write_eq_db(phba, eq, 0, LPFC_QUEUE_REARM);
+}
+
 /**
  * lpfc_sli4_queue_free - free a queue structure and associated memory
  * @queue: The queue structure to free.
@@ -14436,6 +14583,7 @@ lpfc_sli4_queue_alloc(struct lpfc_hba *phba, uint32_t page_size,
 		return NULL;
 
 	INIT_LIST_HEAD(&queue->list);
+	INIT_LIST_HEAD(&queue->_poll_list);
 	INIT_LIST_HEAD(&queue->wq_list);
 	INIT_LIST_HEAD(&queue->wqfull_list);
 	INIT_LIST_HEAD(&queue->page_list);
@@ -19757,6 +19905,8 @@ lpfc_sli4_issue_wqe(struct lpfc_hba *phba, struct lpfc_sli4_hdw_queue *qp,
 
 		lpfc_sli_ringtxcmpl_put(phba, pring, pwqe);
 		spin_unlock_irqrestore(&pring->ring_lock, iflags);
+
+		lpfc_sli4_poll_eq(qp->hba_eq, LPFC_POLL_FASTPATH);
 		return 0;
 	}
 
@@ -19777,6 +19927,8 @@ lpfc_sli4_issue_wqe(struct lpfc_hba *phba, struct lpfc_sli4_hdw_queue *qp,
 		}
 		lpfc_sli_ringtxcmpl_put(phba, pring, pwqe);
 		spin_unlock_irqrestore(&pring->ring_lock, iflags);
+
+		lpfc_sli4_poll_eq(qp->hba_eq, LPFC_POLL_FASTPATH);
 		return 0;
 	}
 
@@ -19805,6 +19957,8 @@ lpfc_sli4_issue_wqe(struct lpfc_hba *phba, struct lpfc_sli4_hdw_queue *qp,
 		}
 		lpfc_sli_ringtxcmpl_put(phba, pring, pwqe);
 		spin_unlock_irqrestore(&pring->ring_lock, iflags);
+
+		lpfc_sli4_poll_eq(qp->hba_eq, LPFC_POLL_FASTPATH);
 		return 0;
 	}
 	return WQE_ERROR;
diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h
index bbe24c19b1d9..ef32159248d7 100644
--- a/drivers/scsi/lpfc/lpfc_sli4.h
+++ b/drivers/scsi/lpfc/lpfc_sli4.h
@@ -133,6 +133,23 @@ struct lpfc_rqb {
 struct lpfc_queue {
 	struct list_head list;
 	struct list_head wq_list;
+
+	/*
+	 * If interrupts are in effect on _all_ the eq's the footprint
+	 * of polling code is zero (except mode). This memory is chec-
+	 * ked for every io to see if the io needs to be polled and
+	 * while completion to check if the eq's needs to be rearmed.
+	 * Keep in same cacheline as the queue ptr to avoid cpu fetch
+	 * stalls. Using 1B memory will leave us with 7B hole. Fill
+	 * it with other frequently used members.
+	 */
+	uint16_t last_cpu;	/* most recent cpu */
+	uint16_t hdwq;
+	uint8_t	 qe_valid;
+	uint8_t  mode;	/* interrupt or polling */
+#define LPFC_EQ_INTERRUPT	0
+#define LPFC_EQ_POLL		1
+
 	struct list_head wqfull_list;
 	enum lpfc_sli4_queue_type type;
 	enum lpfc_sli4_queue_subtype subtype;
@@ -240,10 +257,8 @@ struct lpfc_queue {
 	struct delayed_work	sched_spwork;
 
 	uint64_t isr_timestamp;
-	uint16_t hdwq;
-	uint16_t last_cpu;	/* most recent cpu */
-	uint8_t	qe_valid;
 	struct lpfc_queue *assoc_qp;
+	struct list_head _poll_list;
 	void **q_pgs;	/* array to index entries per page */
 };
 
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 09/11] lpfc: Change default IRQ model on AMD architectures
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (7 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 08/11] lpfc: Add registration for CPU Offline/Online events James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 10/11] lpfc: Add enablement of multiple adapter dumps James Smart
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

The current driver attempts to allocate an interrupt vector
per cpu using the systems managed IRQ allocator (flag
PCI_IRQ_AFFINITY). The system IRQ allocator will either provide
the per-cpu vector, or return fewer vectors. When fewer vectors,
they are evenly spread between the numa nodes on the system.
When run on an AMD architecture, if interrupts occur to a cpu that
is not in the same numa node as the adapter generating the
interrupt, there are extreme costs and overheads in performance.
Thus, if 1:1 vector allocation is used, or the "balanced" vectors
in the other numa nodes, performance can be hit significantly.

A much more performance model is to allocate interrupts only on
the cpus that are in the numa node where the adapter resides.
I/O completion is still performed by the cpu that where the I/O
was generated. Unfortunately, there is no flag to request the
managed IRQ subsystem allocate vectors only for the cpu's in the
numa node as the adapter.

On AMD architecture, revert the irq allocation to the normal style
(non-managed) and then use irq_set_affinity_hint() to set the cpu
affinity and disable user-space rebalancing.

Tie the support into CPU offline/online. If the cpu being offlined
owns a vector, the vector is re-affinitized to one of the other
cpu's on the same numa node. If there are no more cpus on the
numa node, the vector has all affinity removed and lets the system
determine where it's serviced. Similarly, when the cpu that owned
a vector comes online, the vector is reaffinitized to the cpu.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc.h      |  21 ++
 drivers/scsi/lpfc/lpfc_attr.c | 132 +++++++++++--
 drivers/scsi/lpfc/lpfc_init.c | 450 +++++++++++++++++++++++++++++++-----------
 drivers/scsi/lpfc/lpfc_sli4.h |  19 +-
 4 files changed, 484 insertions(+), 138 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 88d5fd99f5ec..935f98804198 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -837,6 +837,7 @@ struct lpfc_hba {
 	uint32_t cfg_fcp_mq_threshold;
 	uint32_t cfg_hdw_queue;
 	uint32_t cfg_irq_chann;
+	uint32_t cfg_irq_numa;
 	uint32_t cfg_suppress_rsp;
 	uint32_t cfg_nvme_oas;
 	uint32_t cfg_nvme_embed_cmd;
@@ -1312,6 +1313,26 @@ lpfc_phba_elsring(struct lpfc_hba *phba)
 }
 
 /**
+ * lpfc_next_online_numa_cpu - Finds next online CPU on NUMA node
+ * @numa_mask: Pointer to phba's numa_mask member.
+ * @start: starting cpu index
+ *
+ * Note: If no valid cpu found, then nr_cpu_ids is returned.
+ *
+ **/
+static inline unsigned int
+lpfc_next_online_numa_cpu(const struct cpumask *numa_mask, unsigned int start)
+{
+	unsigned int cpu_it;
+
+	for_each_cpu_wrap(cpu_it, numa_mask, start) {
+		if (cpu_online(cpu_it))
+			break;
+	}
+
+	return cpu_it;
+}
+/**
  * lpfc_sli4_mod_hba_eq_delay - update EQ delay
  * @phba: Pointer to HBA context object.
  * @q: The Event Queue to update.
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index e7f6581935bf..4ff82b36a37a 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -5331,7 +5331,7 @@ lpfc_fcp_cpu_map_show(struct device *dev, struct device_attribute *attr,
 			len += scnprintf(buf + len, PAGE_SIZE - len,
 					"CPU %02d not present\n",
 					phba->sli4_hba.curr_disp_cpu);
-		else if (cpup->irq == LPFC_VECTOR_MAP_EMPTY) {
+		else if (cpup->eq == LPFC_VECTOR_MAP_EMPTY) {
 			if (cpup->hdwq == LPFC_VECTOR_MAP_EMPTY)
 				len += scnprintf(
 					buf + len, PAGE_SIZE - len,
@@ -5344,10 +5344,10 @@ lpfc_fcp_cpu_map_show(struct device *dev, struct device_attribute *attr,
 			else
 				len += scnprintf(
 					buf + len, PAGE_SIZE - len,
-					"CPU %02d EQ %04d hdwq %04d "
+					"CPU %02d EQ None hdwq %04d "
 					"physid %d coreid %d ht %d ua %d\n",
 					phba->sli4_hba.curr_disp_cpu,
-					cpup->eq, cpup->hdwq, cpup->phys_id,
+					cpup->hdwq, cpup->phys_id,
 					cpup->core_id,
 					(cpup->flag & LPFC_CPU_MAP_HYPER),
 					(cpup->flag & LPFC_CPU_MAP_UNASSIGN));
@@ -5362,7 +5362,7 @@ lpfc_fcp_cpu_map_show(struct device *dev, struct device_attribute *attr,
 					cpup->core_id,
 					(cpup->flag & LPFC_CPU_MAP_HYPER),
 					(cpup->flag & LPFC_CPU_MAP_UNASSIGN),
-					cpup->irq);
+					lpfc_get_irq(cpup->eq));
 			else
 				len += scnprintf(
 					buf + len, PAGE_SIZE - len,
@@ -5373,7 +5373,7 @@ lpfc_fcp_cpu_map_show(struct device *dev, struct device_attribute *attr,
 					cpup->core_id,
 					(cpup->flag & LPFC_CPU_MAP_HYPER),
 					(cpup->flag & LPFC_CPU_MAP_UNASSIGN),
-					cpup->irq);
+					lpfc_get_irq(cpup->eq));
 		}
 
 		phba->sli4_hba.curr_disp_cpu++;
@@ -5744,7 +5744,7 @@ LPFC_ATTR_RW(nvme_embed_cmd, 1, 0, 2,
  * the driver will advertise it supports to the SCSI layer.
  *
  *      0    = Set nr_hw_queues by the number of CPUs or HW queues.
- *      1,128 = Manually specify the maximum nr_hw_queue value to be set,
+ *      1,256 = Manually specify nr_hw_queue value to be advertised,
  *
  * Value range is [0,256]. Default value is 8.
  */
@@ -5762,30 +5762,130 @@ LPFC_ATTR_R(fcp_mq_threshold, LPFC_FCP_MQ_THRESHOLD_DEF,
  * A hardware IO queue maps (qidx) to a specific driver CQ/WQ.
  *
  *      0    = Configure the number of hdw queues to the number of active CPUs.
- *      1,128 = Manually specify how many hdw queues to use.
+ *      1,256 = Manually specify how many hdw queues to use.
  *
- * Value range is [0,128]. Default value is 0.
+ * Value range is [0,256]. Default value is 0.
  */
 LPFC_ATTR_R(hdw_queue,
 	    LPFC_HBA_HDWQ_DEF,
 	    LPFC_HBA_HDWQ_MIN, LPFC_HBA_HDWQ_MAX,
 	    "Set the number of I/O Hardware Queues");
 
+static inline void
+lpfc_assign_default_irq_numa(struct lpfc_hba *phba)
+{
+#if IS_ENABLED(CONFIG_X86)
+	/* If AMD architecture, then default is LPFC_IRQ_CHANN_NUMA */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		phba->cfg_irq_numa = 1;
+	else
+		phba->cfg_irq_numa = 0;
+#else
+	phba->cfg_irq_numa = 0;
+#endif
+}
+
 /*
  * lpfc_irq_chann: Set the number of IRQ vectors that are available
  * for Hardware Queues to utilize.  This also will map to the number
  * of EQ / MSI-X vectors the driver will create. This should never be
  * more than the number of Hardware Queues
  *
- *      0     = Configure number of IRQ Channels to the number of active CPUs.
- *      1,128 = Manually specify how many IRQ Channels to use.
+ *	0		= Configure number of IRQ Channels to:
+ *			  if AMD architecture, number of CPUs on HBA's NUMA node
+ *			  otherwise, number of active CPUs.
+ *	[1,256]		= Manually specify how many IRQ Channels to use.
  *
- * Value range is [0,128]. Default value is 0.
+ * Value range is [0,256]. Default value is [0].
  */
-LPFC_ATTR_R(irq_chann,
-	    LPFC_HBA_HDWQ_DEF,
-	    LPFC_HBA_HDWQ_MIN, LPFC_HBA_HDWQ_MAX,
-	    "Set the number of I/O IRQ Channels");
+static uint lpfc_irq_chann = LPFC_IRQ_CHANN_DEF;
+module_param(lpfc_irq_chann, uint, 0444);
+MODULE_PARM_DESC(lpfc_irq_chann, "Set number of interrupt vectors to allocate");
+
+/* lpfc_irq_chann_init - Set the hba irq_chann initial value
+ * @phba: lpfc_hba pointer.
+ * @val: contains the initial value
+ *
+ * Description:
+ * Validates the initial value is within range and assigns it to the
+ * adapter. If not in range, an error message is posted and the
+ * default value is assigned.
+ *
+ * Returns:
+ * zero if value is in range and is set
+ * -EINVAL if value was out of range
+ **/
+static int
+lpfc_irq_chann_init(struct lpfc_hba *phba, uint32_t val)
+{
+	const struct cpumask *numa_mask;
+
+	if (phba->cfg_use_msi != 2) {
+		lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
+				"8532 use_msi = %u ignoring cfg_irq_numa\n",
+				phba->cfg_use_msi);
+		phba->cfg_irq_numa = 0;
+		phba->cfg_irq_chann = LPFC_IRQ_CHANN_MIN;
+		return 0;
+	}
+
+	/* Check if default setting was passed */
+	if (val == LPFC_IRQ_CHANN_DEF)
+		lpfc_assign_default_irq_numa(phba);
+
+	if (phba->cfg_irq_numa) {
+		numa_mask = &phba->sli4_hba.numa_mask;
+
+		if (cpumask_empty(numa_mask)) {
+			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
+					"8533 Could not identify NUMA node, "
+					"ignoring cfg_irq_numa\n");
+			phba->cfg_irq_numa = 0;
+			phba->cfg_irq_chann = LPFC_IRQ_CHANN_MIN;
+		} else {
+			phba->cfg_irq_chann = cpumask_weight(numa_mask);
+			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
+					"8543 lpfc_irq_chann set to %u "
+					"(numa)\n", phba->cfg_irq_chann);
+		}
+	} else {
+		if (val > LPFC_IRQ_CHANN_MAX) {
+			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
+					"8545 lpfc_irq_chann attribute cannot "
+					"be set to %u, allowed range is "
+					"[%u,%u]\n",
+					val,
+					LPFC_IRQ_CHANN_MIN,
+					LPFC_IRQ_CHANN_MAX);
+			phba->cfg_irq_chann = LPFC_IRQ_CHANN_MIN;
+			return -EINVAL;
+		}
+		phba->cfg_irq_chann = val;
+	}
+
+	return 0;
+}
+
+/**
+ * lpfc_irq_chann_show - Display value of irq_chann
+ * @dev: class converted to a Scsi_host structure.
+ * @attr: device attribute, not used.
+ * @buf: on return contains a string with the list sizes
+ *
+ * Returns: size of formatted string.
+ **/
+static ssize_t
+lpfc_irq_chann_show(struct device *dev, struct device_attribute *attr,
+		    char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct lpfc_vport *vport = (struct lpfc_vport *)shost->hostdata;
+	struct lpfc_hba *phba = vport->phba;
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", phba->cfg_irq_chann);
+}
+
+static DEVICE_ATTR_RO(lpfc_irq_chann);
 
 /*
 # lpfc_enable_hba_reset: Allow or prevent HBA resets to the hardware.
@@ -7190,6 +7290,7 @@ lpfc_get_hba_function_mode(struct lpfc_hba *phba)
 void
 lpfc_get_cfgparam(struct lpfc_hba *phba)
 {
+	lpfc_hba_log_verbose_init(phba, lpfc_log_verbose);
 	lpfc_fcp_io_sched_init(phba, lpfc_fcp_io_sched);
 	lpfc_ns_query_init(phba, lpfc_ns_query);
 	lpfc_fcp2_no_tgt_reset_init(phba, lpfc_fcp2_no_tgt_reset);
@@ -7296,7 +7397,6 @@ lpfc_get_cfgparam(struct lpfc_hba *phba)
 	phba->cfg_soft_wwpn = 0L;
 	lpfc_sg_seg_cnt_init(phba, lpfc_sg_seg_cnt);
 	lpfc_hba_queue_depth_init(phba, lpfc_hba_queue_depth);
-	lpfc_hba_log_verbose_init(phba, lpfc_log_verbose);
 	lpfc_aer_support_init(phba, lpfc_aer_support);
 	lpfc_sriov_nr_virtfn_init(phba, lpfc_sriov_nr_virtfn);
 	lpfc_request_firmware_upgrade_init(phba, lpfc_req_fw_upgrade);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 888ad32d5267..28e6a763f106 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -40,6 +40,7 @@
 #include <linux/irq.h>
 #include <linux/bitops.h>
 #include <linux/crash_dump.h>
+#include <linux/cpu.h>
 #include <linux/cpuhotplug.h>
 
 #include <scsi/scsi.h>
@@ -5996,6 +5997,35 @@ static void lpfc_log_intr_mode(struct lpfc_hba *phba, uint32_t intr_mode)
 }
 
 /**
+ * lpfc_cpumask_of_node_init - initalizes cpumask of phba's NUMA node
+ * @phba: Pointer to HBA context object.
+ *
+ **/
+static void
+lpfc_cpumask_of_node_init(struct lpfc_hba *phba)
+{
+	unsigned int cpu, numa_node;
+	struct cpumask *numa_mask = NULL;
+
+#ifdef CONFIG_NUMA
+	numa_node = phba->pcidev->dev.numa_node;
+#else
+	numa_node = NUMA_NO_NODE;
+#endif
+	numa_mask = &phba->sli4_hba.numa_mask;
+
+	cpumask_clear(numa_mask);
+
+	/* Check if we're a NUMA architecture */
+	if (!cpumask_of_node(numa_node))
+		return;
+
+	for_each_possible_cpu(cpu)
+		if (cpu_to_node(cpu) == numa_node)
+			cpumask_set_cpu(cpu, numa_mask);
+}
+
+/**
  * lpfc_enable_pci_dev - Enable a generic PCI device.
  * @phba: pointer to lpfc hba data structure.
  *
@@ -6438,6 +6468,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
 	phba->sli4_hba.num_present_cpu = lpfc_present_cpu;
 	phba->sli4_hba.num_possible_cpu = num_possible_cpus();
 	phba->sli4_hba.curr_disp_cpu = 0;
+	lpfc_cpumask_of_node_init(phba);
 
 	/* Get all the module params for configuring this host */
 	lpfc_get_cfgparam(phba);
@@ -6973,6 +7004,7 @@ lpfc_sli4_driver_resource_unset(struct lpfc_hba *phba)
 	phba->sli4_hba.num_possible_cpu = 0;
 	phba->sli4_hba.num_present_cpu = 0;
 	phba->sli4_hba.curr_disp_cpu = 0;
+	cpumask_clear(&phba->sli4_hba.numa_mask);
 
 	/* Free memory allocated for fast-path work queue handles */
 	kfree(phba->sli4_hba.hba_eq_hdl);
@@ -10686,7 +10718,6 @@ lpfc_find_cpu_handle(struct lpfc_hba *phba, uint16_t id, int match)
 		 */
 		if ((match == LPFC_FIND_BY_EQ) &&
 		    (cpup->flag & LPFC_CPU_FIRST_IRQ) &&
-		    (cpup->irq != LPFC_VECTOR_MAP_EMPTY) &&
 		    (cpup->eq == id))
 			return cpu;
 
@@ -10724,6 +10755,75 @@ lpfc_find_hyper(struct lpfc_hba *phba, int cpu,
 }
 #endif
 
+/*
+ * lpfc_assign_eq_map_info - Assigns eq for vector_map structure
+ * @phba: pointer to lpfc hba data structure.
+ * @eqidx: index for eq and irq vector
+ * @flag: flags to set for vector_map structure
+ * @cpu: cpu used to index vector_map structure
+ *
+ * The routine assigns eq info into vector_map structure
+ */
+static inline void
+lpfc_assign_eq_map_info(struct lpfc_hba *phba, uint16_t eqidx, uint16_t flag,
+			unsigned int cpu)
+{
+	struct lpfc_vector_map_info *cpup = &phba->sli4_hba.cpu_map[cpu];
+	struct lpfc_hba_eq_hdl *eqhdl = lpfc_get_eq_hdl(eqidx);
+
+	cpup->eq = eqidx;
+	cpup->flag |= flag;
+
+	lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
+			"3336 Set Affinity: CPU %d irq %d eq %d flag x%x\n",
+			cpu, eqhdl->irq, cpup->eq, cpup->flag);
+}
+
+/**
+ * lpfc_cpu_map_array_init - Initialize cpu_map structure
+ * @phba: pointer to lpfc hba data structure.
+ *
+ * The routine initializes the cpu_map array structure
+ */
+static void
+lpfc_cpu_map_array_init(struct lpfc_hba *phba)
+{
+	struct lpfc_vector_map_info *cpup;
+	struct lpfc_eq_intr_info *eqi;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		cpup = &phba->sli4_hba.cpu_map[cpu];
+		cpup->phys_id = LPFC_VECTOR_MAP_EMPTY;
+		cpup->core_id = LPFC_VECTOR_MAP_EMPTY;
+		cpup->hdwq = LPFC_VECTOR_MAP_EMPTY;
+		cpup->eq = LPFC_VECTOR_MAP_EMPTY;
+		cpup->flag = 0;
+		eqi = per_cpu_ptr(phba->sli4_hba.eq_info, cpu);
+		INIT_LIST_HEAD(&eqi->list);
+		eqi->icnt = 0;
+	}
+}
+
+/**
+ * lpfc_hba_eq_hdl_array_init - Initialize hba_eq_hdl structure
+ * @phba: pointer to lpfc hba data structure.
+ *
+ * The routine initializes the hba_eq_hdl array structure
+ */
+static void
+lpfc_hba_eq_hdl_array_init(struct lpfc_hba *phba)
+{
+	struct lpfc_hba_eq_hdl *eqhdl;
+	int i;
+
+	for (i = 0; i < phba->cfg_irq_chann; i++) {
+		eqhdl = lpfc_get_eq_hdl(i);
+		eqhdl->irq = LPFC_VECTOR_MAP_EMPTY;
+		eqhdl->phba = phba;
+	}
+}
+
 /**
  * lpfc_cpu_affinity_check - Check vector CPU affinity mappings
  * @phba: pointer to lpfc hba data structure.
@@ -10742,22 +10842,10 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 	int max_core_id, min_core_id;
 	struct lpfc_vector_map_info *cpup;
 	struct lpfc_vector_map_info *new_cpup;
-	const struct cpumask *maskp;
 #ifdef CONFIG_X86
 	struct cpuinfo_x86 *cpuinfo;
 #endif
 
-	/* Init cpu_map array */
-	for_each_possible_cpu(cpu) {
-		cpup = &phba->sli4_hba.cpu_map[cpu];
-		cpup->phys_id = LPFC_VECTOR_MAP_EMPTY;
-		cpup->core_id = LPFC_VECTOR_MAP_EMPTY;
-		cpup->hdwq = LPFC_VECTOR_MAP_EMPTY;
-		cpup->eq = LPFC_VECTOR_MAP_EMPTY;
-		cpup->irq = LPFC_VECTOR_MAP_EMPTY;
-		cpup->flag = 0;
-	}
-
 	max_phys_id = 0;
 	min_phys_id = LPFC_VECTOR_MAP_EMPTY;
 	max_core_id = 0;
@@ -10793,65 +10881,6 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 			min_core_id = cpup->core_id;
 	}
 
-	for_each_possible_cpu(i) {
-		struct lpfc_eq_intr_info *eqi =
-			per_cpu_ptr(phba->sli4_hba.eq_info, i);
-
-		INIT_LIST_HEAD(&eqi->list);
-		eqi->icnt = 0;
-	}
-
-	/* This loop sets up all CPUs that are affinitized with a
-	 * irq vector assigned to the driver. All affinitized CPUs
-	 * will get a link to that vectors IRQ and EQ.
-	 *
-	 * NULL affinity mask handling:
-	 * If irq count is greater than one, log an error message.
-	 * If the null mask is received for the first irq, find the
-	 * first present cpu, and assign the eq index to ensure at
-	 * least one EQ is assigned.
-	 */
-	for (idx = 0; idx <  phba->cfg_irq_chann; idx++) {
-		/* Get a CPU mask for all CPUs affinitized to this vector */
-		maskp = pci_irq_get_affinity(phba->pcidev, idx);
-		if (!maskp) {
-			if (phba->cfg_irq_chann > 1)
-				lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
-						"3329 No affinity mask found "
-						"for vector %d (%d)\n",
-						idx, phba->cfg_irq_chann);
-			if (!idx) {
-				cpu = cpumask_first(cpu_present_mask);
-				cpup = &phba->sli4_hba.cpu_map[cpu];
-				cpup->eq = idx;
-				cpup->irq = pci_irq_vector(phba->pcidev, idx);
-				cpup->flag |= LPFC_CPU_FIRST_IRQ;
-			}
-			break;
-		}
-
-		i = 0;
-		/* Loop through all CPUs associated with vector idx */
-		for_each_cpu_and(cpu, maskp, cpu_present_mask) {
-			/* Set the EQ index and IRQ for that vector */
-			cpup = &phba->sli4_hba.cpu_map[cpu];
-			cpup->eq = idx;
-			cpup->irq = pci_irq_vector(phba->pcidev, idx);
-
-			/* If this is the first CPU thats assigned to this
-			 * vector, set LPFC_CPU_FIRST_IRQ.
-			 */
-			if (!i)
-				cpup->flag |= LPFC_CPU_FIRST_IRQ;
-			i++;
-
-			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
-					"3336 Set Affinity: CPU %d "
-					"irq %d eq %d flag x%x\n",
-					cpu, cpup->irq, cpup->eq, cpup->flag);
-		}
-	}
-
 	/* After looking at each irq vector assigned to this pcidev, its
 	 * possible to see that not ALL CPUs have been accounted for.
 	 * Next we will set any unassigned (unaffinitized) cpu map
@@ -10877,7 +10906,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 			for (i = 0; i < phba->sli4_hba.num_present_cpu; i++) {
 				new_cpup = &phba->sli4_hba.cpu_map[new_cpu];
 				if (!(new_cpup->flag & LPFC_CPU_MAP_UNASSIGN) &&
-				    (new_cpup->irq != LPFC_VECTOR_MAP_EMPTY) &&
+				    (new_cpup->eq != LPFC_VECTOR_MAP_EMPTY) &&
 				    (new_cpup->phys_id == cpup->phys_id))
 					goto found_same;
 				new_cpu = cpumask_next(
@@ -10890,7 +10919,6 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 found_same:
 			/* We found a matching phys_id, so copy the IRQ info */
 			cpup->eq = new_cpup->eq;
-			cpup->irq = new_cpup->irq;
 
 			/* Bump start_cpu to the next slot to minmize the
 			 * chance of having multiple unassigned CPU entries
@@ -10902,9 +10930,10 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 
 			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
 					"3337 Set Affinity: CPU %d "
-					"irq %d from id %d same "
+					"eq %d from peer cpu %d same "
 					"phys_id (%d)\n",
-					cpu, cpup->irq, new_cpu, cpup->phys_id);
+					cpu, cpup->eq, new_cpu,
+					cpup->phys_id);
 		}
 	}
 
@@ -10928,7 +10957,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 			for (i = 0; i < phba->sli4_hba.num_present_cpu; i++) {
 				new_cpup = &phba->sli4_hba.cpu_map[new_cpu];
 				if (!(new_cpup->flag & LPFC_CPU_MAP_UNASSIGN) &&
-				    (new_cpup->irq != LPFC_VECTOR_MAP_EMPTY))
+				    (new_cpup->eq != LPFC_VECTOR_MAP_EMPTY))
 					goto found_any;
 				new_cpu = cpumask_next(
 					new_cpu, cpu_present_mask);
@@ -10938,13 +10967,12 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 			/* We should never leave an entry unassigned */
 			lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
 					"3339 Set Affinity: CPU %d "
-					"irq %d UNASSIGNED\n",
-					cpup->hdwq, cpup->irq);
+					"eq %d UNASSIGNED\n",
+					cpup->hdwq, cpup->eq);
 			continue;
 found_any:
 			/* We found an available entry, copy the IRQ info */
 			cpup->eq = new_cpup->eq;
-			cpup->irq = new_cpup->irq;
 
 			/* Bump start_cpu to the next slot to minmize the
 			 * chance of having multiple unassigned CPU entries
@@ -10956,8 +10984,8 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 
 			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
 					"3338 Set Affinity: CPU %d "
-					"irq %d from id %d (%d/%d)\n",
-					cpu, cpup->irq, new_cpu,
+					"eq %d from peer cpu %d (%d/%d)\n",
+					cpu, cpup->eq, new_cpu,
 					new_cpup->phys_id, new_cpup->core_id);
 		}
 	}
@@ -10978,9 +11006,9 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 		idx++;
 		lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
 				"3333 Set Affinity: CPU %d (phys %d core %d): "
-				"hdwq %d eq %d irq %d flg x%x\n",
+				"hdwq %d eq %d flg x%x\n",
 				cpu, cpup->phys_id, cpup->core_id,
-				cpup->hdwq, cpup->eq, cpup->irq, cpup->flag);
+				cpup->hdwq, cpup->eq, cpup->flag);
 	}
 	/* Finally we need to associate a hdwq with each cpu_map entry
 	 * This will be 1 to 1 - hdwq to cpu, unless there are less
@@ -11056,9 +11084,9 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
  logit:
 		lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
 				"3335 Set Affinity: CPU %d (phys %d core %d): "
-				"hdwq %d eq %d irq %d flg x%x\n",
+				"hdwq %d eq %d flg x%x\n",
 				cpu, cpup->phys_id, cpup->core_id,
-				cpup->hdwq, cpup->eq, cpup->irq, cpup->flag);
+				cpup->hdwq, cpup->eq, cpup->flag);
 	}
 
 	/* The cpu_map array will be used later during initialization
@@ -11078,10 +11106,8 @@ static void
 lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu,
 		  struct list_head *eqlist)
 {
-	struct lpfc_vector_map_info *map;
 	const struct cpumask *maskp;
 	struct lpfc_queue *eq;
-	unsigned int i;
 	cpumask_t tmp;
 	u16 idx;
 
@@ -11111,15 +11137,8 @@ lpfc_cpuhp_get_eq(struct lpfc_hba *phba, unsigned int cpu,
 		 * the software can share an eq, but eventually
 		 * only eq will be mapped to this vector
 		 */
-		for_each_possible_cpu(i) {
-			map = &phba->sli4_hba.cpu_map[i];
-			if (!(map->irq == pci_irq_vector(phba->pcidev, idx)))
-				continue;
-			eq = phba->sli4_hba.hdwq[map->hdwq].hba_eq;
-			list_add(&eq->_poll_list, eqlist);
-			/* 1 is good enough. others will be a copy of this */
-			break;
-		}
+		eq = phba->sli4_hba.hba_eq_hdl[idx].eq;
+		list_add(&eq->_poll_list, eqlist);
 	}
 }
 
@@ -11181,6 +11200,99 @@ static int __lpfc_cpuhp_checks(struct lpfc_hba *phba, int *retval)
 	return false;
 }
 
+/**
+ * lpfc_irq_set_aff - set IRQ affinity
+ * @eqhdl: EQ handle
+ * @cpu: cpu to set affinity
+ *
+ **/
+static inline void
+lpfc_irq_set_aff(struct lpfc_hba_eq_hdl *eqhdl, unsigned int cpu)
+{
+	cpumask_clear(&eqhdl->aff_mask);
+	cpumask_set_cpu(cpu, &eqhdl->aff_mask);
+	irq_set_status_flags(eqhdl->irq, IRQ_NO_BALANCING);
+	irq_set_affinity_hint(eqhdl->irq, &eqhdl->aff_mask);
+}
+
+/**
+ * lpfc_irq_clear_aff - clear IRQ affinity
+ * @eqhdl: EQ handle
+ *
+ **/
+static inline void
+lpfc_irq_clear_aff(struct lpfc_hba_eq_hdl *eqhdl)
+{
+	cpumask_clear(&eqhdl->aff_mask);
+	irq_clear_status_flags(eqhdl->irq, IRQ_NO_BALANCING);
+	irq_set_affinity_hint(eqhdl->irq, &eqhdl->aff_mask);
+}
+
+/**
+ * lpfc_irq_rebalance - rebalances IRQ affinity according to cpuhp event
+ * @phba: pointer to HBA context object.
+ * @cpu: cpu going offline/online
+ * @offline: true, cpu is going offline. false, cpu is coming online.
+ *
+ * If cpu is going offline, we'll try our best effort to find the next
+ * online cpu on the phba's NUMA node and migrate all offlining IRQ affinities.
+ *
+ * If cpu is coming online, reaffinitize the IRQ back to the onlineng cpu.
+ *
+ * Note: Call only if cfg_irq_numa is enabled, otherwise rely on
+ *	 PCI_IRQ_AFFINITY to auto-manage IRQ affinity.
+ *
+ **/
+static void
+lpfc_irq_rebalance(struct lpfc_hba *phba, unsigned int cpu, bool offline)
+{
+	struct lpfc_vector_map_info *cpup;
+	struct cpumask *aff_mask;
+	unsigned int cpu_select, cpu_next, idx;
+	const struct cpumask *numa_mask;
+
+	if (!phba->cfg_irq_numa)
+		return;
+
+	numa_mask = &phba->sli4_hba.numa_mask;
+
+	if (!cpumask_test_cpu(cpu, numa_mask))
+		return;
+
+	cpup = &phba->sli4_hba.cpu_map[cpu];
+
+	if (!(cpup->flag & LPFC_CPU_FIRST_IRQ))
+		return;
+
+	if (offline) {
+		/* Find next online CPU on NUMA node */
+		cpu_next = cpumask_next_wrap(cpu, numa_mask, cpu, true);
+		cpu_select = lpfc_next_online_numa_cpu(numa_mask, cpu_next);
+
+		/* Found a valid CPU */
+		if ((cpu_select < nr_cpu_ids) && (cpu_select != cpu)) {
+			/* Go through each eqhdl and ensure offlining
+			 * cpu aff_mask is migrated
+			 */
+			for (idx = 0; idx < phba->cfg_irq_chann; idx++) {
+				aff_mask = lpfc_get_aff_mask(idx);
+
+				/* Migrate affinity */
+				if (cpumask_test_cpu(cpu, aff_mask))
+					lpfc_irq_set_aff(lpfc_get_eq_hdl(idx),
+							 cpu_select);
+			}
+		} else {
+			/* Rely on irqbalance if no online CPUs left on NUMA */
+			for (idx = 0; idx < phba->cfg_irq_chann; idx++)
+				lpfc_irq_clear_aff(lpfc_get_eq_hdl(idx));
+		}
+	} else {
+		/* Migrate affinity back to this CPU */
+		lpfc_irq_set_aff(lpfc_get_eq_hdl(cpup->eq), cpu);
+	}
+}
+
 static int lpfc_cpu_offline(unsigned int cpu, struct hlist_node *node)
 {
 	struct lpfc_hba *phba = hlist_entry_safe(node, struct lpfc_hba, cpuhp);
@@ -11196,6 +11308,8 @@ static int lpfc_cpu_offline(unsigned int cpu, struct hlist_node *node)
 	if (__lpfc_cpuhp_checks(phba, &retval))
 		return retval;
 
+	lpfc_irq_rebalance(phba, cpu, true);
+
 	lpfc_cpuhp_get_eq(phba, cpu, &eqlist);
 
 	/* start polling on these eq's */
@@ -11222,6 +11336,8 @@ static int lpfc_cpu_online(unsigned int cpu, struct hlist_node *node)
 	if (__lpfc_cpuhp_checks(phba, &retval))
 		return retval;
 
+	lpfc_irq_rebalance(phba, cpu, false);
+
 	list_for_each_entry_safe(eq, next, &phba->poll_list, _poll_list) {
 		n = lpfc_find_cpu_handle(phba, eq->hdwq, LPFC_FIND_BY_HDWQ);
 		if (n == cpu)
@@ -11236,7 +11352,24 @@ static int lpfc_cpu_online(unsigned int cpu, struct hlist_node *node)
  * @phba: pointer to lpfc hba data structure.
  *
  * This routine is invoked to enable the MSI-X interrupt vectors to device
- * with SLI-4 interface spec.
+ * with SLI-4 interface spec.  It also allocates MSI-X vectors and maps them
+ * to cpus on the system.
+ *
+ * When cfg_irq_numa is enabled, the adapter will only allocate vectors for
+ * the number of cpus on the same numa node as this adapter.  The vectors are
+ * allocated without requesting OS affinity mapping.  A vector will be
+ * allocated and assigned to each online and offline cpu.  If the cpu is
+ * online, then affinity will be set to that cpu.  If the cpu is offline, then
+ * affinity will be set to the nearest peer cpu within the numa node that is
+ * online.  If there are no online cpus within the numa node, affinity is not
+ * assigned and the OS may do as it pleases. Note: cpu vector affinity mapping
+ * is consistent with the way cpu online/offline is handled when cfg_irq_numa is
+ * configured.
+ *
+ * If numa mode is not enabled and there is more than 1 vector allocated, then
+ * the driver relies on the managed irq interface where the OS assigns vector to
+ * cpu affinity.  The driver will then use that affinity mapping to setup its
+ * cpu mapping table.
  *
  * Return codes
  * 0 - successful
@@ -11247,13 +11380,31 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
 {
 	int vectors, rc, index;
 	char *name;
+	const struct cpumask *numa_mask = NULL;
+	unsigned int cpu = 0, cpu_cnt = 0, cpu_select = nr_cpu_ids;
+	struct lpfc_hba_eq_hdl *eqhdl;
+	const struct cpumask *maskp;
+	bool first;
+	unsigned int flags = PCI_IRQ_MSIX;
 
 	/* Set up MSI-X multi-message vectors */
 	vectors = phba->cfg_irq_chann;
 
-	rc = pci_alloc_irq_vectors(phba->pcidev,
-				1,
-				vectors, PCI_IRQ_MSIX | PCI_IRQ_AFFINITY);
+	if (phba->cfg_irq_numa) {
+		numa_mask = &phba->sli4_hba.numa_mask;
+		cpu_cnt = cpumask_weight(numa_mask);
+		vectors = min(phba->cfg_irq_chann, cpu_cnt);
+
+		/* cpu: iterates over numa_mask including offline or online
+		 * cpu_select: iterates over online numa_mask to set affinity
+		 */
+		cpu = cpumask_first(numa_mask);
+		cpu_select = lpfc_next_online_numa_cpu(numa_mask, cpu);
+	} else {
+		flags |= PCI_IRQ_AFFINITY;
+	}
+
+	rc = pci_alloc_irq_vectors(phba->pcidev, 1, vectors, flags);
 	if (rc < 0) {
 		lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
 				"0484 PCI enable MSI-X failed (%d)\n", rc);
@@ -11263,23 +11414,61 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
 
 	/* Assign MSI-X vectors to interrupt handlers */
 	for (index = 0; index < vectors; index++) {
-		name = phba->sli4_hba.hba_eq_hdl[index].handler_name;
+		eqhdl = lpfc_get_eq_hdl(index);
+		name = eqhdl->handler_name;
 		memset(name, 0, LPFC_SLI4_HANDLER_NAME_SZ);
 		snprintf(name, LPFC_SLI4_HANDLER_NAME_SZ,
 			 LPFC_DRIVER_HANDLER_NAME"%d", index);
 
-		phba->sli4_hba.hba_eq_hdl[index].idx = index;
-		phba->sli4_hba.hba_eq_hdl[index].phba = phba;
+		eqhdl->idx = index;
 		rc = request_irq(pci_irq_vector(phba->pcidev, index),
 			 &lpfc_sli4_hba_intr_handler, 0,
-			 name,
-			 &phba->sli4_hba.hba_eq_hdl[index]);
+			 name, eqhdl);
 		if (rc) {
 			lpfc_printf_log(phba, KERN_WARNING, LOG_INIT,
 					"0486 MSI-X fast-path (%d) "
 					"request_irq failed (%d)\n", index, rc);
 			goto cfg_fail_out;
 		}
+
+		eqhdl->irq = pci_irq_vector(phba->pcidev, index);
+
+		if (phba->cfg_irq_numa) {
+			/* If found a neighboring online cpu, set affinity */
+			if (cpu_select < nr_cpu_ids)
+				lpfc_irq_set_aff(eqhdl, cpu_select);
+
+			/* Assign EQ to cpu_map */
+			lpfc_assign_eq_map_info(phba, index,
+						LPFC_CPU_FIRST_IRQ,
+						cpu);
+
+			/* Iterate to next offline or online cpu in numa_mask */
+			cpu = cpumask_next(cpu, numa_mask);
+
+			/* Find next online cpu in numa_mask to set affinity */
+			cpu_select = lpfc_next_online_numa_cpu(numa_mask, cpu);
+		} else if (vectors == 1) {
+			cpu = cpumask_first(cpu_present_mask);
+			lpfc_assign_eq_map_info(phba, index, LPFC_CPU_FIRST_IRQ,
+						cpu);
+		} else {
+			maskp = pci_irq_get_affinity(phba->pcidev, index);
+
+			first = true;
+			/* Loop through all CPUs associated with vector index */
+			for_each_cpu_and(cpu, maskp, cpu_present_mask) {
+				/* If this is the first CPU thats assigned to
+				 * this vector, set LPFC_CPU_FIRST_IRQ.
+				 */
+				lpfc_assign_eq_map_info(phba, index,
+							first ?
+							LPFC_CPU_FIRST_IRQ : 0,
+							cpu);
+				if (first)
+					first = false;
+			}
+		}
 	}
 
 	if (vectors != phba->cfg_irq_chann) {
@@ -11295,9 +11484,12 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
 
 cfg_fail_out:
 	/* free the irq already requested */
-	for (--index; index >= 0; index--)
-		free_irq(pci_irq_vector(phba->pcidev, index),
-				&phba->sli4_hba.hba_eq_hdl[index]);
+	for (--index; index >= 0; index--) {
+		eqhdl = lpfc_get_eq_hdl(index);
+		lpfc_irq_clear_aff(eqhdl);
+		irq_set_affinity_hint(eqhdl->irq, NULL);
+		free_irq(eqhdl->irq, eqhdl);
+	}
 
 	/* Unconfigure MSI-X capability structure */
 	pci_free_irq_vectors(phba->pcidev);
@@ -11324,6 +11516,8 @@ static int
 lpfc_sli4_enable_msi(struct lpfc_hba *phba)
 {
 	int rc, index;
+	unsigned int cpu;
+	struct lpfc_hba_eq_hdl *eqhdl;
 
 	rc = pci_alloc_irq_vectors(phba->pcidev, 1, 1,
 				   PCI_IRQ_MSI | PCI_IRQ_AFFINITY);
@@ -11345,9 +11539,15 @@ lpfc_sli4_enable_msi(struct lpfc_hba *phba)
 		return rc;
 	}
 
+	eqhdl = lpfc_get_eq_hdl(0);
+	eqhdl->irq = pci_irq_vector(phba->pcidev, 0);
+
+	cpu = cpumask_first(cpu_present_mask);
+	lpfc_assign_eq_map_info(phba, 0, LPFC_CPU_FIRST_IRQ, cpu);
+
 	for (index = 0; index < phba->cfg_irq_chann; index++) {
-		phba->sli4_hba.hba_eq_hdl[index].idx = index;
-		phba->sli4_hba.hba_eq_hdl[index].phba = phba;
+		eqhdl = lpfc_get_eq_hdl(index);
+		eqhdl->idx = index;
 	}
 
 	return 0;
@@ -11380,7 +11580,9 @@ lpfc_sli4_enable_intr(struct lpfc_hba *phba, uint32_t cfg_mode)
 		retval = 0;
 		if (!retval) {
 			/* Now, try to enable MSI-X interrupt mode */
+			get_online_cpus();
 			retval = lpfc_sli4_enable_msix(phba);
+			put_online_cpus();
 			if (!retval) {
 				/* Indicate initialization to MSI-X mode */
 				phba->intr_type = MSIX;
@@ -11405,15 +11607,21 @@ lpfc_sli4_enable_intr(struct lpfc_hba *phba, uint32_t cfg_mode)
 				     IRQF_SHARED, LPFC_DRIVER_NAME, phba);
 		if (!retval) {
 			struct lpfc_hba_eq_hdl *eqhdl;
+			unsigned int cpu;
 
 			/* Indicate initialization to INTx mode */
 			phba->intr_type = INTx;
 			intr_mode = 0;
 
+			eqhdl = lpfc_get_eq_hdl(0);
+			eqhdl->irq = pci_irq_vector(phba->pcidev, 0);
+
+			cpu = cpumask_first(cpu_present_mask);
+			lpfc_assign_eq_map_info(phba, 0, LPFC_CPU_FIRST_IRQ,
+						cpu);
 			for (idx = 0; idx < phba->cfg_irq_chann; idx++) {
-				eqhdl = &phba->sli4_hba.hba_eq_hdl[idx];
+				eqhdl = lpfc_get_eq_hdl(idx);
 				eqhdl->idx = idx;
-				eqhdl->phba = phba;
 			}
 		}
 	}
@@ -11435,14 +11643,14 @@ lpfc_sli4_disable_intr(struct lpfc_hba *phba)
 	/* Disable the currently initialized interrupt mode */
 	if (phba->intr_type == MSIX) {
 		int index;
+		struct lpfc_hba_eq_hdl *eqhdl;
 
 		/* Free up MSI-X multi-message vectors */
 		for (index = 0; index < phba->cfg_irq_chann; index++) {
-			irq_set_affinity_hint(
-				pci_irq_vector(phba->pcidev, index),
-				NULL);
-			free_irq(pci_irq_vector(phba->pcidev, index),
-					&phba->sli4_hba.hba_eq_hdl[index]);
+			eqhdl = lpfc_get_eq_hdl(index);
+			lpfc_irq_clear_aff(eqhdl);
+			irq_set_affinity_hint(eqhdl->irq, NULL);
+			free_irq(eqhdl->irq, eqhdl);
 		}
 	} else {
 		free_irq(phba->pcidev->irq, phba);
@@ -12848,6 +13056,12 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid)
 	phba->pport = NULL;
 	lpfc_stop_port(phba);
 
+	/* Init cpu_map array */
+	lpfc_cpu_map_array_init(phba);
+
+	/* Init hba_eq_hdl array */
+	lpfc_hba_eq_hdl_array_init(phba);
+
 	/* Configure and enable interrupt */
 	intr_mode = lpfc_sli4_enable_intr(phba, cfg_mode);
 	if (intr_mode == LPFC_INTR_ERROR) {
diff --git a/drivers/scsi/lpfc/lpfc_sli4.h b/drivers/scsi/lpfc/lpfc_sli4.h
index ef32159248d7..d963ca871383 100644
--- a/drivers/scsi/lpfc/lpfc_sli4.h
+++ b/drivers/scsi/lpfc/lpfc_sli4.h
@@ -41,8 +41,13 @@
 
 /* Multi-queue arrangement for FCP EQ/CQ/WQ tuples */
 #define LPFC_HBA_HDWQ_MIN	0
-#define LPFC_HBA_HDWQ_MAX	128
-#define LPFC_HBA_HDWQ_DEF	0
+#define LPFC_HBA_HDWQ_MAX	256
+#define LPFC_HBA_HDWQ_DEF	LPFC_HBA_HDWQ_MIN
+
+/* irq_chann range, values */
+#define LPFC_IRQ_CHANN_MIN	0
+#define LPFC_IRQ_CHANN_MAX	256
+#define LPFC_IRQ_CHANN_DEF	LPFC_IRQ_CHANN_MIN
 
 /* FCP MQ queue count limiting */
 #define LPFC_FCP_MQ_THRESHOLD_MIN	0
@@ -467,11 +472,17 @@ struct lpfc_hba;
 #define LPFC_SLI4_HANDLER_NAME_SZ	16
 struct lpfc_hba_eq_hdl {
 	uint32_t idx;
+	uint16_t irq;
 	char handler_name[LPFC_SLI4_HANDLER_NAME_SZ];
 	struct lpfc_hba *phba;
 	struct lpfc_queue *eq;
+	struct cpumask aff_mask;
 };
 
+#define lpfc_get_eq_hdl(eqidx) (&phba->sli4_hba.hba_eq_hdl[eqidx])
+#define lpfc_get_aff_mask(eqidx) (&phba->sli4_hba.hba_eq_hdl[eqidx].aff_mask)
+#define lpfc_get_irq(eqidx) (phba->sli4_hba.hba_eq_hdl[eqidx].irq)
+
 /*BB Credit recovery value*/
 struct lpfc_bbscn_params {
 	uint32_t word0;
@@ -561,11 +572,10 @@ struct lpfc_sli4_lnk_info {
 #define LPFC_SLI4_HANDLER_CNT		(LPFC_HBA_IO_CHAN_MAX+ \
 					 LPFC_FOF_IO_CHAN_NUM)
 
-/* Used for IRQ vector to CPU mapping */
+/* Used for tracking CPU mapping attributes */
 struct lpfc_vector_map_info {
 	uint16_t	phys_id;
 	uint16_t	core_id;
-	uint16_t	irq;
 	uint16_t	eq;
 	uint16_t	hdwq;
 	uint16_t	flag;
@@ -908,6 +918,7 @@ struct lpfc_sli4_hba {
 	struct lpfc_vector_map_info *cpu_map;
 	uint16_t num_possible_cpu;
 	uint16_t num_present_cpu;
+	struct cpumask numa_mask;
 	uint16_t curr_disp_cpu;
 	struct lpfc_eq_intr_info __percpu *eq_info;
 	uint32_t conf_trunk;
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 10/11] lpfc: Add enablement of multiple adapter dumps
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (8 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 09/11] lpfc: Change default IRQ model on AMD architectures James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-05  0:57 ` [PATCH 11/11] lpfc: Update lpfc version to 12.6.0.1 James Smart
  2019-11-06  4:45 ` [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 Martin K. Petersen
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Some adapters support the ability to hold multiple adapter dumps
on the adapter flash. Some adapters default to enabling this
feature while others default to single-dump.

Make support uniform by enabling dual dump by default.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_hw4.h | 10 ++++++++++
 drivers/scsi/lpfc/lpfc_sli.c | 27 ++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h
index 9daa2b494b5c..25cdcbc2b02f 100644
--- a/drivers/scsi/lpfc/lpfc_hw4.h
+++ b/drivers/scsi/lpfc/lpfc_hw4.h
@@ -3530,6 +3530,7 @@ struct lpfc_sli4_parameters {
 
 #define LPFC_SET_UE_RECOVERY		0x10
 #define LPFC_SET_MDS_DIAGS		0x11
+#define LPFC_SET_DUAL_DUMP		0x1e
 struct lpfc_mbx_set_feature {
 	struct mbox_header header;
 	uint32_t feature;
@@ -3544,6 +3545,15 @@ struct lpfc_mbx_set_feature {
 #define lpfc_mbx_set_feature_mds_deep_loopbk_SHIFT  1
 #define lpfc_mbx_set_feature_mds_deep_loopbk_MASK   0x00000001
 #define lpfc_mbx_set_feature_mds_deep_loopbk_WORD   word6
+#define lpfc_mbx_set_feature_dd_SHIFT		0
+#define lpfc_mbx_set_feature_dd_MASK		0x00000001
+#define lpfc_mbx_set_feature_dd_WORD		word6
+#define lpfc_mbx_set_feature_ddquery_SHIFT	1
+#define lpfc_mbx_set_feature_ddquery_MASK	0x00000001
+#define lpfc_mbx_set_feature_ddquery_WORD	word6
+#define LPFC_DISABLE_DUAL_DUMP		0
+#define LPFC_ENABLE_DUAL_DUMP		1
+#define LPFC_QUERY_OP_DUAL_DUMP		2
 	uint32_t word7;
 #define lpfc_mbx_set_feature_UERP_SHIFT 0
 #define lpfc_mbx_set_feature_UERP_MASK  0x0000ffff
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 5f097328d236..b59bb75fbc78 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -6203,6 +6203,14 @@ lpfc_set_features(struct lpfc_hba *phba, LPFC_MBOXQ_t *mbox,
 		mbox->u.mqe.un.set_feature.feature = LPFC_SET_MDS_DIAGS;
 		mbox->u.mqe.un.set_feature.param_len = 8;
 		break;
+	case LPFC_SET_DUAL_DUMP:
+		bf_set(lpfc_mbx_set_feature_dd,
+		       &mbox->u.mqe.un.set_feature, LPFC_ENABLE_DUAL_DUMP);
+		bf_set(lpfc_mbx_set_feature_ddquery,
+		       &mbox->u.mqe.un.set_feature, 0);
+		mbox->u.mqe.un.set_feature.feature = LPFC_SET_DUAL_DUMP;
+		mbox->u.mqe.un.set_feature.param_len = 4;
+		break;
 	}
 
 	return;
@@ -7200,7 +7208,7 @@ lpfc_post_rq_buffer(struct lpfc_hba *phba, struct lpfc_queue *hrq,
 int
 lpfc_sli4_hba_setup(struct lpfc_hba *phba)
 {
-	int rc, i, cnt, len;
+	int rc, i, cnt, len, dd;
 	LPFC_MBOXQ_t *mboxq;
 	struct lpfc_mqe *mqe;
 	uint8_t *vpd;
@@ -7451,6 +7459,23 @@ lpfc_sli4_hba_setup(struct lpfc_hba *phba)
 	phba->sli3_options |= (LPFC_SLI3_NPIV_ENABLED | LPFC_SLI3_HBQ_ENABLED);
 	spin_unlock_irq(&phba->hbalock);
 
+	/* Always try to enable dual dump feature if we can */
+	lpfc_set_features(phba, mboxq, LPFC_SET_DUAL_DUMP);
+	rc = lpfc_sli_issue_mbox(phba, mboxq, MBX_POLL);
+	dd = bf_get(lpfc_mbx_set_feature_dd, &mboxq->u.mqe.un.set_feature);
+	if ((rc == MBX_SUCCESS) && (dd == LPFC_ENABLE_DUAL_DUMP))
+		lpfc_printf_log(phba, KERN_ERR, LOG_SLI | LOG_INIT,
+				"6448 Dual Dump is enabled\n");
+	else
+		lpfc_printf_log(phba, KERN_INFO, LOG_SLI | LOG_INIT,
+				"6447 Dual Dump Mailbox x%x (x%x/x%x) failed, "
+				"rc:x%x dd:x%x\n",
+				bf_get(lpfc_mqe_command, &mboxq->u.mqe),
+				lpfc_sli_config_mbox_subsys_get(
+					phba, mboxq),
+				lpfc_sli_config_mbox_opcode_get(
+					phba, mboxq),
+				rc, dd);
 	/*
 	 * Allocate all resources (xri,rpi,vpi,vfi) now.  Subsequent
 	 * calls depends on these resources to complete port setup.
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 11/11] lpfc: Update lpfc version to 12.6.0.1
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (9 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 10/11] lpfc: Add enablement of multiple adapter dumps James Smart
@ 2019-11-05  0:57 ` James Smart
  2019-11-06  4:45 ` [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 Martin K. Petersen
  11 siblings, 0 replies; 13+ messages in thread
From: James Smart @ 2019-11-05  0:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dick Kennedy

Update lpfc version to 12.6.0.1

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h
index ed1b10b0161e..1532390770f5 100644
--- a/drivers/scsi/lpfc/lpfc_version.h
+++ b/drivers/scsi/lpfc/lpfc_version.h
@@ -20,7 +20,7 @@
  * included with this package.                                     *
  *******************************************************************/
 
-#define LPFC_DRIVER_VERSION "12.6.0.0"
+#define LPFC_DRIVER_VERSION "12.6.0.1"
 #define LPFC_DRIVER_NAME		"lpfc"
 
 /* Used for SLI 2/3 */
-- 
2.13.7


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1
  2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
                   ` (10 preceding siblings ...)
  2019-11-05  0:57 ` [PATCH 11/11] lpfc: Update lpfc version to 12.6.0.1 James Smart
@ 2019-11-06  4:45 ` Martin K. Petersen
  11 siblings, 0 replies; 13+ messages in thread
From: Martin K. Petersen @ 2019-11-06  4:45 UTC (permalink / raw)
  To: James Smart; +Cc: linux-scsi


James,

> Update lpfc to revision 12.6.0.1

Applied to 5.5/scsi-queue.

Sad to see yet another driver with AMD workarounds. It really seems like
this should be addressed in the core managed IRQ code.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-06  4:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-05  0:56 [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 James Smart
2019-11-05  0:56 ` [PATCH 01/11] lpfc: Fix duplicate unreg_rpi error in port offline flow James Smart
2019-11-05  0:56 ` [PATCH 02/11] lpfc: Fix configuration of BB credit recovery in service parameters James Smart
2019-11-05  0:57 ` [PATCH 03/11] lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce James Smart
2019-11-05  0:57 ` [PATCH 04/11] lpfc: Fix unexpected error messages during RSCN handling James Smart
2019-11-05  0:57 ` [PATCH 05/11] lpfc: Fix dynamic fw log enablement check James Smart
2019-11-05  0:57 ` [PATCH 06/11] lpfc: Sync with FC-NVMe-2 SLER change to require Conf with SLER James Smart
2019-11-05  0:57 ` [PATCH 07/11] lpfc: Clarify FAWNN error message James Smart
2019-11-05  0:57 ` [PATCH 08/11] lpfc: Add registration for CPU Offline/Online events James Smart
2019-11-05  0:57 ` [PATCH 09/11] lpfc: Change default IRQ model on AMD architectures James Smart
2019-11-05  0:57 ` [PATCH 10/11] lpfc: Add enablement of multiple adapter dumps James Smart
2019-11-05  0:57 ` [PATCH 11/11] lpfc: Update lpfc version to 12.6.0.1 James Smart
2019-11-06  4:45 ` [PATCH 00/11] lpfc: Update lpfc to revision 12.6.0.1 Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.