linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC.
@ 2018-09-26  4:22 Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available Suganath Prabu S
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

Posting below set of patches to support PCIe Hot Plug surprise removal,
and few defect fixes.

This is NOT the normal PCIe Hot Plug support, whereby the user informs the
OS that a hot removal is desired, the OS does an orderly shutdown of the
driver on the device, special hot plug circuitry removes power from the
PCIe slot, then the user can remove the device and replace it
(where orderly bring-up of the device is done).

With a true surprise removal (just removing HBA from a slot)
there is a possibility to get all kinds of PCIe transaction errors,
Below patches addresses those issues and remove HBA without bringing
the system down.

For surprise removal detection, driver does a PCI
read of IOC's vendor field in IOC's PCI configuration space.
If the read value is 0xFFFFFFFF this indicates that the device
might have hot removed and the device will be removed from driver.

V1 changes:
In Patch 0001 - unlock mutex, if active reset is in progress.

V2 changes:
Replaced mpt3sas_base_pci_device_is_unplugged with
pci_device_is_present.

V3 Change Set:
Simplified function "mpt3sas_base_pci_device_is_available" and
made inline

V4 Change set:
Reframe split strings in print statement, to avoid
warning from checkpatch.pl.

Suganath Prabu S (6):
  mpt3sas: Introduce      mpt3sas_base_pci_device_is_available
  mpt3sas: Separate out      mpt3sas_wait_for_ioc_to_operational
  mpt3sas: Introdude  _scsih_get_shost_and_ioc.
  mpt3sas: Fix Sync cache command failure during driver      unload.
  mpt3sas: Fix driver modifying NVRAM/persistent data.
  mpt3sas: Bump driver version to 27.100.00.00.

 drivers/scsi/mpt3sas/mpt3sas_base.c      | 133 +++++++++++++++-------
 drivers/scsi/mpt3sas/mpt3sas_base.h      |  11 +-
 drivers/scsi/mpt3sas/mpt3sas_config.c    |  32 +-----
 drivers/scsi/mpt3sas/mpt3sas_ctl.c       |  26 +----
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 189 +++++++++++++++++++++++++++----
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  82 +++-----------
 6 files changed, 296 insertions(+), 177 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/6] mpt3sas: Introduce  mpt3sas_base_pci_device_is_available
  2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
@ 2018-09-26  4:22 ` Suganath Prabu S
  2018-09-26 21:32   ` Bjorn Helgaas
  2018-09-26  4:22 ` [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational Suganath Prabu S
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

* Driver uses "pci_device_is_present" to check whether
If Hot unplugged:
the outstanding IOs with 'DID_NO_CONNECT' before removing the drives
attached to the HBA.
"DID_NO_CONNECT" status and free the smid, if driver detects that
HBA is hot unplugged.

* In the hard reset flush out all the outstanding IOs even if diag reset
fails and also if driver detects that HBA is hot unplugged.

v1 change set:
==============
unlock mutex before goto "out_unlocked",
if active reset is in progress.

v2 change set:
==============
1) Use pci_device_is_present instead of
mpt3sas_base_pci_device_is_unplugged.
2) As suggested by Lukas, removed using
watchdog thread for checking hba hot unplug(Patch 02 of V1).
Added Hot unplug checks in scan finish and reset paths.

v3 Change Set:
=============
Simplified function "mpt3sas_base_pci_device_is_available" and
made inline.

v4 Changes:
===========
Dont split strings in print statement.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 39 ++++++++++++++++++++++++++++
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 ++-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 50 ++++++++++++++++++++++++++++++++----
 3 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 59d7844..c880e72 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -543,6 +543,20 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
 }
 
 /**
+ * mpt3sas_base_pci_device_is_available - check whether pci device is
+ *			available for any transactions with FW
+ *
+ * @ioc: per adapter object
+ *
+ * Return 1 if pci device state is up and running else return 0.
+ */
+inline bool
+mpt3sas_base_pci_device_is_available(struct MPT3SAS_ADAPTER *ioc)
+{
+	return !ioc->pci_error_recovery && pci_device_is_present(ioc->pdev);
+}
+
+/**
  * _base_fault_reset_work - workq handling ioc fault conditions
  * @work: input argument, used to derive ioc
  *
@@ -6122,6 +6136,11 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc)
 
 	count = 0;
 	do {
+		if (!pci_device_is_present(ioc->pdev)) {
+			ioc->remove_host = 1;
+			pr_err(MPT3SAS_FMT "Hba Hot unplugged\n", ioc->name);
+			goto out;
+		}
 		/* Write magic sequence to WriteSequence register
 		 * Loop until in diagnostic mode
 		 */
@@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc)
 
 	ioc->pending_io_count = 0;
 
+	if (!mpt3sas_base_pci_device_is_available(ioc)) {
+		pr_err(MPT3SAS_FMT
+		    "%s: pci error recovery reset or pci device unplug occurred\n",
+		    ioc->name, __func__);
+		return;
+	}
+
 	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
 	if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
 		return;
@@ -6899,6 +6925,19 @@ mpt3sas_base_hard_reset_handler(struct MPT3SAS_ADAPTER *ioc,
 	/* wait for an active reset in progress to complete */
 	mutex_lock(&ioc->reset_in_progress_mutex);
 
+	if (!mpt3sas_base_pci_device_is_available(ioc)) {
+		pr_err(MPT3SAS_FMT
+		    "%s: pci error recovery reset or pci device unplug occurred\n",
+		    ioc->name, __func__);
+		if (!pci_device_is_present(ioc->pdev))
+			ioc->schedule_dead_ioc_flush_running_cmds(ioc);
+		r = 0;
+		mutex_unlock(&ioc->reset_in_progress_mutex);
+		goto out_unlocked;
+	}
+
+	mpt3sas_halt_firmware(ioc);
+
 	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
 	ioc->shost_recovery = 1;
 	spin_unlock_irqrestore(&ioc->ioc_reset_in_progress_lock, flags);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 96dc15e..a802ad4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1474,7 +1474,8 @@ void mpt3sas_base_update_missing_delay(struct MPT3SAS_ADAPTER *ioc,
 	u16 device_missing_delay, u8 io_missing_delay);
 
 int mpt3sas_port_enable(struct MPT3SAS_ADAPTER *ioc);
-
+inline bool  mpt3sas_base_pci_device_is_available(
+	struct MPT3SAS_ADAPTER *ioc);
 void
 mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc);
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 53133cf..566a550 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2846,9 +2846,19 @@ scsih_abort(struct scsi_cmnd *scmd)
 		"attempting task abort! scmd(%p)\n", scmd);
 	_scsih_tm_display_info(ioc, scmd);
 
+	if (!pci_device_is_present(ioc->pdev) || ioc->remove_host) {
+		sdev_printk(KERN_INFO, scmd->device, "%s scmd(%p)\n",
+		    ((ioc->remove_host) ? ("shost is getting removed!") :
+		    ("pci device been removed!")), scmd);
+		if (st && st->smid)
+			mpt3sas_base_free_smid(ioc, st->smid);
+		scmd->result = DID_NO_CONNECT << 16;
+		r = FAST_IO_FAIL;
+		goto out;
+	}
+
 	sas_device_priv_data = scmd->device->hostdata;
-	if (!sas_device_priv_data || !sas_device_priv_data->sas_target ||
-	    ioc->remove_host) {
+	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
 		sdev_printk(KERN_INFO, scmd->device,
 			"device been deleted! scmd(%p)\n", scmd);
 		scmd->result = DID_NO_CONNECT << 16;
@@ -2918,6 +2928,15 @@ scsih_dev_reset(struct scsi_cmnd *scmd)
 		"attempting device reset! scmd(%p)\n", scmd);
 	_scsih_tm_display_info(ioc, scmd);
 
+	if (!pci_device_is_present(ioc->pdev) || ioc->remove_host) {
+		sdev_printk(KERN_INFO, scmd->device, "%s scmd(%p)\n",
+		    ((ioc->remove_host) ? ("shost is getting removed!") :
+		    ("pci device been removed!")), scmd);
+		scmd->result = DID_NO_CONNECT << 16;
+		r = FAST_IO_FAIL;
+		goto out;
+	}
+
 	sas_device_priv_data = scmd->device->hostdata;
 	if (!sas_device_priv_data || !sas_device_priv_data->sas_target ||
 	    ioc->remove_host) {
@@ -2995,9 +3014,17 @@ scsih_target_reset(struct scsi_cmnd *scmd)
 		scmd);
 	_scsih_tm_display_info(ioc, scmd);
 
+	if ((!pci_device_is_present(ioc->pdev)) || ioc->remove_host) {
+		sdev_printk(KERN_INFO, scmd->device, "%s scmd(%p)\n",
+		    ((ioc->remove_host) ? ("shost is getting removed!") :
+		    ("pci device been removed!")), scmd);
+		scmd->result = DID_NO_CONNECT << 16;
+		r = FAST_IO_FAIL;
+		goto out;
+	}
+
 	sas_device_priv_data = scmd->device->hostdata;
-	if (!sas_device_priv_data || !sas_device_priv_data->sas_target ||
-	    ioc->remove_host) {
+	if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
 		starget_printk(KERN_INFO, starget, "target been deleted! scmd(%p)\n",
 			scmd);
 		scmd->result = DID_NO_CONNECT << 16;
@@ -4474,7 +4501,9 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
 		st = scsi_cmd_priv(scmd);
 		mpt3sas_base_clear_st(ioc, st);
 		scsi_dma_unmap(scmd);
-		if (ioc->pci_error_recovery || ioc->remove_host)
+
+		if ((!mpt3sas_base_pci_device_is_available(ioc))
+				|| ioc->remove_host)
 			scmd->result = DID_NO_CONNECT << 16;
 		else
 			scmd->result = DID_RESET << 16;
@@ -9726,6 +9755,9 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
 	if (list_empty(&ioc->raid_device_list))
 		return;
 
+	if (!pci_device_is_present(ioc->pdev))
+		return;
+
 	mutex_lock(&ioc->scsih_cmds.mutex);
 
 	if (ioc->scsih_cmds.status != MPT3_CMD_NOT_USED) {
@@ -10247,6 +10279,14 @@ scsih_scan_finished(struct Scsi_Host *shost, unsigned long time)
 {
 	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
 
+	if (!pci_device_is_present(ioc->pdev)) {
+		complete(&ioc->port_enable_cmds.done);
+		ioc->port_enable_cmds.status = MPT3_CMD_NOT_USED;
+		ioc->is_driver_loading = 0;
+		ioc->remove_host = 1;
+		return 1;
+	}
+
 	if (disable_discovery > 0) {
 		ioc->is_driver_loading = 0;
 		ioc->wait_for_discovery_to_complete = 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/6] mpt3sas: Separate out  mpt3sas_wait_for_ioc_to_operational
  2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available Suganath Prabu S
@ 2018-09-26  4:22 ` Suganath Prabu S
  2018-09-26 21:03   ` Bjorn Helgaas
  2018-09-26  4:22 ` [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc Suganath Prabu S
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

Introduce mpt3sas_wait_for_ioc_to_operational.

This section of code "wait for IOC to be operational"
is used in many places across the driver,
and hence moved this section of code in to the function
"mpt3sas_wait_for_ioc_to_operational".

Also added HBA hot unplug checks, and this returns with
error code EFAULT, if it detects HBA is hot unplugged
or IOC is not in operational state.

V2 change set:
used pci_device_is_present instead of
mpt3sas_base_pci_device_is_unplugged

v4 Change set:
Dont split strings in print statement

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c      | 92 +++++++++++++++++++-------------
 drivers/scsi/mpt3sas/mpt3sas_base.h      |  4 ++
 drivers/scsi/mpt3sas/mpt3sas_config.c    | 28 +++-------
 drivers/scsi/mpt3sas/mpt3sas_ctl.c       | 26 ++-------
 drivers/scsi/mpt3sas/mpt3sas_transport.c | 75 +++++---------------------
 5 files changed, 81 insertions(+), 144 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index c880e72..9f1d8fb 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -5176,6 +5176,53 @@ _base_send_ioc_reset(struct MPT3SAS_ADAPTER *ioc, u8 reset_type, int timeout)
 }
 
 /**
+ * mpt3sas_wait_for_ioc_to_operational - IOC's operational
+ *		state and HBA hot unplug status are checked here.
+ * @ioc: per adapter object
+ * @wait_count: timeout in seconds
+ *
+ * Return:  Returns EFAULT, if HBA is hot unplugged or IOC is
+ * not in operational state, within the wait_count.
+ * And returns 0, If not hot unplugged Or ioc is in
+ * operational state.
+ */
+
+int
+mpt3sas_wait_for_ioc_to_operational(struct MPT3SAS_ADAPTER *ioc,
+	int wait_count)
+{
+	int wait_state_count = 0;
+	u32 ioc_state;
+
+	if (!pci_device_is_present(ioc->pdev))
+		return -EFAULT;
+
+	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
+	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
+
+		if (!pci_device_is_present(ioc->pdev))
+			return -EFAULT;
+
+		if (wait_state_count++ == wait_count) {
+			pr_err(MPT3SAS_FMT
+			    "%s: failed due to ioc not operational\n",
+			    ioc->name, __func__);
+			return -EFAULT;
+		}
+		ssleep(1);
+		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
+		pr_info(MPT3SAS_FMT
+		    "%s: waiting for operational state(count=%d)\n",
+		     ioc->name, __func__, wait_state_count);
+	}
+	if (wait_state_count)
+		pr_info(MPT3SAS_FMT "%s: ioc is operational\n",
+		    ioc->name, __func__);
+
+	return 0;
+}
+
+/**
  * _base_handshake_req_reply_wait - send request thru doorbell interface
  * @ioc: per adapter object
  * @request_bytes: request length
@@ -5316,11 +5363,9 @@ mpt3sas_base_sas_iounit_control(struct MPT3SAS_ADAPTER *ioc,
 	Mpi2SasIoUnitControlRequest_t *mpi_request)
 {
 	u16 smid;
-	u32 ioc_state;
 	u8 issue_reset = 0;
 	int rc;
 	void *request;
-	u16 wait_state_count;
 
 	dinitprintk(ioc, pr_info(MPT3SAS_FMT "%s\n", ioc->name,
 	    __func__));
@@ -5334,22 +5379,10 @@ mpt3sas_base_sas_iounit_control(struct MPT3SAS_ADAPTER *ioc,
 		goto out;
 	}
 
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == 10) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			rc = -EFAULT;
-			goto out;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name, __func__, wait_state_count);
-	}
+	rc = mpt3sas_wait_for_ioc_to_operational(ioc,
+					IOC_OPERATIONAL_WAIT_COUNT);
+	if (rc)
+		goto out;
 
 	smid = mpt3sas_base_get_smid(ioc, ioc->base_cb_idx);
 	if (!smid) {
@@ -5416,11 +5449,9 @@ mpt3sas_base_scsi_enclosure_processor(struct MPT3SAS_ADAPTER *ioc,
 	Mpi2SepReply_t *mpi_reply, Mpi2SepRequest_t *mpi_request)
 {
 	u16 smid;
-	u32 ioc_state;
 	u8 issue_reset = 0;
 	int rc;
 	void *request;
-	u16 wait_state_count;
 
 	dinitprintk(ioc, pr_info(MPT3SAS_FMT "%s\n", ioc->name,
 	    __func__));
@@ -5434,23 +5465,10 @@ mpt3sas_base_scsi_enclosure_processor(struct MPT3SAS_ADAPTER *ioc,
 		goto out;
 	}
 
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == 10) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			rc = -EFAULT;
-			goto out;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name,
-		    __func__, wait_state_count);
-	}
+	rc = mpt3sas_wait_for_ioc_to_operational(ioc,
+					IOC_OPERATIONAL_WAIT_COUNT);
+	if (rc)
+		goto out;
 
 	smid = mpt3sas_base_get_smid(ioc, ioc->base_cb_idx);
 	if (!smid) {
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index a802ad4..f0351a2 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -139,6 +139,8 @@
 #define DEFAULT_NUM_FWCHAIN_ELEMTS	8
 
 #define FW_IMG_HDR_READ_TIMEOUT	15
+
+#define IOC_OPERATIONAL_WAIT_COUNT	10
 /*
  * NVMe defines
  */
@@ -1481,6 +1483,8 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc);
 
 u8 mpt3sas_base_check_cmd_timeout(struct MPT3SAS_ADAPTER *ioc,
 	u8 status, void *mpi_request, int sz);
+int mpt3sas_wait_for_ioc_to_operational(struct MPT3SAS_ADAPTER *ioc,
+	int wait_count);
 
 /* scsih shared API */
 struct scsi_cmnd *mpt3sas_scsih_scsi_lookup_get(struct MPT3SAS_ADAPTER *ioc,
diff --git a/drivers/scsi/mpt3sas/mpt3sas_config.c b/drivers/scsi/mpt3sas/mpt3sas_config.c
index d29a2dc..5713a2d 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_config.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_config.c
@@ -303,11 +303,10 @@ _config_request(struct MPT3SAS_ADAPTER *ioc, Mpi2ConfigRequest_t
 	void *config_page, u16 config_page_sz)
 {
 	u16 smid;
-	u32 ioc_state;
 	Mpi2ConfigRequest_t *config_request;
 	int r;
 	u8 retry_count, issue_host_reset = 0;
-	u16 wait_state_count;
+
 	struct config_request mem;
 	u32 ioc_status = UINT_MAX;
 
@@ -365,26 +364,11 @@ _config_request(struct MPT3SAS_ADAPTER *ioc, Mpi2ConfigRequest_t
 		pr_info(MPT3SAS_FMT "%s: attempting retry (%d)\n",
 		    ioc->name, __func__, retry_count);
 	}
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == MPT3_CONFIG_PAGE_DEFAULT_TIMEOUT) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			ioc->config_cmds.status = MPT3_CMD_NOT_USED;
-			r = -EFAULT;
-			goto free_mem;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name, __func__, wait_state_count);
-	}
-	if (wait_state_count)
-		pr_info(MPT3SAS_FMT "%s: ioc is operational\n",
-		    ioc->name, __func__);
+
+	r = mpt3sas_wait_for_ioc_to_operational(ioc,
+				MPT3_CONFIG_PAGE_DEFAULT_TIMEOUT);
+	if (r)
+		goto free_mem;
 
 	smid = mpt3sas_base_get_smid(ioc, ioc->config_cb_idx);
 	if (!smid) {
diff --git a/drivers/scsi/mpt3sas/mpt3sas_ctl.c b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
index 5e8c059..a46039c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c
@@ -652,7 +652,6 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command karg,
 	MPI2DefaultReply_t *mpi_reply;
 	Mpi26NVMeEncapsulatedRequest_t *nvme_encap_request = NULL;
 	struct _pcie_device *pcie_device = NULL;
-	u32 ioc_state;
 	u16 smid;
 	u8 timeout;
 	u8 issue_reset;
@@ -665,7 +664,6 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command karg,
 	dma_addr_t data_in_dma = 0;
 	size_t data_in_sz = 0;
 	long ret;
-	u16 wait_state_count;
 	u16 device_handle = MPT3SAS_INVALID_DEVICE_HANDLE;
 	u8 tr_method = MPI26_SCSITASKMGMT_MSGFLAGS_PROTOCOL_LVL_RST_PCIE;
 
@@ -678,26 +676,10 @@ _ctl_do_mpt_command(struct MPT3SAS_ADAPTER *ioc, struct mpt3_ioctl_command karg,
 		goto out;
 	}
 
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == 10) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			ret = -EFAULT;
-			goto out;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name,
-		    __func__, wait_state_count);
-	}
-	if (wait_state_count)
-		pr_info(MPT3SAS_FMT "%s: ioc is operational\n",
-		    ioc->name, __func__);
+	ret = mpt3sas_wait_for_ioc_to_operational(ioc,
+					IOC_OPERATIONAL_WAIT_COUNT);
+	if (ret)
+		goto out;
 
 	mpi_request = kzalloc(ioc->request_sz, GFP_KERNEL);
 	if (!mpi_request) {
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index f8cc267..b10d73e 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -299,7 +299,6 @@ _transport_expander_report_manufacture(struct MPT3SAS_ADAPTER *ioc,
 	struct rep_manu_request *manufacture_request;
 	int rc;
 	u16 smid;
-	u32 ioc_state;
 	void *psge;
 	u8 issue_reset = 0;
 	void *data_out = NULL;
@@ -307,7 +306,6 @@ _transport_expander_report_manufacture(struct MPT3SAS_ADAPTER *ioc,
 	dma_addr_t data_in_dma;
 	size_t data_in_sz;
 	size_t data_out_sz;
-	u16 wait_state_count;
 
 	if (ioc->shost_recovery || ioc->pci_error_recovery) {
 		pr_info(MPT3SAS_FMT "%s: host reset in progress!\n",
@@ -325,25 +323,10 @@ _transport_expander_report_manufacture(struct MPT3SAS_ADAPTER *ioc,
 	}
 	ioc->transport_cmds.status = MPT3_CMD_PENDING;
 
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == 10) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			rc = -EFAULT;
-			goto out;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name, __func__, wait_state_count);
-	}
-	if (wait_state_count)
-		pr_info(MPT3SAS_FMT "%s: ioc is operational\n",
-		    ioc->name, __func__);
+	rc = mpt3sas_wait_for_ioc_to_operational(ioc,
+					IOC_OPERATIONAL_WAIT_COUNT);
+	if (rc)
+		goto out;
 
 	smid = mpt3sas_base_get_smid(ioc, ioc->transport_cb_idx);
 	if (!smid) {
@@ -1089,13 +1072,11 @@ _transport_get_expander_phy_error_log(struct MPT3SAS_ADAPTER *ioc,
 	struct phy_error_log_reply *phy_error_log_reply;
 	int rc;
 	u16 smid;
-	u32 ioc_state;
 	void *psge;
 	u8 issue_reset = 0;
 	void *data_out = NULL;
 	dma_addr_t data_out_dma;
 	u32 sz;
-	u16 wait_state_count;
 
 	if (ioc->shost_recovery || ioc->pci_error_recovery) {
 		pr_info(MPT3SAS_FMT "%s: host reset in progress!\n",
@@ -1113,25 +1094,10 @@ _transport_get_expander_phy_error_log(struct MPT3SAS_ADAPTER *ioc,
 	}
 	ioc->transport_cmds.status = MPT3_CMD_PENDING;
 
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == 10) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			rc = -EFAULT;
-			goto out;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name, __func__, wait_state_count);
-	}
-	if (wait_state_count)
-		pr_info(MPT3SAS_FMT "%s: ioc is operational\n",
-		    ioc->name, __func__);
+	rc = mpt3sas_wait_for_ioc_to_operational(ioc,
+					IOC_OPERATIONAL_WAIT_COUNT);
+	if (rc)
+		goto out;
 
 	smid = mpt3sas_base_get_smid(ioc, ioc->transport_cb_idx);
 	if (!smid) {
@@ -1402,13 +1368,11 @@ _transport_expander_phy_control(struct MPT3SAS_ADAPTER *ioc,
 	struct phy_control_reply *phy_control_reply;
 	int rc;
 	u16 smid;
-	u32 ioc_state;
 	void *psge;
 	u8 issue_reset = 0;
 	void *data_out = NULL;
 	dma_addr_t data_out_dma;
 	u32 sz;
-	u16 wait_state_count;
 
 	if (ioc->shost_recovery || ioc->pci_error_recovery) {
 		pr_info(MPT3SAS_FMT "%s: host reset in progress!\n",
@@ -1426,25 +1390,10 @@ _transport_expander_phy_control(struct MPT3SAS_ADAPTER *ioc,
 	}
 	ioc->transport_cmds.status = MPT3_CMD_PENDING;
 
-	wait_state_count = 0;
-	ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-	while (ioc_state != MPI2_IOC_STATE_OPERATIONAL) {
-		if (wait_state_count++ == 10) {
-			pr_err(MPT3SAS_FMT
-			    "%s: failed due to ioc not operational\n",
-			    ioc->name, __func__);
-			rc = -EFAULT;
-			goto out;
-		}
-		ssleep(1);
-		ioc_state = mpt3sas_base_get_iocstate(ioc, 1);
-		pr_info(MPT3SAS_FMT
-			"%s: waiting for operational state(count=%d)\n",
-			ioc->name, __func__, wait_state_count);
-	}
-	if (wait_state_count)
-		pr_info(MPT3SAS_FMT "%s: ioc is operational\n",
-		    ioc->name, __func__);
+	rc = mpt3sas_wait_for_ioc_to_operational(ioc,
+					IOC_OPERATIONAL_WAIT_COUNT);
+	if (rc)
+		goto out;
 
 	smid = mpt3sas_base_get_smid(ioc, ioc->transport_cb_idx);
 	if (!smid) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/6] mpt3sas: Introdude  _scsih_get_shost_and_ioc.
  2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational Suganath Prabu S
@ 2018-09-26  4:22 ` Suganath Prabu S
  2018-09-26 21:09   ` Bjorn Helgaas
  2018-09-26 21:33   ` Bjorn Helgaas
  2018-09-26  4:22 ` [PATCH v4 4/6] mpt3sas: Fix Sync cache command failure during driver unload Suganath Prabu S
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

The code for getting shost and IOC is redundant so
moved that to function "scsih_get_shost_and_ioc".
Also checks for NULL are added to IOC and shost.

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------
 1 file changed, 82 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 566a550..f6e92eb 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
 }
 
 /**
+ * _scsih_get_shost_and_ioc - get shost and ioc
+ *			and verify whether they are NULL or not
+ * @pdev: PCI device struct
+ * @shost: address of scsi host pointer
+ * @ioc: address of HBA adapter pointer
+ *
+ * Return zero if *shost and *ioc are not NULL otherwise return error number.
+ */
+static int
+_scsih_get_shost_and_ioc(struct pci_dev *pdev,
+	struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc)
+{
+	*shost = pci_get_drvdata(pdev);
+	if (*shost == NULL) {
+		dev_err(&pdev->dev, "pdev's driver data is null\n");
+		return -ENXIO;
+	}
+
+	*ioc = shost_priv(*shost);
+	if (*ioc == NULL) {
+		dev_err(&pdev->dev, "shost's private data is null\n");
+		return -ENXIO;
+	}
+
+	return 0;
+}
+
+
+/**
  * scsih_remove - detach and remove add host
  * @pdev: PCI device struct
  *
@@ -9816,8 +9845,8 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
  */
 static void scsih_remove(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 	struct _sas_port *mpt3sas_port, *next_port;
 	struct _raid_device *raid_device, *next;
 	struct MPT3SAS_TARGET *sas_target_priv_data;
@@ -9825,6 +9854,10 @@ static void scsih_remove(struct pci_dev *pdev)
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) {
+		dev_err(&pdev->dev, "unable to remove device\n");
+		return;
+	}
 	ioc->remove_host = 1;
 
 	mpt3sas_wait_for_commands_to_complete(ioc);
@@ -9898,11 +9931,16 @@ static void scsih_remove(struct pci_dev *pdev)
 static void
 scsih_shutdown(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) {
+		dev_err(&pdev->dev, "unable to shutdown device\n");
+		return;
+	}
+
 	ioc->remove_host = 1;
 
 	mpt3sas_wait_for_commands_to_complete(ioc);
@@ -10727,10 +10765,16 @@ out_add_shost_fail:
 static int
 scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 	pci_power_t device_state;
+	int rc;
 
+	rc = _scsih_get_shost_and_ioc(pdev, &shost, &ioc);
+	if (rc) {
+		dev_err(&pdev->dev, "unable to suspend device\n");
+		return rc;
+	}
 	mpt3sas_base_stop_watchdog(ioc);
 	flush_scheduled_work();
 	scsi_block_requests(shost);
@@ -10754,11 +10798,17 @@ scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 static int
 scsih_resume(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 	pci_power_t device_state = pdev->current_state;
 	int r;
 
+	r = _scsih_get_shost_and_ioc(pdev, &shost, &ioc);
+	if (r) {
+		dev_err(&pdev->dev, "unable to resume device\n");
+		return r;
+	}
+
 	pr_info(MPT3SAS_FMT
 		"pdev=0x%p, slot=%s, previous operating state [D%d]\n",
 		ioc->name, pdev, pci_name(pdev), device_state);
@@ -10790,9 +10840,13 @@ scsih_resume(struct pci_dev *pdev)
 static pci_ers_result_t
 scsih_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 
+	if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) {
+		dev_err(&pdev->dev, "device unavailable\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
 	pr_info(MPT3SAS_FMT "PCI error: detected callback, state(%d)!!\n",
 	    ioc->name, state);
 
@@ -10827,10 +10881,14 @@ scsih_pci_error_detected(struct pci_dev *pdev, pci_channel_state_t state)
 static pci_ers_result_t
 scsih_pci_slot_reset(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 	int rc;
 
+	if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) {
+		dev_err(&pdev->dev, "unable to perform slot reset\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
 	pr_info(MPT3SAS_FMT "PCI error: slot reset callback!!\n",
 	     ioc->name);
 
@@ -10863,9 +10921,13 @@ scsih_pci_slot_reset(struct pci_dev *pdev)
 static void
 scsih_pci_resume(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 
+	if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) {
+		dev_err(&pdev->dev, "unable to resume device\n");
+		return;
+	}
 	pr_info(MPT3SAS_FMT "PCI error: resume callback!!\n", ioc->name);
 
 	pci_cleanup_aer_uncorrect_error_status(pdev);
@@ -10880,9 +10942,13 @@ scsih_pci_resume(struct pci_dev *pdev)
 static pci_ers_result_t
 scsih_pci_mmio_enabled(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost = NULL;
+	struct MPT3SAS_ADAPTER *ioc = NULL;
 
+	if (_scsih_get_shost_and_ioc(pdev, &shost, &ioc)) {
+		dev_err(&pdev->dev, "unable to enable mmio\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
 	pr_info(MPT3SAS_FMT "PCI error: mmio enabled callback!!\n",
 	    ioc->name);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/6] mpt3sas: Fix Sync cache command failure during driver  unload.
  2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
                   ` (2 preceding siblings ...)
  2018-09-26  4:22 ` [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc Suganath Prabu S
@ 2018-09-26  4:22 ` Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 5/6] mpt3sas: Fix driver modifying NVRAM/persistent data Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 6/6] mpt3sas: Bump driver version to 27.100.00.00 Suganath Prabu S
  5 siblings, 0 replies; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

This is to fix Sync cache and start stop command
 failures with DID_NO_CONNECT during driver unload.

1) Release drives first from SML, then remove internally
in driver.
2) And allow sync cache and Start stop commands to firmware,
even when remove_host flag is set

v2 Changeset:
Replaced this function mpt3sas_base_pci_device_is_unplugged
with pci_device_is_present

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 41 ++++++++++++++++++++++++++++++--
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  7 ++++--
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index f6e92eb..5d15d06 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3806,6 +3806,43 @@ _scsih_tm_tr_complete(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
 	return _scsih_check_for_pending_tm(ioc, smid);
 }
 
+/** _scsih_allow_scmd_to_device - check whether scmd needs to
+ *				 issue to IOC or not.
+ * @ioc: per adapter object
+ * @scmd: pointer to scsi command object
+ *
+ * Returns true if scmd can be issued to IOC otherwise returns false.
+ */
+inline bool _scsih_allow_scmd_to_device(struct MPT3SAS_ADAPTER *ioc,
+	struct scsi_cmnd *scmd)
+{
+
+	if (ioc->pci_error_recovery)
+		return false;
+
+	if (ioc->hba_mpi_version_belonged == MPI2_VERSION) {
+		if (ioc->remove_host)
+			return false;
+
+		return true;
+	}
+
+
+	if (ioc->remove_host) {
+		if (!pci_device_is_present(ioc->pdev))
+			return false;
+
+		switch (scmd->cmnd[0]) {
+		case SYNCHRONIZE_CACHE:
+		case START_STOP:
+			return true;
+		default:
+			return false;
+		}
+	}
+
+	return true;
+}
 
 /**
  * _scsih_sas_control_complete - completion routine
@@ -4640,7 +4677,7 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 		return 0;
 	}
 
-	if (ioc->pci_error_recovery || ioc->remove_host) {
+	if (!(_scsih_allow_scmd_to_device(ioc, scmd))) {
 		scmd->result = DID_NO_CONNECT << 16;
 		scmd->scsi_done(scmd);
 		return 0;
@@ -9874,6 +9911,7 @@ static void scsih_remove(struct pci_dev *pdev)
 
 	/* release all the volumes */
 	_scsih_ir_shutdown(ioc);
+	sas_remove_host(shost);
 	list_for_each_entry_safe(raid_device, next, &ioc->raid_device_list,
 	    list) {
 		if (raid_device->starget) {
@@ -9916,7 +9954,6 @@ static void scsih_remove(struct pci_dev *pdev)
 		ioc->sas_hba.num_phys = 0;
 	}
 
-	sas_remove_host(shost);
 	mpt3sas_base_detach(ioc);
 	spin_lock(&gioc_lock);
 	list_del(&ioc->list);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index b10d73e..742da74 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -817,10 +817,13 @@ mpt3sas_transport_port_remove(struct MPT3SAS_ADAPTER *ioc, u64 sas_address,
 			    mpt3sas_port->remote_identify.sas_address,
 			    mpt3sas_phy->phy_id);
 		mpt3sas_phy->phy_belongs_to_port = 0;
-		sas_port_delete_phy(mpt3sas_port->port, mpt3sas_phy->phy);
+		if (!ioc->remove_host)
+			sas_port_delete_phy(mpt3sas_port->port,
+						mpt3sas_phy->phy);
 		list_del(&mpt3sas_phy->port_siblings);
 	}
-	sas_port_delete(mpt3sas_port->port);
+	if (!ioc->remove_host)
+		sas_port_delete(mpt3sas_port->port);
 	kfree(mpt3sas_port);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 5/6] mpt3sas: Fix driver modifying NVRAM/persistent data.
  2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
                   ` (3 preceding siblings ...)
  2018-09-26  4:22 ` [PATCH v4 4/6] mpt3sas: Fix Sync cache command failure during driver unload Suganath Prabu S
@ 2018-09-26  4:22 ` Suganath Prabu S
  2018-09-26  4:22 ` [PATCH v4 6/6] mpt3sas: Bump driver version to 27.100.00.00 Suganath Prabu S
  5 siblings, 0 replies; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

* If EEDPTagMode field in manufacturing page11 is set,
unset it. This is needed to fix a hardware bug
in SAS3/SAS2 cards, So, skipping EEDPTagMode changes
in Manufacturing page11 for SAS35 controllers.

* Fix driver modifying NVRAM/persistent data in
Manufacturing page11 along with current copy. Driver should
change only current copy of Manufacturing page11

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c   | 2 +-
 drivers/scsi/mpt3sas/mpt3sas_config.c | 4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 9f1d8fb..a43579e 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -4122,7 +4122,7 @@ _base_static_config_pages(struct MPT3SAS_ADAPTER *ioc)
 	 * flag unset in NVDATA.
 	 */
 	mpt3sas_config_get_manufacturing_pg11(ioc, &mpi_reply, &ioc->manu_pg11);
-	if (ioc->manu_pg11.EEDPTagMode == 0) {
+	if ((!ioc->is_gen35_ioc) && (ioc->manu_pg11.EEDPTagMode == 0)) {
 		pr_err("%s: overriding NVDATA EEDPTagMode setting\n",
 		    ioc->name);
 		ioc->manu_pg11.EEDPTagMode &= ~0x3;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_config.c b/drivers/scsi/mpt3sas/mpt3sas_config.c
index 5713a2d..f2a326a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_config.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_config.c
@@ -676,10 +676,6 @@ mpt3sas_config_set_manufacturing_pg11(struct MPT3SAS_ADAPTER *ioc,
 	r = _config_request(ioc, &mpi_request, mpi_reply,
 	    MPT3_CONFIG_PAGE_DEFAULT_TIMEOUT, config_page,
 	    sizeof(*config_page));
-	mpi_request.Action = MPI2_CONFIG_ACTION_PAGE_WRITE_NVRAM;
-	r = _config_request(ioc, &mpi_request, mpi_reply,
-	    MPT3_CONFIG_PAGE_DEFAULT_TIMEOUT, config_page,
-	    sizeof(*config_page));
  out:
 	return r;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 6/6] mpt3sas: Bump driver version to 27.100.00.00.
  2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
                   ` (4 preceding siblings ...)
  2018-09-26  4:22 ` [PATCH v4 5/6] mpt3sas: Fix driver modifying NVRAM/persistent data Suganath Prabu S
@ 2018-09-26  4:22 ` Suganath Prabu S
  5 siblings, 0 replies; 24+ messages in thread
From: Suganath Prabu S @ 2018-09-26  4:22 UTC (permalink / raw)
  To: linux-scsi, linux-pci
  Cc: lukas, andy.shevchenko, Sathya.Prakash, sreekanth.reddy,
	Suganath Prabu S

Modify driver version to 27.100.00.00
(which is equivalent to PH8 OOB driver)

Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index f0351a2..b880d79 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -74,8 +74,8 @@
 #define MPT3SAS_DRIVER_NAME		"mpt3sas"
 #define MPT3SAS_AUTHOR "Avago Technologies <MPT-FusionLinux.pdl@avagotech.com>"
 #define MPT3SAS_DESCRIPTION	"LSI MPT Fusion SAS 3.0 Device Driver"
-#define MPT3SAS_DRIVER_VERSION		"26.100.00.00"
-#define MPT3SAS_MAJOR_VERSION		26
+#define MPT3SAS_DRIVER_VERSION		"27.100.00.00"
+#define MPT3SAS_MAJOR_VERSION		27
 #define MPT3SAS_MINOR_VERSION		100
 #define MPT3SAS_BUILD_VERSION		0
 #define MPT3SAS_RELEASE_VERSION	00
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational
  2018-09-26  4:22 ` [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational Suganath Prabu S
@ 2018-09-26 21:03   ` Bjorn Helgaas
  2018-09-27 10:31     ` Suganath Prabu Subramani
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-09-26 21:03 UTC (permalink / raw)
  To: Suganath Prabu S
  Cc: linux-scsi, linux-pci, lukas, andy.shevchenko, Sathya.Prakash,
	sreekanth.reddy

On Wed, Sep 26, 2018 at 09:52:35AM +0530, Suganath Prabu S wrote:
> Introduce mpt3sas_wait_for_ioc_to_operational.
> 
> This section of code "wait for IOC to be operational"
> is used in many places across the driver,
> and hence moved this section of code in to the function
> "mpt3sas_wait_for_ioc_to_operational".
> 
> Also added HBA hot unplug checks, and this returns with
> error code EFAULT, if it detects HBA is hot unplugged
> or IOC is not in operational state.

This should be two patches:

  1) Factor out the "wait for IOC" code.  This should not cause any
     functional changes (I didn't verify in your code, but this is the
     objective).

  2) Add the hot unplug checks.

This makes the patches much easier to review.

> V2 change set:
> used pci_device_is_present instead of
> mpt3sas_base_pci_device_is_unplugged
> 
> v4 Change set:
> Dont split strings in print statement

I don't know the convention in drivers/scsi, but in driver/pci, I
remove this sort of v2/v3 commentary from the changelog because it's
really not of interest after the patch is merged.  It's nice to have
it in the email, but I think if you put it after a line containing
only "--" it will be in the email but not the changelog.

> Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c      | 92 +++++++++++++++++++-------------
>  drivers/scsi/mpt3sas/mpt3sas_base.h      |  4 ++
>  drivers/scsi/mpt3sas/mpt3sas_config.c    | 28 +++-------
>  drivers/scsi/mpt3sas/mpt3sas_ctl.c       | 26 ++-------
>  drivers/scsi/mpt3sas/mpt3sas_transport.c | 75 +++++---------------------
>  5 files changed, 81 insertions(+), 144 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index c880e72..9f1d8fb 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -5176,6 +5176,53 @@ _base_send_ioc_reset(struct MPT3SAS_ADAPTER *ioc, u8 reset_type, int timeout)
>  }
>  
>  /**
> + * mpt3sas_wait_for_ioc_to_operational - IOC's operational
> + *		state and HBA hot unplug status are checked here.
> + * @ioc: per adapter object
> + * @wait_count: timeout in seconds

"wait_count" is really a timeout and maybe should be named "timeout".

> + * Return:  Returns EFAULT, if HBA is hot unplugged or IOC is
> + * not in operational state, within the wait_count.
> + * And returns 0, If not hot unplugged Or ioc is in
> + * operational state.

I think you mean something like:

  Waits up to wait_count seconds for the IOC to become operational.
  Returns 0 if IOC is present and operational; otherwise returns
  -EFAULT.

> + */
> +
> +int
> +mpt3sas_wait_for_ioc_to_operational(struct MPT3SAS_ADAPTER *ioc,
> +	int wait_count)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/6] mpt3sas: Introdude  _scsih_get_shost_and_ioc.
  2018-09-26  4:22 ` [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc Suganath Prabu S
@ 2018-09-26 21:09   ` Bjorn Helgaas
  2018-10-01  7:27     ` Suganath Prabu Subramani
  2018-09-26 21:33   ` Bjorn Helgaas
  1 sibling, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-09-26 21:09 UTC (permalink / raw)
  To: Suganath Prabu S
  Cc: linux-scsi, linux-pci, lukas, andy.shevchenko, Sathya.Prakash,
	sreekanth.reddy

On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote:
> The code for getting shost and IOC is redundant so
> moved that to function "scsih_get_shost_and_ioc".
> Also checks for NULL are added to IOC and shost.
> 
> Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------
>  1 file changed, 82 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 566a550..f6e92eb 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
>  }
>  
>  /**
> + * _scsih_get_shost_and_ioc - get shost and ioc
> + *			and verify whether they are NULL or not
> + * @pdev: PCI device struct
> + * @shost: address of scsi host pointer
> + * @ioc: address of HBA adapter pointer
> + *
> + * Return zero if *shost and *ioc are not NULL otherwise return error number.
> + */
> +static int
> +_scsih_get_shost_and_ioc(struct pci_dev *pdev,
> +	struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc)
> +{
> +	*shost = pci_get_drvdata(pdev);
> +	if (*shost == NULL) {
> +		dev_err(&pdev->dev, "pdev's driver data is null\n");
> +		return -ENXIO;
> +	}
> +
> +	*ioc = shost_priv(*shost);
> +	if (*ioc == NULL) {
> +		dev_err(&pdev->dev, "shost's private data is null\n");
> +		return -ENXIO;

I think it's better to omit NULL pointer checks like these because
there should not be a path where we can execute this code when these
pointers are NULL.

If there *is* such a path, I think that's a serious bug and it's
better to oops when we try to dereference the NULL pointer.  If we
just return an error code, it's likely the bug will be ignored and
never fixed.

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-09-26  4:22 ` [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available Suganath Prabu S
@ 2018-09-26 21:32   ` Bjorn Helgaas
  2018-09-27  7:03     ` Lukas Wunner
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-09-26 21:32 UTC (permalink / raw)
  To: Suganath Prabu S
  Cc: linux-scsi, linux-pci, lukas, andy.shevchenko, Sathya.Prakash,
	sreekanth.reddy, linux-kernel

[+cc LKML]

On Wed, Sep 26, 2018 at 09:52:34AM +0530, Suganath Prabu S wrote:
> * Driver uses "pci_device_is_present" to check whether
> If Hot unplugged:
> the outstanding IOs with 'DID_NO_CONNECT' before removing the drives
> attached to the HBA.

This sentence needs a verb.

> "DID_NO_CONNECT" status and free the smid, if driver detects that
> HBA is hot unplugged.

This sentence also needs a verb.

> * In the hard reset flush out all the outstanding IOs even if diag reset
> fails and also if driver detects that HBA is hot unplugged.
> 
> v1 change set:
> ==============
> unlock mutex before goto "out_unlocked",
> if active reset is in progress.
> 
> v2 change set:
> ==============
> 1) Use pci_device_is_present instead of
> mpt3sas_base_pci_device_is_unplugged.
> 2) As suggested by Lukas, removed using
> watchdog thread for checking hba hot unplug(Patch 02 of V1).
> Added Hot unplug checks in scan finish and reset paths.
> 
> v3 Change Set:
> =============
> Simplified function "mpt3sas_base_pci_device_is_available" and
> made inline.
> 
> v4 Changes:
> ===========
> Dont split strings in print statement.
> 
> Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c  | 39 ++++++++++++++++++++++++++++
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 ++-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 50 ++++++++++++++++++++++++++++++++----
>  3 files changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 59d7844..c880e72 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -543,6 +543,20 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
>  }
>  
>  /**
> + * mpt3sas_base_pci_device_is_available - check whether pci device is
> + *			available for any transactions with FW
> + *
> + * @ioc: per adapter object
> + *
> + * Return 1 if pci device state is up and running else return 0.
> + */
> +inline bool
> +mpt3sas_base_pci_device_is_available(struct MPT3SAS_ADAPTER *ioc)
> +{
> +	return !ioc->pci_error_recovery && pci_device_is_present(ioc->pdev);
> +}
> +
> +/**
>   * _base_fault_reset_work - workq handling ioc fault conditions
>   * @work: input argument, used to derive ioc
>   *
> @@ -6122,6 +6136,11 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc)
>  
>  	count = 0;
>  	do {
> +		if (!pci_device_is_present(ioc->pdev)) {
> +			ioc->remove_host = 1;
> +			pr_err(MPT3SAS_FMT "Hba Hot unplugged\n", ioc->name);

You capitalized as "HBA" above.

> +			goto out;
> +		}
>  		/* Write magic sequence to WriteSequence register
>  		 * Loop until in diagnostic mode
>  		 */
> @@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc)
>  
>  	ioc->pending_io_count = 0;
>  
> +	if (!mpt3sas_base_pci_device_is_available(ioc)) {
> +		pr_err(MPT3SAS_FMT
> +		    "%s: pci error recovery reset or pci device unplug occurred\n",
> +		    ioc->name, __func__);
> +		return;
> +	}
> +
>  	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);

This is a good example of why I don't like pci_device_is_present(): it
is fundamentally racy and gives a false sense of security.  Here we
*think* we're making the code safer, but in fact we could have this
sequence:

  mpt3sas_base_pci_device_is_available()    # returns true
  # device is removed
  ioc_state = mpt3sas_base_get_iocstate()

In this case the readl() inside mpt3sas_base_get_iocstate() will
probably return 0xffffffff data, and we assume that's valid and
continue on our merry way, pretending that "ioc_state" makes sense
when it really doesn't.

>  	if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
>  		return;

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/6] mpt3sas: Introdude  _scsih_get_shost_and_ioc.
  2018-09-26  4:22 ` [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc Suganath Prabu S
  2018-09-26 21:09   ` Bjorn Helgaas
@ 2018-09-26 21:33   ` Bjorn Helgaas
  1 sibling, 0 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2018-09-26 21:33 UTC (permalink / raw)
  To: Suganath Prabu S
  Cc: linux-scsi, linux-pci, lukas, andy.shevchenko, Sathya.Prakash,
	sreekanth.reddy

Trivial nits/questions.  In subject:

  s/Introdude/Introduce/
  s/  / /   (remove double space)
  s/\.//    (remove trailing period, also appears in patches 4, 5, 6)

On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote:
> The code for getting shost and IOC is redundant so
> moved that to function "scsih_get_shost_and_ioc".
> Also checks for NULL are added to IOC and shost.
> 
> Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>

Per [1], it's good to use full names, e.g., spell out "S" when that
makes sense.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n456

> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------
>  1 file changed, 82 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 566a550..f6e92eb 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
>  }
>  
>  /**
> + * _scsih_get_shost_and_ioc - get shost and ioc
> + *			and verify whether they are NULL or not
> + * @pdev: PCI device struct
> + * @shost: address of scsi host pointer

There seems to be a convention to capitalize "IOC" and "SCSI" when
used in English text, i.e., when they're not variable names or part of
a function name.

> + * @ioc: address of HBA adapter pointer
> + *
> + * Return zero if *shost and *ioc are not NULL otherwise return error number.
> + */
> +static int
> +_scsih_get_shost_and_ioc(struct pci_dev *pdev,
> +	struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc)
> +{
> +	*shost = pci_get_drvdata(pdev);
> +	if (*shost == NULL) {
> +		dev_err(&pdev->dev, "pdev's driver data is null\n");

I don't see any other uses of dev_printk() in this file; existing
output seems to use sdev_printk(), pr_info(), etc.  Just pointing this
out to make sure dev_printk() is really what you want here.

> +		return -ENXIO;
> +	}

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-09-26 21:32   ` Bjorn Helgaas
@ 2018-09-27  7:03     ` Lukas Wunner
  2018-09-27 18:47       ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Lukas Wunner @ 2018-09-27  7:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Suganath Prabu S, linux-scsi, linux-pci, andy.shevchenko,
	Sathya.Prakash, sreekanth.reddy, linux-kernel

On Wed, Sep 26, 2018 at 04:32:41PM -0500, Bjorn Helgaas wrote:
> On Wed, Sep 26, 2018 at 09:52:34AM +0530, Suganath Prabu S wrote:
> > @@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc)
> >  
> >  	ioc->pending_io_count = 0;
> >  
> > +	if (!mpt3sas_base_pci_device_is_available(ioc)) {
> > +		pr_err(MPT3SAS_FMT
> > +		    "%s: pci error recovery reset or pci device unplug occurred\n",
> > +		    ioc->name, __func__);
> > +		return;
> > +	}
> > +
> >  	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> 
> This is a good example of why I don't like pci_device_is_present(): it
> is fundamentally racy and gives a false sense of security.  Here we
> *think* we're making the code safer, but in fact we could have this
> sequence:
> 
>   mpt3sas_base_pci_device_is_available()    # returns true
>   # device is removed
>   ioc_state = mpt3sas_base_get_iocstate()
> 
> In this case the readl() inside mpt3sas_base_get_iocstate() will
> probably return 0xffffffff data, and we assume that's valid and
> continue on our merry way, pretending that "ioc_state" makes sense
> when it really doesn't.

The function does the following:

	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
	if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
		return;

where MPI2_IOC_STATE_MASK is 0xF0000000 and MPI2_IOC_STATE_OPERATIONAL
is 0x20000000.  If the device is removed after the call to
mpt3sas_base_pci_device_is_available(), the result of the bitwise "and"
operation would be 0xF0000000, which is unequal to 0x20000000.
Hence this looks safe.

I agree that pci_device_is_present() (and the pci_dev_is_disconnected()
it calls) must be used judiciously, but here it seems to have been done
correctly.

One thing to be aware of is that a return value of "true" from
pci_dev_is_disconnected() is definitive and can be trusted.
On the other hand a return value of "false" is more like a fuzzy
"likely not disconnected, but can't give any guarantees".
So the boolean return value is kind of the problem here.
Boolean logic doesn't really fit these "definitive if true,
not definitive if false" semantics.

However being able to get the definitive answer in the disconnected
case is valuable:  pciehp is the only entity that can determine
surprise removal authoritatively and unambiguously (albeit with
a latency).  All the other tools that we have at our disposal don't
have that quality:  E.g. checking the Vendor ID is ambiguous because
it returns a valid value if a device was quickly replaced with another
one.  Also, all ones may be returned in the case of an Uncorrectable
Error, but the device may revert to valid responses if the error can
be recovered.  (Please correct me if I'm wrong.)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational
  2018-09-26 21:03   ` Bjorn Helgaas
@ 2018-09-27 10:31     ` Suganath Prabu Subramani
  0 siblings, 0 replies; 24+ messages in thread
From: Suganath Prabu Subramani @ 2018-09-27 10:31 UTC (permalink / raw)
  To: helgaas
  Cc: linux-scsi, linux-pci, lukas, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy

Hi Bjorn,

Thanks for reviewing.

On Thu, Sep 27, 2018 at 2:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Sep 26, 2018 at 09:52:35AM +0530, Suganath Prabu S wrote:
> > Introduce mpt3sas_wait_for_ioc_to_operational.
> >
> > This section of code "wait for IOC to be operational"
> > is used in many places across the driver,
> > and hence moved this section of code in to the function
> > "mpt3sas_wait_for_ioc_to_operational".
> >
> > Also added HBA hot unplug checks, and this returns with
> > error code EFAULT, if it detects HBA is hot unplugged
> > or IOC is not in operational state.
>
> This should be two patches:
>
>   1) Factor out the "wait for IOC" code.  This should not cause any
>      functional changes (I didn't verify in your code, but this is the
>      objective).
>
>   2) Add the hot unplug checks.
>
> This makes the patches much easier to review.
> Will move hot unplug check to separate patch.
> > V2 change set:
> > used pci_device_is_present instead of
> > mpt3sas_base_pci_device_is_unplugged
> >
> > v4 Change set:
> > Dont split strings in print statement
>
> I don't know the convention in drivers/scsi, but in driver/pci, I
> remove this sort of v2/v3 commentary from the changelog because it's
> really not of interest after the patch is merged.  It's nice to have
> it in the email, but I think if you put it after a line containing
> only "--" it will be in the email but not the changelog.
>Thanks, Will add change sets after --
> > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_base.c      | 92 +++++++++++++++++++-------------
> >  drivers/scsi/mpt3sas/mpt3sas_base.h      |  4 ++
> >  drivers/scsi/mpt3sas/mpt3sas_config.c    | 28 +++-------
> >  drivers/scsi/mpt3sas/mpt3sas_ctl.c       | 26 ++-------
> >  drivers/scsi/mpt3sas/mpt3sas_transport.c | 75 +++++---------------------
> >  5 files changed, 81 insertions(+), 144 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> > index c880e72..9f1d8fb 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> > @@ -5176,6 +5176,53 @@ _base_send_ioc_reset(struct MPT3SAS_ADAPTER *ioc, u8 reset_type, int timeout)
> >  }
> >
> >  /**
> > + * mpt3sas_wait_for_ioc_to_operational - IOC's operational
> > + *           state and HBA hot unplug status are checked here.
> > + * @ioc: per adapter object
> > + * @wait_count: timeout in seconds
>
> "wait_count" is really a timeout and maybe should be named "timeout".
>Yes, wait_count is timeout, will rename wait_count to timeout
> > + * Return:  Returns EFAULT, if HBA is hot unplugged or IOC is
> > + * not in operational state, within the wait_count.
> > + * And returns 0, If not hot unplugged Or ioc is in
> > + * operational state.
>
> I think you mean something like:
>
>   Waits up to wait_count seconds for the IOC to become operational.
>   Returns 0 if IOC is present and operational; otherwise returns
>   -EFAULT.
> Yes.
> > + */
> > +
> > +int
> > +mpt3sas_wait_for_ioc_to_operational(struct MPT3SAS_ADAPTER *ioc,
> > +     int wait_count)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-09-27  7:03     ` Lukas Wunner
@ 2018-09-27 18:47       ` Bjorn Helgaas
  2018-09-27 19:09         ` Lukas Wunner
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-09-27 18:47 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Suganath Prabu S, linux-scsi, linux-pci, andy.shevchenko,
	Sathya.Prakash, sreekanth.reddy, linux-kernel,
	Benjamin Herrenschmidt, Russell Currey, Sam Bobroff,
	Oliver O'Halloran

[+cc Ben, Russell, Sam, Oliver in case they have pertinent experience from
powerpc error handling; thread begins at
https://lore.kernel.org/linux-pci/1537935759-14754-1-git-send-email-suganath-prabu.subramani@broadcom.com/]

On Thu, Sep 27, 2018 at 09:03:27AM +0200, Lukas Wunner wrote:
> On Wed, Sep 26, 2018 at 04:32:41PM -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 26, 2018 at 09:52:34AM +0530, Suganath Prabu S wrote:
> > > @@ -6853,6 +6872,13 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc)
> > >  
> > >  	ioc->pending_io_count = 0;
> > >  
> > > +	if (!mpt3sas_base_pci_device_is_available(ioc)) {
> > > +		pr_err(MPT3SAS_FMT
> > > +		    "%s: pci error recovery reset or pci device unplug occurred\n",
> > > +		    ioc->name, __func__);
> > > +		return;
> > > +	}
> > > +
> > >  	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> > 
> > This is a good example of why I don't like pci_device_is_present(): it
> > is fundamentally racy and gives a false sense of security.  Here we
> > *think* we're making the code safer, but in fact we could have this
> > sequence:
> > 
> >   mpt3sas_base_pci_device_is_available()    # returns true
> >   # device is removed
> >   ioc_state = mpt3sas_base_get_iocstate()
> > 
> > In this case the readl() inside mpt3sas_base_get_iocstate() will
> > probably return 0xffffffff data, and we assume that's valid and
> > continue on our merry way, pretending that "ioc_state" makes sense
> > when it really doesn't.
> 
> The function does the following:
> 
> 	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> 	if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
> 		return;
> 
> where MPI2_IOC_STATE_MASK is 0xF0000000 and MPI2_IOC_STATE_OPERATIONAL
> is 0x20000000.  If the device is removed after the call to
> mpt3sas_base_pci_device_is_available(), the result of the bitwise "and"
> operation would be 0xF0000000, which is unequal to 0x20000000.
> Hence this looks safe.

I agree this particular case is technically safe, but figuring that
out requires an unreasonable amount of analysis.  And there's no hint
in the code that we need to be concerned about whether the readl()
returns valid data, so the need for the analysis won't even occur to
most readers.

I don't feel good about encouraging this style of adding an explicit
test for whether the device is available, followed by a completely
implicit test that accidentally happens to correctly handle a device
that was removed after the explicit test.

If we instead added a test for ~0 after the readl(), we would avoid
the race and give the reader a clue that *any* read from the device
can potentially fail without advance warning.

> I agree that pci_device_is_present() (and the pci_dev_is_disconnected()
> it calls) must be used judiciously, but here it seems to have been done
> correctly.
> 
> One thing to be aware of is that a return value of "true" from
> pci_dev_is_disconnected() is definitive and can be trusted.
> On the other hand a return value of "false" is more like a fuzzy
> "likely not disconnected, but can't give any guarantees".
> So the boolean return value is kind of the problem here.
> Boolean logic doesn't really fit these "definitive if true,
> not definitive if false" semantics.
> 
> However being able to get the definitive answer in the disconnected
> case is valuable:  pciehp is the only entity that can determine
> surprise removal authoritatively and unambiguously (albeit with
> a latency).  All the other tools that we have at our disposal don't
> have that quality:  E.g. checking the Vendor ID is ambiguous because
> it returns a valid value if a device was quickly replaced with another
> one.  Also, all ones may be returned in the case of an Uncorrectable
> Error, but the device may revert to valid responses if the error can
> be recovered.  (Please correct me if I'm wrong.)

I think everything you said above is true, but I'm not yet convinced
that it's being applied usefully in mpt3sas.

  bool pci_dev_is_disconnected(pdev)       # "true" is definitive
  {
    return test_bit(PCI_DEV_DISCONNECTED, &dev->priv_flags);
  }

  bool pci_device_is_present(pdev)         # "false" is definitive
  {
    if (pci_dev_is_disconnected(pdev))
      return false;
    return pci_bus_read_dev_vendor_id(...);
  }

  mpt3sas_base_pci_device_is_available(ioc)  # "false" is definitive
  {
    return !ioc->pci_error_recovery && pci_device_is_present(ioc->pdev);
  }

  mpt3sas_wait_for_commands_to_complete(...)
  {
    ...
 +  if (!mpt3sas_base_pci_device_is_available(ioc))
 +    return;

    # In the definitive case, we returned above.  If we get here, we
    # think the device *might* be present.  The readl() inside
    # mpt3sas_base_get_iocstate() might fail (returning ~0 data).
    # It happens that (~0 & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL
    # so we will return below if the device was removed after the
    # check above.

    ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
    if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
      return;
    ...


I think this code is unreasonably complicated.  The multiple layers
and negations make it very difficult to figure out what's definitive.

I'm not sure how mpt3sas benefits from adding
mpt3sas_base_pci_device_is_available() here, other than the fact that
it avoids an MMIO read to the device in most cases.  I think it would
be simpler and better to make mpt3sas_base_get_iocstate() smarter
about handling ~0 values from the readl().

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-09-27 18:47       ` Bjorn Helgaas
@ 2018-09-27 19:09         ` Lukas Wunner
  2018-10-01  6:57           ` Suganath Prabu Subramani
  0 siblings, 1 reply; 24+ messages in thread
From: Lukas Wunner @ 2018-09-27 19:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Suganath Prabu S, linux-scsi, linux-pci, andy.shevchenko,
	Sathya.Prakash, sreekanth.reddy, linux-kernel,
	Benjamin Herrenschmidt, Russell Currey, Sam Bobroff,
	Oliver O'Halloran

On Thu, Sep 27, 2018 at 01:47:46PM -0500, Bjorn Helgaas wrote:
>   mpt3sas_wait_for_commands_to_complete(...)
>   {
>     ...
>  +  if (!mpt3sas_base_pci_device_is_available(ioc))
>  +    return;
> 
>     # In the definitive case, we returned above.  If we get here, we
>     # think the device *might* be present.  The readl() inside
>     # mpt3sas_base_get_iocstate() might fail (returning ~0 data).
>     # It happens that (~0 & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL
>     # so we will return below if the device was removed after the
>     # check above.
> 
>     ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
>     if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
>       return;
>     ...
> 
> 
> I think this code is unreasonably complicated.  The multiple layers
> and negations make it very difficult to figure out what's definitive.

I agree there's room for improvement.


> I'm not sure how mpt3sas benefits from adding
> mpt3sas_base_pci_device_is_available() here, other than the fact that
> it avoids an MMIO read to the device in most cases.  I think it would
> be simpler and better to make mpt3sas_base_get_iocstate() smarter
> about handling ~0 values from the readl().

Avoiding an MMIO read when it's known to run into a Completion Timeout
buys about 17 ms.  If the function is executed many times (I don't know
if that's the case here, I'm referring to the general case), or if bailing
out of it early avoids significant amounts of further I/O, then checking
for disconnectedness makes sense.

The 17 ms number is from this talk:
https://www.youtube.com/watch?v=GJ6B0xzgvlM&t=832

It's the typical Completion Timeout on an Intel chip, but it can be up to
50 ms in the default setting or up to 64 s if so configured in the Device
Control 2 register (PCIe r4.0 sec 7.8.16).

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-09-27 19:09         ` Lukas Wunner
@ 2018-10-01  6:57           ` Suganath Prabu Subramani
  2018-10-01 20:40             ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Suganath Prabu Subramani @ 2018-10-01  6:57 UTC (permalink / raw)
  To: lukas
  Cc: helgaas, linux-scsi, linux-pci, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel, benh, ruscur, sbobroff, oohall

Hi Lucas and Bjorn,
Thanks for reviewing and providing useful comments.

On Fri, Sep 28, 2018 at 12:40 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Thu, Sep 27, 2018 at 01:47:46PM -0500, Bjorn Helgaas wrote:
> >   mpt3sas_wait_for_commands_to_complete(...)
> >   {
> >     ...
> >  +  if (!mpt3sas_base_pci_device_is_available(ioc))
> >  +    return;
> >
> >     # In the definitive case, we returned above.  If we get here, we
> >     # think the device *might* be present.  The readl() inside
> >     # mpt3sas_base_get_iocstate() might fail (returning ~0 data).
> >     # It happens that (~0 & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL
> >     # so we will return below if the device was removed after the
> >     # check above.
> >
> >     ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
> >     if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
> >       return;
> >     ...
> >
> >
> > I think this code is unreasonably complicated.  The multiple layers
> > and negations make it very difficult to figure out what's definitive.
>
> I agree there's room for improvement.
>
>
> > I'm not sure how mpt3sas benefits from adding
> > mpt3sas_base_pci_device_is_available() here, other than the fact that
> > it avoids an MMIO read to the device in most cases.  I think it would
> > be simpler and better to make mpt3sas_base_get_iocstate() smarter
> > about handling ~0 values from the readl().
>
> Avoiding an MMIO read when it's known to run into a Completion Timeout
> buys about 17 ms.  If the function is executed many times (I don't know
> if that's the case here, I'm referring to the general case), or if bailing
> out of it early avoids significant amounts of further I/O, then checking
> for disconnectedness makes sense.
>
> The 17 ms number is from this talk:
> https://www.youtube.com/watch?v=GJ6B0xzgvlM&t=832
>
> It's the typical Completion Timeout on an Intel chip, but it can be up to
> 50 ms in the default setting or up to 64 s if so configured in the Device
> Control 2 register (PCIe r4.0 sec 7.8.16).
>

This function is called only during controller reset, system shutdown
and driver unload operations.
For this case we can remove mpt3sas_base_pci_device_is_available() check,
since we can make the device removal from readl in
mpt3sas_base_get_iocstate() as you suggested.
But we need mpt3sas_base_pci_device_is_available() in other places
like while submitting the
IO/TMs to the HBA firmware etc where driver is not checking for the
IOC state (through readl() operation)
 to gracefully handle the HBA hot-unplug operation.

> Thanks,
>
> Lukas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc.
  2018-09-26 21:09   ` Bjorn Helgaas
@ 2018-10-01  7:27     ` Suganath Prabu Subramani
  2018-10-01 13:40       ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Suganath Prabu Subramani @ 2018-10-01  7:27 UTC (permalink / raw)
  To: helgaas
  Cc: linux-scsi, linux-pci, lukas, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy

On Thu, Sep 27, 2018 at 2:39 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote:
> > The code for getting shost and IOC is redundant so
> > moved that to function "scsih_get_shost_and_ioc".
> > Also checks for NULL are added to IOC and shost.
> >
> > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------
> >  1 file changed, 82 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index 566a550..f6e92eb 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
> >  }
> >
> >  /**
> > + * _scsih_get_shost_and_ioc - get shost and ioc
> > + *                   and verify whether they are NULL or not
> > + * @pdev: PCI device struct
> > + * @shost: address of scsi host pointer
> > + * @ioc: address of HBA adapter pointer
> > + *
> > + * Return zero if *shost and *ioc are not NULL otherwise return error number.
> > + */
> > +static int
> > +_scsih_get_shost_and_ioc(struct pci_dev *pdev,
> > +     struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc)
> > +{
> > +     *shost = pci_get_drvdata(pdev);
> > +     if (*shost == NULL) {
> > +             dev_err(&pdev->dev, "pdev's driver data is null\n");
> > +             return -ENXIO;
> > +     }
> > +
> > +     *ioc = shost_priv(*shost);
> > +     if (*ioc == NULL) {
> > +             dev_err(&pdev->dev, "shost's private data is null\n");
> > +             return -ENXIO;
>
> I think it's better to omit NULL pointer checks like these because
> there should not be a path where we can execute this code when these
> pointers are NULL.
>
> If there *is* such a path, I think that's a serious bug and it's
> better to oops when we try to dereference the NULL pointer.  If we
> just return an error code, it's likely the bug will be ignored and
> never fixed.
>
We have added the ioc and shost checks, because of kernel panic in
below scenario.
Have 3 HBA's in system and perform below operation.
1) Run “poweroff”.
2) Immediate hot unplug HBA.
We have observed, kernel's shutdown process has removed all the 3 HBA
devices smoothly,
and also user have unplugged the HBA device during this time. PCI
subsystem's hot-plug thread is also trying to remove
the hot plugged PCI device which is already removed/cleaned by the
shutdown process. (Which is not expected for the
already removed device) Due to this kernel panic is observed. And we
are not sure whether it
has to fixed from pciehp layer, so we added sanity checks in driver.

[ 1745.605510] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000a98
[ 1745.606554] IP: [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas]
[ 1745.607609] PGD 0
[ 1745.608621] Oops: 0000 [#1] SMP
[ 1745.622989] CPU: 0 PID: 668 Comm: kworker/0:2 Tainted: G
O    4.4.55-1.el7.elrepo.x86_64 #1
[ 1745.624800] Hardware name: PRO-MNU65930231
PRO-NME69559126/BRD-PRO55212588, BIOS 0.51e 05/08/2017
[ 1745.626673] Workqueue: pciehp-3 pciehp_power_thread
[ 1745.628566] task: ffff881fe50dd880 ti: ffff881fe88e4000 task.ti:
ffff881fe88e4000
[ 1745.630530] RIP: 0010:[<ffffffffa054c750>]  [<ffffffffa054c750>]
scsih_remove+0x20/0x2d0 [mpt3sas]
[ 1745.632577] RSP: 0018:ffff881fe88e7c98  EFLAGS: 00010292
[ 1745.634639] RAX: 0000000000000001 RBX: ffff881feef5c000 RCX: 0000000000000000
[ 1745.636718] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff881feef5c000
[ 1745.638832] RBP: ffff881fe88e7cc8 R08: 0000000000000000 R09: 0000000180220002
[ 1745.640972] R10: 00000000eaf4a401 R11: ffffea007fabd280 R12: 0000000000000000
[ 1745.643136] R13: ffffffffa0576020 R14: ffff881fe9af8240 R15: 0000000000000000
[ 1745.645320] FS:  0000000000000000(0000) GS:ffff881ffde00000(0000)
knlGS:0000000000000000
[ 1745.647572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1745.649833] CR2: 0000000000000a98 CR3: 0000001fe76df000 CR4: 00000000003406f0
[ 1745.652147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1745.654476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1745.656825] Stack:
[ 1745.659138]  ffff881fe88e7cc8 ffff881feef5c098 ffff881feef5c000
ffffffffa0576020
[ 1745.661562]  ffff881fe9af8240 0000000000000000 ffff881fe88e7cf0
ffffffff8137f9d9
[ 1745.663990]  ffff881feef5c098 ffffffffa0576088 ffff881feef5c000
ffff881fe88e7d10
[ 1745.666428] Call Trace:
[ 1745.668830]  [<ffffffff8137f9d9>] pci_device_remove+0x39/0xc0
[ 1745.671256]  [<ffffffff8147db36>] __device_release_driver+0x96/0x130
[ 1745.673664]  [<ffffffff8147dbf3>] device_release_driver+0x23/0x30
[ 1745.676071]  [<ffffffff8137851c>] pci_stop_bus_device+0x8c/0xa0
[ 1745.678485]  [<ffffffff81378612>] pci_stop_and_remove_bus_device+0x12/0x20
[ 1745.680909]  [<ffffffff81392eea>] pciehp_unconfigure_device+0xaa/0x1b0
[ 1745.683331]  [<ffffffff813929e2>] pciehp_disable_slot+0x52/0xd0
[ 1745.685767]  [<ffffffff81392aed>] pciehp_power_thread+0x8d/0xb0
[ 1745.688210]  [<ffffffff8109728f>] process_one_work+0x14f/0x400
[ 1745.690633]  [<ffffffff81097b04>] worker_thread+0x114/0x470
[ 1745.693080]  [<ffffffff810979f0>] ? rescuer_thread+0x310/0x310
[ 1745.695518]  [<ffffffff8109d648>] kthread+0xd8/0xf0
[ 1745.697936]  [<ffffffff8109d570>] ? kthread_park+0x60/0x60
[ 1745.700359]  [<ffffffff816f854f>] ret_from_fork+0x3f/0x70
[ 1745.702747]  [<ffffffff8109d570>] ? kthread_park+0x60/0x60

> Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc.
  2018-10-01  7:27     ` Suganath Prabu Subramani
@ 2018-10-01 13:40       ` Bjorn Helgaas
  2018-10-08  6:47         ` Suganath Prabu Subramani
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-10-01 13:40 UTC (permalink / raw)
  To: Suganath Prabu Subramani
  Cc: linux-scsi, linux-pci, lukas, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel

[+cc LKML]

On Mon, Oct 01, 2018 at 12:57:01PM +0530, Suganath Prabu Subramani wrote:
> On Thu, Sep 27, 2018 at 2:39 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote:
> > > The code for getting shost and IOC is redundant so
> > > moved that to function "scsih_get_shost_and_ioc".
> > > Also checks for NULL are added to IOC and shost.
> > >
> > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> > > ---
> > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------
> > >  1 file changed, 82 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > index 566a550..f6e92eb 100644
> > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
> > >  }
> > >
> > >  /**
> > > + * _scsih_get_shost_and_ioc - get shost and ioc
> > > + *                   and verify whether they are NULL or not
> > > + * @pdev: PCI device struct
> > > + * @shost: address of scsi host pointer
> > > + * @ioc: address of HBA adapter pointer
> > > + *
> > > + * Return zero if *shost and *ioc are not NULL otherwise return error number.
> > > + */
> > > +static int
> > > +_scsih_get_shost_and_ioc(struct pci_dev *pdev,
> > > +     struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc)
> > > +{
> > > +     *shost = pci_get_drvdata(pdev);
> > > +     if (*shost == NULL) {
> > > +             dev_err(&pdev->dev, "pdev's driver data is null\n");
> > > +             return -ENXIO;
> > > +     }
> > > +
> > > +     *ioc = shost_priv(*shost);
> > > +     if (*ioc == NULL) {
> > > +             dev_err(&pdev->dev, "shost's private data is null\n");
> > > +             return -ENXIO;
> >
> > I think it's better to omit NULL pointer checks like these because
> > there should not be a path where we can execute this code when these
> > pointers are NULL.
> >
> > If there *is* such a path, I think that's a serious bug and it's
> > better to oops when we try to dereference the NULL pointer.  If we
> > just return an error code, it's likely the bug will be ignored and
> > never fixed.
> >
> We have added the ioc and shost checks, because of kernel panic in
> below scenario.
> Have 3 HBA's in system and perform below operation.
> 1) Run “poweroff”.
> 2) Immediate hot unplug HBA.
> We have observed, kernel's shutdown process has removed all the 3
> HBA devices smoothly, and also user have unplugged the HBA device
> during this time. PCI subsystem's hot-plug thread is also trying to
> remove the hot plugged PCI device which is already removed/cleaned
> by the shutdown process. (Which is not expected for the already
> removed device) Due to this kernel panic is observed. And we are not
> sure whether it has to fixed from pciehp layer, so we added sanity
> checks in driver.

This is a great example.  scsih_remove() is the mpt3sas_driver.remove
method.  It sounds like it's getting called twice for the same HBA.
I think that's a PCI core defect and we should fix it there instead of
trying to work around it in every driver.

The trace below is from v4.4.55.  Can you reproduce the same problem
with a current tree, e.g., v4.19-rc?  There have been many changes
since v4.4 that may have fixed this problem.

> [ 1745.605510] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000a98
> [ 1745.606554] IP: [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas]
> [ 1745.607609] PGD 0
> [ 1745.608621] Oops: 0000 [#1] SMP
> [ 1745.622989] CPU: 0 PID: 668 Comm: kworker/0:2 Tainted: G
> O    4.4.55-1.el7.elrepo.x86_64 #1
> [ 1745.624800] Hardware name: PRO-MNU65930231
> PRO-NME69559126/BRD-PRO55212588, BIOS 0.51e 05/08/2017
> [ 1745.626673] Workqueue: pciehp-3 pciehp_power_thread
> [ 1745.628566] task: ffff881fe50dd880 ti: ffff881fe88e4000 task.ti:
> ffff881fe88e4000
> [ 1745.630530] RIP: 0010:[<ffffffffa054c750>]  [<ffffffffa054c750>]
> scsih_remove+0x20/0x2d0 [mpt3sas]
> [ 1745.632577] RSP: 0018:ffff881fe88e7c98  EFLAGS: 00010292
> [ 1745.634639] RAX: 0000000000000001 RBX: ffff881feef5c000 RCX: 0000000000000000
> [ 1745.636718] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff881feef5c000
> [ 1745.638832] RBP: ffff881fe88e7cc8 R08: 0000000000000000 R09: 0000000180220002
> [ 1745.640972] R10: 00000000eaf4a401 R11: ffffea007fabd280 R12: 0000000000000000
> [ 1745.643136] R13: ffffffffa0576020 R14: ffff881fe9af8240 R15: 0000000000000000
> [ 1745.645320] FS:  0000000000000000(0000) GS:ffff881ffde00000(0000)
> knlGS:0000000000000000
> [ 1745.647572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1745.649833] CR2: 0000000000000a98 CR3: 0000001fe76df000 CR4: 00000000003406f0
> [ 1745.652147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1745.654476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1745.656825] Stack:
> [ 1745.659138]  ffff881fe88e7cc8 ffff881feef5c098 ffff881feef5c000
> ffffffffa0576020
> [ 1745.661562]  ffff881fe9af8240 0000000000000000 ffff881fe88e7cf0
> ffffffff8137f9d9
> [ 1745.663990]  ffff881feef5c098 ffffffffa0576088 ffff881feef5c000
> ffff881fe88e7d10
> [ 1745.666428] Call Trace:
> [ 1745.668830]  [<ffffffff8137f9d9>] pci_device_remove+0x39/0xc0
> [ 1745.671256]  [<ffffffff8147db36>] __device_release_driver+0x96/0x130
> [ 1745.673664]  [<ffffffff8147dbf3>] device_release_driver+0x23/0x30
> [ 1745.676071]  [<ffffffff8137851c>] pci_stop_bus_device+0x8c/0xa0
> [ 1745.678485]  [<ffffffff81378612>] pci_stop_and_remove_bus_device+0x12/0x20
> [ 1745.680909]  [<ffffffff81392eea>] pciehp_unconfigure_device+0xaa/0x1b0
> [ 1745.683331]  [<ffffffff813929e2>] pciehp_disable_slot+0x52/0xd0
> [ 1745.685767]  [<ffffffff81392aed>] pciehp_power_thread+0x8d/0xb0
> [ 1745.688210]  [<ffffffff8109728f>] process_one_work+0x14f/0x400
> [ 1745.690633]  [<ffffffff81097b04>] worker_thread+0x114/0x470
> [ 1745.693080]  [<ffffffff810979f0>] ? rescuer_thread+0x310/0x310
> [ 1745.695518]  [<ffffffff8109d648>] kthread+0xd8/0xf0
> [ 1745.697936]  [<ffffffff8109d570>] ? kthread_park+0x60/0x60
> [ 1745.700359]  [<ffffffff816f854f>] ret_from_fork+0x3f/0x70
> [ 1745.702747]  [<ffffffff8109d570>] ? kthread_park+0x60/0x60

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-10-01  6:57           ` Suganath Prabu Subramani
@ 2018-10-01 20:40             ` Bjorn Helgaas
  2018-10-02 14:04               ` Bjorn Helgaas
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-10-01 20:40 UTC (permalink / raw)
  To: Suganath Prabu Subramani
  Cc: lukas, linux-scsi, linux-pci, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel, benh, ruscur, sbobroff, oohall

On Mon, Oct 01, 2018 at 12:27:28PM +0530, Suganath Prabu Subramani wrote:
> On Fri, Sep 28, 2018 at 12:40 AM Lukas Wunner <lukas@wunner.de> wrote:
> > On Thu, Sep 27, 2018 at 01:47:46PM -0500, Bjorn Helgaas wrote:

> > > I'm not sure how mpt3sas benefits from adding
> > > mpt3sas_base_pci_device_is_available() here, other than the fact that
> > > it avoids an MMIO read to the device in most cases.  I think it would
> > > be simpler and better to make mpt3sas_base_get_iocstate() smarter
> > > about handling ~0 values from the readl().
> >
> > Avoiding an MMIO read when it's known to run into a Completion Timeout
> > buys about 17 ms.  If the function is executed many times (I don't know
> > if that's the case here, I'm referring to the general case), or if bailing
> > out of it early avoids significant amounts of further I/O, then checking
> > for disconnectedness makes sense.

I agree that if we know the device is gone, it's helpful to avoid
further I/O to it.  The misleading pattern used in this patch is:

  R1) Check for device presence
  R2) Read from device
  R3) Consume data from device

That promotes the misconception that "the device is present, so we got
valid data from it."  But in fact the device may disappear between R1
and R2, so the following pattern is better:

  A) Read from device
  B) Check data for validity, e.g., look for ~0
  C) Consume data if valid

If we're writing to the device, we don't expect any response, and it
makes sense to do:

  W1) Check for device presence
  W2) If device present, write to device

Obviously the device can disappear between W1 and W2, so this can't
eliminate *all* writes to non-existent devices, but if we want to
reduce them and we're only doing writes to the device (with no reads),
this is the best we can do.

We can learn that the device is gone in several ways: pciehp might
notice the link is down, the driver might notice a ~0 return, etc.

The driver writer knows the structure of communication with the
device, so he/she should know the appropriate places to check for
valid data from reads and where to check to minimize the number of
writes to a non-existent device.

> This function is called only during controller reset, system shutdown
> and driver unload operations.
> For this case we can remove mpt3sas_base_pci_device_is_available() check,
> since we can make the device removal from readl in
> mpt3sas_base_get_iocstate() as you suggested.

> But we need mpt3sas_base_pci_device_is_available() in other places
> like while submitting the
> IO/TMs to the HBA firmware etc where driver is not checking for the
> IOC state (through readl() operation)
> to gracefully handle the HBA hot-unplug operation.

This is the W1/W2 case above, and what you can do is constrained by
the device model, i.e., where it requires reads from the device.  I
agree you may want to check whether the device is absent to minimize
writes after a device has been removed.

I think the names "pci_device_is_present()" and
"mpt3sas_base_pci_device_is_available()" contribute to the problem
because they make promises that can't be kept -- all we can say is
that the device *was* present, but we know whether it is *still*
present.  I think it would be better if the interfaces were something
like "pci_device_is_absent()" because that gives a result we can rely
on.  If that returns true, we know the device is definitely gone.

Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-10-01 20:40             ` Bjorn Helgaas
@ 2018-10-02 14:04               ` Bjorn Helgaas
  2018-10-08  6:44                 ` Suganath Prabu Subramani
  0 siblings, 1 reply; 24+ messages in thread
From: Bjorn Helgaas @ 2018-10-02 14:04 UTC (permalink / raw)
  To: Suganath Prabu Subramani
  Cc: lukas, linux-scsi, linux-pci, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel, benh, ruscur, sbobroff, oohall

On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote:
> I think the names "pci_device_is_present()" and
> "mpt3sas_base_pci_device_is_available()" contribute to the problem
> because they make promises that can't be kept -- all we can say is
> that the device *was* present, but we know whether it is *still*
> present.

Oops, I meant "we DON'T know whether it is still present."

> I think it would be better if the interfaces were something
> like "pci_device_is_absent()" because that gives a result we can rely
> on.  If that returns true, we know the device is definitely gone.
> 
> Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-10-02 14:04               ` Bjorn Helgaas
@ 2018-10-08  6:44                 ` Suganath Prabu Subramani
  2018-10-12  5:47                   ` Sreekanth Reddy
  2018-10-12 23:43                   ` Bjorn Helgaas
  0 siblings, 2 replies; 24+ messages in thread
From: Suganath Prabu Subramani @ 2018-10-08  6:44 UTC (permalink / raw)
  To: helgaas
  Cc: lukas, linux-scsi, linux-pci, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel, benh, ruscur, sbobroff, Oliver

On Tue, Oct 2, 2018 at 7:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote:
> > I think the names "pci_device_is_present()" and
> > "mpt3sas_base_pci_device_is_available()" contribute to the problem
> > because they make promises that can't be kept -- all we can say is
> > that the device *was* present, but we know whether it is *still*
> > present.

Bjorn,

In the patch we are using '!' (i.e. not operation) of pci_device_is_present(),
which is logically same as pci_device_is absent, and it is
same for mpt3sas_base_pci_device_is_available().
My understanding is that, you want us to rename these functions for
better readability
Is that correct ?


> Oops, I meant "we DON'T know whether it is still present."
>
> > I think it would be better if the interfaces were something
> > like "pci_device_is_absent()" because that gives a result we can rely
> > on.  If that returns true, we know the device is definitely gone.
> >
> > Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc.
  2018-10-01 13:40       ` Bjorn Helgaas
@ 2018-10-08  6:47         ` Suganath Prabu Subramani
  0 siblings, 0 replies; 24+ messages in thread
From: Suganath Prabu Subramani @ 2018-10-08  6:47 UTC (permalink / raw)
  To: helgaas
  Cc: linux-scsi, linux-pci, lukas, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel

Hi Bjorn,
We tried to recreate the issue with latest kernel multiple
times.(Without including this patch,)
but we were not able to recreate it. On 4.4 kernel we hit this issue
time long back.
So we are not sure that this issue is really fixed with latest kernel or not.
We prefer to keep this patch as this only adds sanity check.
We are also okay, if you suggest us to completely remove this patch.

Thanks,
Suganath Prabu
On Mon, Oct 1, 2018 at 7:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc LKML]
>
> On Mon, Oct 01, 2018 at 12:57:01PM +0530, Suganath Prabu Subramani wrote:
> > On Thu, Sep 27, 2018 at 2:39 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Sep 26, 2018 at 09:52:36AM +0530, Suganath Prabu S wrote:
> > > > The code for getting shost and IOC is redundant so
> > > > moved that to function "scsih_get_shost_and_ioc".
> > > > Also checks for NULL are added to IOC and shost.
> > > >
> > > > Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> > > > ---
> > > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 98 ++++++++++++++++++++++++++++++------
> > > >  1 file changed, 82 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > index 566a550..f6e92eb 100644
> > > > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > > > @@ -9809,6 +9809,35 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
> > > >  }
> > > >
> > > >  /**
> > > > + * _scsih_get_shost_and_ioc - get shost and ioc
> > > > + *                   and verify whether they are NULL or not
> > > > + * @pdev: PCI device struct
> > > > + * @shost: address of scsi host pointer
> > > > + * @ioc: address of HBA adapter pointer
> > > > + *
> > > > + * Return zero if *shost and *ioc are not NULL otherwise return error number.
> > > > + */
> > > > +static int
> > > > +_scsih_get_shost_and_ioc(struct pci_dev *pdev,
> > > > +     struct Scsi_Host **shost, struct MPT3SAS_ADAPTER **ioc)
> > > > +{
> > > > +     *shost = pci_get_drvdata(pdev);
> > > > +     if (*shost == NULL) {
> > > > +             dev_err(&pdev->dev, "pdev's driver data is null\n");
> > > > +             return -ENXIO;
> > > > +     }
> > > > +
> > > > +     *ioc = shost_priv(*shost);
> > > > +     if (*ioc == NULL) {
> > > > +             dev_err(&pdev->dev, "shost's private data is null\n");
> > > > +             return -ENXIO;
> > >
> > > I think it's better to omit NULL pointer checks like these because
> > > there should not be a path where we can execute this code when these
> > > pointers are NULL.
> > >
> > > If there *is* such a path, I think that's a serious bug and it's
> > > better to oops when we try to dereference the NULL pointer.  If we
> > > just return an error code, it's likely the bug will be ignored and
> > > never fixed.
> > >
> > We have added the ioc and shost checks, because of kernel panic in
> > below scenario.
> > Have 3 HBA's in system and perform below operation.
> > 1) Run “poweroff”.
> > 2) Immediate hot unplug HBA.
> > We have observed, kernel's shutdown process has removed all the 3
> > HBA devices smoothly, and also user have unplugged the HBA device
> > during this time. PCI subsystem's hot-plug thread is also trying to
> > remove the hot plugged PCI device which is already removed/cleaned
> > by the shutdown process. (Which is not expected for the already
> > removed device) Due to this kernel panic is observed. And we are not
> > sure whether it has to fixed from pciehp layer, so we added sanity
> > checks in driver.
>
> This is a great example.  scsih_remove() is the mpt3sas_driver.remove
> method.  It sounds like it's getting called twice for the same HBA.
> I think that's a PCI core defect and we should fix it there instead of
> trying to work around it in every driver.
>
> The trace below is from v4.4.55.  Can you reproduce the same problem
> with a current tree, e.g., v4.19-rc?  There have been many changes
> since v4.4 that may have fixed this problem.
>
> > [ 1745.605510] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000a98
> > [ 1745.606554] IP: [<ffffffffa054c750>] scsih_remove+0x20/0x2d0 [mpt3sas]
> > [ 1745.607609] PGD 0
> > [ 1745.608621] Oops: 0000 [#1] SMP
> > [ 1745.622989] CPU: 0 PID: 668 Comm: kworker/0:2 Tainted: G
> > O    4.4.55-1.el7.elrepo.x86_64 #1
> > [ 1745.624800] Hardware name: PRO-MNU65930231
> > PRO-NME69559126/BRD-PRO55212588, BIOS 0.51e 05/08/2017
> > [ 1745.626673] Workqueue: pciehp-3 pciehp_power_thread
> > [ 1745.628566] task: ffff881fe50dd880 ti: ffff881fe88e4000 task.ti:
> > ffff881fe88e4000
> > [ 1745.630530] RIP: 0010:[<ffffffffa054c750>]  [<ffffffffa054c750>]
> > scsih_remove+0x20/0x2d0 [mpt3sas]
> > [ 1745.632577] RSP: 0018:ffff881fe88e7c98  EFLAGS: 00010292
> > [ 1745.634639] RAX: 0000000000000001 RBX: ffff881feef5c000 RCX: 0000000000000000
> > [ 1745.636718] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff881feef5c000
> > [ 1745.638832] RBP: ffff881fe88e7cc8 R08: 0000000000000000 R09: 0000000180220002
> > [ 1745.640972] R10: 00000000eaf4a401 R11: ffffea007fabd280 R12: 0000000000000000
> > [ 1745.643136] R13: ffffffffa0576020 R14: ffff881fe9af8240 R15: 0000000000000000
> > [ 1745.645320] FS:  0000000000000000(0000) GS:ffff881ffde00000(0000)
> > knlGS:0000000000000000
> > [ 1745.647572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1745.649833] CR2: 0000000000000a98 CR3: 0000001fe76df000 CR4: 00000000003406f0
> > [ 1745.652147] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 1745.654476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 1745.656825] Stack:
> > [ 1745.659138]  ffff881fe88e7cc8 ffff881feef5c098 ffff881feef5c000
> > ffffffffa0576020
> > [ 1745.661562]  ffff881fe9af8240 0000000000000000 ffff881fe88e7cf0
> > ffffffff8137f9d9
> > [ 1745.663990]  ffff881feef5c098 ffffffffa0576088 ffff881feef5c000
> > ffff881fe88e7d10
> > [ 1745.666428] Call Trace:
> > [ 1745.668830]  [<ffffffff8137f9d9>] pci_device_remove+0x39/0xc0
> > [ 1745.671256]  [<ffffffff8147db36>] __device_release_driver+0x96/0x130
> > [ 1745.673664]  [<ffffffff8147dbf3>] device_release_driver+0x23/0x30
> > [ 1745.676071]  [<ffffffff8137851c>] pci_stop_bus_device+0x8c/0xa0
> > [ 1745.678485]  [<ffffffff81378612>] pci_stop_and_remove_bus_device+0x12/0x20
> > [ 1745.680909]  [<ffffffff81392eea>] pciehp_unconfigure_device+0xaa/0x1b0
> > [ 1745.683331]  [<ffffffff813929e2>] pciehp_disable_slot+0x52/0xd0
> > [ 1745.685767]  [<ffffffff81392aed>] pciehp_power_thread+0x8d/0xb0
> > [ 1745.688210]  [<ffffffff8109728f>] process_one_work+0x14f/0x400
> > [ 1745.690633]  [<ffffffff81097b04>] worker_thread+0x114/0x470
> > [ 1745.693080]  [<ffffffff810979f0>] ? rescuer_thread+0x310/0x310
> > [ 1745.695518]  [<ffffffff8109d648>] kthread+0xd8/0xf0
> > [ 1745.697936]  [<ffffffff8109d570>] ? kthread_park+0x60/0x60
> > [ 1745.700359]  [<ffffffff816f854f>] ret_from_fork+0x3f/0x70
> > [ 1745.702747]  [<ffffffff8109d570>] ? kthread_park+0x60/0x60

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-10-08  6:44                 ` Suganath Prabu Subramani
@ 2018-10-12  5:47                   ` Sreekanth Reddy
  2018-10-12 23:43                   ` Bjorn Helgaas
  1 sibling, 0 replies; 24+ messages in thread
From: Sreekanth Reddy @ 2018-10-12  5:47 UTC (permalink / raw)
  To: Suganath Prabu Subramani, helgaas
  Cc: lukas, linux-scsi, linux-pci, Andy Shevchenko,
	Sathya Prakash Veerichetty, linux-kernel, benh, ruscur, sbobroff,
	Oliver

HI Bjorn,

Please provide your valuable suggestion/reply here.

Thank you,
Sreekanth

-----Original Message-----
From: Suganath Prabu Subramani
[mailto:suganath-prabu.subramani@broadcom.com]
Sent: Monday, October 8, 2018 12:15 PM
To: helgaas@kernel.org
Cc: lukas@wunner.de; linux-scsi@vger.kernel.org; linux-pci@vger.kernel.org;
Andy Shevchenko; Sathya Prakash; Sreekanth Reddy;
linux-kernel@vger.kernel.org; benh@kernel.crashing.org; ruscur@russell.cc;
sbobroff@linux.ibm.com; Oliver
Subject: Re: [PATCH v4 1/6] mpt3sas: Introduce
mpt3sas_base_pci_device_is_available

On Tue, Oct 2, 2018 at 7:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote:
> > I think the names "pci_device_is_present()" and
> > "mpt3sas_base_pci_device_is_available()" contribute to the problem
> > because they make promises that can't be kept -- all we can say is
> > that the device *was* present, but we know whether it is *still*
> > present.

Bjorn,

In the patch we are using '!' (i.e. not operation) of
pci_device_is_present(),
which is logically same as pci_device_is absent, and it is
same for mpt3sas_base_pci_device_is_available().
My understanding is that, you want us to rename these functions for
better readability
Is that correct ?


> Oops, I meant "we DON'T know whether it is still present."
>
> > I think it would be better if the interfaces were something
> > like "pci_device_is_absent()" because that gives a result we can rely
> > on.  If that returns true, we know the device is definitely gone.
> >
> > Bjorn

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available
  2018-10-08  6:44                 ` Suganath Prabu Subramani
  2018-10-12  5:47                   ` Sreekanth Reddy
@ 2018-10-12 23:43                   ` Bjorn Helgaas
  1 sibling, 0 replies; 24+ messages in thread
From: Bjorn Helgaas @ 2018-10-12 23:43 UTC (permalink / raw)
  To: Suganath Prabu Subramani
  Cc: lukas, linux-scsi, linux-pci, Andy Shevchenko, Sathya Prakash,
	Sreekanth Reddy, linux-kernel, benh, ruscur, sbobroff, Oliver

On Mon, Oct 08, 2018 at 12:14:40PM +0530, Suganath Prabu Subramani wrote:
> On Tue, Oct 2, 2018 at 7:34 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Mon, Oct 01, 2018 at 03:40:51PM -0500, Bjorn Helgaas wrote:
> > > I think the names "pci_device_is_present()" and
> > > "mpt3sas_base_pci_device_is_available()" contribute to the
> > > problem because they make promises that can't be kept -- all we
> > > can say is that the device *was* present, but we don't know
> > > whether it is *still* present.
> 
> In the patch we are using '!' (i.e. not operation) of
> pci_device_is_present(), which is logically same as pci_device_is
> absent, and it is same for mpt3sas_base_pci_device_is_available().
>
> My understanding is that, you want us to rename these functions for
> better readability.  Is that correct ?

I think "pci_device_is_present()" is a poor name, but I'm not
suggesting you fix it in this patch.  Changing that would be a PCI
core change that would also touch the tg3, igb, nvme, and now mpt3sas
drivers that use it.

My personal opinion is that you should do something like the patch
below.  The main point is that you should for the device being
unreachable *at the places where you might learn that*.

If you WRITE to a device that has been removed, the write will mostly
likely get dropped and you won't learn anything.  But when you READ
from a device that has been removed, you'll most likely get ~0 data
back, and you can check for that.

Of course, it's also possible that pci_device_is_present() can tell
you something.  So you *could* sprinkle those all over, as in your
patch.  But you end up with code like this:

  if (!pci_device_is_present())
    return;

  writel();
  readl();

Here, the device might be removed right after pci_device_is_present()
returns true.  You do the writel() and the readl() anyway, so you
really haven't gained anything.  And if the readl() returned ~0, you
assume it's valid data when it's not.

It's better to do this:

  writel();
  val = readl();
  if (val == ~0) {
    /* device is unreachable, clean things up */
    ...
  }

(Obviously it's possible that ~0 is a valid value for whatever you
read from the device.  That depends on the device and you have to use
your knowledge of the hardware.  If you read ~0 from a register where
that might be a valid value, you can read from a different register
for with ~0 is *not* a valid value.)

I don't claim the patch below is complete because I don't know
anything about your hardware and what sort of registers or ring
buffers it uses.  You still have to arrange to flush or complete
whatever is outstanding when you learn the device is gone.

Bjorn


diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 59d7844ee022..37b09362b31a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -6145,6 +6145,11 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc)
 		drsprintk(ioc, pr_info(MPT3SAS_FMT
 			"wrote magic sequence: count(%d), host_diagnostic(0x%08x)\n",
 		    ioc->name, count, host_diagnostic));
+		if (host_diagnostic == ~0) {
+			ioc->remove_host = 1;
+			pr_err(MPT3SAS_FMT "HBA unreachable\n", ioc->name);
+			goto out;
+		}
 
 	} while ((host_diagnostic & MPI2_DIAG_DIAG_WRITE_ENABLE) == 0);
 
@@ -6237,6 +6242,11 @@ _base_make_ioc_ready(struct MPT3SAS_ADAPTER *ioc, enum reset_type type)
 	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
 	dhsprintk(ioc, pr_info(MPT3SAS_FMT "%s: ioc_state(0x%08x)\n",
 	    ioc->name, __func__, ioc_state));
+	if (ioc_state == ~0) {
+		pr_err(MPT3SAS_FMT "%s: HBA unreachable (ioc_state=0x%x)\n",
+			ioc->name, __func__, ioc_state);
+		return -EFAULT;
+	}
 
 	/* if in RESET state, it should move to READY state shortly */
 	count = 0;
@@ -6251,6 +6261,11 @@ _base_make_ioc_ready(struct MPT3SAS_ADAPTER *ioc, enum reset_type type)
 			}
 			ssleep(1);
 			ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
+			if (ioc_state == ~0) {
+				pr_err(MPT3SAS_FMT "%s: HBA unreachable (ioc_state=0x%x)\n",
+					ioc->name, __func__, ioc_state);
+				return -EFAULT;
+			}
 		}
 	}
 
@@ -6854,6 +6869,11 @@ mpt3sas_wait_for_commands_to_complete(struct MPT3SAS_ADAPTER *ioc)
 	ioc->pending_io_count = 0;
 
 	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
+	if (ioc_state == ~0) {
+		pr_err(MPT3SAS_FMT "%s: HBA unreachable\n",
+		       ioc->name, __func__);
+		return;
+	}
 	if ((ioc_state & MPI2_IOC_STATE_MASK) != MPI2_IOC_STATE_OPERATIONAL)
 		return;
 
@@ -6909,6 +6929,14 @@ mpt3sas_base_hard_reset_handler(struct MPT3SAS_ADAPTER *ioc,
 	    MPT3_DIAG_BUFFER_IS_RELEASED))) {
 		is_trigger = 1;
 		ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
+		if (ioc_state == ~0) {
+			pr_err(MPT3SAS_FMT "%s: HBA unreachable\n",
+			       ioc->name, __func__);
+			ioc->schedule_dead_ioc_flush_running_cmds(ioc);
+			r = 0;
+			mutex_unlock(&ioc->reset_in_progress_mutex);
+			goto out_unlocked;
+		}
 		if ((ioc_state & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT)
 			is_fault = 1;
 	}
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 53133cfd420f..d0d4c8d94a10 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2653,6 +2653,11 @@ mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle, u64 lun,
 	}
 
 	ioc_state = mpt3sas_base_get_iocstate(ioc, 0);
+	if (ioc_state == ~0) {
+		pr_info(MPT3SAS_FMT "%s: HBA unreachable\n",
+                    __func__, ioc->name);
+		return FAILED;
+	}
 	if (ioc_state & MPI2_DOORBELL_USED) {
 		dhsprintk(ioc, pr_info(MPT3SAS_FMT
 			"unexpected doorbell active!\n", ioc->name));

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-10-12 23:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-26  4:22 [PATCH v4 0/6] mpt3sas: Hot-Plug Surprise removal support on IOC Suganath Prabu S
2018-09-26  4:22 ` [PATCH v4 1/6] mpt3sas: Introduce mpt3sas_base_pci_device_is_available Suganath Prabu S
2018-09-26 21:32   ` Bjorn Helgaas
2018-09-27  7:03     ` Lukas Wunner
2018-09-27 18:47       ` Bjorn Helgaas
2018-09-27 19:09         ` Lukas Wunner
2018-10-01  6:57           ` Suganath Prabu Subramani
2018-10-01 20:40             ` Bjorn Helgaas
2018-10-02 14:04               ` Bjorn Helgaas
2018-10-08  6:44                 ` Suganath Prabu Subramani
2018-10-12  5:47                   ` Sreekanth Reddy
2018-10-12 23:43                   ` Bjorn Helgaas
2018-09-26  4:22 ` [PATCH v4 2/6] mpt3sas: Separate out mpt3sas_wait_for_ioc_to_operational Suganath Prabu S
2018-09-26 21:03   ` Bjorn Helgaas
2018-09-27 10:31     ` Suganath Prabu Subramani
2018-09-26  4:22 ` [PATCH v4 3/6] mpt3sas: Introdude _scsih_get_shost_and_ioc Suganath Prabu S
2018-09-26 21:09   ` Bjorn Helgaas
2018-10-01  7:27     ` Suganath Prabu Subramani
2018-10-01 13:40       ` Bjorn Helgaas
2018-10-08  6:47         ` Suganath Prabu Subramani
2018-09-26 21:33   ` Bjorn Helgaas
2018-09-26  4:22 ` [PATCH v4 4/6] mpt3sas: Fix Sync cache command failure during driver unload Suganath Prabu S
2018-09-26  4:22 ` [PATCH v4 5/6] mpt3sas: Fix driver modifying NVRAM/persistent data Suganath Prabu S
2018-09-26  4:22 ` [PATCH v4 6/6] mpt3sas: Bump driver version to 27.100.00.00 Suganath Prabu S

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).