All of lore.kernel.org
 help / color / mirror / Atom feed
* mpt2sas,mpt3sas watchdog device removal
@ 2013-05-15 17:24 Joe Lawrence
  2013-05-15 17:26 ` [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal safe Joe Lawrence
  2013-05-15 17:29 ` mpt2sas,mpt3sas watchdog device removal Joe Lawrence
  0 siblings, 2 replies; 12+ messages in thread
From: Joe Lawrence @ 2013-05-15 17:24 UTC (permalink / raw)
  To: linux-scsi
  Cc: James E.J. Bottomley, Sreekanth Reddy, Desai, Kashyap,
	Nagalakshmi Nandigama, Bjorn Helgaas

This is a followup to the earlier discussion of MPT watchdog device
removal calling directly into PCI core API:

[1] http://thread.gmane.org/gmane.linux.scsi/80629

I've tested two safer, alternative methods of removing MPT hosts from
the SCSI topology. Both involve wrapping the existing MPT .remove
routine, ensuring mutual exclusion between regularly scheduled PCI
device removal and the drivers' periodic watchdog thread. Both changes
tested well against surprise PCI removal of LSI SAS 9211-8i HBAs while
driving direct IO out to attached disks.

The first version is straightforward, essentially just adding a common
mutex and checking that the driver still cares about a given PCI device
before removing.

The second version is an attempt to detach only from the SCSI topology as
soon as possible. Later PCI removal cleans up the rest of the resources.

Neither patch maps cleanly to the MPT fusion driver as its watchdog
thread resides in the mptbase module. The code currently uses PCI core
API pci_remove_bus_device to route around the driver's module
dependencies to call from mptbase to mptsas:

Module                  Size  Used by
mptsas                 62366  8                      << PCI .remove
mptscsih               38803  1 mptsas
mptbase                99878  2 mptsas,mptscsih      << watchdog

The fusion driver has devised other means of calling from mptbase to
mptscsih, for example, via the schedule_dead_ioc_flush_running_cmds
function pointer. The two changes I explored were made with a relatively
light hand, so I didn't know how best to proceed with a MPT fusion
patch.

Comments on either removal patch strategy welcome. It would also be
great if we had documentation guiding the SCSI LLDs in how to safely and
completely remove attached hosts in hotplug and defective HW scenarios.

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal safe
  2013-05-15 17:24 mpt2sas,mpt3sas watchdog device removal Joe Lawrence
@ 2013-05-15 17:26 ` Joe Lawrence
  2013-05-17 15:29   ` Bjorn Helgaas
  2013-05-15 17:29 ` mpt2sas,mpt3sas watchdog device removal Joe Lawrence
  1 sibling, 1 reply; 12+ messages in thread
From: Joe Lawrence @ 2013-05-15 17:26 UTC (permalink / raw)
  To: linux-scsi
  Cc: James E.J. Bottomley, Sreekanth Reddy, Desai, Kashyap,
	Nagalakshmi Nandigama, Bjorn Helgaas

>From 9fc1a958ad48718216fbdc19405297dd11d11539 Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@stratus.com>
Date: Tue, 14 May 2013 15:41:17 -0400
Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal
 safe

Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
device removal races with PCI callback functions such as device removal.

Modify the MPT PCI callbacks to verify Scsi_Host attachment under a
common mutex to prevent racing the driver's watchdog device removal.
(PCI callbacks can trust pci_dev existence.)

To protect the watchdog device removal thread from racing PCI removal,
first wrapper the existing PCI device .remove function. The watchdog
should directly call this function, not PCI hotplug API. Under the same
common mutex, this routine should verify that pci_dev is on driver's
ioc_list before proceeding with device removal. (Asynchronous driver
watchdog kthread cannot trust pci_dev existence.)

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/mpt2sas/mpt2sas_base.c  | 24 +++------
 drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 +
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 96 ++++++++++++++++++++++++++++--------
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 22 +++------
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 95 ++++++++++++++++++++++++++++-------
 6 files changed, 171 insertions(+), 70 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
index bcb23d2..eb24ddd 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -116,24 +116,16 @@ module_param_call(mpt2sas_fwfault_debug, _scsih_set_fwfault_debug,
 
 /**
  *  mpt2sas_remove_dead_ioc_func - kthread context to remove dead ioc
- * @arg: input argument, used to derive ioc
+ * @pdev: PCI device struct
  *
- * Return 0 if controller is removed from pci subsystem.
- * Return -1 for other case.
+ * Returns 0.
  */
-static int mpt2sas_remove_dead_ioc_func(void *arg)
+static int
+mpt2sas_remove_dead_ioc_func(void *arg)
 {
-		struct MPT2SAS_ADAPTER *ioc = (struct MPT2SAS_ADAPTER *)arg;
-		struct pci_dev *pdev;
-
-		if ((ioc == NULL))
-			return -1;
-
-		pdev = ioc->pdev;
-		if ((pdev == NULL))
-			return -1;
-		pci_stop_and_remove_bus_device(pdev);
-		return 0;
+	/* mpt2sas_scsih_detach_pci will validate pci_dev */
+	mpt2sas_scsih_detach_pci((struct pci_dev *)arg);
+	return 0;
 }
 
 
@@ -192,7 +184,7 @@ _base_fault_reset_work(struct work_struct *work)
 		 */
 		ioc->remove_host = 1;
 		/*Remove the Dead Host */
-		p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc,
+		p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc->pdev,
 		    "mpt2sas_dead_ioc_%d", ioc->id);
 		if (IS_ERR(p)) {
 			printk(MPT2SAS_ERR_FMT
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
index 4caaac1..d88515d 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -1079,6 +1079,8 @@ void mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc);
 
 void mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
 
+void mpt2sas_scsih_detach_pci(struct pci_dev *pdev);
+
 /* config shared API */
 u8 mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
     u32 reply);
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index c6bdc92..8bff162 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -78,6 +78,8 @@ static int _scsih_scan_finished(struct Scsi_Host *shost, unsigned long time);
 LIST_HEAD(mpt2sas_ioc_list);
 
 /* local parameters */
+DEFINE_MUTEX(_mpt2sas_pci_mutex);
+
 static u8 scsi_io_cb_idx = -1;
 static u8 tm_cb_idx = -1;
 static u8 ctl_cb_idx = -1;
@@ -7660,11 +7662,17 @@ _scsih_ir_shutdown(struct MPT2SAS_ADAPTER *ioc)
 static void
 _scsih_shutdown(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost;
+	struct MPT2SAS_ADAPTER *ioc;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	mutex_lock(&_mpt2sas_pci_mutex);
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+	ioc = shost_priv(shost);
+
 	ioc->remove_host = 1;
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -7677,26 +7685,41 @@ _scsih_shutdown(struct pci_dev *pdev)
 
 	_scsih_ir_shutdown(ioc);
 	mpt2sas_base_detach(ioc);
+out:
+	mutex_unlock(&_mpt2sas_pci_mutex);
 }
 
 /**
- * _scsih_remove - detach and remove add host
+ * mpt2sas_scsih_detach_pci - detach shost from PCI device
  * @pdev: PCI device struct
  *
- * Routine called when unloading the driver.
+ * Routine called by pci driver .remove callback and watchdog-created
+ *   mpt2sas_remove_dead_ioc_func kthread.
  * Return nothing.
  */
-static void
-_scsih_remove(struct pci_dev *pdev)
+void
+mpt2sas_scsih_detach_pci(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost;
+	struct MPT2SAS_ADAPTER *ioc;
 	struct _sas_port *mpt2sas_port, *next_port;
 	struct _raid_device *raid_device, *next;
 	struct MPT2SAS_TARGET *sas_target_priv_data;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	mutex_lock(&_mpt2sas_pci_mutex);
+	/* verify driver still knows about this pdev */
+	list_for_each_entry(ioc, &mpt2sas_ioc_list, list) {
+		if (ioc->pdev == pdev)
+			goto remove;
+	}
+	goto out;
+remove:
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+
 	ioc->remove_host = 1;
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -7750,6 +7773,21 @@ _scsih_remove(struct pci_dev *pdev)
 	list_del(&ioc->list);
 	scsi_remove_host(shost);
 	scsi_host_put(shost);
+out:
+	mutex_unlock(&_mpt2sas_pci_mutex);
+}
+
+/**
+ * _scsih_remove - detach and remove host
+ * @pdev: PCI device struct
+ *
+ * Routine called when unloading the driver and hotplug remove.
+ * Return nothing.
+ */
+static void
+_scsih_remove(struct pci_dev *pdev)
+{
+	mpt2sas_scsih_detach_pci(pdev);
 }
 
 /**
@@ -8159,10 +8197,16 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static int
 _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost;
+	struct MPT2SAS_ADAPTER *ioc;
 	pci_power_t device_state;
 
+	mutex_lock(&_mpt2sas_pci_mutex);
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+	ioc = shost_priv(shost);
+
 	mpt2sas_base_stop_watchdog(ioc);
 	scsi_block_requests(shost);
 	device_state = pci_choose_state(pdev, state);
@@ -8174,6 +8218,9 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 	pci_save_state(pdev);
 	pci_disable_device(pdev);
 	pci_set_power_state(pdev, device_state);
+out:
+	mutex_unlock(&_mpt2sas_pci_mutex);
+
 	return 0;
 }
 
@@ -8186,10 +8233,17 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 static int
 _scsih_resume(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
-	pci_power_t device_state = pdev->current_state;
-	int r;
+	struct Scsi_Host *shost;
+	struct MPT2SAS_ADAPTER *ioc;
+	pci_power_t device_state;
+	int r = 0;
+
+	mutex_lock(&_mpt2sas_pci_mutex);
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+	ioc = shost_priv(shost);
+	device_state = pdev->current_state;
 
 	printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, previous "
 	    "operating state [D%d]\n", ioc->name, pdev,
@@ -8200,13 +8254,15 @@ _scsih_resume(struct pci_dev *pdev)
 	pci_restore_state(pdev);
 	ioc->pdev = pdev;
 	r = mpt2sas_base_map_resources(ioc);
-	if (r)
-		return r;
+	if (!r) {
+		mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
+		scsi_unblock_requests(shost);
+		mpt2sas_base_start_watchdog(ioc);
+	}
+out:
+	mutex_unlock(&_mpt2sas_pci_mutex);
 
-	mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
-	scsi_unblock_requests(shost);
-	mpt2sas_base_start_watchdog(ioc);
-	return 0;
+	return r;
 }
 #endif /* CONFIG_PM */
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1836003..9d4f314 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -111,23 +111,15 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug,
 
 /**
  *  mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc
- * @arg: input argument, used to derive ioc
+ * @pdev: PCI device struct
  *
- * Return 0 if controller is removed from pci subsystem.
- * Return -1 for other case.
+ * Returns 0.
  */
-static int mpt3sas_remove_dead_ioc_func(void *arg)
+static int
+mpt3sas_remove_dead_ioc_func(void *arg)
 {
-	struct MPT3SAS_ADAPTER *ioc = (struct MPT3SAS_ADAPTER *)arg;
-	struct pci_dev *pdev;
-
-	if ((ioc == NULL))
-		return -1;
-
-	pdev = ioc->pdev;
-	if ((pdev == NULL))
-		return -1;
-	pci_stop_and_remove_bus_device(pdev);
+	/* mpt3sas_scsih_detach_pci will validate pci_dev */
+	mpt3sas_scsih_detach_pci((struct pci_dev *)arg);
 	return 0;
 }
 
@@ -173,7 +165,7 @@ _base_fault_reset_work(struct work_struct *work)
 		 */
 		ioc->remove_host = 1;
 		/*Remove the Dead Host */
-		p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc,
+		p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc->pdev,
 		    "mpt3sas_dead_ioc_%d", ioc->id);
 		if (IS_ERR(p))
 			pr_err(MPT3SAS_FMT
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 994656c..225c84f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1009,6 +1009,8 @@ struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
 
 void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
 
+void mpt3sas_scsih_detach_pci(struct pci_dev *pdev);
+
 /* config shared API */
 u8 mpt3sas_config_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
 	u32 reply);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index dcbf7c8..5b8c365 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -81,6 +81,8 @@ static int _scsih_scan_finished(struct Scsi_Host *shost, unsigned long time);
 LIST_HEAD(mpt3sas_ioc_list);
 
 /* local parameters */
+DEFINE_MUTEX(_mpt3sas_pci_mutex);
+
 static u8 scsi_io_cb_idx = -1;
 static u8 tm_cb_idx = -1;
 static u8 ctl_cb_idx = -1;
@@ -7365,22 +7367,36 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
 }
 
 /**
- * _scsih_remove - detach and remove add host
+ * mpt3sas_scsih_detach_pci - detach shost from PCI device
  * @pdev: PCI device struct
  *
- * Routine called when unloading the driver.
+ * Routine called by pci driver .remove callback and watchdog-created
+ *   mpt3sas_remove_dead_ioc_func kthread.
  * Return nothing.
  */
-static void _scsih_remove(struct pci_dev *pdev)
+void
+mpt3sas_scsih_detach_pci(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost;
+	struct MPT3SAS_ADAPTER *ioc;
 	struct _sas_port *mpt3sas_port, *next_port;
 	struct _raid_device *raid_device, *next;
 	struct MPT3SAS_TARGET *sas_target_priv_data;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	mutex_lock(&_mpt3sas_pci_mutex);
+	/* verify driver still knows about this pdev */
+	list_for_each_entry(ioc, &mpt3sas_ioc_list, list) {
+		if (ioc->pdev == pdev)
+			goto remove;
+	}
+	goto out;
+remove:
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+
 	ioc->remove_host = 1;
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -7434,6 +7450,21 @@ static void _scsih_remove(struct pci_dev *pdev)
 	list_del(&ioc->list);
 	scsi_remove_host(shost);
 	scsi_host_put(shost);
+out:
+	mutex_unlock(&_mpt3sas_pci_mutex);
+}
+
+/**
+ * _scsih_remove - detach and remove host
+ * @pdev: PCI device struct
+ *
+ * Routine called when unloading the driver and hotplug remove.
+ * Return nothing.
+ */
+static void
+_scsih_remove(struct pci_dev *pdev)
+{
+	mpt3sas_scsih_detach_pci(pdev);
 }
 
 /**
@@ -7445,11 +7476,17 @@ static void _scsih_remove(struct pci_dev *pdev)
 static void
 _scsih_shutdown(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost;
+	struct MPT3SAS_ADAPTER *ioc;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	mutex_lock(&_mpt3sas_pci_mutex);
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+	ioc = shost_priv(shost);
+
 	ioc->remove_host = 1;
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -7462,6 +7499,8 @@ _scsih_shutdown(struct pci_dev *pdev)
 
 	_scsih_ir_shutdown(ioc);
 	mpt3sas_base_detach(ioc);
+out:
+	mutex_unlock(&_mpt3sas_pci_mutex);
 }
 
 
@@ -7854,10 +7893,16 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static int
 _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct Scsi_Host *shost;
+	struct MPT3SAS_ADAPTER *ioc;
 	pci_power_t device_state;
 
+	mutex_lock(&_mpt3sas_pci_mutex);
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+	ioc = shost_priv(shost);
+
 	mpt3sas_base_stop_watchdog(ioc);
 	flush_scheduled_work();
 	scsi_block_requests(shost);
@@ -7869,6 +7914,9 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 	pci_save_state(pdev);
 	mpt3sas_base_free_resources(ioc);
 	pci_set_power_state(pdev, device_state);
+out:
+	mutex_unlock(&_mpt3sas_pci_mutex);
+
 	return 0;
 }
 
@@ -7881,10 +7929,17 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 static int
 _scsih_resume(struct pci_dev *pdev)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
-	pci_power_t device_state = pdev->current_state;
-	int r;
+	struct Scsi_Host *shost;
+	struct MPT3SAS_ADAPTER *ioc;
+	pci_power_t device_state;
+	int r = 0;
+
+	mutex_lock(&_mpt3sas_pci_mutex);
+	shost = pci_get_drvdata(pdev);
+	if (!shost)
+		goto out;
+	ioc = shost_priv(shost);
+	device_state = pdev->current_state;
 
 	pr_info(MPT3SAS_FMT
 		"pdev=0x%p, slot=%s, previous operating state [D%d]\n",
@@ -7895,13 +7950,15 @@ _scsih_resume(struct pci_dev *pdev)
 	pci_restore_state(pdev);
 	ioc->pdev = pdev;
 	r = mpt3sas_base_map_resources(ioc);
-	if (r)
-		return r;
+	if (!r) {
+		mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
+		scsi_unblock_requests(shost);
+		mpt3sas_base_start_watchdog(ioc);
+	}
+out:
+	mutex_unlock(&_mpt3sas_pci_mutex);
 
-	mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
-	scsi_unblock_requests(shost);
-	mpt3sas_base_start_watchdog(ioc);
-	return 0;
+	return r;
 }
 #endif /* CONFIG_PM */
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-05-15 17:24 mpt2sas,mpt3sas watchdog device removal Joe Lawrence
  2013-05-15 17:26 ` [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal safe Joe Lawrence
@ 2013-05-15 17:29 ` Joe Lawrence
  2013-05-17 15:29   ` Bjorn Helgaas
  2013-07-16 12:00   ` Reddy, Sreekanth
  1 sibling, 2 replies; 12+ messages in thread
From: Joe Lawrence @ 2013-05-15 17:29 UTC (permalink / raw)
  To: linux-scsi
  Cc: James E.J. Bottomley, Sreekanth Reddy, Desai, Kashyap,
	Nagalakshmi Nandigama, Bjorn Helgaas

>From 84ac7a35ebd61e84d4254eae78bb967de17254c2 Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@stratus.com>
Date: Wed, 15 May 2013 12:52:31 -0400
Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal
 safe

Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
device removal races with PCI callback functions.

Simplify the mpt2sas watchdog mechanism by separating PCI device removal
from SCSI midlayer detachment. When the watchdog wishes to remove a SCSI
device from the topology, detach from the SCSI midlayer, leaving the PCI
device alone. Adjust various pci_driver callbacks to account for a
potentially SCSI detached PCI device.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/mpt2sas/mpt2sas_base.c  | 43 +---------------
 drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 +
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 97 +++++++++++++++++++++---------------
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 43 +---------------
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 65 +++++++++++++++---------
 6 files changed, 105 insertions(+), 147 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
index bcb23d2..cc1bf28 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -57,7 +57,6 @@
 #include <linux/sort.h>
 #include <linux/io.h>
 #include <linux/time.h>
-#include <linux/kthread.h>
 #include <linux/aer.h>
 
 #include "mpt2sas_base.h"
@@ -115,29 +114,6 @@ module_param_call(mpt2sas_fwfault_debug, _scsih_set_fwfault_debug,
     param_get_int, &mpt2sas_fwfault_debug, 0644);
 
 /**
- *  mpt2sas_remove_dead_ioc_func - kthread context to remove dead ioc
- * @arg: input argument, used to derive ioc
- *
- * Return 0 if controller is removed from pci subsystem.
- * Return -1 for other case.
- */
-static int mpt2sas_remove_dead_ioc_func(void *arg)
-{
-		struct MPT2SAS_ADAPTER *ioc = (struct MPT2SAS_ADAPTER *)arg;
-		struct pci_dev *pdev;
-
-		if ((ioc == NULL))
-			return -1;
-
-		pdev = ioc->pdev;
-		if ((pdev == NULL))
-			return -1;
-		pci_stop_and_remove_bus_device(pdev);
-		return 0;
-}
-
-
-/**
  * _base_fault_reset_work - workq handling ioc fault conditions
  * @work: input argument, used to derive ioc
  * Context: sleep.
@@ -152,7 +128,6 @@ _base_fault_reset_work(struct work_struct *work)
 	unsigned long	 flags;
 	u32 doorbell;
 	int rc;
-	struct task_struct *p;
 
 	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
 	if (ioc->shost_recovery || ioc->pci_error_recovery)
@@ -186,23 +161,7 @@ _base_fault_reset_work(struct work_struct *work)
 		 * command back from HW.
 		 */
 		ioc->schedule_dead_ioc_flush_running_cmds(ioc);
-		/*
-		 * Set remove_host flag early since kernel thread will
-		 * take some time to execute.
-		 */
-		ioc->remove_host = 1;
-		/*Remove the Dead Host */
-		p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc,
-		    "mpt2sas_dead_ioc_%d", ioc->id);
-		if (IS_ERR(p)) {
-			printk(MPT2SAS_ERR_FMT
-			"%s: Running mpt2sas_dead_ioc thread failed !!!!\n",
-			ioc->name, __func__);
-		} else {
-		    printk(MPT2SAS_ERR_FMT
-			"%s: Running mpt2sas_dead_ioc thread success !!!!\n",
-			ioc->name, __func__);
-		}
+		mpt2sas_scsih_detach(ioc);
 
 		return; /* don't rearm timer */
 	}
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
index 4caaac1..94d0e98 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -817,6 +817,7 @@ struct MPT2SAS_ADAPTER {
 	u8		broadcast_aen_busy;
 	u16		broadcast_aen_pending;
 	u8		shost_recovery;
+	u8		shost_attached;
 
 	struct mutex	reset_in_progress_mutex;
 	spinlock_t 	ioc_reset_in_progress_lock;
@@ -1078,6 +1079,7 @@ struct _sas_device *mpt2sas_scsih_sas_device_find_by_sas_address(
 void mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc);
 
 void mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
+void mpt2sas_scsih_detach(struct MPT2SAS_ADAPTER *ioc);
 
 /* config shared API */
 u8 mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index c6bdc92..a06662c 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7652,51 +7652,23 @@ _scsih_ir_shutdown(struct MPT2SAS_ADAPTER *ioc)
 }
 
 /**
- * _scsih_shutdown - routine call during system shutdown
- * @pdev: PCI device struct
- *
- * Return nothing.
- */
-static void
-_scsih_shutdown(struct pci_dev *pdev)
-{
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
-	struct workqueue_struct	*wq;
-	unsigned long flags;
-
-	ioc->remove_host = 1;
-	_scsih_fw_event_cleanup_queue(ioc);
-
-	spin_lock_irqsave(&ioc->fw_event_lock, flags);
-	wq = ioc->firmware_event_thread;
-	ioc->firmware_event_thread = NULL;
-	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
-	if (wq)
-		destroy_workqueue(wq);
-
-	_scsih_ir_shutdown(ioc);
-	mpt2sas_base_detach(ioc);
-}
-
-/**
- * _scsih_remove - detach and remove add host
- * @pdev: PCI device struct
+ * mpt2sas_scsih_detach - detach from SCSI midlayer and free resources
+ * @ioc: per adapter object
  *
- * Routine called when unloading the driver.
  * Return nothing.
  */
-static void
-_scsih_remove(struct pci_dev *pdev)
+void
+mpt2sas_scsih_detach(struct MPT2SAS_ADAPTER *ioc)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
 	struct _sas_port *mpt2sas_port, *next_port;
 	struct _raid_device *raid_device, *next;
 	struct MPT2SAS_TARGET *sas_target_priv_data;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	if (!ioc->shost_attached)
+		return;
+
 	ioc->remove_host = 1;
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -7745,11 +7717,52 @@ _scsih_remove(struct pci_dev *pdev)
 		ioc->sas_hba.num_phys = 0;
 	}
 
-	sas_remove_host(shost);
+	sas_remove_host(ioc->shost);
+	scsi_remove_host(ioc->shost);
+	ioc->shost_attached = 0;
+}
+
+/**
+ * _scsih_shutdown - routine call during system shutdown
+ * @pdev: PCI device struct
+ *
+ * Return nothing.
+ */
+static void
+_scsih_shutdown(struct pci_dev *pdev)
+{
+	struct Scsi_Host *shost = pci_get_drvdata(pdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	ioc->remove_host = 1;
+
+	mpt2sas_base_stop_watchdog(ioc);
+	mpt2sas_scsih_detach(ioc);
+	mpt2sas_base_detach(ioc);
+}
+
+/**
+ * _scsih_remove - detach and remove add host
+ * @pdev: PCI device struct
+ *
+ * Routine called when unloading the driver.
+ * Return nothing.
+ */
+static void
+_scsih_remove(struct pci_dev *pdev)
+{
+	struct Scsi_Host *shost = pci_get_drvdata(pdev);
+	struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+	ioc->remove_host = 1;
+
+	mpt2sas_base_stop_watchdog(ioc);
+	mpt2sas_scsih_detach(ioc);
+
 	mpt2sas_base_detach(ioc);
 	list_del(&ioc->list);
-	scsi_remove_host(shost);
-	scsi_host_put(shost);
+
+	scsi_host_put(ioc->shost);
 }
 
 /**
@@ -8022,6 +8035,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&ioc->list);
 	list_add_tail(&ioc->list, &mpt2sas_ioc_list);
 	ioc->shost = shost;
+	ioc->shost_attached = 1;
 	ioc->id = mpt_ids++;
 	sprintf(ioc->name, "%s%d", MPT2SAS_DRIVER_NAME, ioc->id);
 	ioc->pdev = pdev;
@@ -8144,6 +8158,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	list_del(&ioc->list);
 	scsi_remove_host(shost);
  out_add_shost_fail:
+	ioc->shost_attached = 0;
 	scsi_host_put(shost);
 	return -ENODEV;
 }
@@ -8164,7 +8179,8 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 	pci_power_t device_state;
 
 	mpt2sas_base_stop_watchdog(ioc);
-	scsi_block_requests(shost);
+	if (ioc->shost_attached)
+		scsi_block_requests(shost);
 	device_state = pci_choose_state(pdev, state);
 	printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, entering "
 	    "operating state [D%d]\n", ioc->name, pdev,
@@ -8204,7 +8220,8 @@ _scsih_resume(struct pci_dev *pdev)
 		return r;
 
 	mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
-	scsi_unblock_requests(shost);
+	if (ioc->shost_attached)
+		scsi_unblock_requests(shost);
 	mpt2sas_base_start_watchdog(ioc);
 	return 0;
 }
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1836003..f8c25a1 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -56,7 +56,6 @@
 #include <linux/dma-mapping.h>
 #include <linux/io.h>
 #include <linux/time.h>
-#include <linux/kthread.h>
 #include <linux/aer.h>
 
 
@@ -110,28 +109,6 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug,
 	param_get_int, &mpt3sas_fwfault_debug, 0644);
 
 /**
- *  mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc
- * @arg: input argument, used to derive ioc
- *
- * Return 0 if controller is removed from pci subsystem.
- * Return -1 for other case.
- */
-static int mpt3sas_remove_dead_ioc_func(void *arg)
-{
-	struct MPT3SAS_ADAPTER *ioc = (struct MPT3SAS_ADAPTER *)arg;
-	struct pci_dev *pdev;
-
-	if ((ioc == NULL))
-		return -1;
-
-	pdev = ioc->pdev;
-	if ((pdev == NULL))
-		return -1;
-	pci_stop_and_remove_bus_device(pdev);
-	return 0;
-}
-
-/**
  * _base_fault_reset_work - workq handling ioc fault conditions
  * @work: input argument, used to derive ioc
  * Context: sleep.
@@ -146,8 +123,6 @@ _base_fault_reset_work(struct work_struct *work)
 	unsigned long	 flags;
 	u32 doorbell;
 	int rc;
-	struct task_struct *p;
-
 
 	spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
 	if (ioc->shost_recovery)
@@ -167,22 +142,8 @@ _base_fault_reset_work(struct work_struct *work)
 		 * command back from HW.
 		 */
 		ioc->schedule_dead_ioc_flush_running_cmds(ioc);
-		/*
-		 * Set remove_host flag early since kernel thread will
-		 * take some time to execute.
-		 */
-		ioc->remove_host = 1;
-		/*Remove the Dead Host */
-		p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc,
-		    "mpt3sas_dead_ioc_%d", ioc->id);
-		if (IS_ERR(p))
-			pr_err(MPT3SAS_FMT
-			"%s: Running mpt3sas_dead_ioc thread failed !!!!\n",
-			ioc->name, __func__);
-		else
-			pr_err(MPT3SAS_FMT
-			"%s: Running mpt3sas_dead_ioc thread success !!!!\n",
-			ioc->name, __func__);
+		mpt3sas_scsih_detach(ioc);
+
 		return; /* don't rearm timer */
 	}
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 994656c..9fbf56c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -731,6 +731,7 @@ struct MPT3SAS_ADAPTER {
 	u8		broadcast_aen_busy;
 	u16		broadcast_aen_pending;
 	u8		shost_recovery;
+	u8		shost_attached;
 
 	struct mutex	reset_in_progress_mutex;
 	spinlock_t	ioc_reset_in_progress_lock;
@@ -990,6 +991,7 @@ int mpt3sas_port_enable(struct MPT3SAS_ADAPTER *ioc);
 u8 mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
 	u32 reply);
 void mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase);
+void mpt3sas_scsih_detach(struct MPT3SAS_ADAPTER *ioc);
 
 int mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle,
 	uint channel, uint id, uint lun, u8 type, u16 smid_task,
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index dcbf7c8..9ae0914 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -7365,22 +7365,23 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
 }
 
 /**
- * _scsih_remove - detach and remove add host
- * @pdev: PCI device struct
+ * mpt3sas_scsih_detach - detach from SCSI midlayer and free resources
+ * @ioc: per adapter object
  *
- * Routine called when unloading the driver.
  * Return nothing.
  */
-static void _scsih_remove(struct pci_dev *pdev)
+void
+mpt3sas_scsih_detach(struct MPT3SAS_ADAPTER *ioc)
 {
-	struct Scsi_Host *shost = pci_get_drvdata(pdev);
-	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
 	struct _sas_port *mpt3sas_port, *next_port;
 	struct _raid_device *raid_device, *next;
 	struct MPT3SAS_TARGET *sas_target_priv_data;
 	struct workqueue_struct	*wq;
 	unsigned long flags;
 
+	if (!ioc->shost_attached)
+		return;
+
 	ioc->remove_host = 1;
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -7429,11 +7430,9 @@ static void _scsih_remove(struct pci_dev *pdev)
 		ioc->sas_hba.num_phys = 0;
 	}
 
-	sas_remove_host(shost);
-	mpt3sas_base_detach(ioc);
-	list_del(&ioc->list);
-	scsi_remove_host(shost);
-	scsi_host_put(shost);
+	sas_remove_host(ioc->shost);
+	scsi_remove_host(ioc->shost);
+	ioc->shost_attached = 0;
 }
 
 /**
@@ -7447,23 +7446,37 @@ _scsih_shutdown(struct pci_dev *pdev)
 {
 	struct Scsi_Host *shost = pci_get_drvdata(pdev);
 	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
-	struct workqueue_struct	*wq;
-	unsigned long flags;
 
 	ioc->remove_host = 1;
-	_scsih_fw_event_cleanup_queue(ioc);
-
-	spin_lock_irqsave(&ioc->fw_event_lock, flags);
-	wq = ioc->firmware_event_thread;
-	ioc->firmware_event_thread = NULL;
-	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
-	if (wq)
-		destroy_workqueue(wq);
 
-	_scsih_ir_shutdown(ioc);
+	mpt3sas_base_stop_watchdog(ioc);
+	mpt3sas_scsih_detach(ioc);
 	mpt3sas_base_detach(ioc);
 }
 
+/**
+ * _scsih_remove - detach and remove add host
+ * @pdev: PCI device struct
+ *
+ * Routine called when unloading the driver.
+ * Return nothing.
+ */
+static void
+_scsih_remove(struct pci_dev *pdev)
+{
+	struct Scsi_Host *shost = pci_get_drvdata(pdev);
+	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+
+	ioc->remove_host = 1;
+
+	mpt3sas_base_stop_watchdog(ioc);
+	mpt3sas_scsih_detach(ioc);
+
+	mpt3sas_base_detach(ioc);
+	list_del(&ioc->list);
+
+	scsi_host_put(ioc->shost);
+}
 
 /**
  * _scsih_probe_boot_devices - reports 1st device
@@ -7735,6 +7748,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&ioc->list);
 	list_add_tail(&ioc->list, &mpt3sas_ioc_list);
 	ioc->shost = shost;
+	ioc->shost_attached = 1;
 	ioc->id = mpt_ids++;
 	sprintf(ioc->name, "%s%d", MPT3SAS_DRIVER_NAME, ioc->id);
 	ioc->pdev = pdev;
@@ -7839,6 +7853,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	list_del(&ioc->list);
 	scsi_remove_host(shost);
  out_add_shost_fail:
+	ioc->shost_attached = 0;
 	scsi_host_put(shost);
 	return -ENODEV;
 }
@@ -7860,7 +7875,8 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
 
 	mpt3sas_base_stop_watchdog(ioc);
 	flush_scheduled_work();
-	scsi_block_requests(shost);
+	if (ioc->shost_attached)
+		scsi_block_requests(shost);
 	device_state = pci_choose_state(pdev, state);
 	pr_info(MPT3SAS_FMT
 		"pdev=0x%p, slot=%s, entering operating state [D%d]\n",
@@ -7899,7 +7915,8 @@ _scsih_resume(struct pci_dev *pdev)
 		return r;
 
 	mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
-	scsi_unblock_requests(shost);
+	if (ioc->shost_attached)
+		scsi_unblock_requests(shost);
 	mpt3sas_base_start_watchdog(ioc);
 	return 0;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal safe
  2013-05-15 17:26 ` [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal safe Joe Lawrence
@ 2013-05-17 15:29   ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2013-05-17 15:29 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-scsi, James E.J. Bottomley, Sreekanth Reddy, Desai,
	Kashyap, Nagalakshmi Nandigama, linux-pci

[+cc linux-pci]

On Wed, May 15, 2013 at 11:26 AM, Joe Lawrence <joe.lawrence@stratus.com> wrote:
> From 9fc1a958ad48718216fbdc19405297dd11d11539 Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@stratus.com>
> Date: Tue, 14 May 2013 15:41:17 -0400
> Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal
>  safe
>
> Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
> device removal races with PCI callback functions such as device removal.
>
> Modify the MPT PCI callbacks to verify Scsi_Host attachment under a
> common mutex to prevent racing the driver's watchdog device removal.
> (PCI callbacks can trust pci_dev existence.)
>
> To protect the watchdog device removal thread from racing PCI removal,
> first wrapper the existing PCI device .remove function. The watchdog
> should directly call this function, not PCI hotplug API. Under the same
> common mutex, this routine should verify that pci_dev is on driver's
> ioc_list before proceeding with device removal. (Asynchronous driver
> watchdog kthread cannot trust pci_dev existence.)

I think it's a mistake to use the strategy of verifying that a pci_dev
still exists.  The driver should be written so that if you have a
pci_dev pointer, you are guaranteed that the pci_dev is still valid.
The driver should clean up any asynchronous threads or pending work
items that keep a pci_dev pointer in its .remove() method, which is
before the pci_dev is deallocated.

If a driver keeps a pointer to a struct pci_dev after .remove()
returns, that's a bug.

Bjorn

> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  drivers/scsi/mpt2sas/mpt2sas_base.c  | 24 +++------
>  drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 +
>  drivers/scsi/mpt2sas/mpt2sas_scsih.c | 96 ++++++++++++++++++++++++++++--------
>  drivers/scsi/mpt3sas/mpt3sas_base.c  | 22 +++------
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 +
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 95 ++++++++++++++++++++++++++++-------
>  6 files changed, 171 insertions(+), 70 deletions(-)
>
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
> index bcb23d2..eb24ddd 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> @@ -116,24 +116,16 @@ module_param_call(mpt2sas_fwfault_debug, _scsih_set_fwfault_debug,
>
>  /**
>   *  mpt2sas_remove_dead_ioc_func - kthread context to remove dead ioc
> - * @arg: input argument, used to derive ioc
> + * @pdev: PCI device struct
>   *
> - * Return 0 if controller is removed from pci subsystem.
> - * Return -1 for other case.
> + * Returns 0.
>   */
> -static int mpt2sas_remove_dead_ioc_func(void *arg)
> +static int
> +mpt2sas_remove_dead_ioc_func(void *arg)
>  {
> -               struct MPT2SAS_ADAPTER *ioc = (struct MPT2SAS_ADAPTER *)arg;
> -               struct pci_dev *pdev;
> -
> -               if ((ioc == NULL))
> -                       return -1;
> -
> -               pdev = ioc->pdev;
> -               if ((pdev == NULL))
> -                       return -1;
> -               pci_stop_and_remove_bus_device(pdev);
> -               return 0;
> +       /* mpt2sas_scsih_detach_pci will validate pci_dev */
> +       mpt2sas_scsih_detach_pci((struct pci_dev *)arg);
> +       return 0;
>  }
>
>
> @@ -192,7 +184,7 @@ _base_fault_reset_work(struct work_struct *work)
>                  */
>                 ioc->remove_host = 1;
>                 /*Remove the Dead Host */
> -               p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc,
> +               p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc->pdev,
>                     "mpt2sas_dead_ioc_%d", ioc->id);
>                 if (IS_ERR(p)) {
>                         printk(MPT2SAS_ERR_FMT
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> index 4caaac1..d88515d 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> @@ -1079,6 +1079,8 @@ void mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc);
>
>  void mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
>
> +void mpt2sas_scsih_detach_pci(struct pci_dev *pdev);
> +
>  /* config shared API */
>  u8 mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
>      u32 reply);
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> index c6bdc92..8bff162 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> @@ -78,6 +78,8 @@ static int _scsih_scan_finished(struct Scsi_Host *shost, unsigned long time);
>  LIST_HEAD(mpt2sas_ioc_list);
>
>  /* local parameters */
> +DEFINE_MUTEX(_mpt2sas_pci_mutex);
> +
>  static u8 scsi_io_cb_idx = -1;
>  static u8 tm_cb_idx = -1;
>  static u8 ctl_cb_idx = -1;
> @@ -7660,11 +7662,17 @@ _scsih_ir_shutdown(struct MPT2SAS_ADAPTER *ioc)
>  static void
>  _scsih_shutdown(struct pci_dev *pdev)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct Scsi_Host *shost;
> +       struct MPT2SAS_ADAPTER *ioc;
>         struct workqueue_struct *wq;
>         unsigned long flags;
>
> +       mutex_lock(&_mpt2sas_pci_mutex);
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +       ioc = shost_priv(shost);
> +
>         ioc->remove_host = 1;
>         _scsih_fw_event_cleanup_queue(ioc);
>
> @@ -7677,26 +7685,41 @@ _scsih_shutdown(struct pci_dev *pdev)
>
>         _scsih_ir_shutdown(ioc);
>         mpt2sas_base_detach(ioc);
> +out:
> +       mutex_unlock(&_mpt2sas_pci_mutex);
>  }
>
>  /**
> - * _scsih_remove - detach and remove add host
> + * mpt2sas_scsih_detach_pci - detach shost from PCI device
>   * @pdev: PCI device struct
>   *
> - * Routine called when unloading the driver.
> + * Routine called by pci driver .remove callback and watchdog-created
> + *   mpt2sas_remove_dead_ioc_func kthread.
>   * Return nothing.
>   */
> -static void
> -_scsih_remove(struct pci_dev *pdev)
> +void
> +mpt2sas_scsih_detach_pci(struct pci_dev *pdev)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct Scsi_Host *shost;
> +       struct MPT2SAS_ADAPTER *ioc;
>         struct _sas_port *mpt2sas_port, *next_port;
>         struct _raid_device *raid_device, *next;
>         struct MPT2SAS_TARGET *sas_target_priv_data;
>         struct workqueue_struct *wq;
>         unsigned long flags;
>
> +       mutex_lock(&_mpt2sas_pci_mutex);
> +       /* verify driver still knows about this pdev */
> +       list_for_each_entry(ioc, &mpt2sas_ioc_list, list) {
> +               if (ioc->pdev == pdev)
> +                       goto remove;
> +       }
> +       goto out;
> +remove:
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +
>         ioc->remove_host = 1;
>         _scsih_fw_event_cleanup_queue(ioc);
>
> @@ -7750,6 +7773,21 @@ _scsih_remove(struct pci_dev *pdev)
>         list_del(&ioc->list);
>         scsi_remove_host(shost);
>         scsi_host_put(shost);
> +out:
> +       mutex_unlock(&_mpt2sas_pci_mutex);
> +}
> +
> +/**
> + * _scsih_remove - detach and remove host
> + * @pdev: PCI device struct
> + *
> + * Routine called when unloading the driver and hotplug remove.
> + * Return nothing.
> + */
> +static void
> +_scsih_remove(struct pci_dev *pdev)
> +{
> +       mpt2sas_scsih_detach_pci(pdev);
>  }
>
>  /**
> @@ -8159,10 +8197,16 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  static int
>  _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct Scsi_Host *shost;
> +       struct MPT2SAS_ADAPTER *ioc;
>         pci_power_t device_state;
>
> +       mutex_lock(&_mpt2sas_pci_mutex);
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +       ioc = shost_priv(shost);
> +
>         mpt2sas_base_stop_watchdog(ioc);
>         scsi_block_requests(shost);
>         device_state = pci_choose_state(pdev, state);
> @@ -8174,6 +8218,9 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>         pci_save_state(pdev);
>         pci_disable_device(pdev);
>         pci_set_power_state(pdev, device_state);
> +out:
> +       mutex_unlock(&_mpt2sas_pci_mutex);
> +
>         return 0;
>  }
>
> @@ -8186,10 +8233,17 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>  static int
>  _scsih_resume(struct pci_dev *pdev)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> -       pci_power_t device_state = pdev->current_state;
> -       int r;
> +       struct Scsi_Host *shost;
> +       struct MPT2SAS_ADAPTER *ioc;
> +       pci_power_t device_state;
> +       int r = 0;
> +
> +       mutex_lock(&_mpt2sas_pci_mutex);
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +       ioc = shost_priv(shost);
> +       device_state = pdev->current_state;
>
>         printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, previous "
>             "operating state [D%d]\n", ioc->name, pdev,
> @@ -8200,13 +8254,15 @@ _scsih_resume(struct pci_dev *pdev)
>         pci_restore_state(pdev);
>         ioc->pdev = pdev;
>         r = mpt2sas_base_map_resources(ioc);
> -       if (r)
> -               return r;
> +       if (!r) {
> +               mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
> +               scsi_unblock_requests(shost);
> +               mpt2sas_base_start_watchdog(ioc);
> +       }
> +out:
> +       mutex_unlock(&_mpt2sas_pci_mutex);
>
> -       mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
> -       scsi_unblock_requests(shost);
> -       mpt2sas_base_start_watchdog(ioc);
> -       return 0;
> +       return r;
>  }
>  #endif /* CONFIG_PM */
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 1836003..9d4f314 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -111,23 +111,15 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug,
>
>  /**
>   *  mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc
> - * @arg: input argument, used to derive ioc
> + * @pdev: PCI device struct
>   *
> - * Return 0 if controller is removed from pci subsystem.
> - * Return -1 for other case.
> + * Returns 0.
>   */
> -static int mpt3sas_remove_dead_ioc_func(void *arg)
> +static int
> +mpt3sas_remove_dead_ioc_func(void *arg)
>  {
> -       struct MPT3SAS_ADAPTER *ioc = (struct MPT3SAS_ADAPTER *)arg;
> -       struct pci_dev *pdev;
> -
> -       if ((ioc == NULL))
> -               return -1;
> -
> -       pdev = ioc->pdev;
> -       if ((pdev == NULL))
> -               return -1;
> -       pci_stop_and_remove_bus_device(pdev);
> +       /* mpt3sas_scsih_detach_pci will validate pci_dev */
> +       mpt3sas_scsih_detach_pci((struct pci_dev *)arg);
>         return 0;
>  }
>
> @@ -173,7 +165,7 @@ _base_fault_reset_work(struct work_struct *work)
>                  */
>                 ioc->remove_host = 1;
>                 /*Remove the Dead Host */
> -               p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc,
> +               p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc->pdev,
>                     "mpt3sas_dead_ioc_%d", ioc->id);
>                 if (IS_ERR(p))
>                         pr_err(MPT3SAS_FMT
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 994656c..225c84f 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -1009,6 +1009,8 @@ struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
>
>  void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
>
> +void mpt3sas_scsih_detach_pci(struct pci_dev *pdev);
> +
>  /* config shared API */
>  u8 mpt3sas_config_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
>         u32 reply);
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index dcbf7c8..5b8c365 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -81,6 +81,8 @@ static int _scsih_scan_finished(struct Scsi_Host *shost, unsigned long time);
>  LIST_HEAD(mpt3sas_ioc_list);
>
>  /* local parameters */
> +DEFINE_MUTEX(_mpt3sas_pci_mutex);
> +
>  static u8 scsi_io_cb_idx = -1;
>  static u8 tm_cb_idx = -1;
>  static u8 ctl_cb_idx = -1;
> @@ -7365,22 +7367,36 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
>  }
>
>  /**
> - * _scsih_remove - detach and remove add host
> + * mpt3sas_scsih_detach_pci - detach shost from PCI device
>   * @pdev: PCI device struct
>   *
> - * Routine called when unloading the driver.
> + * Routine called by pci driver .remove callback and watchdog-created
> + *   mpt3sas_remove_dead_ioc_func kthread.
>   * Return nothing.
>   */
> -static void _scsih_remove(struct pci_dev *pdev)
> +void
> +mpt3sas_scsih_detach_pci(struct pci_dev *pdev)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct Scsi_Host *shost;
> +       struct MPT3SAS_ADAPTER *ioc;
>         struct _sas_port *mpt3sas_port, *next_port;
>         struct _raid_device *raid_device, *next;
>         struct MPT3SAS_TARGET *sas_target_priv_data;
>         struct workqueue_struct *wq;
>         unsigned long flags;
>
> +       mutex_lock(&_mpt3sas_pci_mutex);
> +       /* verify driver still knows about this pdev */
> +       list_for_each_entry(ioc, &mpt3sas_ioc_list, list) {
> +               if (ioc->pdev == pdev)
> +                       goto remove;
> +       }
> +       goto out;
> +remove:
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +
>         ioc->remove_host = 1;
>         _scsih_fw_event_cleanup_queue(ioc);
>
> @@ -7434,6 +7450,21 @@ static void _scsih_remove(struct pci_dev *pdev)
>         list_del(&ioc->list);
>         scsi_remove_host(shost);
>         scsi_host_put(shost);
> +out:
> +       mutex_unlock(&_mpt3sas_pci_mutex);
> +}
> +
> +/**
> + * _scsih_remove - detach and remove host
> + * @pdev: PCI device struct
> + *
> + * Routine called when unloading the driver and hotplug remove.
> + * Return nothing.
> + */
> +static void
> +_scsih_remove(struct pci_dev *pdev)
> +{
> +       mpt3sas_scsih_detach_pci(pdev);
>  }
>
>  /**
> @@ -7445,11 +7476,17 @@ static void _scsih_remove(struct pci_dev *pdev)
>  static void
>  _scsih_shutdown(struct pci_dev *pdev)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct Scsi_Host *shost;
> +       struct MPT3SAS_ADAPTER *ioc;
>         struct workqueue_struct *wq;
>         unsigned long flags;
>
> +       mutex_lock(&_mpt3sas_pci_mutex);
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +       ioc = shost_priv(shost);
> +
>         ioc->remove_host = 1;
>         _scsih_fw_event_cleanup_queue(ioc);
>
> @@ -7462,6 +7499,8 @@ _scsih_shutdown(struct pci_dev *pdev)
>
>         _scsih_ir_shutdown(ioc);
>         mpt3sas_base_detach(ioc);
> +out:
> +       mutex_unlock(&_mpt3sas_pci_mutex);
>  }
>
>
> @@ -7854,10 +7893,16 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  static int
>  _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct Scsi_Host *shost;
> +       struct MPT3SAS_ADAPTER *ioc;
>         pci_power_t device_state;
>
> +       mutex_lock(&_mpt3sas_pci_mutex);
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +       ioc = shost_priv(shost);
> +
>         mpt3sas_base_stop_watchdog(ioc);
>         flush_scheduled_work();
>         scsi_block_requests(shost);
> @@ -7869,6 +7914,9 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>         pci_save_state(pdev);
>         mpt3sas_base_free_resources(ioc);
>         pci_set_power_state(pdev, device_state);
> +out:
> +       mutex_unlock(&_mpt3sas_pci_mutex);
> +
>         return 0;
>  }
>
> @@ -7881,10 +7929,17 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>  static int
>  _scsih_resume(struct pci_dev *pdev)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> -       pci_power_t device_state = pdev->current_state;
> -       int r;
> +       struct Scsi_Host *shost;
> +       struct MPT3SAS_ADAPTER *ioc;
> +       pci_power_t device_state;
> +       int r = 0;
> +
> +       mutex_lock(&_mpt3sas_pci_mutex);
> +       shost = pci_get_drvdata(pdev);
> +       if (!shost)
> +               goto out;
> +       ioc = shost_priv(shost);
> +       device_state = pdev->current_state;
>
>         pr_info(MPT3SAS_FMT
>                 "pdev=0x%p, slot=%s, previous operating state [D%d]\n",
> @@ -7895,13 +7950,15 @@ _scsih_resume(struct pci_dev *pdev)
>         pci_restore_state(pdev);
>         ioc->pdev = pdev;
>         r = mpt3sas_base_map_resources(ioc);
> -       if (r)
> -               return r;
> +       if (!r) {
> +               mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
> +               scsi_unblock_requests(shost);
> +               mpt3sas_base_start_watchdog(ioc);
> +       }
> +out:
> +       mutex_unlock(&_mpt3sas_pci_mutex);
>
> -       mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
> -       scsi_unblock_requests(shost);
> -       mpt3sas_base_start_watchdog(ioc);
> -       return 0;
> +       return r;
>  }
>  #endif /* CONFIG_PM */
>
> --
> 1.8.1.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-05-15 17:29 ` mpt2sas,mpt3sas watchdog device removal Joe Lawrence
@ 2013-05-17 15:29   ` Bjorn Helgaas
  2013-05-17 21:42     ` Joe Lawrence
  2013-07-16 12:00   ` Reddy, Sreekanth
  1 sibling, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2013-05-17 15:29 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-scsi, James E.J. Bottomley, Sreekanth Reddy, Desai,
	Kashyap, Nagalakshmi Nandigama, linux-pci

[+cc linux-pci]

On Wed, May 15, 2013 at 11:29 AM, Joe Lawrence <joe.lawrence@stratus.com> wrote:
> From 84ac7a35ebd61e84d4254eae78bb967de17254c2 Mon Sep 17 00:00:00 2001
> From: Joe Lawrence <joe.lawrence@stratus.com>
> Date: Wed, 15 May 2013 12:52:31 -0400
> Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal
>  safe
>
> Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
> device removal races with PCI callback functions.
>
> Simplify the mpt2sas watchdog mechanism by separating PCI device removal
> from SCSI midlayer detachment. When the watchdog wishes to remove a SCSI
> device from the topology, detach from the SCSI midlayer, leaving the PCI
> device alone. Adjust various pci_driver callbacks to account for a
> potentially SCSI detached PCI device.

I don't know the details of the SCSI detachment, but this approach
looks much cleaner to me.

Thanks for doing all this work, Joe.  I know this isn't finished, but
it looks like a great step in making these drivers simpler and more
reliable.

Bjorn

> Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
> ---
>  drivers/scsi/mpt2sas/mpt2sas_base.c  | 43 +---------------
>  drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 +
>  drivers/scsi/mpt2sas/mpt2sas_scsih.c | 97 +++++++++++++++++++++---------------
>  drivers/scsi/mpt3sas/mpt3sas_base.c  | 43 +---------------
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 +
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 65 +++++++++++++++---------
>  6 files changed, 105 insertions(+), 147 deletions(-)
>
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
> index bcb23d2..cc1bf28 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
> @@ -57,7 +57,6 @@
>  #include <linux/sort.h>
>  #include <linux/io.h>
>  #include <linux/time.h>
> -#include <linux/kthread.h>
>  #include <linux/aer.h>
>
>  #include "mpt2sas_base.h"
> @@ -115,29 +114,6 @@ module_param_call(mpt2sas_fwfault_debug, _scsih_set_fwfault_debug,
>      param_get_int, &mpt2sas_fwfault_debug, 0644);
>
>  /**
> - *  mpt2sas_remove_dead_ioc_func - kthread context to remove dead ioc
> - * @arg: input argument, used to derive ioc
> - *
> - * Return 0 if controller is removed from pci subsystem.
> - * Return -1 for other case.
> - */
> -static int mpt2sas_remove_dead_ioc_func(void *arg)
> -{
> -               struct MPT2SAS_ADAPTER *ioc = (struct MPT2SAS_ADAPTER *)arg;
> -               struct pci_dev *pdev;
> -
> -               if ((ioc == NULL))
> -                       return -1;
> -
> -               pdev = ioc->pdev;
> -               if ((pdev == NULL))
> -                       return -1;
> -               pci_stop_and_remove_bus_device(pdev);
> -               return 0;
> -}
> -
> -
> -/**
>   * _base_fault_reset_work - workq handling ioc fault conditions
>   * @work: input argument, used to derive ioc
>   * Context: sleep.
> @@ -152,7 +128,6 @@ _base_fault_reset_work(struct work_struct *work)
>         unsigned long    flags;
>         u32 doorbell;
>         int rc;
> -       struct task_struct *p;
>
>         spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
>         if (ioc->shost_recovery || ioc->pci_error_recovery)
> @@ -186,23 +161,7 @@ _base_fault_reset_work(struct work_struct *work)
>                  * command back from HW.
>                  */
>                 ioc->schedule_dead_ioc_flush_running_cmds(ioc);
> -               /*
> -                * Set remove_host flag early since kernel thread will
> -                * take some time to execute.
> -                */
> -               ioc->remove_host = 1;
> -               /*Remove the Dead Host */
> -               p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc,
> -                   "mpt2sas_dead_ioc_%d", ioc->id);
> -               if (IS_ERR(p)) {
> -                       printk(MPT2SAS_ERR_FMT
> -                       "%s: Running mpt2sas_dead_ioc thread failed !!!!\n",
> -                       ioc->name, __func__);
> -               } else {
> -                   printk(MPT2SAS_ERR_FMT
> -                       "%s: Running mpt2sas_dead_ioc thread success !!!!\n",
> -                       ioc->name, __func__);
> -               }
> +               mpt2sas_scsih_detach(ioc);
>
>                 return; /* don't rearm timer */
>         }
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> index 4caaac1..94d0e98 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> @@ -817,6 +817,7 @@ struct MPT2SAS_ADAPTER {
>         u8              broadcast_aen_busy;
>         u16             broadcast_aen_pending;
>         u8              shost_recovery;
> +       u8              shost_attached;
>
>         struct mutex    reset_in_progress_mutex;
>         spinlock_t      ioc_reset_in_progress_lock;
> @@ -1078,6 +1079,7 @@ struct _sas_device *mpt2sas_scsih_sas_device_find_by_sas_address(
>  void mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc);
>
>  void mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
> +void mpt2sas_scsih_detach(struct MPT2SAS_ADAPTER *ioc);
>
>  /* config shared API */
>  u8 mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index,
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> index c6bdc92..a06662c 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> @@ -7652,51 +7652,23 @@ _scsih_ir_shutdown(struct MPT2SAS_ADAPTER *ioc)
>  }
>
>  /**
> - * _scsih_shutdown - routine call during system shutdown
> - * @pdev: PCI device struct
> - *
> - * Return nothing.
> - */
> -static void
> -_scsih_shutdown(struct pci_dev *pdev)
> -{
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> -       struct workqueue_struct *wq;
> -       unsigned long flags;
> -
> -       ioc->remove_host = 1;
> -       _scsih_fw_event_cleanup_queue(ioc);
> -
> -       spin_lock_irqsave(&ioc->fw_event_lock, flags);
> -       wq = ioc->firmware_event_thread;
> -       ioc->firmware_event_thread = NULL;
> -       spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
> -       if (wq)
> -               destroy_workqueue(wq);
> -
> -       _scsih_ir_shutdown(ioc);
> -       mpt2sas_base_detach(ioc);
> -}
> -
> -/**
> - * _scsih_remove - detach and remove add host
> - * @pdev: PCI device struct
> + * mpt2sas_scsih_detach - detach from SCSI midlayer and free resources
> + * @ioc: per adapter object
>   *
> - * Routine called when unloading the driver.
>   * Return nothing.
>   */
> -static void
> -_scsih_remove(struct pci_dev *pdev)
> +void
> +mpt2sas_scsih_detach(struct MPT2SAS_ADAPTER *ioc)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
>         struct _sas_port *mpt2sas_port, *next_port;
>         struct _raid_device *raid_device, *next;
>         struct MPT2SAS_TARGET *sas_target_priv_data;
>         struct workqueue_struct *wq;
>         unsigned long flags;
>
> +       if (!ioc->shost_attached)
> +               return;
> +
>         ioc->remove_host = 1;
>         _scsih_fw_event_cleanup_queue(ioc);
>
> @@ -7745,11 +7717,52 @@ _scsih_remove(struct pci_dev *pdev)
>                 ioc->sas_hba.num_phys = 0;
>         }
>
> -       sas_remove_host(shost);
> +       sas_remove_host(ioc->shost);
> +       scsi_remove_host(ioc->shost);
> +       ioc->shost_attached = 0;
> +}
> +
> +/**
> + * _scsih_shutdown - routine call during system shutdown
> + * @pdev: PCI device struct
> + *
> + * Return nothing.
> + */
> +static void
> +_scsih_shutdown(struct pci_dev *pdev)
> +{
> +       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> +       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> +
> +       ioc->remove_host = 1;
> +
> +       mpt2sas_base_stop_watchdog(ioc);
> +       mpt2sas_scsih_detach(ioc);
> +       mpt2sas_base_detach(ioc);
> +}
> +
> +/**
> + * _scsih_remove - detach and remove add host
> + * @pdev: PCI device struct
> + *
> + * Routine called when unloading the driver.
> + * Return nothing.
> + */
> +static void
> +_scsih_remove(struct pci_dev *pdev)
> +{
> +       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> +       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
> +
> +       ioc->remove_host = 1;
> +
> +       mpt2sas_base_stop_watchdog(ioc);
> +       mpt2sas_scsih_detach(ioc);
> +
>         mpt2sas_base_detach(ioc);
>         list_del(&ioc->list);
> -       scsi_remove_host(shost);
> -       scsi_host_put(shost);
> +
> +       scsi_host_put(ioc->shost);
>  }
>
>  /**
> @@ -8022,6 +8035,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>         INIT_LIST_HEAD(&ioc->list);
>         list_add_tail(&ioc->list, &mpt2sas_ioc_list);
>         ioc->shost = shost;
> +       ioc->shost_attached = 1;
>         ioc->id = mpt_ids++;
>         sprintf(ioc->name, "%s%d", MPT2SAS_DRIVER_NAME, ioc->id);
>         ioc->pdev = pdev;
> @@ -8144,6 +8158,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>         list_del(&ioc->list);
>         scsi_remove_host(shost);
>   out_add_shost_fail:
> +       ioc->shost_attached = 0;
>         scsi_host_put(shost);
>         return -ENODEV;
>  }
> @@ -8164,7 +8179,8 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>         pci_power_t device_state;
>
>         mpt2sas_base_stop_watchdog(ioc);
> -       scsi_block_requests(shost);
> +       if (ioc->shost_attached)
> +               scsi_block_requests(shost);
>         device_state = pci_choose_state(pdev, state);
>         printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, entering "
>             "operating state [D%d]\n", ioc->name, pdev,
> @@ -8204,7 +8220,8 @@ _scsih_resume(struct pci_dev *pdev)
>                 return r;
>
>         mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
> -       scsi_unblock_requests(shost);
> +       if (ioc->shost_attached)
> +               scsi_unblock_requests(shost);
>         mpt2sas_base_start_watchdog(ioc);
>         return 0;
>  }
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 1836003..f8c25a1 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -56,7 +56,6 @@
>  #include <linux/dma-mapping.h>
>  #include <linux/io.h>
>  #include <linux/time.h>
> -#include <linux/kthread.h>
>  #include <linux/aer.h>
>
>
> @@ -110,28 +109,6 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug,
>         param_get_int, &mpt3sas_fwfault_debug, 0644);
>
>  /**
> - *  mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc
> - * @arg: input argument, used to derive ioc
> - *
> - * Return 0 if controller is removed from pci subsystem.
> - * Return -1 for other case.
> - */
> -static int mpt3sas_remove_dead_ioc_func(void *arg)
> -{
> -       struct MPT3SAS_ADAPTER *ioc = (struct MPT3SAS_ADAPTER *)arg;
> -       struct pci_dev *pdev;
> -
> -       if ((ioc == NULL))
> -               return -1;
> -
> -       pdev = ioc->pdev;
> -       if ((pdev == NULL))
> -               return -1;
> -       pci_stop_and_remove_bus_device(pdev);
> -       return 0;
> -}
> -
> -/**
>   * _base_fault_reset_work - workq handling ioc fault conditions
>   * @work: input argument, used to derive ioc
>   * Context: sleep.
> @@ -146,8 +123,6 @@ _base_fault_reset_work(struct work_struct *work)
>         unsigned long    flags;
>         u32 doorbell;
>         int rc;
> -       struct task_struct *p;
> -
>
>         spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
>         if (ioc->shost_recovery)
> @@ -167,22 +142,8 @@ _base_fault_reset_work(struct work_struct *work)
>                  * command back from HW.
>                  */
>                 ioc->schedule_dead_ioc_flush_running_cmds(ioc);
> -               /*
> -                * Set remove_host flag early since kernel thread will
> -                * take some time to execute.
> -                */
> -               ioc->remove_host = 1;
> -               /*Remove the Dead Host */
> -               p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc,
> -                   "mpt3sas_dead_ioc_%d", ioc->id);
> -               if (IS_ERR(p))
> -                       pr_err(MPT3SAS_FMT
> -                       "%s: Running mpt3sas_dead_ioc thread failed !!!!\n",
> -                       ioc->name, __func__);
> -               else
> -                       pr_err(MPT3SAS_FMT
> -                       "%s: Running mpt3sas_dead_ioc thread success !!!!\n",
> -                       ioc->name, __func__);
> +               mpt3sas_scsih_detach(ioc);
> +
>                 return; /* don't rearm timer */
>         }
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 994656c..9fbf56c 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -731,6 +731,7 @@ struct MPT3SAS_ADAPTER {
>         u8              broadcast_aen_busy;
>         u16             broadcast_aen_pending;
>         u8              shost_recovery;
> +       u8              shost_attached;
>
>         struct mutex    reset_in_progress_mutex;
>         spinlock_t      ioc_reset_in_progress_lock;
> @@ -990,6 +991,7 @@ int mpt3sas_port_enable(struct MPT3SAS_ADAPTER *ioc);
>  u8 mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
>         u32 reply);
>  void mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase);
> +void mpt3sas_scsih_detach(struct MPT3SAS_ADAPTER *ioc);
>
>  int mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle,
>         uint channel, uint id, uint lun, u8 type, u16 smid_task,
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index dcbf7c8..9ae0914 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -7365,22 +7365,23 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)
>  }
>
>  /**
> - * _scsih_remove - detach and remove add host
> - * @pdev: PCI device struct
> + * mpt3sas_scsih_detach - detach from SCSI midlayer and free resources
> + * @ioc: per adapter object
>   *
> - * Routine called when unloading the driver.
>   * Return nothing.
>   */
> -static void _scsih_remove(struct pci_dev *pdev)
> +void
> +mpt3sas_scsih_detach(struct MPT3SAS_ADAPTER *ioc)
>  {
> -       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> -       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
>         struct _sas_port *mpt3sas_port, *next_port;
>         struct _raid_device *raid_device, *next;
>         struct MPT3SAS_TARGET *sas_target_priv_data;
>         struct workqueue_struct *wq;
>         unsigned long flags;
>
> +       if (!ioc->shost_attached)
> +               return;
> +
>         ioc->remove_host = 1;
>         _scsih_fw_event_cleanup_queue(ioc);
>
> @@ -7429,11 +7430,9 @@ static void _scsih_remove(struct pci_dev *pdev)
>                 ioc->sas_hba.num_phys = 0;
>         }
>
> -       sas_remove_host(shost);
> -       mpt3sas_base_detach(ioc);
> -       list_del(&ioc->list);
> -       scsi_remove_host(shost);
> -       scsi_host_put(shost);
> +       sas_remove_host(ioc->shost);
> +       scsi_remove_host(ioc->shost);
> +       ioc->shost_attached = 0;
>  }
>
>  /**
> @@ -7447,23 +7446,37 @@ _scsih_shutdown(struct pci_dev *pdev)
>  {
>         struct Scsi_Host *shost = pci_get_drvdata(pdev);
>         struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> -       struct workqueue_struct *wq;
> -       unsigned long flags;
>
>         ioc->remove_host = 1;
> -       _scsih_fw_event_cleanup_queue(ioc);
> -
> -       spin_lock_irqsave(&ioc->fw_event_lock, flags);
> -       wq = ioc->firmware_event_thread;
> -       ioc->firmware_event_thread = NULL;
> -       spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
> -       if (wq)
> -               destroy_workqueue(wq);
>
> -       _scsih_ir_shutdown(ioc);
> +       mpt3sas_base_stop_watchdog(ioc);
> +       mpt3sas_scsih_detach(ioc);
>         mpt3sas_base_detach(ioc);
>  }
>
> +/**
> + * _scsih_remove - detach and remove add host
> + * @pdev: PCI device struct
> + *
> + * Routine called when unloading the driver.
> + * Return nothing.
> + */
> +static void
> +_scsih_remove(struct pci_dev *pdev)
> +{
> +       struct Scsi_Host *shost = pci_get_drvdata(pdev);
> +       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> +
> +       ioc->remove_host = 1;
> +
> +       mpt3sas_base_stop_watchdog(ioc);
> +       mpt3sas_scsih_detach(ioc);
> +
> +       mpt3sas_base_detach(ioc);
> +       list_del(&ioc->list);
> +
> +       scsi_host_put(ioc->shost);
> +}
>
>  /**
>   * _scsih_probe_boot_devices - reports 1st device
> @@ -7735,6 +7748,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>         INIT_LIST_HEAD(&ioc->list);
>         list_add_tail(&ioc->list, &mpt3sas_ioc_list);
>         ioc->shost = shost;
> +       ioc->shost_attached = 1;
>         ioc->id = mpt_ids++;
>         sprintf(ioc->name, "%s%d", MPT3SAS_DRIVER_NAME, ioc->id);
>         ioc->pdev = pdev;
> @@ -7839,6 +7853,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>         list_del(&ioc->list);
>         scsi_remove_host(shost);
>   out_add_shost_fail:
> +       ioc->shost_attached = 0;
>         scsi_host_put(shost);
>         return -ENODEV;
>  }
> @@ -7860,7 +7875,8 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
>
>         mpt3sas_base_stop_watchdog(ioc);
>         flush_scheduled_work();
> -       scsi_block_requests(shost);
> +       if (ioc->shost_attached)
> +               scsi_block_requests(shost);
>         device_state = pci_choose_state(pdev, state);
>         pr_info(MPT3SAS_FMT
>                 "pdev=0x%p, slot=%s, entering operating state [D%d]\n",
> @@ -7899,7 +7915,8 @@ _scsih_resume(struct pci_dev *pdev)
>                 return r;
>
>         mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
> -       scsi_unblock_requests(shost);
> +       if (ioc->shost_attached)
> +               scsi_unblock_requests(shost);
>         mpt3sas_base_start_watchdog(ioc);
>         return 0;
>  }
> --
> 1.8.1.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-05-17 15:29   ` Bjorn Helgaas
@ 2013-05-17 21:42     ` Joe Lawrence
  2013-10-10 21:59       ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Joe Lawrence @ 2013-05-17 21:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-scsi, James E.J. Bottomley, Sreekanth Reddy, Desai,
	Kashyap, Nagalakshmi Nandigama, linux-pci

On Fri, 17 May 2013 09:29:06 -0600
Bjorn Helgaas <bhelgaas@google.com> wrote:

> [+cc linux-pci]
> 
> On Wed, May 15, 2013 at 11:29 AM, Joe Lawrence
> <joe.lawrence@stratus.com> wrote:
> > From 84ac7a35ebd61e84d4254eae78bb967de17254c2 Mon Sep 17 00:00:00
> > 2001 From: Joe Lawrence <joe.lawrence@stratus.com>
> > Date: Wed, 15 May 2013 12:52:31 -0400
> > Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device
> > removal safe
> >
> > Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
> > device removal races with PCI callback functions.
> >
> > Simplify the mpt2sas watchdog mechanism by separating PCI device
> > removal from SCSI midlayer detachment. When the watchdog wishes to
> > remove a SCSI device from the topology, detach from the SCSI
> > midlayer, leaving the PCI device alone. Adjust various pci_driver
> > callbacks to account for a potentially SCSI detached PCI device.
> 
> I don't know the details of the SCSI detachment, but this approach
> looks much cleaner to me.

Thanks for commenting, Bjorn.  I think this approach more closely
represents what this watchdog is trying to accomplish.  

Off list, Sreekanth from LSI tested and noticed a few issues with this
patch:

 - mpt2sas_base_stop_watchdog is called twice: The call from
   mpt2sas_base_detach is safe, but now unnecessary (as a call was
   added earlier up in the PCI driver callbacks to ensure that the
   watchdog was out of the way.) This second invocation can be removed.

 - If the watchdog detects a bad IOC, the watchdog remains running:
   The watchdog workqueue isn't cleaned up until
   mpt2sas_base_stop_watchdog is called, so in the case that the
   watchdog removes the device from SCSI topo, the workqueue will
   remain unused until PCI .remove/.shutdown cleans it up. Perhaps a
   single watchdog that iterates over all adapters would be simpler?

Finally, if SCSI topo detachment is all that is interesting here, would
it make more sense to move the watchdog into the MPT "scsi" code?  I
haven't looked at the code yet, but this might make an MPT fusion patch
easier (due to dependencies between its "scsi" and "base" modules).

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mpt2sas,mpt3sas watchdog device removal
  2013-05-15 17:29 ` mpt2sas,mpt3sas watchdog device removal Joe Lawrence
  2013-05-17 15:29   ` Bjorn Helgaas
@ 2013-07-16 12:00   ` Reddy, Sreekanth
  2013-07-16 12:03     ` James Bottomley
  1 sibling, 1 reply; 12+ messages in thread
From: Reddy, Sreekanth @ 2013-07-16 12:00 UTC (permalink / raw)
  To: Joe Lawrence, linux-scsi; +Cc: James E.J. Bottomley

James,

This patch seem to be fine. Please consider this patch.

Regards,
Sreekanth.

-----Original Message-----
From: Joe Lawrence [mailto:joe.lawrence@stratus.com]
Sent: Wednesday, May 15, 2013 11:00 PM
To: linux-scsi@vger.kernel.org
Cc: James E.J. Bottomley; Reddy, Sreekanth; Desai, Kashyap; Nandigama, Nagalakshmi; Bjorn Helgaas
Subject: Re: mpt2sas,mpt3sas watchdog device removal

>From 84ac7a35ebd61e84d4254eae78bb967de17254c2 Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@stratus.com>
Date: Wed, 15 May 2013 12:52:31 -0400
Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal  safe

Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce device removal races with PCI callback functions.

Simplify the mpt2sas watchdog mechanism by separating PCI device removal from SCSI midlayer detachment. When the watchdog wishes to remove a SCSI device from the topology, detach from the SCSI midlayer, leaving the PCI device alone. Adjust various pci_driver callbacks to account for a potentially SCSI detached PCI device.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/scsi/mpt2sas/mpt2sas_base.c  | 43 +---------------  drivers/scsi/mpt2sas/mpt2sas_base.h  |  2 +  drivers/scsi/mpt2sas/mpt2sas_scsih.c | 97 +++++++++++++++++++++---------------
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 43 +---------------  drivers/scsi/mpt3sas/mpt3sas_base.h  |  2 +  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 65 +++++++++++++++---------
 6 files changed, 105 insertions(+), 147 deletions(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
index bcb23d2..cc1bf28 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -57,7 +57,6 @@
 #include <linux/sort.h>
 #include <linux/io.h>
 #include <linux/time.h>
-#include <linux/kthread.h>
 #include <linux/aer.h>

 #include "mpt2sas_base.h"
@@ -115,29 +114,6 @@ module_param_call(mpt2sas_fwfault_debug, _scsih_set_fwfault_debug,
     param_get_int, &mpt2sas_fwfault_debug, 0644);

 /**
- *  mpt2sas_remove_dead_ioc_func - kthread context to remove dead ioc
- * @arg: input argument, used to derive ioc
- *
- * Return 0 if controller is removed from pci subsystem.
- * Return -1 for other case.
- */
-static int mpt2sas_remove_dead_ioc_func(void *arg) -{
-               struct MPT2SAS_ADAPTER *ioc = (struct MPT2SAS_ADAPTER *)arg;
-               struct pci_dev *pdev;
-
-               if ((ioc == NULL))
-                       return -1;
-
-               pdev = ioc->pdev;
-               if ((pdev == NULL))
-                       return -1;
-               pci_stop_and_remove_bus_device(pdev);
-               return 0;
-}
-
-
-/**
  * _base_fault_reset_work - workq handling ioc fault conditions
  * @work: input argument, used to derive ioc
  * Context: sleep.
@@ -152,7 +128,6 @@ _base_fault_reset_work(struct work_struct *work)
        unsigned long    flags;
        u32 doorbell;
        int rc;
-       struct task_struct *p;

        spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
        if (ioc->shost_recovery || ioc->pci_error_recovery) @@ -186,23 +161,7 @@ _base_fault_reset_work(struct work_struct *work)
                 * command back from HW.
                 */
                ioc->schedule_dead_ioc_flush_running_cmds(ioc);
-               /*
-                * Set remove_host flag early since kernel thread will
-                * take some time to execute.
-                */
-               ioc->remove_host = 1;
-               /*Remove the Dead Host */
-               p = kthread_run(mpt2sas_remove_dead_ioc_func, ioc,
-                   "mpt2sas_dead_ioc_%d", ioc->id);
-               if (IS_ERR(p)) {
-                       printk(MPT2SAS_ERR_FMT
-                       "%s: Running mpt2sas_dead_ioc thread failed !!!!\n",
-                       ioc->name, __func__);
-               } else {
-                   printk(MPT2SAS_ERR_FMT
-                       "%s: Running mpt2sas_dead_ioc thread success !!!!\n",
-                       ioc->name, __func__);
-               }
+               mpt2sas_scsih_detach(ioc);

                return; /* don't rearm timer */
        }
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
index 4caaac1..94d0e98 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -817,6 +817,7 @@ struct MPT2SAS_ADAPTER {
        u8              broadcast_aen_busy;
        u16             broadcast_aen_pending;
        u8              shost_recovery;
+       u8              shost_attached;

        struct mutex    reset_in_progress_mutex;
        spinlock_t      ioc_reset_in_progress_lock;
@@ -1078,6 +1079,7 @@ struct _sas_device *mpt2sas_scsih_sas_device_find_by_sas_address(
 void mpt2sas_port_enable_complete(struct MPT2SAS_ADAPTER *ioc);

 void mpt2sas_scsih_reset_handler(struct MPT2SAS_ADAPTER *ioc, int reset_phase);
+void mpt2sas_scsih_detach(struct MPT2SAS_ADAPTER *ioc);

 /* config shared API */
 u8 mpt2sas_config_done(struct MPT2SAS_ADAPTER *ioc, u16 smid, u8 msix_index, diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index c6bdc92..a06662c 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7652,51 +7652,23 @@ _scsih_ir_shutdown(struct MPT2SAS_ADAPTER *ioc)  }

 /**
- * _scsih_shutdown - routine call during system shutdown
- * @pdev: PCI device struct
- *
- * Return nothing.
- */
-static void
-_scsih_shutdown(struct pci_dev *pdev)
-{
-       struct Scsi_Host *shost = pci_get_drvdata(pdev);
-       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
-       struct workqueue_struct *wq;
-       unsigned long flags;
-
-       ioc->remove_host = 1;
-       _scsih_fw_event_cleanup_queue(ioc);
-
-       spin_lock_irqsave(&ioc->fw_event_lock, flags);
-       wq = ioc->firmware_event_thread;
-       ioc->firmware_event_thread = NULL;
-       spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
-       if (wq)
-               destroy_workqueue(wq);
-
-       _scsih_ir_shutdown(ioc);
-       mpt2sas_base_detach(ioc);
-}
-
-/**
- * _scsih_remove - detach and remove add host
- * @pdev: PCI device struct
+ * mpt2sas_scsih_detach - detach from SCSI midlayer and free resources
+ * @ioc: per adapter object
  *
- * Routine called when unloading the driver.
  * Return nothing.
  */
-static void
-_scsih_remove(struct pci_dev *pdev)
+void
+mpt2sas_scsih_detach(struct MPT2SAS_ADAPTER *ioc)
 {
-       struct Scsi_Host *shost = pci_get_drvdata(pdev);
-       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
        struct _sas_port *mpt2sas_port, *next_port;
        struct _raid_device *raid_device, *next;
        struct MPT2SAS_TARGET *sas_target_priv_data;
        struct workqueue_struct *wq;
        unsigned long flags;

+       if (!ioc->shost_attached)
+               return;
+
        ioc->remove_host = 1;
        _scsih_fw_event_cleanup_queue(ioc);

@@ -7745,11 +7717,52 @@ _scsih_remove(struct pci_dev *pdev)
                ioc->sas_hba.num_phys = 0;
        }

-       sas_remove_host(shost);
+       sas_remove_host(ioc->shost);
+       scsi_remove_host(ioc->shost);
+       ioc->shost_attached = 0;
+}
+
+/**
+ * _scsih_shutdown - routine call during system shutdown
+ * @pdev: PCI device struct
+ *
+ * Return nothing.
+ */
+static void
+_scsih_shutdown(struct pci_dev *pdev)
+{
+       struct Scsi_Host *shost = pci_get_drvdata(pdev);
+       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+       ioc->remove_host = 1;
+
+       mpt2sas_base_stop_watchdog(ioc);
+       mpt2sas_scsih_detach(ioc);
+       mpt2sas_base_detach(ioc);
+}
+
+/**
+ * _scsih_remove - detach and remove add host
+ * @pdev: PCI device struct
+ *
+ * Routine called when unloading the driver.
+ * Return nothing.
+ */
+static void
+_scsih_remove(struct pci_dev *pdev)
+{
+       struct Scsi_Host *shost = pci_get_drvdata(pdev);
+       struct MPT2SAS_ADAPTER *ioc = shost_priv(shost);
+
+       ioc->remove_host = 1;
+
+       mpt2sas_base_stop_watchdog(ioc);
+       mpt2sas_scsih_detach(ioc);
+
        mpt2sas_base_detach(ioc);
        list_del(&ioc->list);
-       scsi_remove_host(shost);
-       scsi_host_put(shost);
+
+       scsi_host_put(ioc->shost);
 }

 /**
@@ -8022,6 +8035,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        INIT_LIST_HEAD(&ioc->list);
        list_add_tail(&ioc->list, &mpt2sas_ioc_list);
        ioc->shost = shost;
+       ioc->shost_attached = 1;
        ioc->id = mpt_ids++;
        sprintf(ioc->name, "%s%d", MPT2SAS_DRIVER_NAME, ioc->id);
        ioc->pdev = pdev;
@@ -8144,6 +8158,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        list_del(&ioc->list);
        scsi_remove_host(shost);
  out_add_shost_fail:
+       ioc->shost_attached = 0;
        scsi_host_put(shost);
        return -ENODEV;
 }
@@ -8164,7 +8179,8 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)
        pci_power_t device_state;

        mpt2sas_base_stop_watchdog(ioc);
-       scsi_block_requests(shost);
+       if (ioc->shost_attached)
+               scsi_block_requests(shost);
        device_state = pci_choose_state(pdev, state);
        printk(MPT2SAS_INFO_FMT "pdev=0x%p, slot=%s, entering "
            "operating state [D%d]\n", ioc->name, pdev, @@ -8204,7 +8220,8 @@ _scsih_resume(struct pci_dev *pdev)
                return r;

        mpt2sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
-       scsi_unblock_requests(shost);
+       if (ioc->shost_attached)
+               scsi_unblock_requests(shost);
        mpt2sas_base_start_watchdog(ioc);
        return 0;
 }
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 1836003..f8c25a1 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -56,7 +56,6 @@
 #include <linux/dma-mapping.h>
 #include <linux/io.h>
 #include <linux/time.h>
-#include <linux/kthread.h>
 #include <linux/aer.h>


@@ -110,28 +109,6 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug,
        param_get_int, &mpt3sas_fwfault_debug, 0644);

 /**
- *  mpt3sas_remove_dead_ioc_func - kthread context to remove dead ioc
- * @arg: input argument, used to derive ioc
- *
- * Return 0 if controller is removed from pci subsystem.
- * Return -1 for other case.
- */
-static int mpt3sas_remove_dead_ioc_func(void *arg) -{
-       struct MPT3SAS_ADAPTER *ioc = (struct MPT3SAS_ADAPTER *)arg;
-       struct pci_dev *pdev;
-
-       if ((ioc == NULL))
-               return -1;
-
-       pdev = ioc->pdev;
-       if ((pdev == NULL))
-               return -1;
-       pci_stop_and_remove_bus_device(pdev);
-       return 0;
-}
-
-/**
  * _base_fault_reset_work - workq handling ioc fault conditions
  * @work: input argument, used to derive ioc
  * Context: sleep.
@@ -146,8 +123,6 @@ _base_fault_reset_work(struct work_struct *work)
        unsigned long    flags;
        u32 doorbell;
        int rc;
-       struct task_struct *p;
-

        spin_lock_irqsave(&ioc->ioc_reset_in_progress_lock, flags);
        if (ioc->shost_recovery)
@@ -167,22 +142,8 @@ _base_fault_reset_work(struct work_struct *work)
                 * command back from HW.
                 */
                ioc->schedule_dead_ioc_flush_running_cmds(ioc);
-               /*
-                * Set remove_host flag early since kernel thread will
-                * take some time to execute.
-                */
-               ioc->remove_host = 1;
-               /*Remove the Dead Host */
-               p = kthread_run(mpt3sas_remove_dead_ioc_func, ioc,
-                   "mpt3sas_dead_ioc_%d", ioc->id);
-               if (IS_ERR(p))
-                       pr_err(MPT3SAS_FMT
-                       "%s: Running mpt3sas_dead_ioc thread failed !!!!\n",
-                       ioc->name, __func__);
-               else
-                       pr_err(MPT3SAS_FMT
-                       "%s: Running mpt3sas_dead_ioc thread success !!!!\n",
-                       ioc->name, __func__);
+               mpt3sas_scsih_detach(ioc);
+
                return; /* don't rearm timer */
        }

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 994656c..9fbf56c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -731,6 +731,7 @@ struct MPT3SAS_ADAPTER {
        u8              broadcast_aen_busy;
        u16             broadcast_aen_pending;
        u8              shost_recovery;
+       u8              shost_attached;

        struct mutex    reset_in_progress_mutex;
        spinlock_t      ioc_reset_in_progress_lock;
@@ -990,6 +991,7 @@ int mpt3sas_port_enable(struct MPT3SAS_ADAPTER *ioc);
 u8 mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
        u32 reply);
 void mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase);
+void mpt3sas_scsih_detach(struct MPT3SAS_ADAPTER *ioc);

 int mpt3sas_scsih_issue_tm(struct MPT3SAS_ADAPTER *ioc, u16 handle,
        uint channel, uint id, uint lun, u8 type, u16 smid_task, diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index dcbf7c8..9ae0914 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -7365,22 +7365,23 @@ _scsih_ir_shutdown(struct MPT3SAS_ADAPTER *ioc)  }

 /**
- * _scsih_remove - detach and remove add host
- * @pdev: PCI device struct
+ * mpt3sas_scsih_detach - detach from SCSI midlayer and free resources
+ * @ioc: per adapter object
  *
- * Routine called when unloading the driver.
  * Return nothing.
  */
-static void _scsih_remove(struct pci_dev *pdev)
+void
+mpt3sas_scsih_detach(struct MPT3SAS_ADAPTER *ioc)
 {
-       struct Scsi_Host *shost = pci_get_drvdata(pdev);
-       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
        struct _sas_port *mpt3sas_port, *next_port;
        struct _raid_device *raid_device, *next;
        struct MPT3SAS_TARGET *sas_target_priv_data;
        struct workqueue_struct *wq;
        unsigned long flags;

+       if (!ioc->shost_attached)
+               return;
+
        ioc->remove_host = 1;
        _scsih_fw_event_cleanup_queue(ioc);

@@ -7429,11 +7430,9 @@ static void _scsih_remove(struct pci_dev *pdev)
                ioc->sas_hba.num_phys = 0;
        }

-       sas_remove_host(shost);
-       mpt3sas_base_detach(ioc);
-       list_del(&ioc->list);
-       scsi_remove_host(shost);
-       scsi_host_put(shost);
+       sas_remove_host(ioc->shost);
+       scsi_remove_host(ioc->shost);
+       ioc->shost_attached = 0;
 }

 /**
@@ -7447,23 +7446,37 @@ _scsih_shutdown(struct pci_dev *pdev)  {
        struct Scsi_Host *shost = pci_get_drvdata(pdev);
        struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
-       struct workqueue_struct *wq;
-       unsigned long flags;

        ioc->remove_host = 1;
-       _scsih_fw_event_cleanup_queue(ioc);
-
-       spin_lock_irqsave(&ioc->fw_event_lock, flags);
-       wq = ioc->firmware_event_thread;
-       ioc->firmware_event_thread = NULL;
-       spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
-       if (wq)
-               destroy_workqueue(wq);

-       _scsih_ir_shutdown(ioc);
+       mpt3sas_base_stop_watchdog(ioc);
+       mpt3sas_scsih_detach(ioc);
        mpt3sas_base_detach(ioc);
 }

+/**
+ * _scsih_remove - detach and remove add host
+ * @pdev: PCI device struct
+ *
+ * Routine called when unloading the driver.
+ * Return nothing.
+ */
+static void
+_scsih_remove(struct pci_dev *pdev)
+{
+       struct Scsi_Host *shost = pci_get_drvdata(pdev);
+       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+
+       ioc->remove_host = 1;
+
+       mpt3sas_base_stop_watchdog(ioc);
+       mpt3sas_scsih_detach(ioc);
+
+       mpt3sas_base_detach(ioc);
+       list_del(&ioc->list);
+
+       scsi_host_put(ioc->shost);
+}

 /**
  * _scsih_probe_boot_devices - reports 1st device @@ -7735,6 +7748,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        INIT_LIST_HEAD(&ioc->list);
        list_add_tail(&ioc->list, &mpt3sas_ioc_list);
        ioc->shost = shost;
+       ioc->shost_attached = 1;
        ioc->id = mpt_ids++;
        sprintf(ioc->name, "%s%d", MPT3SAS_DRIVER_NAME, ioc->id);
        ioc->pdev = pdev;
@@ -7839,6 +7853,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        list_del(&ioc->list);
        scsi_remove_host(shost);
  out_add_shost_fail:
+       ioc->shost_attached = 0;
        scsi_host_put(shost);
        return -ENODEV;
 }
@@ -7860,7 +7875,8 @@ _scsih_suspend(struct pci_dev *pdev, pm_message_t state)

        mpt3sas_base_stop_watchdog(ioc);
        flush_scheduled_work();
-       scsi_block_requests(shost);
+       if (ioc->shost_attached)
+               scsi_block_requests(shost);
        device_state = pci_choose_state(pdev, state);
        pr_info(MPT3SAS_FMT
                "pdev=0x%p, slot=%s, entering operating state [D%d]\n", @@ -7899,7 +7915,8 @@ _scsih_resume(struct pci_dev *pdev)
                return r;

        mpt3sas_base_hard_reset_handler(ioc, CAN_SLEEP, SOFT_RESET);
-       scsi_unblock_requests(shost);
+       if (ioc->shost_attached)
+               scsi_unblock_requests(shost);
        mpt3sas_base_start_watchdog(ioc);
        return 0;
 }
--
1.8.1.4




^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-07-16 12:00   ` Reddy, Sreekanth
@ 2013-07-16 12:03     ` James Bottomley
  2013-07-16 15:21       ` Joe Lawrence
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2013-07-16 12:03 UTC (permalink / raw)
  To: Reddy, Sreekanth; +Cc: Joe Lawrence, linux-scsi

On Tue, 2013-07-16 at 17:30 +0530, Reddy, Sreekanth wrote:
> James,
> 
> This patch seem to be fine. Please consider this patch.

Where's the new version?  The one that has all of this fixed:

> Off list, Sreekanth from LSI tested and noticed a few issues with this
> patch:
> 
>  - mpt2sas_base_stop_watchdog is called twice: The call from
>    mpt2sas_base_detach is safe, but now unnecessary (as a call was
>    added earlier up in the PCI driver callbacks to ensure that the
>    watchdog was out of the way.) This second invocation can be
> removed.
> 
>  - If the watchdog detects a bad IOC, the watchdog remains running:
>    The watchdog workqueue isn't cleaned up until
>    mpt2sas_base_stop_watchdog is called, so in the case that the
>    watchdog removes the device from SCSI topo, the workqueue will
>    remain unused until PCI .remove/.shutdown cleans it up. Perhaps a
>    single watchdog that iterates over all adapters would be simpler?
> 
> Finally, if SCSI topo detachment is all that is interesting here,
> would
> it make more sense to move the watchdog into the MPT "scsi" code?  I
> haven't looked at the code yet, but this might make an MPT fusion
> patch
> easier (due to dependencies between its "scsi" and "base" modules).

?

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-07-16 12:03     ` James Bottomley
@ 2013-07-16 15:21       ` Joe Lawrence
  2013-07-17 12:03         ` Reddy, Sreekanth
  0 siblings, 1 reply; 12+ messages in thread
From: Joe Lawrence @ 2013-07-16 15:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: Reddy, Sreekanth, linux-scsi

On Tue, 16 Jul 2013 16:03:38 +0400
James Bottomley <James.Bottomley@hansenpartnership.com> wrote:

> On Tue, 2013-07-16 at 17:30 +0530, Reddy, Sreekanth wrote:
> > James,
> > 
> > This patch seem to be fine. Please consider this patch.
> 
> Where's the new version?  The one that has all of this fixed:
> 
> > Off list, Sreekanth from LSI tested and noticed a few issues with this
> > patch:
> > 
> >  - mpt2sas_base_stop_watchdog is called twice: The call from
> >    mpt2sas_base_detach is safe, but now unnecessary (as a call was
> >    added earlier up in the PCI driver callbacks to ensure that the
> >    watchdog was out of the way.) This second invocation can be
> > removed.
> > 
> >  - If the watchdog detects a bad IOC, the watchdog remains running:
> >    The watchdog workqueue isn't cleaned up until
> >    mpt2sas_base_stop_watchdog is called, so in the case that the
> >    watchdog removes the device from SCSI topo, the workqueue will
> >    remain unused until PCI .remove/.shutdown cleans it up. Perhaps a
> >    single watchdog that iterates over all adapters would be simpler?
> > 
> > Finally, if SCSI topo detachment is all that is interesting here,
> > would
> > it make more sense to move the watchdog into the MPT "scsi" code?  I
> > haven't looked at the code yet, but this might make an MPT fusion
> > patch
> > easier (due to dependencies between its "scsi" and "base" modules).

This patch fizzled out in May as other work took priority.  If LSI is
still interested in these changes, I can dust off my notes and
test/rebase for the 3.11 series.

A few of the issues quoted above are easily fixed, however I remember
having an outstanding question of how to best clean up the driver's
per device watchdog workqueue:

The way the MPT drivers are working right now is that the watchdog
workqueue function _base_fault_reset_work() initiates a PCI device
removal via kthread.  The PCI callback kthread context then tears down
the device and cancel/flush/destroys the watchdog workqueue.

This patch eliminated the kthread and its call into PCI API, simply
detaching from the SCSI midlayer.  In my opinion, the kthread
complicated device removal and introduced potential races if the
watchdog tried removing the device at the same time an ordinary device
removal request occurred. 

At the time, the best solution I had was to leave the unused workqueue
around until its PCI device was removed.

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mpt2sas,mpt3sas watchdog device removal
  2013-07-16 15:21       ` Joe Lawrence
@ 2013-07-17 12:03         ` Reddy, Sreekanth
  0 siblings, 0 replies; 12+ messages in thread
From: Reddy, Sreekanth @ 2013-07-17 12:03 UTC (permalink / raw)
  To: Joe Lawrence, James Bottomley; +Cc: linux-scsi

Hi Joe,

>-----Original Message-----
>From: Joe Lawrence [mailto:joe.lawrence@stratus.com]
>Sent: Tuesday, July 16, 2013 8:52 PM
>To: James Bottomley
>Cc: Reddy, Sreekanth; linux-scsi@vger.kernel.org
>Subject: Re: mpt2sas,mpt3sas watchdog device removal
>
>On Tue, 16 Jul 2013 16:03:38 +0400
>James Bottomley <James.Bottomley@hansenpartnership.com> wrote:
>
>> On Tue, 2013-07-16 at 17:30 +0530, Reddy, Sreekanth wrote:
>> > James,
>> >
>> > This patch seem to be fine. Please consider this patch.
>>
>> Where's the new version?  The one that has all of this fixed:
>>
>> > Off list, Sreekanth from LSI tested and noticed a few issues with
>> > this
>> > patch:
>> >
>> >  - mpt2sas_base_stop_watchdog is called twice: The call from
>> >    mpt2sas_base_detach is safe, but now unnecessary (as a call was
>> >    added earlier up in the PCI driver callbacks to ensure that the
>> >    watchdog was out of the way.) This second invocation can be
>> > removed.
>> >
>> >  - If the watchdog detects a bad IOC, the watchdog remains running:
>> >    The watchdog workqueue isn't cleaned up until
>> >    mpt2sas_base_stop_watchdog is called, so in the case that the
>> >    watchdog removes the device from SCSI topo, the workqueue will
>> >    remain unused until PCI .remove/.shutdown cleans it up. Perhaps a
>> >    single watchdog that iterates over all adapters would be simpler?
>> >
>> > Finally, if SCSI topo detachment is all that is interesting here,
>> > would it make more sense to move the watchdog into the MPT "scsi"
>> > code?  I haven't looked at the code yet, but this might make an MPT
>> > fusion patch easier (due to dependencies between its "scsi" and
>> > "base" modules).
>
>This patch fizzled out in May as other work took priority.  If LSI is still
>interested in these changes, I can dust off my notes and test/rebase for the
>3.11 series.
 

It would be grate help if you post the updated patch which will accommodate the fixes for above comments.   
 
>
>A few of the issues quoted above are easily fixed, however I remember
>having an outstanding question of how to best clean up the driver's per device
>watchdog workqueue:
>
>The way the MPT drivers are working right now is that the watchdog
>workqueue function _base_fault_reset_work() initiates a PCI device removal
>via kthread.  The PCI callback kthread context then tears down the device and
>cancel/flush/destroys the watchdog workqueue.
>
>This patch eliminated the kthread and its call into PCI API, simply detaching
>from the SCSI midlayer.  In my opinion, the kthread complicated device
>removal and introduced potential races if the watchdog tried removing the
>device at the same time an ordinary device removal request occurred.
>
>At the time, the best solution I had was to leave the unused workqueue
>around until its PCI device was removed.
>
>Regards,
>
>-- Joe

Regards,
Sreekanth


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-05-17 21:42     ` Joe Lawrence
@ 2013-10-10 21:59       ` Bjorn Helgaas
  2013-10-21 14:24         ` Joe Lawrence
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2013-10-10 21:59 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-scsi, James E.J. Bottomley, Sreekanth Reddy, Desai,
	Kashyap, Nagalakshmi Nandigama, linux-pci

On Fri, May 17, 2013 at 3:42 PM, Joe Lawrence <joe.lawrence@stratus.com> wrote:
> On Fri, 17 May 2013 09:29:06 -0600
> Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>> [+cc linux-pci]
>>
>> On Wed, May 15, 2013 at 11:29 AM, Joe Lawrence
>> <joe.lawrence@stratus.com> wrote:
>> > From 84ac7a35ebd61e84d4254eae78bb967de17254c2 Mon Sep 17 00:00:00
>> > 2001 From: Joe Lawrence <joe.lawrence@stratus.com>
>> > Date: Wed, 15 May 2013 12:52:31 -0400
>> > Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device
>> > removal safe
>> >
>> > Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
>> > device removal races with PCI callback functions.
>> >
>> > Simplify the mpt2sas watchdog mechanism by separating PCI device
>> > removal from SCSI midlayer detachment. When the watchdog wishes to
>> > remove a SCSI device from the topology, detach from the SCSI
>> > midlayer, leaving the PCI device alone. Adjust various pci_driver
>> > callbacks to account for a potentially SCSI detached PCI device.
>>
>> I don't know the details of the SCSI detachment, but this approach
>> looks much cleaner to me.
>
> Thanks for commenting, Bjorn.  I think this approach more closely
> represents what this watchdog is trying to accomplish.
>
> Off list, Sreekanth from LSI tested and noticed a few issues with this
> patch:
>
>  - mpt2sas_base_stop_watchdog is called twice: The call from
>    mpt2sas_base_detach is safe, but now unnecessary (as a call was
>    added earlier up in the PCI driver callbacks to ensure that the
>    watchdog was out of the way.) This second invocation can be removed.
>
>  - If the watchdog detects a bad IOC, the watchdog remains running:
>    The watchdog workqueue isn't cleaned up until
>    mpt2sas_base_stop_watchdog is called, so in the case that the
>    watchdog removes the device from SCSI topo, the workqueue will
>    remain unused until PCI .remove/.shutdown cleans it up. Perhaps a
>    single watchdog that iterates over all adapters would be simpler?
>
> Finally, if SCSI topo detachment is all that is interesting here, would
> it make more sense to move the watchdog into the MPT "scsi" code?  I
> haven't looked at the code yet, but this might make an MPT fusion patch
> easier (due to dependencies between its "scsi" and "base" modules).

Hi Joe,

I noticed this when looking through old email.  I can't remember if
anything happened with this.  It seems like nice work; did it go
anywhere?

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mpt2sas,mpt3sas watchdog device removal
  2013-10-10 21:59       ` Bjorn Helgaas
@ 2013-10-21 14:24         ` Joe Lawrence
  0 siblings, 0 replies; 12+ messages in thread
From: Joe Lawrence @ 2013-10-21 14:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-scsi, James E.J. Bottomley, Sreekanth Reddy, Desai,
	Kashyap, Nagalakshmi Nandigama, linux-pci

On Thu, 10 Oct 2013 15:59:28 -0600
Bjorn Helgaas <bhelgaas@google.com> wrote:

> On Fri, May 17, 2013 at 3:42 PM, Joe Lawrence <joe.lawrence@stratus.com> wrote:
> > On Fri, 17 May 2013 09:29:06 -0600
> > Bjorn Helgaas <bhelgaas@google.com> wrote:
> >
> >> [+cc linux-pci]
> >>
> >> On Wed, May 15, 2013 at 11:29 AM, Joe Lawrence
> >> <joe.lawrence@stratus.com> wrote:
> >> > From 84ac7a35ebd61e84d4254eae78bb967de17254c2 Mon Sep 17 00:00:00
> >> > 2001 From: Joe Lawrence <joe.lawrence@stratus.com>
> >> > Date: Wed, 15 May 2013 12:52:31 -0400
> >> > Subject: [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device
> >> > removal safe
> >> >
> >> > Calling pci_stop_and_remove_bus_device from a SCSI LLD may introduce
> >> > device removal races with PCI callback functions.
> >> >
> >> > Simplify the mpt2sas watchdog mechanism by separating PCI device
> >> > removal from SCSI midlayer detachment. When the watchdog wishes to
> >> > remove a SCSI device from the topology, detach from the SCSI
> >> > midlayer, leaving the PCI device alone. Adjust various pci_driver
> >> > callbacks to account for a potentially SCSI detached PCI device.
> >>
> >> I don't know the details of the SCSI detachment, but this approach
> >> looks much cleaner to me.
> >
> > Thanks for commenting, Bjorn.  I think this approach more closely
> > represents what this watchdog is trying to accomplish.
> >
> > Off list, Sreekanth from LSI tested and noticed a few issues with this
> > patch:
> >
> >  - mpt2sas_base_stop_watchdog is called twice: The call from
> >    mpt2sas_base_detach is safe, but now unnecessary (as a call was
> >    added earlier up in the PCI driver callbacks to ensure that the
> >    watchdog was out of the way.) This second invocation can be removed.
> >
> >  - If the watchdog detects a bad IOC, the watchdog remains running:
> >    The watchdog workqueue isn't cleaned up until
> >    mpt2sas_base_stop_watchdog is called, so in the case that the
> >    watchdog removes the device from SCSI topo, the workqueue will
> >    remain unused until PCI .remove/.shutdown cleans it up. Perhaps a
> >    single watchdog that iterates over all adapters would be simpler?
> >
> > Finally, if SCSI topo detachment is all that is interesting here, would
> > it make more sense to move the watchdog into the MPT "scsi" code?  I
> > haven't looked at the code yet, but this might make an MPT fusion patch
> > easier (due to dependencies between its "scsi" and "base" modules).
> 
> Hi Joe,
> 
> I noticed this when looking through old email.  I can't remember if
> anything happened with this.  It seems like nice work; did it go
> anywhere?

Hello Bjorn,

Sorry for the late reply, I was on vacation last week.

This patchset started as an attempt to push upstream some of the
changes that Stratus makes to the fusion/mpt* drivers.  I had hoped to
come up with a patchset that fixed hotplug vs. watchdog removal in each
of the three LSI drivers.  Unfortunately this was more of a spare time
project and other work took priority.

There is a chance that Stratus maybe revisiting the fusion driver in
the near future.  I will keep this patchset in mind if that happens.

Regards,

-- Joe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-10-21 14:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 17:24 mpt2sas,mpt3sas watchdog device removal Joe Lawrence
2013-05-15 17:26 ` [PATCH] mpt2sas,mpt3sas: make watchdog instantiated device removal safe Joe Lawrence
2013-05-17 15:29   ` Bjorn Helgaas
2013-05-15 17:29 ` mpt2sas,mpt3sas watchdog device removal Joe Lawrence
2013-05-17 15:29   ` Bjorn Helgaas
2013-05-17 21:42     ` Joe Lawrence
2013-10-10 21:59       ` Bjorn Helgaas
2013-10-21 14:24         ` Joe Lawrence
2013-07-16 12:00   ` Reddy, Sreekanth
2013-07-16 12:03     ` James Bottomley
2013-07-16 15:21       ` Joe Lawrence
2013-07-17 12:03         ` Reddy, Sreekanth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.