linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
@ 2015-08-26  4:09 Nicholas A. Bellinger
  2015-08-26  4:09 ` [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-26  4:09 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, James Bottomley, Calvin Owens, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

Hi James & Co,

This series is a mpt3sas forward port of Calvin Owens' in-flight
reference counting bugfixes for mpt2sas LLD code here:

[PATCH v4 0/2] Fixes for memory corruption in mpt2sas
http://marc.info/?l=linux-scsi&m=143951695904115&w=2

The differences between mpt2sas + mpt3sas in this area are very,
very small, and is required to address a NULL pointer dereference
OOPsen in _scsih_probe_sas() -> mpt3sas_transport_port_add() ->
sas_port_add_phy() that I've been hitting occasionally on boot
during initial LUN scan.

So far this code has been tested on v3.14.47 with a small cluster
using SAS3008 HBAs plus a few preceeding upstream mpt3sas patches
so these patches apply cleanly, and with the changes in place the
original OOPsen appears to be resolved.

This patch series is cut atop v4.2-rc1 code, and barring any
objections from Avago folks et al., should be considered along
with Calvin's mpt2sas patch set for v4.3-rc1 code.

Thank you,

--nab

Nicholas Bellinger (2):
  mpt3sas: Refcount sas_device objects and fix unsafe list usage
  mpt3sas: Refcount fw_events and fix unsafe list usage

 drivers/scsi/mpt3sas/mpt3sas_base.h      |  23 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 592 ++++++++++++++++++++++---------
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  12 +-
 3 files changed, 449 insertions(+), 178 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-08-26  4:09 [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Nicholas A. Bellinger
@ 2015-08-26  4:09 ` Nicholas A. Bellinger
  2015-08-27  2:20   ` Calvin Owens
  2015-08-26  4:09 ` [PATCH 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
  2015-08-26 23:54 ` [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Calvin Owens
  2 siblings, 1 reply; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-26  4:09 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, James Bottomley, Calvin Owens, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

These objects can be referenced concurrently throughout the driver, we
need a way to make sure threads can't delete them out from under
each other. This patch adds the refcount, and refactors the code to use
it.

Additionally, we cannot iterate over the sas_device_list without
holding the lock, or we risk corrupting random memory if items are
added or deleted as we iterate. This patch refactors _scsih_probe_sas()
to use the sas_device_list in a safe way.

This patch is a port of Calvin's PATCH-v4 for mpt2sas code.

Cc: Calvin Owens <calvinowens@fb.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/mpt3sas/mpt3sas_base.h      |  23 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 476 +++++++++++++++++++++----------
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  12 +-
 3 files changed, 355 insertions(+), 156 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index afa8816..fe29ac0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -209,6 +209,7 @@ struct Mpi2ManufacturingPage11_t {
  * @flags: MPT_TARGET_FLAGS_XXX flags
  * @deleted: target flaged for deletion
  * @tm_busy: target is busy with TM request.
+ * @sdev: The sas_device associated with this target
  */
 struct MPT3SAS_TARGET {
 	struct scsi_target *starget;
@@ -218,6 +219,7 @@ struct MPT3SAS_TARGET {
 	u32	flags;
 	u8	deleted;
 	u8	tm_busy;
+	struct _sas_device *sdev;
 };
 
 
@@ -294,6 +296,7 @@ struct _internal_cmd {
  * @responding: used in _scsih_sas_device_mark_responding
  * @fast_path: fast path feature enable bit
  * @pfa_led_on: flag for PFA LED status
+ * @refcount: reference count for deletion
  *
  */
 struct _sas_device {
@@ -315,8 +318,24 @@ struct _sas_device {
 	u8	responding;
 	u8	fast_path;
 	u8	pfa_led_on;
+	struct kref refcount;
 };
 
+static inline void sas_device_get(struct _sas_device *s)
+{
+	kref_get(&s->refcount);
+}
+
+static inline void sas_device_free(struct kref *r)
+{
+	kfree(container_of(r, struct _sas_device, refcount));
+}
+
+static inline void sas_device_put(struct _sas_device *s)
+{
+	kref_put(&s->refcount, sas_device_free);
+}
+
 /**
  * struct _raid_device - raid volume link list
  * @list: sas device list
@@ -1043,7 +1062,9 @@ struct _sas_node *mpt3sas_scsih_expander_find_by_handle(
 	struct MPT3SAS_ADAPTER *ioc, u16 handle);
 struct _sas_node *mpt3sas_scsih_expander_find_by_sas_address(
 	struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
-struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
+struct _sas_device *mpt3sas_get_sdev_by_addr(
+	 struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
+struct _sas_device *__mpt3sas_get_sdev_by_addr(
 	struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
 
 void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..7142e3b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -518,8 +518,61 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
 	}
 }
 
+static struct _sas_device *
+__mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
+	struct MPT3SAS_TARGET *tgt_priv)
+{
+	struct _sas_device *ret;
+
+	assert_spin_locked(&ioc->sas_device_lock);
+
+	ret = tgt_priv->sdev;
+	if (ret)
+		sas_device_get(ret);
+
+	return ret;
+}
+
+static struct _sas_device *
+mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
+	struct MPT3SAS_TARGET *tgt_priv)
+{
+	struct _sas_device *ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	ret = __mpt3sas_get_sdev_from_target(ioc, tgt_priv);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return ret;
+}
+
+struct _sas_device *
+__mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
+	u64 sas_address)
+{
+	struct _sas_device *sas_device;
+
+	assert_spin_locked(&ioc->sas_device_lock);
+
+	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
+		if (sas_device->sas_address == sas_address)
+			goto found_device;
+
+	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
+		if (sas_device->sas_address == sas_address)
+			goto found_device;
+
+	return NULL;
+
+found_device:
+	sas_device_get(sas_device);
+	return sas_device;
+}
+
+
 /**
- * mpt3sas_scsih_sas_device_find_by_sas_address - sas device search
+ * mpt3sas_get_sdev_by_addr - sas device search
  * @ioc: per adapter object
  * @sas_address: sas address
  * Context: Calling function should acquire ioc->sas_device_lock
@@ -528,24 +581,44 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
  * object.
  */
 struct _sas_device *
-mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
+mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
 	u64 sas_address)
 {
 	struct _sas_device *sas_device;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+			sas_address);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return sas_device;
+}
+
+static struct _sas_device *
+__mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _sas_device *sas_device;
+
+	assert_spin_locked(&ioc->sas_device_lock);
 
 	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
-		if (sas_device->sas_address == sas_address)
-			return sas_device;
+		if (sas_device->handle == handle)
+			goto found_device;
 
 	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
-		if (sas_device->sas_address == sas_address)
-			return sas_device;
+		if (sas_device->handle == handle)
+			goto found_device;
 
 	return NULL;
+
+found_device:
+	sas_device_get(sas_device);
+	return sas_device;
 }
 
 /**
- * _scsih_sas_device_find_by_handle - sas device search
+ * mpt3sas_get_sdev_by_handle - sas device search
  * @ioc: per adapter object
  * @handle: sas device handle (assigned by firmware)
  * Context: Calling function should acquire ioc->sas_device_lock
@@ -554,19 +627,16 @@ mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
  * object.
  */
 static struct _sas_device *
-_scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
+mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 {
 	struct _sas_device *sas_device;
+	unsigned long flags;
 
-	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
-		if (sas_device->handle == handle)
-			return sas_device;
-
-	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
-		if (sas_device->handle == handle)
-			return sas_device;
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
-	return NULL;
+	return sas_device;
 }
 
 /**
@@ -575,7 +645,7 @@ _scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
  * @sas_device: the sas_device object
  * Context: This function will acquire ioc->sas_device_lock.
  *
- * Removing object and freeing associated memory from the ioc->sas_device_list.
+ * If sas_device is on the list, remove it and decrement its reference count.
  */
 static void
 _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
@@ -585,10 +655,15 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
 
 	if (!sas_device)
 		return;
-
+	/*
+	 * The lock serializes access to the list, but we still need to verify
+	 * that nobody removed the entry while we were waiting on the lock.
+	 */
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	list_del(&sas_device->list);
-	kfree(sas_device);
+	if (!list_empty(&sas_device->list)) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 }
 
@@ -609,12 +684,17 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-	if (sas_device)
-		list_del(&sas_device->list);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+	if (sas_device) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-	if (sas_device)
+
+	if (sas_device) {
 		_scsih_remove_device(ioc, sas_device);
+		sas_device_put(sas_device);
+	}
 }
 
 /**
@@ -635,13 +715,17 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	    sas_address);
-	if (sas_device)
-		list_del(&sas_device->list);
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
+	if (sas_device) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-	if (sas_device)
+
+	if (sas_device) {
 		_scsih_remove_device(ioc, sas_device);
+		sas_device_put(sas_device);
+	}
 }
 
 /**
@@ -664,6 +748,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
 		(unsigned long long)sas_device->sas_address));
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device_get(sas_device);
 	list_add_tail(&sas_device->list, &ioc->sas_device_list);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
@@ -705,6 +790,7 @@ _scsih_sas_device_init_add(struct MPT3SAS_ADAPTER *ioc,
 		(unsigned long long)sas_device->sas_address));
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device_get(sas_device);
 	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
 	_scsih_determine_boot_device(ioc, sas_device, 0);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -1083,12 +1169,15 @@ _scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
 		goto not_sata;
 	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))
 		goto not_sata;
+
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	   sas_device_priv_data->sas_target->sas_address);
-	if (sas_device && sas_device->device_info &
-	    MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
-		max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
+	sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
+	if (sas_device) {
+		if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
+			max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
+
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
  not_sata:
@@ -1145,12 +1234,13 @@ _scsih_target_alloc(struct scsi_target *starget)
 	/* sas/sata devices */
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	rphy = dev_to_rphy(starget->dev.parent);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	   rphy->identify.sas_address);
 
 	if (sas_device) {
 		sas_target_priv_data->handle = sas_device->handle;
 		sas_target_priv_data->sas_address = sas_device->sas_address;
+		sas_target_priv_data->sdev = sas_device;
 		sas_device->starget = starget;
 		sas_device->id = starget->id;
 		sas_device->channel = starget->channel;
@@ -1200,13 +1290,21 @@ _scsih_target_destroy(struct scsi_target *starget)
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	rphy = dev_to_rphy(starget->dev.parent);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	   rphy->identify.sas_address);
+	sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
 	if (sas_device && (sas_device->starget == starget) &&
 	    (sas_device->id == starget->id) &&
 	    (sas_device->channel == starget->channel))
 		sas_device->starget = NULL;
 
+	if (sas_device) {
+		/*
+		 * Corresponding get() is in _scsih_target_alloc()
+		 */
+		sas_target_priv_data->sdev = NULL;
+		sas_device_put(sas_device);
+
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
  out:
@@ -1262,7 +1360,7 @@ _scsih_slave_alloc(struct scsi_device *sdev)
 
 	if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+		sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 					sas_target_priv_data->sas_address);
 		if (sas_device && (sas_device->starget == NULL)) {
 			sdev_printk(KERN_INFO, sdev,
@@ -1270,6 +1368,10 @@ _scsih_slave_alloc(struct scsi_device *sdev)
 				__func__, __LINE__);
 			sas_device->starget = starget;
 		}
+
+		if (sas_device)
+			sas_device_put(sas_device);
+
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 
@@ -1304,10 +1406,14 @@ _scsih_slave_destroy(struct scsi_device *sdev)
 
 	if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-		   sas_target_priv_data->sas_address);
+		sas_device = __mpt3sas_get_sdev_from_target(ioc,
+				sas_target_priv_data);
 		if (sas_device && !sas_target_priv_data->num_luns)
 			sas_device->starget = NULL;
+
+		if (sas_device)
+			sas_device_put(sas_device);
+
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 
@@ -1743,7 +1849,7 @@ _scsih_slave_configure(struct scsi_device *sdev)
 	}
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	   sas_device_priv_data->sas_target->sas_address);
 	if (!sas_device) {
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -1777,12 +1883,12 @@ _scsih_slave_configure(struct scsi_device *sdev)
 		ds, (unsigned long long)
 	    sas_device->enclosure_logical_id, sas_device->slot);
 
+	sas_device_put(sas_device);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
 	if (!ssp_target)
 		_scsih_display_sata_capabilities(ioc, handle, sdev);
 
-
 	_scsih_change_queue_depth(sdev, qdepth);
 
 	if (ssp_target) {
@@ -2173,8 +2279,7 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
 		    device_str, (unsigned long long)priv_target->sas_address);
 	} else {
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-		    priv_target->sas_address);
+		sas_device = __mpt3sas_get_sdev_from_target(ioc, priv_target);
 		if (sas_device) {
 			if (priv_target->flags &
 			    MPT_TARGET_FLAGS_RAID_COMPONENT) {
@@ -2193,6 +2298,8 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
 			    "enclosure_logical_id(0x%016llx), slot(%d)\n",
 			   (unsigned long long)sas_device->enclosure_logical_id,
 			    sas_device->slot);
+
+			sas_device_put(sas_device);
 		}
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
@@ -2268,10 +2375,11 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
 {
 	struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
 	struct MPT3SAS_DEVICE *sas_device_priv_data;
-	struct _sas_device *sas_device;
-	unsigned long flags;
+	struct _sas_device *sas_device = NULL;
 	u16	handle;
 	int r;
+	struct scsi_target *starget = scmd->device->sdev_target;
+	struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
 
 	sdev_printk(KERN_INFO, scmd->device,
 		"attempting device reset! scmd(%p)\n", scmd);
@@ -2291,12 +2399,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
 	handle = 0;
 	if (sas_device_priv_data->sas_target->flags &
 	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc,
-		   sas_device_priv_data->sas_target->handle);
+		sas_device = mpt3sas_get_sdev_from_target(ioc,
+				target_priv_data);
 		if (sas_device)
 			handle = sas_device->volume_handle;
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	} else
 		handle = sas_device_priv_data->sas_target->handle;
 
@@ -2313,6 +2419,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
  out:
 	sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n",
 	    ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+	if (sas_device)
+		sas_device_put(sas_device);
+
 	return r;
 }
 
@@ -2327,11 +2437,11 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
 {
 	struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
 	struct MPT3SAS_DEVICE *sas_device_priv_data;
-	struct _sas_device *sas_device;
-	unsigned long flags;
+	struct _sas_device *sas_device = NULL;
 	u16	handle;
 	int r;
 	struct scsi_target *starget = scmd->device->sdev_target;
+	struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
 
 	starget_printk(KERN_INFO, starget, "attempting target reset! scmd(%p)\n",
 		scmd);
@@ -2351,12 +2461,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
 	handle = 0;
 	if (sas_device_priv_data->sas_target->flags &
 	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc,
-		   sas_device_priv_data->sas_target->handle);
+		sas_device = mpt3sas_get_sdev_from_target(ioc,
+				target_priv_data);
 		if (sas_device)
 			handle = sas_device->volume_handle;
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	} else
 		handle = sas_device_priv_data->sas_target->handle;
 
@@ -2373,6 +2481,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
  out:
 	starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n",
 	    ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+	if (sas_device)
+		sas_device_put(sas_device);
+
 	return r;
 }
 
@@ -2683,15 +2795,15 @@ _scsih_block_io_to_children_attached_to_ex(struct MPT3SAS_ADAPTER *ioc,
 
 	list_for_each_entry(mpt3sas_port,
 	   &sas_expander->sas_port_list, port_list) {
-		if (mpt3sas_port->remote_identify.device_type ==
-		    SAS_END_DEVICE) {
+		if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
 			spin_lock_irqsave(&ioc->sas_device_lock, flags);
-			sas_device =
-			    mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-			   mpt3sas_port->remote_identify.sas_address);
-			if (sas_device)
+			sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+					mpt3sas_port->remote_identify.sas_address);
+			if (sas_device) {
 				set_bit(sas_device->handle,
-				    ioc->blocking_handles);
+					ioc->blocking_handles);
+				sas_device_put(sas_device);
+			}
 			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 		}
 	}
@@ -2759,7 +2871,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 {
 	Mpi2SCSITaskManagementRequest_t *mpi_request;
 	u16 smid;
-	struct _sas_device *sas_device;
+	struct _sas_device *sas_device = NULL;
 	struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
 	u64 sas_address = 0;
 	unsigned long flags;
@@ -2792,7 +2904,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device && sas_device->starget &&
 	    sas_device->starget->hostdata) {
 		sas_target_priv_data = sas_device->starget->hostdata;
@@ -2814,14 +2926,14 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	if (!smid) {
 		delayed_tr = kzalloc(sizeof(*delayed_tr), GFP_ATOMIC);
 		if (!delayed_tr)
-			return;
+			goto out;
 		INIT_LIST_HEAD(&delayed_tr->list);
 		delayed_tr->handle = handle;
 		list_add_tail(&delayed_tr->list, &ioc->delayed_tr_list);
 		dewtprintk(ioc, pr_info(MPT3SAS_FMT
 		    "DELAYED:tr:handle(0x%04x), (open)\n",
 		    ioc->name, handle));
-		return;
+		goto out;
 	}
 
 	dewtprintk(ioc, pr_info(MPT3SAS_FMT
@@ -2835,6 +2947,9 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	mpi_request->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET;
 	mpt3sas_base_put_smid_hi_priority(ioc, smid);
 	mpt3sas_trigger_master(ioc, MASTER_TRIGGER_DEVICE_REMOVAL);
+out:
+	if (sas_device)
+		sas_device_put(sas_device);
 }
 
 /**
@@ -3685,7 +3800,6 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 	char *desc_scsi_state = ioc->tmp_string;
 	u32 log_info = le32_to_cpu(mpi_reply->IOCLogInfo);
 	struct _sas_device *sas_device = NULL;
-	unsigned long flags;
 	struct scsi_target *starget = scmd->device->sdev_target;
 	struct MPT3SAS_TARGET *priv_target = starget->hostdata;
 	char *device_str = NULL;
@@ -3813,9 +3927,7 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 		pr_warn(MPT3SAS_FMT "\t%s wwid(0x%016llx)\n", ioc->name,
 		    device_str, (unsigned long long)priv_target->sas_address);
 	} else {
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-		    priv_target->sas_address);
+		sas_device = mpt3sas_get_sdev_from_target(ioc, priv_target);
 		if (sas_device) {
 			pr_warn(MPT3SAS_FMT
 				"\tsas_address(0x%016llx), phy(%d)\n",
@@ -3825,8 +3937,9 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 			    "\tenclosure_logical_id(0x%016llx), slot(%d)\n",
 			    ioc->name, (unsigned long long)
 			    sas_device->enclosure_logical_id, sas_device->slot);
+
+			sas_device_put(sas_device);
 		}
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 
 	pr_warn(MPT3SAS_FMT
@@ -3878,7 +3991,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	Mpi2SepRequest_t mpi_request;
 	struct _sas_device *sas_device;
 
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (!sas_device)
 		return;
 
@@ -3893,7 +4006,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	    &mpi_request)) != 0) {
 		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
 		__FILE__, __LINE__, __func__);
-		return;
+		goto out;
 	}
 	sas_device->pfa_led_on = 1;
 
@@ -3902,8 +4015,10 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 			"enclosure_processor: ioc_status (0x%04x), loginfo(0x%08x)\n",
 			ioc->name, le16_to_cpu(mpi_reply.IOCStatus),
 		    le32_to_cpu(mpi_reply.IOCLogInfo)));
-		return;
+		goto out;
 	}
+out:
+	sas_device_put(sas_device);
 }
 /**
  * _scsih_turn_off_pfa_led - turn off Fault LED
@@ -3986,19 +4101,17 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 
 	/* only handle non-raid devices */
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (!sas_device) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
+		goto out_unlock;
 	}
 	starget = sas_device->starget;
 	sas_target_priv_data = starget->hostdata;
 
 	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
-	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)))
+		goto out_unlock;
+
 	starget_printk(KERN_WARNING, starget, "predicted fault\n");
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
@@ -4012,7 +4125,7 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	if (!event_reply) {
 		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
 		    ioc->name, __FILE__, __LINE__, __func__);
-		return;
+		goto out;
 	}
 
 	event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
@@ -4029,6 +4142,14 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
 	mpt3sas_ctl_add_to_event_log(ioc, event_reply);
 	kfree(event_reply);
+out:
+	if (sas_device)
+		sas_device_put(sas_device);
+	return;
+
+out_unlock:
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	goto out;
 }
 
 /**
@@ -4772,13 +4893,11 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	    sas_address);
 
-	if (!sas_device) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	if (!sas_device)
+		goto out_unlock;
 
 	if (unlikely(sas_device->handle != handle)) {
 		starget = sas_device->starget;
@@ -4796,20 +4915,24 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
 		pr_err(MPT3SAS_FMT
 			"device is not present handle(0x%04x), flags!!!\n",
 			ioc->name, handle);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
+		goto out_unlock;
 	}
 
 	/* check if there were any issues with discovery */
 	if (_scsih_check_access_status(ioc, sas_address, handle,
-	    sas_device_pg0.AccessStatus)) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	    sas_device_pg0.AccessStatus))
+		goto out_unlock;
 
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	_scsih_ublock_io_device(ioc, sas_address);
+	if (sas_device)
+		sas_device_put(sas_device);
+	return;
 
+out_unlock:
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (sas_device)
+		sas_device_put(sas_device);
 }
 
 /**
@@ -4834,7 +4957,6 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 	u32 ioc_status;
 	u64 sas_address;
 	u32 device_info;
-	unsigned long flags;
 
 	if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
 	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
@@ -4870,13 +4992,11 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 	    sas_device_pg0.AccessStatus))
 		return -1;
 
-	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	    sas_address);
-	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-
-	if (sas_device)
+	sas_device = mpt3sas_get_sdev_by_addr(ioc, sas_address);
+	if (sas_device) {
+		sas_device_put(sas_device);
 		return -1;
+	}
 
 	sas_device = kzalloc(sizeof(struct _sas_device),
 	    GFP_KERNEL);
@@ -4886,6 +5006,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 		return 0;
 	}
 
+	kref_init(&sas_device->refcount);
 	sas_device->handle = handle;
 	if (_scsih_get_sas_address(ioc,
 	    le16_to_cpu(sas_device_pg0.ParentDevHandle),
@@ -4917,6 +5038,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 	else
 		_scsih_sas_device_add(ioc, sas_device);
 
+	sas_device_put(sas_device);
 	return 0;
 }
 
@@ -4965,8 +5087,6 @@ _scsih_remove_device(struct MPT3SAS_ADAPTER *ioc,
 		ioc->name, __func__,
 	    sas_device->handle, (unsigned long long)
 	    sas_device->sas_address));
-
-	kfree(sas_device);
 }
 
 #ifdef CONFIG_SCSI_MPT3SAS_LOGGING
@@ -5291,25 +5411,25 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc,
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_address = le64_to_cpu(event_data->SASAddress);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	    sas_address);
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
 
-	if (!sas_device || !sas_device->starget) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	if (!sas_device || !sas_device->starget)
+		goto out;
 
 	target_priv_data = sas_device->starget->hostdata;
-	if (!target_priv_data) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	if (!target_priv_data)
+		goto out;
 
 	if (event_data->ReasonCode ==
 	    MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET)
 		target_priv_data->tm_busy = 1;
 	else
 		target_priv_data->tm_busy = 0;
+
+out:
+	if (sas_device)
+		sas_device_put(sas_device);
+
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 }
 
@@ -5793,7 +5913,7 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
 	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device) {
 		sas_device->volume_handle = 0;
 		sas_device->volume_wwid = 0;
@@ -5812,6 +5932,8 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
 	/* exposing raid component */
 	if (starget)
 		starget_for_each_device(starget, NULL, _scsih_reprobe_lun);
+
+	sas_device_put(sas_device);
 }
 
 /**
@@ -5840,7 +5962,7 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
 		    &volume_wwid);
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device) {
 		set_bit(handle, ioc->pd_handles);
 		if (sas_device->starget && sas_device->starget->hostdata) {
@@ -5860,6 +5982,8 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
 	_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
 	if (starget)
 		starget_for_each_device(starget, (void *)1, _scsih_reprobe_lun);
+
+	sas_device_put(sas_device);
 }
 
 /**
@@ -5892,7 +6016,6 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
 	Mpi2EventIrConfigElement_t *element)
 {
 	struct _sas_device *sas_device;
-	unsigned long flags;
 	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
 	Mpi2ConfigReply_t mpi_reply;
 	Mpi2SasDevicePage0_t sas_device_pg0;
@@ -5902,11 +6025,10 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
 
 	set_bit(handle, ioc->pd_handles);
 
-	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device) {
 		_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
+		sas_device_put(sas_device);
 		return;
 	}
 
@@ -6183,7 +6305,6 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
 	u16 handle, parent_handle;
 	u32 state;
 	struct _sas_device *sas_device;
-	unsigned long flags;
 	Mpi2ConfigReply_t mpi_reply;
 	Mpi2SasDevicePage0_t sas_device_pg0;
 	u32 ioc_status;
@@ -6212,12 +6333,12 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
 	case MPI2_RAID_PD_STATE_HOT_SPARE:
 
 		set_bit(handle, ioc->pd_handles);
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
-		if (sas_device)
+		sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
+		if (sas_device) {
+			sas_device_put(sas_device);
 			return;
+		}
 
 		if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
 		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
@@ -6673,6 +6794,7 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
 	struct _raid_device *raid_device, *raid_device_next;
 	struct list_head tmp_list;
 	unsigned long flags;
+	LIST_HEAD(head);
 
 	pr_info(MPT3SAS_FMT "removing unresponding devices: start\n",
 	    ioc->name);
@@ -6680,14 +6802,29 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
 	/* removing unresponding end devices */
 	pr_info(MPT3SAS_FMT "removing unresponding devices: end-devices\n",
 	    ioc->name);
+
+	/*
+	 * Iterate, pulling off devices marked as non-responding. We become the
+	 * owner for the reference the list had on any object we prune.
+	 */
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	list_for_each_entry_safe(sas_device, sas_device_next,
-	    &ioc->sas_device_list, list) {
+				 &ioc->sas_device_list, list) {
 		if (!sas_device->responding)
-			mpt3sas_device_remove_by_sas_address(ioc,
-			    sas_device->sas_address);
+			list_move_tail(&sas_device->list, &head);
 		else
 			sas_device->responding = 0;
 	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	/*
+	 * Now, uninitialize and remove the unresponding devices we pruned.
+	 */
+	list_for_each_entry_safe(sas_device, sas_device_next, &head, list) {
+		_scsih_remove_device(ioc, sas_device);
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 
 	/* removing unresponding volumes */
 	if (ioc->ir_firmware) {
@@ -6841,11 +6978,11 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
 		}
 		phys_disk_num = pd_pg0.PhysDiskNum;
 		handle = le16_to_cpu(pd_pg0.DevHandle);
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		if (sas_device)
+		sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
+		if (sas_device) {
+			sas_device_put(sas_device);
 			continue;
+		}
 		if (mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
 		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
 		    handle) != 0)
@@ -6966,12 +7103,12 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
 		if (!(_scsih_is_end_device(
 		    le32_to_cpu(sas_device_pg0.DeviceInfo))))
 			continue;
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+		sas_device = mpt3sas_get_sdev_by_addr(ioc,
 		    le64_to_cpu(sas_device_pg0.SASAddress));
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		if (sas_device)
+		if (sas_device) {
+			sas_device_put(sas_device);
 			continue;
+		}
 		parent_handle = le16_to_cpu(sas_device_pg0.ParentDevHandle);
 		if (!_scsih_get_sas_address(ioc, parent_handle, &sas_address)) {
 			pr_info(MPT3SAS_FMT "\tBEFORE adding end device: " \
@@ -7598,6 +7735,48 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
 	}
 }
 
+static struct _sas_device *get_next_sas_device(struct MPT3SAS_ADAPTER *ioc)
+{
+	struct _sas_device *sas_device = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	if (!list_empty(&ioc->sas_device_init_list)) {
+		sas_device = list_first_entry(&ioc->sas_device_init_list,
+				struct _sas_device, list);
+		sas_device_get(sas_device);
+	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return sas_device;
+}
+
+static void sas_device_make_active(struct MPT3SAS_ADAPTER *ioc,
+		struct _sas_device *sas_device)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+
+	/*
+	 * Since we dropped the lock during the call to port_add(), we need to
+	 * be careful here that somebody else didn't move or delete this item
+	 * while we were busy with other things.
+	 *
+	 * If it was on the list, we need a put() for the reference the list
+	 * had. Either way, we need a get() for the destination list.
+	 */
+	if (!list_empty(&sas_device->list)) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
+
+	sas_device_get(sas_device);
+	list_add_tail(&sas_device->list, &ioc->sas_device_list);
+
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+}
+
 /**
  * _scsih_probe_sas - reporting sas devices to sas transport
  * @ioc: per adapter object
@@ -7607,17 +7786,13 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
 static void
 _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
 {
-	struct _sas_device *sas_device, *next;
-	unsigned long flags;
-
-	/* SAS Device List */
-	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
-	    list) {
+	struct _sas_device *sas_device;
 
+	while ((sas_device = get_next_sas_device(ioc))) {
 		if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
-		    sas_device->sas_address_parent)) {
-			list_del(&sas_device->list);
-			kfree(sas_device);
+				sas_device->sas_address_parent)) {
+			_scsih_sas_device_remove(ioc, sas_device);
+			sas_device_put(sas_device);
 			continue;
 		} else if (!sas_device->starget) {
 			/*
@@ -7628,17 +7803,16 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
 			 */
 			if (!ioc->is_driver_loading) {
 				mpt3sas_transport_port_remove(ioc,
-				    sas_device->sas_address,
-				    sas_device->sas_address_parent);
-				list_del(&sas_device->list);
-				kfree(sas_device);
+						sas_device->sas_address,
+						sas_device->sas_address_parent);
+				_scsih_sas_device_remove(ioc, sas_device);
+				sas_device_put(sas_device);
 				continue;
 			}
 		}
 
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		list_move_tail(&sas_device->list, &ioc->sas_device_list);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		sas_device_make_active(ioc, sas_device);
+		sas_device_put(sas_device);
 	}
 }
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index efb98af..2f9d038 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -1306,15 +1306,17 @@ _transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
 	int rc;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	    rphy->identify.sas_address);
 	if (sas_device) {
 		*identifier = sas_device->enclosure_logical_id;
 		rc = 0;
+		sas_device_put(sas_device);
 	} else {
 		*identifier = 0;
 		rc = -ENXIO;
 	}
+
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	return rc;
 }
@@ -1334,12 +1336,14 @@ _transport_get_bay_identifier(struct sas_rphy *rphy)
 	int rc;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	    rphy->identify.sas_address);
-	if (sas_device)
+	if (sas_device) {
 		rc = sas_device->slot;
-	else
+		sas_device_put(sas_device);
+	} else {
 		rc = -ENXIO;
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	return rc;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] mpt3sas: Refcount fw_events and fix unsafe list usage
  2015-08-26  4:09 [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Nicholas A. Bellinger
  2015-08-26  4:09 ` [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
@ 2015-08-26  4:09 ` Nicholas A. Bellinger
  2015-08-27  2:06   ` Calvin Owens
  2015-08-26 23:54 ` [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Calvin Owens
  2 siblings, 1 reply; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-26  4:09 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, James Bottomley, Calvin Owens, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

The fw_event_work struct is concurrently referenced at shutdown, so
add a refcount to protect it, and refactor the code to use it.

Additionally, refactor _scsih_fw_event_cleanup_queue() such that it
no longer iterates over the list without holding the lock, since
_firmware_event_work() concurrently deletes items from the list.

This patch is a port of Calvin's PATCH-v4 for mpt2sas code.

Cc: Calvin Owens <calvinowens@fb.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 116 ++++++++++++++++++++++++++++-------
 1 file changed, 94 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 7142e3b..4cb7e89 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -175,6 +175,7 @@ struct sense_info {
  * @VP_ID: virtual port id
  * @ignore: flag meaning this event has been marked to ignore
  * @event: firmware event MPI2_EVENT_XXX defined in mpt2_ioc.h
+ * @refcount: reference count for fw_event_work
  * @event_data: reply event data payload follows
  *
  * This object stored on ioc->fw_event_list.
@@ -191,9 +192,37 @@ struct fw_event_work {
 	u8			VP_ID;
 	u8			ignore;
 	u16			event;
+	struct kref		refcount;
 	char			event_data[0] __aligned(4);
 };
 
+static void fw_event_work_free(struct kref *r)
+{
+	kfree(container_of(r, struct fw_event_work, refcount));
+}
+
+static void fw_event_work_get(struct fw_event_work *fw_work)
+{
+	kref_get(&fw_work->refcount);
+}
+
+static void fw_event_work_put(struct fw_event_work *fw_work)
+{
+	kref_put(&fw_work->refcount, fw_event_work_free);
+}
+
+static struct fw_event_work *alloc_fw_event_work(int len)
+{
+	struct fw_event_work *fw_event;
+
+	fw_event = kzalloc(sizeof(*fw_event) + len, GFP_ATOMIC);
+	if (!fw_event)
+		return NULL;
+
+	kref_init(&fw_event->refcount);
+	return fw_event;
+}
+
 /* raid transport support */
 static struct raid_template *mpt3sas_raid_template;
 
@@ -2542,32 +2571,36 @@ _scsih_fw_event_add(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 		return;
 
 	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	fw_event_work_get(fw_event);
 	INIT_LIST_HEAD(&fw_event->list);
 	list_add_tail(&fw_event->list, &ioc->fw_event_list);
 	INIT_WORK(&fw_event->work, _firmware_event_work);
+	fw_event_work_get(fw_event);
 	queue_work(ioc->firmware_event_thread, &fw_event->work);
 	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
 }
 
 /**
- * _scsih_fw_event_free - delete fw_event
+ * _scsih_fw_event_del_from_list - delete fw_event from the list
  * @ioc: per adapter object
  * @fw_event: object describing the event
  * Context: This function will acquire ioc->fw_event_lock.
  *
- * This removes firmware event object from link list, frees associated memory.
+ * If the fw_event is on the fw_event_list, remove it and do a put.
  *
  * Return nothing.
  */
 static void
-_scsih_fw_event_free(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work
+_scsih_fw_event_del_from_list(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work
 	*fw_event)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&ioc->fw_event_lock, flags);
-	list_del(&fw_event->list);
-	kfree(fw_event);
+	if (!list_empty(&fw_event->list)) {
+		list_del_init(&fw_event->list);
+		fw_event_work_put(fw_event);
+	}
 	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
 }
 
@@ -2587,14 +2620,14 @@ mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc,
 
 	if (ioc->is_driver_loading)
 		return;
-	fw_event = kzalloc(sizeof(*fw_event) + sizeof(*event_data),
-			   GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(sizeof(*event_data));
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_PROCESS_TRIGGER_DIAG;
 	fw_event->ioc = ioc;
 	memcpy(fw_event->event_data, event_data, sizeof(*event_data));
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -2610,12 +2643,13 @@ _scsih_error_recovery_delete_devices(struct MPT3SAS_ADAPTER *ioc)
 
 	if (ioc->is_driver_loading)
 		return;
-	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(0);
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_REMOVE_UNRESPONDING_DEVICES;
 	fw_event->ioc = ioc;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -2629,12 +2663,29 @@ mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc)
 {
 	struct fw_event_work *fw_event;
 
-	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(0);
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_PORT_ENABLE_COMPLETE;
 	fw_event->ioc = ioc;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
+}
+
+static struct fw_event_work *dequeue_next_fw_event(struct MPT3SAS_ADAPTER *ioc)
+{
+	unsigned long flags;
+	struct fw_event_work *fw_event = NULL;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	if (!list_empty(&ioc->fw_event_list)) {
+		fw_event = list_first_entry(&ioc->fw_event_list,
+				struct fw_event_work, list);
+		list_del_init(&fw_event->list);
+	}
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+
+	return fw_event;
 }
 
 /**
@@ -2649,17 +2700,25 @@ mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc)
 static void
 _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc)
 {
-	struct fw_event_work *fw_event, *next;
+	struct fw_event_work *fw_event;
 
 	if (list_empty(&ioc->fw_event_list) ||
 	     !ioc->firmware_event_thread || in_interrupt())
 		return;
 
-	list_for_each_entry_safe(fw_event, next, &ioc->fw_event_list, list) {
-		if (cancel_delayed_work_sync(&fw_event->delayed_work)) {
-			_scsih_fw_event_free(ioc, fw_event);
-			continue;
-		}
+	while ((fw_event = dequeue_next_fw_event(ioc))) {
+		/*
+		 * Wait on the fw_event to complete. If this returns 1, then
+		 * the event was never executed, and we need a put for the
+		 * reference the delayed_work had on the fw_event.
+		 *
+		 * If it did execute, we wait for it to finish, and the put will
+		 * happen from _firmware_event_work()
+		 */
+		if (cancel_work_sync(&fw_event->work))
+			fw_event_work_put(fw_event);
+
+		fw_event_work_put(fw_event);
 	}
 }
 
@@ -4071,13 +4130,14 @@ _scsih_send_event_to_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 {
 	struct fw_event_work *fw_event;
 
-	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(0);
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_TURN_ON_PFA_LED;
 	fw_event->device_handle = handle;
 	fw_event->ioc = ioc;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -7200,10 +7260,11 @@ mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase)
 static void
 _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 {
+	_scsih_fw_event_del_from_list(ioc, fw_event);
+
 	/* the queue is being flushed so ignore this event */
-	if (ioc->remove_host ||
-	    ioc->pci_error_recovery) {
-		_scsih_fw_event_free(ioc, fw_event);
+	if (ioc->remove_host || ioc->pci_error_recovery) {
+		fw_event_work_put(fw_event);
 		return;
 	}
 
@@ -7214,8 +7275,17 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 			fw_event->event_data);
 		break;
 	case MPT3SAS_REMOVE_UNRESPONDING_DEVICES:
-		while (scsi_host_in_recovery(ioc->shost) || ioc->shost_recovery)
+		while (scsi_host_in_recovery(ioc->shost) ||
+				ioc->shost_recovery) {
+			/*
+			 * If we're unloading, bail. Otherwise, this can become
+			 * an infinite loop.
+			 */
+			if (ioc->remove_host)
+				goto out;
+
 			ssleep(1);
+		}
 		_scsih_remove_unresponding_sas_devices(ioc);
 		_scsih_scan_for_devices_after_reset(ioc);
 		break;
@@ -7260,7 +7330,8 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 		_scsih_sas_ir_operation_status_event(ioc, fw_event);
 		break;
 	}
-	_scsih_fw_event_free(ioc, fw_event);
+out:
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -7376,7 +7447,7 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
 	}
 
 	sz = le16_to_cpu(mpi_reply->EventDataLength) * 4;
-	fw_event = kzalloc(sizeof(*fw_event) + sz, GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(sz);
 	if (!fw_event) {
 		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
 		    ioc->name, __FILE__, __LINE__, __func__);
@@ -7389,6 +7460,7 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
 	fw_event->VP_ID = mpi_reply->VP_ID;
 	fw_event->event = event;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 	return 1;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-26  4:09 [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Nicholas A. Bellinger
  2015-08-26  4:09 ` [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
  2015-08-26  4:09 ` [PATCH 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
@ 2015-08-26 23:54 ` Calvin Owens
  2015-08-26 23:58   ` Nicholas A. Bellinger
  2 siblings, 1 reply; 15+ messages in thread
From: Calvin Owens @ 2015-08-26 23:54 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, James Bottomley, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, Nicholas Bellinger,
	kernel-team

On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> Hi James & Co,
> 
> This series is a mpt3sas forward port of Calvin Owens' in-flight
> reference counting bugfixes for mpt2sas LLD code here:
> 
> [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
> http://marc.info/?l=linux-scsi&m=143951695904115&w=2

Nicholas,

I'm glad to hear these fixes were helpful! I was planning on porting to
mpt3sas in the near future, thanks for saving me the trouble :)

Calvin

> The differences between mpt2sas + mpt3sas in this area are very,
> very small, and is required to address a NULL pointer dereference
> OOPsen in _scsih_probe_sas() -> mpt3sas_transport_port_add() ->
> sas_port_add_phy() that I've been hitting occasionally on boot
> during initial LUN scan.
> 
> So far this code has been tested on v3.14.47 with a small cluster
> using SAS3008 HBAs plus a few preceeding upstream mpt3sas patches
> so these patches apply cleanly, and with the changes in place the
> original OOPsen appears to be resolved.
> 
> This patch series is cut atop v4.2-rc1 code, and barring any
> objections from Avago folks et al., should be considered along
> with Calvin's mpt2sas patch set for v4.3-rc1 code.
> 
> Thank you,
> 
> --nab
> 
> Nicholas Bellinger (2):
>   mpt3sas: Refcount sas_device objects and fix unsafe list usage
>   mpt3sas: Refcount fw_events and fix unsafe list usage
> 
>  drivers/scsi/mpt3sas/mpt3sas_base.h      |  23 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 592 ++++++++++++++++++++++---------
>  drivers/scsi/mpt3sas/mpt3sas_transport.c |  12 +-
>  3 files changed, 449 insertions(+), 178 deletions(-)
> 
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-26 23:54 ` [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Calvin Owens
@ 2015-08-26 23:58   ` Nicholas A. Bellinger
  2015-08-27  5:07     ` Sreekanth Reddy
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-26 23:58 UTC (permalink / raw)
  To: Calvin Owens
  Cc: Nicholas A. Bellinger, linux-scsi, linux-kernel, James Bottomley,
	Christoph Hellwig, Sreekanth Reddy, MPT-FusionLinux.pdl,
	kernel-team

On Wed, 2015-08-26 at 16:54 -0700, Calvin Owens wrote:
> On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger <nab@linux-iscsi.org>
> > 
> > Hi James & Co,
> > 
> > This series is a mpt3sas forward port of Calvin Owens' in-flight
> > reference counting bugfixes for mpt2sas LLD code here:
> > 
> > [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
> > http://marc.info/?l=linux-scsi&m=143951695904115&w=2
> 
> Nicholas,
> 
> I'm glad to hear these fixes were helpful! I was planning on porting to
> mpt3sas in the near future, thanks for saving me the trouble :)
> 

Would you mind giving this series the once over, and add your
Reviewed-by as well..?

Sreekanth + Avago folks, can you please do the same for Calvin's mpt2sas
series and this series..?

This has been a long-standing issue with mpt*sas LLD code and really
needs to be addressed for upstream.

Thank you,

--nab


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] mpt3sas: Refcount fw_events and fix unsafe list usage
  2015-08-26  4:09 ` [PATCH 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
@ 2015-08-27  2:06   ` Calvin Owens
  0 siblings, 0 replies; 15+ messages in thread
From: Calvin Owens @ 2015-08-27  2:06 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, James Bottomley, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, Nicholas Bellinger

On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> The fw_event_work struct is concurrently referenced at shutdown, so
> add a refcount to protect it, and refactor the code to use it.
> 
> Additionally, refactor _scsih_fw_event_cleanup_queue() such that it
> no longer iterates over the list without holding the lock, since
> _firmware_event_work() concurrently deletes items from the list.
> 
> This patch is a port of Calvin's PATCH-v4 for mpt2sas code.
> 
> Cc: Calvin Owens <calvinowens@fb.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 116 ++++++++++++++++++++++++++++-------
>  1 file changed, 94 insertions(+), 22 deletions(-)

Looks good, thanks again for this.

Reviewed-by: Calvin Owens <calvinowens@fb.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-08-26  4:09 ` [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
@ 2015-08-27  2:20   ` Calvin Owens
  0 siblings, 0 replies; 15+ messages in thread
From: Calvin Owens @ 2015-08-27  2:20 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, James Bottomley, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, Nicholas Bellinger

On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> These objects can be referenced concurrently throughout the driver, we
> need a way to make sure threads can't delete them out from under
> each other. This patch adds the refcount, and refactors the code to use
> it.
> 
> Additionally, we cannot iterate over the sas_device_list without
> holding the lock, or we risk corrupting random memory if items are
> added or deleted as we iterate. This patch refactors _scsih_probe_sas()
> to use the sas_device_list in a safe way.
> 
> This patch is a port of Calvin's PATCH-v4 for mpt2sas code.
> 
> Cc: Calvin Owens <calvinowens@fb.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h      |  23 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 476 +++++++++++++++++++++----------
>  drivers/scsi/mpt3sas/mpt3sas_transport.c |  12 +-
>  3 files changed, 355 insertions(+), 156 deletions(-)

Looks good, thanks again for this.

Reviewed-by: Calvin Owens <calvinowens@fb.com>
 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index afa8816..fe29ac0 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -209,6 +209,7 @@ struct Mpi2ManufacturingPage11_t {
>   * @flags: MPT_TARGET_FLAGS_XXX flags
>   * @deleted: target flaged for deletion
>   * @tm_busy: target is busy with TM request.
> + * @sdev: The sas_device associated with this target
>   */
>  struct MPT3SAS_TARGET {
>  	struct scsi_target *starget;
> @@ -218,6 +219,7 @@ struct MPT3SAS_TARGET {
>  	u32	flags;
>  	u8	deleted;
>  	u8	tm_busy;
> +	struct _sas_device *sdev;
>  };
>  
>  
> @@ -294,6 +296,7 @@ struct _internal_cmd {
>   * @responding: used in _scsih_sas_device_mark_responding
>   * @fast_path: fast path feature enable bit
>   * @pfa_led_on: flag for PFA LED status
> + * @refcount: reference count for deletion
>   *
>   */
>  struct _sas_device {
> @@ -315,8 +318,24 @@ struct _sas_device {
>  	u8	responding;
>  	u8	fast_path;
>  	u8	pfa_led_on;
> +	struct kref refcount;
>  };
>  
> +static inline void sas_device_get(struct _sas_device *s)
> +{
> +	kref_get(&s->refcount);
> +}
> +
> +static inline void sas_device_free(struct kref *r)
> +{
> +	kfree(container_of(r, struct _sas_device, refcount));
> +}
> +
> +static inline void sas_device_put(struct _sas_device *s)
> +{
> +	kref_put(&s->refcount, sas_device_free);
> +}
> +
>  /**
>   * struct _raid_device - raid volume link list
>   * @list: sas device list
> @@ -1043,7 +1062,9 @@ struct _sas_node *mpt3sas_scsih_expander_find_by_handle(
>  	struct MPT3SAS_ADAPTER *ioc, u16 handle);
>  struct _sas_node *mpt3sas_scsih_expander_find_by_sas_address(
>  	struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
> -struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
> +struct _sas_device *mpt3sas_get_sdev_by_addr(
> +	 struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
> +struct _sas_device *__mpt3sas_get_sdev_by_addr(
>  	struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
>  
>  void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 5a97e32..7142e3b 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -518,8 +518,61 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
>  	}
>  }
>  
> +static struct _sas_device *
> +__mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
> +	struct MPT3SAS_TARGET *tgt_priv)
> +{
> +	struct _sas_device *ret;
> +
> +	assert_spin_locked(&ioc->sas_device_lock);
> +
> +	ret = tgt_priv->sdev;
> +	if (ret)
> +		sas_device_get(ret);
> +
> +	return ret;
> +}
> +
> +static struct _sas_device *
> +mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
> +	struct MPT3SAS_TARGET *tgt_priv)
> +{
> +	struct _sas_device *ret;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +	ret = __mpt3sas_get_sdev_from_target(ioc, tgt_priv);
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +	return ret;
> +}
> +
> +struct _sas_device *
> +__mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
> +	u64 sas_address)
> +{
> +	struct _sas_device *sas_device;
> +
> +	assert_spin_locked(&ioc->sas_device_lock);
> +
> +	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> +		if (sas_device->sas_address == sas_address)
> +			goto found_device;
> +
> +	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> +		if (sas_device->sas_address == sas_address)
> +			goto found_device;
> +
> +	return NULL;
> +
> +found_device:
> +	sas_device_get(sas_device);
> +	return sas_device;
> +}
> +
> +
>  /**
> - * mpt3sas_scsih_sas_device_find_by_sas_address - sas device search
> + * mpt3sas_get_sdev_by_addr - sas device search
>   * @ioc: per adapter object
>   * @sas_address: sas address
>   * Context: Calling function should acquire ioc->sas_device_lock
> @@ -528,24 +581,44 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
>   * object.
>   */
>  struct _sas_device *
> -mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> +mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
>  	u64 sas_address)
>  {
>  	struct _sas_device *sas_device;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> +			sas_address);
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +	return sas_device;
> +}
> +
> +static struct _sas_device *
> +__mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> +{
> +	struct _sas_device *sas_device;
> +
> +	assert_spin_locked(&ioc->sas_device_lock);
>  
>  	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> -		if (sas_device->sas_address == sas_address)
> -			return sas_device;
> +		if (sas_device->handle == handle)
> +			goto found_device;
>  
>  	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> -		if (sas_device->sas_address == sas_address)
> -			return sas_device;
> +		if (sas_device->handle == handle)
> +			goto found_device;
>  
>  	return NULL;
> +
> +found_device:
> +	sas_device_get(sas_device);
> +	return sas_device;
>  }
>  
>  /**
> - * _scsih_sas_device_find_by_handle - sas device search
> + * mpt3sas_get_sdev_by_handle - sas device search
>   * @ioc: per adapter object
>   * @handle: sas device handle (assigned by firmware)
>   * Context: Calling function should acquire ioc->sas_device_lock
> @@ -554,19 +627,16 @@ mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
>   * object.
>   */
>  static struct _sas_device *
> -_scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> +mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  {
>  	struct _sas_device *sas_device;
> +	unsigned long flags;
>  
> -	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> -		if (sas_device->handle == handle)
> -			return sas_device;
> -
> -	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> -		if (sas_device->handle == handle)
> -			return sas_device;
> +	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
> -	return NULL;
> +	return sas_device;
>  }
>  
>  /**
> @@ -575,7 +645,7 @@ _scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>   * @sas_device: the sas_device object
>   * Context: This function will acquire ioc->sas_device_lock.
>   *
> - * Removing object and freeing associated memory from the ioc->sas_device_list.
> + * If sas_device is on the list, remove it and decrement its reference count.
>   */
>  static void
>  _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> @@ -585,10 +655,15 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
>  
>  	if (!sas_device)
>  		return;
> -
> +	/*
> +	 * The lock serializes access to the list, but we still need to verify
> +	 * that nobody removed the entry while we were waiting on the lock.
> +	 */
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	list_del(&sas_device->list);
> -	kfree(sas_device);
> +	if (!list_empty(&sas_device->list)) {
> +		list_del_init(&sas_device->list);
> +		sas_device_put(sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>  
> @@ -609,12 +684,17 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> +	if (sas_device) {
> +		list_del_init(&sas_device->list);
> +		sas_device_put(sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -	if (sas_device)
> +
> +	if (sas_device) {
>  		_scsih_remove_device(ioc, sas_device);
> +		sas_device_put(sas_device);
> +	}
>  }
>  
>  /**
> @@ -635,13 +715,17 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -	    sas_address);
> -	if (sas_device)
> -		list_del(&sas_device->list);
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
> +	if (sas_device) {
> +		list_del_init(&sas_device->list);
> +		sas_device_put(sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -	if (sas_device)
> +
> +	if (sas_device) {
>  		_scsih_remove_device(ioc, sas_device);
> +		sas_device_put(sas_device);
> +	}
>  }
>  
>  /**
> @@ -664,6 +748,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
>  		(unsigned long long)sas_device->sas_address));
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +	sas_device_get(sas_device);
>  	list_add_tail(&sas_device->list, &ioc->sas_device_list);
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
> @@ -705,6 +790,7 @@ _scsih_sas_device_init_add(struct MPT3SAS_ADAPTER *ioc,
>  		(unsigned long long)sas_device->sas_address));
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +	sas_device_get(sas_device);
>  	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
>  	_scsih_determine_boot_device(ioc, sas_device, 0);
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -1083,12 +1169,15 @@ _scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
>  		goto not_sata;
>  	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))
>  		goto not_sata;
> +
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -	   sas_device_priv_data->sas_target->sas_address);
> -	if (sas_device && sas_device->device_info &
> -	    MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
> -		max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
> +	sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
> +	if (sas_device) {
> +		if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
> +			max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
> +
> +		sas_device_put(sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
>   not_sata:
> @@ -1145,12 +1234,13 @@ _scsih_target_alloc(struct scsi_target *starget)
>  	/* sas/sata devices */
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	rphy = dev_to_rphy(starget->dev.parent);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  	   rphy->identify.sas_address);
>  
>  	if (sas_device) {
>  		sas_target_priv_data->handle = sas_device->handle;
>  		sas_target_priv_data->sas_address = sas_device->sas_address;
> +		sas_target_priv_data->sdev = sas_device;
>  		sas_device->starget = starget;
>  		sas_device->id = starget->id;
>  		sas_device->channel = starget->channel;
> @@ -1200,13 +1290,21 @@ _scsih_target_destroy(struct scsi_target *starget)
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	rphy = dev_to_rphy(starget->dev.parent);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -	   rphy->identify.sas_address);
> +	sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
>  	if (sas_device && (sas_device->starget == starget) &&
>  	    (sas_device->id == starget->id) &&
>  	    (sas_device->channel == starget->channel))
>  		sas_device->starget = NULL;
>  
> +	if (sas_device) {
> +		/*
> +		 * Corresponding get() is in _scsih_target_alloc()
> +		 */
> +		sas_target_priv_data->sdev = NULL;
> +		sas_device_put(sas_device);
> +
> +		sas_device_put(sas_device);
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
>   out:
> @@ -1262,7 +1360,7 @@ _scsih_slave_alloc(struct scsi_device *sdev)
>  
>  	if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +		sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  					sas_target_priv_data->sas_address);
>  		if (sas_device && (sas_device->starget == NULL)) {
>  			sdev_printk(KERN_INFO, sdev,
> @@ -1270,6 +1368,10 @@ _scsih_slave_alloc(struct scsi_device *sdev)
>  				__func__, __LINE__);
>  			sas_device->starget = starget;
>  		}
> +
> +		if (sas_device)
> +			sas_device_put(sas_device);
> +
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
>  
> @@ -1304,10 +1406,14 @@ _scsih_slave_destroy(struct scsi_device *sdev)
>  
>  	if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -		   sas_target_priv_data->sas_address);
> +		sas_device = __mpt3sas_get_sdev_from_target(ioc,
> +				sas_target_priv_data);
>  		if (sas_device && !sas_target_priv_data->num_luns)
>  			sas_device->starget = NULL;
> +
> +		if (sas_device)
> +			sas_device_put(sas_device);
> +
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
>  
> @@ -1743,7 +1849,7 @@ _scsih_slave_configure(struct scsi_device *sdev)
>  	}
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  	   sas_device_priv_data->sas_target->sas_address);
>  	if (!sas_device) {
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -1777,12 +1883,12 @@ _scsih_slave_configure(struct scsi_device *sdev)
>  		ds, (unsigned long long)
>  	    sas_device->enclosure_logical_id, sas_device->slot);
>  
> +	sas_device_put(sas_device);
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
>  	if (!ssp_target)
>  		_scsih_display_sata_capabilities(ioc, handle, sdev);
>  
> -
>  	_scsih_change_queue_depth(sdev, qdepth);
>  
>  	if (ssp_target) {
> @@ -2173,8 +2279,7 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
>  		    device_str, (unsigned long long)priv_target->sas_address);
>  	} else {
>  		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -		    priv_target->sas_address);
> +		sas_device = __mpt3sas_get_sdev_from_target(ioc, priv_target);
>  		if (sas_device) {
>  			if (priv_target->flags &
>  			    MPT_TARGET_FLAGS_RAID_COMPONENT) {
> @@ -2193,6 +2298,8 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
>  			    "enclosure_logical_id(0x%016llx), slot(%d)\n",
>  			   (unsigned long long)sas_device->enclosure_logical_id,
>  			    sas_device->slot);
> +
> +			sas_device_put(sas_device);
>  		}
>  		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
> @@ -2268,10 +2375,11 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
>  {
>  	struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
>  	struct MPT3SAS_DEVICE *sas_device_priv_data;
> -	struct _sas_device *sas_device;
> -	unsigned long flags;
> +	struct _sas_device *sas_device = NULL;
>  	u16	handle;
>  	int r;
> +	struct scsi_target *starget = scmd->device->sdev_target;
> +	struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
>  
>  	sdev_printk(KERN_INFO, scmd->device,
>  		"attempting device reset! scmd(%p)\n", scmd);
> @@ -2291,12 +2399,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
>  	handle = 0;
>  	if (sas_device_priv_data->sas_target->flags &
>  	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = _scsih_sas_device_find_by_handle(ioc,
> -		   sas_device_priv_data->sas_target->handle);
> +		sas_device = mpt3sas_get_sdev_from_target(ioc,
> +				target_priv_data);
>  		if (sas_device)
>  			handle = sas_device->volume_handle;
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	} else
>  		handle = sas_device_priv_data->sas_target->handle;
>  
> @@ -2313,6 +2419,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
>   out:
>  	sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n",
>  	    ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
> +
> +	if (sas_device)
> +		sas_device_put(sas_device);
> +
>  	return r;
>  }
>  
> @@ -2327,11 +2437,11 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
>  {
>  	struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
>  	struct MPT3SAS_DEVICE *sas_device_priv_data;
> -	struct _sas_device *sas_device;
> -	unsigned long flags;
> +	struct _sas_device *sas_device = NULL;
>  	u16	handle;
>  	int r;
>  	struct scsi_target *starget = scmd->device->sdev_target;
> +	struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
>  
>  	starget_printk(KERN_INFO, starget, "attempting target reset! scmd(%p)\n",
>  		scmd);
> @@ -2351,12 +2461,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
>  	handle = 0;
>  	if (sas_device_priv_data->sas_target->flags &
>  	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = _scsih_sas_device_find_by_handle(ioc,
> -		   sas_device_priv_data->sas_target->handle);
> +		sas_device = mpt3sas_get_sdev_from_target(ioc,
> +				target_priv_data);
>  		if (sas_device)
>  			handle = sas_device->volume_handle;
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	} else
>  		handle = sas_device_priv_data->sas_target->handle;
>  
> @@ -2373,6 +2481,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
>   out:
>  	starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n",
>  	    ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
> +
> +	if (sas_device)
> +		sas_device_put(sas_device);
> +
>  	return r;
>  }
>  
> @@ -2683,15 +2795,15 @@ _scsih_block_io_to_children_attached_to_ex(struct MPT3SAS_ADAPTER *ioc,
>  
>  	list_for_each_entry(mpt3sas_port,
>  	   &sas_expander->sas_port_list, port_list) {
> -		if (mpt3sas_port->remote_identify.device_type ==
> -		    SAS_END_DEVICE) {
> +		if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
>  			spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -			sas_device =
> -			    mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -			   mpt3sas_port->remote_identify.sas_address);
> -			if (sas_device)
> +			sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> +					mpt3sas_port->remote_identify.sas_address);
> +			if (sas_device) {
>  				set_bit(sas_device->handle,
> -				    ioc->blocking_handles);
> +					ioc->blocking_handles);
> +				sas_device_put(sas_device);
> +			}
>  			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  		}
>  	}
> @@ -2759,7 +2871,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  {
>  	Mpi2SCSITaskManagementRequest_t *mpi_request;
>  	u16 smid;
> -	struct _sas_device *sas_device;
> +	struct _sas_device *sas_device = NULL;
>  	struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
>  	u64 sas_address = 0;
>  	unsigned long flags;
> @@ -2792,7 +2904,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  		return;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>  	if (sas_device && sas_device->starget &&
>  	    sas_device->starget->hostdata) {
>  		sas_target_priv_data = sas_device->starget->hostdata;
> @@ -2814,14 +2926,14 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  	if (!smid) {
>  		delayed_tr = kzalloc(sizeof(*delayed_tr), GFP_ATOMIC);
>  		if (!delayed_tr)
> -			return;
> +			goto out;
>  		INIT_LIST_HEAD(&delayed_tr->list);
>  		delayed_tr->handle = handle;
>  		list_add_tail(&delayed_tr->list, &ioc->delayed_tr_list);
>  		dewtprintk(ioc, pr_info(MPT3SAS_FMT
>  		    "DELAYED:tr:handle(0x%04x), (open)\n",
>  		    ioc->name, handle));
> -		return;
> +		goto out;
>  	}
>  
>  	dewtprintk(ioc, pr_info(MPT3SAS_FMT
> @@ -2835,6 +2947,9 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  	mpi_request->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET;
>  	mpt3sas_base_put_smid_hi_priority(ioc, smid);
>  	mpt3sas_trigger_master(ioc, MASTER_TRIGGER_DEVICE_REMOVAL);
> +out:
> +	if (sas_device)
> +		sas_device_put(sas_device);
>  }
>  
>  /**
> @@ -3685,7 +3800,6 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
>  	char *desc_scsi_state = ioc->tmp_string;
>  	u32 log_info = le32_to_cpu(mpi_reply->IOCLogInfo);
>  	struct _sas_device *sas_device = NULL;
> -	unsigned long flags;
>  	struct scsi_target *starget = scmd->device->sdev_target;
>  	struct MPT3SAS_TARGET *priv_target = starget->hostdata;
>  	char *device_str = NULL;
> @@ -3813,9 +3927,7 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
>  		pr_warn(MPT3SAS_FMT "\t%s wwid(0x%016llx)\n", ioc->name,
>  		    device_str, (unsigned long long)priv_target->sas_address);
>  	} else {
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -		    priv_target->sas_address);
> +		sas_device = mpt3sas_get_sdev_from_target(ioc, priv_target);
>  		if (sas_device) {
>  			pr_warn(MPT3SAS_FMT
>  				"\tsas_address(0x%016llx), phy(%d)\n",
> @@ -3825,8 +3937,9 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
>  			    "\tenclosure_logical_id(0x%016llx), slot(%d)\n",
>  			    ioc->name, (unsigned long long)
>  			    sas_device->enclosure_logical_id, sas_device->slot);
> +
> +			sas_device_put(sas_device);
>  		}
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	}
>  
>  	pr_warn(MPT3SAS_FMT
> @@ -3878,7 +3991,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  	Mpi2SepRequest_t mpi_request;
>  	struct _sas_device *sas_device;
>  
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
>  	if (!sas_device)
>  		return;
>  
> @@ -3893,7 +4006,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  	    &mpi_request)) != 0) {
>  		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
>  		__FILE__, __LINE__, __func__);
> -		return;
> +		goto out;
>  	}
>  	sas_device->pfa_led_on = 1;
>  
> @@ -3902,8 +4015,10 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  			"enclosure_processor: ioc_status (0x%04x), loginfo(0x%08x)\n",
>  			ioc->name, le16_to_cpu(mpi_reply.IOCStatus),
>  		    le32_to_cpu(mpi_reply.IOCLogInfo)));
> -		return;
> +		goto out;
>  	}
> +out:
> +	sas_device_put(sas_device);
>  }
>  /**
>   * _scsih_turn_off_pfa_led - turn off Fault LED
> @@ -3986,19 +4101,17 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  
>  	/* only handle non-raid devices */
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>  	if (!sas_device) {
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> +		goto out_unlock;
>  	}
>  	starget = sas_device->starget;
>  	sas_target_priv_data = starget->hostdata;
>  
>  	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
> -	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> -	}
> +	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)))
> +		goto out_unlock;
> +
>  	starget_printk(KERN_WARNING, starget, "predicted fault\n");
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
> @@ -4012,7 +4125,7 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  	if (!event_reply) {
>  		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
>  		    ioc->name, __FILE__, __LINE__, __func__);
> -		return;
> +		goto out;
>  	}
>  
>  	event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
> @@ -4029,6 +4142,14 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  	event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
>  	mpt3sas_ctl_add_to_event_log(ioc, event_reply);
>  	kfree(event_reply);
> +out:
> +	if (sas_device)
> +		sas_device_put(sas_device);
> +	return;
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +	goto out;
>  }
>  
>  /**
> @@ -4772,13 +4893,11 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  	    sas_address);
>  
> -	if (!sas_device) {
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> -	}
> +	if (!sas_device)
> +		goto out_unlock;
>  
>  	if (unlikely(sas_device->handle != handle)) {
>  		starget = sas_device->starget;
> @@ -4796,20 +4915,24 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
>  		pr_err(MPT3SAS_FMT
>  			"device is not present handle(0x%04x), flags!!!\n",
>  			ioc->name, handle);
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> +		goto out_unlock;
>  	}
>  
>  	/* check if there were any issues with discovery */
>  	if (_scsih_check_access_status(ioc, sas_address, handle,
> -	    sas_device_pg0.AccessStatus)) {
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> -	}
> +	    sas_device_pg0.AccessStatus))
> +		goto out_unlock;
>  
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	_scsih_ublock_io_device(ioc, sas_address);
> +	if (sas_device)
> +		sas_device_put(sas_device);
> +	return;
>  
> +out_unlock:
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +	if (sas_device)
> +		sas_device_put(sas_device);
>  }
>  
>  /**
> @@ -4834,7 +4957,6 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>  	u32 ioc_status;
>  	u64 sas_address;
>  	u32 device_info;
> -	unsigned long flags;
>  
>  	if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
>  	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
> @@ -4870,13 +4992,11 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>  	    sas_device_pg0.AccessStatus))
>  		return -1;
>  
> -	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -	    sas_address);
> -	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -
> -	if (sas_device)
> +	sas_device = mpt3sas_get_sdev_by_addr(ioc, sas_address);
> +	if (sas_device) {
> +		sas_device_put(sas_device);
>  		return -1;
> +	}
>  
>  	sas_device = kzalloc(sizeof(struct _sas_device),
>  	    GFP_KERNEL);
> @@ -4886,6 +5006,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>  		return 0;
>  	}
>  
> +	kref_init(&sas_device->refcount);
>  	sas_device->handle = handle;
>  	if (_scsih_get_sas_address(ioc,
>  	    le16_to_cpu(sas_device_pg0.ParentDevHandle),
> @@ -4917,6 +5038,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>  	else
>  		_scsih_sas_device_add(ioc, sas_device);
>  
> +	sas_device_put(sas_device);
>  	return 0;
>  }
>  
> @@ -4965,8 +5087,6 @@ _scsih_remove_device(struct MPT3SAS_ADAPTER *ioc,
>  		ioc->name, __func__,
>  	    sas_device->handle, (unsigned long long)
>  	    sas_device->sas_address));
> -
> -	kfree(sas_device);
>  }
>  
>  #ifdef CONFIG_SCSI_MPT3SAS_LOGGING
> @@ -5291,25 +5411,25 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc,
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	sas_address = le64_to_cpu(event_data->SASAddress);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -	    sas_address);
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
>  
> -	if (!sas_device || !sas_device->starget) {
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> -	}
> +	if (!sas_device || !sas_device->starget)
> +		goto out;
>  
>  	target_priv_data = sas_device->starget->hostdata;
> -	if (!target_priv_data) {
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		return;
> -	}
> +	if (!target_priv_data)
> +		goto out;
>  
>  	if (event_data->ReasonCode ==
>  	    MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET)
>  		target_priv_data->tm_busy = 1;
>  	else
>  		target_priv_data->tm_busy = 0;
> +
> +out:
> +	if (sas_device)
> +		sas_device_put(sas_device);
> +
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>  
> @@ -5793,7 +5913,7 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
>  	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>  	if (sas_device) {
>  		sas_device->volume_handle = 0;
>  		sas_device->volume_wwid = 0;
> @@ -5812,6 +5932,8 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
>  	/* exposing raid component */
>  	if (starget)
>  		starget_for_each_device(starget, NULL, _scsih_reprobe_lun);
> +
> +	sas_device_put(sas_device);
>  }
>  
>  /**
> @@ -5840,7 +5962,7 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
>  		    &volume_wwid);
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>  	if (sas_device) {
>  		set_bit(handle, ioc->pd_handles);
>  		if (sas_device->starget && sas_device->starget->hostdata) {
> @@ -5860,6 +5982,8 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
>  	_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
>  	if (starget)
>  		starget_for_each_device(starget, (void *)1, _scsih_reprobe_lun);
> +
> +	sas_device_put(sas_device);
>  }
>  
>  /**
> @@ -5892,7 +6016,6 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
>  	Mpi2EventIrConfigElement_t *element)
>  {
>  	struct _sas_device *sas_device;
> -	unsigned long flags;
>  	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
>  	Mpi2ConfigReply_t mpi_reply;
>  	Mpi2SasDevicePage0_t sas_device_pg0;
> @@ -5902,11 +6025,10 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
>  
>  	set_bit(handle, ioc->pd_handles);
>  
> -	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
>  	if (sas_device) {
>  		_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
> +		sas_device_put(sas_device);
>  		return;
>  	}
>  
> @@ -6183,7 +6305,6 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
>  	u16 handle, parent_handle;
>  	u32 state;
>  	struct _sas_device *sas_device;
> -	unsigned long flags;
>  	Mpi2ConfigReply_t mpi_reply;
>  	Mpi2SasDevicePage0_t sas_device_pg0;
>  	u32 ioc_status;
> @@ -6212,12 +6333,12 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
>  	case MPI2_RAID_PD_STATE_HOT_SPARE:
>  
>  		set_bit(handle, ioc->pd_handles);
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  
> -		if (sas_device)
> +		sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> +		if (sas_device) {
> +			sas_device_put(sas_device);
>  			return;
> +		}
>  
>  		if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
>  		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
> @@ -6673,6 +6794,7 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
>  	struct _raid_device *raid_device, *raid_device_next;
>  	struct list_head tmp_list;
>  	unsigned long flags;
> +	LIST_HEAD(head);
>  
>  	pr_info(MPT3SAS_FMT "removing unresponding devices: start\n",
>  	    ioc->name);
> @@ -6680,14 +6802,29 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
>  	/* removing unresponding end devices */
>  	pr_info(MPT3SAS_FMT "removing unresponding devices: end-devices\n",
>  	    ioc->name);
> +
> +	/*
> +	 * Iterate, pulling off devices marked as non-responding. We become the
> +	 * owner for the reference the list had on any object we prune.
> +	 */
> +	spin_lock_irqsave(&ioc->sas_device_lock, flags);
>  	list_for_each_entry_safe(sas_device, sas_device_next,
> -	    &ioc->sas_device_list, list) {
> +				 &ioc->sas_device_list, list) {
>  		if (!sas_device->responding)
> -			mpt3sas_device_remove_by_sas_address(ioc,
> -			    sas_device->sas_address);
> +			list_move_tail(&sas_device->list, &head);
>  		else
>  			sas_device->responding = 0;
>  	}
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +	/*
> +	 * Now, uninitialize and remove the unresponding devices we pruned.
> +	 */
> +	list_for_each_entry_safe(sas_device, sas_device_next, &head, list) {
> +		_scsih_remove_device(ioc, sas_device);
> +		list_del_init(&sas_device->list);
> +		sas_device_put(sas_device);
> +	}
>  
>  	/* removing unresponding volumes */
>  	if (ioc->ir_firmware) {
> @@ -6841,11 +6978,11 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
>  		}
>  		phys_disk_num = pd_pg0.PhysDiskNum;
>  		handle = le16_to_cpu(pd_pg0.DevHandle);
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		if (sas_device)
> +		sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> +		if (sas_device) {
> +			sas_device_put(sas_device);
>  			continue;
> +		}
>  		if (mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
>  		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
>  		    handle) != 0)
> @@ -6966,12 +7103,12 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
>  		if (!(_scsih_is_end_device(
>  		    le32_to_cpu(sas_device_pg0.DeviceInfo))))
>  			continue;
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +		sas_device = mpt3sas_get_sdev_by_addr(ioc,
>  		    le64_to_cpu(sas_device_pg0.SASAddress));
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -		if (sas_device)
> +		if (sas_device) {
> +			sas_device_put(sas_device);
>  			continue;
> +		}
>  		parent_handle = le16_to_cpu(sas_device_pg0.ParentDevHandle);
>  		if (!_scsih_get_sas_address(ioc, parent_handle, &sas_address)) {
>  			pr_info(MPT3SAS_FMT "\tBEFORE adding end device: " \
> @@ -7598,6 +7735,48 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
>  	}
>  }
>  
> +static struct _sas_device *get_next_sas_device(struct MPT3SAS_ADAPTER *ioc)
> +{
> +	struct _sas_device *sas_device = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +	if (!list_empty(&ioc->sas_device_init_list)) {
> +		sas_device = list_first_entry(&ioc->sas_device_init_list,
> +				struct _sas_device, list);
> +		sas_device_get(sas_device);
> +	}
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +	return sas_device;
> +}
> +
> +static void sas_device_make_active(struct MPT3SAS_ADAPTER *ioc,
> +		struct _sas_device *sas_device)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +
> +	/*
> +	 * Since we dropped the lock during the call to port_add(), we need to
> +	 * be careful here that somebody else didn't move or delete this item
> +	 * while we were busy with other things.
> +	 *
> +	 * If it was on the list, we need a put() for the reference the list
> +	 * had. Either way, we need a get() for the destination list.
> +	 */
> +	if (!list_empty(&sas_device->list)) {
> +		list_del_init(&sas_device->list);
> +		sas_device_put(sas_device);
> +	}
> +
> +	sas_device_get(sas_device);
> +	list_add_tail(&sas_device->list, &ioc->sas_device_list);
> +
> +	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +}
> +
>  /**
>   * _scsih_probe_sas - reporting sas devices to sas transport
>   * @ioc: per adapter object
> @@ -7607,17 +7786,13 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
>  static void
>  _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  {
> -	struct _sas_device *sas_device, *next;
> -	unsigned long flags;
> -
> -	/* SAS Device List */
> -	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> -	    list) {
> +	struct _sas_device *sas_device;
>  
> +	while ((sas_device = get_next_sas_device(ioc))) {
>  		if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> -		    sas_device->sas_address_parent)) {
> -			list_del(&sas_device->list);
> -			kfree(sas_device);
> +				sas_device->sas_address_parent)) {
> +			_scsih_sas_device_remove(ioc, sas_device);
> +			sas_device_put(sas_device);
>  			continue;
>  		} else if (!sas_device->starget) {
>  			/*
> @@ -7628,17 +7803,16 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  			 */
>  			if (!ioc->is_driver_loading) {
>  				mpt3sas_transport_port_remove(ioc,
> -				    sas_device->sas_address,
> -				    sas_device->sas_address_parent);
> -				list_del(&sas_device->list);
> -				kfree(sas_device);
> +						sas_device->sas_address,
> +						sas_device->sas_address_parent);
> +				_scsih_sas_device_remove(ioc, sas_device);
> +				sas_device_put(sas_device);
>  				continue;
>  			}
>  		}
>  
> -		spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -		list_move_tail(&sas_device->list, &ioc->sas_device_list);
> -		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +		sas_device_make_active(ioc, sas_device);
> +		sas_device_put(sas_device);
>  	}
>  }
>  
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> index efb98af..2f9d038 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> @@ -1306,15 +1306,17 @@ _transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
>  	int rc;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  	    rphy->identify.sas_address);
>  	if (sas_device) {
>  		*identifier = sas_device->enclosure_logical_id;
>  		rc = 0;
> +		sas_device_put(sas_device);
>  	} else {
>  		*identifier = 0;
>  		rc = -ENXIO;
>  	}
> +
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	return rc;
>  }
> @@ -1334,12 +1336,14 @@ _transport_get_bay_identifier(struct sas_rphy *rphy)
>  	int rc;
>  
>  	spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>  	    rphy->identify.sas_address);
> -	if (sas_device)
> +	if (sas_device) {
>  		rc = sas_device->slot;
> -	else
> +		sas_device_put(sas_device);
> +	} else {
>  		rc = -ENXIO;
> +	}
>  	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  	return rc;
>  }
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-26 23:58   ` Nicholas A. Bellinger
@ 2015-08-27  5:07     ` Sreekanth Reddy
  2015-08-27  7:07       ` Nicholas A. Bellinger
  2015-08-27 14:40       ` James Bottomley
  0 siblings, 2 replies; 15+ messages in thread
From: Sreekanth Reddy @ 2015-08-27  5:07 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Calvin Owens, Nicholas A. Bellinger, linux-scsi, linux-kernel,
	James Bottomley, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

HI Nicholas & Calvin,

Thanks for the patchset. Sure We will review and we do some unit
testing on this patch series. Currently my bandwidth is occupied with
some internal activity, so by end of next week I will acknowledge this
series if all the thing are fine with this patch series.

Thanks,
Sreekanth

On Thu, Aug 27, 2015 at 5:28 AM, Nicholas A. Bellinger
<nab@linux-iscsi.org> wrote:
> On Wed, 2015-08-26 at 16:54 -0700, Calvin Owens wrote:
>> On Wednesday 08/26 at 04:09 +0000, Nicholas A. Bellinger wrote:
>> > From: Nicholas Bellinger <nab@linux-iscsi.org>
>> >
>> > Hi James & Co,
>> >
>> > This series is a mpt3sas forward port of Calvin Owens' in-flight
>> > reference counting bugfixes for mpt2sas LLD code here:
>> >
>> > [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
>> > http://marc.info/?l=linux-scsi&m=143951695904115&w=2
>>
>> Nicholas,
>>
>> I'm glad to hear these fixes were helpful! I was planning on porting to
>> mpt3sas in the near future, thanks for saving me the trouble :)
>>
>
> Would you mind giving this series the once over, and add your
> Reviewed-by as well..?
>
> Sreekanth + Avago folks, can you please do the same for Calvin's mpt2sas
> series and this series..?
>
> This has been a long-standing issue with mpt*sas LLD code and really
> needs to be addressed for upstream.
>
> Thank you,
>
> --nab
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-27  5:07     ` Sreekanth Reddy
@ 2015-08-27  7:07       ` Nicholas A. Bellinger
  2015-08-27 14:40       ` James Bottomley
  1 sibling, 0 replies; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-27  7:07 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Calvin Owens, Nicholas A. Bellinger, linux-scsi, linux-kernel,
	James Bottomley, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

Hi Sreekanth,

On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> HI Nicholas & Calvin,
> 
> Thanks for the patchset. Sure We will review and we do some unit
> testing on this patch series. Currently my bandwidth is occupied with
> some internal activity, so by end of next week I will acknowledge this
> series if all the thing are fine with this patch series.
> 

Thanks for the quick response.

So Calvin's mpt2sas series has been merged into scsi.git/for-next.

Since this series is AFAICT functionally identical to Calvin's changes,
I assume JEJB will be merging it up for v4.3-rc1 soon too.

Also, we're probably not the only ones hitting this class of OOPsen, so
it will certainly need to be picked up for stable at some point, and any
further regressions will need to be addressed.

Please keep the list posted of your testing progress.

For reference, here are prerequisites need to apply atop v3.14.y:

4dc06fd mpt3sas: delay scsi_add_host call to work with scsi-mq
35b6236 mpt3sas: combine fw_event_work and its event_data
62c4da4 mpt3sas: correct scsi_{target,device} hostdata allocation

--nab


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-27  5:07     ` Sreekanth Reddy
  2015-08-27  7:07       ` Nicholas A. Bellinger
@ 2015-08-27 14:40       ` James Bottomley
  2015-08-27 14:48         ` Sreekanth Reddy
  2015-08-27 19:15         ` Nicholas A. Bellinger
  1 sibling, 2 replies; 15+ messages in thread
From: James Bottomley @ 2015-08-27 14:40 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Nicholas A. Bellinger, Calvin Owens, Nicholas A. Bellinger,
	linux-scsi, linux-kernel, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> HI Nicholas & Calvin,
> 
> Thanks for the patchset. Sure We will review and we do some unit
> testing on this patch series. Currently my bandwidth is occupied with
> some internal activity, so by end of next week I will acknowledge this
> series if all the thing are fine with this patch series.

Calvin responded to your review feedback and that series has been
outstanding for a while, so I'm not going to drop it from the misc tree.
However, I will reorder to make it ready for the second push. You have
until Friday week to find a problem with it.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-27 14:40       ` James Bottomley
@ 2015-08-27 14:48         ` Sreekanth Reddy
  2015-08-27 19:15         ` Nicholas A. Bellinger
  1 sibling, 0 replies; 15+ messages in thread
From: Sreekanth Reddy @ 2015-08-27 14:48 UTC (permalink / raw)
  To: James Bottomley
  Cc: Nicholas A. Bellinger, Calvin Owens, Nicholas A. Bellinger,
	linux-scsi, linux-kernel, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team, Krishnaraddi Mankani

Ok James, by 9/4/2015 we will provide our feedback over this patch series.

Thanks,
Sreekanth


On Thu, Aug 27, 2015 at 8:10 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
>> HI Nicholas & Calvin,
>>
>> Thanks for the patchset. Sure We will review and we do some unit
>> testing on this patch series. Currently my bandwidth is occupied with
>> some internal activity, so by end of next week I will acknowledge this
>> series if all the thing are fine with this patch series.
>
> Calvin responded to your review feedback and that series has been
> outstanding for a while, so I'm not going to drop it from the misc tree.
> However, I will reorder to make it ready for the second push. You have
> until Friday week to find a problem with it.
>
> James
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-27 14:40       ` James Bottomley
  2015-08-27 14:48         ` Sreekanth Reddy
@ 2015-08-27 19:15         ` Nicholas A. Bellinger
  2015-08-28 20:25           ` James Bottomley
  1 sibling, 1 reply; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-27 19:15 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sreekanth Reddy, Calvin Owens, Nicholas A. Bellinger, linux-scsi,
	linux-kernel, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > HI Nicholas & Calvin,
> > 
> > Thanks for the patchset. Sure We will review and we do some unit
> > testing on this patch series. Currently my bandwidth is occupied with
> > some internal activity, so by end of next week I will acknowledge this
> > series if all the thing are fine with this patch series.
> 
> Calvin responded to your review feedback and that series has been
> outstanding for a while, so I'm not going to drop it from the misc tree.
> However, I will reorder to make it ready for the second push. You have
> until Friday week to find a problem with it.
> 

James, as mentioned this series is functionally identical to Calvin's
mpt2sas series.

Please consider merging it to scsi.git/for-next, so both series are
together and in-sync.

--nab


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-27 19:15         ` Nicholas A. Bellinger
@ 2015-08-28 20:25           ` James Bottomley
  2015-08-30  7:22             ` Nicholas A. Bellinger
  0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2015-08-28 20:25 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Sreekanth Reddy, Calvin Owens, Nicholas A. Bellinger, linux-scsi,
	linux-kernel, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > HI Nicholas & Calvin,
> > > 
> > > Thanks for the patchset. Sure We will review and we do some unit
> > > testing on this patch series. Currently my bandwidth is occupied with
> > > some internal activity, so by end of next week I will acknowledge this
> > > series if all the thing are fine with this patch series.
> > 
> > Calvin responded to your review feedback and that series has been
> > outstanding for a while, so I'm not going to drop it from the misc tree.
> > However, I will reorder to make it ready for the second push. You have
> > until Friday week to find a problem with it.
> > 
> 
> James, as mentioned this series is functionally identical to Calvin's
> mpt2sas series.
> 
> Please consider merging it to scsi.git/for-next, so both series are
> together and in-sync.

Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
the mpt_sas code bases.  This patch is also dangerous: the early
versions left unremoved objects lying around, so  getting some stress
testing from avago is very useful.  At this point in the cycle, the risk
vs reward of doing a blind upport to mpt3_sas is just too great and the
time for review and stress testing too limited within the merge window.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-28 20:25           ` James Bottomley
@ 2015-08-30  7:22             ` Nicholas A. Bellinger
  2015-08-30 16:14               ` James Bottomley
  0 siblings, 1 reply; 15+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-30  7:22 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sreekanth Reddy, Calvin Owens, Nicholas A. Bellinger, linux-scsi,
	linux-kernel, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

On Fri, 2015-08-28 at 13:25 -0700, James Bottomley wrote:
> On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> > On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > > HI Nicholas & Calvin,
> > > > 
> > > > Thanks for the patchset. Sure We will review and we do some unit
> > > > testing on this patch series. Currently my bandwidth is occupied with
> > > > some internal activity, so by end of next week I will acknowledge this
> > > > series if all the thing are fine with this patch series.
> > > 
> > > Calvin responded to your review feedback and that series has been
> > > outstanding for a while, so I'm not going to drop it from the misc tree.
> > > However, I will reorder to make it ready for the second push. You have
> > > until Friday week to find a problem with it.
> > > 
> > 
> > James, as mentioned this series is functionally identical to Calvin's
> > mpt2sas series.
> > 
> > Please consider merging it to scsi.git/for-next, so both series are
> > together and in-sync.
> 
> Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
> the mpt_sas code bases.  This patch is also dangerous: the early
> versions left unremoved objects lying around, so  getting some stress
> testing from avago is very useful.  At this point in the cycle, the risk
> vs reward of doing a blind upport to mpt3_sas is just too great and the
> time for review and stress testing too limited within the merge window.

To clarify, this series is Calvin's latest -v4 mpt2sas changes that
you've already merged into for-next, and that have been applied (by
hand) to v4.2-rc1 mpt3sas code.

If you look closer, this series is an obvious bug-fix for a class of
long-standing bugs within mpt*sas, and I don't see how keeping the
broken list_head dereferences in one LLD, but not the other makes any
sense at this point. 

Unfortunately, the mpt3sas patches you've merged this week add yet more
bogus mpt3sas_scsih_sas_device_find_by_sas_address() usage.  Really,
adding more broken code to mpt3sas can't possibly be better than just
merging this bug-fix series.

Here's are two cases that required fixing to apply this series atop
latest scsi.git/for-next:

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 85ff0dd..897153b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2866,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
        struct scsi_device *sdev;
        struct _sas_device *sas_device;
 
-       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
        if (!sas_device)
                return;
 
@@ -2882,6 +2890,8 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
                        continue;
                _scsih_internal_device_block(sdev, sas_device_priv_data);
        }
+
+       sas_device_put(sas_device);
 }
 
 /**
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 18f1de5..6074b11 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -734,7 +734,7 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
        rphy->identify = mpt3sas_port->remote_identify;
 
        if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
-               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+               sas_device = __mpt3sas_get_sdev_by_addr(ioc,
                                    mpt3sas_port->remote_identify.sas_address);
                if (!sas_device) {
                        dfailprintk(ioc, printk(MPT3SAS_FMT
@@ -750,8 +750,10 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
                    ioc->name, __FILE__, __LINE__, __func__);
        }
 
-       if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE)
+       if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
                sas_device->pend_sas_rphy_add = 0;
+               sas_device_put(sas_device);
+       }
 
        if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
                dev_printk(KERN_INFO, &rphy->dev,

Also, I'm currently using the -v1 series on v3.14.47 atop 40 nodes with
12 HDDs per HBA. (480 total), and the number of HBAs using this series
will double over the next week.  The specific hardware setup is:

  LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

Thus far, it has resolved the original OOPsen bug that would appear
occasionally during boot with a failing HDD.  So far, no other new
regressions have appeared.

That said, I'll be posting the updated -v2 atop current scsi/for-next
shortly, and will push to target-pending/for-next-merge for now to be
picked up for 0-day + linux-next.

Please consider picking it up for v4.3-rc1, otherwise I'll plan to push
to Linus with Sreekanth's ACK, barring any new regressions or other
specific -v2 code comments.

--nab


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas
  2015-08-30  7:22             ` Nicholas A. Bellinger
@ 2015-08-30 16:14               ` James Bottomley
  0 siblings, 0 replies; 15+ messages in thread
From: James Bottomley @ 2015-08-30 16:14 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Sreekanth Reddy, Calvin Owens, Nicholas A. Bellinger, linux-scsi,
	linux-kernel, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team

On Sun, 2015-08-30 at 00:22 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2015-08-28 at 13:25 -0700, James Bottomley wrote:
> > On Thu, 2015-08-27 at 12:15 -0700, Nicholas A. Bellinger wrote:
> > > On Thu, 2015-08-27 at 07:40 -0700, James Bottomley wrote:
> > > > On Thu, 2015-08-27 at 10:37 +0530, Sreekanth Reddy wrote:
> > > > > HI Nicholas & Calvin,
> > > > > 
> > > > > Thanks for the patchset. Sure We will review and we do some unit
> > > > > testing on this patch series. Currently my bandwidth is occupied with
> > > > > some internal activity, so by end of next week I will acknowledge this
> > > > > series if all the thing are fine with this patch series.
> > > > 
> > > > Calvin responded to your review feedback and that series has been
> > > > outstanding for a while, so I'm not going to drop it from the misc tree.
> > > > However, I will reorder to make it ready for the second push. You have
> > > > until Friday week to find a problem with it.
> > > > 
> > > 
> > > James, as mentioned this series is functionally identical to Calvin's
> > > mpt2sas series.
> > > 
> > > Please consider merging it to scsi.git/for-next, so both series are
> > > together and in-sync.
> > 
> > Unfortunately, the driver isn't, thanks to drift between v2 and v3 of
> > the mpt_sas code bases.  This patch is also dangerous: the early
> > versions left unremoved objects lying around, so  getting some stress
> > testing from avago is very useful.  At this point in the cycle, the risk
> > vs reward of doing a blind upport to mpt3_sas is just too great and the
> > time for review and stress testing too limited within the merge window.
> 
> To clarify, this series is Calvin's latest -v4 mpt2sas changes that
> you've already merged into for-next, and that have been applied (by
> hand) to v4.2-rc1 mpt3sas code.
> 
> If you look closer, this series is an obvious bug-fix for a class of
> long-standing bugs within mpt*sas, and I don't see how keeping the
> broken list_head dereferences in one LLD, but not the other makes any
> sense at this point. 

Look, Nic, the original patches went through four reviews with issues
uncovered and fixed at most of them.  They're now sitting at the head of
the misc tree while they get stress tested so they can easily be ejected
without affecting the rest of the tree should anything fail.

They're hardly "an obvious bug fix" they actually introduce a separated
lifetime for _sas_device and fw_event_work.

Separated lifetime patches should always be a last resort because they
introduce a whole new set of lifetime handling rules for the objects.
Get one of them wrong and either you free it too early (oops) or pin it
forever.  The general rule of thumb is never do separated lifetime
objects unless you really really have to.  Ideally pin them to existing
core objects.  The only reason I'm considering this is because no one
can see how to pin _sas_device to the natural core object (scsi_device).
The firmware event one doesn't make sense to me at all so I'm relying on
the reviewers.  Ideally a fw event is allocated in the event setup and
freed in the event fire (or cancel)  I really don't see we need a
separated lifetime object here.  you have a callback to mediate exactly
whether the event fired or not in the cancel, so there's no ambiguity
about ownership.

Simply because of this, your upport to mp3sas can't go in without the
same level of review and testing.

> Unfortunately, the mpt3sas patches you've merged this week add yet more
> bogus mpt3sas_scsih_sas_device_find_by_sas_address() usage.  Really,
> adding more broken code to mpt3sas can't possibly be better than just
> merging this bug-fix series.

Well, it had two reviewers per patch, one of whom was Martin Petersen
who doesn't give reviewed by lightly.  Funnily enough it seems the list
lost your review feedback for this patch set.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-08-30 16:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-26  4:09 [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Nicholas A. Bellinger
2015-08-26  4:09 ` [PATCH 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
2015-08-27  2:20   ` Calvin Owens
2015-08-26  4:09 ` [PATCH 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
2015-08-27  2:06   ` Calvin Owens
2015-08-26 23:54 ` [PATCH 0/2] mpt3sas: Reference counting fixes from in-flight mpt2sas Calvin Owens
2015-08-26 23:58   ` Nicholas A. Bellinger
2015-08-27  5:07     ` Sreekanth Reddy
2015-08-27  7:07       ` Nicholas A. Bellinger
2015-08-27 14:40       ` James Bottomley
2015-08-27 14:48         ` Sreekanth Reddy
2015-08-27 19:15         ` Nicholas A. Bellinger
2015-08-28 20:25           ` James Bottomley
2015-08-30  7:22             ` Nicholas A. Bellinger
2015-08-30 16:14               ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).