All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas
@ 2015-08-30  7:54 Nicholas A. Bellinger
  2015-08-30  7:54 ` [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-30  7:54 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, James Bottomley, Calvin Owens, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, kernel-team,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

Hi all,

This series is a mpt3sas LLD forward port of Calvin Owens' for-next
reference counting bugfix series for mpt2sas LLD code.

His latest patch series can be found here:

[PATCH v4 0/2] Fixes for memory corruption in mpt2sas
http://marc.info/?l=linux-scsi&m=143951695904115&w=2

The differences between mpt2sas + mpt3sas in this area are very
small, and the changes are required to address a class of long
standing NULL pointer dereference bugs through-out both LLDs.

With Calvin's changes in place for mpt3sas, the original NULL
pointer dereference OOPsen during probe with a failed HDD appear
to be resolved, and so far no new regressions have been reported
with -v1 series code.

The -v1 series code for mpt3sas has been tested on v3.14.47 with
40x SAS3008 HBAs, with three preceeding upstream mpt3sas patches:

4dc06fd mpt3sas: delay scsi_add_host call to work with scsi-mq
35b6236 mpt3sas: combine fw_event_work and its event_data
62c4da4 mpt3sas: correct scsi_{target,device} hostdata allocation

Please review.

--nab

-v2 changes:
  - Fix _scsih_block_io_device() v4.3-rc0 brekage
  - Fix mpt3sas_transport_port_add() v4.3-rc0 breakage

Nicholas Bellinger (2):
  mpt3sas: Refcount sas_device objects and fix unsafe list usage
  mpt3sas: Refcount fw_events and fix unsafe list usage

 drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 595 ++++++++++++++++++++++---------
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
 3 files changed, 458 insertions(+), 180 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-08-30  7:54 [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger
@ 2015-08-30  7:54 ` Nicholas A. Bellinger
  2015-09-08 11:55   ` Sreekanth Reddy
  2015-08-30  7:54 ` [PATCH-v2 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
  2015-09-04 18:47 ` [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger
  2 siblings, 1 reply; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-30  7:54 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, James Bottomley, Calvin Owens, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, kernel-team,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

These objects can be referenced concurrently throughout the driver, we
need a way to make sure threads can't delete them out from under
each other. This patch adds the refcount, and refactors the code to use
it.

Additionally, we cannot iterate over the sas_device_list without
holding the lock, or we risk corrupting random memory if items are
added or deleted as we iterate. This patch refactors _scsih_probe_sas()
to use the sas_device_list in a safe way.

This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
mpt3sas changes in scsi.git/for-next.

Cc: Calvin Owens <calvinowens@fb.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479 +++++++++++++++++++++----------
 drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
 3 files changed, 364 insertions(+), 158 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index f0e462b..9a73c8b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -248,6 +248,7 @@ struct Mpi2ManufacturingPage11_t {
  * @flags: MPT_TARGET_FLAGS_XXX flags
  * @deleted: target flaged for deletion
  * @tm_busy: target is busy with TM request.
+ * @sdev: The sas_device associated with this target
  */
 struct MPT3SAS_TARGET {
 	struct scsi_target *starget;
@@ -257,6 +258,7 @@ struct MPT3SAS_TARGET {
 	u32	flags;
 	u8	deleted;
 	u8	tm_busy;
+	struct	_sas_device *sdev;
 };
 
 
@@ -335,6 +337,9 @@ struct _internal_cmd {
  * @pfa_led_on: flag for PFA LED status
  * @pend_sas_rphy_add: flag to check if device is in sas_rphy_add()
  *	addition routine.
+ * @enclosure_level: used for enclosure services
+ * @connector_name: ASCII value from pg0.ConnectorName
+ * @refcount: reference count for deletion
  */
 struct _sas_device {
 	struct list_head list;
@@ -358,8 +363,24 @@ struct _sas_device {
 	u8	pend_sas_rphy_add;
 	u8	enclosure_level;
 	u8	connector_name[4];
+	struct kref refcount;
 };
 
+static inline void sas_device_get(struct _sas_device *s)
+{
+	kref_get(&s->refcount);
+}
+
+static inline void sas_device_free(struct kref *r)
+{
+	kfree(container_of(r, struct _sas_device, refcount));
+}
+
+static inline void sas_device_put(struct _sas_device *s)
+{
+	kref_put(&s->refcount, sas_device_free);
+}
+
 /**
  * struct _raid_device - raid volume link list
  * @list: sas device list
@@ -1090,7 +1111,9 @@ struct _sas_node *mpt3sas_scsih_expander_find_by_handle(
 	struct MPT3SAS_ADAPTER *ioc, u16 handle);
 struct _sas_node *mpt3sas_scsih_expander_find_by_sas_address(
 	struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
-struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
+struct _sas_device *mpt3sas_get_sdev_by_addr(
+	 struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
+struct _sas_device *__mpt3sas_get_sdev_by_addr(
 	struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
 
 void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 8ccef38..897153b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -518,8 +518,61 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
 	}
 }
 
+static struct _sas_device *
+__mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
+	struct MPT3SAS_TARGET *tgt_priv)
+{
+	struct _sas_device *ret;
+
+	assert_spin_locked(&ioc->sas_device_lock);
+
+	ret = tgt_priv->sdev;
+	if (ret)
+		sas_device_get(ret);
+
+	return ret;
+}
+
+static struct _sas_device *
+mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
+	struct MPT3SAS_TARGET *tgt_priv)
+{
+	struct _sas_device *ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	ret = __mpt3sas_get_sdev_from_target(ioc, tgt_priv);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return ret;
+}
+
+struct _sas_device *
+__mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
+	u64 sas_address)
+{
+	struct _sas_device *sas_device;
+
+	assert_spin_locked(&ioc->sas_device_lock);
+
+	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
+		if (sas_device->sas_address == sas_address)
+			goto found_device;
+
+	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
+		if (sas_device->sas_address == sas_address)
+			goto found_device;
+
+	return NULL;
+
+found_device:
+	sas_device_get(sas_device);
+	return sas_device;
+}
+
+
 /**
- * mpt3sas_scsih_sas_device_find_by_sas_address - sas device search
+ * mpt3sas_get_sdev_by_addr - sas device search
  * @ioc: per adapter object
  * @sas_address: sas address
  * Context: Calling function should acquire ioc->sas_device_lock
@@ -528,24 +581,44 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
  * object.
  */
 struct _sas_device *
-mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
+mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
 	u64 sas_address)
 {
 	struct _sas_device *sas_device;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+			sas_address);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return sas_device;
+}
+
+static struct _sas_device *
+__mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
+{
+	struct _sas_device *sas_device;
+
+	assert_spin_locked(&ioc->sas_device_lock);
 
 	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
-		if (sas_device->sas_address == sas_address)
-			return sas_device;
+		if (sas_device->handle == handle)
+			goto found_device;
 
 	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
-		if (sas_device->sas_address == sas_address)
-			return sas_device;
+		if (sas_device->handle == handle)
+			goto found_device;
 
 	return NULL;
+
+found_device:
+	sas_device_get(sas_device);
+	return sas_device;
 }
 
 /**
- * _scsih_sas_device_find_by_handle - sas device search
+ * mpt3sas_get_sdev_by_handle - sas device search
  * @ioc: per adapter object
  * @handle: sas device handle (assigned by firmware)
  * Context: Calling function should acquire ioc->sas_device_lock
@@ -554,19 +627,16 @@ mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
  * object.
  */
 static struct _sas_device *
-_scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
+mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 {
 	struct _sas_device *sas_device;
+	unsigned long flags;
 
-	list_for_each_entry(sas_device, &ioc->sas_device_list, list)
-		if (sas_device->handle == handle)
-			return sas_device;
-
-	list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
-		if (sas_device->handle == handle)
-			return sas_device;
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
-	return NULL;
+	return sas_device;
 }
 
 /**
@@ -575,7 +645,7 @@ _scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
  * @sas_device: the sas_device object
  * Context: This function will acquire ioc->sas_device_lock.
  *
- * Removing object and freeing associated memory from the ioc->sas_device_list.
+ * If sas_device is on the list, remove it and decrement its reference count.
  */
 static void
 _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
@@ -601,10 +671,15 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
 		   "removing enclosure level(0x%04x), connector name( %s)\n",
 		   ioc->name, sas_device->enclosure_level,
 		   sas_device->connector_name);
-
+	/*
+	 * The lock serializes access to the list, but we still need to verify
+	 * that nobody removed the entry while we were waiting on the lock.
+	 */
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	list_del(&sas_device->list);
-	kfree(sas_device);
+	if (!list_empty(&sas_device->list)) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 }
 
@@ -625,12 +700,17 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-	if (sas_device)
-		list_del(&sas_device->list);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+	if (sas_device) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-	if (sas_device)
+
+	if (sas_device) {
 		_scsih_remove_device(ioc, sas_device);
+		sas_device_put(sas_device);
+	}
 }
 
 /**
@@ -651,13 +731,17 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	    sas_address);
-	if (sas_device)
-		list_del(&sas_device->list);
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
+	if (sas_device) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-	if (sas_device)
+
+	if (sas_device) {
 		_scsih_remove_device(ioc, sas_device);
+		sas_device_put(sas_device);
+	}
 }
 
 /**
@@ -692,6 +776,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
 		    sas_device->enclosure_level, sas_device->connector_name));
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device_get(sas_device);
 	list_add_tail(&sas_device->list, &ioc->sas_device_list);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
@@ -745,6 +830,7 @@ _scsih_sas_device_init_add(struct MPT3SAS_ADAPTER *ioc,
 		    sas_device->connector_name));
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	sas_device_get(sas_device);
 	list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
 	_scsih_determine_boot_device(ioc, sas_device, 0);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -1123,12 +1209,15 @@ _scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
 		goto not_sata;
 	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))
 		goto not_sata;
+
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	   sas_device_priv_data->sas_target->sas_address);
-	if (sas_device && sas_device->device_info &
-	    MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
-		max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
+	sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
+	if (sas_device) {
+		if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
+			max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
+
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
  not_sata:
@@ -1185,12 +1274,13 @@ _scsih_target_alloc(struct scsi_target *starget)
 	/* sas/sata devices */
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	rphy = dev_to_rphy(starget->dev.parent);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	   rphy->identify.sas_address);
 
 	if (sas_device) {
 		sas_target_priv_data->handle = sas_device->handle;
 		sas_target_priv_data->sas_address = sas_device->sas_address;
+		sas_target_priv_data->sdev = sas_device;
 		sas_device->starget = starget;
 		sas_device->id = starget->id;
 		sas_device->channel = starget->channel;
@@ -1240,13 +1330,21 @@ _scsih_target_destroy(struct scsi_target *starget)
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	rphy = dev_to_rphy(starget->dev.parent);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	   rphy->identify.sas_address);
+	sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
 	if (sas_device && (sas_device->starget == starget) &&
 	    (sas_device->id == starget->id) &&
 	    (sas_device->channel == starget->channel))
 		sas_device->starget = NULL;
 
+	if (sas_device) {
+		/*
+		 * Corresponding get() is in _scsih_target_alloc()
+		 */
+		sas_target_priv_data->sdev = NULL;
+		sas_device_put(sas_device);
+
+		sas_device_put(sas_device);
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
  out:
@@ -1302,7 +1400,7 @@ _scsih_slave_alloc(struct scsi_device *sdev)
 
 	if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+		sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 					sas_target_priv_data->sas_address);
 		if (sas_device && (sas_device->starget == NULL)) {
 			sdev_printk(KERN_INFO, sdev,
@@ -1310,6 +1408,10 @@ _scsih_slave_alloc(struct scsi_device *sdev)
 				__func__, __LINE__);
 			sas_device->starget = starget;
 		}
+
+		if (sas_device)
+			sas_device_put(sas_device);
+
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 
@@ -1344,10 +1446,14 @@ _scsih_slave_destroy(struct scsi_device *sdev)
 
 	if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-		   sas_target_priv_data->sas_address);
+		sas_device = __mpt3sas_get_sdev_from_target(ioc,
+				sas_target_priv_data);
 		if (sas_device && !sas_target_priv_data->num_luns)
 			sas_device->starget = NULL;
+
+		if (sas_device)
+			sas_device_put(sas_device);
+
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 
@@ -1783,7 +1889,7 @@ _scsih_slave_configure(struct scsi_device *sdev)
 	}
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	   sas_device_priv_data->sas_target->sas_address);
 	if (!sas_device) {
 		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -1823,12 +1929,12 @@ _scsih_slave_configure(struct scsi_device *sdev)
 		     ds, sas_device->enclosure_level,
 		     sas_device->connector_name);
 
+	sas_device_put(sas_device);
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
 	if (!ssp_target)
 		_scsih_display_sata_capabilities(ioc, handle, sdev);
 
-
 	_scsih_change_queue_depth(sdev, qdepth);
 
 	if (ssp_target) {
@@ -2219,8 +2325,7 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
 		    device_str, (unsigned long long)priv_target->sas_address);
 	} else {
 		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-		    priv_target->sas_address);
+		sas_device = __mpt3sas_get_sdev_from_target(ioc, priv_target);
 		if (sas_device) {
 			if (priv_target->flags &
 			    MPT_TARGET_FLAGS_RAID_COMPONENT) {
@@ -2246,8 +2351,9 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
 				"enclosure level(0x%04x),connector name(%s)\n",
 				 sas_device->enclosure_level,
 				 sas_device->connector_name);
+
+			sas_device_put(sas_device);
 		}
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 }
 
@@ -2321,10 +2427,11 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
 {
 	struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
 	struct MPT3SAS_DEVICE *sas_device_priv_data;
-	struct _sas_device *sas_device;
-	unsigned long flags;
+	struct _sas_device *sas_device = NULL;
 	u16	handle;
 	int r;
+	struct scsi_target *starget = scmd->device->sdev_target;
+	struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
 
 	sdev_printk(KERN_INFO, scmd->device,
 		"attempting device reset! scmd(%p)\n", scmd);
@@ -2344,12 +2451,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
 	handle = 0;
 	if (sas_device_priv_data->sas_target->flags &
 	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc,
-		   sas_device_priv_data->sas_target->handle);
+		sas_device = mpt3sas_get_sdev_from_target(ioc,
+				target_priv_data);
 		if (sas_device)
 			handle = sas_device->volume_handle;
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	} else
 		handle = sas_device_priv_data->sas_target->handle;
 
@@ -2366,6 +2471,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
  out:
 	sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n",
 	    ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+	if (sas_device)
+		sas_device_put(sas_device);
+
 	return r;
 }
 
@@ -2380,11 +2489,11 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
 {
 	struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
 	struct MPT3SAS_DEVICE *sas_device_priv_data;
-	struct _sas_device *sas_device;
-	unsigned long flags;
+	struct _sas_device *sas_device = NULL;
 	u16	handle;
 	int r;
 	struct scsi_target *starget = scmd->device->sdev_target;
+	struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
 
 	starget_printk(KERN_INFO, starget, "attempting target reset! scmd(%p)\n",
 		scmd);
@@ -2404,12 +2513,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
 	handle = 0;
 	if (sas_device_priv_data->sas_target->flags &
 	    MPT_TARGET_FLAGS_RAID_COMPONENT) {
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc,
-		   sas_device_priv_data->sas_target->handle);
+		sas_device = mpt3sas_get_sdev_from_target(ioc,
+				target_priv_data);
 		if (sas_device)
 			handle = sas_device->volume_handle;
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	} else
 		handle = sas_device_priv_data->sas_target->handle;
 
@@ -2426,6 +2533,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
  out:
 	starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n",
 	    ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
+
+	if (sas_device)
+		sas_device_put(sas_device);
+
 	return r;
 }
 
@@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	struct scsi_device *sdev;
 	struct _sas_device *sas_device;
 
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (!sas_device)
 		return;
 
@@ -2779,6 +2890,8 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 			continue;
 		_scsih_internal_device_block(sdev, sas_device_priv_data);
 	}
+
+	sas_device_put(sas_device);
 }
 
 /**
@@ -2804,15 +2917,15 @@ _scsih_block_io_to_children_attached_to_ex(struct MPT3SAS_ADAPTER *ioc,
 
 	list_for_each_entry(mpt3sas_port,
 	   &sas_expander->sas_port_list, port_list) {
-		if (mpt3sas_port->remote_identify.device_type ==
-		    SAS_END_DEVICE) {
+		if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
 			spin_lock_irqsave(&ioc->sas_device_lock, flags);
-			sas_device =
-			    mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-			   mpt3sas_port->remote_identify.sas_address);
-			if (sas_device)
+			sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+					mpt3sas_port->remote_identify.sas_address);
+			if (sas_device) {
 				set_bit(sas_device->handle,
-				    ioc->blocking_handles);
+					ioc->blocking_handles);
+				sas_device_put(sas_device);
+			}
 			spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 		}
 	}
@@ -2880,7 +2993,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 {
 	Mpi2SCSITaskManagementRequest_t *mpi_request;
 	u16 smid;
-	struct _sas_device *sas_device;
+	struct _sas_device *sas_device = NULL;
 	struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
 	u64 sas_address = 0;
 	unsigned long flags;
@@ -2913,7 +3026,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 		return;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device && sas_device->starget &&
 	    sas_device->starget->hostdata) {
 		sas_target_priv_data = sas_device->starget->hostdata;
@@ -2947,14 +3060,14 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	if (!smid) {
 		delayed_tr = kzalloc(sizeof(*delayed_tr), GFP_ATOMIC);
 		if (!delayed_tr)
-			return;
+			goto out;
 		INIT_LIST_HEAD(&delayed_tr->list);
 		delayed_tr->handle = handle;
 		list_add_tail(&delayed_tr->list, &ioc->delayed_tr_list);
 		dewtprintk(ioc, pr_info(MPT3SAS_FMT
 		    "DELAYED:tr:handle(0x%04x), (open)\n",
 		    ioc->name, handle));
-		return;
+		goto out;
 	}
 
 	dewtprintk(ioc, pr_info(MPT3SAS_FMT
@@ -2968,6 +3081,9 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	mpi_request->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET;
 	mpt3sas_base_put_smid_hi_priority(ioc, smid);
 	mpt3sas_trigger_master(ioc, MASTER_TRIGGER_DEVICE_REMOVAL);
+out:
+	if (sas_device)
+		sas_device_put(sas_device);
 }
 
 /**
@@ -3818,7 +3934,6 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 	char *desc_scsi_state = ioc->tmp_string;
 	u32 log_info = le32_to_cpu(mpi_reply->IOCLogInfo);
 	struct _sas_device *sas_device = NULL;
-	unsigned long flags;
 	struct scsi_target *starget = scmd->device->sdev_target;
 	struct MPT3SAS_TARGET *priv_target = starget->hostdata;
 	char *device_str = NULL;
@@ -3946,9 +4061,7 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 		pr_warn(MPT3SAS_FMT "\t%s wwid(0x%016llx)\n", ioc->name,
 		    device_str, (unsigned long long)priv_target->sas_address);
 	} else {
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-		    priv_target->sas_address);
+		sas_device = mpt3sas_get_sdev_from_target(ioc, priv_target);
 		if (sas_device) {
 			pr_warn(MPT3SAS_FMT
 				"\tsas_address(0x%016llx), phy(%d)\n",
@@ -3967,8 +4080,9 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
 				  " connector name( %s)\n", ioc->name,
 				  sas_device->enclosure_level,
 				  sas_device->connector_name);
+
+			sas_device_put(sas_device);
 		}
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	}
 
 	pr_warn(MPT3SAS_FMT
@@ -4020,7 +4134,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	Mpi2SepRequest_t mpi_request;
 	struct _sas_device *sas_device;
 
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (!sas_device)
 		return;
 
@@ -4035,7 +4149,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	    &mpi_request)) != 0) {
 		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
 		__FILE__, __LINE__, __func__);
-		return;
+		goto out;
 	}
 	sas_device->pfa_led_on = 1;
 
@@ -4044,8 +4158,10 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 			"enclosure_processor: ioc_status (0x%04x), loginfo(0x%08x)\n",
 			ioc->name, le16_to_cpu(mpi_reply.IOCStatus),
 		    le32_to_cpu(mpi_reply.IOCLogInfo)));
-		return;
+		goto out;
 	}
+out:
+	sas_device_put(sas_device);
 }
 /**
  * _scsih_turn_off_pfa_led - turn off Fault LED
@@ -4128,19 +4244,17 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 
 	/* only handle non-raid devices */
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (!sas_device) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
+		goto out_unlock;
 	}
 	starget = sas_device->starget;
 	sas_target_priv_data = starget->hostdata;
 
 	if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
-	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	   ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)))
+		goto out_unlock;
+
 	if (sas_device->enclosure_handle != 0)
 		starget_printk(KERN_INFO, starget, "predicted fault, "
 			"enclosure logical id(0x%016llx), slot(%d)\n",
@@ -4163,7 +4277,7 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	if (!event_reply) {
 		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
 		    ioc->name, __FILE__, __LINE__, __func__);
-		return;
+		goto out;
 	}
 
 	event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
@@ -4180,6 +4294,14 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
 	mpt3sas_ctl_add_to_event_log(ioc, event_reply);
 	kfree(event_reply);
+out:
+	if (sas_device)
+		sas_device_put(sas_device);
+	return;
+
+out_unlock:
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	goto out;
 }
 
 /**
@@ -4933,13 +5055,11 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	    sas_address);
 
-	if (!sas_device) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	if (!sas_device)
+		goto out_unlock;
 
 	if (unlikely(sas_device->handle != handle)) {
 		starget = sas_device->starget;
@@ -4967,20 +5087,24 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
 		pr_err(MPT3SAS_FMT
 			"device is not present handle(0x%04x), flags!!!\n",
 			ioc->name, handle);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
+		goto out_unlock;
 	}
 
 	/* check if there were any issues with discovery */
 	if (_scsih_check_access_status(ioc, sas_address, handle,
-	    sas_device_pg0.AccessStatus)) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	    sas_device_pg0.AccessStatus))
+		goto out_unlock;
 
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	_scsih_ublock_io_device(ioc, sas_address);
+	if (sas_device)
+		sas_device_put(sas_device);
+	return;
 
+out_unlock:
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	if (sas_device)
+		sas_device_put(sas_device);
 }
 
 /**
@@ -5005,7 +5129,6 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 	u32 ioc_status;
 	u64 sas_address;
 	u32 device_info;
-	unsigned long flags;
 
 	if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
 	    MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
@@ -5041,13 +5164,11 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 	    sas_device_pg0.AccessStatus))
 		return -1;
 
-	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	    sas_address);
-	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-
-	if (sas_device)
+	sas_device = mpt3sas_get_sdev_by_addr(ioc, sas_address);
+	if (sas_device) {
+		sas_device_put(sas_device);
 		return -1;
+	}
 
 	sas_device = kzalloc(sizeof(struct _sas_device),
 	    GFP_KERNEL);
@@ -5057,6 +5178,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 		return 0;
 	}
 
+	kref_init(&sas_device->refcount);
 	sas_device->handle = handle;
 	if (_scsih_get_sas_address(ioc,
 	    le16_to_cpu(sas_device_pg0.ParentDevHandle),
@@ -5098,6 +5220,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
 	else
 		_scsih_sas_device_add(ioc, sas_device);
 
+	sas_device_put(sas_device);
 	return 0;
 }
 
@@ -5506,25 +5629,25 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc,
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	sas_address = le64_to_cpu(event_data->SASAddress);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
-	    sas_address);
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
 
-	if (!sas_device || !sas_device->starget) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	if (!sas_device || !sas_device->starget)
+		goto out;
 
 	target_priv_data = sas_device->starget->hostdata;
-	if (!target_priv_data) {
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		return;
-	}
+	if (!target_priv_data)
+		goto out;
 
 	if (event_data->ReasonCode ==
 	    MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET)
 		target_priv_data->tm_busy = 1;
 	else
 		target_priv_data->tm_busy = 0;
+
+out:
+	if (sas_device)
+		sas_device_put(sas_device);
+
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 }
 
@@ -6008,7 +6131,7 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
 	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device) {
 		sas_device->volume_handle = 0;
 		sas_device->volume_wwid = 0;
@@ -6027,6 +6150,8 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
 	/* exposing raid component */
 	if (starget)
 		starget_for_each_device(starget, NULL, _scsih_reprobe_lun);
+
+	sas_device_put(sas_device);
 }
 
 /**
@@ -6055,7 +6180,7 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
 		    &volume_wwid);
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
+	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device) {
 		set_bit(handle, ioc->pd_handles);
 		if (sas_device->starget && sas_device->starget->hostdata) {
@@ -6075,6 +6200,8 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
 	_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
 	if (starget)
 		starget_for_each_device(starget, (void *)1, _scsih_reprobe_lun);
+
+	sas_device_put(sas_device);
 }
 
 /**
@@ -6107,7 +6234,6 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
 	Mpi2EventIrConfigElement_t *element)
 {
 	struct _sas_device *sas_device;
-	unsigned long flags;
 	u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
 	Mpi2ConfigReply_t mpi_reply;
 	Mpi2SasDevicePage0_t sas_device_pg0;
@@ -6117,11 +6243,10 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
 
 	set_bit(handle, ioc->pd_handles);
 
-	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (sas_device) {
 		_scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
+		sas_device_put(sas_device);
 		return;
 	}
 
@@ -6398,7 +6523,6 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
 	u16 handle, parent_handle;
 	u32 state;
 	struct _sas_device *sas_device;
-	unsigned long flags;
 	Mpi2ConfigReply_t mpi_reply;
 	Mpi2SasDevicePage0_t sas_device_pg0;
 	u32 ioc_status;
@@ -6427,12 +6551,12 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
 	case MPI2_RAID_PD_STATE_HOT_SPARE:
 
 		set_bit(handle, ioc->pd_handles);
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 
-		if (sas_device)
+		sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
+		if (sas_device) {
+			sas_device_put(sas_device);
 			return;
+		}
 
 		if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
 		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
@@ -6906,6 +7030,7 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
 	struct _raid_device *raid_device, *raid_device_next;
 	struct list_head tmp_list;
 	unsigned long flags;
+	LIST_HEAD(head);
 
 	pr_info(MPT3SAS_FMT "removing unresponding devices: start\n",
 	    ioc->name);
@@ -6913,14 +7038,29 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
 	/* removing unresponding end devices */
 	pr_info(MPT3SAS_FMT "removing unresponding devices: end-devices\n",
 	    ioc->name);
+
+	/*
+	 * Iterate, pulling off devices marked as non-responding. We become the
+	 * owner for the reference the list had on any object we prune.
+	 */
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
 	list_for_each_entry_safe(sas_device, sas_device_next,
-	    &ioc->sas_device_list, list) {
+				 &ioc->sas_device_list, list) {
 		if (!sas_device->responding)
-			mpt3sas_device_remove_by_sas_address(ioc,
-			    sas_device->sas_address);
+			list_move_tail(&sas_device->list, &head);
 		else
 			sas_device->responding = 0;
 	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	/*
+	 * Now, uninitialize and remove the unresponding devices we pruned.
+	 */
+	list_for_each_entry_safe(sas_device, sas_device_next, &head, list) {
+		_scsih_remove_device(ioc, sas_device);
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
 
 	/* removing unresponding volumes */
 	if (ioc->ir_firmware) {
@@ -7074,11 +7214,11 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
 		}
 		phys_disk_num = pd_pg0.PhysDiskNum;
 		handle = le16_to_cpu(pd_pg0.DevHandle);
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		if (sas_device)
+		sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
+		if (sas_device) {
+			sas_device_put(sas_device);
 			continue;
+		}
 		if (mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
 		    &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
 		    handle) != 0)
@@ -7199,12 +7339,12 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
 		if (!(_scsih_is_end_device(
 		    le32_to_cpu(sas_device_pg0.DeviceInfo))))
 			continue;
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+		sas_device = mpt3sas_get_sdev_by_addr(ioc,
 		    le64_to_cpu(sas_device_pg0.SASAddress));
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
-		if (sas_device)
+		if (sas_device) {
+			sas_device_put(sas_device);
 			continue;
+		}
 		parent_handle = le16_to_cpu(sas_device_pg0.ParentDevHandle);
 		if (!_scsih_get_sas_address(ioc, parent_handle, &sas_address)) {
 			pr_info(MPT3SAS_FMT "\tBEFORE adding end device: " \
@@ -7831,6 +7971,48 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
 	}
 }
 
+static struct _sas_device *get_next_sas_device(struct MPT3SAS_ADAPTER *ioc)
+{
+	struct _sas_device *sas_device = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+	if (!list_empty(&ioc->sas_device_init_list)) {
+		sas_device = list_first_entry(&ioc->sas_device_init_list,
+				struct _sas_device, list);
+		sas_device_get(sas_device);
+	}
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+
+	return sas_device;
+}
+
+static void sas_device_make_active(struct MPT3SAS_ADAPTER *ioc,
+		struct _sas_device *sas_device)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&ioc->sas_device_lock, flags);
+
+	/*
+	 * Since we dropped the lock during the call to port_add(), we need to
+	 * be careful here that somebody else didn't move or delete this item
+	 * while we were busy with other things.
+	 *
+	 * If it was on the list, we need a put() for the reference the list
+	 * had. Either way, we need a get() for the destination list.
+	 */
+	if (!list_empty(&sas_device->list)) {
+		list_del_init(&sas_device->list);
+		sas_device_put(sas_device);
+	}
+
+	sas_device_get(sas_device);
+	list_add_tail(&sas_device->list, &ioc->sas_device_list);
+
+	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+}
+
 /**
  * _scsih_probe_sas - reporting sas devices to sas transport
  * @ioc: per adapter object
@@ -7840,17 +8022,13 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
 static void
 _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
 {
-	struct _sas_device *sas_device, *next;
-	unsigned long flags;
-
-	/* SAS Device List */
-	list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
-	    list) {
+	struct _sas_device *sas_device;
 
+	while ((sas_device = get_next_sas_device(ioc))) {
 		if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
-		    sas_device->sas_address_parent)) {
-			list_del(&sas_device->list);
-			kfree(sas_device);
+				sas_device->sas_address_parent)) {
+			_scsih_sas_device_remove(ioc, sas_device);
+			sas_device_put(sas_device);
 			continue;
 		} else if (!sas_device->starget) {
 			/*
@@ -7861,17 +8039,16 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
 			 */
 			if (!ioc->is_driver_loading) {
 				mpt3sas_transport_port_remove(ioc,
-				    sas_device->sas_address,
-				    sas_device->sas_address_parent);
-				list_del(&sas_device->list);
-				kfree(sas_device);
+						sas_device->sas_address,
+						sas_device->sas_address_parent);
+				_scsih_sas_device_remove(ioc, sas_device);
+				sas_device_put(sas_device);
 				continue;
 			}
 		}
 
-		spin_lock_irqsave(&ioc->sas_device_lock, flags);
-		list_move_tail(&sas_device->list, &ioc->sas_device_list);
-		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+		sas_device_make_active(ioc, sas_device);
+		sas_device_put(sas_device);
 	}
 }
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 70fd019..6074b11 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -734,7 +734,7 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
 	rphy->identify = mpt3sas_port->remote_identify;
 
 	if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
-		sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+		sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 				    mpt3sas_port->remote_identify.sas_address);
 		if (!sas_device) {
 			dfailprintk(ioc, printk(MPT3SAS_FMT
@@ -750,8 +750,10 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
 		    ioc->name, __FILE__, __LINE__, __func__);
 	}
 
-	if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE)
+	if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
 		sas_device->pend_sas_rphy_add = 0;
+		sas_device_put(sas_device);
+	}
 
 	if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
 		dev_printk(KERN_INFO, &rphy->dev,
@@ -1324,15 +1326,17 @@ _transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
 	int rc;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	    rphy->identify.sas_address);
 	if (sas_device) {
 		*identifier = sas_device->enclosure_logical_id;
 		rc = 0;
+		sas_device_put(sas_device);
 	} else {
 		*identifier = 0;
 		rc = -ENXIO;
 	}
+
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	return rc;
 }
@@ -1352,12 +1356,14 @@ _transport_get_bay_identifier(struct sas_rphy *rphy)
 	int rc;
 
 	spin_lock_irqsave(&ioc->sas_device_lock, flags);
-	sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
+	sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 	    rphy->identify.sas_address);
-	if (sas_device)
+	if (sas_device) {
 		rc = sas_device->slot;
-	else
+		sas_device_put(sas_device);
+	} else {
 		rc = -ENXIO;
+	}
 	spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 	return rc;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH-v2 2/2] mpt3sas: Refcount fw_events and fix unsafe list usage
  2015-08-30  7:54 [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger
  2015-08-30  7:54 ` [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
@ 2015-08-30  7:54 ` Nicholas A. Bellinger
  2015-09-04 18:47 ` [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger
  2 siblings, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-08-30  7:54 UTC (permalink / raw)
  To: linux-scsi
  Cc: linux-kernel, James Bottomley, Calvin Owens, Christoph Hellwig,
	Sreekanth Reddy, MPT-FusionLinux.pdl, kernel-team,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

The fw_event_work struct is concurrently referenced at shutdown, so
add a refcount to protect it, and refactor the code to use it.

Additionally, refactor _scsih_fw_event_cleanup_queue() such that it
no longer iterates over the list without holding the lock, since
_firmware_event_work() concurrently deletes items from the list.

This patch is a port of Calvin's PATCH-v4 for mpt2sas code.

Cc: Calvin Owens <calvinowens@fb.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 116 ++++++++++++++++++++++++++++-------
 1 file changed, 94 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 897153b..0431cd0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -175,6 +175,7 @@ struct sense_info {
  * @VP_ID: virtual port id
  * @ignore: flag meaning this event has been marked to ignore
  * @event: firmware event MPI2_EVENT_XXX defined in mpt2_ioc.h
+ * @refcount: reference count for fw_event_work
  * @event_data: reply event data payload follows
  *
  * This object stored on ioc->fw_event_list.
@@ -191,9 +192,37 @@ struct fw_event_work {
 	u8			VP_ID;
 	u8			ignore;
 	u16			event;
+	struct kref		refcount;
 	char			event_data[0] __aligned(4);
 };
 
+static void fw_event_work_free(struct kref *r)
+{
+	kfree(container_of(r, struct fw_event_work, refcount));
+}
+
+static void fw_event_work_get(struct fw_event_work *fw_work)
+{
+	kref_get(&fw_work->refcount);
+}
+
+static void fw_event_work_put(struct fw_event_work *fw_work)
+{
+	kref_put(&fw_work->refcount, fw_event_work_free);
+}
+
+static struct fw_event_work *alloc_fw_event_work(int len)
+{
+	struct fw_event_work *fw_event;
+
+	fw_event = kzalloc(sizeof(*fw_event) + len, GFP_ATOMIC);
+	if (!fw_event)
+		return NULL;
+
+	kref_init(&fw_event->refcount);
+	return fw_event;
+}
+
 /* raid transport support */
 static struct raid_template *mpt3sas_raid_template;
 
@@ -2594,32 +2623,36 @@ _scsih_fw_event_add(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 		return;
 
 	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	fw_event_work_get(fw_event);
 	INIT_LIST_HEAD(&fw_event->list);
 	list_add_tail(&fw_event->list, &ioc->fw_event_list);
 	INIT_WORK(&fw_event->work, _firmware_event_work);
+	fw_event_work_get(fw_event);
 	queue_work(ioc->firmware_event_thread, &fw_event->work);
 	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
 }
 
 /**
- * _scsih_fw_event_free - delete fw_event
+ * _scsih_fw_event_del_from_list - delete fw_event from the list
  * @ioc: per adapter object
  * @fw_event: object describing the event
  * Context: This function will acquire ioc->fw_event_lock.
  *
- * This removes firmware event object from link list, frees associated memory.
+ * If the fw_event is on the fw_event_list, remove it and do a put.
  *
  * Return nothing.
  */
 static void
-_scsih_fw_event_free(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work
+_scsih_fw_event_del_from_list(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work
 	*fw_event)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&ioc->fw_event_lock, flags);
-	list_del(&fw_event->list);
-	kfree(fw_event);
+	if (!list_empty(&fw_event->list)) {
+		list_del_init(&fw_event->list);
+		fw_event_work_put(fw_event);
+	}
 	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
 }
 
@@ -2639,14 +2672,14 @@ mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc,
 
 	if (ioc->is_driver_loading)
 		return;
-	fw_event = kzalloc(sizeof(*fw_event) + sizeof(*event_data),
-			   GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(sizeof(*event_data));
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_PROCESS_TRIGGER_DIAG;
 	fw_event->ioc = ioc;
 	memcpy(fw_event->event_data, event_data, sizeof(*event_data));
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -2662,12 +2695,13 @@ _scsih_error_recovery_delete_devices(struct MPT3SAS_ADAPTER *ioc)
 
 	if (ioc->is_driver_loading)
 		return;
-	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(0);
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_REMOVE_UNRESPONDING_DEVICES;
 	fw_event->ioc = ioc;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -2681,12 +2715,29 @@ mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc)
 {
 	struct fw_event_work *fw_event;
 
-	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(0);
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_PORT_ENABLE_COMPLETE;
 	fw_event->ioc = ioc;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
+}
+
+static struct fw_event_work *dequeue_next_fw_event(struct MPT3SAS_ADAPTER *ioc)
+{
+	unsigned long flags;
+	struct fw_event_work *fw_event = NULL;
+
+	spin_lock_irqsave(&ioc->fw_event_lock, flags);
+	if (!list_empty(&ioc->fw_event_list)) {
+		fw_event = list_first_entry(&ioc->fw_event_list,
+				struct fw_event_work, list);
+		list_del_init(&fw_event->list);
+	}
+	spin_unlock_irqrestore(&ioc->fw_event_lock, flags);
+
+	return fw_event;
 }
 
 /**
@@ -2701,17 +2752,25 @@ mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc)
 static void
 _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc)
 {
-	struct fw_event_work *fw_event, *next;
+	struct fw_event_work *fw_event;
 
 	if (list_empty(&ioc->fw_event_list) ||
 	     !ioc->firmware_event_thread || in_interrupt())
 		return;
 
-	list_for_each_entry_safe(fw_event, next, &ioc->fw_event_list, list) {
-		if (cancel_delayed_work_sync(&fw_event->delayed_work)) {
-			_scsih_fw_event_free(ioc, fw_event);
-			continue;
-		}
+	while ((fw_event = dequeue_next_fw_event(ioc))) {
+		/*
+		 * Wait on the fw_event to complete. If this returns 1, then
+		 * the event was never executed, and we need a put for the
+		 * reference the delayed_work had on the fw_event.
+		 *
+		 * If it did execute, we wait for it to finish, and the put will
+		 * happen from _firmware_event_work()
+		 */
+		if (cancel_work_sync(&fw_event->work))
+			fw_event_work_put(fw_event);
+
+		fw_event_work_put(fw_event);
 	}
 }
 
@@ -4214,13 +4273,14 @@ _scsih_send_event_to_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 {
 	struct fw_event_work *fw_event;
 
-	fw_event = kzalloc(sizeof(struct fw_event_work), GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(0);
 	if (!fw_event)
 		return;
 	fw_event->event = MPT3SAS_TURN_ON_PFA_LED;
 	fw_event->device_handle = handle;
 	fw_event->ioc = ioc;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -7436,10 +7496,11 @@ mpt3sas_scsih_reset_handler(struct MPT3SAS_ADAPTER *ioc, int reset_phase)
 static void
 _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 {
+	_scsih_fw_event_del_from_list(ioc, fw_event);
+
 	/* the queue is being flushed so ignore this event */
-	if (ioc->remove_host ||
-	    ioc->pci_error_recovery) {
-		_scsih_fw_event_free(ioc, fw_event);
+	if (ioc->remove_host || ioc->pci_error_recovery) {
+		fw_event_work_put(fw_event);
 		return;
 	}
 
@@ -7450,8 +7511,17 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 			fw_event->event_data);
 		break;
 	case MPT3SAS_REMOVE_UNRESPONDING_DEVICES:
-		while (scsi_host_in_recovery(ioc->shost) || ioc->shost_recovery)
+		while (scsi_host_in_recovery(ioc->shost) ||
+				ioc->shost_recovery) {
+			/*
+			 * If we're unloading, bail. Otherwise, this can become
+			 * an infinite loop.
+			 */
+			if (ioc->remove_host)
+				goto out;
+
 			ssleep(1);
+		}
 		_scsih_remove_unresponding_sas_devices(ioc);
 		_scsih_scan_for_devices_after_reset(ioc);
 		break;
@@ -7496,7 +7566,8 @@ _mpt3sas_fw_work(struct MPT3SAS_ADAPTER *ioc, struct fw_event_work *fw_event)
 		_scsih_sas_ir_operation_status_event(ioc, fw_event);
 		break;
 	}
-	_scsih_fw_event_free(ioc, fw_event);
+out:
+	fw_event_work_put(fw_event);
 }
 
 /**
@@ -7612,7 +7683,7 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
 	}
 
 	sz = le16_to_cpu(mpi_reply->EventDataLength) * 4;
-	fw_event = kzalloc(sizeof(*fw_event) + sz, GFP_ATOMIC);
+	fw_event = alloc_fw_event_work(sz);
 	if (!fw_event) {
 		pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
 		    ioc->name, __FILE__, __LINE__, __func__);
@@ -7625,6 +7696,7 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, u8 msix_index,
 	fw_event->VP_ID = mpi_reply->VP_ID;
 	fw_event->event = event;
 	_scsih_fw_event_add(ioc, fw_event);
+	fw_event_work_put(fw_event);
 	return 1;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas
  2015-08-30  7:54 [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger
  2015-08-30  7:54 ` [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
  2015-08-30  7:54 ` [PATCH-v2 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
@ 2015-09-04 18:47 ` Nicholas A. Bellinger
  2 siblings, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-09-04 18:47 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: linux-scsi, linux-kernel, James Bottomley, Calvin Owens,
	Christoph Hellwig, MPT-FusionLinux.pdl, nab, Martin K. Petersen

Hi Sreekanth,

(Adding MKP CC')

On Sun, 2015-08-30 at 07:54 +0000, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
> 
> Hi all,
> 
> This series is a mpt3sas LLD forward port of Calvin Owens' for-next
> reference counting bugfix series for mpt2sas LLD code.
> 
> His latest patch series can be found here:
> 
> [PATCH v4 0/2] Fixes for memory corruption in mpt2sas
> http://marc.info/?l=linux-scsi&m=143951695904115&w=2
> 
> The differences between mpt2sas + mpt3sas in this area are very
> small, and the changes are required to address a class of long
> standing NULL pointer dereference bugs through-out both LLDs.
> 
> With Calvin's changes in place for mpt3sas, the original NULL
> pointer dereference OOPsen during probe with a failed HDD appear
> to be resolved, and so far no new regressions have been reported
> with -v1 series code.
> 
> The -v1 series code for mpt3sas has been tested on v3.14.47 with
> 40x SAS3008 HBAs, with three preceeding upstream mpt3sas patches:
> 
> 4dc06fd mpt3sas: delay scsi_add_host call to work with scsi-mq
> 35b6236 mpt3sas: combine fw_event_work and its event_data
> 62c4da4 mpt3sas: correct scsi_{target,device} hostdata allocation
> 
> Please review.
> 
> --nab
> 
> -v2 changes:
>   - Fix _scsih_block_io_device() v4.3-rc0 brekage
>   - Fix mpt3sas_transport_port_add() v4.3-rc0 breakage
> 
> Nicholas Bellinger (2):
>   mpt3sas: Refcount sas_device objects and fix unsafe list usage
>   mpt3sas: Refcount fw_events and fix unsafe list usage
> 
>  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 595 ++++++++++++++++++++++---------
>  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
>  3 files changed, 458 insertions(+), 180 deletions(-)
> 

Have you been able to verify this port of Calvin's changes to mpt3sas
code..?

On my end, we're up to 60x machines with SAS3008 HBAs w/ 12 HDDs each
running the original -v1 series on v3.14.47 code.

They have been running continuous I/O stress tests and node reboot
tests, and after two weeks we've still not run into any of the original
NULL pointer dereference OOPsen, or any other new regressions.

So at this point I think they are looking safe to merge for -rc1.

Can you please give your Reviewed-by -> Acked-by..?

Thank you,

--nab


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-08-30  7:54 ` [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
@ 2015-09-08 11:55   ` Sreekanth Reddy
  2015-09-09  6:21     ` Nicholas A. Bellinger
  2015-09-09 14:29     ` Chaitra Basappa
  0 siblings, 2 replies; 11+ messages in thread
From: Sreekanth Reddy @ 2015-09-08 11:55 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, James Bottomley, Calvin Owens,
	Christoph Hellwig, MPT-FusionLinux.pdl, kernel-team,
	Nicholas Bellinger, Chaitra Basappa

On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger
<nab@daterainc.com> wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
>
> These objects can be referenced concurrently throughout the driver, we
> need a way to make sure threads can't delete them out from under
> each other. This patch adds the refcount, and refactors the code to use
> it.
>
> Additionally, we cannot iterate over the sas_device_list without
> holding the lock, or we risk corrupting random memory if items are
> added or deleted as we iterate. This patch refactors _scsih_probe_sas()
> to use the sas_device_list in a safe way.
>
> This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> mpt3sas changes in scsi.git/for-next.
>
> Cc: Calvin Owens <calvinowens@fb.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479 +++++++++++++++++++++----------
>  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
>  3 files changed, 364 insertions(+), 158 deletions(-)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index f0e462b..9a73c8b 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -248,6 +248,7 @@ struct Mpi2ManufacturingPage11_t {
>   * @flags: MPT_TARGET_FLAGS_XXX flags
>   * @deleted: target flaged for deletion
>   * @tm_busy: target is busy with TM request.
> + * @sdev: The sas_device associated with this target
>   */
>  struct MPT3SAS_TARGET {
>         struct scsi_target *starget;
> @@ -257,6 +258,7 @@ struct MPT3SAS_TARGET {
>         u32     flags;
>         u8      deleted;
>         u8      tm_busy;
> +       struct  _sas_device *sdev;
>  };
>
>
> @@ -335,6 +337,9 @@ struct _internal_cmd {
>   * @pfa_led_on: flag for PFA LED status
>   * @pend_sas_rphy_add: flag to check if device is in sas_rphy_add()
>   *     addition routine.
> + * @enclosure_level: used for enclosure services
> + * @connector_name: ASCII value from pg0.ConnectorName
> + * @refcount: reference count for deletion
>   */
>  struct _sas_device {
>         struct list_head list;
> @@ -358,8 +363,24 @@ struct _sas_device {
>         u8      pend_sas_rphy_add;
>         u8      enclosure_level;
>         u8      connector_name[4];
> +       struct kref refcount;
>  };
>
> +static inline void sas_device_get(struct _sas_device *s)
> +{
> +       kref_get(&s->refcount);
> +}
> +
> +static inline void sas_device_free(struct kref *r)
> +{
> +       kfree(container_of(r, struct _sas_device, refcount));
> +}
> +
> +static inline void sas_device_put(struct _sas_device *s)
> +{
> +       kref_put(&s->refcount, sas_device_free);
> +}
> +
>  /**
>   * struct _raid_device - raid volume link list
>   * @list: sas device list
> @@ -1090,7 +1111,9 @@ struct _sas_node *mpt3sas_scsih_expander_find_by_handle(
>         struct MPT3SAS_ADAPTER *ioc, u16 handle);
>  struct _sas_node *mpt3sas_scsih_expander_find_by_sas_address(
>         struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
> -struct _sas_device *mpt3sas_scsih_sas_device_find_by_sas_address(
> +struct _sas_device *mpt3sas_get_sdev_by_addr(
> +        struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
> +struct _sas_device *__mpt3sas_get_sdev_by_addr(
>         struct MPT3SAS_ADAPTER *ioc, u64 sas_address);
>
>  void mpt3sas_port_enable_complete(struct MPT3SAS_ADAPTER *ioc);
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 8ccef38..897153b 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -518,8 +518,61 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
>         }
>  }
>
> +static struct _sas_device *
> +__mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
> +       struct MPT3SAS_TARGET *tgt_priv)
> +{
> +       struct _sas_device *ret;
> +
> +       assert_spin_locked(&ioc->sas_device_lock);
> +
> +       ret = tgt_priv->sdev;
> +       if (ret)
> +               sas_device_get(ret);
> +
> +       return ret;
> +}
> +
> +static struct _sas_device *
> +mpt3sas_get_sdev_from_target(struct MPT3SAS_ADAPTER *ioc,
> +       struct MPT3SAS_TARGET *tgt_priv)
> +{
> +       struct _sas_device *ret;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +       ret = __mpt3sas_get_sdev_from_target(ioc, tgt_priv);
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +       return ret;
> +}
> +
> +struct _sas_device *
> +__mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
> +       u64 sas_address)
> +{
> +       struct _sas_device *sas_device;
> +
> +       assert_spin_locked(&ioc->sas_device_lock);
> +
> +       list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> +               if (sas_device->sas_address == sas_address)
> +                       goto found_device;
> +
> +       list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> +               if (sas_device->sas_address == sas_address)
> +                       goto found_device;
> +
> +       return NULL;
> +
> +found_device:
> +       sas_device_get(sas_device);
> +       return sas_device;
> +}
> +
> +
>  /**
> - * mpt3sas_scsih_sas_device_find_by_sas_address - sas device search
> + * mpt3sas_get_sdev_by_addr - sas device search
>   * @ioc: per adapter object
>   * @sas_address: sas address
>   * Context: Calling function should acquire ioc->sas_device_lock
> @@ -528,24 +581,44 @@ _scsih_determine_boot_device(struct MPT3SAS_ADAPTER *ioc,
>   * object.
>   */
>  struct _sas_device *
> -mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> +mpt3sas_get_sdev_by_addr(struct MPT3SAS_ADAPTER *ioc,
>         u64 sas_address)
>  {
>         struct _sas_device *sas_device;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> +                       sas_address);
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +       return sas_device;
> +}
> +
> +static struct _sas_device *
> +__mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> +{
> +       struct _sas_device *sas_device;
> +
> +       assert_spin_locked(&ioc->sas_device_lock);
>
>         list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> -               if (sas_device->sas_address == sas_address)
> -                       return sas_device;
> +               if (sas_device->handle == handle)
> +                       goto found_device;
>
>         list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> -               if (sas_device->sas_address == sas_address)
> -                       return sas_device;
> +               if (sas_device->handle == handle)
> +                       goto found_device;
>
>         return NULL;
> +
> +found_device:
> +       sas_device_get(sas_device);
> +       return sas_device;
>  }
>
>  /**
> - * _scsih_sas_device_find_by_handle - sas device search
> + * mpt3sas_get_sdev_by_handle - sas device search
>   * @ioc: per adapter object
>   * @handle: sas device handle (assigned by firmware)
>   * Context: Calling function should acquire ioc->sas_device_lock
> @@ -554,19 +627,16 @@ mpt3sas_scsih_sas_device_find_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
>   * object.
>   */
>  static struct _sas_device *
> -_scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> +mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  {
>         struct _sas_device *sas_device;
> +       unsigned long flags;
>
> -       list_for_each_entry(sas_device, &ioc->sas_device_list, list)
> -               if (sas_device->handle == handle)
> -                       return sas_device;
> -
> -       list_for_each_entry(sas_device, &ioc->sas_device_init_list, list)
> -               if (sas_device->handle == handle)
> -                       return sas_device;
> +       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> -       return NULL;
> +       return sas_device;
>  }
>
>  /**
> @@ -575,7 +645,7 @@ _scsih_sas_device_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>   * @sas_device: the sas_device object
>   * Context: This function will acquire ioc->sas_device_lock.
>   *
> - * Removing object and freeing associated memory from the ioc->sas_device_list.
> + * If sas_device is on the list, remove it and decrement its reference count.
>   */
>  static void
>  _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> @@ -601,10 +671,15 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
>                    "removing enclosure level(0x%04x), connector name( %s)\n",
>                    ioc->name, sas_device->enclosure_level,
>                    sas_device->connector_name);
> -
> +       /*
> +        * The lock serializes access to the list, but we still need to verify
> +        * that nobody removed the entry while we were waiting on the lock.
> +        */
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       list_del(&sas_device->list);
> -       kfree(sas_device);
> +       if (!list_empty(&sas_device->list)) {
> +               list_del_init(&sas_device->list);
> +               sas_device_put(sas_device);
> +       }
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>
> @@ -625,12 +700,17 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>                 return;
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -       if (sas_device)
> -               list_del(&sas_device->list);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> +       if (sas_device) {
> +               list_del_init(&sas_device->list);
> +               sas_device_put(sas_device);
> +       }
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -       if (sas_device)
> +
> +       if (sas_device) {
>                 _scsih_remove_device(ioc, sas_device);
> +               sas_device_put(sas_device);
> +       }
>  }
>
>  /**
> @@ -651,13 +731,17 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
>                 return;
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -           sas_address);
> -       if (sas_device)
> -               list_del(&sas_device->list);
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
> +       if (sas_device) {
> +               list_del_init(&sas_device->list);
> +               sas_device_put(sas_device);
> +       }
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -       if (sas_device)
> +
> +       if (sas_device) {
>                 _scsih_remove_device(ioc, sas_device);
> +               sas_device_put(sas_device);
> +       }
>  }
>
>  /**
> @@ -692,6 +776,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
>                     sas_device->enclosure_level, sas_device->connector_name));
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +       sas_device_get(sas_device);
>         list_add_tail(&sas_device->list, &ioc->sas_device_list);
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> @@ -745,6 +830,7 @@ _scsih_sas_device_init_add(struct MPT3SAS_ADAPTER *ioc,
>                     sas_device->connector_name));
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +       sas_device_get(sas_device);
>         list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
>         _scsih_determine_boot_device(ioc, sas_device, 0);
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -1123,12 +1209,15 @@ _scsih_change_queue_depth(struct scsi_device *sdev, int qdepth)
>                 goto not_sata;
>         if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))
>                 goto not_sata;
> +
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -          sas_device_priv_data->sas_target->sas_address);
> -       if (sas_device && sas_device->device_info &
> -           MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
> -               max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
> +       sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
> +       if (sas_device) {
> +               if (sas_device->device_info & MPI2_SAS_DEVICE_INFO_SATA_DEVICE)
> +                       max_depth = MPT3SAS_SATA_QUEUE_DEPTH;
> +
> +               sas_device_put(sas_device);
> +       }
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
>   not_sata:
> @@ -1185,12 +1274,13 @@ _scsih_target_alloc(struct scsi_target *starget)
>         /* sas/sata devices */
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
>         rphy = dev_to_rphy(starget->dev.parent);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>            rphy->identify.sas_address);
>
>         if (sas_device) {
>                 sas_target_priv_data->handle = sas_device->handle;
>                 sas_target_priv_data->sas_address = sas_device->sas_address;
> +               sas_target_priv_data->sdev = sas_device;
>                 sas_device->starget = starget;
>                 sas_device->id = starget->id;
>                 sas_device->channel = starget->channel;
> @@ -1240,13 +1330,21 @@ _scsih_target_destroy(struct scsi_target *starget)
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
>         rphy = dev_to_rphy(starget->dev.parent);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -          rphy->identify.sas_address);
> +       sas_device = __mpt3sas_get_sdev_from_target(ioc, sas_target_priv_data);
>         if (sas_device && (sas_device->starget == starget) &&
>             (sas_device->id == starget->id) &&
>             (sas_device->channel == starget->channel))
>                 sas_device->starget = NULL;
>
> +       if (sas_device) {
> +               /*
> +                * Corresponding get() is in _scsih_target_alloc()
> +                */
> +               sas_target_priv_data->sdev = NULL;
> +               sas_device_put(sas_device);
> +
> +               sas_device_put(sas_device);
> +       }
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
>   out:
> @@ -1302,7 +1400,7 @@ _scsih_slave_alloc(struct scsi_device *sdev)
>
>         if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
>                 spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +               sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>                                         sas_target_priv_data->sas_address);
>                 if (sas_device && (sas_device->starget == NULL)) {
>                         sdev_printk(KERN_INFO, sdev,
> @@ -1310,6 +1408,10 @@ _scsih_slave_alloc(struct scsi_device *sdev)
>                                 __func__, __LINE__);
>                         sas_device->starget = starget;
>                 }
> +
> +               if (sas_device)
> +                       sas_device_put(sas_device);
> +
>                 spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         }
>
> @@ -1344,10 +1446,14 @@ _scsih_slave_destroy(struct scsi_device *sdev)
>
>         if (!(sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)) {
>                 spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -                  sas_target_priv_data->sas_address);
> +               sas_device = __mpt3sas_get_sdev_from_target(ioc,
> +                               sas_target_priv_data);
>                 if (sas_device && !sas_target_priv_data->num_luns)
>                         sas_device->starget = NULL;
> +
> +               if (sas_device)
> +                       sas_device_put(sas_device);
> +
>                 spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         }
>
> @@ -1783,7 +1889,7 @@ _scsih_slave_configure(struct scsi_device *sdev)
>         }
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>            sas_device_priv_data->sas_target->sas_address);
>         if (!sas_device) {
>                 spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -1823,12 +1929,12 @@ _scsih_slave_configure(struct scsi_device *sdev)
>                      ds, sas_device->enclosure_level,
>                      sas_device->connector_name);
>
> +       sas_device_put(sas_device);
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
>         if (!ssp_target)
>                 _scsih_display_sata_capabilities(ioc, handle, sdev);
>
> -
>         _scsih_change_queue_depth(sdev, qdepth);
>
>         if (ssp_target) {
> @@ -2219,8 +2325,7 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
>                     device_str, (unsigned long long)priv_target->sas_address);
>         } else {
>                 spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -                   priv_target->sas_address);
> +               sas_device = __mpt3sas_get_sdev_from_target(ioc, priv_target);
>                 if (sas_device) {
>                         if (priv_target->flags &
>                             MPT_TARGET_FLAGS_RAID_COMPONENT) {
> @@ -2246,8 +2351,9 @@ _scsih_tm_display_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
>                                 "enclosure level(0x%04x),connector name(%s)\n",
>                                  sas_device->enclosure_level,
>                                  sas_device->connector_name);
> +
> +                       sas_device_put(sas_device);
>                 }
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         }
>  }
>
> @@ -2321,10 +2427,11 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
>  {
>         struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
>         struct MPT3SAS_DEVICE *sas_device_priv_data;
> -       struct _sas_device *sas_device;
> -       unsigned long flags;
> +       struct _sas_device *sas_device = NULL;
>         u16     handle;
>         int r;
> +       struct scsi_target *starget = scmd->device->sdev_target;
> +       struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
>
>         sdev_printk(KERN_INFO, scmd->device,
>                 "attempting device reset! scmd(%p)\n", scmd);
> @@ -2344,12 +2451,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
>         handle = 0;
>         if (sas_device_priv_data->sas_target->flags &
>             MPT_TARGET_FLAGS_RAID_COMPONENT) {
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = _scsih_sas_device_find_by_handle(ioc,
> -                  sas_device_priv_data->sas_target->handle);
> +               sas_device = mpt3sas_get_sdev_from_target(ioc,
> +                               target_priv_data);
>                 if (sas_device)
>                         handle = sas_device->volume_handle;
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         } else
>                 handle = sas_device_priv_data->sas_target->handle;
>
> @@ -2366,6 +2471,10 @@ _scsih_dev_reset(struct scsi_cmnd *scmd)
>   out:
>         sdev_printk(KERN_INFO, scmd->device, "device reset: %s scmd(%p)\n",
>             ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
> +
> +       if (sas_device)
> +               sas_device_put(sas_device);
> +
>         return r;
>  }
>
> @@ -2380,11 +2489,11 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
>  {
>         struct MPT3SAS_ADAPTER *ioc = shost_priv(scmd->device->host);
>         struct MPT3SAS_DEVICE *sas_device_priv_data;
> -       struct _sas_device *sas_device;
> -       unsigned long flags;
> +       struct _sas_device *sas_device = NULL;
>         u16     handle;
>         int r;
>         struct scsi_target *starget = scmd->device->sdev_target;
> +       struct MPT3SAS_TARGET *target_priv_data = starget->hostdata;
>
>         starget_printk(KERN_INFO, starget, "attempting target reset! scmd(%p)\n",
>                 scmd);
> @@ -2404,12 +2513,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
>         handle = 0;
>         if (sas_device_priv_data->sas_target->flags &
>             MPT_TARGET_FLAGS_RAID_COMPONENT) {
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = _scsih_sas_device_find_by_handle(ioc,
> -                  sas_device_priv_data->sas_target->handle);
> +               sas_device = mpt3sas_get_sdev_from_target(ioc,
> +                               target_priv_data);
>                 if (sas_device)
>                         handle = sas_device->volume_handle;
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         } else
>                 handle = sas_device_priv_data->sas_target->handle;
>
> @@ -2426,6 +2533,10 @@ _scsih_target_reset(struct scsi_cmnd *scmd)
>   out:
>         starget_printk(KERN_INFO, starget, "target reset: %s scmd(%p)\n",
>             ((r == SUCCESS) ? "SUCCESS" : "FAILED"), scmd);
> +
> +       if (sas_device)
> +               sas_device_put(sas_device);
> +
>         return r;
>  }
>
> @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>         struct scsi_device *sdev;
>         struct _sas_device *sas_device;
>

[Sreekanth] Here sas_device_lock spin lock needs to be acquired before calling
                  __mpt3sas_get_sdev_by_addr() function.

> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (!sas_device)
>                 return;
>
> @@ -2779,6 +2890,8 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>                         continue;
>                 _scsih_internal_device_block(sdev, sas_device_priv_data);
>         }
> +
> +       sas_device_put(sas_device);
>  }
>
>  /**
> @@ -2804,15 +2917,15 @@ _scsih_block_io_to_children_attached_to_ex(struct MPT3SAS_ADAPTER *ioc,
>
>         list_for_each_entry(mpt3sas_port,
>            &sas_expander->sas_port_list, port_list) {
> -               if (mpt3sas_port->remote_identify.device_type ==
> -                   SAS_END_DEVICE) {
> +               if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
>                         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -                       sas_device =
> -                           mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -                          mpt3sas_port->remote_identify.sas_address);
> -                       if (sas_device)
> +                       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
> +                                       mpt3sas_port->remote_identify.sas_address);
> +                       if (sas_device) {
>                                 set_bit(sas_device->handle,
> -                                   ioc->blocking_handles);
> +                                       ioc->blocking_handles);
> +                               sas_device_put(sas_device);
> +                       }
>                         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>                 }
>         }
> @@ -2880,7 +2993,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>  {
>         Mpi2SCSITaskManagementRequest_t *mpi_request;
>         u16 smid;
> -       struct _sas_device *sas_device;
> +       struct _sas_device *sas_device = NULL;
>         struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
>         u64 sas_address = 0;
>         unsigned long flags;
> @@ -2913,7 +3026,7 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>                 return;
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (sas_device && sas_device->starget &&
>             sas_device->starget->hostdata) {
>                 sas_target_priv_data = sas_device->starget->hostdata;
> @@ -2947,14 +3060,14 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>         if (!smid) {
>                 delayed_tr = kzalloc(sizeof(*delayed_tr), GFP_ATOMIC);
>                 if (!delayed_tr)
> -                       return;
> +                       goto out;
>                 INIT_LIST_HEAD(&delayed_tr->list);
>                 delayed_tr->handle = handle;
>                 list_add_tail(&delayed_tr->list, &ioc->delayed_tr_list);
>                 dewtprintk(ioc, pr_info(MPT3SAS_FMT
>                     "DELAYED:tr:handle(0x%04x), (open)\n",
>                     ioc->name, handle));
> -               return;
> +               goto out;
>         }
>
>         dewtprintk(ioc, pr_info(MPT3SAS_FMT
> @@ -2968,6 +3081,9 @@ _scsih_tm_tr_send(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>         mpi_request->TaskType = MPI2_SCSITASKMGMT_TASKTYPE_TARGET_RESET;
>         mpt3sas_base_put_smid_hi_priority(ioc, smid);
>         mpt3sas_trigger_master(ioc, MASTER_TRIGGER_DEVICE_REMOVAL);
> +out:
> +       if (sas_device)
> +               sas_device_put(sas_device);
>  }
>
>  /**
> @@ -3818,7 +3934,6 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
>         char *desc_scsi_state = ioc->tmp_string;
>         u32 log_info = le32_to_cpu(mpi_reply->IOCLogInfo);
>         struct _sas_device *sas_device = NULL;
> -       unsigned long flags;
>         struct scsi_target *starget = scmd->device->sdev_target;
>         struct MPT3SAS_TARGET *priv_target = starget->hostdata;
>         char *device_str = NULL;
> @@ -3946,9 +4061,7 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
>                 pr_warn(MPT3SAS_FMT "\t%s wwid(0x%016llx)\n", ioc->name,
>                     device_str, (unsigned long long)priv_target->sas_address);
>         } else {
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -                   priv_target->sas_address);
> +               sas_device = mpt3sas_get_sdev_from_target(ioc, priv_target);
>                 if (sas_device) {
>                         pr_warn(MPT3SAS_FMT
>                                 "\tsas_address(0x%016llx), phy(%d)\n",
> @@ -3967,8 +4080,9 @@ _scsih_scsi_ioc_info(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd,
>                                   " connector name( %s)\n", ioc->name,
>                                   sas_device->enclosure_level,
>                                   sas_device->connector_name);
> +
> +                       sas_device_put(sas_device);
>                 }
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         }
>
>         pr_warn(MPT3SAS_FMT
> @@ -4020,7 +4134,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>         Mpi2SepRequest_t mpi_request;
>         struct _sas_device *sas_device;
>
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (!sas_device)
>                 return;
>
> @@ -4035,7 +4149,7 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>             &mpi_request)) != 0) {
>                 pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
>                 __FILE__, __LINE__, __func__);
> -               return;
> +               goto out;
>         }
>         sas_device->pfa_led_on = 1;
>
> @@ -4044,8 +4158,10 @@ _scsih_turn_on_pfa_led(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>                         "enclosure_processor: ioc_status (0x%04x), loginfo(0x%08x)\n",
>                         ioc->name, le16_to_cpu(mpi_reply.IOCStatus),
>                     le32_to_cpu(mpi_reply.IOCLogInfo)));
> -               return;
> +               goto out;
>         }
> +out:
> +       sas_device_put(sas_device);
>  }
>  /**
>   * _scsih_turn_off_pfa_led - turn off Fault LED
> @@ -4128,19 +4244,17 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>
>         /* only handle non-raid devices */
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (!sas_device) {
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> +               goto out_unlock;
>         }
>         starget = sas_device->starget;
>         sas_target_priv_data = starget->hostdata;
>
>         if ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_RAID_COMPONENT) ||
> -          ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME))) {
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> -       }
> +          ((sas_target_priv_data->flags & MPT_TARGET_FLAGS_VOLUME)))
> +               goto out_unlock;
> +
>         if (sas_device->enclosure_handle != 0)
>                 starget_printk(KERN_INFO, starget, "predicted fault, "
>                         "enclosure logical id(0x%016llx), slot(%d)\n",
> @@ -4163,7 +4277,7 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>         if (!event_reply) {
>                 pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n",
>                     ioc->name, __FILE__, __LINE__, __func__);
> -               return;
> +               goto out;
>         }
>
>         event_reply->Function = MPI2_FUNCTION_EVENT_NOTIFICATION;
> @@ -4180,6 +4294,14 @@ _scsih_smart_predicted_fault(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>         event_data->SASAddress = cpu_to_le64(sas_target_priv_data->sas_address);
>         mpt3sas_ctl_add_to_event_log(ioc, event_reply);
>         kfree(event_reply);
> +out:
> +       if (sas_device)
> +               sas_device_put(sas_device);
> +       return;
> +
> +out_unlock:
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +       goto out;
>  }
>
>  /**
> @@ -4933,13 +5055,11 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
>         sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>             sas_address);
>
> -       if (!sas_device) {
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> -       }
> +       if (!sas_device)
> +               goto out_unlock;
>
>         if (unlikely(sas_device->handle != handle)) {
>                 starget = sas_device->starget;
> @@ -4967,20 +5087,24 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
>                 pr_err(MPT3SAS_FMT
>                         "device is not present handle(0x%04x), flags!!!\n",
>                         ioc->name, handle);
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> +               goto out_unlock;
>         }
>
>         /* check if there were any issues with discovery */
>         if (_scsih_check_access_status(ioc, sas_address, handle,
> -           sas_device_pg0.AccessStatus)) {
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> -       }
> +           sas_device_pg0.AccessStatus))
> +               goto out_unlock;
>
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         _scsih_ublock_io_device(ioc, sas_address);
> +       if (sas_device)
> +               sas_device_put(sas_device);
> +       return;
>
> +out_unlock:
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +       if (sas_device)
> +               sas_device_put(sas_device);
>  }
>
>  /**
> @@ -5005,7 +5129,6 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>         u32 ioc_status;
>         u64 sas_address;
>         u32 device_info;
> -       unsigned long flags;
>
>         if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply, &sas_device_pg0,
>             MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
> @@ -5041,13 +5164,11 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>             sas_device_pg0.AccessStatus))
>                 return -1;
>
> -       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -           sas_address);
> -       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -
> -       if (sas_device)
> +       sas_device = mpt3sas_get_sdev_by_addr(ioc, sas_address);
> +       if (sas_device) {
> +               sas_device_put(sas_device);
>                 return -1;
> +       }
>
>         sas_device = kzalloc(sizeof(struct _sas_device),
>             GFP_KERNEL);
> @@ -5057,6 +5178,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>                 return 0;
>         }
>
> +       kref_init(&sas_device->refcount);
>         sas_device->handle = handle;
>         if (_scsih_get_sas_address(ioc,
>             le16_to_cpu(sas_device_pg0.ParentDevHandle),
> @@ -5098,6 +5220,7 @@ _scsih_add_device(struct MPT3SAS_ADAPTER *ioc, u16 handle, u8 phy_num,
>         else
>                 _scsih_sas_device_add(ioc, sas_device);
>
> +       sas_device_put(sas_device);
>         return 0;
>  }
>
> @@ -5506,25 +5629,25 @@ _scsih_sas_device_status_change_event(struct MPT3SAS_ADAPTER *ioc,
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
>         sas_address = le64_to_cpu(event_data->SASAddress);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> -           sas_address);
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc, sas_address);
>
> -       if (!sas_device || !sas_device->starget) {
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> -       }
> +       if (!sas_device || !sas_device->starget)
> +               goto out;
>
>         target_priv_data = sas_device->starget->hostdata;
> -       if (!target_priv_data) {
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               return;
> -       }
> +       if (!target_priv_data)
> +               goto out;
>
>         if (event_data->ReasonCode ==
>             MPI2_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET)
>                 target_priv_data->tm_busy = 1;
>         else
>                 target_priv_data->tm_busy = 0;
> +
> +out:
> +       if (sas_device)
> +               sas_device_put(sas_device);
> +
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>  }
>
> @@ -6008,7 +6131,7 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
>         u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (sas_device) {
>                 sas_device->volume_handle = 0;
>                 sas_device->volume_wwid = 0;
> @@ -6027,6 +6150,8 @@ _scsih_sas_pd_expose(struct MPT3SAS_ADAPTER *ioc,
>         /* exposing raid component */
>         if (starget)
>                 starget_for_each_device(starget, NULL, _scsih_reprobe_lun);
> +
> +       sas_device_put(sas_device);
>  }
>
>  /**
> @@ -6055,7 +6180,7 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
>                     &volume_wwid);
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (sas_device) {
>                 set_bit(handle, ioc->pd_handles);
>                 if (sas_device->starget && sas_device->starget->hostdata) {
> @@ -6075,6 +6200,8 @@ _scsih_sas_pd_hide(struct MPT3SAS_ADAPTER *ioc,
>         _scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
>         if (starget)
>                 starget_for_each_device(starget, (void *)1, _scsih_reprobe_lun);
> +
> +       sas_device_put(sas_device);
>  }
>
>  /**
> @@ -6107,7 +6234,6 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
>         Mpi2EventIrConfigElement_t *element)
>  {
>         struct _sas_device *sas_device;
> -       unsigned long flags;
>         u16 handle = le16_to_cpu(element->PhysDiskDevHandle);
>         Mpi2ConfigReply_t mpi_reply;
>         Mpi2SasDevicePage0_t sas_device_pg0;
> @@ -6117,11 +6243,10 @@ _scsih_sas_pd_add(struct MPT3SAS_ADAPTER *ioc,
>
>         set_bit(handle, ioc->pd_handles);
>
> -       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +       sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (sas_device) {
>                 _scsih_ir_fastpath(ioc, handle, element->PhysDiskNum);
> +               sas_device_put(sas_device);
>                 return;
>         }
>
> @@ -6398,7 +6523,6 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
>         u16 handle, parent_handle;
>         u32 state;
>         struct _sas_device *sas_device;
> -       unsigned long flags;
>         Mpi2ConfigReply_t mpi_reply;
>         Mpi2SasDevicePage0_t sas_device_pg0;
>         u32 ioc_status;
> @@ -6427,12 +6551,12 @@ _scsih_sas_ir_physical_disk_event(struct MPT3SAS_ADAPTER *ioc,
>         case MPI2_RAID_PD_STATE_HOT_SPARE:
>
>                 set_bit(handle, ioc->pd_handles);
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>
> -               if (sas_device)
> +               sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> +               if (sas_device) {
> +                       sas_device_put(sas_device);
>                         return;
> +               }
>
>                 if ((mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
>                     &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
> @@ -6906,6 +7030,7 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
>         struct _raid_device *raid_device, *raid_device_next;
>         struct list_head tmp_list;
>         unsigned long flags;
> +       LIST_HEAD(head);
>
>         pr_info(MPT3SAS_FMT "removing unresponding devices: start\n",
>             ioc->name);
> @@ -6913,14 +7038,29 @@ _scsih_remove_unresponding_sas_devices(struct MPT3SAS_ADAPTER *ioc)
>         /* removing unresponding end devices */
>         pr_info(MPT3SAS_FMT "removing unresponding devices: end-devices\n",
>             ioc->name);
> +
> +       /*
> +        * Iterate, pulling off devices marked as non-responding. We become the
> +        * owner for the reference the list had on any object we prune.
> +        */
> +       spin_lock_irqsave(&ioc->sas_device_lock, flags);
>         list_for_each_entry_safe(sas_device, sas_device_next,
> -           &ioc->sas_device_list, list) {
> +                                &ioc->sas_device_list, list) {
>                 if (!sas_device->responding)
> -                       mpt3sas_device_remove_by_sas_address(ioc,
> -                           sas_device->sas_address);
> +                       list_move_tail(&sas_device->list, &head);
>                 else
>                         sas_device->responding = 0;
>         }
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +       /*
> +        * Now, uninitialize and remove the unresponding devices we pruned.
> +        */
> +       list_for_each_entry_safe(sas_device, sas_device_next, &head, list) {
> +               _scsih_remove_device(ioc, sas_device);
> +               list_del_init(&sas_device->list);
> +               sas_device_put(sas_device);
> +       }
>
>         /* removing unresponding volumes */
>         if (ioc->ir_firmware) {
> @@ -7074,11 +7214,11 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
>                 }
>                 phys_disk_num = pd_pg0.PhysDiskNum;
>                 handle = le16_to_cpu(pd_pg0.DevHandle);
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               if (sas_device)
> +               sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
> +               if (sas_device) {
> +                       sas_device_put(sas_device);
>                         continue;
> +               }
>                 if (mpt3sas_config_get_sas_device_pg0(ioc, &mpi_reply,
>                     &sas_device_pg0, MPI2_SAS_DEVICE_PGAD_FORM_HANDLE,
>                     handle) != 0)
> @@ -7199,12 +7339,12 @@ _scsih_scan_for_devices_after_reset(struct MPT3SAS_ADAPTER *ioc)
>                 if (!(_scsih_is_end_device(
>                     le32_to_cpu(sas_device_pg0.DeviceInfo))))
>                         continue;
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +               sas_device = mpt3sas_get_sdev_by_addr(ioc,
>                     le64_to_cpu(sas_device_pg0.SASAddress));
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> -               if (sas_device)
> +               if (sas_device) {
> +                       sas_device_put(sas_device);
>                         continue;
> +               }
>                 parent_handle = le16_to_cpu(sas_device_pg0.ParentDevHandle);
>                 if (!_scsih_get_sas_address(ioc, parent_handle, &sas_address)) {
>                         pr_info(MPT3SAS_FMT "\tBEFORE adding end device: " \
> @@ -7831,6 +7971,48 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
>         }
>  }
>
> +static struct _sas_device *get_next_sas_device(struct MPT3SAS_ADAPTER *ioc)
> +{
> +       struct _sas_device *sas_device = NULL;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +       if (!list_empty(&ioc->sas_device_init_list)) {
> +               sas_device = list_first_entry(&ioc->sas_device_init_list,
> +                               struct _sas_device, list);
> +               sas_device_get(sas_device);
> +       }
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +
> +       return sas_device;
> +}
> +
> +static void sas_device_make_active(struct MPT3SAS_ADAPTER *ioc,
> +               struct _sas_device *sas_device)
> +{
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ioc->sas_device_lock, flags);
> +
> +       /*
> +        * Since we dropped the lock during the call to port_add(), we need to
> +        * be careful here that somebody else didn't move or delete this item
> +        * while we were busy with other things.
> +        *
> +        * If it was on the list, we need a put() for the reference the list
> +        * had. Either way, we need a get() for the destination list.
> +        */
> +       if (!list_empty(&sas_device->list)) {
> +               list_del_init(&sas_device->list);
> +               sas_device_put(sas_device);
> +       }
> +
> +       sas_device_get(sas_device);
> +       list_add_tail(&sas_device->list, &ioc->sas_device_list);
> +
> +       spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +}
> +
>  /**
>   * _scsih_probe_sas - reporting sas devices to sas transport
>   * @ioc: per adapter object
> @@ -7840,17 +8022,13 @@ _scsih_probe_raid(struct MPT3SAS_ADAPTER *ioc)
>  static void
>  _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>  {
> -       struct _sas_device *sas_device, *next;
> -       unsigned long flags;
> -
> -       /* SAS Device List */
> -       list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> -           list) {
> +       struct _sas_device *sas_device;
>
> +       while ((sas_device = get_next_sas_device(ioc))) {
>                 if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> -                   sas_device->sas_address_parent)) {
> -                       list_del(&sas_device->list);
> -                       kfree(sas_device);
> +                               sas_device->sas_address_parent)) {
> +                       _scsih_sas_device_remove(ioc, sas_device);
> +                       sas_device_put(sas_device);
>                         continue;
>                 } else if (!sas_device->starget) {
>                         /*
> @@ -7861,17 +8039,16 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
>                          */
>                         if (!ioc->is_driver_loading) {
>                                 mpt3sas_transport_port_remove(ioc,
> -                                   sas_device->sas_address,
> -                                   sas_device->sas_address_parent);
> -                               list_del(&sas_device->list);
> -                               kfree(sas_device);
> +                                               sas_device->sas_address,
> +                                               sas_device->sas_address_parent);
> +                               _scsih_sas_device_remove(ioc, sas_device);
> +                               sas_device_put(sas_device);
>                                 continue;
>                         }
>                 }
>
> -               spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -               list_move_tail(&sas_device->list, &ioc->sas_device_list);
> -               spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> +               sas_device_make_active(ioc, sas_device);
> +               sas_device_put(sas_device);
>         }
>  }
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> index 70fd019..6074b11 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> @@ -734,7 +734,7 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
>         rphy->identify = mpt3sas_port->remote_identify;
>
>         if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {

[Sreekanth] Here also sas_device_lock spin lock needs to be acquired
before calling
                  __mpt3sas_get_sdev_by_addr() function.


> -               sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +               sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>                                     mpt3sas_port->remote_identify.sas_address);
>                 if (!sas_device) {
>                         dfailprintk(ioc, printk(MPT3SAS_FMT
> @@ -750,8 +750,10 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
>                     ioc->name, __FILE__, __LINE__, __func__);
>         }
>
> -       if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE)
> +       if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
>                 sas_device->pend_sas_rphy_add = 0;
> +               sas_device_put(sas_device);
> +       }
>
>         if ((ioc->logging_level & MPT_DEBUG_TRANSPORT))
>                 dev_printk(KERN_INFO, &rphy->dev,
> @@ -1324,15 +1326,17 @@ _transport_get_enclosure_identifier(struct sas_rphy *rphy, u64 *identifier)
>         int rc;
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>             rphy->identify.sas_address);
>         if (sas_device) {
>                 *identifier = sas_device->enclosure_logical_id;
>                 rc = 0;
> +               sas_device_put(sas_device);
>         } else {
>                 *identifier = 0;
>                 rc = -ENXIO;
>         }
> +
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         return rc;
>  }
> @@ -1352,12 +1356,14 @@ _transport_get_bay_identifier(struct sas_rphy *rphy)
>         int rc;
>
>         spin_lock_irqsave(&ioc->sas_device_lock, flags);
> -       sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> +       sas_device = __mpt3sas_get_sdev_by_addr(ioc,
>             rphy->identify.sas_address);
> -       if (sas_device)
> +       if (sas_device) {
>                 rc = sas_device->slot;
> -       else
> +               sas_device_put(sas_device);
> +       } else {
>                 rc = -ENXIO;
> +       }
>         spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
>         return rc;
>  }
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-09-08 11:55   ` Sreekanth Reddy
@ 2015-09-09  6:21     ` Nicholas A. Bellinger
  2015-09-09 14:29     ` Chaitra Basappa
  1 sibling, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-09-09  6:21 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Nicholas A. Bellinger, linux-scsi, linux-kernel, James Bottomley,
	Calvin Owens, Christoph Hellwig, MPT-FusionLinux.pdl,
	kernel-team, Chaitra Basappa, Martin K. Petersen

Hi Sreekanth,

On Tue, 2015-09-08 at 17:25 +0530, Sreekanth Reddy wrote:
> On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger
> <nab@daterainc.com> wrote:
> > From: Nicholas Bellinger <nab@linux-iscsi.org>
> >
> > These objects can be referenced concurrently throughout the driver, we
> > need a way to make sure threads can't delete them out from under
> > each other. This patch adds the refcount, and refactors the code to use
> > it.
> >
> > Additionally, we cannot iterate over the sas_device_list without
> > holding the lock, or we risk corrupting random memory if items are
> > added or deleted as we iterate. This patch refactors _scsih_probe_sas()
> > to use the sas_device_list in a safe way.
> >
> > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > mpt3sas changes in scsi.git/for-next.
> >
> > Cc: Calvin Owens <calvinowens@fb.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> > Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479 +++++++++++++++++++++----------
> >  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
> >  3 files changed, 364 insertions(+), 158 deletions(-)


<SNIP>

> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> > index 70fd019..6074b11 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
> > @@ -734,7 +734,7 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
> >         rphy->identify = mpt3sas_port->remote_identify;
> >
> >         if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
> 
> [Sreekanth] Here also sas_device_lock spin lock needs to be acquired
> before calling
>                   __mpt3sas_get_sdev_by_addr() function.
> 

Thanks for your review.

Fixed with the following incremental patch atop for-next-merge code.

Please review.

Thank you,

>From 4004584675d24d8e886b03c013074cdfb76a0864 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Tue, 8 Sep 2015 23:05:49 -0700
Subject: [PATCH] mpt3sas: Add missing ioc->sas_device_lock in
 mpt3sas_transport_port_add

This patch adds the missing struct MPT3SAS_ADAPTER->sas_device_lock
usage in __mpt3sas_get_sdev_by_addr() lookup to avoid a NULL pointer
dereference when &ioc->sas_device_list or &ioc->sas_device_init_list
changes from below without a proper sas_device_get(sas_device)
reference held.

Reported-by: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Calvin Owens <calvinowens@fb.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/mpt3sas/mpt3sas_transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 6074b11..cc55907 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -734,8 +734,10 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
 	rphy->identify = mpt3sas_port->remote_identify;
 
 	if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
+		spin_lock_irqsave(&ioc->sas_device_lock, flags);
 		sas_device = __mpt3sas_get_sdev_by_addr(ioc,
 				    mpt3sas_port->remote_identify.sas_address);
+		spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
 		if (!sas_device) {
 			dfailprintk(ioc, printk(MPT3SAS_FMT
 				"failure at %s:%d/%s()!\n",
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-09-08 11:55   ` Sreekanth Reddy
  2015-09-09  6:21     ` Nicholas A. Bellinger
@ 2015-09-09 14:29     ` Chaitra Basappa
  2015-09-09 22:03       ` Nicholas A. Bellinger
  1 sibling, 1 reply; 11+ messages in thread
From: Chaitra Basappa @ 2015-09-09 14:29 UTC (permalink / raw)
  To: Sreekanth Reddy, Nicholas A. Bellinger
  Cc: linux-scsi, linux-kernel, James Bottomley, Calvin Owens,
	Christoph Hellwig, PDL-MPT-FUSIONLINUX, kernel-team,
	Nicholas Bellinger

From: Sreekanth Reddy [mailto:sreekanth.reddy@avagotech.com]
Sent: Tuesday, September 08, 2015 5:26 PM
To: Nicholas A. Bellinger
Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
Basappa
Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
unsafe list usage

On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger <nab@daterainc.com>
wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
>
> These objects can be referenced concurrently throughout the driver, we
> need a way to make sure threads can't delete them out from under each
> other. This patch adds the refcount, and refactors the code to use it.
>
> Additionally, we cannot iterate over the sas_device_list without
> holding the lock, or we risk corrupting random memory if items are
> added or deleted as we iterate. This patch refactors
> _scsih_probe_sas() to use the sas_device_list in a safe way.
>
> This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> mpt3sas changes in scsi.git/for-next.
>
> Cc: Calvin Owens <calvinowens@fb.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479
> +++++++++++++++++++++----------
>  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
>  3 files changed, 364 insertions(+), 158 deletions(-)
>
> @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> u16 handle)
>         struct scsi_device *sdev;
>         struct _sas_device *sas_device;
>

[Sreekanth] Here sas_device_lock spin lock needs to be acquired before
calling
                  __mpt3sas_get_sdev_by_addr() function.

[Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
calling
	"mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
type of kernel panic

> -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
>         if (!sas_device)
>                 return;
>
> @@ -2779,6 +2890,8 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> u16 handle)
>                         continue;
>                 _scsih_internal_device_block(sdev, sas_device_priv_data);
>         }
> +
> +       sas_device_put(sas_device);
>  }
>


Regards,
Chaitra

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-09-09 14:29     ` Chaitra Basappa
@ 2015-09-09 22:03       ` Nicholas A. Bellinger
  2015-09-11  6:55         ` Nicholas A. Bellinger
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-09-09 22:03 UTC (permalink / raw)
  To: Chaitra Basappa
  Cc: Sreekanth Reddy, Nicholas A. Bellinger, linux-scsi, linux-kernel,
	James Bottomley, Calvin Owens, Christoph Hellwig,
	PDL-MPT-FUSIONLINUX, kernel-team

On Wed, 2015-09-09 at 19:59 +0530, Chaitra Basappa wrote:
> From: Sreekanth Reddy [mailto:sreekanth.reddy@avagotech.com]
> Sent: Tuesday, September 08, 2015 5:26 PM
> To: Nicholas A. Bellinger
> Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
> Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
> Basappa
> Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
> unsafe list usage
> 
> On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger <nab@daterainc.com>
> wrote:
> > From: Nicholas Bellinger <nab@linux-iscsi.org>
> >
> > These objects can be referenced concurrently throughout the driver, we
> > need a way to make sure threads can't delete them out from under each
> > other. This patch adds the refcount, and refactors the code to use it.
> >
> > Additionally, we cannot iterate over the sas_device_list without
> > holding the lock, or we risk corrupting random memory if items are
> > added or deleted as we iterate. This patch refactors
> > _scsih_probe_sas() to use the sas_device_list in a safe way.
> >
> > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > mpt3sas changes in scsi.git/for-next.
> >
> > Cc: Calvin Owens <calvinowens@fb.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> > Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> > ---
> >  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
> >  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479
> > +++++++++++++++++++++----------
> >  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
> >  3 files changed, 364 insertions(+), 158 deletions(-)
> >
> > @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> > u16 handle)
> >         struct scsi_device *sdev;
> >         struct _sas_device *sas_device;
> >
> 
> [Sreekanth] Here sas_device_lock spin lock needs to be acquired before
> calling
>                   __mpt3sas_get_sdev_by_addr() function.
> 
> [Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
> calling
> 	"mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
> type of kernel panic
> 
> > -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> >         if (!sas_device)
> >                 return;
> >

Whoops, missed this comment in _scsih_block_io_device() from Sreekanth's
earlier reply.

Here's the updated incremental patch atop target-pending/for-next-merge
to use the protected callers for both cases.

Please review + ACK ASAP.

Thank you,

--nab

>From 8edb1554f7c2eb73cf70c9856aec01e786b9bcf9 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Tue, 8 Sep 2015 23:05:49 -0700
Subject: [PATCH] mpt3sas: Fix unprotected list lookup in v4.3-rc0 changes

This patch adds the missing mpt3sas_get_sdev_by_addr() protected
lookup usage in mpt3sas_transport_port_add() to avoid a NULL pointer
dereference when &ioc->sas_device_list or &ioc->sas_device_init_list
changes from below without a proper sas_device_get(sas_device)
reference held.

Also, use the protected mpt3sas_get_sdev_by_handle() lookup within
_scsih_block_io_device() as well.

Reported-by: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Reported-by: Chaitra Basappa <chaitra.basappa@avagotech.com>
Cc: Calvin Owens <calvinowens@fb.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 2 +-
 drivers/scsi/mpt3sas/mpt3sas_transport.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 0431cd0..9e68432 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2933,7 +2933,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc, u16 handle)
 	struct scsi_device *sdev;
 	struct _sas_device *sas_device;
 
-	sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
+	sas_device = mpt3sas_get_sdev_by_handle(ioc, handle);
 	if (!sas_device)
 		return;
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index 6074b11..ca36d7e 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -734,7 +734,7 @@ mpt3sas_transport_port_add(struct MPT3SAS_ADAPTER *ioc, u16 handle,
 	rphy->identify = mpt3sas_port->remote_identify;
 
 	if (mpt3sas_port->remote_identify.device_type == SAS_END_DEVICE) {
-		sas_device = __mpt3sas_get_sdev_by_addr(ioc,
+		sas_device = mpt3sas_get_sdev_by_addr(ioc,
 				    mpt3sas_port->remote_identify.sas_address);
 		if (!sas_device) {
 			dfailprintk(ioc, printk(MPT3SAS_FMT
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-09-09 22:03       ` Nicholas A. Bellinger
@ 2015-09-11  6:55         ` Nicholas A. Bellinger
  2015-09-11 17:50           ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-09-11  6:55 UTC (permalink / raw)
  To: Chaitra Basappa
  Cc: Sreekanth Reddy, Nicholas A. Bellinger, linux-scsi, linux-kernel,
	James Bottomley, Calvin Owens, Christoph Hellwig,
	PDL-MPT-FUSIONLINUX, kernel-team

On Wed, 2015-09-09 at 15:03 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2015-09-09 at 19:59 +0530, Chaitra Basappa wrote:
> > From: Sreekanth Reddy [mailto:sreekanth.reddy@avagotech.com]
> > Sent: Tuesday, September 08, 2015 5:26 PM
> > To: Nicholas A. Bellinger
> > Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
> > Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
> > Basappa
> > Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
> > unsafe list usage
> > 
> > On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger <nab@daterainc.com>
> > wrote:
> > > From: Nicholas Bellinger <nab@linux-iscsi.org>
> > >
> > > These objects can be referenced concurrently throughout the driver, we
> > > need a way to make sure threads can't delete them out from under each
> > > other. This patch adds the refcount, and refactors the code to use it.
> > >
> > > Additionally, we cannot iterate over the sas_device_list without
> > > holding the lock, or we risk corrupting random memory if items are
> > > added or deleted as we iterate. This patch refactors
> > > _scsih_probe_sas() to use the sas_device_list in a safe way.
> > >
> > > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > > mpt3sas changes in scsi.git/for-next.
> > >
> > > Cc: Calvin Owens <calvinowens@fb.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> > > Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> > > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> > > ---
> > >  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
> > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479
> > > +++++++++++++++++++++----------
> > >  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
> > >  3 files changed, 364 insertions(+), 158 deletions(-)
> > >
> > > @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> > > u16 handle)
> > >         struct scsi_device *sdev;
> > >         struct _sas_device *sas_device;
> > >
> > 
> > [Sreekanth] Here sas_device_lock spin lock needs to be acquired before
> > calling
> >                   __mpt3sas_get_sdev_by_addr() function.
> > 
> > [Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
> > calling
> > 	"mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
> > type of kernel panic
> > 
> > > -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > > +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> > >         if (!sas_device)
> > >                 return;
> > >
> 
> Whoops, missed this comment in _scsih_block_io_device() from Sreekanth's
> earlier reply.
> 
> Here's the updated incremental patch atop target-pending/for-next-merge
> to use the protected callers for both cases.
> 
> Please review + ACK ASAP.

The mpt3sas -v2 series + v4.3-rc0 breakage incremental patch here made
it into linux-next-09102015, and at this point I don't see a scenario
where keeping around the broken list_head dereferences makes sense.

So that said, I'd like to send a target-pending/for-next-merge PULL
request out to Linus in the next 48 hours.

Any objections from Avago folks..?

Thank you,

--nab


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-09-11  6:55         ` Nicholas A. Bellinger
@ 2015-09-11 17:50           ` James Bottomley
  2015-09-11 20:01             ` Nicholas A. Bellinger
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2015-09-11 17:50 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Chaitra Basappa, Sreekanth Reddy, Nicholas A. Bellinger,
	linux-scsi, linux-kernel, Calvin Owens, Christoph Hellwig,
	PDL-MPT-FUSIONLINUX, kernel-team

On Thu, 2015-09-10 at 23:55 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2015-09-09 at 15:03 -0700, Nicholas A. Bellinger wrote:
> > On Wed, 2015-09-09 at 19:59 +0530, Chaitra Basappa wrote:
> > > From: Sreekanth Reddy [mailto:sreekanth.reddy@avagotech.com]
> > > Sent: Tuesday, September 08, 2015 5:26 PM
> > > To: Nicholas A. Bellinger
> > > Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
> > > Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
> > > Basappa
> > > Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
> > > unsafe list usage
> > > 
> > > On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger <nab@daterainc.com>
> > > wrote:
> > > > From: Nicholas Bellinger <nab@linux-iscsi.org>
> > > >
> > > > These objects can be referenced concurrently throughout the driver, we
> > > > need a way to make sure threads can't delete them out from under each
> > > > other. This patch adds the refcount, and refactors the code to use it.
> > > >
> > > > Additionally, we cannot iterate over the sas_device_list without
> > > > holding the lock, or we risk corrupting random memory if items are
> > > > added or deleted as we iterate. This patch refactors
> > > > _scsih_probe_sas() to use the sas_device_list in a safe way.
> > > >
> > > > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > > > mpt3sas changes in scsi.git/for-next.
> > > >
> > > > Cc: Calvin Owens <calvinowens@fb.com>
> > > > Cc: Christoph Hellwig <hch@infradead.org>
> > > > Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> > > > Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> > > > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> > > > ---
> > > >  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
> > > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479
> > > > +++++++++++++++++++++----------
> > > >  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
> > > >  3 files changed, 364 insertions(+), 158 deletions(-)
> > > >
> > > > @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> > > > u16 handle)
> > > >         struct scsi_device *sdev;
> > > >         struct _sas_device *sas_device;
> > > >
> > > 
> > > [Sreekanth] Here sas_device_lock spin lock needs to be acquired before
> > > calling
> > >                   __mpt3sas_get_sdev_by_addr() function.
> > > 
> > > [Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
> > > calling
> > > 	"mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
> > > type of kernel panic
> > > 
> > > > -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > > > +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> > > >         if (!sas_device)
> > > >                 return;
> > > >
> > 
> > Whoops, missed this comment in _scsih_block_io_device() from Sreekanth's
> > earlier reply.
> > 
> > Here's the updated incremental patch atop target-pending/for-next-merge
> > to use the protected callers for both cases.
> > 
> > Please review + ACK ASAP.
> 
> The mpt3sas -v2 series + v4.3-rc0 breakage incremental patch here made
> it into linux-next-09102015, and at this point I don't see a scenario
> where keeping around the broken list_head dereferences makes sense.

I already explained the dangers of what the patch does.  Separated
lifetime objects need to be treated very carefully.  Rushing this in to
-rc1 without an Avago soak test is irresponsible.  Two issues have
already turned up in this thanks to inspection and as a bug fix it's not
bound by the merge window anyway so there's no reason to rush it into
-rc1 without the proper testing.

The reason for wanting to do this right is not to create a bisection
black hole: if we create an unreliable base storage driver by rushing
this into -rc1 it makes bisection very difficult for people who use mpt3
gear because they won't know if it's the bug they're chasing or the one
we introduced which they can't avoid because they have to use a storage
driver to boot the kernel.


> So that said, I'd like to send a target-pending/for-next-merge PULL
> request out to Linus in the next 48 hours.

How about no: it's not a target patch, it's an initiator patch, which
makes it my decision not yours.  The Maintainers are being responsive,
so there's no reason to override their request for a soak test, even if
you are the patch author.  It will get pushed once they confirm.

James




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage
  2015-09-11 17:50           ` James Bottomley
@ 2015-09-11 20:01             ` Nicholas A. Bellinger
  0 siblings, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2015-09-11 20:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Chaitra Basappa, Sreekanth Reddy, Nicholas A. Bellinger,
	linux-scsi, linux-kernel, Calvin Owens, Christoph Hellwig,
	PDL-MPT-FUSIONLINUX, kernel-team

On Fri, 2015-09-11 at 10:50 -0700, James Bottomley wrote:
> On Thu, 2015-09-10 at 23:55 -0700, Nicholas A. Bellinger wrote:
> > On Wed, 2015-09-09 at 15:03 -0700, Nicholas A. Bellinger wrote:
> > > On Wed, 2015-09-09 at 19:59 +0530, Chaitra Basappa wrote:
> > > > From: Sreekanth Reddy [mailto:sreekanth.reddy@avagotech.com]
> > > > Sent: Tuesday, September 08, 2015 5:26 PM
> > > > To: Nicholas A. Bellinger
> > > > Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
> > > > Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
> > > > Basappa
> > > > Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
> > > > unsafe list usage
> > > > 
> > > > On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger <nab@daterainc.com>
> > > > wrote:
> > > > > From: Nicholas Bellinger <nab@linux-iscsi.org>
> > > > >
> > > > > These objects can be referenced concurrently throughout the driver, we
> > > > > need a way to make sure threads can't delete them out from under each
> > > > > other. This patch adds the refcount, and refactors the code to use it.
> > > > >
> > > > > Additionally, we cannot iterate over the sas_device_list without
> > > > > holding the lock, or we risk corrupting random memory if items are
> > > > > added or deleted as we iterate. This patch refactors
> > > > > _scsih_probe_sas() to use the sas_device_list in a safe way.
> > > > >
> > > > > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > > > > mpt3sas changes in scsi.git/for-next.
> > > > >
> > > > > Cc: Calvin Owens <calvinowens@fb.com>
> > > > > Cc: Christoph Hellwig <hch@infradead.org>
> > > > > Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
> > > > > Cc: MPT-FusionLinux.pdl <MPT-FusionLinux.pdl@avagotech.com>
> > > > > Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
> > > > > ---
> > > > >  drivers/scsi/mpt3sas/mpt3sas_base.h      |  25 +-
> > > > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c     | 479
> > > > > +++++++++++++++++++++----------
> > > > >  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
> > > > >  3 files changed, 364 insertions(+), 158 deletions(-)
> > > > >
> > > > > @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> > > > > u16 handle)
> > > > >         struct scsi_device *sdev;
> > > > >         struct _sas_device *sas_device;
> > > > >
> > > > 
> > > > [Sreekanth] Here sas_device_lock spin lock needs to be acquired before
> > > > calling
> > > >                   __mpt3sas_get_sdev_by_addr() function.
> > > > 
> > > > [Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
> > > > calling
> > > > 	"mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
> > > > type of kernel panic
> > > > 
> > > > > -       sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > > > > +       sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> > > > >         if (!sas_device)
> > > > >                 return;
> > > > >
> > > 
> > > Whoops, missed this comment in _scsih_block_io_device() from Sreekanth's
> > > earlier reply.
> > > 
> > > Here's the updated incremental patch atop target-pending/for-next-merge
> > > to use the protected callers for both cases.
> > > 
> > > Please review + ACK ASAP.
> > 
> > The mpt3sas -v2 series + v4.3-rc0 breakage incremental patch here made
> > it into linux-next-09102015, and at this point I don't see a scenario
> > where keeping around the broken list_head dereferences makes sense.
> 
> I already explained the dangers of what the patch does.  Separated
> lifetime objects need to be treated very carefully.  Rushing this in to
> -rc1 without an Avago soak test is irresponsible.  Two issues have
> already turned up in this thanks to inspection and as a bug fix it's not
> bound by the merge window anyway so there's no reason to rush it into
> -rc1 without the proper testing.
> 

It's not being 'rushed in'.  The changes have being run continuously on
60+ HBAs w/ 720+ HDDs using v3.14.y code for the last 3 weeks.

Calvin reviewed the code, the Avago folks have commented on the code,
and it's in linux-next.

Currently there are no outstanding comments to be addressed for -v2.  

> The reason for wanting to do this right is not to create a bisection
> black hole: if we create an unreliable base storage driver by rushing
> this into -rc1 it makes bisection very difficult for people who use mpt3
> gear because they won't know if it's the bug they're chasing or the one
> we introduced which they can't avoid because they have to use a storage
> driver to boot the kernel.
> 

As mentioned, there is not a scenario where keeping this broken list
handling code around for -rc1 makes any sense, when the exact same
change for mpt2sas minus two simple cases is (I assume) going to Linus
shortly.

Do you have any specific code comments on the series, or not..?

> 
> > So that said, I'd like to send a target-pending/for-next-merge PULL
> > request out to Linus in the next 48 hours.
> 
> How about no: it's not a target patch, it's an initiator patch, which
> makes it my decision not yours.  The Maintainers are being responsive,
> so there's no reason to override their request for a soak test, even if
> you are the patch author.  It will get pushed once they confirm.
> 

Believe me, fixing up LLD code and arguing with you over why utterly
broken list handling code should be left as-is for another few weeks in
mainline is the last thing I'd like to be spending time on right now.

Unfortunately, I've got 100+ HBAs that use this driver, and the current
state of broken list handling is completely unacceptable.

--nab


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-09-11 20:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-30  7:54 [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger
2015-08-30  7:54 ` [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage Nicholas A. Bellinger
2015-09-08 11:55   ` Sreekanth Reddy
2015-09-09  6:21     ` Nicholas A. Bellinger
2015-09-09 14:29     ` Chaitra Basappa
2015-09-09 22:03       ` Nicholas A. Bellinger
2015-09-11  6:55         ` Nicholas A. Bellinger
2015-09-11 17:50           ` James Bottomley
2015-09-11 20:01             ` Nicholas A. Bellinger
2015-08-30  7:54 ` [PATCH-v2 2/2] mpt3sas: Refcount fw_events " Nicholas A. Bellinger
2015-09-04 18:47 ` [PATCH-v2 0/2] mpt3sas: Reference counting fixes from for-next mpt2sas Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.