All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...)
@ 2011-12-23  2:58 Dan Williams
  2011-12-23  2:58 ` [PATCH v2 01/28] libsas: remove unused ata_task_resp fields Dan Williams
                   ` (27 more replies)
  0 siblings, 28 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

v1 here: http://marc.info/?l=linux-scsi&m=132408929808366&w=2

Resending all patches given 16 patches were either modified or are new
in this set.

Changes since v1:
1/ The changes to kernel/workqueue.c (to track unchained work during a
   drain_workqueue() operation) have been dropped.  Instead this
   functionality has been pushed down into libsas.  "[PATCH v2 07/28]
   libsas: introduce sas_drain_work()"

2/ Extended "[PATCH v2 09/28] libsas: prevent domain rediscovery
   competing with ata error handling" to fix a deadlock encountered while
   removing a device.  Since device removal issues cache-flush i/o it
   causes libsas to be dependent on the completion of eh which in turn
   means that libsas must not hold eh_mutex over a removal event.

3/ New patch "[PATCH v2 27/28] libsas: fix sas_find_local_phy(), take
   phy references" addresses hitting the BUG_ON(!exphy) in this routine.
   Nothing prevents eh from still being in flight after libsas has removed a
   device from the domain, so the BUG_ON is bogus.

4/ A small collection of dev->gone related fixups, patch 25, 26, and 28.

5/ Picked up a few acked-by and reviewed-by's from Jack, but did not
   include his tested-by across the set given the changes since v1.

  git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci.git libsas


Note that I can still occasionally produce the following which appears
to be a use after free of the request_queue, but at least the hotplug /
eh bugs are starting to be dominated by upper layer issues:

[ 8756.193230] general protection fault: 0000 [#1] SMP 
[ 8756.199203] CPU 3 
[ 8756.201260] Modules linked in: isci libsas libata scsi_transport_sas sd_mod ipv6 md_mod i2c_i801 i2c_core [last unloaded: scsi_transport_sas]
[ 8756.216991] 
[ 8756.219041] Pid: 2986, comm: dd Not tainted 3.2.0-rc5+
[ 8756.232102] RIP: 0010:[<ffffffff8106942f>]  [<ffffffff8106942f>] __lock_acquire+0xe3/0xcf1
[ 8756.242131] RSP: 0018:ffff880032b83948  EFLAGS: 00010002
[ 8756.248457] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800350d0000 RCX: 0000000000000000
[ 8756.256824] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800380c26a8
[ 8756.265189] RBP: ffff880032b839f8 R08: 0000000000000002 R09: 0000000000000001
[ 8756.273558] R10: ffffffff816123e0 R11: ffffea00005df200 R12: ffff8800350d0000
[ 8756.281925] R13: 0000000000000002 R14: 0000000000000000 R15: ffff8800380c26a8
[ 8756.290292] FS:  00007f4cfa9af700(0000) GS:ffff88003a6c0000(0000) knlGS:0000000000000000
[ 8756.300110] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 8756.306921] CR2: 00007fff1fc37c78 CR3: 0000000023337000 CR4: 00000000000406e0
[ 8756.315285] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8756.323653] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 8756.332032] Process dd (pid: 2986, threadinfo ffff880032b82000, task ffff8800350d0000)
[ 8756.341645] Stack:
[ 8756.344275]  ffff880000000000 ffffffff00000001 0000000000000000 000000000000000c
[ 8756.353363]  ffff880032b839b8 ffffffff81069159 ffff8800350d06a0 0000000000001000
[ 8756.362460]  0000000000000000 ffff8800350d0000 ffff8800350d06a0 0000000000000002
[ 8756.371558] Call Trace:
[ 8756.374681]  [<ffffffff81069159>] ? mark_lock+0x2d/0x220
[ 8756.381007]  [<ffffffff81069828>] ? __lock_acquire+0x4dc/0xcf1
[ 8756.387911]  [<ffffffff8103412d>] ? try_to_wake_up+0x2e/0x1d9
[ 8756.394725]  [<ffffffff8106a4d6>] lock_acquire+0xec/0x117
[ 8756.401147]  [<ffffffff8103412d>] ? try_to_wake_up+0x2e/0x1d9
[ 8756.407955]  [<ffffffff81497be7>] _raw_spin_lock_irqsave+0x4e/0x61
[ 8756.415255]  [<ffffffff8103412d>] ? try_to_wake_up+0x2e/0x1d9
[ 8756.422073]  [<ffffffff8111f83c>] ? bdi_start_background_writeback+0x52/0x78
[ 8756.430348]  [<ffffffff8103412d>] try_to_wake_up+0x2e/0x1d9
[ 8756.436970]  [<ffffffff81034301>] wake_up_process+0x15/0x17
[ 8756.443591]  [<ffffffff8111f855>] bdi_start_background_writeback+0x6b/0x78


Updated overview of the patchkit:

Patches 1 - 6, 10: miscellaneous cleanups
We seem to have been leaking struct domain_device objects for a while,
and the event locks in libsas are not required.  Calling
->lldd_dev_found() twice is removed in favor of libata notifying aic94xx
of the dma parameters directly.

Patch 7: sas_drain_work() implementation as noted above

Patch 8: uplevel sas_ata ata lock management

Patches 11 - 17, 27: completion races and libsas-eh vs libata-eh
Close races between eh completion and "late" completion by the lldd.
Where possible defer error handling to libata.  After these updates a
lldd can use sas_ata_schedule_reset() to ensure that the reset is
performed in libata context and not libsas-eh which has no link recovery
logic.

Patches 9, 18 - 24: libata link management
These patches aim to make sure all sources of reset of ata devices
occur in libata-eh context.  While libata-eh is active domain
rediscovery is held off.

Patches 25 - 26, 28: dev->gone related fixups

---
[PATCH v2 01/28] libsas: remove unused ata_task_resp fields
[PATCH v2 02/28] libsas: kill sas_slave_destroy
[PATCH v2 03/28] libsas: fix domain_device leak
[PATCH v2 04/28] libsas: fix leak of dev->sata_dev.identify_[packet_]device
[PATCH v2 05/28] libsas: replace event locks with atomic bitops
[PATCH v2 06/28] libsas: convert ha->state to flags
[PATCH v2 07/28] libsas: introduce sas_drain_work()
[PATCH v2 08/28] libsas: remove ata_port.lock management duties from lldds
[PATCH v2 09/28] libsas: prevent domain rediscovery competing with ata error handling
[PATCH v2 10/28] libsas: use ->set_dmamode to notify lldds of NCQ parameters
[PATCH v2 11/28] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done
[PATCH v2 12/28] libsas: close error handling vs sas_ata_task_done() race
[PATCH v2 13/28] libsas: prevent double completion of scmds from eh
[PATCH v2 14/28] libsas: fix timeout vs completion race
[PATCH v2 15/28] libsas: let libata handle command timeouts
[PATCH v2 16/28] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata
[PATCH v2 17/28] libsas: use libata-eh-reset for sata rediscovery fis transmit failures
[PATCH v2 18/28] libsas: perform sas-transport resets in shost->workq context
[PATCH v2 19/28] libsas: execute transport link resets with libata-eh via host workqueue
[PATCH v2 20/28] libsas: sas_phy_enable via transport_sas_phy_reset
[PATCH v2 21/28] libsas: Remove redundant phy state notification calls.
[PATCH v2 22/28] libsas: add mutex for SMP task execution
[PATCH v2 23/28] libsas: async ata-eh
[PATCH v2 24/28] libsas: poll for ata device readiness after reset
[PATCH v2 25/28] libsas: don't mark expanders as gone when a child device is removed
[PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task()
[PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references
[PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset()

 Documentation/scsi/libsas.txt       |   15 -
 drivers/ata/libata-eh.c             |    1 
 drivers/ata/libata.h                |    1 
 drivers/scsi/aic94xx/aic94xx.h      |    2 
 drivers/scsi/aic94xx/aic94xx_dev.c  |   38 +-
 drivers/scsi/aic94xx/aic94xx_init.c |    5 
 drivers/scsi/aic94xx/aic94xx_tmf.c  |    9 
 drivers/scsi/isci/host.c            |    8 
 drivers/scsi/isci/init.c            |    1 
 drivers/scsi/isci/request.c         |    3 
 drivers/scsi/isci/task.c            |   15 -
 drivers/scsi/isci/task.h            |   36 --
 drivers/scsi/libsas/sas_ata.c       |  670 ++++++++++++++---------------------
 drivers/scsi/libsas/sas_discover.c  |  143 ++++++-
 drivers/scsi/libsas/sas_event.c     |   62 +++
 drivers/scsi/libsas/sas_expander.c  |  107 ++++--
 drivers/scsi/libsas/sas_init.c      |  190 +++++++++-
 drivers/scsi/libsas/sas_internal.h  |   71 ++--
 drivers/scsi/libsas/sas_phy.c       |   12 -
 drivers/scsi/libsas/sas_port.c      |   24 +
 drivers/scsi/libsas/sas_scsi_host.c |  299 +++++++---------
 drivers/scsi/mvsas/mv_init.c        |    1 
 drivers/scsi/mvsas/mv_sas.c         |   11 -
 drivers/scsi/pm8001/pm8001_init.c   |    1 
 drivers/scsi/pm8001/pm8001_sas.c    |   29 +-
 drivers/scsi/scsi_transport_sas.c   |   59 +++
 include/linux/libata.h              |    1 
 include/scsi/libsas.h               |   58 ++-
 include/scsi/sas_ata.h              |   26 +
 include/scsi/scsi_transport_sas.h   |   12 +
 30 files changed, 1063 insertions(+), 847 deletions(-)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 01/28] libsas: remove unused ata_task_resp fields
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
@ 2011-12-23  2:58 ` Dan Williams
  2011-12-23  2:58 ` [PATCH v2 02/28] libsas: kill sas_slave_destroy Dan Williams
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Sergei Shtylyov, Jack Wang

Commit 1e34c838 "[SCSI] libsas: remove spurious sata control register
read/write" removed the routines to fake the presence of the sata
control registers, now remove the unused data structure fields to kill
any remaining confusion.

Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c |    4 ----
 include/scsi/libsas.h         |    7 -------
 2 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index db9238f..83118d0 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -121,10 +121,6 @@ static void sas_ata_task_done(struct sas_task *task)
 			if (unlikely(link->eh_info.err_mask))
 				qc->flags |= ATA_QCFLAG_FAILED;
 		}
-
-		dev->sata_dev.sstatus = resp->sstatus;
-		dev->sata_dev.serror = resp->serror;
-		dev->sata_dev.scontrol = resp->scontrol;
 	} else {
 		ac = sas_to_ata_err(stat);
 		if (ac) {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 6a308d4..6e64b03 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -171,9 +171,6 @@ struct sata_device {
 	struct ata_port *ap;
 	struct ata_host ata_host;
 	struct ata_taskfile tf;
-	u32 sstatus;
-	u32 serror;
-	u32 scontrol;
 };
 
 /* ---------- Domain device ---------- */
@@ -487,10 +484,6 @@ enum exec_status {
 struct ata_task_resp {
 	u16  frame_len;
 	u8   ending_fis[24];	  /* dev to host or data-in */
-	u32  sstatus;
-	u32  serror;
-	u32  scontrol;
-	u32  sactive;
 };
 
 #define SAS_STATUS_BUF_SIZE 96


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 02/28] libsas: kill sas_slave_destroy
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
  2011-12-23  2:58 ` [PATCH v2 01/28] libsas: remove unused ata_task_resp fields Dan Williams
@ 2011-12-23  2:58 ` Dan Williams
  2011-12-23  2:58 ` [PATCH v2 03/28] libsas: fix domain_device leak Dan Williams
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Per commit 3e4ec344 "libata: kill ATA_FLAG_DISABLED" needing to set
ATA_DEV_NONE is a holdover from before libsas converted to the
"new-style" ata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/aic94xx/aic94xx_init.c |    1 -
 drivers/scsi/isci/init.c            |    1 -
 drivers/scsi/libsas/sas_scsi_host.c |    9 ---------
 drivers/scsi/mvsas/mv_init.c        |    1 -
 drivers/scsi/pm8001/pm8001_init.c   |    1 -
 include/scsi/libsas.h               |    1 -
 6 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index d5ff142..8db4e72 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -68,7 +68,6 @@ static struct scsi_host_template aic94xx_sht = {
 	.queuecommand		= sas_queuecommand,
 	.target_alloc		= sas_target_alloc,
 	.slave_configure	= sas_slave_configure,
-	.slave_destroy		= sas_slave_destroy,
 	.scan_finished		= asd_scan_finished,
 	.scan_start		= asd_scan_start,
 	.change_queue_depth	= sas_change_queue_depth,
diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index a97edab..f988c16 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -146,7 +146,6 @@ static struct scsi_host_template isci_sht = {
 	.queuecommand			= sas_queuecommand,
 	.target_alloc			= sas_target_alloc,
 	.slave_configure		= sas_slave_configure,
-	.slave_destroy			= sas_slave_destroy,
 	.scan_finished			= isci_host_scan_finished,
 	.scan_start			= isci_host_scan_start,
 	.change_queue_depth		= sas_change_queue_depth,
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index b6e233d..e95e5e1 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -797,14 +797,6 @@ int sas_slave_configure(struct scsi_device *scsi_dev)
 	return 0;
 }
 
-void sas_slave_destroy(struct scsi_device *scsi_dev)
-{
-	struct domain_device *dev = sdev_to_domain_dev(scsi_dev);
-
-	if (dev_is_sata(dev))
-		sas_to_ata_dev(dev)->class = ATA_DEV_NONE;
-}
-
 int sas_change_queue_depth(struct scsi_device *sdev, int depth, int reason)
 {
 	struct domain_device *dev = sdev_to_domain_dev(sdev);
@@ -1108,7 +1100,6 @@ EXPORT_SYMBOL_GPL(sas_request_addr);
 EXPORT_SYMBOL_GPL(sas_queuecommand);
 EXPORT_SYMBOL_GPL(sas_target_alloc);
 EXPORT_SYMBOL_GPL(sas_slave_configure);
-EXPORT_SYMBOL_GPL(sas_slave_destroy);
 EXPORT_SYMBOL_GPL(sas_change_queue_depth);
 EXPORT_SYMBOL_GPL(sas_change_queue_type);
 EXPORT_SYMBOL_GPL(sas_bios_param);
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 6f58919..d45878b 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -60,7 +60,6 @@ static struct scsi_host_template mvs_sht = {
 	.queuecommand		= sas_queuecommand,
 	.target_alloc		= sas_target_alloc,
 	.slave_configure	= sas_slave_configure,
-	.slave_destroy		= sas_slave_destroy,
 	.scan_finished		= mvs_scan_finished,
 	.scan_start		= mvs_scan_start,
 	.change_queue_depth	= sas_change_queue_depth,
diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index c21a216..bd165ea 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -62,7 +62,6 @@ static struct scsi_host_template pm8001_sht = {
 	.queuecommand		= sas_queuecommand,
 	.target_alloc		= sas_target_alloc,
 	.slave_configure	= sas_slave_configure,
-	.slave_destroy		= sas_slave_destroy,
 	.scan_finished		= pm8001_scan_finished,
 	.scan_start		= pm8001_scan_start,
 	.change_queue_depth	= sas_change_queue_depth,
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 6e64b03..2b14348 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -625,7 +625,6 @@ extern int sas_queuecommand(struct Scsi_Host * ,struct scsi_cmnd *);
 extern int sas_target_alloc(struct scsi_target *);
 extern int sas_slave_alloc(struct scsi_device *);
 extern int sas_slave_configure(struct scsi_device *);
-extern void sas_slave_destroy(struct scsi_device *);
 extern int sas_change_queue_depth(struct scsi_device *, int new_depth,
 				  int reason);
 extern int sas_change_queue_type(struct scsi_device *, int qt);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 03/28] libsas: fix domain_device leak
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
  2011-12-23  2:58 ` [PATCH v2 01/28] libsas: remove unused ata_task_resp fields Dan Williams
  2011-12-23  2:58 ` [PATCH v2 02/28] libsas: kill sas_slave_destroy Dan Williams
@ 2011-12-23  2:58 ` Dan Williams
  2011-12-23  2:58 ` [PATCH v2 04/28] libsas: fix leak of dev->sata_dev.identify_[packet_]device Dan Williams
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jack Wang

Arrange for the deallocation of a struct domain_device object when it no
longer has:
1/ any children
2/ references by any scsi_targets
3/ references by a lldd

The comment about domain_device lifetime in
Documentation/scsi/libsas.txt is stale as it appears mainline never had
a version of a struct domain_device that was registered as a kobject.
We now manage domain_device reference counts on behalf of external
agents.

Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/scsi/libsas.txt       |   15 ---------------
 drivers/scsi/libsas/sas_discover.c  |   36 +++++++++++++++++++++++------------
 drivers/scsi/libsas/sas_expander.c  |   10 ++++++----
 drivers/scsi/libsas/sas_internal.h  |   19 ++++++++++++++++++
 drivers/scsi/libsas/sas_scsi_host.c |   16 ++++++----------
 include/scsi/libsas.h               |    1 +
 6 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/Documentation/scsi/libsas.txt b/Documentation/scsi/libsas.txt
index aa54f54..3cc9c78 100644
--- a/Documentation/scsi/libsas.txt
+++ b/Documentation/scsi/libsas.txt
@@ -398,21 +398,6 @@ struct sas_task {
 	task_done -- callback when the task has finished execution
 };
 
-When an external entity, entity other than the LLDD or the
-SAS Layer, wants to work with a struct domain_device, it
-_must_ call kobject_get() when getting a handle on the
-device and kobject_put() when it is done with the device.
-
-This does two things:
-     A) implements proper kfree() for the device;
-     B) increments/decrements the kref for all players:
-     domain_device
-	all domain_device's ... (if past an expander)
-	    port
-		host adapter
-		     pci device
-			 and up the ladder, etc.
-
 DISCOVERY
 ---------
 
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 54a5199..4e64930 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -36,8 +36,6 @@
 
 void sas_init_dev(struct domain_device *dev)
 {
-        INIT_LIST_HEAD(&dev->siblings);
-        INIT_LIST_HEAD(&dev->dev_list_node);
         switch (dev->dev_type) {
         case SAS_END_DEV:
                 break;
@@ -73,14 +71,14 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	struct sas_rphy *rphy;
 	struct domain_device *dev;
 
-	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+	dev = sas_alloc_device();
 	if (!dev)
 		return -ENOMEM;
 
 	spin_lock_irqsave(&port->phy_list_lock, flags);
 	if (list_empty(&port->phy_list)) {
 		spin_unlock_irqrestore(&port->phy_list_lock, flags);
-		kfree(dev);
+		sas_put_device(dev);
 		return -ENODEV;
 	}
 	phy = container_of(port->phy_list.next, struct asd_sas_phy, port_phy_el);
@@ -130,7 +128,7 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	}
 
 	if (!rphy) {
-		kfree(dev);
+		sas_put_device(dev);
 		return -ENODEV;
 	}
 	rphy->identify.phy_identifier = phy->phy->identify.phy_identifier;
@@ -173,6 +171,7 @@ int sas_notify_lldd_dev_found(struct domain_device *dev)
 			       dev_name(sas_ha->dev),
 			       SAS_ADDR(dev->sas_addr), res);
 		}
+		kref_get(&dev->kref);
 	}
 	return res;
 }
@@ -184,8 +183,10 @@ void sas_notify_lldd_dev_gone(struct domain_device *dev)
 	struct Scsi_Host *shost = sas_ha->core.shost;
 	struct sas_internal *i = to_sas_internal(shost->transportt);
 
-	if (i->dft->lldd_dev_gone)
+	if (i->dft->lldd_dev_gone) {
 		i->dft->lldd_dev_gone(dev);
+		sas_put_device(dev);
+	}
 }
 
 /* ---------- Common/dispatchers ---------- */
@@ -219,6 +220,20 @@ out_err2:
 
 /* ---------- Device registration and unregistration ---------- */
 
+void sas_free_device(struct kref *kref)
+{
+	struct domain_device *dev = container_of(kref, typeof(*dev), kref);
+
+	if (dev->parent)
+		sas_put_device(dev->parent);
+
+	/* remove the phys and ports, everything else should be gone */
+	if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV)
+		kfree(dev->ex_dev.ex_phy);
+
+	kfree(dev);
+}
+
 static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_device *dev)
 {
 	sas_notify_lldd_dev_gone(dev);
@@ -230,6 +245,8 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 	spin_lock_irq(&port->dev_list_lock);
 	list_del_init(&dev->dev_list_node);
 	spin_unlock_irq(&port->dev_list_lock);
+
+	sas_put_device(dev);
 }
 
 void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
@@ -239,11 +256,6 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
 		sas_rphy_delete(dev->rphy);
 		dev->rphy = NULL;
 	}
-	if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV) {
-		/* remove the phys and ports, everything else should be gone */
-		kfree(dev->ex_dev.ex_phy);
-		dev->ex_dev.ex_phy = NULL;
-	}
 	sas_unregister_common_dev(port, dev);
 }
 
@@ -322,7 +334,7 @@ static void sas_discover_domain(struct work_struct *work)
 		list_del_init(&dev->dev_list_node);
 		spin_unlock_irq(&port->dev_list_lock);
 
-		kfree(dev); /* not kobject_register-ed yet */
+		sas_put_device(dev);
 		port->port_dev = NULL;
 	}
 
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 1b831c5..15d2239 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -657,10 +657,11 @@ static struct domain_device *sas_ex_discover_end_dev(
 	if (phy->attached_sata_host || phy->attached_sata_ps)
 		return NULL;
 
-	child = kzalloc(sizeof(*child), GFP_KERNEL);
+	child = sas_alloc_device();
 	if (!child)
 		return NULL;
 
+	kref_get(&parent->kref);
 	child->parent = parent;
 	child->port   = parent->port;
 	child->iproto = phy->attached_iproto;
@@ -762,7 +763,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 	sas_port_delete(phy->port);
  out_err:
 	phy->port = NULL;
-	kfree(child);
+	sas_put_device(child);
 	return NULL;
 }
 
@@ -809,7 +810,7 @@ static struct domain_device *sas_ex_discover_expander(
 			    phy->attached_phy_id);
 		return NULL;
 	}
-	child = kzalloc(sizeof(*child), GFP_KERNEL);
+	child = sas_alloc_device();
 	if (!child)
 		return NULL;
 
@@ -835,6 +836,7 @@ static struct domain_device *sas_ex_discover_expander(
 	child->rphy = rphy;
 	edev = rphy_to_expander_device(rphy);
 	child->dev_type = phy->attached_dev_type;
+	kref_get(&parent->kref);
 	child->parent = parent;
 	child->port = port;
 	child->iproto = phy->attached_iproto;
@@ -858,7 +860,7 @@ static struct domain_device *sas_ex_discover_expander(
 		spin_lock_irq(&parent->port->dev_list_lock);
 		list_del(&child->dev_list_node);
 		spin_unlock_irq(&parent->port->dev_list_lock);
-		kfree(child);
+		sas_put_device(child);
 		return NULL;
 	}
 	list_add_tail(&child->siblings, &parent->ex_dev.children);
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 14e21b5..0d43408 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -76,6 +76,8 @@ struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
 
 void sas_hae_reset(struct work_struct *work);
 
+void sas_free_device(struct kref *kref);
+
 #ifdef CONFIG_SCSI_SAS_HOST_SMP
 extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
 				struct request *rsp);
@@ -161,4 +163,21 @@ static inline void sas_add_parent_port(struct domain_device *dev, int phy_id)
 	sas_port_add_phy(ex->parent_port, ex_phy->phy);
 }
 
+static inline struct domain_device *sas_alloc_device(void)
+{
+	struct domain_device *dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+
+	if (dev) {
+		INIT_LIST_HEAD(&dev->siblings);
+		INIT_LIST_HEAD(&dev->dev_list_node);
+		kref_init(&dev->kref);
+	}
+	return dev;
+}
+
+static inline void sas_put_device(struct domain_device *dev)
+{
+	kref_put(&dev->kref, sas_free_device);
+}
+
 #endif /* _SAS_INTERNAL_H_ */
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index e95e5e1..2a163c7 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -737,16 +737,10 @@ struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy)
 	return found_dev;
 }
 
-static inline struct domain_device *sas_find_target(struct scsi_target *starget)
-{
-	struct sas_rphy *rphy = dev_to_rphy(starget->dev.parent);
-
-	return sas_find_dev_by_rphy(rphy);
-}
-
 int sas_target_alloc(struct scsi_target *starget)
 {
-	struct domain_device *found_dev = sas_find_target(starget);
+	struct sas_rphy *rphy = dev_to_rphy(starget->dev.parent);
+	struct domain_device *found_dev = sas_find_dev_by_rphy(rphy);
 	int res;
 
 	if (!found_dev)
@@ -758,6 +752,7 @@ int sas_target_alloc(struct scsi_target *starget)
 			return res;
 	}
 
+	kref_get(&found_dev->kref);
 	starget->hostdata = found_dev;
 	return 0;
 }
@@ -1047,7 +1042,7 @@ int sas_slave_alloc(struct scsi_device *scsi_dev)
 
 void sas_target_destroy(struct scsi_target *starget)
 {
-	struct domain_device *found_dev = sas_find_target(starget);
+	struct domain_device *found_dev = starget->hostdata;
 
 	if (!found_dev)
 		return;
@@ -1055,7 +1050,8 @@ void sas_target_destroy(struct scsi_target *starget)
 	if (dev_is_sata(found_dev))
 		ata_sas_port_destroy(found_dev->sata_dev.ap);
 
-	return;
+	starget->hostdata = NULL;
+	sas_put_device(found_dev);
 }
 
 static void sas_parse_addr(u8 *sas_addr, const char *p)
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 2b14348..7ecb5c1 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -206,6 +206,7 @@ struct domain_device {
 
         void *lldd_dev;
 	int gone;
+	struct kref kref;
 };
 
 struct sas_discovery_event {


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 04/28] libsas: fix leak of dev->sata_dev.identify_[packet_]device
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (2 preceding siblings ...)
  2011-12-23  2:58 ` [PATCH v2 03/28] libsas: fix domain_device leak Dan Williams
@ 2011-12-23  2:58 ` Dan Williams
  2011-12-23  2:58 ` [PATCH v2 05/28] libsas: replace event locks with atomic bitops Dan Williams
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jack Wang

These are never freed in the nominal path.  A domain_device has a
different lifetime than a sas_rphy we need a dev->rphy independent way
of identifying sata devices.

Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |    6 ++++++
 include/scsi/sas_ata.h             |    3 ++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 4e64930..dc52b1f 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -30,6 +30,7 @@
 
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_sas.h>
+#include <scsi/sas_ata.h>
 #include "../scsi_sas_internal.h"
 
 /* ---------- Basic task processing for discovery purposes ---------- */
@@ -231,6 +232,11 @@ void sas_free_device(struct kref *kref)
 	if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV)
 		kfree(dev->ex_dev.ex_phy);
 
+	if (dev_is_sata(dev)) {
+		kfree(dev->sata_dev.identify_device);
+		kfree(dev->sata_dev.identify_packet_device);
+	}
+
 	kfree(dev);
 }
 
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 9c159f7..7d5013f 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -32,7 +32,8 @@
 
 static inline int dev_is_sata(struct domain_device *dev)
 {
-	return (dev->rphy->identify.target_port_protocols & SAS_PROTOCOL_SATA);
+	return dev->dev_type == SATA_DEV || dev->dev_type == SATA_PM ||
+	       dev->dev_type == SATA_PM_PORT;
 }
 
 int sas_ata_init_host_and_port(struct domain_device *found_dev,


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 05/28] libsas: replace event locks with atomic bitops
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (3 preceding siblings ...)
  2011-12-23  2:58 ` [PATCH v2 04/28] libsas: fix leak of dev->sata_dev.identify_[packet_]device Dan Williams
@ 2011-12-23  2:58 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 06/28] libsas: convert ha->state to flags Dan Williams
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:58 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

The locks only served to make sure the pending event bitmask was updated
consistently.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |   10 +++-------
 drivers/scsi/libsas/sas_event.c    |    8 +++-----
 drivers/scsi/libsas/sas_init.c     |    3 +--
 drivers/scsi/libsas/sas_internal.h |   32 +++++++-------------------------
 drivers/scsi/libsas/sas_phy.c      |   12 ++++--------
 drivers/scsi/libsas/sas_port.c     |   15 +++++----------
 include/scsi/libsas.h              |    3 ---
 7 files changed, 23 insertions(+), 60 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index dc52b1f..ed04118 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -295,8 +295,7 @@ static void sas_discover_domain(struct work_struct *work)
 		container_of(work, struct sas_discovery_event, work);
 	struct asd_sas_port *port = ev->port;
 
-	sas_begin_event(DISCE_DISCOVER_DOMAIN, &port->disc.disc_event_lock,
-			&port->disc.pending);
+	clear_bit(DISCE_DISCOVER_DOMAIN, &port->disc.pending);
 
 	if (port->port_dev)
 		return;
@@ -355,8 +354,7 @@ static void sas_revalidate_domain(struct work_struct *work)
 		container_of(work, struct sas_discovery_event, work);
 	struct asd_sas_port *port = ev->port;
 
-	sas_begin_event(DISCE_REVALIDATE_DOMAIN, &port->disc.disc_event_lock,
-			&port->disc.pending);
+	clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
 
 	SAS_DPRINTK("REVALIDATING DOMAIN on port %d, pid:%d\n", port->id,
 		    task_pid_nr(current));
@@ -379,8 +377,7 @@ int sas_discover_event(struct asd_sas_port *port, enum discover_event ev)
 
 	BUG_ON(ev >= DISC_NUM_EVENTS);
 
-	sas_queue_event(ev, &disc->disc_event_lock, &disc->pending,
-			&disc->disc_work[ev].work, port->ha);
+	sas_queue_event(ev, &disc->pending, &disc->disc_work[ev].work, port->ha);
 
 	return 0;
 }
@@ -400,7 +397,6 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 		[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 	};
 
-	spin_lock_init(&disc->disc_event_lock);
 	disc->pending = 0;
 	for (i = 0; i < DISC_NUM_EVENTS; i++) {
 		INIT_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 9db30fb..9c084bc 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -30,7 +30,7 @@ static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event)
 {
 	BUG_ON(event >= HA_NUM_EVENTS);
 
-	sas_queue_event(event, &sas_ha->event_lock, &sas_ha->pending,
+	sas_queue_event(event, &sas_ha->pending,
 			&sas_ha->ha_events[event].work, sas_ha);
 }
 
@@ -40,7 +40,7 @@ static void notify_port_event(struct asd_sas_phy *phy, enum port_event event)
 
 	BUG_ON(event >= PORT_NUM_EVENTS);
 
-	sas_queue_event(event, &ha->event_lock, &phy->port_events_pending,
+	sas_queue_event(event, &phy->port_events_pending,
 			&phy->port_events[event].work, ha);
 }
 
@@ -50,7 +50,7 @@ static void notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
 
 	BUG_ON(event >= PHY_NUM_EVENTS);
 
-	sas_queue_event(event, &ha->event_lock, &phy->phy_events_pending,
+	sas_queue_event(event, &phy->phy_events_pending,
 			&phy->phy_events[event].work, ha);
 }
 
@@ -62,8 +62,6 @@ int sas_init_events(struct sas_ha_struct *sas_ha)
 
 	int i;
 
-	spin_lock_init(&sas_ha->event_lock);
-
 	for (i = 0; i < HA_NUM_EVENTS; i++) {
 		INIT_WORK(&sas_ha->ha_events[i].work, sas_ha_event_fns[i]);
 		sas_ha->ha_events[i].ha = sas_ha;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index d81c3b1..a435876 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -97,8 +97,7 @@ void sas_hae_reset(struct work_struct *work)
 		container_of(work, struct sas_ha_event, work);
 	struct sas_ha_struct *ha = ev->ha;
 
-	sas_begin_event(HAE_RESET, &ha->event_lock,
-			&ha->pending);
+	clear_bit(HAE_RESET, &ha->pending);
 }
 
 int sas_register_ha(struct sas_ha_struct *sas_ha)
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 0d43408..7fe4ede 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -92,36 +92,18 @@ static inline int sas_smp_host_handler(struct Scsi_Host *shost,
 }
 #endif
 
-static inline void sas_queue_event(int event, spinlock_t *lock,
-				   unsigned long *pending,
+static inline void sas_queue_event(int event, unsigned long *pending,
 				   struct work_struct *work,
 				   struct sas_ha_struct *sas_ha)
 {
-	unsigned long flags;
+	if (!test_and_set_bit(event, pending)) {
+		unsigned long flags;
 
-	spin_lock_irqsave(lock, flags);
-	if (test_bit(event, pending)) {
-		spin_unlock_irqrestore(lock, flags);
-		return;
+		spin_lock_irqsave(&sas_ha->state_lock, flags);
+		if (sas_ha->state != SAS_HA_UNREGISTERED)
+			scsi_queue_work(sas_ha->core.shost, work);
+		spin_unlock_irqrestore(&sas_ha->state_lock, flags);
 	}
-	__set_bit(event, pending);
-	spin_unlock_irqrestore(lock, flags);
-
-	spin_lock_irqsave(&sas_ha->state_lock, flags);
-	if (sas_ha->state != SAS_HA_UNREGISTERED) {
-		scsi_queue_work(sas_ha->core.shost, work);
-	}
-	spin_unlock_irqrestore(&sas_ha->state_lock, flags);
-}
-
-static inline void sas_begin_event(int event, spinlock_t *lock,
-				   unsigned long *pending)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(lock, flags);
-	__clear_bit(event, pending);
-	spin_unlock_irqrestore(lock, flags);
 }
 
 static inline void sas_fill_in_rphy(struct domain_device *dev,
diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index e0f5018..dcfd4a9 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -36,8 +36,7 @@ static void sas_phye_loss_of_signal(struct work_struct *work)
 		container_of(work, struct asd_sas_event, work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	sas_begin_event(PHYE_LOSS_OF_SIGNAL, &phy->ha->event_lock,
-			&phy->phy_events_pending);
+	clear_bit(PHYE_LOSS_OF_SIGNAL, &phy->phy_events_pending);
 	phy->error = 0;
 	sas_deform_port(phy, 1);
 }
@@ -48,8 +47,7 @@ static void sas_phye_oob_done(struct work_struct *work)
 		container_of(work, struct asd_sas_event, work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	sas_begin_event(PHYE_OOB_DONE, &phy->ha->event_lock,
-			&phy->phy_events_pending);
+	clear_bit(PHYE_OOB_DONE, &phy->phy_events_pending);
 	phy->error = 0;
 }
 
@@ -63,8 +61,7 @@ static void sas_phye_oob_error(struct work_struct *work)
 	struct sas_internal *i =
 		to_sas_internal(sas_ha->core.shost->transportt);
 
-	sas_begin_event(PHYE_OOB_ERROR, &phy->ha->event_lock,
-			&phy->phy_events_pending);
+	clear_bit(PHYE_OOB_ERROR, &phy->phy_events_pending);
 
 	sas_deform_port(phy, 1);
 
@@ -95,8 +92,7 @@ static void sas_phye_spinup_hold(struct work_struct *work)
 	struct sas_internal *i =
 		to_sas_internal(sas_ha->core.shost->transportt);
 
-	sas_begin_event(PHYE_SPINUP_HOLD, &phy->ha->event_lock,
-			&phy->phy_events_pending);
+	clear_bit(PHYE_SPINUP_HOLD, &phy->phy_events_pending);
 
 	phy->error = 0;
 	i->dft->lldd_control_phy(phy, PHY_FUNC_RELEASE_SPINUP_HOLD, NULL);
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index 42fd1f2..a47c7a7 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -213,8 +213,7 @@ void sas_porte_bytes_dmaed(struct work_struct *work)
 		container_of(work, struct asd_sas_event, work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	sas_begin_event(PORTE_BYTES_DMAED, &phy->ha->event_lock,
-			&phy->port_events_pending);
+	clear_bit(PORTE_BYTES_DMAED, &phy->port_events_pending);
 
 	sas_form_port(phy);
 }
@@ -227,8 +226,7 @@ void sas_porte_broadcast_rcvd(struct work_struct *work)
 	unsigned long flags;
 	u32 prim;
 
-	sas_begin_event(PORTE_BROADCAST_RCVD, &phy->ha->event_lock,
-			&phy->port_events_pending);
+	clear_bit(PORTE_BROADCAST_RCVD, &phy->port_events_pending);
 
 	spin_lock_irqsave(&phy->sas_prim_lock, flags);
 	prim = phy->sas_prim;
@@ -244,8 +242,7 @@ void sas_porte_link_reset_err(struct work_struct *work)
 		container_of(work, struct asd_sas_event, work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	sas_begin_event(PORTE_LINK_RESET_ERR, &phy->ha->event_lock,
-			&phy->port_events_pending);
+	clear_bit(PORTE_LINK_RESET_ERR, &phy->port_events_pending);
 
 	sas_deform_port(phy, 1);
 }
@@ -256,8 +253,7 @@ void sas_porte_timer_event(struct work_struct *work)
 		container_of(work, struct asd_sas_event, work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	sas_begin_event(PORTE_TIMER_EVENT, &phy->ha->event_lock,
-			&phy->port_events_pending);
+	clear_bit(PORTE_TIMER_EVENT, &phy->port_events_pending);
 
 	sas_deform_port(phy, 1);
 }
@@ -268,8 +264,7 @@ void sas_porte_hard_reset(struct work_struct *work)
 		container_of(work, struct asd_sas_event, work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	sas_begin_event(PORTE_HARD_RESET, &phy->ha->event_lock,
-			&phy->port_events_pending);
+	clear_bit(PORTE_HARD_RESET, &phy->port_events_pending);
 
 	sas_deform_port(phy, 1);
 }
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 7ecb5c1..de63a66 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -215,7 +215,6 @@ struct sas_discovery_event {
 };
 
 struct sas_discovery {
-	spinlock_t disc_event_lock;
 	struct sas_discovery_event disc_work[DISC_NUM_EVENTS];
 	unsigned long    pending;
 	u8     fanout_sas_addr[8];
@@ -272,7 +271,6 @@ struct asd_sas_event {
  */
 struct asd_sas_phy {
 /* private: */
-	/* protected by ha->event_lock */
 	struct asd_sas_event   port_events[PORT_NUM_EVENTS];
 	struct asd_sas_event   phy_events[PHY_NUM_EVENTS];
 
@@ -337,7 +335,6 @@ enum sas_ha_state {
 
 struct sas_ha_struct {
 /* private: */
-	spinlock_t       event_lock;
 	struct sas_ha_event ha_events[HA_NUM_EVENTS];
 	unsigned long	 pending;
 


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 06/28] libsas: convert ha->state to flags
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (4 preceding siblings ...)
  2011-12-23  2:58 ` [PATCH v2 05/28] libsas: replace event locks with atomic bitops Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 07/28] libsas: introduce sas_drain_work() Dan Williams
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

In preparation for adding new states (SAS_HA_DRAINING, SAS_HA_FROZEN),
convert ha->state into a set of flags.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_init.c     |    4 ++--
 drivers/scsi/libsas/sas_internal.h |    2 +-
 include/scsi/libsas.h              |    3 +--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index a435876..da244e6 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -112,7 +112,7 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 	else if (sas_ha->lldd_queue_size == -1)
 		sas_ha->lldd_queue_size = 128; /* Sanity */
 
-	sas_ha->state = SAS_HA_REGISTERED;
+	set_bit(SAS_HA_REGISTERED, &sas_ha->state);
 	spin_lock_init(&sas_ha->state_lock);
 
 	error = sas_register_phys(sas_ha);
@@ -160,7 +160,7 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	/* Set the state to unregistered to avoid further
 	 * events to be queued */
 	spin_lock_irqsave(&sas_ha->state_lock, flags);
-	sas_ha->state = SAS_HA_UNREGISTERED;
+	clear_bit(SAS_HA_REGISTERED, &sas_ha->state);
 	spin_unlock_irqrestore(&sas_ha->state_lock, flags);
 	scsi_flush_work(sas_ha->core.shost);
 
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 7fe4ede..1fd84b3 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -100,7 +100,7 @@ static inline void sas_queue_event(int event, unsigned long *pending,
 		unsigned long flags;
 
 		spin_lock_irqsave(&sas_ha->state_lock, flags);
-		if (sas_ha->state != SAS_HA_UNREGISTERED)
+		if (test_bit(SAS_HA_REGISTERED, &sas_ha->state))
 			scsi_queue_work(sas_ha->core.shost, work);
 		spin_unlock_irqrestore(&sas_ha->state_lock, flags);
 	}
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index de63a66..8e402d5 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -330,7 +330,6 @@ struct sas_ha_event {
 
 enum sas_ha_state {
 	SAS_HA_REGISTERED,
-	SAS_HA_UNREGISTERED
 };
 
 struct sas_ha_struct {
@@ -338,7 +337,7 @@ struct sas_ha_struct {
 	struct sas_ha_event ha_events[HA_NUM_EVENTS];
 	unsigned long	 pending;
 
-	enum sas_ha_state state;
+	unsigned long	  state;
 	spinlock_t 	  state_lock;
 
 	struct scsi_core core;


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 07/28] libsas: introduce sas_drain_work()
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (5 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 06/28] libsas: convert ha->state to flags Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 08/28] libsas: remove ata_port.lock management duties from lldds Dan Williams
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide

When an lldd invokes ->notify_port_event() it can trigger a chain of libsas
events to:

  1/ form the port and find the direct attached device

  2/ if the attached device is an expander perform domain discovery

A call to flush_workqueue() will only flush the initial port formation work.
Currently libsas users need to call scsi_flush_work() up to the max depth of
chain (which will grow from 2 to 3 when ata discovery is moved to its own
discovery event).  Instead of open coding multiple calls switch to use
drain_workqueue() to flush sas work.

drain_workqueue() does not handle new work submitted during the drain so
libsas needs a bit of infrastructure to hold off unchained work submissions
while a drain is in flight.  A lldd ->notify() event is considered 'unchained'
while a sas_discover_event() is 'chained'.  As Tejun notes:

  "For now, I think it would be best to add private wrapper in libsas to
   support deferring unchained work items while draining."

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/aic94xx/aic94xx_init.c |    2 +
 drivers/scsi/isci/host.c            |    8 ++---
 drivers/scsi/libsas/sas_discover.c  |   21 +++++++++++++-
 drivers/scsi/libsas/sas_event.c     |   54 +++++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_init.c      |    9 ++++--
 drivers/scsi/libsas/sas_internal.h  |   14 ---------
 drivers/scsi/mvsas/mv_sas.c         |    2 +
 drivers/scsi/pm8001/pm8001_sas.c    |    4 ++-
 include/scsi/libsas.h               |    4 +++
 9 files changed, 92 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 8db4e72..2b3717f 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -971,7 +971,7 @@ static int asd_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	if (time < HZ)
 		return 0;
 	/* Wait for discovery to finish */
-	scsi_flush_work(shost);
+	sas_drain_work(SHOST_TO_SAS_HA(shost));
 	return 1;
 }
 
diff --git a/drivers/scsi/isci/host.c b/drivers/scsi/isci/host.c
index e7fe9c4..e7e5d06 100644
--- a/drivers/scsi/isci/host.c
+++ b/drivers/scsi/isci/host.c
@@ -650,15 +650,13 @@ static void isci_host_start_complete(struct isci_host *ihost, enum sci_status co
 
 int isci_host_scan_finished(struct Scsi_Host *shost, unsigned long time)
 {
-	struct isci_host *ihost = SHOST_TO_SAS_HA(shost)->lldd_ha;
+	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+	struct isci_host *ihost = ha->lldd_ha;
 
 	if (test_bit(IHOST_START_PENDING, &ihost->flags))
 		return 0;
 
-	/* todo: use sas_flush_discovery once it is upstream */
-	scsi_flush_work(shost);
-
-	scsi_flush_work(shost);
+	sas_drain_work(ha);
 
 	dev_dbg(&ihost->pdev->dev,
 		"%s: ihost->status = %d, time = %ld\n",
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index ed04118..32e0117 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -367,6 +367,25 @@ static void sas_revalidate_domain(struct work_struct *work)
 
 /* ---------- Events ---------- */
 
+static void sas_chain_work(struct sas_ha_struct *ha, struct work_struct *work)
+{
+	/* chained work is not subject to SA_HA_DRAINING or SAS_HA_REGISTERED */
+	scsi_queue_work(ha->core.shost, work);
+}
+
+static void sas_chain_event(int event, unsigned long *pending,
+			    struct work_struct *work,
+			    struct sas_ha_struct *ha)
+{
+	if (!test_and_set_bit(event, pending)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&ha->state_lock, flags);
+		sas_chain_work(ha, work);
+		spin_unlock_irqrestore(&ha->state_lock, flags);
+	}
+}
+
 int sas_discover_event(struct asd_sas_port *port, enum discover_event ev)
 {
 	struct sas_discovery *disc;
@@ -377,7 +396,7 @@ int sas_discover_event(struct asd_sas_port *port, enum discover_event ev)
 
 	BUG_ON(ev >= DISC_NUM_EVENTS);
 
-	sas_queue_event(ev, &disc->pending, &disc->disc_work[ev].work, port->ha);
+	sas_chain_event(ev, &disc->pending, &disc->disc_work[ev].work, port->ha);
 
 	return 0;
 }
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 9c084bc..3ff73af 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -22,10 +22,64 @@
  *
  */
 
+#include <linux/export.h>
 #include <scsi/scsi_host.h>
 #include "sas_internal.h"
 #include "sas_dump.h"
 
+static void sas_queue_work(struct sas_ha_struct *ha, struct work_struct *work)
+{
+	if (!test_bit(SAS_HA_REGISTERED, &ha->state))
+		return;
+
+	if (test_bit(SAS_HA_DRAINING, &ha->state))
+		list_add(&work->entry, &ha->defer_q);
+	else
+		scsi_queue_work(ha->core.shost, work);
+}
+
+static void sas_queue_event(int event, unsigned long *pending,
+			    struct work_struct *work,
+			    struct sas_ha_struct *ha)
+{
+	if (!test_and_set_bit(event, pending)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&ha->state_lock, flags);
+		sas_queue_work(ha, work);
+		spin_unlock_irqrestore(&ha->state_lock, flags);
+	}
+}
+
+int sas_drain_work(struct sas_ha_struct *ha)
+{
+	struct workqueue_struct *wq = ha->core.shost->work_q;
+	struct work_struct *w, *_w;
+	int ret;
+
+	mutex_lock(&ha->drain_mutex);
+
+	set_bit(SAS_HA_DRAINING, &ha->state);
+	/* flush submitters */
+	spin_lock_irq(&ha->state_lock);
+	spin_unlock_irq(&ha->state_lock);
+
+	drain_workqueue(wq);
+
+	spin_lock_irq(&ha->state_lock);
+	clear_bit(SAS_HA_DRAINING, &ha->state);
+	ret = !list_empty(&ha->defer_q);
+	list_for_each_entry_safe(w, _w, &ha->defer_q, entry) {
+		list_del_init(&w->entry);
+		sas_queue_work(ha, w);
+	}
+	spin_unlock_irq(&ha->state_lock);
+	mutex_unlock(&ha->drain_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sas_drain_work);
+
 static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event)
 {
 	BUG_ON(event >= HA_NUM_EVENTS);
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index da244e6..572b943 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -114,6 +114,8 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 
 	set_bit(SAS_HA_REGISTERED, &sas_ha->state);
 	spin_lock_init(&sas_ha->state_lock);
+	mutex_init(&sas_ha->drain_mutex);
+	INIT_LIST_HEAD(&sas_ha->defer_q);
 
 	error = sas_register_phys(sas_ha);
 	if (error) {
@@ -157,12 +159,13 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 {
 	unsigned long flags;
 
-	/* Set the state to unregistered to avoid further
-	 * events to be queued */
+	/* Set the state to unregistered to avoid further unchained
+	 * events to be queued
+	 */
 	spin_lock_irqsave(&sas_ha->state_lock, flags);
 	clear_bit(SAS_HA_REGISTERED, &sas_ha->state);
 	spin_unlock_irqrestore(&sas_ha->state_lock, flags);
-	scsi_flush_work(sas_ha->core.shost);
+	sas_drain_work(sas_ha);
 
 	sas_unregister_ports(sas_ha);
 
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 1fd84b3..948ea64 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -92,20 +92,6 @@ static inline int sas_smp_host_handler(struct Scsi_Host *shost,
 }
 #endif
 
-static inline void sas_queue_event(int event, unsigned long *pending,
-				   struct work_struct *work,
-				   struct sas_ha_struct *sas_ha)
-{
-	if (!test_and_set_bit(event, pending)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&sas_ha->state_lock, flags);
-		if (test_bit(SAS_HA_REGISTERED, &sas_ha->state))
-			scsi_queue_work(sas_ha->core.shost, work);
-		spin_unlock_irqrestore(&sas_ha->state_lock, flags);
-	}
-}
-
 static inline void sas_fill_in_rphy(struct domain_device *dev,
 				    struct sas_rphy *rphy)
 {
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index a4884a5..b118e63 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -308,7 +308,7 @@ int mvs_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	if (mvs_prv->scan_finished == 0)
 		return 0;
 
-	scsi_flush_work(shost);
+	sas_drain_work(sha);
 	return 1;
 }
 
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index fb3dc99..13811c7 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -234,12 +234,14 @@ void pm8001_scan_start(struct Scsi_Host *shost)
 
 int pm8001_scan_finished(struct Scsi_Host *shost, unsigned long time)
 {
+	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+
 	/* give the phy enabling interrupt event time to come in (1s
 	* is empirically about all it takes) */
 	if (time < HZ)
 		return 0;
 	/* Wait for discovery to finish */
-	scsi_flush_work(shost);
+	sas_drain_work(ha);
 	return 1;
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 8e402d5..42900fa 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -330,6 +330,7 @@ struct sas_ha_event {
 
 enum sas_ha_state {
 	SAS_HA_REGISTERED,
+	SAS_HA_DRAINING,
 };
 
 struct sas_ha_struct {
@@ -337,6 +338,8 @@ struct sas_ha_struct {
 	struct sas_ha_event ha_events[HA_NUM_EVENTS];
 	unsigned long	 pending;
 
+	struct list_head  defer_q; /* work queued while draining */
+	struct mutex	  drain_mutex;
 	unsigned long	  state;
 	spinlock_t 	  state_lock;
 
@@ -657,6 +660,7 @@ int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd);
 extern void sas_target_destroy(struct scsi_target *);
 extern int sas_slave_alloc(struct scsi_device *);
 extern int sas_ioctl(struct scsi_device *sdev, int cmd, void __user *arg);
+extern int sas_drain_work(struct sas_ha_struct *ha);
 
 extern int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 			   struct request *req);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 08/28] libsas: remove ata_port.lock management duties from lldds
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (6 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 07/28] libsas: introduce sas_drain_work() Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 09/28] libsas: prevent domain rediscovery competing with ata error handling Dan Williams
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: Xiangliang Yu, linux-ide, Christoph Hellwig, Jack Wang

Each libsas driver (mvsas, pm8001, and isci) has invented a different
method for managing the ap->lock.  The lock is held by the ata
->queuecommand() path.  mvsas drops it prior to acquiring any internal
locks which allows it to hold its internal lock across calls to
task->task_done().  This capability is important as it is the only way
the driver can flush task->task_done() instances to guarantee that it no
longer has any in-flight references to a domain_device at
->lldd_dev_gone() time.

Assumes ->queuecommand() is always called with irqs enabled which was
the assumption mvsas was making prior to the conversion.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jack Wang <jack_wang@usish.com>
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/isci/request.c         |    3 +--
 drivers/scsi/isci/task.c            |    6 ++----
 drivers/scsi/isci/task.h            |   36 -----------------------------------
 drivers/scsi/libsas/sas_ata.c       |   31 ++++++++++++++++++------------
 drivers/scsi/libsas/sas_scsi_host.c |    6 ++----
 drivers/scsi/mvsas/mv_sas.c         |    6 ------
 drivers/scsi/pm8001/pm8001_sas.c    |    6 +-----
 7 files changed, 24 insertions(+), 70 deletions(-)

diff --git a/drivers/scsi/isci/request.c b/drivers/scsi/isci/request.c
index 192cb48..83383ef 100644
--- a/drivers/scsi/isci/request.c
+++ b/drivers/scsi/isci/request.c
@@ -3649,8 +3649,7 @@ int isci_request_execute(struct isci_host *ihost, struct isci_remote_device *ide
 		/* Cause this task to be scheduled in the SCSI error
 		 * handler thread.
 		 */
-		isci_execpath_callback(ihost, task,
-				       sas_task_abort);
+		sas_task_abort(task);
 
 		/* Change the status, since we are holding
 		 * the I/O until it is managed by the SCSI
diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
index 66ad3dc..5901a0e 100644
--- a/drivers/scsi/isci/task.c
+++ b/drivers/scsi/isci/task.c
@@ -96,8 +96,7 @@ static void isci_task_refuse(struct isci_host *ihost, struct sas_task *task,
 			__func__, task, response, status);
 
 		task->lldd_task = NULL;
-
-		isci_execpath_callback(ihost, task, task->task_done);
+		task->task_done(task);
 		break;
 
 	case isci_perform_aborted_io_completion:
@@ -117,8 +116,7 @@ static void isci_task_refuse(struct isci_host *ihost, struct sas_task *task,
 			"%s: Error - task = %p, response=%d, "
 			"status=%d\n",
 			__func__, task, response, status);
-
-		isci_execpath_callback(ihost, task, sas_task_abort);
+		sas_task_abort(task);
 		break;
 
 	default:
diff --git a/drivers/scsi/isci/task.h b/drivers/scsi/isci/task.h
index bc78c0a..df8d440 100644
--- a/drivers/scsi/isci/task.h
+++ b/drivers/scsi/isci/task.h
@@ -322,40 +322,4 @@ isci_task_set_completion_status(
 	return task_notification_selection;
 
 }
-/**
-* isci_execpath_callback() - This function is called from the task
-* execute path when the task needs to callback libsas about the submit-time
-* task failure.  The callback occurs either through the task's done function
-* or through sas_task_abort.  In the case of regular non-discovery SATA/STP I/O
-* requests, libsas takes the host lock before calling execute task.  Therefore
-* in this situation the host lock must be managed before calling the func.
-*
-* @ihost: This parameter is the controller to which the I/O request was sent.
-* @task: This parameter is the I/O request.
-* @func: This parameter is the function to call in the correct context.
-* @status: This parameter is the status code for the completed task.
-*
-*/
-static inline void isci_execpath_callback(struct isci_host *ihost,
-					  struct sas_task  *task,
-					  void (*func)(struct sas_task *))
-{
-	struct domain_device *dev = task->dev;
-
-	if (dev_is_sata(dev) && task->uldd_task) {
-		unsigned long flags;
-
-		/* Since we are still in the submit path, and since
-		 * libsas takes the host lock on behalf of SATA
-		 * devices before I/O starts (in the non-discovery case),
-		 * we need to unlock before we can call the callback function.
-		 */
-		raw_local_irq_save(flags);
-		spin_unlock(dev->sata_dev.ap->lock);
-		func(task);
-		spin_lock(dev->sata_dev.ap->lock);
-		raw_local_irq_restore(flags);
-	} else
-		func(task);
-}
 #endif /* !defined(_SCI_TASK_H_) */
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 83118d0..0489001 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -166,23 +166,26 @@ qc_already_gone:
 
 static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
 {
-	int res;
+	unsigned int si;
+	unsigned int xfer = 0;
 	struct sas_task *task;
-	struct domain_device *dev = qc->ap->private_data;
+	struct scatterlist *sg;
+	int ret = AC_ERR_SYSTEM;
+	struct ata_port *ap = qc->ap;
+	struct domain_device *dev = ap->private_data;
 	struct sas_ha_struct *sas_ha = dev->port->ha;
 	struct Scsi_Host *host = sas_ha->core.shost;
 	struct sas_internal *i = to_sas_internal(host->transportt);
-	struct scatterlist *sg;
-	unsigned int xfer = 0;
-	unsigned int si;
+
+	spin_unlock_irq(ap->lock);
 
 	/* If the device fell off, no sense in issuing commands */
 	if (dev->gone)
-		return AC_ERR_SYSTEM;
+		goto out;
 
 	task = sas_alloc_task(GFP_ATOMIC);
 	if (!task)
-		return AC_ERR_SYSTEM;
+		goto out;
 	task->dev = dev;
 	task->task_proto = SAS_PROTOCOL_STP;
 	task->task_done = sas_ata_task_done;
@@ -227,21 +230,23 @@ static unsigned int sas_ata_qc_issue(struct ata_queued_cmd *qc)
 		ASSIGN_SAS_TASK(qc->scsicmd, task);
 
 	if (sas_ha->lldd_max_execute_num < 2)
-		res = i->dft->lldd_execute_task(task, 1, GFP_ATOMIC);
+		ret = i->dft->lldd_execute_task(task, 1, GFP_ATOMIC);
 	else
-		res = sas_queue_up(task);
+		ret = sas_queue_up(task);
 
 	/* Examine */
-	if (res) {
-		SAS_DPRINTK("lldd_execute_task returned: %d\n", res);
+	if (ret) {
+		SAS_DPRINTK("lldd_execute_task returned: %d\n", ret);
 
 		if (qc->scsicmd)
 			ASSIGN_SAS_TASK(qc->scsicmd, NULL);
 		sas_free_task(task);
-		return AC_ERR_SYSTEM;
+		ret = AC_ERR_SYSTEM;
 	}
 
-	return 0;
+ out:
+	spin_lock_irq(ap->lock);
+	return ret;
 }
 
 static bool sas_ata_qc_fill_rtf(struct ata_queued_cmd *qc)
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 2a163c7..fd60465 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -198,11 +198,9 @@ int sas_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
 	}
 
 	if (dev_is_sata(dev)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(dev->sata_dev.ap->lock, flags);
+		spin_lock_irq(dev->sata_dev.ap->lock);
 		res = ata_sas_queuecmd(cmd, dev->sata_dev.ap);
-		spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags);
+		spin_unlock_irq(dev->sata_dev.ap->lock);
 		return res;
 	}
 
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index b118e63..cd88223 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -893,9 +893,6 @@ static int mvs_task_exec(struct sas_task *task, const int num, gfp_t gfp_flags,
 
 	mvi = ((struct mvs_device *)task->dev->lldd_dev)->mvi_info;
 
-	if ((dev->dev_type == SATA_DEV) && (dev->sata_dev.ap != NULL))
-		spin_unlock_irq(dev->sata_dev.ap->lock);
-
 	spin_lock_irqsave(&mvi->lock, flags);
 	rc = mvs_task_prep(task, mvi, is_tmf, tmf, &pass);
 	if (rc)
@@ -906,9 +903,6 @@ static int mvs_task_exec(struct sas_task *task, const int num, gfp_t gfp_flags,
 				(MVS_CHIP_SLOT_SZ - 1));
 	spin_unlock_irqrestore(&mvi->lock, flags);
 
-	if ((dev->dev_type == SATA_DEV) && (dev->sata_dev.ap != NULL))
-		spin_lock_irq(dev->sata_dev.ap->lock);
-
 	return rc;
 }
 
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 13811c7..5add18c 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -342,7 +342,7 @@ static int pm8001_task_exec(struct sas_task *task, const int num,
 	struct pm8001_ccb_info *ccb;
 	u32 tag = 0xdeadbeef, rc, n_elem = 0;
 	u32 n = num;
-	unsigned long flags = 0, flags_libsas = 0;
+	unsigned long flags = 0;
 
 	if (!dev->port) {
 		struct task_status_struct *tsm = &t->task_status;
@@ -366,11 +366,7 @@ static int pm8001_task_exec(struct sas_task *task, const int num,
 				ts->stat = SAS_PHY_DOWN;
 
 				spin_unlock_irqrestore(&pm8001_ha->lock, flags);
-				spin_unlock_irqrestore(dev->sata_dev.ap->lock,
-						flags_libsas);
 				t->task_done(t);
-				spin_lock_irqsave(dev->sata_dev.ap->lock,
-					flags_libsas);
 				spin_lock_irqsave(&pm8001_ha->lock, flags);
 				if (n > 1)
 					t = list_entry(t->list.next,


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 09/28] libsas: prevent domain rediscovery competing with ata error handling
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (7 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 08/28] libsas: remove ata_port.lock management duties from lldds Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 10/28] libsas: use ->set_dmamode to notify lldds of NCQ parameters Dan Williams
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Christoph Hellwig

libata error handling provides for a timeout for link recovery.  libsas
must not rescan for previously known devices in this interval otherwise
it may remove a device that is simply waiting for its link to recover.
Let libata-eh make the determination of when the link is stable and
prevent libsas (host workqueue) from taking action while this
determination is pending.

Using a mutex (ha->eh_mutex) to block ata eh while rediscovery is
running requires any discovery action that may block on eh be moved to
its own context outside the lock.  Probing ATA devices explicitly waits
on ata-eh and the cache-flush-io issued during device removal may also
pend awaiting eh completion.  Essentially any rphy add/remove activity
needs to run outside the lock.

This adds a new cleanup state for domain devices to libsas
'allocated-not-probed'.  In this state dev->rphy points to a rphy that
is known to have been through a sas_rphy_add() event.  At
sas_unregister_dev() time check if this device is still pending probe
and cleanup accordingly.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c      |   45 +++++++++++++++++++++++++++---
 drivers/scsi/libsas/sas_discover.c |   54 ++++++++++++++++++++++++++++++++----
 drivers/scsi/libsas/sas_expander.c |    5 +--
 drivers/scsi/libsas/sas_init.c     |    2 +
 drivers/scsi/libsas/sas_internal.h |    1 +
 drivers/scsi/libsas/sas_port.c     |    2 +
 drivers/scsi/scsi_transport_sas.c  |   18 ++++++++++--
 include/scsi/libsas.h              |   10 +++++--
 include/scsi/sas_ata.h             |    5 +++
 include/scsi/scsi_transport_sas.h  |    1 +
 10 files changed, 125 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 0489001..4b91c74 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -753,6 +753,35 @@ static int sas_discover_sata_pm(struct domain_device *dev)
 	return -ENODEV;
 }
 
+void sas_probe_sata(struct work_struct *work)
+{
+	struct domain_device *dev, *n;
+	struct sas_discovery_event *ev =
+		container_of(work, struct sas_discovery_event, work);
+	struct asd_sas_port *port = ev->port;
+
+	clear_bit(DISCE_PROBE, &port->disc.pending);
+
+	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node) {
+		int err;
+
+		spin_lock_irq(&port->dev_list_lock);
+		list_add_tail(&dev->dev_list_node, &port->dev_list);
+		spin_unlock_irq(&port->dev_list_lock);
+
+		err = sas_rphy_add(dev->rphy);
+
+		if (err) {
+			SAS_DPRINTK("%s: for %s device %16llx returned %d\n",
+				    __func__, dev->parent ? "exp-attached" :
+							    "direct-attached",
+				    SAS_ADDR(dev->sas_addr), err);
+			sas_unregister_dev(port, dev);
+		} else
+			list_del_init(&dev->disco_list_node);
+	}
+}
+
 /**
  * sas_discover_sata -- discover an STP/SATA domain device
  * @dev: pointer to struct domain_device of interest
@@ -789,10 +818,15 @@ int sas_discover_sata(struct domain_device *dev)
 		break;
 	}
 	sas_notify_lldd_dev_gone(dev);
-	if (!res) {
-		sas_notify_lldd_dev_found(dev);
-		res = sas_rphy_add(dev->rphy);
-	}
+
+	if (res)
+		return res;
+
+	res = sas_notify_lldd_dev_found(dev);
+	if (res)
+		return res;
+
+	sas_discover_event(dev->port, DISCE_PROBE);
 
 	return res;
 }
@@ -800,7 +834,9 @@ int sas_discover_sata(struct domain_device *dev)
 void sas_ata_strategy_handler(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
+	struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(shost);
 
+	mutex_lock(&sas_ha->eh_mutex);
 	shost_for_each_device(sdev, shost) {
 		struct domain_device *ddev = sdev_to_domain_dev(sdev);
 		struct ata_port *ap = ddev->sata_dev.ap;
@@ -811,6 +847,7 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost)
 		ata_port_printk(ap, KERN_DEBUG, "sas eh calling libata port error handler");
 		ata_scsi_port_error_handler(shost, ap);
 	}
+	mutex_unlock(&sas_ha->eh_mutex);
 }
 
 int sas_ata_timed_out(struct scsi_cmnd *cmd, struct sas_task *task,
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 32e0117..eca9927 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -148,9 +148,14 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	port->disc.max_level = 0;
 
 	dev->rphy = rphy;
-	spin_lock_irq(&port->dev_list_lock);
-	list_add_tail(&dev->dev_list_node, &port->dev_list);
-	spin_unlock_irq(&port->dev_list_lock);
+
+	if (dev_is_sata(dev))
+		list_add_tail(&dev->disco_list_node, &port->disco_list);
+	else {
+		spin_lock_irq(&port->dev_list_lock);
+		list_add_tail(&dev->dev_list_node, &port->dev_list);
+		spin_unlock_irq(&port->dev_list_lock);
+	}
 
 	return 0;
 }
@@ -255,14 +260,42 @@ static void sas_unregister_common_dev(struct asd_sas_port *port, struct domain_d
 	sas_put_device(dev);
 }
 
-void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
+static void sas_destruct_devices(struct work_struct *work)
 {
-	if (dev->rphy) {
+	struct domain_device *dev, *n;
+	struct sas_discovery_event *ev =
+		container_of(work, struct sas_discovery_event, work);
+	struct asd_sas_port *port = ev->port;
+
+	clear_bit(DISCE_DESTRUCT, &port->disc.pending);
+
+	list_for_each_entry_safe(dev, n, &port->destroy_list, dev_list_node) {
 		sas_remove_children(&dev->rphy->dev);
 		sas_rphy_delete(dev->rphy);
 		dev->rphy = NULL;
+		sas_unregister_common_dev(port, dev);
+	}
+}
+
+void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
+{
+	if (!list_empty(&dev->disco_list_node)) {
+		/* this rphy never saw sas_rphy_add */
+		list_del_init(&dev->disco_list_node);
+		sas_rphy_free(dev->rphy);
+		dev->rphy = NULL;
+		sas_unregister_common_dev(port, dev);
+	}
+
+	if (dev->rphy) {
+		sas_rphy_unlink(dev->rphy);
+
+		spin_lock_irq(&port->dev_list_lock);
+		list_move_tail(&dev->dev_list_node, &port->destroy_list);
+		spin_unlock_irq(&port->dev_list_lock);
+
+		sas_discover_event(dev->port, DISCE_DESTRUCT);
 	}
-	sas_unregister_common_dev(port, dev);
 }
 
 void sas_unregister_domain_devices(struct asd_sas_port *port)
@@ -271,6 +304,8 @@ void sas_unregister_domain_devices(struct asd_sas_port *port)
 
 	list_for_each_entry_safe_reverse(dev, n, &port->dev_list, dev_list_node)
 		sas_unregister_dev(port, dev);
+	list_for_each_entry_safe(dev, n, &port->disco_list, disco_list_node)
+		sas_unregister_dev(port, dev);
 
 	port->port->rphy = NULL;
 
@@ -335,6 +370,7 @@ static void sas_discover_domain(struct work_struct *work)
 		sas_rphy_free(dev->rphy);
 		dev->rphy = NULL;
 
+		list_del_init(&dev->disco_list_node);
 		spin_lock_irq(&port->dev_list_lock);
 		list_del_init(&dev->dev_list_node);
 		spin_unlock_irq(&port->dev_list_lock);
@@ -358,8 +394,12 @@ static void sas_revalidate_domain(struct work_struct *work)
 
 	SAS_DPRINTK("REVALIDATING DOMAIN on port %d, pid:%d\n", port->id,
 		    task_pid_nr(current));
+
+	/* prevent rediscovery from finding sata links in recovery */
+	mutex_lock(&port->ha->eh_mutex);
 	if (port->port_dev)
 		res = sas_ex_revalidate_domain(port->port_dev);
+	mutex_unlock(&port->ha->eh_mutex);
 
 	SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
 		    port->id, task_pid_nr(current), res);
@@ -414,6 +454,8 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 	static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
 		[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 		[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
+		[DISCE_PROBE] = sas_probe_sata,
+		[DISCE_DESTRUCT] = sas_destruct_devices,
 	};
 
 	disc->pending = 0;
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 15d2239..c3846cf 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -704,9 +704,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 
 		child->rphy = rphy;
 
-		spin_lock_irq(&parent->port->dev_list_lock);
-		list_add_tail(&child->dev_list_node, &parent->port->dev_list);
-		spin_unlock_irq(&parent->port->dev_list_lock);
+		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
 
 		res = sas_discover_sata(child);
 		if (res) {
@@ -756,6 +754,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 	sas_rphy_free(child->rphy);
 	child->rphy = NULL;
 
+	list_del(&child->disco_list_node);
 	spin_lock_irq(&parent->port->dev_list_lock);
 	list_del(&child->dev_list_node);
 	spin_unlock_irq(&parent->port->dev_list_lock);
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 572b943..0cca72a 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -104,6 +104,7 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 {
 	int error = 0;
 
+	mutex_init(&sas_ha->eh_mutex);
 	spin_lock_init(&sas_ha->phy_port_lock);
 	sas_hash_addr(sas_ha->hashed_sas_addr, sas_ha->sas_addr);
 
@@ -168,6 +169,7 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	sas_drain_work(sas_ha);
 
 	sas_unregister_ports(sas_ha);
+	sas_drain_work(sas_ha);
 
 	if (sas_ha->lldd_max_execute_num > 1) {
 		sas_shutdown_queue(sas_ha);
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 948ea64..e21c245 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -138,6 +138,7 @@ static inline struct domain_device *sas_alloc_device(void)
 	if (dev) {
 		INIT_LIST_HEAD(&dev->siblings);
 		INIT_LIST_HEAD(&dev->dev_list_node);
+		INIT_LIST_HEAD(&dev->disco_list_node);
 		kref_init(&dev->kref);
 	}
 	return dev;
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index a47c7a7..e8e68d0 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -277,6 +277,8 @@ static void sas_init_port(struct asd_sas_port *port,
 	memset(port, 0, sizeof(*port));
 	port->id = i;
 	INIT_LIST_HEAD(&port->dev_list);
+	INIT_LIST_HEAD(&port->disco_list);
+	INIT_LIST_HEAD(&port->destroy_list);
 	spin_lock_init(&port->phy_list_lock);
 	INIT_LIST_HEAD(&port->phy_list);
 	port->ha = sas_ha;
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 9d9330a..9421bae 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -1603,6 +1603,20 @@ sas_rphy_delete(struct sas_rphy *rphy)
 EXPORT_SYMBOL(sas_rphy_delete);
 
 /**
+ * sas_rphy_unlink  -  unlink SAS remote PHY
+ * @rphy:	SAS remote phy to unlink from its parent port
+ *
+ * Removes port reference to an rphy
+ */
+void sas_rphy_unlink(struct sas_rphy *rphy)
+{
+	struct sas_port *parent = dev_to_sas_port(rphy->dev.parent);
+
+	parent->rphy = NULL;
+}
+EXPORT_SYMBOL(sas_rphy_unlink);
+
+/**
  * sas_rphy_remove  -  remove SAS remote PHY
  * @rphy:	SAS remote phy to remove
  *
@@ -1612,7 +1626,6 @@ void
 sas_rphy_remove(struct sas_rphy *rphy)
 {
 	struct device *dev = &rphy->dev;
-	struct sas_port *parent = dev_to_sas_port(dev->parent);
 
 	switch (rphy->identify.device_type) {
 	case SAS_END_DEVICE:
@@ -1626,10 +1639,9 @@ sas_rphy_remove(struct sas_rphy *rphy)
 		break;
 	}
 
+	sas_rphy_unlink(rphy);
 	transport_remove_device(dev);
 	device_del(dev);
-
-	parent->rphy = NULL;
 }
 EXPORT_SYMBOL(sas_rphy_remove);
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 42900fa..8ada830 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -86,7 +86,9 @@ enum discover_event {
 	DISCE_DISCOVER_DOMAIN   = 0U,
 	DISCE_REVALIDATE_DOMAIN = 1,
 	DISCE_PORT_GONE         = 2,
-	DISC_NUM_EVENTS 	= 3,
+	DISCE_PROBE		= 3,
+	DISCE_DESTRUCT		= 4,
+	DISC_NUM_EVENTS		= 5,
 };
 
 /* ---------- Expander Devices ---------- */
@@ -188,6 +190,7 @@ struct domain_device {
         struct asd_sas_port *port;        /* shortcut to root of the tree */
 
         struct list_head dev_list_node;
+	struct list_head disco_list_node;
 
         enum sas_protocol    iproto;
         enum sas_protocol    tproto;
@@ -223,7 +226,6 @@ struct sas_discovery {
 	int    max_level;
 };
 
-
 /* The port struct is Class:RW, driver:RO */
 struct asd_sas_port {
 /* private: */
@@ -233,6 +235,8 @@ struct asd_sas_port {
 	struct domain_device *port_dev;
 	spinlock_t dev_list_lock;
 	struct list_head dev_list;
+	struct list_head disco_list;
+	struct list_head destroy_list;
 	enum   sas_linkrate linkrate;
 
 	struct sas_phy *phy;
@@ -343,6 +347,8 @@ struct sas_ha_struct {
 	unsigned long	  state;
 	spinlock_t 	  state_lock;
 
+	struct mutex eh_mutex;
+
 	struct scsi_core core;
 
 /* public: */
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 7d5013f..557fc9a 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -45,6 +45,7 @@ int sas_ata_timed_out(struct scsi_cmnd *cmd, struct sas_task *task,
 		      enum blk_eh_timer_return *rtn);
 int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 	       struct list_head *done_q);
+void sas_probe_sata(struct work_struct *work);
 
 #else
 
@@ -78,6 +79,10 @@ static inline int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 	return 0;
 }
 
+static inline void sas_probe_sata(struct work_struct *work)
+{
+}
+
 #endif
 
 #endif /* _SAS_ATA_H_ */
diff --git a/include/scsi/scsi_transport_sas.h b/include/scsi/scsi_transport_sas.h
index ffeebc3..6d14daa 100644
--- a/include/scsi/scsi_transport_sas.h
+++ b/include/scsi/scsi_transport_sas.h
@@ -194,6 +194,7 @@ void sas_rphy_free(struct sas_rphy *);
 extern int sas_rphy_add(struct sas_rphy *);
 extern void sas_rphy_remove(struct sas_rphy *);
 extern void sas_rphy_delete(struct sas_rphy *);
+extern void sas_rphy_unlink(struct sas_rphy *);
 extern int scsi_is_sas_rphy(const struct device *);
 
 struct sas_port *sas_port_alloc(struct device *, int);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 10/28] libsas: use ->set_dmamode to notify lldds of NCQ parameters
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (8 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 09/28] libsas: prevent domain rediscovery competing with ata error handling Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 11/28] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done Dan Williams
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: Xiangliang Yu, linux-ide, Luben Tuikov, Jack Wang

sas_discover_sata() notifies lldds of sata devices twice.  Once to allow
the 'identify' to be sent, and a second time to allow aic94xx (the only
libsas driver that cares about sata_dev.identify) to setup NCQ
parameters before the device becomes known to the midlayer.  Replace
this double notification and intervening 'identify' with an explicit
->lldd_ata_set_dmamode notification.  With this change all ata internal
commands are issued by libata, so we no longer need sas_issue_ata_cmd().

The data from the identify command only needs to be cached in one
location so ata_device.id replaces domain_device.sata_dev.identify.

Cc: Jack Wang <jack_wang@usish.com>
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/aic94xx/aic94xx.h      |    2 
 drivers/scsi/aic94xx/aic94xx_dev.c  |   38 +++-
 drivers/scsi/aic94xx/aic94xx_init.c |    2 
 drivers/scsi/libsas/sas_ata.c       |  324 ++---------------------------------
 drivers/scsi/libsas/sas_discover.c  |    5 -
 include/scsi/libsas.h               |    4 
 6 files changed, 49 insertions(+), 326 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx.h b/drivers/scsi/aic94xx/aic94xx.h
index 2863a9d..66cda66 100644
--- a/drivers/scsi/aic94xx/aic94xx.h
+++ b/drivers/scsi/aic94xx/aic94xx.h
@@ -80,6 +80,8 @@ void asd_invalidate_edb(struct asd_ascb *ascb, int edb_id);
 
 int  asd_execute_task(struct sas_task *, int num, gfp_t gfp_flags);
 
+void asd_set_dmamode(struct domain_device *dev);
+
 /* ---------- TMFs ---------- */
 int  asd_abort_task(struct sas_task *);
 int  asd_abort_task_set(struct domain_device *, u8 *lun);
diff --git a/drivers/scsi/aic94xx/aic94xx_dev.c b/drivers/scsi/aic94xx/aic94xx_dev.c
index 2e2ddec..64136c56 100644
--- a/drivers/scsi/aic94xx/aic94xx_dev.c
+++ b/drivers/scsi/aic94xx/aic94xx_dev.c
@@ -109,26 +109,37 @@ static int asd_init_sata_tag_ddb(struct domain_device *dev)
 	return 0;
 }
 
-static int asd_init_sata(struct domain_device *dev)
+void asd_set_dmamode(struct domain_device *dev)
 {
 	struct asd_ha_struct *asd_ha = dev->port->ha->lldd_ha;
+	struct ata_device *ata_dev = sas_to_ata_dev(dev);
 	int ddb = (int) (unsigned long) dev->lldd_dev;
 	u32 qdepth = 0;
-	int res = 0;
 
-	asd_ddbsite_write_word(asd_ha, ddb, ATA_CMD_SCBPTR, 0xFFFF);
-	if ((dev->dev_type == SATA_DEV || dev->dev_type == SATA_PM_PORT) &&
-	    dev->sata_dev.identify_device &&
-	    dev->sata_dev.identify_device[10] != 0) {
-		u16 w75 = le16_to_cpu(dev->sata_dev.identify_device[75]);
-		u16 w76 = le16_to_cpu(dev->sata_dev.identify_device[76]);
-
-		if (w76 & 0x100) /* NCQ? */
-			qdepth = (w75 & 0x1F) + 1;
+	if (dev->dev_type == SATA_DEV || dev->dev_type == SATA_PM_PORT) {
+		if (ata_id_has_ncq(ata_dev->id))
+			qdepth = ata_id_queue_depth(ata_dev->id);
 		asd_ddbsite_write_dword(asd_ha, ddb, SATA_TAG_ALLOC_MASK,
 					(1ULL<<qdepth)-1);
 		asd_ddbsite_write_byte(asd_ha, ddb, NUM_SATA_TAGS, qdepth);
 	}
+
+	if (qdepth > 0)
+		if (asd_init_sata_tag_ddb(dev) != 0) {
+			unsigned long flags;
+
+			spin_lock_irqsave(dev->sata_dev.ap->lock, flags);
+			ata_dev->flags |= ATA_DFLAG_NCQ_OFF;
+			spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags);
+		}
+}
+
+static int asd_init_sata(struct domain_device *dev)
+{
+	struct asd_ha_struct *asd_ha = dev->port->ha->lldd_ha;
+	int ddb = (int) (unsigned long) dev->lldd_dev;
+
+	asd_ddbsite_write_word(asd_ha, ddb, ATA_CMD_SCBPTR, 0xFFFF);
 	if (dev->dev_type == SATA_DEV || dev->dev_type == SATA_PM ||
 	    dev->dev_type == SATA_PM_PORT) {
 		struct dev_to_host_fis *fis = (struct dev_to_host_fis *)
@@ -136,9 +147,8 @@ static int asd_init_sata(struct domain_device *dev)
 		asd_ddbsite_write_byte(asd_ha, ddb, SATA_STATUS, fis->status);
 	}
 	asd_ddbsite_write_word(asd_ha, ddb, NCQ_DATA_SCB_PTR, 0xFFFF);
-	if (qdepth > 0)
-		res = asd_init_sata_tag_ddb(dev);
-	return res;
+
+	return 0;
 }
 
 static int asd_init_target_ddb(struct domain_device *dev)
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 2b3717f..eea988a 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -1009,6 +1009,8 @@ static struct sas_domain_function_template aic94xx_transport_functions = {
 	.lldd_clear_nexus_ha	= asd_clear_nexus_ha,
 
 	.lldd_control_phy	= asd_control_phy,
+
+	.lldd_ata_set_dmamode	= asd_set_dmamode,
 };
 
 static const struct pci_device_id aic94xx_pci_table[] __devinitdata = {
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 4b91c74..dd35a98 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -362,6 +362,17 @@ static void sas_ata_post_internal(struct ata_queued_cmd *qc)
 	}
 }
 
+
+static void sas_ata_set_dmamode(struct ata_port *ap, struct ata_device *ata_dev)
+{
+	struct domain_device *dev = ap->private_data;
+	struct sas_internal *i =
+		to_sas_internal(dev->port->ha->core.shost->transportt);
+
+	if (i->dft->lldd_ata_set_dmamode)
+		i->dft->lldd_ata_set_dmamode(dev);
+}
+
 static struct ata_port_operations sas_sata_ops = {
 	.prereset		= ata_std_prereset,
 	.softreset		= sas_ata_soft_reset,
@@ -375,6 +386,7 @@ static struct ata_port_operations sas_sata_ops = {
 	.qc_fill_rtf		= sas_ata_qc_fill_rtf,
 	.port_start		= ata_sas_port_start,
 	.port_stop		= ata_sas_port_stop,
+	.set_dmamode		= sas_ata_set_dmamode,
 };
 
 static struct ata_port_info sata_port_info = {
@@ -437,163 +449,6 @@ void sas_ata_task_abort(struct sas_task *task)
 	complete(waiting);
 }
 
-static void sas_task_timedout(unsigned long _task)
-{
-	struct sas_task *task = (void *) _task;
-	unsigned long flags;
-
-	spin_lock_irqsave(&task->task_state_lock, flags);
-	if (!(task->task_state_flags & SAS_TASK_STATE_DONE))
-		task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	spin_unlock_irqrestore(&task->task_state_lock, flags);
-
-	complete(&task->completion);
-}
-
-static void sas_disc_task_done(struct sas_task *task)
-{
-	if (!del_timer(&task->timer))
-		return;
-	complete(&task->completion);
-}
-
-#define SAS_DEV_TIMEOUT 10
-
-/**
- * sas_execute_task -- Basic task processing for discovery
- * @task: the task to be executed
- * @buffer: pointer to buffer to do I/O
- * @size: size of @buffer
- * @dma_dir: DMA direction.  DMA_xxx
- */
-static int sas_execute_task(struct sas_task *task, void *buffer, int size,
-			    enum dma_data_direction dma_dir)
-{
-	int res = 0;
-	struct scatterlist *scatter = NULL;
-	struct task_status_struct *ts = &task->task_status;
-	int num_scatter = 0;
-	int retries = 0;
-	struct sas_internal *i =
-		to_sas_internal(task->dev->port->ha->core.shost->transportt);
-
-	if (dma_dir != DMA_NONE) {
-		scatter = kzalloc(sizeof(*scatter), GFP_KERNEL);
-		if (!scatter)
-			goto out;
-
-		sg_init_one(scatter, buffer, size);
-		num_scatter = 1;
-	}
-
-	task->task_proto = task->dev->tproto;
-	task->scatter = scatter;
-	task->num_scatter = num_scatter;
-	task->total_xfer_len = size;
-	task->data_dir = dma_dir;
-	task->task_done = sas_disc_task_done;
-	if (dma_dir != DMA_NONE &&
-	    sas_protocol_ata(task->task_proto)) {
-		task->num_scatter = dma_map_sg(task->dev->port->ha->dev,
-					       task->scatter,
-					       task->num_scatter,
-					       task->data_dir);
-	}
-
-	for (retries = 0; retries < 5; retries++) {
-		task->task_state_flags = SAS_TASK_STATE_PENDING;
-		init_completion(&task->completion);
-
-		task->timer.data = (unsigned long) task;
-		task->timer.function = sas_task_timedout;
-		task->timer.expires = jiffies + SAS_DEV_TIMEOUT*HZ;
-		add_timer(&task->timer);
-
-		res = i->dft->lldd_execute_task(task, 1, GFP_KERNEL);
-		if (res) {
-			del_timer(&task->timer);
-			SAS_DPRINTK("executing SAS discovery task failed:%d\n",
-				    res);
-			goto ex_err;
-		}
-		wait_for_completion(&task->completion);
-		res = -ECOMM;
-		if (task->task_state_flags & SAS_TASK_STATE_ABORTED) {
-			int res2;
-			SAS_DPRINTK("task aborted, flags:0x%x\n",
-				    task->task_state_flags);
-			res2 = i->dft->lldd_abort_task(task);
-			SAS_DPRINTK("came back from abort task\n");
-			if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) {
-				if (res2 == TMF_RESP_FUNC_COMPLETE)
-					continue; /* Retry the task */
-				else
-					goto ex_err;
-			}
-		}
-		if (task->task_status.stat == SAM_STAT_BUSY ||
-			   task->task_status.stat == SAM_STAT_TASK_SET_FULL ||
-			   task->task_status.stat == SAS_QUEUE_FULL) {
-			SAS_DPRINTK("task: q busy, sleeping...\n");
-			schedule_timeout_interruptible(HZ);
-		} else if (task->task_status.stat == SAM_STAT_CHECK_CONDITION) {
-			struct scsi_sense_hdr shdr;
-
-			if (!scsi_normalize_sense(ts->buf, ts->buf_valid_size,
-						  &shdr)) {
-				SAS_DPRINTK("couldn't normalize sense\n");
-				continue;
-			}
-			if ((shdr.sense_key == 6 && shdr.asc == 0x29) ||
-			    (shdr.sense_key == 2 && shdr.asc == 4 &&
-			     shdr.ascq == 1)) {
-				SAS_DPRINTK("device %016llx LUN: %016llx "
-					    "powering up or not ready yet, "
-					    "sleeping...\n",
-					    SAS_ADDR(task->dev->sas_addr),
-					    SAS_ADDR(task->ssp_task.LUN));
-
-				schedule_timeout_interruptible(5*HZ);
-			} else if (shdr.sense_key == 1) {
-				res = 0;
-				break;
-			} else if (shdr.sense_key == 5) {
-				break;
-			} else {
-				SAS_DPRINTK("dev %016llx LUN: %016llx "
-					    "sense key:0x%x ASC:0x%x ASCQ:0x%x"
-					    "\n",
-					    SAS_ADDR(task->dev->sas_addr),
-					    SAS_ADDR(task->ssp_task.LUN),
-					    shdr.sense_key,
-					    shdr.asc, shdr.ascq);
-			}
-		} else if (task->task_status.resp != SAS_TASK_COMPLETE ||
-			   task->task_status.stat != SAM_STAT_GOOD) {
-			SAS_DPRINTK("task finished with resp:0x%x, "
-				    "stat:0x%x\n",
-				    task->task_status.resp,
-				    task->task_status.stat);
-			goto ex_err;
-		} else {
-			res = 0;
-			break;
-		}
-	}
-ex_err:
-	if (dma_dir != DMA_NONE) {
-		if (sas_protocol_ata(task->task_proto))
-			dma_unmap_sg(task->dev->port->ha->dev,
-				     task->scatter, task->num_scatter,
-				     task->data_dir);
-		kfree(scatter);
-	}
-out:
-	return res;
-}
-
-/* ---------- SATA ---------- */
-
 static void sas_get_ata_command_set(struct domain_device *dev)
 {
 	struct dev_to_host_fis *fis =
@@ -637,122 +492,6 @@ static void sas_get_ata_command_set(struct domain_device *dev)
 		dev->sata_dev.command_set = ATAPI_COMMAND_SET;
 }
 
-/**
- * sas_issue_ata_cmd -- Basic SATA command processing for discovery
- * @dev: the device to send the command to
- * @command: the command register
- * @features: the features register
- * @buffer: pointer to buffer to do I/O
- * @size: size of @buffer
- * @dma_dir: DMA direction.  DMA_xxx
- */
-static int sas_issue_ata_cmd(struct domain_device *dev, u8 command,
-			     u8 features, void *buffer, int size,
-			     enum dma_data_direction dma_dir)
-{
-	int res = 0;
-	struct sas_task *task;
-	struct dev_to_host_fis *d2h_fis = (struct dev_to_host_fis *)
-		&dev->frame_rcvd[0];
-
-	res = -ENOMEM;
-	task = sas_alloc_task(GFP_KERNEL);
-	if (!task)
-		goto out;
-
-	task->dev = dev;
-
-	task->ata_task.fis.fis_type = 0x27;
-	task->ata_task.fis.command = command;
-	task->ata_task.fis.features = features;
-	task->ata_task.fis.device = d2h_fis->device;
-	task->ata_task.retry_count = 1;
-
-	res = sas_execute_task(task, buffer, size, dma_dir);
-
-	sas_free_task(task);
-out:
-	return res;
-}
-
-#define ATA_IDENTIFY_DEV         0xEC
-#define ATA_IDENTIFY_PACKET_DEV  0xA1
-#define ATA_SET_FEATURES         0xEF
-#define ATA_FEATURE_PUP_STBY_SPIN_UP 0x07
-
-/**
- * sas_discover_sata_dev -- discover a STP/SATA device (SATA_DEV)
- * @dev: STP/SATA device of interest (ATA/ATAPI)
- *
- * The LLDD has already been notified of this device, so that we can
- * send FISes to it.  Here we try to get IDENTIFY DEVICE or IDENTIFY
- * PACKET DEVICE, if ATAPI device, so that the LLDD can fine-tune its
- * performance for this device.
- */
-static int sas_discover_sata_dev(struct domain_device *dev)
-{
-	int     res;
-	__le16  *identify_x;
-	u8      command;
-
-	identify_x = kzalloc(512, GFP_KERNEL);
-	if (!identify_x)
-		return -ENOMEM;
-
-	if (dev->sata_dev.command_set == ATA_COMMAND_SET) {
-		dev->sata_dev.identify_device = identify_x;
-		command = ATA_IDENTIFY_DEV;
-	} else {
-		dev->sata_dev.identify_packet_device = identify_x;
-		command = ATA_IDENTIFY_PACKET_DEV;
-	}
-
-	res = sas_issue_ata_cmd(dev, command, 0, identify_x, 512,
-				DMA_FROM_DEVICE);
-	if (res)
-		goto out_err;
-
-	/* lives on the media? */
-	if (le16_to_cpu(identify_x[0]) & 4) {
-		/* incomplete response */
-		SAS_DPRINTK("sending SET FEATURE/PUP_STBY_SPIN_UP to "
-			    "dev %llx\n", SAS_ADDR(dev->sas_addr));
-		if (!(identify_x[83] & cpu_to_le16(1<<6)))
-			goto cont1;
-		res = sas_issue_ata_cmd(dev, ATA_SET_FEATURES,
-					ATA_FEATURE_PUP_STBY_SPIN_UP,
-					NULL, 0, DMA_NONE);
-		if (res)
-			goto cont1;
-
-		schedule_timeout_interruptible(5*HZ); /* More time? */
-		res = sas_issue_ata_cmd(dev, command, 0, identify_x, 512,
-					DMA_FROM_DEVICE);
-		if (res)
-			goto out_err;
-	}
-cont1:
-	/* XXX Hint: register this SATA device with SATL.
-	   When this returns, dev->sata_dev->lu is alive and
-	   present.
-	sas_satl_register_dev(dev);
-	*/
-
-	sas_fill_in_rphy(dev, dev->rphy);
-
-	return 0;
-out_err:
-	dev->sata_dev.identify_packet_device = NULL;
-	dev->sata_dev.identify_device = NULL;
-	kfree(identify_x);
-	return res;
-}
-
-static int sas_discover_sata_pm(struct domain_device *dev)
-{
-	return -ENODEV;
-}
-
 void sas_probe_sata(struct work_struct *work)
 {
 	struct domain_device *dev, *n;
@@ -786,49 +525,26 @@ void sas_probe_sata(struct work_struct *work)
  * sas_discover_sata -- discover an STP/SATA domain device
  * @dev: pointer to struct domain_device of interest
  *
- * First we notify the LLDD of this device, so we can send frames to
- * it.  Then depending on the type of device we call the appropriate
- * discover functions.  Once device discover is done, we notify the
- * LLDD so that it can fine-tune its parameters for the device, by
- * removing it and then adding it.  That is, the second time around,
- * the driver would have certain fields, that it is looking at, set.
- * Finally we initialize the kobj so that the device can be added to
- * the system at registration time.  Devices directly attached to a HA
- * port, have no parents.  All other devices do, and should have their
- * "parent" pointer set appropriately before calling this function.
+ * Devices directly attached to a HA port, have no parents.  All other
+ * devices do, and should have their "parent" pointer set appropriately
+ * before calling this function.
  */
 int sas_discover_sata(struct domain_device *dev)
 {
 	int res;
 
-	sas_get_ata_command_set(dev);
-
-	res = sas_notify_lldd_dev_found(dev);
-	if (res)
-		return res;
-
-	switch (dev->dev_type) {
-	case SATA_DEV:
-		res = sas_discover_sata_dev(dev);
-		break;
-	case SATA_PM:
-		res = sas_discover_sata_pm(dev);
-		break;
-	default:
-		break;
-	}
-	sas_notify_lldd_dev_gone(dev);
+	if (dev->dev_type == SATA_PM)
+		return -ENODEV;
 
-	if (res)
-		return res;
+	sas_get_ata_command_set(dev);
+	sas_fill_in_rphy(dev, dev->rphy);
 
 	res = sas_notify_lldd_dev_found(dev);
 	if (res)
 		return res;
 
 	sas_discover_event(dev->port, DISCE_PROBE);
-
-	return res;
+	return 0;
 }
 
 void sas_ata_strategy_handler(struct Scsi_Host *shost)
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index eca9927..3905143 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -237,11 +237,6 @@ void sas_free_device(struct kref *kref)
 	if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV)
 		kfree(dev->ex_dev.ex_phy);
 
-	if (dev_is_sata(dev)) {
-		kfree(dev->sata_dev.identify_device);
-		kfree(dev->sata_dev.identify_packet_device);
-	}
-
 	kfree(dev);
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 8ada830..94845c3 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -164,9 +164,6 @@ enum ata_command_set {
 struct sata_device {
         enum   ata_command_set command_set;
         struct smp_resp        rps_resp; /* report_phy_sata_resp */
-        __le16 *identify_device;
-        __le16 *identify_packet_device;
-
         u8     port_no;        /* port number, if this is a PM (Port) */
         struct list_head children; /* PM Ports if this is a PM */
 
@@ -604,6 +601,7 @@ struct sas_domain_function_template {
 	int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
 	int (*lldd_I_T_nexus_reset)(struct domain_device *);
 	int (*lldd_ata_soft_reset)(struct domain_device *);
+	void (*lldd_ata_set_dmamode)(struct domain_device *);
 	int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
 	int (*lldd_query_task)(struct sas_task *);
 


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 11/28] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (9 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 10/28] libsas: use ->set_dmamode to notify lldds of NCQ parameters Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 12/28] libsas: close error handling vs sas_ata_task_done() race Dan Williams
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Darrick J. Wong

Prior to the conversion to the new-style libata-eh sas_ata_task_done()
may have been the last opportunity to clean up the scmd, but now
libata-eh explicitly handles this case.  It also races against sas-eh.
If a lldd completes a task after SAS_TASK_STATE_ABORTED is set it could
trigger a spurious decrement of shost->host_failed.  Current lldds have
the band-aid of checking SAS_TASK_STATE_ABORTED before calling
->task_done(), but better to just let the scmds escalate to libata for
race free cleanup.

Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c |   14 --------------
 1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index dd35a98..649b04b 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -145,20 +145,6 @@ static void sas_ata_task_done(struct sas_task *task)
 	ata_qc_complete(qc);
 	spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags);
 
-	/*
-	 * If the sas_task has an ata qc, a scsi_cmnd and the aborted
-	 * flag is set, then we must have come in via the libsas EH
-	 * functions.  When we exit this function, we need to put the
-	 * scsi_cmnd on the list of finished errors.  The ata_qc_complete
-	 * call cleans up the libata side of things but we're protected
-	 * from the scsi_cmnd going away because the scsi_cmnd is owned
-	 * by the EH, making libata's call to scsi_done a NOP.
-	 */
-	spin_lock_irqsave(&task->task_state_lock, flags);
-	if (qc->scsicmd && task->task_state_flags & SAS_TASK_STATE_ABORTED)
-		scsi_eh_finish_cmd(qc->scsicmd, &sas_ha->eh_done_q);
-	spin_unlock_irqrestore(&task->task_state_lock, flags);
-
 qc_already_gone:
 	list_del_init(&task->list);
 	sas_free_task(task);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 12/28] libsas: close error handling vs sas_ata_task_done() race
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (10 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 11/28] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 13/28] libsas: prevent double completion of scmds from eh Dan Williams
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Darrick J. Wong

Since sas_ata does not implement ->freeze(), completions for scmds and
internal commands can still arrive concurrent with
ata_scsi_cmd_error_handler() and sas_ata_post_internal() respectively.
By the time either of those is called libata has committed to completing
the qc, and the ATA_PFLAG_FROZEN flag tells sas_ata_task_done() it has
lost the race.

In the sas_ata_post_internal() case we take on the additional
responsibility of freeing the sas_task to close the race with
sas_ata_task_done() freeing the the task while sas_ata_post_internal()
is in the process of invoking ->lldd_abort_task().

Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c       |   84 +++++++++++++++++++++++++++++++----
 drivers/scsi/libsas/sas_scsi_host.c |   44 ------------------
 include/scsi/libsas.h               |    1 
 3 files changed, 75 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 649b04b..11d049d 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -100,15 +100,31 @@ static void sas_ata_task_done(struct sas_task *task)
 	enum ata_completion_errors ac;
 	unsigned long flags;
 	struct ata_link *link;
+	struct ata_port *ap;
 
 	if (!qc)
 		goto qc_already_gone;
 
-	dev = qc->ap->private_data;
+	ap = qc->ap;
+	dev = ap->private_data;
 	sas_ha = dev->port->ha;
-	link = &dev->sata_dev.ap->link;
+	link = &ap->link;
+
+	spin_lock_irqsave(ap->lock, flags);
+	/* check if we lost the race with libata/sas_ata_post_internal() */
+	if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) {
+		spin_unlock_irqrestore(ap->lock, flags);
+		if (qc->scsicmd)
+			goto qc_already_gone;
+		else {
+			/* if eh is not involved and the port is frozen then the
+			 * ata internal abort process has taken responsibility
+			 * for this sas_task
+			 */
+			return;
+		}
+	}
 
-	spin_lock_irqsave(dev->sata_dev.ap->lock, flags);
 	if (stat->stat == SAS_PROTO_RESPONSE || stat->stat == SAM_STAT_GOOD ||
 	    ((stat->stat == SAM_STAT_CHECK_CONDITION &&
 	      dev->sata_dev.command_set == ATAPI_COMMAND_SET))) {
@@ -143,7 +159,7 @@ static void sas_ata_task_done(struct sas_task *task)
 	if (qc->scsicmd)
 		ASSIGN_SAS_TASK(qc->scsicmd, NULL);
 	ata_qc_complete(qc);
-	spin_unlock_irqrestore(dev->sata_dev.ap->lock, flags);
+	spin_unlock_irqrestore(ap->lock, flags);
 
 qc_already_gone:
 	list_del_init(&task->list);
@@ -320,6 +336,54 @@ static int sas_ata_soft_reset(struct ata_link *link, unsigned int *class,
 	return ret;
 }
 
+/*
+ * notify the lldd to forget the sas_task for this internal ata command
+ * that bypasses scsi-eh
+ */
+static void sas_ata_internal_abort(struct sas_task *task)
+{
+	struct sas_internal *si =
+		to_sas_internal(task->dev->port->ha->core.shost->transportt);
+	unsigned long flags;
+	int res;
+
+	spin_lock_irqsave(&task->task_state_lock, flags);
+	if (task->task_state_flags & SAS_TASK_STATE_ABORTED ||
+	    task->task_state_flags & SAS_TASK_STATE_DONE) {
+		spin_unlock_irqrestore(&task->task_state_lock, flags);
+		SAS_DPRINTK("%s: Task %p already finished.\n", __func__,
+			    task);
+		goto out;
+	}
+	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
+	spin_unlock_irqrestore(&task->task_state_lock, flags);
+
+	res = si->dft->lldd_abort_task(task);
+
+	spin_lock_irqsave(&task->task_state_lock, flags);
+	if (task->task_state_flags & SAS_TASK_STATE_DONE ||
+	    res == TMF_RESP_FUNC_COMPLETE) {
+		spin_unlock_irqrestore(&task->task_state_lock, flags);
+		goto out;
+	}
+
+	/* XXX we are not prepared to deal with ->lldd_abort_task()
+	 * failures.  TODO: lldds need to unconditionally forget about
+	 * aborted ata tasks, otherwise we (likely) leak the sas task
+	 * here
+	 */
+	SAS_DPRINTK("%s: Task %p leaked.\n", __func__, task);
+
+	if (!(task->task_state_flags & SAS_TASK_STATE_DONE))
+		task->task_state_flags &= ~SAS_TASK_STATE_ABORTED;
+	spin_unlock_irqrestore(&task->task_state_lock, flags);
+
+	return;
+ out:
+	list_del_init(&task->list);
+	sas_free_task(task);
+}
+
 static void sas_ata_post_internal(struct ata_queued_cmd *qc)
 {
 	if (qc->flags & ATA_QCFLAG_FAILED)
@@ -327,10 +391,12 @@ static void sas_ata_post_internal(struct ata_queued_cmd *qc)
 
 	if (qc->err_mask) {
 		/*
-		 * Find the sas_task and kill it.  By this point,
-		 * libata has decided to kill the qc, so we needn't
-		 * bother with sas_ata_task_done.  But we still
-		 * ought to abort the task.
+		 * Find the sas_task and kill it.  By this point, libata
+		 * has decided to kill the qc and has frozen the port.
+		 * In this state sas_ata_task_done() will no longer free
+		 * the sas_task, so we need to notify the lldd (via
+		 * ->lldd_abort_task) that the task is dead and free it
+		 *  ourselves.
 		 */
 		struct sas_task *task = qc->lldd_task;
 		unsigned long flags;
@@ -343,7 +409,7 @@ static void sas_ata_post_internal(struct ata_queued_cmd *qc)
 			spin_unlock_irqrestore(&task->task_state_lock, flags);
 
 			task->uldd_task = NULL;
-			__sas_task_abort(task);
+			sas_ata_internal_abort(task);
 		}
 	}
 }
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index fd60465..5e9fa99 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -957,49 +957,6 @@ void sas_shutdown_queue(struct sas_ha_struct *sas_ha)
 }
 
 /*
- * Call the LLDD task abort routine directly.  This function is intended for
- * use by upper layers that need to tell the LLDD to abort a task.
- */
-int __sas_task_abort(struct sas_task *task)
-{
-	struct sas_internal *si =
-		to_sas_internal(task->dev->port->ha->core.shost->transportt);
-	unsigned long flags;
-	int res;
-
-	spin_lock_irqsave(&task->task_state_lock, flags);
-	if (task->task_state_flags & SAS_TASK_STATE_ABORTED ||
-	    task->task_state_flags & SAS_TASK_STATE_DONE) {
-		spin_unlock_irqrestore(&task->task_state_lock, flags);
-		SAS_DPRINTK("%s: Task %p already finished.\n", __func__,
-			    task);
-		return 0;
-	}
-	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	spin_unlock_irqrestore(&task->task_state_lock, flags);
-
-	if (!si->dft->lldd_abort_task)
-		return -ENODEV;
-
-	res = si->dft->lldd_abort_task(task);
-
-	spin_lock_irqsave(&task->task_state_lock, flags);
-	if ((task->task_state_flags & SAS_TASK_STATE_DONE) ||
-	    (res == TMF_RESP_FUNC_COMPLETE))
-	{
-		spin_unlock_irqrestore(&task->task_state_lock, flags);
-		task->task_done(task);
-		return 0;
-	}
-
-	if (!(task->task_state_flags & SAS_TASK_STATE_DONE))
-		task->task_state_flags &= ~SAS_TASK_STATE_ABORTED;
-	spin_unlock_irqrestore(&task->task_state_lock, flags);
-
-	return -EAGAIN;
-}
-
-/*
  * Tell an upper layer that it needs to initiate an abort for a given task.
  * This should only ever be called by an LLDD.
  */
@@ -1097,7 +1054,6 @@ EXPORT_SYMBOL_GPL(sas_slave_configure);
 EXPORT_SYMBOL_GPL(sas_change_queue_depth);
 EXPORT_SYMBOL_GPL(sas_change_queue_type);
 EXPORT_SYMBOL_GPL(sas_bios_param);
-EXPORT_SYMBOL_GPL(__sas_task_abort);
 EXPORT_SYMBOL_GPL(sas_task_abort);
 EXPORT_SYMBOL_GPL(sas_phy_reset);
 EXPORT_SYMBOL_GPL(sas_phy_enable);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 94845c3..d100503 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -657,7 +657,6 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *);
 void sas_init_dev(struct domain_device *);
 
 void sas_task_abort(struct sas_task *);
-int __sas_task_abort(struct sas_task *);
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd);
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd);
 


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 13/28] libsas: prevent double completion of scmds from eh
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (11 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 12/28] libsas: close error handling vs sas_ata_task_done() race Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 14/28] libsas: fix timeout vs completion race Dan Williams
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

We invoke task->task_done() to free the task in the eh case, but at this
point we are prepared for scsi_eh_flush_done_q() to finish off the scmd.

Introduce sas_end_task() to capture the final response status from the
lldd and free the task.

Also take the opportunity to kill this warning.
drivers/scsi/libsas/sas_scsi_host.c: In function ‘sas_end_task’:
drivers/scsi/libsas/sas_scsi_host.c:102:3: warning: case value ‘2’ not in enumerated type ‘enum exec_status’ [-Wswitch]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_scsi_host.c |   61 +++++++++++++++++++----------------
 include/scsi/libsas.h               |    5 ++-
 2 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 5e9fa99..6ee9826 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -49,27 +49,12 @@
 #include <linux/scatterlist.h>
 #include <linux/libata.h>
 
-/* ---------- SCSI Host glue ---------- */
-
-static void sas_scsi_task_done(struct sas_task *task)
+/* record final status and free the task */
+static void sas_end_task(struct scsi_cmnd *sc, struct sas_task *task)
 {
 	struct task_status_struct *ts = &task->task_status;
-	struct scsi_cmnd *sc = task->uldd_task;
 	int hs = 0, stat = 0;
 
-	if (unlikely(task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
-		/* Aborted tasks will be completed by the error handler */
-		SAS_DPRINTK("task done but aborted\n");
-		return;
-	}
-
-	if (unlikely(!sc)) {
-		SAS_DPRINTK("task_done called with non existing SCSI cmnd!\n");
-		list_del_init(&task->list);
-		sas_free_task(task);
-		return;
-	}
-
 	if (ts->resp == SAS_TASK_UNDELIVERED) {
 		/* transport error */
 		hs = DID_NO_CONNECT;
@@ -124,10 +109,32 @@ static void sas_scsi_task_done(struct sas_task *task)
 			break;
 		}
 	}
-	ASSIGN_SAS_TASK(sc, NULL);
+
 	sc->result = (hs << 16) | stat;
+	ASSIGN_SAS_TASK(sc, NULL);
 	list_del_init(&task->list);
 	sas_free_task(task);
+}
+
+static void sas_scsi_task_done(struct sas_task *task)
+{
+	struct scsi_cmnd *sc = task->uldd_task;
+
+	if (unlikely(task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
+		/* Aborted tasks will be completed by the error handler */
+		SAS_DPRINTK("task done but aborted\n");
+		return;
+	}
+
+	if (unlikely(!sc)) {
+		SAS_DPRINTK("task_done called with non existing SCSI cmnd!\n");
+		list_del_init(&task->list);
+		sas_free_task(task);
+		return;
+	}
+
+	ASSIGN_SAS_TASK(sc, NULL);
+	sas_end_task(sc, task);
 	sc->scsi_done(sc);
 }
 
@@ -236,18 +243,16 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
 	struct sas_task *task = TO_SAS_TASK(cmd);
 	struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host);
 
-	/* remove the aborted task flag to allow the task to be
-	 * completed now. At this point, we only get called following
-	 * an actual abort of the task, so we should be guaranteed not
-	 * to be racing with any completions from the LLD (hence we
-	 * don't need the task state lock to clear the flag) */
-	task->task_state_flags &= ~SAS_TASK_STATE_ABORTED;
-	/* Now call task_done.  However, task will be free'd after
-	 * this */
-	task->task_done(task);
+	/* At this point, we only get called following an actual abort
+	 * of the task, so we should be guaranteed not to be racing with
+	 * any completions from the LLD.  Task is freed after this.
+	 */
+	sas_end_task(cmd, task);
+
 	/* now finish the command and move it on to the error
 	 * handler done list, this also takes it off the
-	 * error handler pending list */
+	 * error handler pending list.
+	 */
 	scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q);
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index d100503..6e9ad20 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -447,7 +447,10 @@ enum service_response {
 };
 
 enum exec_status {
-	/* The SAM_STAT_.. codes fit in the lower 6 bits */
+	/* The SAM_STAT_.. codes fit in the lower 6 bits, alias some of
+	 * them here to silence 'case value not in enumerated type' warnings
+	 */
+	__SAM_STAT_CHECK_CONDITION = SAM_STAT_CHECK_CONDITION,
 
 	SAS_DEV_NO_RESPONSE = 0x80,
 	SAS_DATA_UNDERRUN,

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 14/28] libsas: fix timeout vs completion race
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (12 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 13/28] libsas: prevent double completion of scmds from eh Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 15/28] libsas: let libata handle command timeouts Dan Williams
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide, Christoph Hellwig, Darrick J. Wong

Until we have told the lldd to forget a task a timed out operation can
return from the hardware at any time.  Since completion frees the task
we need to make sure that no tasks run their normal completion handler
once eh has decided to manage the task.  Similar to
ata_scsi_cmd_error_handler() freeze completions to let eh judge the
outcome of the race.

Task collector mode is problematic because it presents a situation where
a task can be timed out and aborted before the lldd has even seen it.
For this case we need to guarantee that a task that an lldd has been
told to forget does not get queued after the lldd says "never seen it".
With sas_scsi_timed_out we achieve this with the ->task_queue_flush
mutex, rather than adding more time.

Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c       |   35 ++++--------
 drivers/scsi/libsas/sas_internal.h  |    1 
 drivers/scsi/libsas/sas_scsi_host.c |  104 +++++++++++++++++------------------
 include/scsi/libsas.h               |    3 +
 include/scsi/sas_ata.h              |    8 ---
 5 files changed, 68 insertions(+), 83 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 11d049d..2bedc5af 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -93,21 +93,30 @@ static enum ata_completion_errors sas_to_ata_err(struct task_status_struct *ts)
 static void sas_ata_task_done(struct sas_task *task)
 {
 	struct ata_queued_cmd *qc = task->uldd_task;
-	struct domain_device *dev;
+	struct domain_device *dev = task->dev;
 	struct task_status_struct *stat = &task->task_status;
 	struct ata_task_resp *resp = (struct ata_task_resp *)stat->buf;
-	struct sas_ha_struct *sas_ha;
+	struct sas_ha_struct *sas_ha = dev->port->ha;
 	enum ata_completion_errors ac;
 	unsigned long flags;
 	struct ata_link *link;
 	struct ata_port *ap;
 
+	spin_lock_irqsave(&dev->done_lock, flags);
+	if (test_bit(SAS_HA_FROZEN, &sas_ha->state))
+		task = NULL;
+	else if (qc && qc->scsicmd)
+		ASSIGN_SAS_TASK(qc->scsicmd, NULL);
+	spin_unlock_irqrestore(&dev->done_lock, flags);
+
+	/* check if libsas-eh got to the task before us */
+	if (unlikely(!task))
+		return;
+
 	if (!qc)
 		goto qc_already_gone;
 
 	ap = qc->ap;
-	dev = ap->private_data;
-	sas_ha = dev->port->ha;
 	link = &ap->link;
 
 	spin_lock_irqsave(ap->lock, flags);
@@ -156,8 +165,6 @@ static void sas_ata_task_done(struct sas_task *task)
 	}
 
 	qc->lldd_task = NULL;
-	if (qc->scsicmd)
-		ASSIGN_SAS_TASK(qc->scsicmd, NULL);
 	ata_qc_complete(qc);
 	spin_unlock_irqrestore(ap->lock, flags);
 
@@ -618,22 +625,6 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost)
 	mutex_unlock(&sas_ha->eh_mutex);
 }
 
-int sas_ata_timed_out(struct scsi_cmnd *cmd, struct sas_task *task,
-		      enum blk_eh_timer_return *rtn)
-{
-	struct domain_device *ddev = cmd_to_domain_dev(cmd);
-
-	if (!dev_is_sata(ddev) || task)
-		return 0;
-
-	/* we're a sata device with no task, so this must be a libata
-	 * eh timeout.  Ideally should hook into libata timeout
-	 * handling, but there's no point, it just wants to activate
-	 * the eh thread */
-	*rtn = BLK_EH_NOT_HANDLED;
-	return 1;
-}
-
 int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 	       struct list_head *done_q)
 {
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index e21c245..f60658e 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -140,6 +140,7 @@ static inline struct domain_device *sas_alloc_device(void)
 		INIT_LIST_HEAD(&dev->dev_list_node);
 		INIT_LIST_HEAD(&dev->disco_list_node);
 		kref_init(&dev->kref);
+		spin_lock_init(&dev->done_lock);
 	}
 	return dev;
 }
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 6ee9826..f15e33a 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -119,9 +119,19 @@ static void sas_end_task(struct scsi_cmnd *sc, struct sas_task *task)
 static void sas_scsi_task_done(struct sas_task *task)
 {
 	struct scsi_cmnd *sc = task->uldd_task;
+	struct domain_device *dev = task->dev;
+	struct sas_ha_struct *ha = dev->port->ha;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->done_lock, flags);
+	if (test_bit(SAS_HA_FROZEN, &ha->state))
+		task = NULL;
+	else
+		ASSIGN_SAS_TASK(sc, NULL);
+	spin_unlock_irqrestore(&dev->done_lock, flags);
 
-	if (unlikely(task->task_state_flags & SAS_TASK_STATE_ABORTED)) {
-		/* Aborted tasks will be completed by the error handler */
+	if (unlikely(!task)) {
+		/* task will be completed by the error handler */
 		SAS_DPRINTK("task done but aborted\n");
 		return;
 	}
@@ -133,7 +143,6 @@ static void sas_scsi_task_done(struct sas_task *task)
 		return;
 	}
 
-	ASSIGN_SAS_TASK(sc, NULL);
 	sas_end_task(sc, task);
 	sc->scsi_done(sc);
 }
@@ -298,6 +307,7 @@ enum task_disposition {
 	TASK_IS_DONE,
 	TASK_IS_ABORTED,
 	TASK_IS_AT_LU,
+	TASK_IS_NOT_AT_HA,
 	TASK_IS_NOT_AT_LU,
 	TASK_ABORT_FAILED,
 };
@@ -314,19 +324,18 @@ static enum task_disposition sas_scsi_find_task(struct sas_task *task)
 		struct scsi_core *core = &ha->core;
 		struct sas_task *t, *n;
 
+		mutex_lock(&core->task_queue_flush);
 		spin_lock_irqsave(&core->task_queue_lock, flags);
-		list_for_each_entry_safe(t, n, &core->task_queue, list) {
+		list_for_each_entry_safe(t, n, &core->task_queue, list)
 			if (task == t) {
 				list_del_init(&t->list);
-				spin_unlock_irqrestore(&core->task_queue_lock,
-						       flags);
-				SAS_DPRINTK("%s: task 0x%p aborted from "
-					    "task_queue\n",
-					    __func__, task);
-				return TASK_IS_ABORTED;
+				break;
 			}
-		}
 		spin_unlock_irqrestore(&core->task_queue_lock, flags);
+		mutex_unlock(&core->task_queue_flush);
+
+		if (task == t)
+			return TASK_IS_NOT_AT_HA;
 	}
 
 	for (i = 0; i < 5; i++) {
@@ -499,8 +508,7 @@ try_bus_reset:
 }
 
 static int sas_eh_handle_sas_errors(struct Scsi_Host *shost,
-				    struct list_head *work_q,
-				    struct list_head *done_q)
+				    struct list_head *work_q)
 {
 	struct scsi_cmnd *cmd, *n;
 	enum task_disposition res = TASK_IS_DONE;
@@ -511,7 +519,16 @@ static int sas_eh_handle_sas_errors(struct Scsi_Host *shost,
 
 Again:
 	list_for_each_entry_safe(cmd, n, work_q, eh_entry) {
-		struct sas_task *task = TO_SAS_TASK(cmd);
+		struct domain_device *dev = cmd_to_domain_dev(cmd);
+		struct sas_task *task;
+
+		spin_lock_irqsave(&dev->done_lock, flags);
+		/* by this point the lldd has either observed
+		 * SAS_HA_FROZEN and is leaving the task alone, or has
+		 * won the race with eh and decided to complete it
+		 */
+		task = TO_SAS_TASK(cmd);
+		spin_unlock_irqrestore(&dev->done_lock, flags);
 
 		if (!task)
 			continue;
@@ -534,6 +551,14 @@ Again:
 		cmd->eh_eflags = 0;
 
 		switch (res) {
+		case TASK_IS_NOT_AT_HA:
+			SAS_DPRINTK("%s: task 0x%p is not at ha: %s\n",
+				    __func__, task,
+				    cmd->retries ? "retry" : "aborted");
+			if (cmd->retries)
+				cmd->retries--;
+			sas_eh_finish_cmd(cmd);
+			continue;
 		case TASK_IS_DONE:
 			SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
 				    task);
@@ -635,7 +660,8 @@ void sas_scsi_recover_host(struct Scsi_Host *shost)
 	 * Deal with commands that still have SAS tasks (i.e. they didn't
 	 * complete via the normal sas_task completion mechanism)
 	 */
-	if (sas_eh_handle_sas_errors(shost, &eh_work_q, &ha->eh_done_q))
+	set_bit(SAS_HA_FROZEN, &ha->state);
+	if (sas_eh_handle_sas_errors(shost, &eh_work_q))
 		goto out;
 
 	/*
@@ -649,6 +675,10 @@ void sas_scsi_recover_host(struct Scsi_Host *shost)
 			scsi_eh_ready_devs(shost, &eh_work_q, &ha->eh_done_q);
 
 out:
+	clear_bit(SAS_HA_FROZEN, &ha->state);
+	if (ha->lldd_max_execute_num > 1)
+		wake_up_process(ha->core.queue_thread);
+
 	/* now link into libata eh --- if we have any ata devices */
 	sas_ata_strategy_handler(shost);
 
@@ -660,43 +690,7 @@ out:
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
 {
-	struct sas_task *task = TO_SAS_TASK(cmd);
-	unsigned long flags;
-	enum blk_eh_timer_return rtn;
-
-	if (sas_ata_timed_out(cmd, task, &rtn))
-		return rtn;
-
-	if (!task) {
-		cmd->request->timeout /= 2;
-		SAS_DPRINTK("command 0x%p, task 0x%p, gone: %s\n",
-			    cmd, task, (cmd->request->timeout ?
-			    "BLK_EH_RESET_TIMER" : "BLK_EH_NOT_HANDLED"));
-		if (!cmd->request->timeout)
-			return BLK_EH_NOT_HANDLED;
-		return BLK_EH_RESET_TIMER;
-	}
-
-	spin_lock_irqsave(&task->task_state_lock, flags);
-	BUG_ON(task->task_state_flags & SAS_TASK_STATE_ABORTED);
-	if (task->task_state_flags & SAS_TASK_STATE_DONE) {
-		spin_unlock_irqrestore(&task->task_state_lock, flags);
-		SAS_DPRINTK("command 0x%p, task 0x%p, timed out: "
-			    "BLK_EH_HANDLED\n", cmd, task);
-		return BLK_EH_HANDLED;
-	}
-	if (!(task->task_state_flags & SAS_TASK_AT_INITIATOR)) {
-		spin_unlock_irqrestore(&task->task_state_lock, flags);
-		SAS_DPRINTK("command 0x%p, task 0x%p, not at initiator: "
-			    "BLK_EH_RESET_TIMER\n",
-			    cmd, task);
-		return BLK_EH_RESET_TIMER;
-	}
-	task->task_state_flags |= SAS_TASK_STATE_ABORTED;
-	spin_unlock_irqrestore(&task->task_state_lock, flags);
-
-	SAS_DPRINTK("command 0x%p, task 0x%p, timed out: BLK_EH_NOT_HANDLED\n",
-		    cmd, task);
+	scmd_printk(KERN_DEBUG, cmd, "command %p timed out\n", cmd);
 
 	return BLK_EH_NOT_HANDLED;
 }
@@ -861,9 +855,11 @@ static void sas_queue(struct sas_ha_struct *sas_ha)
 	int res;
 	struct sas_internal *i = to_sas_internal(core->shost->transportt);
 
+	mutex_lock(&core->task_queue_flush);
 	spin_lock_irqsave(&core->task_queue_lock, flags);
 	while (!kthread_should_stop() &&
-	       !list_empty(&core->task_queue)) {
+	       !list_empty(&core->task_queue) &&
+	       !test_bit(SAS_HA_FROZEN, &sas_ha->state)) {
 
 		can_queue = sas_ha->lldd_queue_size - core->task_queue_size;
 		if (can_queue >= 0) {
@@ -899,6 +895,7 @@ static void sas_queue(struct sas_ha_struct *sas_ha)
 		}
 	}
 	spin_unlock_irqrestore(&core->task_queue_lock, flags);
+	mutex_unlock(&core->task_queue_flush);
 }
 
 /**
@@ -925,6 +922,7 @@ int sas_init_queue(struct sas_ha_struct *sas_ha)
 	struct scsi_core *core = &sas_ha->core;
 
 	spin_lock_init(&core->task_queue_lock);
+	mutex_init(&core->task_queue_flush);
 	core->task_queue_size = 0;
 	INIT_LIST_HEAD(&core->task_queue);
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 6e9ad20..ede0a3e 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -174,6 +174,7 @@ struct sata_device {
 
 /* ---------- Domain device ---------- */
 struct domain_device {
+	spinlock_t done_lock;
         enum sas_dev_type dev_type;
 
         enum sas_linkrate linkrate;
@@ -317,6 +318,7 @@ struct asd_sas_phy {
 struct scsi_core {
 	struct Scsi_Host *shost;
 
+	struct mutex	  task_queue_flush;
 	spinlock_t        task_queue_lock;
 	struct list_head  task_queue;
 	int               task_queue_size;
@@ -332,6 +334,7 @@ struct sas_ha_event {
 enum sas_ha_state {
 	SAS_HA_REGISTERED,
 	SAS_HA_DRAINING,
+	SAS_HA_FROZEN,
 };
 
 struct sas_ha_struct {
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 557fc9a..9f7a23d 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -41,8 +41,6 @@ int sas_ata_init_host_and_port(struct domain_device *found_dev,
 
 void sas_ata_task_abort(struct sas_task *task);
 void sas_ata_strategy_handler(struct Scsi_Host *shost);
-int sas_ata_timed_out(struct scsi_cmnd *cmd, struct sas_task *task,
-		      enum blk_eh_timer_return *rtn);
 int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 	       struct list_head *done_q);
 void sas_probe_sata(struct work_struct *work);
@@ -67,12 +65,6 @@ static inline void sas_ata_strategy_handler(struct Scsi_Host *shost)
 {
 }
 
-static inline int sas_ata_timed_out(struct scsi_cmnd *cmd,
-				    struct sas_task *task,
-				    enum blk_eh_timer_return *rtn)
-{
-	return 0;
-}
 static inline int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 			     struct list_head *done_q)
 {


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 15/28] libsas: let libata handle command timeouts
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (13 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 14/28] libsas: fix timeout vs completion race Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 16/28] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata Dan Williams
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Andrzej Jakowski

libsas-eh if it successfully aborts an ata command will hide the timeout
condition (AC_ERR_TIMEOUT) from libata.  The command likely completes
with the all-zero task->task_status it started with.  Instead, interpret
a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the scmd
around for libata-eh to handle.

Tested-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_init.c      |    1 +
 drivers/scsi/libsas/sas_scsi_host.c |   22 ++++++++++++++++++++--
 include/scsi/libsas.h               |    3 ++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 0cca72a..ffa9869 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -146,6 +146,7 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 	}
 
 	INIT_LIST_HEAD(&sas_ha->eh_done_q);
+	INIT_LIST_HEAD(&sas_ha->eh_ata_q);
 
 	return 0;
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index f15e33a..1804069 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -265,6 +265,22 @@ static void sas_eh_finish_cmd(struct scsi_cmnd *cmd)
 	scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q);
 }
 
+static void sas_eh_defer_cmd(struct scsi_cmnd *cmd)
+{
+	struct sas_task *task = TO_SAS_TASK(cmd);
+	struct domain_device *dev = task->dev;
+	struct sas_ha_struct *ha = dev->port->ha;
+
+	if (!dev_is_sata(dev)) {
+		sas_eh_finish_cmd(cmd);
+		return;
+	}
+
+	/* report the timeout to libata */
+	sas_end_task(cmd, task);
+	list_move_tail(&cmd->eh_entry, &ha->eh_ata_q);
+}
+
 static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd *my_cmd)
 {
 	struct scsi_cmnd *cmd, *n;
@@ -562,12 +578,12 @@ Again:
 		case TASK_IS_DONE:
 			SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
 				    task);
-			sas_eh_finish_cmd(cmd);
+			sas_eh_defer_cmd(cmd);
 			continue;
 		case TASK_IS_ABORTED:
 			SAS_DPRINTK("%s: task 0x%p is aborted\n",
 				    __func__, task);
-			sas_eh_finish_cmd(cmd);
+			sas_eh_defer_cmd(cmd);
 			continue;
 		case TASK_IS_AT_LU:
 			SAS_DPRINTK("task 0x%p is at LU: lu recover\n", task);
@@ -635,12 +651,14 @@ Again:
 			goto clear_q;
 		}
 	}
+	list_splice_tail_init(&ha->eh_ata_q, work_q);
 	return list_empty(work_q);
 clear_q:
 	SAS_DPRINTK("--- Exit %s -- clear_q\n", __func__);
 	list_for_each_entry_safe(cmd, n, work_q, eh_entry)
 		sas_eh_finish_cmd(cmd);
 
+	list_splice_tail_init(&ha->eh_ata_q, work_q);
 	return list_empty(work_q);
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index ede0a3e..3ac12eb 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -377,7 +377,8 @@ struct sas_ha_struct {
 
 	void *lldd_ha;		  /* not touched by sas class code */
 
-	struct list_head eh_done_q;
+	struct list_head eh_done_q;  /* complete via scsi_eh_flush_done_q */
+	struct list_head eh_ata_q; /* scmds to promote from sas to ata eh */
 };
 
 #define SHOST_TO_SAS_HA(_shost) (*(struct sas_ha_struct **)(_shost)->hostdata)


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 16/28] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (14 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 15/28] libsas: let libata handle command timeouts Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  2:59 ` [PATCH v2 17/28] libsas: use libata-eh-reset for sata rediscovery fis transmit failures Dan Williams
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Darrick J. Wong

lldds use the SAS_TASK_NEED_DEV_RESET interface to request that eh
perform a reset.  In the sata device case defer the commands that
triggered the reset to libata-eh context so it can perform its pre and
post reset management.

In the sas_ata_post_internal() case the reset request is falling on deaf
ears as the sas_task is immediately destroyed without any reset action.
Since it is currently a nop, and likely superfluous given the conversion
to new-style libata-eh, just drop the request.

Cc: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c       |   14 ++++----------
 drivers/scsi/libsas/sas_scsi_host.c |    4 ++--
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 2bedc5af..21faf5a 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -406,18 +406,12 @@ static void sas_ata_post_internal(struct ata_queued_cmd *qc)
 		 *  ourselves.
 		 */
 		struct sas_task *task = qc->lldd_task;
-		unsigned long flags;
 
 		qc->lldd_task = NULL;
-		if (task) {
-			/* Should this be a AT(API) device reset? */
-			spin_lock_irqsave(&task->task_state_lock, flags);
-			task->task_state_flags |= SAS_TASK_NEED_DEV_RESET;
-			spin_unlock_irqrestore(&task->task_state_lock, flags);
-
-			task->uldd_task = NULL;
-			sas_ata_internal_abort(task);
-		}
+		if (!task)
+			return;
+		task->uldd_task = NULL;
+		sas_ata_internal_abort(task);
 	}
 }
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 1804069..57a3484 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -288,7 +288,7 @@ static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd
 	list_for_each_entry_safe(cmd, n, error_q, eh_entry) {
 		if (cmd->device->sdev_target == my_cmd->device->sdev_target &&
 		    cmd->device->lun == my_cmd->device->lun)
-			sas_eh_finish_cmd(cmd);
+			sas_eh_defer_cmd(cmd);
 	}
 }
 
@@ -594,7 +594,7 @@ Again:
 					    "recovered\n",
 					    SAS_ADDR(task->dev),
 					    cmd->device->lun);
-				sas_eh_finish_cmd(cmd);
+				sas_eh_defer_cmd(cmd);
 				sas_scsi_clear_queue_lu(work_q, cmd);
 				goto Again;
 			}


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 17/28] libsas: use libata-eh-reset for sata rediscovery fis transmit failures
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (15 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 16/28] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata Dan Williams
@ 2011-12-23  2:59 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 18/28] libsas: perform sas-transport resets in shost->workq context Dan Williams
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  2:59 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Since sata devices can take several seconds to recover the link on reset
the 0.5 seconds that libsas currently waits may not be enough.  Instead
if we are rediscovering a phy that was previously attached to a sata
device let libata handle any resets to encourage the device to transmit
the initial fis.

Once sas_ata_hard_reset() and lldds learn how to honor 'deadline' libsas
should stop encountering phys in an intermediate state, until then this
will loop until the fis is transmitted or ->attached_sas_addr gets
cleared, but in the more likely initial discovery case we keep existing
behavior.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c      |   19 ++++++++++++++++
 drivers/scsi/libsas/sas_expander.c |   44 ++++++++++++++++++++++++++++++++----
 include/scsi/sas_ata.h             |    6 ++++-
 3 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 21faf5a..dc1ff15 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -664,3 +664,22 @@ int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 
 	return rtn;
 }
+
+void sas_ata_schedule_reset(struct domain_device *dev)
+{
+	struct ata_eh_info *ehi;
+	struct ata_port *ap;
+	unsigned long flags;
+
+	if (!dev_is_sata(dev))
+		return;
+
+	ap = dev->sata_dev.ap;
+	ehi = &ap->link.eh_info;
+
+	spin_lock_irqsave(ap->lock, flags);
+	ehi->err_mask |= AC_ERR_TIMEOUT;
+	ehi->action |= ATA_EH_RESET;
+	ata_port_schedule_eh(ap);
+	spin_unlock_irqrestore(ap->lock, flags);
+}
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index c3846cf..ed26a23 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -28,6 +28,7 @@
 
 #include "sas_internal.h"
 
+#include <scsi/sas_ata.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_sas.h>
 #include "../scsi_sas_internal.h"
@@ -226,12 +227,35 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id,
 	return;
 }
 
+/* check if we have an existing attached ata device on this expander phy */
+static struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id)
+{
+	struct ex_phy *ex_phy = &ex_dev->ex_dev.ex_phy[phy_id];
+	struct domain_device *dev;
+	struct sas_rphy *rphy;
+
+	if (!ex_phy->port)
+		return NULL;
+
+	rphy = ex_phy->port->rphy;
+	if (!rphy)
+		return NULL;
+
+	dev = sas_find_dev_by_rphy(rphy);
+
+	if (dev && dev_is_sata(dev))
+		return dev;
+
+	return NULL;
+}
+
 #define DISCOVER_REQ_SIZE  16
 #define DISCOVER_RESP_SIZE 56
 
 static int sas_ex_phy_discover_helper(struct domain_device *dev, u8 *disc_req,
 				      u8 *disc_resp, int single)
 {
+	struct domain_device *ata_dev = sas_ex_to_ata(dev, single);
 	int i, res;
 
 	disc_req[9] = single;
@@ -242,20 +266,30 @@ static int sas_ex_phy_discover_helper(struct domain_device *dev, u8 *disc_req,
 				       disc_resp, DISCOVER_RESP_SIZE);
 		if (res)
 			return res;
-		/* This is detecting a failure to transmit initial
-		 * dev to host FIS as described in section G.5 of
-		 * sas-2 r 04b */
 		dr = &((struct smp_resp *)disc_resp)->disc;
 		if (memcmp(dev->sas_addr, dr->attached_sas_addr,
 			  SAS_ADDR_SIZE) == 0) {
 			sas_printk("Found loopback topology, just ignore it!\n");
 			return 0;
 		}
+
+		/* This is detecting a failure to transmit initial
+		 * dev to host FIS as described in section J.5 of
+		 * sas-2 r16
+		 */
 		if (!(dr->attached_dev_type == 0 &&
 		      dr->attached_sata_dev))
 			break;
-		/* In order to generate the dev to host FIS, we
-		 * send a link reset to the expander port */
+
+		/* In order to generate the dev to host FIS, we send a
+		 * link reset to the expander port.  If a device was
+		 * previously detected on this port we ask libata to
+		 * manage the reset and link recovery.
+		 */
+		if (ata_dev) {
+			sas_ata_schedule_reset(ata_dev);
+			break;
+		}
 		sas_smp_phy_control(dev, single, PHY_FUNC_LINK_RESET, NULL);
 		/* Wait for the reset to trigger the negotiation */
 		msleep(500);
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index 9f7a23d..c0bcd30 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -44,7 +44,7 @@ void sas_ata_strategy_handler(struct Scsi_Host *shost);
 int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 	       struct list_head *done_q);
 void sas_probe_sata(struct work_struct *work);
-
+void sas_ata_schedule_reset(struct domain_device *dev);
 #else
 
 
@@ -75,6 +75,10 @@ static inline void sas_probe_sata(struct work_struct *work)
 {
 }
 
+static inline void sas_ata_schedule_reset(struct domain_device *dev)
+{
+}
+
 #endif
 
 #endif /* _SAS_ATA_H_ */


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 18/28] libsas: perform sas-transport resets in shost->workq context
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (16 preceding siblings ...)
  2011-12-23  2:59 ` [PATCH v2 17/28] libsas: use libata-eh-reset for sata rediscovery fis transmit failures Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 19/28] libsas: execute transport link resets with libata-eh via host workqueue Dan Williams
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Extend the sas transport class to allow transport users to attach extra
data to a sas_phy (->hostdata).  Use this area in libsas to move resets
to workq context in preparation for scheduling ata device resets through
libata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_event.c    |    2 +
 drivers/scsi/libsas/sas_init.c     |   58 +++++++++++++++++++++++++++++++++++-
 drivers/scsi/libsas/sas_internal.h |   10 ++++++
 drivers/scsi/scsi_transport_sas.c  |   18 +++++++++++
 include/scsi/scsi_transport_sas.h  |    5 ++-
 5 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 3ff73af..3a75c81 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -27,7 +27,7 @@
 #include "sas_internal.h"
 #include "sas_dump.h"
 
-static void sas_queue_work(struct sas_ha_struct *ha, struct work_struct *work)
+void sas_queue_work(struct sas_ha_struct *ha, struct work_struct *work)
 {
 	if (!test_bit(SAS_HA_REGISTERED, &ha->state))
 		return;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index ffa9869..39d2899 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -290,9 +290,65 @@ int sas_set_phy_speed(struct sas_phy *phy,
 	return ret;
 }
 
+static void sas_phy_release(struct sas_phy *phy)
+{
+	kfree(phy->hostdata);
+	phy->hostdata = NULL;
+}
+
+static void phy_reset_work(struct work_struct *work)
+{
+	struct sas_phy_data *d = container_of(work, typeof(*d), reset_work);
+
+	d->reset_result = sas_phy_reset(d->phy, d->hard_reset);
+}
+
+static int sas_phy_setup(struct sas_phy *phy)
+{
+	struct sas_phy_data *d = kzalloc(sizeof(*d), GFP_KERNEL);
+
+	if (!d)
+		return -ENOMEM;
+
+	mutex_init(&d->event_lock);
+	INIT_WORK(&d->reset_work, phy_reset_work);
+	d->phy = phy;
+	phy->hostdata = d;
+
+	return 0;
+}
+
+static int queue_phy_reset(struct sas_phy *phy, int hard_reset)
+{
+	struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
+	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+	struct sas_phy_data *d = phy->hostdata;
+	int rc;
+
+	if (!d)
+		return -ENOMEM;
+
+	/* libsas workqueue coordinates ata-eh reset with discovery */
+	mutex_lock(&d->event_lock);
+	d->reset_result = 0;
+	d->hard_reset = hard_reset;
+
+	spin_lock_irq(&ha->state_lock);
+	sas_queue_work(ha, &d->reset_work);
+	spin_unlock_irq(&ha->state_lock);
+
+	sas_drain_work(ha);
+	rc = d->reset_result;
+	mutex_unlock(&d->event_lock);
+
+	return rc;
+}
+
 static struct sas_function_template sft = {
 	.phy_enable = sas_phy_enable,
-	.phy_reset = sas_phy_reset,
+	.phy_reset = queue_phy_reset,
+	.phy_setup = sas_phy_setup,
+	.phy_release = sas_phy_release,
 	.set_phy_speed = sas_set_phy_speed,
 	.get_linkerrors = sas_get_linkerrors,
 	.smp_handler = sas_smp_handler,
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index f60658e..2a873fa 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -38,6 +38,15 @@
 #define TO_SAS_TASK(_scsi_cmd)  ((void *)(_scsi_cmd)->host_scribble)
 #define ASSIGN_SAS_TASK(_sc, _t) do { (_sc)->host_scribble = (void *) _t; } while (0)
 
+struct sas_phy_data {
+	/* let reset be performed in sas_queue_work() context */
+	struct sas_phy *phy;
+	struct mutex event_lock;
+	int hard_reset;
+	int reset_result;
+	struct work_struct reset_work;
+};
+
 void sas_scsi_recover_host(struct Scsi_Host *shost);
 
 int sas_show_class(enum sas_class class, char *buf);
@@ -64,6 +73,7 @@ void sas_porte_broadcast_rcvd(struct work_struct *work);
 void sas_porte_link_reset_err(struct work_struct *work);
 void sas_porte_timer_event(struct work_struct *work);
 void sas_porte_hard_reset(struct work_struct *work);
+void sas_queue_work(struct sas_ha_struct *ha, struct work_struct *work);
 
 int sas_notify_lldd_dev_found(struct domain_device *);
 void sas_notify_lldd_dev_gone(struct domain_device *);
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index 9421bae..ab3bd0b 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -652,9 +652,21 @@ sas_phy_linkerror_attr(running_disparity_error_count);
 sas_phy_linkerror_attr(loss_of_dword_sync_count);
 sas_phy_linkerror_attr(phy_reset_problem_count);
 
+static int sas_phy_setup(struct transport_container *tc, struct device *dev,
+			 struct device *cdev)
+{
+	struct sas_phy *phy = dev_to_phy(dev);
+	struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
+	struct sas_internal *i = to_sas_internal(shost->transportt);
+
+	if (i->f->phy_setup)
+		i->f->phy_setup(phy);
+
+	return 0;
+}
 
 static DECLARE_TRANSPORT_CLASS(sas_phy_class,
-		"sas_phy", NULL, NULL, NULL);
+		"sas_phy", sas_phy_setup, NULL, NULL);
 
 static int sas_phy_match(struct attribute_container *cont, struct device *dev)
 {
@@ -678,7 +690,11 @@ static int sas_phy_match(struct attribute_container *cont, struct device *dev)
 static void sas_phy_release(struct device *dev)
 {
 	struct sas_phy *phy = dev_to_phy(dev);
+	struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
+	struct sas_internal *i = to_sas_internal(shost->transportt);
 
+	if (i->f->phy_release)
+		i->f->phy_release(phy);
 	put_device(dev->parent);
 	kfree(phy);
 }
diff --git a/include/scsi/scsi_transport_sas.h b/include/scsi/scsi_transport_sas.h
index 6d14daa..42817fa 100644
--- a/include/scsi/scsi_transport_sas.h
+++ b/include/scsi/scsi_transport_sas.h
@@ -75,7 +75,8 @@ struct sas_phy {
 	/* for the list of phys belonging to a port */
 	struct list_head	port_siblings;
 
-	struct work_struct      reset_work;
+	/* available to the lldd */
+	void			*hostdata;
 };
 
 #define dev_to_phy(d) \
@@ -169,6 +170,8 @@ struct sas_function_template {
 	int (*get_bay_identifier)(struct sas_rphy *);
 	int (*phy_reset)(struct sas_phy *, int);
 	int (*phy_enable)(struct sas_phy *, int);
+	int (*phy_setup)(struct sas_phy *);
+	void (*phy_release)(struct sas_phy *);
 	int (*set_phy_speed)(struct sas_phy *, struct sas_phy_linkrates *);
 	int (*smp_handler)(struct Scsi_Host *, struct sas_rphy *, struct request *);
 };


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 19/28] libsas: execute transport link resets with libata-eh via host workqueue
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (17 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 18/28] libsas: perform sas-transport resets in shost->workq context Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 20/28] libsas: sas_phy_enable via transport_sas_phy_reset Dan Williams
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata.  Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds.  They are not
prepared for it to loop back into eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/ata/libata-eh.c            |    1 +
 drivers/ata/libata.h               |    1 -
 drivers/scsi/libsas/sas_ata.c      |   11 +++++++
 drivers/scsi/libsas/sas_expander.c |    2 +
 drivers/scsi/libsas/sas_init.c     |   56 +++++++++++++++++++++++++++++++++++-
 drivers/scsi/libsas/sas_internal.h |    1 +
 include/linux/libata.h             |    1 +
 include/scsi/sas_ata.h             |    4 +++
 8 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index a9b2820..c61316e 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -863,6 +863,7 @@ void ata_port_wait_eh(struct ata_port *ap)
 		goto retry;
 	}
 }
+EXPORT_SYMBOL_GPL(ata_port_wait_eh);
 
 static int ata_eh_nr_in_flight(struct ata_port *ap)
 {
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 773de97..78c356d 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -150,7 +150,6 @@ extern void ata_eh_acquire(struct ata_port *ap);
 extern void ata_eh_release(struct ata_port *ap);
 extern enum blk_eh_timer_return ata_scsi_timed_out(struct scsi_cmnd *cmd);
 extern void ata_scsi_error(struct Scsi_Host *host);
-extern void ata_port_wait_eh(struct ata_port *ap);
 extern void ata_eh_fastdrain_timerfn(unsigned long arg);
 extern void ata_qc_schedule_eh(struct ata_queued_cmd *qc);
 extern void ata_dev_disable(struct ata_device *dev);
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index dc1ff15..3b7c362 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -683,3 +683,14 @@ void sas_ata_schedule_reset(struct domain_device *dev)
 	ata_port_schedule_eh(ap);
 	spin_unlock_irqrestore(ap->lock, flags);
 }
+
+void sas_ata_wait_eh(struct domain_device *dev)
+{
+	struct ata_port *ap;
+
+	if (!dev_is_sata(dev))
+		return;
+
+	ap = dev->sata_dev.ap;
+	ata_port_wait_eh(ap);
+}
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index ed26a23..9d2bb32 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -228,7 +228,7 @@ static void sas_set_ex_phy(struct domain_device *dev, int phy_id,
 }
 
 /* check if we have an existing attached ata device on this expander phy */
-static struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id)
+struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id)
 {
 	struct ex_phy *ex_phy = &ex_dev->ex_dev.ex_phy[phy_id];
 	struct domain_device *dev;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 39d2899..f261e97 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -28,6 +28,7 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/spinlock.h>
+#include <scsi/sas_ata.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_transport.h>
@@ -195,6 +196,59 @@ static int sas_get_linkerrors(struct sas_phy *phy)
 	return sas_smp_get_phy_events(phy);
 }
 
+/**
+ * transport_sas_phy_reset - reset a phy and permit libata to manage the link
+ *
+ * phy reset request via sysfs in host workqueue context so we know we
+ * can block on eh and safely traverse the domain_device topology
+ */
+static int transport_sas_phy_reset(struct sas_phy *phy, int hard_reset)
+{
+	int ret;
+	enum phy_func reset_type;
+
+	if (hard_reset)
+		reset_type = PHY_FUNC_HARD_RESET;
+	else
+		reset_type = PHY_FUNC_LINK_RESET;
+
+	if (scsi_is_sas_phy_local(phy)) {
+		struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
+		struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(shost);
+		struct asd_sas_phy *asd_phy = sas_ha->sas_phy[phy->number];
+		struct sas_internal *i =
+			to_sas_internal(sas_ha->core.shost->transportt);
+		struct domain_device *dev = NULL;
+
+		if (asd_phy->port)
+			dev = asd_phy->port->port_dev;
+
+		/* validate that dev has been probed */
+		if (dev)
+			dev = sas_find_dev_by_rphy(dev->rphy);
+
+		if (dev && dev_is_sata(dev) && !hard_reset) {
+			sas_ata_schedule_reset(dev);
+			sas_ata_wait_eh(dev);
+			ret = 0;
+		} else
+			ret = i->dft->lldd_control_phy(asd_phy, reset_type, NULL);
+	} else {
+		struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
+		struct domain_device *ddev = sas_find_dev_by_rphy(rphy);
+		struct domain_device *ata_dev = sas_ex_to_ata(ddev, phy->number);
+
+		if (ata_dev && !hard_reset) {
+			sas_ata_schedule_reset(ata_dev);
+			sas_ata_wait_eh(ata_dev);
+			ret = 0;
+		} else
+			ret = sas_smp_phy_control(ddev, phy->number, reset_type, NULL);
+	}
+
+	return ret;
+}
+
 int sas_phy_enable(struct sas_phy *phy, int enable)
 {
 	int ret;
@@ -300,7 +354,7 @@ static void phy_reset_work(struct work_struct *work)
 {
 	struct sas_phy_data *d = container_of(work, typeof(*d), reset_work);
 
-	d->reset_result = sas_phy_reset(d->phy, d->hard_reset);
+	d->reset_result = transport_sas_phy_reset(d->phy, d->hard_reset);
 }
 
 static int sas_phy_setup(struct sas_phy *phy)
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 2a873fa..984de1c 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -83,6 +83,7 @@ int sas_smp_phy_control(struct domain_device *dev, int phy_id,
 int sas_smp_get_phy_events(struct sas_phy *phy);
 
 struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
+struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id);
 
 void sas_hae_reset(struct work_struct *work);
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index cafc09a..aa42704 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1147,6 +1147,7 @@ static inline int ata_acpi_cbl_80wire(struct ata_port *ap,
  * EH - drivers/ata/libata-eh.c
  */
 extern void ata_port_schedule_eh(struct ata_port *ap);
+extern void ata_port_wait_eh(struct ata_port *ap);
 extern int ata_link_abort(struct ata_link *link);
 extern int ata_port_abort(struct ata_port *ap);
 extern int ata_port_freeze(struct ata_port *ap);
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index c0bcd30..da3f377 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -45,6 +45,7 @@ int sas_ata_eh(struct Scsi_Host *shost, struct list_head *work_q,
 	       struct list_head *done_q);
 void sas_probe_sata(struct work_struct *work);
 void sas_ata_schedule_reset(struct domain_device *dev);
+void sas_ata_wait_eh(struct domain_device *dev);
 #else
 
 
@@ -79,6 +80,9 @@ static inline void sas_ata_schedule_reset(struct domain_device *dev)
 {
 }
 
+static inline void sas_ata_wait_eh(struct domain_device *dev)
+{
+}
 #endif
 
 #endif /* _SAS_ATA_H_ */


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 20/28] libsas: sas_phy_enable via transport_sas_phy_reset
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (18 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 19/28] libsas: execute transport link resets with libata-eh via host workqueue Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 21/28] libsas: Remove redundant phy state notification calls Dan Williams
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Execute the link-reset triggered by sas_phy_enable via
transport_sas_phy_reset so that it can be managed by libata.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_init.c      |   56 ++++++++++++++++++++++++++++++-----
 drivers/scsi/libsas/sas_internal.h  |    3 ++
 drivers/scsi/libsas/sas_scsi_host.c |    1 -
 include/scsi/libsas.h               |    1 -
 4 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index f261e97..ca080a8 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -249,15 +249,15 @@ static int transport_sas_phy_reset(struct sas_phy *phy, int hard_reset)
 	return ret;
 }
 
-int sas_phy_enable(struct sas_phy *phy, int enable)
+static int sas_phy_enable(struct sas_phy *phy, int enable)
 {
 	int ret;
-	enum phy_func command;
+	enum phy_func cmd;
 
 	if (enable)
-		command = PHY_FUNC_LINK_RESET;
+		cmd = PHY_FUNC_LINK_RESET;
 	else
-		command = PHY_FUNC_DISABLE;
+		cmd = PHY_FUNC_DISABLE;
 
 	if (scsi_is_sas_phy_local(phy)) {
 		struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
@@ -266,15 +266,21 @@ int sas_phy_enable(struct sas_phy *phy, int enable)
 		struct sas_internal *i =
 			to_sas_internal(sas_ha->core.shost->transportt);
 
-		if (!enable) {
+		if (enable)
+			ret = transport_sas_phy_reset(phy, 0);
+		else {
 			sas_phy_disconnected(asd_phy);
 			sas_ha->notify_phy_event(asd_phy, PHYE_LOSS_OF_SIGNAL);
+			ret = i->dft->lldd_control_phy(asd_phy, cmd, NULL);
 		}
-		ret = i->dft->lldd_control_phy(asd_phy, command, NULL);
 	} else {
 		struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
 		struct domain_device *ddev = sas_find_dev_by_rphy(rphy);
-		ret = sas_smp_phy_control(ddev, phy->number, command, NULL);
+
+		if (enable)
+			ret = transport_sas_phy_reset(phy, 0);
+		else
+			ret = sas_smp_phy_control(ddev, phy->number, cmd, NULL);
 	}
 	return ret;
 }
@@ -357,6 +363,13 @@ static void phy_reset_work(struct work_struct *work)
 	d->reset_result = transport_sas_phy_reset(d->phy, d->hard_reset);
 }
 
+static void phy_enable_work(struct work_struct *work)
+{
+	struct sas_phy_data *d = container_of(work, typeof(*d), enable_work);
+
+	d->enable_result = sas_phy_enable(d->phy, d->enable);
+}
+
 static int sas_phy_setup(struct sas_phy *phy)
 {
 	struct sas_phy_data *d = kzalloc(sizeof(*d), GFP_KERNEL);
@@ -366,6 +379,7 @@ static int sas_phy_setup(struct sas_phy *phy)
 
 	mutex_init(&d->event_lock);
 	INIT_WORK(&d->reset_work, phy_reset_work);
+	INIT_WORK(&d->enable_work, phy_enable_work);
 	d->phy = phy;
 	phy->hostdata = d;
 
@@ -398,8 +412,34 @@ static int queue_phy_reset(struct sas_phy *phy, int hard_reset)
 	return rc;
 }
 
+static int queue_phy_enable(struct sas_phy *phy, int enable)
+{
+	struct Scsi_Host *shost = dev_to_shost(phy->dev.parent);
+	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
+	struct sas_phy_data *d = phy->hostdata;
+	int rc;
+
+	if (!d)
+		return -ENOMEM;
+
+	/* libsas workqueue coordinates ata-eh reset with discovery */
+	mutex_lock(&d->event_lock);
+	d->enable_result = 0;
+	d->enable = enable;
+
+	spin_lock_irq(&ha->state_lock);
+	sas_queue_work(ha, &d->enable_work);
+	spin_unlock_irq(&ha->state_lock);
+
+	sas_drain_work(ha);
+	rc = d->enable_result;
+	mutex_unlock(&d->event_lock);
+
+	return rc;
+}
+
 static struct sas_function_template sft = {
-	.phy_enable = sas_phy_enable,
+	.phy_enable = queue_phy_enable,
 	.phy_reset = queue_phy_reset,
 	.phy_setup = sas_phy_setup,
 	.phy_release = sas_phy_release,
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 984de1c..cde1a84 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -45,6 +45,9 @@ struct sas_phy_data {
 	int hard_reset;
 	int reset_result;
 	struct work_struct reset_work;
+	int enable;
+	int enable_result;
+	struct work_struct enable_work;
 };
 
 void sas_scsi_recover_host(struct Scsi_Host *shost);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 57a3484..b849dcd 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -1077,7 +1077,6 @@ EXPORT_SYMBOL_GPL(sas_change_queue_type);
 EXPORT_SYMBOL_GPL(sas_bios_param);
 EXPORT_SYMBOL_GPL(sas_task_abort);
 EXPORT_SYMBOL_GPL(sas_phy_reset);
-EXPORT_SYMBOL_GPL(sas_phy_enable);
 EXPORT_SYMBOL_GPL(sas_eh_device_reset_handler);
 EXPORT_SYMBOL_GPL(sas_eh_bus_reset_handler);
 EXPORT_SYMBOL_GPL(sas_slave_alloc);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 3ac12eb..7ba7095 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -629,7 +629,6 @@ extern int sas_unregister_ha(struct sas_ha_struct *);
 
 int sas_set_phy_speed(struct sas_phy *phy,
 		      struct sas_phy_linkrates *rates);
-int sas_phy_enable(struct sas_phy *phy, int enabled);
 int sas_phy_reset(struct sas_phy *phy, int hard_reset);
 int sas_queue_up(struct sas_task *task);
 extern int sas_queuecommand(struct Scsi_Host * ,struct scsi_cmnd *);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 21/28] libsas: Remove redundant phy state notification calls.
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (19 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 20/28] libsas: sas_phy_enable via transport_sas_phy_reset Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 22/28] libsas: add mutex for SMP task execution Dan Williams
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi
  Cc: Xiangliang Yu, linux-ide, Jeff Skirvin, Luben Tuikov, Jack Wang

From: Jeff Skirvin <jeffrey.d.skirvin@intel.com>

In the case of an explicit sas_phy_enable call to disable a phy,
the LLDD provides the calls to sas_phy_disconnected and the
PHYE_LOSS_OF_SIGNAL event.

NOTE: This assumes that the lldd(s) generate the notification, which
appears to be the case, but only verfied on isci.

Cc: Jack Wang <jack_wang@usish.com>
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_init.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index ca080a8..075927e 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -268,11 +268,8 @@ static int sas_phy_enable(struct sas_phy *phy, int enable)
 
 		if (enable)
 			ret = transport_sas_phy_reset(phy, 0);
-		else {
-			sas_phy_disconnected(asd_phy);
-			sas_ha->notify_phy_event(asd_phy, PHYE_LOSS_OF_SIGNAL);
+		else
 			ret = i->dft->lldd_control_phy(asd_phy, cmd, NULL);
-		}
 	} else {
 		struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
 		struct domain_device *ddev = sas_find_dev_by_rphy(rphy);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 22/28] libsas: add mutex for SMP task execution
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (20 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 21/28] libsas: Remove redundant phy state notification calls Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 23/28] libsas: async ata-eh Dan Williams
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide, Jeff Skirvin

From: Jeff Skirvin <jeffrey.d.skirvin@intel.com>

SAS does not tag SMP requests, and at least one lldd (isci) does not permit
more than one in-flight request at a time.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_discover.c |    1 +
 drivers/scsi/libsas/sas_expander.c |   29 ++++++++++++++++-------------
 include/scsi/libsas.h              |    2 ++
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 3905143..6e5fdfd 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -43,6 +43,7 @@ void sas_init_dev(struct domain_device *dev)
         case EDGE_DEV:
         case FANOUT_DEV:
                 INIT_LIST_HEAD(&dev->ex_dev.children);
+		mutex_init(&dev->ex_dev.cmd_mutex);
                 break;
         case SATA_DEV:
         case SATA_PM:
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 9d2bb32..fd77ea3 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -72,11 +72,13 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 	struct sas_internal *i =
 		to_sas_internal(dev->port->ha->core.shost->transportt);
 
+	mutex_lock(&dev->ex_dev.cmd_mutex);
 	for (retry = 0; retry < 3; retry++) {
 		task = sas_alloc_task(GFP_KERNEL);
-		if (!task)
-			return -ENOMEM;
-
+		if (!task) {
+			res = -ENOMEM;
+			break;
+		}
 		task->dev = dev;
 		task->task_proto = dev->tproto;
 		sg_init_one(&task->smp_task.smp_req, req, req_size);
@@ -94,7 +96,7 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 		if (res) {
 			del_timer(&task->timer);
 			SAS_DPRINTK("executing SMP task failed:%d\n", res);
-			goto ex_err;
+			break;
 		}
 
 		wait_for_completion(&task->completion);
@@ -104,21 +106,23 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 			i->dft->lldd_abort_task(task);
 			if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) {
 				SAS_DPRINTK("SMP task aborted and not done\n");
-				goto ex_err;
+				break;
 			}
 		}
 		if (task->task_status.resp == SAS_TASK_COMPLETE &&
 		    task->task_status.stat == SAM_STAT_GOOD) {
 			res = 0;
 			break;
-		} if (task->task_status.resp == SAS_TASK_COMPLETE &&
-		      task->task_status.stat == SAS_DATA_UNDERRUN) {
+		}
+		if (task->task_status.resp == SAS_TASK_COMPLETE &&
+		    task->task_status.stat == SAS_DATA_UNDERRUN) {
 			/* no error, but return the number of bytes of
 			 * underrun */
 			res = task->task_status.residual;
 			break;
-		} if (task->task_status.resp == SAS_TASK_COMPLETE &&
-		      task->task_status.stat == SAS_DATA_OVERRUN) {
+		}
+		if (task->task_status.resp == SAS_TASK_COMPLETE &&
+		    task->task_status.stat == SAS_DATA_OVERRUN) {
 			res = -EMSGSIZE;
 			break;
 		} else {
@@ -131,11 +135,10 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 			task = NULL;
 		}
 	}
-ex_err:
+	mutex_unlock(&dev->ex_dev.cmd_mutex);
+
 	BUG_ON(retry == 3 && task != NULL);
-	if (task != NULL) {
-		sas_free_task(task);
-	}
+	sas_free_task(task);
 	return res;
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 7ba7095..6e1c640 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -153,6 +153,8 @@ struct expander_device {
 
 	struct ex_phy *ex_phy;
 	struct sas_port *parent_port;
+
+	struct mutex cmd_mutex;
 };
 
 /* ---------- SATA device ---------- */


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 23/28] libsas: async ata-eh
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (21 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 22/28] libsas: add mutex for SMP task execution Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 24/28] libsas: poll for ata device readiness after reset Dan Williams
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a
pathological configuration could take 25 seconds per ata device
(serialized) to recover.  Run per-port recoveries in parallel.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 3b7c362..0c67577 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -23,6 +23,7 @@
 
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
+#include <linux/async.h>
 
 #include <scsi/sas_ata.h>
 #include "sas_internal.h"
@@ -600,22 +601,32 @@ int sas_discover_sata(struct domain_device *dev)
 	return 0;
 }
 
+static void async_sas_ata_eh(void *data, async_cookie_t cookie)
+{
+	struct domain_device *dev = data;
+	struct ata_port *ap = dev->sata_dev.ap;
+	struct sas_ha_struct *ha = dev->port->ha;
+
+	ata_port_printk(ap, KERN_DEBUG, "sas eh calling libata port error handler");
+	ata_scsi_port_error_handler(ha->core.shost, ap);
+}
+
 void sas_ata_strategy_handler(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
 	struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(shost);
+	LIST_HEAD(async);
 
 	mutex_lock(&sas_ha->eh_mutex);
 	shost_for_each_device(sdev, shost) {
 		struct domain_device *ddev = sdev_to_domain_dev(sdev);
-		struct ata_port *ap = ddev->sata_dev.ap;
 
 		if (!dev_is_sata(ddev))
 			continue;
 
-		ata_port_printk(ap, KERN_DEBUG, "sas eh calling libata port error handler");
-		ata_scsi_port_error_handler(shost, ap);
+		async_schedule_domain(async_sas_ata_eh, ddev, &async);
 	}
+	async_synchronize_full_domain(&async);
 	mutex_unlock(&sas_ha->eh_mutex);
 }
 


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 24/28] libsas: poll for ata device readiness after reset
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (22 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 23/28] libsas: async ata-eh Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-29  6:18   ` Jack Wang
  2011-12-23  3:00 ` [PATCH v2 25/28] libsas: don't mark expanders as gone when a child device is removed Dan Williams
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: Tejun Heo, linux-ide

Use ata_wait_after_reset() to poll for link recovery after a reset.
This combined with sas_ha->eh_mutex prevents expander rediscovery from
probing phys in an intermediate state.  Local discovery does not have a
mechanism to filter link status changes during this timeout, so it
remains the responsibility of lldds to prevent premature port teardown.
Although once all lldd's support ->lldd_ata_check_ready() that could be
used as a gate to local port teardown.

The signature fis is re-transmitted when the link comes back so we
should be revalidating the ata device class, but that is left to a future
patch.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c      |  104 +++++++++++++++++++++++++-----------
 drivers/scsi/libsas/sas_expander.c |   10 ++-
 drivers/scsi/libsas/sas_internal.h |    3 +
 include/scsi/libsas.h              |    1 
 4 files changed, 83 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 0c67577..e174a73 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -267,39 +267,84 @@ static bool sas_ata_qc_fill_rtf(struct ata_queued_cmd *qc)
 	return true;
 }
 
-static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
-			       unsigned long deadline)
+static struct sas_internal *dev_to_sas_internal(struct domain_device *dev)
+{
+	return to_sas_internal(dev->port->ha->core.shost->transportt);
+}
+
+static int smp_ata_check_ready(struct ata_link *link)
 {
+	int res;
+	u8 addr[8];
 	struct ata_port *ap = link->ap;
 	struct domain_device *dev = ap->private_data;
-	struct sas_internal *i =
-		to_sas_internal(dev->port->ha->core.shost->transportt);
-	int res = TMF_RESP_FUNC_FAILED;
-	int ret = 0;
+	struct domain_device *ex_dev = dev->parent;
+	struct sas_phy *phy = sas_find_local_phy(dev);
 
-	if (i->dft->lldd_I_T_nexus_reset)
-		res = i->dft->lldd_I_T_nexus_reset(dev);
+	res = sas_get_phy_attached_sas_addr(ex_dev, phy->number, addr);
+	/* break the wait early if the expander is unreachable,
+	 * otherwise keep polling
+	 */
+	if (res == -ECOMM)
+		return res;
+	if (res != SMP_RESP_FUNC_ACC || SAS_ADDR(addr) == 0)
+		return 0;
+	else
+		return 1;
+}
 
-	if (res != TMF_RESP_FUNC_COMPLETE) {
-		SAS_DPRINTK("%s: Unable to reset I T nexus?\n", __func__);
-		ret = -EAGAIN;
+static int local_ata_check_ready(struct ata_link *link)
+{
+	struct ata_port *ap = link->ap;
+	struct domain_device *dev = ap->private_data;
+	struct sas_internal *i = dev_to_sas_internal(dev);
+
+	if (i->dft->lldd_ata_check_ready)
+		return i->dft->lldd_ata_check_ready(dev);
+	else {
+		/* lldd's that don't implement 'ready' checking get the
+		 * old default behavior of not coordinating reset
+		 * recovery with libata
+		 */
+		return 1;
 	}
+}
 
+static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
+			      unsigned long deadline)
+{
+	int ret = 0, res;
+	struct ata_port *ap = link->ap;
+	int (*check_ready)(struct ata_link *link);
+	struct domain_device *dev = ap->private_data;
+	struct sas_phy *phy = sas_find_local_phy(dev);
+	struct sas_internal *i = dev_to_sas_internal(dev);
+
+	res = i->dft->lldd_I_T_nexus_reset(dev);
+
+	if (res != TMF_RESP_FUNC_COMPLETE)
+		SAS_DPRINTK("%s: Unable to reset ata device?\n", __func__);
+
+	if (scsi_is_sas_phy_local(phy))
+		check_ready = local_ata_check_ready;
+	else
+		check_ready = smp_ata_check_ready;
+
+	ret = ata_wait_after_reset(link, deadline, check_ready);
+	if (ret && ret != -EAGAIN)
+		ata_link_err(link, "COMRESET failed (errno=%d)\n", ret);
+
+	/* XXX: if the class changes during the reset the upper layer
+	 * should be informed, if the device has gone away we assume
+	 * libsas will eventually delete it
+	 */
 	switch (dev->sata_dev.command_set) {
-		case ATA_COMMAND_SET:
-			SAS_DPRINTK("%s: Found ATA device.\n", __func__);
-			*class = ATA_DEV_ATA;
-			break;
-		case ATAPI_COMMAND_SET:
-			SAS_DPRINTK("%s: Found ATAPI device.\n", __func__);
-			*class = ATA_DEV_ATAPI;
-			break;
-		default:
-			SAS_DPRINTK("%s: Unknown SATA command set: %d.\n",
-				    __func__,
-				    dev->sata_dev.command_set);
-			*class = ATA_DEV_UNKNOWN;
-			break;
+	case ATA_COMMAND_SET:
+		*class = ATA_DEV_ATA;
+		break;
+	case ATAPI_COMMAND_SET:
+		*class = ATA_DEV_ATAPI;
+		break;
 	}
 
 	ap->cbl = ATA_CBL_SATA;
@@ -311,8 +356,7 @@ static int sas_ata_soft_reset(struct ata_link *link, unsigned int *class,
 {
 	struct ata_port *ap = link->ap;
 	struct domain_device *dev = ap->private_data;
-	struct sas_internal *i =
-		to_sas_internal(dev->port->ha->core.shost->transportt);
+	struct sas_internal *i = dev_to_sas_internal(dev);
 	int res = TMF_RESP_FUNC_FAILED;
 	int ret = 0;
 
@@ -350,8 +394,7 @@ static int sas_ata_soft_reset(struct ata_link *link, unsigned int *class,
  */
 static void sas_ata_internal_abort(struct sas_task *task)
 {
-	struct sas_internal *si =
-		to_sas_internal(task->dev->port->ha->core.shost->transportt);
+	struct sas_internal *si = dev_to_sas_internal(task->dev);
 	unsigned long flags;
 	int res;
 
@@ -420,8 +463,7 @@ static void sas_ata_post_internal(struct ata_queued_cmd *qc)
 static void sas_ata_set_dmamode(struct ata_port *ap, struct ata_device *ata_dev)
 {
 	struct domain_device *dev = ap->private_data;
-	struct sas_internal *i =
-		to_sas_internal(dev->port->ha->core.shost->transportt);
+	struct sas_internal *i = dev_to_sas_internal(dev);
 
 	if (i->dft->lldd_ata_set_dmamode)
 		i->dft->lldd_ata_set_dmamode(dev);
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index fd77ea3..5e1eec9 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -125,7 +125,11 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 		    task->task_status.stat == SAS_DATA_OVERRUN) {
 			res = -EMSGSIZE;
 			break;
-		} else {
+		}
+		if (task->task_status.resp == SAS_TASK_UNDELIVERED &&
+		    task->task_status.stat == SAS_DEVICE_UNKNOWN)
+			break;
+		else {
 			SAS_DPRINTK("%s: task to dev %016llx response: 0x%x "
 				    "status 0x%x\n", __func__,
 				    SAS_ADDR(dev->sas_addr),
@@ -1648,8 +1652,8 @@ static int sas_get_phy_change_count(struct domain_device *dev,
 	return res;
 }
 
-static int sas_get_phy_attached_sas_addr(struct domain_device *dev,
-					 int phy_id, u8 *attached_sas_addr)
+int sas_get_phy_attached_sas_addr(struct domain_device *dev, int phy_id,
+				  u8 *attached_sas_addr)
 {
 	int res;
 	struct smp_resp *disc_resp;
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index cde1a84..c6317c1 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -87,7 +87,8 @@ int sas_smp_get_phy_events(struct sas_phy *phy);
 
 struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
 struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id);
-
+int sas_get_phy_attached_sas_addr(struct domain_device *dev, int phy_id,
+				  u8 *attached_sas_addr);
 void sas_hae_reset(struct work_struct *work);
 
 void sas_free_device(struct kref *kref);
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 6e1c640..3c9849c 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -610,6 +610,7 @@ struct sas_domain_function_template {
 	int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
 	int (*lldd_I_T_nexus_reset)(struct domain_device *);
 	int (*lldd_ata_soft_reset)(struct domain_device *);
+	int (*lldd_ata_check_ready)(struct domain_device *);
 	void (*lldd_ata_set_dmamode)(struct domain_device *);
 	int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
 	int (*lldd_query_task)(struct sas_task *);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 25/28] libsas: don't mark expanders as gone when a child device is removed
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (23 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 24/28] libsas: poll for ata device readiness after reset Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task() Dan Williams
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that
have been hot-removed" marked the parent device of an end-device as gone
when all the phys to the end device have been deleted.

The expander device is still present until its parent is removed.  This
is a benign change until the smp_execute_task() path is taught to check
->gone.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 5e1eec9..d9c2769 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -1820,7 +1820,6 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
 				break;
 			}
 		}
-		parent->gone = 1;
 		sas_disable_routing(parent, phy->attached_sas_addr);
 	}
 	memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task()
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (24 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 25/28] libsas: don't mark expanders as gone when a child device is removed Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2012-01-09 19:04   ` Dan Williams
  2011-12-23  3:00 ` [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references Dan Williams
  2011-12-23  3:00 ` [PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset() Dan Williams
  27 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

No sense in issuing or retrying commands to an expander that has been
removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_expander.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index d9c2769..e2efc6c 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -74,6 +74,9 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
 
 	mutex_lock(&dev->ex_dev.cmd_mutex);
 	for (retry = 0; retry < 3; retry++) {
+		if (dev->gone)
+			return -ECOMM;
+
 		task = sas_alloc_task(GFP_KERNEL);
 		if (!task) {
 			res = -ENOMEM;


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (25 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task() Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-27  9:21   ` Jack Wang
  2011-12-23  3:00 ` [PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset() Dan Williams
  27 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: Xiangliang Yu, linux-ide, Luben Tuikov, Jack Wang

In the direct-attached case this routine returns the phy on which this
device was first discovered.  Which is broken if we want to support
wide-targets, as this phy reference can become stale even though the
port is still active.

In the expander-attached case this routine tries to lookup the phy by
scanning the attached sas addresses of the parent expander, and BUG_ONs
if it can't find it.  However since eh and the libsas workqueue run
independently we can still be attempting device recovery via eh after
libsas has recorded the device as detached.  This is even easier to hit
now that eh is blocked while device domain rediscovery takes place, and
that libata is fed more timed out commands increasing the chances that
it will try to recover the ata device.

Arrange for dev->phy to always point to a last known good phy, it may be
stale after the port is torn down, but it will catch up for wide port
reconfigurations, and never be NULL.

Q: How is pm8001_I_T_nexus_reset getting away with not performing reset
   on direct attached sata devices?

Cc: Jack Wang <jack_wang@usish.com>
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/aic94xx/aic94xx_tmf.c  |    9 ++++++--
 drivers/scsi/isci/task.c            |    9 +++++---
 drivers/scsi/libsas/sas_ata.c       |    7 +++++-
 drivers/scsi/libsas/sas_discover.c  |   24 ++++++++++++++++++++++
 drivers/scsi/libsas/sas_expander.c  |    5 ++++-
 drivers/scsi/libsas/sas_internal.h  |    1 +
 drivers/scsi/libsas/sas_port.c      |    7 +++---
 drivers/scsi/libsas/sas_scsi_host.c |   38 +++++++++++++++++------------------
 drivers/scsi/mvsas/mv_sas.c         |    3 ++-
 drivers/scsi/pm8001/pm8001_sas.c    |   19 +++++++++++-------
 drivers/scsi/scsi_transport_sas.c   |   23 +++++++++++++++++++++
 include/scsi/libsas.h               |    9 ++++++--
 include/scsi/scsi_transport_sas.h   |    6 ++++++
 13 files changed, 116 insertions(+), 44 deletions(-)

diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c b/drivers/scsi/aic94xx/aic94xx_tmf.c
index 0add73b..50b914f 100644
--- a/drivers/scsi/aic94xx/aic94xx_tmf.c
+++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
@@ -181,7 +181,7 @@ static int asd_clear_nexus_I_T(struct domain_device *dev,
 int asd_I_T_nexus_reset(struct domain_device *dev)
 {
 	int res, tmp_res, i;
-	struct sas_phy *phy = sas_find_local_phy(dev);
+	struct sas_phy *phy = sas_get_local_phy(dev);
 	/* Standard mandates link reset for ATA  (type 0) and
 	 * hard reset for SSP (type 1) */
 	int reset_type = (dev->dev_type == SATA_DEV ||
@@ -201,7 +201,7 @@ int asd_I_T_nexus_reset(struct domain_device *dev)
 	for (i = 0 ; i < 3; i++) {
 		tmp_res = asd_clear_nexus_I_T(dev, NEXUS_PHASE_RESUME);
 		if (tmp_res == TC_RESUME)
-			return res;
+			goto out;
 		msleep(500);
 	}
 
@@ -211,7 +211,10 @@ int asd_I_T_nexus_reset(struct domain_device *dev)
 	dev_printk(KERN_ERR, &phy->dev,
 		   "Failed to resume nexus after reset 0x%x\n", tmp_res);
 
-	return TMF_RESP_FUNC_FAILED;
+	res = TMF_RESP_FUNC_FAILED;
+ out:
+	sas_put_local_phy(phy);
+	return res;
 }
 
 static int asd_clear_nexus_I_T_L(struct domain_device *dev, u8 *lun)
diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
index 5901a0e..a6ab49a 100644
--- a/drivers/scsi/isci/task.c
+++ b/drivers/scsi/isci/task.c
@@ -1332,7 +1332,7 @@ isci_task_request_complete(struct isci_host *ihost,
 static int isci_reset_device(struct isci_host *ihost,
 			     struct isci_remote_device *idev)
 {
-	struct sas_phy *phy = sas_find_local_phy(idev->domain_dev);
+	struct sas_phy *phy = sas_get_local_phy(idev->domain_dev);
 	enum sci_status status;
 	unsigned long flags;
 	int rc;
@@ -1347,8 +1347,8 @@ static int isci_reset_device(struct isci_host *ihost,
 		dev_dbg(&ihost->pdev->dev,
 			 "%s: sci_remote_device_reset(%p) returned %d!\n",
 			 __func__, idev, status);
-
-		return TMF_RESP_FUNC_FAILED;
+		rc = TMF_RESP_FUNC_FAILED;
+		goto out;
 	}
 	spin_unlock_irqrestore(&ihost->scic_lock, flags);
 
@@ -1369,7 +1369,8 @@ static int isci_reset_device(struct isci_host *ihost,
 	}
 
 	dev_dbg(&ihost->pdev->dev, "%s: idev %p complete.\n", __func__, idev);
-
+ out:
+	sas_put_local_phy(phy);
 	return rc;
 }
 
diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index e174a73..96f316f 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -279,9 +279,10 @@ static int smp_ata_check_ready(struct ata_link *link)
 	struct ata_port *ap = link->ap;
 	struct domain_device *dev = ap->private_data;
 	struct domain_device *ex_dev = dev->parent;
-	struct sas_phy *phy = sas_find_local_phy(dev);
+	struct sas_phy *phy = sas_get_local_phy(dev);
 
 	res = sas_get_phy_attached_sas_addr(ex_dev, phy->number, addr);
+	sas_put_local_phy(phy);
 	/* break the wait early if the expander is unreachable,
 	 * otherwise keep polling
 	 */
@@ -314,10 +315,10 @@ static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
 			      unsigned long deadline)
 {
 	int ret = 0, res;
+	struct sas_phy *phy;
 	struct ata_port *ap = link->ap;
 	int (*check_ready)(struct ata_link *link);
 	struct domain_device *dev = ap->private_data;
-	struct sas_phy *phy = sas_find_local_phy(dev);
 	struct sas_internal *i = dev_to_sas_internal(dev);
 
 	res = i->dft->lldd_I_T_nexus_reset(dev);
@@ -325,10 +326,12 @@ static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
 	if (res != TMF_RESP_FUNC_COMPLETE)
 		SAS_DPRINTK("%s: Unable to reset ata device?\n", __func__);
 
+	phy = sas_get_local_phy(dev);
 	if (scsi_is_sas_phy_local(phy))
 		check_ready = local_ata_check_ready;
 	else
 		check_ready = smp_ata_check_ready;
+	sas_put_local_phy(phy);
 
 	ret = ata_wait_after_reset(link, deadline, check_ready);
 	if (ret && ret != -EAGAIN)
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 6e5fdfd..c765218 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -147,6 +147,7 @@ static int sas_get_port_device(struct asd_sas_port *port)
 	memset(port->disc.eeds_a, 0, SAS_ADDR_SIZE);
 	memset(port->disc.eeds_b, 0, SAS_ADDR_SIZE);
 	port->disc.max_level = 0;
+	sas_device_set_phy(dev, port->port);
 
 	dev->rphy = rphy;
 
@@ -234,6 +235,9 @@ void sas_free_device(struct kref *kref)
 	if (dev->parent)
 		sas_put_device(dev->parent);
 
+	sas_port_put_phy(dev->phy);
+	dev->phy = NULL;
+
 	/* remove the phys and ports, everything else should be gone */
 	if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV)
 		kfree(dev->ex_dev.ex_phy);
@@ -307,6 +311,26 @@ void sas_unregister_domain_devices(struct asd_sas_port *port)
 
 }
 
+void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
+{
+	struct sas_ha_struct *ha;
+	struct sas_phy *new_phy;
+
+	if (!dev)
+		return;
+
+	ha = dev->port->ha;
+	new_phy = sas_port_get_phy(port);
+
+	/* pin and record last seen phy */
+	spin_lock_irq(&ha->phy_port_lock);
+	if (new_phy) {
+		sas_port_put_phy(dev->phy);
+		dev->phy = new_phy;
+	}
+	spin_unlock_irq(&ha->phy_port_lock);
+}
+
 /* ---------- Discovery and Revalidation ---------- */
 
 /**
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index e2efc6c..e47599b 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -721,6 +721,7 @@ static struct domain_device *sas_ex_discover_end_dev(
 		}
 	}
 	sas_ex_get_linkrate(parent, child, phy);
+	sas_device_set_phy(child, phy->port);
 
 #ifdef CONFIG_SCSI_SAS_ATA
 	if ((phy->attached_tproto & SAS_PROTOCOL_STP) || phy->attached_sata_dev) {
@@ -1808,7 +1809,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
 {
 	struct expander_device *ex_dev = &parent->ex_dev;
 	struct ex_phy *phy = &ex_dev->ex_phy[phy_id];
-	struct domain_device *child, *n;
+	struct domain_device *child, *n, *found = NULL;
 	if (last) {
 		list_for_each_entry_safe(child, n,
 			&ex_dev->children, siblings) {
@@ -1820,6 +1821,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
 					sas_unregister_ex_tree(parent->port, child);
 				else
 					sas_unregister_dev(parent->port, child);
+				found = child;
 				break;
 			}
 		}
@@ -1828,6 +1830,7 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
 	memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
 	if (phy->port) {
 		sas_port_delete_phy(phy->port, phy->phy);
+		sas_device_set_phy(found, phy->port);
 		if (phy->port->num_phys == 0)
 			sas_port_delete(phy->port);
 		phy->port = NULL;
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index c6317c1..eec945e 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -85,6 +85,7 @@ int sas_smp_phy_control(struct domain_device *dev, int phy_id,
 			enum phy_func phy_func, struct sas_phy_linkrates *);
 int sas_smp_get_phy_events(struct sas_phy *phy);
 
+void sas_device_set_phy(struct domain_device *dev, struct sas_port *port);
 struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
 struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int phy_id);
 int sas_get_phy_attached_sas_addr(struct domain_device *dev, int phy_id,
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index e8e68d0..36e2905 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -108,9 +108,6 @@ static void sas_form_port(struct asd_sas_phy *phy)
 	port->num_phys++;
 	port->phy_mask |= (1U << phy->id);
 
-	if (!port->phy)
-		port->phy = phy->phy;
-
 	if (*(u64 *)port->attached_sas_addr == 0) {
 		port->class = phy->class;
 		memcpy(port->attached_sas_addr, phy->attached_sas_addr,
@@ -175,8 +172,10 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 		sas_unregister_domain_devices(port);
 		sas_port_delete(port->port);
 		port->port = NULL;
-	} else
+	} else {
 		sas_port_delete_phy(port->port, phy->phy);
+		sas_device_set_phy(dev, port->port);
+	}
 
 	if (si->dft->lldd_port_deformed)
 		si->dft->lldd_port_deformed(phy);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index b849dcd..59a227d 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -439,30 +439,26 @@ static int sas_recover_I_T(struct domain_device *dev)
 	return res;
 }
 
-/* Find the sas_phy that's attached to this device */
-struct sas_phy *sas_find_local_phy(struct domain_device *dev)
+/* take a reference on the last known good phy for this device */
+struct sas_phy *sas_get_local_phy(struct domain_device *dev)
 {
-	struct domain_device *pdev = dev->parent;
-	struct ex_phy *exphy = NULL;
-	int i;
+	struct sas_ha_struct *ha = dev->port->ha;
+	struct sas_phy *phy;
+	unsigned long flags;
 
-	/* Directly attached device */
-	if (!pdev)
-		return dev->port->phy;
+	/* a published domain device always has a valid phy, it may be
+	 * stale, but it is never NULL
+	 */
+	BUG_ON(!dev->phy);
 
-	/* Otherwise look in the expander */
-	for (i = 0; i < pdev->ex_dev.num_phys; i++)
-		if (!memcmp(dev->sas_addr,
-			    pdev->ex_dev.ex_phy[i].attached_sas_addr,
-			    SAS_ADDR_SIZE)) {
-			exphy = &pdev->ex_dev.ex_phy[i];
-			break;
-		}
+	spin_lock_irqsave(&ha->phy_port_lock, flags);
+	phy = dev->phy;
+	get_device(&phy->dev);
+	spin_unlock_irqrestore(&ha->phy_port_lock, flags);
 
-	BUG_ON(!exphy);
-	return exphy->phy;
+	return phy;
 }
-EXPORT_SYMBOL_GPL(sas_find_local_phy);
+EXPORT_SYMBOL_GPL(sas_get_local_phy);
 
 /* Attempt to send a LUN reset message to a device */
 int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
@@ -489,7 +485,7 @@ int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
 int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 {
 	struct domain_device *dev = cmd_to_domain_dev(cmd);
-	struct sas_phy *phy = sas_find_local_phy(dev);
+	struct sas_phy *phy = sas_get_local_phy(dev);
 	int res;
 
 	res = sas_phy_reset(phy, 1);
@@ -497,6 +493,8 @@ int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
 		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
 			    kobject_name(&phy->dev.kobj),
 			    res);
+	sas_put_local_phy(phy);
+
 	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
 		return SUCCESS;
 
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index cd88223..b68a653 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1474,10 +1474,11 @@ static int mvs_debug_issue_ssp_tmf(struct domain_device *dev,
 static int mvs_debug_I_T_nexus_reset(struct domain_device *dev)
 {
 	int rc;
-	struct sas_phy *phy = sas_find_local_phy(dev);
+	struct sas_phy *phy = sas_get_local_phy(dev);
 	int reset_type = (dev->dev_type == SATA_DEV ||
 			(dev->tproto & SAS_PROTOCOL_STP)) ? 0 : 1;
 	rc = sas_phy_reset(phy, reset_type);
+	sas_put_local_phy(phy);
 	msleep(2000);
 	return rc;
 }
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index 5add18c..b111018 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -873,12 +873,14 @@ int pm8001_I_T_nexus_reset(struct domain_device *dev)
 
 	pm8001_dev = dev->lldd_dev;
 	pm8001_ha = pm8001_find_ha_by_dev(dev);
-	phy = sas_find_local_phy(dev);
+	phy = sas_get_local_phy(dev);
 
 	if (dev_is_sata(dev)) {
 		DECLARE_COMPLETION_ONSTACK(completion_setstate);
-		if (scsi_is_sas_phy_local(phy))
-			return 0;
+		if (scsi_is_sas_phy_local(phy)) {
+			rc = 0;
+			goto out;
+		}
 		rc = sas_phy_reset(phy, 1);
 		msleep(2000);
 		rc = pm8001_exec_internal_task_abort(pm8001_ha, pm8001_dev ,
@@ -887,12 +889,14 @@ int pm8001_I_T_nexus_reset(struct domain_device *dev)
 		rc = PM8001_CHIP_DISP->set_dev_state_req(pm8001_ha,
 			pm8001_dev, 0x01);
 		wait_for_completion(&completion_setstate);
-	} else{
-	rc = sas_phy_reset(phy, 1);
-	msleep(2000);
+	} else {
+		rc = sas_phy_reset(phy, 1);
+		msleep(2000);
 	}
 	PM8001_EH_DBG(pm8001_ha, pm8001_printk(" for device[%x]:rc=%d\n",
 		pm8001_dev->device_id, rc));
+ out:
+	sas_put_local_phy(phy);
 	return rc;
 }
 
@@ -904,10 +908,11 @@ int pm8001_lu_reset(struct domain_device *dev, u8 *lun)
 	struct pm8001_device *pm8001_dev = dev->lldd_dev;
 	struct pm8001_hba_info *pm8001_ha = pm8001_find_ha_by_dev(dev);
 	if (dev_is_sata(dev)) {
-		struct sas_phy *phy = sas_find_local_phy(dev);
+		struct sas_phy *phy = sas_get_local_phy(dev);
 		rc = pm8001_exec_internal_task_abort(pm8001_ha, pm8001_dev ,
 			dev, 1, 0);
 		rc = sas_phy_reset(phy, 1);
+		sas_put_local_phy(phy);
 		rc = PM8001_CHIP_DISP->set_dev_state_req(pm8001_ha,
 			pm8001_dev, 0x01);
 		msleep(2000);
diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
index ab3bd0b..7d69a25 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -1060,6 +1060,29 @@ int scsi_is_sas_port(const struct device *dev)
 EXPORT_SYMBOL(scsi_is_sas_port);
 
 /**
+ * sas_port_get_phy - try to take a reference on a port member
+ * @port: port to check
+ */
+struct sas_phy *sas_port_get_phy(struct sas_port *port)
+{
+	struct sas_phy *phy;
+
+	mutex_lock(&port->phy_list_mutex);
+	if (list_empty(&port->phy_list))
+		phy = NULL;
+	else {
+		struct list_head *ent = port->phy_list.next;
+
+		phy = list_entry(ent, typeof(*phy), port_siblings);
+		get_device(&phy->dev);
+	}
+	mutex_unlock(&port->phy_list_mutex);
+
+	return phy;
+}
+EXPORT_SYMBOL(sas_port_get_phy);
+
+/**
  * sas_port_add_phy - add another phy to a port to form a wide port
  * @port:	port to add the phy to
  * @phy:	phy to add
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 3c9849c..571e7fc 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -188,6 +188,7 @@ struct domain_device {
         struct domain_device *parent;
         struct list_head siblings; /* devices on the same level */
         struct asd_sas_port *port;        /* shortcut to root of the tree */
+	struct sas_phy *phy;
 
         struct list_head dev_list_node;
 	struct list_head disco_list_node;
@@ -239,7 +240,6 @@ struct asd_sas_port {
 	struct list_head destroy_list;
 	enum   sas_linkrate linkrate;
 
-	struct sas_phy *phy;
 	struct work_struct work;
 
 /* public: */
@@ -424,6 +424,11 @@ static inline unsigned int to_sas_gpio_od(int device, int bit)
 	return 3 * device + bit;
 }
 
+static inline void sas_put_local_phy(struct sas_phy *phy)
+{
+	put_device(&phy->dev);
+}
+
 #ifdef CONFIG_SCSI_SAS_HOST_SMP
 int try_test_sas_gpio_gp_bit(unsigned int od, u8 *data, u8 index, u8 count);
 #else
@@ -679,7 +684,7 @@ extern int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
 
 extern void sas_ssp_task_response(struct device *dev, struct sas_task *task,
 				  struct ssp_response_iu *iu);
-struct sas_phy *sas_find_local_phy(struct domain_device *dev);
+struct sas_phy *sas_get_local_phy(struct domain_device *dev);
 
 int sas_request_addr(struct Scsi_Host *shost, u8 *addr);
 
diff --git a/include/scsi/scsi_transport_sas.h b/include/scsi/scsi_transport_sas.h
index 42817fa..98b3a20 100644
--- a/include/scsi/scsi_transport_sas.h
+++ b/include/scsi/scsi_transport_sas.h
@@ -209,6 +209,12 @@ void sas_port_add_phy(struct sas_port *, struct sas_phy *);
 void sas_port_delete_phy(struct sas_port *, struct sas_phy *);
 void sas_port_mark_backlink(struct sas_port *);
 int scsi_is_sas_port(const struct device *);
+struct sas_phy *sas_port_get_phy(struct sas_port *port);
+static inline void sas_port_put_phy(struct sas_phy *phy)
+{
+	if (phy)
+		put_device(&phy->dev);
+}
 
 extern struct scsi_transport_template *
 sas_attach_transport(struct sas_function_template *);


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset()
  2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
                   ` (26 preceding siblings ...)
  2011-12-23  3:00 ` [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references Dan Williams
@ 2011-12-23  3:00 ` Dan Williams
  2011-12-27  9:23   ` Jack Wang
  27 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2011-12-23  3:00 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

The commands that timeout when a disk is forcibly removed may trigger
libata to attempt recovery of the device.  If libsas has decided to
remove the device don't permit ata to continue to issue resets to its
last known phy.

The primary motivation for this patch is hotplug testing by writing 0 to
/sys/class/sas_phy/phyX/enable.  Without this check this test leads to
libata issuing a reset and re-enabling the device that wants to be torn
down.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/libsas/sas_ata.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 96f316f..2814731 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -321,6 +321,9 @@ static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
 	struct domain_device *dev = ap->private_data;
 	struct sas_internal *i = dev_to_sas_internal(dev);
 
+	if (dev->gone)
+		return -ENODEV;
+
 	res = i->dft->lldd_I_T_nexus_reset(dev);
 
 	if (res != TMF_RESP_FUNC_COMPLETE)


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references
  2011-12-23  3:00 ` [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references Dan Williams
@ 2011-12-27  9:21   ` Jack Wang
  2011-12-28 18:45     ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Jack Wang @ 2011-12-27  9:21 UTC (permalink / raw)
  To: 'Dan Williams', linux-scsi
  Cc: 'Xiangliang Yu', linux-ide, 'Luben Tuikov'

> 
> In the direct-attached case this routine returns the phy on which this
> device was first discovered.  Which is broken if we want to support
> wide-targets, as this phy reference can become stale even though the
> port is still active.
> 
> In the expander-attached case this routine tries to lookup the phy by
> scanning the attached sas addresses of the parent expander, and BUG_ONs
> if it can't find it.  However since eh and the libsas workqueue run
> independently we can still be attempting device recovery via eh after
> libsas has recorded the device as detached.  This is even easier to hit
> now that eh is blocked while device domain rediscovery takes place, and
> that libata is fed more timed out commands increasing the chances that
> it will try to recover the ata device.
> 
> Arrange for dev->phy to always point to a last known good phy, it may be
> stale after the port is torn down, but it will catch up for wide port
> reconfigurations, and never be NULL.
> 
> Q: How is pm8001_I_T_nexus_reset getting away with not performing reset
>    on direct attached sata devices?
> 
[Jack Wang] 
We found reset may lead to some SATA disks can not be found sometime, in
fact no only for direct attached sata devices.

I wonder why we always reset the sata device when probe, for pm8001 direct
attached sata firmware will report Initial SATA FIS when phy ready.


> Cc: Jack Wang <jack_wang@usish.com>
> Cc: Xiangliang Yu <yuxiangl@marvell.com>
> Cc: Luben Tuikov <ltuikov@yahoo.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/scsi/aic94xx/aic94xx_tmf.c  |    9 ++++++--
>  drivers/scsi/isci/task.c            |    9 +++++---
>  drivers/scsi/libsas/sas_ata.c       |    7 +++++-
>  drivers/scsi/libsas/sas_discover.c  |   24 ++++++++++++++++++++++
>  drivers/scsi/libsas/sas_expander.c  |    5 ++++-
>  drivers/scsi/libsas/sas_internal.h  |    1 +
>  drivers/scsi/libsas/sas_port.c      |    7 +++---
>  drivers/scsi/libsas/sas_scsi_host.c |   38
> +++++++++++++++++------------------
>  drivers/scsi/mvsas/mv_sas.c         |    3 ++-
>  drivers/scsi/pm8001/pm8001_sas.c    |   19 +++++++++++-------
>  drivers/scsi/scsi_transport_sas.c   |   23 +++++++++++++++++++++
>  include/scsi/libsas.h               |    9 ++++++--
>  include/scsi/scsi_transport_sas.h   |    6 ++++++
>  13 files changed, 116 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/scsi/aic94xx/aic94xx_tmf.c
> b/drivers/scsi/aic94xx/aic94xx_tmf.c
> index 0add73b..50b914f 100644
> --- a/drivers/scsi/aic94xx/aic94xx_tmf.c
> +++ b/drivers/scsi/aic94xx/aic94xx_tmf.c
> @@ -181,7 +181,7 @@ static int asd_clear_nexus_I_T(struct domain_device
*dev,
>  int asd_I_T_nexus_reset(struct domain_device *dev)
>  {
>  	int res, tmp_res, i;
> -	struct sas_phy *phy = sas_find_local_phy(dev);
> +	struct sas_phy *phy = sas_get_local_phy(dev);
>  	/* Standard mandates link reset for ATA  (type 0) and
>  	 * hard reset for SSP (type 1) */
>  	int reset_type = (dev->dev_type == SATA_DEV ||
> @@ -201,7 +201,7 @@ int asd_I_T_nexus_reset(struct domain_device *dev)
>  	for (i = 0 ; i < 3; i++) {
>  		tmp_res = asd_clear_nexus_I_T(dev, NEXUS_PHASE_RESUME);
>  		if (tmp_res == TC_RESUME)
> -			return res;
> +			goto out;
>  		msleep(500);
>  	}
> 
> @@ -211,7 +211,10 @@ int asd_I_T_nexus_reset(struct domain_device *dev)
>  	dev_printk(KERN_ERR, &phy->dev,
>  		   "Failed to resume nexus after reset 0x%x\n", tmp_res);
> 
> -	return TMF_RESP_FUNC_FAILED;
> +	res = TMF_RESP_FUNC_FAILED;
> + out:
> +	sas_put_local_phy(phy);
> +	return res;
>  }
> 
>  static int asd_clear_nexus_I_T_L(struct domain_device *dev, u8 *lun)
> diff --git a/drivers/scsi/isci/task.c b/drivers/scsi/isci/task.c
> index 5901a0e..a6ab49a 100644
> --- a/drivers/scsi/isci/task.c
> +++ b/drivers/scsi/isci/task.c
> @@ -1332,7 +1332,7 @@ isci_task_request_complete(struct isci_host *ihost,
>  static int isci_reset_device(struct isci_host *ihost,
>  			     struct isci_remote_device *idev)
>  {
> -	struct sas_phy *phy = sas_find_local_phy(idev->domain_dev);
> +	struct sas_phy *phy = sas_get_local_phy(idev->domain_dev);
>  	enum sci_status status;
>  	unsigned long flags;
>  	int rc;
> @@ -1347,8 +1347,8 @@ static int isci_reset_device(struct isci_host
*ihost,
>  		dev_dbg(&ihost->pdev->dev,
>  			 "%s: sci_remote_device_reset(%p) returned %d!\n",
>  			 __func__, idev, status);
> -
> -		return TMF_RESP_FUNC_FAILED;
> +		rc = TMF_RESP_FUNC_FAILED;
> +		goto out;
>  	}
>  	spin_unlock_irqrestore(&ihost->scic_lock, flags);
> 
> @@ -1369,7 +1369,8 @@ static int isci_reset_device(struct isci_host
*ihost,
>  	}
> 
>  	dev_dbg(&ihost->pdev->dev, "%s: idev %p complete.\n", __func__,
idev);
> -
> + out:
> +	sas_put_local_phy(phy);
>  	return rc;
>  }
> 
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index e174a73..96f316f 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -279,9 +279,10 @@ static int smp_ata_check_ready(struct ata_link *link)
>  	struct ata_port *ap = link->ap;
>  	struct domain_device *dev = ap->private_data;
>  	struct domain_device *ex_dev = dev->parent;
> -	struct sas_phy *phy = sas_find_local_phy(dev);
> +	struct sas_phy *phy = sas_get_local_phy(dev);
> 
>  	res = sas_get_phy_attached_sas_addr(ex_dev, phy->number, addr);
> +	sas_put_local_phy(phy);
>  	/* break the wait early if the expander is unreachable,
>  	 * otherwise keep polling
>  	 */
> @@ -314,10 +315,10 @@ static int sas_ata_hard_reset(struct ata_link *link,
> unsigned int *class,
>  			      unsigned long deadline)
>  {
>  	int ret = 0, res;
> +	struct sas_phy *phy;
>  	struct ata_port *ap = link->ap;
>  	int (*check_ready)(struct ata_link *link);
>  	struct domain_device *dev = ap->private_data;
> -	struct sas_phy *phy = sas_find_local_phy(dev);
>  	struct sas_internal *i = dev_to_sas_internal(dev);
> 
>  	res = i->dft->lldd_I_T_nexus_reset(dev);
> @@ -325,10 +326,12 @@ static int sas_ata_hard_reset(struct ata_link *link,
> unsigned int *class,
>  	if (res != TMF_RESP_FUNC_COMPLETE)
>  		SAS_DPRINTK("%s: Unable to reset ata device?\n", __func__);
> 
> +	phy = sas_get_local_phy(dev);
>  	if (scsi_is_sas_phy_local(phy))
>  		check_ready = local_ata_check_ready;
>  	else
>  		check_ready = smp_ata_check_ready;
> +	sas_put_local_phy(phy);
> 
>  	ret = ata_wait_after_reset(link, deadline, check_ready);
>  	if (ret && ret != -EAGAIN)
> diff --git a/drivers/scsi/libsas/sas_discover.c
> b/drivers/scsi/libsas/sas_discover.c
> index 6e5fdfd..c765218 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -147,6 +147,7 @@ static int sas_get_port_device(struct asd_sas_port
*port)
>  	memset(port->disc.eeds_a, 0, SAS_ADDR_SIZE);
>  	memset(port->disc.eeds_b, 0, SAS_ADDR_SIZE);
>  	port->disc.max_level = 0;
> +	sas_device_set_phy(dev, port->port);
> 
>  	dev->rphy = rphy;
> 
> @@ -234,6 +235,9 @@ void sas_free_device(struct kref *kref)
>  	if (dev->parent)
>  		sas_put_device(dev->parent);
> 
> +	sas_port_put_phy(dev->phy);
> +	dev->phy = NULL;
> +
>  	/* remove the phys and ports, everything else should be gone */
>  	if (dev->dev_type == EDGE_DEV || dev->dev_type == FANOUT_DEV)
>  		kfree(dev->ex_dev.ex_phy);
> @@ -307,6 +311,26 @@ void sas_unregister_domain_devices(struct
asd_sas_port
> *port)
> 
>  }
> 
> +void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
> +{
> +	struct sas_ha_struct *ha;
> +	struct sas_phy *new_phy;
> +
> +	if (!dev)
> +		return;
> +
> +	ha = dev->port->ha;
> +	new_phy = sas_port_get_phy(port);
> +
> +	/* pin and record last seen phy */
> +	spin_lock_irq(&ha->phy_port_lock);
> +	if (new_phy) {
> +		sas_port_put_phy(dev->phy);
> +		dev->phy = new_phy;
> +	}
> +	spin_unlock_irq(&ha->phy_port_lock);
> +}
> +
>  /* ---------- Discovery and Revalidation ---------- */
> 
>  /**
> diff --git a/drivers/scsi/libsas/sas_expander.c
> b/drivers/scsi/libsas/sas_expander.c
> index e2efc6c..e47599b 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -721,6 +721,7 @@ static struct domain_device *sas_ex_discover_end_dev(
>  		}
>  	}
>  	sas_ex_get_linkrate(parent, child, phy);
> +	sas_device_set_phy(child, phy->port);
> 
>  #ifdef CONFIG_SCSI_SAS_ATA
>  	if ((phy->attached_tproto & SAS_PROTOCOL_STP) ||
phy->attached_sata_dev)
> {
> @@ -1808,7 +1809,7 @@ static void sas_unregister_devs_sas_addr(struct
> domain_device *parent,
>  {
>  	struct expander_device *ex_dev = &parent->ex_dev;
>  	struct ex_phy *phy = &ex_dev->ex_phy[phy_id];
> -	struct domain_device *child, *n;
> +	struct domain_device *child, *n, *found = NULL;
>  	if (last) {
>  		list_for_each_entry_safe(child, n,
>  			&ex_dev->children, siblings) {
> @@ -1820,6 +1821,7 @@ static void sas_unregister_devs_sas_addr(struct
> domain_device *parent,
>  					sas_unregister_ex_tree(parent->port,
child);
>  				else
>  					sas_unregister_dev(parent->port,
child);
> +				found = child;
>  				break;
>  			}
>  		}
> @@ -1828,6 +1830,7 @@ static void sas_unregister_devs_sas_addr(struct
> domain_device *parent,
>  	memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
>  	if (phy->port) {
>  		sas_port_delete_phy(phy->port, phy->phy);
> +		sas_device_set_phy(found, phy->port);
>  		if (phy->port->num_phys == 0)
>  			sas_port_delete(phy->port);
>  		phy->port = NULL;
> diff --git a/drivers/scsi/libsas/sas_internal.h
> b/drivers/scsi/libsas/sas_internal.h
> index c6317c1..eec945e 100644
> --- a/drivers/scsi/libsas/sas_internal.h
> +++ b/drivers/scsi/libsas/sas_internal.h
> @@ -85,6 +85,7 @@ int sas_smp_phy_control(struct domain_device *dev, int
> phy_id,
>  			enum phy_func phy_func, struct sas_phy_linkrates *);
>  int sas_smp_get_phy_events(struct sas_phy *phy);
> 
> +void sas_device_set_phy(struct domain_device *dev, struct sas_port
*port);
>  struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
>  struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int
> phy_id);
>  int sas_get_phy_attached_sas_addr(struct domain_device *dev, int phy_id,
> diff --git a/drivers/scsi/libsas/sas_port.c
> b/drivers/scsi/libsas/sas_port.c
> index e8e68d0..36e2905 100644
> --- a/drivers/scsi/libsas/sas_port.c
> +++ b/drivers/scsi/libsas/sas_port.c
> @@ -108,9 +108,6 @@ static void sas_form_port(struct asd_sas_phy *phy)
>  	port->num_phys++;
>  	port->phy_mask |= (1U << phy->id);
> 
> -	if (!port->phy)
> -		port->phy = phy->phy;
> -
>  	if (*(u64 *)port->attached_sas_addr == 0) {
>  		port->class = phy->class;
>  		memcpy(port->attached_sas_addr, phy->attached_sas_addr,
> @@ -175,8 +172,10 @@ void sas_deform_port(struct asd_sas_phy *phy, int
gone)
>  		sas_unregister_domain_devices(port);
>  		sas_port_delete(port->port);
>  		port->port = NULL;
> -	} else
> +	} else {
>  		sas_port_delete_phy(port->port, phy->phy);
> +		sas_device_set_phy(dev, port->port);
> +	}
> 
>  	if (si->dft->lldd_port_deformed)
>  		si->dft->lldd_port_deformed(phy);
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c
> b/drivers/scsi/libsas/sas_scsi_host.c
> index b849dcd..59a227d 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -439,30 +439,26 @@ static int sas_recover_I_T(struct domain_device
*dev)
>  	return res;
>  }
> 
> -/* Find the sas_phy that's attached to this device */
> -struct sas_phy *sas_find_local_phy(struct domain_device *dev)
> +/* take a reference on the last known good phy for this device */
> +struct sas_phy *sas_get_local_phy(struct domain_device *dev)
>  {
> -	struct domain_device *pdev = dev->parent;
> -	struct ex_phy *exphy = NULL;
> -	int i;
> +	struct sas_ha_struct *ha = dev->port->ha;
> +	struct sas_phy *phy;
> +	unsigned long flags;
> 
> -	/* Directly attached device */
> -	if (!pdev)
> -		return dev->port->phy;
> +	/* a published domain device always has a valid phy, it may be
> +	 * stale, but it is never NULL
> +	 */
> +	BUG_ON(!dev->phy);
> 
> -	/* Otherwise look in the expander */
> -	for (i = 0; i < pdev->ex_dev.num_phys; i++)
> -		if (!memcmp(dev->sas_addr,
> -			    pdev->ex_dev.ex_phy[i].attached_sas_addr,
> -			    SAS_ADDR_SIZE)) {
> -			exphy = &pdev->ex_dev.ex_phy[i];
> -			break;
> -		}
> +	spin_lock_irqsave(&ha->phy_port_lock, flags);
> +	phy = dev->phy;
> +	get_device(&phy->dev);
> +	spin_unlock_irqrestore(&ha->phy_port_lock, flags);
> 
> -	BUG_ON(!exphy);
> -	return exphy->phy;
> +	return phy;
>  }
> -EXPORT_SYMBOL_GPL(sas_find_local_phy);
> +EXPORT_SYMBOL_GPL(sas_get_local_phy);
> 
>  /* Attempt to send a LUN reset message to a device */
>  int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
> @@ -489,7 +485,7 @@ int sas_eh_device_reset_handler(struct scsi_cmnd *cmd)
>  int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
>  {
>  	struct domain_device *dev = cmd_to_domain_dev(cmd);
> -	struct sas_phy *phy = sas_find_local_phy(dev);
> +	struct sas_phy *phy = sas_get_local_phy(dev);
>  	int res;
> 
>  	res = sas_phy_reset(phy, 1);
> @@ -497,6 +493,8 @@ int sas_eh_bus_reset_handler(struct scsi_cmnd *cmd)
>  		SAS_DPRINTK("Bus reset of %s failed 0x%x\n",
>  			    kobject_name(&phy->dev.kobj),
>  			    res);
> +	sas_put_local_phy(phy);
> +
>  	if (res == TMF_RESP_FUNC_SUCC || res == TMF_RESP_FUNC_COMPLETE)
>  		return SUCCESS;
> 
> diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
> index cd88223..b68a653 100644
> --- a/drivers/scsi/mvsas/mv_sas.c
> +++ b/drivers/scsi/mvsas/mv_sas.c
> @@ -1474,10 +1474,11 @@ static int mvs_debug_issue_ssp_tmf(struct
> domain_device *dev,
>  static int mvs_debug_I_T_nexus_reset(struct domain_device *dev)
>  {
>  	int rc;
> -	struct sas_phy *phy = sas_find_local_phy(dev);
> +	struct sas_phy *phy = sas_get_local_phy(dev);
>  	int reset_type = (dev->dev_type == SATA_DEV ||
>  			(dev->tproto & SAS_PROTOCOL_STP)) ? 0 : 1;
>  	rc = sas_phy_reset(phy, reset_type);
> +	sas_put_local_phy(phy);
>  	msleep(2000);
>  	return rc;
>  }
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c
> b/drivers/scsi/pm8001/pm8001_sas.c
> index 5add18c..b111018 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -873,12 +873,14 @@ int pm8001_I_T_nexus_reset(struct domain_device
*dev)
> 
>  	pm8001_dev = dev->lldd_dev;
>  	pm8001_ha = pm8001_find_ha_by_dev(dev);
> -	phy = sas_find_local_phy(dev);
> +	phy = sas_get_local_phy(dev);
> 
>  	if (dev_is_sata(dev)) {
>  		DECLARE_COMPLETION_ONSTACK(completion_setstate);
> -		if (scsi_is_sas_phy_local(phy))
> -			return 0;
> +		if (scsi_is_sas_phy_local(phy)) {
> +			rc = 0;
> +			goto out;
> +		}
>  		rc = sas_phy_reset(phy, 1);
>  		msleep(2000);
>  		rc = pm8001_exec_internal_task_abort(pm8001_ha, pm8001_dev ,
> @@ -887,12 +889,14 @@ int pm8001_I_T_nexus_reset(struct domain_device
*dev)
>  		rc = PM8001_CHIP_DISP->set_dev_state_req(pm8001_ha,
>  			pm8001_dev, 0x01);
>  		wait_for_completion(&completion_setstate);
> -	} else{
> -	rc = sas_phy_reset(phy, 1);
> -	msleep(2000);
> +	} else {
> +		rc = sas_phy_reset(phy, 1);
> +		msleep(2000);
>  	}
>  	PM8001_EH_DBG(pm8001_ha, pm8001_printk(" for device[%x]:rc=%d\n",
>  		pm8001_dev->device_id, rc));
> + out:
> +	sas_put_local_phy(phy);
>  	return rc;
>  }
> 
> @@ -904,10 +908,11 @@ int pm8001_lu_reset(struct domain_device *dev, u8
*lun)
>  	struct pm8001_device *pm8001_dev = dev->lldd_dev;
>  	struct pm8001_hba_info *pm8001_ha = pm8001_find_ha_by_dev(dev);
>  	if (dev_is_sata(dev)) {
> -		struct sas_phy *phy = sas_find_local_phy(dev);
> +		struct sas_phy *phy = sas_get_local_phy(dev);
>  		rc = pm8001_exec_internal_task_abort(pm8001_ha, pm8001_dev ,
>  			dev, 1, 0);
>  		rc = sas_phy_reset(phy, 1);
> +		sas_put_local_phy(phy);
>  		rc = PM8001_CHIP_DISP->set_dev_state_req(pm8001_ha,
>  			pm8001_dev, 0x01);
>  		msleep(2000);
> diff --git a/drivers/scsi/scsi_transport_sas.c
> b/drivers/scsi/scsi_transport_sas.c
> index ab3bd0b..7d69a25 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -1060,6 +1060,29 @@ int scsi_is_sas_port(const struct device *dev)
>  EXPORT_SYMBOL(scsi_is_sas_port);
> 
>  /**
> + * sas_port_get_phy - try to take a reference on a port member
> + * @port: port to check
> + */
> +struct sas_phy *sas_port_get_phy(struct sas_port *port)
> +{
> +	struct sas_phy *phy;
> +
> +	mutex_lock(&port->phy_list_mutex);
> +	if (list_empty(&port->phy_list))
> +		phy = NULL;
> +	else {
> +		struct list_head *ent = port->phy_list.next;
> +
> +		phy = list_entry(ent, typeof(*phy), port_siblings);
> +		get_device(&phy->dev);
> +	}
> +	mutex_unlock(&port->phy_list_mutex);
> +
> +	return phy;
> +}
> +EXPORT_SYMBOL(sas_port_get_phy);
> +
> +/**
>   * sas_port_add_phy - add another phy to a port to form a wide port
>   * @port:	port to add the phy to
>   * @phy:	phy to add
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 3c9849c..571e7fc 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -188,6 +188,7 @@ struct domain_device {
>          struct domain_device *parent;
>          struct list_head siblings; /* devices on the same level */
>          struct asd_sas_port *port;        /* shortcut to root of the tree
*/
> +	struct sas_phy *phy;
> 
>          struct list_head dev_list_node;
>  	struct list_head disco_list_node;
> @@ -239,7 +240,6 @@ struct asd_sas_port {
>  	struct list_head destroy_list;
>  	enum   sas_linkrate linkrate;
> 
> -	struct sas_phy *phy;
>  	struct work_struct work;
> 
>  /* public: */
> @@ -424,6 +424,11 @@ static inline unsigned int to_sas_gpio_od(int device,
int
> bit)
>  	return 3 * device + bit;
>  }
> 
> +static inline void sas_put_local_phy(struct sas_phy *phy)
> +{
> +	put_device(&phy->dev);
> +}
> +
>  #ifdef CONFIG_SCSI_SAS_HOST_SMP
>  int try_test_sas_gpio_gp_bit(unsigned int od, u8 *data, u8 index, u8
count);
>  #else
> @@ -679,7 +684,7 @@ extern int sas_smp_handler(struct Scsi_Host *shost,
struct
> sas_rphy *rphy,
> 
>  extern void sas_ssp_task_response(struct device *dev, struct sas_task
*task,
>  				  struct ssp_response_iu *iu);
> -struct sas_phy *sas_find_local_phy(struct domain_device *dev);
> +struct sas_phy *sas_get_local_phy(struct domain_device *dev);
> 
>  int sas_request_addr(struct Scsi_Host *shost, u8 *addr);
> 
> diff --git a/include/scsi/scsi_transport_sas.h
> b/include/scsi/scsi_transport_sas.h
> index 42817fa..98b3a20 100644
> --- a/include/scsi/scsi_transport_sas.h
> +++ b/include/scsi/scsi_transport_sas.h
> @@ -209,6 +209,12 @@ void sas_port_add_phy(struct sas_port *, struct
sas_phy
> *);
>  void sas_port_delete_phy(struct sas_port *, struct sas_phy *);
>  void sas_port_mark_backlink(struct sas_port *);
>  int scsi_is_sas_port(const struct device *);
> +struct sas_phy *sas_port_get_phy(struct sas_port *port);
> +static inline void sas_port_put_phy(struct sas_phy *phy)
> +{
> +	if (phy)
> +		put_device(&phy->dev);
> +}
> 
>  extern struct scsi_transport_template *
>  sas_attach_transport(struct sas_function_template *);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset()
  2011-12-23  3:00 ` [PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset() Dan Williams
@ 2011-12-27  9:23   ` Jack Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Jack Wang @ 2011-12-27  9:23 UTC (permalink / raw)
  To: 'Dan Williams', linux-scsi; +Cc: linux-ide

I pull the new V2 patchset, it works fine.
I found another bug in libsas, will send out later.

Jack

[PATCH v2 28/28] libsas: don't recover 'gone' devices in
> sas_ata_hard_reset()
> 
> The commands that timeout when a disk is forcibly removed may trigger
> libata to attempt recovery of the device.  If libsas has decided to
> remove the device don't permit ata to continue to issue resets to its
> last known phy.
> 
> The primary motivation for this patch is hotplug testing by writing 0 to
> /sys/class/sas_phy/phyX/enable.  Without this check this test leads to
> libata issuing a reset and re-enabling the device that wants to be torn
> down.
> 
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/scsi/libsas/sas_ata.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index 96f316f..2814731 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -321,6 +321,9 @@ static int sas_ata_hard_reset(struct ata_link *link,
> unsigned int *class,
>  	struct domain_device *dev = ap->private_data;
>  	struct sas_internal *i = dev_to_sas_internal(dev);
> 
> +	if (dev->gone)
> +		return -ENODEV;
> +
>  	res = i->dft->lldd_I_T_nexus_reset(dev);
> 
>  	if (res != TMF_RESP_FUNC_COMPLETE)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references
  2011-12-27  9:21   ` Jack Wang
@ 2011-12-28 18:45     ` Dan Williams
  2011-12-29  6:18       ` Jack Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2011-12-28 18:45 UTC (permalink / raw)
  To: Jack Wang; +Cc: linux-scsi, Xiangliang Yu, linux-ide, Luben Tuikov

On Tue, Dec 27, 2011 at 1:21 AM, Jack Wang <jack_wang@usish.com> wrote:
>>
>> In the direct-attached case this routine returns the phy on which this
>> device was first discovered.  Which is broken if we want to support
>> wide-targets, as this phy reference can become stale even though the
>> port is still active.
>>
>> In the expander-attached case this routine tries to lookup the phy by
>> scanning the attached sas addresses of the parent expander, and BUG_ONs
>> if it can't find it.  However since eh and the libsas workqueue run
>> independently we can still be attempting device recovery via eh after
>> libsas has recorded the device as detached.  This is even easier to hit
>> now that eh is blocked while device domain rediscovery takes place, and
>> that libata is fed more timed out commands increasing the chances that
>> it will try to recover the ata device.
>>
>> Arrange for dev->phy to always point to a last known good phy, it may be
>> stale after the port is torn down, but it will catch up for wide port
>> reconfigurations, and never be NULL.
>>
>> Q: How is pm8001_I_T_nexus_reset getting away with not performing reset
>>    on direct attached sata devices?
>>
> [Jack Wang]
> We found reset may lead to some SATA disks can not be found sometime, in
> fact no only for direct attached sata devices.
>
> I wonder why we always reset the sata device when probe, for pm8001 direct
> attached sata firmware will report Initial SATA FIS when phy ready.
>

We need to get the drive to a known state.  Do these problems still
happen in your tests with the "wait for ready" checking?  That's
supposed to allow enough time for the signature fis to be transmitted.

I'll take a closer look at your libsas fix, because I believe we are
still seeing failures to rediscover all attached devices even with
these patches.

--
Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 24/28] libsas: poll for ata device readiness after reset
  2011-12-23  3:00 ` [PATCH v2 24/28] libsas: poll for ata device readiness after reset Dan Williams
@ 2011-12-29  6:18   ` Jack Wang
  2012-02-19 22:06     ` James Bottomley
  0 siblings, 1 reply; 37+ messages in thread
From: Jack Wang @ 2011-12-29  6:18 UTC (permalink / raw)
  To: 'Dan Williams', linux-scsi; +Cc: 'Tejun Heo', linux-ide

> @@ -267,39 +267,84 @@ static bool sas_ata_qc_fill_rtf(struct
ata_queued_cmd
> *qc)
>  	return true;
>  }
> 
> -static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
> -			       unsigned long deadline)
> +static struct sas_internal *dev_to_sas_internal(struct domain_device
*dev)
> +{
> +	return to_sas_internal(dev->port->ha->core.shost->transportt);
> +}
> +
> +static int smp_ata_check_ready(struct ata_link *link)
>  {
> +	int res;
> +	u8 addr[8];
>  	struct ata_port *ap = link->ap;
>  	struct domain_device *dev = ap->private_data;
> -	struct sas_internal *i
> -		to_sas_internal(dev->port->ha->core.shost->transportt);
> -	int res = TMF_RESP_FUNC_FAILED;
> -	int ret = 0;
> +	struct domain_device *ex_dev = dev->parent;
> +	struct sas_phy *phy = sas_find_local_phy(dev);
> 
> -	if (i->dft->lldd_I_T_nexus_reset)
> -		res = i->dft->lldd_I_T_nexus_reset(dev);
> +	res = sas_get_phy_attached_sas_addr(ex_dev, phy->number, addr);
> +	/* break the wait early if the expander is unreachable,
> +	 * otherwise keep polling
> +	 */
> +	if (res == -ECOMM)
> +		return res;
> +	if (res != SMP_RESP_FUNC_ACC || SAS_ADDR(addr) == 0)
[Jack Wang] 
This check may not guarantee the FIS have received by the expander, should
we
 Use sas_ex_phy_discover instead, we still need to teach
sas_ex_phy_discover_helper to return right code.
> +		return 0;
> +	else
> +		return 1;
> +}
> 
> -	if (res != TMF_RESP_FUNC_COMPLETE) {
> -		SAS_DPRINTK("%s: Unable to reset I T nexus?\n", __func__);
> -		ret = -EAGAIN;
> +static int local_ata_check_ready(struct ata_link *link)
> +{
> +	struct ata_port *ap = link->ap;
> +	struct domain_device *dev = ap->private_data;
> +	struct sas_internal *i = dev_to_sas_internal(dev);
> +
> +	if (i->dft->lldd_ata_check_ready)
> +		return i->dft->lldd_ata_check_ready(dev);
> +	else {
> +		/* lldd's that don't implement 'ready' checking get the
> +		 * old default behavior of not coordinating reset
> +		 * recovery with libata
> +		 */
> +		return 1;
>  	}
> +}
> 
> +static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
> +			      unsigned long deadline)
> +{
> +	int ret = 0, res;
> +	struct ata_port *ap = link->ap;
> +	int (*check_ready)(struct ata_link *link);
> +	struct domain_device *dev = ap->private_data;
> +	struct sas_phy *phy = sas_find_local_phy(dev);
> +	struct sas_internal *i = dev_to_sas_internal(dev);
> +
> +	res = i->dft->lldd_I_T_nexus_reset(dev);
> +
> +	if (res != TMF_RESP_FUNC_COMPLETE)
> +		SAS_DPRINTK("%s: Unable to reset ata device?\n", __func__);
> +
> +	if (scsi_is_sas_phy_local(phy))
> +		check_ready = local_ata_check_ready;
> +	else
> +		check_ready = smp_ata_check_ready;
> +
> +	ret = ata_wait_after_reset(link, deadline, check_ready);
> +	if (ret && ret != -EAGAIN)
> +		ata_link_err(link, "COMRESET failed (errno=%d)\n", ret);
> +
> +	/* XXX: if the class changes during the reset the upper layer
> +	 * should be informed, if the device has gone away we assume
> +	 * libsas will eventually delete it
> +	 */
>  	switch (dev->sata_dev.command_set) {
> -		case ATA_COMMAND_SET:
> -			SAS_DPRINTK("%s: Found ATA device.\n", __func__);
> -			*class = ATA_DEV_ATA;
> -			break;
> -		case ATAPI_COMMAND_SET:
> -			SAS_DPRINTK("%s: Found ATAPI device.\n", __func__);
> -			*class = ATA_DEV_ATAPI;
> -			break;
> -		default:
> -			SAS_DPRINTK("%s: Unknown SATA command set: %d.\n",
> -				    __func__,
> -				    dev->sata_dev.command_set);
> -			*class = ATA_DEV_UNKNOWN;
> -			break;
> +	case ATA_COMMAND_SET:
> +		*class = ATA_DEV_ATA;
> +		break;
> +	case ATAPI_COMMAND_SET:
> +		*class = ATA_DEV_ATAPI;
> +		break;
>  	}
> 
>  	ap->cbl = ATA_CBL_SATA;
> @@ -311,8 +356,7 @@ static int sas_ata_soft_reset(struct ata_link *link,
> unsigned int *class,
>  {
>  	struct ata_port *ap = link->ap;
>  	struct domain_device *dev = ap->private_data;
> -	struct sas_internal *i
> -		to_sas_internal(dev->port->ha->core.shost->transportt);
> +	struct sas_internal *i = dev_to_sas_internal(dev);
>  	int res = TMF_RESP_FUNC_FAILED;
>  	int ret = 0;
> 
> @@ -350,8 +394,7 @@ static int sas_ata_soft_reset(struct ata_link *link,
> unsigned int *class,
>   */
>  static void sas_ata_internal_abort(struct sas_task *task)
>  {
> -	struct sas_internal *si
> -
to_sas_internal(task->dev->port->ha->core.shost->transportt);
> +	struct sas_internal *si = dev_to_sas_internal(task->dev);
>  	unsigned long flags;
>  	int res;
> 
> @@ -420,8 +463,7 @@ static void sas_ata_post_internal(struct
ata_queued_cmd
> *qc)
>  static void sas_ata_set_dmamode(struct ata_port *ap, struct ata_device
> *ata_dev)
>  {
>  	struct domain_device *dev = ap->private_data;
> -	struct sas_internal *i
> -		to_sas_internal(dev->port->ha->core.shost->transportt);
> +	struct sas_internal *i = dev_to_sas_internal(dev);
> 
>  	if (i->dft->lldd_ata_set_dmamode)
>  		i->dft->lldd_ata_set_dmamode(dev);
> diff --git a/drivers/scsi/libsas/sas_expander.c
> b/drivers/scsi/libsas/sas_expander.c
> index fd77ea3..5e1eec9 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -125,7 +125,11 @@ static int smp_execute_task(struct domain_device
*dev,
> void *req, int req_size,
>  		    task->task_status.stat == SAS_DATA_OVERRUN) {
>  			res = -EMSGSIZE;
>  			break;
> -		} else {
> +		}
> +		if (task->task_status.resp == SAS_TASK_UNDELIVERED &&
> +		    task->task_status.stat == SAS_DEVICE_UNKNOWN)
> +			break;
> +		else {
>  			SAS_DPRINTK("%s: task to dev %016llx response: 0x%x
"
>  				    "status 0x%x\n", __func__,
>  				    SAS_ADDR(dev->sas_addr),
> @@ -1648,8 +1652,8 @@ static int sas_get_phy_change_count(struct
> domain_device *dev,
>  	return res;
>  }
> 
> -static int sas_get_phy_attached_sas_addr(struct domain_device *dev,
> -					 int phy_id, u8 *attached_sas_addr)
> +int sas_get_phy_attached_sas_addr(struct domain_device *dev, int phy_id,
> +				  u8 *attached_sas_addr)
>  {
>  	int res;
>  	struct smp_resp *disc_resp;
> diff --git a/drivers/scsi/libsas/sas_internal.h
> b/drivers/scsi/libsas/sas_internal.h
> index cde1a84..c6317c1 100644
> --- a/drivers/scsi/libsas/sas_internal.h
> +++ b/drivers/scsi/libsas/sas_internal.h
> @@ -87,7 +87,8 @@ int sas_smp_get_phy_events(struct sas_phy *phy);
> 
>  struct domain_device *sas_find_dev_by_rphy(struct sas_rphy *rphy);
>  struct domain_device *sas_ex_to_ata(struct domain_device *ex_dev, int
> phy_id);
> -
> +int sas_get_phy_attached_sas_addr(struct domain_device *dev, int phy_id,
> +				  u8 *attached_sas_addr);
>  void sas_hae_reset(struct work_struct *work);
> 
>  void sas_free_device(struct kref *kref);
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 6e1c640..3c9849c 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -610,6 +610,7 @@ struct sas_domain_function_template {
>  	int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
>  	int (*lldd_I_T_nexus_reset)(struct domain_device *);
>  	int (*lldd_ata_soft_reset)(struct domain_device *);
> +	int (*lldd_ata_check_ready)(struct domain_device *);
>  	void (*lldd_ata_set_dmamode)(struct domain_device *);
>  	int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
>  	int (*lldd_query_task)(struct sas_task *);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references
  2011-12-28 18:45     ` Dan Williams
@ 2011-12-29  6:18       ` Jack Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Jack Wang @ 2011-12-29  6:18 UTC (permalink / raw)
  To: 'Dan Williams'
  Cc: linux-scsi, 'Xiangliang Yu', linux-ide, 'Luben Tuikov'

> 
> On Tue, Dec 27, 2011 at 1:21 AM, Jack Wang <jack_wang@usish.com> wrote:
> >>
> >> In the direct-attached case this routine returns the phy on which this
> >> device was first discovered.  Which is broken if we want to support
> >> wide-targets, as this phy reference can become stale even though the
> >> port is still active.
> >>
> >> In the expander-attached case this routine tries to lookup the phy by
> >> scanning the attached sas addresses of the parent expander, and BUG_ONs
> >> if it can't find it.  However since eh and the libsas workqueue run
> >> independently we can still be attempting device recovery via eh after
> >> libsas has recorded the device as detached.  This is even easier to hit
> >> now that eh is blocked while device domain rediscovery takes place, and
> >> that libata is fed more timed out commands increasing the chances that
> >> it will try to recover the ata device.
> >>
> >> Arrange for dev->phy to always point to a last known good phy, it may
be
> >> stale after the port is torn down, but it will catch up for wide port
> >> reconfigurations, and never be NULL.
> >>
> >> Q: How is pm8001_I_T_nexus_reset getting away with not performing reset
> >>    on direct attached sata devices?
> >>
> > [Jack Wang]
> > We found reset may lead to some SATA disks can not be found sometime, in
> > fact no only for direct attached sata devices.
> >
> > I wonder why we always reset the sata device when probe, for pm8001
direct
> > attached sata firmware will report Initial SATA FIS when phy ready.
> >
> 
> We need to get the drive to a known state.  Do these problems still
> happen in your tests with the "wait for ready" checking?  That's
> supposed to allow enough time for the signature fis to be transmitted.
> 
> I'll take a closer look at your libsas fix, because I believe we are
> still seeing failures to rediscover all attached devices even with
> these patches.
> 
> --
> Dan
[Jack Wang] 
I only test sata behind expander, will add lldd_ata_check_ready to have a
test later.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task()
  2011-12-23  3:00 ` [PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task() Dan Williams
@ 2012-01-09 19:04   ` Dan Williams
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2012-01-09 19:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-ide

On Thu, Dec 22, 2011 at 7:00 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> No sense in issuing or retrying commands to an expander that has been
> removed.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/scsi/libsas/sas_expander.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
> index d9c2769..e2efc6c 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -74,6 +74,9 @@ static int smp_execute_task(struct domain_device *dev, void *req, int req_size,
>
>        mutex_lock(&dev->ex_dev.cmd_mutex);
>        for (retry = 0; retry < 3; retry++) {
> +               if (dev->gone)
> +                       return -ECOMM;
> +

Test results are just now tripping up on this obvious deadlock.  Will resend.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 24/28] libsas: poll for ata device readiness after reset
  2011-12-29  6:18   ` Jack Wang
@ 2012-02-19 22:06     ` James Bottomley
  2012-02-20  1:08       ` Jack Wang
  0 siblings, 1 reply; 37+ messages in thread
From: James Bottomley @ 2012-02-19 22:06 UTC (permalink / raw)
  To: Jack Wang
  Cc: 'Dan Williams', linux-scsi, 'Tejun Heo', linux-ide

On Thu, 2011-12-29 at 14:18 +0800, Jack Wang wrote:
> > @@ -267,39 +267,84 @@ static bool sas_ata_qc_fill_rtf(struct
> ata_queued_cmd
> > *qc)
> >  	return true;
> >  }
> > 
> > -static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
> > -			       unsigned long deadline)
> > +static struct sas_internal *dev_to_sas_internal(struct domain_device
> *dev)
> > +{
> > +	return to_sas_internal(dev->port->ha->core.shost->transportt);
> > +}
> > +
> > +static int smp_ata_check_ready(struct ata_link *link)
> >  {
> > +	int res;
> > +	u8 addr[8];
> >  	struct ata_port *ap = link->ap;
> >  	struct domain_device *dev = ap->private_data;
> > -	struct sas_internal *i
> > -		to_sas_internal(dev->port->ha->core.shost->transportt);
> > -	int res = TMF_RESP_FUNC_FAILED;
> > -	int ret = 0;
> > +	struct domain_device *ex_dev = dev->parent;
> > +	struct sas_phy *phy = sas_find_local_phy(dev);
> > 
> > -	if (i->dft->lldd_I_T_nexus_reset)
> > -		res = i->dft->lldd_I_T_nexus_reset(dev);
> > +	res = sas_get_phy_attached_sas_addr(ex_dev, phy->number, addr);
> > +	/* break the wait early if the expander is unreachable,
> > +	 * otherwise keep polling
> > +	 */
> > +	if (res == -ECOMM)
> > +		return res;
> > +	if (res != SMP_RESP_FUNC_ACC || SAS_ADDR(addr) == 0)
> [Jack Wang] 
> This check may not guarantee the FIS have received by the expander, should
> we
>  Use sas_ex_phy_discover instead, we still need to teach
> sas_ex_phy_discover_helper to return right code.

So the concern seems valid, do we have a fix yet?

James



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v2 24/28] libsas: poll for ata device readiness after reset
  2012-02-19 22:06     ` James Bottomley
@ 2012-02-20  1:08       ` Jack Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Jack Wang @ 2012-02-20  1:08 UTC (permalink / raw)
  To: 'James Bottomley'
  Cc: 'Dan Williams', linux-scsi, 'Tejun Heo', linux-ide

> 
> So the concern seems valid, do we have a fix yet?
> 
> James
> 
> 
[Jack Wang] Patch is already send out by Dan in new eh handle rework v9.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-02-20  1:11 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-23  2:58 [PATCH v2 00/28] libsas: eh reworks (ata-eh vs discovery, races, ...) Dan Williams
2011-12-23  2:58 ` [PATCH v2 01/28] libsas: remove unused ata_task_resp fields Dan Williams
2011-12-23  2:58 ` [PATCH v2 02/28] libsas: kill sas_slave_destroy Dan Williams
2011-12-23  2:58 ` [PATCH v2 03/28] libsas: fix domain_device leak Dan Williams
2011-12-23  2:58 ` [PATCH v2 04/28] libsas: fix leak of dev->sata_dev.identify_[packet_]device Dan Williams
2011-12-23  2:58 ` [PATCH v2 05/28] libsas: replace event locks with atomic bitops Dan Williams
2011-12-23  2:59 ` [PATCH v2 06/28] libsas: convert ha->state to flags Dan Williams
2011-12-23  2:59 ` [PATCH v2 07/28] libsas: introduce sas_drain_work() Dan Williams
2011-12-23  2:59 ` [PATCH v2 08/28] libsas: remove ata_port.lock management duties from lldds Dan Williams
2011-12-23  2:59 ` [PATCH v2 09/28] libsas: prevent domain rediscovery competing with ata error handling Dan Williams
2011-12-23  2:59 ` [PATCH v2 10/28] libsas: use ->set_dmamode to notify lldds of NCQ parameters Dan Williams
2011-12-23  2:59 ` [PATCH v2 11/28] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done Dan Williams
2011-12-23  2:59 ` [PATCH v2 12/28] libsas: close error handling vs sas_ata_task_done() race Dan Williams
2011-12-23  2:59 ` [PATCH v2 13/28] libsas: prevent double completion of scmds from eh Dan Williams
2011-12-23  2:59 ` [PATCH v2 14/28] libsas: fix timeout vs completion race Dan Williams
2011-12-23  2:59 ` [PATCH v2 15/28] libsas: let libata handle command timeouts Dan Williams
2011-12-23  2:59 ` [PATCH v2 16/28] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata Dan Williams
2011-12-23  2:59 ` [PATCH v2 17/28] libsas: use libata-eh-reset for sata rediscovery fis transmit failures Dan Williams
2011-12-23  3:00 ` [PATCH v2 18/28] libsas: perform sas-transport resets in shost->workq context Dan Williams
2011-12-23  3:00 ` [PATCH v2 19/28] libsas: execute transport link resets with libata-eh via host workqueue Dan Williams
2011-12-23  3:00 ` [PATCH v2 20/28] libsas: sas_phy_enable via transport_sas_phy_reset Dan Williams
2011-12-23  3:00 ` [PATCH v2 21/28] libsas: Remove redundant phy state notification calls Dan Williams
2011-12-23  3:00 ` [PATCH v2 22/28] libsas: add mutex for SMP task execution Dan Williams
2011-12-23  3:00 ` [PATCH v2 23/28] libsas: async ata-eh Dan Williams
2011-12-23  3:00 ` [PATCH v2 24/28] libsas: poll for ata device readiness after reset Dan Williams
2011-12-29  6:18   ` Jack Wang
2012-02-19 22:06     ` James Bottomley
2012-02-20  1:08       ` Jack Wang
2011-12-23  3:00 ` [PATCH v2 25/28] libsas: don't mark expanders as gone when a child device is removed Dan Williams
2011-12-23  3:00 ` [PATCH v2 26/28] libsas: check for 'gone' expanders in smp_execute_task() Dan Williams
2012-01-09 19:04   ` Dan Williams
2011-12-23  3:00 ` [PATCH v2 27/28] libsas: fix sas_find_local_phy(), take phy references Dan Williams
2011-12-27  9:21   ` Jack Wang
2011-12-28 18:45     ` Dan Williams
2011-12-29  6:18       ` Jack Wang
2011-12-23  3:00 ` [PATCH v2 28/28] libsas: don't recover 'gone' devices in sas_ata_hard_reset() Dan Williams
2011-12-27  9:23   ` Jack Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.